Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
(Thread IKs: hot cocoa on the couch)
 
  • Post
  • Reply
neato burrito
Aug 25, 2002

bitch better have my chex mix

Blink 182 is playing right now. The original members.

Adbot
ADBOT LOVES YOU

Luvcow
Jul 1, 2007

One day nearer spring

Saalkin posted:

Someone post a God dang line up

i know gorillaz is playing which i'd love to see, new album is good imo


neato burrito posted:

Blink 182 is playing right now. The original members.

worth seeing too

SIDS Vicious
Jan 1, 1970


I love tom delonge

SIDS Vicious
Jan 1, 1970


aliens exist

Zeluth
May 12, 2001

by Fluffdaddy
Coming up: Blondie

https://www.youtube.com/watch?v=pHCdS7O248g

Dixville
Nov 4, 2008

I don't think!
Ham Wrangler

El Jebus posted:

I'm 5 miles away, avoiding the people that go to it. Does that count?

Same. The 10 had so much more traffic than usual last night

Hey.

It's still Friday

Bad Purchase
Jun 17, 2019




piss skulled with the coworkers, now it's time for some gaming. friday, it's good.

jimmyjams
Jan 10, 2001


King Kong of Megadongs
Gobblin' them mega schlongs
Makin' sure they mega long
Stroke' 'em if they mega strong
fart

El Jebus
Jun 18, 2008

This avatar is paid for by "Avatars for improving Lowtax's spine by any means that doesn't result in him becoming brain dead by putting his brain into a cyborg body and/or putting him in a exosuit due to fears of the suit being hacked and crushing him during a cyberpunk future timeline" Foundation

Dixville posted:

Same. The 10 had so much more traffic than usual last night

Hey.

It's still Friday

Yeah, did all my grocery shopping for the weekend on Wednesday. I won't have to leave the house for food, beer, or weed.

Chief McHeath
Apr 23, 2002
has PISS SKULLER the boat happened yet what;s the status

BAGS FLY AT NOON
Apr 6, 2011

A Soft Nylon Bag
Saturday morning’s alright for posting

fartknocker
Oct 28, 2012


Damn it, this always happens. I think I'm gonna score, and then I never score. It's not fair.



Wedge Regret

That’s the way to start Saturday

Pennywise the Frown
May 10, 2010

Upset Trowel
Actually went out with friends and had a fun night.

Looking for a movie to play while I start playing a game.

neato burrito
Aug 25, 2002

bitch better have my chex mix

One more for the road.

cumpantry
Dec 18, 2020

played a lot of minecraft and a little oblivion and blue dragon. kind of a psycho collection of games to run through in a day but im a crazy guy :hampants:

cumpantry
Dec 18, 2020

good friday overall

numberoneposter
Feb 19, 2014

How much do I cum? The answer might surprise you!

drank some beers and cooked some honey garlic sausages with a buddy

havnt really socialized too much with my bros lately so it was nice to catch up and talk some poo poo

solid friday

Chinatown
Sep 11, 2001

by Fluffdaddy
Fun Shoe
late nite burrito

Haptical Sales Slut
Mar 15, 2010

Age 18 to 49

Chinatown posted:

late nite burrito

A good username. And lifestyle.

Methanar
Sep 26, 2013

by the sex ghost
i woke up at 10 to find a I missed a meeting with some intel guy at 9 am who was supposed to help us understand why we have garbage performance on our new servers. something something numa.

i think I did like 10 minutes of work to hand off a thing to another team who was trying to a/b test which of our new datacenters was worse than the others. this is somewhat complicated actually because graphite is a garbage tsdb and thats where application metrics are stored. We should be using prometheus at this point tbh, but years of mismanagement have prevented that lol oh well. anyway that was hard so I just left it to the other guys and went back to bed for a bit longer.

Eventually my phone started chiming so I had to go have some slack conversations and helped another guy with a terraform thing.
i lasted about 2 hours before I just decided I didn't really want to be awake so I went back to bed at 12 until about 2:30 according to my chrome history.



i think at this point I decided to follow up with the graphite metrics guys to see their conclusions. Had some more conversations around the interpretations of their findings.
helped a few other random people with random whatevers
did some more low-intensity testing and follow ups on more the nightmare that is this stupid fuckin datacenter performance issues we've been having.
one thing that came up was that there was a strong correlation between packet loss and bad performance. It was actually our strongest correlation we've found after like 3 weeks of chasing red herrings.

Further investigation I found that we have all of our interrupt handling on a single core. That's apparently bad according to cloudflare particularly with numa systems.
https://null.53bits.co.uk/index.php?page=numa-and-queue-affinity
https://blog.cloudflare.com/how-to-achieve-low-latency/
https://blog.cloudflare.com/how-to-receive-a-million-packets/

quote:

While a 10% penalty for running on a different NUMA node may not sound too bad, the problem only gets worse with scale. On some tests I was able to squeeze out only 250kpps per core. On all the cross-NUMA tests the variability was bad. The performance penalty across NUMA nodes is even more visible at higher throughput. In one of the tests I got a 4x penalty when running the receiver on a bad NUMA node.

After some reading, it's actually all kinda hosed. Anyway im updating the intel ice nic drivers from 0.8.1-k to 1.11.14 as an action item. supposedly this helps with irq balancing. Crazy how old the ubuntu included driver is.
im probably going to need to screw around with pinning of irqs to specific physical cores. idfk its all complicated and im way too tired to be dealing with this.

there's also some poo poo about disabling c-states and modifying p-states and maybe the cpu frequency governor i probably need to pursue. I've been ignoring this angle for a while. i don't relaly think its seriously the problem anymore after finding the packet drop stuff.
I have some PR for tuning tcp socket buffers. This is another one of those redherrings. I think there's still merit in it even if it's not my immediate problem.
https://en.wikipedia.org/wiki/Bandwidth-delay_product
Apparently this was a known bottleneck for kafka. It couldn't saturate its 10g nics until these were tuned. tbh my servers likely won't be pushing 10gbps on a single socket like kafka was, but i'll still tune it why not. these defaults were set in like 1999 for 10mbps nics on machines with 500mb of memory lol

code:
# [url]https://www.ibm.com/docs/de/smpi/10.2?topic=mpi-tuning-your-linux-system[/url]
net.core.rmem_default = 10000000
net.core.wmem_default = 10000000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 10000000        10000000        16777216
net.ipv4.tcp_wmem = 10000000        10000000        16777216
net.core.netdev_max_backlog = 30000
I also got a report a few days ago that prometheus in the datacenter wasn't forwarding metrics to s3 so we only had 48h of metrics available. Lol this was broken for like 6 months, but we're only noticing now because I finally made recording rules earlier this week so we could actually meaningfully query out some data without just ooming. funny enough that revealed there was never more than 48h of data anyway.

this has turned into another whole fuckin mess because look at this poo poo. the prometheus operator doesn't support arbitrary environment variables being passed in. Which is what I need in order to do the oidc iam federation authentication thing. this thing https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/
https://github.com/prometheus-opera...efulset.go#L726

It was only recently that thanos-sidecar at got support for using non-static credentials. v.0.25.0 according to the tags here. I'm still running v0.21.0 from forever ago because neglect. so guess i need to update this while im at it.
https://github.com/thanos-io/thanos...8f8f64bb328R103

Even though the operator doesn't support what I want. thats fine because I can just cheat and template in my own stuff as a work around and keep going. Except the way things are templated is all hosed so I've been wasting like 45 minutes unravelling this dogshit configuration. It's actually so bad. I've been underwater for the past 9 months so I haven't had time to really care about this, but man all of the new hires have hosed this up so bad. I hate it. I want to both fix this, but i just don't have time for anything anymore and haven't in so long. and also actually i don't want to fix it, i just want it to not be wasting my time.

I ended up just going back to bed for another 2 hours before I was willing to come back to work for a little bit more.

quote:

level=error ts=2023-04-15T05:48:05.185550013Z caller=main.go:130 err="yaml: unmarshal errors:\n line 1: field aws_sdk_auth not found in type s3.Config\ncreate S3 client\ngithub.com/thanos-io/thanos/pkg/objstore/client.NewBucket\n\t/app/pkg/objstore/client/factory.go:77\nmain.runStore\n\t/app/cmd/thanos/store.go:243\nmain.registerStore.func1\n\t/app/cmd/thanos/store.go:188\nmain.main\n\t/app/cmd/thanos/main.go:128\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\ncreate bucket client\nmain.runStore\n\t/app/cmd/thanos/store.go:245\nmain.registerStore.func1\n\t/app/cmd/thanos/store.go:188\nmain.main\n\t/app/cmd/thanos/main.go:128\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\npreparing store command failed\nmain.main\n\t/app/cmd/thanos/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"

after more pissing around with this and reading about kernel networking details I decided to just go to the grocery store. I ran out of food a couple days ago. I bought veggies and eggs mostly. The pork loins I usually buy weren't on sale so I didn't bother. This was the first time I really left my house in more than 2 weeks I think. i honestly don't remember when I last did.

guess i'll just keep fixing this thanos sidecar auth thing so metrics work again for the rest of the night.

overall an extremely painfully normal day

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

this is what you get for not closing the friday thread hot coco on the clout

Chinatown
Sep 11, 2001

by Fluffdaddy
Fun Shoe

Chinatown posted:

hot cocoa passed out on the couch

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

lol

Bad Purchase
Jun 17, 2019




hot cocoa on the couch posted:

there's no doubt in my mind that the schedule will be perfect this friday

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

:owned:

Bad Purchase
Jun 17, 2019




also lol that the effort poster is very bad at their job

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

Bad Purchase posted:

also lol that the effort poster is very bad at their job

that kind of language earns you a paystub in your PMs buddy

Bad Purchase
Jun 17, 2019




looking forward to it, on a saturday

Methanar
Sep 26, 2013

by the sex ghost
im the technical team lead of a group of 10
im actually not bad at my job. i've just had an extremely difficult last 7 years of stressful project work and have started having severe sleeping problems again in the past couple weeks.

DeadFatDuckFat
Oct 29, 2012

This avatar brought to you by the 'save our dead gay forums' foundation.


Sounds like you should go back to work then and stop posting on the forums forever

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

I'm sure life is real stressful when you blind move to a new city every 4 months

Methanar
Sep 26, 2013

by the sex ghost
I give each city a full 12 months. I wish I went to knoxville instead though


im taking a break, it's been a hard day.

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

you're gonna be welcomed with open arms in the neo-Confederate South, I can feel it

Bad Purchase
Jun 17, 2019




Methanar posted:

im the technical team lead of a group of 10
im actually not bad at my job. i've just had an extremely difficult last 7 years of stressful project work and have started having severe sleeping problems again in the past couple weeks.

lol

Methanar
Sep 26, 2013

by the sex ghost
code:
level=warn ts=2023-04-15T05:57:10.971583454Z caller=sidecar.go:347 err="check exists: stat s3 object: Access Denied." uploaded=0
level=warn ts=2023-04-15T05:57:40.960132557Z caller=sidecar.go:347 err="check exists: stat s3 object: Access Denied." uploaded=0
level=warn ts=2023-04-15T05:58:10.956038044Z caller=sidecar.go:347 err="check exists: stat s3 object: Access Denied." uploaded=0
man. fuckin why.

https://github.com/thanos-io/thanos/issues/5929

apparently it broke after 0.28? come on.

Chief McHeath
Apr 23, 2002
I woke up and had to pisssssssssss

Methanar
Sep 26, 2013

by the sex ghost
hm nope still broken on 0.28.
code:
k logs -f prometheus-monitoring-prometheus-oper-prometheus-1 thanos-sidecar | grep sidecar
level=info ts=2023-04-15T07:17:21.497092722Z caller=sidecar.go:361 msg="starting sidecar"
level=info ts=2023-04-15T07:17:21.497752221Z caller=http.go:73 service=http/server component=sidecar msg="listening for requests and metrics" address=:10902
level=info ts=2023-04-15T07:17:21.498280506Z caller=tls_config.go:195 service=http/server component=sidecar msg="TLS is disabled." http2=false
level=info ts=2023-04-15T07:17:21.498959487Z caller=grpc.go:131 service=gRPC/server component=sidecar msg="listening for serving gRPC" address=:10901
level=warn ts=2023-04-15T07:17:21.501596394Z caller=sidecar.go:373 msg="failed to get Prometheus flags. Is Prometheus running? Retrying" err="got non-200 response code: 503, response: Service Unavailable"
level=info ts=2023-04-15T07:17:40.349525724Z caller=sidecar.go:179 msg="successfully loaded prometheus version"
level=warn ts=2023-04-15T07:17:41.716929422Z caller=sidecar.go:345 err="check exists: stat s3 object: Access Denied." uploaded=0
level=warn ts=2023-04-15T07:18:11.71567341Z caller=sidecar.go:345 err="check exists: stat s3 object: Access Denied." uploaded=0
code:
k8s-cluster-addons git:(main) ✗ pbpaste | base64 --decode
config:
  aws_sdk_auth: true
Okay. I think i see why. it's because my serviceAccount is wrong.

code:
k get pod prometheus-monitoring-prometheus-oper-prometheus-0 -o yaml | grep serviceAccount
  serviceAccount: monitoring-prometheus-oper-prometheus
  serviceAccountName: monitoring-prometheus-oper-prometheus
code:
k get pod thanos-storegateway-0 -o yaml | grep serviceAccount
  serviceAccount:thanos
  serviceAccountName: thanos
yeah that'll do it.

within the IAM trust of the role I want to assume, we pin to specific principals. The prometheus container obviously is using the prometheus SA.
I guess I'll need to figure out how to make this an OR and include prometheus here. That's a terraform change. whatever guess i'll do it.
https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition_operators.html
code:
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "CENSORED.s3-us-west-2.amazonaws.com/CENSORED/identifier:sub": "system:serviceaccount:monitoring:thanos"
                }
            }

MrQwerty
Apr 15, 2003

LOVE IS BEAUTIFUL
(づ ̄ ³ ̄)づ♥(‘∀’●)

Methanar posted:

im actually not bad at my job.

Methanar posted:

apparently it broke after 0.28? come on.

Methanar
Sep 26, 2013

by the sex ghost
wow its actually a pain in the rear end to do a logical OR in iam policies. thankfully somebody figured out the truth table for me
https://dev.to/himwad05/aws-iam-how-to-achieve-logical-or-effect-with-multiple-iam-condition-operators-2h0p

Adbot
ADBOT LOVES YOU

Chief McHeath
Apr 23, 2002
drat, dog, i sure give a gently caress about any of that!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply