Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Methanar
Sep 26, 2013

by the sex ghost

i am a moron posted:

Anyone else wake up, realize they aren't doing poo poo today (and wouldn't have contributed much if you did?), and be happy about it? I honestly don't know why the companies I work for/with even stay open in December.

Same except but woke up by pager duty multiple times for gov cloud and then several more times because of some pretty questionable nightmares and not happy about it

Adbot
ADBOT LOVES YOU

Methanar
Sep 26, 2013

by the sex ghost

i am a moron posted:

oh my loving god we're merging with a public company

edit: sold. gently caress.

ask for a fat equity grant so you are motivated to share in the company's success

Methanar
Sep 26, 2013

by the sex ghost
Our security org is making several really stupid changes that dramatically increase risk and frustration for zero benefit to appear as though they're increasing security in light of the solarwinds fiasco.

code:
Shortening VPN connection time to 9 hours before having to re-authenticate
We have some extremely irritating mandatory release windows that happen at the end of the day. This on its own is dumb because having 20 different changes go out all at once every day sometimes makes it unclear what change was the problematic one. Let alone the disaster it is to east coast people who need to start their releases at 6pm when they're all tired and want to stop working. Its extremely likely there are now going to be several hundred engineers getting interrupted mid-release/mid-maintenance right around the 9 hour mark.

Live in Europe? Hope you like 11pm releases or 6 am releases

Good job guys

Methanar fucked around with this message at 22:23 on Dec 17, 2020

Methanar
Sep 26, 2013

by the sex ghost

CLAM DOWN posted:

Get Owned

Love, Security

You're probably not even joking.

this is what security people actually think it means to contribute value.

Methanar
Sep 26, 2013

by the sex ghost
I got paged 4 times and smoke alarm woke up once last night

Somebody actually manually created a pd to wake me at 3 am because their dev docker registry wasn't working. What.

Methanar
Sep 26, 2013

by the sex ghost

Biowarfare posted:

have you considered just hitting ack and ignoring it because you don't have a 3am response slo for dev

The timezone of PD is set to east coast for some reason. It was 2:43 AM.


I did just that. And the psychos immediately escalated as soon as they noticed to somebody else. If they sent it to me again I would have gotten up and asked what the deal is. I was later told

quote:

dev is dev for our developers, but we really should treat it like production for us. If our dev environment is down at any hour of the day, dozens of engineers are impacted.

I ended up getting up at the 4 and 5AM pages anyway because of scary things like `40 pods of important app are unscheduled` and `ingress's conntrack table is full`

This whole day has been one giant unending trashfire of problems one after the other. And this is just what I was paged over.

Methanar fucked around with this message at 00:17 on Dec 22, 2020

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

It's true actually that dev is just production 2 for ops, but that just means that you should have ops engineers in every timezone that you have developers and have folks oncall outside of business hours.

Every ops person we have in Europe is out on PTO right now at the same time lol.

Methanar
Sep 26, 2013

by the sex ghost
Most of the ops issues this morning were because I disabled the cluster autoscaler in a few places on friday after a fire. Something about a combination of unbounded connections being allowed app side on something and a hilarious customer regex that ended up sending us 130x normal traffic on something causing us to saturate an alb, exhaust ephemeral port ranges and cause a kafka rebalance storm. Because I manually did some ASG tweaking as part of the remediation because I was worried about the CA causing a bunch of shuffling to re-optimize binpacking and potentially causing a huge mess of more kafka rebalances at a time we didn't want to.

After that friday fire I turned on the CA again but then another ops person told me no we shouldn't lets not rock the boat with causing a bunch of shuffling and re-binpacking on a friday night. I argued that the CA is in place for a reason and leaving it off is more liability than turning it back on. I wasn't in the mood to argue after that friday fire so I relented and left it off. Which was really dumb because I knew it was more liability and if I'm on call: it should be my judgement to do what is less risky since it falls on me when things blow up.

This morning the guy who said we should leave the CA off sent me what amounted to `lol we should turn the CA back on`.

Methanar
Sep 26, 2013

by the sex ghost

Biowarfare posted:

hey open question to anyone here that is new to kubernetes as an org:

- how long did it take for someone to complain that their pod rebooted, was rescheduled, autoscaled up or down, or moved to a different host node

this happened to me about 3 months ago because someone was running a singleton

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

1. Instantly, within the first 20 minutes, at both orgs where we were charged with implementing k8s clusters for the first time.
2. Most people were actually pretty understanding when I explained that this is just a thing with kubernetes, that the price we're paying for them being able to kubectl whatever the gently caress they want was that sometimes this is going to happen and that if they don't want to deal with it, they have access to normal infrastructure through the usual medium.
3. This has actually never happened to me but if you want to broaden to "doing stupid things with kubectl exec", also instantly, within the first 20 minutes, at both orgs.


So far I've managed to successfully argue against autoscaling anything to do with kafka every time it has come up, we've been able to plan for running the cluster 24/7 at peak capacity and we just leave it up like that all the time and eat the extra cost. This lets us run the actual kafka cluster nodes and various ancillary services (mirror maker, cli tools node, etc) on 3 year reservation ec2 instances which I would say is definitely a best practice.

Message workers, publishers, and friends run through kubernetes because we don't actually care if they start boot looping because the EKS CNI can't find an IP address for them or whatever the gently caress -- we get paged when average processing latency gets too high, and usually we show up, find an exciting new bug in whatever worker, and rollback to the last deployed version.

Why are y'all autoscaling the actual kafka nodes? Cost reasons?

we don't scale kafka brokers. We scale the consumers (95% of everything here is kafka-based one way or another). Its just that our kafka throughput is measured in the petabytes per day with many, many partitions. So shifting around a few hundred consumers all at once can sometimes hurt.

Methanar
Sep 26, 2013

by the sex ghost

uhhhhahhhhohahhh posted:

POV: people itt start talking about kubernetes
https://www.youtube.com/watch?v=y8OnoxKotPQ

this is now the kafka sensitivities thread

post all your kafka stories itt

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

Lyft's FlinkK8sOperator is worse than just defining a Deployment and rapidly degrades after deployment to the point where the operator state machine just totally breaks and your deployment gets stuck in SubmittingJob or JobCancelled or JobFailed forever, and the only way to recover is to kubectl exec into the job manager, run a cli command to push a savepoint (not a checkpoint) to s3, and then kill it.

If you take a checkpoint instead of a savepoint and kill it, your flink workers will come back up and start reprocessing events from whenever the last time you took a real savepoint was. This is very, very, extremely bad if you're using the operator to process financial information.

I don't know anything flinking lyfts but lmao

quote:

Project Status
Beta

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

I suppose it depends on your business and what your team does day to day -- even having an SLA for a dev environment speaks to a level of disassociation with feature development that is way different than anywhere I have worked in the past (almost entirely SaaS, startups, and enterprises that think they are still startups). It's not that there isn't an SLA necessarily it's just that the SLA is implied: the company writes and runs software. How much do I care about this? A lot -- it is 100% of my job to ensure that the company can continue to successfully write and run software.

Not to suggest that you are doing this but if your employer makes money by selling a software product it's reaaaally not a good idea to intentionally cultivate an us/them mentality or even imagine any sort of wall between ops/dev teams or reflect that into dev(them)/prod(us) environments. The ops team exists to operate the software in such a way that generates money for the company, blocking feature dev is blocking the pipeline that produces the requirements that your team exists to solve. This becomes really obvious when you work for a company as they are entering the "oh poo poo I guess we need an ops team" stage of life.

If your production environment is like an AD domain with file shares so that sales and accounting can all edit spreadsheets together, certainly that requires a different attitude, but also yeah you don't really have "developers" for that in any sense that matches an environment that you deploy a software product to. It is still called production though which is a great terminology collision for this line of work.

"Restrictions" is kind of a weird term here -- your dev environment definitely has a deploy mechanism. It's the same mechanism as production because that's the best way to write software. Since you're a good development shop and your ops teams have a healthy relationship with your dev teams, the mechanism is both the easiest and safest way to deploy software to any environment, so everyone uses it all the time for every deployment related task. Since anyone can safely deploy anything at any time, a dev deployment failure is a pending prod deployment failure, which is an ops concern. Additionally, anything that just broke dev would also break production, which is another ops concern.

If any of that isn't true for anyone the best path forward is to make all of it true, otherwise you're not really doing "dev ops", and none of the usual rituals or tools apply to your work as intended.

This is a good post and it's changed my opinion on the matter.

Methanar
Sep 26, 2013

by the sex ghost

i am a moron posted:

Frankly I think you’re fostering the us vs. them mentality to fail to empathize with why ops has to ration their time carefully. I get why developers would feel this way about a dev environment, but breaking the walls down would mean compromising on how the usually stretched thin ops teams handle their time the most effectively. That doesn’t usually involve treating dev environments like they are actually generating revenue right this very second and it can’t wait.

That being said I don’t work with product companies usually, but you definitely see the kind of mentality more often and for good reason from their perspective. I’d still regard it as bad practice though, and after interviewing at product focused companies a few times I decided it wasn’t for me so definitely probably personal preference/attitude needed for it.

What does it mean to be a non product-company

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

"What if we just didn't do anything?"

Methanar
Sep 26, 2013

by the sex ghost

i am a moron posted:

By doing this, they’ve acknowledged they see you as a resource instead of a human being. Hiring someone is a crapshoot, and they picked someone with some obscure one-time useful skills that you assuredly could’ve figured out yourself instead of picking someone they know is successful and were (imo) cruelly suggesting should be advancing within your current org.

I might get too bent out of shape about this stuff, but it really pisses me off when IT departments do this kind of poo poo. Don’t bitch about people job hopping or lacking loyalty or whatever when you see them as skills with people attached to them. Specific domain knowledge is overrated to an extreme degree - you’re hiring people drat it.

tell the manager to promote you and bring in an SME as a contractor if necessary

Methanar
Sep 26, 2013

by the sex ghost

i am a moron posted:

Can anyone explain what possible reason you'd have for not taking the shortest TTLs possible? I don't even understand this convo.

I had code review comments recently that told me that 60 second TTLs were too long and I should instead be using 30. I had no reason to disagree.

Well-written software should be re-using sockets as often as possible so its not like half the ttl doubles authoritative dns hits.

Methanar fucked around with this message at 04:40 on Jan 3, 2021

Methanar
Sep 26, 2013

by the sex ghost
nvm you guys are right I walked into that one.

Methanar
Sep 26, 2013

by the sex ghost
Putting C on your resume is a good way to encourage somebody to interrogate you on it

Methanar
Sep 26, 2013

by the sex ghost

jaegerx posted:

I still google loving facls and quotas which is poo poo you don’t need in 2021. Auto mount too. Why is this still on your stupid test?

facl you

22 Eargesplitten posted:

I just mean like the most basic poo poo for someone that knows how to yum/apt-get/dnf (lol does anything use dnf anymore?)/cat/mkdir and pretty much nothing else.

there is nothing else to know

Methanar
Sep 26, 2013

by the sex ghost

22 Eargesplitten posted:

I just mean like the most basic poo poo for someone that knows how to yum/apt-get/dnf (lol does anything use dnf anymore?)/cat/mkdir and pretty much nothing else.

How do you find the different package versions available for haproxy on a ubuntu system.

How do you install a specific version and make sure it doesn't automatically change on you.

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

Yes, being able to ansible a windows node without having to mass GPO a bunch winrm/psrp nonsense and then figure out a way to locally store AD credentials in a way that doesn't piss off every AD admin to ever exist so I don't need to --ask-pass and type my 36 character full sentence password, which I still have to have and rotate every 90 days for some reason, to manage a win server node would be a good start.

12 rats tied together posted:

The only good things Microsoft of the post-2000s has done are WSL and .net core.

SQL Server is a bad, expensive database platform that encourages you to repeatedly invest in a widely agreed upon antipattern: putting business logic in the database. I've never owned an XBOX.

12 rats tied together posted:

Enter-PSSession is not the same as ssh, it is ~reasonably close in terms of what it allows a human to do, but the important part of supporting ssh is so you can integrate with the entire other half of production server ecosystems which have been using it for the last 25 years.

Internet Explorer posted:

*whispers* you're not the intended consumer

:hai:

I have not enjoyed my Azure experiences. It's sort of telling though in response to your several effortposts most of the retort is Clam-level 'nuh uhs'

CLAM DOWN posted:

Man, you are wrong and ignorant and outdated. Not worth getting into a debate about.

Methanar
Sep 26, 2013

by the sex ghost

jaegerx posted:

All cloud providers are poo poo and everyone should run openstack in their own DC.

What if I don't want to pretend I work at rackspace though

Methanar
Sep 26, 2013

by the sex ghost

jaegerx posted:

I run raid 0 on my gaming pc, AMA.

You probably run windows on it too

Methanar
Sep 26, 2013

by the sex ghost
I learned my lesson about trusting the cloud when tumblr decided to start censoring my art

Methanar
Sep 26, 2013

by the sex ghost
My company's bold new diversity initiative for the year is to reallocate the company holidays december 24th and 31st to instead fall on MLK jr and veterans day.

Methanar
Sep 26, 2013

by the sex ghost
I made an actual big boy software architecture recommendation today for some team that is building a new feature in a zoom I got pulled into. That was cool, I don't usually do that.

I hope it works lol.

Methanar
Sep 26, 2013

by the sex ghost

DelphiAegis posted:

Apparently the full story is even more hilarious.

I've been laughing about this all morning.

https://imgur.com/gallery/oFMaMAI

https://www.reddit.com/r/ParlerWatch/comments/kv0jo6/psa_the_heavily_upvoted_description_of_the_parler/

I counter your unsubstantiated anonymous internet post with another unsubstantiated anonymous internet post

Methanar
Sep 26, 2013

by the sex ghost

The Fool posted:

I wrote a terraform module on Friday and it’s going in to production today.

It’s going to affect how VM’s are deployed in Azure across the entire organization.

:ohdear:

:hellyeah:

Methanar
Sep 26, 2013

by the sex ghost

turn your camera off

Methanar
Sep 26, 2013

by the sex ghost

jaegerx posted:

Turn your monitor on.

that was the joke I was making

Methanar
Sep 26, 2013

by the sex ghost

CLAM DOWN posted:

public anger

You're so angry. Are you ok

(USER WAS PUT ON PROBATION FOR THIS POST)

Adbot
ADBOT LOVES YOU

Methanar
Sep 26, 2013

by the sex ghost
If I have a bitlocker encrypted disk and I take an image with dd and write that image back to another disk later. Will I be able to decrypt it normally? I'm pretty sure the answer is yes because I'm not storing anything in a TPM.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply