Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

prom candy: Dec 16, 2005; Only I may dance

luminalflux posted:

Another thing to consider is how you handle dropping tables. We had some severe issues with RDS, each time we'd reboot with failover - something that AWS claims should be "fairly quick", we'd end up having hours+ of downtime. Way back in the day someone dropped a big table and while it was dropping, it took down the site due to ... idk what was the actual issue. So they rebooted the database.

We only discovered years later that the database files were left around on the RDS instance's filesystem, and each time it booted it say "huh, i have table files but no record of them in the metadata table. I have no idea what's real, so I need to go into crash recovery". AWS support would join our zoom bridge, but they basically could only troubleshoot as far to "well the instance is working normally, let me page the database team" taking 30min-1hr to get to a point where we had an expert online.

Learnings?

Even though RDS can do multi-AZ, don't rely on it. Have a replica you can switch to and promote instead. Ideally you have

Drop tables gracefully by emptying them first using pt-archiver.

AWS support isn't enough. Either you need in-house expertise on your primary datastore, or engage experts such as Percona or Pythian.

If you intend to scale (i e, you're a startup that's scaling up customers et c), build out best practices and tooling while you're small enough before you end up with huge amounts of inadvertent downtime

(i realize i'm basically drip-feeding my PerconaLive talk from 2020 i never gave because, uh, the 'rona)

Thanks, this is great stuff to think about. I have an inherited setup right now that imo is just stupidly overcomplicated for a company our size. I'm looking at moving everything to PaaS because I think it'll ultimately be cheaper than hiring the devops engineer that I'm going to need to hire otherwise but doing a cutover on infra at this point in our business seems crazy and terrifying. I'd love to just move the whole thing to PlanetScale and Render.com or something.

# ? Mar 5, 2023 06:03

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 03:04

Docjowles: Apr 9, 2009

This is all validating my career choices to intentionally get away from managing large, important data stores. Cause that poo poo is hard.

Super interesting stuff, luminalflux. I would come to your TED talk.

luminalflux posted:

AWS support isn't enough

Also, my god, so much this. They're trying their best but unless your monthly spend is measured in multi-millions good luck getting the right people engaged before you just figure it out yourself. I find myself routinely raising issues with our TAM and then explaining the solution we came up with ourselves to him a couple weeks later. The fact that we pay well into 6 figures a year just in support costs for this is comical

Docjowles fucked around with this message at 06:41 on Mar 5, 2023

# ? Mar 5, 2023 06:34

luminalflux: May 27, 2005

prom candy posted:

Thanks, this is great stuff to think about. I have an inherited setup right now that imo is just stupidly overcomplicated for a company our size. I'm looking at moving everything to PaaS because I think it'll ultimately be cheaper than hiring the devops engineer that I'm going to need to hire otherwise but doing a cutover on infra at this point in our business seems crazy and terrifying. I'd love to just move the whole thing to PlanetScale and Render.com or something.

Friend of mine just joined Render, curious what your experience is. PlanetScale is pretty cool too, we've been eyeing them for horizontal sharding of our mysql. Honestly a lot of places could just host on a PaaS and call it a day. Too bad Heroku isn't viable anymore.

Docjowles posted:

Super interesting stuff, luminalflux. I would come to your TED talk.

Thanks! I'm kinda kicking myself for not giving this online. I've given a version of this internally, but i've kinda moved on from MySQL since we hired 2 SREs that just focus on MySQL management.

quote:

I find myself routinely raising issues with our TAM and then explaining the solution we came up with ourselves to him a couple weeks later. The fact that we pay well into 6 figures a year just in support costs for this is comical

There's a couple levers to pull here, I think i've delved into this before. A big one is talking to your account manager's manager and explain that you aren't getting good enough support. Your TAM's skillset might not be aligned with what you're doing. They can rotate the team for you - it's a big grenade to throw in there but I've done it. TAMs aren't be-all end-all though. They're good at routing your question to the correct service team, getting in front of service teams PM so you can kvetch at them for 50 minutes about why IRSA sucks rear end, and asking around among other TAMs stuff like "hey has anyone done <x>?"

# ? Mar 6, 2023 01:24

prom candy: Dec 16, 2005; Only I may dance

luminalflux posted:

Friend of mine just joined Render, curious what your experience is. PlanetScale is pretty cool too, we've been eyeing them for horizontal sharding of our mysql. Honestly a lot of places could just host on a PaaS and call it a day. Too bad Heroku isn't viable anymore.

It actually looks like you can do a zero downtime migration from RDS to PlanetScale so I might look more seriously at this option. I need to do the math a bit more carefully but it might actually also be cheaper than RDS for our needs.

Render I don't have any experience with but it does look like it makes a lot of the tricky parts much easier. Railway is another option in that space that I've looked at.

# ? Mar 6, 2023 03:13

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

As the SRE for a number of petabyte-scale-but-can't-afford-it DB scenarios I was at an org that was one of the early adopters of AWS Aurora MySQL and let me tell you how incredibly frustrating it was when AWS had bugs with it where there was a constant stream of silly set of bug fixes after one of our "massive" load spikes of maybe 100k write requests / minute buckled during the ~2 hours. One jaw-dropper was a note from the TAM that failover didn't work correctly because they were mistakenly caching DNS entries in their JVM, which was pretty darn well known by every senior Java dev I thought as how the JVM has worked for decades? I really hope it was something custom but the bugs really didn't give me much confidence that Aurora was crafted with the same kind of attention to detail that DynamoDB was.

# ? Mar 6, 2023 11:01

freeasinbeer: Mar 26, 2015; by Fluffdaddy

I think aws likes to hire well pedigreed but otherwise completely inexperienced folks and throw them at big problems. They could honestly use more ops folks all around and stop using baby programmers.

# ? Mar 7, 2023 17:01

drunk mutt: Jul 5, 2011; I just think they're neat

Is there any tooling for end-to-end testing with cloud resources?

Feels like Ansible would be a good fit, but feels like it's abusing the purpose of the tooling. The idea would be to not build a script as that's just horrible to maintain, but leverage tooling that would be easy to define resources to create, validate resources are created and can do what they're intended to do and tear down resources created for the tests.

Really trying to avoid having a co-worker go off and just write a kludge of a script that will just add more burden to our small team.

# ? Mar 7, 2023 17:18

The Fool: Oct 16, 2003

we use terratest

# ? Mar 7, 2023 17:20

drunk mutt: Jul 5, 2011; I just think they're neat

The Fool posted:

we use terratest

We were looking at that, but the AWS provider doesn't have the API calls built into their HCL for the resources we need. So it gave certain co-workers the great idea that we could just use null resources again, and well, I just rather not have to have that conversation all over again.

# ? Mar 7, 2023 17:28

Docjowles: Apr 9, 2009

One of my teammates is extremely hype about terratest backed by localstack. So I tried it out and literally the first test I tried to write proved to be impossible due a bug around transit gateways in localstack which was of course the thing I needed to test. So that was a pretty major buzzkill.

# ? Mar 7, 2023 17:34

Sylink: Apr 17, 2004

Is the job market messed up right now with the layoffs + copy cat layoffs?

I've been looking since the start of the year, and barely get any follow ups. I feel like previously I'd at least get lots of assessments and interviews, at least. now its just a black hole I'm throwing applications into.

I have a decade+ of practical experience at this point, but its not very traditional and leans towards sysadmin/operations, with heavy application support. So I apply to SRE/Devops roles aas well that sometimes fit this bill since people don't know how to describe SRE.

# ? Mar 7, 2023 17:35

12 rats tied together: Sep 7, 2006

drunk mutt posted:

Feels like Ansible would be a good fit, but feels like it's abusing the purpose of the tooling.

ansible is good at this and it isn't an abuse of the tool, there are several built in utilities specifically for writing playbooks that are test suites, even

do not use molecule, however, it's bad.

# ? Mar 7, 2023 17:48

Methanar: Sep 26, 2013; by the sex ghost

Sylink posted:

Is the job market messed up right now with the layoffs + copy cat layoffs?

I've been looking since the start of the year, and barely get any follow ups. I feel like previously I'd at least get lots of assessments and interviews, at least. now its just a black hole I'm throwing applications into.

I have a decade+ of practical experience at this point, but its not very traditional and leans towards sysadmin/operations, with heavy application support. So I apply to SRE/Devops roles aas well that sometimes fit this bill since people don't know how to describe SRE.

Yes. My org has been in a hiring freeze for months and I doubt we're alone.

# ? Mar 7, 2023 19:02

Sylink: Apr 17, 2004

Does your org do a freeze but still leave all the openings up ?

Cause I see what must amount to millions of positions that never get filled, lmao. Sorry if this is a derail for the thread.

# ? Mar 7, 2023 20:07

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

12 rats tied together posted:

do not use molecule, however, it's bad.

Molecule is intended for role testing. If you use it for testing roles, it's good. If you use it for running arbitrary playbooks, it's bad (because you're doing the wrong thing).

e: though I will note that I literally don't use driver functionality aside from establishing connections, and that we wrote a collection of Molecule helpers that we call from cookiecutter'd molecule/*/{create,destroy} playbooks. I recommend this approach for internal roles.

The Fool posted:

we use terratest

Terratest is mostly fine, but you can't escape it being written in Golang, which is just really bad at boilerplate/plumbing for comparing schemaless data structures (for example, making sure a Terraform-generated IAM policy has the expected contents). It does have all the good HCL-rewriting libraries, which is good if you need to mangle your examples to run integration tests in your environment.

Vulture Culture fucked around with this message at 20:18 on Mar 7, 2023

# ? Mar 7, 2023 20:12

Methanar: Sep 26, 2013; by the sex ghost

Sylink posted:

Does your org do a freeze but still leave all the openings up ?

Cause I see what must amount to millions of positions that never get filled, lmao. Sorry if this is a derail for the thread.

yes lol

# ? Mar 7, 2023 20:13

Sylink: Apr 17, 2004

welp then, cause I've done several interviews then seen the position still open months later :shrug:

I love that companies can throw like $20k in man hours on interviews then not fill a position.

# ? Mar 7, 2023 20:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Sylink posted:

welp then, cause I've done several interviews then seen the position still open months later

I love that companies can throw like $20k in man hours on interviews then not fill a position.

In such a company's defense, eating that sunk cost is cheaper than filling the position.

Imagine each person you employ has a fixed cost and generates some amount of value (which varies from person to person). Now imagine the macroeconomic environment works as a multiplier on that value; sometimes it's 2x, or 10x, and sometimes it's 0.5x in a contraction. There's little point hiring someone who's going to bleed money for reasons that they and you literally cannot do anything about.

# ? Mar 7, 2023 20:23

Docjowles: Apr 9, 2009

Methanar posted:

yes lol

Recruiting sucks so bad lol. I could believe any reason for this, including wanting a pipeline full of apps ready to go when hiring opens up again, thinking having a bunch of bogus openings posted makes the company look good, or simple incompetence.

# ? Mar 7, 2023 20:40

Sylink: Apr 17, 2004

The worst part is I get interviews when I've somehow reached the hiring manager/person I would be working for directly.

Whenever I deal with an HR middle-person, they just check off some bullet points and don't have the knowledge to extrapolate skillsets or see how they would apply.

Especially frustrating with computer touching where most of the skills is just being able to read documentation for new things and work with it, because there is just so much now to deal with, imo.

# ? Mar 7, 2023 21:30

Wizard of the Deep: Sep 25, 2005; Another productive workday

There's also bureaucracy moving at the speed of bureaucracy, in that the hiring team may not get told they can't fill the role until they've finished interviewing and say they want to move forward with a particular person.

Communication is both hard and important, yo.

# ? Mar 7, 2023 22:21

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Docjowles posted:

Recruiting sucks so bad lol. I could believe any reason for this, including wanting a pipeline full of apps ready to go when hiring opens up again, thinking having a bunch of bogus openings posted makes the company look good, or simple incompetence.

In a lot of cases it�s probably just a result of people within the company attempting to navigate the bureaucracy.

- Manager gets approval to open a req for some job.

- Manager communicates to internal talent acquisition team the requirements for the job and they create the job posting.

- Company institutes hiring freeze

- Manager knows that if she closes the req she will have to go through the approvals all over again to get it re-opened, and it could be denied, so she decides to leave it open

- the talent acquisition folks leave it open because the hiring manager hasn�t told them to close it, plus if the company isn�t even pretending to hire people then they don�t have a reason to remain employed

- since the job is still posted people still apply and the talent acquisition people still do their job and screen applicants and forward some on to the hiring manager who may interview them with an eye towards hiring later or trying to get an exception from the freeze

Plus hiring freezes are rarely universal. There�s generally *some* level of hiring going on just to maintain staffing levels as people leave.

# ? Mar 8, 2023 06:09

Hadlock: Nov 9, 2004

Sylink posted:

Is the job market messed up right now with the layoffs + copy cat layoffs?

I've been looking since the start of the year, and barely get any follow ups. I feel like previously I'd at least get lots of assessments and interviews, at least. now its just a black hole I'm throwing applications into.

I have a decade+ of practical experience at this point, but its not very traditional and leans towards sysadmin/operations, with heavy application support. So I apply to SRE/Devops roles aas well that sometimes fit this bill since people don't know how to describe SRE.

The market just fell off a cliff after the first half of January

I used to string along recruiters and they would email me back six weeks later. Now if I don't respond within 24 hours, recruiters are telling me they've already moved to final stage with an applicant if they even respond at all

Another factor is that Q1 hiring is done, we're three weeks from the end of Q1. Q2 headcount and payroll dollars are about to be approved/released by the CFO. If you're looking for a job right now, you'll be spinning your wheels until about March 28 and then March 30-April 14 expect the floodgates to open. You should be spending the intermediate time grinding leetcode + cramming weird trivia like Twitter for zombies SQL (this came up in a recent tech screen for some reason?)

# ? Mar 8, 2023 09:09

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?

# ? Mar 8, 2023 17:34

JehovahsWetness: Dec 9, 2005; bang that shit retarded

Vulture Culture posted:

Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?

We don't allow our pipelines to directly assume target IAM Roles, but use Vault as an IAM STS broker that gatekeeps the valid "target" roles for a given pipeline. So IAM Roles only need to trust Vault and we keep a map in Vault of pipeline -> role associations. Pipelines can then more easily use multiple IAM Roles and we don't have people loving around with trust policies.

e: it also centralizes the "this repo fucks with this account" list by forcing repo maintainers to explicitly register the association.

JehovahsWetness fucked around with this message at 18:04 on Mar 8, 2023

# ? Mar 8, 2023 18:01

12 rats tied together: Sep 7, 2006

how on earth are you hitting policy size limit with just a series of allow -> sts:AssumeRole?

# ? Mar 8, 2023 18:19

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

12 rats tied together posted:

how on earth are you hitting policy size limit with just a series of allow -> sts:AssumeRole?

We aren't, but back of the napkin math on role ARN length says we'd hit this somewhere around 160-170 accounts

# ? Mar 8, 2023 18:42

12 rats tied together: Sep 7, 2006

~30 characters per account, (round up to 50)
~130 characters to set up the scaffolding for allowlisting "assume role with series of account targets" (round up to 150) * 11 = 1650 characters

You get 1 inline policy at 10,240 characters and 10 managed policies at 6,144 characters for a total of 71,680 characters, after substracting for the scaffolding, that leaves you with (round down) 70,000 characters / 50 characters per account+role id = 1400 accounts? give or take?

I don't think you should have more than like, 30 accounts, to be clear, but I think even if you're learning really hard into the micro account strategy, this isn't a limit that matters usually.

# ? Mar 8, 2023 18:50

luminalflux: May 27, 2005

Vulture Culture posted:

Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?

We use OIDC role assumption with CircleCI

# ? Mar 8, 2023 19:08

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I'm about to run into this problem, and assumed I'd have to split the policy up into multiple policies. Which is do-able, it's just a hassle.

I'm intrigued about the Vault solution, but I'm bewildered about how that actually works. I grok'd all the nouns in that explanation, but I have no idea how they all connect together.

# ? Mar 8, 2023 20:27

12 rats tied together: Sep 7, 2006

if you're using ansible to handle your managed policies you can use jinja2's batch filter to express this pretty trivially

YAML code:

{% for chunk in aws_accounts | batch(200) %}
PushModePolicy{{ loop.index }}:
  Type: AWS::IAM::ManagedPolicy
  Properties:
    # ...
{%   for account in chunk %}
      - "{{ account }}"
{%   endfor %}
    # ...
{% endfor %}

You'll get a PushModeManagedPolicy for every 200 accounts, and each account injected into its own PolicyDocument. This pattern is also useful for other things that are big sometimes like DNS records or external customers in a bucket policy.

You'll want to take care that you don't unnecessarily re-sort "accounts" but even if you do CloudFormation handles this update gracefully so there's no chance of outage.

# ? Mar 8, 2023 20:50

crazypenguin: Mar 9, 2005; nothing witty here, move along

Vulture Culture posted:

Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?

Do you need to enumerate the account IDs?

This is not something I have set up, but thinking back, I'd assume we were using something that could assume a particular role name in any account. All the actual access control was within those accounts (e.g. trusting the CI/CD Service root account, and requiring, I think, an sts:ExternalId to help prevent any confused deputy type of attacks)

# ? Mar 8, 2023 22:21

luminalflux: May 27, 2005

Vulture Culture posted:

Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?

I guess i'm kinda struggling with the problem because we have a bunch of accounts and a lot of different CircleCI jobs in repos, but we use OIDC. We create an OIDC IdP in each account that trusts Circle's OIDC provider and use AssumeRoleWithWebIdentity (see Circle's docs). This also limits blast radius somewhat since you can lock down each role to a specific repo with conditions.

If instead you insist on having One Big Fat Jenkins Role that needs to assume roles in other accounts and you need to grant it a bajillion sts:AssumeRole permissions, can you do something like permitting it to assume arn::iam::*:role/honkfarts, and in the trust policy for each role in the target accounts have a condition key for the organizational unit Jenkins lives in?

# ? Mar 9, 2023 01:01

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Are there any better LSM controllers for Kubernetes than KubeArmor right now? I have a need to restrict some behaviors of a badly-behaved vendor application, and the deny/allow policy behaviors of KubeArmor are killing me. I'd like something where I can write policy into K8s somewhere, but I'll compile eBPF bytecode if I have to.

# ? Mar 9, 2023 02:37

madmatt112: Jul 11, 2016; Is that a cat in your pants, or are you just a lonely excuse for an adult?

Vulture Culture posted:

� I'll compile eBPF bytecode if I have to.

What a flex

# ? Mar 9, 2023 03:01

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

madmatt112 posted:

What a flex

Unfortunately, there's no way for me to defend this by saying "I was writing dtrace probes 15 years ago" without it also sounding like a flex, so I think all I can say is "this tech all sounds a lot harder than it is"

# ? Mar 9, 2023 03:41

madmatt112: Jul 11, 2016; Is that a cat in your pants, or are you just a lonely excuse for an adult?

Vulture Culture posted:

Unfortunately, there's no way for me to defend this by saying "I was writing dtrace probes 15 years ago" without it also sounding like a flex, so I think all I can say is "this tech all sounds a lot harder than it is"

Bro come on now I know you IRL whether you�re aware or not and that�s all I�ll say about that lmao. You are an outlier on the high end of the bell curve, but you still have a point that it all looks much more intimidating from the outside.

# ? Mar 9, 2023 05:35

drunk mutt: Jul 5, 2011; I just think they're neat

It gets real fun when you get hired in as a "junior" with 20+ years with the exact technologies under your belt.

It shouldn't happen, but hey sometimes someone just likes playing the miserable startup life for a majority of their life and it just doesn't look like they have experience on paper.

But seriously, there are kids way smarter than me these days on aspects that I give absolutely no poo poo about...Like I do not give a gently caress about your preferred javascript library framework thingy...I will absolutely without doubt and stand my ground say that I will not touch JS. My hatred towards this has blinded me, but at the same time, I accept that blindness for the technology direct and delegate all of that poo poo to a person I trust can communicate that world to mine.

I guess a lot of words to say, there are a lot of us around that were doing poo poo above and beyond expectations of the modern day, but the only thing that makes us different is this era gets by with doing WAY loving LESS and it's loving awesome.

# ? Mar 9, 2023 06:07

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

madmatt112 posted:

Bro come on now I know you IRL whether you�re aware or not and that�s all I�ll say about that lmao. You are an outlier on the high end of the bell curve, but you still have a point that it all looks much more intimidating from the outside.

Not particularly accomplished or anything but as someone that works with folks writing eBPF working in the domain of security and o11y for K8S including one of the kernel contributors a lot of these supposedly fancy things sound intimidating mostly because people simply aren't that familiar with it. Like how iteration got supported is kinda bonkers but honestly it's not the worst thing I've seen anyone do either (it's not that different to me than doing a validator with ye olde funrollloops). There's a lot of hard, exasperatingly stupid work necessary and lots of sharp edges to deal with bleeding edge stuff that kind of takes away from the glamour of it all though IMO. You would think most of us experienced folks wouldn't romanticize all this newfangled stuff much but I have the suspicion it's a combination of the hype-makers and less experienced engineers that aren't jaded enough that they might fall for hype once in a while.

# ? Mar 9, 2023 07:37

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 03:04

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

necrobobsledder posted:

Not particularly accomplished or anything but as someone that works with folks writing eBPF working in the domain of security and o11y for K8S including one of the kernel contributors a lot of these supposedly fancy things sound intimidating mostly because people simply aren't that familiar with it. Like how iteration got supported is kinda bonkers but honestly it's not the worst thing I've seen anyone do either (it's not that different to me than doing a validator with ye olde funrollloops). There's a lot of hard, exasperatingly stupid work necessary and lots of sharp edges to deal with bleeding edge stuff that kind of takes away from the glamour of it all though IMO. You would think most of us experienced folks wouldn't romanticize all this newfangled stuff much but I have the suspicion it's a combination of the hype-makers and less experienced engineers that aren't jaded enough that they might fall for hype once in a while.

Oh. I work with ebpf. Are you a bee?

# ? Mar 9, 2023 07:50

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »