Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

12 rats tied together: Sep 7, 2006

the problems with yaml, including every problem from that article, are best solved by using a version of the yaml spec that is newer than the one from 2006

# ? Dec 17, 2023 20:51

Adbot: ADBOT LOVES YOU

# ? Jun 9, 2024 08:09

The Iron Rose: May 12, 2012; Cat Army

also by quoting strings

I will say my ugliest helm repo is for cert manager because I had to integrate existing deployments over ~50 clusters and ~50 repos with completely different naming schemes into a service-specific repo. This means I have like 8 environment variables per cluster to account for all the backwards compatibility issues, but sprig templating well despite being ugly as hell.

For dev teams having a cluster oriented repo for deploying third party tooling is not a bad one, but when you�re a platform team it doesn�t scale at all. Easily the biggest improvement we�ve made to code quality and consistent service delivery, with semantic versioning of our releases (and helm charts, libraries, etc�) coming up second.

I also really can�t recommend helmfile (https://github.com/helmfile/helmfile) highly enough for orchestrating helm releases. Huge huge improvement over writing a bunch of helm apply commands in your CI pipelines.

# ? Dec 17, 2023 20:56

Falcon2001: Oct 10, 2004; Eat your hamburgers, Apollo.; Pillbug

12 rats tied together posted:

the problems with yaml, including every problem from that article, are best solved by using a version of the yaml spec that is newer than the one from 2006

True, but as it points out, PyYAML, the most common Python YAML library (which I use because I had never heard of this problem before!) still uses the 1.1 spec and therefore has a lot of the weird problems. This is probably not uncommon for other languages too.

# ? Dec 17, 2023 22:23

vanity slug: Jul 20, 2010

ruamel.yaml does yaml 1.2 and has its own host of issues (like using sourceforge in tyool 2023)

# ? Dec 17, 2023 22:39

12 rats tied together: Sep 7, 2006

yep. if you're on python i recommend switching to ruamel which supports yaml 1.2 (from 2009), which suffers from 0 of the problems in "the yaml document from hell" article to my recollection

it also supports versions of the specification from this current decade as well. but even going to 1.2 is a huge upgrade for the average user of yaml

if you're on a language with a bad selection of yaml libraries (c#, go) i recommend using xml instead

# ? Dec 17, 2023 22:43

The Fool: Oct 16, 2003

I wouldn't use xml if it needs to be human readable

# ? Dec 17, 2023 22:52

Hadlock: Nov 9, 2004

xml has it's place but uh i can't think of a polite way to say this but i don't think i've ever heard it suggested to be used for IaC before, and I certainly wouldn't use it for that purpose. maybe i misread your post

yaml works well for me for two reasons

1. it's a widely accepted, json-compatible global standard
2. it's human readable, particularly at 3am when production is down and you're squinting at a monitor in the dark

# ? Dec 17, 2023 22:59

Falcon2001: Oct 10, 2004; Eat your hamburgers, Apollo.; Pillbug

Yeah I'd be way more likely to use JSON over xml if it's supposed to be human readable, just format it properly and it's at least fine.

# ? Dec 17, 2023 23:07

George Wright: Nov 20, 2005

I strongly prefer JSONx over XML, JSON, and YAML.

# ? Dec 17, 2023 23:26

12 rats tied together: Sep 7, 2006

JSON doesn't have integers so I would not use it for infrastructure as code.

XML is very humanly readable, simply parse it in a good language first (python), do whatever you need to it, and then serialize it when you're done. :cheers:

# ? Dec 17, 2023 23:44

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Terraform fundamentally doesn't work as a deployer for Kubernetes applications because the Terraform and Kubernetes models are actually, in a great number of circumstances, completely incompatible.

A great example involves CRDs. If you're going to install a software release, you're going to want those CRDs installed too. If you uninstall the release, say to test an upgrade or troubleshoot an issue, in most cases you're going to leave the CRDs alone. In these cases, Terraform really dislikes the "ignore this on destroy" mindset. There are other cases where you do want Terraform to remove the CRDs. In these cases, Terraform wants to uninstall the Kubernetes resources in the opposite order from how they were created, which will probably also be incorrect: it will try to uninstall the controllers, then uninstall the CRDs; this will trigger a cascade delete of every extant resource of that type on the apiserver, and all of those deletes will time out because you've already removed the controller that's supposed to handle them.

Externalize your deployments. It doesn't matter if you're using Helm or wrapping Argo or Flux or whatever, and having those configurations managed in Terraform if you want them. But using a model where one Kubernetes resource is represented by one Terraform resource, Terraform's lifecycle is just not capable of adding more value than it subtracts in that scenario.

# ? Dec 18, 2023 15:28

22 Eargesplitten: Oct 10, 2010

Hadlock posted:

Edit: oh it's "you have to have already worked on a SAG film to get hired for future SAG films"

This seems to be a good comparison for a lot of IT stuff. For example, my most recent role was a junior Linux admin. At the job before that, I started as a NOC technician but my manager knew I was overqualified and pushed for me to shadow whichever department I wanted, I picked the Unix team. Then I got promoted to a role that was partially Linux partially Windows and learned enough Linux to get into the junior Linux admin role.

# ? Dec 18, 2023 16:37

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Does anyone have a Kubernetes workflow for continuous delivery onto ephemeral clusters (ex. PR/MR environments) that they're happy with? Argo and Flux both seem to suck badly enough at this that because of GitOps Opinions that I'm not clear on whether or to what extent they add actual value.

# ? Dec 18, 2023 19:41

Hadlock: Nov 9, 2004

At my last job we allowed ephemeral environments of the monolith, and they were updated using flux and it was generally really reliable. The downside was that those environments managed their flux state inside the same repo so every push to that repo caused flux to poo poo an extra commit to the same branch and I forget how we repaired the "ephemeral environment flux state" folder when it got merged back to master. I did not like this system

# ? Dec 18, 2023 19:45

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Hadlock posted:

At my last job we allowed ephemeral environments of the monolith, and they were updated using flux and it was generally really reliable. The downside was that those environments managed their flux state inside the same repo so every push to that repo caused flux to poo poo an extra commit to the same branch and I forget how we repaired the "ephemeral environment flux state" folder when it got merged back to master. I did not like this system

To an outsider, this certainly looks like the main area of downsides as I'm considering switching to something like Flux for deploying K8s controllers/operators. It seems like reconciling the Git repository state the way that it wants you carries way more overhead than reconciling the K8s deployments from some generic source of truth.

# ? Dec 18, 2023 19:50

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

12 rats tied together posted:

yep. if you're on python i recommend switching to ruamel which supports yaml 1.2 (from 2009), which suffers from 0 of the problems in "the yaml document from hell" article to my recollection

it also supports versions of the specification from this current decade as well. but even going to 1.2 is a huge upgrade for the average user of yaml

if you're on a language with a bad selection of yaml libraries (c#, go) i recommend using xml instead

Shut up I'm trying to get my company to start using yaml and stop using loving powershell for iac, I already lived through the xml era and don't want to go back

# ? Dec 18, 2023 20:29

The Fool: Oct 16, 2003

like, are they writing custom powershell scripts to deploy infrastructure

or are you just talking about dsc and bicep/arm?

the latter is dumb, but the former is insane

# ? Dec 18, 2023 20:44

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

I've written a poo poo ton of PowerShell code to provision infrastructure but I think it's really a stretch to call that "Infrastructure as Code".

# ? Dec 18, 2023 20:44

Hadlock: Nov 9, 2004

Bruegels Fuckbooks posted:

DevOps Thread: I already lived through the XML era and don't want to go back

New thread title

# ? Dec 18, 2023 22:14

Hadlock: Nov 9, 2004

Arguing with my non-ops minded VP today about why we should have a dedicated git account for automation purposes and not just use his personal account token everywhere

# ? Dec 18, 2023 22:16

The Fool: Oct 16, 2003

should have a few imo

# ? Dec 18, 2023 22:56

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

The Fool posted:

like, are they writing custom powershell scripts to deploy infrastructure

or are you just talking about dsc and bicep/arm?

the latter is dumb, but the former is insane

Former

# ? Dec 18, 2023 23:43

Dukes Mayo Clinic: Aug 31, 2009

if i never again invoke xmlstarlet it will be too soon.

# ? Dec 19, 2023 18:31

i am a moron: Nov 12, 2020; "I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

The Fool posted:

like, are they writing custom powershell scripts to deploy infrastructure

or are you just talking about dsc and bicep/arm?

the latter is dumb, but the former is insane

Ngl the latter and former in this case are both dumb af, currently consulting with a company using bicep and it is an absolute joke of Microsoft proportions.

# ? Dec 19, 2023 20:12

12 rats tied together: Sep 7, 2006

bicep seemed fine when i was looking at it but it's definitely one of those "who asked for this" types of projects.

i would prefer that they invest actually documenting ARM instead of, when I ask what happens if I change a property of an ARM resource, forwarding my email to the engineer who implemented the resource :confused:

# ? Dec 19, 2023 20:20

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

12 rats tied together posted:

bicep seemed fine when i was looking at it but it's definitely one of those "who asked for this" types of projects.

i would prefer that they invest actually documenting ARM instead of, when I ask what happens if I change a property of an ARM resource, forwarding my email to the engineer who implemented the resource

This reminds me of the time I filed a HashiCorp support ticket, got on a Zoom, and started complaining about a feature that was implemented badly, only to find out a minute later I was talking to the engineer who wrote the code

# ? Dec 19, 2023 21:56

The Fool: Oct 16, 2003

I've done that intentionally with both the engineer and the product manager

# ? Dec 19, 2023 22:01

TheBlackVegetable: Oct 29, 2006

Hadlock posted:

Arguing with my non-ops minded VP today about why we should have a dedicated git account for automation purposes and not just use his personal account token everywhere

Do you mean GitHub account? Because I think you're supposed to use the repository Deploy Key for that.

# ? Dec 19, 2023 22:14

Hadlock: Nov 9, 2004

TheBlackVegetable posted:

Do you mean GitHub account? Because I think you're supposed to use the repository Deploy Key for that.

Then you have to manage a key per repo, and you have to manually go generate a new one every time a repo is created/re-created

:suicide:

Fast forward 3 months from now:

Developer: hey I'm getting access denied when I try and setup this declarative ci/cd thing for < greenfield >
Me: oh yeah you need to go into GitHub UI and generate a deploy key, then save it in the shared secrets manager, then it wherever you need to
Developer: oh but this is going to get automated? How do I login to the shared secrets thing again
Me: no you'll need to do this every time
Developer: is there documentation? This seems needlessly repetitive
Me: no just Google it
Developer + their manager: what the gently caress do we pay you for

Or with dedicated GitHub user:

Developer: I setup my thing today using the stored credential
Me: yay IaC
Developer: let me buy you a beer
Me: :guinness:

Hadlock fucked around with this message at 22:29 on Dec 19, 2023

# ? Dec 19, 2023 22:26

The Iron Rose: May 12, 2012; Cat Army

12 rats tied together posted:

bicep seemed fine when i was looking at it but it's definitely one of those "who asked for this" types of projects.

i would prefer that they invest actually documenting ARM instead of, when I ask what happens if I change a property of an ARM resource, forwarding my email to the engineer who implemented the resource

I don�t mind bicep, but mostly because I just increasingly dislike terraform. Statelessness is bliss.

# ? Dec 20, 2023 02:54

12 rats tied together: Sep 7, 2006

Bicep is just "they hold the state", to my recollection, right? Like cloudformation except instead of having you type yaml you type a DSL?

My usual bar for these things is I google "<name of tool>" + "AWS::NoValue" to get a gauge on whether anybody has actually used it to manage any infrastructure of note. It looks like Bicep finally started supporting conditionally passing nulls this year, in september! Huge congrats to the azure team responsible for this feature.

AWS::NoValue has existed since the year 2010.

# ? Dec 20, 2023 03:19

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I think a lot about what "better than Terraform" looks like, and I've decided that one day we're going to blink and some event-driven enterprise SaaS orchestration system will have abruptly taken over everything we used to use infrastructure-as-code tools for

The distance between streaming ETL and Terraform's gaps is not nearly as big as people imagine it is

The second anyone figures out how close the modern Identity and Access Management and ephemeral infrastructure problem spaces are together, it's also going to cause a big disruption

Vulture Culture fucked around with this message at 20:11 on Dec 20, 2023

# ? Dec 20, 2023 20:08

12 rats tied together: Sep 7, 2006

Agreed and I think if you go into my post history ITT and scroll past the walls of text where I lose my mind about how bad terraform is, you will find me also pining for something like this.

I have a small handful of things at work running on top of ansible-rulebook and I liked it well enough to try and build something "real" with it, which is currently in the planning stages.

The documentation is pretty bad at the moment. I'm comfortable enough with the domain and the codebase that if redhat drops the project, I can support it as-currently-written while we rebuild parts of it onto a less insane technology stack.

I would not suggest taking it on as a keystone type dependency unless you feel the same way.

# ? Dec 20, 2023 20:39

The Fool: Oct 16, 2003

i'm just going to leave this here

https://radapp.io/

and back away slowly

# ? Dec 20, 2023 21:20

i am a moron: Nov 12, 2020; "I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

I feel like there is very much a point of diminishing returns for provisioning solutions that exists for 99% of infrastructure out there that is being spun up with TF today. I got one text file that describes my infra, another one that records what it should look like currently down to a very nice level, and an app that helps me manage the difference between them. It's revelatory for a lot of orgs in that it adds visibility and repeatability that used to be harder to come by, but anything beyond that is kinda like okay... why? How much faster is deploying my app going to get when I can cut my infra provisioning times down as far as I already have? My company is trying to consult in the "better than TF" space and it's met with a lot of shrugging when you start talking to the eng teams.

I really doubt people are going to pay for a SaaS solution for this stuff. They won't do it now. Even selling bolt-ons that makes the ecosystem easier to manage doesn't really seem that fruitful IME. TF is cheap and easy and adds value and I can shove it into my DevOps or GH workflows and it barely costs anything.

# ? Dec 20, 2023 21:28

Hadlock: Nov 9, 2004

i am a moron posted:

but anything beyond that is kinda like okay... why? How much faster is deploying my app going to get when I can cut my infra provisioning times down as far as I already have? My company is trying to consult in the "better than TF" space and it's met with a lot of shrugging when you start talking to the eng teams.

I had lunch with an engineer at twillio one day and he said something that really stuck with me. He gave me a story about doing something fulfilling like walking his dog after work.

He said something like

twillio sre posted:

if I'm going to have to learn something new, it ought to increase my free time by a magnitude of order, in order to get back the time I spent learning the new thing. If your thing isn't going to get me a magnitude of order of extra time back for effort put in, you're never going to get buy in. Show me how I'm going to get 30 minutes a day of extra time playing with my dog or kids, every day, not just once a week, by implementing this new technology, and I'll sign up right away. If not, don't bother me

Probably the best lesson I've ever had for introducing new technology, arguing the value proposition, and knowing to let go and not try and win every battle

Kubernetes + AWS + helm allows you to eliminate most of IT, network engineers, junior Linux admins, hardware purchasing planning/scheduling etc etc it's an easy sell to a legacy brick and mortar operation. It's that "magnitude of order" improvement

Terraform is marginally better than ansible. The argument for upgrading from ansible is a really loving hard sell. Terraform shops in general are better than ansible shops but only because they started pos 2015 or more realistically post 2017.

Maybe some day someone will invent ChatGPT that you say "setup a complete cicd for this repo" hand it your credit card and it'll just do it, and everyone will move to that

Right now the only problem I have with terraform is that there's not a "Ruby on rails" or. Django style of highly opinionated infrastructure best practices and how to organize everything

i am a moron posted:

My company is trying to consult in the "better than TF" space

Good luck with that.

# ? Dec 20, 2023 21:50

i am a moron: Nov 12, 2020; "I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

Hadlock posted:

Kubernetes + AWS + helm allows you to eliminate most of IT, network engineers, junior Linux admins, hardware purchasing planning/scheduling etc etc it's an easy sell to a legacy brick and mortar operation. It's that "magnitude of order" improvement

There are very large organizations using TF that also have fossilized stacks that can't move to this at all (SAP, ERPs, huge #'s of COTS deployments), but if it's feasible I totally agree. Maybe not on that specific stack, but anything close to it is so goddamn easy and moving back to anything else hurts (ask me how I know :cry:

)

quote:

Good luck with that.

We're always trying to suss out the new stuff, and so far it mostly seems like recycled ideas and possibly cool but ultimately not that much better bullshit like Pulumi.

# ? Dec 20, 2023 23:52

12 rats tied together: Sep 7, 2006

The main sell for Pulumi is:

- Cross language functionality so nobody has to learn HCL or how to contort their infrastructure into json-or-worse. Everyone just has to learn the provider(s), which they had to do anyway.

- Automation API is a tighter integration between procedures and collections of resources.

It's like if you had your own terraform cloud except it was just a bash command that kicked off a rolling update that safely deploys a bunch of changes in sequence. Since it's written using a programming language it can also export prometheus metrics or push logs/events/slack messages, it can check safety valve metrics between each terraform stack it mutates, you could even spin up a temporary web server and do something like "react with stop sign emoji to pause this deployment" in slack.

It's hugely better, it's basically a new paradigm. Instead of needing to use ansible to drive your terraform you have a tool that does both natively, and it doesn't suffer from ansible being a curses application with a bunch of nasty state.

# ? Dec 21, 2023 00:04

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

i am a moron posted:

I feel like there is very much a point of diminishing returns for provisioning solutions that exists for 99% of infrastructure out there that is being spun up with TF today. I got one text file that describes my infra, another one that records what it should look like currently down to a very nice level, and an app that helps me manage the difference between them. It's revelatory for a lot of orgs in that it adds visibility and repeatability that used to be harder to come by, but anything beyond that is kinda like okay... why? How much faster is deploying my app going to get when I can cut my infra provisioning times down as far as I already have? My company is trying to consult in the "better than TF" space and it's met with a lot of shrugging when you start talking to the eng teams.

I really doubt people are going to pay for a SaaS solution for this stuff. They won't do it now. Even selling bolt-ons that makes the ecosystem easier to manage doesn't really seem that fruitful IME. TF is cheap and easy and adds value and I can shove it into my DevOps or GH workflows and it barely costs anything.

These tools are all fine for small orgs where you have an infra team or whatever. Once you have people looking to solve problems by injecting behaviors into each and every thing in the business, even something as small as a tag on a bunch of AWS resources, legacy configuration-as-code approaches slow you down at least as much as they speed you up

# ? Dec 21, 2023 00:18

Adbot: ADBOT LOVES YOU

# ? Jun 9, 2024 08:09

i am a moron: Nov 12, 2020; "I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"

I don't work with small companies and I don't understand your point here:

quote:

Once you have people looking to solve problems by injecting behaviors into each and every thing in the business, even something as small as a tag on a bunch of AWS resources, legacy configuration-as-code approaches slow you down at least as much as they speed you up

You can just reference the same input variables over and over or set up guardrails to ensure the platform tags things anyways. Seems like a really specific and nitpicky thing. And if you need to tag something, how is your SaaS product going to figure that out without human input anyways?

# ? Dec 21, 2023 00:24

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »