Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

IMO the primary benefit IS reproducibility rather than productivity; clicking is cancer. 100% agreed on the rest though.

I spent the entirety of 2019 trying to make extensible terraform modules for consumption by other groups, as described above, and it went absolutely how you expect.

Bhodi fucked around with this message at 03:58 on Feb 2, 2021

# ¿ Feb 2, 2021 03:54

Adbot: ADBOT LOVES YOU

# ¿ May 16, 2024 05:40

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Terraform is very bad at anything at the OS layer so no, it's not the tool I'd use to deploy apps. I've had the most luck using terraform to stuff the cloud-init with a bootstrap/install script of some kind but SSH scripts are, in general, a bad idea during a terraform run. It's very very brittle because it has almost no error handling or ability to detect/recover from issues. Even the basic AWS delay of assigning a dynamic IP after an instance is provisioned can cause runs to fail, as can any issues with connectivity between the host that you're running terraform on. It's such an issue that there's a giant banner "Provisioners are a Last Resort" in the terraform documentation trying to warn you away from this approach.

Because of this limitation, terraform's not a good tool to orchestrate software provisions or updates - you're going to want to use something else. Once something is "provisioned" the expectation is that it doesn't change, which doesn't mesh with software updates unless you deploy the entire thing from scratch, which terraform is also not able to do as there's no good mechanisms rolling update/redeploys. There are a few blogs that talk about some approaches that get close but those are more hacks that require manual steps (rather than something that's more consistent, repeatable, and able to be put into a automated pipeline). Look elsewhere.

Bhodi fucked around with this message at 05:22 on Feb 2, 2021

# ¿ Feb 2, 2021 05:09

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Vulture Culture posted:

My most hilarious abuse of nsenter so far has been prototyping an Ansible connection plugin so that Ansible can run containerized in a pull model and still manage its own host

This is the most galaxy brain thing i have ever heard of.

Container escape, but for justice!

# ¿ May 8, 2021 14:54

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

I want someone to actually calculate how much it costs to run anti-virus on all our servers in terms of licensing and increased memory/disk/cpu usage and contrast that with any successfully detected viruses or worms.

# ¿ Jul 1, 2021 21:35

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

I'm not a super expert, but we (and many others, including ssl labs and your customer's security scanning tools) follow NIST guidance on this, check out page 14. tl;dr on that section is that to be up-to-date and modern and as safe as possible, disable everything but the ones explicitly specified in the list on page 16-18, but like Hadlock says it might break some legacy things because old things are only going to support older, less secure protocols.

As to the why cloudflare allows older, less secure ciphers by default, the answer is "Cloudflare attempts to provide compatibility for as wide a range of user agents (browsers, API clients, etc.) as possible."

It sounds like you may need to strike a balance between potentially supporting older less secure customers and having more security conscious customers be angry at you, though I can say that we've never lost customers by being "too secure" for their lovely IE 7 or whatever connections, we (politely) tell them to upgrade and point to our documentation that shows we only support a specific range of browsers. This kind of hardening isn't unreasonable and the vast majority of connections should support the list of NIST approved ciphers.

Bhodi fucked around with this message at 16:50 on Oct 27, 2021

# ¿ Oct 27, 2021 16:39

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

YMMV if you operate under PCI DSS, fedramp or some other external ATO or security framework, which may mandate these sorts of things, but if that's the case it should be coming from your security department or your own security tools, not from an external customer.

TBH I kinda disagree with it being a waste of time and just checking a box. Normally I advocate with going with defaults but security hardening is something everyone should be comfortable with doing and something you should have already BEEN doing. Don't assume sane defaults when it comes to configuration of external connections and a customer calling you out on it is a failure of your company process, not just them being annoying.

Bhodi fucked around with this message at 16:59 on Oct 27, 2021

# ¿ Oct 27, 2021 16:52

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

cum jabbar posted:

Anybody in here self-hosting their Terraform state? We work in a tough regulatory environment and basically need to keep it on our own servers. I'm not sure which storage is most ergonomic and safe.

It's not very large and it's just a plaintext file so anything that has file locking and that you have appropriate backups to and possibly versioning will work fine.

# ¿ Nov 20, 2021 16:13

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Last time I had to deal with that, we compromised with a local mirror which had scheduled syncing to a staging area and went to our security team for scan/approval before going live (just simlinking to the staging directory). It made security and auditing happy because there was appropriate approvals and the real security ended up being the ~1-2 week delay from live which allowed us to just not do the sync that week when something dumb hits the news. Didn't protect from the whole "this thing had been compromised for 6+ months" or npm dependency bloat but it would catch those quickly caught and reverted hacked account uploads.

# ¿ Nov 20, 2021 16:45

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Blinkz0rz posted:

A 1-2 week delay in wanting to use a dependency to approval?

If so, that's nuts. Putting security as a primary blocker is fundamentally unscalable and only gives engineering orgs ammo to push back against better security measures.

That's why good security teams have increased focus on shifting security left to put that onus on tooling that developers work with directly like npm audit, snyk, etc. and adding gates in the CI/CD pipeline to prevent malicious packages from sneaking through.

The only downside is that it has to be implemented alongside decent endpoint visibility tools and a solid active response process in the event that a malicious packages is identified on a dev's endpoint but if you're putting up a local mirror hopefully you have enough opsec to have those tools and processes in place already.

Yeah, at least it was better than what was previous, which was pushes only on quarterly releases or with emergency approval (two separate approvals up to senior director level signoff). It was a compromise, and security inserting themselves as a gate of any code going into the isolated environment was a mandatory part of the process due to factors outside our control :smith:

only having a 1-2 weeks delay from internet was considered a triumph

# ¿ Nov 20, 2021 17:20

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Blinkz0rz posted:

You could run localstack on a server in your DC and make sure the disk is mirrored.

Reminds me. Remember openstack?

Lol, just lol. Flew too close to the overcomplex sun.

# ¿ Nov 20, 2021 20:26

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Storing configuration? Yeah sure no problem just use a complex distributed messaging service with a bunch of moving parts like consul and hook it into everything, nothing will go wrong and this is a fantastic idea.

e: Alternatively: you can use salt, like rhn products do under the hood now and get an underperformant multi-part eventual-consistency registration handshake powered by pixie dust and prayers.

Bhodi fucked around with this message at 16:10 on Jun 19, 2022

# ¿ Jun 19, 2022 16:06

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Hadlock posted:

I legit can't tell if you're making fun of k8s and etcd or not

I mean, etcd is the most reliable and fast of the bunch

FWIW

# ¿ Jun 20, 2022 22:40

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

It's unlikely you're going to get cycles to re-architect at this late date, so the next best thing is to do some repo surgery as was suggested. In terms of work:reward you're probably looking at half a day's effort which is a pretty reasonable sell.

At least you can buy yourself some more time, as twerk says by using bfg-repo-cleaner, git filter-repo, and git-sizer.

# ¿ Aug 21, 2022 16:16

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

That smells to me like deliberate obfuscation for the intention of selling proserv.

# ¿ Aug 31, 2022 17:55

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

New Yorp New Yorp posted:

Let's play Pattern or Anti-Pattern!

Something I see a lot is teams creating a module (either in Bicep or in Terraform) that wraps around a single resource. i.e. a module for an Azure App service plan. It just exposes a few properties of the app service plan as parameters.

I consider this an anti-pattern: A module should be a versioned unit of reuse that provides a template for a set of interrelated resources, not a thin wrapper around a single resource.

This is even worse when the single-resource modules aren't properly versioned, meaning all consumers are tightly coupled to a thin wrapper that is impossible to change because dozens of consumers need to be simultaneously updated otherwise their infrastructure all breaks unexpectedly.

This comes up a lot more in Bicep, because Bicep doesn't have the ability to split individual resources out into separate files. So people create a module just so they can have an "app_service.bicep" file. But I see it in Terraform too and it drives me crazy.

I think a lot of companies went this way (including mine) because there wasn't (maybe still isn't? I don't deal with it much anymore) a way of pulling in environment-specific variables to auto-populate resource fields for end users while also enforcing mandatory things like tags without that wrapper. You end up making a joe_web module because all joe webservers go in this subnet which the app team has no idea about because it's abstracted from them, adding 20 fields in every resource is messy/duplicates work and is hard to keep updated, and security/finance mandates everything gets these tags, so if you're a "frameworks" type group trying to provide structure to various app groups, unless you want to PR every single change by every group that uses terraform well as "own" all the terraform files, modules are the only way to provide some structure and enforce standardization to those sister orgs.

With terraforms limitations, without the module wrapper layer there's very little abstraction which just doesn't work very well in larger orgs unless you use some other tool to handle that, but templating out terraform files in that way lies madness. You couldn't even use interpolation in other declared variables which precluded importing some sort of universal standards config file you could manage.

Bhodi fucked around with this message at 22:11 on Dec 21, 2022

# ¿ Dec 21, 2022 21:57

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Methanar posted:

That system kind of works and encourages teams to fix their terraform to include the tags otherwise they'll have constant state diffs reported by TF since the autotagger will just keep overwriting you for doing things wrong.

This is the lawful evil box of the devops D&D grid meme.

Force overwrite and let ~~god~~ other groups sort it out

Bhodi fucked around with this message at 02:02 on Dec 22, 2022

# ¿ Dec 22, 2022 01:58

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

xzzy posted:

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.

I haven't admined an instance in a hot minute but don't they gate federation/oauth behind enterprise licensing? Runners too, IIRC. You can live without runners but it'd be hard for a lot of people to live without federation of some kind.

# ¿ Mar 3, 2023 20:54

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Yeah, that one component will be up while everything else in your entire office toolchain is broken

# ¿ Mar 3, 2023 21:57

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

We just cleared up like of 60TB of unattached azure storage from decomm'd hosts mid last year that no one noticed was there. People counting dollars and looking at cost/effectiveness, you're doing alright.

# ¿ Mar 21, 2023 15:44

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

we asked them for YEARS to provide us a way of searching their issued ssl certificates by domain name rather than that crazy uuid issue number in the vault gui. they were absolutely not responsive to (paying) customer requests and getting features onto their roadmap.

# ¿ Aug 18, 2023 00:05

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

management uses that term exclusively and i hate it

# ¿ Aug 22, 2023 14:11

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

�new and sexy� would not have described puppet or chef a decade ago, much less now

# ¿ Dec 13, 2023 13:28

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

if you want me to adopt a new tool, offer something that doesn�t require me to switch my entire infra into new managed source of truth that won�t interoperate with anything else and is so brittle that you have to go into disaster recovery if it�s internal model doesn�t match reality down to the decimal place.

every single new tool is always �this is amazing if you greenfield it from the ground up in a controlled environment�. yeah? when�s the last time you worked in a place that was willing to throw away / transition out of decades of developed business process?

I don�t need another finely tuned precision wondertool that jealously guards its domain, I need some contraption that runs on bunker oil and grit that brings modest benefit and runs everywhere with an eye to interoperation.

it�s rare that any dept can dictate the whole stack that a tool manages in anything but tiny companies and there is rarely consideration of this from the tools design perspective beyond �i guess we will allow it to ignore tags that have changed�

SSH being a ubiquitous channel into every server and ansible not requiring server agents are the two keys to its wide success compared to other, similar tools. It can be dropped into virtually any workplace and be immediately useful without onboarding agents or management servers.

Stop asking my hosts to register and be managed, no one wants hosts to be registered and managed by yet another system because there�s already a half-dozen separate agents brawling over /var/log file locks keeping systems �secure� and our performance team just changed the iops value on the database drives to prevent it from melting down while generating quarterly reports

Bhodi fucked around with this message at 13:56 on Dec 21, 2023

# ¿ Dec 21, 2023 13:38

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

That�s hilarious. The terraform plan report is a masterclass of how not to relay useful / critical information. The apply claims another victim!

# ¿ Dec 21, 2023 18:36

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

I�m 99% sure our policy of aggressively setting deletion protection flags has saved our infrastructure from rogue TF applies a number of times in the last year.

# ¿ Dec 22, 2023 01:06

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

without trying to be mean, a realistic look at your org�s willingness to adopt any of those policies and your personal resiliency at how hard and long you�re willing to tilt at a windmill. how you might take those docs and that training getting entirely ignored or actively resisted.

Bhodi fucked around with this message at 23:01 on Jan 19, 2024

# ¿ Jan 19, 2024 22:53

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

To me it�s a sanitation problem. It transforms unsafe poop (say, plaintext passwords or root or admin access keys) into something that�s safe to touch and use. which would be awesome, except you still need to get the poop in there which requires touching the poop, updating the poop, backing up the poop, and since it happily contaminates and gets into everything and is overall unpleasant to deal with, you cant use the same tools and automation that you use to handle the safe stuff so it gets to be manual or have it�s own separate PPE systems which themselves have to be managed, monitored, secured and kept up to date. Oh and you can�t isolate it because by its very nature it needs to be globally accessible.

This ecosystem is by its nature isolated from the rest of your automation, is a self-contained system with well-defined boundaries (did the poop touch?), doesn�t really need to interoperate, and so is a natural separate and isolated "problem space" ripe for a random product to come in and claim to �handle it� soup to nuts.

Bhodi fucked around with this message at 21:02 on Jan 22, 2024

# ¿ Jan 22, 2024 20:46

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

fletcher posted:

This is the best analogy of it I've ever heard. Love it.

Vulture Culture posted:

Right, an application has some amount of bits where you don't want people with access to the source code to also have access to those bits. There's good use cases for sticking those things into data stores on some kind of need-to-know basis. What defines "secrets management" as distinct from handling the secrecy of any other production data (ex. PHI) enough to warrant the usage of separate products and platforms?

Scope and the ability to fully encapsulate it. You don't do provisioning with PII/PHI and I've never seen something where architectural data like tags or network names or instance names was considered "in scope" for PII/PHI. PII/PHI can be encapsulated or isolated in a variety of ways up and down your stack, to specific systems/repos/APIs considered "in-boundary" and so there's typically no chicken-egg or bootstrap problem with using those credentials because there is always greater access to encapsulate it with, but "secrets management" is used at a much more fundamental, often the most fundamental level within your infra (most often provisioning of cross-boundary items such as VPCs, datastores, and various platform services and deployment tools) that fundamentally cannot be encapsulated.

Bhodi fucked around with this message at 22:41 on Jan 22, 2024

# ¿ Jan 22, 2024 22:15

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Vulture Culture posted:

I had been thinking of a secrets store principally as some type of security tool, but hadn't considered its utility as a DR/BC or reliability tool. So I'll notch a point for secrets being centrally-managed rather than distributed, but I'm still unclear what, beyond just conceptual separation of duties, makes any of these tools better than files in object storage.

I won't say better but like 12 rats says, it's a product space with a logical boundary that is similar if not identical across various companies, which makes it easy to develop and sell a product that fits that niche, and if it's popular enough, it's a known quantity that auditors will expect that (when properly configured) has a baseline level of security competence, and comes with it's own inheritable controls/certifications/compliance for fips/fedramp/pci. You can "roll your own" solution but I know you just winced at that phrase because everyone here knows how homegrown poo poo is inconsistent and gets hosed up in some subtle way and now the onus is on you to convince your auditors that your process is just as good as an "industry established" one.

Or you can just buy cyberark, no one gets blasted for buying cyberark and you won't have to write a book of justification for it for external auditors who immediately break out the fine-toothed combs, magnifying glasses and guarded expressions.

If you don't have external auditors though the pressing business need to just buy a solution is a lot less pressing.

Bhodi fucked around with this message at 16:48 on Jan 24, 2024

# ¿ Jan 24, 2024 15:49

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

We know we're secure because we checked that "encrypt storage" chceckbox in AWS

# ¿ Jan 24, 2024 16:43

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

I'm a little confused but I hope this wasn't about my obvious single sentence shitpost/joke

# ¿ Jan 24, 2024 22:28

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Rundeck�s primary difference is organizational; all scripts are tied and owned by a �project� which also has a single pregenerated list of nodes not unlike an ansible inventory hostfile. Scripts cannot be �shared� amongst projects (though can be administratively moved/cloned/copied) and the nodelist must exist or be pregenerated before any script is run. You can run on a filtered subset of this nodelist, set as a job param, but a project and nodelist are 1:1. Nodes themselves are not validated and are just strings passed to ssh and so can be part of multiple project�s nodelists and are typically generated via periodic inventory scraping scripts or static, again not unlike ansible.

Bhodi fucked around with this message at 14:08 on Feb 1, 2024

# ¿ Feb 1, 2024 14:05

Adbot: ADBOT LOVES YOU

# ¿ May 16, 2024 05:40

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Obviously you greenfield it better if possible but since you're likely just shoving some already-designed scripts into jenkins for ease-of-execution GUI, RBAC, and logging, you work with what you're given.

The real crime here is how much developing pipelines suck as a development workflow. there's no breakpointing, no IDE, a special one-off language and no real way to troubleshoot beyond executing the script over and over again. Just trying to get a simple maintenance task done means just covering everything in echos, executing it 10 times (+1 if you're providing parameters in the pipeline which modify the job itself) while hitting your shin on every single dereference failure and syntax error read through the job console log.

Good luck trying to develop a jenkins library because the best our modern systems have to offer are an attached job configuration window gui, a separate execute job window gui and clicking into the console every time to see what happened.

This is why most people recommend leaning more on shell scripts, not less, getting those working separately with a faster execute/fix loop and then using jenkins glue as the thinnest coordination/execution piece possible.

Bhodi fucked around with this message at 15:14 on Mar 18, 2024

# ¿ Mar 18, 2024 15:12

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »