Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

LochNessMonster posted:

All these downstream pipelines are building container images based on customer specific configs.

The build is config-dependent, or the runtime is config-dependent? Because if it's the latter, you should absolutely be injecting the config in at runtime, not during build. While it can be convenient to ship config baked into the container, it results in situations like this where now you need 1 container per config. If the config was injected at runtime, you only need 1 build.

# ¿ Oct 31, 2022 18:47

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 17:39

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

The quality of TAM may be heavily influenced by your total spend with the vendor. You may both be on Enterprise support, but the $100MM/year customer is going to get the velvet rope TAM treatment while the $1MM/year customer gets the intern.

At any rate, the worst support is from MindTree, Azure's default 3rd party support org. They never read the ticket and just chase the close. They don't know the answers to anything. They will keep the ticket open so long that the person assigned will leave the company and you'll have to start again from scratch. And this is all exacerbated by the terrible Azure support technology that shoves everything into email but doesn't handle images or threading or collapsing quoted text, so every conversation reads like a shredded dictionary.

# ¿ Dec 15, 2022 20:13

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

FISHMANPET posted:

I've actually had pretty good success with Microsoft support, both on the Azure side and non-Azure side.

There's multiple support tiers at Azure. The default $100/mo support from MindTree is the useless one. The expensive Unified Premier Support or whatever it's called gets you actual Microsoft staff and is (presumably) lightyears better.

# ¿ Dec 15, 2022 21:19

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

One approach is to offer sliding scale with different expectations:
- DIY: Your teams build it in the cloud, you own it - with all that entails. Good for Need It Fast, proof of concepts, not much red tape, but you're on your own if you have a problem.
- "Basic" shared platform: easy to onboard, takes care of most of the details, but it's opinionated and support is not great. Good for teams who can't cloud very well.
- "S-tier" shared platform: high bar to onboard (must pass security / finance / legal reviews) but rock solid support and features. Good for critical apps.

# ¿ Feb 14, 2023 20:49

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

It helps if the platform team runs their team like a business vs a government agency. If they run it like a business, they care about internal customers and want to meet their needs and add useful features. But if it's run like a government agency, and seen as a cost sink, it stagnates as a monopoly that makes the rules and has a captive (and unhappy) customer base.

In reality I suspect most platform teams are somewhere in the middle; or at least start out with ambitions of being the former, before legal / security / technical realities are imposed upon them and they end up being the bureaucratic monster they once professed to hate.

# ¿ Feb 14, 2023 22:22

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

There definitely needs different approaches for different levels of app. For example, I have a half-page script that reads data from an internal API and squirts it into a human-readable Google Sheet. I'd run it off my laptop, but it needs to run when I'm on vacation or if I left the company, so I need to find a proper home for it.

If I apply to the IT platform department, then they will ask:
- security: what ports does it expose? What's your process for handling CVEs? Do you do static analysis & container scanning? When and how are you rotating credentials?
- GDPR: What personal data are you processing? Where is it stored? How long will you retain it? Is that policy published anywhere?
- ops: what CPU / network / storage requirements? Where's your testing/staging environment? How critical is this app? Who do we escalate PagerDuty to? What health checks can we apply?
- legal: Is the source code open-sourced? If so, what license is it using? What's our exposure?
- procurement/finance: Is it using any 3rd party SaaS services? Are they onboarded into our vendor management system? Do we have an enterprise agreement with them?

All of these are very reasonable questions... but this is just a non-critical script, and there are thousands of similar things all over the company. If the bar to entry into a platform is this high, no-one is going to wade through all that red tape. They'll either hobble along running it on their laptop, or go rogue and procure an off-the-books cloud account to run it where the eagle eyes of infosec et al won't see it.

If the platform team is made up of "cover my own rear end" types, they don't care: they only care that they asked the right questions and got their boxes ticked. But if they actually care about internal customers, they'll recognize this friction for what it is, and make an easy path for them.

# ¿ Feb 15, 2023 00:19

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

It's all Goodhart's Law stuff, right? Infosec imposes onerous policies because bad number go down brrrrr.

Since security incident quantity/cost is easy to track, but the negative effects of onerous policies are intangible and 2nd order (e.g. people spinning up services on their credit card because it's easier than going through InfoSec), then it's really easy for InfoSec to slip into a bureaucratic friction bottleneck that primarily only benefits themselves.

# ¿ Feb 16, 2023 17:55

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I'm about to run into this problem, and assumed I'd have to split the policy up into multiple policies. Which is do-able, it's just a hassle.

I'm intrigued about the Vault solution, but I'm bewildered about how that actually works. I grok'd all the nouns in that explanation, but I have no idea how they all connect together.

# ¿ Mar 8, 2023 20:27

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

JehovahsWetness posted:

This setup works for us because pipelines are required to store any needed secrets in Vault anyway, so it's not a big leap to using it for STS creds.

Ah, that makes much more sense. Thanks for the explanation!

# ¿ Mar 10, 2023 20:58

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

This guy: https://www.redhat.com/en/technologies/management/ansible

AWX is a CLI interface to the same.

# ¿ May 23, 2023 00:51

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Azure's APIs are a 7-layer dogshit lasagne:
- Unlike AWS where you can just get an access key attached to your IAM account and start using boto3 immediately, Azure forces you to register a service principal. But doesn't offer you a simple "look, I just want to give you a secret and then I make API calls", noooo, instead you have to define whether your app is mobile or does Oauth, and you need to supply a Redirect URI (!!), plus a zillion other confusing questions.
- You can't just pass in the secret key and get an API session, you have to go through the whole song and dance of using the key to get a session token, and then using the session token to make the API calls. And handle storing a refresh token in a file somewhere so that you don't have to re-auth the next time you run the app. (In fairness, I think recent versions of the Python API tried to emulate the ease of boto3 by hiding most of this away)
- Ok, so now you have a connection. What Python library do you want to use? Because there are multiple versions of each API, some with "preview" in the name despite being years old, and they're all simultaneously supported, so you need to pick a specific version. How you make sure the right version is installed and selected is basically magic.
- Now you have a client instance and want to make an API call. So you visit the docs to understand the parameters, and they offer such gems as "subscription_policies: the subscription policies". Thanks!
- More docs fun: inconsistent names! Tenant / Azure AD / Directory, they all mean the same thing and are used interchangeably. Sorry if ya didn't know!
- Everything in Azure has a unique Object ID. However, sometimes this is hidden (in the Portal) and they show you a separate ID, e.g. both subscriptions and tenants have an ID that are different from their object ID. Some API calls use the object ID, some use the publicly visible ID. Trial and error till you figure out which one.
- Unlike AWS, many API writes don't immediately take effect. You may need to install various waits() or polls() after you make a write to Azure, otherwise your program may hit a race condition. E.g. if you create a user and then try to grant that user a permission, the grant sometimes fails because the user hasn't finished being created in the background. So now your app is polluted with lots of polls and retries!
- Don't even bother asking support unless you're on a premium support tier, because then you'll get redirected to Azure's awful 3rd party MindTree support, which is actually less then useless.

# ¿ May 31, 2023 22:57

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I have it on my ToDo list to look at StackSets, because they'd make it trivial for me to ensure various IAM users/roles/policies/groups exist in each linked account, e.g. stuff we need for basic admin tasks. Definitely beats writing some Ansible automation to STS:AssumeRole into each account and set it all up, especially given the number of accounts we manage.

But I don't think I'd want to touch it for any compute infra.

# ¿ Aug 8, 2023 21:11

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

The Iron Rose posted:

Before I do though, am I reinventing the wheel here?

Are you running vanilla K8s? I don't know too much about it, but if you're running OpenShift then middleware like Hypershift for can run control planes at scale, i.e. 1 "super" control plane manages many k8s clusters.

# ¿ Nov 9, 2023 02:24

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

With Ansible, it's important to understand that it's basically 2 parts: (1) a way of maintaining an inventory of configuration (i.e. what data applies to what hosts, and how you organize it so that there's maximum sanity & minimal duplication), and (2) applying that configuration to a bunch of hosts, via the playbooks/roles/tasks/modules and whatnot. The first step ultimately renders a giant JSON data structure that contains the final per-host config, and then the 2nd step uses that data as input to the playbooks which apply it to all the hosts.

The thing that bugged me at first was that they didn't really prescribe a way to manage the inventory side of things, you're kind of left to feel that out for yourself, so that took a bit of trial and error. But since the end result of the first step is just "render a big data blob", you can replace their built-in way of doing things with your own code (so-called "inventory sources"). This helped a lot because I no longer had to try to shoehorn my particular situation's weird structure into Ansible's config format, and I could also easily leverage oddball data sources like spreadsheets and databases.

The "apply the data to the hosts" side of things also took a bit of getting used to. It's really an overly-simple programming language, and it's really not cut out for doing data manipulation, so it's best if all your data is rendered into an easy-to-consume form in the first step. But it is really good at applying the same steps in parallel to lots of hosts, gathering status & errors, retrying on failed hosts etc.

# ¿ Dec 13, 2023 18:55

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

It's a special case of config management. It needs controls & auditing far beyond what's needed for "regular" config. And as someone mentioned above, it's also about getting that data where it's used in a secure way. Most config management software is not overly concerned with the security of the config itself, because the config is often not sensitive info. But secrets are sensitive, and need to be distributed to the same areas as regular config, so either config management tooling clumsily integrates with secrets managers to do that, or the system is architected to bypass the config management tooling entirely and talk directly to the secrets managers.

(We had a system where our config management + deployment scripts were horribly integrated with our secrets manager, because it was the config management tool that was ultimately responsible for putting those secrets onto the hosts in a way that our programs could use them. We inverted the model and instead get our programs to reach out to the secrets manager on startup to retrieve whatever secrets they need, and this has vastly simplified (and secured) our deployment scripts, at the cost of coupling our programs with a secrets manager. Seems worth it, so far.)

# ¿ Jan 22, 2024 21:25

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Vulture Culture posted:

The only way to responsibly handle this is to use short-lived credentials, at which point you become a full identity broker. That's a great endgame! I'm unconvinced that in the space between "literally nothing" and "full identity broker", these products provide any value at all. They don't in any way touch the hard part of the problem.

We mostly use Hashicorp Vault as a KV store, and to be honest it doesn't provide significant value beyond a generic KV store like Redis, if it had some authnz + HA + backup controls slapped around it. But it was nice that it came with all that out of the box.

Using Vault as a credentials minter for short-term creds is convenient because Vault is also a secure place to store the permanent "bootstrap" creds required by the minter. AFAICT even if you arranged ephemeral creds for all apps, there'll always be a need for perma-creds to bootstrap the creds minter. So the value proposition of a secrets manager is partly as a way to handle those necessary perma-creds.

# ¿ Jan 22, 2024 22:38

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I don't know how useful it is for secrets managers to have comprehensive audit trails. Practically-speaking, whenever I've seen a leaked credential they would not have helped because it's been either:
- a dipshit engineer treating a perma-secret as regular data (e.g. adding a credential to some dev / CI script that they mistakenly upload to a public Git repo)
- leaked in logs. CI is a common problem here because everyone dials their debug levels up to the max so more stuff gets logged, including "dump the environment to aid debugging". And some CI systems log to public buckets because "it's just build results, it's not sensitive, we don't want to make life harder for people debugging stuff by adding an authnz layer over it".

# ¿ Jan 23, 2024 01:04

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Docjowles posted:

Going off on a tangent but at a past job we totally had a careless dev commit admin IAM keys to public GitHub. It was impressive how quickly they were found and used to spin up crypto miners in every region.

This happens because Github (for reasons I'm not clear on) publish an public event log of every public change that happens in every single Github repo, and hackers listen to it and scan for leaked credentials. Consequently, any leaked credentials get used almost instantly. AWS is thankfully aware of this, and now listens to the same event log. They will quickly quarantine any AWS credentials they find by applying an IAM policy that effectively makes it a read-only credential, and then inform the account owner.

When we do forensics on these kind of incidents, it's common to find that the culprit was trying to cobble together some CI pipeline and had no idea how to inject credentials into it, so they just hardcoded it into the script.

# ¿ Jan 23, 2024 04:23

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Tests need known state in order to not be flakey, so it's useful to completely reset the DB between test runs. We have a containerized Postgres (vanilla docker.io/library/postgres) which auto-initializes and keeps its state in the container, so after a test run we just toss the whole container and make a new one.

But a fresh DB every test run also means you need to recreate the schema (at a minimum) and possibly inject some core data, and that can slow down your test runs when you have thousands of tests. So another way to do it is to use Postgres's ability to clone a DB from a template DB that's already setup with the right schema. Or, depending on the app, run the entire test in a transaction and roll it back at the end of the test.

# ¿ Feb 24, 2024 05:02

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

my homie dhall posted:

it�s always funny to me when people say they are trying to solve the problem of groovy�s terrible syntax by introducing another flavor of turing complete yaml

The most damning criticism of Groovy is when the creator said

quote:

I can honestly say if someone had shown me the Programming in Scala book by by Martin Odersky, Lex Spoon & Bill Venners back in 2003 I'd probably have never created Groovy.

# ¿ Mar 18, 2024 15:15

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Agreed it's bad but I figure a lot of teams need to setup an automation platform for CICD and/or random jobs that need (a) non-technical people to launch them and (b) people to look at output logs with relative ease, and Jenkins is just simple enough that 1 person can get it setup "enough" for a team within a short time. The better alternatives (a k8s cluster? Ansible Automation Platform?) are complex enough that they need an Ops team.

# ¿ Mar 18, 2024 17:34

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

The Fool posted:

Azure Devops , GitLab, Github Actions, or CircleCI?

All of those are super easy for a small team to set up

Yes, for CICD. But not for

quote:

random jobs that need (a) non-technical people to launch them and (b) people to look at output logs with relative ease

We did a survey of the hundreds of Jenkins instances we discovered running within the company, and many of them were being used for non-CICD purposes.

# ¿ Mar 18, 2024 20:15

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 17:39

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Hell, I've seen candidates accept the offer, go through a week of on-boarding, and then surprise everyone with "welp, actually I got a better offer from Another Corp, I'm gonna go with them, byeeeee". Can't do anything but shake your fist and "ignorelist" them if they every try to get hired here again, but realistically no-one will remember.

# ¿ Apr 1, 2024 20:11

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »