Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
The Fool
Oct 16, 2003


The Iron Rose posted:

I just use gitlab pipelines/GitHub actions :v:

same but azure devops

Adbot
ADBOT LOVES YOU

Hadlock
Nov 9, 2004

Vulture Culture posted:

The documentation and UX for ArgoCD and Flux both paint a picture where ArgoCD is a lot more batteries-included than Flux is, and I was very surprised to find in practice that the opposite is true

Yeah this is rapidly turning out to be a disaster

ArgoCD does present a lot better though, my boss was a front end developer in a previous life and the UI is really nice for giving a visual representation of what's deployed and where

Junkiebev
Jan 18, 2002


Feel the progress.

is there a recommended starting point for tuning systctl defaults for use on "general-purpose" kubernetes nodes, or any resources available for general consumption?

defaults like this are kinda garbage, but I'm sure I'm missing other stuff

code:
net.ipv4.tcp_rmem = 4096	131072	6291456
net.ipv4.tcp_wmem = 4096	16384	4194304
not looking for the complete solution though I'd accept it, just looking for a starting point.

Hadlock
Nov 9, 2004

What OS are you using? Container Linux is going to be very different from say vanilla redhat. Also are you managed or unmanaged k8s

If you can control your node image, I'd look at rolling your own node image with your own bespoke defaults, and then tell the management plane to use your home rolled images. Automate the image rolling so that you pick up security updates every 30 days or whatever

Junkiebev
Jan 18, 2002


Feel the progress.

Hadlock posted:

What OS are you using? Container Linux is going to be very different from say vanilla redhat. Also are you managed or unmanaged k8s

If you can control your node image, I'd look at rolling your own node image with your own bespoke defaults, and then tell the management plane to use your home rolled images. Automate the image rolling so that you pick up security updates every 30 days or whatever


Ubuntu 22.04 (currently, but not married to it). Managed internally, a mixture of RKE and RKE2. vSphere and Azure.

It's my intent to cook differences into a "golden image" and periodically built with packer.

(derp - thought that was @me.)

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
I mean, if races don't matter, the easiest option is to run some privileged DaemonSets that set your sysctls how you need them. If races do matter, you can still use this approach, you just need to set taints on your nodes and have the DaemonSets clear the taints when they're done with a first run

SurgicalOntologist
Jun 17, 2004

Hadlock posted:

I'm not super happy with ArgoCD but I'm too far along the implementation path to back out and switch to flux because I need to get this delivered

ArgoCD is pretty good for what it does. But then to update the image tag of the container you need to.... Install a third party plugin that's v0.12 and loudly points out that it could change at any time?

Looks like there's a PR ready to merge but the guy who maintains the plugin has abandoned it and wants someone else to take over the plugin, but doesn't offer any way to contact them :cripes: also a bunch of proceduralists are adding red tape

Third there's no first class support for AWS ECR, gently caress me, guys come on. Ok fine I'll install a weird third party helm chart to get the ecr login secret, I guess. Now I have to create a local fork of this third party chart to support my CD system

I'm all for "do one thing, and do it well" but it doesn't seem like these functions need to be independent of the main helm chart, you've already broken ArgoCD into five+ services

Of interest, it looks like the guys who started ArgoCD gave up on it, literally forked argocd-image-updater and built a new CD system on top of it, Kargo (although they've since fully rewritten the image updater code). Kargo is too new for my tastes but I'm not loving this "band of merry helm charts" approach to building a functional CD system; flux would have been a very choice at this point I think.

I think most people aren't interested in something like argocd-image-updater but are editing the tag in a CI job. At least that's what we're doing. We have a <80 line bash script with usage like this:
Bash code:
update-image-tag REPO_URI MANIFEST_PATH TAG_JSONPATH_OR_ENV_VAR IMAGE_TAG [--dry-run] [--target-branch=BRANCH]
Not sure about ECR, we use GCP Artifact Registry and we just had to set up the service account of one of the ArgoCD services to have permisions there (via Workload Identity). In other words, it's managed via GCP IAM not anything within ArgoCD itself. Surely you could do something similar in AWS?

You reminded me I never answered your previous question, so here goes.

Our first approach to third-party apps was to put the third-party app in a subchart. This let us keep the values file in our manifest repo (I think there was a limitation, now solved, that the values file and helm templates had to be in the same repo? something like that), and let us add more resources if we wanted. However, we didn't have an easy way to override for example the chart version in a specific environment. The other option is just putting everything in the Application itself, but that's not very DRY across environments.

Our new approach is to use kustomize, and reference the chart in HelmChartInflationGenerator (see here for example: https://medium.com/@brent.gruber77/the-power-of-kustomize-and-helm-5773d0f4d95e). This has several advantages:
  • We can still add resources
  • We can also patch resources even if the values doesn't expose what we want to do.
  • We can patch the HelmChartInflationGenerator itself, i.e. the chart version, so we can properly promote the update across environments
  • Everything we install is in kustomize so the layout of base, envs, overlays, etc is identical between third-party apps and our own resources. So our tooling is always the same. The process of promoting a change across environments is always the same (just a cp). The process of refactoring an overlay that has been promoted across all environments into the base is always the same. The way to verify we don't push a change that doesn't follow the promotion rules is always the same. Etc.
  • Our ArgoCD Application resources are all the same, so we can replace a directory of dozens of them with a single ApplicationSet, since every app follows the same conventions
  • I have a feeling a lot more standardization/automation is possible. We've also embraced Crossplane so our IAC is also following the same protocols.

It was a bit hairy to setup, you have to get into kustomize pretty deep and understand generators, transformers, etc. (I don't quite get it to be honest), but once we had it working for one app it was just cargo culting from there.

Also granted our system is pretty simple, we just have dev, staging and production, each with 2 clusters. Next step is replacing dev with on-demand environments (namespaces for some simple use cases and temporary clusters for others).

Hadlock
Nov 9, 2004

For whatever reason, my brain just absolutely rejects the notion of kustomize

The Iron Rose posted:

I just use gitlab pipelines/GitHub actions :v:

So I went down this rabbit hole

If I do this, I have to declare the value per branch, which is fine for long-lived environments like dev, staging, prod. This does follow true git-ops style.

Makes it hard to have an ephemeral app "subscribe" to a tag though, especially like dynamic tags like branch names. Declarative tags per branch means creating a folder, creating a values file and then when the branch is closed, some kind cleanup process to drop that folder

Maybe I do github actions for dev/staging/prod now, and then roll out image-updater for ephemeral environments later, that can "subscribe" to a tag/tagging rules set using annotations at helm chart deploy-time. Subscription based, even with state stored in git, isn't super declarative, but for ephemeral non-prod environments, that level of slop is justifiable.

SurgicalOntologist
Jun 17, 2004

Haven't got there yet but I was thinking of trying this for ephemeral environments: https://argo-cd.readthedocs.io/en/stable/operator-manual/applicationset/Generators-Pull-Request/

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Hadlock posted:

For whatever reason, my brain just absolutely rejects the notion of kustomize

So I went down this rabbit hole

If I do this, I have to declare the value per branch, which is fine for long-lived environments like dev, staging, prod. This does follow true git-ops style.

Makes it hard to have an ephemeral app "subscribe" to a tag though, especially like dynamic tags like branch names. Declarative tags per branch means creating a folder, creating a values file and then when the branch is closed, some kind cleanup process to drop that folder

Maybe I do github actions for dev/staging/prod now, and then roll out image-updater for ephemeral environments later, that can "subscribe" to a tag/tagging rules set using annotations at helm chart deploy-time. Subscription based, even with state stored in git, isn't super declarative, but for ephemeral non-prod environments, that level of slop is justifiable.

Tag your images with the commit SHA or branch name as a suffix after the semantic version, and set up lifecycle or cleanup rules in your registry to clean up unused images after a few months. You can then deploy to k8s namespaces using the branch/commit sha. Add cert manager and kube DNS for fully automated throwaway environments.

No clue how this integrates with argoCD because I’ve never used it, but the above happens automatically for us with no need for manual intervention other than clicking a manually triggered “deploy” pipeline stage after tests pass.

Hadlock
Nov 9, 2004

After talking to two different IRL people I now have an 8 line bash script that generates an imagetag.yaml in a folder named after a git branch (filtered through a magic git and docker safe regex) and pushes that to a dedicated image tag repo, and those value files can be referenced by ArgoCD by applying the same magic regex. The image tag.yaml has a single key value that is tag: $GIT_SHA

This handles the use cases for both long lived environments and ephemeral environments, as I nuke the whole directory structure and regenerate it each time idempotently so I don't need to trim data for deleted branches

The Iron Rose posted:

Tag your images with the commit SHA or branch name as a suffix after the semantic version, and set up lifecycle or cleanup rules in your registry to clean up unused images after a few months. You can then deploy to k8s namespaces using the branch/commit sha. Add cert manager and kube DNS for fully automated throwaway environments.

Way ahead of you, doing both of these things along with unix timestamp and truncated commit message. The only thing I've been struggling with today was providing an updated value for ArgoCD to ingest to update the helm release

I could put this nested ugly dir structure in the same IaC mono repo but then you end up with endless robot git commit spam where it's not wanted or needed

Hadlock fucked around with this message at 23:54 on Feb 8, 2024

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Thread opinion on buildpacks? There's some adoption discussion from our cloud security team and all I know about them is that they're hardened and opinionated container builds but beyond that I have no idea what footguns to look out for or broadly what they do that's not accomplished by well written dockerfiles.

SurgicalOntologist
Jun 17, 2004

Hadlock posted:

After talking to two different IRL people I now have an 8 line bash script that generates an imagetag.yaml in a folder named after a git branch (filtered through a magic git and docker safe regex) and pushes that to a dedicated image tag repo, and those value files can be referenced by ArgoCD by applying the same magic regex. The image tag.yaml has a single key value that is tag: $GIT_SHA

This handles the use cases for both long lived environments and ephemeral environments, as I nuke the whole directory structure and regenerate it each time idempotently so I don't need to trim data for deleted branches

Way ahead of you, doing both of these things along with unix timestamp and truncated commit message. The only thing I've been struggling with today was providing an updated value for ArgoCD to ingest to update the helm release

I could put this nested ugly dir structure in the same IaC mono repo but then you end up with endless robot git commit spam where it's not wanted or needed

Good idea putting the tags in a separate repo, may consider that.

Edit: wait, how do you actually include those in your manifests generated by ArgoCD?

Also did you explore the Pull Request Generator? If so I wonder why you rejected it, as we'll be facing that decision soon.

Hadlock
Nov 9, 2004

SurgicalOntologist posted:

Good idea putting the tags in a separate repo, may consider that.

Edit: wait, how do you actually include those in your manifests generated by ArgoCD?

https://argo-cd.readthedocs.io/en/stable/user-guide/multiple_sources/#helm-value-files-from-external-git-repository

JavaScript code:

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  sources:
  - repoURL: 'https://prometheus-community.github.io/helm-charts'
    chart: prometheus
    targetRevision: 15.7.1
    helm:
      valueFiles:
      - $values/charts/prometheus/values.yaml
  - repoURL: 'https://git.example.com/org/value-files.git'
    targetRevision: dev
    ref: values

I had some regex tweaking to do and I'm just now getting the repo up to do my first test I'll report back tomorrow if this rube goldberg machine actually works

Hadlock
Nov 9, 2004

Hadlock posted:

if this rube goldberg machine actually works

It works, two (three?) caveats

1) you have to use "sources" instead of "source" which means it breaks the initial setup UI, so you have to "manually generate" and kubectl apply the Application.yaml
2) you may or may not have to setup the tag repo under settings/repositories and provide ssh key
3) I forget but not a big deal

It's all resolved using declarative yaml and maybe some bash one liners though

Going to stick with the git foreach parsing container tag thing to cover the ephemeral use case on the condition that the contract says these tags MAY reference a real container*, and then

Have a switch statement for "protected environments" and those will update in a different dir structure and the contract guarantees it references a real container

*Maybe only flag this on if the branch name ends in -env that's a future me problem

Hadlock
Nov 9, 2004

12 rats tied together posted:

in my current role it would take >1 year to redeploy our infrastructure from scratch for no benefit to the company or our team so we would never even consider it, but i'm on a dbre team now, so the usual advice doesn't apply

What's your disaster recovery/business continuity plan look like

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Today I learned that all our AWS infra is built with Ansible. Which is obviously possible, because they've done it, but holy moly is it a mindfuck trying to actually figure out how anything is put together.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

FISHMANPET posted:

Today I learned that all our AWS infra is built with Ansible. Which is obviously possible, because they've done it, but holy moly is it a mindfuck trying to actually figure out how anything is put together.
It's not easier with Terraform written by lots of different teams

12 rats tied together
Sep 7, 2006

FISHMANPET posted:

Today I learned that all our AWS infra is built with Ansible. Which is obviously possible, because they've done it, but holy moly is it a mindfuck trying to actually figure out how anything is put together.

Why? Ansible is good at this.

The Fool
Oct 16, 2003


our bootstrap/onboarding automation is a combination of ansible and terraform

works well enough

Hadlock
Nov 9, 2004

If it takes a year to rebuild your infrastructure from scratch then no, ansible is not good at this

Hadlock
Nov 9, 2004

Hadlock posted:

What's your disaster recovery/business continuity plan look like

Is there a good boilerplate policy for disaster recovery SLA with the c suite? I was casually talking about 24 hours for basic functionality, and 7 days to return to full functionality

I think just spinning up a database server from scratch, unlocking the off site backup and doing a full database restore would take us ~4 hours

The Fool
Oct 16, 2003


Hadlock posted:

If it takes a year to rebuild your infrastructure from scratch then no, ansible is not good at this

the year to rebuild isnt a function of ansible

Hadlock posted:

I think just spinning up a database server from scratch, unlocking the off site backup and doing a full database restore would take us ~4 hours

"a database server"
adorbs

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Hadlock posted:

Is there a good boilerplate policy for disaster recovery SLA with the c suite? I was casually talking about 24 hours for basic functionality, and 7 days to return to full functionality

I think just spinning up a database server from scratch, unlocking the off site backup and doing a full database restore would take us ~4 hours

I've done some DR exercises before and I think the biggest point is that there isn't like a flat policy. DR is basically always going to be 'in scenario X, our time to recover is Y', and you have to pick how big of a problem you think you can solve.

For example, some of our DR SLAs were 5 minutes at that role, because it was a stateless frontend service with automatic error detection and failover, so for the scenario of 'what happens if the host pool in X region goes down' our DR was 5 minutes.

On the other hand, our DR for 'a meteor strikes {hq_city} wiping out the campus and all occupants', that was our actual line in the sand for 'anything this problematic or higher is out of scope and it would be disastrous to our business, it's close up shop time'. Most things fell somewhere in the middle and had scaling responses.

Most of the DR stuff comes down to 'if feature/functionality/service X goes down, do you have a plan to recover from it, and how long will it take to execute the plan?' so if you want a boilerplate, that's the most straightforward answer. You also need to test your DR - an untested DR plan is meaningless, even if you're just trying to execute the failover/etc steps. I would also note that it's not necessarily a bad thing to say 'this would require a manual rebuild of XYZ and would take an estimated Z months' because if that really is the answer then it's worth leadership understanding it, and then figuring out how much the work to offset it would be.

Sometimes, especially if your budget is shoestring or the service isn't that important, there's lots of times where that line is pretty low down the list.

Edit: gently caress it, I'll keep going.

I'm not familiar with standards for this stuff, so there might be one I'm not aware of. I would say if there isn't, book dedicated time to sit down and start looking at your system. Identify all the most likely failure points, and then document how you'd approach recovering from it. This should include stuff like 'what if our main datacenter goes offline due to an idiot with a backhoe / cooling failure / etc', and stuff like that. Make sure there's documentation for that approach, and that the documentation has some sort of mechanism to stay up to date over time, so you don't go to enact it to find out you changed your networking stack since it was written, and now you're having to ad-hoc gently caress with DNS while leadership is trying to crawl all the way up your rear end in a top hat in hopes of puppeting your body to a faster mitigation time.

If the service you're investigating is some random feature that could go down for a while, be less worried about it. If it's the main service that keeps your company in the green financially, worry more about it.

Falcon2001 fucked around with this message at 05:01 on Feb 11, 2024

Hadlock
Nov 9, 2004

Falcon2001 posted:


Most of the DR stuff comes down to 'if feature/functionality/service X goes down, do you have a plan to recover from it, and how long will it take to execute the plan?' so if you want a boilerplate, that's the most straightforward answer. You also need to test your DR - an untested DR plan is meaningless, even if you're just trying to execute the failover/etc steps.

Edit: gently caress it, I'll keep going.

I'm not familiar with standards for this stuff, so there might be one I'm not aware of.

I worked at a place that did real time trading. You've never heard of them but it was a thing they offered. Anyways as a result they were regulated by FINRA and had a full, manned , DR site in some basically empty nondescript 7 story office building near a major interstate, full of decade old desktops that were powered on just rotting running a fully patched copy of Windows 7 enterprise and dust covers on the keyboards, and big signs hanging from the ceiling saying "ACCOUNTING" and "CLEARING" and "TRADE DESK" etc. full on "meteor strikes hq building" backup. Before I left I raided their office supply cabinet (that had probably never been opened) for a very nice collection of wilcott flexible stainless steel rulers. Very "liminal spaces" type space

ANYWAYS

Annually we would, on some three day trading weekend, the DR team would roll over to the B site. Usually it was the team lead, who had been there 15 years and done this test 14 times with his blindfold on basically. He was supposed to follow the printed manual in the binder. Well this year they sent him home 5 minutes before the exercise happened, and had my coworker do it. Instead of 4 hours it took them like 36 hours and they uncovered all sorts of tribal knowledge that wasn't recorded anywhere. The CTO got involved at one point because they weren't sure they could fail back over to the A site before trading started Monday morning at the bell and in fact I think we ran that whole week from the B site and then had to switch back over the next weekend. All software testing and deployment was halted for the whole week because even though we had testing servers at both sites, we lost the A servers and testing couldn't run in 24 hours on just the B servers

TL;DR yeah always test your DR plan

I don't like a whole lot of modules but I do at every place I work try to setup a weekly job that will deploy a copy of production framework (Kubernetes, ArgoCD, external dns, cert manager, RDS, etc), and then tear it back down, so that I have some level of confidence in our DR process, and then trust the IT guys are taking care of our off-site database backup and we can get it at some point

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Hadlock posted:

I worked at a place that did real time trading. You've never heard of them but it was a thing they offered. Anyways as a result they were regulated by FINRA and had a full, manned , DR site in some basically empty nondescript 7 story office building near a major interstate, full of decade old desktops that were powered on just rotting running a fully patched copy of Windows 7 enterprise and dust covers on the keyboards, and big signs hanging from the ceiling saying "ACCOUNTING" and "CLEARING" and "TRADE DESK" etc. full on "meteor strikes hq building" backup. Before I left I raided their office supply cabinet (that had probably never been opened) for a very nice collection of wilcott flexible stainless steel rulers. Very "liminal spaces" type space

ANYWAYS

Annually we would, on some three day trading weekend, the DR team would roll over to the B site. Usually it was the team lead, who had been there 15 years and done this test 14 times with his blindfold on basically. He was supposed to follow the printed manual in the binder. Well this year they sent him home 5 minutes before the exercise happened, and had my coworker do it. Instead of 4 hours it took them like 36 hours and they uncovered all sorts of tribal knowledge that wasn't recorded anywhere. The CTO got involved at one point because they weren't sure they could fail back over to the A site before trading started Monday morning at the bell and in fact I think we ran that whole week from the B site and then had to switch back over the next weekend. All software testing and deployment was halted for the whole week because even though we had testing servers at both sites, we lost the A servers and testing couldn't run in 24 hours on just the B servers

TL;DR yeah always test your DR plan

I don't like a whole lot of modules but I do at every place I work try to setup a weekly job that will deploy a copy of production framework (Kubernetes, ArgoCD, external dns, cert manager, RDS, etc), and then tear it back down, so that I have some level of confidence in our DR process, and then trust the IT guys are taking care of our off-site database backup and we can get it at some point

Earlier in my career, I went from working at a service that was almost entirely a query-based stateless service with a sub five minute failover time, to working in the org with payments and billing and stuff. The first job didn't even bother doing DR drills for the most part because every day we were constantly shifting traffic around (if we hadn't had an outage to test stuff like a region going offline, they would test that too periodically, we just had a lot of our DR systems baked into normal operation).

The new job I walked in and got an email talking about how excited they were that they were able to fail over to a secondary region and it only took seventy-two straight hours, handing off from person to person the whole goddamn weekend. At least some of those people were overseas, but yeah. Crazy weekend, and they were SO loving EXCITED. I was horrified.

Edit two: Oh yeah, any process that one person does is a huge red flag for weird hidden knowledge never documented. I'm handling something like that now and the amount of weird bullshit we've dug up is crazy.

Falcon2001 fucked around with this message at 10:55 on Feb 11, 2024

drunk mutt
Jul 5, 2011

I just think they're neat
ArgoCD wasn't really designed with Helm in first consideration (HelmV2) and angled towards the GitOps ideology which was better with declarative patterns (e.g, kustomize). It's not a "magical workflow operator" and really was designed with the idea that teams would toss their k8s manifests in a Git repo and those changes would be realized into the cluster when they hit "trunk".

eta: It actually isn't really all that well designed to handle mono-repos; you can very easily get these defined by creation of repository resources in ConfigMap resources, but still not the first approach in defining an ArgoCD "Application" resource.

Hadlock
Nov 9, 2004

How do you approach cloudfront distributions with IaC for k8s. I guess our front end gets compiled to a stack of files on S3 served via cloudfront

Spinning up the S3 bucket it points at is cake

I'm looking at doing an AWS ack controller for both S3 and cloudfront (basically , AWS resources as k8s crds) but the way external dns is wired, is to look for ingress controllers and point DNS to a load balancer

https://github.com/aws-controllers-k8s

I could do this at the terraform level but I'm super loathe to do that because I lose a lot of flexibility for my front end team to do copies of prod in dev without a lot of manual drudgery

The big downside to acks is documentation is extremely sparse besides the simple S3 example, or maybe I'm not looking hard enough

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Hadlock posted:

How do you approach cloudfront distributions with IaC for k8s. I guess our front end gets compiled to a stack of files on S3 served via cloudfront

Spinning up the S3 bucket it points at is cake

I'm looking at doing an AWS ack controller for both S3 and cloudfront (basically , AWS resources as k8s crds) but the way external dns is wired, is to look for ingress controllers and point DNS to a load balancer

https://github.com/aws-controllers-k8s

I could do this at the terraform level but I'm super loathe to do that because I lose a lot of flexibility for my front end team to do copies of prod in dev without a lot of manual drudgery

The big downside to acks is documentation is extremely sparse besides the simple S3 example, or maybe I'm not looking hard enough
Are you doing some kind of micro-account segmentation for ephemeral environments? If not, why create separate buckets and distributions for each temporary deployment instead of prefixing assets on a Git SHA or whatever?

Hadlock
Nov 9, 2004

Right now I have a lovely container running vue in developer mode that the ingress points at, but that's not how prod is meant to be deployed

It's just a vue front end, it compiles a bunch of static assets and then they get s3 cp ./* s3//unique staticprefix-branchname/staticassets and cloudfront points at that

CD spins up namespace branchname and the ingress declares the service will be available at. But with cloudfront, DNS points at... I'm guessing, the cname cloudfront is listening at

In theory with ack I just add the S3 resource template to the helm release, as well as the cloudfront resource template and when the GitHub action workflow runs it blindly copies to the agreed upon branchname S3 bucket and any branch that has the magic keyword in it gets the developer treatment and is deployed the same way as production

Extremely Penetrated
Aug 8, 2004
Hail Spwwttag.
We use Cloudflare instead of Cloudfront, but the idea is externaldns provisions Cloudflare DNS records for both static assets and APIs. We use a Cloudflare Worker (you'd use Lambda@Edge) to handle routing requests to static assets to the right path prefix for that ephemeral environment -- by default we want them using the prod assets, but folks can set a branch name in their HTTP header. So for S3 in particular you don't need to handle it from k8s, and if that lets you avoid running ACK or Crossplane or some poo poo then that's a big plus.

But if you do have other AWS resources that you must handle from k8s then I'd suggest looking at Crossplane's aws-provider-family over ACK.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
How are y'all working around AWS's design decision of RAM-shared resources not having any tags visible in other accounts?

12 rats tied together
Sep 7, 2006

can't say it's ever come up, tags are write-sometimes-read-never for me. every now and then i muck about with them for cost tracking purposes.

Docjowles
Apr 9, 2009

12 rats tied together posted:

can't say it's ever come up, tags are write-sometimes-read-never for me. every now and then i muck about with them for cost tracking purposes.

same

The Fool
Oct 16, 2003


same but we have a finops team so I never look at the tags, they have to

Docjowles
Apr 9, 2009

We went the route of many microsegmented accounts for better and worse. Whether an account is dev/stage/prod and who owns it and so on are known simply by some metadata on the account itself. So we have not found being militant about tagging to be useful at all. Instead we have many other problems :pseudo:

Really the only time lack of tags has come up is if we try to engage with a cloud vendor that expects there to be many, rigorously maintained tags and their software just can't handle any other asset tracking strategy.

The Fool
Oct 16, 2003


Docjowles posted:

We went the route of many microsegmented accounts for better and worse. Whether an account is dev/stage/prod and who owns it and so on are known simply by some metadata on the account itself. So we have not found being militant about tagging to be useful at all. Instead we have many other problems :pseudo:

this is the primary reason I think azure's subscription model is better

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Docjowles posted:

We went the route of many microsegmented accounts for better and worse. Whether an account is dev/stage/prod and who owns it and so on are known simply by some metadata on the account itself. So we have not found being militant about tagging to be useful at all. Instead we have many other problems :pseudo:

Really the only time lack of tags has come up is if we try to engage with a cloud vendor that expects there to be many, rigorously maintained tags and their software just can't handle any other asset tracking strategy.
Yeah this is really more for stuff like transit gateways and VPC Lattice service networks that are shared out to whole Organizations or OUs

vanity slug
Jul 20, 2010

We have a few standard tags but the most useful to me is linking to the repository in which the resource was created

Adbot
ADBOT LOVES YOU

Extremely Penetrated
Aug 8, 2004
Hail Spwwttag.

Docjowles posted:

We went the route of many microsegmented accounts for better and worse.
...
Instead we have many other problems :pseudo:

We're starting to implement this after our TAM pushed hard for it for 2 years. I feel like it's going to be a lot of effort for a sidegrade, but I don't care enough to fight it. Any advice on how to minimize the pain?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply