|
Is there any sort of consensus on IaC management platforms? We use ARM templates at this time and I'm looking to not just improve our Azure capabilities but also other platforms we use. Drift detection is a key feature for me. Terraform cloud seems to be on the out, while spacelift is quite interesting. Pulumi is also interesting, but I am working with an infrastructure team that will have an easier job adopting terraform/opentofu vs the languages Pulumi works with.
|
# ? May 10, 2024 23:35 |
|
|
# ? Jun 9, 2024 08:10 |
|
It seems like most people are using GitHub actions, or terraform could from what I can tell. You can self host atlantis but I've never tried it personally Pulumi is something people like to talk about but I don't know anyone using it in production as their primary go to
|
# ? May 11, 2024 00:50 |
|
Spacelift seems solid but I've never used it If your just doing this for your own team, or the infrastructure is going to be managed by a smaller group and not doing a self service platform thing I wouldn't bother just run the terraform/opentofu in pipelines/actions and keep remote state in storage accounts/s3
|
# ? May 11, 2024 00:53 |
|
Hadlock posted:It seems like most people are using GitHub actions, or terraform could from what I can tell. You can self host atlantis but I've never tried it personally We run Atlantis integrated with Gitlab for pushing out Terraform. It's worlds better than people yoloing poo poo from their workstation obviously. There's locking and the opportunity to force code/plan reviews if desired and making sure the branch you're about to deploy isn't a month behind main. It's an important piece but I would not call it a full "IaC management platform" by any means. Atlantis IS free, though, and by god my company will tie itself into a pretzel to avoid buying software if at all possible
|
# ? May 11, 2024 01:10 |
|
Hadlock posted:It seems like most people are using GitHub actions, or terraform could from what I can tell. You can self host atlantis but I've never tried it personally We use Pulumi for isolated application stacks that devs work with frequently and Terraform for shared services like networking, shared load balancers, etc. which belong to ops staff. We run purely in AWS so all the relevant info like VPC IDs, cache endpoints, etc. are stored in SSM parameters by Terraform and sourced by Pulumi at runtime. It's fine for that use case but I wouldn't want to go any bigger or more complex. Part of why I picked it was the automation API - it made it very easy to write deployment tooling around our ECS tasks. The smaller community means less knowledge available when something goes wrong, their docs are pretty tragic and badly organised and a lot of the state manipulation stuff is either painful or non-existent which makes importing unmanaged resources or refactoring a massive chore. We might be migrating away from it to Terraform because that's what the rest of the dev teams seem to be consolidating around and I certainly won't be shedding any tears. And it turns out the automation API (for C# at least) just shells out to a local pulumi binary and loads the results into some models anyway; so for our use case, redoing that with Terraform's an option. Terraform or Pulumi we just use GitLab runner. When a commit is made to a branch other than main, it runs a plan without refreshing state. If a commit is made to main it runs a plan, waits for manual approval and then runs apply. whats for dinner fucked around with this message at 01:42 on May 11, 2024 |
# ? May 11, 2024 01:26 |
|
I'm not arguing with you but for a full "IaC management platform" what additional boxes would it need to tick for you to give it the green light
|
# ? May 11, 2024 01:26 |
|
This overall effort is about modernizing our processes and ensuring compliance. Terraform as the input is required, but kubernetes is a plus with OS level config being a bonus. Config drift de detection is the primary factor though. My goal is to ensure standards compliance and defeat pushes by our projects to dig themselves into the holes that they do. We currently have a very flexible ARM template practice that cannot really be enforced. So state management is my main thing. A proper repo strategy which we lack today will be a part of this effort. We use git as storage right now, not as any sort of change control mechanism . We have a solid devops practice for applications, but our infrastructure is lacking. I'm looking at products because I won't get developer hours to roll out own solution. And no, please challenge what I'm looking for, if I'm off base, I'm interested. Dyscrasia fucked around with this message at 04:02 on May 11, 2024 |
# ? May 11, 2024 03:36 |
|
Dyscrasia posted:This overall effort is about modernizing our processes and ensuring compliance. Terraform as the input is required, but kubernetes is a plus with OS level config being a bonus. Config drift de detection is the primary factor though. My goal is to ensure standards compliance and defeat pushes by our projects to dig themselves into the holes that they do. I ended up in my current field because I was looking for a way to solve configuration drift and went to work for a company that monitored it, but ultimately they had to pivot out of the space because it was obvious that declarative config and iac were going to destroy that market Between git ops, iac and declarative config (k8s) you effectively eliminate the entire class of configuration drift errors instantly I guess the point I'm trying to make is that by using git ops and declarative config there's really no need for configuration drift monitoring The favorite story I like to tell is that we used to do war games, where on Friday afternoon someone would do something nefarious like, turn off nginx, comment out the database password in a config file, or rename the SSL cert. Then engineers would race to find and fix the problem When we switched to Kubernetes and gitops we had to end that because if you broke something it would launch a replacement, or if you made a change to the git repo, they would just revert the breaking change and it'd start working again You'll still probably need something to do scanning to show the auditors but I guess my point is you're heading in the right direction if you want to solve configuration drift once and for all
|
# ? May 11, 2024 04:53 |
|
Hadlock posted:Pulumi is something people like to talk about but I don't know anyone using it in production as their primary go to Hi, hello. Dyscrasia posted:the languages Pulumi works with. It’s a little known language called Python. Shows real promise.
|
# ? May 11, 2024 06:47 |
|
i've used it in production in the past and at the current place i'll be switching us to it later. seems like OP has a pretty good handle on it i would just note that pulumi has a yaml mode which makes it head & shoulders the best choice for any greenfield iac application since its feature compatible with terraform at a minimum, likely better in some dimensions (aws, azure), you can write it in almost any relevant programming language + the only relevant markup language, and things you build in it are shareable as cross-language constructs i think the only reason to stay on terraform these days is if you have a specific need to consume an OSS thing that only works in terraform like community modules, which is not a place i would really want to be, but im sure its reality for a bunch of folks and i wish them well
|
# ? May 11, 2024 07:06 |
|
Hadlock posted:I ended up in my current field because I was looking for a way to solve configuration drift and went to work for a company that monitored it, but ultimately they had to pivot out of the space because it was obvious that declarative config and iac were going to destroy that market I'm envious of your leadership's discipline that's allowed you to eliminate config drift. Everywhere I've worked since config management / iac / gitops have existed as concepts, the development side of the house has been able to successfully argue to senior management BUT WHAT IF AN OUTAGE??? I NEED A BACK DOOR JUST IN CASE. Which they then abuse to circumvent procedure and make random rear end changes when it suits them. The one that made me want to keep a handle of booze in my desk at a past job was devs editing some config file then doing "chattr +i foo" to make it immutable so Chef couldn't revert changes. Why are you doing this instead of checking the change into version control and letting it roll out automatically??? We have made this extremely easy for you I don't want to complain too much about my current employer because overall I am very happy. But the ops and security leadership always get steamrolled politically and we have to deal with stupid things that fall out of that sometimes. The company is pretty drat old in terms of businesses born on the web and turning the ship is a slow loving process.
|
# ? May 11, 2024 07:16 |
|
Docjowles posted:I'm envious of your leadership's discipline that's allowed you to eliminate config drift. Yeah developers have no access to production currently. It's pretty great. I'm gonna keep it on lock down until (if) they fire me.
|
# ? May 11, 2024 08:01 |
|
I'm pretty happy with Spacelift, pricing's good (compared to Terraform Cloud) and their support is very helpful and responsive.
|
# ? May 11, 2024 13:28 |
|
12 rats tied together posted:i've used it in production in the past and at the current place i'll be switching us to it later. We need to sit down and review it now that we’re “going legit” and also, you know, IBM, but it seems like an excellent option. I don’t love that it requires an account and phones home your state or whatever but I’d presume that if you give them a bunch of money you can get a airgapped version or whatever. They’re going to be big mad when they find out what the guy that genned up our infrastructure is using it for.
|
# ? May 11, 2024 15:18 |
|
We have a free tier Terraform account for one project, with state stored in Terraform Cloud (I don't know if it's possible to set it up otherwise, it was here when I started). We're also on some old tier of the free plan that's limited by users rather than resources. We're over 500 resources so we can't switch over to the "new" free tier. I don't like how running a plan locally just triggers a plan in TFC, because it ends up being incredibly slow. I don't like how all it reports back to GitHub is a green or red check, and you have to actually go into TFC to see the plan output for a PR. Which requires an account (which I don't know if there are limits on the number of accounts anymore or not). Doing a POC on Atlantis is a low priority project for us. Actually a second POC, but the first was also before my time so not sure where it went. We're interested in putting plan output in the PRs, so devs can see what their changes are going to do. Also I'd like for state to still be in s3 and be able to run a plan locally in a reasonable amount of time. Currently most of our infrastructure for various services is configured in a monorepo, but different modules for each service/environment. We recently had an outage caused because nobody knew whose job it was to actually apply terraform changes after a PR gets merged, and so we broke a service when we finally did run an apply. And so we're hoping that Atlantis is something we can drop into our environment and help clean things up a bit. It might be the first service we'd run that isn't one of our own products, but the alternative seems to be managing a bunch of custom configuration in our CI to just replicate what something like Atlantis does.
|
# ? May 11, 2024 15:45 |
|
We have Atlantis + gitlab set up such that you cannot merge a branch unless Atlantis cleanly ran terraform apply first. Then when it does it gets merged automatically. I would be kind of shocked if TFC and GitHub cannot be configured similarly but I’ve not actually used it so I guess it’s possible. But anyway yeah it sounds like Atlantis can do everything you want. “Someone” shouldn’t be responsible for applying it should just happen in CICD when the MR is approved. Whether that’s a manual review or just when tests pass if you’re confident enough.
Docjowles fucked around with this message at 16:24 on May 11, 2024 |
# ? May 11, 2024 16:20 |
|
The people creating the terraform PRs should be the ones who own, apply, and shepherd the changes. We also use Atlantis + Gitlab. It’s not perfect, but it’s free and it’s better than running apply from your laptop. I haven’t used TFC for years so I’m sure my opinion of it is out of date, hopefully.
|
# ? May 11, 2024 16:54 |
|
I recently set up Atlantis, it's definitely got some nice pros. I think my preference is still gonna be to roll my own pipeline if I'm on github or gitlab because you can use oidc and if you're using their runners you don't need any infra for the terraform pipeline. If you're using bitbucket or something lovely like that then Atlantis is the way to go. Atlantis is also probably a good option to get something deployed ASAP. Definitely always set up apply before merge and add branch protection rules so you can't merge until apply. Agree with whoever said the person writing the terraform is responsible for applying it. Wild that people are pushing terraform prs and just assuming someone else is gonna finish their work for them.
|
# ? May 11, 2024 17:04 |
|
We use Atlantis and have a pretty significant footprint. As a dev I was skeptical at first but it's pretty decent so far. State locking is incredibly useful and plan output in PRs is great. Only thing that's not ideal is just that it's still terraform under the hood so it's still all the sucky bits of hcl, etc.
|
# ? May 11, 2024 17:06 |
|
Our terraform is in a right state, we've got like 18 layers of "it shouldn't be that way" and yet... Thankfully they've brought me on, and I just keep finding things and fixing them.
|
# ? May 11, 2024 23:27 |
|
Seriously thank you everyone , this is all extremely helpful information and I'll have to do some more consideration, particularly regarding terraform in general.
|
# ? May 12, 2024 00:13 |
|
Warbird posted:We need to sit down and review it now that we’re “going legit” and also, you know, IBM, but it seems like an excellent option. I don’t love that it requires an account and phones home your state or whatever but I’d presume that if you give them a bunch of money you can get a airgapped version or whatever. They’re going to be big mad when they find out what the guy that genned up our infrastructure is using it for. It does let you use self-managed backends like S3: https://www.pulumi.com/docs/concepts/state/#using-a-self-managed-backend That's what we do and it works just fine
|
# ? May 12, 2024 00:30 |
|
Has anyone looked at using Digger instead of Atlantis? Seems potentially less clunky and with better PR output but I haven't bothered yet.
|
# ? May 12, 2024 04:24 |
|
Just FYI apparently back in February argo-helm released v6.0.0 of their argo-cd chart version. They're already on v6.9.2 The big breaking change is that you don't need to do absurd backflips to get the standard helm chart to play nice with modern ingress controllers
|
# ? May 15, 2024 23:04 |
|
I don't really understand Argo despite working with it in the past. Is my vague understanding that it's a fancy helm repo frontend that can also deploy the things as well more or less on point? My K8s ecosystem understanding is still laughably underdeveloped. I also made the extremely stupid mistake of volunteering to handle compliance stuff to get our exempted app, now in prod with customers, into internal line with infosec and other bureaucracy and have been in email hell for a couple of weeks now.
|
# ? May 17, 2024 02:47 |
|
Warbird posted:I don't really understand Argo despite working with it in the past. Is my vague understanding that it's a fancy helm repo frontend that can also deploy the things as well more or less on point? My K8s ecosystem understanding is still laughably underdeveloped. What do you need help with, specifically? The way I explain it to internal customers is that it’s like helm apply but they don’t have to manage or automate it, and it will optionally prevent drift due to manual changes. That’s it. People try it. Some like it because it’s less to thick about. Some like it because of the UI. Some don’t like it. They probably don’t like it because it forces them to be honest about how often they need to make small, unplanned, and unannounced tweaks to keep things humming along.
|
# ? May 17, 2024 03:19 |
|
If you aren't using Argo Rollouts it's debatable if you ever needed Argo at all Like, seriously, unless you're doing something dumb with sync waves, the features in Rollouts are the whole reason you choose Argo over something with fewer moving parts
|
# ? May 17, 2024 04:08 |
|
idk, I don't see a great deal of benefit there vs just having CI/CD call helm install/upgrade itself, but I fully admit my ignorance in the area. We don't have to dwell on it, I'm sure I'll get more contextual understanding in time.
|
# ? May 17, 2024 18:18 |
|
Argo is nice because it gives you a single pane of glass, a dedicated notification controller, a health check for the overall system, and a way to pause specific helm deployments
|
# ? May 17, 2024 18:49 |
|
that’s what we use it for, its a good watcher and has integration so you can see pod log messages and health, terminate and manage deployments and stuff like that from the console. there are lots of tools to get your code from git into active running but it’s that plus a pretty neat package that has some dashboards and usability tools built in. it’s also not just for helm, not all our appsets have charts and it works for them too.
|
# ? May 17, 2024 19:04 |
|
I have been pouring over the helm options to deploy, I think Prometheus Tempo Loki Grafana Prom-tail/alloy Grafana offers no less than 5 helm charts for Loki, at least one is completely depreciated, and then there's a secret sixth one distributed with the main app None of them seem to be updated with any particular regularity, every couple of months Grafana went v11.0.0 over a week ago and the helm chart hasn't been updated despite having multiple rc in the lead up There's a helm chart called "lgtm-stack" that also includes their Thanos (Prometheus distribution proxy) alternative... Mirai or something Seems like the best option for deploying Prometheus/grafana is to just use the poorly named "kube-prometheus-stack" which installs Prometheus, Thanos, and grafana. Then install... I guess, prom tail or alloy (alloy just hit v1.0 and they're getting rid of prom tail/grafana agent) helm chart, then install loki, then install tempo? Looks like CNCF is producing perses as a CNCF alternative to grafana https://github.com/perses/perses Looks like they hit v0.43 the other day, looks roughly on par with grafana 2/3 based on a single screenshot
|
# ? May 18, 2024 05:18 |
|
my dirty k8s secret is that I’ve never actually deployed Prometheus/grafana in 5 years of kubernetesing datadog and opentelemetry go a long way yall
|
# ? May 18, 2024 05:36 |
|
datadog costs money though
|
# ? May 18, 2024 05:37 |
|
grafana and prometheus are popular because its the same garbage nerds are running in their home labs
|
# ? May 18, 2024 05:38 |
|
The Fool posted:grafana and prometheus are popular because its the same garbage nerds are running in their home labs
|
# ? May 18, 2024 05:50 |
|
I tried patching with kustomize but I'm doing something wrong cause I'm not getting any errors but the service isn't being patched. I might poke at it a little more before just disabling that charts grafana and just install grafana on my own. well I guess im dumb. looks like I could just add the service/annotations to the values even though the default values file didn't have that Resdfru fucked around with this message at 07:30 on May 18, 2024 |
# ? May 18, 2024 06:04 |
|
I'm waiting for my company to XYZ and then we can move over to datadog or whatever, if necessary There's a three year old jira ticket (which predates me by 2.5 years) that says "ship logging somewhere that engineers can see it" so that's where I'm coming from We have a pretty small stack with ~500 active 9-5 users so "computer janitoring grafana" seems like decent make-work once everything is in place and stable, rather than farming it out to grafana cloud or datadog There's no helm chart that formally supports grafana 11 which feels like a pretty big upgrade, plus I accidentally ran it so it already migrated my db so I'm stuck with it, so working on a solution for that In the mean time, having the "lgtm" stack (lol) is nice as I'm not dropping down to the command line constantly, and helps build confidence with my boss that the added expense is worth it. Importantly, building/struggle bussing through running your own monitoring + alerting stack also forces you to really look at what's going on, which is important if you're the principal over everything
|
# ? May 18, 2024 07:28 |
|
The Fool posted:datadog costs money though Yeah this At some point I'll make the financial case that datadog is cheaper than hiring a jr to janitor Prometheus/grafana. We just haven't crossed that bridge to operational maturity yet
|
# ? May 18, 2024 07:45 |
|
Hadlock posted:At some point I'll make the financial case that datadog is cheaper than hiring a jr to janitor Prometheus/grafana. We just haven't crossed that bridge to operational maturity yet Datadog has a great service but holy poo poo it's expensive if you have even a modest amount of data you want to feed it. If you're in a multi dev organization you really need to police what people are putting into it because the cost can very quickly run away if people aren't careful, and if you're the one who championed Datadog guess who gets the blame when that unexpectedly big invoice shows up?
|
# ? May 18, 2024 11:28 |
|
|
# ? Jun 9, 2024 08:10 |
|
Prometheus was a huge upgrade for me because it finally weaned the nerds I work with from the feeling that they needed to store metrics going back to the birth of Christ. Every time a ganglia rrd got lost I heard about it and it drove me nuts. But when our server count outgrew what ganglia could handle I got to swap to Prometheus and was all "sorry, it can only handle 6 months, behind that it tosses chunks." There was grumbling but they adapted. It's been pretty maintenance free too. Wish I could say the same about grafana.
|
# ? May 18, 2024 12:37 |