Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

candide: Jun 16, 2002; The Tipping Point

I have a Docker best practices question and I didn't see a Docker thread anywhere.

code:

# `dev` Docker image
- /app    # source code, changes often, lives with the image (for better or worse)
- /data   # large training weights, datasets, what have you. size of /data >>>>>>>>>>>>>>>>>>> size of /app

Fake but representative version of our current image. Could be better but it's what we got for now. Compared to what we used to do, everyone prefers this because how easy it is to `docker pull dev` and hit the ground running. I hate it because `/data` rarely changes, and 98% of the `docker pull` command is spent downloading the same contents of `/data` for the umpteenth time. Optimizing the Dockerfile to minimize what layers gets downloaded helps, but I'd just like to decouple `/app` from `/data` completely.

I imagine Docker volumes are the way to go:

a) Remove `/data` from the `dev` image
b) Have users create a docker volume and populate `/data` in the volume
c) From now on, create containers of `dev` while volume mounting said volume
d) Updates to `dev` only concern things in `/app`, and the same volume persists `/data` from container to container

That seems straightforward (assuming I understand it right), but now my concern is version controlling the Docker volume. Everything I've read explains how you can export the volume as a means to backup the volume or migrate the volume from host to host. Is there a less manual way? Given only these two options, I'd rather spend time (re)downloading `/data` vs. holding someone's hand every time they fat finger `/ddata` or something. I'd rather not create any custom scripts to help this either, unless I have to. I'm hoping these aren't the only options? I think I'm asking if there's some sort of `docker volume pull` equivalent?

Another thought, why would this be terribad: Create a `data` Docker image that contains nothing but `/data`, and use Docker Compose to start the `dev/data` containers separately along with the proper volume mounts. Time passes, they update `dev`, time passes, someone hollers and says there are new weights available in the new `data` image, people update `data`, etc. My Google skills are fading because I can't seem to find a single example of anyone doing this so I'm assuming this is dumb and/or there's a more obvious solution.

I'm intentionally avoiding things like Kubernetes. All those features for scaling, deployment, etc. are nice but way overkill for our simple little development-not-deployment environment and also I had to google how to spell Kubernetes just now. We've simply containerized source code w/ the dev environment so people can jump right in and not have to deal with dependency hell.

And is there a Discord server for SH/SC or cobol nerd goons?

# ? Feb 2, 2021 02:33

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 08:21

12 rats tied together: Sep 7, 2006

Hadlock posted:

It is so, so, so hosed

I used to get really confused when people would hesitate on cloudformation but the vast majority of cfn I've encountered in the wild has been absolutely deranged.

If it takes you longer than like, a single day to extend a template to support a new requirement that's a failure of the template and needs to be examined so we can find the root assumption that was incorrect and try to learn from it. Longer than a week is an ops disaster. At 6 months you might as well fire the whole team and start over because you'll probably have better odds with new staff.

There are knock-on benefits to infrastructure as code like readability, determinism, test suites, etc. but the primary reason we do it is based in producitvity, because it is faster than clicking. If it's ever consistently slower than just clicking there are some serious loving problems lurking around. Holy poo poo.

# ? Feb 2, 2021 03:47

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

IMO the primary benefit IS reproducibility rather than productivity; clicking is cancer. 100% agreed on the rest though.

I spent the entirety of 2019 trying to make extensible terraform modules for consumption by other groups, as described above, and it went absolutely how you expect.

Bhodi fucked around with this message at 03:58 on Feb 2, 2021

# ? Feb 2, 2021 03:54

Methanar: Sep 26, 2013; by the sex ghost

Hadlock posted:

It's taken us six months to roll out a postgres prometheus exporter

wtf

# ? Feb 2, 2021 04:02

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

candide posted:

Another thought, why would this be terribad: Create a `data` Docker image that contains nothing but `/data`, and use Docker Compose to start the `dev/data` containers separately along with the proper volume mounts.

Yeah I'd probably go with this, because then I'd retain the ability to easily retrieve the data via a docker pull. Otherwise I'd just use some scripts to install / update the data into some standard location, and bind-mount it read-only into any dev containers.

But before you dig into solving it, what problem exactly are you trying to solve? Seems like everyone else loves the convenience and is willing to put up with the extra build/pull times, but it only irks you. Of course there are multiple solutions, but there's no solution that doesn't negatively impact convenience for someone.

# ? Feb 2, 2021 04:24

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

The best part about cloud formation is it being months behind the AWS terraform provider whenever I was on teams that were forced to use it

# ? Feb 2, 2021 04:45

Vanadium: Jan 8, 2005

Do normal people actually use terraform or cloudformation directly to deploy releases of user code/their own apps/w/e? It feels weird to leave the details of how a thing is rolled out to a generic aws resource provisioning tool that couldn't care less about app semantics.

Edit: For the purposes of this post I'm going to assume we're not the only ones who don't just live in a kubernetes cluster 24/7

# ? Feb 2, 2021 04:59

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Terraform is very bad at anything at the OS layer so no, it's not the tool I'd use to deploy apps. I've had the most luck using terraform to stuff the cloud-init with a bootstrap/install script of some kind but SSH scripts are, in general, a bad idea during a terraform run. It's very very brittle because it has almost no error handling or ability to detect/recover from issues. Even the basic AWS delay of assigning a dynamic IP after an instance is provisioned can cause runs to fail, as can any issues with connectivity between the host that you're running terraform on. It's such an issue that there's a giant banner "Provisioners are a Last Resort" in the terraform documentation trying to warn you away from this approach.

Because of this limitation, terraform's not a good tool to orchestrate software provisions or updates - you're going to want to use something else. Once something is "provisioned" the expectation is that it doesn't change, which doesn't mesh with software updates unless you deploy the entire thing from scratch, which terraform is also not able to do as there's no good mechanisms rolling update/redeploys. There are a few blogs that talk about some approaches that get close but those are more hacks that require manual steps (rather than something that's more consistent, repeatable, and able to be put into a automated pipeline). Look elsewhere.

Bhodi fucked around with this message at 05:22 on Feb 2, 2021

# ? Feb 2, 2021 05:09

Hadlock: Nov 9, 2004

12 rats tied together posted:

I used to get really confused when people would hesitate on cloudformation but the vast majority of cfn I've encountered in the wild has been absolutely deranged.

The best part is that the cloud formation gets fed into a not-invented-here custom compiler that then feeds it to S3 that gets picked up by some other process which makes local dev impossible

There's only one diagram explaining the whole thing and it's by the guy who wrote the whole thing from scratch and the diagram is outdated and terrible

Also the custom compiler only speaks json (instead of yaml) and there are no comments on json so you sort of just have to divine what is happening, and sort of guess/memorize a bunch of default behavior that's only defined in the (not exaggerating) 8th nested layer of templates. The learning curve for our cloud formation is steep

Also also, at least two different times they decided to change the folder structure without moving the old files so templates are cross nested through different folder structures

12 rats tied together posted:

absolutely deranged.

Yes

Hadlock fucked around with this message at 06:04 on Feb 2, 2021

# ? Feb 2, 2021 06:01

Methanar: Sep 26, 2013; by the sex ghost

Hadlock posted:

8th nested layer of templates

start over from scratch

# ? Feb 2, 2021 06:05

candide: Jun 16, 2002; The Tipping Point

minato posted:

Yeah I'd probably go with this, because then I'd retain the ability to easily retrieve the data via a docker pull. Otherwise I'd just use some scripts to install / update the data into some standard location, and bind-mount it read-only into any dev containers.

Thanks, this helps.

quote:

.. no solution that doesn't negatively impact convenience for someone.

I failed to mention that I had issues with this initially but it was a tiny nuance and I'd rather please the team as a whole. This has come up again once the pandemic / working from home became the new norm and people are starting to get sick of hour long pull times over VPN.

# ? Feb 2, 2021 15:30

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Cloud formation frameworks aren�t that bad with Sceptre. I purposefully kept it lightweight and that it�s just an easy way to roll a number of stacks together without having to setup stack sets like AWS wants you to do nowadays. With complex stacks I found that the export and parameter limits of Cloudformation make expanding them tougher than Terraform by raw design, but the tooling is overall easier and the DSL black magic for anyone that can�t understand why you need to have half a dozen Fn::Split statements everywhere to accept a parameter I had to make because we ran out of parameters and had to cram 12 fields into one.

# ? Feb 2, 2021 15:57

Votlook: Aug 20, 2005

Hadlock posted:

We have a team of four, soon to be six, I think I am the only person on the team who is not a dedicated infrastructure-as-code janitor, we're trying to bake everything into this twisted speghetti cloud formation and one of our new guys was so frustrated with the disaster he quit rather than keep working on it, and it's only going to get more weird and more twisted and more speghetti.

This new guy had the right idea, how the hell does it take 6 months to roll out postgres prometheus exporter?
CloudFormation can be a bitch but deploying a simple app should be doable in a few days.

Votlook fucked around with this message at 22:31 on Feb 2, 2021

# ? Feb 2, 2021 21:35

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

If it takes you two weeks to deploy a tool, your infrastructure is hosed up.

If it takes you six months to deploy a tool, your organisation is hosed up.

# ? Feb 2, 2021 21:45

Votlook: Aug 20, 2005

necrobobsledder posted:

Cloud formation frameworks aren�t that bad with Sceptre. I purposefully kept it lightweight and that it�s just an easy way to roll a number of stacks together without having to setup stack sets like AWS wants you to do nowadays. With complex stacks I found that the export and parameter limits of Cloudformation make expanding them tougher than Terraform by raw design, but the tooling is overall easier and the DSL black magic for anyone that can�t understand why you need to have half a dozen Fn::Split statements everywhere to accept a parameter I had to make because we ran out of parameters and had to cram 12 fields into one.

AWS recently increased the limits on parameters and outputs: https://aws.amazon.com/about-aws/whats-new/2020/10/aws-cloudformation-now-supports-increased-limits-on-five-service-quotas/

CloudFormation always felt a bit low level to me, are Pulumi or CDK any good?

# ? Feb 2, 2021 22:36

Hadlock: Nov 9, 2004

Votlook posted:

AWS recently increased the limits on parameters and outputs: https://aws.amazon.com/about-aws/whats-new/2020/10/aws-cloudformation-now-supports-increased-limits-on-five-service-quotas/

CloudFormation always felt a bit low level to me, are Pulumi or CDK any good?

Pulumi, last time I checked was just a code wrapper around terraform bindings, which I guess is sort of the community fork of terraform I was wishing for upthread

# ? Feb 2, 2021 22:44

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Votlook posted:

AWS recently increased the limits on parameters and outputs: https://aws.amazon.com/about-aws/whats-new/2020/10/aws-cloudformation-now-supports-increased-limits-on-five-service-quotas/

CloudFormation always felt a bit low level to me, are Pulumi or CDK any good?

So, while those limits were raised the sad thing is that monolithic, monotonically growing stacks like at a previous company would still not have worked out with those resource limits. You still have to design a crapload upfront honestly if you�re going to grow a lot. 500 resources up from 200? I had over 1500 resources necessary to deploy a single environment before which at a certain velocity of changes is terribly slow. On the flipside, Terraform�s problem is that the cycle to perform plans is longer and longer.

CDK is fundamentally Moar CloudFormation and carries with it the same problems I�ve been hired to fix many times that none of the Infrastructure as Code tooling accomplishes - developers get roped into writing infrastructure code rather than feature code and make bad infrastructure decisions oftentimes unwittingly that are super difficult to transition away from without causing downtime that may not be affordable or negotiable. This isn�t as much of a death stroke like it used to be with VPC transit gateways making bad VPC subnetting and routing decisions a costly inconvenience that you can pay more to Amazon to hand-wave away turning your technical debt directly into Amazon�s revenue.

Pulumi�s what a lot of people that dislike HCL and want to really use code would prefer because you can use third party general purpose libraries again rather than having to resort to weird stuff like Terratest. I do appreciate the company having the balls to drop their launch party flyers at Hashiconf 2019, but them immediately going managed may hurt adoption more than if they launched as fully open first like the trend has been for like 15 years now with most of our commonly used tools. I�d hoped https://ascode.run/ took off which is about the right level of imperative code vs declarative BS but it doesn�t seem to have the kind of traction or ambition to even get funding.

There�s a lot of options honestly mostly extending into NIH solutions but the problem we have in infrastructure management is that we�re having problems managing complexity beyond a certain level with open OSS solutions before we dig in deep and spend the resources to do things on our terms. I think being conscientious at all times about how much time you spend working around your tools and processes vs actually getting work done with them is a conversation any healthy culture should regularly have. I just found out that my company spent a lot of resources moving to a new company-wide tool before pulling the plug on it, and that�s healthy to stop and avoid more sunk cost instead of trying to launch it to pad internal resumes for promotion bullet points.

# ? Feb 3, 2021 05:59

Mr Shiny Pants: Nov 12, 2012

The way I see it for our situation:

We are not a big company by any stretch of the imagination, but we do have a lot of "tenants" that we manage.
One of the things that I could see Terraform be very good at is the whole: "We have a blueprint of how we would like a certain environment to look" and have it take care of that.

These environments don't change that much but it would be really handy to be able to copy configs of certain resources between tenants and to do the low level security stuff on who gets to login where.

Another thing is having the ability to quickly spin up some test environments on whichever cloud provider makes sense so we can test new technologies without needing to manually install it. (which won't be done if you need to do it by hand)

# ? Feb 3, 2021 07:50

12 rats tied together: Sep 7, 2006

Mr Shiny Pants posted:

"We have a blueprint of how we would like a certain environment to look" and have it take care of that.

You can do this with Terraform, you'll run into some obnoxious problems eventually though unless you just go all-in in copy pasting between tenants. The main thing to watch here is modules, since they exist as this really obvious shiny "this will help me copy paste less code" object, but it can get messy quick if you start trying to handle things like "this tenant differs slightly, so we'll add a condition to the module". IMO: keep all of your conditional stuff in root state, modules should not contain any type of forking logic except things that always exist at least once, but might exist multiple times.

Once you introduce a resource into a module with a count that is 0-n you've opened the pandoras box of terraform bullshit, 1-n is fine though. Also unless something has radically changed since 0.11, you're going to be copy pasting a shitload of code anyway, so I would really de-emphasize trying to make it "DRY" or follow other types of "best practices". Write simple Terraform that does what you want, that's really the only point of the tool, there's no need to also involve the wide array of unsolved software engineering problems.

From purely a technical perspective (requirements, features, implementation details) CloudFormation is the better tool here. You will probably have better luck with Terraform though, at least at first.

# ? Feb 3, 2021 18:29

Erwin: Feb 17, 2006

12 rats tied together posted:

Also unless something has radically changed since 0.11

The entire syntax has radically changed, yeah. Terraform is a pain in the rear end, but it's often the best tool for the job. You're the biggest detractor of it in this thread, and that's totally fine because it deserves heavy criticism, but you often make authoritative statements about it that are incorrect. For instance 0-n resource counts works fine and I'm not sure what you're getting at there.

# ? Feb 3, 2021 18:39

12 rats tied together: Sep 7, 2006

Erwin posted:

The entire syntax has radically changed, yeah. Terraform is a pain in the rear end, but it's often the best tool for the job. You're the biggest detractor of it in this thread, and that's totally fine because it deserves heavy criticism, but you often make authoritative statements about it that are incorrect. For instance 0-n resource counts works fine and I'm not sure what you're getting at there.

Sorry, please feel free to implicitly add "IMO" statements to everything I post. It's true that 0-n counts are supported, but it's not a good (IMO) idea to put them into modules. Using modules successfully in a complex environment is like 10% "being able to write the language" and 90% keeping the dependency graph across module boundaries as simple as possible.

If you're calling a networking resources module and passing a flag that potentially creates either a nat gateway or an internet gateway, but never both, you're going to exponentially increase the complexity of dependent code because the route resource does not expose an abstract "target gateway" parameter, it accepts either nat_gateway_id or gateway_id.

In my experience, the more things start to conditionally exist, the more you run into this problem.

# ? Feb 3, 2021 18:44

drunk mutt: Jul 5, 2011; I just think they're neat

I'm really wondering what y'all are running into causing all this chaos.

I have abstracted multiple modules to do explicit types of configs, and most providers will allow additional resources to be attached to the identifier that's used as an output to the module.

So for example, don't do a dynamic block to handle storage for GCP compute, have a module that handles just propping up the instance and output the self_link URL which the implementation can then associate additional storage to.

From what I've seen, a module should do one thing and allow some slight configuration and anything that has more dynamic needs it's better to address elsewhere for the most part.

Also, I've changed my workflow to do Ansible playbooks that call a generic provision role, the provision role creates the plan to be applied does some checks and applies it if those checks succeed. I've found this to do well passing off the execution to front end developers as they'll break poo poo no matter what and this limits how much they actually break.

# ? Feb 3, 2021 19:45

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Leaky abstractions are a problem in all forms of software and the difference with infrastructure code and application code differs in its impact. The criticisms I have of our systems management tooling comes down fundamentally to it being super duper slow and awkward to debug compared to what we do in application code that it destroys our feedback cycles. As a result, people that have given up on this and moved toward container orchestration may have traded off a different set of problems. But observability tooling is much easier when there's fewer components that don't emit tracing events. The closest thing we've got in the cloud realm tends to be more auditing oriented rather than systems management like Cloudtrail or Cloud Audit Logs.

# ? Feb 3, 2021 19:48

12 rats tied together: Sep 7, 2006

Follow up to this bad etiquette, unqualified statement I made:

12 rats tied together posted:

Also unless something has radically changed since 0.11, you're going to be copy pasting a shitload of code anyway, so

I downloaded the latest (? 0.14.5) terraform to check this and it seems like you still can't use variables in source or version directives

code:

Error: Variables not allowed

  on main.tf line 5, in terraform:
   5:       version = "${var.try_version}"

Variables may not be used here.

Error: Variables not allowed

  on main.tf line 11, in module "consul":
  11:   source  = "${var.try_source}"

Variables may not be used here.

Keeping these things accurate and up to date is mostly what I mean by copy pasting code around, especially if you version your external dependencies. The actual terraform dsl that writes out resources is usually handled fine by modules.

e: It's also possible I'm doing this wrong because this is way harder to test, but it seems like the "conditionally null parameter key and value" problem is still here:

code:

Error: Invalid index

  on main.tf line 17, in resource "null_resource" "test_nullable":
  17:     test = "${null_resource.is_null[*].id != "" ? null_resource.is_null[0].id : null}"
    |----------------
    | null_resource.is_null is empty tuple

If I set count=0 on "is_null" I get an invalid index error -- in a production scenario this would be a situation where count = 0 or 1 based on whether or not I want that aforementioned nat gateway or internet gateway, or "public vs private" in AWS terms. If I set count=1 this plans and applies just fine. Back in 0.11/early 0.12 days there was a github issue about how the ternary evaluates both sides of an expression, so "this or null" will never work because "this" must always exist.

The Terraform solution to this problem would probably be operator chaining so you'd have some awful like 200 character nested ternary/list coalesce nonsense. I imagine that _probably_ works now, but I don't care enough to find the github issue where apparentlymart explains it in a way that someone who doesn't work at hashicorp can understand.

12 rats tied together fucked around with this message at 20:18 on Feb 3, 2021

# ? Feb 3, 2021 19:57

Docjowles: Apr 9, 2009

necrobobsledder posted:

turning your technical debt directly into Amazon�s revenue

Just wanted to say I appreciate the occasional deeply cynical gems that appear in this thread. Going to steal this one to describe a...concerning number of Amazon services :v:

# ? Feb 3, 2021 20:23

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

You can't define source versions in most languages in general because it's a sort of very, very late binding if you think about it at a layer beyond shared library ABIs. The Terraform HCL graph is parsed multiple times and even the newer language constructs like for loops for resources and modules wind up being somewhere between an imperative language loop and a m4 macro preprocessor. The fact there's constructs like `try` https://www.terraform.io/docs/language/functions/try.html show a leak in the parsing and evaluation system. There's a lot of things I will rail on about HCL that comes directly from its roots in the Puppet graph declaration DSL, and making versioning of resources and modules look like the rest of the language is an extension of going too far with a leaky abstraction. We've had issues with this stuff for ages like in Puppetfiles, librarian files, etc. and it's making the whole infrastructure as code operations paradigm quickly the worst of coding with the worst parts of infrastructure.

# ? Feb 3, 2021 21:29

12 rats tied together: Sep 7, 2006

necrobobsledder posted:

The fact there's constructs like `try` https://www.terraform.io/docs/language/functions/try.html.

I actually did not know about this at all so I popped it open and:

quote:

try evaluates all of its argument expressions in turn and returns the result of the first one that does not produce any errors.

immediately starting laughing out loud. Wow.

For the rest of the post: I skipped puppet/chef early on in my career and landed in an ansible shop which does not suffer from this problem (and instead has a different set of self inflicted problems), so I appreciate the context.

# ? Feb 3, 2021 21:32

drunk mutt: Jul 5, 2011; I just think they're neat

I ask again, what the gently caress are y'all doing?

I have highly complex deployments and none of this poo poo has ever been an issue. See earlier, I also deal with front end developers.

I seriously want to know so I don't do that, you know.

# ? Feb 4, 2021 03:45

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Imagine wanting to use interpolation in providers/versions.

At some point you have to stop abstraction. Terraform began life as a declarative provisioning tool, it's miles ahead of where it was and is still loads better and accessible. I'm of the opinion that it doesn't need try/catch stuff and if you need that or backend interpolation then just write a thin wrapper for it or use something like terragrunt.

# ? Feb 4, 2021 04:21

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Gyshall posted:

Imagine wanting to use interpolation in providers/versions.

At some point you have to stop abstraction. Terraform began life as a declarative provisioning tool, it's miles ahead of where it was and is still loads better and accessible. I'm of the opinion that it doesn't need try/catch stuff and if you need that or backend interpolation then just write a thin wrapper for it or use something like terragrunt.

Yeah, dynamic versioning is a pretty awful smell that you're doing something fundamentally at odds with what terraform is designed to do.

12 rats tied together posted:

e: It's also possible I'm doing this wrong because this is way harder to test, but it seems like the "conditionally null parameter key and value" problem is still here:
code:
Error: Invalid index

  on main.tf line 17, in resource "null_resource" "test_nullable":
  17:     test = "${null_resource.is_null[*].id != "" ? null_resource.is_null[0].id : null}"
    |----------------
    | null_resource.is_null is empty tuple
If I set count=0 on "is_null" I get an invalid index error -- in a production scenario this would be a situation where count = 0 or 1 based on whether or not I want that aforementioned nat gateway or internet gateway, or "public vs private" in AWS terms. If I set count=1 this plans and applies just fine. Back in 0.11/early 0.12 days there was a github issue about how the ternary evaluates both sides of an expression, so "this or null" will never work because "this" must always exist.

The Terraform solution to this problem would probably be operator chaining so you'd have some awful like 200 character nested ternary/list coalesce nonsense. I imagine that _probably_ works now, but I don't care enough to find the github issue where apparentlymart explains it in a way that someone who doesn't work at hashicorp can understand.

The established pattern since forever is to have a boolean variable at the module level and then define a resource's count based on the variable's value. Of course it gets complicated with dependent or linked resources but that's kind of what you'd expect with a declarative graph.

# ? Feb 4, 2021 14:28

vanity slug: Jul 20, 2010

i stopped loving around with all that boolean bullshit inside modules when they added count to modules

# ? Feb 4, 2021 15:33

Erwin: Feb 17, 2006

Most of the Terraform problems that 12 rats is pointing out are only problems when you're writing the code, and have established solutions. Yes iteration in Terraform is awfully slow compared to application code, but you can test the results of convoluted logic and once it works, it works. You can even do (slow) TDD with tools like kitchen-terraform or Terratest. Terraform is often the best tool for infrastructure automation largely because of its wide use - you'd rarely be the first to run into a given issue, especially with the main heavily-used providers.

# ? Feb 4, 2021 15:35

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

We are literally using Terragrunt and I had to use try because an expression I was using for materializing a resource conditionally depends upon several variables, outputs, and locals. Conditional provisioning of a resource combined with null_resource and provisioners really mucks with a clean dependency graph and you wind up in the same hell as people that need several Chef / Puppet convergence runs, so yeah avoid provisioners like the Hashicorp people tell you to do (I�m porting all our provisioning code to sit as cloud-init and salt states to escape the provisioning blocks basically). I don�t think we have very complicated modules but trying to stitch together a lot of modules that have conditionally provisioned resources is problematic when mixing Terraform versions in your ecosystem. The count method vs for loops are mutually exclusive with each other and have ramifications to resource coordinates. Essentially, addressing for loops of resources versus count is a set because you want to use a string key for resource collections, count is for an array where you want stable ordering.

There�s a lot of footgun issues possible with Terraform and they can all be fixed with some time doing a bunch of import calls and some state file surgery fundamentally. I can�t say that kind of surgery exists for CloudFormation despite all the new features they�ve added to catch up with Terraform. When I�ve had to do anything like rename stuff in CF, that basically is a death sentence for the resource and I can�t change it for its lifetime in production.

Check out the CloudPosse Terraform modules for examples of complex modules that are probably better than what you�d cobble together. What sucks for me currently is our VPN configuration is wonky enough I can�t download modules in a clean repo from end to end unless I disconnect first to do an init, and the only other workaround is to recompile Terraform against a custom Go source that overrides all uses of net/http and their own custom dependency downloader package. So that�s my excuse for why we don�t use module registries at all

# ? Feb 4, 2021 16:55

New Yorp New Yorp: Jul 18, 2003; Only in Kenya.; Pillbug

I�m being pulled into an effort to upgrade a bunch of Terraform from 11 to 12. How bad is it likely to be? I have no idea what their stuff looks like right now so I'm hoping it's not a nightmare to begin with.

# ? Feb 4, 2021 20:16

Methanar: Sep 26, 2013; by the sex ghost

New Yorp New Yorp posted:

I�m being pulled into an effort to upgrade a bunch of Terraform from 11 to 12. How bad is it likely to be? I have no idea what their stuff looks like right now so I'm hoping it's not a nightmare to begin with.

It might be bad.

There is a auto syntax upgrade option, but it's not guaranteed to fix everything.

code:

    0.12upgrade        Rewrites pre-0.12 module source code for v0.12

# ? Feb 4, 2021 20:31

The Iron Rose: May 12, 2012; Cat Army

New Yorp New Yorp posted:

I�m being pulled into an effort to upgrade a bunch of Terraform from 11 to 12. How bad is it likely to be? I have no idea what their stuff looks like right now so I'm hoping it's not a nightmare to begin with.

it's not great. terraform version management in general is also not great, it's difficult to cleanly upgrade when working with many many different developers.

I might even go so far as to say terraform isn't great, but it does do many things well and it's better than cloudformation (lol parameters)

# ? Feb 4, 2021 20:41

12 rats tied together: Sep 7, 2006

Using variables in a terraform's version or source field is dynamic only in the sense that not manually updating module version across 300 module invocations every time you update a module is "dynamic". Having a "default module version" and a "not default module version" is extremely a best practice, even ignoring writing software in general we've been doing this in config files for decades. Same with provider, the PR to update the AWS provider across your terraform repository should be a one liner.

I'm extremely confused by this sentiment, if I can't specify a default configuration in a tool it's a busted tool, whether it was intentionally(?) designed that way or not.

New Yorp New Yorp posted:

I'm being pulled into an effort to upgrade a bunch of Terraform from 11 to 12. How bad is it likely to be? I have no idea what their stuff looks like right now so I'm hoping it's not a nightmare to begin with.

I had to relearn terraform from 0.11 for my "0.14.5 still sucks" post above and I thought it was fine. I would not use the auto updater though, but I always hate that kind of poo poo.

# ? Feb 4, 2021 20:47

12 rats tied together: Sep 7, 2006

necrobobsledder posted:

When I�ve had to do anything like rename stuff in CF, that basically is a death sentence for the resource and I can�t change it for its lifetime in production.
[...]
Check out the CloudPosse Terraform modules for examples of complex modules that are probably better than what you�d cobble together.

I did this because the official terraform example for "a vpc" is a literal nightmare, and while this stuff is "better", the terraform file in the subnets submodule that decides whether or not to create a nat gateway has 8 distinct ternaries in it, again, this is the dedicated file for only whether or not a nat gateway is created.

I wouldn't even say it's better than the Terraform I cobbled together to create subnets in the past, nevermind the tool that replaced Terraform. I also don't see a way to specify how many private subnets I want in a VPC, or any type of config for mapping private subnets to nat gateways, which is a red flag but could also just be user error on my part.

re: renaming stuff in CloudFormation, you can rename most resources at most times. The resources you can't rename are due to API limitations, which Terraform also suffers from. CloudFormation at least documents what happens when you change any property whereas with Terraform you'll have to run a plan instead.

If you're talking about CloudFormation's logical resource id, yes that used to actually be impossible to change. It's possible now but the only constraint on the logical resource id is that it needs to be unique, so having to change it in the first place is basically template author error. You might as well just generate uuids for every resource id if this is a consistent problem.

# ? Feb 4, 2021 21:00

vanity slug: Jul 20, 2010

I don't think that module is official. It's one of Anton Babenko's modules iirc and he has a bad case of overengineering.

# ? Feb 4, 2021 22:42

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 08:21

New Yorp New Yorp: Jul 18, 2003; Only in Kenya.; Pillbug

Methanar posted:

It might be bad.

There is a auto syntax upgrade option, but it's not guaranteed to fix everything.
code:
    0.12upgrade        Rewrites pre-0.12 module source code for v0.12

Yeah, that's kind of what I figured. This organization is all sorts of messed up with the way they're doing Terraform, I know that from past experience working with them where it took 3 months to get an environment stood up because they wouldn't let us see or modify anything and had to communicate with them entirely via ServiceNow ticket. I'm going to peel back the covers in a week or two and see just how bad it really is. Coding horrors posts incoming...

# ? Feb 5, 2021 00:12

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »