Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Vulture Culture posted:

First, I'm going to challenge "no syncing of environmental state data in/out". You can use Consul, etcd, any S3-compatible datastore (e.g. Ceph or OpenStack Swift), local Artifactory, or a REST endpoint to handle first-party storage of the state information. Any of these options will work fine in an airgapped configuration. If you're doing cloud in an airgapped environment, I assume you're running in an OpenStack environment, so you should be able to just use whatever you're currently running for object storage.

(e: unless I misread you as "there's no conceivable way to share state data from one environment to another", in which case it sounds like you have very little managed infrastructure to share anyway.)

I have a few basic guiding principles for managing our Terraform configurations, which can be fairly complex at times:

  • Terraform remote state isn't just for synchronizing the state of a project between multiple computers or developers. You can also import that remote state with the terraform_remote_state data source, which allows you to abstract out even derived pieces of configuration data in a way where it can be shared between multiple projects. As an example: if you want multiple dev environments sharing common infrastructure in a common VPC, use a module to instantiate it, then export the IDs of important resources via remote state. You can import those IDs in your environment-specific projects.
  • As a corollary to the above, migrating resources between projects is still a work in progress [disaster] in Terraform, so it's better to add too many layers than too few.
  • Resist the temptation to build everything into Terraform. It shouldn't be a God Object for your infrastructure any more than Puppet or any other declarative configuration management tool. Tie into other tools everywhere they're appropriate.
  • Your projects may end up encapsulating thousands or tens of thousands of resources. The state of each resource needs to be queried every time you create a Terraform plan. Consider this when you figure out where the boundaries between projects should lie.
  • Modules are your friend. Use them freely, but don't try to stretch them too far or make their logic too complicated. If you need anything more complicated than count driving your conditional logic, make a different module. A little bit of copying and pasting is way better than a series of janky, completely broken abstractions.
  • Keep your module hierarchies flat. Include as many modules as you want from your project, but don't import modules from other modules. Pass data back down from your modules via exported variables, aggregate it together in your top-level project files, and pass that data to higher-level modules from there. You'll be much happier. Your code will also be much more composable.
  • Terraform has a Workspaces configuration, which used to be Environments. I've never used it. I use a separate project per environment to keep things completely, totally unambiguous and limit opportunities for pilot error.

For sure. In environments that aren't legally bound up by regulatory compliance requirements, it's sanest to push responsibility as close to the application owners as possible. You own it, you run it. It's your department's AWS account, do whatever you want. Use whatever tools, or no tools.
The airgap is a security control, not a technical one. The only way data crosses boundaries is through very specific channels, with nothing automated. It kinda sucks. We push a "release" bundle of git code and it goes through our security dept and they examine then they place it in the git repo of the requested environment for us. We're on various vpcs within aws govcloud, running our own kubernetes in ec2 (eks not offered yet), each site having separate accounts. From what I was reading, I think I can use the same terraform config in our sync'd git repo to manage multiple environments with remote state, but pointing to a different s3 bucket per site, and then workspaces to handle the prod/stage breakout within the site. Alternately, we could break out the different prod/stage environments within sites into their own modules. Then again, limiting opportunities for pilot error is fairly important, we were worried about having a separate project per application would clutter up our repo too much, but that may be a better way to go, except that then there's no easy way to track overall what terraform has deployed, and if someone wants to make a one-off box for testing, they need to fork the primary project repo and make their own just for one box - seems a little wasteful, and annoying. Because of the sync issue, in that we have to release and package each repo as a zip separately, we'd really prefer a single repo but with different terraform config files within it that you specify at runtime - workspaces would probably solve that, except that we can't have workspaces be both the prod/stage differentiation and also the app, unless we want to name the workspaces app-stage and app-prod, which is uh. no.

I also need to find a way of giving terraform derived variables for the vpcs and such to create hosts, and what makes sense in this context. I think modules are likely the answer here. Fortunately, we're really only looking at terraform to create random one-off machines for apps that don't belong in containers, and we're using chef to manage apps, so we're using terraform strictly as our instance deployment manager.

The hierarchy thing is a good thing to note, definitely. I can see how this can get complex, fast.

Bhodi fucked around with this message at 21:55 on Oct 27, 2018

Adbot
ADBOT LOVES YOU

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Anyone else going to re:invent? Can't wait to look at all the stuff I can't / won't use :confuoot:

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Ah yeah, that one's next month. Our group split, half are going to reinvent and half are going there.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

toadoftoadhall posted:

When commissioning a new server, I run through a mental checklist I've cobbled together from linode and digital ocean tutorials for CentOS or Ubuntu machines. Eg:
(1) upload public
(2) add non-root user
(3) add user to groups
(4) configure firewall
(5) ...
(6) N

Hasn't been an issue, because they've been my approaching-nil traffic personal VPSs. Weaknesses are obvious. What does "doing it right" involve?
Not doing this every time

Less pithy answer: pick your favorite stig/hardening, make it into an image / kickstart / autoinstall. Basically don't do it by hand

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Docjowles posted:

Current re:Invent status: Waiting in an hour long line to even register for the conference. Going to miss my first session. No food or coffee because everyplace that serves those also has an hour long line.

My coworker went to a different venue, was registered and had coffee in like 10 minutes. Currently researching the legality of murder in Nevada.
it's pretty nuts. I got in last night at 2am and got up early because I knew badge line was gonna be long

the shuttle busses are the most silly thing to me, it takes an hour to get ferried across the street because this city is hell on earth, it's like everything is done and designed in the worst way possible, deliberately

it's 4pm and feels like 10pm, i spent all day getting pitched to and what little I learned could have been conveyed in a 10 minute blog post read; i want to pull the plug on this entire week

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

necrobobsledder posted:

How do you guys do continuous prod deployments to systems that have message queue based communication and handle heterogeneous application component versions consuming from shared queues? In a synchronous processing world that'd be an endpoint handled by different versions of the service. We have a web request frontend, clients upload large artifacts separately (S3, their own hosting service, etc.), reference them in their API request, and processing is picked up asynchronously via SQS queues serialized as < 1 KB XML messages across several upstream services that self-report the status of their tasks to the primary Aurora MySQL DB. I'm trying to setup an architecture in AWS using a canary / blue-green approach using environment-specific SQS queues, load balancers, and instances but shared data stores like S3 buckets and DBs. DB updates to apps will be done by mutating their views, not by changing the actual underlying DB structures (the latency hit isn't measurable for us in tests so far). This would allow us to make a bunch of changes in production as necessary, cherry pick messages from queues to run through a deployment candidate's queues, and rollback changes faster than we do now (a deployment process straight out of 1995 but in AWS and with 90% of our services that can't be shut down on demand without losing customer data, which really, really, really is a pain in the rear end)
LOL, we don't, one of our primary apps uses zookeeper which is a mid aughts piece of poo poo that requires the config file to have all the other nodes listed in the config file on startup and is brittle as hell, absolutely not designed for cloud or containers and because everything relies on it, we have to bounce the entire thing every single time. There's no wedging some legacy apps into the blue/green canary deployment framework no matter how much you want to. They'd take a complete rewrite.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

StabbinHobo posted:

this somewhat impossible recursive chasing of a way to abstract away a state assumption is, in large part, why kafka was invented.

jury is still out on if thats a good thing (i lean yes).

sorry that doesn't really help you though because "rewrite everything to upgrade from rmq to kafka" is about as helpful as "install linux problem solved".
Funny joke, some java dev in the monolothic app I'm talking about decided this exact thing about two years ago, so now we run two problems. Only half of the app's been ported so we have both zk and kafka for the foreseeable future during the "transition period"

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
it's all https://github.com/brandonhilkert/fucking_shell_scripts

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Blinkz0rz posted:

Hot take: deploying kubernetes (properly) and maintaining deployment systems on top of it takes more work (and reaps less rewards) than a mostly working existing system.

This side of the industry loves new toys but gently caress me if kubernetes adoption for its own sake is the loving dumbest thing I've ever seen.
For us, it's a way to get from a desired state (chef) environment to a baked image (containers, ami) environment and also a way to condense a huge dozen-server-spanning application into a smaller number of EC2 instances. I'll let you know next month if it actually works, but that's the theory.

Of course, we're not deploying kubernetes properly, we're literally throwing together whatever we can get into production as fast as possible because we were given 4 months to go from 0 to fully deployed, with December being one of those months. So it's a toss-up as to how well it actually performs.

We already ran into this unresolved issue running on m5s: : https://github.com/kubernetes/kubernetes/issues/49926 and so I look forward to stepping on other land mines.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
We're using artifactory because it allows us to host all of our other repos (rpm, nuget, gem, etc) at the same time and because it's S3 backed it's fairly economical and we don't really have to worry about space issues.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

smackfu posted:

Ironically often the reason GitHub is blocked is because people check company code into it so they can work on it at home. Now that they have free private repos maybe this will be less of a problem.
people should know by now that you can't block stupid

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Anyone have a good aws terraform example config for something like that? We're bringing up a new vpc from scratch and want to switch to autogenerating the subnets, sgs, iam roles and such and I don't really want to fall into any obvious traps.

Any good whitepapers or blogs on this? Like, some vpcs are nearly permanent you might want a different state store than your apps just to prevent accidents? Stuff like that.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
That terraform blog post was amazing, I need more words like that.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
The first two bullet points work fine with github; you're describing tags, and your test env is labeling all your commits to your feature branches with whether they passed your tests. You can absolutely restrict pull requests to only tags. Depending on the frequency of commits / size of your dev team you may not need the two-tiered approach that you laid out, and if you do it's more commonly implemented as unit testing feature branches (your Test), then merging into dev if passing and then periodically tagging dev branch commits for integration testing (this would be manual in your case, sometimes it's weekly or daily or whatever) as a pre-requisite for merging a release into master. If it ends up failing, you end up just doing an additional commit into dev from your feature branch and kick the test off again - it's not really necessary to track it back to the commit of the feature branch like you're suggesting.

The benefit of doing it this way is that you can test multiple feature commits at the same time on a periodic basis, it conveniently follows common business requirements like sprints and quarterly releases, and if you have REALLY long tests you can tune the auto testing to fit them instead of having them queue behind each other as devs frantically try and get their features in at 3pm on a friday before the end of the sprint.

Bhodi fucked around with this message at 03:23 on Mar 13, 2019

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

FISHMANPET posted:

We're way more on the ops side than dev side, so we basically have zero formal software development process requirements. And generally the changes we're working on are small enough that only one person is working on them. We don't do "releases" we just push code when we write it. And we've never used tags (should we be?).

I'm sure this isn't unique but a lot of our code depends on a ton of other stuff so integration testing requires we have basically a mirror of a lot of our environment, and each runbook needs something radically different. Testing our self service server builds requires a separate form to accept submissions from. Testing code that runs during our server build requires modifications to our server build process. Testing code that updates our inventory requires a bunch of test Google documents, etc etc. So it would be nice to have all those environments setup so upon doing something (pushing to a specific test branch) the code gets deployed in whatever way is appropriate to test it so after that we can fill out the form and submit a server request, or build a server that will run the test code, or modify the test google documents instead of the prod...

Maybe we're small enough that I'm over thinking it. Maybe I should just start setting up those test scenarios and setup our deploy automation to start doing deploys when it seems commits to branches other than master.
Whoops totally missed this, I kinda forgot this thread existed. Yeah, I think your gut is probably right, best thing to is set up a simple case and try it out, you can always add more complexity later but it's very, very difficult to reduce complexity once it's been added. If you've got manual steps you're going to find it difficult to close the CI pipeline loop.

Tags are good, but only if you care about seeing whether a specific commit passed tests at a glance. Making releases in github does functionally the same thing.


For my own stuff, I may have asked this before anyone have good terraform whitepapers on infra design for multiple environments and app deployment with a CI/CD pipeline? We're building out a new env from scratch and it's been decreed that we're not going to be using ansible and jenkins will the the orchestrator so I need to figure out a way to wedge absolutely everything I can into a terraform git repo including application configuration. Does it even have a templating feature? I'll probably be leveraging our existing chef infra for the hard stuff but woof, it's going to suck to split code like that.

Bhodi fucked around with this message at 22:22 on Apr 21, 2019

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

chutwig posted:

I would recommend spending some time getting to know Packer so that you can build AMIs. There's a temptation to use Chef to both lay down your baseline and then configure instance-specific settings, which is problematic because it takes more time to run and you need to add in a reboot for security patches/kernel updates to take effect, which is annoying to deal with. Moving the baseline setup into Packer means you spend less time converging, don't need to reboot, minimize the complexity of the cookbooks, and have a known good AMI built on a schedule that you can use across the rest of your infrastructure.

Are you planning to use Chef in client-server mode or chef-zero mode? If client-server, have you considered how you plan to handle key distribution or reaping dead nodes from the Chef server? There are valid reasons to not use Ansible or SaltStack, but Jenkins probably isn't the right tool to replace them.
We're not planning to build a separate AMI per app. We've got two known-good patched and security vetted AMIs generated through a different process: one for docker (contains a much larger partition for the image cache) and one for general use - both run chef-client on boot. We have several apps that are already using chef (server) to provision and so we'll be leveraging previously written cookbooks for those apps. We're planning to use only terraform plans to build out hosts for apps and so the decommissioning of both dns and chef node+client will happen in the plan itself (through the chef provider?). Most plans will either build a box and launch a local docker container through the docker provider or run the desired chef app cookbook from the runlist. My tentative plan is to have a separate testing tfvars file that we can hook into a ci pipeline in our test space for testing pull requests, and have both the chef cookbook repo and the terraform repo both execute the same test via a Jenkinsfile in the root of the terraform repo.

Bhodi fucked around with this message at 20:43 on Apr 22, 2019

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Umbreon posted:

If anyone here has some spare time to answer:

What's a day in this career field like? What do you do every day, and what are some of the more difficult parts of your job?(and how do you handle them?)
If you could explain what your reason for asking is, people could reply with a more tailored focus on the things you are specifically interested in.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
I told the interns this year that my job is "computer janitor". I helpfully explained that I "Janitor the computers, you know, tidy up the cloud"

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
I'm getting really fed up with declarative poo poo for systems management and just want to go back to procedural

Things really do run in cycles, don't they. We're back to fancy shell scripts

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Necronomicon posted:

Can anybody provide some conventional wisdom re: Terraform backends in AWS? Specifically regarding things like multiple managed environments. Should each deployment have its own specific S3 bucket and DynamoDB table? For instance, I currently have four deployments - Company A Staging, Company A Production, Company B Staging, and Company B Production. Is there a clever way of keeping all of those state and lock files in the same location to keep things nice and clean, or is it better for them each to have their own isolated environment?

I started from scratch about a month ago, mostly working off the the Gruntwork Terraform guides, so I might have missed some important bits along the way.
Absolutely use separate state files or you WILL be sorry. in AWS, use dynamo and S3. You can share dynamo tables (the key is per root module name) and share s3 buckets (the key is the directory name within the bucket) for cleanliness. All you need is

backend.config posted:

bucket = "my-s3-bucket-name"
dynamodb_table = "terraform-state-lock-dynamo"
key = "whatever-name-you-want-maybe-module-name-but-this-becomes-a-dirname-in-bucket"

backend.tf posted:

terraform {
backend "s3" {
encrypt = true
region = "your-favorite-aws-region"
}
}
then you just "terraform init -backend-config backend.config -upgrade" as normal


In fact, I advocate for multiple state files within the env, like if you are using terraform to deploy EVERYTHING, I highly HIGHLY suggest you isolate your vpc, security group, and IAM stuff from ec2 instance deployment and management. Ignore this at your own peril.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Necronomicon posted:

...but Terraform yelled at me, since apparently you can't use variables or expressions within a backend config. So I'm stuck hard-coding (at the very least) the key for every single deployed environment, which annoys the hell out of me.
Hopefully your deployments are in different directories that only contain your environment variables, then you import submodules that actually do the work... If that's the case, all you need is one separate backend file per directory. Which is yeah, but doesn't even rank on the top 10 terraform annoyances

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
The problem with that sentence is that "should" has to be bolded, underlined, and in 24 point font

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Vulture Culture posted:

One extremely low-lift way to resolve this is to use remote state with S3 and object versioning on the bucket.
I really hope everyone who is using remote state on S3 already has versioning enabled, if not, do this immediately!

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
flux good tho?

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Whereas, I think cfengine (of which he was the author) and chef and promise theory application onto individual servers in general as fundamentally flawed and outdated ways of managing systems. But i'm not going to write a 500 page treatise about it

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Zorak of Michigan posted:

Re container chat, my org is still in its infancy in containerizing workloads. I've been advocating Kubernetes because, when I tinkered with Swarm, I couldn't imagine it scaling up to the number of different teams I would hope would eventually be using our container environment. Is there something easier to live with for an on-prem deployment than Kubernetes that can still support multiple siloed teams deploying to it?
Other than docker-compose? no. If you use docker swarm, you're going to regret it.

IMO docker-compose is good enough for a majority of stuff that doesn't aggressively autoscale. The last few pages talk a bit about this.

Bhodi fucked around with this message at 02:28 on Dec 18, 2019

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

12 rats tied together posted:

There will always be tools to glue together and you'll always have to glue them together with a mixture of automation and human process. With this in mind I judge a tool mostly by its ability to accomplish what I need it to, and for it to play nicely with other tools and arbitrary code. There's a sweet spot in there that obviously hugely varies based on your org, but in mine Terraform has a long way to go before it is "better enough" than a locally optimal piece of tech (ARM, Cloudformation, ROS, etc) to justify using, just as an example.
The giant bolded "Provisioners are a last resort" header paragraph in the terraform doc was added in 0.12.x only a few months ago, which should give you an idea of the maturity of a product.

We have a gently caress terraform "FTF" box on the whiteboard and everyone adds a tic mark when they discover something dumb, like the fact you can't use index on modules, along the auto-closed or stale as hell git issue with people begging for support.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
I think fargate is what they're trying to sell in that niche but I've never used it. If you are trying to go to docker images -> :yaycloud: the golden standard is gke and everyone else has a long way to catch up. EKS is a pretty lovely offering because you still have to manage all the hard poo poo yourself, and they make you pay a premium for it on top of the hassle.

It's kind of like directory service which is another sub-par offering that is almost worse than just running the poo poo on your own.

Bhodi fucked around with this message at 23:19 on Feb 12, 2020

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Remember: AWS hates you.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

CyberPingu posted:

Yeah I went down the output and dependency route. Cheers.
fyi we've got a massive list of terraform problems we've discovered and this one is at the top of it. The only other choice is using remote state store rather than dependencies, but be aware dependencies may not work as expected unless you create a dummy dependency variable within the module itself to force terraform to do the right thing.

here's the git issue - feel free to throw your pleas on top of the multiple year thread with hundreds of posts:
https://github.com/hashicorp/terraform/issues/17101

I've got a dozen git issues right behind that that we've run into and documented and last time we had hashicorp on a phone for renewal talks we pinned them to the wall about their lack of responsiveness to these kinds of multi-year architecture destroying-issues. I could write hundreds of words on the problems we've discovered and our lovely band-aid workarounds as we try to move to code-driven deployment.

I almost want to type it up and publish in blog format to at least make people aware of the absolutely massive list of gotchas that terraform has, things we wish we knew a year ago when we decided to use the new hotness (pronounced hot mess).

the tl;dr is modules suck, fail at encapsulation and only have partial functionality - critical things like looping and dependencies are missing or only partially implemented.

Bhodi fucked around with this message at 17:44 on Feb 27, 2020

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Blinkz0rz posted:

What's the current hotness for managing Jenkins job definitions in code? Is it still pipelines with Jenkinsfiles in the project repo or something else?
To go against the crowd, we(I) decided against jenkins job builder because we already have consistent repos for building docker containers, terraform, stuff like that, connected to github and a stub Jenkinsfile launching a shared library, and our "end user" of jenkins are developers somewhat unfamiliar with the garbage that is build engineering.

Blue Ocean's made pretty large strides with ease of setting up new Jenkinsfile style jobs; setting up a new repo is pretty much 4 clicks of the next button which was a perfect low-effort solution to our end-user's needs.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

necrobobsledder posted:

Shared libraries come with their own baggage of fun and while Jenkins is JVM based it’s never been easy for me to write pipelines or jobs in anything except Groovy. Like if I want to try Kotlin for our jobs, that’s not going to be convenient as I re-run jobs repeatedly to figure out another class loader problem when using anything other than plain old Java and Jenkins classes. It fits user hostile while being “user friendly” in its own universe of misery.

Seriously, what about this makes one think “easy to test CI”? https://medium.com/disney-streaming/testing-jenkins-shared-libraries-4d4939406fa2
gently caress that garbage. You CAN use shared libraries to import groovy and do native tests and go down the groovy rabbithole but we only use it for a "load and execute this shared declarative jenkinsfile from this one repo" stub.

The jenkinsfile itself is similar to below, with a bunch of predefined steps which lints branches, deploys and tests a dev instance/container on PRs, and tags/uploads on master branch. It's just 200 lines of generic code that pulls in config files/secrets from vault and runs shell commands. You'd be insane to try and develop your actual tests in groovy. You can use conditional build steps to skip specific stages but in the end jenkins is best used for just some really simple flow control goo that's executed through hooks from your source control, like line 80 in this:

https://github.com/jenkinsci/pipeline-examples/blob/master/declarative-examples/jenkinsfile-examples/mavenDocker.groovy

We use a jenkins docker image and launch all our jobs as docker containers. For our tests, all jenkins does is execute test/*.sh and if anything returns non-zero the job fails. The scripts in the test directory can execute complex rspec code or sometimes something so basic as a curl test against a built container. This decoupling of tests has the advantage of being tool-agnostic and allow you to test outside of the jenkins pipeline with just a local docker daemon and local checkout of the repo.

All of this could easily be ported to concourse or whatever CI hotness you choose. It CAN be very complex but it really shouldn't be.

Bhodi fucked around with this message at 05:08 on Mar 3, 2020

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

The Fool posted:

Also known as “the real world” for a ton of enterprises
I've been screaming for 20 years and haven't stopped screaming yet

Vulture Culture posted:

Yeah, I've never been able to run a Helm chart in production without modifying something about it
Our k8s lead absolutely hates helm and after using it for a while I'm inclined to agree. It's billed as deployment-made-easy but it's really just a glorified templating engine that you end up having to endlessly customize anyway. And then there's tiller. That drat thing just will not cooperate long-term, it's almost magical how it ends up able to stop itself in various ways.

Flux seems to work better, at least for our needs.

Bhodi fucked around with this message at 22:10 on Apr 17, 2020

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
that one is garbage but this one is pretty good https://github.com/terraform-aws-modules/terraform-aws-vpc

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

12 rats tied together posted:

I agree the various "with_", "loop_control", etc, features in ansible have all been really bad. J2's "{% for %}" though is totally fine and coincidentially it also hasn't changed since 2011 or whatever.

An ansible feature that a lot of people ignore because it sounds scary is custom filter plugins. Add a file to your repo under ./plugins/filter/ and define a very simple pluginbase subclass and you can pipe inputs from ansible directly into a full python interpreter and then get results back, instead of trying to juggle a series of "info modules -> set fact -> set fact transpose to list -> loop_control_double_nested_iterator_with_index" or whatever other nonsense exists today.

I've reviewed coworker PRs where we turn an entire tasks file in a role into a single item with a single filter call. Same deal if you're trying to get values from some external source and you don't want to chain together all of the various info modules: just write a lookup plugin, which you drop inside ./plugins/lookup.

I have a lookup plugin that turns the XML response from AWS' vpn connection service into a simple dictionary that I plug into a single downstream template call, that configures a full mesh sts vpn network on a bunch of vyos routers and a bunch of vpn gateways. It's like 15 lines of python, and I get to use an actual xml library, instead of 15 pages of module calls and trying to use "| from_xml" or whatever the gently caress.

The "stdlib" ansible-playbook interface is extremely bloated because they're trying to cater to all kinds of people with wildly variant levels of python experience. If you know what subclassing is, though, just pop the hood and go hog wild. It is extremely friendly to you doing this.

e: I am extremely interested in salt though (esp. salt reactor, which I think I discovered exists from you linking it in this very thread), unfortunately it seems like not a lot of people are actually using it so my chances of job hopping to get some experience with it are pretty slim.
I'm stuck in this hell right now, trying to find the best way to build a multi-platform file using yum_repository from a yaml structure but apparently you can't have an inner and outer loop in ansible that traverses a tree with different size leaves without include_task? Because the jinja2 product filter only let you reduce not reference? Like, WTF?

I'm starting in on ansible hard for the first time and yeah maybe i should just make a custom filter i guess? It shouldn't be this hard to parse and be able to make the index names 7.1_repo1 7.1_repo2 and 7.2_repo3:
pre:
repos:
 rhel7:
   "7.1":
   - name: repo1
     baseurl: http://whatever1
   - name: repo2
     baseurl: http://whatever2
   "7.2":
   - name: repo3
     baseurl: http://whatever3
  

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

12 rats tied together posted:

I spent some time with this and yeah, this definitely sucks/is a good time to use a filter plugin. I would suggest that, in general, a yaml file (interacted w/ via single templated parameter value in ansible-playbook) is not the place for this data. If you asked me to do it from scratch:
code:
inventories/group_vars/rhel7/7.1/common.yaml:
---
repos:
  - name: repo1
    baseurl: [url]http://whatever1[/url]
group_vars is good at this, ansible will discover the appropriate release as part of gathering node facts and your module invocation/template calls are simplified to only being with_items: "{{ repos }}" or {% for repo in repos %}. A good next step would be turning this into a rhel7.1 role, which gets you hooks into role defaults/role vars/role params inheritance structures, as well as role dependencies. You'd probably want your rhel7.1 role to depend on a rhel7 base role, and then you'd put repos common to every release of rhel7 in the base.

You can also totally do a plugin, quick example that worked for me. Add a section to your ansible.cfg to extend the lookup plugin path to somewhere inside your repo, and then drop this file into it. I was calling it like this:
code:
- debug:
    msg: "{{ lookup('saved_filename', repos.rhel7, inner_attr='name') }}"
It was a lookup plugin because I really wanted to do a "with_" expression in the playbook, but it could be either a lookup or a filter. The documentation for writing plugins is also much better than I remember, but feel free to PM me or just reply here if you have any issues. It's nice to see this thread get some activity.
Thanks for this; I ended up doing the known and ugly include_task just for time constraints but I'll definitely be looking at plugins next time I have an unusual shaped nail.


Followup question for ansible people; what's the best practice way of using a local boto .aws/confiig profile to run aws commands on remote hosts? I'd think there would be a best practice way of doing this, but docs seem to infer copying your entire .aws/ over temporarily or export a mess of environment variables (which you'd need to parse from local profile first). Has anyone developed a task block or plugin to streamline this or do I get to reinvent the wheel?

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
In this particular case, I needed to proxy my own permissions to copy some stuff to a S3 bucket the instance normally doesn't have access to. I could pass them through environment variables, but to use the profile I'd need to first parse my local boto profile config for them through a localhost connection and register a variable and I was just kinda was hoping there was something built-in to do it all for me and make them ephemeral. most of the aws related tasks support profiles now, it's just that those have to be local to the instance they're being run on.

The need was definitely an edge case, since most everything can be done through local commands as you noted. In the end I got lazy and gave the machine role additional creds for a few hours.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Vulture Culture posted:

Is it difficult to generate this via Jinja templates rather than doing all the weirdo machinations in Ansible YAML?
Mostly the directive to use yum_repository and to expand preexisting code. It was an interesting constraint that grew my knowledge, so a win I guess? Even if there were better ways of doing it, it's done now.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
IMO I heavily, heavily recommend and prefer that that secrets (and environment specific properties when possible) get pulled in as environment variables rather than a command line -e other_vars flag or an ansible plugin.

It's more secure, extensible, and portable. It's straightforward and works with pretty much everything. It allows you to swap your hashicorp secrets solution for an LDAP one, it allows you to stick stuff in a docker image or kubernetes and "just work(tm)", it allows you to easily iterate and test locally without relying on outside servers and it allows you to override those secrets locally on a given run when needed. It also lets you hook into not-ansible for things like automated rspec/junit tests using the same method.

It's just a better, more flexible approach than locking yourself into only ansible with a community supported plugin. There are definitely use cases for ansible plugins in general though, it's just not my preferred method in this use case.

HOWEVER, all that said, putting secrets in -e flags is wildly insecure so for a variety of reasons so Do Not Do That regardless of what solution you finally decide on.

Bhodi fucked around with this message at 19:10 on Jan 20, 2021

Adbot
ADBOT LOVES YOU

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Using ansible as your secrets entrypoint definitely works as long as you buy into ansible as wrapper for anything you'd conceivably need to access those secrets. That hasn't been a good fit for me in the past, for example if you need some secrets to access or modify things which ansible isn't a good fit for - network hardware, AWS services, basically anything that isn't at the OS or application level.

Just as an example, if your CI/CD testing wants to stand up an entire stack including VPC and EBS teardown, well you're probably going to be running terraform or cloudformation. If you go that route you've also got to manage accessing the same secrets in multiple different ways or wrap the whole thing in ansible - you may find that to be one Matryoshka too deep.

It's better to have some sort of smaller wrapper to manage secrets outside ansible, something that's straightforward, relatively secure and broadly supported by literally every CI/CD tool - environment variables. Yes, it's definitely an extra step and probably not as clean.

For something much more contained such as a single repository application without any external dependencies and with a straightforward compile/test/deploy, ansible works great. It starts to work less great when ansible is only a small tool in your overall CI/CD box rather than your entrypoint and your task is to try and keep them all in sync within the same pipeline / process.

I completely agree that If you're baking in the secrets you probably don't need environment variables; you're replacing that with cloud-init metadata or some file on the system that's external to ansible in a similar way. From the ansible side you plug both in very similarly. Maybe a fact would be the right approach instead? I'd need to think on it and dig into details.

Bhodi fucked around with this message at 19:51 on Jan 20, 2021

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply