Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »

lil bip: Mar 13, 2004; That ain't workin', that's the way you do it

IAmKale posted:

Are there any good guides on best practices for capturing log output from containers? For the scale of what I�m supporting, it�d be great to get a robust local logging setup. I know at some point, though, I�ll need to look at services I can use to aggregate data. For now, though, I�m more interested in higher level fundamentals to gain more confidence in Docker.

The way I do it is have a shared volume mounted to each instance running docker, and output logs from the containers to that mounted volume. Then you can run whatever is your favourite log parser over the shared volume and not worry about missing logs from each container host that you have running.

# ? Apr 17, 2018 04:31

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 01:08

The Fool: Oct 16, 2003

Just run another container with rsyslog

# ? Apr 17, 2018 05:00

Hadlock: Nov 9, 2004

IAmKale posted:

Are there any good guides on best practices for capturing log output from containers? For the scale of what I’m supporting, it’d be great to get a robust local logging setup. I know at some point, though, I’ll need to look at services I can use to aggregate data. For now, though, I’m more interested in higher level fundamentals to gain more confidence in Docker.

Sidecar your logs to log management like ELK, GELF, Splunk etc. Our legacy prod mission critical stuff is in Splunk right now but it costs a fortune, we hope to be 100% greylog by end of quarter.

I haven't figured out what the magic way to collect logs from kubernetes is. For stats monitoring, Prometheus is dead simple. Haven't seen a vendor agnostic zero config log solution on par with Prometheus yet.

# ? Apr 17, 2018 06:40

fluppet: Feb 10, 2009

I need to automate schema changes to a aws rds database using vsts. Any suggestions?

fluppet fucked around with this message at 12:14 on Apr 17, 2018

# ? Apr 17, 2018 11:27

lil bip: Mar 13, 2004; That ain't workin', that's the way you do it

Vendor agnostism is a pipedream held by those usually not involved in the financial decisions within a company.

But outside of that, go hog wild! Certainly a good decision whilst in the start up phase or early days.

# ? Apr 17, 2018 11:29

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Hadlock posted:

Sidecar your logs to log management like ELK, GELF, Splunk etc. Our legacy prod mission critical stuff is in Splunk right now but it costs a fortune, we hope to be 100% greylog by end of quarter.

I haven't figured out what the magic way to collect logs from kubernetes is. For stats monitoring, Prometheus is dead simple. Haven't seen a vendor agnostic zero config log solution on par with Prometheus yet.

I�m spoiled because with GKE it�s a checkbox that grabs stout and stderror, behind the scenes it�s running fluentd and scraping the pod logs.

Here�s the k8s documentation and it looks like there are multiple sample daemonset implementations. https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level

# ? Apr 17, 2018 12:18

Docjowles: Apr 9, 2009

freeasinbeer posted:

I�m spoiled because with GKE it�s a checkbox that grabs stout and stderror, behind the scenes it�s running fluentd and scraping the pod logs.

Here�s the k8s documentation and it looks like there are multiple sample daemonset implementations. https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level

Yeah this is what we do (self-managed cluster on AWS built with kops). Containers write to stdout/stderr, which kubernetes redirects to /var/log/containers/ on the node. There's a daemonset running fluentd on every node. It tails all the logs and sends them to elasticsearch. Not much to it.

# ? Apr 17, 2018 15:07

Hadlock: Nov 9, 2004

Docjowles posted:

Yeah this is what we do (self-managed cluster on AWS built with kops). Containers write to stdout/stderr, which kubernetes redirects to /var/log/containers/ on the node. There's a daemonset running fluentd on every node. It tails all the logs and sends them to elasticsearch. Not much to it.

Yeah my last company was GKE, log management was magical with, what is it, log stash? Super easy push button, loved it.

Can you go in to more detail of what you're doing that works with your kops implementation, would love to hear more detail, that's what we're doing but it's not coming together as smoothly as you're describing.

# ? Apr 17, 2018 16:57

Hadlock: Nov 9, 2004

Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else.

# ? Apr 20, 2018 19:21

Walked: Apr 14, 2003

Hadlock posted:

Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else.

Check out AWS's new Secrets Manager maybe?
https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/

but we use Vault happily

# ? Apr 20, 2018 19:35

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Secrets manager is too expensive for most places to warrant using. SSM also has a hidden 30 requests / second maximum transaction rate that is ok for smaller shops but not viable for most places at scale.

So umm... comedy DynamoDB with KMS encryption option?

# ? Apr 20, 2018 21:00

Methanar: Sep 26, 2013; by the sex ghost

People who have moved important things into containers:

How do you handle doing hacky adhoc fixes live in prod in the event it seems necessary? Occasionally something goes wrong at peak traffic time and there is an incident that needs investigating. Sometimes as part of this there will be a modification made to the code that is running in prod just by opening up a file in vim and sticking a return statement at the top of a function and then reloading the code to avoid a disaster.

How do you do something like that when you're running in Kubernetes? Maybe you don't have time to properly go through a full commit/build/deploy cycle and you really just want to put a few extra lines of interpreted code into prod really quickly.

# ? Apr 21, 2018 00:48

Doc Hawkins: Jun 15, 2010; Dashing? But I'm not even moving!

kubectl exec

E: more specifically kubectl exec -n $namespace $pod_name -it /bin/bash

E2: Of course, this assumes your containers have an editor, which they shouldn't. For that and other reasons the real answer to your question is that your full build process should be fast and effortless enough that you don't question using it, even for one-line changes (and especially for interpreted languages).

Doc Hawkins fucked around with this message at 00:59 on Apr 21, 2018

# ? Apr 21, 2018 00:53

Methanar: Sep 26, 2013; by the sex ghost

Doc Hawkins posted:

`kubectl exec`

If my entrypoint/cmd for a container is that a particular process is running, restarting that process will kill the container.

# ? Apr 21, 2018 01:06

Doc Hawkins: Jun 15, 2010; Dashing? But I'm not even moving!

Right, all that was a bad answer.

Okay, obviously I can't be trusted, but I think you could manually produce a new image with the line changed, push it to whatever image repository you're using, then change the deployment to point to that new image. You could pull the current image down, and run a container on it that you do your editing in, and then tag the image that results when you're done.

# ? Apr 21, 2018 02:12

SeaborneClink: Aug 27, 2010; MAWP... MAWP!

You roll back. Why are you yolo testing in prod?

# ? Apr 21, 2018 02:42

Methanar: Sep 26, 2013; by the sex ghost

Imagine the following scenario.

You're having record breaking levels of traffic. As you approach your normal peak period, things start breaking in unusual ways.

Human investigation reveals a contributing factor to be a particular redis command that is being run every few seconds to update a status dashboard. The redis call is complexity O(N). N is very large right now. This had been in the code basically forever, but didn't reach a threshold of causing issues until a few minutes ago, there isn't a version to rollback to that doesn't have this function. The executive decision is made to short circuit the function that is responsible for issuing the redis call.

Normal builds take up to 15 minutes to complete from clicking start. But you're not even sure if the changes you want to make help, or make things worse, and waiting 15 minutes to try isn't going to fly. Letting Kubernetes take untested, emergency code into its own hands and start rolling it out prod wide is also a recipe for disaster.

The obvious lazy answer is just to say 'make sure your code is good before it goes to prod' or 'test harder', but the world is a complicated place and incidents happen no matter how hard you try to prevent them. It is guaranteed that novel incidents will continue to happen that require a human to intervene.

That's kind of what I'm thinking about.

Methanar fucked around with this message at 03:08 on Apr 21, 2018

# ? Apr 21, 2018 03:00

StabbinHobo: Oct 18, 2002; by Jeffrey of YOSPOS

been there a hundred times. you don't get to have the process you didn't build before the emergency. if your pipeline takes 15 minutes, and you're not even sure if the first try will fix it, well then you're looking at a 30+ minute incident. thats what you built, thats what you get.

this is where so many shops get stuck on ci/cd. they build a massive amount of rear end-covering junk into the build & deploy cycle, because thats what it took to get all the cowards and middle management onboard, and because most developers never met a widget they wouldn't gladly bolt onto the contraption. it leaves you incapable of timely responses to real world events. poo poo like extraneous tiers (stage/qa/uat), manual/human-intervention blocking, serialized tests, lots of fetches to external poo poo like github/apt/docker-hub/etc, slow ramps because no one ever bothers to profile & tune the startup/warmup phase.

your options are to speed up your deploys to take seconds/minutes, or to religiously wrap every discreet feature in a flag so that it can be disabled live without a code deploy (in your case whatever sub-section of your status dashboard is making that expensive redis call).

imho you should do both, but if you did neither there isn't really a workaround anyone can tell you other than "go full cowboy with whatever poo poo you got", and thats a bad plan.

code is data. you can read and write to a db in milliseconds, if you cant deploy code within five orders of magnitude of that ur doin it rong.

StabbinHobo fucked around with this message at 03:28 on Apr 25, 2018

# ? Apr 21, 2018 03:28

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Feature flags aren�t going to help in that situation if you�re doing them right because they�ll have been ellided/ removed after being stable for a while.

For an incident response like that you�ll need to go on offense (like attacking the source of the problem) or going defensive like adding capacity / rolling back in a full rollback. The nature of said N is important though and this is where thinking on your feet and the umm... �athletic� part of operations comes in. Honestly, I�d just stop those specific Redis calls or something else to let everyone make progress. And the time it would take to identify that root cause adds to the incident MTTR as well, so by that point it�s probably already 20 minutes in with degraded performance.

Really, any service that has regular bursts of high traffic probably will have hit scaling factors quite early into the lifecycle in production enough that there would be a feature flag associated with the lines in question or simply a way to discard non-essential writes. I know in our service we have pretty predictable diurnal patterns from our customers and we setup monitoring to look for falling behind in the work output, and besides issues related to a recent change we have historically known about every single problem that�s bit us in production (we have the tickets filed months before to prove it). For example, our services started taking 45 - 130 minutes to simply start up. Turns out it�s tangentially related to a bunch of DB locking failures we�ve experienced for months now.

# ? Apr 21, 2018 13:53

freeasinbeer: Mar 26, 2015; by Fluffdaddy

In kubernetes id kubectl run a new pod with my entry point as /bin/sh, debug what I needed and then build a new image.

If it was a one line config file change I could even just override it with a configmap and use the existing image in place.

# ? Apr 21, 2018 13:54

Hadlock: Nov 9, 2004

Hadlock posted:

Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else.

Update, consul 0.8.x apparently leaked 100gb of disk over 300 days, the guy before be did not setup any kind of disk monitoring (or it got buried in the "notifications" noise - I'm not allowed to setup an actionable alerts slack channel, pick your battles etc etc) and while vault was writing to the lease KV the encrypted string got truncated and wasn't able to be decrypted. This is not well described nor alluded to in the error messages and I fully expect my PR to be roundly ignored, but after deleting all the lease data, everything came back to life. Out of disk always fucks everything, but I expected the root key to at least be able to log in and do things. Que sera, sera

# ? Apr 24, 2018 08:24

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Hadlock posted:

I fully expect my PR to be roundly ignored

FWIW every patch I've submitted to a Hashicorp product has been promptly reviewed and mostly accepted, and there have been a lot of them

# ? Apr 24, 2018 18:49

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Methanar posted:

Imagine the following scenario.

You're having record breaking levels of traffic. As you approach your normal peak period, things start breaking in unusual ways.

Human investigation reveals a contributing factor to be a particular redis command that is being run every few seconds to update a status dashboard. The redis call is complexity O(N). N is very large right now. This had been in the code basically forever, but didn't reach a threshold of causing issues until a few minutes ago, there isn't a version to rollback to that doesn't have this function. The executive decision is made to short circuit the function that is responsible for issuing the redis call.

Normal builds take up to 15 minutes to complete from clicking start. But you're not even sure if the changes you want to make help, or make things worse, and waiting 15 minutes to try isn't going to fly. Letting Kubernetes take untested, emergency code into its own hands and start rolling it out prod wide is also a recipe for disaster.

The obvious lazy answer is just to say 'make sure your code is good before it goes to prod' or 'test harder', but the world is a complicated place and incidents happen no matter how hard you try to prevent them. It is guaranteed that novel incidents will continue to happen that require a human to intervene.

That's kind of what I'm thinking about.

The real dumbass way would be to cowboy up a Docker image with a new file statically glued on top

# ? Apr 24, 2018 18:51

IAmKale: Jun 7, 2007; やらないか; Fun Shoe

Hey, speaking of Docker, I'm using the official Nginx Docker image via Compose to host a really simple reverse proxy. Unfortunately I'm getting 502 errors, but when I run docker-compose logs nginx nothing gets output. All of the image's logging outputs are mapped to stdout and stderr, so I was expecting even Nginx initialization logging. However there's zero output of any kind from that command. Am I doing something wrong?

Edit: it turns out I had set values for err_log and access_log in my nginx.conf, which prevented the logs from showing up in stdout and stderror :suicide:

IAmKale fucked around with this message at 21:44 on Apr 24, 2018

# ? Apr 24, 2018 20:47

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

If it was Java / JVM code, I�ve created Frankenwars that had a class file and property file shoved into it and deployed to production because the build and regression procedures took over 4 days (manual QA, yeah.....).

# ? Apr 24, 2018 22:43

Hadlock: Nov 9, 2004

IAmKale posted:

Hey, speaking of Docker, I'm using the official Nginx Docker image via Compose to host a really simple reverse proxy. Unfortunately I'm getting 502 errors, but when I run docker-compose logs nginx nothing gets output. All of the image's logging outputs are mapped to stdout and stderr, so I was expecting even Nginx initialization logging. However there's zero output of any kind from that command. Am I doing something wrong?

Edit: it turns out I had set values for err_log and access_log in my nginx.conf, which prevented the logs from showing up in stdout and stderror

Check out the jwilder ngnx reverse proxy container, once you realize you just have to point dns at the ip and add the env -e URL=MY.COOL-DOMAIN.COM to the docker run command, and it takes care of everything else, it's just magic, zero config. Been using it for years and it's just bullet proof.

# ? Apr 25, 2018 04:05

IAmKale: Jun 7, 2007; やらないか; Fun Shoe

Hadlock posted:

Check out the jwilder ngnx reverse proxy container, once you realize you just have to point dns at the ip and add the env -e URL=MY.COOL-DOMAIN.COM to the docker run command, and it takes care of everything else, it's just magic, zero config. Been using it for years and it's just bullet proof.

Man, this is stupid easy to use, thanks for pointing this out! The only thing missing is static file hosting, but I could probably hack something together using the project's template.

# ? Apr 25, 2018 20:53

Docjowles: Apr 9, 2009

Methanar posted:

How does everyone do their source control for Kubernetes and interaction with Kubernetes.

Dumping a bunch of yaml files into a git repo is with a readme.md explaining what they do is a terrible bad way of doing things. Someone tell me why helm charts should not be used as a deployment mechanism for internally produced applications.

How about things like rolling updates. Should I wrap all of the commands associated with doing rolling upgrades in a jenkins task runner that someone can just click on. Should I wrap helm charts with Jenkins? Should I use github as a Helm chart repo?

Mao Zedong Thot posted:

makefiles and yaml

Helm is awful, it's a massively overcomplex way to....... write templated yaml, where the template values don't ever change in practice anyway oh well lol

edit: ksonnet seems cool, but I haven't used it, nor do I grok it quite.

edit edit: Mostly, I think you shouldn't be doing complex poo poo in your resource files, and if you are, you should find or build a better way to do it, or move or abstract that complexity elsewhere. You shouldn't be doing a whole bunch more than `kubectl apply -f foo.yaml`

Hey I'm here to post the same question as Methanar and see if anyone has a different answer :pseudo:

We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable.

We easily could have a Jenkins job (or hell even a commit hook) that does the apply command and that would cover most cases. There are certain changes that require extra actions. But we could special case those.

But it seems like there has to be a tool for this already because doing it manually is so janky and horrible. And I know companies way bigger than mine are running Kubernetes in production. Is that tool Helm? Something else? I agree Helm doesn't sound ideal.

# ? May 2, 2018 04:19

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

Docjowles posted:

Hey I'm here to post the same question as Methanar and see if anyone has a different answer We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable.

We easily could have a Jenkins job (or hell even a commit hook) that does the apply command and that would cover most cases. There are certain changes that require extra actions. But we could special case those.

But it seems like there has to be a tool for this already because doing it manually is so janky and horrible. And I know companies way bigger than mine are running Kubernetes in production. Is that tool Helm? Something else? I agree Helm doesn't sound ideal.

Sadly I think you probably want openshift if you want developers going right into your cluster

# ? May 2, 2018 05:38

Docjowles: Apr 9, 2009

jaegerx posted:

Sadly I think you probably want openshift if you want developers going right into your cluster

Seeing as I know everyone hates Openshift... what would you recommend as an alternative approach?

We have a bunch of dev teams who all want to be able to deploy apps whenever. These are not microservices, but are mostly at least web apps written in modern-ish Java and actively maintained. If app behavior needs to be changed to work in a containerized world, it can be. We aren't trying to forklift some awful 1980's ERP thing.

We have a pretty reasonable setup today where devs can deploy their applications whenever onto traditional VM's without involving operations. And can make relevant configuration changes via pull requests against our Chef cookbooks, with permission to deploy them once merged to master. But it's still slow and clunky and a waste of resources and ops ends up as a bottleneck more than we'd like.

So now we've set up kubernetes which has a lot of advantages. But somehow ops applying changes is still a loving bottleneck and we need to fix that. We've gotten to the point where devs can update their containers with new code whenever and deploy that no problem. But any configuration change (deployments, services, configmaps, etc) is still manual and error prone.

We are still early enough on that we could change almost anything if it turns out what we are doing is terrible.

Docjowles fucked around with this message at 06:19 on May 2, 2018

# ? May 2, 2018 06:07

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

I actually love openshift. It's probably the easiest implementation of k8s right now(I haven't seen rancher 2.0). It's extremely developer friendly and I'm sure redhat is gonna add the cool parts of tectonic into it soon. From what you're saying to me you need something that's developer friendly so ops doesn't have to do every little thing. That's pretty much openshift(or rancher, I honestly don't know)

# ? May 2, 2018 06:18

Methanar: Sep 26, 2013; by the sex ghost

I can't stop laughing at your avatar.

Docjowles posted:

Hey I'm here to post the same question as Methanar and see if anyone has a different answer

I still haven't figured this out.

Docjowles posted:

So now we've set up kubernetes which has a lot of advantages. But somehow ops applying changes is still a loving bottleneck and we need to fix that. We've gotten to the point where devs can update their containers with new code whenever and deploy that no problem. But any configuration change (deployments, services, configmaps, etc) is still manual and error prone.

This is basically where I'm at. I can write helm charts so that updating code associated with a chart is easy, but to create new deployments or charts is where its still hairy.

# ? May 2, 2018 06:59

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Yeah I don�t like helm unless I am installing something we don�t maintain.

But yeah we are in the same place. The best tool right now is probably spinnaker for actual deployments but I think it�s super overkill.

As for secrets and configmaps, terraforms k8s provider has pretty good support and if you align workspaces to namespaces it�s pretty easy to keep sorted, it has a ton of features missing like it can�t do deployments yet. Work is under the way to change that(I have pretty good connections to hashicorp and my bitching made them refocus on it)

Ksonnet exists but I hate reading writing json

There isn�t a magic bullet here and I don�t like using something that isn�t mainline kubernetes in a ecosystem that moves as fast as k8s.

In practice we keep dev and prod completely separate and make sure our deployments have sanity checks. We also bundle services and deployments kubespecs into the actual docker images at build time and apply them by pulling the target docker artifact and then applying them to the target cluster.

# ? May 2, 2018 12:28

freeasinbeer: Mar 26, 2015; by Fluffdaddy

The brand new stack driver k8s integration is amazing. It beats anything I�ve seen so far.

# ? May 2, 2018 17:51

Docjowles: Apr 9, 2009

Methanar posted:

I can't stop laughing at your avatar.

Haha. This is more or less what it used to be before some dipshit bought me that awful anime avatar last month. Finally decided to drop the :5bux: to fix it.

Thanks for the input, everyone. It's both comforting and disturbing that everyone else has the same problems and no real solutions.

# ? May 2, 2018 19:16

Methanar: Sep 26, 2013; by the sex ghost

Docjowles posted:

Haha. This is more or less what it used to be before some dipshit bought me that awful anime avatar last month. Finally decided to drop the :5bux: to fix it.

Its funny because its just slightly bigger and high res than the last one. Still laughing at it.

freeasinbeer posted:

In practice we keep dev and prod completely separate and make sure our deployments have sanity checks. We also bundle services and deployments kubespecs into the actual docker images at build time and apply them by pulling the target docker artifact and then applying them to the target cluster.

Wait what?

You embed your service/rbac/pod/deployment/etc yaml configs into a docker container entrypoint, and instantiate them by executing the docker container?

# ? May 2, 2018 20:16

Mulozon Empuri: Jan 23, 2006

Anybody at KubeCon? If so, what happens after the two measly beer tickets?

# ? May 2, 2018 22:13

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Methanar posted:

Its funny because its just slightly bigger and high res than the last one. Still laughing at it.

Wait what?

You embed your service/rbac/pod/deployment/etc yaml configs into a docker container entrypoint, and instantiate them by executing the docker container?

Only service or deployment, but yes. We pull down the docker image locally, grab the kubespec and apply. Our sole artifact is the docker image.

# ? May 3, 2018 00:18

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

Docjowles posted:

Hey I'm here to post the same question as Methanar and see if anyone has a different answer We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable.

We easily could have a Jenkins job (or hell even a commit hook) that does the apply command and that would cover most cases. There are certain changes that require extra actions. But we could special case those.

But it seems like there has to be a tool for this already because doing it manually is so janky and horrible. And I know companies way bigger than mine are running Kubernetes in production. Is that tool Helm? Something else? I agree Helm doesn't sound ideal.

I was tasked with dealing with how to manage our Kubernetes clusters and deploy to them. I had to solve for a number of issues around bare metal deployments at the same time; the clusters I manage are primarily used for running machine learning jobs, so some of the nodes are GPU-equipped.

I ended up writing a big pile of Ansible playbooks and a couple thousand lines of Groovy :shepicide:

to glue it all together in Jenkins. The playbooks are grouped into different sets of roles and take care of infrastructure and application deployment, building up a multi-master Kubernetes cluster and deploying our applications into the cluster. They contain all the YAML manifests for everything as well as the information for the image tags, and any merged PR generates a new tag for that set of playbooks. To do the actual deployments, there's a repository that contains the configuration information for each managed cluster (per-cluster Ansible variables and inventory), and a metadata JSON file that declares all the playbook repositories used by the cluster, the version tag for each playbook repository, and the order to run stuff in.

The end result is that the workflow for deploying applications is entirely driven by GitHub Enterprise and Jenkins. To get an application update out to live clusters, devs:

do a release of application X
modify the Ansible playbooks that control application X with the new version, plus any other changes that might be required, such as modifying configmaps, secrets, deployments, etc., and validate this in their local development environment
make a PR to update the playbooks in GHE, get it merged
make a PR to update the configuration repository for the QA cluster with the tag for the release of the playbooks that has their change
roll to QA, let it bake
roll to dev and prod later on

It's worked pretty well for us. There are obvious rough spots in the workflow; doing a lot of PRs is repetitive and boring (especially when you're driving the same set of changes to several clusters in order), Jenkins is a crappy UI, I get pulled in on any non-superficial Ansible changes, and devs are generally a little scared to drive the process because they don't want the buck to stop with them. Even so, I'm working on fixing the rough spots and a lot of teams in the company are interested in using it, not least because the versioning system means they can synchronously push app+config changes at the same time.

We did look at Helm before I wrote any of this stuff, but Helm worked really poorly in our walled-off internal environment, and what Helm does is a subset of what I wrote.

# ? May 3, 2018 00:31

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 01:08

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Jenkins Pipeline docs talk about this new buildingTag condition: https://jenkins.io/doc/book/pipeline/syntax/#built-in-conditions

Looks like it was added very recently: https://github.com/jenkins-infra/jenkins.io/pull/1430/files

Does that mean it's in one of the Weekly releases? https://jenkins.io/download/

I didn't see it in the changelog yet: https://jenkins.io/changelog/

# ? May 5, 2018 01:04

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »