|
IAmKale posted:Are there any good guides on best practices for capturing log output from containers? For the scale of what I’m supporting, it’d be great to get a robust local logging setup. I know at some point, though, I’ll need to look at services I can use to aggregate data. For now, though, I’m more interested in higher level fundamentals to gain more confidence in Docker. The way I do it is have a shared volume mounted to each instance running docker, and output logs from the containers to that mounted volume. Then you can run whatever is your favourite log parser over the shared volume and not worry about missing logs from each container host that you have running.
|
# ? Apr 17, 2018 04:31 |
|
|
# ? May 18, 2024 01:08 |
|
Just run another container with rsyslog
|
# ? Apr 17, 2018 05:00 |
|
IAmKale posted:Are there any good guides on best practices for capturing log output from containers? For the scale of what I’m supporting, it’d be great to get a robust local logging setup. I know at some point, though, I’ll need to look at services I can use to aggregate data. For now, though, I’m more interested in higher level fundamentals to gain more confidence in Docker. Sidecar your logs to log management like ELK, GELF, Splunk etc. Our legacy prod mission critical stuff is in Splunk right now but it costs a fortune, we hope to be 100% greylog by end of quarter. I haven't figured out what the magic way to collect logs from kubernetes is. For stats monitoring, Prometheus is dead simple. Haven't seen a vendor agnostic zero config log solution on par with Prometheus yet.
|
# ? Apr 17, 2018 06:40 |
|
I need to automate schema changes to a aws rds database using vsts. Any suggestions?
fluppet fucked around with this message at 12:14 on Apr 17, 2018 |
# ? Apr 17, 2018 11:27 |
|
Vendor agnostism is a pipedream held by those usually not involved in the financial decisions within a company. But outside of that, go hog wild! Certainly a good decision whilst in the start up phase or early days.
|
# ? Apr 17, 2018 11:29 |
|
Hadlock posted:Sidecar your logs to log management like ELK, GELF, Splunk etc. Our legacy prod mission critical stuff is in Splunk right now but it costs a fortune, we hope to be 100% greylog by end of quarter. I’m spoiled because with GKE it’s a checkbox that grabs stout and stderror, behind the scenes it’s running fluentd and scraping the pod logs. Here’s the k8s documentation and it looks like there are multiple sample daemonset implementations. https://kubernetes.io/docs/concepts/cluster-administration/logging/#logging-at-the-node-level
|
# ? Apr 17, 2018 12:18 |
|
freeasinbeer posted:I’m spoiled because with GKE it’s a checkbox that grabs stout and stderror, behind the scenes it’s running fluentd and scraping the pod logs. Yeah this is what we do (self-managed cluster on AWS built with kops). Containers write to stdout/stderr, which kubernetes redirects to /var/log/containers/ on the node. There's a daemonset running fluentd on every node. It tails all the logs and sends them to elasticsearch. Not much to it.
|
# ? Apr 17, 2018 15:07 |
|
Docjowles posted:Yeah this is what we do (self-managed cluster on AWS built with kops). Containers write to stdout/stderr, which kubernetes redirects to /var/log/containers/ on the node. There's a daemonset running fluentd on every node. It tails all the logs and sends them to elasticsearch. Not much to it. Yeah my last company was GKE, log management was magical with, what is it, log stash? Super easy push button, loved it. Can you go in to more detail of what you're doing that works with your kops implementation, would love to hear more detail, that's what we're doing but it's not coming together as smoothly as you're describing.
|
# ? Apr 17, 2018 16:57 |
|
Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else.
|
# ? Apr 20, 2018 19:21 |
|
Hadlock posted:Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else. Check out AWS's new Secrets Manager maybe? https://aws.amazon.com/blogs/aws/aws-secrets-manager-store-distribute-and-rotate-credentials-securely/ but we use Vault happily
|
# ? Apr 20, 2018 19:35 |
|
Secrets manager is too expensive for most places to warrant using. SSM also has a hidden 30 requests / second maximum transaction rate that is ok for smaller shops but not viable for most places at scale. So umm... comedy DynamoDB with KMS encryption option?
|
# ? Apr 20, 2018 21:00 |
|
People who have moved important things into containers: How do you handle doing hacky adhoc fixes live in prod in the event it seems necessary? Occasionally something goes wrong at peak traffic time and there is an incident that needs investigating. Sometimes as part of this there will be a modification made to the code that is running in prod just by opening up a file in vim and sticking a return statement at the top of a function and then reloading the code to avoid a disaster. How do you do something like that when you're running in Kubernetes? Maybe you don't have time to properly go through a full commit/build/deploy cycle and you really just want to put a few extra lines of interpreted code into prod really quickly.
|
# ? Apr 21, 2018 00:48 |
|
kubectl exec E: more specifically kubectl exec -n $namespace $pod_name -it /bin/bash E2: Of course, this assumes your containers have an editor, which they shouldn't. For that and other reasons the real answer to your question is that your full build process should be fast and effortless enough that you don't question using it, even for one-line changes (and especially for interpreted languages). Doc Hawkins fucked around with this message at 00:59 on Apr 21, 2018 |
# ? Apr 21, 2018 00:53 |
|
Doc Hawkins posted:`kubectl exec` If my entrypoint/cmd for a container is that a particular process is running, restarting that process will kill the container.
|
# ? Apr 21, 2018 01:06 |
|
Right, all that was a bad answer. Okay, obviously I can't be trusted, but I think you could manually produce a new image with the line changed, push it to whatever image repository you're using, then change the deployment to point to that new image. You could pull the current image down, and run a container on it that you do your editing in, and then tag the image that results when you're done.
|
# ? Apr 21, 2018 02:12 |
|
You roll back. Why are you yolo testing in prod?
|
# ? Apr 21, 2018 02:42 |
|
Imagine the following scenario. You're having record breaking levels of traffic. As you approach your normal peak period, things start breaking in unusual ways. Human investigation reveals a contributing factor to be a particular redis command that is being run every few seconds to update a status dashboard. The redis call is complexity O(N). N is very large right now. This had been in the code basically forever, but didn't reach a threshold of causing issues until a few minutes ago, there isn't a version to rollback to that doesn't have this function. The executive decision is made to short circuit the function that is responsible for issuing the redis call. Normal builds take up to 15 minutes to complete from clicking start. But you're not even sure if the changes you want to make help, or make things worse, and waiting 15 minutes to try isn't going to fly. Letting Kubernetes take untested, emergency code into its own hands and start rolling it out prod wide is also a recipe for disaster. The obvious lazy answer is just to say 'make sure your code is good before it goes to prod' or 'test harder', but the world is a complicated place and incidents happen no matter how hard you try to prevent them. It is guaranteed that novel incidents will continue to happen that require a human to intervene. That's kind of what I'm thinking about. Methanar fucked around with this message at 03:08 on Apr 21, 2018 |
# ? Apr 21, 2018 03:00 |
|
been there a hundred times. you don't get to have the process you didn't build before the emergency. if your pipeline takes 15 minutes, and you're not even sure if the first try will fix it, well then you're looking at a 30+ minute incident. thats what you built, thats what you get. this is where so many shops get stuck on ci/cd. they build a massive amount of rear end-covering junk into the build & deploy cycle, because thats what it took to get all the cowards and middle management onboard, and because most developers never met a widget they wouldn't gladly bolt onto the contraption. it leaves you incapable of timely responses to real world events. poo poo like extraneous tiers (stage/qa/uat), manual/human-intervention blocking, serialized tests, lots of fetches to external poo poo like github/apt/docker-hub/etc, slow ramps because no one ever bothers to profile & tune the startup/warmup phase. your options are to speed up your deploys to take seconds/minutes, or to religiously wrap every discreet feature in a flag so that it can be disabled live without a code deploy (in your case whatever sub-section of your status dashboard is making that expensive redis call). imho you should do both, but if you did neither there isn't really a workaround anyone can tell you other than "go full cowboy with whatever poo poo you got", and thats a bad plan. code is data. you can read and write to a db in milliseconds, if you cant deploy code within five orders of magnitude of that ur doin it rong. StabbinHobo fucked around with this message at 03:28 on Apr 25, 2018 |
# ? Apr 21, 2018 03:28 |
|
Feature flags aren’t going to help in that situation if you’re doing them right because they’ll have been ellided/ removed after being stable for a while. For an incident response like that you’ll need to go on offense (like attacking the source of the problem) or going defensive like adding capacity / rolling back in a full rollback. The nature of said N is important though and this is where thinking on your feet and the umm... “athletic” part of operations comes in. Honestly, I’d just stop those specific Redis calls or something else to let everyone make progress. And the time it would take to identify that root cause adds to the incident MTTR as well, so by that point it’s probably already 20 minutes in with degraded performance. Really, any service that has regular bursts of high traffic probably will have hit scaling factors quite early into the lifecycle in production enough that there would be a feature flag associated with the lines in question or simply a way to discard non-essential writes. I know in our service we have pretty predictable diurnal patterns from our customers and we setup monitoring to look for falling behind in the work output, and besides issues related to a recent change we have historically known about every single problem that’s bit us in production (we have the tickets filed months before to prove it). For example, our services started taking 45 - 130 minutes to simply start up. Turns out it’s tangentially related to a bunch of DB locking failures we’ve experienced for months now.
|
# ? Apr 21, 2018 13:53 |
|
In kubernetes id kubectl run a new pod with my entry point as /bin/sh, debug what I needed and then build a new image. If it was a one line config file change I could even just override it with a configmap and use the existing image in place.
|
# ? Apr 21, 2018 13:54 |
|
Hadlock posted:Favorite secret store system? Our vault setup just rolled over and management doesn't trust it, also the guy who set it up didn't have any backups anywhere so looking for something else. Update, consul 0.8.x apparently leaked 100gb of disk over 300 days, the guy before be did not setup any kind of disk monitoring (or it got buried in the "notifications" noise - I'm not allowed to setup an actionable alerts slack channel, pick your battles etc etc) and while vault was writing to the lease KV the encrypted string got truncated and wasn't able to be decrypted. This is not well described nor alluded to in the error messages and I fully expect my PR to be roundly ignored, but after deleting all the lease data, everything came back to life. Out of disk always fucks everything, but I expected the root key to at least be able to log in and do things. Que sera, sera
|
# ? Apr 24, 2018 08:24 |
|
Hadlock posted:I fully expect my PR to be roundly ignored
|
# ? Apr 24, 2018 18:49 |
|
Methanar posted:Imagine the following scenario.
|
# ? Apr 24, 2018 18:51 |
|
Hey, speaking of Docker, I'm using the official Nginx Docker image via Compose to host a really simple reverse proxy. Unfortunately I'm getting 502 errors, but when I run docker-compose logs nginx nothing gets output. All of the image's logging outputs are mapped to stdout and stderr, so I was expecting even Nginx initialization logging. However there's zero output of any kind from that command. Am I doing something wrong? Edit: it turns out I had set values for err_log and access_log in my nginx.conf, which prevented the logs from showing up in stdout and stderror IAmKale fucked around with this message at 21:44 on Apr 24, 2018 |
# ? Apr 24, 2018 20:47 |
|
If it was Java / JVM code, I’ve created Frankenwars that had a class file and property file shoved into it and deployed to production because the build and regression procedures took over 4 days (manual QA, yeah.....).
|
# ? Apr 24, 2018 22:43 |
|
IAmKale posted:Hey, speaking of Docker, I'm using the official Nginx Docker image via Compose to host a really simple reverse proxy. Unfortunately I'm getting 502 errors, but when I run docker-compose logs nginx nothing gets output. All of the image's logging outputs are mapped to stdout and stderr, so I was expecting even Nginx initialization logging. However there's zero output of any kind from that command. Am I doing something wrong? Check out the jwilder ngnx reverse proxy container, once you realize you just have to point dns at the ip and add the env -e URL=MY.COOL-DOMAIN.COM to the docker run command, and it takes care of everything else, it's just magic, zero config. Been using it for years and it's just bullet proof.
|
# ? Apr 25, 2018 04:05 |
|
Hadlock posted:Check out the jwilder ngnx reverse proxy container, once you realize you just have to point dns at the ip and add the env -e URL=MY.COOL-DOMAIN.COM to the docker run command, and it takes care of everything else, it's just magic, zero config. Been using it for years and it's just bullet proof.
|
# ? Apr 25, 2018 20:53 |
|
Methanar posted:How does everyone do their source control for Kubernetes and interaction with Kubernetes. Mao Zedong Thot posted:makefiles and yaml Hey I'm here to post the same question as Methanar and see if anyone has a different answer We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable. We easily could have a Jenkins job (or hell even a commit hook) that does the apply command and that would cover most cases. There are certain changes that require extra actions. But we could special case those. But it seems like there has to be a tool for this already because doing it manually is so janky and horrible. And I know companies way bigger than mine are running Kubernetes in production. Is that tool Helm? Something else? I agree Helm doesn't sound ideal.
|
# ? May 2, 2018 04:19 |
|
Docjowles posted:Hey I'm here to post the same question as Methanar and see if anyone has a different answer We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable. Sadly I think you probably want openshift if you want developers going right into your cluster
|
# ? May 2, 2018 05:38 |
|
jaegerx posted:Sadly I think you probably want openshift if you want developers going right into your cluster Seeing as I know everyone hates Openshift... what would you recommend as an alternative approach? We have a bunch of dev teams who all want to be able to deploy apps whenever. These are not microservices, but are mostly at least web apps written in modern-ish Java and actively maintained. If app behavior needs to be changed to work in a containerized world, it can be. We aren't trying to forklift some awful 1980's ERP thing. We have a pretty reasonable setup today where devs can deploy their applications whenever onto traditional VM's without involving operations. And can make relevant configuration changes via pull requests against our Chef cookbooks, with permission to deploy them once merged to master. But it's still slow and clunky and a waste of resources and ops ends up as a bottleneck more than we'd like. So now we've set up kubernetes which has a lot of advantages. But somehow ops applying changes is still a loving bottleneck and we need to fix that. We've gotten to the point where devs can update their containers with new code whenever and deploy that no problem. But any configuration change (deployments, services, configmaps, etc) is still manual and error prone. We are still early enough on that we could change almost anything if it turns out what we are doing is terrible. Docjowles fucked around with this message at 06:19 on May 2, 2018 |
# ? May 2, 2018 06:07 |
|
I actually love openshift. It's probably the easiest implementation of k8s right now(I haven't seen rancher 2.0). It's extremely developer friendly and I'm sure redhat is gonna add the cool parts of tectonic into it soon. From what you're saying to me you need something that's developer friendly so ops doesn't have to do every little thing. That's pretty much openshift(or rancher, I honestly don't know)
|
# ? May 2, 2018 06:18 |
|
I can't stop laughing at your avatar.Docjowles posted:Hey I'm here to post the same question as Methanar and see if anyone has a different answer I still haven't figured this out. Docjowles posted:So now we've set up kubernetes which has a lot of advantages. But somehow ops applying changes is still a loving bottleneck and we need to fix that. We've gotten to the point where devs can update their containers with new code whenever and deploy that no problem. But any configuration change (deployments, services, configmaps, etc) is still manual and error prone. This is basically where I'm at. I can write helm charts so that updating code associated with a chart is easy, but to create new deployments or charts is where its still hairy.
|
# ? May 2, 2018 06:59 |
|
Yeah I don’t like helm unless I am installing something we don’t maintain. But yeah we are in the same place. The best tool right now is probably spinnaker for actual deployments but I think it’s super overkill. As for secrets and configmaps, terraforms k8s provider has pretty good support and if you align workspaces to namespaces it’s pretty easy to keep sorted, it has a ton of features missing like it can’t do deployments yet. Work is under the way to change that(I have pretty good connections to hashicorp and my bitching made them refocus on it) Ksonnet exists but I hate reading writing json There isn’t a magic bullet here and I don’t like using something that isn’t mainline kubernetes in a ecosystem that moves as fast as k8s. In practice we keep dev and prod completely separate and make sure our deployments have sanity checks. We also bundle services and deployments kubespecs into the actual docker images at build time and apply them by pulling the target docker artifact and then applying them to the target cluster.
|
# ? May 2, 2018 12:28 |
|
The brand new stack driver k8s integration is amazing. It beats anything I’ve seen so far.
|
# ? May 2, 2018 17:51 |
|
Methanar posted:I can't stop laughing at your avatar. Haha. This is more or less what it used to be before some dipshit bought me that awful anime avatar last month. Finally decided to drop the :5bux: to fix it. Thanks for the input, everyone. It's both comforting and disturbing that everyone else has the same problems and no real solutions.
|
# ? May 2, 2018 19:16 |
|
Docjowles posted:Haha. This is more or less what it used to be before some dipshit bought me that awful anime avatar last month. Finally decided to drop the :5bux: to fix it. Its funny because its just slightly bigger and high res than the last one. Still laughing at it. freeasinbeer posted:In practice we keep dev and prod completely separate and make sure our deployments have sanity checks. We also bundle services and deployments kubespecs into the actual docker images at build time and apply them by pulling the target docker artifact and then applying them to the target cluster. Wait what? You embed your service/rbac/pod/deployment/etc yaml configs into a docker container entrypoint, and instantiate them by executing the docker container?
|
# ? May 2, 2018 20:16 |
|
Anybody at KubeCon? If so, what happens after the two measly beer tickets?
|
# ? May 2, 2018 22:13 |
|
Methanar posted:Its funny because its just slightly bigger and high res than the last one. Still laughing at it. Only service or deployment, but yes. We pull down the docker image locally, grab the kubespec and apply. Our sole artifact is the docker image.
|
# ? May 3, 2018 00:18 |
|
Docjowles posted:Hey I'm here to post the same question as Methanar and see if anyone has a different answer We've been doing a POC with kubernetes and have determined that it owns. But going from "a few engineers dicking around with no revenue on the line" to "production environment shared by a bunch of devs across a bunch of disparate teams, some of which are subject to government regulations" is quite the leap. Even in our simple test environment we've had people accidentally apply changes to the "production which is thankfully not really production" cluster that were meant for staging. Or do a "kubectl apply -f" without having pulled the latest version of the repo, blowing away changes someone else made. This is completely untenable. I was tasked with dealing with how to manage our Kubernetes clusters and deploy to them. I had to solve for a number of issues around bare metal deployments at the same time; the clusters I manage are primarily used for running machine learning jobs, so some of the nodes are GPU-equipped. I ended up writing a big pile of Ansible playbooks and a couple thousand lines of Groovy to glue it all together in Jenkins. The playbooks are grouped into different sets of roles and take care of infrastructure and application deployment, building up a multi-master Kubernetes cluster and deploying our applications into the cluster. They contain all the YAML manifests for everything as well as the information for the image tags, and any merged PR generates a new tag for that set of playbooks. To do the actual deployments, there's a repository that contains the configuration information for each managed cluster (per-cluster Ansible variables and inventory), and a metadata JSON file that declares all the playbook repositories used by the cluster, the version tag for each playbook repository, and the order to run stuff in. The end result is that the workflow for deploying applications is entirely driven by GitHub Enterprise and Jenkins. To get an application update out to live clusters, devs:
It's worked pretty well for us. There are obvious rough spots in the workflow; doing a lot of PRs is repetitive and boring (especially when you're driving the same set of changes to several clusters in order), Jenkins is a crappy UI, I get pulled in on any non-superficial Ansible changes, and devs are generally a little scared to drive the process because they don't want the buck to stop with them. Even so, I'm working on fixing the rough spots and a lot of teams in the company are interested in using it, not least because the versioning system means they can synchronously push app+config changes at the same time. We did look at Helm before I wrote any of this stuff, but Helm worked really poorly in our walled-off internal environment, and what Helm does is a subset of what I wrote.
|
# ? May 3, 2018 00:31 |
|
|
# ? May 18, 2024 01:08 |
Jenkins Pipeline docs talk about this new buildingTag condition: https://jenkins.io/doc/book/pipeline/syntax/#built-in-conditions Looks like it was added very recently: https://github.com/jenkins-infra/jenkins.io/pull/1430/files Does that mean it's in one of the Weekly releases? https://jenkins.io/download/ I didn't see it in the changelog yet: https://jenkins.io/changelog/
|
|
# ? May 5, 2018 01:04 |