|
Thankfully I had a few boilerplate statements put into our corporate policies to beef up my "don't make me tap the sign" replies when I get these kinds of requests. That doesn't stop people from asking "why not", but at least I can just tell them I'm not entertaining re-litigating our corporate policies right this very second, in probably a more diplomatic way. I think half of my annual feedback for policy updates is based on stupid arguments I've had over the past year.
|
# ? Mar 14, 2022 18:41 |
|
|
# ? Jun 4, 2024 21:47 |
|
It seems like I always first come in contact with the finance team at about month 8 at a job I never get to look at their entire system but it's always a completely seperate computing system, everything is still on the same version as it was when it was installed six years ago, thousand line long cmd (windows) scripts of speghetti, abject maximum clusterfuck It only interfaces in a couple places with production but when they do need access, it's 10 minutes before their year long initiative is set to go live
|
# ? Mar 16, 2022 18:03 |
|
Hadlock posted:It seems like I always first come in contact with the finance team at about month 8 at a job This is eerily close to my own experience. I'd also say it's the worst part of having to jump organizations every 12-36 months to right-size my compensation. Once I get those first connections on the finance team, I'll intentionally keep the channels open to head off those "hair-on-fire, ten minutes to launch" issues. It's like an isolated team doesn't realize that what we do isn't actually magic, and takes time to accomplish.
|
# ? Mar 16, 2022 18:16 |
|
12 rats tied together posted:I think there are a bunch of really good and intuitive "rules of thumb" that we all mostly agree on. The largest and most interesting factor I concern myself with is the productivity of my customers, for example, it's a low-leverage use of time for a sr. frontend engineer to be learning all of the ways Terraform can be bad. Maybe I have really high standards because I've always tried to be a high level red mage, but I don't think any of these technologies are particularly hard to understand on their own. It's the integration points between technologies that are typically undocumented, often without best practice, and almost always learned through trial-and-error, and I don't expect teams to repeat those explorations in order to learn and figure it out. That's where platform teams should start with the smallest abstractions possible–making sophisticated joinery easy–and grow to encompass the full surface area of the toolset only if it's absolutely necessary. (As a sidebar: this is actually one of the cooler things about Kubernetes from the standpoint of a developer in a midsize organization: you don't need to be concerned with where your logs go or how your service gets load balanced or put into DNS, or what agent is handling your service metrics, or maybe even where your secrets come from: the abstractions are designed precisely so that you can make this a cluster administrator's job, and even the team writing the deployment automation doesn't need to care too much. You have a lot of work to do writing the automations that are right for your business, but the platform has made most of the joinery really painless.) 12 rats tied together posted:I think it naturally follows, then, that infrastructure should be owned by infrastructure, feature owned by feature, and that the original OP was right to put themselves in front of the clearly arbitrary nomad deployment choice, not that we ever disagreed at least on this aspect of it. I actually don't think that "infrastructure" is a word that's well-defined enough to generalize how it should apply to an organizational structure or workflow/process configuration. It's literally any dependency that your application needs to function that isn't part of your running application itself. Is a cloud resource infrastructure? Almost always. But that could be an EKS cluster, an EC2 instance, an RDS DB instance, an S3 bucket, a load balancer configuration, a DynamoDB table, a SendGrid sender identity, or a Datadog alert. I'd definitely expect some of these to be low-leverage for a typical engineering team, but for others, the turnaround in not having it be self-service would be even lower leverage (and the company investing into bespoke automations for each of them would be lowest still). Vulture Culture fucked around with this message at 18:08 on Mar 19, 2022 |
# ? Mar 19, 2022 17:55 |
|
Is it weird to think it's expected for engineers to understand their tools? If a backend engineer didn't understand roughly what docker does for them or how terraform works I'd have concerns about their skillset and their tenure with any company that's not so enterprise-y that devs just throw code over the wall. Obviously different expectations for juniors vs seniors but one of the things that used to drive me nuts when I worked on a SRE team were product engineers that @ed my team whenever a Jenkins job failed instead of reading the logs and trying to figure out why it failed. Vulture Culture posted:(As a sidebar: this is actually one of the cooler things about Kubernetes from the standpoint of a developer in a midsize organization: you don't need to be concerned with where your logs go or how your service gets load balanced or put into DNS, or what agent is handling your service metrics, or maybe even where your secrets come from: the abstractions are designed precisely so that you can make this a cluster administrator's job, and even the team writing the deployment automation doesn't need to care too much. You have a lot of work to do writing the automations that are right for your business, but the platform has made most of the joinery really painless.) I agree with almost everything you said except for this. Kubernetes is fundamentally difficult without (and I'd argue even with) an extremely knowledgeable and robust platform engineering team that is maintaining that joinery close to the product teams that use it. All of the log management, cluster metrics, load balancing, dns, and secrets are in and of themselves join points because while they're usually well written integrations, they are extra pieces of an already complicated puzzle and when they fail (which they inevitably will) they always fail in weird ways that can have varying effects on other pieces. For example, we once had a log outage that was caused by a failure in coredns but we didn't realize where the problem was occurring until the host DNS cache for our services expired and clients couldn't do DNS resolution. Similarly, we once had downtime for a huge chunk of our product because we lost a NAT router in the AZ where the external-dns leader pod lived. Turns out that for some reason that service communicates directly with etcd to maintain the leader lock but couldn't make requests to Route 53 to update target groups because the router went down. These issues can't be abstracted away behind tooling, they're the result of CNCF-owned architectural decisions at multiple points wherein the composition of these points can easily lead to cascading failures. This also causes funny organizational issues because if you're a platform team who sets up and administers a kubernetes cluster you need to define the edges of all these tiny components that make everything else work and communicate that to product teams in a way that makes it very clear who owns each piece, how to figure that out, and how to diagnose and escalate issues. Now multiply that across how ever many product teams and clusters and AWS accounts your platform team supports and it gets untenable without a large enough team to support these product teams. Then you run into the issue that as your platform team expands it becomes spiderman-pointing-at-other-spiderman.png to determine who is responsible and knowledgeable enough to resolve these issues if domain knowledge of these joins is siloed in smaller parts of your platform team which always happens as a team grows quickly to support a more complex landscape. Basically I miss golden image AMIs. The same level of abstraction existed except with fewer moving pieces.
|
# ? Mar 19, 2022 20:27 |
|
Blinkz0rz posted:Obviously different expectations for juniors vs seniors but one of the things that used to drive me nuts when I worked on a SRE team were product engineers that @ed my team whenever a Jenkins job failed instead of reading the logs and trying to figure out why it failed. Almost every stack overflow question goes like this: "X failed because of Y. Change Z to correct the issue." "HELP WITH THIS ERROR MESSAGE! HIGH PRIORITY URGENT" "...did you change Z?" "NO, HELP" *changes Z, problem resolves*
|
# ? Mar 19, 2022 20:36 |
|
We just bootstrapped a new set of domain services using nixos, and I'm pretty sure we'll replace a giant chunk of our tool chain with it eventually elsewhere. It gives me the same warm and fuzzies golden amis used to.
|
# ? Mar 19, 2022 20:37 |
|
New Yorp New Yorp posted:Almost every stack overflow question goes like this: Yup and because it's so loving easy to just google something I'll never accept an engineer that's too lazy to try to help themselves. Not in previous life as a SRE and most definitely not in my current role with a product team.
|
# ? Mar 19, 2022 20:38 |
|
Slightly different subject, has anyone found a good "stack in a box" setup that's composable but also allows for easy service debugging? We played around with skaffold but our services are written in Go and it was extremely painful to integrate with IntelliJ/Goland and running a debugger meant remotely connecting to a running instance of the service so debugging startup issues was basically impossible. Also, maintaining Helm charts for our services when they're deployed to remote environments via raw Kubernetes manifests from Spinnaker was a huge pain. What I'd love to have is a way to mash different docker compose stacks together with shared dependencies but I don't think that's possible.
|
# ? Mar 19, 2022 20:45 |
|
Blinkz0rz posted:I agree with almost everything you said except for this. Kubernetes is fundamentally difficult without (and I'd argue even with) an extremely knowledgeable and robust platform engineering team that is maintaining that joinery close to the product teams that use it. Blinkz0rz posted:This also causes funny organizational issues because if you're a platform team who sets up and administers a kubernetes cluster you need to define the edges of all these tiny components that make everything else work and communicate that to product teams in a way that makes it very clear who owns each piece, how to figure that out, and how to diagnose and escalate issues. Now multiply that across how ever many product teams and clusters and AWS accounts your platform team supports and it gets untenable without a large enough team to support these product teams. At the end of the day, it's not a fully-managed platform. It's Just Enough Platform to either solve a problem really elegantly or drop a shipping container on your foot. But you don't need to write the integrations yourself, or task yourself with figuring out precisely where the boundaries are supposed to be, or build your own control loops to run them. Maybe the worst part of the dominant messaging is the idea that Kubernetes somehow makes you hybrid something-or-other, or that it allows you to hedge against competing cloud ecosystems by being some kind of abstraction layer that prevents you from having to make choices with tradeoffs. It isn't that. It's an OS for services, and it only adds value if you actually lean into its ecosystem. As a dumb analogy: you don't succeed at running Linux by only using the preinstalled packages and making sure to only touch the POSIX APIs. Blinkz0rz posted:Then you run into the issue that as your platform team expands it becomes spiderman-pointing-at-other-spiderman.png to determine who is responsible and knowledgeable enough to resolve these issues if domain knowledge of these joins is siloed in smaller parts of your platform team which always happens as a team grows quickly to support a more complex landscape. Blinkz0rz posted:Basically I miss golden image AMIs. The same level of abstraction existed except with fewer moving pieces. On the other hand, don't add the complexity if you don't actually need it. I'm not advocating everyone, or even most people, should use it. But for people who keep finding themselves roping developers into year-long migrations between infrastructure dependencies, it might be time for them to look around at how other ecosystems are addressing that problem, because the value add of the platform isn't "runs container good". Vulture Culture fucked around with this message at 21:20 on Mar 19, 2022 |
# ? Mar 19, 2022 21:08 |
|
Gyshall posted:We just bootstrapped a new set of domain services using nixos, and I'm pretty sure we'll replace a giant chunk of our tool chain with it eventually elsewhere. we're on someone else's hosted kubernetes service so we're not Going Full Nixos* for a long time, but i have made a proof of concept showing how much our pipelines could be sped up by doing container image builds with nix instead of docker. the caching is incredible, and the tooling around packaging different languages keeps getting better (partly i think because more and more language package managers are accepting the strategy of either fully vendoring dependencies or committing hashes for them) *: im using it on my personal laptop though so i am absolutely doomed
|
# ? Mar 19, 2022 21:33 |
|
Vulture Culture posted:Well, yeah. If you're going to build or integrate a platform, you need to support it. If you can't support it, it's a bad decision. The point is that these days, the ecosystem is so robust that a competent operations team can make Kubernetes provide a lot of value even if they just expose bare-rear end naked Kubernetes API to engineers. That wasn't the case a few years back. This is some MBA-cum-technologist speak where you write a lot but say very little beyond "git gud." Of course you need an extremely competent platform organization with error budgets, SLAs, bulletproof tooling, deep technical knowledge, and near-perfect knowledge dissemination as well as a business that recognizes the importance of this team as an enabler for product even if the financial outlay isn't realized until product grows and scales. How many of those do you think exist, especially at the size you seem to think Kubernetes is appropriate for?
|
# ? Mar 19, 2022 21:53 |
|
Blinkz0rz posted:What I'd love to have is a way to mash different docker compose stacks together with shared dependencies but I don't think that's possible. We end up using kustomize base+overlays for loads of stuff where helm charts are too much of a pain in the rear end for the value they provide.
|
# ? Mar 19, 2022 22:14 |
|
Junkiebev posted:We end up using kustomize base+overlays for loads of stuff where helm charts are too much of a pain in the rear end for the value they provide. You're still runing kubernetes on workstations which makes my aforementioned debugging requirement way tougher than it needs to be.
|
# ? Mar 19, 2022 22:17 |
|
Blinkz0rz posted:What I'd love to have is a way to mash different docker compose stacks together with shared dependencies but I don't think that's possible. This is, I guess, what I have been getting at. You can hand "I need a good way to manage the composition and orchestration of these yaml fragments" to an infrastructure team and that's like 20 minutes of ansible. Scaling docker compose files has been a solved problem since 2013. The kubernetes ecosystem and, like you said, the CNCF, is to blame for repeatedly un-and-re-solving this problem for nearly a decade now because kubernetes is a psyop for selling kubernetes tooling and advertisement impressions for kubernetes-adjacent medium articles.
|
# ? Mar 19, 2022 23:22 |
|
Vulture Culture posted:[...] that could be an EKS cluster, an EC2 instance, an RDS DB instance, an S3 bucket, a load balancer configuration, a DynamoDB table, a SendGrid sender identity, or a Datadog alert. I'd definitely expect some of these to be low-leverage for a typical engineering team, but for others, the turnaround in not having it be self-service would be even lower leverage (and the company investing into bespoke automations for each of them would be lowest still). I will take the time to fully read and respond to your post when it's not the weekend and I'm not phone posting, but I was on board until about here. My counter-opinion though is going to be kind of low value for the discussion though: if you have to build bespoke automations for these things and it is of significant cost to your infrastructure teams, and despite being significant cost they do not actually provide value for >99% of use cases at your org, your infrastructure team is bad and needs to be replaced. That's not a reasonable thing to suggest, and it's not helpful to anyone experiencing the issue at their job, but there's no way around it IMO. If you suggest that k8s is the best fit for your org because there's something fundamental about the concept of a "Deployment" API object, that's nonsense. If k8s is the best fit for your org because you got unlucky in devops hiring roulette and you have access to a team of people who can't build anything except modified examples of the terraform docs snippets? I totally agree. Probably there is a sliding scale here where you can argue that you should pick the deployment tech that is optimized for the least bad part of your engineering org. I guess you suggested this originally with patronage-based ops, even.
|
# ? Mar 19, 2022 23:32 |
|
How do I career transition to working as a gardening center associate
|
# ? Mar 20, 2022 00:05 |
|
12 rats tied together posted:This is, I guess, what I have been getting at. You can hand "I need a good way to manage the composition and orchestration of these yaml fragments" to an infrastructure team and that's like 20 minutes of ansible. Scaling docker compose files has been a solved problem since 2013. So is there an answer here or what? Compose overrides aren't what I'm looking for because they require passing multiple compose files rather than understanding whether an existing container satisfies the service. Extends is gone in 3.x. I'd rather keep this simple and avoid writing orchestration wrappers around my dev containers and this sort of thing seems like something compose would support out of the box.
|
# ? Mar 20, 2022 00:17 |
|
Blinkz0rz posted:This is some MBA-cum-technologist speak where you write a lot but say very little beyond "git gud." Of course you need an extremely competent platform organization with error budgets, SLAs, bulletproof tooling, deep technical knowledge, and near-perfect knowledge dissemination as well as a business that recognizes the importance of this team as an enabler for product even if the financial outlay isn't realized until product grows and scales. How many of those do you think exist, especially at the size you seem to think Kubernetes is appropriate for?
|
# ? Mar 20, 2022 23:00 |
|
I set up CI for a project using github actions, and it was really pleasant! The ease of direct integration is a huge plus for me, and the pricing is way easier to understand than whatever the gently caress CircleCI does. I uh, somehow missed that OSX costs 10x more, though. I thought it was maybe 3x? Gonna have to cut lots of unnecessary poo poo, cus there go this month's minutes.
|
# ? Mar 21, 2022 02:35 |
|
We are getting ready to get rid of circleci and replace it with github actions
|
# ? Mar 21, 2022 02:37 |
|
I would not recommend doing that unless you have very modest requirements. Github actions is not very good at keeping their macOS builders up to date (e.g. they still haven't updated to macOS 12 and so can't run Xcode 13.3) and even on the Enterprise plan it took us ~6 months to successfully get bumped up 50 concurrent macOS jobs.
|
# ? Mar 21, 2022 02:52 |
|
OTOH, CircleCI still doesn't have support for arm64 with their docker executor and talks down at you if you dare ask for it.
|
# ? Mar 21, 2022 06:05 |
|
On Friday I had enough and finally just marked myself as being ⛔️ with a message that I'm busy , DND and away and then closed slack entirely. It was great and I did it again today after 10AM. Everyone can deal with their own poo poo for a few days. This is the first time I've gotten any non-interrupt work done in a month and it feels good. Highly recommend.
|
# ? Mar 21, 2022 20:26 |
|
luminalflux posted:OTOH, CircleCI still doesn't have support for arm64 with their docker executor and talks down at you if you dare ask for it. It's almost worth it to have a multi core physical machine on site for all the time circleci is down, just to keep up with the load
|
# ? Mar 22, 2022 01:39 |
|
Blinkz0rz posted:So is there an answer here or what? Compose overrides aren't what I'm looking for because they require passing multiple compose files rather than understanding whether an existing container satisfies the service. Extends is gone in 3.x. I'm sorry, I didn't mean for that to come off as blithe as it did. The bad news is, I use orchestration tech for this. The good news is that it's been the same orchestration tech for almost a decade and it has only gotten better over time. The gist of it is, a docker compose file is a config file that needs to exist on a server. Use ansible, put your compose fragments in a folder called "compose_fragments", write an ansible role for each compose file output you need, use ansible's j2 integration to render arbitrarily complex compose files (or k8s manifests, or cloudformation templates, or terraform configs, etc.). Extract common functionality (e.g. logging sidecars, prometheus scrapers, whatever) into .j2 files and put them into "compose_fragments". In ansible, jinja2 includes evaluate from either your role_path or the path that your playbook executes from, so "global includes" go into compose_fragments, and "specific includes" go into the per-role folder structure. Ansible also includes every variable "in scope" (e.g. loaded from group_vars membership), which means maintaining hierarchical data in ansible becomes an extremely high leverage activity, since you can use it in a unified approach to managing every type of infrastructure, using any type of orchestrator or scheduler, from the same set of controls. In a situation where you need to be more dynamic than simple snippet loading, you can use jinja2 extend, but I've found that when people actually attempt to use this functionality they usually decide it's not worth the extra complexity and go back to snippet includes. Slightly more useful is to define some set of jinja2 macros (or write an ansible plugin, all of which become available for use in any template) to DRY up your templates because they can get pretty messy. You could also build your own thing on top of jinja2, or whatever other templating engine, but in my experience whenever people do this it ends up being worse, sometimes alarmingly worse (helm+tiller), with no benefit. I do agree with you that ultimately compose should just support this out of the box. It's literally "step 2" on any project more complicated than a blog article snippet.
|
# ? Mar 22, 2022 02:30 |
|
12 rats tied together posted:I'm sorry, I didn't mean for that to come off as blithe as it did. The bad news is, I use orchestration tech for this. The good news is that it's been the same orchestration tech for almost a decade and it has only gotten better over time. The gist of it is, a docker compose file is a config file that needs to exist on a server. Use ansible, put your compose fragments in a folder called "compose_fragments", write an ansible role for each compose file output you need, use ansible's j2 integration to render arbitrarily complex compose files (or k8s manifests, or cloudformation templates, or terraform configs, etc.). You...you are comparing compose manifests to other orchestrator configs? Are you using swarm or something? I just find it very hard to see how going this route makes more sense over something like Kustomize. Seems way more complex and likely to fail from a JR dev loving something up.
|
# ? Mar 22, 2022 03:49 |
|
Kustomize is alright but the base/overlay/rfc6902 stuff results in an extremely verbose and dense directory structure. It's my experience that deeply nested and confusing directory structures result in higher onboarding difficulties, burnout, and frustration with codebases than complicated files do. Kustomize also only works on k8s, whereas with ansible k8s is just an implementation detail, in fact, you could just use kustomize with ansible and I've done this myself extensively at a previous job -- my takeaway was that it was worse than jinja2, but, I almost always think that. Ansible's runtime safety is also, generally, much higher than kustomize (at least last I looked, which was 2019ish) thanks to its long history of being a deployment orchestrator instead of a document generator. Any kind of preflight, inflight, or postflight safety check you can think of is likely already a directly supported feature, or worst case scenario, 20 lines of python or yaml from reality for you. It hasn't been my experience that junior devs have a hard time learning ansible, in general, but i have a lot of experience with it and I like to think I'm a pretty OK teacher. Both ansible and kustomize are way better than helm in every way that they could be.
|
# ? Mar 22, 2022 06:56 |
|
You know you can apply to work at Red Hat to actually get paid for evangelizing Ansible in every situation, right?
|
# ? Mar 22, 2022 23:42 |
|
My current shop has a team where we spend 50%-60% of our dedicated ticket work to them doing either Ansible support, and they're a combination of junior and senior devs (mostly java and some C) I love ansible so much, but one of the big things I've found is that it isn't the most straight or efficient path to solving a problem, so traditional devs have a hard time grasping some of the abstractions. w/r/t provisioning and managing your own k8s, it all sucks rear end
|
# ? Mar 23, 2022 10:48 |
|
Also worth noting that the problem was figuring out how to have a "stack in a box" for local dev work where different services in different repos might have different, conflicting, or the same infra dependencies while also being able to launch services with a debugger. Nowhere does Ansible remotely look like a solution.
|
# ? Mar 23, 2022 12:16 |
You’re probably struggling to find an answer because the solution is not to do that on a local machine, because why would you even want to
|
|
# ? Mar 23, 2022 12:19 |
|
This certainly sounds like a huge abuse of Ansible to me. I see things like that all the time. "We like and know X, so we just try to do everything with it" -- where X is Ansible, Terraform, GH Actions, Azure Pipelines, Chef, etc. Like, every additional layer of abstraction you add over tooling is one more place something can go wrong. The process described with Ansible sounds insanely complex with practically no benefit versus just authoring Helm charts or using Kustomize.
|
# ? Mar 23, 2022 12:22 |
|
i am a moron posted:You’re probably struggling to find an answer because the solution is not to do that on a local machine, because why would you even want to You've never needed to spin up services for local dev? Say I have service A and service B and they both need a DB. The code for each service is in different repos. I don't want to run 2 DB containers but instead use a different database within the same postgres container. How do I ensure that the logic of "start the DB if it isn't started or use the existing DB container if it is" is reflected via docker compose? I know, the answer is that docker compose can't do this but I don't want to run kubernetes locally just to avoid port collisions and duplicate infra for each service just to run them on a dev box.
|
# ? Mar 23, 2022 12:25 |
I’m out here wasting peoples money like crazy, usually just give developers a sandbox in [chose your favorite cloud platform] and all the tools they need to make it as close to when they push to dev as possible. I’m also extremely used to people having junky VDIs and whatnot and placing the localized developer workloads in the cloud even for the simplest of things. Edit: which could defeat the entire purpose of what you’re saying, I dunno. Never thought about it much i am a moron fucked around with this message at 12:37 on Mar 23, 2022 |
|
# ? Mar 23, 2022 12:34 |
|
Blinkz0rz posted:Also worth noting that the problem was figuring out how to have a "stack in a box" for local dev work where different services in different repos might have different, conflicting, or the same infra dependencies while also being able to launch services with a debugger. The answer is mocking service calls and responses. We see the "local stack" as an anti pattern because it stops scaling past whatever domains and 10+ micro services/db backends/integrations etc. That reminds me though, does anyone have experience testing gcp server less functions locally?
|
# ? Mar 23, 2022 13:46 |
|
I'm on a project using ansible for stuff that they could just put directly into their Jenkinsfiles. I don't really get it. My contract ends in a month and I can't wait to leave.
|
# ? Mar 23, 2022 14:06 |
|
asap-salafi posted:I'm on a project using ansible for stuff that they could just put directly into their Jenkinsfiles. I don't really get it. My contract ends in a month and I can't wait to leave. I inherited a thing that is like this in Azure DevOps. Multiple pipelines that call multiple ansible playbooks, and more than half the time there are a couple layers of nested playbooks. Nightmare to maintain, even being familiar woth ansible.
|
# ? Mar 23, 2022 15:39 |
|
My current project is completely removing ansible from a major workflow in favor of chef. The last thing I want in my life is another yaml dsl
|
# ? Mar 23, 2022 15:42 |
|
|
# ? Jun 4, 2024 21:47 |
|
Hilariously, we use chef for our actual config management
|
# ? Mar 23, 2022 15:51 |