|
I will never forget my production deployment that failed to rollback my Kinesis changes because AWS had put a default maximum change interval of n times in 24 hours such that I could make a mistake exactly once and then upon the second failure to the stream would be unable to rollback, so I had to sit and wait for AWS support to unlock me at 1 am. Who decides these limits exactly anyway?
|
# ? Feb 12, 2023 04:07 |
|
|
# ? May 21, 2024 21:21 |
necrobobsledder posted:Who decides these limits exactly anyway? Somehow I feel the answer would only make us more unhappy.
|
|
# ? Feb 12, 2023 04:39 |
|
Resdfru posted:I refuse to acknowledge cloudformations existence New thread title?
|
# ? Feb 12, 2023 04:46 |
|
Hadlock posted:New thread title? tbh I think this is better madmatt112 posted:Somehow I feel the answer would only make us more unhappy.
|
# ? Feb 12, 2023 05:10 |
|
you can be forgiven for assuming cloudformation is bad if you come from azure or gcp where arm and gcp deployment manager are strictly worse than just using the platform normally in aws though cloudformation is fantastic, absolutely a key service that should be on your resume/interview panel. even, and especially, if you consider yourself more of a terraform user -- terraform "vs" cfn is just marketing, you should be using both. why? let's do a quick feature review for things that should be especially relevant for any professional working with infrastructure as code:
it's a good practice to wrap anything that matters in a cfn stack. you can think about it like a database transaction but for infrastructure, which is an apt comparison because cloud iac is way closer to DBA than it is to SWE
|
# ? Feb 12, 2023 17:46 |
|
12 rats tied together posted:you can be forgiven for assuming cloudformation is bad if you come from azure or gcp where arm and gcp deployment manager are strictly worse than just using the platform normally This is why it looks good on paper, but not so much in practice. I'll take poorly formed TF module structure bullshit over the nonsense I've had to deal with because of so bullshit resource graphing issue within CF
|
# ? Feb 12, 2023 23:08 |
|
My favorite thing about cloudformation back when I used it was when a stack deployment failed and you couldn't just edit the bad parameter and had to go delete and relaunch it. Man that was fun. I think they fixed that now but I'm never gonna use cloudformation to find out
|
# ? Feb 12, 2023 23:55 |
|
that happens when a stack fails to create from scratch, it rolls back to create_failed, and yes you must manually delete it because the alternative is that cloudformation authoritatively deletes your api objects without your consent which would be bad you should not be experiencing create_failed very often, it's usually a sign of someone holding something wrong
|
# ? Feb 13, 2023 00:10 |
|
12 rats tied together posted:that happens when a stack fails to create from scratch, it rolls back to create_failed, and yes you must manually delete it because the alternative is that cloudformation authoritatively deletes your api objects without your consent which would be bad No, I'm not saying I want cfn to delete it for me. I'm saying I want to edit it because I fat fingered the wrong thing in a single parameter and the whole thing failed and instead of being able to just edit it I have to delete and start over. Call it someone doing something dumb/wrong and it shouldn't happen much, sure, if you want, I'm not perfect. But it happens and it's annoying as hell. I'm sure it's some api limitation or something so it is what it is. I prefer terraform and that's what I use. If someone else wants to use cloudformation then that's cool
|
# ? Feb 13, 2023 02:01 |
|
Resdfru posted:I think they fixed that now but I'm never gonna use cloudformation to find out Cloudformation marketing slogan #1
|
# ? Feb 13, 2023 02:52 |
12 rats tied together posted:you can be forgiven for assuming cloudformation is bad if you come from azure or gcp where arm and gcp deployment manager are strictly worse than just using the platform normally Azure does all this too now but who cares? Why would anyone want a vendor locked IAC that sucks worse than TF
|
|
# ? Feb 13, 2023 13:30 |
|
you should use the good parts of your platform. it's not good engineering to find the lowest common denominator and intentionally spread its failings to every other site you run
|
# ? Feb 13, 2023 13:45 |
|
it doesn't really do all that, anyway, and i'm mad you made me read azure documentation to see if they've improved that garbage in the past 3 years not using arm is justifiable because it's bad, and gcp deployment manager is even worse. cloudformation is as valuable as RDS or ElastiCache, it should be on your toolbelt if you get paid to touch AWS.
|
# ? Feb 13, 2023 14:03 |
Lol it records stuff for sure and shows config changes I might’ve glossed over the rest of your points. I wouldn’t even know anyways I literally never use it
|
|
# ? Feb 13, 2023 14:06 |
|
Back when I used cloudformation more extensively it had a tendency to get into a create_failed then delete_failed state which required aws support to intervene and actually delete the stack and the resources associated with it. I’m sure it’s not as bad anymore but why even risk it?
|
# ? Feb 13, 2023 15:43 |
|
The one time I tried CloudFormation it happily created my resources but got stuck in a failure state trying to tear everything down again. Thanks, Amazon!
|
# ? Feb 13, 2023 15:54 |
|
When I last used cloudformation in 2018/2019 it sucked a lot more but these days I have such contempt for terraform and module related footguns I’m pretty open to alternatives
|
# ? Feb 13, 2023 16:24 |
|
Blinkz0rz posted:Back when I used cloudformation more extensively it had a tendency to get into a create_failed then delete_failed state which required aws support to intervene and actually delete the stack and the resources associated with it. It still does this on occasion but you need to pass the aforementioned compatibility flag with your API call to disengage the safety and then you're free to engage the footgun on full auto. The Iron Rose posted:When I last used cloudformation in 2018/2019 it sucked a lot more but these days I have such contempt for terraform and module related footguns I’m pretty open to alternatives Cloudformation is not, by itself, a replacement for terraform unless you really like clicking buttons and pasting values into UI prompts. You will want some kind of compositor, I have posted extensively about my preferences ITT and we don't need to go into them again. Terraform could be your compositor but it sucks for the reasons you hinted at. Pulumi is good. AWS CDK is good if you can't pitch Pulumi. As a last resort, you can use the CDKTF, which is like a store brand version of Pulumi except brought to you by the same people who brought you your first set of problems.
|
# ? Feb 13, 2023 17:16 |
|
12 rats tied together posted:As a last resort, you can use the CDKTF, which is like a store brand version of Pulumi except brought to you by the same people who brought you your first set of problems. The one and only actual use case I've found for it is just generating resources with absolutely random combinations of properties, to create piles of arbitrary garbage to test Sentinel rules against. It's good at layering in "here's a perfectly good and conformant resource, now gently caress it up in some prescribed way" Vulture Culture fucked around with this message at 22:10 on Feb 13, 2023 |
# ? Feb 13, 2023 22:05 |
|
I'm in the market for a new job. I'm currently a "devops engineer" but with skills gaps that you could drive a truck through. I'm a pretty quick learner, we just have pretty backwards infrastructure so I don't do a lot of modern stuff. I'm ~12 years into my IT career, so not a "junior" by any stretch, though I don't think I'd hit the bullet points for many "senior" devops engineer positions. Anyone have tips on the kinds of things I should be focused on learning to help me get my foot in the door? I've got lots of experience with Azure DevOps, some container experience, some Azure App Service experience, some other Azure services, and over a decade of more complex "sysadmin" experience. I started my career on Solaris and Linux, ended up deep in the Windows world, but am still pretty passable at Linux. I can pretty quickly learn just about anything technical thrown at me, I'm just not sure what I should be throwing at myself.
|
# ? Feb 13, 2023 22:33 |
|
If you replace all instances of Azure with AWS, that will probably increase your job pool by like 2 orders of magnitude. Managed by some sort of infrastructure-as-code tool such as Terraform. Sprinkle in some Kubernetes fundamentals and you're cooking.
|
# ? Feb 13, 2023 22:41 |
|
Based on some of the recent discussion here, is there much I should be looking at besides deploying to Kubernetes? Build some test clusters using AKS and/or EKS?
|
# ? Feb 13, 2023 22:43 |
|
FISHMANPET posted:Based on some of the recent discussion here, is there much I should be looking at besides deploying to Kubernetes? Build some test clusters using AKS and/or EKS? Think about how you might manage or own a release system to Kubernetes. Flux, Helmfile, ArgoCD, other. Be able to explain back the Kubernetes networking model and what all of the pieces do. What is a CNI, what is ipvs, what is a nodePort, what does kube-proxy do, what is a containerd and what happened to docker. Talk about the financial benefits of Kubernetes - why is it good beyond resume driven fad development. Ease adoption of things like spot instances, HPA, immutable infrastructure, avoiding vendor lock in by developing tooling against OSS platform rather than proprietary AWS/Azure tech, parity of logging, timeseries collection, release system infrastructure across AWS and private datacenter. Be able to explain scaling Prometheus, Log aggregation tools People have been doing microservices long before k8s - why bother. Methanar fucked around with this message at 22:52 on Feb 13, 2023 |
# ? Feb 13, 2023 22:48 |
|
FISHMANPET posted:Based on some of the recent discussion here, is there much I should be looking at besides deploying to Kubernetes? Build some test clusters using AKS and/or EKS? You write your own paychecks by getting good at this.
|
# ? Feb 13, 2023 22:52 |
|
Vulture Culture posted:What's really nice about CDKTF is that the whole reason to not write Terraform is to avoid the inanity of its DAG and single-phase application. CDKTF wraps up the whole thing in a package that makes you feel like you have the flexibility of TypeScript or Python but actually ties your hands even harder the second you need to do something with a computed value, then fucks you with the same DAG over and over and over the exciting/interesting thing about pulumi's strategy (AFAIR) is that their code uses the terraform providers just as a source of free c/r/u/d behavior, since the providers are all open source, but the core terraform dag is not used so you aren't bound by its limitations. it also means any terraform provider is also a valid pulumi provider, so it's a source of free and perpetual market relevance there are some other weird things you might run into, it really does approximate local cloud formation in that there are "stacks", but overall it's really quite a nice product. i will be sad if they shutter it.
|
# ? Feb 13, 2023 23:02 |
|
Vulture Culture posted:The hardest problems being faced by any modern technology organization all revolve around the catastrophe that emerges at the intersection of highly autonomous teams, self-service cloud environments, and the across-the-board application of governance/standards (either for risk management or for scale). It's really a shame this is too long to be the thread title
|
# ? Feb 13, 2023 23:07 |
|
Docjowles posted:It's really a shame this is too long to be the thread title Devops: catastrophe of crossing autonomous teams&self-service cloud Is 68 characters. Not sure what the limit is. But this is a good quote and I'm stealing it. Perfectly sums up what I need to deal with these days. Vulture Culture posted:The hardest problems being faced by any modern technology organization all revolve around the catastrophe that emerges at the intersection of highly autonomous teams, self-service cloud environments, and the across-the-board application of governance/standards (either for risk management or for scale). Pretend you're someone, anyone, who needs to safely get the same change made across the same layer of multiple applications within a company. How do you do it?
|
# ? Feb 13, 2023 23:13 |
|
I think I'd like to say "Solve the combined technical debt of engineers and management for $$$$" Not sure if I'm going to get paid the 7 figgies like some of the folks I know that quit over pay were looking for here though so maybe I'm too stupid or inexperienced to know that I'm actually bad at it.
|
# ? Feb 14, 2023 01:17 |
|
Technical debt isn't real. All you have are choices with consequences, and that's a lot scarier.
|
# ? Feb 14, 2023 17:43 |
|
tech debt is just a miserable pile of consequences
|
# ? Feb 14, 2023 18:39 |
|
it's important to be able to distinguish between consequences that result in extra work (debt) and consequences that have downstream benefits (interest). ideally you are designing things that have self-reinforcing downstream benefits (compound interest) in american english we use money metaphors for these things (time is money but all you gotta pay is attention) because we are a deeply diseased society
|
# ? Feb 14, 2023 18:46 |
|
a big aspect of this that i see a lot of teams and people miss is "mean time to lessons learned". your ops organization grows and evolves as fast as it can ingest and react to a change - if you're building a new thing or implementing another tech stack or if you're ripping out vendor X and replacing it with vendor Y, organizational growth is on hold until you finish that project and everyone else learns about it you can permanently silo entire arms of your organization by creating e.g. the monitoring team, with the ostensible responsibility of being the monitoring expertise group, but who largely exist in a vacuum and inject uncertainty and debt into the rest of the organization as they continue to iterate internally without the results of those iterations being acknowledged and understood by everyone else sometimes this is desirable: specialized databases. persons under compliance controls, political firewalls, or other legal concerns. billing and payment processing. human resources. most of the time it's a bad idea: monitoring. cloud. kubernetes. no amount of documentation or "proactive guidance" is a replacement for local expertise, and choosing to permanently isolate that expertise into an echo chamber is a really hard thing to get right.
|
# ? Feb 14, 2023 19:01 |
|
So I’m at an organization that has whiplashed between the two extremes as nauseam over the past few years. We went from having a devops person embedded in each dev team, which produced tons of technology sprawl because nobody coordinated. Lots of poorly configured, poorly supported environments with the attendant waste of endlessly rebuilding a better mousetrap. Too much focus on getting an app running without thinking about things like spot instances, managing logging, HA and scaling, etc. Then we went to a centralized infrastructure team, which means we think about all those things but now act as a bottleneck for all dev work. This leads to much tossing of hand grenades over cubicle walls. people on all sides are bad at communicating and worse at understanding the broader environment. Also it’s staffed largely of sysadmins who don’t really understand dev work, and devs who are unfairly contemptuous of the sysadmins, with not so great blood as a result. There’s the much beloved platform model which in theory works but basically means making GBS threads out CI templates and o11y libraries, which is fine I guess but then dev teams have to actually seek out and use and iterate on those templates and libraries to drive improvements, which won’t happen unless they’re forced to or aware of those templates existing in the first place. Management can’t help too much beyond really broad diktats like “use kubernetes”, which while obviously extremely limited in a lot of ways at least gets everyone on a shared lingua franca. Is the solution to just hire SWEs for your platform teams so you get people speaking the same language? There’s only so much devolving of monitoring/observability you can really expect to get into individual teams of 5-10 developers. It’s not reasonable to ask that every dev becomes an expert on monitoring, cloud, and kubernetes - and let’s not even talk about networking and security on top of everything else. Ultimately I feel the solution here has to be one that doesn’t rely so heavily on the skill levels of individual developers or on a rapid and responsive infra/devops/platform/sre/etc team ( aka “devops bullshit”). This is a big part of what the big three cloud providers sell - but even if VPCs obviate the need for network engineers you still need someone who knows what a route table is and who can configure peering, and that probably won’t be someone working on a feature overdue be two weeks. What I am slowly experimenting with is the idea of a devops bullshit team that provides the basic building blocks and support for that within their area of expertise. Basic CI templates, APM libraries, secrets management, and so on. To the extent devs need more than that as they inevitably will, that’s something they can build out communally or internally. A dedicated team provides a floor, upon which teams can build scaffolding to suit their own needs. Then again I also just had to shoot down people on my team saying “people shouldn’t use cloud functions it should all be k8s!” so I don’t exactly want to presume the floor is very high either. Tl;dr: the cloud is hard and people are bad at it. What is a modern theory of devops that can be applied from the perspective of an engineering organization at scale, now that we’ve seen a decade+ of people trying and failing to do devops bullshit at all sorts of organizations big and small? The Iron Rose fucked around with this message at 19:57 on Feb 14, 2023 |
# ? Feb 14, 2023 19:53 |
|
i think building an internal platform is almost always bad, and ultimately, this problem is structural and organizational instead of technical in nature. it's what C levels and directors and SVPs are supposed to be good at, so it's kind of funny that they are so observably and obviously bad at it ideal scenario for me is that you run the <whatever> team like a support org, my hunch is that this is the ultimate problem hiding under all the turtles, the role is fundamentally support, and needs to be graded and incentivized as such. i could speculate as to why we keep calling it development -> we hire a bunch of developers -> they develop a useless amalgamation of external services but worse that we force everyone internally to use -> repeat. i've worked at one place where infrastructure was not "engineering" and had a lower pay cap -- i think a lot of businesses aren't ready to accept the idea that "cloud support" is worth 200k base pay. ideally you find a bunch of people who are willing to work under a support umbrella, you grade them based on how long it takes to finish tickets, and you give them free reign to invest in their own tooling for finishing tickets as fast as possible. no sprints, no OKRs, just "how fast were you able to make us go" and "how much do other teams here hate you". compared to e.g. a huge backstage investment, this is a better strategy because it doesn't force the org into a "you have to use the new platform" migration phase. there are no unknown risks, you were already gonna get the tickets and the tickets are because of a business need, so just be really good at the tickets, and that's the whole team.
|
# ? Feb 14, 2023 20:14 |
|
the other side of this strategy's coin is that you just get rid of people who want to work in the support org but just want to build stuff and don't care about tickets. the tickets are from the customers and the customers are the entire point of having a team and not just paying extra for premium support. if you don't want to be in the "have no tickets" pipeline, congrats, you've been promoted to internal customer, where you can work on features that generate revenue for the company instead of generating animosity from your coworkers
|
# ? Feb 14, 2023 20:20 |
|
I think that’s a fine idea and solves some problems, but it doesn’t really scale to solve the others. Let me reframe what I’m trying to get at here. The cloud has democratized many of the services that used to require dedicated teams or in house software. As a result, more and more work can be handled by developers without the need to venture externally. The problem is developers still aren’t good enough at doing this to be as effective as we want them to. Logging, monitoring, scaling, HA, o11y, database administration, effective use of compute, secure design, and so on. The real question is whether them not being good enough at it actually matters, and I’m not sure that it does. If we do think it matters, some approaches here are: - central infrastructure team — default state, kinda awful in lots of ways. Bottlenecks, poor cross-functional relationships, treated as a cost centre, and scales poorly. - create building blocks such that dev teams can create good enough designs without the need to involve cross functional business units? — this is the platform approach, and sucks because you probably aren’t building better building blocks than BigCloud is. - stick someone who knows to cloud in every team — this requires someone who knows how to cloud, has a pretty high bus factor risk, and results in chaotic design patterns that are a support challenge - accept that devs aren’t good enough at this, but so long as the business keeps running not good enough is still fine/Make infra a support org. — I almost like this but it doesn’t solve for when the business needs better results than a devolved approach provides. — are you really getting $X million in value from this approach versus what the actual cloud support contracts provide? — this also doesn’t really solve for o11y and security, both of which require heavy development/admin work running your monitoring/security infrastructure. Elastic doesn’t baby itself and not every company can use a SaaS offering here. You might devolve compute, but if security and logging still result in cross-functional friction have you really solved the problem, or have you just solved infrastructure/compute sui generis? The Iron Rose fucked around with this message at 20:39 on Feb 14, 2023 |
# ? Feb 14, 2023 20:35 |
|
12 rats tied together posted:the other side of this strategy's coin is that you just get rid of people who want to work in the support org but just want to build stuff and don't care about tickets. Missed this when writing my reply, but I actually agree with this to an extent, which is that I almost think a business is better served by having no central devops/infra team at all. I’m not sure an internal support org is worth it either, but I’m also not sure the business is served by infrastructure existing at all. Maybe just fold it + networking into security. Shared services feels like a bitch to manage in general. The Iron Rose fucked around with this message at 20:40 on Feb 14, 2023 |
# ? Feb 14, 2023 20:38 |
|
i think "make infra a support org" is a version of the first bullet point you have, not the 3rd one. there is a central infra team _and_ it is a support org. for all of the downsides, getting better at them is the explicit goal of the org, it's the only incentive, the only reason to be promoted, and so on. it's not my opinion that there is a hidden better solution to this problem, the approaches are all trade-offs to a central issue that presents us with a bunch of sliders. IMO the winning strategy is to go really hard one way (centralized) but combat the problems inherent to it by shifting the thinking around the team and making "dont suck poo poo at this, your job" more of a focus than i have seen it at all of my employers. regarding scale: sure. that's a fair point. but, taking some advice from the rails era startup boon, design your processes for the size your org is right now, plus or minus 2x it's size. if you get any bigger or any smaller than that, you're gonna have to redesign anyway, pay the cost of that redesign when it happens instead of right now.
|
# ? Feb 14, 2023 20:43 |
|
this topic kind of touches on a blog article that i linked in the coc discord cloud channel recently after it was shared at my employer: https://samnewman.io/blog/2023/02/08/dont-call-it-a-platform/ i think it's a pretty good read. you might enjoy it.
|
# ? Feb 14, 2023 20:49 |
|
|
# ? May 21, 2024 21:21 |
|
One approach is to offer sliding scale with different expectations: - DIY: Your teams build it in the cloud, you own it - with all that entails. Good for Need It Fast, proof of concepts, not much red tape, but you're on your own if you have a problem. - "Basic" shared platform: easy to onboard, takes care of most of the details, but it's opinionated and support is not great. Good for teams who can't cloud very well. - "S-tier" shared platform: high bar to onboard (must pass security / finance / legal reviews) but rock solid support and features. Good for critical apps.
|
# ? Feb 14, 2023 20:49 |