Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
LochNessMonster
Feb 3, 2005

I need about three fitty


Installing Prometheus/Grafana with a Helm chart sounds like the perfect way to start indeed, great suggestion thanks!

CD was not done with Argo unfortunately. Have seen a pretty slick demo recently so maybe something to move towards in the future.

Adbot
ADBOT LOVES YOU

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
Helmfile isn’t bad for orchestrating multiple releases in a cluster: https://github.com/helmfile/helmfile

Hooks in nicely with your secrets management tool of choice as well. Use with any CI provider you want - GitHub, gitlab, jenkins; anything that runs a script on a commit or merge will work here. Haven’t used Argo CD but I’ve heard good things.

Hadlock
Nov 9, 2004

Oh one feature that's not obvious, helm has a template feature, where it will output the completed template. This is handy in case you have pushback against using helm for some reason (a vocal minority loathe helm) or you just want to inspect the output. I've found this to be very helpful during debug without having to update live resources on a real cluster, takes all the mystery out of what's actually happening

xzzy
Mar 5, 2009

2023 is gonna be the year of helm and ansible to me. We're a hardcore puppet shop and literally everything ties into it so I've been on cruise control for several years, creating a huge gap in my knowledge.

But puppet is not as good with cloudy stuff which is finally becoming A Thing here so I'm gonna take that opportunity.

SurgicalOntologist
Jun 17, 2004

I was told in a meeting today, "I hear that Microsoft is the cheapest cloud provider these days, have we requested any proposal from them for how much it would cost to run our infrastructure?" Please lord, tell me my answer was satisfying to him ("it could be free and we would still lose money on the migration") and he drops it.

Methanar
Sep 26, 2013

by the sex ghost

SurgicalOntologist posted:

I was told in a meeting today, "I hear that Microsoft is the cheapest cloud provider these days, have we requested any proposal from them for how much it would cost to run our infrastructure?" Please lord, tell me my answer was satisfying to him ("it could be free and we would still lose money on the migration") and he drops it.

This is the best answer you could have given tbh.

luminalflux
May 27, 2005



Hadlock posted:

(a vocal minority loathe helm)

I've run into this and it's usually due to Helm 2 needing tiller installed and running in the cluster. Helm 3 does not require this. Other than that idk why people would loathe helm, beyond the godawful templating that makes ansible look palatable

12 rats tied together
Sep 7, 2006

if i get to pick, it's pulumi's k8s provider. if i just need to cram some values into some manifests, totally greenfield, it's kustomize

usually the values come from some other system and in that case it's ansible due to its propensity for existing already and for having access to all those systems

helm is generally the least good option out of all of them unless your needs are, exactly, "the public helm chart". in that case you're stuck with it

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
I dunno who the hell in TYOOL 2022 with a room temperature IQ at least keeps thinking that cloud saves money unless you're planning on growing like gangbusters and need a ton of capacity fast and have no idea how to run datacenters? It's like falling for a 419 scam by now as CTOs and CIOS

Methanar
Sep 26, 2013

by the sex ghost
Current mood:

1password crashing instantly on me because IT did an update or some poo poo. Preventing me from getting my AWS password to investigate an incident.
Non-AWS console tooling failing on me because of the incident. Nice.

Methanar fucked around with this message at 00:45 on Dec 21, 2022

luminalflux
May 27, 2005



i don't even have an aws password

all hail SSO

George Wright
Nov 20, 2005
Helm is great if you never have to look at it beyond helm install <some public chart>. The things people do in the helm templates can be breathtaking.

ArgoCD, on the other hand, is spectacular.

Junkiebev
Jan 18, 2002


Feel the progress.

Sylink posted:

in kubernetes, lets say you are trying to rolling update pods that have requests to limit the pods per node, but you need to ignore that to update the pods. how do you get around that?

I'm in a situation where on rolling update, some of the new pods are stuck pending because the scheduler sees the existing pods eating the resources, when obviously the final state it would replace those pods.

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable

.spec.strategy.type: RollingUpdate

.spec.strategy.rollingUpdate.maxUnavailable: 1

I'm assuming your replicas are <4, because the default is 25% and the absolute number is calculated from percentage, rounding down.

k8s is weird

Junkiebev fucked around with this message at 05:49 on Dec 21, 2022

Junkiebev
Jan 18, 2002


Feel the progress.

depending on your workload, you might want to ask your doctor if StatefulSets are right for you!

Hadlock
Nov 9, 2004

luminalflux posted:

I've run into this and it's usually due to Helm 2 needing tiller installed and running in the cluster. Helm 3 does not require this. Other than that idk why people would loathe helm, beyond the godawful templating that makes ansible look palatable

I don't disagree, I think they're largely obtuse contrarians, but they do exist

I'm happy to adopt something better but in the mean time, at least a third of companies using k8s use helm in some fashion, seems to be keeping the internet running ok

Methanar
Sep 26, 2013

by the sex ghost
at least its not ksonnet

Junkiebev
Jan 18, 2002


Feel the progress.


this word does not exist in the Quran, so I deny it!

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
My rule tends to be “consume helm charts, don’t write them” and it’s worked out pretty well so far.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug
Let's play Pattern or Anti-Pattern!

Something I see a lot is teams creating a module (either in Bicep or in Terraform) that wraps around a single resource. i.e. a module for an Azure App service plan. It just exposes a few properties of the app service plan as parameters.

I consider this an anti-pattern: A module should be a versioned unit of reuse that provides a template for a set of interrelated resources, not a thin wrapper around a single resource.

This is even worse when the single-resource modules aren't properly versioned, meaning all consumers are tightly coupled to a thin wrapper that is impossible to change because dozens of consumers need to be simultaneously updated otherwise their infrastructure all breaks unexpectedly.

This comes up a lot more in Bicep, because Bicep doesn't have the ability to split individual resources out into separate files. So people create a module just so they can have an "app_service.bicep" file. But I see it in Terraform too and it drives me crazy.

12 rats tied together
Sep 7, 2006

to give terraform a little bit of credit, they do explicitly tell you not to do that in the docs these days.

unfortunately they were 4-5 years too late and now it's endemic.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
It’s awful especially when you have OPA rules to enforce naming patterns and whatnot. I had to kill the few single resource modules I found at current job - thankfully there were only a handful.

It adds a totally unnecessary layer of abstraction, maintenance overhead, and ties you to specific provider version semantics.

Modules should be created very rarely and used sparingly.

The Fool
Oct 16, 2003


we maintain a library of modules and publish them in a private library, we need to have very strict policies and be diligent about pr's from outside the team because people try to do all kinds of nonsense

Docjowles
Apr 9, 2009

The Fool posted:

we maintain a library of modules and publish them in a private library, we need to have very strict policies and be diligent about pr's from outside the team because people try to do all kinds of nonsense

This sort of reminds me of the bad old days when I had to work with Chef a lot. The community cookbooks for popular apps were absolute monstrosities that were more complex than the actual app they were meant to manage. They tried to cover every possible use case and be all things to all people. Like gently caress you why do I have to load a bunch of windows and powershell cookbooks into my chef server (and keep them up to date when you randomly bump the required versions) when I run literally zero windows machines? I always want to reuse existing code if I can but some of these just had way too many settings and layers of abstraction.

So I appreciate your dedication to keeping things focused rather than taking every half assed PR that adds some esoteric feature one user wants.

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

New Yorp New Yorp posted:

Let's play Pattern or Anti-Pattern!

Something I see a lot is teams creating a module (either in Bicep or in Terraform) that wraps around a single resource. i.e. a module for an Azure App service plan. It just exposes a few properties of the app service plan as parameters.

I consider this an anti-pattern: A module should be a versioned unit of reuse that provides a template for a set of interrelated resources, not a thin wrapper around a single resource.

This is even worse when the single-resource modules aren't properly versioned, meaning all consumers are tightly coupled to a thin wrapper that is impossible to change because dozens of consumers need to be simultaneously updated otherwise their infrastructure all breaks unexpectedly.

This comes up a lot more in Bicep, because Bicep doesn't have the ability to split individual resources out into separate files. So people create a module just so they can have an "app_service.bicep" file. But I see it in Terraform too and it drives me crazy.
I think a lot of companies went this way (including mine) because there wasn't (maybe still isn't? I don't deal with it much anymore) a way of pulling in environment-specific variables to auto-populate resource fields for end users while also enforcing mandatory things like tags without that wrapper. You end up making a joe_web module because all joe webservers go in this subnet which the app team has no idea about because it's abstracted from them, adding 20 fields in every resource is messy/duplicates work and is hard to keep updated, and security/finance mandates everything gets these tags, so if you're a "frameworks" type group trying to provide structure to various app groups, unless you want to PR every single change by every group that uses terraform well as "own" all the terraform files, modules are the only way to provide some structure and enforce standardization to those sister orgs.

With terraforms limitations, without the module wrapper layer there's very little abstraction which just doesn't work very well in larger orgs unless you use some other tool to handle that, but templating out terraform files in that way lies madness. You couldn't even use interpolation in other declared variables which precluded importing some sort of universal standards config file you could manage.

Bhodi fucked around with this message at 22:11 on Dec 21, 2022

The Fool
Oct 16, 2003


We manage a self-service platform built around Terraform Enterprise and in addition to the private module registry we have an insane onboarding process and deployment pipeline to scaffold all of those environment settings that you can't trust anyone to ever do correctly if left to their own devices.

Methanar
Sep 26, 2013

by the sex ghost

Bhodi posted:

I think a lot of companies went this way (including mine) because there wasn't (maybe still isn't? I don't deal with it much anymore) a way of pulling in environment-specific variables to auto-populate resource fields for end users while also enforcing mandatory things like tags without that wrapper. You end up making a joe_web module because all joe webservers go in this subnet which the app team has no idea about because it's abstracted from them, adding 20 fields in every resource is messy/duplicates work and is hard to keep updated, and security/finance mandates everything gets these tags, so if you're a "frameworks" type group trying to provide structure to various app groups, unless you want to PR every single change by every group that uses terraform well as "own" all the terraform files, modules are the only way to provide some structure and enforce standardization to those sister orgs.

With terraforms limitations, without the module wrapper layer there's very little abstraction which just doesn't work very well in larger orgs unless you use some other tool to handle that, but templating out terraform files in that way lies madness. You couldn't even use interpolation in other declared variables which precluded importing some sort of universal standards config file you could manage.

I wonder if there is a way mass tagging of everything created by a given TF project with an intermediary processor hook of some kind that could inject information into Terraform's AST without filling up the actual TF project with repetitive tag declarations on every resource.

12 rats tied together
Sep 7, 2006

Pulumi has exactly that and it's called (iirc) a Transformation. It's basically a callback that is applied to every resource, the example in the docs is an auto tagger, but it's a useful pattern in general.

Pulumi also has the ability append an imported flag to a resource, so you can use the same skeleton in every e.g. subnet, but this time all the resources should be imported from this list of ids that I got from cloudformation, and the other time they don't exist yet so you can create them.

It's good.

Hadlock
Nov 9, 2004

Bhodi posted:

I think a lot of companies went this way (including mine) because there wasn't (maybe still isn't? I don't deal with it much anymore) a way of pulling in environment-specific variables to auto-populate resource fields for end users while also enforcing mandatory things like tags without that wrapper. You end up making a joe_web module because all joe webservers go in this subnet which the app team has no idea about because it's abstracted from them, adding 20 fields in every resource is messy/duplicates work and is hard to keep updated, and security/finance mandates everything gets these tags, so if you're a "frameworks" type group trying to provide structure to various app groups, unless you want to PR every single change by every group that uses terraform well as "own" all the terraform files, modules are the only way to provide some structure and enforce standardization to those sister orgs.

With terraforms limitations, without the module wrapper layer there's very little abstraction which just doesn't work very well in larger orgs unless you use some other tool to handle that, but templating out terraform files in that way lies madness. You couldn't even use interpolation in other declared variables which precluded importing some sort of universal standards config file you could manage.

:five:

We had our main cloud services in AWS but every project (GCP services namespace) in GCP needed an XPN connection to the AWS VPN, and needed all sorts of very specific VPN subnet settings etc. You could create a new project and k8s cluster using vanilla tf but there's no way you would figure out how to do all the nuanced config, so instead you would just use hadlock_gcpproject.module and hadlock_k8scluster.module and skip three hours of reading documentation and trial and error. Also when VPN settings changed you got that got free, rather than having to debug weird networking problems because joebob didn't get the memo to update the terraform VPN, or whatever

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Hadlock posted:

:five:

We had our main cloud services in AWS but every project (GCP services namespace) in GCP needed an XPN connection to the AWS VPN, and needed all sorts of very specific VPN subnet settings etc. You could create a new project and k8s cluster using vanilla tf but there's no way you would figure out how to do all the nuanced config, so instead you would just use hadlock_gcpproject.module and hadlock_k8scluster.module and skip three hours of reading documentation and trial and error. Also when VPN settings changed you got that got free, rather than having to debug weird networking problems because joebob didn't get the memo to update the terraform VPN, or whatever

We do the same thing for project (auto budgeting and audit logging) and VPC creation (auto vpn +dns peering + private services connect) and it is exactly what you want from a module because it’s a tightly coupled set of resources your developers shouldn’t need to care about. Ours has about a dozen or two resources that’s easily invoked and extensible from the module call.

What we do not want is “create this VM, you get these tags and this naming scheme.” The maintenance is not worth the extra layer abstraction, use OPA rules or SCPs or whatever the gently caress - a terraform module isn’t the right way to solve it. AWS has an amazing provider_tags feature that rules though.

In practice devs just say “give me a subnet” and we create it for them, but it still makes our lives easier.

Also death to terraform long live pulami

The Iron Rose fucked around with this message at 22:41 on Dec 21, 2022

Methanar
Sep 26, 2013

by the sex ghost
At some point we wrote an autotagger that looks at every object in a given AWS account, and maps the creator of the object to some set of tags like what team they are on and tags objects based on that.

Like bob is a member of datascience. So every resource created by bob gets datascience billing domain/team tags applied to it if bob didn't do it properly himself, or specify otherwise.

That system kind of works and encourages teams to fix their terraform to include the tags otherwise they'll have constant state diffs reported by TF since the autotagger will just keep overwriting you for doing things wrong.


If I ever get to green field infrastructure I'm going all in on pulumi

Methanar fucked around with this message at 22:42 on Dec 21, 2022

SurgicalOntologist
Jun 17, 2004

Methanar posted:

This is the best answer you could have given tbh.

I thought so but his answer was "if it's free then we didn't do a proper search choosing the provider and everyone involved should be fired." :smuggo:

gently caress, our spend is a perfectly reasonable for our scale and no one seems to understand that optimizing our costs when we haven't even validated product-market fit is the last thing we should be doing! Maybe $20k is the biggest cell on your stupid business spreadsheet but getting that down to $15k is not what's going to save the business! God, why did I join a startup with zero tech experience on the management team, I'm such an idiot :emo:

/vent

PS what's the job market like these days, I want to build developer platforms on k8s, I make terrible personal decisions but my technical decisions are merely mediocre

Edit: the last line on my resume will be

my future LinkedIn posted:

  • Successfully optimized cloud costs, gaining the company an extra week day of runway.
  • Spent that hard-earned day updating my resume

SurgicalOntologist fucked around with this message at 01:03 on Dec 22, 2022

SurgicalOntologist
Jun 17, 2004

For some better thread content, I think Crossplane has a lot of potential for solving those pain points with TF (see above about mediocre decisions though)

Zephirus
May 18, 2004

BRRRR......CHK

Blinkz0rz posted:

My rule tends to be “consume helm charts, don’t write them” and it’s worked out pretty well so far.

We're at about 4 vendors now where we have to maintain custom forks of their helm charts because they don't consider customisations that might need to happen or that it might not be running in EKS or GCP, or that maybe we don't want to run the db in k8s or similar. The situation isn't ideal to be honest.

12 rats tied together posted:

to give terraform a little bit of credit, they do explicitly tell you not to do that in the docs these days.

unfortunately they were 4-5 years too late and now it's endemic.

I'm a fair way through un-doing this mentality on our tf estate. All our infra was in modules and it just made it impossible to update providers without breaking everything.

I want to throw a hat into the TF deployment ring for Atlantis which works really well for us.

Hadlock
Nov 9, 2004

Methanar posted:

At some point we wrote an autotagger that looks at every object in a given AWS account, and maps the creator of the object to some set of tags like what team they are on and tags objects based on that.


Yeah I think terraform finally has a feature where everything that's taggable gets some custom auto tags now. It's not perfect but at least the budget guys can pull it up and be like "yep this is owned by DevOps" which is a start at least

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Methanar posted:

That system kind of works and encourages teams to fix their terraform to include the tags otherwise they'll have constant state diffs reported by TF since the autotagger will just keep overwriting you for doing things wrong.
This is the lawful evil box of the devops D&D grid meme.

Force overwrite and let god other groups sort it out

Bhodi fucked around with this message at 02:02 on Dec 22, 2022

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Zephirus posted:

We're at about 4 vendors now where we have to maintain custom forks of their helm charts because they don't consider customisations that might need to happen or that it might not be running in EKS or GCP, or that maybe we don't want to run the db in k8s or similar. The situation isn't ideal to be honest.

Whenever this happens to me I take the time to ensure my changes are polished and then I offer to send the patches to the vendor so they can roll it into the chart upstream and I stop having to carry the patches locally. I'm still waiting for any vendor to take me up on this offer.

Okita
Aug 31, 2004
King of toilets.

necrobobsledder posted:

I dunno who the hell in TYOOL 2022 with a room temperature IQ at least keeps thinking that cloud saves money unless you're planning on growing like gangbusters and need a ton of capacity fast and have no idea how to run datacenters? It's like falling for a 419 scam by now as CTOs and CIOS

It's about taxes and accounting. Everything cloud is tax deductible in the year the expenses were incurred, whereas datacenter investments are deducted over multiple years based on depreciation and amortization.

The datacenter investment requires multi-year planning and up-front money and is much less flexible.

You've just landed your dream job as the CTO of a fortune 500 company- employee turnover averages maybe 2-3 years. You have no idea wtf is going on across the hundreds of kingdoms and fiefdoms in the software landscape at your company. Every team is using a different tech stack. Now you need to somewhat accurately predict what they will all need over the next few years when most of the people there will be replaced by completely different people by then.

Or you could say gently caress all that noise, tell them all to just use cloud, and package that up as a win for yourself to put on your resume before moving onto the next fortune 500 company.

Which one would you honestly choose? Do you just love Megacorp inc. sooo much that you're going to be putting in the hard legwork for them to keep shareholder pockets slightly more full than usual?

Methanar
Sep 26, 2013

by the sex ghost

Okita posted:

It's about taxes and accounting. Everything cloud is tax deductible in the year the expenses were incurred, whereas datacenter investments are deducted over multiple years based on depreciation and amortization.

The datacenter investment requires multi-year planning and up-front money and is much less flexible.

Nobody buys a datacenter with a big upfront capital expense. I'm pretty sure what happens is that you take out a loan and treat that payments on the loan as an operating expense in your non-GAAP earnings report.

In fact when you buy your reserved instances for a year, you don't even really pay upfront for that either but still take out a loan. In the end you always end up with a similar situation of spending somebody else's money. At least you did before 2022 when interest rates were low.

Don't know if that's changed. Actually it definitely changed. The fed raising interest rates works by preventing exactly this practice of just taking out a loan for everything as free money with sub-inflation interest rates. With off the charts interest rates, it's no longer financially advantageous to be doing this which means there is less money overall to be spending on growth, like buying a new datacenter with money you don't have yet.

Methanar fucked around with this message at 21:47 on Dec 26, 2022

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Methanar posted:

At some point we wrote an autotagger that looks at every object in a given AWS account, and maps the creator of the object to some set of tags like what team they are on and tags objects based on that.

Like bob is a member of datascience. So every resource created by bob gets datascience billing domain/team tags applied to it if bob didn't do it properly himself, or specify otherwise.

That system kind of works and encourages teams to fix their terraform to include the tags otherwise they'll have constant state diffs reported by TF since the autotagger will just keep overwriting you for doing things wrong.

If I ever get to green field infrastructure I'm going all in on pulumi
that works great for mythical unicorn modern companies in which a given person works on one and only one team

in other kinds of companies, you have to write off team membership entirely as a useful construct for everything except birthright licensing/permissions on common systems, and push stuff into system+role in increasingly convoluted ways

Vulture Culture fucked around with this message at 00:57 on Dec 27, 2022

Adbot
ADBOT LOVES YOU

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Okita posted:

It's about taxes and accounting. Everything cloud is tax deductible in the year the expenses were incurred, whereas datacenter investments are deducted over multiple years based on depreciation and amortization.

The datacenter investment requires multi-year planning and up-front money and is much less flexible.
"Datacenter" is the tongue in cheek keyword here - physical / self-managed infrastructure is more correct really. We have a colocation lease agreement in the same way that people can buy reserved capacity / instances and many colo companies allow pay-as-you go utilization models now (see: Equinix Metal AKA Packet). Truth really is that most start-up companies are (rightfully so!) overweight on software engineers and they're total garbage at basic sysadmin like package management, user access controls, and so forth, so you might as well pay someone else to decide this crap for you better. I don't ever want to have to manage ssh keys ever again in a job if at all possible, holy cow why am I doing this crap still making 8x+ what I used to?

Also, MACRS depreciation for durable assets also gets deducted over time depending upon the class at 5 or 10 year life of service and shifting more to opex over Capex has a trade-off where opex needs to be kept under control better because wildly swinging opex makes investors nervous, which a lot of cloud spend is terrible for while on-premise self managed infrastructure is overall more stable. "Saving" 70% on capital commitments while opex jumps 200% YoY doesn't make any investor happier, which is roughly the kind of rates I've seen for almost every organization... until they fire their old graybeard sysadmins that couldn't make their networks and systems not suck after decades of fighting M&A debts. So while trying to tweak margins sorta works for not very savvy investors anyone better than Jim Cramer knows these tricks for growth companies and how it doesn't really do jack poo poo for making a business more competitive or anything, which is why I'm in the Warren Buffett mode of thought here for tech overall long-term.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply