Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Love Stole the Day
Nov 4, 2012
Please give me free quality professional advice so I can be a baby about it and insult you
CDK and Pulumi are cool and good

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

:hai:

I don't have a ton of experience with the CDK but Pulumi is incredible, basically the same amount of freedom to solve problems that I got from Ansible in the mid 2010s and haven't seen since. There are some funky things about it though:

1, the resource objects are immutable once instantiated, so you can't define "the load balancer" in a shared library, import the library, and then change the health check timeout. Instead there are typed "thingArgs" objects in the library which you can do this with, for example in python you might create "the load balancer config" as a named tuple, attrs class, dataclass, or whatever, and then you can import that with all of its defaults and feed it into your load balancer resource after modifying the health check.

2, the pulumi engine magically scoops up all instances of Resource and orchestrates them, which is not idiomatic in most languages. I think most people expect that you would have to return the resource objects to some type of parent scope, or pass them to a handler, but they "just work".

3, even though the code looks intuitive and synchronous, the pulumi engine does some weird poo poo behind the scenes so that you (like terraform) don't have to tell it manually about the order for resolving every dependency. This means that even if it looks really trivial, you usually can't do "give me the ARN of this load balancer and HTTP POST to our inventory service" in an obvious way. The mechanism for this in pulumi is called apply and you have to feed it a callback function that will run once the value is known. If you're doing a lot of this, it makes sense to pick a language that is really good at callbacks (typescript), even if you might have preference for one of the other supported languages or even just the YAML mode.

But overall, it's a massive improvement over basic terraform IMO. The easiest thing to sell people on in my experience is the Transformations construct which is basically an object that describes a set of changes to make to a stack (workspace). You can define these Transformations in a repeatable, DRY way and then very ergonomically apply them to a particular stack, or all stacks, a couple obvious examples of this are: "We tag all resources in the PCI account with CONCERNS_PCI=True", and "every resource should have the pulumi tag". These are both quick tagger functions you can write in an "auto_taggers" module and then you just import and apply them to your stack, and then you get free tagging forever that is impossible to gently caress up or forget.

Docjowles
Apr 9, 2009

I am falling out of love with terraform more all the time. It was a revelation when it came out and I was a die hard Terraform fan for quite a while. But even after years of improvement and passing 1.0, MAN is HCL still kind of awkward and lovely to work with. Any kind of non trivial loops or conditionals are just the most tortured crap. You can “fix” it by wrapping it with something else but some point, I would just rather be using something else entirely than papering over shortcomings.

I haven’t played with it much yet but I know a number of CDK true believers. Haven’t met any Pulumi users in the wild.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Terraform has all the sins and warts of Puppet and Chef combined with making worse one of the better things around loops and conditions being regular old Ruby essentially. In the future I suspect writing raw Terraform will be viewed the way we oftentimes write JavaScript today - by writing in anything other than that such as TypeScript.

Also seriously what’s with some of these infra folks using TypeScript for infrastructure code? Maybe I’m missing something important but I can’t think of a particularly solid construct that makes learning a whole rear end other language and tool chain on top of Terraform worth it unless the language and library ecosystem is more stable than the cesspool of technical debt that are both NPM and yarn.

kalel
Jun 19, 2012

what's the market share of pulumi vs. terraform and ansible? is it actually being used at scale? I haven't heard of it before lol, so I'm curious

Pile Of Garbage
May 28, 2007



It's hard to compare Ansible and Terraform as whilst they can do the same kind of things they do them in radically different ways.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Docjowles posted:

I am falling out of love with terraform more all the time. It was a revelation when it came out and I was a die hard Terraform fan for quite a while. But even after years of improvement and passing 1.0, MAN is HCL still kind of awkward and lovely to work with. Any kind of non trivial loops or conditionals are just the most tortured crap. You can “fix” it by wrapping it with something else but some point, I would just rather be using something else entirely than papering over shortcomings.

I haven’t played with it much yet but I know a number of CDK true believers. Haven’t met any Pulumi users in the wild.

God yes. Terraform was revolutionary but it’s very quickly becoming too cumbersome to use. New and better abstractions are badly needed.

The main benefit of it is that it’s not really coding, so it’s not intimidating for devops peeps who don’t know how to code and it’s incredibly accessible as a result. That’s to its credit, but targeting that audience naturally comes with problems for those who do, especially because unlike ansible, it’s not as easily extensible.

I haven’t used Pulami in a production environment but I’d dearly like to.

E: vvvvv lmao I thought we were in that thread, death to state

The Iron Rose fucked around with this message at 14:49 on Aug 19, 2022

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Firstly, there's no real concept of an inventory in Terraform while Ansible is intimately tied to where code is executed and lives and communicates between different places while Terraform is implicitly locally executed for basically everything it does (its IPC calls are on the same box although there's not a great deal architecturally that would stop TF from having providers and plugins execute on remote systems eventually). Secondly, Ansible doesn't keep global state of a system as a rule and expects modules to resolve localized state themselves, Terraform maintains state and passes it (for better or worse) to the provider / plugin code. This drastically changes runtime behaviors and patterns.

Ansible is more comparable to stuff like Puppet, Chef, and Saltstack that are all designed around internal systems management constructs and needs rather than Terraform's constructs of materializing a global system state with a declaratively defined language wrapping a serialization format. The infra nerd CI/DevOps/SRE thread is a better place to go for deeper discussion

12 rats tied together
Sep 7, 2006

A couple of big orgs "use pulumi" but they tend to be the type of org that use a little bit of everything. It's not nearly as commonly used as terraform or ansible. Hashicorp's own CDK went GA earlier this month as well, CDKTF, which could be a useful introduction to working with this type of tool for anyone who is interested.

Notable differences from Pulumi are that the resource objects are mutable and the config holder objects are not, although the resource objects must be modified through explicit setter methods e.g. import CompanyNameEcsService -> set_load_balancer(). You have to instantiate a wrapper class for "resources in a stack" and you end up shoving a lot of stuff into class constructors for this reason, compared to Pulumi which handles the stack <- resource relationship more implicitly with some CLI commands and other exterior scaffolding.

CDKTF also has Pulumi's transformations construct but it calls them "aspects", its where your auto taggers and the like would go. It's not nearly as bad as I was expecting, but something happened to hashicorp in the last 5 years where they're just awful at documentation now. Pulumi's docs identify which resource properties will trigger a rebuild, and they host their own documentation compared to the CDKTF which has them all up on constructs.dev

Both tools can interop with HCL Terraform but Pulumi's UX for reading terraform outputs is much better, while CDKTF has a unique(?) advantage in that you can invoke an HCL module directly from your python/ts/whatever CDKTF program.

12 rats tied together
Sep 7, 2006

necrobobsledder posted:

Also seriously what’s with some of these infra folks using TypeScript for infrastructure code? Maybe I’m missing something important but I can’t think of a particularly solid construct that makes learning a whole rear end other language and tool chain on top of Terraform worth it unless the language and library ecosystem is more stable than the cesspool of technical debt that are both NPM and yarn.

The CDK and CDKTF are TypeScript heavy because their cross-language capabilities come from jsii, an AWS library that lets any language interact with javascript stuff. Pulumi's uses grpc which probably explains a lot of the other differences, I don't have a lot of experience using cross-language facilities in either tool and honestly I hope to never develop this experience either.

Pulumi was founded by Microsoft people, from what I understand, and that whole ecosystem loves TypeScript for various good and bad reasons. I can tell you that it is surprisingly ergonomic, if you don't mind dealing with the transpiler and the abundance of things called "interface" that are actually just data. I've written CDK-style code in C#, F#, TS, and Python and my preferences are basically that list in reverse order.

More generally: the dream of simply having the developers write the terraform was unattainable unless you were working with the most patient and motivated developers to ever exist. If you want the dev teams to seriously commit to writing their own infra code, you have to meet them where they're at (in their IDE, in their preferred language).

BaseballPCHiker
Jan 16, 2006

The Iron Rose posted:

God yes. Terraform was revolutionary but it’s very quickly becoming too cumbersome to use. New and better abstractions are badly needed.

The main benefit of it is that it’s not really coding, so it’s not intimidating for devops peeps who don’t know how to code and it’s incredibly accessible as a result. That’s to its credit, but targeting that audience naturally comes with problems for those who do, especially because unlike ansible, it’s not as easily extensible.

This is me 100%, I just started playing with it a while ago.

Being able to deploy a whole VPC with all the parameters I need for some poo poo vuln scanner I have to support has been a revelation. I started using it for a few things in my personal AWS account as well. But now you're telling me I am once again behind the times!?!

Internet Explorer
Jun 1, 2005





And here I am learning CloudFormation like an idiot.

Internet Explorer fucked around with this message at 20:42 on Aug 19, 2022

The Fool
Oct 16, 2003


gotta eat your dog food

12 rats tied together
Sep 7, 2006

CloudFormation is extremely good, and there are reasons to use it even from Terraform. Time you spend learning it is never wasted.

Vanadium
Jan 8, 2005

You're not gonna regret learning how cloudformation works if you end up using CDK. :v:

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
CloudFormation and Terraform wind up calling the same dang endpoints in the end so you’re really learning your cloud provider and AWS, Google, MS win either way.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I like cloudFornation.

Just like I liked dropping acid and going to class back in the day: everything is chaotic and mysterious, the simplest task feels like an epic adventure, and days later you look back at the notes you took and can only think wtf.

vanity slug
Jul 20, 2010

The only reason people learn CloudFormation is to pass the exams.

MightyBigMinus
Jan 26, 2020

i've worked for a handful of b2b saas companies that deploy some stuff into the customers aws account (even if just an iam role/policy to assume) and my general breakdown of clients was:
80% - did not use infra as code at all
10% - used terraform and are very tribal/opinionated/social-signal-y about it
10% - just used cloudformation
.0001% - used cdk or pulimi or ansible

so for aws only environments you dealt with 90% of your customers by using cloudformation, because the web-UI and background state are handled by amazon. introducing tf to their env would create the prereq for "where does it run and save state" which is just a total gently caress-that conversation to be avoided.

Startyde
Apr 19, 2007

come post with us, forever and ever and ever
I’d shift that 80% up closer to 90 but otherwise concur.
The amount of very bad shell invoking the aws cli passing for IaC I’ve seen is harrowing, though that might be a local quirk from so minicomputing grognards in the region.

ledge
Jun 10, 2003

Startyde posted:

I’d shift that 80% up closer to 90 but otherwise concur.
The amount of very bad shell invoking the aws cli passing for IaC I’ve seen is harrowing, though that might be a local quirk from so minicomputing grognards in the region.

I'm not too proud to deny that I may have used excel to create a bunch of cli commands to add users to Connect via the cli.

One thing I have found is that provisioning via cf seems slow as poo poo compared to doing the same via the sdks.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

ledge posted:

I'm not too proud to deny that I may have used excel to create a bunch of cli commands to add users to Connect via the cli.

I’ve done this for adding peering connections and modifying route tables when adding a VPC to existing infrastructure. I could create CF statements for it, but copying and modifying a single line script twenty times is far, far less hassle.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Cost savings and performance optimization have always been an interest of mine and as I've yet to actually do any professional AWS work, I'm curious: what are the big monthly cost items for a company when it comes to AWS? I'm guessing S3 and EC2? Watching EMR and Spark tutorials, it looks like you can quickly ring up eye-watering charges on clusters but I don't have much to compare it against.

For those of you working with this stuff daily, if you started at a company and were tasked to start saving them money on the monthly bill, where would be the first places you'd start looking?

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Hughmoris posted:

Cost savings and performance optimization have always been an interest of mine and as I've yet to actually do any professional AWS work, I'm curious: what are the big monthly cost items for a company when it comes to AWS? I'm guessing S3 and EC2? Watching EMR and Spark tutorials, it looks like you can quickly ring up eye-watering charges on clusters but I don't have much to compare it against.

For those of you working with this stuff daily, if you started at a company and were tasked to start saving them money on the monthly bill, where would be the first places you'd start looking?

It’s EC2. EC2 is the biggest service on AWS and companies gather there before branching off into other services.

To mitigate this, there are a bunch of ways to cut costs, from reserved instances to savings plans to EDPs.

S3 is a big service but is relatively cheap and all you need to do there is turn on tiered storage and yer done.

When you get into millions, if not tens of millions, per month then you can negotiate highly customized contract billing with AWS for ridiculous savings. But plan on spending fifty million+ USD per year first.

12 rats tied together
Sep 7, 2006

The first place is always the bill, and it has everything you need generally. Everything comes with a big "it depends" asterisk because everything, well, depends.

Big cost drivers are, IME:

- S3 storage, there's usually at least one s3 bucket named after the org that has an order of magnitude more data than it should.
- RDS instance type tends to just get ratcheted up over time as more "database events" happen and are solved with vertical scaling.
- AWS' list price for data transfer is exorbitant, so if you have a chatty app on AWS, it tends to dominate your spend (adtech is really bad about this in particular).
- If you have a chatty app that is only chatty "privately", whatever ops team that exists has usually never done the work to optimize for that so you'll see a lot of inter-AZ bandwidth charges too.

EMR tends to be pretty cheap because you can just run it on spot. If there's no capacity, whatever, run the job again when there is. EC2 instance type is also kind of an obvious cost center so even the least responsible orgs have optimized around it to some degree.

Happiness Commando
Feb 1, 2002
$$ joy at gunpoint $$

EC2, S3, RDS are usually the big ones for most enterprises.

Cost savings measures:

* Are they using reserved instances or compute savings plans?
* How much inter-AZ or inter-region traffic is there, and can it be rearchitected around?
* VPC endpoints instead of making API calls over the public internet?
* GP3 vs. GP2 EBS volumes?
* too many old snapshots sticking around
* Are they using any S3 tiers other than the standard one?
* modern instances classes vs. old ones
* rightsizing with Compute Optimizer
* rightsizing EBS volume size
* are they using datacenter-like usage patterns (like querying an S3 API every second 24/7 for if a new file has showed up vs using a message bus)
* transit gateways are expensive


Finops tools:
CUDOS and other CID dashboards
Cost Anomaly detector
Cost Categories/Budgets when paired with intelligent tagging strategies

Thanks Ants
May 21, 2004

#essereFerrari


- Don't migrate your VMs to EC2 and wonder why :yaycloud: isn't giving you the savings that it was advertised as being able to bring

Hughmoris
Apr 21, 2007
Let's go to the abyss!
That is great insight, thank you. I've worked for several community hospitals where it was sometimes a struggle to keep the doors open, so I always enjoyed hunting for those easy wins when it came to costs savings.

Agrikk posted:

When you get into millions, if not tens of millions, per month then you can negotiate highly customized contract billing with AWS for ridiculous savings. But plan on spending fifty million+ USD per year first.

I was watching an AWS Event (I think) video where they were interviewing a data architect who spoke about his company utilizing Lambda for their ETL. He said the current pipeline cost $1k/day. That seems like a ton to me but I can imagine there are some absolute bonker monthly invoices out there.

*Funny enough, that data architect worked for an adtech company.

Thanks Ants
May 21, 2004

#essereFerrari


Also I think it's important to know when the cloud isn't the right option. If you need to run an enterprise app that is going to eat 16 cores 24x7 and make billions of storage transactions, has strict requirements in terms of what OS runs and the memory settings you use etc. then buy the Dell servers and an FC SAN and run it it your own data centres. Not every app is right for the cloud, it's something that can be changed when you go back out to tender for the software next time around, but there's no point fighting and trying to run something in AWS that the people writing the application expect to be on-prem with 2ms latency to your MRI scanner or whatever.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Hughmoris posted:

He said the current pipeline cost $1k/day. That seems like a ton to me but I can imagine there are some absolute bonker monthly invoices out there.

One of my favorite conversations from a truculent customer went something like:

$them: “we need to EC2 to do $thing,”

Me: “I have submitted your feature request and the EC2 team will give it consideration and prioritize it accordingly. In the mean time here is an architecture that not only solves your issue but follows best practices as well as allowing you to do thing1 and thing2.”

“Yeah but that’ll take work. We want you to the work.”

“If you have constraints, perhaps we can create a ProServe engagement?”

“That is unacceptable. Do you know who we are?”

“Yes sir. You are $company and I really enjoy working with you.”

“We spend $15,000 a month. Surely that puts us in the top percentage of your customers?”

“Spend by other customers isn’t relevant here. I am trying to get you to green as easily and cheaply as possible.”

“(Smugly) How big are your other customers? Smaller AWS spend than us, right?”

“If this will help us move on, I have three other customers currently. With monthly spend of $600,000 $700,000 and $1.4 million.”

“Oh. Oh.”

The Fool
Oct 16, 2003


I was curious and it looks like our flagship application, off season, is at $50k/month

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
For a peek at cost optimization in large spenders let me share. For reference, last I checked we had somewhere around 100-125 accounts with a monthly spend between $5 and $7 million. We are one of the customers that Agrikk was referring to wrt negotiated discounts but are absolutely nowhere near the actual big customers.

We have a team of 4 whose entire job is driving cost savings initiatives and working with product teams to rearchitect for cost.

Our most expensive services are far and away EC2, S3, and RDS.

Fun story, we did a cost savings activity where we expired a bunch of old objects in S3 and the S3 team reached out to check if something was wrong and to ask us to let them know if we were going to do it again because it affected storage allocation in some way that was affecting other customers.

Second biggest bit of cost savings I’m aware of that we did was batching payloads so we made fewer S3 API calls. It was a while ago but iirc it saved us like $20-30k per day.

Tearing down unused infrastructure has saved us hundreds of thousands a month. Moving from dedicated instances to spot for EKS and EMR workloads saved us a bunch. Rightsizing EC2 instances has probably saved us millions in total.

AWS at scale is an absolute trip.

Blinkz0rz fucked around with this message at 00:54 on Aug 23, 2022

freeasinbeer
Mar 26, 2015

by Fluffdaddy
My company is at about 20 million a year after discounts; our ec2 insofar as K8s is pretty lean; about $120k a month; we spend 3x that on ec2 for another part of the org I have little insight into. We are almost 100% spot or compute savings plans; plus get a pretty normal EDP discount.

I think our EDP requires us to spend $24 million this year; so I expect we’ll find something to blow it on; last year it was $2 million on Rekognition, which didn’t really do anything besides piss money. 🤷‍♀️

We spend $300k a month on s3; but that’s with deep discounts.

Another million a year in GCP and 200k a year in Azure.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

freeasinbeer posted:

I think our EDP requires us to spend $24 million this year; so I expect we’ll find something to blow it on; last year it was $2 million on Rekognition, which didn’t really do anything besides piss money. 🤷‍♀️

Two places often overlooked when trying to burn through EDP cash: training and ProServe.

AWS will happily put together custom training plans for your org- not just “here’s how to do AWS” but more like “here is how YOUR org does AWS (complete with AWS best practices meshed with your best practices and the reasons why you do things the way you do)”. I’ve set that up for customers in the past and it’s always a huge hit and a morale boost for your devs and engineers and architects.

And ProServe credits are great for getting rid of those head-knockers that you want to be rid of but do t have the time.

Arzakon
Nov 24, 2002

"I hereby retire from Mafia"
Please turbo me if you catch me in a game.

Agrikk posted:

Two places often overlooked when trying to burn through EDP cash: training and ProServe.

Third, Marketplace

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Arzakon posted:

Third, Marketplace

The F5 appliance is super rad, and you’ll watch your spend ratchet up by the second!

Tell it to dump logs into an RDS Oracle instance for big money-sink fun!

Wizard of the Deep
Sep 25, 2005

Another productive workday

Agrikk posted:

The F5 appliance is super rad, and you’ll watch your spend ratchet up by the second!

Tell it to dump logs into an RDS Oracle instance for big money-sink fun!

"We've replaced your normal money-burning furnace with a new one made entirely of gold bricks and fueled by printer ink"

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"
Little late to the party but imo if you aren't learning Pulumi you done hosed up. TF CDK is whatever I'm joining some presentation on it tomorrow but honestly TF is beat old poo poo now (that I love and I will defend to the death but the hot newness is what's up)

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Amusing enough, I managed to waste thousands by using spot instances the other month. Because until I checked my bill I didn’t know we already had savings plans that covers all the families of instances that I use and then some for really steep discounts that were greater than spot rates in all the major regions. Like who the heck would think their organization would have bought both C6a and c6i savings plans and reservations covering thousands of them literally the week they were announced?

Adbot
ADBOT LOVES YOU

Docjowles
Apr 9, 2009

12 rats tied together posted:

- S3 storage, there's usually at least one s3 bucket named after the org that has an order of magnitude more data than it should.

I just had to pull this out for appreciation. There is always a bucket called $companyname and it is always like the first thing some rando dev ever did in AWS years before the rest of the org thought about using the cloud. It will be a giant dumping ground of poo poo with no lifecycle policy and probably leaking PII.

There will also be some horrible reason you can't just easily fix it.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply