|
CDK and Pulumi are cool and good
|
# ? Aug 18, 2022 18:01 |
|
|
# ? May 21, 2024 14:56 |
|
I don't have a ton of experience with the CDK but Pulumi is incredible, basically the same amount of freedom to solve problems that I got from Ansible in the mid 2010s and haven't seen since. There are some funky things about it though: 1, the resource objects are immutable once instantiated, so you can't define "the load balancer" in a shared library, import the library, and then change the health check timeout. Instead there are typed "thingArgs" objects in the library which you can do this with, for example in python you might create "the load balancer config" as a named tuple, attrs class, dataclass, or whatever, and then you can import that with all of its defaults and feed it into your load balancer resource after modifying the health check. 2, the pulumi engine magically scoops up all instances of Resource and orchestrates them, which is not idiomatic in most languages. I think most people expect that you would have to return the resource objects to some type of parent scope, or pass them to a handler, but they "just work". 3, even though the code looks intuitive and synchronous, the pulumi engine does some weird poo poo behind the scenes so that you (like terraform) don't have to tell it manually about the order for resolving every dependency. This means that even if it looks really trivial, you usually can't do "give me the ARN of this load balancer and HTTP POST to our inventory service" in an obvious way. The mechanism for this in pulumi is called apply and you have to feed it a callback function that will run once the value is known. If you're doing a lot of this, it makes sense to pick a language that is really good at callbacks (typescript), even if you might have preference for one of the other supported languages or even just the YAML mode. But overall, it's a massive improvement over basic terraform IMO. The easiest thing to sell people on in my experience is the Transformations construct which is basically an object that describes a set of changes to make to a stack (workspace). You can define these Transformations in a repeatable, DRY way and then very ergonomically apply them to a particular stack, or all stacks, a couple obvious examples of this are: "We tag all resources in the PCI account with CONCERNS_PCI=True", and "every resource should have the pulumi tag". These are both quick tagger functions you can write in an "auto_taggers" module and then you just import and apply them to your stack, and then you get free tagging forever that is impossible to gently caress up or forget.
|
# ? Aug 18, 2022 18:29 |
|
I am falling out of love with terraform more all the time. It was a revelation when it came out and I was a die hard Terraform fan for quite a while. But even after years of improvement and passing 1.0, MAN is HCL still kind of awkward and lovely to work with. Any kind of non trivial loops or conditionals are just the most tortured crap. You can “fix” it by wrapping it with something else but some point, I would just rather be using something else entirely than papering over shortcomings. I haven’t played with it much yet but I know a number of CDK true believers. Haven’t met any Pulumi users in the wild.
|
# ? Aug 19, 2022 04:17 |
|
Terraform has all the sins and warts of Puppet and Chef combined with making worse one of the better things around loops and conditions being regular old Ruby essentially. In the future I suspect writing raw Terraform will be viewed the way we oftentimes write JavaScript today - by writing in anything other than that such as TypeScript. Also seriously what’s with some of these infra folks using TypeScript for infrastructure code? Maybe I’m missing something important but I can’t think of a particularly solid construct that makes learning a whole rear end other language and tool chain on top of Terraform worth it unless the language and library ecosystem is more stable than the cesspool of technical debt that are both NPM and yarn.
|
# ? Aug 19, 2022 10:18 |
|
what's the market share of pulumi vs. terraform and ansible? is it actually being used at scale? I haven't heard of it before lol, so I'm curious
|
# ? Aug 19, 2022 10:52 |
|
It's hard to compare Ansible and Terraform as whilst they can do the same kind of things they do them in radically different ways.
|
# ? Aug 19, 2022 13:31 |
|
Docjowles posted:I am falling out of love with terraform more all the time. It was a revelation when it came out and I was a die hard Terraform fan for quite a while. But even after years of improvement and passing 1.0, MAN is HCL still kind of awkward and lovely to work with. Any kind of non trivial loops or conditionals are just the most tortured crap. You can “fix” it by wrapping it with something else but some point, I would just rather be using something else entirely than papering over shortcomings. God yes. Terraform was revolutionary but it’s very quickly becoming too cumbersome to use. New and better abstractions are badly needed. The main benefit of it is that it’s not really coding, so it’s not intimidating for devops peeps who don’t know how to code and it’s incredibly accessible as a result. That’s to its credit, but targeting that audience naturally comes with problems for those who do, especially because unlike ansible, it’s not as easily extensible. I haven’t used Pulami in a production environment but I’d dearly like to. E: vvvvv lmao I thought we were in that thread, death to state The Iron Rose fucked around with this message at 14:49 on Aug 19, 2022 |
# ? Aug 19, 2022 14:03 |
|
Firstly, there's no real concept of an inventory in Terraform while Ansible is intimately tied to where code is executed and lives and communicates between different places while Terraform is implicitly locally executed for basically everything it does (its IPC calls are on the same box although there's not a great deal architecturally that would stop TF from having providers and plugins execute on remote systems eventually). Secondly, Ansible doesn't keep global state of a system as a rule and expects modules to resolve localized state themselves, Terraform maintains state and passes it (for better or worse) to the provider / plugin code. This drastically changes runtime behaviors and patterns. Ansible is more comparable to stuff like Puppet, Chef, and Saltstack that are all designed around internal systems management constructs and needs rather than Terraform's constructs of materializing a global system state with a declaratively defined language wrapping a serialization format. The infra nerd CI/DevOps/SRE thread is a better place to go for deeper discussion
|
# ? Aug 19, 2022 14:43 |
|
A couple of big orgs "use pulumi" but they tend to be the type of org that use a little bit of everything. It's not nearly as commonly used as terraform or ansible. Hashicorp's own CDK went GA earlier this month as well, CDKTF, which could be a useful introduction to working with this type of tool for anyone who is interested. Notable differences from Pulumi are that the resource objects are mutable and the config holder objects are not, although the resource objects must be modified through explicit setter methods e.g. import CompanyNameEcsService -> set_load_balancer(). You have to instantiate a wrapper class for "resources in a stack" and you end up shoving a lot of stuff into class constructors for this reason, compared to Pulumi which handles the stack <- resource relationship more implicitly with some CLI commands and other exterior scaffolding. CDKTF also has Pulumi's transformations construct but it calls them "aspects", its where your auto taggers and the like would go. It's not nearly as bad as I was expecting, but something happened to hashicorp in the last 5 years where they're just awful at documentation now. Pulumi's docs identify which resource properties will trigger a rebuild, and they host their own documentation compared to the CDKTF which has them all up on constructs.dev Both tools can interop with HCL Terraform but Pulumi's UX for reading terraform outputs is much better, while CDKTF has a unique(?) advantage in that you can invoke an HCL module directly from your python/ts/whatever CDKTF program.
|
# ? Aug 19, 2022 15:32 |
|
necrobobsledder posted:Also seriously what’s with some of these infra folks using TypeScript for infrastructure code? Maybe I’m missing something important but I can’t think of a particularly solid construct that makes learning a whole rear end other language and tool chain on top of Terraform worth it unless the language and library ecosystem is more stable than the cesspool of technical debt that are both NPM and yarn. The CDK and CDKTF are TypeScript heavy because their cross-language capabilities come from jsii, an AWS library that lets any language interact with javascript stuff. Pulumi's uses grpc which probably explains a lot of the other differences, I don't have a lot of experience using cross-language facilities in either tool and honestly I hope to never develop this experience either. Pulumi was founded by Microsoft people, from what I understand, and that whole ecosystem loves TypeScript for various good and bad reasons. I can tell you that it is surprisingly ergonomic, if you don't mind dealing with the transpiler and the abundance of things called "interface" that are actually just data. I've written CDK-style code in C#, F#, TS, and Python and my preferences are basically that list in reverse order. More generally: the dream of simply having the developers write the terraform was unattainable unless you were working with the most patient and motivated developers to ever exist. If you want the dev teams to seriously commit to writing their own infra code, you have to meet them where they're at (in their IDE, in their preferred language).
|
# ? Aug 19, 2022 15:51 |
|
The Iron Rose posted:God yes. Terraform was revolutionary but it’s very quickly becoming too cumbersome to use. New and better abstractions are badly needed. This is me 100%, I just started playing with it a while ago. Being able to deploy a whole VPC with all the parameters I need for some poo poo vuln scanner I have to support has been a revelation. I started using it for a few things in my personal AWS account as well. But now you're telling me I am once again behind the times!?!
|
# ? Aug 19, 2022 16:56 |
|
And here I am learning CloudFormation like an idiot.
Internet Explorer fucked around with this message at 20:42 on Aug 19, 2022 |
# ? Aug 19, 2022 18:23 |
|
gotta eat your dog food
|
# ? Aug 19, 2022 18:25 |
|
CloudFormation is extremely good, and there are reasons to use it even from Terraform. Time you spend learning it is never wasted.
|
# ? Aug 19, 2022 18:36 |
|
You're not gonna regret learning how cloudformation works if you end up using CDK.
|
# ? Aug 19, 2022 18:38 |
|
CloudFormation and Terraform wind up calling the same dang endpoints in the end so you’re really learning your cloud provider and AWS, Google, MS win either way.
|
# ? Aug 19, 2022 19:01 |
|
I like cloudFornation. Just like I liked dropping acid and going to class back in the day: everything is chaotic and mysterious, the simplest task feels like an epic adventure, and days later you look back at the notes you took and can only think wtf.
|
# ? Aug 20, 2022 05:12 |
|
The only reason people learn CloudFormation is to pass the exams.
|
# ? Aug 20, 2022 12:53 |
|
i've worked for a handful of b2b saas companies that deploy some stuff into the customers aws account (even if just an iam role/policy to assume) and my general breakdown of clients was: 80% - did not use infra as code at all 10% - used terraform and are very tribal/opinionated/social-signal-y about it 10% - just used cloudformation .0001% - used cdk or pulimi or ansible so for aws only environments you dealt with 90% of your customers by using cloudformation, because the web-UI and background state are handled by amazon. introducing tf to their env would create the prereq for "where does it run and save state" which is just a total gently caress-that conversation to be avoided.
|
# ? Aug 21, 2022 19:28 |
|
I’d shift that 80% up closer to 90 but otherwise concur. The amount of very bad shell invoking the aws cli passing for IaC I’ve seen is harrowing, though that might be a local quirk from so minicomputing grognards in the region.
|
# ? Aug 22, 2022 06:24 |
|
Startyde posted:I’d shift that 80% up closer to 90 but otherwise concur. I'm not too proud to deny that I may have used excel to create a bunch of cli commands to add users to Connect via the cli. One thing I have found is that provisioning via cf seems slow as poo poo compared to doing the same via the sdks.
|
# ? Aug 22, 2022 07:20 |
|
ledge posted:I'm not too proud to deny that I may have used excel to create a bunch of cli commands to add users to Connect via the cli. I’ve done this for adding peering connections and modifying route tables when adding a VPC to existing infrastructure. I could create CF statements for it, but copying and modifying a single line script twenty times is far, far less hassle.
|
# ? Aug 22, 2022 14:30 |
|
Cost savings and performance optimization have always been an interest of mine and as I've yet to actually do any professional AWS work, I'm curious: what are the big monthly cost items for a company when it comes to AWS? I'm guessing S3 and EC2? Watching EMR and Spark tutorials, it looks like you can quickly ring up eye-watering charges on clusters but I don't have much to compare it against. For those of you working with this stuff daily, if you started at a company and were tasked to start saving them money on the monthly bill, where would be the first places you'd start looking?
|
# ? Aug 22, 2022 21:09 |
|
Hughmoris posted:Cost savings and performance optimization have always been an interest of mine and as I've yet to actually do any professional AWS work, I'm curious: what are the big monthly cost items for a company when it comes to AWS? I'm guessing S3 and EC2? Watching EMR and Spark tutorials, it looks like you can quickly ring up eye-watering charges on clusters but I don't have much to compare it against. It’s EC2. EC2 is the biggest service on AWS and companies gather there before branching off into other services. To mitigate this, there are a bunch of ways to cut costs, from reserved instances to savings plans to EDPs. S3 is a big service but is relatively cheap and all you need to do there is turn on tiered storage and yer done. When you get into millions, if not tens of millions, per month then you can negotiate highly customized contract billing with AWS for ridiculous savings. But plan on spending fifty million+ USD per year first.
|
# ? Aug 22, 2022 21:33 |
|
The first place is always the bill, and it has everything you need generally. Everything comes with a big "it depends" asterisk because everything, well, depends. Big cost drivers are, IME: - S3 storage, there's usually at least one s3 bucket named after the org that has an order of magnitude more data than it should. - RDS instance type tends to just get ratcheted up over time as more "database events" happen and are solved with vertical scaling. - AWS' list price for data transfer is exorbitant, so if you have a chatty app on AWS, it tends to dominate your spend (adtech is really bad about this in particular). - If you have a chatty app that is only chatty "privately", whatever ops team that exists has usually never done the work to optimize for that so you'll see a lot of inter-AZ bandwidth charges too. EMR tends to be pretty cheap because you can just run it on spot. If there's no capacity, whatever, run the job again when there is. EC2 instance type is also kind of an obvious cost center so even the least responsible orgs have optimized around it to some degree.
|
# ? Aug 22, 2022 21:33 |
|
EC2, S3, RDS are usually the big ones for most enterprises. Cost savings measures: * Are they using reserved instances or compute savings plans? * How much inter-AZ or inter-region traffic is there, and can it be rearchitected around? * VPC endpoints instead of making API calls over the public internet? * GP3 vs. GP2 EBS volumes? * too many old snapshots sticking around * Are they using any S3 tiers other than the standard one? * modern instances classes vs. old ones * rightsizing with Compute Optimizer * rightsizing EBS volume size * are they using datacenter-like usage patterns (like querying an S3 API every second 24/7 for if a new file has showed up vs using a message bus) * transit gateways are expensive Finops tools: CUDOS and other CID dashboards Cost Anomaly detector Cost Categories/Budgets when paired with intelligent tagging strategies
|
# ? Aug 22, 2022 21:35 |
|
- Don't migrate your VMs to EC2 and wonder why isn't giving you the savings that it was advertised as being able to bring
|
# ? Aug 22, 2022 21:48 |
|
That is great insight, thank you. I've worked for several community hospitals where it was sometimes a struggle to keep the doors open, so I always enjoyed hunting for those easy wins when it came to costs savings.Agrikk posted:When you get into millions, if not tens of millions, per month then you can negotiate highly customized contract billing with AWS for ridiculous savings. But plan on spending fifty million+ USD per year first. I was watching an AWS Event (I think) video where they were interviewing a data architect who spoke about his company utilizing Lambda for their ETL. He said the current pipeline cost $1k/day. That seems like a ton to me but I can imagine there are some absolute bonker monthly invoices out there. *Funny enough, that data architect worked for an adtech company.
|
# ? Aug 22, 2022 22:10 |
|
Also I think it's important to know when the cloud isn't the right option. If you need to run an enterprise app that is going to eat 16 cores 24x7 and make billions of storage transactions, has strict requirements in terms of what OS runs and the memory settings you use etc. then buy the Dell servers and an FC SAN and run it it your own data centres. Not every app is right for the cloud, it's something that can be changed when you go back out to tender for the software next time around, but there's no point fighting and trying to run something in AWS that the people writing the application expect to be on-prem with 2ms latency to your MRI scanner or whatever.
|
# ? Aug 22, 2022 22:17 |
|
Hughmoris posted:He said the current pipeline cost $1k/day. That seems like a ton to me but I can imagine there are some absolute bonker monthly invoices out there. One of my favorite conversations from a truculent customer went something like: $them: “we need to EC2 to do $thing,” Me: “I have submitted your feature request and the EC2 team will give it consideration and prioritize it accordingly. In the mean time here is an architecture that not only solves your issue but follows best practices as well as allowing you to do thing1 and thing2.” “Yeah but that’ll take work. We want you to the work.” “If you have constraints, perhaps we can create a ProServe engagement?” “That is unacceptable. Do you know who we are?” “Yes sir. You are $company and I really enjoy working with you.” “We spend $15,000 a month. Surely that puts us in the top percentage of your customers?” “Spend by other customers isn’t relevant here. I am trying to get you to green as easily and cheaply as possible.” “(Smugly) How big are your other customers? Smaller AWS spend than us, right?” “If this will help us move on, I have three other customers currently. With monthly spend of $600,000 $700,000 and $1.4 million.” “Oh. Oh.”
|
# ? Aug 22, 2022 22:57 |
|
I was curious and it looks like our flagship application, off season, is at $50k/month
|
# ? Aug 22, 2022 23:06 |
|
For a peek at cost optimization in large spenders let me share. For reference, last I checked we had somewhere around 100-125 accounts with a monthly spend between $5 and $7 million. We are one of the customers that Agrikk was referring to wrt negotiated discounts but are absolutely nowhere near the actual big customers. We have a team of 4 whose entire job is driving cost savings initiatives and working with product teams to rearchitect for cost. Our most expensive services are far and away EC2, S3, and RDS. Fun story, we did a cost savings activity where we expired a bunch of old objects in S3 and the S3 team reached out to check if something was wrong and to ask us to let them know if we were going to do it again because it affected storage allocation in some way that was affecting other customers. Second biggest bit of cost savings I’m aware of that we did was batching payloads so we made fewer S3 API calls. It was a while ago but iirc it saved us like $20-30k per day. Tearing down unused infrastructure has saved us hundreds of thousands a month. Moving from dedicated instances to spot for EKS and EMR workloads saved us a bunch. Rightsizing EC2 instances has probably saved us millions in total. AWS at scale is an absolute trip. Blinkz0rz fucked around with this message at 00:54 on Aug 23, 2022 |
# ? Aug 23, 2022 00:52 |
|
My company is at about 20 million a year after discounts; our ec2 insofar as K8s is pretty lean; about $120k a month; we spend 3x that on ec2 for another part of the org I have little insight into. We are almost 100% spot or compute savings plans; plus get a pretty normal EDP discount. I think our EDP requires us to spend $24 million this year; so I expect we’ll find something to blow it on; last year it was $2 million on Rekognition, which didn’t really do anything besides piss money. 🤷♀️ We spend $300k a month on s3; but that’s with deep discounts. Another million a year in GCP and 200k a year in Azure.
|
# ? Aug 23, 2022 01:22 |
|
freeasinbeer posted:I think our EDP requires us to spend $24 million this year; so I expect we’ll find something to blow it on; last year it was $2 million on Rekognition, which didn’t really do anything besides piss money. 🤷♀️ Two places often overlooked when trying to burn through EDP cash: training and ProServe. AWS will happily put together custom training plans for your org- not just “here’s how to do AWS” but more like “here is how YOUR org does AWS (complete with AWS best practices meshed with your best practices and the reasons why you do things the way you do)”. I’ve set that up for customers in the past and it’s always a huge hit and a morale boost for your devs and engineers and architects. And ProServe credits are great for getting rid of those head-knockers that you want to be rid of but do t have the time.
|
# ? Aug 23, 2022 02:14 |
|
Agrikk posted:Two places often overlooked when trying to burn through EDP cash: training and ProServe. Third, Marketplace
|
# ? Aug 23, 2022 02:29 |
|
Arzakon posted:Third, Marketplace The F5 appliance is super rad, and you’ll watch your spend ratchet up by the second! Tell it to dump logs into an RDS Oracle instance for big money-sink fun!
|
# ? Aug 23, 2022 05:40 |
|
Agrikk posted:The F5 appliance is super rad, and you’ll watch your spend ratchet up by the second! "We've replaced your normal money-burning furnace with a new one made entirely of gold bricks and fueled by printer ink"
|
# ? Aug 23, 2022 06:08 |
Little late to the party but imo if you aren't learning Pulumi you done hosed up. TF CDK is whatever I'm joining some presentation on it tomorrow but honestly TF is beat old poo poo now (that I love and I will defend to the death but the hot newness is what's up)
|
|
# ? Aug 23, 2022 06:13 |
|
Amusing enough, I managed to waste thousands by using spot instances the other month. Because until I checked my bill I didn’t know we already had savings plans that covers all the families of instances that I use and then some for really steep discounts that were greater than spot rates in all the major regions. Like who the heck would think their organization would have bought both C6a and c6i savings plans and reservations covering thousands of them literally the week they were announced?
|
# ? Aug 23, 2022 12:04 |
|
|
# ? May 21, 2024 14:56 |
|
12 rats tied together posted:- S3 storage, there's usually at least one s3 bucket named after the org that has an order of magnitude more data than it should. I just had to pull this out for appreciation. There is always a bucket called $companyname and it is always like the first thing some rando dev ever did in AWS years before the rest of the org thought about using the cloud. It will be a giant dumping ground of poo poo with no lifecycle policy and probably leaking PII. There will also be some horrible reason you can't just easily fix it.
|
# ? Aug 23, 2022 16:18 |