Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
You're talking about running a stateless app that's backed by S3. You can do a stateless app using Lambda or an autoscaling group of instances provisioned with ElasticBeanstalk, OpsWorks, or CloudFormation.

Adbot
ADBOT LOVES YOU

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Pollyanna posted:

I have a question about EBS and baking AMIs. We're currently baking a new AMI for every new version of our app we want to deploy, and I'm wondering if there's a way around that? It takes 15~20 minutes to band one which means that every commit I push to Bitbucket takes half an hour to show up on its staging server. I have to debug some pipeline related poo poo and ensuring that long to run into yet another bug is driving me crazy. What can I do to mitigate this?

Whatever tool you're using to bake has a lot of stuff going on. See if you can simplify the provisioning process if bake a base AMI that contains components that rarely change and use that to bake new app AMIs.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Baking is a solid concept in that it resets instance state back to a known condition so your failure cases are restricted to the app and other transitive dependencies. It makes deployments much easier.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Beanstalk is just CloudFormation and a bunch of shell scripts. It's not terrible if you just want to deploy an app.

We'll need a little more info about what you're trying to do to be able to give you a good recommendation. Will you need to scale? What sort of volume of traffic are you looking at? Are there any persistence requirements or is the app stateless? How much time/money/energy do you have to throw at solving this?

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
So the AMI baking before the Docker container is deployed is what takes a long time? What sort of provisioning tools are you using to provision the AMI? If Chef, you should look at the recipes being run and see if you can figure out where it's spending most of its time.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Thanks Ants posted:

Does anybody on here login to their AWS console through SAML? I'm looking to sort out our sprawl of independent accounts as the company grows. I have AWS linked to G Suite so everybody pick a role when they log in based on strings stored in the directory schema, and this works well. However, a legit issue that has been raised relates to generating access keys for services - since SAML just grants access rather than actually creating an account, there's no user object to add access keys to. If I make a SAML role that enables people to create users that they can then add access keys to it defeats the purpose of using SAML in the first place, since there's extra workload created to audit these accounts and the permissions attached to them.

Is this a thing that anyone has solved, or are people just using something like Spinnaker / their own internal tools which use internal directory details, and not letting people touch the AWS console?

Don't generate access keys for services at all. Instead, use IAM roles and instance profiles and the aws-sdk to authenticate your service on-instance.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Ok, I understand.

We've implemented this but I don't know what was done on the SAML side to get it working. Basically you use whatever SSO provider you're using to authenticate to AWS via IAM::AssumeRoleWithSAML and then have some sort of service running locally that rotates your credentials accordingly.

The project is open source but I really don't want to doxx myself so if you're interested please PM me.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Anyone have suggestions for the best way to analyze CloudTrail logs? We're getting rate limited on some of our EC2 API calls and it's unclear why at a glance. Happens most in eu-central-1 fwiw.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Vanadium posted:

Hey, if I'm using lambda functions that sit idle during their invocation for like 5-10 minutes at a time, am I doing it wrong? I wanted to kick off redshift queries in a clever serverless way but I guess I'm not optimally using the pricing structure if my lambda function starts up, dials up redshift, sends a query and then just sits there until the query returns.

On the other hand the call volume is gonna be low so it doesn't really matter either way, probably. Just feels wrong?

Can you have redshift publish results to an SNS topic? If so, use 2 lambdas, one to kick off the query and the other to process the results when the data is published to the topic.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Thanks Ants posted:

Isn't that also what SQS is designed to manage?

It depends on what you're trying to do. If you want to persist the data until your Lambda dequeues it then yeah, use SQS. If you want your Lambda to be kicked off with the data available in the event context then you use SNS. It really all depends on what RedShift supports and how you can actually get your data out of it.

It may be that you have to get RedShift to publish results to a S3 bucket and set up bucket events to then publish to an SNS topic which notifies your Lambda. The Lambda will then retrieve the data from the bucket and operate on it.

Either way you choose, from a cost and architecture perspective it's a bad idea to leave a Lambda running for any period of time.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
We wrote an on-instance sidecar app that provides tuneable properties for a service based on application and instance attributes and metadata. For example, we can provide a set of global defaults but then overrides for properties in a specific account, region, auto-scaling group, or for a specific instance with user-defined precedence levels. It's backed by S3 and Consul although Consul is purely optional. We have an extension system for it that handles decrypting properties encrypted either with KMS or Vault's transit encryption.

It's been almost perfect for us because we're still using immutable images but the team is starting to think about what it looks like when we migrate to Kubernetes and it's looking like we're going to have to do a near complete rewrite.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Virigoth posted:

If you're in an immutable environment already with AMIs what/why is pushing you into looking at going to Kubernetes? We've got a fully immutable environment and have had a few meetings but can't come up with enough solid points to add it into our deployment pipeline for a PoC and it just seems like adding another layer of complexity to the environment. Most of our AMIs that are baked spin up super fast with the exception being our Jenkins executor slaves coming in at around 3 minutes right now.

We have a few problems to solve that are simplified in favor of solving k8s

1. Our AMIs take too long to bake. A lot of that is down to how we assemble our base AMI and the number of things we provision. We have an engineering team in LA with some political cache that has been clamoring for anything to speed up deployments and baking is a pretty big chunk of that time.

2. Instances take too long to boot. We have chef doing a boot time run to adjust settings based on region, ASG, chef environment, etc. and that takes a not insignificant amount of time. Coupled with the length of time it takes services and some of our bundled software, like Consul, to start makes the average time between creating a new ASG and having a deployed service on the order of 5+ minutes.

3. Multi-region deployments get gross when we have to copy an AMI across the world. Docker images are much easier in that regard.

4. Our chef recipes are an unmaintainable mess. There's 3+ years of bad decisions in there and while we've tried to make it better, a lot of the improvements are trying to shine poo poo. Moving to a different delivery method lets us sweep a lot of that away and makes deploying and maintaining k8s the big problem to focus on.

5. Resume driven development. Unfortunately. Part of it is promoted by that team in LA but a bunch of it is the general desire not to be stuck maintaining legacy software.

A lot of these problems came up pretty organically so u don't think there's really a specific thing to point to and try to resolve. It's just a whole mess that we need to wipe clean and start as close to fresh as we can.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Vanadium posted:

Y'all are a lot fancier than we are I guess.

It's actually open source if you're interested. I'm reluctant to doxx myself so if you'd like more info PM me.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Virigoth posted:

Ah ok I can see that. I'm having the political fight with Docker right now. We're getting ready to fix a problem with #3 that has been a big security bug for awhile so we'll see what that does to our multi-region deployments and time. For #2 I'm not a Chef guy but is there no way to setup your playbooks (We use ansible) so that when your service does the "configure" playbook you can just run a quick set of scripts or invoke something you baked on there? I'm looking at this from a amazon linux AMI perspective we bake on top of.

This is what we're doing with chef. We have a bake cycle which does the initial provisioning and then a boot cycle which effectively does runtime provisioning. Most everything that's configured at runtime is already on the instance, it's just that actually running chef and going through the runlist takes time. The problem is that everything needs to be fully set up and configured before the actual application is started, so while there's not a lot of things that are done on boot, it still takes time. So does starting up most of the spring boot apps. The boot time on them is insane and they don't pass ELB health checks until the spring boot health check endpoint returns a value.

quote:

If I was you I wouldn't put any major cycles into Kubernetes until after reInvent. Like I'd go full stop if you were thinking of starting right now. It just seems like Kubernetes is ripe enough that AWS might pick it up for some sort of support.

Oh for sure, everything is pointing to a managed solution this year so most of the team is just waiting. People did say the same thing about managed service discovery last year and nothing happened there so who knows. Either way this isn't a problem to solve in the short term.

Blinkz0rz fucked around with this message at 04:15 on Nov 11, 2017

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Anyone implemented something like cloud-custodian with any luck? We have something like 60 accounts and are struggling for a good auditing and remediation solution that won't cost an arm and a leg *cough* evident.io *cough*

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Yeah I didn't want to call that out but :ughh:

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
I'm surprised so many folks have had trouble with Terraform. We went the Cloudformation route with our own custom DSL and it's been a nightmare to build out and maintain. We began migrating to Terraform and more than 1/2 of our accounts are switched over and man, it's been like night and day.

I understand some of the issues with Terraform but imo they kind of miss the point. HCL isn't really code. It's a config language like yaml or ini. You wouldn't make your yaml DRY or have a monolithic repo of all your config files because that gets awful and cumbersome.

We've got a great system where we have a repo of company-approved modules that implement best practices and patterns as well as a repo per account or grouping of accounts for infrastructure. Mostly we isolate workspaces by region although we have a "global" region for account configuration and non-region scoped IAM stuff.

I've never had a problem with the remote state loving itself and while some of the resources have weirdly inconsistent inputs and outputs that's a small price to pay to get the gently caress away from the awful dumpster fire that has been Cloudformation.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Also why in God's name would you stand up individual EC2 instances with Terraform? Use ASGs and either control the deployment via a tool like Spinnaker or manage the group and not the individual instances.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Idk maybe I've been lucky? I'm not sure what a remote state getting hosed up looks like.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Vanadium posted:

This past page is like the most positive I've ever seen someone be about CloudFormation. I'm used to people complaining about CloudFormation getting your stack into a weird state that you can't get out of without petitioning AWS support and about not knowing what exactly applying a change is going to do when there's multiple stacks involved.

Does having a loosely connected set of terraform configs referencing each other's outputs really get so unwieldy compared to CloudFormation? I thought that was kind of comparable to how you manage multiple stacks there, and generally a good idea to ~limit blast radius~.

I guess it's hard for me to conceptualize what a comprehensive, nontrivial AWS account configuration looks like, and I've had a hard time getting into CloudFormation because it seems like everybody who talks about using it also has their own DSL they recommend, which doesn't make things less confusing. :shobon:

Fwiw our production load for one product is about 1000 instances across 80 or so microservices with backing persistence layers (rds, elasticache, etc), load balancers, IAM roles, etc. We're most of the way through a migration from CloudFormation to Terraform and it's night and day with how much easier it is to manage now.

Echoing your sentiments; I've never seen people defend CloudFormation so vigorously as on this page.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
We deploy users during every chef run. The users are created from databags containing our dev's public keys. If a databag is deleted (someone leaves) all of our instances remove that user the next time chef runs. Same if a new user is onboarded.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Iirc you need to make sure the logging module is logging to stdout or stderr.

Alternatively you could just use print()

e: you might be running into weirdness with the log level being overwritten somewhere else assuming your code is more complex than your example. The python logging module has some odd warts.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Yeah the logging module maintains a tree of singletons that are retrieved by name. Theoretically your module-level logger should override the package- and root-level loggers but sometimes that's not always the case.

The best way to debug is to use logger.error and ensure it's actually being logged. Then, if it is, try to figure out what level the logger itself is set to and why.

e: might be worth investigating if the logger is flushing when your lambda exits. That'd be another reason it wouldn't output correctly.

Blinkz0rz fucked around with this message at 19:43 on Aug 11, 2019

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Agrikk posted:

Or you can do what a customer of mine did:

Be really clever and buy an iPhone and put all eighty of their accounts’ root 2FA on an instance of google Authenticator and keep the iPhone in a bombproof safe.


They were all kinds of :smug: until someone dropped the phone.


I had to fly down there and get on a video call with our legal department and me sitting next to their leadership and vouch that their leadership was actually their leadership and we all had to present IDs and say who we were and that we were authorized to remove MFA from the account.

We got to do this eighty times.

We were about to do this for our root accounts but decided against phone-in-safe because it forced recovery responsibility on only one office when our platform engineering org is geographically distributed.

I asked our TAM and he never got back to me with suggestions so we ended up getting 1password solely for this scenario but I can't help but think there are better solutions out there.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

For 2fa tokens?

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Running vault on prem means we rely on IT to keep it running, always on, and always available while the folks who would actually need the service work in a different org.

If they ran it, it'd be in AWS which feels not awesome to run a recovery system in an environment that could need to be recovered.

We weren't able to find anything that split the uprights between always available, low touch, and secure.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
I'd nothing works consider spinning up a new instance and mounting the old volume on it to make sure the right key is set in authorized_keys and the file had the right permissions.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
It's good and definitely worth using but there are some weird gotchas the further out from the popular services you get.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Fwiw localstack is a wrapper over Moto.

To give the OP some perspective, our product relies heavily on S3, SQS, and SNS which work pretty well in localstack so we use it to spin up a solid approximation of our environment on each engineer's machine rather than having a dev AWS account for that kind of stuff.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

12 rats tied together posted:

terraform is the most popular tool in this space, yeah. i prefer pulumi to the aws cdk but they are both fine

e: if you decide you want to get a job terraform touching i would look up the following addon tools asap to make your eventual job suck less:

- terraform-landscape
- terragrunt

Landscape is pointless if you're using TF >= 0.12 (which you should should be unless you inherited an older codebase in which case you should be upgrading asap)

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
I’ll offer an alternative which is that SQS metrics don’t always align 1-1 with the way your business logic tracks job status and progression.

Imo it’s a better choice to surface metrics from your publishers and your consumers. Maybe run a Grafana instance and push custom metrics there rather than trying to map SQS metrics to your own logic.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Just-In-Timeberlake posted:

I’ve never had to use a name before and ran into this the other day for the first time and was like wtf is this now

Laughs in queue url

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
For a peek at cost optimization in large spenders let me share. For reference, last I checked we had somewhere around 100-125 accounts with a monthly spend between $5 and $7 million. We are one of the customers that Agrikk was referring to wrt negotiated discounts but are absolutely nowhere near the actual big customers.

We have a team of 4 whose entire job is driving cost savings initiatives and working with product teams to rearchitect for cost.

Our most expensive services are far and away EC2, S3, and RDS.

Fun story, we did a cost savings activity where we expired a bunch of old objects in S3 and the S3 team reached out to check if something was wrong and to ask us to let them know if we were going to do it again because it affected storage allocation in some way that was affecting other customers.

Second biggest bit of cost savings I’m aware of that we did was batching payloads so we made fewer S3 API calls. It was a while ago but iirc it saved us like $20-30k per day.

Tearing down unused infrastructure has saved us hundreds of thousands a month. Moving from dedicated instances to spot for EKS and EMR workloads saved us a bunch. Rightsizing EC2 instances has probably saved us millions in total.

AWS at scale is an absolute trip.

Blinkz0rz fucked around with this message at 00:54 on Aug 23, 2022

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Falcon2001 posted:

Yeah it's still a bit of a pain in the rear end (Does this channel exist and private, or does it not exist at all? WHO KNOWS) but I think I'm not much of a Slack poweruser. I'm sure the team running it is overloaded though.

Still way better than Teams.

That’s kind of the point of a private channel

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Falcon2001 posted:

Yeah to be clear my complaint isn;'t 'oh no my slack channel isn't actually private' more like 'people create private slack channels and then expect people to be able to find them', because Private breaks both discoverability and access, and in my experience most people actually care about the access restriction and don't really give a poo poo / actively dislike the behavior of the discoverability changes.

The intent is kind of in the name, isn't it? Sounds a lot like PEBCAK imo.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

jiffypop45 posted:

Is there a recommended way to execute intense/long running queries against an Aurora RDS cluster without impacting the cluster itself? When we did self hosted MySQL on EC2 we just booted a new box in the same vpc but just left it out of the load balancer.

Create a read replica and then run the query there

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Agrikk posted:

I do t k why I hate AI so much but I do.

Maybe that it’s been adopted by so many as the perfect cure-all for everything from procedure manuals to customer emails to doing our thinking for us.

Maybe people who have no actual idea of its capabilities use it like a general purpose “we’ll throw AI at it to fix it!” rallying cry.

Maybe because people use “AI” like they used “paradigm” in the 90s, “devops” in the 00s or “agile” in the teens.

But I hate it and it sucks.

it's because ai is championed by the same insufferable assholes who constantly sell the sizzle and refuse to acknowledge how lovely the steak is

Adbot
ADBOT LOVES YOU

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

kalel posted:

For the past few weeks, my prod postgresql RDS instance's CPUUtilization metric rises steadily throughout the day to a max of ~8% and then drops instantly to ~2% at 00:00 UTC, every day, like clockwork. Any reason why that would be the case? Google is giving me nothing.

Autovacuuming?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply