Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

StabbinHobo posted:

the only useful info in those answers was "hasn’t changed for like 3 years" (and therefore I should probably just follow the old blog posts).
Tech is hard, that's why they pay us to do it for a living and why this is a different speciality. You basically asked "what language is the best to use to make an app" which is why you had people crawl out out of the woodwork.

Just pick your favorite ci/cd tool and deployment tool and run with it. The devil is in the details though and you are going to hit bumps no matter what you choose. For example, when using terraform for AWS specific stuff, the supporting modules are pretty good. Some bits are hard like dynamically generating security groups and iam roles - basically any place you need to ieterate through a data structure and apply it, squint real hard. Use a remote state file in s3. Do not use terraform to do anthing on servers themselves. Use a diff tool.

If you don't know this up front you're going to struggle and be rewriting stuff, so once you decide on foundational tools you can ask more specifics like what to watch out for.

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

necrobobsledder posted:

Like 5 posts in we're at Defcon-1 CI / DevOps / K8S land rather than being AWS-specific

This is an interesting statement. IMO the process I outlined is a framework for sanely managing complex relationships between dependent resources that mutate over time. I think calling it CI / DevOps / K8S land is kind of dishonest given that it can manage AWS RDS instances, AWS EMR clusters, and other extremely stateful pieces of infrastructure that simply cannot exist in K8S, and have no place in a CI / "DevOps" toolchain except as an environment variable value.

I think, if you're doing IaC right, your AWS resource management is almost indistinguishable from your K8S resource management, but effectively managing an AWS account and effectively operating a K8S cluster are still two different things. IMO, anyway.

For what it's worth: k8s cluster provisioning, deployment, bootstrapping initial services, and applying addon services in my current role is exactly the same as provisioning an AWS resource stack except you replace instances of cloudformation with instances of k8s. It's a fantastic workflow and I highly recommend it.

StabbinHobo posted:

the only useful info in those answers was "hasn’t changed for like 3 years" (and therefore I should probably just follow the old blog posts).

Bhodi is absolutely right that you're going to have to pick a tool and get started before we can go into any more detail, but there are a lot more details to go in on. I would not recommend starting with blog articles as a general rule though.

12 rats tied together fucked around with this message at 15:43 on Aug 7, 2019

StabbinHobo
Oct 18, 2002

by Jeffrey of YOSPOS
goddamn yall love to post a lot of words without reading

i did pick a tool. i named it in my very short and specific question.

12 rats tied together
Sep 7, 2006

StabbinHobo posted:

whats the right ci/cd pipeline setup for IaC/cloudformation work?

You're right, you did name a tool, I'm sorry for missing that.

IMO, don't do CD with CloudFormation. My very specific toolchain works well by using a mixture of create_change_set and validate_template for CI along with the aforementioned ansible assert.

Change sets have an "execute" button you can click manually. IMO including a link to the change set with a pull request and clicking "execute" manually is superior to automatically executing your change set from a CD agent, but most CI/CD tools have a concept of build (create change set) vs deploy (execute change set) that you can use too. If you have a compliance thing involved here you can restrict execute_change_set permissions to your CD agent and have a documented workflow where an engineer reviews a changeset and approves a PR to kick it off, auditors are usually pretty happy with this.

Actually choosing a CI/CD tool depends more on what you're using for version control than anything else in my experience. Gitlab runners are popular and work reasonably well, CircleCI on github is okay too. I've used TravisCI and I was not a fan but it was functional. I don't think you can go wrong here unless you tried to write a ci service yourself that manually consumes github web hooks.

e: If you haven't done cloudformation before I would recommend taking a look at stack policies at your earliest convenience and to start thinking about them ASAP.

In my experience the "I would like to run tests on my IaC" desire is kind of a version of the xy problem where the actual desire is to be able to safely make changes to production resources, and it's manifesting as a stated desire for a comprehensive test suite.

You can skip a lot of the landmines involved in accurately writing tests for IaC by just outright blocking actions that would cause downtime. There's no situation where a routine update should take down a database, so everything involved in the access pattern for your database should have a stack policy that blocks Update:Replace and Update:Delete, and now you don't need to write a "the database should not become unreachable during updates" assertion and verify that its true in a test environment every time you merge a pull request.

12 rats tied together fucked around with this message at 19:02 on Aug 8, 2019

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Adhemar posted:

For future reference, you should turn on logging for both APIG and Lambda and check the appropriate CloudWatch logs.

The first invocation after saving the function will always be a cold start, so the symptoms you were seeing definitely pointed towards something being not quite stateless.

I’m using Serverless to provision/deploy everything and am logging things with Python’s logging module but nothing seems to show up in CloudWatch except that the request started/finished and unhandled errors when they are thrown. I’m probably missing an option or something in my Serverless config because I thought it would have automatically done this for me too but I haven’t looked into it too much.

Cute n Popular
Oct 12, 2012
Lambda functions automatically configure their own logging module so you need to use that.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Cute n Popular posted:

Lambda functions automatically configure their own logging module so you need to use that.

I’m using Python’s native logging module, ie

code:
import logging
logger = logging.getLogger(level=logging.DEBUG)

def handler(event, context):
    logger.debug(‘some stuff to log’)
This should be logged into CloudWatch according to their docs. In CloudWatch I’m only seeing the entries that tell me when the lambda started, finished, and how much I was billed for.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Iirc you need to make sure the logging module is logging to stdout or stderr.

Alternatively you could just use print()

e: you might be running into weirdness with the log level being overwritten somewhere else assuming your code is more complex than your example. The python logging module has some odd warts.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I’ll take a look at the config for it cause I thought it’s going to stdout/stderr already but I’m not sure.

That’s probably the first time I’ve ever read to “just use print” as well :v:

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Last I remember, you need to give getLogger a positional argument. I've always used logging.getLogger(__name__, **somekwargs)

12F style applications (which a Lambda function falls pretty much into IMO) are all about logging to stdout and stderr, being basic is back in style, baby

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Yeah the logging module maintains a tree of singletons that are retrieved by name. Theoretically your module-level logger should override the package- and root-level loggers but sometimes that's not always the case.

The best way to debug is to use logger.error and ensure it's actually being logged. Then, if it is, try to figure out what level the logger itself is set to and why.

e: might be worth investigating if the logger is flushing when your lambda exits. That'd be another reason it wouldn't output correctly.

Blinkz0rz fucked around with this message at 19:43 on Aug 11, 2019

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord
Any idea what could cause an AWS Batch job to stall in a Queue in Runnable state for 5-10 minutes per execution? There's a ton of articles about a process being "hung" there, but the test jobs I've set up have valid environments and resources, they just take forever to run very simple commands.

Edit: It will also run instantaneously sometimes with absolutely no changes on my end.

PierreTheMime fucked around with this message at 18:54 on Aug 12, 2019

Startyde
Apr 19, 2007

come post with us, forever and ever and ever
If it's got to start a new instance to service the job, it can sometimes take that long. Be mindful of the memory requirements on the task, remember there's overhead, and add every drat instance that the job is capable of running on to the selection list. That's helped keep our stuff moving quickly.

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord
Another question on my journey to getting code to execute:

I built a docker container using Corretto and my code. If I run the container from an EC2 instance, I can run my code by passing my access key and secret as arguments and it works fine. Alternatively, if you don't pass anything it takes the user creds from the local environment, so if I run it from Lambda with no creds, it uses the permissions associated with the Lambda role and works fine.

Running the code from a Batch job using a role that has full permissions to S3 (what I'm accessing), it barfs on access (specifically heading the object to get the object size). If I run the command with my explicit access key, it similarly barfs after it confirms that it successfully created a connection to the S3 client. What is am I missing in Batch that works everywhere else. :(

PierreTheMime fucked around with this message at 21:30 on Aug 14, 2019

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
You may not be properly overriding the instance profile based permissions in your job. What are the environment variables that is used starting up your Batch job exactly? Also, if your code uses the Go SDK you may need to use another environment variable called AWS_SDK_LOAD_CONFIG to have it read credentials from a ~/.aws/credentials file as well as environment variables.

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord

necrobobsledder posted:

You may not be properly overriding the instance profile based permissions in your job. What are the environment variables that is used starting up your Batch job exactly? Also, if your code uses the Go SDK you may need to use another environment variable called AWS_SDK_LOAD_CONFIG to have it read credentials from a ~/.aws/credentials file as well as environment variables.

I’m using the Java SDK. I’ll go dig through everything, but it’s configured to build a client connection with AWS credential (key/secret) if provided or default to a “standard” client connection that checks a few places for credentials before giving up.

I’ll go add some connection debug messages to confirm what account it’s actually using in a Batch run.

crazypenguin
Mar 9, 2005
nothing witty here, move along

PierreTheMime posted:

default to a “standard” client connection

Yeah, this (just use DefaultCredentialsProvider) in conjunction with assigning roles to things is the way to go.

Another thing you might check is whether the role is setup right so that batch (or ecs or whatever) can grant it properly. I don't know if this is different between ec2/lambda/ecs/whatever. But it's at least conceivable that ec2/lambda can grant the role to an instance, but ecs is unable to grant it to a container.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I don't know AWS well and I've come in to admin a project that is currently hosted across multiple EC2 instances and I have questions.

1. One thing that I think could be better is that the project has a redis server acting as a task queue and python workers running on one instance. If i understand correctly, if I'm using AWS, I should probably move those python workers over to Lambda, no? Then I can just eliminate redis and replace the code that sends tasks to the workers via redis with code that starts lambda tasks (or whatever the lambda terminology is)?

2. Some of these instances call HTTP endpoints on other instances via public dns addresses...I should just use the VPC local address, right?

Startyde
Apr 19, 2007

come post with us, forever and ever and ever
Speaking of go it's coming up on two years since aws-sdk-go-v2 has been out but not really, from the git it feels like it's a OMA. Is that coming anytime soon?

Thermopyle posted:

I don't know AWS well and I've come in to admin a project that is currently hosted across multiple EC2 instances and I have questions.

1. One thing that I think could be better is that the project has a redis server acting as a task queue and python workers running on one instance. If i understand correctly, if I'm using AWS, I should probably move those python workers over to Lambda, no? Then I can just eliminate redis and replace the code that sends tasks to the workers via redis with code that starts lambda tasks (or whatever the lambda terminology is)?

2. Some of these instances call HTTP endpoints on other instances via public dns addresses...I should just use the VPC local address, right?

Yea lambda, SQS. Redis for a queue??
You can do that or use a private dns zone.

crazypenguin
Mar 9, 2005
nothing witty here, move along

Thermopyle posted:

I don't know AWS well and I've come in to admin a project that is currently hosted across multiple EC2 instances and I have questions.

1. One thing that I think could be better is that the project has a redis server acting as a task queue and python workers running on one instance. If i understand correctly, if I'm using AWS, I should probably move those python workers over to Lambda, no? Then I can just eliminate redis and replace the code that sends tasks to the workers via redis with code that starts lambda tasks (or whatever the lambda terminology is)?

2. Some of these instances call HTTP endpoints on other instances via public dns addresses...I should just use the VPC local address, right?

1. You're probably looking for: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

2. I don't understand the question, are you looking for this? https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/hosted-zones-private.html

e:fb; at least I've got links!

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
How do I get a list of all resources (lambdas, API Gateways, buckets, etc) that are currently in use on my account among all regions? I just saw that I had things uploaded to a S3 bucket in the wrong/different region that I would have never noticed except by accident and I'd like to just go through and get rid of everything I've ever set up through various tutorials and examples etc.

deedee megadoodoo
Sep 28, 2000
Two roads diverged in a wood, and I, I took the one to Flavortown, and that has made all the difference.


I am currently working on limiting access to resources (namely kms, ssm, and iam) by team. We're basically limiting users to only interact with resources where the arn matches a specific path. One of the problems I'm running into is that developers need to be able to create policies but there's nothing limiting what they can put in their policy. So it ends up being a security issue due to privilege escalation. Is there any way to mitigate this by limiting what a user can put into a policy or are we going to need to insert ourselves into the policy creation process?

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord

deedee megadoodoo posted:

I am currently working on limiting access to resources (namely kms, ssm, and iam) by team. We're basically limiting users to only interact with resources where the arn matches a specific path. One of the problems I'm running into is that developers need to be able to create policies but there's nothing limiting what they can put in their policy. So it ends up being a security issue due to privilege escalation. Is there any way to mitigate this by limiting what a user can put into a policy or are we going to need to insert ourselves into the policy creation process?

I'm not directly involved, but I know our team here uses some processes that reads through policies and strips out access that the user isn't allowed to provide on a regular interval. In higher level envs they typically have the infra team handle it explicitly, but in dev spaces it can speed things up and still make sure they haven't opened up giant security holes.

deedee megadoodoo
Sep 28, 2000
Two roads diverged in a wood, and I, I took the one to Flavortown, and that has made all the difference.


PierreTheMime posted:

I'm not directly involved, but I know our team here uses some processes that reads through policies and strips out access that the user isn't allowed to provide on a regular interval. In higher level envs they typically have the infra team handle it explicitly, but in dev spaces it can speed things up and still make sure they haven't opened up giant security holes.

Yeah, I guess that makes sense. I can probably pretty easily put in a lambda that triggers on policy creation to scan the policy for anything related to the specific services we're limiting access to. Then just generate an email to the user and to my team with the details of the policy if it's got dumb stuff in it.

Scrapez
Feb 27, 2004

I feel like I should be able to figure this out but I'm kind of stumped.

Trying to setup a NAT Gateway for a private subnet.

Have public subnet 10.10.1.0/24 which has an Internet Gateway attached. I've setup NAT Gateway 10.10.1.99 in this subnet.
Have private subnet 10.100.96.0/21. I'm trying to setup the route table for 0.0.0.0/0 to go through the NAT Gateway 10.10.1.99 but it is not listed in the NAT Gateway pulldown list.

The two VPCs that the two mentioned subnets are in have a VPC peering connection established between then but not sure that would have any impact on adding route for NAT gateway.

I'm guessing it's something glaringly obvious but I'm not seeing it. Anyone help me out?

Docjowles
Apr 9, 2009

The subnets are in different VPCs? I'm not sure you can do what you're talking about in that scenario, even if the VPCs are peered.

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord
Alright, my Batch job that I'm calling is listed as having a role assigned under the "Job role" area, which has full permissions to the S3 bucket its trying to access. When the job runs, it builds a AmazonS3 client connection using the standard process and somehow assumes my identity (displays my AWS access key). I have no real idea why it's doing this, as there's no ~/.aws/credential file set or aws config values in the container.

The even more irritating issue is that it's failing to read from S3, which both the role and my personal account have full control to. Even if it's pulling my ID somehow, it should not be failing to head an object to get its size. The same operation works from command line from my workspace, from a generic EC2, and from Lambda (it should the correct assumed role access key here). This is absolutely the most frustrating thing I've dealt with in a while--any ideas?

PierreTheMime fucked around with this message at 19:58 on Aug 15, 2019

Thanks Ants
May 21, 2004

#essereFerrari


deedee megadoodoo posted:

Yeah, I guess that makes sense. I can probably pretty easily put in a lambda that triggers on policy creation to scan the policy for anything related to the specific services we're limiting access to. Then just generate an email to the user and to my team with the details of the policy if it's got dumb stuff in it.

An explicit deny rule will always take precedence over anything else, maybe you can get creative with those.

12 rats tied together
Sep 7, 2006

Thermopyle posted:

1. One thing that I think could be better is that the project has a redis server acting as a task queue and python workers running on one instance. If i understand correctly, if I'm using AWS, I should probably move those python workers over to Lambda, no? Then I can just eliminate redis and replace the code that sends tasks to the workers via redis with code that starts lambda tasks (or whatever the lambda terminology is)?

2. Some of these instances call HTTP endpoints on other instances via public dns addresses...I should just use the VPC local address, right?

1) You can do this (replace an ec2 task worker with triggered lambda functions) but you don't have to. Make sure that none of your workers need to run a task for longer than the function timeout. While you're reading these docs, make sure that you won't exceed any of these other limits as they can't be increased. In particular, be aware that concurrent function executions are shared per-account, per-region, and by moving these workers to lambda you're sharing that constraint amongst them and all other functions that execute in your account.

A robust task/worker queue implemented in ec2 is functionally not very different from sqs/lambda, and there are some benefits to doing this yourself that can't be replicated on lambda. In the past I have never been able to responsibly justify a migration here. You can switch redis from ec2 to elasticache if it isn't already there, that's trivial to do and is a huge reduction in management burden without also possibly springing a fundamental tech change on a software team who doesn't particularly care and just needs to schedule business logic.

2) Yes, the private IP address of the instance or its private hostname if you prefer, they're basically the same thing. This will be faster, possibly cheaper, and definitely easier to maintain moving forward.

Scrapez posted:

[...]
The two VPCs that the two mentioned subnets are in have a VPC peering connection established between then but not sure that would have any impact on adding route for NAT gateway.

I'm guessing it's something glaringly obvious but I'm not seeing it. Anyone help me out?

Docjowles is correct in that you cannot use one VPC as a transit network for another. If you have two VPCs here, you need two NAT gateways. They describe this in the documentation but you kind of have to dig for it: link. If I'm understanding what you want to do correctly, you're trying to run the shown invalid edge-to-edge configuration. Please let me know if that's not the case though, you might indeed be missing something silly because the UI for nat gateways is kind of weird.

12 rats tied together fucked around with this message at 21:33 on Aug 15, 2019

Scrapez
Feb 27, 2004

You guys had it right. I was trying to share a NAT Gateway through a VPC peering connection which does not work. After I took a step back I kind of realized I didn't need multiple VPCs anyway. After consolidating into one, it works just fine. Thanks!

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Boris Galerkin posted:

How do I get a list of all resources (lambdas, API Gateways, buckets, etc) that are currently in use on my account among all regions? I just saw that I had things uploaded to a S3 bucket in the wrong/different region that I would have never noticed except by accident and I'd like to just go through and get rid of everything I've ever set up through various tutorials and examples etc.

The easiest way will be through cost explorer. If you a spend for a service, then you have stuff in that region.

FamDav
Mar 29, 2008

deedee megadoodoo posted:

I am currently working on limiting access to resources (namely kms, ssm, and iam) by team. We're basically limiting users to only interact with resources where the arn matches a specific path. One of the problems I'm running into is that developers need to be able to create policies but there's nothing limiting what they can put in their policy. So it ends up being a security issue due to privilege escalation. Is there any way to mitigate this by limiting what a user can put into a policy or are we going to need to insert ourselves into the policy creation process?

You want permissions boundaries. Specifically, you can require create-user/role to include a permissions boundary.

JHVH-1
Jun 28, 2002
Ive heard of some orgs that just give each team their own account so they are isolated. It also has the benefit that they get to pay the bill so if they waste resources it comes out of their own budget.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

JHVH-1 posted:

Ive heard of some orgs that just give each team their own account so they are isolated. It also has the benefit that they get to pay the bill so if they waste resources it comes out of their own budget.

This happens a lot and the multi account strategy solves lots of billing and resource tracking problems, but can create huge problems as well. Before heading down a multi account strategy consider the following:

- how well will our processes around account creation scale? What works at 5 accounts won’t work at 50. Or 500.

- how will we manage account governance? All of these accounts will need to be secured somehow. What happens when someone switches teams? How will their access roles be moved?

- how will we manage data security? Will there be a central team with review permissions on every account? Will each team be responsible for implementing our security best practices?

- how will resources in these accounts talk to each other? Will they have a hub-and-spoke model or fully meshed? What do New VPCs look like in terms of peering relationships or VPN/DirectConnect access?

- how will resources be identified? Will there be any company-wide naming conventions and tags?

Etc.


Getting these questions wrong (as in wrong for your company, since there are no wrong answers in general) will cause massive headaches when you hit that landmine.

From countless experiences like this, I tell you that Architectural redesigns are incredibly disruptive and painful when the tech debt bill has to be paid. And 80% of them are avoidable with thorough analysis ahead of time.

Agrikk fucked around with this message at 18:19 on Aug 18, 2019

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
From personal note, dealing with 2fa on 50 accounts is nightmarish.

I highly suggest if you do have to go this route, you screenshot and print the QC codes and stick them all in a safe.

Internet Explorer
Jun 1, 2005





Bhodi posted:

From personal note, dealing with 2fa on 50 accounts is nightmarish.

I highly suggest if you do have to go this route, you screenshot and print the QC codes and stick them all in a safe.

Or use something like Authenticator+ that can back them up and restore them.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Or you can do what a customer of mine did:

Be really clever and buy an iPhone and put all eighty of their accounts’ root 2FA on an instance of google Authenticator and keep the iPhone in a bombproof safe.


They were all kinds of :smug: until someone dropped the phone.


I had to fly down there and get on a video call with our legal department and me sitting next to their leadership and vouch that their leadership was actually their leadership and we all had to present IDs and say who we were and that we were authorized to remove MFA from the account.

We got to do this eighty times.

Agrikk fucked around with this message at 02:35 on Aug 19, 2019

freeasinbeer
Mar 26, 2015

by Fluffdaddy

Agrikk posted:

Or you can do what a customer of mine did:

Be really clever and buy an iPhone and put all eighty of their accounts’ root 2FA on an instance of google Authenticator and keep the iPhone in a bombproof safe.


They were all kinds of :smug: until someone dropped the phone.


I had to fly down there and get on a video call with our legal department and me sitting next to their leadership and vouch that their leadership was actually their leadership and we all had to present IDs and say who we were and that we were authorized to remove MFA from the account.

We got to do this eighty times.

Hey it’s real dumb I can’t attach more then one u2f key to an AWS account.

Docjowles
Apr 9, 2009

Agrikk posted:

Or you can do what a customer of mine did:

Be really clever and buy an iPhone and put all eighty of their accounts’ root 2FA on an instance of google Authenticator and keep the iPhone in a bombproof safe.


They were all kinds of :smug: until someone dropped the phone.


I had to fly down there and get on a video call with our legal department and me sitting next to their leadership and vouch that their leadership was actually their leadership and we all had to present IDs and say who we were and that we were authorized to remove MFA from the account.

We got to do this eighty times.

:lol: holy poo poo :lol: I'm starting to see why you tout TAM as a fun and cool job so much.

For anyone struggling with 2FA, I strongly recommend ditching individual IAM accounts and just using your corporate SSO solution. Because yeah, dealing with 2FA loving sucks. If you are at a company of any size you hopefully already have some sort of SSO backed by 2FA and you can just reuse that instead of making every AWS user set up a second solution. And not hate your life twice as much every time someone drops their phone in the toilet.

This has the added benefit that engineers do not have permanent access keys. Can't upload your god-mode key to GitHub if you don't have a key :thunk: You can request temporary keys once you authenticate via SSO, and we make users do this. I wrote a lovely script that makes it very easy to authenticate to our SSO, pick which AWS account you want to work in (filtered to the set this user can access based on their Active Directory groups), and then dump the temp creds to their local environment. Some of the SSO vendors even provide this out of the box. Doing this has already paid un(?)expected dividends like devs coming to us saying "hey I run this production critical job from my laptop every day under my user, and now that's not possible, what gives?" and we can gently repoint them toward not loving running critical jobs from their laptops with admin access.

Apps running on EC2 instances should use IAM instance profiles to assume a role that can do what they need. There will always be service accounts that need an actual IAM user with a long-lived key. But that should be the last resort choice, IMO.

Actual human using AWS? Access via SSO with 2FA, get temp API keys if needed
Application running in AWS? Use IAM roles
App running elsewhere that needs to access AWS resources? OK fine, you get a key but it's restricted to the minimal set of features said app requires. And it's expiring on a set schedule.

Adbot
ADBOT LOVES YOU

Adhemar
Jan 21, 2004

Kellner, da ist ein scheussliches Biest in meiner Suppe.
Yep that’s spot on. In fact this is what we do internally at AWS. Using IAM users is strongly discouraged and there is a lot of work ongoing to get rid of existing ones.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply