Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
12 rats tied together
Sep 7, 2006

Depends on the job, and the org, but if you had to pick only one service to fully understand it should be IAM (because it's the only hard part about all the other services).

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

i actually love it, sorry, and my suggestion was only half joking. you can google "iam actions conditions" + service name and the first result will be one of the best references in the docs for understanding a new service in my experience.

AWS IAM is also best in class of the various cloud providers, second only to probably alibaba cloud which is the exact same service but it's called RAM instead.

GCP's IAM is horrible by comparison, Azure is also extremely bad unless you have an existing AD domain plus a team of active directory janitors to deal with it.

12 rats tied together
Sep 7, 2006

yea the problem with that one is that CopyObject isn't a real permission. you need to have PutObject and GetObject on the destination/source respectively

the api docs kind of mention it obliquely: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html

12 rats tied together
Sep 7, 2006

an extra fun part of CopyObject is that you can end up with objects in your account that people with AdministratorAccess, also in your account, don't have access to

it happens if you allow a user in a different account to PutObject, and they do a CopyObject, which as noted in the API docs does not preserve the Object ACL but replaces it with "private". "private" is a Canned ACL which basically grants the source account owner full control over the object, which is normally fine, except because the user belongs to a different account, the source account owner is that different account, not the bucket owner account

you can still HeadObject, pretty sure you can DeleteObject, but you can definitely be a full rear end admin and get an access denied trying to GetObject a thing in your own account. it rules

12 rats tied together
Sep 7, 2006

necrobobsledder posted:

Another fun part is when security groups reference each other, and lots of people intuitively think that resolution is fully transitive and resolves IP groups as a union when it's really just pointers to another group.

Yeah this is an interesting thing in AWS, whenever you start dealing with groups they start being references-to-references instead of explicit objects themselves like you would find in active directory etc. Of particular note an IAM group is not a valid value for use in a policy's principal field: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html

This means, in a resource based policy, you can't be like "members of group ThingDoers are allowed to Do Thing on this resource", you have to be like "group ThingDoers are allowed to assume role ThingDoer, role ThingDoer is allowed to Do Thing".

A principal based policy attached to a group is interesting in that the principal is implicitly "self", but that implicit-ness AFAICT is used as a pass-through, each user resolves the policy individually, where "self" becomes the IAM user "me". Basically, querying groups and resolving group membership is expensive computationally, and AWS' model dodges that problem by just not supporting it.

e: My biggest recommendation for OP is to actually read the documentation. They worked very hard on it, and it's very good.

12 rats tied together fucked around with this message at 18:37 on Sep 19, 2021

12 rats tied together
Sep 7, 2006

I'll take a guess:
- all the load balancers use RR dns that you dont get to pick, so the IP will change out from under you over time and you will always have at least 2 IPs*
- NAT gateway only works for egress traffic since it's dynamic and not static NAT*

*- I think

OP I think we both know this already but the best answer would be to use software that isn't terrible garbage and will respect your configured route table. Since thats probably out of your control, if I had to deal with this, I'd probably bake something into cloud-init that disabled the problematic interface on startup.

12 rats tied together
Sep 7, 2006

for this scenario I use a single profile and the credentials in that profile have only one permission, the ability to sts:AssumeRole a management principal for each account/context, and then i have a shell command for switching between those roles

12 rats tied together
Sep 7, 2006

The aws::ec2::vpc resource only accept a primary cidr block: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpc.html#cfn-aws-ec2-vpc-cidrblock

You would have to create an aws::ec2::vpccidrblock: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpccidrblock.html which lets you bind a cidr block and a vpc together.

12 rats tied together
Sep 7, 2006

The autoscaling group resource has a field for specifying a launch template, the same kind that you would use for an EC2 instance. This is distinct from and mutually exclusive with the usual "ASG Launch Configuration" config item.

I don't have great access to the ASG web interface at the moment but this setting should be hiding in there somewhere.

12 rats tied together
Sep 7, 2006

I don't think that you can use any of the cfn intrinsic functions in Mappings, but it's a little hard to say if that is the exact issue here because I'm not super clear on where the Mappings key starts in your second example.

Without any other context, in this scenario, I would recommend two things:

1- If you can avoid prefixing your "production vpc stack" with a per-region name, you can just import it directly. You can just call it production, export it as production-VpcId and then instead of using Mappings, just Fn::ImportValue production-VpcId. Since the stack must exist in only a single region, the region is implied (and available elsewhere in the API), and you don't need the mapping.

2- Since you can't change the VpcId of a security group without deleting it, I would embed the security groups in the VPC stack and just use !Ref. In my experience it's a good idea to avoid introducing cross-stack references alongside a "replacement" update behavior, if you can.

Comedy option 3: If you use ansible for this, the "template_parameters" field is recursively parsed, so you can pass arbitrarily complex maps to cloudformation with it.

Comedy option 4: If I'm misunderstanding what "Production-OH-Network" means, and you do have this kind of double-dynamic relationship where any given VPC consumer stack needs to consume an output from a stack that you, for some reason, can't know the name of, I would probably use a nested stack instead and then pass the input params through AWS::CloudFormation::Stack.

12 rats tied together
Sep 7, 2006

If it makes the template cumbersome to read and understand, you're absolutely right to split it up like this. Security Groups are a super overloaded concept in AWS so what I generally prefer to see is that you make a distinction between "network" SGs and "membership" SGs.

Membership SGs are for when you have something like "the chat service" which is comprised of a bunch of other AWS crap. The chat service SG, which contains every applicable member of the chat service, lives in the chat service template just for convenience. You mostly use this SG for its members, for example, a load balancer config where you need to allow traffic to every member of the chat service.

Network SGs are for when you have something like "allow inbound traffic from the office". It's not tied to a particular service, so it doesn't have a service stack to live in, your options are basically to have a Network SG stack or to embed it somewhere that logically relates to things in AWS that have network connectivity to things not in AWS. I usually end up deciding that the vpc stack is the best place and I throw them all in there, but I rarely have more than like 5 of these "Network SGs" so it is not especially cumbersome.

If I had 50, I would absolutely put them in their own stack, and that stack would probably also be a good place for network ACLs to live if I had any.

12 rats tied together
Sep 7, 2006

A resource-based policy such as the one attached to a secret, when using Principal: AWS: "*", you're effectively applying s3 public access to that resource. It configures access for all users including anonymous users and you probably shouldn't do it, in general.

It is still only 1/2 of the required permissions for cross account access, but it doesn't implicitly scope to "AWS accounts in my org" or anything (see link). A malicious actor will certainly configure the other half of the required permissions themselves and then there's nothing stopping this Secrets Manager config from allowing ListSecrets or whatever.

If your developer intends to allow access to the org, they'd want to layer a Condition block in there using one of the global condition keys appropriate for their intent, at the minimum. Better would be to explicitly enumerate the principals that should have access.

12 rats tied together
Sep 7, 2006

Hughmoris posted:

How often do you use the AWS CLI versus CloudFormation versus Console? My goal is to build good practices while I'm self-learning, in eventual hopes of employment using AWS.

I use CFN for everything, the CLI for sts:AssumeRole and s3 actions, and the web interface only for cloudtrail, cloudwatch, and examining CFN changesets.

I would recommend building templates for everything.

12 rats tied together
Sep 7, 2006

if you're using, or can use, aurora serverless, you don't need a proxy or a bastion and you can use the "data API": https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

12 rats tied together
Sep 7, 2006

There are some per-AZ types of charge in AWS, and inter-AZ traffic costs more, but overall no that is a stupid waste of time entirely.

At least try to restrict to a subset of the "true AZs" regardless of account-letter mapping because it can cause problems for some types of peering (esp. PrivateLink), the peering tech can rely on both sides of the relationship existing in the same AZ.

12 rats tied together
Sep 7, 2006

It's fair to say "do the math to find out if nat gateway makes sense for you", I wouldn't go further than that in any direction for making any firm rules. If I spend more than 20 minutes thinking about the NAT instances per month in my employer's aws environment, I've have wiped out all of the savings we might have seen for it just in labor, nevermind what would happen if we actually lost a NAT instance and had to deal with the production impact, all of the monitoring we'd need for it, the extra config management and orchestration, and like noted above, the extra impact the custom setup now has on downstream resources in this network.

Being able to simply declare "these instances go out through a nat device" in a yaml file and be charged $0.045/hr and $0.045/GB for it to just work, just scale automatically in the background, and to just get graceful failover for free is an extremely compelling pitch for a service to make.

In the event that you find yourself having to janitor this setup for whatever reason, I have done it with keepalive.d in the past and ENI juggling, and I did not find it to be especially cumbersome. If you need to update route tables when an instance fails, that's not really suitable for a production workload at an employer, IMO.

e: I would also add that the answer to this type of thing at scale is usually "avoid using NAT, especialy source NAT, at all costs".

12 rats tied together fucked around with this message at 21:35 on Jun 3, 2022

12 rats tied together
Sep 7, 2006

I find pulumi to be the best of the "CDK" style products by a fair margin -- it uses terraform under the hood so you don't need an explicit "synth" step, but it's actually a competently developed set of libraries with some great documentation, unlike the CDKTF.

I definitely recommend checking out something in this area though. Please help bring the days of typing manually HCL or YAML closer to an end.

12 rats tied together
Sep 7, 2006

if you want to do "big data" in general, python and java are the languages for that, in that they're different enough from each other that you would benefit from learning both on purpose, and wouldn't necessarily be able to pick up one easily because you know the other

it's hard to recommend getting really into node.js for data stuff but i don't pay a ton of attention to it or it's ecosystem, i could be wrong

12 rats tied together
Sep 7, 2006

For that problem specifically you might consider adjusting the distribution's default root object to be index.html, which would save you some code and the complexity of running it.

12 rats tied together
Sep 7, 2006

it's hard to say without benchmarking, especially if your flask app does a lot of lazy loading. my intuition is that most of your fears can be allayed by lambda's provisioned concurrency feature, which basically prewarms a bunch of executors for you.

since you'd be running in flask on lambda, it shouldn't be too complicated to switch towards flask in a container later, if you find that it's not working

12 rats tied together
Sep 7, 2006

CarForumPoster posted:

It’s really hard to see how the phrase brown bag as it applies to lunchtime learnings could have racial connotations even with the knowledge of a brown bag test apparently existing.

It would be more of a classism thing than a racism thing, IMO.

kalel posted:

I'm in the process of learning how to set up an EKS cluster for some microservices which need to process requests from the internet. I was considering using a Fargate profile for the workload, but it would require me to set up a NAT gateway to connect the private-subnet-only Fargate pods to public subnets. My question is, is there any appreciable advantage to using Fargate pods over a node group with EC2s? My impression is that it seems to be more setup and most cost for little gain

You're correct, if you're going to manage the networking you might as well use an ec2 nodegroup instead of a fargate profile for this use case. The gain appears later when you have more complicated needs: one thing people use k8s for is a junk drawer for web apps, but it's only one of my valid use cases, it's a container "scheduler" for a reason after all :).

If you're doing fancier stuff on it than mapping tcp 443 to tcp 8000, you might find that the extra flexibility with fargate (and fargate spot) is worth the extra setup cost. Maybe. It's a case by case thing.

12 rats tied together
Sep 7, 2006

Sure, you could also just lambda:InvokeFunction from your first lambda function for a slightly simpler option, and then for a more complex option you could create and and use an "AWS Step Functions State Machine".

Something else to familiarize yourself with as you explore this type of tooling, if you want to do it for work, is a metadata catalogue and lineage tool like apache atlas. I've worked a lot of "data ops" jobs where I'll show up after like 3 or 4 years of people creating this sort of multi stage pipeline in s3 but never bothering to write down where the data comes from, or where it goes, or who uses it, or what's in it, and (usually most importantly) if we're legally allowed to use it because it isn't derived from PII or HIPAA, or maybe if we use it we're supposed to pay Nielsen or some other company an activation fee, and stuff like that.

It's really easy to create a bullshit mess when you start reacting to s3 objects by creating more objects or moving them around.

12 rats tied together fucked around with this message at 21:31 on Aug 6, 2022

12 rats tied together
Sep 7, 2006

:hai:

I don't have a ton of experience with the CDK but Pulumi is incredible, basically the same amount of freedom to solve problems that I got from Ansible in the mid 2010s and haven't seen since. There are some funky things about it though:

1, the resource objects are immutable once instantiated, so you can't define "the load balancer" in a shared library, import the library, and then change the health check timeout. Instead there are typed "thingArgs" objects in the library which you can do this with, for example in python you might create "the load balancer config" as a named tuple, attrs class, dataclass, or whatever, and then you can import that with all of its defaults and feed it into your load balancer resource after modifying the health check.

2, the pulumi engine magically scoops up all instances of Resource and orchestrates them, which is not idiomatic in most languages. I think most people expect that you would have to return the resource objects to some type of parent scope, or pass them to a handler, but they "just work".

3, even though the code looks intuitive and synchronous, the pulumi engine does some weird poo poo behind the scenes so that you (like terraform) don't have to tell it manually about the order for resolving every dependency. This means that even if it looks really trivial, you usually can't do "give me the ARN of this load balancer and HTTP POST to our inventory service" in an obvious way. The mechanism for this in pulumi is called apply and you have to feed it a callback function that will run once the value is known. If you're doing a lot of this, it makes sense to pick a language that is really good at callbacks (typescript), even if you might have preference for one of the other supported languages or even just the YAML mode.

But overall, it's a massive improvement over basic terraform IMO. The easiest thing to sell people on in my experience is the Transformations construct which is basically an object that describes a set of changes to make to a stack (workspace). You can define these Transformations in a repeatable, DRY way and then very ergonomically apply them to a particular stack, or all stacks, a couple obvious examples of this are: "We tag all resources in the PCI account with CONCERNS_PCI=True", and "every resource should have the pulumi tag". These are both quick tagger functions you can write in an "auto_taggers" module and then you just import and apply them to your stack, and then you get free tagging forever that is impossible to gently caress up or forget.

12 rats tied together
Sep 7, 2006

A couple of big orgs "use pulumi" but they tend to be the type of org that use a little bit of everything. It's not nearly as commonly used as terraform or ansible. Hashicorp's own CDK went GA earlier this month as well, CDKTF, which could be a useful introduction to working with this type of tool for anyone who is interested.

Notable differences from Pulumi are that the resource objects are mutable and the config holder objects are not, although the resource objects must be modified through explicit setter methods e.g. import CompanyNameEcsService -> set_load_balancer(). You have to instantiate a wrapper class for "resources in a stack" and you end up shoving a lot of stuff into class constructors for this reason, compared to Pulumi which handles the stack <- resource relationship more implicitly with some CLI commands and other exterior scaffolding.

CDKTF also has Pulumi's transformations construct but it calls them "aspects", its where your auto taggers and the like would go. It's not nearly as bad as I was expecting, but something happened to hashicorp in the last 5 years where they're just awful at documentation now. Pulumi's docs identify which resource properties will trigger a rebuild, and they host their own documentation compared to the CDKTF which has them all up on constructs.dev

Both tools can interop with HCL Terraform but Pulumi's UX for reading terraform outputs is much better, while CDKTF has a unique(?) advantage in that you can invoke an HCL module directly from your python/ts/whatever CDKTF program.

12 rats tied together
Sep 7, 2006

necrobobsledder posted:

Also seriously what’s with some of these infra folks using TypeScript for infrastructure code? Maybe I’m missing something important but I can’t think of a particularly solid construct that makes learning a whole rear end other language and tool chain on top of Terraform worth it unless the language and library ecosystem is more stable than the cesspool of technical debt that are both NPM and yarn.

The CDK and CDKTF are TypeScript heavy because their cross-language capabilities come from jsii, an AWS library that lets any language interact with javascript stuff. Pulumi's uses grpc which probably explains a lot of the other differences, I don't have a lot of experience using cross-language facilities in either tool and honestly I hope to never develop this experience either.

Pulumi was founded by Microsoft people, from what I understand, and that whole ecosystem loves TypeScript for various good and bad reasons. I can tell you that it is surprisingly ergonomic, if you don't mind dealing with the transpiler and the abundance of things called "interface" that are actually just data. I've written CDK-style code in C#, F#, TS, and Python and my preferences are basically that list in reverse order.

More generally: the dream of simply having the developers write the terraform was unattainable unless you were working with the most patient and motivated developers to ever exist. If you want the dev teams to seriously commit to writing their own infra code, you have to meet them where they're at (in their IDE, in their preferred language).

12 rats tied together
Sep 7, 2006

CloudFormation is extremely good, and there are reasons to use it even from Terraform. Time you spend learning it is never wasted.

12 rats tied together
Sep 7, 2006

The first place is always the bill, and it has everything you need generally. Everything comes with a big "it depends" asterisk because everything, well, depends.

Big cost drivers are, IME:

- S3 storage, there's usually at least one s3 bucket named after the org that has an order of magnitude more data than it should.
- RDS instance type tends to just get ratcheted up over time as more "database events" happen and are solved with vertical scaling.
- AWS' list price for data transfer is exorbitant, so if you have a chatty app on AWS, it tends to dominate your spend (adtech is really bad about this in particular).
- If you have a chatty app that is only chatty "privately", whatever ops team that exists has usually never done the work to optimize for that so you'll see a lot of inter-AZ bandwidth charges too.

EMR tends to be pretty cheap because you can just run it on spot. If there's no capacity, whatever, run the job again when there is. EC2 instance type is also kind of an obvious cost center so even the least responsible orgs have optimized around it to some degree.

12 rats tied together
Sep 7, 2006

First time I used the ns1 provider it set "the entire contents of this zone" to the 1 record that I asked it to provision, deleting thousands of records without indication. Very cool thank you, I'll stick with ansible for DNS though.

12 rats tied together
Sep 7, 2006

Jeoh posted:

I wish Terraform had import blocks, so that I wouldn't have to do it manually (hello aws_system_linked_role my old friend)
Agreed, that is one of my favorite pulumi features: https://www.pulumi.com/docs/intro/concepts/resources/options/import/

TypeScript code:
let group = new aws.ec2.SecurityGroup("web-sg", {
    name: "web-sg-62a569b",
    ingress: [{ protocol: "tcp", fromPort: 80, toPort: 80, cidrBlocks: ["0.0.0.0/0"] }],
}, { import: "sg-04aeda9a214730248" });
imports are modifications, a big benefit of infracode is being able to propose and review modifications using version control. it's crazy that your only option for almost a decade has been to put a date-stamped bash script in a "migrations/" folder somewhere in your terraform repo

Pile Of Garbage posted:

As far as AWS is concerned learn CFN to understand the platform and then look at TF and Ansible if you want to do IaC. Only look at CDK and Pulumi if you want to do tightly-integrated IaC. If your applications are expected to operate at a platform level and interact with cloud then yeah use them, otherwise please don't you'll just make a nightmare for whoever picks up the pieces for support.
I don't necessarily disagree but what do you mean by "tightly-integrated?"

12 rats tied together
Sep 7, 2006

Cool, I broadly agree, there is a middle ground where the application does not need to specifically be orchestrating pulumi calls (e.g. what if the application does not run continuously), but I think we're on the same page.

Something that gets missed a lot is CloudFormation is way more than just a templating tool, it has really granular update policies, it has rollback triggers, it has event notifications, deployment management policies for some types of resource, etc. Even if you're using Pulumi, if you have a bunch of really important resources that need to live together and share state/data/fate, you should put them in a cfn stack.

Firing off naked api calls in sequence to like, a billing database, has always been a worst practice. It doesn't matter what tool is turning what type of input into the api calls.

12 rats tied together
Sep 7, 2006

if you can pin it to 1000 requests exactly that feels like file descriptor limit or one of the other famous historic docker footguns

you're right that you shouldn't need to use a queue to serve this, but it's hard to offer much advice without knowing more about what "crashed" means. this was on ecs, right? the containers presumably went unhealthy and were reaped by the ecs scheduler, the main reasons this can happen are pid 1 exiting or a load balancer health check failing.

12 rats tied together
Sep 7, 2006

BaseballPCHiker posted:

The Lambda is in a different account. That gives me an avenue to go down, thank you!

Since the IAM service is account-scoped, "Cross Account" is a huge huge huge piece of extra added complexity. Definitely include it with all of your search terms, forum posts, etc. :)

From experience I would guess at one of two things being wrong: first one is, like Docjowles was getting at, you uploaded your object with the BUCKET_OWNER_FULL_CONTROL (nobody else has access) Canned ACL which is interfering with your policy in some way.

The other thing would be missing account ids in your various permissions policies. The Lambda function needs its own account-local role to assume, and that role needs a policy that allows access to s3 ARNs that have the bucket-account account id in them. Similarly, the bucket-account bucket policy needs to trust the Lambda account role, probably also a good idea to include the account id in the role ARN as well.

Other than that, it seems like you did everything correct, so an AWS support ticket might be in order. They're usually pretty good at debugging cross account permissions gotchas, and they have slightly more access to your stuff than we do.

12 rats tied together
Sep 7, 2006

Every account should have exactly one VPC, of a suitable size for your business model and infrastructure complexity, per region. /16 is a pretty good default, but reserve an ipv4 cidr block from your corporate ipv4 address allocation matrix so you don't create routing conflicts down the line.

This strategy will scale with you up into the $100mm/month spend range in AWS without any significant problems. If permanently reserving an ipv4 cidr block and spinning up a VPC and subnets stack is too heavy of a lift for your thing, that's a really good heuristic for "this thing should live in a different account".

12 rats tied together
Sep 7, 2006

if you haven't already, check out the aws documentation page for IAM policy variables and tags. a big problem I often see people run into with terraform specifically is creating tons of policies with terraform interpolations in them that could actually just be one policy with an iam variable in it (typically aws:userid)

other than that, I'm not aware of any tricks. I use cloudformation templates for my IAM stuff and serialize a business-specific principal definition to yaml.

12 rats tied together
Sep 7, 2006

"AWS IAM Identity Center" is basically just an addon for regular IAM. It will create a bunch of IAM Roles and it contains mechanisms for verifying identity so that things can access assume the permissions of those roles without needing an IAM user -> the ability to perform the sts:AssumeRole API call.

It won't disable the normal IAM service, or anything like that. That would be pretty disastrous though so you should absolutely file a support ticket to get an official 2nd opinion.

12 rats tied together
Sep 7, 2006

Yes, sorry, I did misread the question at first.

IAM Identity Center will just create a normal role called like AWSReservedSSO_SomeGarbage_SomeOtherGarbage (+ it lets people from your Identity Source become this role without needing an AWS user). The role will have permissions like every other IAM object, but you're supposed to manage them through IAM Identity Center's permission set construct. This is where your policy JSON, managed policy attachments, etc., will live.

I believe the role is actually created per-permission-set which is kind of a garbage pattern because it forces you to maintain a many-many-many relationship between accounts and users and permissions. For your use case though (2 types of user, 1 account) it will probably be fine.

e: to be more helpful here, to implement this you would create 2 "IAM Identity Center Permission Sets" (above link) for "product" and "data". The permission sets can have either an attached policy that you write, a shared customer managed policy that you write, or they can use the AWS managed policies which is preferable. You can search the managed policy list in the console for like "quicksight" and "redshift" to see what already exists - AWS is usually pretty good about having ReadOnly vs PowerUser for their various services.

If you can't find anything in the managed policy list and you have to create a policy, google search "actions resources conditions" + the service name. For example: actions resources conditions quicksight. You can scroll through this documentation to examine the various types of actions, resources, and conditions (:)) that exist for a particular service so you can craft your desired "limited access to quicksight" policy. Probably create this as a Customer Managed Policy when you're done so you can re-use it later.

After you have your 2 permission sets with attached policies, you need to assign the permission sets to an account. When you do this, you also pick a principal. The exact principal you use depends on your identity store, but if you're using the built-in identity store, it should be pretty intuitive: you'd have a group for product and a group for data. If you're mapping identities from another identity store, you're on your own, and I wish you luck.

12 rats tied together fucked around with this message at 21:05 on Nov 28, 2022

12 rats tied together
Sep 7, 2006

IMHO It's much better UX to just use regular IAM roles:

- Create everyone an IAM user (trivial) and configure the user with an MFA device (<10 min per user, async slack message)
- Give them a policy that allows sts:AssumeRole on an arbitrarily complex mapping of AccountName + RoleName
- Add a condition that only allows sts:AssumeRole to succeed with a valid MFA code

That's it. If they want to use the CLI, assume a role. If they want to use the web UI, log in like normal -> assume a role. If you want to swap accounts in the web UI, hit the "Switch Role" drop down in the interface, which has a handy "recent roles" feature. If you want to swap accounts in the CLI, use a role profile, or juggle some vars.

From a management perspective this lets you implement "The product management role has RestartInstances permissions in the UAT accounts" by, very simply, adding a policy element that grants this action, to the product management role, in the UAT accounts.

To do this in AWS SSO you would need to either create a new permission set -> which results in a new option for the user to select, called like "ProductRoleUATAccount", that has different permissions from ProductRoleProductionAccount, or you would need to overload the policy to check for aws:PrincipalAccount -> ForAnyValue -> StringEquals -> each, UAT, account, ID.

If you do ForAnyValue + StringEquals you also have to explain to your auditor why your in-scope permissions policies have "Allow Product" statement in them, but it's totally safe because of this nested JSON, and we ensure that changes to the nested JSON are safe because of [...]. It's bad.

12 rats tied together
Sep 7, 2006

Multiple roles per user is for when you have access to perform multiple types of job duty, which are meaningfully segregated (e.g. political firewall, merger nonsense, SOX controls, things of this nature). It's good that Product can't CloudFront:CreateInvalidation, even if some members of product are also developers and have this permission through a side channel. Decoupling the definition of a job duty from its performers is good because the lack of coupling reduces the scope. Kevin might be on a new team that has a hybrid PO/Developer team lead. Compliance shouldn't have to ingest this new organizational skeleton and produce new scaffolding for ensuring Kevin's actions are compliant, because Developer and PO already exist and already are compliant.

It's way easier to "prove it" when a thing is what it actually is, and proving it is really all that matters.

You can do this in AWS SSO, of course, but the UX on the administration and the user side is significantly worse, because of the aforementioned many-many-many relationship required by the principals -> permission sets -> accounts model.

12 rats tied together
Sep 7, 2006

luminalflux posted:

lol, lmao. Even with 50 engineers this is not trivial when you start running into security policies saying "IAM Credentials must be rotated every 90 days".
This is like a 10 line lambda function, but also, you shouldn't have to rotate every IAM user's credentials after 90 days. Splitting hairs over what is in-scope and what is technically subject to whatever type of credential rotation policy that needs to exist is your security team's job, but I do think that the hiring pool for "good cloud infosec" is basically nonexistent, so it's reasonable that you would be globally subject to a stupid control that shouldn't exist.

I would offer your security team the advice that technically the user doesn't have access to anything, in the proposed model, only the role does.

Arzakon posted:

“Don’t use IAM users” - everyone who works for AWS (for what that is worth)
AWS also created Control Tower. Guard Duty. The "private" s3 canned ACL + the feature for s3 access logs where they can recursively log themselves into their own logs and bill you for it. AWS' job is to have an answer for every question so you can bring your spend to their platform, and they're very good at it.

It's our job, not AWS', to consider our own specific security and management needs and create a set of platform objects that solves for them in the cheapest, easiest, etc., way.

Adbot
ADBOT LOVES YOU

12 rats tied together
Sep 7, 2006

I don't agree with the specific conclusion that management through Identity Center is smoother as a whole, but I think we're on the same page + it's really sad that this is even up for debate. "Use the thing that is fundamentally better" should be the easiest slam dunk of all time, but here we are.

IMO, my accounts are databases, so I don't want to SSO map into them in the same way that I wouldn't want SSO map into mysql. The graphs are in grafana, the deployments are in the deploy tool, the logs are shipped to sumo, etc.

The types of access that should be given out in this model are like "Edge team needs to manage EdgeApp which uses CloudFront" so the foremost concern is that there is a single entity that needs different permissions in every account, which SSO totally drops the ball on compared to the other solution which has been around for over a decade.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply