Amazon Web Services - Cloud Giant Hits Hard - The Something Awful Forums

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard

Vanadium: Jan 8, 2005

Huh, so creating an AMI is more complicated than taking a tarball with a root fs in it, adding a dir with your app to it and uploading that to S3?

# ¿ Apr 26, 2017 20:16

Adbot: ADBOT LOVES YOU

# ¿ Apr 30, 2024 21:35

Vanadium: Jan 8, 2005

I'm having a really hard time with DynamoDB + throttling.

I'm continuously updating a couple thousand rows, with a couple dozen distinct partition keys. This works fine most of the time, but I get sporadically throttled bad enough that my writes back up and I need to drop updates. I'm not sure why that happens and I'd like it to stop.

Is this just a capacity thing? According to the write capacity graph in the DynamoDB console, I have over twice as much provisioned capacity as I'm using when averaged over a minute, though I tend to spend five seconds writing a lot and then like twenty seconds not writing anything, so in those brief few seconds I consume more write capacity units than I have provisioned. So I use x capacity units averaged over a minute, I have x*2.5 provisioned, the actual usage pattern is a burst of consuming x*7 - x*8 over a few seconds, and then do nothing for the rest of the time. During the update cycles where I get throttled, I end up consuming like x*1.5 capacity units per second during the burst period, no change in usage averaged over a minute unless it's bad enough that I end up dropping writes. My assumption was that the provisioned capacity isn't literally "per second", so I can unevenly consume it across a short timespan like a minute or five, is that wrong? Even if it is I don't get my provisioned per-second capacity during the throttled periods so I'm really confused what I'm actually provisioning. (Edit: I think http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Bursting is what I was thinking of.)

Maybe it's just my partitions being very uneven, so the capacity is partially provisioned to partitions I'm not actually using very much? It's a small set of partition keys so I wouldn't be very surprised. On the other hand they never change and the periods in which I'm getting throttled always seem to be fairly short, with much longer periods (hours, maybe days) of things working smoothly, so I dunno what is changing. Is there a way to actually look at how many partitions exist and how load is distributed among them, or are they just an explanatory device and not how things actually work? But according to the illustrative numbers in the DynamoDB docs, my data is way too small and my writes are way too few on average to even get close to filling up a single partition, so I don't have much hope in this direction.

The throttling doesn't seem to coincide with, like, peak daily traffic or anything either. I'm just super mystified and basically resigned to magical thinking at this point. So far I've been looking at the CloudWatch metrics for provisioned capacity, capacity usage and write throttle events, is there anything else I should be looking at to figure out what I'm doing wrong?

Vanadium fucked around with this message at 13:48 on Jun 19, 2017

# ¿ Jun 19, 2017 13:30

Vanadium: Jan 8, 2005

If banking unused capacity happens on a 1:1 basis, I shouldn't have any issues. Averaged over 30 seconds I'm consistently under the provisioned capacity. Getting throttled at that point makes sense to me if the banked capacity is only provided on a best-effort basis, if the underlying hardware has capacity to spare or whatever.

What I can't figure out is why I seem get throttled below my provisioned capacity on a per-second basis, but maybe I'm actually measuring that wrong and averaging too much there.

I haven't realized I can get numbers for my remaining capacity, that sounds a lot more useful than the consumed capacity I've looked at. I'm not sure what I get out of throttling myself, though--does getting throttled on the AWS side punish me by consuming even more capacity? Right now I just add exponentially backoff whenever at least one write in my batch gets throttled, but with how quickly a little bit of capacity refills that doesn't really do much, maybe I need to be more aggressive about backing off.

Vanadium fucked around with this message at 22:50 on Jun 19, 2017

# ¿ Jun 19, 2017 22:40

Vanadium: Jan 8, 2005

Nah, I have one local secondary index and I'm pretty sure it gets included in the ConsumedCapacity CapacityUnits total.

# ¿ Jun 19, 2017 23:22

Vanadium: Jan 8, 2005

Chances are we're going to drop DynamoDB since our use-case is not only a good fit but also doesn't justify how much time I've been sinking into it. It's really small amounts of data and I'm starting to think anything else would work better, even just periodically putting a file on S3. I've basically been trying to use DynamoDB as IPC with incidental persistence.

My backoff works out to multiple seconds as I get throttled, but when my throughput is limited by capacity i don't think I can optimize much with backoff here. Fairly sure I have two partitions right now. The partitioning of my data isn't great so I did the thing where you just add random poo poo to your partition key to spread things out more, which I think should sort that out.

I don't think a cache helps either since my writer is already batching updates in 30 second windows and then only updating each key once.

Thanks for all the advice, I feel a bunch better prepared to use DynamoDB now if that starts seeming appropriate for another project.

# ¿ Jun 21, 2017 11:07

Vanadium: Jan 8, 2005

Hey, if I'm using lambda functions that sit idle during their invocation for like 5-10 minutes at a time, am I doing it wrong? I wanted to kick off redshift queries in a clever serverless way but I guess I'm not optimally using the pricing structure if my lambda function starts up, dials up redshift, sends a query and then just sits there until the query returns.

On the other hand the call volume is gonna be low so it doesn't really matter either way, probably. Just feels wrong?

# ¿ Sep 9, 2017 15:56

Vanadium: Jan 8, 2005

I don't know that you can have Redshift run queries without keeping like a postgres session open to the cluster the entire runtime? I forgot about bucket events, if that works then that seems like the right way to do it.

# ¿ Sep 9, 2017 18:56

Vanadium: Jan 8, 2005

My pal who actually works with redshift every day says you can't kick off queries asynchronously, so my schemes are probably dead in the water then.

# ¿ Sep 10, 2017 14:52

Vanadium: Jan 8, 2005

My data is in redshift already and I don't wanna gently caress with the setup in general too much. I guess it's gonna be a cronjob on some random host to do the queries and post the results to S3 and the lambda then just verifies that everything went ok.

Is there a standard way to hack up ssh or dns so you can ssh to instances by instance id or stuff like that without having to do lookups yourself, by hand?

# ¿ Sep 11, 2017 17:31

Vanadium: Jan 8, 2005

People here have been polling a json config file on S3, ymmv.

# ¿ Nov 7, 2017 00:40

Vanadium: Jan 8, 2005

Y'all are a lot fancier than we are I guess.

# ¿ Nov 10, 2017 14:57

Vanadium: Jan 8, 2005

some student is gonna snipe your bucket names

# ¿ Feb 7, 2018 22:35

Vanadium: Jan 8, 2005

I don't think you can use an IAM user ARN in a role's trust relationship. You need to add the account itself (arn:aws:iam::YOURACCOUNTIDHERE:root) as a trusted entity, and then your AdministratorAccess policy can allow users in that account to assume the role.

Edit: maybe not, I guess I confused myself by trying to assume the role too quickly after changing the trust relationships, when the change takes a bunch of seconds to take effect.

Vanadium fucked around with this message at 12:15 on Apr 6, 2018

# ¿ Apr 6, 2018 11:39

Vanadium: Jan 8, 2005

I'm working on a thing running in AWS that runs a Redshift UNLOAD query to dump a bunch of stuff to some S3 location. The app runs under an instance profile or whatever granting it the required S3 permissions but also a bunch of other permissions, the Redshift cluster doesn't have any relevant permissions.

My app can pass AWS credentials as part of the UNLOAD query, so it can just pass its own temporary credentials and everything will work because Redshift can now everything the app can. I'm a bit hesitant to do that because inserting unnecessarily powerful credentials into the query text seems like a super bad idea in general.

Is there a way to take my app's temporary credentials and turn it into a new set of temporary credentials with further restrictions on allowed actions (and maybe duration), so that my app can delegate just a subset of its own S3 permissions? The STS AssumeRole action allows specifying a policy to further restrict what resulting set of credentials can do, but that requires me to configure a role and teach my app to assume that particular role in addition to its own role, and I'd like to get out of that somehow. I'd also like to avoid attaching a role to the whole Redshift cluster, since then I'd have to think about how to make it so only my app can take advantage of that and not other users of that Redshift cluster.

Since AssumeRole and other STS actions exist, this seems tantalizingly possible, but it's apparently not actually supported by any STS calls for some reason. Am I misunderstanding the permissions model? Should I just get over it and configure a bunch of super narrow extra roles? Or should I just let the Redshift cluster do whatever it wants on my AWS account and not worry about the query texts with the credentials showing up in the wrong place at some point?

# ¿ May 25, 2018 17:34

Vanadium: Jan 8, 2005

This past page is like the most positive I've ever seen someone be about CloudFormation. I'm used to people complaining about CloudFormation getting your stack into a weird state that you can't get out of without petitioning AWS support and about not knowing what exactly applying a change is going to do when there's multiple stacks involved.

Does having a loosely connected set of terraform configs referencing each other's outputs really get so unwieldy compared to CloudFormation? I thought that was kind of comparable to how you manage multiple stacks there, and generally a good idea to ~limit blast radius~.

I guess it's hard for me to conceptualize what a comprehensive, nontrivial AWS account configuration looks like, and I've had a hard time getting into CloudFormation because it seems like everybody who talks about using it also has their own DSL they recommend, which doesn't make things less confusing. :shobon:

# ¿ May 30, 2018 23:23

Vanadium: Jan 8, 2005

Is there really no way to find your current Redshift limits for nodes total/per cluster short of trying to exceed them?

# ¿ Sep 12, 2018 15:39

Vanadium: Jan 8, 2005

I thought they only just increased lambda from 5 to 15 minutes max?

# ¿ Nov 2, 2018 06:30

Vanadium: Jan 8, 2005

PierreTheMime posted:

I posted in the Java thread but I suppose I’ll ask here too: does anyone have any experience with the AWS SDK? I’m rewriting some code to read a file from an S3 bucket (related to my question previously) but instantiating the client connection takes literally 2+ minutes from an EC2 instance with direct access and nigh-instant CLI connectivity.

The library is a little big but I can’t imagine it should take as long as it does. Once the connection is made it works but, slow as molasses. The same code parsing a file from local/network drives takes milliseconds.

Is there some weird network setup that makes the sdk's attempt to fetch AWS credentials from the ec2 instance metadata endpoint time out before it uses some other source of creds?

# ¿ Apr 20, 2019 11:55

Vanadium: Jan 8, 2005

Yeah that sounds normal. I figured it might time out and then fallback to so something else, but that doesn't sound like it.

Can you configure debug logging for the SDK and see if there's anything weird going on? I've mostly used the golang sdk, but I can just tell that to log every request it does and then figure out what's taking all the time.

# ¿ Apr 20, 2019 12:25

Vanadium: Jan 8, 2005

Why are you using the Cloudwatch Logs interface when you could be using Cloudwatch Logs Insights!!

# ¿ Aug 25, 2019 15:01

Vanadium: Jan 8, 2005

Thanks Ants posted:

I'm trying to convince our developers to move to AWS Organisations and put SSO in, authing against Azure AD. They seem to think that SSO is less secure because if you're already logged into Azure AD then you don't have to put a password in again to use AWS (I've already done a conditional access demo), so actually having all the accounts separate is better. The other push back I'm getting is that because they are only a small team that their manager is happy to create and turn off accounts as required.

If were big enough to have a security team I'd get them to have a word. I'm also a bit confused at a software developer actively pushing back against automating a process (new starter when in the right group automatically pops up in AWS ready to be assigned permissions) but I think a part of it is empire building.

Speaking as a developer who's repeatedly been suffering through petty annoyances because of process improvements, it's probably just something like the azure ad login form loading slightly more slowly than the aws signin.

# ¿ Apr 9, 2020 08:42

Vanadium: Jan 8, 2005

Yes, that's how a coworker implemented their wedding's website. :toot:

Maybe you can get somewhere with an ALB instead of API Gateway too? idk.

# ¿ May 22, 2020 20:25

Vanadium: Jan 8, 2005

iirc there was a shitpost a while back where someone built a URL shortener as a lambda by hardcoding all the URLs, and having the lambda update itself when a new URL was to be shortened, imo that's important prior art

# ¿ Jul 27, 2020 02:12

Vanadium: Jan 8, 2005

SurgicalOntologist posted:

Pretty specific question but does anyone know if one of the media services supports this use case? Frankly it is a bit of a mess deciphering all the different services available.

Basically, we are generating 2-minute long mp4s every 2 minutes and want to send this to a pseudo-live stream. Ideally this would also get saved as VOD for the future (we already have VOD working with concatenating the files locally then sending to MediaConvert, but want to switch to a as-soon-as-each-chunk-is-ready model).

The best I can parse the docs so far is that the closest service is MediaLive. It can turn one mp4 into a stream but for multiple files I can't find any relevant use case. I'm wondering if I'm missing something or we need to locally convert the files into RTP pushes or another supported ingest format.

What are you looking to do with the resulting livestream? Is someone gonna watch it in real time? How long is it gonna be, is it 2 minute chunks forever and ever?

Amazon Interactive Video Service (twitch-as-a-service) has "live streams" and "VOD" but I'm not sure if piping an indefinite number of mp4 files into an RTMP stream is gonna be very convenient.

# ¿ Apr 16, 2021 01:39

Vanadium: Jan 8, 2005

Maybe the people asking for security alerts instead of locked down accounts misunderstand your stance as "I'd simply not get hacked"?

# ¿ Jul 22, 2021 08:56

Vanadium: Jan 8, 2005

Does the ALB have a public IP address or is it internal? Do the subnets have internet gateways or nat gateways and stuff like that?

Did you configure a listener on the ALB or only a target group?

# ¿ Oct 21, 2021 09:19

Vanadium: Jan 8, 2005

can you skip all the nat gateway stuff by using public subnets and using security groups to keep people from talking to you?

# ¿ Jun 5, 2022 02:46

Vanadium: Jan 8, 2005

Pretty sure the west coast people I know with principal in the title at AWS are making significantly more than $300k total comp, but they're also mostly SDEs and idk what this job family is.

# ¿ Aug 12, 2022 21:07

Vanadium: Jan 8, 2005

You're not gonna regret learning how cloudformation works if you end up using CDK. :v:

# ¿ Aug 19, 2022 18:38

Vanadium: Jan 8, 2005

Is it okay to use the default VPC or should I always make a fresh VPC in my terraform or whatever? I need to put a few lambdas into a VPC because they need to talk to a VPC Endpoint to talk to a thing running in another account and welp it's just a lot of boilerplate isn't it.

# ¿ Oct 4, 2022 02:15

Vanadium: Jan 8, 2005

Eh, they're trying to cut down on handing out routable CIDR ranges. I'm pretty sure asking for a /16 would raise eyebrows, for an actual "gonna run a bunch of computey things" account I think they just gave me a /22 or /21 or something.

So far this account is mostly metrics and IAM roles and that sort of thing, and these couple lambdas that are periodically doing things to CloudWatch alarms, based on data from this service I'm wanting to call via privatelink. Right now it calls the legacy version of the thing via API Gateway but the shiny new version isn't going to have an API Gateway.

Having a tiny VPC just for these lambdas and nothing else seems least likely to cause conflicts down the line, and in the worst case I can spin up this stuff somewhere else and delete the VPC again. If they get a lot of scope creep the lambdas should probably move to a better platform than this pile of terraform, too, but I shouldn't invest a lot of time in that right now.

# ¿ Oct 4, 2022 02:53

Vanadium: Jan 8, 2005

Having a extra lambda or two always felt especially rube goldberg because it's like an extra thing that has to wire into the CI/CD stuff instead of applying some terraform straight out of the git snapshot or w/e. Our setup may have grown a little janky.

But "SNS -> SQS -> Lambda" really looks like a single step to me. That's just how you make a lambda, it has some goop in front of it to glue it to the rest of the diagram, that doesn't really count as a separate service.

# ¿ Jun 25, 2023 00:01

Vanadium: Jan 8, 2005

Another thing I'd treat as sensitive are internal/private S3 bucket names, because people frequently enough give them formulaic names and/or construct bucket names dynamically in code, and if it gets out your bucket is named my-cool-app-prod-us-west-2, some clown might create my-cool-app-prod-eu-west-1 or something and give you an unnecessary headache when you want to deploy your stuff there too.

Maybe that's just me being grumpy about shared namespaces.

And account IDs aren't inherently sensitive and it wouldn't come up in a demo, but if eg you're interacting with the AWS accounts of your customers somehow, you probably do want to be careful not to leak their AWS account IDs to other parties even if you don't mind your own showing up in public, but, like, just for normal privacy reasons and not security reasons.

I got laid off from my AWS heavy job a few months ago and do not see a lot of AWS in my future but this stuff is still swirling around my head send help.

# ¿ Aug 5, 2023 19:06

Vanadium: Jan 8, 2005

Hughmoris posted:

Are you just tired of AWS, or moving onto greener pastures?

No shade on AWS, just that the places I'm looking at either don't use it directly or are not super cloud adjacent in the first place. I wouldn't mind doing more AWS and tbh I'd enjoy getting more perspectives on how other orgs set up their stuff.

# ¿ Aug 5, 2023 19:21

Vanadium: Jan 8, 2005

Can you make it run aws sts get-caller-identity?

# ¿ Sep 29, 2023 13:34

Vanadium: Jan 8, 2005

Xerxes17 posted:

In SSH, .ebextensions, or both?

Like in the context where you're not sure what role is being used. So, sure, both, why not. It's cheap and easy and doesn't even require any permissions!

# ¿ Sep 30, 2023 11:50

Adbot: ADBOT LOVES YOU

# ¿ Apr 30, 2024 21:35

Vanadium: Jan 8, 2005

That's wild, do they come with creds for a role in this account that you apparently don't own? Any chance it's some implementation-detail AWS-internal account leaking through?

# ¿ Dec 28, 2023 20:58

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard