Amazon Web Services - Cloud Giant Hits Hard - The Something Awful Forums

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard

«‹›61 »

Docjowles: Apr 9, 2009

Vanadium posted:

I got laid off from my AWS heavy job a few months ago and do not see a lot of AWS in my future but this stuff is still swirling around my head send help.

Today I was at a park with my kids and started chatting with another dad. Asked what he does for work and he loving works for AWS. Even on the weekend there is no escaping the shadow of the cloud.

Sorry to hear about the layoff that sucks.

# ? Aug 5, 2023 20:42

Adbot: ADBOT LOVES YOU

# ? May 1, 2024 17:07

Falcon2001: Oct 10, 2004; Eat your hamburgers, Apollo.; Pillbug

necrobobsledder posted:

Am in security and am a Cloud Guy and this is part of the pentest suites for discovery out there, so yes being predictable is probably a Bad Idea. Also yes, consider "prod" to be similar to using "password" in your passwords - they are directly being used in word lists as top priority right next to dev and stage.

Aside:

Jesus Christ, in TYOOL 2023 the SA CoC subforum doesn't support Markdown formatted posts or if it does it's certainly not helping you find out.

Nah, this forum's significantly older than Markdown's rise to prominence, you're not missing anything.

# ? Aug 5, 2023 21:41

vanity slug: Jul 20, 2010

Use UUIDs for your bucket names >:)

# ? Aug 6, 2023 14:15

Hed: Mar 31, 2004; Fun Shoe

We have an application set running on EKS that seems to take way too much infrastructure. My 1.0 version (non-kubernetes) ran on 1 t2.medium and now the re-architected version takes 5 t3.larges. The CPU usage is very small across these, though... most of our workflows spend their time waiting for I/O. There's no memory or disk I/O pressure.

The problem is IP addresses. If we try to run too many pods on our EKS we get a "can't schedule pod" message and EKS complains that it's out of IP addresses to allocate. Right now we are running 67 pods and going much higher would require a 6th instance in the cluster.

Where should I look to research this problem and allow for more multiplexing on the same instances? I found this article about optimization, but things like network overlays don't seem like the first thing I'd want to reach for in terms of complexity.

# ? Sep 6, 2023 21:43

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

Hed posted:

We have an application set running on EKS that seems to take way too much infrastructure. My 1.0 version (non-kubernetes) ran on 1 t2.medium and now the re-architected version takes 5 t3.larges. The CPU usage is very small across these, though... most of our workflows spend their time waiting for I/O. There's no memory or disk I/O pressure.

The problem is IP addresses. If we try to run too many pods on our EKS we get a "can't schedule pod" message and EKS complains that it's out of IP addresses to allocate. Right now we are running 67 pods and going much higher would require a 6th instance in the cluster.

Where should I look to research this problem and allow for more multiplexing on the same instances? I found this article about optimization, but things like network overlays don't seem like the first thing I'd want to reach for in terms of complexity.

You're running 67 pods and running out of ips? are you using IPAM? what's your CNI? how many nodes? So like the default for eks I think is 150 pods per node but you're not where hitting that.

# ? Sep 12, 2023 23:34

Xerxes17: Feb 17, 2011

I have a dumb and small question for a personal project. I have a site hosted on an Elastic Beanstalk which I will update, but it also needs a file that's too big to package with the rest of it so I've been just SSHing in to manually copy it over. At first this was done with WinSCP, but later I've been loading it from an S3 bucket. The problem is that I've also got any ebextension script which should be doing it automatically and it does run, but the file ends up always being size 0?

# ? Sep 12, 2023 23:39

Docjowles: Apr 9, 2009

If I had to guess the instance doesn�t have permission to read the object. If you run the script manually as the same user it normally runs under does it work? Are there logs you could inspect or anything in cloudtrail?

# ? Sep 13, 2023 01:03

The Iron Rose: May 12, 2012; Cat Army

jaegerx posted:

You're running 67 pods and running out of ips? are you using IPAM? what's your CNI? how many nodes? So like the default for eks I think is 150 pods per node but you're not where hitting that.

It depends on how many network interfaces the underlying instance has. If you have small workloads and small nodes, you can�t run all that many pods on them at once. At least when using the default CNI. A t3.small supports a grand total of 11 pods, including system daemonsets.

The Iron Rose fucked around with this message at 01:42 on Sep 13, 2023

# ? Sep 13, 2023 01:37

Xerxes17: Feb 17, 2011

Docjowles posted:

If I had to guess the instance doesn’t have permission to read the object. If you run the script manually as the same user it normally runs under does it work? Are there logs you could inspect or anything in cloudtrail?

Correct, when I run it manually in SSH it works. AFAIK, it is something to do with the permissions but I've found it rather difficult to find answers on that. I don't think I have cloud trail logs setup.

# ? Sep 13, 2023 23:12

ledge: Jun 10, 2003

Xerxes17 posted:

Correct, when I run it manually in SSH it works. AFAIK, it is something to do with the permissions but I've found it rather difficult to find answers on that. I don't think I have cloud trail logs setup.

Does the EC2 instance created by Elastic Beanstalk have a role associated to it that has a policy granting access to the S3 bucket?

# ? Sep 13, 2023 23:25

BaseballPCHiker: Jan 16, 2006

Once again I am struggling with cross account permissions.

I'm trying to create a Cloudformation Template that could be deployed in all of our accounts that will detect root user logins via EventBridge and targets a central SNS topic in another account.

The central SNS topic has an access policy of allowing AWS: * to make sns: Publish on the condition that the PrincipalOrgID matches our AWS organization ID. No problems there as far as I can tell.

The CFT I'm writing keeps failing with this error:

code:

Access to the resource blahblahXYZ is denied. Reason: Adding cross-account target is not permitted. (Service: AmazonCloudWatchEvents: Status Code: 400. Error Code: AccessDeniedException. Request ID: Whatever. Proxy: Null

So then I tell myself OK, I need to define a policy in my CFT to give EventBridge rights to publish. But if I do that I get:

code:

"User:" arn whatever is not authorized to perform SNS:SetTopicAttributes on resource blahblahXYZ because no resource based policy allows the SNS:SetTopicAttributes action. (Service: Sns, Status Code: 400. Request ID: whatever REquestToken: whatever. AccessDenied)

Except that I have another SID within the SNS access policy that says allow principal AWS * to make SNS:GetTopic, SetTopic, AddPermission, RemovePermission, DeleteTopic, Subscribe, ListSubsByTopic, AND Publish.

I had thought this would be relatively straight forward. The idea was I could use this as a template and just update events that we wanted to alert on and publish to a central Org topic. But once again I am banging my head against the wall when it comes to cross account access.

Am I missing something obvious or is there a better way to go about this?

# ? Sep 19, 2023 15:40

crazypenguin: Mar 9, 2005; nothing witty here, move along

Fast possibly useless thought, but the error message "Adding cross-account target is not permitted." does not strike me as a IAM policy problem, but a limitation of what the service will accept.

# ? Sep 19, 2023 16:12

12 rats tied together: Sep 7, 2006

haven't thought about it too hard but i want to note that when you're doing x-account access you usually can't rely on asterisk, you have to supply the account id of the other account

edit to include docs link: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html

highly recommend everyone who touches AWS at work read the full iam documentation, especially this part, whenever they have a moment

# ? Sep 19, 2023 17:47

Scrapez: Feb 27, 2004

Question regarding RDS Proxy. Can you connect from a public source through an RDS Proxy to an RDS instance? Or are they only to be used to connect from within AWS/VPC?

If they're only for connecting from within AWS, what is the best method for secure connection of a client to a public RDS instance? Is it simply just locking down the security group to the source IP and providing credentials or is there a better, more secure way of doing this?

API Gateway/Lambda perhaps?

Scrapez fucked around with this message at 04:33 on Sep 20, 2023

# ? Sep 20, 2023 04:17

The Iron Rose: May 12, 2012; Cat Army

Unlike both azure and GCP, aws does not have a clean solution to zero trust public access to RDS instances! You can sorta approximate it by using SSM port forwarding to a bastion host. which sucks and you also have to handle timeouts. There�s really not a great out of the box service, especially compared to azure cosmos db�s inherent identity proxy and the GCP CloudSQL auth proxy.

AWS RDS proxy is a replacement for pgbouncer and other similar connection pooling services. It doesn�t do anything with regards to networking.

Sticking public IPs in an allowlist is neither particularly scalable nor especially secure. I really wouldn�t want to do this without an authentication proxy in front of the service.

In general AWS sucks on this front. They have a competitor service to GCP�s IAP/Azure AD App Proxy, but it costs a ridiculously huge amount of money. Something up 24/7 will cost you thousands of dollars a month, minimum. GCP/Azure�s offerings here are free!

The Iron Rose fucked around with this message at 07:02 on Sep 20, 2023

# ? Sep 20, 2023 06:59

I would blow Dane Cook: Dec 26, 2008

Is anyone here using Azure CDNs from akamai. How are you handling their impending retirement? (October 31)

# ? Sep 25, 2023 12:49

Spookydonut: Sep 13, 2010; "Hello alien thoughtbeasts! We murder children!"
~our children?~
"Not recently, no!"
~we cool bro~

The Iron Rose posted:

Unlike both azure and GCP, aws does not have a clean solution to zero trust public access to RDS instances! You can sorta approximate it by using SSM port forwarding to a bastion host. which sucks and you also have to handle timeouts. There�s really not a great out of the box service, especially compared to azure cosmos db�s inherent identity proxy and the GCP CloudSQL auth proxy.

AWS RDS proxy is a replacement for pgbouncer and other similar connection pooling services. It doesn�t do anything with regards to networking.

Sticking public IPs in an allowlist is neither particularly scalable nor especially secure. I really wouldn�t want to do this without an authentication proxy in front of the service.

In general AWS sucks on this front. They have a competitor service to GCP�s IAP/Azure AD App Proxy, but it costs a ridiculously huge amount of money. Something up 24/7 will cost you thousands of dollars a month, minimum. GCP/Azure�s offerings here are free!

strongdm has a pretty good solution for rds access, teleport has a less good solution that requires iam fuckery

# ? Sep 25, 2023 13:18

MightyBigMinus: Jan 26, 2020

I would blow Dane Cook posted:

Is anyone here using Azure CDNs from akamai. How are you handling their impending retirement? (October 31)

wow imagine how embarrassing this would be to admit

# ? Sep 25, 2023 13:24

BaseballPCHiker: Jan 16, 2006

crazypenguin posted:

Fast possibly useless thought, but the error message "Adding cross-account target is not permitted." does not strike me as a IAM policy problem, but a limitation of what the service will accept.

12 rats tied together posted:

haven't thought about it too hard but i want to note that when you're doing x-account access you usually can't rely on asterisk, you have to supply the account id of the other account

edit to include docs link: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html

highly recommend everyone who touches AWS at work read the full iam documentation, especially this part, whenever they have a moment

Definitely not an IAM problem! Im starting to think cross account SNS isnt possible without SQS? Got to look into it more.

Also a callback to several months ago when I asked about finding API calls around Marketplace subscriptions. They did finally release cloudtrail logging for this about two months ago! - https://docs.aws.amazon.com/marketplace/latest/userguide/logging-aws-marketplace-api-calls-with-aws-cloudtrail.html

# ? Sep 25, 2023 15:57

I would blow Dane Cook: Dec 26, 2008

MightyBigMinus posted:

wow imagine how embarrassing this would be to admit

I don't get it?

# ? Sep 26, 2023 04:47

Hed: Mar 31, 2004; Fun Shoe

The Iron Rose posted:

It depends on how many network interfaces the underlying instance has. If you have small workloads and small nodes, you can�t run all that many pods on them at once. At least when using the default CNI. A t3.small supports a grand total of 11 pods, including system daemonsets.

Is there a way to get more pods on the same hardware? Currently we are just buying the "most economical" from a cost per interfaces standpoint, but it's really crummy as our instances sit idle almost all of the time. Should we switch to nitro?

# ? Sep 26, 2023 20:08

The Iron Rose: May 12, 2012; Cat Army

Hed posted:

Is there a way to get more pods on the same hardware? Currently we are just buying the "most economical" from a cost per interfaces standpoint, but it's really crummy as our instances sit idle almost all of the time. Should we switch to nitro?

It sounds purpose fit to solve your problems, yes!

# ? Sep 26, 2023 20:34

Xerxes17: Feb 17, 2011

Docjowles posted:

If I had to guess the instance doesn�t have permission to read the object. If you run the script manually as the same user it normally runs under does it work? Are there logs you could inspect or anything in cloudtrail?

ledge posted:

Does the EC2 instance created by Elastic Beanstalk have a role associated to it that has a policy granting access to the S3 bucket?

Sorry for not replying sooner, as full-stack development is now my day job, my hobby project needs to get by with the scraps of dev-energy I have left over. :v:

Alas, the logs don't tell me much beyond claiming that the ebextensions script ran successfully.

So when I run it manually via SSH, I do the command by ec2-user, and it works. And I thought I've added the EB role to the S3 bucket and so on, but I guess not? How would I be able to find which one it is? Looking at the IAM console there are 3 accounts with a "last activity" that matches the last time I deployed the service, (cdk-hnb{etc, etc}) would these be the ones to add instead of the "RecipeAppIAMRole" or "RecipeBeanstalkServiceRole" in the S3 bucket policy? Do I need to add anything to the IAM roles, not just the s3 bucket?

code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam:~~~RecipeAppIAMRole~~~~",
                    "arn:aws:iam::~~~RecipeBeanstalkServiceRole~~~"
                ]
            },
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "arn:aws:s3:::~~~bucket~~~",
                "arn:aws:s3:::~~~bucket~~~/*",
                "arn:aws:s3:::~~~bucket~~~/~~~file~~~"
            ]
        },
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3::::~~~bucket~~~/~~~file~~~"
        }
    ]
}

# ? Sep 29, 2023 12:22

Vanadium: Jan 8, 2005

Can you make it run aws sts get-caller-identity?

# ? Sep 29, 2023 13:34

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

My new job has me janitoring some EB stuff and I think it's because the stuff that runs from .ebextensions is run via cfn-init which has different default creds that are more tightly scoped to the CloudFormation stack (which is what Elastic Beanstalk is under the hood). It's kinda obliquely mentioned in the docs here and they show what you're supposed to do for S3 in the EB docs. Looks like you've got to create an auth method for cfn-init to use (in this case, use the instance profile for the instance) and then tell it to use that when downloading the file from S3.

# ? Sep 29, 2023 14:21

Xerxes17: Feb 17, 2011

Vanadium posted:

Can you make it run aws sts get-caller-identity?

In SSH, .ebextensions, or both?

I'll try that as well whatsfordinner

# ? Sep 30, 2023 02:35

Vanadium: Jan 8, 2005

Xerxes17 posted:

In SSH, .ebextensions, or both?

Like in the context where you're not sure what role is being used. So, sure, both, why not. It's cheap and easy and doesn't even require any permissions!

# ? Sep 30, 2023 11:50

Necronomicon: Jan 18, 2004

I've got a question about monitoring my ECS services and I'm hoping y'all can help, apologies if this is overly simple, I just don't have a huge amount of experience with alarms/monitoring.

I have three ECS services (a queue service, an admin portal, and a customer-facing portal) and a CloudWatch alarm that compares the number of running tasks to the desired number of tasks for each service. If the number of running tasks dips below the desired number for a certain amount of time, it notifies an SNS topic, and invokes a Lambda function that sends an alarm to a Slack channel. The alarms work fine for the two portals, but the queue service is giving me trouble. Whenever we deploy, the service will shut down its one task (since there can only be one at a time), and then redeploy and pick up queue items that piled up in the meantime. It normally takes about 5-10 minutes. So while my task will report that it stopped, it didn't technically "fail". I'm having trouble distinguishing between a "stopped" task (which is expected during the deploy) and a "failed" task.

There are probably some underlying architectural issues here but I'm being told they're not able to be changed and I have to just make this work. I'm using Container Insights and the RunningTaskCount metric, but I think I'm just looking at this from the wrong angle. Does anybody have any advice?

Edit: From research, I think I probably need to create an EventBridge rule, something like...

code:

{
  "source": ["aws.ecs"],
  "detail-type": ["ECS Task State Change"],
  "detail": {
    "clusterArn": "arn:aws:ecs:us-east-1:xxxxxxx:cluster/$cluster
    "containers": [{
      "containerArn": "arn:aws:ecs:us-east-1:xxxxxxx:container/$container",
      "lastStatus": "RUNNING",
      "name": "test",
      "taskArn": "arn:aws:ecs:us-east-1:xxxxxxx:task/$task"
    }],
    "eventType": ["WARN", "ERROR"]
  }
}

...pointed at my SNS topic, where this particular cluster contains basically just the one service that I need the specific alarm for. I'm still poking around with this and will need to trigger some deliberate task failures to test.

Necronomicon fucked around with this message at 21:39 on Oct 10, 2023

# ? Oct 10, 2023 19:51

Necronomicon: Jan 18, 2004

Answering my own question here in case it helps anyone. This is the EventBridge Rule I created to catch the event I was looking for:

code:

{
  "detail": {
    "group": ["service:$serviceName"],
    "lastStatus": ["STOPPED"],
    "stoppedReason": [{
      "anything-but": {
        "prefix": "Scaling activity initiated by (deployment"
      }
    }]
  },
  "detail-type": ["ECS Task State Change"],
  "source": ["aws.ecs"]
}

Shamelessly stolen from here.

# ? Oct 11, 2023 21:22

Scrapez: Feb 27, 2004

I'm looking to use Amazon EventBridge to kick off a monthly script that gathers data about an RDS instance data transfer via resource metrics. Everything I'm seeing says to have EventBridge kick off SSM to execute the shell script on a specific instance. I can do it this way but I don't like the idea of it being tied to a single instance.

Is there a different way to do this where I can have a monthly job that executes

code:

aws pi get-resource-metrics --service-type RDS --identifier db-XXXXXXXXXX --start-time $weekagodate --end-time $todaydate --period-in-seconds 86400 --metric-queries '[{"Metric": "os.network.tx.sum"  }]'

and then passes the output to SNS to be emailed?

# ? Oct 23, 2023 21:08

Docjowles: Apr 9, 2009

Scrapez posted:

I'm looking to use Amazon EventBridge to kick off a monthly script that gathers data about an RDS instance data transfer via resource metrics. Everything I'm seeing says to have EventBridge kick off SSM to execute the shell script on a specific instance. I can do it this way but I don't like the idea of it being tied to a single instance.

Is there a different way to do this where I can have a monthly job that executes
code:
aws pi get-resource-metrics --service-type RDS --identifier db-XXXXXXXXXX --start-time $weekagodate --end-time $todaydate --period-in-seconds 86400 --metric-queries '[{"Metric": "os.network.tx.sum"  }]'
and then passes the output to SNS to be emailed?

I would put this into a Lambda or an ECS task and have EventBridge run that. Rather than having a whole EC2 instance just to run one command once a month

# ? Oct 23, 2023 21:17

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Docjowles posted:

I would put this into a Lambda or an ECS task and have EventBridge run that. Rather than having a whole EC2 instance just to run one command once a month

Same, pass it to Lambda. If whats needed is a series of functions that take longer than the 15 minute timeout, use step functions to run subfunctions of the same code. If it sounds tricky find a youtube on stepfunctions, they're pretty nice for "I want to do something daily/monthly but it takes an hour for all the steps". If you're over the "size limit" of a lambda you can use docker containers up to 10GB with lambdas.

If you use AWS SAM CLI to set up this workflow you can set up scheduling in the template.yaml file so it all "just works"

# ? Oct 23, 2023 21:41

Scrapez: Feb 27, 2004

Thank you, guys. Lambda was definitely the best fit here.

# ? Oct 25, 2023 16:39

Scrapez: Feb 27, 2004

Aurora Postgresql

I'm trying to determine if there's a way, either with a built in AWS mechanism like Performance Insights or with something built into Postgresql, that will allow me to determine the total data transfer for a user within a given day.

The database has a DMS task that replicates data into it from a different database. There's a client that then connects to the database once per day to pull down new data for the previous 24 hours. I've tried using the instance Performance Insights to determine how much data transfer is associated with the queries they perform daily but the problem is that it seems to show transfer for everything the instance does which would include DMS, etc. I tried establishing a baseline where the client did not make queries against the database for a few days but the size of data transfer still fluctuated.

I've read about pg_stat_statements for Postgresql but it's still unclear to me whether I could extract size of data queried from that. It seems to only be related to number of queries, cost of queries, optimizing speed, etc.

Not sure how best to figure this out.

# ? Nov 9, 2023 00:00

Xerxes17: Feb 17, 2011

Vanadium posted:

Like in the context where you're not sure what role is being used. So, sure, both, why not. It's cheap and easy and doesn't even require any permissions!

I finally got around to trying this in SSH and I found that it was being run by an 'assumed-role', which after I added it to the S3 permission config now seemingly allows it to work. :toot:

# ? Nov 27, 2023 10:21

KS: Jun 10, 2003; Outrageous Lumpwad

Was happy to find the re:Invent talks get posted on YouTube, so posting the best one I attended this year:

https://www.youtube.com/watch?v=CowAYv3qNCs

3 very senior AWS folks talk about how to make apps more resilient, based on their experiences running core AWS services.

# ? Dec 7, 2023 16:55

The Fool: Oct 16, 2003

how much of that is amazon specific vs concepts that can be translated to other services?

# ? Dec 7, 2023 23:58

KS: Jun 10, 2003; Outrageous Lumpwad

Nothing Amazon specific there, just good practices for distributed apps.

# ? Dec 9, 2023 16:45

BaseballPCHiker: Jan 16, 2006

Got a weird one today I've never seen.

A bunch of workspace instances are showing an account# via IMDS that is nowhere to be found in my AWS org. Like I can view their VPC IDs and confirm theyre in the right account but the host itself reports a different account. Really throwing off inventory for me.

# ? Dec 28, 2023 20:43

Adbot: ADBOT LOVES YOU

# ? May 1, 2024 17:07

Vanadium: Jan 8, 2005

That's wild, do they come with creds for a role in this account that you apparently don't own? Any chance it's some implementation-detail AWS-internal account leaking through?

# ? Dec 28, 2023 20:58

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard

«‹›61 »