Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
mondomole
Jun 16, 2023

Falcon2001 posted:

Yeah, for the most part your EC2 instance getting ransomware'd isn't Amazon's problem, that's your problem. But uh...it is a problem. Go find some particularly juicy ransomware stories and start trotting it out every time someone balks at migration plans.

I found a better stick. Here's an article by a law firm claiming that running EOL Windows may violate article 32 of General Data Protection Regulation (GDPR). Ransomware is probably too theoretical / low probability that C suite may end up ignoring it, but GDPR violations have heavy fines and they are enforced relatively quickly. Plus, once you've mentioned GDPR compliance in writing, they probably don't want to be dismissing that in writing either...That said, the argument that EOL Windows should violate GDPR feels a bit weak to me, but I'm also not a lawyer.

The juicy fines stuff:

quote:

For especially severe violations, listed in Art. 83(5) GDPR, the fine framework can be up to 20 million euros, or in the case of an undertaking, up to 4 % of their total global turnover of the preceding fiscal year, whichever is higher. But even the catalogue of less severe violations in Art. 83(4) GDPR sets forth fines of up to 10 million euros, or, in the case of an undertaking, up to 2% of its entire global turnover of the preceding fiscal year, whichever is higher.
https://gdpr-info.eu/issues/fines-penalties/

Adbot
ADBOT LOVES YOU

Internet Explorer
Jun 1, 2005





Also check out FTC vs. Wyndham. I've been able to use that to put the fear of god into execs quite in bit in my travels.

mondomole
Jun 16, 2023

Internet Explorer posted:

Also check out FTC vs. Wyndham. I've been able to use that to put the fear of god into execs quite in bit in my travels.

Oddly enough, the followup settlement had no fines: https://www.ftc.gov/news-events/new...nformation-risk

quote:

Under the terms of the settlement, the company will establish a comprehensive information security program designed to protect cardholder data – including payment card numbers, names and expiration dates. In addition, the company is required to conduct annual information security audits and maintain safeguards in connections to its franchisees’ servers.

I'm shocked they essentially got away with not having this to begin with.

Internet Explorer
Jun 1, 2005





Agreed, pretty amazing. But being bound to the settlement for 20 years isn't something most businesses are going to want to deal with.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

mondomole posted:

I found a better stick. Here's an article by a law firm claiming that running EOL Windows may violate article 32 of General Data Protection Regulation (GDPR). Ransomware is probably too theoretical / low probability that C suite may end up ignoring it, but GDPR violations have heavy fines and they are enforced relatively quickly. Plus, once you've mentioned GDPR compliance in writing, they probably don't want to be dismissing that in writing either...That said, the argument that EOL Windows should violate GDPR feels a bit weak to me, but I'm also not a lawyer.

The juicy fines stuff:

https://gdpr-info.eu/issues/fines-penalties/

Hell yeah, get 'em. Lord knows we need more good reasons for businesses to actually keep up with poo poo.

Docjowles
Apr 9, 2009

I am in no way advocating this but a lot of companies just give the bare minimum lip service to security and privacy. Because the cost of actually doing it right is less than just paying a fine and eating some bad PR if/when a major breach occurs. I mean it’s not like Equifax or Target went down when they had massive incidents a while back.

Thankfully legislation like GDPR actually has some teeth and isn’t just comically finding a company the size of Google like $100k.

Docjowles fucked around with this message at 00:58 on Jun 18, 2023

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Docjowles posted:

I am in no way advocating this but a lot of companies just give the bare minimum lip service to security and privacy. Because the cost of actually doing it right is less than just paying a fine and eating some bad PR if/when a major breach occurs. I mean it’s not like Equifax or Target went down when they had massive incidents a while back.

Thankfully legislation like GDPR actually has some teeth and isn’t just comically finding a company the size of Google like $100k.

Not to get off topic too hard, but the part about this that irritates me the most is how much this fucks with the ability for teams to plan, because ignoring stuff like this just turns into a crisis later on that suddenly comes into your office, sweeps everything off your desk onto the floor, and takes a poo poo on your desk.

jiffypop45
Dec 30, 2011

Is there a recommended way to execute intense/long running queries against an Aurora RDS cluster without impacting the cluster itself? When we did self hosted MySQL on EC2 we just booted a new box in the same vpc but just left it out of the load balancer.

BaseballPCHiker
Jan 16, 2006

Thanks for all of the tips and links everyone!

I was saddened to learn that we could just move these instances over to Azure and apparently we could keep running old poo poo for as long as our hearts desired. For now I'm keeping this info to myself while hoping that people get their poo poo together. Ive been in IT long enough to know better, but have also gotten much better at just getting a CYA email and letting this poo poo go when my shift is over for the day.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

jiffypop45 posted:

Is there a recommended way to execute intense/long running queries against an Aurora RDS cluster without impacting the cluster itself? When we did self hosted MySQL on EC2 we just booted a new box in the same vpc but just left it out of the load balancer.

Create a read replica and then run the query there

jiffypop45
Dec 30, 2011

Blinkz0rz posted:

Create a read replica and then run the query there

I realized after reading the docs how trivial my question is. Thanks!

BaseballPCHiker
Jan 16, 2006

Has anyone played around with EIC much yet? https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-ec2-instance-connect-ssh-rdp-public-ip-address/

Thinking that aside from some DB work teams in my org are doing this would be a great way to cut down on bastion hosts.

Hed
Mar 31, 2004

Fun Shoe
I see that SES can receive emails, has anyone used these as a forwarder too?

I interact with some companies that can only send us reports via email. I'd love to give them a statements@domain.com that kicks off a Lambda / posts the attachment to S3, but in the meantime (while transitioning) I'd like to have it forward to the usual recipients. I guess I could fire a lambda that does an SES forward, but it's starting to sound Rube Goldbergian at that point.

Arzakon
Nov 24, 2002

"I hereby retire from Mafia"
Please turbo me if you catch me in a game.

Hed posted:

I see that SES can receive emails, has anyone used these as a forwarder too?

I interact with some companies that can only send us reports via email. I'd love to give them a statements@domain.com that kicks off a Lambda / posts the attachment to S3, but in the meantime (while transitioning) I'd like to have it forward to the usual recipients. I guess I could fire a lambda that does an SES forward, but it's starting to sound Rube Goldbergian at that point.

Lambda to trigger a forward isnt Rube Goldbergian. My team project to take an incoming SES e-mail and pass it through SNS to SQS to Lambda to parse the subject line and run that as a query against an RDS database and drop the output into an S3 object was Rube Goldbergian. Mostly because it was part of a joke team building exercise to build a Rube Goldberg machine using as many AWS services as possible (back when there weren’t very many services).

Docjowles
Apr 9, 2009

Arzakon posted:

Lambda to trigger a forward isnt Rube Goldbergian. My team project to take an incoming SES e-mail and pass it through SNS to SQS to Lambda to parse the subject line and run that as a query against an RDS database and drop the output into an S3 object was Rube Goldbergian. Mostly because it was part of a joke team building exercise to build a Rube Goldberg machine using as many AWS services as possible (back when there weren’t very many services).

And then those jokes get published as official AWS Solutions. I swear some of those architecture diagrams have 50 unique shapes on them

ledge
Jun 10, 2003

Arzakon posted:

Lambda to trigger a forward isnt Rube Goldbergian. My team project to take an incoming SES e-mail and pass it through SNS to SQS to Lambda to parse the subject line and run that as a query against an RDS database and drop the output into an S3 object was Rube Goldbergian. Mostly because it was part of a joke team building exercise to build a Rube Goldberg machine using as many AWS services as possible (back when there weren’t very many services).

Two steps is not Rube Goldberg. Rube Goldberg is:
file arrives in S3,
triggers lambda to process file line by line into SQS queue, with a dummy entry to indicate the end of the file
SQS sends to Lambda which loads entries into one of two DynamoDBs (one active on empty) based on environment variables in the lambda,
When the dummy eof entry is received call another lambda
this Lambda updates the environment variables on loading lambda and reading lambda and itself about what DynamoDB to target and then deletes and recreates what was previously the active dynamoDB table.

Which I have to do as loading into DynamoDB is slow as poo poo and the file is big enough to take over 15 minutes so I can't do it all in a single lambda.

Vanadium
Jan 8, 2005

Having a extra lambda or two always felt especially rube goldberg because it's like an extra thing that has to wire into the CI/CD stuff instead of applying some terraform straight out of the git snapshot or w/e. Our setup may have grown a little janky.

But "SNS -> SQS -> Lambda" really looks like a single step to me. That's just how you make a lambda, it has some goop in front of it to glue it to the rest of the diagram, that doesn't really count as a separate service.

Startyde
Apr 19, 2007

come post with us, forever and ever and ever

ledge posted:

Two steps is not Rube Goldberg. Rube Goldberg is:
file arrives in S3,
triggers lambda to process file line by line into SQS queue, with a dummy entry to indicate the end of the file
SQS sends to Lambda which loads entries into one of two DynamoDBs (one active on empty) based on environment variables in the lambda,
When the dummy eof entry is received call another lambda
this Lambda updates the environment variables on loading lambda and reading lambda and itself about what DynamoDB to target and then deletes and recreates what was previously the active dynamoDB table.

Which I have to do as loading into DynamoDB is slow as poo poo and the file is big enough to take over 15 minutes so I can't do it all in a single lambda.

I know Batch is a bad word in lots of shops but this sounds like something I'd throw at batch if my org didn't like naked ec2s getting spun up, depending on frequency. At least a step function to avoid lambdas calling lambdas manipulating themselves.

Ajaxify
May 6, 2009

ledge posted:

Two steps is not Rube Goldberg. Rube Goldberg is:
file arrives in S3,
triggers lambda to process file line by line into SQS queue, with a dummy entry to indicate the end of the file
SQS sends to Lambda which loads entries into one of two DynamoDBs (one active on empty) based on environment variables in the lambda,
When the dummy eof entry is received call another lambda
this Lambda updates the environment variables on loading lambda and reading lambda and itself about what DynamoDB to target and then deletes and recreates what was previously the active dynamoDB table.

Which I have to do as loading into DynamoDB is slow as poo poo and the file is big enough to take over 15 minutes so I can't do it all in a single lambda.

Hope you're using a FIFO queue

ledge
Jun 10, 2003

Startyde posted:

I know Batch is a bad word in lots of shops but this sounds like something I'd throw at batch if my org didn't like naked ec2s getting spun up, depending on frequency. At least a step function to avoid lambdas calling lambdas manipulating themselves.

The lambda manipulation has to happen anyway as I am swapping between to tables to avoid any downtime and the call to lookup against the table comes from Connect. The load only happens once a day. That said I might try making a multithreaded lambda to load into the Dynamo DB and see if I can get rid of the SQS part.

Ajaxify posted:

Hope you're using a FIFO queue
:)

dads friend steve
Dec 24, 2004

Startyde posted:

I know Batch is a bad word in lots of shops

I didn’t know this. Any insights as to why? I’ve got a team looking to run, effectively, batch processing jobs and I’m sorely tempted to recommend they throw away the custom job allocation code they’re writing and just switch to Batch. Are there some catches or some other reason it doesn’t live up to how AWS portrays it in their doc?

Startyde
Apr 19, 2007

come post with us, forever and ever and ever
Latency on job begin. It’s such a stupid reason for how easy the service is to hold in your head, interact with the SDK, and build for BUT people get frustrated to the point of insanity that there can be minutes between job add and job begin. Crom help you if the service has to instantiate a new compute env, could be ten whole minutes.
I wish I were joking but it’s been a sticking point more than a few times. Use it! It’s actually a great choice if you don’t already have institutional knowledge of the step function DSL.

jiffypop45
Dec 30, 2011

Does anyone have resources for testing and upgrading from AL1 to AL23?

I can also ask in the Linux thread but I'm confused as to why the internet seems really lacking on this subject. I get AWS would want you to think it just works but that feels naive.

Docjowles
Apr 9, 2009

Are you talking about like building a new environment on AL2023 and what pitfalls you will encounter deploying your app that has been running on AL1 to it? Or trying to upgrade in place via yum or something? AL1 is ancient (in cloud terms), I would be surprised if the latter is even possible. I think AWS would tell you to build new AMIs and new hosts and cut over. AWS is not pet friendly.

jiffypop45
Dec 30, 2011

Docjowles posted:

Are you talking about like building a new environment on AL2023 and what pitfalls you will encounter deploying your app that has been running on AL1 to it? Or trying to upgrade in place via yum or something? AL1 is ancient (in cloud terms), I would be surprised if the latter is even possible. I think AWS would tell you to build new AMIs and new hosts and cut over. AWS is not pet friendly.

This would be deploying existing code to new al23 hosts.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug
Honestly best way might just be deploy it and see what happens.

Hed
Mar 31, 2004

Fun Shoe
We have an Application Load Balancer that is ingress for a k8s application. Most of the time it works really well but occasionally it just times out. Running curl -v https://app.com doesn’t even negotiate TLS and then times out. Once it works it seems to be sticky. Intermittent so it’s hard to debug.
Looking at the health checks for the app it seems fine.

What should I be looking at to debug this? Shouldn’t the ALB negotiate TLS with a client first or is it “smart” and making sure the app is in a good state.

whats for dinner
Sep 25, 2006

IT TURN OUT METAL FOR DINNER!

Hed posted:

We have an Application Load Balancer that is ingress for a k8s application. Most of the time it works really well but occasionally it just times out. Running curl -v https://app.com doesn’t even negotiate TLS and then times out. Once it works it seems to be sticky. Intermittent so it’s hard to debug.
Looking at the health checks for the app it seems fine.

What should I be looking at to debug this? Shouldn’t the ALB negotiate TLS with a client first or is it “smart” and making sure the app is in a good state.

I'd start by looking at CloudWatch metrics looking at load balancer 504s vs. backend 504s. That'll tell you pretty quick where AWS thinks the problem is, at least. If you're not even seeing the failed requests hit the load balancer then you'll want to have a look at VPC flow logs to see if you get spikes of packets being rejected (which generally points toward something like a security group rule being deleted and recreated and requests coming in while the rule is absent).

Love Stole the Day
Nov 4, 2012
Please give me free quality professional advice so I can be a baby about it and insult you

Hed posted:

We have an Application Load Balancer that is ingress for a k8s application. Most of the time it works really well but occasionally it just times out. Running curl -v https://app.com doesn’t even negotiate TLS and then times out. Once it works it seems to be sticky. Intermittent so it’s hard to debug.
Looking at the health checks for the app it seems fine.

What should I be looking at to debug this? Shouldn’t the ALB negotiate TLS with a client first or is it “smart” and making sure the app is in a good state.

When I saw this happen for my NLB, the root cause was that the VPC subnet's IP/CIDRs to which it's routing was not correct for whatever reason.

Presumably, you have multiple subnets to which the ALB forwards, right? So, because it's intermittent... maybe one of them isn't working, and the timeouts are when the routing goes to the bad VPC subnet IP/CIDRs.

edit: since the ALB is supposed to terminate TLS, then most likely you are probably routing to the ALB through some different VPC. I wonder if their VPC mapping rules' CIDR blocks are properly configured? Maybe one of their mappings isn't going to your ALB exactly, because of a bad CIDR block or whatever.

Love Stole the Day fucked around with this message at 20:16 on Jun 28, 2023

Docjowles
Apr 9, 2009

This is a total shot in the dark and probably not your issue, but we were running a very high traffic service behind ALB and dealt with the same thing where requests would randomly be slow or time out entirely at the load balancer even if the backing service was totally healthy. A couple findings

1. If the ALB has to scale up or down there can be brief periods of refusing connections. This was confirmed by AWS support. Not really anything you can do here. Their recommendation was "make sure clients implement retries" which doesn't really help if your client is Safari on some dude's iPhone.

2. Switching from round robin to Least Outstanding Requests was a massive performance win for our specific application. So try that maybe?

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Hed posted:

We have an Application Load Balancer that is ingress for a k8s application. Most of the time it works really well but occasionally it just times out. Running curl -v https://app.com doesn’t even negotiate TLS and then times out. Once it works it seems to be sticky. Intermittent so it’s hard to debug.
Looking at the health checks for the app it seems fine.

What should I be looking at to debug this? Shouldn’t the ALB negotiate TLS with a client first or is it “smart” and making sure the app is in a good state.


First things first, double, triple, and then quadruple check your route tables, subnets, and firewall rules, and open an AWS support case.

Once you’ve done that, you need to start collecting data, which starts with “what % of requests fail vs succeed, and what is the customer/business impact of this failure rate?” If it’s negligible, it may be worth tossing it into the backlog graveyard. If it causes issues that cannot be auto recovered, or cause SLO breaches, then you need to invest the time required to collect more information.

if this is exhibited with curl and within your client applications, you’re unlikely to see issues solely on the client side, but that doesn’t mean request properties don’t influence the overall result. you need to start instrumentation of both your calling service (whether that’s an API client, a mobile application, or a JS frontend) as well as your upstream k8s receiving service with telemetry events that tell you what combination of properties you can associate with your failing requests. For example, are your failing requests evenly distributed among geographic origins? Source subnets? Evenly distributed among destination handler routes? What about User agents, Unique IDs, Variable values? What combinations of feature flags are enabled? If this is sticky across the duration of a TCP session, what happens when you establish a new session within a process lifetime, and do you measure this information in your client? Do you see a greater rates of failures when your ALB is distributing requests to backend pods in subnet A vs subnet B? What about when your requests come from subnet A vs B? What HTTP methods are most commonly used and do the proportions differ between failed and successful calls? If you find that it’s mostly POST/PUT requests, what’s the size of your request body in bytes? Have you instrumented your nginx ingress controller with the available open source tracing plugins? How many open HTTP connections did your service have open during the trailing one second period prior to the failed request call?

Ultimately you need to identify interesting dimensions and find differences in those dimensions between successful and failed requests. If you are not able to see something obvious from the network or config layers, you need to proceed from first principles:

1. Think about what you are trying to understand
2. visualize (and generate) telemetry data to find relevant performance anomalies
3. search for common dimensions within your anomalous areas by grouping and filtering event attributes
4. Have you isolated likely dimensions that indicate possible sources of your anomalous behaviour? If not, repeat.

Now would be a good time to look into opentelemetry if you haven’t yet. You can get a lot of the above information from automatic instrumentation libraries and open source visualization backends like jaeger.

Even if it is a network routing issue, or an ALB implementation issue, collecting this telemetry information is going to give you the essential data you need to work with your cloud provider’s support services, and will pay extraordinary dividends in the future.

The Iron Rose fucked around with this message at 06:18 on Jun 29, 2023

Hed
Mar 31, 2004

Fun Shoe
Thanks everyone who helped, I solved this using your help and realize I forgot to follow up.

We had 2/3 subnets (different AZs) that weren't plumbed correctly. Hard to debug because some sticky sessioning going on (so once it worked it was hard to reproduce). It didn't help between my general inexperience with Terraform and how hard it was to debug because I'm inexperienced attaching to k8s pods. Armed with your thoughts we sat down together and figured it out. Thanks for the assistance.

And in great news I now know how to use curl with the -w command to time the various parts of the handshakes/HTTP transaction. CF won't let me paste it, but using the -w flag to write out things like time_connect made this a lot easier to debug than my "curl from various endpoints upstream and downstream of the LB, and use a wall clock".

MrMoo
Sep 14, 2000

A long article from an S3 author:

https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

Hughmoris
Apr 21, 2007
Let's go to the abyss!
A bit of an odd question for you all:

When watching AWS tutorials, or reading articles, I often see certain parts of the console or CLI being blurred. Clearly, there is sensitive info such as keys/passwords that the presenter doesn't want the viewer to see.

If I start putting out simple videos or blog posts on AWS services, what should I be blurring/hiding from the audience? I know not to show access keys and passwords. Is there anything else I should be cognizant of so I don't get hacked?

Thanks Ants
May 21, 2004

#essereFerrari


Build the environment for the purpose of taking screenshots or video but then tear it all down when you're done.

Happiness Commando
Feb 1, 2002
$$ joy at gunpoint $$

That should be enough. Sometimes arns or resource names can leak sensitive info if you're a corporation. Frequently account IDs are blurred out, even though they're not actually sensitive. Corey Quinn has a tweet somewhere of some aws exec definitively stating that account IDs shouldn't be considered sensitive. But it's usually done anyway.

Vanadium
Jan 8, 2005

Another thing I'd treat as sensitive are internal/private S3 bucket names, because people frequently enough give them formulaic names and/or construct bucket names dynamically in code, and if it gets out your bucket is named my-cool-app-prod-us-west-2, some clown might create my-cool-app-prod-eu-west-1 or something and give you an unnecessary headache when you want to deploy your stuff there too.

Maybe that's just me being grumpy about shared namespaces.

And account IDs aren't inherently sensitive and it wouldn't come up in a demo, but if eg you're interacting with the AWS accounts of your customers somehow, you probably do want to be careful not to leak their AWS account IDs to other parties even if you don't mind your own showing up in public, but, like, just for normal privacy reasons and not security reasons.

I got laid off from my AWS heavy job a few months ago and do not see a lot of AWS in my future but this stuff is still swirling around my head send help.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Thanks Ants posted:

Build the environment for the purpose of taking screenshots or video but then tear it all down when you're done.

Happiness Commando posted:

That should be enough. Sometimes arns or resource names can leak sensitive info if you're a corporation. Frequently account IDs are blurred out, even though they're not actually sensitive. Corey Quinn has a tweet somewhere of some aws exec definitively stating that account IDs shouldn't be considered sensitive. But it's usually done anyway.

Thanks! I went and found that Corey Quinn article on the sensitivity of Account Ids. Here is the meat of it:

quote:

So, settling this debate once and for all, I quote AWS’s Director of Worldwide Analyst Relations & Market Insight Steven Armstrong: “Account IDs are not considered sensitive. Based on your feedback, we’ve started updating our documentation to make this more clear.”


Vanadium posted:

...
I got laid off from my AWS heavy job a few months ago and do not see a lot of AWS in my future but this stuff is still swirling around my head send help.

Are you just tired of AWS, or moving onto greener pastures?

Vanadium
Jan 8, 2005

Hughmoris posted:

Are you just tired of AWS, or moving onto greener pastures?

No shade on AWS, just that the places I'm looking at either don't use it directly or are not super cloud adjacent in the first place. I wouldn't mind doing more AWS and tbh I'd enjoy getting more perspectives on how other orgs set up their stuff.

Adbot
ADBOT LOVES YOU

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

Vanadium posted:

Another thing I'd treat as sensitive are internal/private S3 bucket names, because people frequently enough give them formulaic names and/or construct bucket names dynamically in code, and if it gets out your bucket is named my-cool-app-prod-us-west-2, some clown might create my-cool-app-prod-eu-west-1 or something and give you an unnecessary headache when you want to deploy your stuff there too.
Am in security and am a Cloud Guy and this is part of the pentest suites for discovery out there, so yes being predictable is probably a Bad Idea. Also yes, consider "prod" to be similar to using "password" in your passwords - they are directly being used in word lists as top priority right next to dev and stage.

Aside:

Jesus Christ, in TYOOL 2023 the SA CoC subforum doesn't support Markdown formatted posts or if it does it's certainly not helping you find out.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply