Amazon Web Services - Cloud Giant Hits Hard - The Something Awful Forums

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard

Volguus: Mar 3, 2009

Portland Sucks posted:

I was doing some machine learning classification stuff on my home PC and was sick of having my CPU tied up for days on end so I just jumped into the free tier EC2 without really reading anything about how it worked and was wowed at how slow it was in comparison to my i7. I figure that was the whole burst performance thing that the t2.micro offers working at my disadvantage since I just needed something that could run 100% for as long as I needed. Which EC2 instance types should I be looking at that won't scale back after a few hours of constant threaded CPU?

Last time I looked at their offerings (quite a few years back) they do have compute intensive VMs to choose from. But they cost a pretty penny.

# ¿ Jun 21, 2017 04:36

Adbot: ADBOT LOVES YOU

# ¿ Apr 30, 2024 18:04

Volguus: Mar 3, 2009

What is the best way to run jobs/executable on demand in AWS? ECS/Fargate? AWS Batch? Some other mechanism?

Our problem:

Customer creates project in our web application, which saves project data in the database (Postgresql RDS). Based on the data in the project, we need to train a model. The amount of time it takes to complete the computation depends on the amount of data in the project. It is a linear dependency. Since the computation doesn't need to happen often (only when the data changes, which is relatively rare) we do not want to pay through the nose for a compute-optimized EC2 instance.
So, I am looking at ways to do that calculation asynchronously. A docker container that runs my application (C++ based) looks to be the ideal candidate, which is started when data is changed, runs then dies.

As an input it needs the project id, then the application goes in the database to fetch the data, it does the calculation, then puts the result (the model) back in the database and potentially (this is optional) notifies someone in some way that the job is complete.

Looking through the documentation in AWS is quite confusing which way is the best to go. How can I programatically start the job in AWS? Some notification service? And how do I pass in parameters (said project ID) to the docker image/my executable? Also, where would it be best to store database credentials? In the docker image when it is created? Somehow passed as arguments when the task is created? Some other location?

All of this I would know how to solve using RabbitMQ and a pre-made EC2 machine that runs my application, but we don't want to pay for it to run all the time when not needed.

Thanks.

# ¿ Apr 17, 2018 19:38

Volguus: Mar 3, 2009

As far as I can tell for Lambda one can use Javascript, C#, Java kind of languages. Our thing is done in C++ so lmbda is kinda out of the question as far as I can tell. I am studying batch right now, is just that it looks very confusing. Which is why I thought I'd ask.

JHVH-1 posted:

Well Lambda can be used to trigger a job, like run something in fargate and then it does its thing and generates what you need and then exits so nothing is left running.

Example: https://serverless.com/blog/serverless-application-for-long-running-process-fargate-lambda/

Oh ... that's an interesting approach.

# ¿ Apr 17, 2018 19:53

Volguus: Mar 3, 2009

Doc Hawkins posted:

You can run anything compiled in lambda if you compile it for the right environment, include the binary in the zip file, and include a wrapper in a supported language that shells out when invoked.

E: but if execution will take longer than 10 seconds you probably want something else

Yes, it takes around 45 seconds to complete right now in a medium 1cpu instance of AWS, 20 seconds to complete on our local workstations with a reasonable (expected) load. So, maybe then aws batch?

# ¿ Apr 17, 2018 21:13

Volguus: Mar 3, 2009

Rapner posted:

Lambda's max timeout is 5 minutes, so it's probably you could get it to work?

Hmm, potentially yes. It would probably be easier for me to get my stuff compiled on/for a known docker image, then via lambda execute it in FARGATE. I am essentially trying to replicate https://serverless.com/blog/serverless-application-for-long-running-process-fargate-lambda/ .

If that won't work, then I'll look for alternatives.

# ¿ Apr 18, 2018 05:15

Volguus: Mar 3, 2009

After fooling around with a docker image that holds my application which is launched from a lambda in a Fargate compatible cluster - I've decided to drop it, for now. It just takes too drat long (sometimes even 4 minutes) for the drat image to get launched. Maybe that's the idea i guess, save money on containers but give up time it takes for the thing to execute.

So, for now, I went to the old (I believe) auto-scaling option in AWS. Made the launch configuration and the Auto Scaling group, set the conditions to be met to go up and go down and I DOS-ed my application in AWS. It perfectly went up (more instances) and after a while scaled down just like expected so for now I am very happy with this approach. It is probably more expensiv than the other one, but ... oh well. We'll monitor cost closely in the next few weeks.

I do have a question though: If I want update my application, it seems that I have to update the image, don't I? And if I update the image it seems that I need a new template, and a new launch configuration from that template and then update the Scaling group to use the new launch configuration, and then to force it to use the new image I have to detach the old instance and tell it to launch new instance.

This seems quite ... complicated and awful and long . Is there a better way? Can I automate it? I just need to update certain files in the image and restart 2 systemd services.

# ¿ Apr 21, 2018 20:05

Volguus: Mar 3, 2009

freeasinbeer posted:

Netflix designed spinnaker to do this, you could also use any of the config management tools and some combo of packer.

Any reason why ecs(although I dislike it) doesn�t work?

Jesus, why does everything have to be so complicated? I'll study spinnaker more, but all I want is to update the image a launch configuration is using. Ideally one command from my side that would launch the build, execute the tests, and if successful deploy.
Why ECS doesn't work? Because, as I said before, it takes 5 minutes for the thing to start up and launch the application that is in the container. I manually executed the lambda that started the task. With some notification system (SNS) would potentially take even longer. Everything in AWS seems to just take a long time. Like the other day I created an IAM user to get the AWS ID and secret key for docker deployment, and when I tried to login with it I got internal error for an hour, after which it magically just worked. Maybe stuff needs to propagate to places? No idea. I have absolutely no clue how this AWS monster works at all.

No wonder, with such complicated tooling, that there are people whose full-time job is to manage this cloud crap. I haven't tried any other cloud providers so I have no clue if the others are better or worse.

# ¿ Apr 21, 2018 20:55

Volguus: Mar 3, 2009

StabbinHobo posted:

welcome to devops, the next step is to pick your self medicating poison of choice

Which is why I don't wanna be a devops. When we grow (if we grow) from the 3 people we are now, I totally will hire a devops whose job will be solely to deal with this poo poo.

freeasinbeer posted:

I�m spitballing ideas here, but maybe an image that pulls a docker container on startup from ECR and update the docker container. You�d probably also want to update the launch config with an explicit version.

Now that you mention it, I just got an idea, please tell me if its really dumb:
My application is one WAR file, one .SO (native library), and potentially 2 configuration files. The most often to update is the WAR file, less often the SO and the configuration files probably never. What if (is it even possible?) I put them in S3 when built and ready to go to production and I have the image pull them from S3 when starting up? Then I only have one AMI to use (that I prepare beforehand) and which will always launch the latest version of the app? And when I do the update I only have to tell the autoscaling group to launch a new instance and destroy the last one?

Is that dumb? Can it work?

freeasinbeer posted:

I mean this is what heroku is for. But I don�t know a lot about the specifics here

True, but we are in aws now (have domains and DNS there and load balancing) and heroku looked a bit too opaque (we also have native component to our app). I'm not maried with AWS, but changing ... is hard.

# ¿ Apr 21, 2018 21:27

Volguus: Mar 3, 2009

FamDav posted:

So I believe you said you were using fargate for launching your task? i think you're hitting some issues around network interface connectivity when using public ips. if you set up a nat gateway in your vpc and don't enable public ip support for your fargate task, it should start up much faster and much more consistently (dependent on image size and application warmup).

Oh, that's interesting to hear. My application was not a service that would listen for connections on a port, I have absolutely no need to have a public IP, but without one it woudln't work (due to some other posts on the internet that I found, due to probably some other issues). Autoscaling though does work and I'm happy with it for now, except the hassle of updating the image.

Arzakon posted:

Bootstrapping deployments from S3 on launch and just terminating your old instance every new version is perfectly fine. You still have to deal with OS patching because your AMI will be trapped in a past era. Creating a process for copying your launch configuration with the latest patch level base AMI and replacing that in your ASG would be good to do regularly so you don�t spend an hour patching your OS when that AMI gets very old is recommended. Certainly easier than having some AMI baking process for every release you do.

freeasinbeer posted:

That s3 idea will work. Now someone should jump in and say it�s a bit risky in theory and that most huge places would avoid not explicitly setting the download artifact because you don�t know exactly what version is running, but in reality that should be fine for what you want.

Cool, thanks for the confirmation. All I need is to hold me until we can get a real devops guy on board. My only other worry about S3 is if I can make it private (that is, only me from my own AWS network to access it).

# ¿ Apr 22, 2018 00:19

Volguus: Mar 3, 2009

Blinkz0rz posted:

Yeah I didn't want to call that out but

What did I do? Or say? Or ... how wrong was that sentence?

# ¿ Apr 22, 2018 18:11

Volguus: Mar 3, 2009

the talent deficit posted:

devops isn't a person, it's a methodology

you wouldn't hire an agile person to agile up your software. altho i guess tons of places do this too

Wait, devops is not a job title, a particular job description? "Devops" guys is totally I thing that I heard. And people do hire agile consultants, although I don't think many know what to actually expect of them.

To me (and probably I'm wrong) devops is the team/guy who manages the build infrastructure, artifacts and deployment. Am I wrong to believe that?

# ¿ Apr 22, 2018 18:33

Volguus: Mar 3, 2009

Ok, thanks for the explanations so far. So if I would be to put a job posting tomorrow for a person to help with the build and deployment infrastructure ... what title should I ask for? At the end of the day, I agree with you that the person needs to be a lot more involved in the development and planning aspect, but we're a 4 people startup , 2 of them being scientists that shouldn't be trusted with a computer, much less with C++, me, the developer, architect, tester, build and release manager and AWS expert (god help us) and a CEO who is ... a CEO.

At the end of the day AWS is so drat big that one needs to do that crap full time to even dream of getting anywhere much less to take advantage of it in the most efficient manner. And I would be very happy to not have to touch AWS. The only thing I care about it is to be up and how much does it cost.

# ¿ Apr 22, 2018 23:57

Volguus: Mar 3, 2009

Methanar posted:

What are your scientists doing that doesn't involve computers.

That's the problem. It involves computers.

# ¿ Apr 23, 2018 01:05

Volguus: Mar 3, 2009

Thanks for the info. You're right, maybe a contractor would be the best way to start here, then go from there.

# ¿ Apr 23, 2018 15:05

Volguus: Mar 3, 2009

Got another question for you AWS gurus: how does one make an RDS db (postgresql) available across regions?
I tried the VPC peering option, but it does not have DNS name resolution across regions. I would like to not have the database open to the internet (even if restricted to a single IP or range of IPs). Or, is there a better option for this? Database replication maybe?
My goal is to have a web application available in multiple regions (for latency purposes), now looking at US-EAST (where we're currently) and EU-WEST (Paris). The DB would be relatively fine to be across the ocean as for the latency-sensitive operations it is not used that heavily. But the web application itself should be in the closest region to the user.

# ¿ Sep 21, 2018 14:56

Volguus: Mar 3, 2009

Extremely Penetrated posted:

If just using CloudFront for the web app in its original region isn't enough, RDS Postgresql supports cross-region read replicas. But then the app needs to be redesigned to send read-only queries to X, and read-write queries to Y.

RDS Aurora was supposed to offer multi-region multi-master (i.e. full read/write nodes) capabilities by the end of this year.

You could also look into whether adding an Elasticache layer would be a good fit for the workload.

Interesting, I didn't know about cross-region replicas. But the same issue stands: At the moment, the only way I know of that allows a web application (we're talking about API here, not plain pages, so Cloudfront doesn't really help) to talk with an RDS db in a different region is over the public internet, since VPC peering doesn't allow name resolution across regions (it works fine if I access the database using the internal 172.31.x.x IP). And I would very much like to not send db traffic over the internet. Is VPN a service that AWS provides and they manage and it works and i don't have to worry about that I could use to connect two (or more) VPCs?

# ¿ Sep 21, 2018 15:56

Volguus: Mar 3, 2009

Extremely Penetrated posted:

I haven't tried it, but I thought that Route53's Private Hosted Zones would let requests resolve to their internal IPs across any VPCs you associate them with.

I tried that, but it doesn't resolve the IP, it resolves the name. Essentially, for RDS you configure Route 53 private zones to resolve a pretty name (db.example.com) to the uglier and longer RDS provided name. And with VPC peering, that works fine, except that the remote site gets the ugly name, which it again tries to resolve and then it only gets the external IP for it. Which takes me back to square one. I want the database available to the remote zone via a private IP, and it should communicate with it over a private network .

# ¿ Sep 21, 2018 16:19

Volguus: Mar 3, 2009

Thanks Ants posted:

It's resolving the external IP, but is the traffic actually going externally? I know in Azure when you present a service endpoint into a vnet you still reference the 'public' DNS name but that traffic never actually leaves the private network.

Hmm, i don't know. I guess it tries to since the web application cannot connect to the database when it's presented with the name (since it resolves to the external IP), but it can connect fine if configured with the internal (172) IP.

Extremely Penetrated posted:

That sounded like maybe DNS resolution isn't enabled in the VPC Peering settings. Which led me to discover that "You cannot enable DNS resolution support for an inter-region VPC peering connection." lol so now you're looking at doing something like a TCP proxy in EC2 to forward stuff to RDS, or your own DNS. Hopefully someone else has less lovely ideas.

Yes, the fact that you can't I saw that, I was hoping that people have done that (it should be a solved problem, right?) and they have ideas. But with TCP proxy to ... you kinda lost me.

Edit again: I was struggling when I set up this poo poo for the first time, in one region. I hate the drat CEO for not wanting to hire someone who knows this crap. He was happy though that I was able to put together some lovely (but one button) solution for building and deploying the application automatically, and scaling it if needed and the traffic gets too high. But now when we need to expand, he still pulls that poo poo, that i can come up with something. It doesn't look that I will this time.

Volguus fucked around with this message at 16:51 on Sep 21, 2018

# ¿ Sep 21, 2018 16:45

Volguus: Mar 3, 2009

Im looking at https://docs.aws.amazon.com/vpc/latest/peering/invalid-peering-configurations.html#transitive-peering and it shows 3 VPCs.
I don't have 3, I only have 2 VPCs, and according to https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-basics.html#vpc-peering-limitations I have these limitations:

quote:

An inter-region VPC peering connection has additional limitations:
You cannot create a security group rule that references a peer VPC security group.
You cannot enable support for an EC2-Classic instance that's linked to a VPC via ClassicLink to communicate with the peer VPC.
You cannot enable DNS resolution support (a VPC cannot resolve public IPv4 DNS hostnames to private IPv4 addresses when queried from instances in the peer VPC).
Communication over IPv6 is not supported.
The Maximum Transmission Unit (MTU) across the VPC peering connection is 1500 bytes (jumbo frames are not supported).

And indeed,

I cannot enabled DNS on the peer.

So, what am I missing here that would allow me from Region B to resolve "mydb.rds.amazon.com" to an internal address (172.xxx) instead of the public one?

# ¿ Sep 21, 2018 20:59

Volguus: Mar 3, 2009

Thanks Ants posted:

It seems like you get a choice of having an RDS instance publicly accessible when you create it, and this changes how DNS behaves - if you don't have it publicly accessible then the DNS name will always resolve to a private address.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.WorkingWithRDSInstanceinaVPC.html#USER_VPC.Hiding

Edit: You wrote that above, I missed it. I think this is the setting that you want Volguus.

Oooh, yes, I guess, maybe. But I would like to be able to access the db from work from time to time (I update the security group to allow my ip to access it, do my thing, then remove it). If I set it to private (assuming i'll be able to update it even) I presume that then this will be it. I have to go through an EC2 machine. Which may be fine I guess.

Agrikk posted:

Let me see if I understand this:

- You have b1b1b1b1b.1a1a1a1a1a.us-east-1.rds.amazonaws.com that resolves to 10.1.0.23 in VPC1

- You have EC2 instance myinstance.us-west-2.ec2.amazon.com that resolves to 10.99.087 in VPC99

- When you try to ping b1b1b1b1b.1a1a1a1a1a.us-east-1.rds.amazonaws.com from myinstance.us-west-2.ec2.amazon.com it does not resolve because VPC100 in us-west-2 does not know about what is in VPC99 in us-east-1.

Is this what you are saying?

If so: a workaround is to stand up 1 DNS server in both VPCs with conditional forwarders to 10.1.0.2 and 10.99.0.2 and point all of your resources to your internal DNS servers. Each VPC has a set of AWS DNS servers that get queried by any object local to that VPC. The point is to collect all these disparate VPC namespaces into a single place at a single point that knows about all of them.

For your situation, though, the AWS recommended solution is to set up a replica at the destination and the stuff local to that queries the local instance.

Yes, that's what I'm saying. From region-15 when I ping db.aaa.long.name.RDS.amazon.com I get the public IP (18.x.y.117) instead of the internal IP (172.31.x.y) that I get when I ping from region-4 . And peering 2 VPCs from the 2 zones doesn't apparently do the name resolution correctly to give me the internal IP.
Is that AWS DNS a service that they provide?
About replica: Can PostgreSQL do that? Or is it an AWS service? What's the latency (synchronization time)? How would such a thing work? Who would know about this, a DBA?

I'm just trying to gather as much information as I can to hopefully push the drat CEO to hire the right people for the job as I have enough on my plate to not have to worry about crap like aws (as important as it is).

# ¿ Sep 21, 2018 22:37

Volguus: Mar 3, 2009

Agrikk posted:

The decision to go multi-region should not be taken lightly. As you have already discovered, the architectural decisions that must be made, and made correctly, determine the success of your workload.

Ideally you would have an AWS architect on staff as well as a DBA and together these two will explore options that AWS provides as well as the requirements of your application. there isn�t a Right Way of doing Multi Region. There is only the Right Way For You.

This, this 100 times. I have to shove these sentences down the "powers that be" throats until they'll "get it". My entire hope of all this is that the beers that I'm having right now will knock me out and make me forget that AWS exists by tomorrow morning. The less I know about it, the happier I am.

# ¿ Sep 21, 2018 23:55

Volguus: Mar 3, 2009

freeasinbeer posted:

Also yeah hire someone who knows AWS(I only charge $250 an hour, so feel free to call me)

I'd pay $2500/hour (not my money) to not have to deal with this.But yeah, thanks for the good advice. If i don't forget about aws by monday, i'll relay my findings and suggestions (thanks to you all) to the higher ups who hold the purse (too tight in my opinion).

Edit: tested the availability setting in RDS and I can confirm that setting the database to "private" makes it work as expected (that is, from Region B, it gets properly resolved to the internal IP of Region A which makes it work fine over a VPC peering). Will need to setup either a VPN or some ssh tunnel to access it from outside, but that's perfectly fine.
In the next few days we also have a meeting with some AWS expert, curious what advice he/she will give regarding our AWS setup so far and in the future.

Volguus fucked around with this message at 16:58 on Sep 24, 2018

# ¿ Sep 22, 2018 02:30

Volguus: Mar 3, 2009

deedee megadoodoo posted:

It's never too early to start. And I am the one who created the confusion by not being clear about what exactly was causing my frustration. This change not only broke a lot of app code that wasn't pinned to a specific boto3 version, but it also broke a lot of infrastructure. Our startup scripts rely on being able to run the aws command to copy artifacts from s3. It was a fairly simple fix, but I am just flabbergasted that this made it out into the yum repo to begin with.

That's ok. Jeff will send his ? email and everything will be taken care of.

# ¿ Sep 18, 2020 17:02

Volguus: Mar 3, 2009

Question for the AWS experts:

A company that I work with has the following scaling policy:

Essentially, if the average CPU usage reaches 80%, increase the capacity by 1. All good, it looks like it works, everyone's happy.

I wanted to reduce that number to 65%. Editing it gives me this:

And I cannot understand what it wants and why does it do that. Reading the documentation left me even more confused. Adding more steps with various numbers in there helps even less. What's a negative lower bound and why do I need it? Actually, no, I don't really care what that is, how can I shove down AWS' throat that 65 number and make it leave me alone?

Thank you.

# ¿ Oct 2, 2020 21:09

Volguus: Mar 3, 2009

Agrikk posted:

2. Learn the leadership principles. While plenty of companies have stuff like this, Amazon lives by them and the LPs govern everything we do. Think about how your previous job skills map to the LPs and be prepared with tons of examples for your interviews.

I googled that and they're quite insane and a shitload of them (16 or whatever). To hear that they're not just pr bullshit :dogstare:

# ¿ Jul 30, 2022 04:48

Volguus: Mar 3, 2009

Arzakon posted:

the interviewers that aren�t robots.

This implies that there are interviewers that aren't robots. I find this very hard to believe.

# ¿ Jul 30, 2022 20:10

Adbot: ADBOT LOVES YOU

# ¿ Apr 30, 2024 18:04

Volguus: Mar 3, 2009

quote:

It was a dark and stormy night ...

# ¿ Jul 30, 2022 21:03

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Amazon Web Services - Cloud Giant Hits Hard