Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
xpander
Sep 2, 2004
This is the thread for all things AWS! If you've got a question about a particular service, post it here and hopefully someone will answer it and you aren't just shouting into the void!! Possible topics include:

-how to use a particular service
-architecture questions/best practices
-exams/certifications
-cool stuff you're working on

I posted this in CoC because I want the focus to be on "devops"/scripting/coding because you should be automating your infrastructure, as part of the larger umbrella of "everything". If not - feel free to cruise on outta here on your dinosaur!

Further reading:
AWS Security Whitepaper
Architecting for the Cloud: AWS Best Practices Whitepaper

I'll try to keep the OP updated with must-have knowledge as appropriate, just holla if you have something worthy of this honour(<-- yes I'm Canadian).

Who The Hell Is This Guy?
I just started working at Flux7, who are primarily an AWS consultancy and devops shop. My title is amorphous, but "Cloud Infrastructure Engineer" has many impressive wizard-like words, as well as the added bonus of making real engineers mad. I wrote and passed(just barely!) my AWS Developer Associate exam yesterday, and am currently studying for Solutions Architect. I plan on going the distance all the way to at least one Professional cert, but given how brutal the Developer exam was compared to the practice questions/quizzes I did, I plan on accumulating some hefty experience before potentially throwing away $150 on that. I'm pretty much all-in on AWS's poo poo - the stuff I've seen and done in a bit over a month here has been extremely cool and good, and Amazon's rate of development on their services means it's only going to get better as time goes on. Admittedly I'm hardly a master of this stuff, but I'm learning a ton!

Why Should I Care About The Cloud?
An excellent question. The short answer: because it will probably save you money, time and aggravation. The less snarky answer: it depends! It feels shortsighted to say that everything belongs "in the cloud" - almost certainly there exists a use case or twenty where it's just not a great answer. The reality is that there's a good chance you can re-architect some portion of your system to make better use of on-demand resources so that you aren't paying fulltime usage on i.e. EC2 instances. Things like API Gateway and Lambda might obviate the need for certain fulltime-provisioned resources entirely! And that is all to say nothing about the fantastic failover/disaster recovery possibilities baked into the AWS infrastructure. It's certainly worth investigating to see what might be a good fit. Incidentally, this is exactly the kind of thing that we do at Flux7, and I'm happy to chat about it as a learning exercise for myself.

Are You Certifiable?
I think it's worth talking about becoming AWS Certified, and what that entails. I don't have much data on how valuable this is, but the exams themselves are comprehensive enough that I believe them to accurately demonstrate the knowledge required(and thus prove your skillz).

First up, getting some training - I really like A Cloud Guru(also it's the only training I've done, other than practical/on-the-job):

A Cloud Guru

They've got courses on pretty much everything - I bought the Associate package while on sale at Slashdot for I think $5, which was a steal. I think it's absolutely worth the full price, and will be getting the Professional pairing once I feel it's worth my time. I also bought their Lambda course this week, and while the quality wasn't quite the same(different instructor), it's a great hands-on demo of what's possible with the serverless approach(feels buzzwordy but man can you do some cool poo poo in this space!). Definitely post any feedback you have on other courses they offer, as I'd like to vet training materials so that people aren't wasting their time and money.

Speaking of which - DO NOT do the Webassessor practice exams. From all accounts(not my own) they are garbage, and I'll say that the Developer exam was WAY harder than the practice questions I got from A Cloud Guru. If that's the case for the "official" practice exams, then my assessment is that they lure you into a false sense of security and aren't even worth the paltry $15.

The exams themselves will run you $150 USD apiece, with the pass mark being approximately 65%(can confirm this to be try as of 09/21/2016, uuhh...from a friend). Reportedly it's on a bell curve, and they shift the pass mark based on real-world results, but who knows. Plan to need at least 36/55 correct answers for victory.

Adbot
ADBOT LOVES YOU

xtal
Jan 9, 2011

by Fluffdaddy
I highly recommend installing this addon before reading this thread: https://chrome.google.com/webstore/detail/cloud-to-butt-plus/apmlngnhgbnjpajelfkmabhkfapgnoai?hl=en

xpander
Sep 2, 2004

How could I have missed such low-hanging fruit?? Thanks!

Lutha Mahtin
Oct 10, 2010

Your brokebrain sin is absolved...go and shitpost no more!

I signed up for AWS a few days ago because I want to write a little Internet-using server program for a spare-time project, and the free tiers of various Amazon services look like they will be more than enough for it (I don't really have much of a budget for this). I don't really know where to get started, though. I've been reading through some of Amazon's documentation and marketing "get started" videos, but I'm curious if there are third-party articles or documentation that explain things better for someone (like me) who isn't super familiar with all of this stuff.

What I want to do is query some data over the web and store it somewhere for me to download later. The most important queries would be two things, one is a single query run once and hour, and another would be up to a couple dozen queries that would be run once or twice a day. Neither of these are very high bandwidth or super complex, just a little parsing of the results to toss the data I don't need and store the rest.

There is another API I want to query that updates every minute, and while I don't know anything about cloud stuff, my intuition is to think that the tighter time constraint might produce more headaches. However, it's not critical at this point to hit this API every single minute, because I'm not sure if the data from it will be useful, and I'm pretty sure I can determine this usefulness just by having a few hours or days of data from it to combine with other sources, so I may end up not caring about it.

Right now I think what I need is an EC2 instance to load my program onto. Beyond that, I don't know what I'm doing. I need to figure out first of all how to set things up so that I get an instance and put my program onto it, of course. But I don't really understand how the lifecycle of my virtual server works, like if I need to watch for signals that my instance is going to be like, respawned or moved, or if these are even things that happen. I also don't know if I need to or should be making use of any other stuff, like the data store services. I'm going to keep reading, but any suggestions and resources are welcome!

xpander
Sep 2, 2004

Lutha Mahtin posted:

I signed up for AWS a few days ago because I want to write a little Internet-using server program for a spare-time project, and the free tiers of various Amazon services look like they will be more than enough for it (I don't really have much of a budget for this). I don't really know where to get started, though. I've been reading through some of Amazon's documentation and marketing "get started" videos, but I'm curious if there are third-party articles or documentation that explain things better for someone (like me) who isn't super familiar with all of this stuff.

What I want to do is query some data over the web and store it somewhere for me to download later. The most important queries would be two things, one is a single query run once and hour, and another would be up to a couple dozen queries that would be run once or twice a day. Neither of these are very high bandwidth or super complex, just a little parsing of the results to toss the data I don't need and store the rest.

There is another API I want to query that updates every minute, and while I don't know anything about cloud stuff, my intuition is to think that the tighter time constraint might produce more headaches. However, it's not critical at this point to hit this API every single minute, because I'm not sure if the data from it will be useful, and I'm pretty sure I can determine this usefulness just by having a few hours or days of data from it to combine with other sources, so I may end up not caring about it.

Right now I think what I need is an EC2 instance to load my program onto. Beyond that, I don't know what I'm doing. I need to figure out first of all how to set things up so that I get an instance and put my program onto it, of course. But I don't really understand how the lifecycle of my virtual server works, like if I need to watch for signals that my instance is going to be like, respawned or moved, or if these are even things that happen. I also don't know if I need to or should be making use of any other stuff, like the data store services. I'm going to keep reading, but any suggestions and resources are welcome!

You're on the right track regarding EC2 - this is your basic server virtualization service. If you know anything about how this works outside of the cloud(i.e. traditional server hosting), it's much the same. At the end of the day, you're going to fire up an instance(server) and log in remotely just like a physical machine. I'm not sure what your background is, so if you need further explication on anything then just point it out. You'll definitely want some sort of monitoring on a running instance, for exactly the reasons you described. Typically it will only get moved if the underlying hardware fails - this happens, though it's fairly rare. If the instance does get terminated from their end, note that they will not recreate it for you - that's your job. Because of that, you'll certainly want to keep in mind how you might save program state(if any), configuration details, and data. On that last note, there's a free tier offering of RDS, Amazon's relational database service. This is basically a managed EC2 instance running some flavour of SQL, where you don't have to worry about OS patching, software optimizations, etc - just the architecture of your actual tables. There's plenty of MySQL tutorials out there, and it's useful enough to know how to put together some basic queries. If you don't actually need a database, and instead can make use of file storage, S3 will be your best friend. For monitoring, CloudWatch has you covered. You can set up alerts(i.e. emails) that get sent whenever a given metric crosses a certain threshold. Basic instance health checks don't really need configuration beyond where to send those emails. Do be sure to note the exact parameters of free tier coverage - it's pretty obvious with RDS, from what I recall, but I think any custom metrics with CloudWatch aren't free. Still, I guarantee you can do what you want without paying a cent.

On that note, the first thing I do when setting up a new AWS account is set up a billing alarm called "Cheapskate" where it emails me at >= $0.01 so I instantly know if I'm being charged for something. Check out this page for how to do that. Keep in mind that once it's triggered at that point in a given month, that alarm is now useless for additional charges. It will still show as being in the ALARM state, but won't send out additional notifications. So maybe set up 2-3 at different thresholds just in case. I'm on account #3 because I didn't make much use of those first two years!

If you're querying an API and just storing the data, you might want to take a look at Lambda. It will let you run code without actually having to think about a server at all. The only runtimes it has available are Python, Node.js and Java, so if you're writing in another language then forget about it. But if not, it might be worth looking into. It too has a free tier(that you'll never exceed if you're at all minding how often you run functions). If it seems daunting, just go with whatever you feel you understand the best. I will say that it's nice to not have to think about what's executing my code, or do much setup to get it to that point.

I hope this was informative and not terribly dense - I didn't want to go off the rails too badly for a high-level overview. Again, if you want more specific info regarding a certain topic, Just Post!

Lutha Mahtin
Oct 10, 2010

Your brokebrain sin is absolved...go and shitpost no more!

xpander posted:

I hope this was informative and not terribly dense - I didn't want to go off the rails too badly for a high-level overview. Again, if you want more specific info regarding a certain topic, Just Post!

Thanks for the reply. I have a bachelor's degree in CS, though I'm not very experienced, as I've never had a job in the field. I did take classes on databases, so I know SQL. And I know Java best out of general-purpose languages, so Lambda might work. Your post wasn't too dense, I understand basically what you are talking about; probably one of the big problems for me is just that I'm not really familiar with server administration in general.

The Lambda service might work for me, but again I don't really know. When I was just trying to learn about EC2, I had the vague outline of an idea for a setup where basically a server is spawned that has my code on it already, or it automatically loads it somehow, and if the server gets destroyed or shut down or whatever, a trigger is in place to spawn it again when possible. Data storage would be taken care of by moving my output (XML from APIs and maybe some text files) to a storage thing that doesn't care if my server gets nuked. The code on the server would also maybe read some from the data store to determine a few things, such as when the last queries were made, or which queries I am interested in at the moment; this would allow me to not be hard-coding state into the code or the server image. Is this sounding like something that makes some sort of sense, in terms of ~~the cloud~~?

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
You're talking about running a stateless app that's backed by S3. You can do a stateless app using Lambda or an autoscaling group of instances provisioned with ElasticBeanstalk, OpsWorks, or CloudFormation.

Red Mike
Jul 11, 2011

Lutha Mahtin posted:

The Lambda service might work for me, but again I don't really know. When I was just trying to learn about EC2, I had the vague outline of an idea for a setup where basically a server is spawned that has my code on it already, or it automatically loads it somehow, and if the server gets destroyed or shut down or whatever, a trigger is in place to spawn it again when possible. Data storage would be taken care of by moving my output (XML from APIs and maybe some text files) to a storage thing that doesn't care if my server gets nuked. The code on the server would also maybe read some from the data store to determine a few things, such as when the last queries were made, or which queries I am interested in at the moment; this would allow me to not be hard-coding state into the code or the server image. Is this sounding like something that makes some sort of sense, in terms of ~~the cloud~~?

There are a number of ways to set it up so that you can instantly spawn a server that has/gets your code, and keeps at least one instance up. Here's two options:
  • Set up an EC2 instance, remote in to it, configure it the way you want (while making sure none of your code is using the exact public/private IP you have on that specific instance). Then you can go to the EC2 dashboard, and make an image of that instance (an AMI) so that you can then go to the AMI list and launch a new instance exactly like that one. Ideally, you'd want to create a launch configuration that uses that AMI now for the next step.
  • Make a script (sh script for Linux, powershell for Windows) that automatically downloads and configures the software you want on the machine. Upload it to S3 to a public folder/folder with permissions set up to allow EC2 to access it. Go to Launch Configurations and make a new one based off of an Amazon AMI preferably (only for Linux) or just whatever OS you want installed. In the 'details' section, there's a 'userdata' input, write a small script there (sh for Linux, powershell for Windows; google for info on how to format this) that downloads your script off of S3 and runs it. This will make it so whenever the launch configuration is used for an instance, it'll run this userdata script automatically.

Now once you've done this, go to Autoscaling and set up an autoscaling group with a min of 1 and a max of 1 with your created launch configurations. If the instance goes down for whatever reason, autoscaling will kick in and create a new one. If you ever need to re-deploy that instance, set max to 2 temporarily and desired to 2, then once the second instance is live, set it back down to 1. That means that if you have a load balancer attached to this autoscaling group, you have literally no downtime for the re-deploy.

Regarding data storage:

  • Set up an EC2 instance that is used for storage independently of your computing instances. Tie in their drives via nfs-server or similar. (Not recommended)
  • Use S3. Always save data to S3, always read data from S3.
  • Use RDS (as a database) and save to/read from there. (Recommended only if your data is a good fit for a database obviously.)

Regarding potential trip-ups with EC2:

  • A "VPC" is basically like a private LAN within EC2 that many instances can share. Nowadays with the new instance types (m4, etc) you require a VPC whether or not you want one. Not sure if this is true for t2 as well, but I'd expect so. Practically, the VPC means all your instances in the same VPC get the same sort of private IP (10.0.100.x) and can access each other over that network. This is useful for allowing, for example, a cluster of instances to talk to each other, but having a bunch of them not accessible from the internet.
  • When you first start an instance in a VPC, you'll get an 'auto-assign public IP' option, which effectively means 'this instance you're creating in this VPC, do you want it accessible from the internet as well'.
  • Security groups are annoying and you will have to check them any time you have connectivity issues. A security group is like a firewall on the machine, blocking input/output ports. By default, instances start out allowing SSH/RDP from anywhere, and not accepting any other input, while allowing any output. If you're hosting a webserver, you'll want to allow 80, 443 from anywhere (or just from your IP for testing).
  • "Availability zones" are basically different datacentres. If you have an EC2 instance A in eu-west-a, and an EC2 instance B in eu-west-b, and an RDS instance they both talk to in eu-west-c, that means all communications between those are across availability zones. Some things start costing (more) money if you go out of your availability zone with your traffic, always check rates.
  • "Elastic IPs" are basically a static IP that you can re-assign from one machine to another. You would allocate one, and use it in your front-end/whatever. Then when you want to switch from instance A (which is out of date or has gone down) to your instance B (which is up to date or up) you take the elastic IP from A (or shut down A) and assign it to B (which needs to not already have an elastic IP). I don't recommend you rely on them for anything and instead add a domain name to AWS, which gives you way more control at the cost of a few minutes/hours latency on DNS updates, but that's me thinking of larger scale.
  • Load balancers do exactly what it says on the tin. If you have a load balancer that has 3 instances on it, then any request has a chance to go to any of those 3 instances. If you have it set to sticky connections, then your first request has that chance, then the same session will keep going to the same instance. If you don't need sticky connections, then you can fake a load balancer with Route 53 (the domain name service) by making a DNS record in R53 with 'weighted' policy with 2 sessions.

Summary: If you're trying to learn ~~the cloud~~ then look into Lambda/ElasticBeanstalk/etc, but be aware your knowledge will literally only be useful for AWS ever. If you're just trying to learn server development at small scale, maybe don't use AWS. If you're trying to learn mid/large-scale development, then use AWS, but also pretend that you have a large number of servers everywhere and you can't afford downtime. Otherwise everything you learn will have to be changed or re-learned when you actually have such a system in production.

e: And if you just want to make your app in peace, go for a VPS provider, preferably a very very cheap one. You have providers that offer year-long machines with 1 modest CPU, 256 MB RAM and enough storage, all for £5/year or similar. You can also try DigitalOcean, which constantly have 'promotions' where you can sign up with a promocode and get more than one month's credit when you start up. Working with a VPS provider will also tell you all you need about small scale development, and the knowledge transfers to some degree (depending on provider) over to EC2 once you learn the lingo.

Red Mike fucked around with this message at 12:07 on Oct 8, 2016

Lutha Mahtin
Oct 10, 2010

Your brokebrain sin is absolved...go and shitpost no more!

:worship:

Thanks for the replies, everybody! I think I understand much better how some of the different services and janitoring tools fit together. Now I just need to pick one of the strategies one of you has outlined and go for it... but I'm sure I'll be back with some more questions once I get underway :o:

One final thing I want to point out for now is that I'm already aware about how, depending on what design decisions I may ignorantly make, I could end up with something that can't just be copied and pasted to work on some other cloud service, VPS host, etc. My plan is to just write this small thing for now, and if I want to continue with it as a learning excercise, I might sign up for another different service and refactor the program to make it more platform-agnostic. This is just a fun and educational exercise for me, to increase my programming chops and maybe to have another thing to put on my resume when I start applying for software jobs. I'm already working on another project with the same idea of it being a resume-builder. Thanks again everyone!

xenilk
Apr 17, 2004

ERRYDAY I BE SPLIT-TONING! Honestly, its the only skill I got other than shooting the back of women and calling it "Editorial".
Yay! Perfect thread for this small question I'm having!

I'm running 5 rails apps that realistically don't get much traffic but need to be online and fast at all time. Right now I'm running the rails app / db through DigitalOcean 5$ to 10$ instances and have no plan to move to EC2 since it doesn't yet make sense for me.

BUT I'm debating moving all my postgres databases to a t2.micro RDS instance because it's fully managed AND properly backed up and the data is the core of my apps.

So my question, or questions, are:

- Is running a micro instance sounds like a viable option for a set of 5 sites that get probably <10,000 views daily total and probably less than a gb of database storage.
- I noticed that the network is marked as "LOW" for t2.micro instances, does that mean that it will be slow as hell?
- For EC2, instances have CPU Credits for bursts, is it the same for RDS? Does it mean my instance can go down for any reason? That would blow.

Thanks!

Red Mike
Jul 11, 2011

xenilk posted:

- Is running a micro instance sounds like a viable option for a set of 5 sites that get probably <10,000 views daily total and probably less than a gb of database storage.

t2.micro should be more than enough, assuming you don't do incredibly expensive queries. If you run the CPU up to 100% constantly because you're running a query with 3 dependent subqueries for every one of those daily views, then no it's not going to work.

xenilk posted:

- I noticed that the network is marked as "LOW" for t2.micro instances, does that mean that it will be slow as hell?

That "network" thing just relates to your bandwidth limits (this is how many bytes you can push into/out of the instance in a single instant). It's also misleading across different classes (t2, m4, m3, etc) since t2.micro actually has slightly higher bandwidth limits compared to the cheapest m4.

xenilk posted:

- For EC2, instances have CPU Credits for bursts, is it the same for RDS? Does it mean my instance can go down for any reason? That would blow.

CPU credits don't mean instances going down for either EC2 or RDS. You get baseline performance unless you have cpu credits to spend in order to burst up to higher performance. When you're not using up all your resources, you recover cpu credits slowly. This means that if you only end up needing to burst for 50% of the day, you should be able to maintain it permanently.

Ideally, you should be ignoring CPU credits entirely and making sure that your baseline performance is enough to handle whatever you're throwing at the machine. CPU credits are dumb and will cause you to lock up your machine at 100% CPU in the middle of the night because oops suddenly your machine became quite a bit slower at peak times.

Unrelating to your questions, if you're not also moving to EC2, double check your traffic costs. Data transfer costs money unless your database is talking to an EC2 instance in the same availability zone using private IPs.

zerofunk
Apr 24, 2004
Keep in mind that having your database server in a separate location from your web servers is going to introduce some additional overhead in the latency for communication between the two. If every millisecond isn't critical, then it may not be a big deal. I imagine the latency between DO and AWS isn't even all that bad depending on what locations you've working with. Something worth considering though.

Red Mike brings up a good point about additional costs due to data transfer as well. If you kept it all within AWS, you wouldn't have that issue. Amazon just announced a new VPS product called Lightsail that is supposed to be more akin to Digital Ocean's offering (I haven't read too much about it myself) than EC2. It may be worth looking into that if you did want to move everything over, but keep a similar setup aside from database hosting.

Red Mike
Jul 11, 2011

zerofunk posted:

Red Mike brings up a good point about additional costs due to data transfer as well. If you kept it all within AWS, you wouldn't have that issue. Amazon just announced a new VPS product called Lightsail that is supposed to be more akin to Digital Ocean's offering (I haven't read too much about it myself) than EC2. It may be worth looking into that if you did want to move everything over, but keep a similar setup aside from database hosting.

Lightsail is looking competitively priced, although setting up a bridge between AWS and it (so you can access your RDS instance) will mean you'll have to learn about AWS specific notions (which is a good thing if you're trying to learn; annoying if not).

It does however seem to be US-only for now. Otherwise you'll need EC2 instances which end up costing more money.

Marvin K. Mooney
Jan 2, 2008

poop ship
destroyer
Hey I'm fuckin dumb and have no idea what I messed up, hopefully this is the right place to ask questions.
I'm making a test site using CloudFront/S3 and I can't get them to cooperate. I have my simple site data in an S3 bucket, I made sure to enable static website hosting, and I made sure to enable read permissions
code:
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "AddPerm",
			"Effect": "Allow",
			"Principal": "*",
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::fart-bucket-content/*"
		}
	]
}
On the CloudFront side, I have a distribution with the origin
code:
fart-bucket-content.s3.amazonaws.com
, origin path blank, bucket access not restricted, default behavior is just "redirect HTTP to HTTPS" "GET, HEAD" everything else is default except I set compress objects to yes.
The issue I'm having is it seems to load up the main page fine, but when it accesses any directories in the fart bucket it comes back with Access Denied, even if there's nothing in the directory but an index.html file. What is going on? Is it a path problem? I tried appending /* to the origin path but then it wouldn't even load the main page. Sorry if this is a dumb question, I'm teaching myself as I go and it's a lot of trial and error.

Skier
Apr 24, 2003

Fuck yeah.
Fan of Britches
Try one moving part at a time. Go to your S3 bucket, open properties and show the static web hosting details. You'll see a link to the bucket there, it'll look like http://fart-bucket-content.s3-website-us-east-1.amazonaws.com/ . Open that URL in your browser. If you can get the contents of other directories okay, the problem is with CloudFront. Otherwise there's more settings to play with in S3, probably read related.

http://docs.aws.amazon.com/AmazonS3/latest/dev/website-hosting-custom-domain-walkthrough.html will help, plenty of official AWS docs to help with this.

12 rats tied together
Sep 7, 2006

Marvin K. Mooney posted:

Hey I'm fuckin dumb and have no idea what I messed up, hopefully this is the right place to ask questions.
I'm making a test site using CloudFront/S3 and I can't get them to cooperate. I have my simple site data in an S3 bucket, I made sure to enable static website hosting, and I made sure to enable read permissions
code:
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "AddPerm",
			"Effect": "Allow",
			"Principal": "*",
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::fart-bucket-content/*"
		}
	]
}
On the CloudFront side, I have a distribution with the origin
code:
fart-bucket-content.s3.amazonaws.com
, origin path blank, bucket access not restricted, default behavior is just "redirect HTTP to HTTPS" "GET, HEAD" everything else is default except I set compress objects to yes.
The issue I'm having is it seems to load up the main page fine, but when it accesses any directories in the fart bucket it comes back with Access Denied, even if there's nothing in the directory but an index.html file. What is going on? Is it a path problem? I tried appending /* to the origin path but then it wouldn't even load the main page. Sorry if this is a dumb question, I'm teaching myself as I go and it's a lot of trial and error.

This is kind of funky because directories aren't really a thing with S3. You could try changing s3:GetObject to s3:Get*, and then try copy pasting another statement section where the Resource is just the bucket ARN (without the /*)?

e: actually, I'm sorry, that fix is for an annoying gotcha when you are delegating s3 access to IAM users. It sounds like your problem is that you're trying to read a directory, when those aren't really a thing except as displayed through the s3 web interface. For one of the directories that just contain an empty index.html file, what happens if you try to just view index.html directly? Is that also an access denied?

12 rats tied together fucked around with this message at 04:18 on Feb 10, 2017

Marvin K. Mooney
Jan 2, 2008

poop ship
destroyer
So I know the S3 is fine because using the S3 URLs everything works as expected. It must be something with the CloudFront setup. Basically the site is totally barebones: an index.html for the homepage, then an "about" directory and a "contact" directory, each with an index.html file inside. Right now CloudFront can access the main page fine but gives Access Denied errors for everything else. https://www.fartexample.com works, https://www.fartexample.com/contact/ doesn't work, https://www.fartexample.com/contact/index.html doesn't work.

I'll try posting in the AWS forum too but I've tried changing every little setting one at a time and waiting for the full deployment and nothing seems to do anything.

Marvin K. Mooney
Jan 2, 2008

poop ship
destroyer
I figured it out, posting the answer in case anyone else has the same problem.
Turns out I was using the default S3 bucket naming convention that popped up in the CloudFront Origin settings, but that disables the normal behavior where it looks for index.html inside directories. By changing that to the full S3 bucket name found under "Static Web Hosting" in the S3 control panel, it fixed the forwarding and it works normally now.

ickna
May 19, 2004

Just a friendly reminder to turn on MFA if you use AWS. My account was compromised yesterday morning and I wouldn't have caught it if Amazon didn't do checks on unusual activity (like maxing instances in every area). In 3 hours it racked up $2k in usage charges, which they are fortunately making a concession for.

I'm still not sure how they got my password, it was a pretty secure one (random alpha + numbers).

fluppet
Feb 10, 2009

ickna posted:

Just a friendly reminder to turn on MFA if you use AWS. My account was compromised yesterday morning and I wouldn't have caught it if Amazon didn't do checks on unusual activity (like maxing instances in every area). In 3 hours it racked up $2k in usage charges, which they are fortunately making a concession for.

I'm still not sure how they got my password, it was a pretty secure one (random alpha + numbers).

I'd be checking the iam keys rather than the password

ickna
May 19, 2004

fluppet posted:

I'd be checking the iam keys rather than the password

I'd checked those, the only one I had created was for one of my EC2 instances to be able to access S3. It was definitely compromised on the root account. It was certainly a wake up call and I've gone to 2FA for as many of my other major accounts across the internet as a result.

Pollyanna
Mar 5, 2005

Milk's on them.


I have a question about EBS and baking AMIs. We're currently baking a new AMI for every new version of our app we want to deploy, and I'm wondering if there's a way around that? It takes 15~20 minutes to band one which means that every commit I push to Bitbucket takes half an hour to show up on its staging server. I have to debug some pipeline related poo poo and ensuring that long to run into yet another bug is driving me crazy. What can I do to mitigate this?

Red Mike
Jul 11, 2011
Assuming I'm understanding correctly, the setup I've generally seen (especially with Windows servers that take ages to start up if it's a custom AMI) is that the AMI never changes (and preferably is an Amazon one, so that instances are brought up from the waiting pool of pre-instanced servers) but Launch Configurations are used instead, with a userdata script being passed in that downloads and sets up everything as needed. Launch Configurations are instantly created and available, and there's no overhead on bringing instances online from them.

Only downside: you're limited to 300 or so configurations at any one time, and I don't believe you can increase the limit.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Pollyanna posted:

I have a question about EBS and baking AMIs. We're currently baking a new AMI for every new version of our app we want to deploy, and I'm wondering if there's a way around that? It takes 15~20 minutes to band one which means that every commit I push to Bitbucket takes half an hour to show up on its staging server. I have to debug some pipeline related poo poo and ensuring that long to run into yet another bug is driving me crazy. What can I do to mitigate this?

Whatever tool you're using to bake has a lot of stuff going on. See if you can simplify the provisioning process if bake a base AMI that contains components that rarely change and use that to bake new app AMIs.

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





Pollyanna posted:

I have a question about EBS and baking AMIs. We're currently baking a new AMI for every new version of our app we want to deploy, and I'm wondering if there's a way around that? It takes 15~20 minutes to band one which means that every commit I push to Bitbucket takes half an hour to show up on its staging server. I have to debug some pipeline related poo poo and ensuring that long to run into yet another bug is driving me crazy. What can I do to mitigate this?

spinnaker or something like it can help here by building the ami state on a locally attached ebs volume instead of spinning up a new instance and building from scratch each time but i think most organizations create ami's that include everything but the application code/configuration and then use a minimal deploy tool + launch configurations

DICTATOR OF FUNK
Nov 6, 2007

aaaaaw yeeeeeah

Pollyanna posted:

I have a question about EBS and baking AMIs. We're currently baking a new AMI for every new version of our app we want to deploy, and I'm wondering if there's a way around that? It takes 15~20 minutes to band one which means that every commit I push to Bitbucket takes half an hour to show up on its staging server. I have to debug some pipeline related poo poo and ensuring that long to run into yet another bug is driving me crazy. What can I do to mitigate this?
Docker + ECR works really well for this.

Pollyanna
Mar 5, 2005

Milk's on them.


Yeah, I'm really confused on why things are being made from scratch every time. I'll have to confirm that's actually happening, but since the only thing changing is pulling a different commit of the master branch at any point in time, then there's no reason to bake entire AMIs.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Baking is a solid concept in that it resets instance state back to a known condition so your failure cases are restricted to the app and other transitive dependencies. It makes deployments much easier.

JehovahsWetness
Dec 9, 2005

bang that shit retarded
Per-release AMI baking is how Netflix does it, since they don't modify running instances and just phase-out their running fleets w/ new AMIs. They've got a couple of posts about their baking / release pipeline:

http://techblog.netflix.com/2016/03/how-we-build-code-at-netflix.html
http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html

"The bakery reduced AMI creation time to under 5 minutes. This improvement led to further automation by engineers around Netflix who began scripting bakery calls in their Jenkins builds. Coupled with Asgard deployment scripts, by committing code to SCM, developers can have the latest build of their application running on an EC2 instance in as little as 15 minutes."

Hughlander
May 11, 2005

Pollyanna posted:

Yeah, I'm really confused on why things are being made from scratch every time. I'll have to confirm that's actually happening, but since the only thing changing is pulling a different commit of the master branch at any point in time, then there's no reason to bake entire AMIs.

It guarantees availability of the deployment at any point in time in the future. As long as the instance can launch the app can be brought up. If you launch an instance and then deploy to it (for instance as part of an autoscaling group.) You now need to ensure that the second step bootstrap is available as well. If you're pulling from git you need then to ensure that your git host can sustain the pull of a mass deployment as well. If you have a large autoscale event and need to bootstrap 3000 instances will your git server fall over?

I've been at a place that did something similar, and even just doing an instance launch of 400 servers had a non-zero failure rate. We eventually replaced that system with EFS.

As above though Docker with a private container registery is probably the better approach but from a cost perspective I have a small issue with Docker in AWS.

fluppet
Feb 10, 2009

Pollyanna posted:

Yeah, I'm really confused on why things are being made from scratch every time. I'll have to confirm that's actually happening, but since the only thing changing is pulling a different commit of the master branch at any point in time, then there's no reason to bake entire AMIs.

So it sounds as though you have a couple of options to speed things up

If you setup a common base ami with as much of your standard tooling installed on it as possible then to prep a release all you now need to do in run a git pull

Also depending on how many amis you bake it may be worth getting packer to attach an EBS volume to an already running instance and snapshot it rather than waiting for a new instance to launch (t2.nanos are perfect for this)

StabbinHobo
Oct 18, 2002

by Jeffrey of YOSPOS
I'm curious what other peoples realworld workflows are like.

For instance, do you work from a text editor editing cloudformation templates and then run an aws cli command on your laptop? Do you have an ssh bounce host? Do you use the web interface on a day to day basis? How do you really deeply browse/survey/crawl an account to make sure you haven't accidentally left some rds instances running in brazil for a month?

xpander
Sep 2, 2004

StabbinHobo posted:

I'm curious what other peoples realworld workflows are like.

For instance, do you work from a text editor editing cloudformation templates and then run an aws cli command on your laptop? Do you have an ssh bounce host? Do you use the web interface on a day to day basis? How do you really deeply browse/survey/crawl an account to make sure you haven't accidentally left some rds instances running in brazil for a month?

I use the web interface the most because I work in devops consulting, and frequently run client sessions where I have to show them how to do stuff. So I want to be sure that I know where most things are. There are still some things you can only do on the console though - I couldn't find a way to pull the ARN of an SSL certificate managed via IAM through the web portal. I use PyCharm because I end up working with Python a lot in addition to YAML.

My workflow is: Make changes in PyCharm -> deploy using web console -> verify changes working correctly -> commit and push. For CFN-related work I think I'm going to start using the CLI more. The architectures we deploy frequently have a jump box, but it depends on the client's needs.

As for cost-related things, I'd recommend making some billing alarms. I have one called "cheapskate" in personal accounts still under the free tier, so that I know if I'm hitting any paid usage breakpoint(the threshold is set at $0.01). For your specific example of resources being used in other regions, you can filter by region in the Cost Explorer. That will let you chase down errant EC2/RDS instances or whatever. I don't have a better solution for that, but I don't really have to deal with that facet of the work.

2nd Rate Poster
Mar 25, 2004

i started a joke

StabbinHobo posted:

I'm curious what other peoples realworld workflows are like.

For instance, do you work from a text editor editing cloudformation templates and then run an aws cli command on your laptop? Do you have an ssh bounce host? Do you use the web interface on a day to day basis? How do you really deeply browse/survey/crawl an account to make sure you haven't accidentally left some rds instances running in brazil for a month?

My team manages nearly all infra through terraform -- we are small 10devs, and optimize to avoid aws lock in and auditibility.

For new infrastructure a checkout of terraform will be done locally and after changes are made updates to state files will be pushed to github.

Any post provision configuration is done through Ansible tower runs of our playbooks (also in github).

Local testing and development of Ansible stuff is just running the playbook against the container that development is done against. In some cases that's not enough so we keep a dev environment in terraform as well.

Longer term goal is to get to as much idempotent infrastructure as possible where the build process will bake app images completely. That too will be managed through terraform.

In the cases where we need ssh access we route everything through gravitational teleport. This gives us some central auditing of who ran what where and a level of access control.

The main drawback we've encountered so far is that we don't have a good way of managing terraform state changes. As you need to place your tfstate files centrally. In practice, though we've not had any merge conflicts that cause problems.

We're a little weak on deeper audits -- I suspect most places will be? I think if cost became an issue we'd end up writing some scripts to true up reality to terraform.

StabbinHobo
Oct 18, 2002

by Jeffrey of YOSPOS
thank you. I guess I don't even mean as much from a cost perspective as a "sprawling long tail mess" perspective.

every AWS account I've inherited or jumped into has been riddled with things like s3 buckets nobody knows if they're still in use, iam roles likewise, lambda functions with similar names from when they were trying to get it working, etc etc.

how do you ride herd on this stuff programatically? like I've been trying to clean out my account and start from scratch (its just for labs anyway) and a week later i'm still stumbling on little things in random corners from years ago (ssh keys you've long lost), and I was never even a really active user.

is there like an "aws dump all" or an "aws reset account to scratch --dry-run" type way of auditing all the things? (not in like a logs/security audit but in like a dead-code-path audit sense).

FamDav
Mar 29, 2008

StabbinHobo posted:

thank you. I guess I don't even mean as much from a cost perspective as a "sprawling long tail mess" perspective.

every AWS account I've inherited or jumped into has been riddled with things like s3 buckets nobody knows if they're still in use, iam roles likewise, lambda functions with similar names from when they were trying to get it working, etc etc.

how do you ride herd on this stuff programatically? like I've been trying to clean out my account and start from scratch (its just for labs anyway) and a week later i'm still stumbling on little things in random corners from years ago (ssh keys you've long lost), and I was never even a really active user.

is there like an "aws dump all" or an "aws reset account to scratch --dry-run" type way of auditing all the things? (not in like a logs/security audit but in like a dead-code-path audit sense).

aws config exists, but its mostly centered around resources that you actively pay for. there are also services like cloudcheckr that will do a more detailed inventory.

the talent deficit
Dec 20, 2003

self-deprecation is a very british trait, and problems can arise when the british attempt to do so with a foreign culture





i use this to track what exists where: https://github.com/Netflix/edda/wiki

Virigoth
Apr 28, 2009

Corona rules everything around me
C.R.E.A.M. get the virus
In the ICU y'all......




Edda is nice but if you are in a large environment with lots of deploys and resources it can blow your API limits out of the water and cause throttling really easily. Cloud watch logs should report it if you have alerting setup or or ship them somewhere like Sumo and have an alert. We learned this because some dev deployed it to our dev environment without asking the Infrastructure team for some reason?

fluppet
Feb 10, 2009

2nd Rate Poster posted:

For new infrastructure a checkout of terraform will be done locally and after changes are made updates to state files will be pushed to github.

Is there a particular reason you opted to have remote states pushed to git rather than a versioned s3 bucket that seems to be the common practice?

Adbot
ADBOT LOVES YOU

oliveoil
Apr 22, 2016
Does Amazon have anything like Google's App Engine? I don't want to know how to set up a system out of multiple components. I just want to write some code and then upload it and magically have a working application and never worry about virtual machines or how many instances of my code are running etc

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply