Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Scrapez
Feb 27, 2004

Pretty new to AWS and trying to understand autoscaling and how I can initiate an event to increase group size based upon a log file on one of my instances rather than a cloudwatch metric like CPU utilization, etc.

For instance, I would have an instance receiving calls that had a finite amount of telephony. Once the maximum simultaneous calls reached a certain value, I would like to spin up another EC2 instance. I have log files on the EC2 instance that I could monitor and use to trigger the event but I'm not sure how exactly to do that.

The only thing I can think of would be to write a script that runs on the EC2 instance itself and once the max simultaneous calls were reached, it would spin up another EC2 instance via the AWS CLI.

Anyone doing something similar to this?

Edit: It looks like CloudWatch Agent may have the ability to do this...

Scrapez fucked around with this message at 21:42 on Jan 3, 2019

Adbot
ADBOT LOVES YOU

Scrapez
Feb 27, 2004

Arzakon posted:

Assuming each server has X slots available for a call but the slots were eventually recycled at the end of the call, create a custom CloudWatch metric and have a script on each server to report "Free Slots". Trigger your scale up based off of the sum of Free Slots going below some number and scale down based off of some other number. The latter gets a bit complicated because you don't want to terminate an instance that still has active calls so you need to write a lifecycle hook and respond to that when the active calls falls to zero, or don't have any automated scale down termination and manage deprecating and terminating the instances yourself.

Here is a blog post of someone triggering it based off of a Lambda function he has that queries active connections on his database.
https://blog.powerupcloud.com/aws-autoscaling-based-on-database-query-custom-metrics-f396c16e5e6a

That is very helpful. Thank you. I think scaling up is all I will need so managing the scale down termination manually is perfectly fine. I expect the scaling to be slow and predictable and likely will never need to scale down as the platform will continue to grow and become busier over time.

Scrapez
Feb 27, 2004

Is there a way from the command line on an EC2 instance to retrieve just the public IP address associated with that instance based on the private IP?

I can do `aws ec2 --region us-east-1 describe-addresses` which returns a list of all addresses and I could parse out the PublicIP of the instance I'm looking for with a combination of grep and awk but is there a better way of doing this?

I would be putting the private IP in a variable as I can obtain that via ifconfig and then I'd like to return the public IP based on the private IP.

I'm writing a bootstrap script that will update a config file on the instance with the public IP of that instance. Thoughts?

Scrapez
Feb 27, 2004

2nd Rate Poster posted:

From within the instance you can use the metadata service to find the public ip.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html#instancedata-data-retrieval


code:
curl [url]http://169.254.169.254/latest/meta-data/public-ipv4[/url]

JHVH-1 posted:

Also be aware if you use public-hostname it will resolve to the public IP or private IP depending on where it is resolved. That way you can do things like route the traffic internally for some systems with the same hostname.

Much easier way of doing it. Thank you!

Scrapez
Feb 27, 2004

Question on methodology. I want to create a cloudwatch event that will kick off when auto scaling launches a new instance successfully. Additionally, I want a script or a bunch of commands to be run on the ec2 instance that is launched.

I've created the cloudwatch event with the correct service, event and group name as the source. I've set the Target as SSM Run Command with Document AWS-RunShellScript (Linux). I have my Target key set to "tag:Server Type" and target value of <kamailio>. (I have the launch configuration of the autoscaling group set to tag new instances with tag Server Type and value kamailio.

Is this the above the proper way to say "execute the following commands on new instances with the tag Server Type and value kamailio?

Additionally, is there a way to have it just execute a whole script rather than putting each command in separately as a Constant Configure Parameter?

I hope the above makes sense. Ultimately, if an instance crashes, I want the autoscaling group to launch a replacement, I then want the cloudwatch event to be triggered and run a script that will basically grab the local and public IP address of the instance, put them into variables and then write them out to application config files and start the applications.

Scrapez
Feb 27, 2004

Arzakon posted:

Have you looked into using user-data to execute the script on launch? You could bake the script into your AMI and just use the user-data to run the command, or put the whole script into the user-data.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

There is a UserData field in the Launch Configuration you are defining for your auto-scaling group so you don't have to use a CWE or apply it to specific tags. It will just run on anything launched by that ASG.

I have successfully done it this way but was hoping to move it to a CloudWatch event as I'll have a subsequent Event that will need to happen when a new instance is launched as well. I thought it'd be better to have all the items together there for easier management.

Scrapez
Feb 27, 2004

My understanding was that I could select AWS-RunShellScript (Linux) in the Document type and then in the Commands section, I could just add commands to be run on the commandline. Below is how I have it setup currently for testing. My Auto Scaling group called kamailio successfully launches a new EC2 instance when I terminate one but that either is not triggering this event or once triggered, the event just isn't executing the commands.

https://imgur.com/a/T7ZoqNN

Edit: I'm working from this tutorial: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EC2_Run_Command.html

Double Edit: I setup a cloudwatch alarm for invocation of my autoscaling group kamailio. Deleted my instance which triggered the autoscale function and I did get an alarm in Cloudwatch. I'm stumped.

Scrapez fucked around with this message at 15:38 on Jan 22, 2019

Scrapez
Feb 27, 2004

JHVH-1 posted:

Can you manually run the SSM command to make sure it works? You have the instances set up with the agent and everything right? (Depending on what your base image is I think there is a chance its not installed already).

That could be the problem. I did not manually setup SSM at all on the image. I'll look into that. Thank you.

Edit: I made sure the SSM agent was running, took a new image. Confirmed that when it launches a new instance, SSM agent is running on startup. Made sure IAM role for the Cloudwatch event has all permissions for SSM. No clue why it isn't working.

2nd Edit: This is everything related to ssm I see in /var/log/messages on the launched EC2 instance:
code:
Jan 22 16:38:09 ip-10-100-10-55 systemd: Started amazon-ssm-agent.
Jan 22 16:38:09 ip-10-100-10-55 amazon-ssm-agent: 2019/01/22 16:38:09 Failed to load instance info from vault. RegistrationKey does not exist.
Jan 22 16:38:09 ip-10-100-10-55 amazon-ssm-agent: Error occurred fetching the seelog config file path:  open /etc/amazon/ssm/seelog.xml: no such file or directory
Jan 22 16:38:09 ip-10-100-10-55 amazon-ssm-agent: Initializing new seelog logger
Jan 22 16:38:09 ip-10-100-10-55 amazon-ssm-agent: New Seelog Logger Creation Complete
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Create new startup processor
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [LongRunningPluginsManager] registered plugins: {}
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing bookkeeping folders
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO removing the completed state files
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing bookkeeping folders for long running plugins
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing replies folder for MDS reply requests that couldn't reach the service
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing healthcheck folders for long running plugins
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing locations for inventory plugin
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing default location for custom inventory
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing default location for file inventory
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Initializing default location for role inventory
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Init the cloudwatchlogs publisher
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO Starting Agent: amazon-ssm-agent - v2.3.372.0
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO OS: linux, Arch: amd64
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: datastore file /var/lib/amazon/ssm/i-030c68d4bb30a2241/longrunningplugins/datastore/store doesn't exist - no long running plugins to execute
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] Starting document processing engine...
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [EngineProcessor] Starting
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [EngineProcessor] Initial processing
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] Starting message polling
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] Starting send replies to MDS
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [instanceID=i-030c68d4bb30a2241] Starting association polling
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [Association] [EngineProcessor] Starting
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [Association] Launching response handler
Jan 22 16:38:10 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [Association] [EngineProcessor] Initial processing
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [Association] Initializing association scheduling service
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessagingDeliveryService] [Association] Association scheduling service initialized
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Starting session document processing engine...
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] [EngineProcessor] Starting
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] [EngineProcessor] Initial processing
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] SSM Agent is trying to setup control channel for Session Manager module.
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Setting up websocket for controlchannel for instance: i-030c68d4bb30a2241, requestId: e5ac230f-b6f4-43b5-a269-e0aac69a4076
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [OfflineService] Starting document processing engine...
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [OfflineService] [EngineProcessor] Starting
Jan 22 16:38:11 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [OfflineService] [EngineProcessor] Initial processing
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [OfflineService] Starting message polling
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [OfflineService] Starting send replies to MDS
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [LongRunningPluginsManager] starting long running plugin manager
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [LongRunningPluginsManager] there aren't any long running plugin to execute
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [HealthCheck] HealthCheck reporting agent health.
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] listening reply.
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [LongRunningPluginsManager] There are no long running plugins currently getting executed - skipping their healthcheck
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [StartupProcessor] Executing startup processor tasks
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [StartupProcessor] Write to serial port: Amazon SSM Agent v2.3.372.0 is running
Jan 22 16:38:12 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [StartupProcessor] Write to serial port: OsProductName: CentOS Linux
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [StartupProcessor] Write to serial port: OsVersion: 7
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Opening websocket connection to: %!(EXTRA string=wss://ssmmessages.us-east-1.amazonaws.com/v1/control-channel/i-030c68d4bb30a2241?role=subscribe&stream=input)
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Successfully opened websocket connection to: %!(EXTRA string=wss://ssmmessages.us-east-1.amazonaws.com/v1/control-channel/i-030c68d4bb30a2241?role=subscribe&stream=input)
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Starting receiving message from control channel
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] ssm-user already exists.
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] File /etc/sudoers.d/ssm-agent-users already exists
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:10 INFO [MessageGatewayService] Successfully changed mode of /etc/sudoers.d/ssm-agent-users to 288
Jan 22 16:38:13 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:11 INFO [MessagingDeliveryService] [Association] No associations on boot. Requerying for associations after 30 seconds.
Jan 22 16:38:41 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:38:41 INFO [MessagingDeliveryService] [Association] Schedule manager refreshed with 0 associations, 0 new associations associated
Jan 22 16:40:51 ip-10-100-10-55 amazon-ssm-agent: 2019-01-22 16:40:51 INFO [HealthCheck] HealthCheck reporting agent health.

Scrapez fucked around with this message at 17:44 on Jan 22, 2019

Scrapez
Feb 27, 2004

Bit of an obscure question.

I'm trying to update a dns SRV record when a new instance is launched. I have a Lambda/python function that is performing a ChangeResourceRecordSets upsert and inserting the IP of the newly launched instance into the Value portion of the SRV record. The problem is that when I launch an additional instance, it replaces the value in the SRV record instead of appending the info for the new instance.

I thought with using UPSERT it was supposed to just update a record if it already exists. I'm assuming this somehow doesn't apply to SRV records or the value section specifically.

Is my only recourse to list-resource-record-sets for the record, throw the current value in a variable and then perform my ChangeResourceRecordSets adding the existing value and my new value?

Just trying to understand if UPSERT should be overwriting the value as I'm seeing or if it's something I'm doing incorrectly.

Scrapez
Feb 27, 2004

Docjowles posted:

https://docs.aws.amazon.com/Route53/latest/APIReference/API_ChangeResourceRecordSets.html


Upsert doesn't mean append. It means create if the record doesn't exist at all, or overwrite with the specified value if it does. So yes, you need to read it into a variable, append the string you want added, and then make an API call to set it to that new value.

Thanks. Makes sense. I wish they had something that could just append. Seems like a useful function that people would use.

Scrapez
Feb 27, 2004

What is the best method for triggering an autoscaling event based on output into a log file on the EC2 instances in the group? Use case is a SIP platform and I'd like to be able to trigger a scale out event when number of calls on any given instance reaches X.

Scrapez
Feb 27, 2004

JHVH-1 posted:

You can create a custom cloudwatch metric and then use it as your scaling criteria

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html

Thank you. That is exactly what I was looking for but google searches had not gotten me to that.

Scrapez
Feb 27, 2004

JHVH-1 posted:

Probably have to play around with the alarms and getting the right metrics so you have both scale up and scale down criteria based on something that covers the whole cluster.
My last company we had a developer that was populating a metric in their code and never thought about creating a scale down one, so the thing would get busy or a bug would scale it out like crazy and then never reduce it.

Definitely. I need to get my head around how I'm going to do that. I'll basically be starting with 3 instances, each having a call capacity of 500 calls. So the thought on scale up was just to spin up another instance if any of the 3 reach 450 concurrent calls. The scale down side becomes a bit more difficult. I'm thinking I can write the concurrent calls value from all active instances and if the average of that number falls below a certain value I terminate one of the instances. That will also require keeping track of how many of this type of instance are active.

The good thing about this platform, at least right now, is that it will be very predicable. There won't be huge peak calling events or anything like that. It likely will just continue to slowly scale up over time.

One of my fears is that I do this wrong and instances start launching willy nilly all crazy.

Scrapez
Feb 27, 2004

JHVH-1 posted:

You can do some tweaking based on time at least. I've done that before reducing the capacity over the weekends when I knew it wouldn't be that important.

At least you can make the metrics and create alarms and see what they do before making them the scaling criteria. Like if you know you have 0 calls during long periods on a regular basis you could set an alarm to scale down then or something.

Yeah that makes sense. Overnight or on weekends potentially. At this point, I don't know exactly who the customer base is going to be so it's possible that it will be a lot of international and there won't be a truly "slow time."

Being able to send anything you want back to Cloudwatch via the CLI is really awesome and I can use that for a whole host of other things from alarming problems to reports on various things.

Scrapez
Feb 27, 2004

I'm sure there's a very reasonable explanation but why can't you set DHCP Options Sets at the subnet level?

I have different types of machines in a single VPC and was hoping to be able to give them hostnames that would identify which type they are.

Scrapez
Feb 27, 2004

Agrikk posted:

This is a perfectly reasonable request and one that I have heard countless times before.

At a very high level, it’s a performance issue. Allowing dhcp option sets per vpc is one thing. Allowing option sets for subnets, that can exist at a ratio of several hundred to one, is something else.

But yeah, option sets for subnets would be awesome.

Yeah, I guess I understand that. Perhaps they could make it so that only subnets of a certain size would be allowed to have dhcp options sets.

Scrapez
Feb 27, 2004

Agrikk posted:

Speaking personally, I believe there should just be a fixed limit, say 5 or 10, of option sets per vpc. One could implement a default option set for the vpc and then allocate subnet option sets for special cases that could override the default.

But I don’t know the exact details from the networking guys, so I don’t know the real roadblocks to implementation.

Yeah that would be great. Second on my list of wants behind ELBs that can do UDP.

Scrapez
Feb 27, 2004

I'm trying to get my head around whether I need a Route 53 Outbound endpoint, Inbound inpoint, or both.

I have two VPCs. Each have their own dhcp option sets associated and hostnames and dns resolution enabled.
VPC 1 has option set production.instance and VPC 2 has option set scrapez.com
I have a VPC peering connection setup between them.

I want to be able to resolve the records in my production.instance. hosted zone from my instance in the scrapez.com VPC.

iE:
I need Instance 1: ip-10-0-0-200.scrapez.com to be able to resolve SRV record _sip._udp.pool.production.instance. which has underlying values of: ip-10-100-73-19.production.instance. and ip-10-100-96-92.production.instance.

Is a Route 53 outbound endpoint from the originating VPC the way to accomplish this? Or an inbound endpoint to the target VPC? Or other?

Scrapez
Feb 27, 2004


I associated both VPCs with the private hosted zone production.instance but the following query still fails from the instances in the source VPC: nslookup -type=SRV _sip._udp.pool.production.instance

Server: 10.0.0.2
Address: 10.0.0.2#53

** server can't find _sip._udp.bvr.production.instance: NXDOMAIN

Scrapez
Feb 27, 2004

Docjowles posted:

Just to ask the super dummy question, that same query works fine within the other VPC?

Not dumb. I appreciate the feedback. It does work within the VPC.

Edit: Follow-up associating the VPC with the private hosted zone DID resolve the issue. I just still had inbound and outbound endpoints setup that were breaking things. :negative:

Thanks, Docjowles!

Scrapez fucked around with this message at 15:43 on Feb 13, 2019

Scrapez
Feb 27, 2004

When performing a describe-network-interfaces, is there a way to do wildcards in the description filter to return all matching ENIs?

For example, I have two ENIs with descriptions of: TestAdapter0 and TestAdapter1

Is there a way to do something like `aws ec2 describe-network-interfaces --filters Name=description,Values="TestAdapter*"`

Edit: Gosh I'm dumb...that does work. I just wasn't putting the double quotes around the Value

Scrapez
Feb 27, 2004

Is there a cost associated with Elastic Network Interfaces? I can't find anything that talks about pricing so I think they're free to use but I can't find anything definitive.

Scrapez
Feb 27, 2004

Agrikk posted:

FYI-

Using a wildcard for the filter may result in multiple API calls being made in quick succession, which may result in RequestLimitExceeded errors depending on the amount of entries returned, other filters and other API activity in your account.

I'm not saying that it will happen, but it could happen depending on your use case.

So would it be better to set the description of all the ENIs to the same string (TestAdapter) and then instead do the query as: `aws ec2 describe-network-interfaces --filters Name=description,Values="TestAdapter"`

There's really no reason I need to do it as a wildcard. I had just planned to set descriptions as TestAdapter1, TestAdapter2, etc but it isn't really a requirement to do that.

Scrapez
Feb 27, 2004

Agrikk posted:

That is correct. If this process is going to be anything other than a one-off you should probably build a tagging scheme and do your search based on tags.

Right. Searching for them via tags does make much more sense. Then I can add unique descriptions. Thank you!

Scrapez
Feb 27, 2004

Cloudformation's 200 resource limit is a real bummer. I wanted to use CloudFormer to create a cloudformation template with everything in my us-east-1 region to replicate it in us-west-2 but I have 337 resources. It would be nice if Cloudformer could recognize this and break the resources into multiple nested templates.

Scrapez
Feb 27, 2004

Internet Explorer posted:

For anyone who has gone through the AWS certification process, about how much studying did it take you? I see that it basically goes Foundational, Associate, Professional, with subcategories along the way. Would it be unrealistic to try to get an Associate cert in a month? Did you do the live trainings or were the self-paced materials enough? The recommended experience is 6 months for Foundational, 1 year for Associate, 2 years for Professional. Did you find that to be accurate or is it something you can pick up with only a surface level of AWS experience?

I just took and passed the certified solutions architect associate.

I used a combination of a cloud.guru video courses and whizlabs exams to study. It's hard to say how many days I studied as life and work were so busy it was hard to dedicate a month continuously.

I think if you focused everyday studying for a month with the two, along with reading the white papers and doing the practice exercises, you have a decent chance of passing. It really just depends how quickly you learn and how well you retain knowledge.

The exam was hard and it isn't the type of exam where you can just memorize certain things and be good. You actually have to know what each AWS service does and how it can be used in conjunction with other services to solve an issue in the best and most cost effective manner.

Scrapez
Feb 27, 2004

Cloudformation drift detection...Does it just tell you that objects have changed since you launched your template or is there a way for it to produce an edited Cloudformation template that includes the changes? Or a separate template that only includes the additions/changes?

Scrapez
Feb 27, 2004

Agrikk posted:

No.

Cloud formation launches itself and then is done. Any subsequent changes to the environment has to be monitored by other means.

Gotcha. It would be neat if they could sync up drift detection with cloudformer to have it automatically generate a replacement template.

As it is, it isn't possible to use cloudformer to create a cloudformation template of say your objects in us-east-1 and restore said template to us-east-2 without manually building some objects.

Scrapez
Feb 27, 2004

I feel like I should be able to figure this out but I'm kind of stumped.

Trying to setup a NAT Gateway for a private subnet.

Have public subnet 10.10.1.0/24 which has an Internet Gateway attached. I've setup NAT Gateway 10.10.1.99 in this subnet.
Have private subnet 10.100.96.0/21. I'm trying to setup the route table for 0.0.0.0/0 to go through the NAT Gateway 10.10.1.99 but it is not listed in the NAT Gateway pulldown list.

The two VPCs that the two mentioned subnets are in have a VPC peering connection established between then but not sure that would have any impact on adding route for NAT gateway.

I'm guessing it's something glaringly obvious but I'm not seeing it. Anyone help me out?

Scrapez
Feb 27, 2004

You guys had it right. I was trying to share a NAT Gateway through a VPC peering connection which does not work. After I took a step back I kind of realized I didn't need multiple VPCs anyway. After consolidating into one, it works just fine. Thanks!

Scrapez
Feb 27, 2004

Anyone ever run into the issue of receiving "Server refused our key" when attempting to login to a machine? It worked fine a couple days ago and everyone with access to the machine swears they've not been in it but I receive that when trying to login with the appropriate key now.

AWS provides this as a resolution of placing code into user data to write the public key into the authorized_keys file:
https://aws.amazon.com/premiumsupport/knowledge-center/ec2-server-refused-our-key/

This does not work for me. I thought perhaps the server didn't have the cloud-init package installed so I put `yum -y install cloud-init` into user data and started the instance and then stopped and tried the adding of the key again to no avail.

I'm at a bit of a loss here. The only thing I can think of is that the /home/centos/.ssh/authorized_keys file is somehow corrupt.

Anyone have any ideas? Of course I did not take an AMI of the machine when it was in a healthy state as I should have.

Scrapez
Feb 27, 2004

deedee megadoodoo posted:

You can take an image now then fire up an instance using that AMI and use a known good key.

I tried that and still got the same error. I did use the same key when I launched it. I guess I could try it with a different key.

Edit: Created a new key. Launched an instance using the AMI of the "bad server" and still receive "Server refused our key" when attempting to login.

I've tried using both cloud-init and a general bash script to copy the public key into the /home/centos/.ssh/authorized_keys file and neither seem to work. Would that indicate that the file could be corrupt? user-data executes as root, right? So it shouldn't be a permissions issue? Not that I changed permissions on that file or directory structure anyway. This just happened out of the blue seemingly for no reason.

Scrapez fucked around with this message at 03:03 on Aug 30, 2019

Scrapez
Feb 27, 2004

Docjowles posted:

Riffing on your corrupt file theory, openssh is (rightly) very paranoid about file permissions. So maybe the .ssh dir or authorized_keys file is being created with inappropriate ownership or permissions? It should be 700 / 600 respectively and owned by the same user as the parent home directory. It's easy for these to be set overly broad in provisioning scripts because the defaults are usually like 755 / 644. If those didn't exist at all and were created as root during cloud-init, they probably have the wrong ownership and permissions unless you are actually logging in directly as root. Which is a bad idea, and also you're probably getting blocked by your sshd_config denying root login.

Also a simple thing, but make sure you are using the right username for your AMI.

That makes sense. This issue, I've discovered, is actually impacting any machine that uses this particular key pair. I'm using centos to login and my ppk file hasn't changed. I had someone else try to login using their ppk from a completely different machine and they also get the same error. It makes me think that the key pair within AWS where it shows the fingerprint has become corrupt or some sort of weirdness.

For one of the machines, I heard from someone else that they still have an active SSH session up so I'm asking them to send me the contents of /home/centos/.ssh/authorized_keys file so I can compare to the original public PEM.

Scrapez
Feb 27, 2004

So thankfully someone still had an SSH connection up to one of the impacted machines. I was able to jump onto her session and figure out that the permissions of /home/centos must have been changed. Once I changed them to 751, everything works fine to that machine.

Unfortunately, I still have one other machine in this same state. I've tried adding a script to user data to change permissions of /home/centos to 751 but that didn't seem to help. Will user data allow you to script the change of file permissions or is is somehow restricted from doing that as it could be a security risk?

Scrapez
Feb 27, 2004

AWS provided me with a script that had a bit more to it. Running that resolved the issue.

On another note, Is there a way to return only elastic IPs that are unattached through the CLI? I see where I can do an `aws ec2 describe-addresses` to return all elastic IPs. If I have to, I can script around that but thought perhaps I'm missing an easier way to only return unattached EIPs.

Writing a user-data script that will go find an unattached EIP and attach it to the EC2 instance when it starts.

Scrapez
Feb 27, 2004

This is a voip telephony application running on the ec2 instances and our outbound carrier has to whitelist IPs to allow them to make calls.

Scrapez
Feb 27, 2004

Cancelbot posted:

Based on this; https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-addresses.html

Could you apply a filter where association-id is null/empty string? Or pipe the JSON into jq where that attribute is missing?

This is what I ended up using:

aws ec2 describe-addresses --region us-east-2 --query 'Addresses[?AssociationId==null]' | jq -r '.[] | .PublicIp' | head -n 1

Scrapez
Feb 27, 2004

Is there a way to move files into an EFS directly from my desktop machine?

Right now, I have to SCP the files up to an EC2 instance and then copy them over to the mounted EFS.

Scrapez
Feb 27, 2004

I did see DataSync. That seems like the method I'll have to go.

To expand on my reason for needing this. I've setup our environment in the secure way that AWS suggests with a bastion ec2 host in a public subnet and then all of our ec2 machines in private subnets. The EFS storage is mounted on all the ec2 machines in the private subnets. So, if I want to transfer something up, I have to scp the files to the bastion host and then scp them from there over to the ec2 instance in the private subnets.

I don't want to attach the EFS to the bastion as the EFS may contain sensitive data that I wouldn't want accessible from a machine that's in a public subnet.

I'll give DataSync a try and see how that does. The crap part is that you have to pay for the service but it does seem very cheap (4 cents per gb of transfer.)

Adbot
ADBOT LOVES YOU

Scrapez
Feb 27, 2004

I have an environment setup in AWS where I have a bastion instance in a public subnet and multiple other ec2 instances in private subnets. I have an EFS setup and mounted on all the machines in the private subnets.

What is the best method for transferring files between my PC and the EFS? As it sits now, to get a file from the EFS to my local machine, I have to scp it out to the bastion instance and then scp it from the bastion back to my PC.

I noticed AWS DataSync but that seems to be for copying huge swaths of data to an EFS in one fell swoop rather than transferring individual log files from time to time like I'm trying to do.

Is there a better way than secure copying the file twice to get it back to my machine?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply