|
Yeah we've been kicking that around too. My other idea is to use terraform. Ideally this would be done serverless with lambda and boto. Anyway, thanks for the input everyone. I'm gonna go yell at R&D to get with the drat program. Or yell at someone else to stop calling our stuff cloud ready.
|
# ? Feb 19, 2020 02:41 |
|
|
# ? May 25, 2024 14:12 |
|
Maybe look at AWS CDK. It's basically a tool to generate CloudFormation. It seems to me like you're going for ASGs because "that's the thing that spins up a lot of servers at once" but you can just write a loop in CDK and make whatever you want.
|
# ? Feb 19, 2020 04:45 |
|
Matt Zerella posted:The idea is to spin this up, do a bunch of processing and tear it down. The Linux to Windows link is due to a service that only runs on Windows and consumes data from the Linux machine and sends it back. If this is a batch processing job, it makes it much easier and you don't need a ASG for this at all. But this is a simple fixed infrastructure design: Your workflow looks something like this: - A workload gets dumped into SQS - a lambda is triggered that spins up and tags a preset number of linux boxes and the same number of windows boxes - each windows box finds its linux "mate" via tagging and creates a secure connection - linux boxes drain the queue and shove workloads to its windows mate for processing - upon completion of the queue instances shut down and terminate note that termination is optional. Since this appears to be a recurring task, you could just as easily save yourself some baking time by shutting them down but preserving the EC2 instances until they are powered up by the next batch.
|
# ? Feb 19, 2020 05:22 |
|
Agrikk posted:If this is a batch processing job, it makes it much easier and you don't need a ASG for this at all. But this is a simple fixed infrastructure design: The idea is that part of the trigger process will spin up multiple pairs based on time need. We currently have one node pair testing batch consumption to get a general feel for where we need resources (memory, disk io, could) so we can give a general "if you need it this fast you need this many nodes" estimate. Though I guess we could set a ceiling and just manually spin up pairs. And yeah, shutting down could work too since we would be doing this on a weekly basis. Thanks for this. And thank you to everyone else who's chimed in.
|
# ? Feb 19, 2020 05:30 |
|
I can also think of lovely ways to do this with mesos, nomad or Kubernetes if you want to get even weirder.
|
# ? Feb 19, 2020 05:49 |
|
Yeah if mangling your infrastructure is in scope, you just need two StatefulSets and a CronJob. Once your mixed-OS kube cluster is up and running.
|
# ? Feb 19, 2020 07:34 |
|
What’s the general opinion on XRay? I’m looking at tracing and was leaning towards opentracing compatible tools and vendors, but XRay is very attractive price wise. Anyone have any experience with it, and wants to tell me why it’s a bad idea?
|
# ? Feb 19, 2020 07:37 |
|
The devs at work are in love with tracing everything all the time. They're completely dependent on being able to poke at Jaeger to see what happened when poo poo goes wrong. So we ingest about a billion traces per day. Total spend is maybe 5% what xray would cost at that scale. If you're calculating the price assuming sampling, I recommend don't. Trace everything, put it all in Jaeger or whatever, enjoy life again. If you're going to have low volume even without sampling, take a look at https://github.com/opentracing-contrib/java-xray-tracer. I haven't used it but it's probably fine, whatever, and then you're not tied to XRay.
|
# ? Feb 19, 2020 09:04 |
|
I feel better about my harebrained design now that a TAM
|
# ? Feb 19, 2020 14:14 |
|
Docjowles posted:I feel better about my harebrained design now that a TAM Careful, though. My idea came while in full vacation mode while waiting for my buffalo wings to arrive. I reserve the right to make fun of my own idea when I
|
# ? Feb 20, 2020 05:01 |
|
I'm at work, and the design is good. It's not been compromised by buffalo wings hunger.
|
# ? Feb 20, 2020 14:23 |
|
Nomnom Cookie posted:The devs at work are in love with tracing everything all the time. They're completely dependent on being able to poke at Jaeger to see what happened when poo poo goes wrong. So we ingest about a billion traces per day. Total spend is maybe 5% what xray would cost at that scale. If you're calculating the price assuming sampling, I recommend don't. Trace everything, put it all in Jaeger or whatever, enjoy life again. Thanks, that's good info. We're going to start up with Jaeger anyways so we have some idea on the maintenance effort of doing it ourselves. The bigger concern is the maintenance required for the ES or Cassandra backend.
|
# ? Feb 20, 2020 21:12 |
|
Cerberus911 posted:Thanks, that's good info. Our experience with the C* backend was poor and I don't recommend it. Switching from C* to ES for span storage cut the storage CPU/memory usage approximately in half, cut collector CPU by about 90%, and fixed persistent congestion in the collectors' queues for our production Jaeger instance. Recent ES versions can do recovery in a reasonable way and clean up old indexes automatically, so in the few months since switching we've had zero problems after dialing in cluster and index settings. I can DM you the settings we ended up with (do recommend if you haven't used ES before, the out of the box experience with Jaeger is not good).
|
# ? Feb 21, 2020 05:52 |
|
Nomnom Cookie posted:Our experience with the C* backend was poor and I don't recommend it. Switching from C* to ES for span storage cut the storage CPU/memory usage approximately in half, cut collector CPU by about 90%, and fixed persistent congestion in the collectors' queues for our production Jaeger instance. Recent ES versions can do recovery in a reasonable way and clean up old indexes automatically, so in the few months since switching we've had zero problems after dialing in cluster and index settings. I can DM you the settings we ended up with (do recommend if you haven't used ES before, the out of the box experience with Jaeger is not good). Can you send me the ES settings as well? I’m always curious to know more about ES v. C activity.
|
# ? Feb 21, 2020 06:05 |
|
Nomnom Cookie posted:I can DM you the settings we ended up with (do recommend if you haven't used ES before, the out of the box experience with Jaeger is not good). Thanks, would really appreciate that!
|
# ? Feb 21, 2020 17:51 |
|
I’m an ES stack contributor and would appreciate the settings to make the docs for tracing not suck
|
# ? Feb 22, 2020 16:57 |
Is this the place to ask about how to do something particular with Terraform ?
|
|
# ? Feb 25, 2020 18:08 |
|
CyberPingu posted:Is this the place to ask about how to do something particular with Terraform ? Sure why not
|
# ? Feb 25, 2020 20:04 |
|
There's also a lot of people with
|
# ? Feb 25, 2020 20:21 |
Ah ok might be better asking there but I'll pop it here too. We build our infrastructure as IaC, using terragrunt/Terraform. Ive just finished building a module that creates cloudtrail and associated logging for CT. If I wanted to use a log group that gets created by that module when running terragrunt how would I go about that? Would it need a dependency?
|
|
# ? Feb 25, 2020 20:56 |
|
Is there a way to nest customer and aws managed IAM policies? We have a ton of default permissions that we need to add to every machine role to enable ssm, cloudwatch, various other security logging and monitoring tools that write to s3, stuff like that, and I wanted to create a single customer-managed "default" policy that can then be attached our an instance profile / machine roles for various instances. I could create a combined json of all of the access needed but that doesn't allow us to leverage aws managed policies and it'd be static (and much harder to read). I didn't see a way to embed the ARN of aws managed policies, but it's possible I missed it. The alternative seems to be to attach maybe 6-7 different aws managed policies directly to every single role we have, plus our custom ones, which seems a little cumbersome - maybe deliberately so, since extensive use of machine roles isn't exactly best practice? Could probably ask our TAM as well, but I don't know really the line between proserv and support. Bhodi fucked around with this message at 00:16 on Feb 26, 2020 |
# ? Feb 26, 2020 00:12 |
|
We attach tons of customer managed and aws managed policies to stuff all the time. It's better IMO because it decouples the creation of the policy from the application of the policy to a principal. For auditing, for example, you can have controls around the contents of the policies (especially, restricting updates to them) and you can have separate controls and permissions around attaching them. To verify that machines are meeting compliance, you just list the attached policies and compare. If you do any work with complex cross-account permissions you'll probably end up doing stuff like this anyway with permissions boundaries, so the possibility of code re-use is pretty high. Embedding policies is something we also do, but we do it with jinja2. It sucks in exactly the way you describe and I found the list of attached, managed policies to be a lot easier to work with, but it's very possible to just render a policy document from some other source. CyberPingu posted:Ah ok might be better asking there but I'll pop it here too. Just to double check, are you familiar with terraform outputs and how you consume them from modules? I don't use terragrunt, but I took a quick look at the docs, and it seems like consuming module outputs is basically just normal terraform.
|
# ? Feb 26, 2020 00:23 |
|
terragrunt is just a wrapper around terraform that gives you some convenience features and syntactic sugar. And yeah I think module outputs are the solution to the original question. Make the id of the resource you need an output and then refer to that in the resource that needs to use it as a parameter.
|
# ? Feb 26, 2020 01:05 |
Ah yeah that would make sense. I though about outputs but got a bit lost trying to tie it all together.
|
|
# ? Feb 26, 2020 08:30 |
|
If you are consuming terraform output via state file from some other application I like using ssm parameters. We have some terraform stuff that ultimately gets used by python and bash scripts and it’s much easier for me to get it that way. We also do the opposite and use terraform to consume ssm parameters from outside apps.
|
# ? Feb 26, 2020 13:43 |
deedee megadoodoo posted:If you are consuming terraform output via state file from some other application I like using ssm parameters. We have some terraform stuff that ultimately gets used by python and bash scripts and it’s much easier for me to get it that way. We also do the opposite and use terraform to consume ssm parameters from outside apps. I found a way to do it via output that is then fed into a dependency in the .hcl file
|
|
# ? Feb 26, 2020 13:58 |
|
Bhodi posted:Could probably ask our TAM as well, but I don't know really the line between proserv and support. Not responding directly to your question but more generally: Always open a support case. You pay for support, why not use it? Open a low severity general information case and select the web option. Paste the body of your post into the case, click send and lean back, smug in a job well done. In 1-2 days you will have a nicely formatted and annotated response with an answer to your question. If you aren’t sure if you should open a support case, always err on the side of opening one. Please give us a chance to help you. We know a thing or two because we’ve seen a thing or two. Hell- I work here and I open support cases when I don’t know something. Cloud Support Engineers are pretty good, yo.
|
# ? Feb 26, 2020 17:32 |
|
Always reach out to your TAM your never know what fun toys are tucked away behind a NDA
|
# ? Feb 26, 2020 18:15 |
|
I'm trying to help our development team structure their AWS setup a bit better, and have been reading up on AWS Organizations and SSO since it started supporting SAML and SCIM from external directories rather than having to run a managed AD or an AD connector. Currently everything happens in one AWS account and the dev team handle their own account creation/revocation which isn't going to be an option as the team grows. From what I've read the way to go is: - Enable AWS Organisations and enable SSO in the master account, link to Azure AD and assign roles to AD groups - Create a new AWS account for each purpose (dev, test, prod, anything being developed for a third party so the account can be just passed over to them if required) - Don't use IAM accounts any more, use temporary IAM role accounts with CLI tools Is it recommended to create a new AWS account to use for the master account role in Organizations, and then invite the current AWS account as a member? I vaguely remember reading this somewhere but I can't find any reference to it in the AWS docs now. Will this all end in tears or is Organizations w/SSO a mature offering now? I know I am going to get a load of poo poo from the dev team the first time they're trying to follow some AWS docs and there's a disclaimer about it not working for SSO users (think the amount of Google stuff that doesn't work with G Suite accounts). If I need to hold off to have a smoother implementation experience then I can do that, it's already going to be enough of a struggle to stop this team doing whatever the gently caress they want but it's achievable if the end result is positive.
|
# ? Feb 27, 2020 21:47 |
|
Sorry for the terse phone reply, but yes, all of what you posted is a good idea and the services are solid. Including making a new account to serve as the organizational master that does nothing except authentication and billing. AWS promotes this pattern all over the place in white papers, reInvent talks, the guidance we got from our account team when starting out, etc.
|
# ? Feb 27, 2020 23:19 |
|
Really simple question for the experts here I'm sure, but I have an ECS cluster and the underlying EC2 instance (just one at the moment, it's an in-development project) is using an AMI that now has a more recent version available. I know how to change the AMI and so on, but what is the easiest way to quickly locate the AMI ID for the latest version of this AMI? Searching for the AMI brings up a bunch of different ones and there is no way to sort by version, or even creation date.
|
# ? Feb 27, 2020 23:54 |
|
a hot gujju bhabhi posted:Really simple question for the experts here I'm sure, but I have an ECS cluster and the underlying EC2 instance (just one at the moment, it's an in-development project) is using an AMI that now has a more recent version available. I know how to change the AMI and so on, but what is the easiest way to quickly locate the AMI ID for the latest version of this AMI? Searching for the AMI brings up a bunch of different ones and there is no way to sort by version, or even creation date. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-ami-versions.html says code:
code:
|
# ? Mar 2, 2020 00:09 |
|
Skier posted:https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-ami-versions.html says Oh nice, thanks!
|
# ? Mar 2, 2020 01:12 |
|
What's a good way to automatically compress an output file from a Glue Python script? The files could run rather large (~100GB) and I'd prefer to stream the resulting .csv or .parquet or whatever straight into a .gzip file. I've written Java code to take files the other way, extracting data in-memory without using a permanent compute space, but currently streaming a zip and chunking that into multipart upload to S3 properly is hurting my brain so I figured I'd ask here. Obviously a Python solution would be preferable so I can just ball it into the Glue but at this point I'd take any example somehow had and work from there. Edit: Apparently using S3DistCp against an EMR cluster is recommended. I looked into the backend code as it essentially just uses temporary disk space to compress the file(s) and then do a standard multipart upload. This works well enough but my dream of having the whole thing done in memory to avoid the need for dedicated space continues. PierreTheMime fucked around with this message at 14:41 on Mar 10, 2020 |
# ? Mar 9, 2020 18:11 |
|
Agrikk posted:What’s the position? Persistence does seem to be the key as im now got an interview for a devops consultant role lined up
|
# ? Mar 10, 2020 19:57 |
|
PierreTheMime posted:Edit:
|
# ? Mar 11, 2020 01:53 |
|
Did some Googling of this earlier today but couldn't find an answer: is it possible to delete AWS reserved tags from resources? I've got a bunch of EC2 instances that were spun-up from CFN templates that have since been deleted however the instances all still have the AWS reserved tags for CFN on them. If you try to delete the tags via the console it just spits an error about how you can't delete AWS reserved tags.
|
# ? Mar 13, 2020 13:04 |
|
Is cloudacademy a good resource for breaking into AWS cloud stuff? Or is it acloudguru all the way?
|
# ? Mar 17, 2020 01:02 |
|
Pile Of Garbage posted:Did some Googling of this earlier today but couldn't find an answer: is it possible to delete AWS reserved tags from resources? I've got a bunch of EC2 instances that were spun-up from CFN templates that have since been deleted however the instances all still have the AWS reserved tags for CFN on them. If you try to delete the tags via the console it just spits an error about how you can't delete AWS reserved tags. I'm not an AWS certified professional certified community hero but i think the error message answered your question
|
# ? Mar 17, 2020 03:58 |
|
|
# ? May 25, 2024 14:12 |
|
Nomnom Cookie posted:I'm not an AWS certified professional certified community hero Send me your contact details because I can't nominate a community hero with just an SA username
|
# ? Mar 17, 2020 04:19 |