Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »

Methanar: Sep 26, 2013; by the sex ghost

the elbv2 API is kind of a pain to work with compared to the old elb one.

I'm probably going to need to write python to actually traverse the extra target groups level of indirection. Whereas I did my thing easily in jq for old elb

code:

cat elb.json | jq -r '
  .LoadBalancerDescriptions[] | {
    LoadBalancerName: .LoadBalancerName,
    Instances: .Instances | select(. | length == 0)}'

# ? Jun 3, 2021 21:42

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 23:04

xpander: Sep 2, 2004

CyberPingu posted:

Hi all,

Dont suppose anyone has or know where to find a handy script that could check for AWS creds that are >90 days since last used and disables them.

Im trying to automate this into a lambda function as sack doing this manually.

Good news! There's a Config rule that'll do exactly this: https://docs.aws.amazon.com/config/latest/developerguide/iam-user-unused-credentials-check.html

# ? Jun 4, 2021 19:25

CyberPingu: Sep 15, 2013; If you're not striving to improve, you'll end up going backwards.

xpander posted:

Good news! There's a Config rule that'll do exactly this: https://docs.aws.amazon.com/config/latest/developerguide/iam-user-unused-credentials-check.html

Awesome. Now I need to turn that into Terraform code

# ? Jun 5, 2021 12:16

drunk mutt: Jul 5, 2011; I just think they're neat

CyberPingu posted:

Awesome. Now I need to turn that into Terraform code

Seems like it'd be something you'd want to do in Ansible and not Terraform.

The function is a configuration of a resource, not a provision of one; and I would see it as an anti-pattern to force TF to do it. Do you really care about managing state for user IAM policies?

Edit: To correct, user policies are weird in my head, but service policies are a different thing and would be handled during resource provisioning.

drunk mutt fucked around with this message at 00:45 on Jun 7, 2021

# ? Jun 7, 2021 00:43

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

drunk mutt posted:

Seems like it'd be something you'd want to do in Ansible and not Terraform.

The function is a configuration of a resource, not a provision of one; and I would see it as an anti-pattern to force TF to do it. Do you really care about managing state for user IAM policies?

Edit: To correct, user policies are weird in my head, but service policies are a different thing and would be handled during resource provisioning.

Static configurations, such as how you'd configure an AWS Config rule, should probably be in TF in the same way that ALB SSL policies should be. No need to add additional overheard to something that's either on or off and its state shouldn't change unless the rule configuration changes.

# ? Jun 7, 2021 00:57

drunk mutt: Jul 5, 2011; I just think they're neat

Blinkz0rz posted:

Static configurations, such as how you'd configure an AWS Config rule, should probably be in TF in the same way that ALB SSL policies should be. No need to add additional overheard to something that's either on or off and its state shouldn't change unless the rule configuration changes.

For me this isn't a static configuration because it's directly related to a user's access policy.

I agree though that general IAM policies should be handled within TF, but associating users to them and what not becomes more of a configuration over a provision in my head.

# ? Jun 7, 2021 01:28

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

drunk mutt posted:

For me this isn't a static configuration because it's directly related to a user's access policy.

I agree though that general IAM policies should be handled within TF, but associating users to them and what not becomes more of a configuration over a provision in my head.

No what they were asking about, if I understood correctly, is static configuration. It's enabling an AWS Config Rule with settings. You can do it in TF with the aws_config_config_rule resource where the source object looks like this

code:

source {
    owner             = "AWS"
    source_identifier = "IAM_USER_UNUSED_CREDENTIALS_CHECK"
  }

Now if you're referring to managing IAM users with TF that's kind of a moot point because really you shouldn't be using users at all.

# ? Jun 7, 2021 02:22

CyberPingu: Sep 15, 2013; If you're not striving to improve, you'll end up going backwards.

Since we are a fully IaC house it needs to be done in TF if possible.

Thanks Blikz, I'll take a look at that.

We only have IAM users to set up creds for console access via cli.

I might hit a snag as I'm not sure if disabling a user in the console prevents their cli access. As the "unused creds for 90 days" only checks when they logged into the console. Not any cli based work.

# ? Jun 7, 2021 08:16

JehovahsWetness: Dec 9, 2005; bang that shit retarded

Just to be clear, IAM_USER_UNUSED_CREDENTIALS_CHECK / AWS Config only marks the IAM User as NON_COMPLIANT when their keys cross the maxCredentialUsageAge threshold. It won't disable the key or anything. If you want something to actually happen beyond seeing it turn red in the Config Dashboard you'll need to wire up the SNS Topic that Config notifies for changes w/ a Lambda and do it there.

We actually swapped a lot of the managed Config rules with Custom rules at the org level because the managed rules don't have exemptions or are too simple for a lot of our use cases. Custom rules also allow us to keep the exemptions in git w/ the rules and everything so we have change histories around them.

# ? Jun 7, 2021 11:37

CyberPingu: Sep 15, 2013; If you're not striving to improve, you'll end up going backwards.

Yeah that's kind what I'm after.

Ive written lambda functions and sns topics before. I was just wondering if someone had already done it here before I go inventing the wheel.

# ? Jun 7, 2021 12:25

Methanar: Sep 26, 2013; by the sex ghost

Another day another incident response

My migration today, which had been carried out in 4 other environments with zero issues whatsoever, finally ran into a snag and caused an issue because inconsistencies between environments. I actually knew of the thing that was the cause as a liability and I've been trying to fix it for 2+ months but being constantly sandbagged by everybody else has let it linger dangerously for obviously too long.

Even though this was in my hands when it blew up, I don't take responsibility for it because it would have blown up on anybody. I'm just tired of being the one holding the bag. The application in question that broke didn't even have any monitoring on it. It took 5 hours before anybody noticed the thing was broken. Once I got paged in I knew what happened pretty much immediately at least.

I've just written up the RCA and I'm pretty frustrated!

Methanar posted:

This has been an awful day in an awful week.

Someone implied to me that some issue from 2 months ago was my fault, which is total bullshit.

I can't wait for tomorrow when I get grilled on why I suck at my job!

Methanar fucked around with this message at 06:35 on Jun 8, 2021

# ? Jun 8, 2021 06:31

CyberPingu: Sep 15, 2013; If you're not striving to improve, you'll end up going backwards.

Oh hey I get that all the time. Don't worry about it too much.

I made a change last month that forced traffic to our cdn over https (because you know, it's loving 2021). I was told several times "Yeah it's grand this won't really cause any issues".

Cue to pushing the changes to prod, 5 mins later our PDF rendering service breaks. Incident gets raised and I get grilled for not testing it properly.

Funnily enough our front end team and QA team went very quiet when I asked them where the testing matrix was for this service.
Spoiler. It didn't exist and there was literally no way of me knowing this would break.

# ? Jun 8, 2021 07:48

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

Yeah, blaming the person holding the bag for a change going wrong is shithouse management. Short of deliberately deciding to gently caress things up, a series of things has gone wrong to let it get to that point.

# ? Jun 8, 2021 07:55

BaseballPCHiker: Jan 16, 2006

Hello DevOps thread.

I took a position in InfoSec at a large corp about 6 months ago. One of my new responsibilities in this job is to be on our cloud team in charge on incident response in AWS. I've quickly learned that aside from getting good at knowing and understanding GuardDuty alerts, how to read JSON formatted CloudTrail messages, I have a crap ton to learn about Ansible, CloudFormation, TerraForm, etc.

My background is as an infrastructure guy, primarily enterprise networking with some VMware and Windows experience as well. Part of what made me a good fit for my position in InfoSec initially is that I actually understood the underlying technology, I wasnt just reading some automated report that was generated and telling someone to close a port or something. The other teams knew that I actually knew my stuff and having done the work before I was able to be much more effective in my role.

That long rambling is my way of saying that I need to start understanding DevOps now if I want to be good at my current job in securing our AWS environment. I just passed my AWS security specialty cert, and I have my solutions architect associate cert. They did a good job giving me the basics and introducing me to a lot of AWS services but I need to get better across the board. Its clear that SO MUCH MORE of AWS security is on the development side. Proper IAM policies, Config rules, alerts based on CloudTrail logs, etc.

I'm hoping to learn DevOps within AWS now but am not sure where to start. I have an ACloudGuru subscription and they have an online lab where I can try to run some labs and learn. I have my own AWS account and have setup some things for myself to learn. I made a home backup solution with S3, using lifecycle rules and cross region replication. I have a simple WordPress setup on an EC2 instance that I setup to automatically update and patch, just simple projects like that.

Does anyone here have any recommendations on projects I could start on my own to learn more? Any other learning resources? Any advice at all for the field?

# ? Jun 8, 2021 14:10

crazypenguin: Mar 9, 2005; nothing witty here, move along

A biggish new feature that might not show up in training yet is this sucker:

https://aws.amazon.com/blogs/aws/new-use-aws-iam-access-analyzer-in-aws-organizations/

It should immediately surface the worst IAM policies any dev in the org got wrong. I�ve tried it at the account level and it was great.

# ? Jun 8, 2021 16:58

LochNessMonster: Feb 3, 2005; I need about three fitty

BaseballPCHiker posted:

Does anyone here have any recommendations on projects I could start on my own to learn more? Any other learning resources? Any advice at all for the field?

This guy has nice training material for AWS and also has some demo labs you could try: https://github.com/acantril/learn-cantrill-io-labs

As for projects, you can basically do anything you find interesting or convenient. Some time ago there was a guy teaching AWS courses to people with little or no computer touching background that did a write up on [url= https://forrestbrazeal.com/2020/04/23/the-cloud-resume-challenge/]how to build your resume as a website[/url] that touched a lot of different aspects of AWS. I think it included S3, Cloudfront, R53, DynamoDB (to keep track of a visitor counter) and it was using Github Actions for CI/CD.

These are just some options but there are so many things you can do, it�s best to have some broad idea what tech you want to toch and come up with a use case for that. So if you would like to learn about RDS/Kinesis/Glue you�d have to come up with a completely different project then say working with Alexa.

# ? Jun 8, 2021 17:14

BaseballPCHiker: Jan 16, 2006

crazypenguin posted:

A biggish new feature that might not show up in training yet is this sucker:

https://aws.amazon.com/blogs/aws/new-use-aws-iam-access-analyzer-in-aws-organizations/

It should immediately surface the worst IAM policies any dev in the org got wrong. I�ve tried it at the account level and it was great.

Yeah that is a fantastic tool! It didnt show up on my exam at all or in any of my testing materials.

LochNessMonster posted:

These are just some options but there are so many things you can do, it�s best to have some broad idea what tech you want to toch and come up with a use case for that. So if you would like to learn about RDS/Kinesis/Glue you�d have to come up with a completely different project then say working with Alexa.

Thanks for the link and the advice. My place uses Ansible quite a bit and CloudFormation, so maybe I'll just get my feet wet with those two services and start learning what I can.

# ? Jun 8, 2021 17:19

LochNessMonster: Feb 3, 2005; I need about three fitty

BaseballPCHiker posted:

Thanks for the link and the advice. My place uses Ansible quite a bit and CloudFormation, so maybe I'll just get my feet wet with those two services and start learning what I can.

For Ansible check out Jeff Geerlin (@geerlingguy)

# ? Jun 8, 2021 23:03

deedee megadoodoo: Sep 28, 2000; Two roads diverged in a wood, and I, I took the one to Flavortown, and that has made all the difference.

LochNessMonster posted:

For Ansible check out Jeff Geerlin (@geerlingguy)

A lot of his stuff doesn�t support Amazon Linux so you�ll run into issues if you�re trying to run his code in AWS. I�m just mentioning it because we used a bunch of his code but kept hitting weird issues.

deedee megadoodoo fucked around with this message at 03:52 on Jun 10, 2021

# ? Jun 9, 2021 17:41

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Yeah I mean that's Amazon Linux and ansible in general IMHE

# ? Jun 9, 2021 20:54

Methanar: Sep 26, 2013; by the sex ghost

Anyone else eat poo poo today over eu-central-1c AZ going down?

# ? Jun 10, 2021 23:27

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Methanar posted:

Anyone else eat poo poo today over eu-central-1c AZ going down?

Our machines we're in a and b but not c

# ? Jun 11, 2021 00:53

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Methanar posted:

Anyone else eat poo poo today over eu-central-1c AZ going down?

We lost a NAT router and our EKS nodes in that AZ couldn't talk to the control plane. We had to ultimately point our vpc route table to the ENI of our NAT router in 1a to not have 1/3 of our traffic fall into a black hole.

Fun times.

# ? Jun 11, 2021 00:58

post hole digger: Mar 21, 2011

Anyone able to help me debug an issue with cloud-init in AWS?

I have a file in /var/lib/cloud/scripts/per-instance that should run on first boot. Normally, when I spin up a new instance, it does. However, if I deploy an EC2 instance (Custom CentOS 7 that created with Packer, which is where the per-instance script gets added in) with additional commands in provided via AWS's 'user data' field, the user data script will run, but the per-instance script will not.

cloud-init analyze show -i /var/log/cloud-init.log doesn't show me anything obviously wrong. Is there anything I'm missing? I have had a hard time finding info on whether cloud-init uses some sort of precedence that I'm just missing where per-instance scripts won't run if a user data script is provided.

Normally I'm pretty good at search fu but I'm having a hard time formalizing the issue I'm having in a way thats turning up any useful results.

edit: this was caused by an engineer putting more into his user data script than he initially let on 🙃

post hole digger fucked around with this message at 21:57 on Jun 11, 2021

# ? Jun 11, 2021 02:09

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Methanar posted:

Anyone else eat poo poo today over eu-central-1c AZ going down?

I basically got paged to stop and start an instance when it came back. Yay for no resilience for customers paying millions. But it�s ok I guess because they agreed to the architecture and we�re still within SLAs

# ? Jun 11, 2021 05:13

LochNessMonster: Feb 3, 2005; I need about three fitty

Methanar posted:

Anyone else eat poo poo today over eu-central-1c AZ going down?

eu-west-1 for lyfe

# ? Jun 11, 2021 08:01

CyberPingu: Sep 15, 2013; If you're not striving to improve, you'll end up going backwards.

LochNessMonster posted:

eu-west-1 for lyfe

# ? Jun 11, 2021 08:25

Methanar: Sep 26, 2013; by the sex ghost

My life exists in a state of constant fire.

# ? Jun 15, 2021 01:31

madmatt112: Jul 11, 2016; Is that a cat in your pants, or are you just a lonely excuse for an adult?

Methanar posted:

My life exists in a state of constant fire.

Hail Satan - he invented networks.

# ? Jun 15, 2021 15:13

Hadlock: Nov 9, 2004

How would one presumably break into the 4-12 hours a week part time infrastructure planning consulting business, or does that even exist

I've talked to the big three consulting firms and they do this but typically they hire you out for 3-16 weeks at a time, and you're a full time employee with plenty of travel. They also have name recognition + a waiting list for their services.

Seems like there's plenty of room in the market for additional "please look at my infrastructure spaghetti and tell me how to ~~unfuck everything~~ fix it" consultancy

I get lots of warm leads for, "we need your experience designing infrastructure, but to justify the rest of your time/salary here, you're gonna do toil and bitch at product owners until they follow best practices" the first part sounds interesting, the second half sounds like an entry level job. But the recruiter only gets their bonus if they fill the job, I don't think they've ever passed back my suggested consulting ideas to the hiring manager

# ? Jun 21, 2021 01:08

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Hadlock posted:

How would one presumably break into the 4-12 hours a week part time infrastructure planning consulting business, or does that even exist

I've talked to the big three consulting firms and they do this but typically they hire you out for 3-16 weeks at a time, and you're a full time employee with plenty of travel. They also have name recognition + a waiting list for their services.

Seems like there's plenty of room in the market for additional "please look at my infrastructure spaghetti and tell me how to ~~unfuck everything~~ fix it" consultancy

I get lots of warm leads for, "we need your experience designing infrastructure, but to justify the rest of your time/salary here, you're gonna do toil and bitch at product owners until they follow best practices" the first part sounds interesting, the second half sounds like an entry level job. But the recruiter only gets their bonus if they fill the job, I don't think they've ever passed back my suggested consulting ideas to the hiring manager

Corey Quinn had a thread on Twitter recently about independent consulting; the short answer is your personal network.

# ? Jun 21, 2021 01:32

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Yeah it's tough. I have a LLC I do 1099 stuff through when I'm in between jobs or just need extra cash. It's tough because most of my contract engagements are interested in not just DevOps but full IT consulting, and most are not interested in finding out they can't just sprinkle DevOps and terraform on their existing landscape and solve all their problems.

If you're just interested in writing a bunch of YAML or whatever maybe look at Fiverr or similar sites and see what you can find there.

# ? Jun 21, 2021 03:16

Hadlock: Nov 9, 2004

Gyshall posted:

most are not interested in finding out they can't just sprinkle DevOps and terraform on their existing landscape and solve all their problems.

Sigh

PCjr sidecar posted:

thread on Twitter

Offtopic on my own topic, but this is the worst phrase to come out of the 2010s, after "Fake News" :argh:

# ? Jun 21, 2021 05:07

New Yorp New Yorp: Jul 18, 2003; Only in Kenya.; Pillbug

Hadlock posted:

Sigh

Offtopic on my own topic, but this is the worst phrase to come out of the 2010s, after "Fake News"

No, it's using "low-key" as a synonym for "minor" or "slightly"

# ? Jun 21, 2021 05:46

philihp: Jun 1, 2008

Hadlock posted:

Seems like there's plenty of room in the market for additional "please look at my infrastructure spaghetti and tell me how to ~~unfuck everything~~ fix it" consultancy

I get lots of warm leads for, "we need your experience designing infrastructure, but to justify the rest of your time/salary here, you're gonna do toil and bitch at product owners until they follow best practices" the first part sounds interesting, the second half sounds like an entry level job. But the recruiter only gets their bonus if they fill the job, I don't think they've ever passed back my suggested consulting ideas to the hiring manager

That's not an entry level job, that's the job of a tech lead, and it's not fun.

I think you might have some success with a consultancy that does something like diagnose problems with infra, and then also implement the solutions. I've seen a successful strategy in getting others to solve best practices is to enforce linter checks on pull requests, or code coverage bots that fail if the coverage drops by some unreasonable amount. If your repo has more than a few dozen regular committers, this is a really good way to enforce code quality at scale.

# ? Jun 30, 2021 22:21

Methanar: Sep 26, 2013; by the sex ghost

I've turned off a million dollars of annual spend worth of wasted EC2 instances this week.

Ridiculous.

# ? Jul 1, 2021 18:39

JehovahsWetness: Dec 9, 2005; bang that shit retarded

Methanar posted:

Ridiculous.

supposedly we own the highest number of 3-node GKE clusters in gcp

# ? Jul 1, 2021 18:48

Methanar: Sep 26, 2013; by the sex ghost

JehovahsWetness posted:

supposedly we own the highest number of 3-node GKE clusters in gcp

Impressive. What's the overhead ratio of running a million small GKEs. Of the control planes and unused CPU of the workers / actually used by the applications.

# ? Jul 1, 2021 18:51

The Iron Rose: May 12, 2012; Cat Army

Methanar posted:

I've turned off a million dollars of annual spend worth of wasted EC2 instances this week.

Ridiculous.

we've spent five figures on a completely unnecessary hot standby VM instance that's not used and nobody will let me turn the drat thing off, even though everyone agrees it's unnecessary :negative:

Methanar posted:

Impressive. What's the overhead ratio of running a million small GKEs. Of the control planes and unused CPU of the workers / actually used by the applications.

i mean just running a cluster 24/7 is $800ish a year if i recall right, even before you add in the resource utilization. that adds up pretty quick!

# ? Jul 1, 2021 19:09

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 23:04

JehovahsWetness: Dec 9, 2005; bang that shit retarded

gke starting to charge for the control plane really hosed us up. i've never bothered to figure it out but i'm betting total fleet cpu utilization is <10%.

https://www.youtube.com/watch?v=z0G04bgZHwc

# ? Jul 1, 2021 19:31

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »