Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Elasticsearch has an official K8S operator. Works well for a lot of people in production use cases currently, in fact.

# ? Feb 25, 2023 06:20

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 03:34

Methanar: Sep 26, 2013; by the sex ghost

Docjowles posted:

It is really loving jarring for you to finally have an avatar, Methanar

If only mine was as good looking as yours

# ? Feb 25, 2023 06:30

jaegerx: Sep 10, 2012; Maybe this post will get me on your ignore list!

necrobobsledder posted:

Elasticsearch has an official K8S operator. Works well for a lot of people in production use cases currently, in fact.

Lol, no it doesn't.

# ? Feb 25, 2023 06:38

George Wright: Nov 20, 2005

Docjowles posted:

Yeah... I mean for both of us, StatefulSets exist. But why are you running this app in k8s at all except to say you can.

edit: I guess this is a good time to ask the audience if any of you are running important databases or elasticsearch clusters or something in k8s and are happy about it or doing it at gunpoint. We run a lot of k8s but really try to limit it to just stateless services here.

Our DBA team has been collectively dying for this resume bullet point.

# ? Feb 25, 2023 06:56

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Vulture Culture posted:

Import-as-code is something Terraform definitely needs in order to work in the kinds of GitOps workflows people imagine Terraform would actually be good at

This is coming FWIW. It will probably make some people happy and some people mad and cause a lot of problems while everyone learns appropriate patterns.

# ? Feb 25, 2023 06:56

12 rats tied together: Sep 7, 2006

Import as code is strictly better than the alternative of having to janitor every single state file with the worlds shittiest CLI. I can't imagine anyone being upset with this feature except in a "why did it take you 9 years to add this" sort of way, like me.

# ? Feb 25, 2023 08:26

Zapf Dingbat: Jan 9, 2001

The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

# ? Feb 27, 2023 15:35

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

# ? Feb 27, 2023 16:17

Docjowles: Apr 9, 2009

Zapf Dingbat posted:

The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

You treat the people responsible for the infrastructure like adults and give them the necessary permission (both in IAM and organizationally) to do their job.

FISHMANPET posted:

OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

I don't think you can factor out the entire block of settings. But you could at least put the values you want in a set of locals and then reference those each time so the vales are all declared in one place.

# ? Feb 27, 2023 16:24

12 rats tied together: Sep 7, 2006

this is the type of thing that yaml anchors and aliases were designed for

# ? Feb 27, 2023 18:30

The Fool: Oct 16, 2003

FISHMANPET posted:

OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

app service is lovely to manage in terraform. Just deploy the base resource, certs and custom domain bindings if you need them, add app settings to lifecycle ignore, and for slots only create them, don't manage swapping them in terraform

deployments, slot management, and app settings should be managed out of band

# ? Feb 27, 2023 18:52

Zephirus: May 18, 2004; BRRRR......CHK

The Fool posted:

app service is lovely to manage in terraform. Just deploy the base resource, certs and custom domain bindings if you need them, add app settings to lifecycle ignore, and for slots only create them, don't manage swapping them in terraform

deployments, slot management, and app settings should be managed out of band

Seconding this, slot movement/promotion etc are deployment tasks. FYI if, like some of our devs, you're considering using slots for development be aware they'll all share the app service plan so if you cake the cpu or memory with your dev or uat slot it'll break your prod one too.

# ? Feb 27, 2023 21:11

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

I won't be managing the swapping of slots or the app settings via terraform, I just don't want to get into a situation where I've applied different settings to my staging slot and the "production" slot, and then they get swapped, and now terraform is all mad because the state's messed up.

Once I get this running the odds of me touching it again are also very slim. I'm mostly going down this path because I burnt myself not turning on "https_only" when I hand-created the app the first time (and also an opportunity to play around with terraform).

# ? Feb 27, 2023 21:43

The Fool: Oct 16, 2003

FISHMANPET posted:

I won't be managing the swapping of slots or the app settings via terraform, I just don't want to get into a situation where I've applied different settings to my staging slot and the "production" slot, and then they get swapped, and now terraform is all mad because the state's messed up.

Once I get this running the odds of me touching it again are also very slim. I'm mostly going down this path because I burnt myself not turning on "https_only" when I hand-created the app the first time (and also an opportunity to play around with terraform).

add the app settings block to lifecycle ignore and everything else should be the same, don't try to manage the differences in terraform

# ? Feb 27, 2023 22:41

The Fool: Oct 16, 2003

Zephirus posted:

be aware they'll all share the app service plan

this isn't necessarily true
it is the default behavior but both azure and terraform support moving slots between asps

one of the teams I support does this to manage down time when doing upgrades and pre-deploy sku changes

# ? Feb 27, 2023 22:43

crazypenguin: Mar 9, 2005; nothing witty here, move along

Zapf Dingbat posted:

The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

Docjowles posted:

You treat the people responsible for the infrastructure like adults and give them the necessary permission (both in IAM and organizationally) to do their job.

In addition to the above, which is completely correct, you use AWS Organizations.

Then every dev can have their own aws account to do whatever with, and sensitive stuff you need to be careful with can live in separate accounts, where you can appropriately add controls to prevent anyone from (assuming a permissive role to begin with much less) fat fingering deletions of important stuff.

# ? Feb 27, 2023 22:46

luminalflux: May 27, 2005

crazypenguin posted:

In addition to the above, which is completely correct, you use AWS Organizations.

Then every dev can have their own aws account to do whatever with, and sensitive stuff you need to be careful with can live in separate accounts, where you can appropriately add controls to prevent anyone from (assuming a permissive role to begin with much less) fat fingering deletions of important stuff.

if you do this don't provision them with a root account of devs.email@company.com, instead use a shared email address like devops+devs.email@company.com so when they leave/get fired you can close the account easily/go through pw reset flow without having to ask IT to re-enable their inbox.

# ? Feb 27, 2023 22:48

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

The Fool posted:

this isn't necessarily true
it is the default behavior but both azure and terraform support moving slots between asps

one of the teams I support does this to manage down time when doing upgrades and pre-deploy sku changes

Are you able to actually manage this via terraform? I think I can create a slot and specify a different app service plan at creation time, but if I specify the app service plan that the main app is running on, it seems to store that as null in the state. Then if you try and change the app service plan (even if the value is the one it's actually running on) the plan will fail because I think it tries to validate the old value of empty string and then fail. I'm considering filing a bug about it, unless there's some aspect I'm missing.

# ? Feb 27, 2023 23:24

The Fool: Oct 16, 2003

FISHMANPET posted:

Are you able to actually manage this via terraform? I think I can create a slot and specify a different app service plan at creation time, but if I specify the app service plan that the main app is running on, it seems to store that as null in the state. Then if you try and change the app service plan (even if the value is the one it's actually running on) the plan will fail because I think it tries to validate the old value of empty string and then fail. I'm considering filing a bug about it, unless there's some aspect I'm missing.

I was flying all day, but I can double check the behavior tomorrow. It for sure works on 2.x versions of the provider, was taken away with 3.x, then added back very recently.

# ? Feb 28, 2023 01:08

Zephirus: May 18, 2004; BRRRR......CHK

The Fool posted:

I was flying all day, but I can double check the behavior tomorrow. It for sure works on 2.x versions of the provider, was taken away with 3.x, then added back very recently.

I genuinely had no idea you could do this, that's actually really useful.

# ? Feb 28, 2023 01:55

prom candy: Dec 16, 2005; Only I may dance

I'm in an inherited infrastructure situation and not really a devops guy so looking for a sanity check. My prod database is getting pinged pretty hard and I need a zero(ish) downtime way of alleviating the load. It's MySQL 5.7 on RDS. I'm thinking what I may want to do is spin up a read replica of my database so that I can spread out the load. The DB is Multi A-Z so as far as I can tell that means AWS is capable of spinning up a replica without suspending I/O. Once that's done does RDS take care of routing read/writes or do I need to update my application to do that as well?

The real solution here is to add a couple of indices to a table that the last guy really should have added BEFORE it got to millions of rows but since it's MySQL 5.7 that's not an option.

Does this sound like a moderately sane (if expensive) thing to do?

# ? Mar 3, 2023 00:40

MightyBigMinus: Jan 26, 2020

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

# ? Mar 3, 2023 01:11

Jabor: Jul 16, 2010; #1 Loser at SpaceChem

For a quick fix just slap a cache in front of it. Especially if the problem is the same expensive query being done over and over again.

# ? Mar 3, 2023 01:12

The Iron Rose: May 12, 2012; Cat Army

Your app needs to handle routing reads and writes. This came up a few times - you can check my post history for some good detailed commentary on it as I had a similar question a few months back and other posters had great advice.

You can sorta do it on a a proxy level but there�s no way to know from a networking perspective alone that an HTTP GET or even a SELECT won�t have side effects or cause a write somewhere. The app needs to be aware of its data needs and where to fetch them from especially as there�s latency and data consistency to be aware of when you�ve a distributed system.

You can however easily load balance across multiple read replicas with RDS by hitting the reader endpoint rather than the individual RR endpoint if I recall right.

The Iron Rose fucked around with this message at 01:22 on Mar 3, 2023

# ? Mar 3, 2023 01:14

prom candy: Dec 16, 2005; Only I may dance

MightyBigMinus posted:

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

Oh hey you're right! This actually should solve my issue then.

# ? Mar 3, 2023 02:02

luminalflux: May 27, 2005

The Iron Rose posted:

Your app needs to handle routing reads and writes. This came up a few times - you can check my post history for some good detailed commentary on it as I had a similar question a few months back and other posters had great advice.

If you can't fix it in the app because of Reasons, ProxySQL is what we've used to route queries between the leader and the replicas based on our knowledge of "this query can totally handle some replica lag". There's also GTID consistent reads where proxysql transparently knows which replicas are up to date or not if you're doing write-then-read.

But you should try to fix the app if you can.

MightyBigMinus posted:

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

It Depends. On large and/or high traffic tables you can run into issues you can't grab the metadata lock, or it causes massive performance issues while doing the online change. We use gh-ost for all schema migrations due to issues with online DDL changes causing performance degradation. It's also a lot nicer on the replication.

# ? Mar 3, 2023 03:40

The Iron Rose: May 12, 2012; Cat Army

luminalflux posted:

gh-ost

i just learned a ton about mysql, the documentation for this rocks

e: this is in fact possibly the best documented github for a tool i've ever seen.

The Iron Rose fucked around with this message at 05:56 on Mar 3, 2023

# ? Mar 3, 2023 05:48

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Charity Majors has also written a lot about lessons learned from doing online MySQL schema migrations at Facebook and Honeycomb. The main lesson at scale if you can't handle downtime is "don't modify table DDL", and many companies will add a new column, update code to write both columns, then start doing background migrations of the old data before cutting the application over to only use the new column. After that, the old column is left permanently.

If you don't want to do this, consider downtime.

e: poo poo, gh-ost looks awesome.

# ? Mar 3, 2023 17:08

Docjowles: Apr 9, 2009

Oh cool, Gitlab is raising their per-user cost by 50% out of nowhere. Surprise motherfucker!

I look forward to my company spending 10x that amount in engineer-hours to migrate to GitHub or something out of spite (I have no indication leadership are thinking about that, but it's absolutely the kind of thing we would do lol)

# ? Mar 3, 2023 17:53

vanity slug: Jul 20, 2010

github is still way more expensive than gitlab

# ? Mar 3, 2023 17:59

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

e: removed

Vulture Culture fucked around with this message at 21:51 on Mar 3, 2023

# ? Mar 3, 2023 18:07

Docjowles: Apr 9, 2009

vanity slug posted:

github is still way more expensive than gitlab

It's basically identical (before the stupid loving used car salesman song and dance of enterprise sales where you don't actually pay anything resembling list price) after this price increase

Actually, Github Enterprise is ~$21/user/year and Gitlab is about to be $29/user/year :confused:

Unless there's some massive gotcha in Github's pricing, which would not be at all surprising

Docjowles fucked around with this message at 18:35 on Mar 3, 2023

# ? Mar 3, 2023 18:32

xzzy: Mar 5, 2009

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.

# ? Mar 3, 2023 18:56

prom candy: Dec 16, 2005; Only I may dance

luminalflux posted:

It Depends. On large and/or high traffic tables you can run into issues you can't grab the metadata lock, or it causes massive performance issues while doing the online change. We use gh-ost for all schema migrations due to issues with online DDL changes causing performance degradation. It's also a lot nicer on the replication.

What's a large table in your opinion?

# ? Mar 3, 2023 20:45

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

xzzy posted:

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.

I haven't admined an instance in a hot minute but don't they gate federation/oauth behind enterprise licensing? Runners too, IIRC. You can live without runners but it'd be hard for a lot of people to live without federation of some kind.

# ? Mar 3, 2023 20:54

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

One difference between cloud GitHub and cloud GitLab is that one of them will work if us-east-1 fails

# ? Mar 3, 2023 21:51

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Yeah, that one component will be up while everything else in your entire office toolchain is broken

# ? Mar 3, 2023 21:57

luminalflux: May 27, 2005

Vulture Culture posted:

Charity Majors has also written a lot about lessons learned from doing online MySQL schema migrations at Facebook and Honeycomb. The main lesson at scale if you can't handle downtime is "don't modify table DDL", and many companies will add a new column, update code to write both columns, then start doing background migrations of the old data before cutting the application over to only use the new column. After that, the old column is left permanently.

If you don't want to do this, consider downtime.

e: poo poo, gh-ost looks awesome.

For background row deletions, there's pt-archiver that can background update or delete without bothering the database too much. It monitors replica lag and will throttle itself. gh-ost similarly is amazing at that, and even better you can run the data migration into the new table during the day, and wait to cut over to the new table in you traffic trough / maintenance window. It's a quantum leap past pt-osc (percona toolkit's online schema change) which is showing it's age now.

prom candy posted:

What's a large table in your opinion?

Anything over 10mm rows is where I start reaching for gh-ost, but the table can be smaller and have issues if it has high update traffic on it. We've now enforced that all migrations go through gh-ost to avoid having arguments over "well my table is small enough, it shouldn't matter!" and avoid having to try to make judgment calls or do stuff like `SELECT table_name,table_rows FROM information_schema.tables WHERE table_name = 'buttslol' ORDER BY table_rows DESC; ` whenever we get a migration PR.

# ? Mar 4, 2023 19:02

prom candy: Dec 16, 2005; Only I may dance

luminalflux posted:

Anything over 10mm rows is where I start reaching for gh-ost, but the table can be smaller and have issues if it has high update traffic on it. We've now enforced that all migrations go through gh-ost to avoid having arguments over "well my table is small enough, it shouldn't matter!" and avoid having to try to make judgment calls or do stuff like `SELECT table_name,table_rows FROM information_schema.tables WHERE table_name = 'buttslol' ORDER BY table_rows DESC; ` whenever we get a migration PR.

Thanks. I think I'm still in the zone where I can YOLO this during off peak hours but gh-ost is definitely great to know about.

# ? Mar 4, 2023 21:36

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 03:34

luminalflux: May 27, 2005

prom candy posted:

Thanks. I think I'm still in the zone where I can YOLO this during off peak hours but gh-ost is definitely great to know about.

Another thing to consider is how you handle dropping tables. We had some severe issues with RDS, each time we'd reboot with failover - something that AWS claims should be "fairly quick", we'd end up having hours+ of downtime. Way back in the day someone dropped a big table and while it was dropping, it took down the site due to ... idk what was the actual issue. So they rebooted the database.

We only discovered years later that the database files were left around on the RDS instance's filesystem, and each time it booted it say "huh, i have table files but no record of them in the metadata table. I have no idea what's real, so I need to go into crash recovery". AWS support would join our zoom bridge, but they basically could only troubleshoot as far to "well the instance is working normally, let me page the database team" taking 30min-1hr to get to a point where we had an expert online.

Learnings?

Even though RDS can do multi-AZ, don't rely on it. Have a replica you can switch to and promote instead. Ideally you have
Drop tables gracefully by emptying them first using pt-archiver.
AWS support isn't enough. Either you need in-house expertise on your primary datastore, or engage experts such as Percona or Pythian.
If you intend to scale (i e, you're a startup that's scaling up customers et c), build out best practices and tooling while you're small enough before you end up with huge amounts of inadvertent downtime

(i realize i'm basically drip-feeding my PerconaLive talk from 2020 i never gave because, uh, the 'rona)

# ? Mar 5, 2023 00:43

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »