Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Elasticsearch has an official K8S operator. Works well for a lot of people in production use cases currently, in fact.

Adbot
ADBOT LOVES YOU

Methanar
Sep 26, 2013

by the sex ghost

Docjowles posted:

It is really loving jarring for you to finally have an avatar, Methanar :glomp:

If only mine was as good looking as yours

jaegerx
Sep 10, 2012

Maybe this post will get me on your ignore list!


necrobobsledder posted:

Elasticsearch has an official K8S operator. Works well for a lot of people in production use cases currently, in fact.

Lol, no it doesn't.

George Wright
Nov 20, 2005

Docjowles posted:

Yeah... I mean for both of us, StatefulSets exist. But why are you running this app in k8s at all except to say you can.

edit: I guess this is a good time to ask the audience if any of you are running important databases or elasticsearch clusters or something in k8s and are happy about it or doing it at gunpoint. We run a lot of k8s but really try to limit it to just stateless services here.

Our DBA team has been collectively dying for this resume bullet point.

YOLOsubmarine
Oct 19, 2004

When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Vulture Culture posted:

Import-as-code is something Terraform definitely needs in order to work in the kinds of GitOps workflows people imagine Terraform would actually be good at

This is coming FWIW. It will probably make some people happy and some people mad and cause a lot of problems while everyone learns appropriate patterns.

12 rats tied together
Sep 7, 2006

Import as code is strictly better than the alternative of having to janitor every single state file with the worlds shittiest CLI. I can't imagine anyone being upset with this feature except in a "why did it take you 9 years to add this" sort of way, like me.

Zapf Dingbat
Jan 9, 2001


The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

Docjowles
Apr 9, 2009

Zapf Dingbat posted:

The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

You treat the people responsible for the infrastructure like adults and give them the necessary permission (both in IAM and organizationally) to do their job.

FISHMANPET posted:

OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

I don't think you can factor out the entire block of settings. But you could at least put the values you want in a set of locals and then reference those each time so the vales are all declared in one place.

12 rats tied together
Sep 7, 2006

this is the type of thing that yaml anchors and aliases were designed for

The Fool
Oct 16, 2003


FISHMANPET posted:

OK, another probably dumb terraform question. Is there a way to reuse some settings definitions between resources of different types?

I'm setting up an Azure App Service app, and it has a concept of "slots". A slot is just another app service app, but it has a connection back to the original "app" and in terraform it's a different resource type. I'd like to keep them identical, especially because slots can swap around, and it's possible for the main app to "swap" with one of the slots, and then that could throw state into a tizzy. I suppose I could just do statements like "https_only = azurerm_linux_web_app.app.https_only" for all the statements. But is there some way where I can define a block outside of an individual resource and just "insert" it into my resource definitions?

app service is lovely to manage in terraform. Just deploy the base resource, certs and custom domain bindings if you need them, add app settings to lifecycle ignore, and for slots only create them, don't manage swapping them in terraform

deployments, slot management, and app settings should be managed out of band

Zephirus
May 18, 2004

BRRRR......CHK

The Fool posted:

app service is lovely to manage in terraform. Just deploy the base resource, certs and custom domain bindings if you need them, add app settings to lifecycle ignore, and for slots only create them, don't manage swapping them in terraform

deployments, slot management, and app settings should be managed out of band

Seconding this, slot movement/promotion etc are deployment tasks. FYI if, like some of our devs, you're considering using slots for development be aware they'll all share the app service plan so if you cake the cpu or memory with your dev or uat slot it'll break your prod one too.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
I won't be managing the swapping of slots or the app settings via terraform, I just don't want to get into a situation where I've applied different settings to my staging slot and the "production" slot, and then they get swapped, and now terraform is all mad because the state's messed up.

Once I get this running the odds of me touching it again are also very slim. I'm mostly going down this path because I burnt myself not turning on "https_only" when I hand-created the app the first time (and also an opportunity to play around with terraform).

The Fool
Oct 16, 2003


FISHMANPET posted:

I won't be managing the swapping of slots or the app settings via terraform, I just don't want to get into a situation where I've applied different settings to my staging slot and the "production" slot, and then they get swapped, and now terraform is all mad because the state's messed up.

Once I get this running the odds of me touching it again are also very slim. I'm mostly going down this path because I burnt myself not turning on "https_only" when I hand-created the app the first time (and also an opportunity to play around with terraform).

add the app settings block to lifecycle ignore and everything else should be the same, don't try to manage the differences in terraform

The Fool
Oct 16, 2003


Zephirus posted:

be aware they'll all share the app service plan

this isn't necessarily true
it is the default behavior but both azure and terraform support moving slots between asps

one of the teams I support does this to manage down time when doing upgrades and pre-deploy sku changes

crazypenguin
Mar 9, 2005
nothing witty here, move along

Zapf Dingbat posted:

The place I work for is a startup that hires all noobs so everyone is discovering how to do things for the first time.

I'm running into a problem where my IAM policy is very bottlenecked by the owner of the company. I can create anything I want but I can never delete it, or even make some changes after creation. So I have to give a ticket to my boss that will sit there for months. We have a $35,000/month AWS bill.

How do real companies handle this?

Docjowles posted:

You treat the people responsible for the infrastructure like adults and give them the necessary permission (both in IAM and organizationally) to do their job.

In addition to the above, which is completely correct, you use AWS Organizations.

Then every dev can have their own aws account to do whatever with, and sensitive stuff you need to be careful with can live in separate accounts, where you can appropriately add controls to prevent anyone from (assuming a permissive role to begin with much less) fat fingering deletions of important stuff.

luminalflux
May 27, 2005



crazypenguin posted:

In addition to the above, which is completely correct, you use AWS Organizations.

Then every dev can have their own aws account to do whatever with, and sensitive stuff you need to be careful with can live in separate accounts, where you can appropriately add controls to prevent anyone from (assuming a permissive role to begin with much less) fat fingering deletions of important stuff.

if you do this don't provision them with a root account of devs.email@company.com, instead use a shared email address like devops+devs.email@company.com so when they leave/get fired you can close the account easily/go through pw reset flow without having to ask IT to re-enable their inbox.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams

The Fool posted:

this isn't necessarily true
it is the default behavior but both azure and terraform support moving slots between asps

one of the teams I support does this to manage down time when doing upgrades and pre-deploy sku changes

Are you able to actually manage this via terraform? I think I can create a slot and specify a different app service plan at creation time, but if I specify the app service plan that the main app is running on, it seems to store that as null in the state. Then if you try and change the app service plan (even if the value is the one it's actually running on) the plan will fail because I think it tries to validate the old value of empty string and then fail. I'm considering filing a bug about it, unless there's some aspect I'm missing.

The Fool
Oct 16, 2003


FISHMANPET posted:

Are you able to actually manage this via terraform? I think I can create a slot and specify a different app service plan at creation time, but if I specify the app service plan that the main app is running on, it seems to store that as null in the state. Then if you try and change the app service plan (even if the value is the one it's actually running on) the plan will fail because I think it tries to validate the old value of empty string and then fail. I'm considering filing a bug about it, unless there's some aspect I'm missing.

I was flying all day, but I can double check the behavior tomorrow. It for sure works on 2.x versions of the provider, was taken away with 3.x, then added back very recently.

Zephirus
May 18, 2004

BRRRR......CHK

The Fool posted:

I was flying all day, but I can double check the behavior tomorrow. It for sure works on 2.x versions of the provider, was taken away with 3.x, then added back very recently.

I genuinely had no idea you could do this, that's actually really useful.

prom candy
Dec 16, 2005

Only I may dance
I'm in an inherited infrastructure situation and not really a devops guy so looking for a sanity check. My prod database is getting pinged pretty hard and I need a zero(ish) downtime way of alleviating the load. It's MySQL 5.7 on RDS. I'm thinking what I may want to do is spin up a read replica of my database so that I can spread out the load. The DB is Multi A-Z so as far as I can tell that means AWS is capable of spinning up a replica without suspending I/O. Once that's done does RDS take care of routing read/writes or do I need to update my application to do that as well?

The real solution here is to add a couple of indices to a table that the last guy really should have added BEFORE it got to millions of rows but since it's MySQL 5.7 that's not an option.

Does this sound like a moderately sane (if expensive) thing to do?

MightyBigMinus
Jan 26, 2020

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
For a quick fix just slap a cache in front of it. Especially if the problem is the same expensive query being done over and over again.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
Your app needs to handle routing reads and writes. This came up a few times - you can check my post history for some good detailed commentary on it as I had a similar question a few months back and other posters had great advice.

You can sorta do it on a a proxy level but there’s no way to know from a networking perspective alone that an HTTP GET or even a SELECT won’t have side effects or cause a write somewhere. The app needs to be aware of its data needs and where to fetch them from especially as there’s latency and data consistency to be aware of when you’ve a distributed system.

You can however easily load balance across multiple read replicas with RDS by hitting the reader endpoint rather than the individual RR endpoint if I recall right.

The Iron Rose fucked around with this message at 01:22 on Mar 3, 2023

prom candy
Dec 16, 2005

Only I may dance

MightyBigMinus posted:

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

Oh hey you're right! This actually should solve my issue then.

luminalflux
May 27, 2005



The Iron Rose posted:

Your app needs to handle routing reads and writes. This came up a few times - you can check my post history for some good detailed commentary on it as I had a similar question a few months back and other posters had great advice.

If you can't fix it in the app because of Reasons, ProxySQL is what we've used to route queries between the leader and the replicas based on our knowledge of "this query can totally handle some replica lag". There's also GTID consistent reads where proxysql transparently knows which replicas are up to date or not if you're doing write-then-read.

But you should try to fix the app if you can.


MightyBigMinus posted:

5.7's not *that* old i'm pretty sure adding indexes is non-blocking

It Depends. On large and/or high traffic tables you can run into issues you can't grab the metadata lock, or it causes massive performance issues while doing the online change. We use gh-ost for all schema migrations due to issues with online DDL changes causing performance degradation. It's also a lot nicer on the replication.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:


i just learned a ton about mysql, the documentation for this rocks


e: this is in fact possibly the best documented github for a tool i've ever seen.

The Iron Rose fucked around with this message at 05:56 on Mar 3, 2023

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
Charity Majors has also written a lot about lessons learned from doing online MySQL schema migrations at Facebook and Honeycomb. The main lesson at scale if you can't handle downtime is "don't modify table DDL", and many companies will add a new column, update code to write both columns, then start doing background migrations of the old data before cutting the application over to only use the new column. After that, the old column is left permanently.

If you don't want to do this, consider downtime.

e: poo poo, gh-ost looks awesome.

Docjowles
Apr 9, 2009

Oh cool, Gitlab is raising their per-user cost by 50% out of nowhere. Surprise motherfucker!

I look forward to my company spending 10x that amount in engineer-hours to migrate to GitHub or something out of spite (I have no indication leadership are thinking about that, but it's absolutely the kind of thing we would do lol)

vanity slug
Jul 20, 2010

github is still way more expensive than gitlab

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
e: removed

Vulture Culture fucked around with this message at 21:51 on Mar 3, 2023

Docjowles
Apr 9, 2009

vanity slug posted:

github is still way more expensive than gitlab

It's basically identical (before the stupid loving used car salesman song and dance of enterprise sales where you don't actually pay anything resembling list price) after this price increase

Actually, Github Enterprise is ~$21/user/year and Gitlab is about to be $29/user/year :confused: Unless there's some massive gotcha in Github's pricing, which would not be at all surprising

Docjowles fucked around with this message at 18:35 on Mar 3, 2023

xzzy
Mar 5, 2009

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.

prom candy
Dec 16, 2005

Only I may dance

luminalflux posted:

It Depends. On large and/or high traffic tables you can run into issues you can't grab the metadata lock, or it causes massive performance issues while doing the online change. We use gh-ost for all schema migrations due to issues with online DDL changes causing performance degradation. It's also a lot nicer on the replication.

What's a large table in your opinion?

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

xzzy posted:

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.
I haven't admined an instance in a hot minute but don't they gate federation/oauth behind enterprise licensing? Runners too, IIRC. You can live without runners but it'd be hard for a lot of people to live without federation of some kind.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
One difference between cloud GitHub and cloud GitLab is that one of them will work if us-east-1 fails

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
Yeah, that one component will be up while everything else in your entire office toolchain is broken

luminalflux
May 27, 2005



Vulture Culture posted:

Charity Majors has also written a lot about lessons learned from doing online MySQL schema migrations at Facebook and Honeycomb. The main lesson at scale if you can't handle downtime is "don't modify table DDL", and many companies will add a new column, update code to write both columns, then start doing background migrations of the old data before cutting the application over to only use the new column. After that, the old column is left permanently.

If you don't want to do this, consider downtime.

e: poo poo, gh-ost looks awesome.

For background row deletions, there's pt-archiver that can background update or delete without bothering the database too much. It monitors replica lag and will throttle itself. gh-ost similarly is amazing at that, and even better you can run the data migration into the new table during the day, and wait to cut over to the new table in you traffic trough / maintenance window. It's a quantum leap past pt-osc (percona toolkit's online schema change) which is showing it's age now.


prom candy posted:

What's a large table in your opinion?

Anything over 10mm rows is where I start reaching for gh-ost, but the table can be smaller and have issues if it has high update traffic on it. We've now enforced that all migrations go through gh-ost to avoid having arguments over "well my table is small enough, it shouldn't matter!" and avoid having to try to make judgment calls or do stuff like `SELECT table_name,table_rows FROM information_schema.tables WHERE table_name = 'buttslol' ORDER BY table_rows DESC; ` whenever we get a migration PR.

prom candy
Dec 16, 2005

Only I may dance

luminalflux posted:

Anything over 10mm rows is where I start reaching for gh-ost, but the table can be smaller and have issues if it has high update traffic on it. We've now enforced that all migrations go through gh-ost to avoid having arguments over "well my table is small enough, it shouldn't matter!" and avoid having to try to make judgment calls or do stuff like `SELECT table_name,table_rows FROM information_schema.tables WHERE table_name = 'buttslol' ORDER BY table_rows DESC; ` whenever we get a migration PR.

Thanks. I think I'm still in the zone where I can YOLO this during off peak hours but gh-ost is definitely great to know about.

Adbot
ADBOT LOVES YOU

luminalflux
May 27, 2005



prom candy posted:

Thanks. I think I'm still in the zone where I can YOLO this during off peak hours but gh-ost is definitely great to know about.

Another thing to consider is how you handle dropping tables. We had some severe issues with RDS, each time we'd reboot with failover - something that AWS claims should be "fairly quick", we'd end up having hours+ of downtime. Way back in the day someone dropped a big table and while it was dropping, it took down the site due to ... idk what was the actual issue. So they rebooted the database.

We only discovered years later that the database files were left around on the RDS instance's filesystem, and each time it booted it say "huh, i have table files but no record of them in the metadata table. I have no idea what's real, so I need to go into crash recovery". AWS support would join our zoom bridge, but they basically could only troubleshoot as far to "well the instance is working normally, let me page the database team" taking 30min-1hr to get to a point where we had an expert online.

Learnings?

  • Even though RDS can do multi-AZ, don't rely on it. Have a replica you can switch to and promote instead. Ideally you have
  • Drop tables gracefully by emptying them first using pt-archiver.
  • AWS support isn't enough. Either you need in-house expertise on your primary datastore, or engage experts such as Percona or Pythian.
  • If you intend to scale (i e, you're a startup that's scaling up customers et c), build out best practices and tooling while you're small enough before you end up with huge amounts of inadvertent downtime

(i realize i'm basically drip-feeding my PerconaLive talk from 2020 i never gave because, uh, the 'rona)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply