Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Just-In-Timeberlake posted:

boy I sure do love trying to unfuck a query based on a dozen views that are all like

view_1
view_2
view_3
...
view_final

and each view builds on the previous one until you get to the Voltron final view

which one is the problem? have fun unraveling each one!

imma murder the fucker who wrote this if I ever find him.

I did that last year! Collapsing it all into one view helped performance quite a bit, then we could drop all the sub-views because nothing used them. We didn't because :effort: but we could have :sigh:

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
For those in the field, which way to you see the wind blowing for DBAs over the next few years? From the outside looking in, it looks like the cloud is being pushed more and more. I was watching a Microsoft video where a prominent leader of Azure SQL was talking about the future of DBAs, and how the role isn't going away but will likely morph in to a solution architect.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I don't buy it. They've pushed for a solid decade to get everyone to go NoSQL and it hasn't happened. Things need to be really really large, or particularly susceptible to scalability outages to be worth the hassle and expense associated with making it work. When I was at AWS I saw lots of complaints about relational databases, even principal engineer mandates to avoid them, but there was zero guidance on how to make it happen. As a result, software developers were left to invent ways to store data, which usually meant ORM or WordPress-site equivalent designs; ie, the flattest most inefficient models you can imagine.

Meanwhile, hardware is cheap and mtbf is relatively low. Relational databases are backed by a ton of theory and continue to evolve with increases in memory and cpu power, and are often less expensive than a bunch of custom software. Even with cloud solutions like Aurora, database performance requires attention, knowledge, experience, investigation, adaptation. So the cloud knows how much memory to give to buffers based on hardware settings; it's not telling you how to lay out your tables or optimize queries.

It also seems like cloud data tooling is in its infancy. Companies don't think about ETL and reporting automation until someone thinks "they need ML". More simply, engineers don't know those tools. College graduates are more likely to get basic relational design. With more more more data these days, companies will increasingly encounter customer requests to extend data models and volume, so performance optimization and investigation will still be important.

Disclaimer. The world generally does the exact opposite of what (I think) makes sense.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Isn't NoSQL non relational? How the gently caress is a push to a non relational system supposed to work when an overwhelming majority of enterprise data is relational?

abelwingnut
Dec 23, 2002


D34THROW posted:

Isn't NoSQL non relational? How the gently caress is a push to a non relational system supposed to work when an overwhelming majority of enterprise data is relational?

it makes zero sense. the people who have pushed this obviously have never heard of a JOIN.

raminasi
Jan 25, 2005

a last drink with no ice
My team spoke with an AWS technical person about some trouble we were having wiring their managed GraphQL service up to a relational database (disclaimer: this was a prototype and we were exploring whether it made sense knowing that it might not) and the answer we got was “Why aren’t you using DynamoDB?” He did not seem at all prepared for our answer, which was “Because we have relational requirements on our data and nowhere near enough of it to feel any performance constraints from a relational system.”

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Yeap it's basically all those things. While solutions architects and cloud consultants won't universally bring up NoSQL, you'll hear an increasing number telling you that it's more horizontally scalable, vertically scalable, available, shardable, redundant, etcetera. While some of that is correct, that may be the extent of their reasoning; ie, they aren't solving your product's problem statement.

Take a look at this. It's not an awful design, but read between the lines: https://aws.amazon.com/blogs/database/options-for-legacy-application-modernization-with-amazon-aurora-and-amazon-dynamodb/. What is the author actually telling you about your relational database?

Or try the article from Azure.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Back in ye olde days of Access and SharePoint lists, the non relational structure of the latter is specifically what prevented me from using it. What kind of datasets are going to bog down a relational DB and what lack of indexing is occurring there :psyduck:

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


D34THROW posted:

Back in ye olde days of Access and SharePoint lists, the non relational structure of the latter is specifically what prevented me from using it. What kind of datasets are going to bog down a relational DB and what lack of indexing is occurring there :psyduck:

Lots of idiots don't do indexes. That's really what nosql abstracts away for them: index maintenance. Make your information a bit more hierarchical and store it as whole serialized objects with no referential integrity to any other object. It feels hella-unnatural and the only use case I've really found for it was storing like an enterprise clouds app weblogs and poo poo. To have shittons of long text records that you want better free text searching within.

Wipfmetz
Oct 12, 2007

Sitzen ein oder mehrere Wipfe in einer Lore, so kann man sie ueber den Rand der Lore hinausschauen sehen.
Does "index maintenance" include the definition of primary keys? I'm always very suspicious of people who don't give their primary keys the love they deserve.

They're not a technical detail; "What's the PK here" is one of the first questions which anybody should ask themself when they start working with any form of dataset.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Perhaps it's a bit Hungarian notation in practice but my PKs and FKs are always blatantly obvious, named either "id" or "<foreign table>_id". Even when there is a field that should be unique between rows I create an integer PK to avoid potential issues.

But i am babby at this so.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Wipfmetz posted:

Does "index maintenance" include the definition of primary keys? I'm always very suspicious of people who don't give their primary keys the love they deserve.

They're not a technical detail; "What's the PK here" is one of the first questions which anybody should ask themself when they start working with any form of dataset.

Defining indices on stupid columns, not maintaining indices, not having supporting indices on FK columns/commonly queried columns, all that poo poo here.


D34THROW posted:

Perhaps it's a bit Hungarian notation in practice but my PKs and FKs are always blatantly obvious, named either "id" or "<foreign table>_id". Even when there is a field that should be unique between rows I create an integer PK to avoid potential issues.

But i am babby at this so.

This is 100% good practice from start to finish. Comparing ints is a single step operation for a CPU and yet some folks try to PK on a VARCHAR instead.

Wipfmetz
Oct 12, 2007

Sitzen ein oder mehrere Wipfe in einer Lore, so kann man sie ueber den Rand der Lore hinausschauen sehen.

Nth Doctor posted:

This is 100% good practice from start to finish. Comparing ints is a single step operation for a CPU and yet some folks try to PK on a VARCHAR instead.
If you only consider the performance side of the database operations, then sure.
And if you have a usage scenario where such performance benefits are required.

I'm usually working with engineering data so all that data has cute little tables where the usually handled dataset size rarely enters the millions.
I'll happily put PKs on VARCHARs whenever it reduces _my_ time, e.g. because I'll get obvious uniqueness guarantees onto which i can build my queries on.

I'm usually also adding an autoincrement field to direct my FKs to. No idea if that is stupid or not, performancewise.

(I guess it would be better to have the PK on the autoincrement and ensuring those "obvious uniqueness guarantees" by additonal unique indices. That should make INSERTs easier. )

Wipfmetz fucked around with this message at 14:33 on Mar 7, 2022

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:

Nth Doctor posted:

This is 100% good practice from start to finish. Comparing ints is a single step operation for a CPU and yet some folks try to PK on a VARCHAR instead.

Case in point in the application I'm (re)working on now, every material report is tied to a job. That job has a unique identifier in the format JJ######. I could either A) store the job number directly as an integer with all the formatting headaches that brings (adding leading zeroes if necessary, adding the JJ prefix), or store the job number as a VARCHAR(8) and not have to gently caress with formatting. :effort:

A year ago, I would have tried to brute-force the more complicated solution as chest-beating programming-man moment.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

D34THROW posted:

Isn't NoSQL non relational? How the gently caress is a push to a non relational system supposed to work when an overwhelming majority of enterprise data is relational?

As linked above, all of the major cloud vendors offer a much-more-complicated solution where you can effectively represent relational data in NoSQL databases that are basically just Key / Value stores with CQRS, as seen in Azure's docs here.

I've seen teams move applications to this, engineers love it because it's a ton of new things to learn and study and concepts to reason about, and the cloud vendors love it because it moves the data storage into platform-specific solutions like DynamoDB, Lambdas, and all sorts of entirely proprietary things that mean that the application has vendor-specific code all the way to the core. Also, once you've done it, then you can scale up near infinitely at the cost of a higher cloud bill.

Done right, it really is more resilient, at the cost of doing something like 20x the total compute work and complexity, but all of that additional compute work is horizontally scalable, unlike a single master database node.

Edit: The biggest practical advantage I've seen from NoSQL is that you can let every team have isolated database resources, preventing one team from pooping in the pool and impacting other teams. I spent 5 years working in a huge, many-team, microservices world, and a given data model would sprawl across teams, but they didn't want any teams sharing a database, so it's not like SQL databases could be used effectively either.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


D34THROW posted:

Case in point in the application I'm (re)working on now, every material report is tied to a job. That job has a unique identifier in the format JJ######. I could either A) store the job number directly as an integer with all the formatting headaches that brings (adding leading zeroes if necessary, adding the JJ prefix), or store the job number as a VARCHAR(8) and not have to gently caress with formatting. :effort:

A year ago, I would have tried to brute-force the more complicated solution as chest-beating programming-man moment.

You could even store the job number as a CHAR(8) if you knew it would never ever change. Storing as a VARCHAR(8) locks you into that lurking "what if we need a seventh numerical digit?" landmine just as easily.

I'm a big advocate for identity columns even when there's already a uniquely valued column in the table, if the column isn't comparable as a single CPU step. There's no reason you should spend multiple cycles comparing J, J, 4, 2, 1, 2, 6, 3 to J, J, 4, 2, 1, 2, 6, 0 one by one when you can just as easily see 34920 vs. 88382381 isn't a match with a single step.

nielsm
Jun 1, 2009



Nth Doctor posted:

I'm a big advocate for identity columns even when there's already a uniquely valued column in the table, if the column isn't comparable as a single CPU step. There's no reason you should spend multiple cycles comparing J, J, 4, 2, 1, 2, 6, 3 to J, J, 4, 2, 1, 2, 6, 0 one by one when you can just as easily see 34920 vs. 88382381 isn't a match with a single step.

On the other hand, 8 bytes is the size of an int64 and your CPU can already compare two int64 values in a single instruction. Compilers for C++ and other languages can translation a comparison of an array of 8 bytes into a comparison of a single 64 bit value, depending on exact context, so a database engine might also be able to.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


nielsm posted:

On the other hand, 8 bytes is the size of an int64 and your CPU can already compare two int64 values in a single instruction. Compilers for C++ and other languages can translation a comparison of an array of 8 bytes into a comparison of a single 64 bit value, depending on exact context, so a database engine might also be able to.

You raise a decent point in the case where the strings are <= 8 characters long, that's true. I would also bet modern dbms engines are smart enough to use that trivial escape hatch. You also have the collation layer to get through, though. Does the server/logical database/column have a collation set for case sensitivity vs. case insensitivity? Because that can mean extra work that the engine wouldn't necessarily know about a priori.

Just-In-Timeberlake
Aug 18, 2003
pertinent to this conversation

https://www.youtube.com/watch?v=b2F-DItXtZs

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

12 years later and it only gets worse

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Rookie question: Can someone explain, or point me towards a reference, of how log speed/size affects transaction speeds?

I saw a blog post from Brent Ozar where he points out that that a $22k/mo Azure SQL Database is woefully slow at loading data.

https://www.brentozar.com/archive/2019/02/how-fast-can-a-21468-mo-azure-sql-db-load-data/

The way I understood it is that the log file speed limit was woefully low. I'm starting my SQL Server journey (beyond writing reports) and still not sure how the log controls the speed.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Wipfmetz posted:

Does "index maintenance" include the definition of primary keys? I'm always very suspicious of people who don't give their primary keys the love they deserve.

They're not a technical detail; "What's the PK here" is one of the first questions which anybody should ask themself when they start working with any form of dataset.

if your pk ain't a surrogate it's a shameful pk

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Hughmoris posted:

Rookie question: Can someone explain, or point me towards a reference, of how log speed/size affects transaction speeds?

I saw a blog post from Brent Ozar where he points out that that a $22k/mo Azure SQL Database is woefully slow at loading data.

https://www.brentozar.com/archive/2019/02/how-fast-can-a-21468-mo-azure-sql-db-load-data/

The way I understood it is that the log file speed limit was woefully low. I'm starting my SQL Server journey (beyond writing reports) and still not sure how the log controls the speed.

I think you should start reading here. The MSDN pages about MSSQL are usually very good.

In brief though, the way transactional databases are typically implemented is that when you're changing data you first write a log entry for what you're going to do to a transaction log, and once that's been persisted then you can go off and update the actual tables and indexes, usually asynchronously (you don't flush to disk after every single transaction commit). That way if you crash or lose power while you're updating the big table and index files (or before the changes have been flushed to disk) you can recover from the transaction log, which is cheaper to write to. In postgres this is called the write-ahead log (WAL); in MSSQL it's just the transaction log. Apparently what's going on with Azure SQL is that while you can get a shitton of disk space and very fast I/O for the actual tables and indexes, that apparently doesn't change the I/O speed for the transaction log, so if you have a lot of small transactions you're bottlenecked by the log write speed.

TheFluff fucked around with this message at 03:30 on Mar 8, 2022

Hughmoris
Apr 21, 2007
Let's go to the abyss!

TheFluff posted:

I think you should start reading here. The MSDN pages about MSSQL are usually very good.

In brief though, the way transactional databases are typically implemented is that when you're changing data you first write a log entry for what you're going to do to a transaction log, and once that's been persisted then you can go off and update the actual tables and indexes, usually asynchronously (you don't flush to disk after every single transaction commit). That way if you crash or lose power while you're updating the big table and index files (or before the changes have been flushed to disk) you can recover from the transaction log, which is cheaper to write to. In postgres this is called the write-ahead log (WAL); in MSSQL it's just the transaction log. Apparently what's going on with Azure SQL is that while you can get a shitton of disk space and very fast I/O for the actual tables and indexes, that apparently doesn't change the I/O speed for the transaction log, so if you have a lot of small transactions you're bottlenecked by the log write speed.

That makes sense, thanks for the explanation!

*Very time video that YT recommended: Data Loading Best Practices on Azure SQL DB https://www.youtube.com/watch?v=WP-bqtVdJg4

Hughmoris fucked around with this message at 15:31 on Mar 8, 2022

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


TheFluff posted:

if your pk ain't a surrogate it's a shameful pk

:emptyquote:

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:

TheFluff posted:

if your pk ain't a surrogate it's a shameful pk

I don't think I've ever used anything but a surrogate PK. Just always made sense to me to let the DBMS do it.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Nth Doctor posted:

Defining indices on stupid columns, not maintaining indices, not having supporting indices on FK columns/commonly queried columns, all that poo poo here.

This is 100% good practice from start to finish. Comparing ints is a single step operation for a CPU and yet some folks try to PK on a VARCHAR instead.

I wouldn't use a varchar but a uniqueidentifier or equivalent is a good idea if you think the keys might make it to the front-end because then they shouldn't be guessable.

I've seen it used as a way to make slamming a bunch of data into multiple tables a little easier because you can make the PK/FK in code but I'm not sure if that was necessarily a good idea rather than an expedient one.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
turns out it actually doesn't matter for performance whether your primary key is an int or a varchar or a uuid. what matters is poo poo like n+1 query patterns and horrible joins produced by hibernate or equivalent garbage. you'll need someone who can read and interpret the output of explain long before you'll need to care about the number of cpu cycles per equality comparison

redleader
Aug 18, 2005

Engage according to operational parameters
i'm probably the horror, but i've never seen the point in having special guids or whatever so that ids can't be guessed on the front end. you should have authorization checks in place to prevent users accessing resources they're not allowed to regardless, so even if they changed id=69 to id=420 they should just get a 401 or 404 or whatever you like

maybe knowing the current id and growth rate could leak a bit of information? seems a bit of a niche problem

i guess it could be defence in depth if your authn/z checks are buggy?

plus copying/typing numeric ids is just so dang convenient

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
There are use cases for "anyone with the url is presumed to be allowed access", but you still don't want everything to be publicly browsable.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

redleader posted:

i'm probably the horror, but i've never seen the point in having special guids or whatever so that ids can't be guessed on the front end. you should have authorization checks in place to prevent users accessing resources they're not allowed to regardless, so even if they changed id=69 to id=420 they should just get a 401 or 404 or whatever you like

maybe knowing the current id and growth rate could leak a bit of information? seems a bit of a niche problem
It broadens your attack surface. There are a couple of good summaries of the various concerns and tradeoffs online.

If you have an API that can authorize an id+password, it will be much easier to break in with a scan of [0-65563]/password123 than trying accounts individually. If IDs are sequential, it's fairly fast to find a reasonable upper limit and then try to hide the scan as normal traffic. For a random ID space, most results will be visible as an invalid id metric, which is much more detectable as an attack (even though you only respond to the user with Naup and no details).

Consider something like email. If an external mailer wants to attack a company they need to find names. If addresses were all abc123@ they would be able to increase their true positive/negative rates.

For most cases it's there to protect against unforseen API issues. Sure your users are authorized on FaceGlimpse, but then a frontend team decided to make the messenger use user IDs. Since the site grants you access to your friends, via their email accounts, the app can map their userid so you can message them. Yay! Then some kid in college skipping their art class decides... I can just message user IDs 1 through 99999 and see if they know how to answer this homework question. Everything is authorized because those steps precede the message API, which can assume "they must be friends".

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider
I would simply not have security holes in the first place, then my ids don't matter.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
feel free to have an internal to external id mapping table that is used in all api views. relish in the joy of not giving a gently caress what underlying id type the dba/pm/idiot/yourself wanted to use at the time. know that performance is impacted approximately 0.00000%

redleader
Aug 18, 2005

Engage according to operational parameters
half my job is just mapping ids around

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
What's the largest number of columns you've ever had to work with in one table? I have a table with 20 columns, 5 of which are FKs to lookup tables.

This is as concise as I think I can make it, and I still feel it's excessive. Someone tell me it's not :gonk:

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

D34THROW posted:

What's the largest number of columns you've ever had to work with in one table? I have a table with 20 columns, 5 of which are FKs to lookup tables.

This is as concise as I think I can make it, and I still feel it's excessive. Someone tell me it's not :gonk:

at my current job there is a characteristics table that has 160 columns

Sir Bobert Fishbone
Jan 16, 2006

Beebort
Yeah, I regularly run across tables with 150-200 columns in them.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

raminasi posted:

My team spoke with an AWS technical person about some trouble we were having wiring their managed GraphQL service up to a relational database (disclaimer: this was a prototype and we were exploring whether it made sense knowing that it might not) and the answer we got was “Why aren’t you using DynamoDB?” He did not seem at all prepared for our answer, which was “Because we have relational requirements on our data and nowhere near enough of it to feel any performance constraints from a relational system.”

I’m the opposite of that guy, when I see someone using noSQL my question is usually “so why aren’t you using Postgres?”.

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider
Hate being forced to work with MySQL. It's such a garbage db. Has it improved? Sure. Is it good now? No.

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



redleader posted:

maybe knowing the current id and growth rate could leak a bit of information? seems a bit of a niche problem

i guess it could be defence in depth if your authn/z checks are buggy?

Yep, both of these are decent reasons to use GUIDs IMO.

GUID reasons, even :dadjoke:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply