Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Guildenstern Mother posted:

Is it possible someone could explain composite tables and their use to me, in the kind of terms you'd use with a particularly slow learning child?

A composite table? I haven't heard that term used before. Composite keys sure, but not composite tables. When I tried googling it I didn't really find anything relevant but I did see some people on stackoverflow using the term to refer to join tables/junction tables that have only foreign key columns that all form a composite primary key together. I can explain composite key, and then you can tell me if you're on a different track entirely.

Most tables in most SQL databases have what's called a surrogate primary key. In plain terms, that means that each row gets a unique identifier (a key) that doesn't mean anything (surrogate). In other words, we make up a value (commonly either a positive integer or a UUID) that's unique for this table when we insert a row, and the value has nothing to do with the data we actually want to store, it's just something we can use to identify the row. You can of course have more than one column in a table with a unique index on it, but in many database engines the primary key has a bit of a special status under the hood - we don't need to get into the details of that right now.

The opposite of a surrogate primary key is a natural primary key. What that means is that instead of making up a unique ID for each row, we look at the data we're going to store in each row and see that a-ha, here's something about that data that actually happens to be a unique identifier already, and so we use that as the primary key instead of making up our own ID for each row. For example, we could make a table of users and use the username (or email address) as the primary key. Now, the thing here is that indexes in relational databases can cover more than one column, and that extends to primary keys. To make up a contrived example, say we have a table of products from different manufacturers, and we want to use the model number as a primary key, but some manufacturers use the same model numbers - then we could use a primary key that consists of the two columns (manufacturer_id, model_number). This is called a composite primary key - in simpler terms it's just a multi-column primary key.

Now, say that you have a many-to-many relationship where the join table doesn't contain any information except the two foreign keys. For example, say we have teachers and classes, one teacher can teach many classes and one class can have many teachers, and we have a join table teachers_classes with the two foreign keys teacher_id, class_id. Since a teacher shouldn't be teaching the same class more than once, the combination (teacher_id, class_id) is unique and we can make these two columns the composite primary key of teachers_classes. Then we don't need to have any other columns in that table. I think this is what the people on stackoverflow I found mean by "a composite table".

As for when you should use composite primary keys, the answer is simple: you shouldn't. There are many reasons for this, but the most important one is that a primary key must be stable, by which is meant that it must never change. However, essentially all data that you put into a database will need to change at some point, if not now then at some later point when the requirements have changed. Additionally, composite primary keys have almost no meaningful benefits - a surrogate primary key is only a couple of extra bytes per row, and you can always put a unique index on your supposed composite primary key to ensure data integrity. There are of course some niche use cases, but they're so rare that in general it's very safe to say that you should always use a surrogate primary key.


I realize this might've been an answer to something you didn't ask, and sorry if I over-explained. Still, I hope there was something useful in there.

TheFluff fucked around with this message at 00:37 on Jan 8, 2022

Adbot
ADBOT LOVES YOU

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
No that was fantastically helpful actually, thank you so much! This is in regards to my question upthread, and I have the tables set up like abelwingnut suggested, but was advised to make a table with the composite ID between recipes and ingredients, but while my buddy was helpful getting all this nonsense set up correctly in spring his explanation of what a composite ID was and why we were making it was a bit muddy. I'm still not exactly clear on what he intended for me to use it for, but it seems like its not going to do anything that my junction table doesn't do already so I'll probably go ahead and delete it.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Guildenstern Mother posted:

Is it possible someone could explain composite tables and their use to me, in the kind of terms you'd use with a particularly slow learning child?
"Remember when your teacher had every child get their favorite toy? That list, one item from the first collection of children, matched with one item from the second collection of toys, is the rows of the composite table".

The purpose is to relate two collections. Each relationship is a statement matching one item from the 'left' collection with one item from the 'right' collection. The objects in each collection have their own properties, so it doesn't make sense to just indicate that one is part of the other (with a simple foreign key, such as a book referencing its publisher). In fact it permits multiple relationships (one book can have many authors, yet many books can be associated with one author). Meanwhile, such a relationship could have properties of its own (a person on a movie has a role, but may have different or multiple roles on different movies).

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

TheFluff posted:

As for when you should use composite primary keys, the answer is simple: you shouldn't. There are many reasons for this, but the most important one is that a primary key must be stable, by which is meant that it must never change.

This argument doesn't make any sense. It applies to the creation of surrogate primary keys for tables that aren't junction tables ("composite tables"), but it doesn't apply to junction tables. Those are using the primary keys of your other, non-junction tables, which you've already very sensibly chosen to be meaningless outside the database. The composite primary keys are therefore also meaningless outside the database.

edit: I thought you were exclusively discussing junction tables in the passage I quoted, but re-reading the post, you might not be.

edit: also, I realise it's really irritating to type several paragraphs to help someone out and then a third person comes in and nitpicks a small part of it, so feel free to tell me to sod off if you're so inclined.

Hammerite fucked around with this message at 13:06 on Jan 8, 2022

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Hammerite posted:

This argument doesn't make any sense. It applies to the creation of surrogate primary keys for tables that aren't junction tables ("composite tables"), but it doesn't apply to junction tables. Those are using the primary keys of your other, non-junction tables, which you've already very sensibly chosen to be meaningless outside the database. The composite primary keys are therefore also meaningless outside the database.

edit: I thought you were exclusively discussing junction tables in the passage I quoted, but re-reading the post, you might not be.

edit: also, I realise it's really irritating to type several paragraphs to help someone out and then a third person comes in and nitpicks a small part of it, so feel free to tell me to sod off if you're so inclined.

Don't worry about it, I've been guilty of more than one nitpick myself :v:

Anyway, yeah, I was talking about composite keys in general. It's true that in the specific case of a junction table that has only the foreign keys, a composite primary key is fine. However, it doesn't really have much of a meaningful benefit, and as always changing requirements might make it necessary to change it later. In some databases (e.g. MSSQL) composite primary keys also have some other caveats to them (since the on-disk structure of the table is dictated by the primary key and its sorting order). In general I think it's not worth trying to nuance the recommendation away from composite primary keys, especially not if you're just trying to introducing the concept.

TheFluff fucked around with this message at 17:29 on Jan 8, 2022

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
I have a bunch of filepaths as rows, using postgres. For the row /tank/my/example/path I would like to generate a row for each parent in the hierarchy. So this would generate the output /tank/my/example/path, /tank/my/example, /tank/my, and /tank. It's reasonably fine to assume the paths are well-formed, ie there are no directories with a / in the name itself.

I guess one option might be to have a function that outputs an array, and then unroll it?

Paul MaudDib fucked around with this message at 02:56 on Jan 12, 2022

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

Paul MaudDib posted:

I have a bunch of filepaths as rows, using postgres. For the row /tank/my/example/path I would like to generate a row for each parent in the hierarchy. So this would generate the output /tank/my/example/path, /tank/my/example, /tank/my, and /tank. It's reasonably fine to assume the paths are well-formed, ie there are no directories with a / in the name itself.

I guess one option might be to have a function that outputs an array, and then unroll it?

https://pastebin.com/GkYRD850 :getin:

used pastebin because cloudflare blocks the forum post due to suspected sql injection :cawg:

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

DELETE CASCADE posted:

https://pastebin.com/GkYRD850 :getin:

used pastebin because cloudflare blocks the forum post due to suspected sql injection :cawg:

Thanks!

Baculus
Oct 25, 2007

I DID A BIG CACA IN MY DRUG STORE DIAPER
in ten years of shitposting it never occurred to me that SA would have forums for the poo poo I do to pay for new forum AVs

anyone have a lot of in depth experience with ESRI geodatabases?

MREBoy
Mar 14, 2005

MREs - They're whats for breakfast, lunch AND dinner !
If I want a particular column in a table to hold only a 6 digit number is that something that can be enforced by the SQL server/database or do I have to do this farther upstream ? I have effectively 0 SQL experience and I'm trying to write a small thing in PHP.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

MREBoy posted:

If I want a particular column in a table to hold only a 6 digit number is that something that can be enforced by the SQL server/database or do I have to do this farther upstream ? I have effectively 0 SQL experience and I'm trying to write a small thing in PHP.

You can do this using check constraints, which are supported by most databases. If you are using MySQL, then unless you're using a pretty recent version, it's not supported and you will have to do it in the application layer.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
My first inclination was DECIMAL(6,0), if you need "0 upto 999999". See the Data Type support section. A regular integer with check constraint as Hammerite said, if you need "100000 upto 999999".

And varchar(6) or char(6) if you meant "numbers like 7f7f7f". (But sRGB8 can be converted to integers, which take less space, unless you need to support arbitrary css style colors and such which will take an arbitrary string.)

And if you mean "only numbers like 4.123456 and 789.654321" then... I think the only approach in SQL would require the check constraint, storing as integers, and interpreting as millionths.

MREBoy
Mar 14, 2005

MREs - They're whats for breakfast, lunch AND dinner !

PhantomOfTheCopier posted:

My first inclination was DECIMAL(6,0), if you need "0 upto 999999". See the Data Type support section. A regular integer with check constraint as Hammerite said, if you need "100000 upto 999999".

And varchar(6) or char(6) if you meant "numbers like 7f7f7f". (But sRGB8 can be converted to integers, which take less space, unless you need to support arbitrary css style colors and such which will take an arbitrary string.)

And if you mean "only numbers like 4.123456 and 789.654321" then... I think the only approach in SQL would require the check constraint, storing as integers, and interpreting as millionths.

Your first suggestion looks to be fairly close to what I need (the number in question could be 000000 to 999999, but it always has to have 6 digits), I will check this out, thanks ! :tipshat:

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Crossposting from the Python thread.

I'm working on a Flask app to improve workflows for me and as coding/webdev practice, using SQLite3/SQLAlchemy as the backend.

I have a table color with the fields id, name, abbrev. This is intended to be a lookup table and something that I could pull off easily in Access.

I also have a table storm_panel_header with fields id, customer, job_no, header_color, sill_color, line_items where line_items is the one side of a one-to-many relationship with StormPanelLines.

Where I am running into trouble is defining the type of relationship between color and storm_panel_header. Both header_color and sill_color need to get their values from color.id, but color is also intended, like I said, as a lookup table.

That is to say, storm_panel_header will not be the one and only table referring to color by color.id.

I cannot figure out how to assign the foreign key and backref between the two and it's pissing me off.


EDIT: If I specify a ForeignKey relationship on storm_panel_header.header_color and
storm_panel_header.sill_color but no backref in color, does that essentially make it a one-way lookup? I don't ever foresee a situation in which I am going to need to look up what items across several tables have which color.

DELETE CASCADE posted:

alter table storm_panel_header add foreign key (header_color) references color (id);
alter table storm_panel_header add foreign key (sill_color) references color (id);

wouldn't this work?

Whatever implementation of SQLite3 that SQLAlchemy is using doesn't support ALTER TABLE. It's caused me several migration headaches. If I want to change a field definition I have to delete and recreate the table :shepicide: Either way, I haven't actually created the tables yet, I'm just creating the models for flask db migrate to work with.

D34THROW fucked around with this message at 21:55 on Feb 8, 2022

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
alter table storm_panel_header add foreign key (header_color) references color (id);
alter table storm_panel_header add foreign key (sill_color) references color (id);

wouldn't this work?

Just-In-Timeberlake
Aug 18, 2003
I have 2 queries that are only different by what the value being search on is and are performing wildly different. The good query runs in under a second, the bad query can take up to 40 seconds, even though the total result set can be much smaller. Here is the only difference (corpType):

good query:
code:
WHERE    ([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL) AND [invoice_paidon] BETWEEN '02/01/2022' AND '02/10/2022' AND (corpType = 'fsa')
bad query:
code:
WHERE    ([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL) AND [invoice_paidon] BETWEEN '02/01/2022' AND '02/10/2022' AND (corpType = 'mds')
Looking at the estimated query plan for each, for some reason they are different. This is where they diverge (the second image should start at the second nested loop, imgur is being pissy and won't let me upload a cropped version):

good query:


bad query:


So, aside from why it's doing something different for what is 99.99% the same query, is there a way to force the first plan?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

D34THROW posted:

Crossposting from the Python thread.

I was about to suggest you take it to the Python thread, yeah.

"Backrefs" aren't a thing in SQL, I think they're a SQLAlchemy concept. In SQL you can join in any direction and on any column or arbitrary predicate, whether or not there's a foreign key is irrelevant. Foreign keys are a data integrity rule, they're not a data access tool.

A foreign key from the colour table to $whatever would mean "you can't create a colour until a corresponding record exists in $whatever". Which is obviously wrong, you want to create a bunch of colours upfront, and only then use them in other tables.

What's more, FKs should not form circular dependencies, and some implementations like SQL Server actually hard enforce this, so "backrefs" like you describe are forbidden.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Just-In-Timeberlake posted:

I have 2 queries that are only different by what the value being search on is and are performing wildly different. The good query runs in under a second, the bad query can take up to 40 seconds, even though the total result set can be much smaller. Here is the only difference (corpType):

good query:
code:
WHERE    ([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL) AND [invoice_paidon] BETWEEN '02/01/2022' AND '02/10/2022' AND (corpType = 'fsa')
bad query:
code:
WHERE    ([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL) AND [invoice_paidon] BETWEEN '02/01/2022' AND '02/10/2022' AND (corpType = 'mds')
Looking at the estimated query plan for each, for some reason they are different. This is where they diverge (the second image should start at the second nested loop, imgur is being pissy and won't let me upload a cropped version):

good query:


bad query:


So, aside from why it's doing something different for what is 99.99% the same query, is there a way to force the first plan?

Are these queries generated or obfuscated?
I ask because this is really weird and unnecessary:

code:

([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL)
Other assorted thoughts I have while looking at your query and execution plans:
Parameterizing your query would allow SQL Server to cache a genetic plan it could use for both cases.

What are the estimated and actual rows coming back from the Invoice table in both cases? What does the join predicate look like in the plans?

I'm suspicious of the index scan on the invoice table in the bad query.

Just-In-Timeberlake
Aug 18, 2003

Nth Doctor posted:

Are these queries generated or obfuscated?
I ask because this is really weird and unnecessary:

code:
([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL)
Other assorted thoughts I have while looking at your query and execution plans:
Parameterizing your query would allow SQL Server to cache a genetic plan it could use for both cases.

What are the estimated and actual rows coming back from the Invoice table in both cases? What does the join predicate look like in the plans?

I'm suspicious of the index scan on the invoice table in the bad query.

Nth Doctor posted:

Are these queries generated or obfuscated?

It is a dynamically created query with some, but not all, parameters. Due to the number of fields the user could potentially search on, by = or LIKE, while being able to search for multiple statuses and values, along with sorting and paging, using a purely parameter based solution + sp_executesql isn't possible. There is an OPTION (RECOMPILE) at the end of it all because this query will never be the same twice so caching will do more harm than good, I feel.

Nth Doctor posted:

Are these queries generated or obfuscated?

code:
([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL)

It's odd, but if the user was searching for both paid and unpaid invoices this catches them all. It's part of the first WHERE requirement, which needs something there since all the other dynamically created filters are prefaced with AND/OR.

quote:

What does the join predicate look like in the plans?

And as I just went to re-run it to get the full query plan, it starts loving working the way it should. Goddamn hoodoo database engines, I tell you what.

Thanks for taking the time to respond though.

Just-In-Timeberlake
Aug 18, 2003

Nth Doctor posted:

Are these queries generated or obfuscated?

It is a dynamically created query with some, but not all, parameters. Due to the number of fields the user could potentially search on, by = or LIKE, while being able to search for multiple statuses and values, along with sorting and paging, using a purely parameter based solution + sp_executesql isn't possible. There is an OPTION (RECOMPILE) at the end of it all because this query will never be the same twice so caching will do more harm than good, I feel.

Nth Doctor posted:

I ask because this is really weird and unnecessary:

code:
([i].[Invoice_PaidOn] IS NULL OR [i].[Invoice_PaidOn] IS NOT NULL)

It's odd, but if the user was searching for both paid and unpaid invoices this catches them all. It's part of the first WHERE requirement, which needs something there since all the other dynamically created filters are prefaced with AND/OR. As a test I changed it to [i].[InvoiceID] > 0 (primary key, guaranteed to be > 0), made no difference.

quote:

What does the join predicate look like in the plans?

This is the join for both

code:
FROM	[dbo].[Invoice] [i]
		INNER JOIN [dbo].[Customer] [c] ON [c].[CustomerID] = [i].[CustomerID] 
		INNER JOIN [dbo].[CorporationTypes] [ct] ON [ct].[CorporationPrefix] = [c].[corpType] 
		INNER JOIN [dbo].[Fleet] [f] ON [i].[FleetId] = [f].[FleetId]
		INNER JOIN [dbo].[Site] [s] ON [i].[SiteID] = [s].[SiteID]
		LEFT OUTER JOIN [dbo].[Driver] [d] ON [d].[DriverID] = [i].[DriverID]
		LEFT OUTER JOIN [dbo].[Vehicle] [v] ON [v].[VehicleID] = [i].[VehicleID]
		LEFT OUTER JOIN [dbo].[License] [l] ON [i].[LicenseID] = [l].[LicenseID]
When I run this on our sandbox DB (only ~500k rows) it works regardless what the corpType. The issue is running it on our production DB (~9 million rows). Each DB is on it's own AWS RDS instance, and the sandbox is on a much smaller (performance-wise) instance. The sandbox instance was created by scripting the production DB, so the indexes, keys, etc are all the same.

Searching for 'jli' (~6 million rows) is slow, but so is searching for 'mds', and that's only ~120k rows, while searching for a different type with ~1 million rows is almost instantaneous.

Weirdly, if I sort on the corpType field and filter by it, it is blazing fast no matter what the corpType filter is.

Just-In-Timeberlake fucked around with this message at 12:46 on Feb 9, 2022

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Sounds very much like a statistics issue. The query planner decides on where to start filtering by looking at statistics, trying to guess how selective any given index is likely to be and then based on that guess it tries to filter out as many rows as early as possible in the query. (More selective index = more distinct values = fewer rows returned for a filter with given value). In this case the only filter you actually have is corpType so naturally you'd want to start with that (which it looks like it does in the good query), but it seems like if you choose one particular value for corpType the query planner thinks it's insufficiently selective and not worth starting with. I'd be curious to see what the number of rows returned are at each node in your plan - I'd assume the key lookup on invoice returns basically the whole table? If MSSQL's query stats output includes row count estimates those are interesting too.

Basically what I think is happening is that with the bad query plan it starts looking at all invoices in the entire system while with the good query plan it starts by filtering out only customers with the given corpType and then only getting invoices for those customers (presumably you have a lot more invoices than customers). There's probably a sampling bias in the stats that makes it think only some values for corpType are selective. In order to make it do the right thing you have to make it realize that corpType is actually decently selective, possibly through trickery. If you sort on corpType you're just giving it a very good reason to use that index so it'll almost certainly start there, which is what you want, but it's not useful as a general solution.

See further MSDN on query planner stats and maybe also this old post of mine where I talk about a similar but not identical problem in postgres.

TheFluff fucked around with this message at 07:05 on Feb 10, 2022

Just-In-Timeberlake
Aug 18, 2003

TheFluff posted:

Sounds very much like a statistics issue. The query planner decides on where to start filtering by looking at statistics, trying to guess how selective any given index is likely to be and then based on that guess it tries to filter out as many rows as early as possible in the query. (More selective index = more distinct values = fewer rows returned for a filter with given value). In this case the only filter you actually have is corpType so naturally you'd want to start with that (which it looks like it does in the good query), but it seems like if you choose one particular value for corpType the query planner thinks it's insufficiently selective and not worth starting with. I'd be curious to see what the number of rows returned are at each node in your plan - I'd assume the key lookup on invoice returns basically the whole table? If MSSQL's query stats output includes row count estimates those are interesting too.

Basically what I think is happening is that with the bad query plan it starts looking at all invoices in the entire system while with the good query plan it starts by filtering out only customers with the given corpType and then only getting invoices for those customers (presumably you have a lot more invoices than customers). There's probably a sampling bias in the stats that makes it think only some values for corpType are selective. In order to make it do the right thing you have to make it realize that corpType is actually decently selective, possibly through trickery. If you sort on corpType you're just giving it a very good reason to use that index so it'll almost certainly start there, which is what you want, but it's not useful as a general solution.

See further MSDN on query planner stats and maybe also this old post of mine where I talk about a similar but not identical problem in postgres.

thanks you put me on the right track with tricking the query into using the right plan. I created a temp table that holds the customers that match the corp type, then join the invoice table on that, now the query plans are identical.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
That's... not the solution I was imagining but if it works for you then it works :v:

abelwingnut
Dec 23, 2002


where can i find side sql gigs? nothing permanent or anything. just looking to handle issues for side money. they can be major or minor.

Just-In-Timeberlake
Aug 18, 2003
Is there an equivalent to the SSMS activity monitor in Azure Data Studio? I feel like it should be staring me in the face but I can't find it.

joebuddah
Jan 30, 2005
Which would be more efficient?

Option A

Pruning data each month with a few days as a buffer before and after. Table naming allYourBaseAreMMYYYY

Then a view unioning all the pruned tables.
But I have to have 7years worth of data searchable for regulation reasons.



Or

Option B

Standard table with indexing




//////




A query with 84 unions vs one large table

Just-In-Timeberlake
Aug 18, 2003

joebuddah posted:

Which would be more efficient?
...
But I have to have 7years worth of data searchable for regulation reasons.
...
A query with 84 unions vs one large table

We have a table with ~20 years of data in it that's approaching 9 million rows that is plenty fast to query with the right indexes in place.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
It's almost like you're asking about https://en.m.wikipedia.org/wiki/Partition_(database)

Much depends on your database, since different systems have different performance profiles for index insertions, data clustering, query optimization, etc. Partitions provide the benefit of not needing to start with the full data set when the time/date/partitioning range is known for the query.

wolrah
May 8, 2006
what?

Just-In-Timeberlake posted:

We have a table with ~20 years of data in it that's approaching 9 million rows that is plenty fast to query with the right indexes in place.
9 million rows isn't exactly massive in the database world, especially if we're talking about 20 years of data. I work for a small VoIP provider, we generate a couple hundred thousand rows on our CDR table every month, and we're a fart in the breeze to our upstreams. When I set up our first CDR processing system 15 years ago I didn't even know what an index was and it still ran fine on a random Pentium 4 desktop I had turned in to a server until the tables got large enough to fall in to swap.

joebuddah
Jan 30, 2005
The issue is I've got about 3 years of worth of pruned tables as my the predecessor had a thing for pruned tables. I've got to pull about 30% of the data in all those tables and put them in a table by itself

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


PhantomOfTheCopier posted:

It's almost like you're asking about https://en.m.wikipedia.org/wiki/Partition_(database)

Much depends on your database, since different systems have different performance profiles for index insertions, data clustering, query optimization, etc. Partitions provide the benefit of not needing to start with the full data set when the time/date/partitioning range is known for the query.

It's this as much as possible. The short term heartache will end, the stupidity of stamping down identical tables every month won't otherwise.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Using SQLAlchemy so it's an ORM but im trying to grasp the underlying SQL concepts so that I can get the right things and function calls here. If anyone can directly help with the SQLAlchemy that's great too.

I have a table, report, which has FKs for job_id, user_id, and report_type_id. These tie into the job_card, user, and report_type tables, respectively. What I want to do is use a query to generate a table that has the meaningful fields for each report.

As an example, I want "report with user_id=1 and job_id=1 and report_type_id=1" to return "report with author=D34THROW and job_id=JJ123456 and report_type=Composite Roof". Obviously these aren't 100%the field names but I think it's easy enough to figure out that instead of the FK, i want a different column of the parent table based on the value of the FK in the child table. I think.

If this were Access I could knock it up no worries. I might actually go see what SQL is generated by Access for a similar query in the meantime but i appreciate the help.

nielsm
Jun 1, 2009



You make a JOIN in your FROM clause, joining the related tables to the report table, on the condition of each table's key being equal to the matching foreign key of the report table.
That gives you rows containing the data from the report table, the job_card table, the user table, and the report_type table, which you can then select as appropriate.
The report.job_id etc. foreign key columns will still be present in the joined result, but if you don't include them in the list of fields to select, you won't see them in the result set.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I think the fact that I'm using an ORM makes this a little harder. The JOIN query wasn't what was giving me trouble after all - it was that I had to do $query_name.$table_name.$field_name in the Jinja template to get what I wanted. That's 4 hours of work solved. Thanks! :cheers:

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
Are there any open source/freeware schema modeling/migration tools available? I'd be looking at Postgres primarily.

pgModeler seems like an option, "SQL Power Architect" was another that came up.

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider

Paul MaudDib posted:

Are there any open source/freeware schema modeling/migration tools available? I'd be looking at Postgres primarily.

pgModeler seems like an option, "SQL Power Architect" was another that came up.

DBeaver has some schema modelling built into it and it's free.

Are you looking for schema migration or data migration? For data migration I'd use an ETL tool like Pentaho Data Integration. I think Pentaho has some schema modelling tools as well.

pipebomb
May 12, 2001

Dear God, what is it like in your funny little brains?
It must be so boring.
gently caress, I am an idiot. It's all in the table 'wp_blogs'.


I need a help - very rusty after a couple of years of pandemic related unemployment.

I've got a db with 23 or so wordpress sites in it (multisite). I'd like to return a list of each with the 'table##' and the 'siteurl'. This works when I do a general search for 'site url' but I'd like to get a list of sorts so I know what's what (going in blind to replace someone who didn't do documentation).

The search returns every wp_##_option containing 'siteurl', but I'd need to go through them one by one, as seen below.

Ideally, I'd like to return the tableID associated with each site.

table or wp_## | siteurl

If I choose one of the results, the string shown is:
code:
SELECT * FROM `wp_dbname`.`wp_##_options` WHERE (CONVERT(`option_id` USING utf8) LIKE 'siteurl' OR CONVERT(`option_name` USING utf8) LIKE 'siteurl' OR CONVERT(`option_value` USING utf8) LIKE 'siteurl' OR CONVERT(`autoload` USING utf8) LIKE 'siteurl')
I tried a wildcard for the table but no luck. What am I missing? He;lp?


Only registered members can see post attachments!

pipebomb fucked around with this message at 19:14 on Mar 1, 2022

Just-In-Timeberlake
Aug 18, 2003
boy I sure do love trying to unfuck a query based on a dozen views that are all like

view_1
view_2
view_3
...
view_final

and each view builds on the previous one until you get to the Voltron final view

which one is the problem? have fun unraveling each one!

imma murder the fucker who wrote this if I ever find him.

abelwingnut
Dec 23, 2002


holy god that's awful

had they never heard of a cte?

Adbot
ADBOT LOVES YOU

Just-In-Timeberlake
Aug 18, 2003

abelwingnut posted:

holy god that's awful

had they never heard of a cte?

I am 100% sure the answer is no

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply