Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Thirteenth Step
Mar 3, 2004

I want to do some maths with the 2 tables, same as before 'staff' and 'staff_shift'.

The users rate of pay is found under the 'rate' field in the 'staff' table.
The number of shifts they work will be found by the number of times a certain 'staffid' occurs in the 'staff_shift' table. and then the whole lot needs to be multipled by 4 becuase that's how long each shift is in hours (in this case).

does anyone know how I can do this? I wasn't sure to put this in here or the PHP thread.

Wild guesses;

SELECT rate, COUNT(staffid) FROM staff,staff_shift WHERE staffid = 11

or...

SELECT rate FROM staff
SELECT COUNT(staffid) FROM staff_shift WHERE staffid = 11

:ohdear:

Adbot
ADBOT LOVES YOU

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Well those should both work. Actually I'm not certain of the first one because I'm not sure whether MySQL would protest about not knowing which "staffid" you were COUNTing. In context of the query the ambiguity is resolved, but I don't know. If it works at all it'll give you the correct answer. If I had written that query I'd have written
code:
SELECT
    `staff`.`rate`,
    COUNT(`staff_shift`.`staffid`) AS `numshifts`
FROM
    `staff`
    JOIN `staff_shift` ON `staff`.`staffid` = `staff_shift`.`staffid`
WHERE
    `staff`.`staffid` = 11
because that's the way I write them, but maybe you would find that verbose.

Or if you were happy to do the calculation in the query it would be
code:
SELECT
    `staff`.`rate` * COUNT(`staff_shift`.`staffid`) * 4 AS `paymentdue`
FROM
    `staff`
    JOIN `staff_shift` ON `staff`.`staffid` = `staff_shift`.`staffid`
WHERE
    `staff`.`staffid` = 11
Or of course you could modify it further to do the calculation for all staff members at the same time.

Nebulis01
Dec 30, 2003
Technical Support Ninny

Victor posted:

You need commas at the end of your MOVE lines.

Thanks :)

Thirteenth Step
Mar 3, 2004

Hammerite posted:

Or of course you could modify it further to do the calculation for all staff members at the same time.

How would I do this? If you're thinking what I'm thinking and thats a table of users with the results of the query you posted next to them all telling them how much they're supposed to be paid then that sounds perfect.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
code:
SELECT
    `staff`.`staffid`,
    `staff`.`rate`,
    `SubqueryA`.`numshifts`,
    `staff`.`rate` * `SubqueryA`.`numshifts` * 4 AS `payment`
FROM
    `staff`
    JOIN (
        SELECT
            `staffid`,
            COUNT(*) AS `numshifts`
        FROM
            `staff_shift`
        GROUP BY
            `staffid`
    ) AS `SubqueryA` ON `staff`.`staffid` = `SubqueryA`.`staffid`
You could also get it to return rows for all staff, including those who have no shifts:

code:
SELECT
    `staff`.`staffid`,
    `staff`.`rate`,
    CASE
        WHEN `SubqueryA`.`numshifts` IS NULL THEN 0
        ELSE `SubqueryA`.`numshifts`
    END AS `numshifts`,
    CASE
        WHEN `SubqueryA`.`numshifts` IS NULL THEN 0
        ELSE `staff`.`rate` * `SubqueryA`.`numshifts` * 4
    END AS `payment`
FROM
    `staff`
    LEFT JOIN (
        SELECT
            `staffid`,
            COUNT(*) AS `numshifts`
        FROM
            `staff_shift`
        GROUP BY
            `staffid`
    ) AS `SubqueryA` ON `staff`.`staffid` = `SubqueryA`.`staffid`

Victor
Jun 18, 2004
Man, do you guys have any text left on your ` key, or has it all rubbed off?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
billions of blue blistering backticks

Victor
Jun 18, 2004
Isn't it because MySQL's syntax errors are insanely useless? (Therefore, you're in constant fear of using a word that might be a keyword.) Or that's the story I've heard. Maybe I'm just too used to the modern luxury of syntax coloring...

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
The only serious reason I know why you would have to use them is if you have schema object names that contain special characters. Of course it would be much more sensible to stick to alphanumerics and underscores for names.

I don't think I've ever written a query that requires them in order to work. I have a habit of putting them in, though. To some extent I find it helpful when reading a query now.

Victor
Jun 18, 2004
Makes me think of Hungarian notation, except worse.

rugbert
Mar 26, 2003
yea, fuck you

Hammerite posted:

So are you saying that you're only interested in the values in post_id and you want to get distinct values back? Change SELECT * to SELECT DISTINCT post_id.

From your second paragraph it sounds like you want to identify post_ids found in rows satisfying particular criteria, then get back all of the rows in which those post_ids appear irrespective of whether they caused the post_id to be included. One way you could do this is by issuing

code:
SELECT * FROM table_name WHERE post_id IN (
    SELECT DISTINCT post_id FROM TableName WHERE [where conditions]
)

That almost worked, it grouped everything by post_id but the search conditions werent met. it pulled some columns where bedrooms is 2 or 3.

so I need it do grab the post_id if say, bedrooms=1 AND price>=900. I guess its the OR statement thats causing it, but adding another AND gives me zero results for some reason. hmmmm

Sub Par
Jul 18, 2001


Dinosaur Gum

rugbert posted:

That almost worked, it grouped everything by post_id but the search conditions werent met. it pulled some columns where bedrooms is 2 or 3.

so I need it do grab the post_id if say, bedrooms=1 AND price>=900. I guess its the OR statement thats causing it, but adding another AND gives me zero results for some reason. hmmmm

If you're using a modified version of these where conditions you posted on the last page:
code:
(meta_key = 'Price' AND meta_key BETWEEN 1 AND 900) AND (meta_key = 'Bedrooms' AND meta_key = '1')
note that you should be saying meta_key = 'Bedrooms' and meta_value = '1'. You're repeating meta_key instead. You're doing the same thing for price.

Thirteenth Step
Mar 3, 2004

Going back to my scenario mentioned above (sorry) with the staff table, the shift table and the staff_shift junction table.

How would I go about counting the number of staff (per department) currently occupied to shifts?

So I could output "there are 'x' 'department' staff assigned to this shift"?

Had a Google for it but im not really sure what im looking for.

Fruit Smoothies
Mar 28, 2004

The bat with a ZING
This is probably a really simple question:
I have two tables, an inventory and a shop. Assuming the shop and inventory are as simple as these tables (respectively)

code:
---------------------------------
| ItemID	| Cost		|
---------------------------------
| 1		| 100		|
---------------------------------
| 2		| 200		|
---------------------------------

-------------------------------------------------
| UserID	| ItemID	| Number	|
-------------------------------------------------
| 1		| 1		| 20		|
-------------------------------------------------
In order to get both the cost, and the number in one query, you could do a JOIN:

"SELECT * from `shop` JOIN `inventory` on `shop`.`ItemID` = `inventory`.`ItemID` WHERE `UserID` = 1";

This ignores the second item in the `shop` table because the user doesn't have any.
Basically, what I want, is the COMPLETE shop table, AND the user's respective item - should that item have a row. I can't help but think a sub-query is needed here, but I've no idea what to look up.

Thanks.

Sub Par
Jul 18, 2001


Dinosaur Gum
Edit: Nevermind, I think I misunderstood.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Fruit Smoothies posted:

This is probably a really simple question:
I have two tables, an inventory and a shop. Assuming the shop and inventory are as simple as these tables (respectively)

code:
---------------------------------
| ItemID	| Cost		|
---------------------------------
| 1		| 100		|
---------------------------------
| 2		| 200		|
---------------------------------

-------------------------------------------------
| UserID	| ItemID	| Number	|
-------------------------------------------------
| 1		| 1		| 20		|
-------------------------------------------------
In order to get both the cost, and the number in one query, you could do a JOIN:

"SELECT * from `shop` JOIN `inventory` on `shop`.`ItemID` = `inventory`.`ItemID` WHERE `UserID` = 1";

This ignores the second item in the `shop` table because the user doesn't have any.
Basically, what I want, is the COMPLETE shop table, AND the user's respective item - should that item have a row. I can't help but think a sub-query is needed here, but I've no idea what to look up.

Thanks.

Wouldn't this be an outer join? Inner join only 'works' if the id being joined on is present in both tables, for rows where that's not the case those rows are simply not included. I'm coming at this from an MSSQL perspective mind you.

rugbert
Mar 26, 2003
yea, fuck you

Sub Par posted:

note that you should be saying meta_key = 'Bedrooms' and meta_value = '1'. You're repeating meta_key instead. You're doing the same thing for price.

ugh, I dont know why all of my posts have had tons of typos these past few days.

but yea, heres what I have SELECT * FROM wp_postmeta WHERE post_id IN (SELECT DISTINCT post_id FROM wp_postmeta WHERE (meta_key = 'Price' AND meta_value BETWEEN 1 AND 900) OR (meta_key = 'Bedrooms' AND meta_value = '1'));


I would think that changing that Or and an AND would give me all posts where bedrooms=1 and price=900 but I get an empty set back.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Have you checked that your subquery actually returns a list of post IDs?

Also, formatting your query better would help with the readability...

Fruit Smoothies
Mar 28, 2004

The bat with a ZING

Scaramouche posted:

Wouldn't this be an outer join? Inner join only 'works' if the id being joined on is present in both tables, for rows where that's not the case those rows are simply not included. I'm coming at this from an MSSQL perspective mind you.

The issue is when I include "WHERE `UserID` = 1" at the end. In the case of a LEFT INNER and OUTER JOIN, the shop rows for which the user doesn't have a record are not included.
The WHERE needs to applied before the join, I think, not during...

rugbert
Mar 26, 2003
yea, fuck you

Triple Tech posted:

Have you checked that your subquery actually returns a list of post IDs?

Also, formatting your query better would help with the readability...

It does return the post_ids, BUT it doesnt follow the search rules. Its getting stuff that have two bedrooms as well.

here is my clean code!
code:
SELECT *
FROM   wp_postmeta
WHERE  post_id IN (SELECT DISTINCT post_id
                   FROM   wp_postmeta
                   WHERE  ( meta_key = 'Price'
                            AND meta_value BETWEEN 1 AND 900 )
                           OR ( meta_key = 'Bedrooms'
                                AND meta_value = '1' )); 
wow, first post magic. I like that tool.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

rugbert posted:

It does return the post_ids, BUT it doesnt follow the search rules. Its getting stuff that have two bedrooms as well.

here is my clean code!
code:
SELECT *
FROM   wp_postmeta
WHERE  post_id IN (SELECT DISTINCT post_id
                   FROM   wp_postmeta
                   WHERE  ( meta_key = 'Price'
                            AND meta_value BETWEEN 1 AND 900 )
                           OR ( meta_key = 'Bedrooms'
                                AND meta_value = '1' )); 
wow, first post magic. I like that tool.
This is why EAV can be the devil. Try this:
code:
SELECT *
FROM   wp_postmeta
WHERE  post_id IN (SELECT post_id
                   FROM   wp_postmeta
                   WHERE  ( meta_key = 'Price'
                            AND meta_value BETWEEN 1 AND 900 ))
       and post_id IN (SELECT post_id
                       FROM   wp_postmeta
                       WHERE  ( meta_key = 'Bedrooms'
                                AND meta_value = '1' )); 

rugbert
Mar 26, 2003
yea, fuck you

Jethro posted:

This is why EAV can be the devil. Try this:

cooool yea I got it. All my search params work now! But only when Im in terminal throwing SQL querys manually. I tried to query from PHP and for some reason it doesnt like this part:
code:
	       AND post_id IN (SELECT post_id
                       FROM   wp_postmeta
                       WHERE  ( meta_key = 'Utilities'
                                AND meta_value LIKE '%Water%' ))
This will kill my search too where $searchAvail = '%-08-%'
code:
       AND post_id IN (SELECT post_id
                       FROM   wp_postmeta
                       WHERE  ( meta_key = 'Availability'
                                AND meta_value LIKE '%-$searchAvail-%' ))

But again, only when I query from the webpage. That exact code is fine form terminal. Is it the % signs? I just tried to escape out but no.

rugbert fucked around with this message at 21:08 on Apr 25, 2010

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Fruit Smoothies posted:

The issue is when I include "WHERE `UserID` = 1" at the end. In the case of a LEFT INNER and OUTER JOIN, the shop rows for which the user doesn't have a record are not included.
The WHERE needs to applied before the join, I think, not during...

When you do a LEFT JOIN, at least one row is produced for every row in the left-hand table - if there aren't any rows in the right-hand table that satisfy the join condition, then a single row is produced in which all of the columns from the right-hand table are NULL. When your query looks for rows where UserID = 1, those rows (of the joined table) that correspond to items not appearing for that user in the right-hand table aren't included because the UserID value is NULL and therefore it is not true that UserID = 1.

The solution here would be to make the check of the user ID part of the join condition. So change

"SELECT * from `shop` LEFT JOIN `inventory` ON `shop`.`ItemID` = `inventory`.`ItemID` WHERE `UserID` = 1";

to

"SELECT * from `shop` LEFT JOIN `inventory` ON `shop`.`ItemID` = `inventory`.`ItemID` AND `inventory`.`UserID` = 1";

tawxic
Jun 17, 2005

Growth
I have a table,
`purchases`:
-> 'id'
-> 'siteId'
-> 'userId'
-> 'itemId'

There is a situation where the data could be resubmitted, and I don't want a transaction to show up twice.

Can I make the pairing of 'siteId' 'userId' and 'itemId' unique? So they can have different values, but they can't have those same different values more than once?

Example:

INSERT INTO `purchases` (siteId, userId, itemId) VALUES(1,1,1);
// One row affected

INSERT INTO `purchases` (siteId, userId, itemId) VALUES(1,1,1);
// Zero Rows Affected

INSERT INTO `purchases` (siteId, userId, itemId) VALUES(2,1,4);
// One row affected

INSERT INTO `purchases` (siteId, userId, itemId) VALUES(2,1,4);
// Zero Rows Affected

tawxic fucked around with this message at 06:32 on Apr 26, 2010

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

tawxic posted:

I have a table,
`purchases`:
-> 'id'
-> 'siteId'
-> 'userId'
-> 'itemId'

There is a situation where the data could be resubmitted, and I don't want a transaction to show up twice.

Can I make the pairing of 'siteId' 'userId' and 'itemId' unique? So they can have different values, but they can't have those same different values more than once?

You certainly can enforce a uniqueness constraint at the database level.

I don't know what RDBMS you are using, but in MySQL you can add this uniqueness constraint using

code:
ALTER TABLE `purchases` ADD UNIQUE KEY `purchases_uk` (`siteId`, `userId`, `itemId`)
(probably similar for other DBMSs)

One thing to note: If you attempt to use the INSERT queries you posted to add rows to the table that violate the uniqueness contraint, MySQL won't silently add no rows at all (which I think is what you want), rather it will give an error message. To get the behaviour you want, use INSERT IGNORE.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Also... I don't know the specifics of your database design, but is the "id" column in table "purchases" used for something? If it's not, then perhaps it would be better to ditch it and make (siteId, userId, itemId) the primary key.

Nebulis01
Dec 30, 2003
Technical Support Ninny
In case anyone needs it, here is a quick batch script to copy the latest .bak file and restore the database. Thanks for the help with the SQL portion of the program.

code:
@echo off

:Variables
SET DatabaseBackupPath=\\backupservernamegoeshere

echo.
echo Copy Latest .bak to D:\
FOR /F "delims=|" %%I IN ('DIR "%DatabaseBackupPath%\*.bak" /B /O:D') DO SET NewestFile=%%I
copy "%DatabaseBackupPath%\%NewestFile%" "D:\"

echo.
echo Restore Database Databasename
sqlcmd -d master -Q ^
"RESTORE DATABASE [Databasename] ^
From DISK = N'D:\%NewestFile%' ^
WITH File = 1, ^
MOVE N'Example' TO N'D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\Example.mdf', ^
MOVE N'Example_log' TO N'D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\Example_log.ldf', ^
NOUNLOAD, REPLACE, STATS = 10"

echo.
echo Delete Newest .bak from D:\
del D:\%Newestfile%

tawxic
Jun 17, 2005

Growth

Hammerite posted:

Also... I don't know the specifics of your database design, but is the "id" column in table "purchases" used for something? If it's not, then perhaps it would be better to ditch it and make (siteId, userId, itemId) the primary key.

I had no idea a primary key could be comprised of multiple columns!

Thanks for your help, Hammerite.

Lamb-Blaster 4000
Sep 20, 2007

is it possible to get a comma delimited list of values in a subquery that returns multiple results?

I thought maybe concat_ws() would be nice enough to accept a data set, but no such luck

code:
select 
  s.*, 
  concat_ws(',',
           (select 
             name 
            from evites_categories
            where id in 
                     (select 
                        categoryID 
                      from evites_users_to_categories 
                      where userID=s.id
                     )
             )) as categories 
from evites_subscribers s 
the ideal result being something like
code:
+---------+--------------------+
|name     | categories         |
+---------|--------------------|
|John Doe | Subscriber, Tester |
|Jane Doe | Subscriber, Client |
+---------|--------------------|

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

tawxic posted:

I had no idea a primary key could be comprised of multiple columns!

Hmm... Just because something can be done doesn't mean it should be done. Generally, you don't want your primary keys to be made up of multiple columns...

More specifically, your primary keys should almost always be single column, synthetic integers...

RoadCrewWorker
Nov 19, 2007

camels aren't so great
I've got a very basic 3 tables setup:
code:
A: [A_id,A_data],~500 000 rows PK: A_id, Index on A_data, 
L: [A_id,B_id],~20 000 000 rows PK: (A_id,B_id), Index on B_id 
B: [B_id,B_data],~ 200 000 rows PK: B_id, UNIQUE on B_data. 
_ids are integers, _data are VARCHAR(20)
Out of curiosity i've imported this little test on both SQLite and MySQL. For a simple select count(A_id) from L natural join B where B_data='search'; both clock in equally at around 200ms.
This is the tricky part: I want to find all A_data (or just the count for benchmarking reasons) where L links A_id to a low (but arbitrary) number of 'B_data's specified by wildcard pattern matching, including the evil '%sea%rch%'.

Attempt 1:
code:
select count(A_data) from A 
natural join (select A_id from L natural join B where B_data like 'search1') ...
natural join (select A_id from L natural join B where B_data like 'searchN');
N=3: sqlite3.exe ~ 45 seconds, mysql.exe ~ 8 minutes.

Attempt 2:
code:
select count(A_data) from A where A_id in 
(select A_id from L natural join B where B_data like 'search1') and A_id in ...
(select A_id from L natural join B where B_data like 'searchN') and A_id in (A_id);
N=3: sqlite3.exe returns in <20 seconds, mysql.exe still runs for ~7 minutes.

:smith: Obviously i'm missing some vital part of the MySQL mindset, be it either an index or query pattern. Anyone got some obvious pointers?

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Triple Tech posted:

Hmm... Just because something can be done doesn't mean it should be done. Generally, you don't want your primary keys to be made up of multiple columns...

More specifically, your primary keys should almost always be single column, synthetic integers...

oh my god I am agreeing with triple tech what is going on

But yeah he's right, PK's should always be synthetic in my opinion, and if you need to enforce uniqueness across a set of columns just throw a unique index on them.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

RoadCrewWorker posted:

:smith: Obviously i'm missing some vital part of the MySQL mindset, be it either an index or query pattern. Anyone got some obvious pointers?

Have you looked at the EXPLAIN for the query?

Also, what storage engine are you using? InnoDB and MyISAM might behave slightly differently for your query because InnoDB will actually implement the index on B_id (in table L) as an index on (B_id, A_id), whereas MyISAM won't.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Markoff Chaney posted:

But yeah he's right, PK's should always be synthetic in my opinion, and if you need to enforce uniqueness across a set of columns just throw a unique index on them.

Can you elaborate on why you reckon PKs should always be synthetic integers? I don't see how this makes sense in a lot of contexts where you want to implement a linking table. For example, say I have tables Student, Course and Course_Registration and Course_Registration is just (Student, Course) (before we address primary keys), why would you need to add a synthetic column and put an additional index on the table rather than just make (Student, Course) the primary key?

RoadCrewWorker
Nov 19, 2007

camels aren't so great

Hammerite posted:

Have you looked at the EXPLAIN for the query?

Also, what storage engine are you using? InnoDB and MyISAM might behave slightly differently for your query because InnoDB will actually implement the index on B_id (in table L) as an index on (B_id, A_id), whereas MyISAM won't.
I've gone through EXPLAIN for pretty much every query and subquery, and everything but the final A_data select (which is the dreaded ALL :( ) uses appropriate indices - originally i had L with PK(A_id,B_id),IX(A_id),IX(B_id) but apparently IX(A_id) is already contained in the PK since it's a left prefix.

I guess i'll try adding more indices to A or switching to InnoDB.

I also though about compiling a temporary T[A_id,B_datas] table where B_datas is simply group_concat(B_data) from L natural join B group by A_id, then FULLTEXT B_datas and use MATCH AGAINST, but that seems like an awful fallback solution.

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Hammerite posted:

Can you elaborate on why you reckon PKs should always be synthetic integers? I don't see how this makes sense in a lot of contexts where you want to implement a linking table. For example, say I have tables Student, Course and Course_Registration and Course_Registration is just (Student, Course) (before we address primary keys), why would you need to add a synthetic column and put an additional index on the table rather than just make (Student, Course) the primary key?

What if the student has to take the same course twice?

This is something of a holy war among the RDBMS crowd on the internet so if you google the subject you'll find plenty of arguments on either side, but:

-an incremented integer for a PK will always be unique, any sort of composite key can have the rules change and welp now your data model doesn't work (a good example of this is SSNs, you might *think* they're unique but nope they get recycled and assigned to new people)

-a synthetic key lets you join to other tables on one column instead of more than one, a good example would be like a code table: with a synthetic pk you just say (oracle JOIN syntax)
code:
select <stuff_from_child_table>
     , <stuff_from_code_table>
  from <child_table>
     , <code_table>
 where <child_table>.<code_table_synthetic_key> = <code_table>.<synthetic_key>
instead of having to join across like 3 different columns. This makes FKing easier too.

-if you are hurting for disk so bad that adding one more index kills you, You Are Doing It Wrong (small caveat here, more indexes = more I/O on writes)

-it just makes query writing easier. you can get one specific row back by specifying the synthetic key in your WHERE instead of having to hit up multiple columns


I'm sure that I am missing some huge arguments for synthetic PKs here, but those are the reasons that popped into my brain in response to your question

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Why synthetic
- thinner joins, faster joins
- simpler joins
- automatically assigned
- never changes

The never changes thing is really important. Let's say you know today that my username is unique. It is, forreal. And then you litter all your XREF tables with my username hardcoded there. Then I ask for a username change. Woops, didn't see that coming, gotta modify all the tables now. If it was synthetic, it would never interrupt your domain data ever.

You never really appreciate these database rules of thumb until the otherwise bites you in the rear end. If you don't believe us, go ahead and make a three column PK and continue to use that three column PK in your XREF tables and joins.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Well, okay. I guess I would need to work with more complicated schema designs than I have done before the kind of things you guys are talking about would have made a difference to me.

Having said that, re the faster joins thing: Is it in fact faster to join on a single 8-byte integer column than two 3-byte integer columns and one 2-byte column (say) in a particular order? I'm asking for information, I really don't know. Does it differ from implementation to implementation?

Obviously your SQL will be more verbose if you have to write JOIN ON a.x = A.x AND a.y = a.y AND a.z = a.z everywhere, but SQL is wordy anyway, so that doesn't really matter.

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Hammerite posted:

Well, okay. I guess I would need to work with more complicated schema designs than I have done before the kind of things you guys are talking about would have made a difference to me.

Having said that, re the faster joins thing: Is it in fact faster to join on a single 8-byte integer column than two 3-byte integer columns and one 2-byte column (say) in a particular order? I'm asking for information, I really don't know. Does it differ from implementation to implementation?

Obviously your SQL will be more verbose if you have to write JOIN ON a.x = A.x AND a.y = a.y AND a.z = a.z everywhere, but SQL is wordy anyway, so that doesn't really matter.

Always assume your schema will get more complicated dude, I've worked on my current db instance for 5 of the 11 years of its existence and it has grown substantially more complicated in that time (well over 1000 tables now :emo:).

I'd have to test the join speed thing on my particular oracle implementation to give you a firm answer there but my instinct is yeah, joining on one 8-byte column should be faster.

Verbose code is more of a pain to read and more likely to gently caress up a copy/paste on, synthetic keys keep things simple.

Really what it boils down to is that the only downside to having a synthetic pk and a unique index across the columns that you would have used as a natural key is 1) disk usage (negligible until your table get huge) and 2) I/O (also negligible until your table get huge). And by "huge" I mean the table is like 50 million rows and hundreds of columns.

The other thing it boils down to is if you have to add a synthetic PK *later* instead of just adding it *now* it's going to be a horrible long running script to do so instead of a quick little bit of DDL out the gate.

Adbot
ADBOT LOVES YOU

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

Hammerite posted:

Having said that, re the faster joins thing: Is it in fact faster to join on a single 8-byte integer column than two 3-byte integer columns and one 2-byte column (say) in a particular order? I'm asking for information, I really don't know. Does it differ from implementation to implementation?

It probably does matter to some extent but to me it pales in comparison to other issues. And I don't know the answer. So to me it's like micro optimization. You (not you you) are trying to squeeze a couple micro seconds out of something when a much larger design related problem is looming over head.

For really small dinky databases, like when you're just starting out or it's a pet project, I say go for it. gently caress synthetic keys and do what you love. But as you start to get all big iron and enterprisey, you'll start to see the upsides and downsides to certain designs. As with nearly any system, doing it the most robust way has a very high setup cost. In this case, inventing synthetic keys for every first class object and maintaining XREF tables. On the other hand, you can just pump out a project really fast if you use your natural key as your PK. And, the tables will be human readable right off the bat.

The answer to all database related questions is IT DEPENDS.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply