Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Goat Bastard
Oct 20, 2004

Right, sorry the horrible column names made me not read the actual content of the querys too closely :).

You want to use the row_number() analytic function. It's like ROWNUM only you can specify how to number the rows.

code:
SELECT transactions.isbn13
      ,transactions.used
      ,transactions.new 
      ,prices.price
      ,prices.flag
FROM  (SELECT td_code       AS tide
             ,td_identifier AS isbn13
             ,used_on_hand  AS used
             ,new_on_hand   AS new
       FROM   f_title_description 
       JOIN   v_ad_on_hand 
              ON (td_code = itst_item_tide_code)
      ) items
JOIN   gb_transactions transactions
       ON (items.tide = transactions.gb_tide)
JOIN  (SELECT flag
             ,price
             ,tide 
       FROM  (SELECT pctx_itst_item_ittp_new_used_f AS flag
                    ,pctx_new_price                 AS price
                    ,pctx_itst_item_tide_code       AS tide
                    ,row_number() OVER (PARTITION BY pctx_itst_item_tide_code ORDER BY pctx_date_created DESC ) AS num
              FROM   f_price_change_txn 
             )
       WHERE  num <= 2
      ) prices ON (items.tide = prices.tide)
Edit: after some more thought that probably still isn't what you want as it will select two rows for every record, with one type of price on each row. I think that what you're probably looking for is
code:
SELECT transactions.isbn13
      ,transactions.used
      ,transactions.new 
      ,prices.used_price
      ,prices.new_price
FROM  (SELECT td_code       AS tide
             ,td_identifier AS isbn13
             ,used_on_hand  AS used
             ,new_on_hand   AS new
       FROM   f_title_description 
       JOIN   v_ad_on_hand 
              ON (td_code = itst_item_tide_code)
      ) items
JOIN   gb_transactions transactions
       ON (items.tide = transactions.gb_tide)
JOIN  (SELECT tide
             ,max(new_price)  AS new_price
             ,max(used_price) AS used_price
       FROM (SELECT CASE WHEN pctx_itst_item_ittp_new_used_f  = 'N' THEN pctx_new_price END AS new_price
                   ,CASE WHEN pctx_itst_item_ittp_new_used_f  = 'U' THEN pctx_new_price END AS used_price
                   ,pctx_itst_item_tide_code AS tide
                   ,row_number() OVER (PARTITION BY pctx_itst_item_tide_code
                                                   ,pctx_itst_item_ittp_new_used_f 
                                       ORDER BY     pctx_date_created DESC 
                                      ) AS num
             FROM   f_price_change_txn 
            )
       WHERE  num = 1
       GROUP BY tide            
      ) prices ON (items.tide = prices.tide)
I should point out that the only part of this I've tried to run is the prices sub-query, so the rest may be riddled with errors. Hopefully not though, as the only changes I made were formatting and adding aliases.

I still wasn't convinced about that rownum <= 2 thing, as it would have blown up if the two most recent rows for a TIDE had the same flag (maybe that's not possible, but better safe than sorry), so I've changed that a bit. It now selects the most recent of each type of price for each TIDE and puts them together in a row, which seems like what you were after.

Goat Bastard fucked around with this message at 20:24 on Mar 6, 2011

Adbot
ADBOT LOVES YOU

bone app the teeth
May 14, 2008

Goat Bastard posted:

Right, sorry the horrible column names made me not read the actual content of the querys too closely :).

You want to use the row_number() analytic function. It's like ROWNUM only you can specify how to number the rows.

code:
SELECT transactions.isbn13
      ,transactions.used
      ,transactions.new 
      ,prices.price
      ,prices.flag
FROM  (SELECT td_code       AS tide
             ,td_identifier AS isbn13
             ,used_on_hand  AS used
             ,new_on_hand   AS new
       FROM   f_title_description 
       JOIN   v_ad_on_hand 
              ON (td_code = itst_item_tide_code)
      ) items
JOIN   gb_transactions transactions
       ON (items.tide = transactions.gb_tide)
JOIN  (SELECT flag
             ,price
             ,tide 
       FROM  (SELECT pctx_itst_item_ittp_new_used_f AS flag
                    ,pctx_new_price                 AS price
                    ,pctx_itst_item_tide_code       AS tide
                    ,row_number() OVER (PARTITION BY pctx_itst_item_tide_code ORDER BY pctx_date_created DESC ) AS num
              FROM   f_price_change_txn 
             )
       WHERE  num <= 2
      ) prices ON (items.tide = prices.tide)
Edit: after some more thought that probably still isn't what you want as it will select two rows for every record, with one type of price on each row. I think that what you're probably looking for is
code:
SELECT transactions.isbn13
      ,transactions.used
      ,transactions.new 
      ,prices.used_price
      ,prices.new_price
FROM  (SELECT td_code       AS tide
             ,td_identifier AS isbn13
             ,used_on_hand  AS used
             ,new_on_hand   AS new
       FROM   f_title_description 
       JOIN   v_ad_on_hand 
              ON (td_code = itst_item_tide_code)
      ) items
JOIN   gb_transactions transactions
       ON (items.tide = transactions.gb_tide)
JOIN  (SELECT tide
             ,max(new_price)  AS new_price
             ,max(used_price) AS used_price
       FROM (SELECT CASE WHEN pctx_itst_item_ittp_new_used_f  = 'N' THEN pctx_new_price END AS new_price
                   ,CASE WHEN pctx_itst_item_ittp_new_used_f  = 'U' THEN pctx_new_price END AS used_price
                   ,pctx_itst_item_tide_code AS tide
                   ,row_number() OVER (PARTITION BY pctx_itst_item_tide_code
                                                   ,pctx_itst_item_ittp_new_used_f 
                                       ORDER BY     pctx_date_created DESC 
                                      ) AS num
             FROM   f_price_change_txn 
            )
       WHERE  num = 1
       GROUP BY tide            
      ) prices ON (items.tide = prices.tide)
I should point out that the only part of this I've tried to run is the prices sub-query, so the rest may be riddled with errors. Hopefully not though, as the only changes I made were formatting and adding aliases.

I still wasn't convinced about that rownum <= 2 thing, as it would have blown up if the two most recent rows for a TIDE had the same flag (maybe that's not possible, but better safe than sorry), so I've changed that a bit. It now selects the most recent of each type of price for each TIDE and puts them together in a row, which seems like what you were after.

I want to kiss you on the mouth.

But seriously, thanks. This worked with a few tweaks to the column names. I've been wrestling with this for over a week now. The whole DB schema is retarded but in my defense: it's not mine, I just deal with it. :(

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
My life-long struggle with MAX() continues.

Let's say I have the following data:

pre:
log_id | task_id | done_time           | done_by
--------------------------------------------------
1      | 1       | 2011-03-01 00:00:00 | 16
2      | 1       | 2011-03-06 11:08:51 | 15
3      | 1       | 2011-03-04 14:20:52 | 12
4      | 2       | 2011-02-27 02:22:22 | 20
5      | 2       | 2011-03-01 01:11:11 | 22
and I want to find out the most recent entries for each task. I could run SELECT task_id, done_by, MAX(done_time) FROM table GROUP BY task_id but as we all know that won't work. It will give me the max done time, but the first done_by in the table. In this case, it gives me March 6 for task 1 and done by user 16, and March 1 for task 2, done by user 20.

I hate MAX().

So I'm drawing a bit of a mental block here; how can I get what I desire? The only thing I can think of is joining in a select query where I select just the max date, grouped by ID, and join that on the log_id and say where done_time = inner_query.done_time. Is there a saner solution?

Edit: Left out 'done_by' in my 'could run' query.

Golbez fucked around with this message at 18:23 on Mar 7, 2011

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players
You can use ORDER BY with a GROUP BY to do that. If you need the records in a different order, you can add to the ORDER BY part.

SELECT task_id, done_time FROM table GROUP BY task_id ORDER BY done_time DESC

edit:
VVVVVVVVVVVV
This won't work?

SELECT task_id, done_time, done_by FROM table GROUP BY task_id ORDER BY done_time DESC

butt dickus fucked around with this message at 18:24 on Mar 7, 2011

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Doctor rear end in a top hat posted:

You can use ORDER BY with a GROUP BY to do that. If you need the records in a different order, you can add to the ORDER BY part.

SELECT task_id, done_time FROM table GROUP BY task_id ORDER BY done_time DESC

Doh, I forgot to include in my query example below that I wanted the user as well. That still gives me the first user it finds in the table, which in this case is incorrect.

I guess I know I can do it with a joined select, I just wish I didn't have to. It would seem to make sense to have some way to just grab the rows that fit the criteria of having the highest date grouped by an ID, but HAVING MAX() doesn't work that way.

Edit: I accomplished it easily enough, with...

pre:
SELECT task_id, done_time, done_by
  FROM table LEFT JOIN (SELECT MAX(done_time) AS max_time, task_id 
                          FROM table GROUP BY task_id) AS inner_query USING (task_id)
 WHERE done_time = max_time
... but that doesn't mean I have to like it. As I said, it just seems like there should be a way without running a subquery.

Golbez fucked around with this message at 18:32 on Mar 7, 2011

Aredna
Mar 17, 2007
Nap Ghost
You have to use a subquery, but there is a better way than what you posted if your flavor of SQL supports analytical functions:

code:
select task_id, done_time, done_by
from (  select task_id, done_time, done_by,
               rank() over (partition by task_id order by done_time desc) as rank
        from table
      ) sq
where rank = 1
(insert bitching about analytical functions being invalid in the having clause)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
This seems more of a database question than a PHP question, but... while examining my app, I noticed I'd made a few tables of static data. Like, a "statuses" table where it has all the statuses and what each one means - these statuses mean they're no longer working, these mean they get mailed monthly statements, these mean they're only charged half their premiums each month, etc.

And I was adding a new rule, and I realized a couple of things. One, this doesn't need to be editable, ever. The list of statuses and their abilities - apart from new ones we add - will not change.

I also realized that, why have a middle man? I can maintain this data (a dozen lines) in a PHP array, and keep all of the different settings in there. This would seem to have several benefits: No extra DB lookups to find out what status "8" means, and the data can't be written to, so why not hardcode it.

But I was suddenly wondering if this was a good idea when converting a much simpler table to this method. A category table, it consists of six rows with three columns: A category ID, category name, and one extra boolean used by two categories.

If I switch all of the entries to use this, they will go from using one byte each to 15, and six category names (instead of six category IDs) will be thrown into a table with 17000 rows.

This category list will never change, or if it does it will be rare enough that I'd rather have IT do it than anyone else. However, I wanted to inquire what y'all thought of the limitations of this form of ... I guess denormalization?

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Golbez posted:

This seems more of a database question than a PHP question, but... while examining my app, I noticed I'd made a few tables of static data. Like, a "statuses" table where it has all the statuses and what each one means - these statuses mean they're no longer working, these mean they get mailed monthly statements, these mean they're only charged half their premiums each month, etc.

And I was adding a new rule, and I realized a couple of things. One, this doesn't need to be editable, ever. The list of statuses and their abilities - apart from new ones we add - will not change.

I also realized that, why have a middle man? I can maintain this data (a dozen lines) in a PHP array, and keep all of the different settings in there. This would seem to have several benefits: No extra DB lookups to find out what status "8" means, and the data can't be written to, so why not hardcode it.

But I was suddenly wondering if this was a good idea when converting a much simpler table to this method. A category table, it consists of six rows with three columns: A category ID, category name, and one extra boolean used by two categories.

If I switch all of the entries to use this, they will go from using one byte each to 15, and six category names (instead of six category IDs) will be thrown into a table with 17000 rows.

This category list will never change, or if it does it will be rare enough that I'd rather have IT do it than anyone else. However, I wanted to inquire what y'all thought of the limitations of this form of ... I guess denormalization?

For the simple case, instead of keeping a lookup table you would model it on the database side with a CHECK constraint on the column. Another design pattern that might be good for your second case, and perhaps for the first, is to keep the normalized tables in the database, and have the application read them once and cache the values on the client side.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
The SQL Antipatterns book says that you should only use the lookup table. This avoids keeping track of the data in both places, but I don't agree 100% because it requires magic string comparisons in my application when an Enum type would be much better.

Right now what I'm doing in Java is create an Enum in my application that mirrors what's in the database so that way I can keep the referential integrity in the db and be able to do useful things with the Enum in my code instead of doing string comparisons. Like the book says, this is prone to stupid errors when the data is changed in one place and not the other, so what I want to do in the future is read the table and auto generate the Enum.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
I already do use an ENUM in that table, but for a simple three-state status (different from the "Statuses" I'm talking about). I can't use it for categories because I'd still need to record the boolean setting, and if I'm going to be doing that then I might as well either split it out to another table or store it in PHP.

I've already moved two of these tables to PHP and I'd rather not go back, one because it had dozens of different settings and would never change, and the second is the aforementioned statuses table. But I saw the categories and a couple of other tables and I was wondering if I was going too far. I too want the data in the database to be able to stand on its own, which is why I wanted to store the category names instead of the IDs (which, without a lookup table, would be useful only to the PHP app).

However, the idea of storing these - with their tons of settings - in the database and loading just once into an array is strangely alluring. That means the only thing I have to change in my code is in the loading, rather than the on-demand lookup, or having to recode writing...

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
I guess a good example of what I'm thinking is states. The list of state abbreviations in the US is reasonably static; best as I can tell, in the last century, it's changed a dozen times, so roughly once a decade (Though these tended to be clustered, of course). Now, I could store this in a MySQL table, or I could store it in a PHP array. The benefits being, I save one database lookup on each load, the detriments being, I create an array on each load. Now let's say I start needing to maintain various settings; these states, I don't charge tax in; these states, I don't deliver to (suck it, MO and AE!); etc.

So, the options appear to be:

1) Keep a lookup table in the database with all the selections, and join on it as needed.

2) Keep a lookup table in the database with all the selections, and pull from it as needed into an array, and lookup from the array thereafter in the request.

3) Keep it in a PHP array and lookup from that only.

The third option is by far the easiest to manage, IMO, when it comes to adding new settings, but it is the least semantic. The second seems the most complicated but might well be the best option. The first one is the most normalized but, for static tables like this, seems like a bit much.

Note that in the real-life example (as opposed to my farcical example of 'which states I don't ship to'), none of these settings are likely to change, ever, so there's not going to be any web-based facility to edit these tables, though there is one to view them.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
The only downside to that is that when you make a change in the database you need some mechanism to repopulate the array. Restarting the application should work, but your requirements might be different.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MEAT TREAT posted:

The only downside to that is that when you make a change in the database you need some mechanism to repopulate the array. Restarting the application should work, but your requirements might be different.

We're talking PHP here, so it would be repopulated on every request. No problem.

wonderboy
Aug 15, 2001
What is the secret of your power?

Aredna posted:

You have to use a subquery, but there is a better way than what you posted if your flavor of SQL supports analytical functions:

code:
select task_id, done_time, done_by
from (  select task_id, done_time, done_by,
               rank() over (partition by task_id order by done_time desc) as rank
        from table
      ) sq
where rank = 1
(insert bitching about analytical functions being invalid in the having clause)

This might also work (again, if you can use analytics):

code:
select task_id,
       max(done_time),
       max(done_by) keep (dense_rank first order by done_time desc, done_by)
  from table
 group by task_id

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Golbez posted:

We're talking PHP here, so it would be repopulated on every request. No problem.

If you are going to repopulate the lookup table on each request you might as well do the join then and only get the information that is relevant for that request. The point of using a lookup table to populate the array is so that you cache it for the rest of the requests.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Well this brings me to re-examining the first time I did this, moving a very simple table from MySQL to a PHP array:

php:
<?
$aTypes = array();
$aTypes['E'] = 'foo';
$aTypes['S'] = 'bar';
$aTypes['A'] = 'Adjustment';
$aTypes['R'] = 'hi';
$aTypes['C'] = 'Payment Received';
$aTypes['P'] = 'Payment Sent';
$aTypes['O'] = 'stuff';
?>
And that's it - all this contains is the description. Now, there's tons of business logic associated with these, hardcoded into each function, it's far too much to keep in a central clearing house like a table or an array. So I could remake a table out of this, with either an ID or the type letter being in the main table, and join this in to find the description as needed. Or, I could just store the description straight in the main table and skip the lookup table, but that seems like a waste of bytes.

I think I've asked the same question three times in a row with only slight variations, so I apologize if I've repeated myself. :(

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

Golbez posted:

I think I've asked the same question three times in a row with only slight variations, so I apologize if I've repeated myself. :(

The job of a database is to store data. What if you need to access this data from multiple scripts? Copy the array code to each one? That's stupid to maintain. You could put the array in a file and include it in your scripts, I guess. What if you want to generate a report on the data outside of the script? Databases make joining, comparing and sorting data trivial.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Doctor rear end in a top hat posted:

The job of a database is to store data. What if you need to access this data from multiple scripts? Copy the array code to each one? That's stupid to maintain. You could put the array in a file and include it in your scripts, I guess. What if you want to generate a report on the data outside of the script? Databases make joining, comparing and sorting data trivial.

Well when you put it that way. :P You're right, of course. I hadn't even thought of having to copy it around.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Golbez posted:

I already do use an ENUM in that table, but for a simple three-state status (different from the "Statuses" I'm talking about). I can't use it for categories because I'd still need to record the boolean setting, and if I'm going to be doing that then I might as well either split it out to another table or store it in PHP.

I also wanted to comment on this because this is going to bite you in the rear end later. The problem with using an ENUM datatype in your database is that you cannot easily or portably query the database for the possible values of the ENUM. You might say, "Oh, I'll just do a SELECT DISTINCT on that column" but what if a value isn't being used in the entire table?

Any ENUM datatype you are using should be converted into a lookup table. This has the benefit of being able to add additional columns like you are doing in your categories table.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MEAT TREAT posted:

I also wanted to comment on this because this is going to bite you in the rear end later. The problem with using an ENUM datatype in your database is that you cannot easily or portably query the database for the possible values of the ENUM. You might say, "Oh, I'll just do a SELECT DISTINCT on that column" but what if a value isn't being used in the entire table?

Any ENUM datatype you are using should be converted into a lookup table. This has the benefit of being able to add additional columns like you are doing in your categories table.

Even if it's something as simple as "Eligible", "Pending", "Denied"? (This is still an improvement from the Access DB I converted it from, where there were separate boolean columns for each status... which doesn't make too much sense when something can only be in one status)

Aredna
Mar 17, 2007
Nap Ghost
As someone who has to create reports off of databases with little to no documentation: Trying to reverse engineer enums is a pain in the rear end and any with a table set up have made my life 10x easier.

I guess what I'm trying to say is that if anyone ever wants any reports built off of your data then you need good documentation, but if there is any chance it is slacking at all then having a lookup table is extremely valuable.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Golbez posted:

Even if it's something as simple as "Eligible", "Pending", "Denied"? (This is still an improvement from the Access DB I converted it from, where there were separate boolean columns for each status... which doesn't make too much sense when something can only be in one status)

Yes because those aren't as simple as you think. At anytime in the future someone could decide to add "Conditioned" as a possible value. The only time it's valid is when every possible choice can be enumerated like "LEFT"/"RIGHT".

Another reason is that in the future you decide to deprecate the "Denied" value and replace it with "Bad_Credit" and "Liability", but you don't want to change all of the old records that have been set to Denied because you can't know the reason. So the easiest thing to do is add another boolean column that says if that value is still in use. But you can't do that with an Enum data type.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Aredna posted:

As someone who has to create reports off of databases with little to no documentation: Trying to reverse engineer enums is a pain in the rear end and any with a table set up have made my life 10x easier.

I guess what I'm trying to say is that if anyone ever wants any reports built off of your data then you need good documentation, but if there is any chance it is slacking at all then having a lookup table is extremely valuable.

Related: make use of COMMENT fields, available both at the column and table levels. One of the little things about MySQL that doesn't suck.

cforrester
Apr 3, 2007

We thought the Lord was gonna call Joel home until we got this card. Cambot, can we show a picture of this one?
I'm working with a MySQL database and I'm dealing with the same problem that's outlined here, except this solution is for MSSQL.

http://www.devx.com/getHelpOn/10MinuteSolution/16501

Is it possible to do something similar in MySQL?

illiniguy01
Feb 19, 2011

Sweat, Ubu. Sweat. Good paranoid schizophrenic.
I need help
Here are my tables:
code:
users:
id           pk
first_name
last_name
username
password

users_schedules:
id    pk
users_id      fk on users.id
weekstart

schedules:
id       pk
users_schedules_id     fk on users_schedules.id
schedule_type
day_number
day_in
day_out
lunch_in
lunch_out
I am trying this SQL:
code:
SELECT *
FROM   users,
       users_schedules,
       schedules
WHERE  users_schedules.users_id = 1
       AND users_schedules.weekstart = '2011-03-07' 



I want it to return only values whos user.id = 1, where users_schedules.users_id = 1, users_schedules.weekstart = '2011-03-07', and schedules.users_schedules_id = the users_schedules.id that matches the users_schedules.weekstart of '2011-03-07'

any help?

spiritual bypass
Feb 19, 2008

Grimey Drawer
Try an INNER JOIN

illiniguy01
Feb 19, 2011

Sweat, Ubu. Sweat. Good paranoid schizophrenic.
Ok, I came up with this:
code:
SELECT *
FROM   users,
       users_schedules,
       schedules
WHERE  users_schedules.users_id = 1
       AND users.id = users_schedules.users_id
       AND users_schedules.weekstart = '2011-03-07'
       AND schedules.users_schedules_id = users_schedules.id  
It seems to be working right. Does it look right?

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad
The MySQL documentation say that for non-InnoDB engined tables, TRUNCATE performance is very fast compared to DELETEing everything. I've found that to be the case for a 500k MyISAM table I was testing stuff on. TRUNCATE is a locking operation, right?

Given that I don't want to lock up a database for long, is there any way to scout out the TRUNCATE speed before actually doing it?

Similarly, do people have experience with TRUNCATE performance for InnoDB tables, or at least those without foreign-key relationships?

McGlockenshire
Dec 16, 2005

GOLLOCKS!

quote:

Similarly, do people have experience with TRUNCATE performance for InnoDB tables, or at least those without foreign-key relationships?
The largest InnoDB table I've ever had to truncate was about 9 million rows and about 2.4 gigs. It took a lot longer than we expected, it really felt like it was identical to blind DELETE FROM.

We found it faster to do a CREATE TABLE new LIKE old to copy the structure of the table, then do a two way rename -- old => old_old, new => old. It should only take a split second, and then you can deal with the data in the old table as you'd like. I'd imagine this would completely break poo poo if you have foreign keys.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Try
code:
SET foreign_key_checks = 0;
before doing a truncate on an InnoDB table then set it back.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad

McGlockenshire posted:

The largest InnoDB table I've ever had to truncate was about 9 million rows and about 2.4 gigs. It took a lot longer than we expected, it really felt like it was identical to blind DELETE FROM.

We found it faster to do a CREATE TABLE new LIKE old to copy the structure of the table, then do a two way rename -- old => old_old, new => old. It should only take a split second, and then you can deal with the data in the old table as you'd like. I'd imagine this would completely break poo poo if you have foreign keys.

I'm fuzzier here, but doesn't MySQL naively do some row copying when doing renames? And even if that's not the case, what do I do with the big old_old table? If deletion/truncation performance ends up being bad, I'm still putting a big load on the database by getting rid of it, right?

Aredna
Mar 17, 2007
Nap Ghost

McGlockenshire posted:

The largest InnoDB table I've ever had to truncate was about 9 million rows and about 2.4 gigs. It took a lot longer than we expected, it really felt like it was identical to blind DELETE FROM.

We found it faster to do a CREATE TABLE new LIKE old to copy the structure of the table, then do a two way rename -- old => old_old, new => old. It should only take a split second, and then you can deal with the data in the old table as you'd like. I'd imagine this would completely break poo poo if you have foreign keys.

I've not used MySQL/InnoDB enough to know, but on Greenplum (a variant of PostgreSQL) I had views that updated to old_old during the rename process. I found this out when the DBA did it without informing the users and some data stopped working. I also had views deleted when they cascaded a delete operation on a renamed table in this manner.

I guess all I'm saying is there may be other unintended consequences of the rename solution that could cause issues.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
So while deciding if I should move my arrays back into the database (yes) and if I should use the short codes or an autoinc for the ID (autoinc), I started thinking about states. State abbreviations change, but not often; as I pointed out before, it happened 12 times in the last 100 years. However, I get the impression that most databases either just store the state directly and use program logic to confine it, or their lookup table is simply abbreviation and name - no ID. And it would look weird to browse the database and see someone living in "Cheyenne, 62".

So, how do y'all do it, if at all?

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Remember that each table must have a primary key defined to ensure that a particular row is unique, otherwise you could end up repeating data which you usually don't want. The auto increment field is just a convenient way of letting the database generate unique keys for each row, but now you have to add additional constraints to make sure there are no duplicates.

Usually each table has some natural key that can be used as a primary key, as in your case the state abbreviation. In the future it might be problem when you start supporting other countries and their states because the abbreviations might overlap. But if that's not a concern for the immediate future then by all means use the abbreviation as the primary key for your state table.

I've seen an argument for using a surrogate key (auto inc field) instead of a natural key that basically says you shouldn't use the natural key because what if it changes in the future? Like you said, the state abbreviations have changed 12 times in the last 100 years. But I think that this is a moot point because any database worth a drat supports ON UPDATE CASCADE which solves the issue.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
So it sounds like you're saying, use natural keys where they make sense (protip: SSN is not a natural key), so my little status codes might qualify, but on the other hand, only do this when you aren't saddled with MyISAM.

Goat Bastard
Oct 20, 2004

Primary key values should never change. 12 times in 100 years is more than never.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Goat Bastard posted:

Primary key values should never change. 12 times in 100 years is more than never.

What is the problem with ON UPDATE CASCADE and why must a primary key NEVER change?

McGlockenshire
Dec 16, 2005

GOLLOCKS!
WRT bizarreness when doing a table rename, I've never seen any sort of copying or *any* view reference updating... but then again, our data is stupid and we don't use views. The mere fact that you can turn off foreign key refs and drop a table full of keys referenced elsewhere pretty much says it all when the conversation turns to MySQL and data integrity.

Goat Bastard
Oct 20, 2004

MEAT TREAT posted:

What is the problem with ON UPDATE CASCADE and why must a primary key NEVER change?

I thought immutability was part of the formal definition of a primary key, but some quality time with google tells me I was wrong about that. I'd still suggest that if the values in your candidate key can change then it's probably better to use it as a unique key and have a surrogate for the primary key.

Adbot
ADBOT LOVES YOU

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
So, y'all have convinced me to move my tables out of PHP and back into MySQL. However, this one is simple.

Eligible
Pending
Denied

That's the extent of the data. This was formerly in the ENUM that people said was a bad idea.

Now, I could give it an ID and make that the primary key. However, there are parts in my code where I check "If not eligible, do this". So to do that I would either have to hard-code what ID means what in PHP (thus dulling the usefulness), or always run a "GetStatusID('Eligible')" every time I want to compare to eligible.

So, my other thought was, an identity table. A table consisting of only one column, and the main table just includes the whole words in its "status" column, rather than a number in a "status_id" column.

What would seem to make more sense?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply