SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Munkeymon posted:

I'm doing this to an InnoDB table:
code:
INSERT INTO logins 
  (lo_cookie, lo_created, lo_ip)
  VALUES ("something","whatever","eyepee")
  ON DUPLICATE KEY UPDATE 
    lo_created = VALUES(lo_created), lo_ip = VALUES(lo_ip)
where lo_cookie is a unique key. However, on the first duplicate, the update clause is causing an insert with a blank lo_cookie. Every subsequent update causes that new second row to be updated.

It doesn't matter if I tell it to update the lo_cookie in the 'on duplicate' clause, it still creates and updates a new row with a blank lo_cookie.

That not what's supposed to happen, right?

I was interested in this, so I went and looked at a couple of the documentation pages. I agree that it doesn't sound like the behaviour you see is the behaviour described by the documentation.

Out of curiosity: when you say "an insert with a blank lo_cookie", you mean lo_cookie is NULL, right? Have you tried making a similar table with lo_cookie declared NOT NULL, and seeing what happens in the same situation?

# ? Nov 10, 2009 23:37

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 01:26

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Hammerite posted:

I was interested in this, so I went and looked at a couple of the documentation pages. I agree that it doesn't sound like the behaviour you see is the behaviour described by the documentation.

Out of curiosity: when you say "an insert with a blank lo_cookie", you mean lo_cookie is NULL, right? Have you tried making a similar table with lo_cookie declared NOT NULL, and seeing what happens in the same situation?

It is NOT NULL, but it looks like I managed to omit that:

code:

mysql> desc logins;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| lo_id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| lo_cookie  | varchar(40)      | NO   | UNI | NULL    |                |
| lo_created | datetime         | NO   |     | NULL    |                |
| lo_ip      | varchar(39)      | NO   |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+

I'm assuming a NOT NULL row that defaults to NULL will get either 0, blank, or whatever the equivalent of those are for its type when the value is unspecified.

# ? Nov 11, 2009 16:30

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Munkeymon posted:

It is NOT NULL, but it looks like I managed to omit that:

code:

mysql> desc logins;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| lo_id      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| lo_cookie  | varchar(40)      | NO   | UNI | NULL    |                |
| lo_created | datetime         | NO   |     | NULL    |                |
| lo_ip      | varchar(39)      | NO   |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+

I'm assuming a NOT NULL row that defaults to NULL will get either 0, blank, or whatever the equivalent of those are for its type when the value is unspecified.

You didn't say that you also had a primary key on the table.
see this not a bug

The man page isn't totally clear, but I think you need to do this:

code:

INSERT INTO logins 
  (lo_cookie, lo_created, lo_ip)
  VALUES ("something","whatever","eyepee")
  ON DUPLICATE KEY UPDATE 
    lo_id = LAST_INSERT_ID(lo_id), lo_created = VALUES(lo_created), lo_ip = VALUES(lo_ip)

# ? Nov 12, 2009 19:51

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Jethro posted:

You didn't say that you also had a primary key on the table.
see this not a bug

The man page isn't totally clear, but I think you need to do this:
code:
INSERT INTO logins 
  (lo_cookie, lo_created, lo_ip)
  VALUES ("something","whatever","eyepee")
  ON DUPLICATE KEY UPDATE 
    lo_id = LAST_INSERT_ID(lo_id), lo_created = VALUES(lo_created), lo_ip = VALUES(lo_ip)

I see what they meant now by "you can work around this by using LAST_INSERT_ID(expr)." It wasn't clear to me at all from where the ID value is coming from or going to in their example.

Thanks!

# ? Nov 12, 2009 20:33

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Munkeymon posted:

I see what they meant now by "you can work around this by using LAST_INSERT_ID(expr)." It wasn't clear to me at all from where the ID value is coming from or going to in their example.

Thanks!

It wasn't really clear to me either, but if it works it would seem I guessed right.

# ? Nov 13, 2009 18:19

thefncrow: Mar 14, 2001

I'm really at a loss as to how something like this might be done without serious overhead, so I thought I might lob it out here, see if anyone has a solid solution.

In SQL Server 2005, I'm working with a table that has a PK key sequential GUID called ID and an attribute for a parent item, ParentID. The parent item is the key GUID of the item that this item refers to, and is NULL for root items. Additionally, there is a datetime of when the record was added to the table, DateProcessed. These parent items cascade, such that, for example, we would have one root item with 2 children nodes 2 and 3, and then nodes 4 and 5 whose records reference them as the child of node 2 and 3, respectively. The table is very large, like 200,000+ records, and the size of each grouping is relatively small(most groups don't reach 5 records, though they could), giving me a large number of these groups.

Given this, I want to return just the leaf node that was most recently inserted for each such grouping in the entire table, with a cap at 1000 of the most recent records returned.

The best I've been able to come up with on this has been to rely on the fact that, by using newsequentialid(), the items more recently inserted have GUID values greater than previous items. Based on that, I was able to generate this:

code:

WITH ItemTree AS 
(
	SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS ItemGroup 
	FROM Items 
	WHERE ParentID IS NULL 

		UNION ALL

	SELECT I1.ID, I2.ItemGroup 
	FROM Items I1
	INNER JOIN ItemTree I2 ON I1.ParentID = I2.ID
)
SELECT TOP 1000 * 
FROM Items 
WHERE ID IN 
(
	SELECT MAX(CONVERT(BINARY(16), ID)) 
	FROM ItemTree 
	GROUP BY ItemGroup
)
ORDER BY DateProcessed DESC

This gives me the results that I want, but the query takes 6 minutes to execute, and this is supposed to feed data to an ASP.NET web application, so 6 minutes to execute means that it's not a usable solution.

If I move the TOP 1000 inside the CTE to the root expression, I get incredible performance gains, as you might expect. However, the next step, once I have this, is to add some additional where clauses to the outer query in order to provide the ability to search for certain items, and so the filtering needs to occur outside, or else I'm just left doing my search inside that TOP 1000 set instead of searching the whole set and placing a 1000 record limit on the results.

Is there an easier way to do this that I'm just not seeing?

# ? Nov 13, 2009 22:33

Saltin: Aug 20, 2003; Don't touch

I have a simple one.I'm not much of a SQL fellow, as you can probably guess from this question!

I have a view with a list of companies.. Some companies have parent companies.

dbo.company_view

dbo.company_view.company_id = a numeric ID (unique)
dbo.company_view.company_name= the common name of the company
dbo.company_view.parent_company_id= the numeric ID of the parent (this keys off company_id)

All I want to do, is take the parent_company_id, lookup the company_name, and spit that out to a new column named "parent_company_name_proper" or something like that.

Any help would be greatly appeciated!

Saltin fucked around with this message at 22:45 on Nov 13, 2009

# ? Nov 13, 2009 22:43

thefncrow: Mar 14, 2001

Saltin posted:

I have a simple one.I'm not much of a SQL fellow, as you can probably guess from this question!

I have a view with a list of companies.. Some companies have parent companies.

dbo.company_view

dbo.company_view.company_id = a numeric ID (unique)
dbo.company_view.company_name= the common name of the company
dbo.company_view.parent_company_id= the numeric ID of the parent (this keys off company_id)

All I want to do, is take the parent_company_id, lookup the company_name, and spit that out to a new column named "parent_company_name_proper" or something like that.

Any help would be greatly appeciated!

If you just want the name of the parent company, then what I'd suggest is to do something like this:

code:

SELECT CV1.company_id, CV1.company_name, CV2.company_name AS parent_company_name
FROM dbo.company_view CV1
INNER JOIN dbo.company_view CV2 ON CV1.parent_company_id = CV2.company_id

Slap any WHERE clauses you might be using onto the end of that.

# ? Nov 13, 2009 22:49

Saltin: Aug 20, 2003; Don't touch

thefncrow posted:

If you just want the name of the parent company, then what I'd suggest is to do something like this:

Hot drat, that's it exactly. Thanks a million, I really appreciate it.

# ? Nov 13, 2009 22:57

Nurbs: Aug 31, 2001; Three fries short of a happy meal...Whacko!

thefncrow posted:

I'm really at a loss as to how something like this might be done without serious overhead, so I thought I might lob it out here, see if anyone has a solid solution.

In SQL Server 2005, I'm working with a table that has a PK key sequential GUID called ID and an attribute for a parent item, ParentID. The parent item is the key GUID of the item that this item refers to, and is NULL for root items. Additionally, there is a datetime of when the record was added to the table, DateProcessed. These parent items cascade, such that, for example, we would have one root item with 2 children nodes 2 and 3, and then nodes 4 and 5 whose records reference them as the child of node 2 and 3, respectively. The table is very large, like 200,000+ records, and the size of each grouping is relatively small(most groups don't reach 5 records, though they could), giving me a large number of these groups.

Given this, I want to return just the leaf node that was most recently inserted for each such grouping in the entire table, with a cap at 1000 of the most recent records returned.
Is there an easier way to do this that I'm just not seeing?

I'm not up on performance, but couldn't you do something like this?

SELECT TOP 1000 * FROM Items WHERE ParentId IS NOT NULL AND Id NOT IN (SELECT ParentId FROM Items) ORDER BY DateProcessed

code:

WITH ParentIds AS
(
  SELECT DISTINCT ParentId FROM Items ORDER BY ParentId --Maybe order it descending?
)
SELECT TOP 1000 * FROM Items WHERE ParentId IS NOT NULL AND Id NOT IN ParentIds

Following your use of the WITH command..

Nurbs fucked around with this message at 05:38 on Nov 16, 2009

# ? Nov 16, 2009 05:12

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

thefncrow posted:

I'm really at a loss as to how something like this might be done without serious overhead, so I thought I might lob it out here, see if anyone has a solid solution.

In SQL Server 2005, I'm working with a table that has a PK key sequential GUID called ID and an attribute for a parent item, ParentID. The parent item is the key GUID of the item that this item refers to, and is NULL for root items. Additionally, there is a datetime of when the record was added to the table, DateProcessed. These parent items cascade, such that, for example, we would have one root item with 2 children nodes 2 and 3, and then nodes 4 and 5 whose records reference them as the child of node 2 and 3, respectively. The table is very large, like 200,000+ records, and the size of each grouping is relatively small(most groups don't reach 5 records, though they could), giving me a large number of these groups.

Given this, I want to return just the leaf node that was most recently inserted for each such grouping in the entire table, with a cap at 1000 of the most recent records returned.

The best I've been able to come up with on this has been to rely on the fact that, by using newsequentialid(), the items more recently inserted have GUID values greater than previous items. Based on that, I was able to generate this:
code:
WITH ItemTree AS 
(
	SELECT ID, ROW_NUMBER() OVER (ORDER BY ID ASC) AS ItemGroup 
	FROM Items 
	WHERE ParentID IS NULL 

		UNION ALL

	SELECT I1.ID, I2.ItemGroup 
	FROM Items I1
	INNER JOIN ItemTree I2 ON I1.ParentID = I2.ID
)
SELECT TOP 1000 * 
FROM Items 
WHERE ID IN 
(
	SELECT MAX(CONVERT(BINARY(16), ID)) 
	FROM ItemTree 
	GROUP BY ItemGroup
)
ORDER BY DateProcessed DESC
This gives me the results that I want, but the query takes 6 minutes to execute, and this is supposed to feed data to an ASP.NET web application, so 6 minutes to execute means that it's not a usable solution.

If I move the TOP 1000 inside the CTE to the root expression, I get incredible performance gains, as you might expect. However, the next step, once I have this, is to add some additional where clauses to the outer query in order to provide the ability to search for certain items, and so the filtering needs to occur outside, or else I'm just left doing my search inside that TOP 1000 set instead of searching the whole set and placing a 1000 record limit on the results.

Is there an easier way to do this that I'm just not seeing?

I don't know if this will be any faster, but why not have the ItemGroup in your CTE just be the ID of the top parent item? That way you don't have to do the analytic function.

code:

WITH ItemTree AS 
(
	SELECT ID, ID AS ItemGroup 
	FROM Items 
	WHERE ParentID IS NULL 

		UNION ALL

	SELECT I1.ID, I2.ItemGroup 
	FROM Items I1
	INNER JOIN ItemTree I2 ON I1.ParentID = I2.ID
)
SELECT TOP 1000 * 
FROM Items 
WHERE ID IN 
(
	SELECT MAX(CONVERT(BINARY(16), ID)) 
	FROM ItemTree 
	GROUP BY ItemGroup
)
ORDER BY DateProcessed DESC

Also, if you can be less picky about getting only one item per group, then maybe you could do:

code:

WITH ItemTree AS 
(
	SELECT ID, ID AS ItemGroup 
	FROM Items 
	WHERE ParentID IS NULL 

		UNION ALL

	SELECT I1.ID, I2.ItemGroup 
	FROM Items I1
	INNER JOIN ItemTree I2 ON I1.ParentID = I2.ID
)
SELECT TOP 1000 Items.* 
FROM Items INNER JOIN
ItemTree ON Items.ID = ItemTree.ID
ORDER BY row_number() over (partition by ItemGroup
                            order by DateProcessed DESC),
         DateProcessed DESC

which will give you the most recent items from each group, then the second most recent items from each group, etc. Then you don't have to rely on newsequentialid().

If you really need only the most recent item from each group, then you could do this:

code:

WITH ItemTree AS 
(
	SELECT ID, ID AS ItemGroup, DateProcessed
	FROM Items 
	WHERE ParentID IS NULL 

		UNION ALL

	SELECT I1.ID, I2.ItemGroup, DateProcessed 
	FROM Items I1
	INNER JOIN ItemTree I2 ON I1.ParentID = I2.ID
),
SortItemTree AS
(
	SELECT ID,
               ItemGroup,
               row_number() over (partition by ItemGroup
                                  order by DateProcessed desc) ord
	FROM ItemTree
)
SELECT TOP 1000 Items.* 
FROM Items INNER JOIN
SortItemTree SIT ON Items.ID = SIT.ID and SIT.ord = 1
ORDER BY DateProcessed DESC

This will only give you the most recent items from each group, and you don't have to rely on the behavior of newsequentialid(), but it probably won't perform any better.

Finally, if you have the ability to change the schema, maybe add the GroupId so you don't have to generate it in a view all the time, though if it's flexible that may not be worth the trouble.

# ? Nov 16, 2009 16:59

thefncrow: Mar 14, 2001

I actually got this solved late on Friday, but I appreciate the help.

Nurbs posted:

I'm not up on performance, but couldn't you do something like this?

SELECT TOP 1000 * FROM Items WHERE ParentId IS NOT NULL AND Id NOT IN (SELECT ParentId FROM Items) ORDER BY DateProcessed
code:
WITH ParentIds AS
(
  SELECT DISTINCT ParentId FROM Items ORDER BY ParentId --Maybe order it descending?
)
SELECT TOP 1000 * FROM Items WHERE ParentId IS NOT NULL AND Id NOT IN ParentIds
Following your use of the WITH command..

That would pull back all leaf nodes on the tree. Imagine the circumstance Root -> Child -> (Grandchild1, Grandchild2). Grandchildren 1 and 2 both have ID values that don't appear as ParentID values, so they'd both be returned, instead of just returning the most recent of the two.

Jethro posted:

I don't know if this will be any faster, but why not have the ItemGroup in your CTE just be the ID of the top parent item? That way you don't have to do the analytic function.

This was actually part of my eventual solution.

I went with the group item as the ID value of the root node, and, sloppy as it is, I flattened out the CTE to rid myself of it entirely. That meant losing recusion and going to only a fixed number of levels, but based on what the Items table does, it turned out it would already begin experiencing weirdness if we ever go beyond 3 levels of tree depth. So, I compromised on that and went with 3 levels of depth instead of building it recursively, and I ended up with something like

code:

SELECT TOP 1000 *
FROM Items
WHERE ID IN 
(
	SELECT MAX(CONVERT(BINARY(16), ISNULL(I3.ID, ISNULL(I2.ID, I1.ID))))
	FROM Items I1
	LEFT OUTER JOIN Items I2 ON I2.ParentID = I1.ID
	LEFT OUTER JOIN Items I3 ON I3.ParentID = I2.ID
	WHERE I1.ParentID IS NULL
	GROUP BY I1.ID
)

That actually turns out to be a near instant query.

# ? Nov 16, 2009 20:50

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

I've some problems to do with table and index design.

As backstory: I have a web database application, the database side being MySQL with MyISAM tables. When I was designing the database schema I was inexperienced at working with databases, and so I made the odd suboptimal choice here and there. I now want to correct the design, but since the application is in use, this requires a certain amount of planning. I also intend on changing to InnoDB tables. This is to be done after the schema redesign has gone into effect, but some of the redesign is with the switch to InnoDB in mind. I've read some articles and manual pages on how to design things for InnoDB but can't be 100% confident everything is ship shape in my design.

There is one specific table for which I'm not sure what the correct design choices are, though. This table is named PlayerGameRcd and it records player participation in a game. So I have a table User with autoincrementing primary key UserID, and a table Game with autoincrementing primary key GameID, and the primary key for PlayerGameRcd is (Game, User) where Game is a column referencing GameID and User is a column referencing UserID. This table currently has about 18,000 rows.

The source of difficulty is that I have a lot of things in mind to use this table for (that it's already being used for, in fact). I want to use it for:

- Querying to get a list of games that a player is playing in, is waiting to play in, or has played in (if a user "has played in" a game, then either the game is finished, or the user has quit out of it or been removed).
- Querying to find out how many games a user played, and how many he won. (When I do this, I want to grab some details about the games themselves, so this involves a join to Game.)
- Querying to find out what the best scores are in certain categories of game. In order to know whether I should count a row in a particular category, I need to know whether the game is marked as "doesn't count"; I may also need to know some combination of whether the game was won and whether it was lost. (These are multiplayer games; I use the word "lost" to mean that the player came in last place, as opposed to merely being a runner-up.)
- Querying to find a user's personal best scores, with the same restrictions as previously discussed.

This set of several different uses means that I have several different indexes in mind for the table, but I am worried I have planned too many indexes. Here is the planned schema for the table post-redesign and post-switch to InnoDB:

code:

CREATE TABLE PlayerGameRcd (
    User SMALLINT(5) UNSIGNED NOT NULL,
    Game MEDIUMINT(8) UNSIGNED NOT NULL,
    GameResult ENUM('Quit',
                    'Kicked by Vote',
                    'Kicked by Admin',
                    'Kicked by System',
                    'Finished 5th',
                    'Finished 4th',
                    'Finished 3rd',
                    'Finished 2nd',
                    'Finished 1st',
                    'Game Aborted',
                    'Playing',
                    'Hide'
                    ) NOT NULL DEFAULT 'Playing',
    Inherited TINYINT(1) NOT NULL,
    GameCounts TINYINT(1) NOT NULL,
    Colour TINYINT(1) UNSIGNED NOT NULL,
    Score SMALLINT(5) UNSIGNED NOT NULL DEFAULT 0,
    NumLongTurns TINYINT(1) UNSIGNED NOT NULL DEFAULT 0,
    Notes MEDIUMTEXT,
    CurrentOccupant TINYINT(1) UNSIGNED NOT NULL DEFAULT 0,
        // CurrentOccupant is set to 0 if the user is currently playing in the game as
        // the indicated colour. When a player leaves a game, the CurrentOccupant column
        // is incremented for all records that match the appropriate Game and Colour
        // values (including that of the departing player). This is to permit the unique
        // key pgr_multi_uk.
    GameCompleted TINYINT(1) NOT NULL,
    GameWon TINYINT(1) NOT NULL,
    GameLost TINYINT(1) NOT NULL,
    PRIMARY KEY (Game, User),
    UNIQUE KEY pgr_multi_uk (Game, CurrentOccupant, Colour),
    KEY Stats_Won_ind (GameCounts, GameWon, User),
    KEY Stats_Lost_ind (GameCounts, GameLost, User),
    KEY GameList_ind (User, CurrentOccupant),
    CONSTRAINT Constr_PlayerGameRcd_User_fk
        FOREIGN KEY User_fk (User) REFERENCES User (UserID)
        ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT Constr_PlayerGameRcd_Game_fk
        FOREIGN KEY Game_fk (Game) REFERENCES Game (GameID)
        ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=INNODB CHARACTER SET utf8 COLLATE utf8_general_ci

Now, if we ignore the column Notes which will usually be NULL, the length of a row of this table will be 16 bytes. The total amount of space taken up by the indexes that can be attributed to a single row, on the other hand, is 32. (Recall that where an index does not include all columns of the primary key, InnoDB actually implements that index with the missing primary key columns added at the end.) So the indexes would be taking up about twice as much space as the table itself, which seems like some kind of warning sign that something is wrong with this plan. Would it actually be faster to just let InnoDB do a table scan when it needs to do some of these queries? Should I throw out Stats_Won_ind and Stats_Lost_ind and index instead on GameResult, and let the queries that would use that index instead look for ranges of index values rather than precise ones? This is a very tough design situation for me.

I'm also kind of weak on just how MySQL makes use of indexes on tables when they can be used both to slim down the result set and to perform a join; I'm not sure where to learn about that. Nor am I sure whether it uses them when they can be of help in implementing an ORDER BY clause. (I am aware that only one index is used for each table occurring in a query.)

As an example, here is a query that I might use to rank losing scores in three-player games on a particular game board:

code:

SELECT
    ROUND(PlayerGameRcd.Score/100, 2) AS AdaptedScore,
    PlayerGameRcd.Game,
    PlayerGameRcd.User,
    User.Name
FROM
    PlayerGameRcd
    LEFT JOIN Game ON PlayerGameRcd.Game = Game.GameID
    LEFT JOIN User ON PlayerGameRcd.User = User.UserID
WHERE
    PlayerGameRcd.GameCounts = 1 AND
    PlayerGameRcd.GameLost = 1 AND
    Game.EffectiveNumPlayers = 3 AND
    Game.GVersion = 2
ORDER BY
    PlayerGameRcd.Score DESC,
    Game.GameID
LIMIT 100
;

(If I wanted a user's personal best scores of that type, I would add "AND PlayerGameRcd.User = X" in the WHERE clause.)

# ? Nov 18, 2009 01:37

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

18,000 rows is really nothing that shouldn't even matter as long as you index it by the most common fields you will use in your where.

Sprawl fucked around with this message at 02:07 on Nov 18, 2009

# ? Nov 18, 2009 02:04

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

quote!= edit

# ? Nov 18, 2009 02:06

Vanadium Dame: May 29, 2002; HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

If the cardinality of the indexed column is low a b-tree index is actually going to hurt performance anyway, and what Sprawl said about 18k rows being small is absolutely true. If you're selecting based on a column with high cardinality (like a PK) then the opposite is true of course.

# ? Nov 18, 2009 02:52

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Markoff Chaney posted:

If the cardinality of the indexed column is low a b-tree index is actually going to hurt performance anyway, and what Sprawl said about 18k rows being small is absolutely true. If you're selecting based on a column with high cardinality (like a PK) then the opposite is true of course.

Does this remain true if I have an index on multiple columns, a mixture of high-cardinality and low-cardinality?

Based on your and Sprawl's responses, it seems as though the columns and indices GameWon and GameLost (and the column GameCompleted) are not worthwhile.

The other indices are all ones that are either necessary (the PRIMARY KEY and the other UNIQUE KEY) or will be frequently-used; GameList_ind will be used to find all games a given user is involved in. (Also, if GameList_ind didn't exist then InnoDB would make a new index for the foreign key constraint on User, anyway.) But I'm unsure whether to reinstate an earlier idea, which was to have an index on (GameCounts, GameResult, User) [which in InnoDB is implicitly a key on (GameCounts, GameResult, User, Game)]. GameCounts and GameResult have middling to low cardinality (GameCounts is treated as a Boolean value), but the proposed index could nevertheless be used for some queries; particularly when the value of User is of interest too (looking for a user's personal data).

Actually, I think I just realised what I need to do: have an index on (GameCounts, GameResult, Score, User). This is assuming that indices on columns that will appear in an ORDER BY clause are useful; I should do some experiments to check this.

# ? Nov 18, 2009 12:00

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

Hammerite posted:

Does this remain true if I have an index on multiple columns, a mixture of high-cardinality and low-cardinality?

Based on your and Sprawl's responses, it seems as though the columns and indices GameWon and GameLost (and the column GameCompleted) are not worthwhile.

The other indices are all ones that are either necessary (the PRIMARY KEY and the other UNIQUE KEY) or will be frequently-used; GameList_ind will be used to find all games a given user is involved in. (Also, if GameList_ind didn't exist then InnoDB would make a new index for the foreign key constraint on User, anyway.) But I'm unsure whether to reinstate an earlier idea, which was to have an index on (GameCounts, GameResult, User) [which in InnoDB is implicitly a key on (GameCounts, GameResult, User, Game)]. GameCounts and GameResult have middling to low cardinality (GameCounts is treated as a Boolean value), but the proposed index could nevertheless be used for some queries; particularly when the value of User is of interest too (looking for a user's personal data).

Actually, I think I just realised what I need to do: have an index on (GameCounts, GameResult, Score, User). This is assuming that indices on columns that will appear in an ORDER BY clause are useful; I should do some experiments to check this.

They would if your dataset was long enough maybe but i doubt you would see anything but a few MS here and there i wouldn't expect any results unless you had lots of concats to take more then 100ms.

# ? Nov 18, 2009 15:31

MoNsTeR: Jun 29, 2002

Hammerite posted:

... So the indexes would be taking up about twice as much space as the table itself, which seems like some kind of warning sign that something is wrong with this plan. ...

This is pretty common. Here is the size breakdown for one of my company's core tables (which mostly just associates two other things and tracks a status):
table: 10372MB
PK index: 2780MB
create date index: 3452MB
another date index: 3656MB
status date index: 3968MB
index on one associated entity id, and status: 4448MB
index on both associated entity ids: 3436MB
(index total): 21740MB

And of course at these sizes, we really do need these indexes for performance (the tables is also partitioned but that's another story).

# ? Nov 18, 2009 16:57

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Thanks guys.

More-or-less unrelated question: If I put the function UTC_TIMESTAMP into a WHERE clause, is it recalculated for each row (after all, it might have changed by milliseconds in the meantime!) or calculated only once for the query (as seems sensible)? So if I issue a query in which part of the WHERE clause reads, for example,

code:

... AND Game.LastMove > TIMESTAMPADD(MONTH, -5, UTC_TIMESTAMP()) AND ...

is it going to work that out on every row, as opposed to working out the value returned by TIMESTAMPADD once and just comparing each row to the (constant) result? I'm sure I would be able to find out what the answer to this question is normally, but it's been a long day.

EDIT: I found where in the documentation it says this not a minute after typing this. (The answer is it only evaluates it once.)

Hammerite fucked around with this message at 00:52 on Nov 19, 2009

# ? Nov 19, 2009 00:50

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

What is the correct way to do what I am trying to do?

This obviously won't work, since I can only have 1 field in the subquery:

code:

SELECT col_a FROM table WHERE col_b IN (SELECT col_a, col_b FROM temp_table)

But this gives me a Can't reopen table: 'temp_table' error:

code:

SELECT col_a FROM table WHERE (col_b IN (SELECT col_a FROM temp_table) OR col_b IN (SELECT col_b FROM temp_table))

It's addressed in the temporary tables problems page, but there doesn't seem to be a solution.

# ? Nov 19, 2009 01:19

Bad Titty Puker: Nov 3, 2007; Soiled Meat

fletcher posted:

What is the correct way to do what I am trying to do?

This obviously won't work, since I can only have 1 field in the subquery:
code:
SELECT col_a FROM table WHERE col_b IN (SELECT col_a, col_b FROM temp_table)
But this gives me a Can't reopen table: 'temp_table' error:
code:
SELECT col_a FROM table WHERE (col_b IN (SELECT col_a FROM temp_table) OR col_b IN (SELECT col_b FROM temp_table))
It's addressed in the temporary tables problems page, but there doesn't seem to be a solution.

Something like this (untested)?

code:

SELECT 
   col_a
FROM
   table
WHERE EXISTS
   (SELECT *
    FROM temp_table
    WHERE temp_table.col_a = table.col_b
    OR temp_table.col_b = table.col_b)

# ? Nov 19, 2009 02:08

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

camels posted:

Something like this (untested)?

code:

SELECT 
   col_a
FROM
   table
WHERE EXISTS
   (SELECT *
    FROM temp_table
    WHERE temp_table.col_a = table.col_b
    OR temp_table.col_b = table.col_b)

Worked great! Thanks!

# ? Nov 19, 2009 19:29

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

I have a VARCHAR(1300) column that I want to sort my results by. There wasn't an index on it so it was pretty slow, but when I try to add an index to it I get: "Specified key was too long; max key length is 1000 bytes"

Is there another way to get the query to run faster?

# ? Nov 19, 2009 23:12

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

fletcher posted:

I have a VARCHAR(1300) column that I want to sort my results by. There wasn't an index on it so it was pretty slow, but when I try to add an index to it I get: "Specified key was too long; max key length is 1000 bytes"

Is there another way to get the query to run faster?

use a fulltext index? or convert it to a text field and put a fulltext index on it?

# ? Nov 19, 2009 23:20

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Sprawl posted:

use a fulltext index? or convert it to a text field and put a fulltext index on it?

I tried using a fulltext index, but it was still slow, and EXPLAIN said it wasn't using an index. I also tried specifying with USE INDEX() and it was still slow. Same results whether it's a VARCHAR or TEXT field.

# ? Nov 19, 2009 23:32

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

You could maybe create a column of much shorter length (anywhere < 1000, I'm guessing?) that will act as the sort key in lieu of the actual content column. Then cluster index by this and sort by it.

code:

Key  | Value
Flet | Fletcher sucks balls
Trip | Triple Tech is a homosexual
Spra | Sprawl needs an index or Scotch

# ? Nov 19, 2009 23:47

nbv4: Aug 21, 2002; by Duchess Gummybuns

nevermind

nbv4 fucked around with this message at 02:43 on Nov 20, 2009

# ? Nov 20, 2009 02:35

KhanTusion: Mar 14, 2006

I'm new to SQL commands and I'm currently writing a VB program utilizing them...

I need to display rows of Product Name, Unit Price, Quantity, and Total (of unit price * quantity) and then at the very bottom a grand total which would be the SUM of the Totals which I gave the alias Product_Total

here's my code:

code:

SELECT Sum(product_total) AS grand_total
FROM   (SELECT [Order Details].unitprice,
               productname,
               quantity,
               ([Order Details].unitprice * quantity) AS product_total
        FROM   [Order Details],
               products
        WHERE  orderid = @OrderID
               AND [Order Details].productid = products.productid)

If I remove the first line where I'm trying to select the SUM(Product_Total) everything runs smoothly. I just can't figure out how to get the aggregate function to work. With the code as is I get the error: "Incorrect Syntax Near ')'"

Any help on grasping how to accomplish this would be wondrous.

KhanTusion fucked around with this message at 10:02 on Nov 20, 2009

# ? Nov 20, 2009 09:58

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

If you were using MySQL, I know for a fact that that query won't work as-is because a subquery must have an alias. If you take your query as it is and enter " AS SubqueryA" after the final bracket, then it would work in MySQL (with some trivial differences in notation). Could that be the problem here? MySQL would explicitly tell you that the subquery needs an alias, but maybe other DBMSs have the same requirement but no explicit error message about it.

Edit: Also, you seem to be SELECTing product_name and then not using it. In fact, you don't actually need to use a subquery at all; you can use

code:

SELECT Sum([Order Details].unitprice * quantity) AS grand_total
FROM   [Order Details],
       products
WHERE  orderid = @OrderID
       AND [Order Details].productid = products.productid)

Hammerite fucked around with this message at 13:56 on Nov 20, 2009

# ? Nov 20, 2009 13:53

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

KhanTusion posted:

I'm new to SQL commands and I'm currently writing a VB program utilizing them...

I need to display rows of Product Name, Unit Price, Quantity, and Total (of unit price * quantity) and then at the very bottom a grand total which would be the SUM of the Totals which I gave the alias Product_Total

here's my code:
code:
SELECT Sum(product_total) AS grand_total
FROM   (SELECT [Order Details].unitprice,
               productname,
               quantity,
               ([Order Details].unitprice * quantity) AS product_total
        FROM   [Order Details],
               products
        WHERE  orderid = @OrderID
               AND [Order Details].productid = products.productid) 
If I remove the first line where I'm trying to select the SUM(Product_Total) everything runs smoothly. I just can't figure out how to get the aggregate function to work. With the code as is I get the error: "Incorrect Syntax Near ')'"

Any help on grasping how to accomplish this would be wondrous.

code:

SELECT [Order Details].unitprice,
               productname,
               quantity,
               ([Order Details].unitprice * quantity) AS product_total
        FROM   [Order Details],
               products
        WHERE  orderid = @OrderID
               AND [Order Details].productid = products.productid

What i'm trying to figure out is why you need a grand total when all it is an aggregate of all the lines your going to be selected anyways? And you really shouldn't be doing it as a sub query it should realistically be a separate one or you can just add up the product total lines. But this will give you what you are looking for.

code:

SELECT [Order Details].unitprice,
               products.productname,
               [Order Details].quantity,
               ([Order Details].unitprice * quantity) AS product_total,
	       ( select ([Order Details].unitprice * quantity) from [Order Details] as t2 where t2.orderid = [Order Details].orderid ) as grand_total
        FROM   [Order Details],
               products
        WHERE  [Order Details].orderid = @OrderID
               AND [Order Details].productid = products.productid

# ? Nov 20, 2009 17:34

KhanTusion: Mar 14, 2006

thanks for the help, from what you've said I decided to just create a subroutine in VB to run a second query to aggregate the sum. Even with those code examples you gave me I couldn't get it to work all in one query so I scrapped it. Working now though, thanks.

# ? Nov 20, 2009 22:04

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

KhanTusion posted:

thanks for the help, from what you've said I decided to just create a subroutine in VB to run a second query to aggregate the sum. Even with those code examples you gave me I couldn't get it to work all in one query so I scrapped it. Working now though, thanks.

Really that's odd i have a working one here i created to make sure it was gonna work before i pasted that hth.

code:

select jobnumber,
	   ponumber,
	   linenumber,
	   amount,
	   qty,
	   (amount * Case qty when 0 then 1 else qty end) as total,
	   (Select sum(amount* Case qty When 0 then 1 else qty End)
	   		from amount_contract_detail as t2 
	   		where   t2.jobnumber = amount_contract_detail.jobnumber 
	   	     and t2.ponumber = amount_contract_detail.ponumber
	   ) as grand_total 
	   from amount_contract_detail

code:

XXXXXX	35	1	13720	0	13720	76974
XXXXXX	35	2	9000	0	9000	76974
XXXXXX	35	3	6600	0	6600	76974
XXXXXX	35	4	43624	0	43624	76974
XXXXXX	6	1	0	0	0	0
XXXXXX	30	1	230	0	230	1183.01
XXXXXX	30	2	953.01	0	953.01	1183.01

# ? Nov 20, 2009 22:15

Vince McMahon: Dec 18, 2003

What about GROUP BY ... WITH ROLLUP?

That gives an extra row at the end with the SUM of number fields.

# ? Nov 21, 2009 02:08

Dromio: Oct 16, 2002; Sleeper

I hope this is the right place to ask about this. I figure some folks here know a good bit about SQLServer diagnostics. I'm looking for some tips on how best to diagnose a problem...

We have a ASP.NET Testing website (built using NHibernate) running against SQL Server Express on our QA box. It's got about 40 users right now.

Every once in a while (twice over the last 2 months), something locks up one of the tables in the db-- the "SiteUSERS" table. Any access to this table times out, even the simplest of queries directly from the Management Studio console.

I can go to the Activity monitor in management studio and see that there are two processes accessing the DB, and one is suspended, waiting on the other. If I kill that process, everything goes back to normal.

But how do I figure out the root cause? Is there some way to see why that process is locked? The table is used in a lot of places on the site. I don't have direct control over the SQL statements used by the site, NHibernate is generating those. I could enable full logging on NHibernate, but that's a LOT of information to sift through.

Next time this happens, what information should I gather to help find the root cause?

# ? Nov 21, 2009 22:29

Bad Titty Puker: Nov 3, 2007; Soiled Meat

Dromio posted:

We have a ASP.NET Testing website (built using NHibernate) running against SQL Server Express on our QA box. It's got about 40 users right now.

Every once in a while (twice over the last 2 months), something locks up one of the tables in the db-- the "SiteUSERS" table. Any access to this table times out, even the simplest of queries directly from the Management Studio console.

I can go to the Activity monitor in management studio and see that there are two processes accessing the DB, and one is suspended, waiting on the other. If I kill that process, everything goes back to normal.

But how do I figure out the root cause? Is there some way to see why that process is locked? The table is used in a lot of places on the site. I don't have direct control over the SQL statements used by the site, NHibernate is generating those. I could enable full logging on NHibernate, but that's a LOT of information to sift through.

1. Identify the SQL statement that is blocking. This approach is the same (I think) whether you are using SQL Server 2005 Express or SQL Server 2008 Express.

Use SQL Trace to capture the offending SQL statement. I'm pretty sure SQL Server Express does not support the GUI tool (Profiler), but it does support the system stored procedures for running a trace. Here's an example: http://www.mssqltips.com/tip.asp?tip=1035

Limiting the trace to long running queries where the SQL Text is like the table name will probably narrow it down enough.

2. With this info, perhaps you could start a support incident with NHibernate? I'm not familiar with them. I take it that you can't modify the SQL Statements or server settings.

Perhaps there are missing indexes that would solve the problem, assuming you can add indexes.

http://www.sql-server-performance.com/tips/blocking_p1.aspx
http://www.sql-server-performance.com/tips/reducing_locks_p1.aspx

edit
If you are at SSMS when the blocking is occurring, try this:

1. Get the object id of the table in question:

code:

SELECT OBJECT_ID('the table name')

2. View the blocking locks

code:

sp_lock

The output will look something like this:

code:

spid   dbid   ObjId       IndId  Type Resource                         Mode     Status
------ ------ ----------- ------ ---- -------------------------------- -------- ------
52     1      1115151018  0      TAB                                   IS       GRANT

And look for the object id in the ObjId column.

Then to get the offending SQL statement, use

code:

DBCC INPUTBUFFER( <the spid> )

I think these are all supported by SQL Server Express.

Bad Titty Puker fucked around with this message at 03:51 on Nov 22, 2009

# ? Nov 22, 2009 02:59

Dromio: Oct 16, 2002; Sleeper

Thank you camels, this looks like it will help a lot! If only I could force the issue to happen again sooner

# ? Nov 22, 2009 16:32

Gallipoli: Apr 27, 2004; Not the Italian city. No one cares about the Italian city.

I'm trying to pull data out of a dozen different tables or so and put them into a format that is actually part of the industry standard.

The standard in question is called Multispeak and, at the end of the day for this conversion, what I need is an XML file that conforms to the XML schema (.xsd) that I have for Multispeak.

Is there an easy way of pulling data from MS SQL into an XML file by mapping particular table fields to an XSD file?

So far, all I have come up with is this near-decade-old article on MSDN; the sample code doesn't appear to work when I try to modify the .xsd they provide to reflect the tables I am pulling from.

My other option is using a standard XML parsing library (.NET or Python's lxml) to write it out myself, but it feels like there is an easier way out there I am missing just using SQL Server and thought I would ask. Thanks.

# ? Nov 23, 2009 18:17

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Gallipoli posted:

I'm trying to pull data out of a dozen different tables or so and put them into a format that is actually part of the industry standard.

The standard in question is called Multispeak and, at the end of the day for this conversion, what I need is an XML file that conforms to the XML schema (.xsd) that I have for Multispeak.

Is there an easy way of pulling data from MS SQL into an XML file by mapping particular table fields to an XSD file?

So far, all I have come up with is this near-decade-old article on MSDN; the sample code doesn't appear to work when I try to modify the .xsd they provide to reflect the tables I am pulling from.

My other option is using a standard XML parsing library (.NET or Python's lxml) to write it out myself, but it feels like there is an easier way out there I am missing just using SQL Server and thought I would ask. Thanks.

If you're using SQL Server 2005 or newer http://msdn.microsoft.com/en-us/library/ms190936%28SQL.90%29.aspx

# ? Nov 23, 2009 18:56

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 01:26

adamarama: Mar 20, 2009

I apologise if this isn't the right place to ask, I was wondering if anyone knows of any good resources (books or online) for learning SQL? There's so many books and sites available, I'm not sure which would be best for a beginner. I have never done programming before, I'm just learning as I hear it's quite useful in business analysis. Should I focus on mastering SELECT, or is it useful knowing how to construct databases?

# ? Nov 23, 2009 20:07

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »