Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


kajester posted:

I had to do something similar to this but for timeclock punches. ROW_NUMBER window function in MS SQL proved useful as you can essentially create row numbers partitioned by driver and IN or OUT and then match the INs to the OUTs. Look into something like
code:
;WITH EntriesByRow AS (
SELECT ROW_NUMBER() OVER(PARTITION BY Driver, Direction ORDER BY Time) as Row
  , *
FROM YourDataTable
)
SELECT *
FROM EntriesByRow a
INNER JOIN EntriesByRow b ON b.Driver = a.Driver and b.Row = a.Row and b.Direction = 'OUT'
WHERE a.Direction = 'IN'
This solution assumes every IN is followed by an OUT.

Good luck!

This is slick. Changing to use a LOJ and taking precautions of b's date stamp being null means you can easily handle cases where an IN is not followed by an OUT.
I adapted my previous query and ended up with this:
code:
CREATE TABLE dbo.#driver_log
  (
     driverid  INT,
     add_date  DATETIME,
     direction BIT -- 0 = inbound, 1 = outbound
  )

-- Assumes:
-- that the count of inbound records is either equal to, or one greater than, the outbound records.
-- For each driver, every inbound record will map to a unique outbound record,
--    that is: between any two inbound records for a driver, there must be exactly one outbound record.
-- That the combination of all 3 columns is UNIQUE
INSERT dbo.#driver_log
       (driverid,
        add_date,
        direction)
VALUES (1,
        '1/1/2013',
        0),
       (1,
        '1/1/2013 1:00:00 AM',
        1),
       (1,
        '1/2/2013',
        0),
       (2,
        '1/1/2013',
        0)

;WITH EntriesByRow AS (
SELECT ROW_NUMBER() OVER(PARTITION BY driverid, Direction ORDER BY add_date) as RowNum,
  DL.DriverID,
  DL.Direction,
  DL.Add_Date
FROM dbo.#driver_log AS DL
)
SELECT DL1.DriverID, Datediff(minute, DL1.add_date, Isnull(DL2.add_date, Getdate())) AS
       [Minutes on site]
FROM EntriesByRow AS DL1
LEFT OUTER JOIN EntriesByRow AS DL2
  ON DL2.driverid = DL1.driverid and DL2.RowNum = DL1.RowNum and DL2.Direction = 1
WHERE DL1.Direction = 0

;WITH EntriesByRow AS (
SELECT ROW_NUMBER() OVER(PARTITION BY driverid, Direction ORDER BY add_date) as RowNum,
  DL.DriverID,
  DL.Direction,
  DL.Add_Date
FROM dbo.#driver_log AS DL
)
SELECT DL1.DriverID, Sum(Datediff(minute, DL1.add_date, Isnull(DL2.add_date, Getdate()))) AS
       [Total minutes on site]
FROM EntriesByRow AS DL1
LEFT OUTER JOIN EntriesByRow AS DL2
  ON DL2.driverid = DL1.driverid and DL2.RowNum = DL1.RowNum and DL2.Direction = 1
WHERE DL1.Direction = 0
GROUP BY DL1.DriverID

DROP TABLE dbo.#driver_log

Adbot
ADBOT LOVES YOU

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Nth Doctor posted:

This is slick. Changing to use a LOJ and taking precautions of b's date stamp being null means you can easily handle cases where an IN is not followed by an OUT.
I adapted my previous query and ended up with this:
code:
CREATE TABLE dbo.#driver_log
  (
     driverid  INT,
     add_date  DATETIME,
     direction BIT -- 0 = inbound, 1 = outbound
  )

-- Assumes:
-- that the count of inbound records is either equal to, or one greater than, the outbound records.
-- For each driver, every inbound record will map to a unique outbound record,
--    that is: between any two inbound records for a driver, there must be exactly one outbound record.
-- That the combination of all 3 columns is UNIQUE
INSERT dbo.#driver_log
       (driverid,
        add_date,
        direction)
VALUES (1,
        '1/1/2013',
        0),
       (1,
        '1/1/2013 1:00:00 AM',
        1),
       (1,
        '1/2/2013',
        0),
       (2,
        '1/1/2013',
        0)

;WITH EntriesByRow AS (
SELECT ROW_NUMBER() OVER(PARTITION BY driverid, Direction ORDER BY add_date) as RowNum,
  DL.DriverID,
  DL.Direction,
  DL.Add_Date
FROM dbo.#driver_log AS DL
)
SELECT DL1.DriverID, Datediff(minute, DL1.add_date, Isnull(DL2.add_date, Getdate())) AS
       [Minutes on site]
FROM EntriesByRow AS DL1
LEFT OUTER JOIN EntriesByRow AS DL2
  ON DL2.driverid = DL1.driverid and DL2.RowNum = DL1.RowNum and DL2.Direction = 1
WHERE DL1.Direction = 0

;WITH EntriesByRow AS (
SELECT ROW_NUMBER() OVER(PARTITION BY driverid, Direction ORDER BY add_date) as RowNum,
  DL.DriverID,
  DL.Direction,
  DL.Add_Date
FROM dbo.#driver_log AS DL
)
SELECT DL1.DriverID, Sum(Datediff(minute, DL1.add_date, Isnull(DL2.add_date, Getdate()))) AS
       [Total minutes on site]
FROM EntriesByRow AS DL1
LEFT OUTER JOIN EntriesByRow AS DL2
  ON DL2.driverid = DL1.driverid and DL2.RowNum = DL1.RowNum and DL2.Direction = 1
WHERE DL1.Direction = 0
GROUP BY DL1.DriverID

DROP TABLE dbo.#driver_log

This kind of query is usually much simpler and more straightforward when the DDL for the table includes a column for "time inbound" and one for "time outbound" instead of a single datetime column. It also prevents certain kinds of bad data from getting in the table, for example three "inbound" rows and no "outbound" rows. The time_inbound column would be NOT NULL and the time_outbound column would be nullable to get even more data integrity.

kajester
Jan 17, 2013

camels posted:

This kind of query is usually much simpler and more straightforward when the DDL for the table includes a column for "time inbound" and one for "time outbound" instead of a single datetime column. It also prevents certain kinds of bad data from getting in the table, for example three "inbound" rows and no "outbound" rows. The time_inbound column would be NOT NULL and the time_outbound column would be nullable to get even more data integrity.

Agreed, though there are times where the table may actually be clock in/out transactions in which you don't want to touch the original transaction and let it live on its own. Consider a multi-publisher scenario where systems are all touching distributed data that needs to flow back to one central source. Possibly touching the same source table can lead to some hairy conflicts back at the central source.

Bad Titty Puker
Nov 3, 2007
Soiled Meat

kajester posted:

Agreed, though there are times where the table may actually be clock in/out transactions in which you don't want to touch the original transaction and let it live on its own. Consider a multi-publisher scenario where systems are all touching distributed data that needs to flow back to one central source. Possibly touching the same source table can lead to some hairy conflicts back at the central source.

I think it deals more with basic data modeling. The table design with one time imitates entries on a clipboard, while the design with in and out times models the entities more accurately. If the second design is used from the start I don't see any problems caused by a distributed data system. If you mean that the start time can't be modified once set that doesn't depend on how the data is modeled, but how that constraint is enforced. Whether or not the data is distributed is orthogonal to how the reality of the data is modeled in the table design.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Hey guys, long time no post. I've been so busy with other stuff I've barely stepped into SH/SC, but now that I have a problem with MySQL and I am of course back with my tail between my legs. Of course, it's not an SQL specific problem but I've been reading this thread for quite a while and I'm pretty sure some of you guys are pretty sharp on the administration side.

I've been asked to fix a mySQL server that for some reason, overnight, decided that queries now take 10-30 seconds to execute instead of 0-1 seconds. It's hosted on a GoDaddy virtual dedicated server with these specs:
- 8 core Intel(R) Xeon(R) CPU L5410 @ 2.33GHz
- 2 gb RAM
- 60 gb disc
- MySQL 5.1.66-c11
- Apache/2.2.23 (Unix) mod_ssl/2.2.23 OpenSSL/1.0.0-fips mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
- PHP 5.3.18
- Kernel 2.6.18-028stab101.1
- 8 tiny databases; largest is about 10mb. This machine is way overspecced for what is on it. Tables are a mix of iSAM and InnoDB (I hear this could be bad).
- osCommerce 2.3.x
- cPanel 11.34.1 (build 7)

So I've been wrestling with mysqltuner.pl a lot of the last little while, and have added all kinds of neat stuff to my.cnf which has exactly zero effect (I usually wait 24 hours to see what a change does before trying something else).

I realize now though, that I doubt it's a my.cnf/configuration problem, because the thing was working fine, and then suddenly didn't. The performance tunings and so I can post up the mysqltuner output and the my.cnf but first I wanted to pick your guys' brains to see if this situation seemed familiar:

1. Server is running fine for about 6 months
2. Monday suddenly pageloads on the osCommerce pages go from 1 second to over 2 mins. CPU never gets above 5%, RAM never gets above 40%, disc is 80% free. Static pages (.html, .php) work fine, as do non-database things like .jpgs, .pdfs, etc. This leads me to question MySQL as the culprit.
3. I did a CHECK TABLE, ANALYZE TABLE, OPTIMIZE TABLE, and REPAIR TABLE on all the tables in question. I also did a mysqldump, drop, and re-import of the databases.
4. Various mysqltuner shenanigans ensue (setting max_connections, timeouts, query_cache_size, join_buffer_size_thread_cache_size, table_cache, changing innodb_buffer_pool/log file size) to no real effect (spread over 4 days for maximal effect measuring).
5. Just watch queries run using SHOW FULL PROCESSLIST; nothing really sticks out, they're all equally slow.

I realize this might be fallacious logic but it feels like that if this happened suddenly and out of the blue that the solution would also likely be something rather binary and simple (basically undo whatever went wrong) instead of tweaking and prodding cache memory limits and the like. Do the above scenario ring any bells for anyone here?

McGlockenshire
Dec 16, 2005

GOLLOCKS!
Is the behavior consistent and repeatable? Does issuing the same query back to back result in roughly identical performance? What does EXPLAIN say when fed a slow query?

quote:

8 tiny databases; largest is about 10mb

So this'd place your memory use at what, 80-160 megs? In other words, each database fits entirely in memory, so there's no way that this is I/O contention.

Does it happen on both MyISAM and InnoDB tables? Both of those table types obey table locks, but the state of waiting for a lock to be released is pretty obvious on SHOW FULL PROCESSLIST.

What's SHOW INNODB STATUS look like? Looks for lines about deadlocks.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

McGlockenshire posted:

Is the behavior consistent and repeatable? Does issuing the same query back to back result in roughly identical performance? What does EXPLAIN say when fed a slow query?

So this'd place your memory use at what, 80-160 megs? In other words, each database fits entirely in memory, so there's no way that this is I/O contention.

Does it happen on both MyISAM and InnoDB tables? Both of those table types obey table locks, but the state of waiting for a lock to be released is pretty obvious on SHOW FULL PROCESSLIST.

What's SHOW INNODB STATUS look like? Looks for lines about deadlocks.

Thanks for your reply, you're one of the guys I had hoped would see this.

The overall behavior is inconsistent; page load times/queries can vary from over a minute, to roughly 2 seconds, to instantaneous like 'normal' at semi-random times, with 90% being the query dies (I have timeout set to 20000). I have noticed watching PROCESSLIST though that it seems like queries are equally affected, so it's consistent in a way. I mean that there might be 8 queries in queue and they all are delayed by an equal amount, almost as if the db was 'clumping' them somehow.

Unfortunately right now it's being a jerk and working perfectly. I'm not sure if this is because I tried another mysqldump/drop/import and then restart the server last night or not. I remain convinced it is going to slow down again in an hour or so; it usually has about 1 hour of normal operation out of 24. I had forgotten about EXPLAIN, definitely going to check that out once it's slow again.

I don't actually see locks very often at all in PROCESSLIST, the most common status is 'freeing items' and 'opening/closing tables'.

Here's the INNODB STATUS, which looks pretty innocuous right now since the thing is working:
http://pastebin.com/rfp8mAbD

McGlockenshire
Dec 16, 2005

GOLLOCKS!

quote:

Thanks for your reply, you're one of the guys I had hoped would see this.
Crap, I've got a reputation now.

quote:

Number of rows inserted 5293, updated 982, deleted 380, read 257675
0.07 inserts/s, 0.00 updates/s, 0.09 deletes/s, 7.04 reads/s
That's a freaking pathetic number. There's no reason you should ever see any sort of performance problem out of this machine. If you had locking or deadlock issues, you'd see phrases like "LATEST DETECTED DEADLOCK".

The next time you feel like doing a dump/reload, turn on innodb file per table and then convert *all* of the tables that you can over to it from MyISAM after the reload. The only tables that should remain MyISAM should be those with fulltext indexes.

The intent here is only to concentrate the queries into an engine that stats can be pulled from. The next time you're seeing slow behavior, watch SHOW INNODB STATUS like a hawk. There's a list of open and ongoing transactions in there that might be revealing. If you're seeing multiple queries sitting and waiting, chances are that they're all blocking on one other query that's taking an abnormally long time with a resource.

There might be another thing going wrong here. This is totally speculation. What virtualization technology are you being subjected to? Many of them include the ability to throttle CPU and disk I/O once a threshold has been reached. Some of these throttles work by letting you have free reign for a certain amount of time, and then majorly clamping down on resources. Is it possible that you're bumping up against these limits?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Huh, still going strong and fast. This is my theory: last night I dropped the databases, and then restored from backup from the week previous before this started happening. It was still laggy as poo poo after I did that, and even first thing this morning, but for the last 5 hours it's been fine, which is the longest its stayed fast since this started. I wonder if there was some index corruption/messup going on, but if that was the case the you'd think it would have sped up last night when I imported the backup? Either that or queries took some time to get optimized properly but other than that it is a mystery to me.

Innodb file per table was already enabled (one of the first things I did). I've been using SHOW PROCESSLIST as my primary diagnostic tool but I didn't know about SHOW INNODB STATUS which looks way better for this kind of thing.

I was (and still am) suspicious about the virtualization throttling as well, but it seems hard to credit when the databases are so small and the CPU load never gets above 5%. I could see it if this server was always running flat out, but their VM manager would have to be extremely screwed to redline the server over that. Another possibility I considered is that bad neighbors in the VM could be swamping the entire machine, but GoDaddy denies that.

Regardless, I am hesitantly, and tentatively, going to hope that this is now fixed. I've kind of told this client flat out 'I don't know enough about this stuff that I should be your on call emergency guy for Linux/MySQL problems' because I'm tired of getting midnight calls for questions I don't have the answer to. Sure I charge them for the hours I'm researching and fixing, but it's just getting tiring, and to be frank, I have better things to do.

Sorry for the mini-rant, been quite a few sleepless nights over this problem. Thanks for your help though; I think what you've told me about SHOW INNODB STATUS will be key if the problem flares up again. Hopefully it doesn't (knock wood) but if it does I'll be back here again, like a dog returning to his sick.

Radbot
Aug 12, 2009
Probation
Can't post for 3 years!
I'm going to be interviewing for a data analyst position this week, and I just found out that SQL experience is considered a plus. Unfortunately, I don't have any experience with it.

I've previously worked with databases (SymphonyIRI and Nielsen NITRO POS) that handled syndicated consumer packaged good sales information, but it's all done through either web or proprietary interfaces. Basically, I'm pretty familiar with setting up database queries, making pivot tables based on them, and all that basic analyst stuff, but I've never queried by hand with SQL before.

I've already run through the net tuts tutorial (http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3/) but I'm wondering what else I should know for an analyst-level (as opposed to an administrator or developer level) understanding of SQL, at least enough to bluff my way through the interview while I continue to learn.

Radbot fucked around with this message at 02:58 on Jan 27, 2013

big trivia FAIL
May 9, 2003

"Jorge wants to be hardcore,
but his mom won't let him"

Radbot posted:

I'm going to be interviewing for a data analyst position this week, and I just found out that SQL experience is considered a plus. Unfortunately, I don't have any experience with it.

I've previously worked with databases (SymphonyIRI and Nielsen NITRO POS) that handled syndicated consumer packaged good sales information, but it's all done through either web or proprietary interfaces. Basically, I'm pretty familiar with setting up database queries, making pivot tables based on them, and all that basic analyst stuff, but I've never queried by hand with SQL before.

I've already run through the net tuts tutorial (http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3/) but I'm wondering what else I should know for an analyst-level (as opposed to an administrator or developer level) understanding of SQL, at least enough to bluff my way through the interview while I continue to learn.

Realistically, you should be able to get by if you know basic syntax, how joins work, and how to do grouping (GROUP BY). They might have you demonstrate a select statement after giving you example data from a couple of tables.

Radbot
Aug 12, 2009
Probation
Can't post for 3 years!

-S- posted:

Realistically, you should be able to get by if you know basic syntax, how joins work, and how to do grouping (GROUP BY). They might have you demonstrate a select statement after giving you example data from a couple of tables.

After spending most of my weekend on SQLZoo and Learn SQL the Hard Way, I think I've got a decent handle on these concepts, including the different types of joins. Thanks for your response, I think I'll be going into this more relaxed than I had originally anticipated.

Ron Don Volante
Dec 29, 2012

Are there any good datasets online that SQL beginners can download and mess around with?

Hot Yellow KoolAid
Aug 17, 2012
I'm having a lot of trouble adding a column to a table and changing the NULL values inside to actual valid values. Here's my code so far:
code:
ALTER TABLE dept 
ADD status VARCHAR(20);

UPDATE dept
SET    status = 'OK'
WHERE  status = NULL;
The "Status" column gets added to the table just fine by the first statement, but I've been trying to tweak the second statement to update the null status for some time. Nothing has worked so far.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Hot Yellow KoolAid posted:

I'm having a lot of trouble adding a column to a table and changing the NULL values inside to actual valid values. Here's my code so far:
code:
ALTER TABLE dept 
ADD status VARCHAR(20);

UPDATE dept
SET    status = 'OK'
WHERE  status = NULL;
The "Status" column gets added to the table just fine by the first statement, but I've been trying to tweak the second statement to update the null status for some time. Nothing has worked so far.

NULL is not equal to anything, you need to use IS NULL and IS NOT NULL to test against NULL values.

Beelzebozo
Nov 6, 2002

I don’t mean to sound bitter, cold, or cruel. But I am, so that’s how it comes out.

Ron Don Volante posted:

Are there any good datasets online that SQL beginners can download and mess around with?

You could always use TPC-H. That's the example schema with which academic database research is typically demonstrated. It deals with orders and line items and suppliers and customers. That sort of thing. A bit dry, but it provides a common example that would be good to learn anyway. Other than that there is another example called Northwind that comes preinstalled with MS SQL, which you can download an "Express" version of for free.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
MySQL: We have a table with start and end dates, and we want to be able to query this to collapse it into a single range. For example:

Row1: Start date 2011-01-01, end date 2011-12-31.
Row2: Start date 2012-02-01, end date 2012-03-31.
Row3: Start date 2012-04-01, end date 2012-12-31.
Row4: Start date 2013-01-01, end date null

I want to be able to query for today and say, "Get me the range that this record is active." Which would be Feb 1 2012 through present (rows 2 through 4), but I don't just want to say "min(start_date) to max(end_date)" because that would include Row1 and thus the inactive range between Jan 1 and Jan 31 2012. Is this possible to do in MySQL?

salted hash browns
Mar 26, 2007
ykrop
What extensions are available for MySQL that enable active-active or active-passive clustering? So far I know of MySQL cluster, percona, galera. What else am I missing that I should consider?

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
code:
SELECT DISTINCT SomeColumn FROM SomeTable ORDER BY SomeColumn
Returns in 0.03s w/ 164 rows

code:
SELECT COUNT(*) FROM SomeTable WHERE SomeColumn = 'Particular Value'
Returns in 1.89s w/ a count of ~100k

code:
SELECT SomeColumn, COUNT(*) FROM SomeTable GROUP BY SomeColumn
Goes out to lunch, still nothing after 25 minutes

SomeTable is ~300m rows. If it can do query #2 so quick, why it is having a hard time with query #3?

Also, there's an index on SomeColumn, but does it make sense to have one with so few distinct values? I guess it certainly helps query #1 perform better.

fletcher fucked around with this message at 20:54 on Feb 2, 2013

Pardot
Jul 25, 2001




Depends entirely on what database and version it is, and what plan your db is using. If your database offers it, look at the explain output.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Pardot posted:

Depends entirely on what database and version it is, and what plan your db is using. If your database offers it, look at the explain output.

MySQL 5.1

The explain says select_type=SIMPLE, type=index, possible_keys = null, key = SomeTable_SomeColumn_Index, key_len=243, rows=300000000

Aredna
Mar 17, 2007
Nap Ghost

Golbez posted:

MySQL: We have a table with start and end dates, and we want to be able to query this to collapse it into a single range. For example:

Row1: Start date 2011-01-01, end date 2011-12-31.
Row2: Start date 2012-02-01, end date 2012-03-31.
Row3: Start date 2012-04-01, end date 2012-12-31.
Row4: Start date 2013-01-01, end date null

I want to be able to query for today and say, "Get me the range that this record is active." Which would be Feb 1 2012 through present (rows 2 through 4), but I don't just want to say "min(start_date) to max(end_date)" because that would include Row1 and thus the inactive range between Jan 1 and Jan 31 2012. Is this possible to do in MySQL?

What version of MySQL are you using? Do you know which window functions it supports (i.e. row_number, rank, lag, lead)? I'm normally an MS SQL guy, but the general idea here is that you want find a way to compare each row to the prior or next row and determine if they are in the same group or not. After this you'll filter for just the group that matches the current group and find the min startdate from there and just use getdate() as your enddate since it's the current row.

Polio Vax Scene
Apr 5, 2009



This weekend I requested an entire database be copied in SQL Server Management Studio and let it run over the weekend. When I went to check on it this morning out of habit I clicked close on the process box before looking at why it failed. Now there's no copy and I don't know how to check why it failed. Is that info stored somewhere?

Atoramos
Aug 31, 2003

Jim's now a Blind Cave Salamander!


This is probably a stupid question but I've banged against this query an inordinate amount of time now.

I have a table set up like this:
code:
1	394	2013-01-16 00:00:00.000
2	394	2013-02-01 00:00:00.000
3	394	2013-02-04 00:00:00.000
4	60	2013-01-30 00:00:00.000
5	203	2013-01-22 00:00:00.000
6	203	2013-01-28 00:00:00.000
I want a select to return this:

code:
394	2013-02-04, 2013-02-01, 2013-01-16
60	2013-01-30
203	2013-01-28, 2013-01-22
I have no idea how to get a comma delimited return like that though. I've heard pivot, concat, and coalesce as places to start, but I've been playing with those commands for a while now and haven't made much progress.

Pardot
Jul 25, 2001




select yoursecondcolumn, array_to_string(array_agg(yourthirdcolumn), ', ') from yourtable group by yoursecondcolumn;

Atoramos
Aug 31, 2003

Jim's now a Blind Cave Salamander!


Pardot posted:

select yoursecondcolumn, array_to_string(array_agg(yourthirdcolumn), ', ') from yourtable group by yoursecondcolumn;

Thanks a ton, but unfortunately I'm getting "'array_agg' is not a recognized built-in function name."

mobby_6kl
Aug 9, 2009

by Fluffdaddy
I think that's a DB2 and Postgres function, while Atoramos is probably using SQL Server (based on Pivot).

Is this a one-time thing? When I needed to do something like that, I just pulled the data into my local Postgres server which can do that, or even use concat directly as an aggregate function. This ended up being much less painful then dicking around with some weird hacks.

kalthir
Mar 15, 2012

You can use FOR XML PATH. The example here should get you on the right track, although I'm pretty sure it's pretty slow and inefficient because of the subqueries.

http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/

Atoramos
Aug 31, 2003

Jim's now a Blind Cave Salamander!


mobby_6kl posted:

I think that's a DB2 and Postgres function, while Atoramos is probably using SQL Server (based on Pivot).

Is this a one-time thing? When I needed to do something like that, I just pulled the data into my local Postgres server which can do that, or even use concat directly as an aggregate function. This ended up being much less painful then dicking around with some weird hacks.

Sadly this isn't a one-time thing. Currently I've got this server-based code which is turning a SQL query into an Excel table. Screwing with the Query-To-Table code is a nightmare, but I need to get a cell to have that comma delimited date. If I can manage to get it in a selection statement, I should be all set, but you can see why I've been tearing my hair out to get this working just so.

Edit: I ended up using the FOR XML method. Thanks a lot!

Atoramos fucked around with this message at 14:41 on Feb 11, 2013

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


kalthir posted:

You can use FOR XML PATH. The example here should get you on the right track, although I'm pretty sure it's pretty slow and inefficient because of the subqueries.

http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/

Came in to suggest this.

glompix
Jan 19, 2004

propane grill-pilled
You can also flatten data like that with a recursive CTE, but can't remember how. Recursive queries hurt my brain hard. I've always just used FOR XML PATH('') as well, but it's an option.

http://www.codeproject.com/Articles/21082/Concatenate-Field-Values-in-One-String-Using-CTE-i

Chuu
Sep 11, 2004

Grimey Drawer
I've been experimenting with some of the Open Source column stores that use MySQL as a frontend to their engine (Infobright and InfiniDB mainly). Every once in a while I need to write a query that's literally "give me everything in the table" to load into an external analytic package. These queries have been shockingly slow, 17.5M rows x 60 columns (mostly Int with a couple of Long) takes about 8 minutes to transfer from the database to my C# application running locally on the server -- with all the data in the server's cache. This is much worse than MSSQL transferring over a network.

Does anyone know the fastest way to get an entire table dumped from MySQL into a custom application programatically? Right now I'm just using Connector/NET and iterating with a Data Reader, using the default settings on the client side.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

fletcher posted:

MySQL 5.1

The explain says select_type=SIMPLE, type=index, possible_keys = null, key = SomeTable_SomeColumn_Index, key_len=243, rows=300000000

Still trying to figure this one out. In my testing I noticed an [fixed]ALTER TABLE xyz ENABLE KEYS[/url] takes 1.5 hours if charset is utf8, and only .5 hours if charset is latin1. It makes sense to me why it would take longer for utf8, but I wasn't sure if there are some parameters that can be tuned to get them to similar performance?

big trivia FAIL
May 9, 2003

"Jorge wants to be hardcore,
but his mom won't let him"

kalthir posted:

You can use FOR XML PATH. The example here should get you on the right track, although I'm pretty sure it's pretty slow and inefficient because of the subqueries.

http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/

This is exactly right and I use that poo poo way more than is appropriate in an enterprise setting

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

kalthir posted:

You can use FOR XML PATH. The example here should get you on the right track, although I'm pretty sure it's pretty slow and inefficient because of the subqueries.

http://sqlandme.com/2011/04/27/tsql-concatenate-rows-using-for-xml-path/
If you are allowed to turn on CLR integration you can create a CLR Aggregate function.

I created a pair of CLR aggregate functions, one that used ', ' and one that used carriage returns.

In SQL 2008 you can create aggregate functions with multiple inputs, but you have to do a little jiggery-pokery to get it to deploy if you're still using Visual Studio 2008. It is apparently fixed in VS2010.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
I'm working on storing all iterations of something, for record keeping. So I have a record with an ID, information on the record, and finally start_date and end_date. Any time a change is made, what will really happen is a new record will be made with the new information and a start_date of NOW() and a null end_date. Once made, apart from the end_date, a record should never be altered.

But then I was thinking - these records aren't designed to be deleted or ended. There will never be an "end_date". The current record's end_date will always be null. The previous record's end_date, therefore, will always be the next record's start_date.

So I wondered: Do I even need an end_date column in this case? To get the current record I could just do SELECT ... WHERE NOW() > start_date ORDER BY start_date DESC LIMIT 1, right? I don't need to do ... WHERE end_date IS NULL. And, likewise, if I was trying to find what it was like on January 1, I would run ... WHERE start_date < "2013-01-01" ORDER BY start_date DESC LIMIT 1 instead of ... WHERE start_date < "2013-01-01" AND COALESCE(end_date, NOW()) > "2013-01-01". Right?

An additional question: should the creation of these records be handled in the PHP code, or in a MySQL trigger, and all the code thinks it's doing is running an update?

Also, to answer an earlier response:

Aredna posted:

What version of MySQL are you using? Do you know which window functions it supports (i.e. row_number, rank, lag, lead)? I'm normally an MS SQL guy, but the general idea here is that you want find a way to compare each row to the prior or next row and determine if they are in the same group or not. After this you'll filter for just the group that matches the current group and find the min startdate from there and just use getdate() as your enddate since it's the current row.

Version 5.5. I don't know much about the window functions. One of my cocoders found a snippet that might be able to work it but I haven't tested it yet.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Golbez posted:

So I wondered: Do I even need an end_date column in this case? To get the current record I could just do SELECT ... WHERE NOW() > start_date ORDER BY start_date DESC LIMIT 1, right? I don't need to do ... WHERE end_date IS NULL. And, likewise, if I was trying to find what it was like on January 1, I would run ... WHERE start_date < "2013-01-01" ORDER BY start_date DESC LIMIT 1 instead of ... WHERE start_date < "2013-01-01" AND COALESCE(end_date, NOW()) > "2013-01-01". Right?

Sounds good.

quote:

An additional question: should the creation of these records be handled in the PHP code, or in a MySQL trigger, and all the code thinks it's doing is running an update?

I'm inclined to say stored procedures for each possible change of attributes that might occur in a revision, but I may be overly fond of stored procedures, and it might not be practical if there are several attributes of the modelled object and several of them might change at a time. I guess triggers should work.

hewerman
Jun 5, 2008

Golbez posted:

So I wondered: Do I even need an end_date column in this case? To get the current record I could just do SELECT ... WHERE NOW() > start_date ORDER BY start_date DESC LIMIT 1, right? I don't need to do ... WHERE end_date IS NULL. And, likewise, if I was trying to find what it was like on January 1, I would run ... WHERE start_date < "2013-01-01" ORDER BY start_date DESC LIMIT 1 instead of ... WHERE start_date < "2013-01-01" AND COALESCE(end_date, NOW()) > "2013-01-01". Right?


Are you worried about storage or performance? I guess my question is why remove end_date if it doesn't greatly impact performance or storage. Will this data every be shown in a report? For example say you want to show each value of an iteration and the dates it was active. Displaying it in a report the way you are suggesting will require some extra query gymnastics to get the range, where as if you just leave the end date in it would be simple.

What you've described definitely sounds like it will work, and may be just what you need. I'm not sure how you will be using this data, but as a BI developer those would be my concerns.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

hewerman posted:

Are you worried about storage or performance? I guess my question is why remove end_date if it doesn't greatly impact performance or storage. Will this data every be shown in a report? For example say you want to show each value of an iteration and the dates it was active. Displaying it in a report the way you are suggesting will require some extra query gymnastics to get the range, where as if you just leave the end date in it would be simple.

What you've described definitely sounds like it will work, and may be just what you need. I'm not sure how you will be using this data, but as a BI developer those would be my concerns.

I like clean data. It has nothing to do with space or performance and everything to do with having a clean schema that won't be confusing or won't become potentially inconsistent.

Good point, though - it requires more than a simple query to find the whole date range a record is active.

Adbot
ADBOT LOVES YOU

Beelzebozo
Nov 6, 2002

I don’t mean to sound bitter, cold, or cruel. But I am, so that’s how it comes out.
I normally recommend against separately storing an end_date in cases where a set of date ranges should form a full cover, such as an audit of a status change. Not because of any storage redundancy or performance concern, but that just means that humans have to maintain the full cover property and opens up the possibility of somebody creating a situation where there are overlapping ranges or parts of the time series not covered by any range, which creates an unknown semantics scenario that software or reports might not expect.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply