SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

dwayne_dibbley: Nov 26, 2005

I have a table that stores the history of status changes for accounts but have some bad rows in there, here is a simplified example:

code:

account_no	status	valid_from	valid_to
1		A	01-JAN-2009	22-JAN-2009
1		B	23-JAN-2009	01-FEB-2009
1		C	02-FEB-2009	12-FEB-2009
1		B	13-FEB-2009	14-FEB-2009
1		B	15-FEB-2009	22-FEB-2009
1		B	23-FEB-2009	03-MAR-2009
1		A	04-MAR-2009	17-MAR-2009

Those 3 consecutive rows for status B should be combined into one row like so:

code:

account_no	status	valid_from	valid_to
1		A	01-JAN-2009	22-JAN-2009
1		B	23-JAN-2009	01-FEB-2009
1		C	02-FEB-2009	12-FEB-2009
1		B	13-FEB-2009	03-MAR-2009
1		A	04-MAR-2009	17-MAR-2009

(This problem applies to many account_no values and different statuses)

Anyone have any idea how to do this row combination in SQL? I can do it programatically using row-by-row comparisons but for a 22 million row table it will take many hours...

Thanks.

# ? Sep 5, 2009 19:23

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 19:35

Vanadium Dame: May 29, 2002; HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

What flavor of db are you using? If it's Oracle you could use the max() and min() partition analytic function dark magic on the valid_from and valid_to columns to get what you're looking for I think. http://psoug.org/reference/analytic_functions.html

Also, nice user name.

Vanadium Dame fucked around with this message at 21:04 on Sep 5, 2009

# ? Sep 5, 2009 21:02

Xae: Jan 19, 2005

dwayne_dibbley posted:

I have a table that stores the history of status changes for accounts but have some bad rows in there, here is a simplified example:
code:
account_no	status	valid_from	valid_to
1		A	01-JAN-2009	22-JAN-2009
1		B	23-JAN-2009	01-FEB-2009
1		C	02-FEB-2009	12-FEB-2009
1		B	13-FEB-2009	14-FEB-2009
1		B	15-FEB-2009	22-FEB-2009
1		B	23-FEB-2009	03-MAR-2009
1		A	04-MAR-2009	17-MAR-2009
Those 3 consecutive rows for status B should be combined into one row like so:
code:
account_no	status	valid_from	valid_to
1		A	01-JAN-2009	22-JAN-2009
1		B	23-JAN-2009	01-FEB-2009
1		C	02-FEB-2009	12-FEB-2009
1		B	13-FEB-2009	03-MAR-2009
1		A	04-MAR-2009	17-MAR-2009
(This problem applies to many account_no values and different statuses)

Anyone have any idea how to do this row combination in SQL? I can do it programatically using row-by-row comparisons but for a 22 million row table it will take many hours...

Thanks.

code:

SELECT account_no, status, min(valod_from), max(valid_to)
  FROM account_status
GROUP BY account_no, status;

# ? Sep 6, 2009 05:02

dwayne_dibbley: Nov 26, 2005

Markoff Chaney posted:

What flavor of db are you using? If it's Oracle you could use the max() and min() partition analytic function dark magic

Yeah, Oracle. I've been trying to use the analytical functions but so far no success... :-(

# ? Sep 6, 2009 10:47

dwayne_dibbley: Nov 26, 2005

Xae posted:

code:

SELECT account_no, status, min(valod_from), max(valid_to)
  FROM account_status
GROUP BY account_no, status;

Thanks, unfortunately this combines too many of the rows. The status A are combined when they shouldn't be and the first B shouldn't be touched also.

# ? Sep 6, 2009 10:49

ether: May 20, 2001

dwayne_dibbley posted:

...

Anyone have any idea how to do this row combination in SQL? I can do it programatically using row-by-row comparisons but for a 22 million row table it will take many hours...

Thanks.

Something like this might work, haven't done any serious PL/SQL ever and no Oracle nearby to test it.

code:

DECLARE last_status varchar(1);
DECLARE collapser number;

SELECT 
    account_no, 
    status, 
    min(valid_from), 
    max(valid_to)
  FROM 
  (
    SELECT 
      account_no, 
      status, 
      valid_from, 
      valid_to, 
      collapser := CASE last_status <> status THEN collapser ELSE collapser+1 END CASE,
      last_status := status
    FROM
    (
      SELECT 
        account_no, 
        status, 
        valid_from, 
        valid_to      
      FROM account_status
    )
    GROUP BY account_no, status, collapser
  )

# ? Sep 7, 2009 09:09

Sneaking Mission: Nov 11, 2008

dwayne_dibbley posted:

Thanks, unfortunately this combines too many of the rows. The status A are combined when they shouldn't be and the first B shouldn't be touched also.

This should work assuming that all rows have a non-null valid_from value. It makes liberal use of the lag and lead analytic functions to check the previous and next statuses to determine if the records are sequential.

code:

select account_number, status, valid_from, valid_to
from (
    select account_number, status, valid_from,
        case when valid_to is null then lead(valid_to, 1, null) over (partition by account_number order by valid_from)
        else valid_to end as valid_to
    from (
        select account_number, status,
            case when prev_status=status then null else valid_from end as valid_from,
            case when next_status=status then null else valid_to end as valid_to
        from (
            select account_number, status, valid_from, valid_to,
                lag(status, 1, NULL) over (partition by account_number order by valid_from) as prev_status,
                lead(status, 1, NULL) over (partition by account_number order by valid_from) as next_status
            from sa_test
        )
    )
    where valid_from is not null or valid_to is not null
)
where valid_from is not null

# ? Sep 7, 2009 18:29

dwayne_dibbley: Nov 26, 2005

Clone5 posted:

<Snip very cool SQL>

Holy crap, that worked!

(I had to make one very very small correction on the outer-most lead function, in the order by valid_from we now had nulls in the column so I replaced that with an explicit row_number() ordering)

code:

select account_no, status, valid_from, valid_to
from
(
    select account_no, status, valid_from,
        case when valid_to is null then lead(valid_to, 1, null) over (partition by account_no order by row_number)
        else valid_to end as valid_to
    from (
        select account_no, status,
            case when prev_status=status then null else valid_from end as valid_from,
            case when next_status=status then null else valid_to end as valid_to,
            row_number
        from (
            select account_no, valid_from, valid_to,
                lag(status, 1, NULL) over (partition by account_no order by valid_from) as prev_status,
                status,
                lead(status, 1, NULL) over (partition by account_no order by valid_from) as next_status,
                row_number() over (partition by account_no order by valid_from) as row_number
            from sa_test
             )
          )
    where valid_from is not null or valid_to is not null
)
where valid_from is not null
order by valid_from

Well, it worked on my little test case, I'll have a go at my real 22 million row table now.

But man... I am drat impressed. I had nearly given up!

ps. thanks to the previous 3 posters for taking the time to reply.

# ? Sep 8, 2009 01:03

Beardless Woman: May 5, 2004; M for Mysterious

We're using MySQL and I'm having the hardest time getting it to work the way I want it to.

There's a table with 3 relevant fields: machineID, switch, and port.

code:

machineID    switch    port
590            506      0
593            506      0
594            506      0
595            506      0
596            506      1

I'm trying to check to see if any of these are recorded as being on the same switch and port.

I managed to hack together something using a temporary table, but I'd rather it be done all at once.

code:

CREATE TEMPORARY TABLE machine_tmp
  SELECT   machineID,
           switch,
           port
  FROM     machine
  GROUP BY switch,
           port
  HAVING   COUNT(*) > 1;

SELECT   machine.machineID,
         machine.switch,
         machine.port
FROM     machine_tmp,
         machine
WHERE    machine_tmp.switch = machine.switch
         AND machine_tmp.port = machine.port
ORDER BY switch,
         port; 

+-----------+--------+------+
| machineID | switch | port |
+-----------+--------+------+
| 590       |   506  |    0 | 
| 593       |   506  |    0 | 
| 594       |   506  |    0 | 
| 595       |   506  |    0 | 
+-----------+--------+------+

I was able to get it to work all in one statement, but I can't seem to get the output on it "right." I'd much rather have a list similar to the first one but how to get it there is completely evading me.

code:

SELECT   t1.machineID AS machine1,
         t2.machineID AS machine2,
         t2.switch,
         t2.port
FROM     machine AS t1,
         machine AS t2
WHERE    (t1.switch = t2.switch
          AND t1.port = t2.port
          AND t1.machineID < t2.machineID)
GROUP BY t1.machineID
ORDER BY t2.switch,
         t2.port,
         t1.machineID; 

+----------+----------+--------+------+
| machine1 | machine2 | switch | port |
+----------+----------+--------+------+
| 590      | 593      | 506    |    0 | 
| 590      | 594      | 506    |    0 | 
| 590      | 595      | 506    |    0 |
+----------+----------+--------+------+

Beardless Woman fucked around with this message at 17:05 on Sep 8, 2009

# ? Sep 8, 2009 15:31

MikeRabsitch: Aug 23, 2004; Show us what you got, what you got

Beardless Woman posted:

We're using MySQL and I'm having the hardest time getting it to work the way I want it to.

There's a table with 3 relevant fields: machineID, switch, and port.

code:

machineID    switch    port
590            506      0
593            506      0
594            506      0
595            506      0
596            506      1

I'm trying to check to see if any of these are recorded as being on the same switch and port.

I managed to hack together something using a temporary table, but I'd rather it be done all at once.

code:

CREATE TEMPORARY TABLE machine_tmp
  SELECT   machineID,
           switch,
           port
  FROM     machine
  GROUP BY switch,
           port
  HAVING   COUNT(*) > 1;

SELECT   machine.machineID,
         machine.switch,
         machine.port
FROM     machine_tmp,
         machine
WHERE    machine_tmp.switch = machine.switch
         AND machine_tmp.port = machine.port
ORDER BY switch,
         port; 

+-----------+--------+------+
| machineID | switch | port |
+-----------+--------+------+
| 590       |   506  |    0 | 
| 593       |   506  |    0 | 
| 594       |   506  |    0 | 
| 595       |   506  |    0 | 
+-----------+--------+------+

code:

SELECT   t1.machineID AS machine1,
         t2.machineID AS machine2,
         t2.switch,
         t2.port
FROM     machine AS t1,
         machine AS t2
WHERE    (AND t1.switch = t2.switch
          AND t1.port = t2.port
          AND t1.machineID < t2.machineID)
GROUP BY t1.machineID,
         t2.machineID
ORDER BY t2.switch,
         t2.port,
         t1.machineID; 

+----------+----------+--------+------+
| machine1 | machine2 | switch | port |
+----------+----------+--------+------+
| 590      | 593      | 506    |    0 | 
| 590      | 594      | 506    |    0 | 
| 590      | 595      | 506    |    0 |
+----------+----------+--------+------+

code:

SELECT   t1.machineID AS machine,
         t2.switch,
         t2.port
FROM     machine AS t1,
         machine AS t2
WHERE    (AND t1.switch = t2.switch
          AND t1.port = t2.port
          AND t1.machineID <> t2.machineID)
GROUP BY t1.machineID,
ORDER BY t2.switch,
         t2.port,
         t1.machineID;

Can't test it, would that work?

# ? Sep 8, 2009 16:37

blueberrypudding: Sep 3, 2009

Knightmare posted:

code:

WHERE    (AND t1.switch = t2.switch
          AND t1.port = t2.port
          AND t1.machineID <> t2.machineID)

Should the first "AND" be there?

blueberrypudding fucked around with this message at 14:59 on Sep 9, 2009

# ? Sep 8, 2009 16:46

Beardless Woman: May 5, 2004; M for Mysterious

You are both absolutely correct. Thank you so much!

# ? Sep 8, 2009 17:04

reworp: Apr 28, 2006

So I'm working with a medium-sized (~400k row) database of scores. The idea is that every x minutes, a PHP script inserts a bunch of new scores into the database, then calculates the new ranks of each score, handling ties, and inserts that value into the database. Here's the code I was using to do this:

code:

UPDATE scoretable
SET    rankscore = (SELECT rankscore
                    FROM   (SELECT   t1.id,
                                     Count(t2.score) AS rankscore
                            FROM     scoretable AS t1,
                                     scoretable AS t2
                            WHERE    t1.score < t2.score
                                      OR (t1.score = t2.score
                                          AND t1.id = t2.id)
                            GROUP BY t1.id,
                                     t1.score
                            ORDER BY t1.score ASC,
                                     t1.id ASC) AS derivedtable
                    WHERE  derivedtable.id = scoretable.id)
WHERE  EXISTS (SELECT rankscore
               FROM   (SELECT   t1.id,
                                Count(t2.score) AS rankscore
                       FROM     scoretable AS t1,
                                scoretable AS t2
                       WHERE    t1.score < t2.score
                                 OR (t1.score = t2.score
                                     AND t1.id = t2.id)
                       GROUP BY t1.id,
                                t1.score
                       ORDER BY t1.score ASC,
                                t1.id ASC) AS derivedtable
               WHERE  derivedtable.id = scoretable.id)

This works great when I test it on a tiny table... however, when I tried to run it on a (copy) of the real score table with 400k rows, it ended up loving up the table and causing a whole mess of problems. Guessing the server ran out of memory because of those derived tables?

I have some other ideas on how to do this, like creating a temporary table then getting the new rank from there, but I'm a bit hesitant to do anything after the debacle that this code caused (it's not my server, I'm doing this as semi-contract work for a friend of a friend) without being 100% sure it'll at least not completely gently caress up a table... any ideas?

# ? Sep 10, 2009 20:49

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

That query doesn't look so hot (given its length alone). Can you describe how your rank is calculated?

# ? Sep 10, 2009 21:18

reworp: Apr 28, 2006

Yeah, I just tried running that query again (learning from my mistakes, I exported the database to my own XAMPP server so I had permission to kill the query after it stalled) and same thing happened, so that query's not usable.

The scores are times, so lower is better, but other than it's what you'd expect: if A's time is 1.2, B's is 1.5, C's is 1.5, and D's is 2.0, A would be rank 1, B and C would both be rank 2 and D would be rank 4. The ranks themselves are calculated in the query, which I pulled from http://www.1keydata.com/sql/sql-rank.html. Again, works great for small tables, but not for something as large as this one.

edit: Decided to go the simple route and just use order by/variable incrementing:

code:

SET @rownum := 0;

UPDATE scoretable
SET    rankscore = (SELECT @rownum := @rownum + 1)
ORDER BY score ASC;

This approach is fast but doesn't give scores with equal values the same rank... any way to work around that?

reworp fucked around with this message at 21:58 on Sep 10, 2009

# ? Sep 10, 2009 21:34

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

reworp posted:

The scores are times, so lower is better, but other than it's what you'd expect: if A's time is 1.2, B's is 1.5, C's is 1.5, and D's is 2.0, A would be rank 1, B and C would both be rank 2 and D would be rank 4.

In other words, a person's rank is exactly equal to the number of people with strictly lower times, plus one.

code:

UPDATE scoretable
  SET rankscore = (SELECT COUNT(*)
                     FROM scoretable AS sub
                     WHERE sub.score < scoretable.score) + 1

Edit: Better standard SQL compliance.

ShoulderDaemon fucked around with this message at 22:39 on Sep 10, 2009

# ? Sep 10, 2009 22:30

reworp: Apr 28, 2006

ShoulderDaemon posted:

In other words, a person's rank is exactly equal to the number of people with strictly lower times, plus one.
code:
UPDATE scoretable AS parent
  SET rankscore = (SELECT COUNT(*)
                     FROM scoretable AS sub
                     WHERE sub.score < parent.score) + 1

This is true, but that method is also very, very slow, since you're doing a SELECT COUNT(*) for 400k entries. It was actually how the rank was being calculated before I started working on this. Also, that query gives an error when I tried to run it, "#1093 - You can't specify target table 'parent' for update in FROM clause"

Edit: Your new SQL gives the same error. I believe in MySQL you aren't allowed to reference the table that you're updating in derived tables

reworp fucked around with this message at 22:49 on Sep 10, 2009

# ? Sep 10, 2009 22:38

ShoulderDaemon: Oct 9, 2003; support goon fund; Taco Defender

I just built a synthetic 0.5mil rowset in Postgres 8.3 to look at that. With an index on scores, the query planner is just touching each row in the score table once, at the top, and to determine the count it looks up the score of that row in the index, then uses a single index check to find the count of lower scores. That's 0.5mil index hits, and 0.5mil row hits, which isn't that bad.

You could do better by walking the index manually in order and maintaining your own next-rank counter, but I don't know if your database lets you write custom index queries. That doesn't actually reduce your hitcount at all, but it preserves locality on the index better, so there's some small constant factor speedup.

If you want to do it with a row walk instead of an index walk, it's similar to what you had before with the @rownum counter, but you need two state variables: The last rank assigned, and the score of that rank. You walk the table in score order, increment the rank variable if the score of the current row is greater than the previous score, then assign the rank. Probably clearest in a stored procedure. That gives you 0.5mil row hits and however many index hits are needed to sort your rows, which if you're clustered on the score index is cheaper than what I had, but otherwise is more expensive.

Edit: Oh, you're in MySQL, no wonder it's slow. You'll probably just want to cluster on a score index and do the row walk yourself as I wrote above; otherwise you'll just be fighting with the terrible query planner forever.

ShoulderDaemon fucked around with this message at 23:08 on Sep 10, 2009

# ? Sep 10, 2009 23:06

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

reworp posted:

This is true, but that method is also very, very slow, since you're doing a SELECT COUNT(*) for 400k entries. It was actually how the rank was being calculated before I started working on this. Also, that query gives an error when I tried to run it, "#1093 - You can't specify target table 'parent' for update in FROM clause"

Edit: Your new SQL gives the same error. I believe in MySQL you aren't allowed to reference the table that you're updating in derived tables

code:

CREATE PROCEDURE CalculateRanks ()
LANGUAGE SQL
NOT DETERMINISTIC
MODIFIES SQL DATA
SQL SECURITY DEFINER
BEGIN
DECLARE Finished TINYINT DEFAULT 0;
DECLARE NewScore INT;
DECLARE OldScore INT;
DECLARE RowID INT;
DECLARE RecordedRank INT DEFAULT 1;
DECLARE RealRank INT DEFAULT 0;
DECLARE Cur CURSOR FOR SELECT score, id FROM scoretable ORDER BY score ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET Finished = 1;
OPEN Cur;
REPEAT SET OldScore = NewScore;
FETCH Cur INTO NewScore, RowID;
IF Finished = 0 THEN SET RealRank = RealRank + 1;
IF NewScore <> OldScore THEN SET RecordedRank = RealRank;
END IF;
UPDATE score SET rank = RecordedRank WHERE id = RowID;
END IF;
UNTIL Finished = 1 END REPEAT;
CLOSE Cur;
END

I have not tested this and am not certain of all the syntax, use at own risk

(Once defined it is executed by issuing CALL CalculateRanks())

# ? Sep 11, 2009 00:09

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

Also double check to make sure your engine doesn't already provide ranking functions (SQL Server does).

# ? Sep 11, 2009 03:43

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

I wondered how one my query for trends in some data that has a record for a date field. For example, I wondered if it was possible in one query to find all records which, for the past 3 days, have had a day-to-day net increase in a certain field. I wondered then if it was possible for this to follow date records where there are holes, such that it would jump over the hole and look just by the consecutive days it has recorded, not by actual consecutive days in real time.

Perhaps an example:

code:

Name     Date          Data
A        09-01-2009    1
B        09-01-2009    10
A        09-02-2009    2
B        09-02-2009    12
A        09-03-2009    3
B        09-03-2009    9
A        09-07-2009    4
B        09-07-2009    20
A        09-08-2009    5
B        09-08-2009    6

I'd like to somehow form a query that'll let me know of a trend in A. I figured I'd just query the whole table in chunks and look through it manually, but I hoped for something on the DB end that could take care of the grunt work.

# ? Sep 13, 2009 06:37

Squashy Nipples: Aug 18, 2007

Can someone please explain how I hosed up?

This works:

code:

SELECT
 MSE_SEC_NM AS SEC_NAME,
 COUNT(DISTINCT MALPHLMH.MAL_INVR_ID) AS PARTSBAL
 FROM MALPHLMH, MSECURTY
 WHERE ( MALPHLMH.MAL_SEC_ID = MSECURTY.MSE_SEC_ID )
 AND MALPHLMH.MAL_MTH_STR_DT = '03/01/2009' 
 AND MALPHLMH.MAL_DCP_NBR IN (planlist)
 AND MALPHLMH.MAL_HLDG_BL > 0
 GROUP BY MSE_SEC_NM

This does NOT:

code:

SELECT
 MSE_SEC_NM AS SEC_NAME,
 COUNT(DISTINCT MALPHLMH.MAL_INVR_ID) AS PARTSBAL
 FROM MALPHLMH, MSECURTY
 WHERE ( MALPHLMH.MAL_SEC_ID = MSECURTY.MSE_SEC_ID )
 AND MALPHLMH.MAL_MTH_STR_DT = '03/01/2009' 
 AND MALPHLMH.MAL_DCP_NBR IN (planlist)
 GROUP BY MSE_SEC_NM
 HAVING SUM(MALPHLMH.MAL_HLDG_BL) > 0

In the second case, PARTSBAL (the number of distinct participants with a balance in a specific Security) is overstated by almost double.

I mean, I understand that WHERE happens before the GROUPing, and HAVING happens after, but I guess the order its hosed up in my head...

# ? Sep 15, 2009 20:32

MoNsTeR: Jun 29, 2002

Rocko Bonaparte posted:

I wondered how one my query for trends in some data that has a record for a date field. For example, I wondered if it was possible in one query to find all records which, for the past 3 days, have had a day-to-day net increase in a certain field. I wondered then if it was possible for this to follow date records where there are holes, such that it would jump over the hole and look just by the consecutive days it has recorded, not by actual consecutive days in real time.

Perhaps an example:
code:
Name     Date          Data
A        09-01-2009    1
B        09-01-2009    10
A        09-02-2009    2
B        09-02-2009    12
A        09-03-2009    3
B        09-03-2009    9
A        09-07-2009    4
B        09-07-2009    20
A        09-08-2009    5
B        09-08-2009    6
I'd like to somehow form a query that'll let me know of a trend in A. I figured I'd just query the whole table in chunks and look through it manually, but I hoped for something on the DB end that could take care of the grunt work.

Analytic functions to the rescue once more:

code:

select *
  from (
        select name
             , date
             , data
             , lead(date,1) over(partition by name order by date) as next_date
             , lead(data,1) over(partition by name order by date) as next_data
          from rockos_table
       )
 where next_date >= trunc(sysdate)-3 --within the past 3 days
   and next_data-data > 0 --net day-to-day increase

This is written for Oracle. If you have Microsoft or Postgres the syntax is probably slightly different. If your implementation lacks analytic functions, you'll probably have to do something procedural or in multiple steps, as analytics cannot easily be simulated through SQL tricks.

I'm not sure it does exactly what you want, but it gets at the general idea. The main point is that lead() gets the next row as defined by the windowing clause, regardless of whether the dates are calendar-continuous.

# ? Sep 15, 2009 20:37

inveratulo: May 14, 2002; wat

I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases.

Is there a good starting point for intermediate users regarding database design best practices? I'm looking for a book or something. Budget is < $100

# ? Sep 16, 2009 02:32

Kekekela: Oct 28, 2004

inveratulo posted:

I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases.

Is there a good starting point for intermediate users regarding database design best practices? I'm looking for a book or something. Budget is < $100

If you just search for "data normalization" or "normal forms" (there are five but generally getting to third normal form is good enough) you can probably find a bunch of good tutorials/references. From there you can make intelligent decisions on what you want to denormalize for ease of reporting etc. (you may also find "OLAP" and "data cubing" to be useful topics, but I'd get comfortable with normalization first)

Sorry I don't have more specific recommendations but this was actually one of the only useful things I learned in college several decades ago, so I can't even remember what texts we used and I'm sure there is a bunch of better poo poo now anyway.

# ? Sep 16, 2009 02:43

Isometric Bacon: Jul 24, 2004; Let's get naked!

I've been having troubles sorting by dates in my program.

We're using a Access database as a backend (yeah...), which is being accessed and controlled by ASP Classic with VBScript.

Now everything seems to act swimmingly, untill you add dates into the equation. For whatever reason, it just refuses to properly display specific dates, date ranges and order by them.

The database has been imported from an excel spreadsheet where the dates are entered in as milestones are reached. Thus the date fields have many null values, as this indicates that there is no milestone for this date.

Part of the reason I think this may be happening is because Access seems to only want to read the dates in the US specific way of "MM/DD/YYYY" whereas since we're in Australia, we address it via "DD/MM/YYYY". A Lcid statement in my ASP allowed the ASP pages to display the date correctly, but I still think it's registering them as numbers, rather than actual 'dates'. I have tried using the cDate function of both SQL and in ASP to no luck.

Basically to break it down - a query of SELECT Date WHERE Date BETWEEN CDate("&StartDate&") AND CDate("&EndDate&") will return absolutely nothing, regardless of the values of start date and end date. However, if I add a "IS NOT BETWEEN" clause, it will return everything, making me think they're still recognised as number values.

Anyone ever experienced anything like this before? I'm at my wits end here.

I'm starting to wonder if I could be able to reformat Access to recognise YYYY/MM/DD but it looks fairly difficult to get it both reading them in properly and ASP recognising it properly too.

# ? Sep 16, 2009 07:18

Bhaal: Jul 13, 2001; I ain't going down alone; Dr. Infant, MD

inveratulo posted:

I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases.

Is there a good starting point for intermediate users regarding database design best practices? I'm looking for a book or something. Budget is < $100

This collection of anti patterns could be handy to avoid common pitfalls and get a sense of the reasons behind 'good' db practices.

(pdf link)
http://www.karwin.com/downloads/SQL_Antipatterns_MySQLUC2008.pdf

# ? Sep 16, 2009 08:56

Begby: Apr 7, 2005; Light saber? Check. Black boots? Check. Codpiece? Check. He's more machine than kid now.

Isometric Bacon posted:

I've been having troubles sorting by dates in my program.

We're using a Access database as a backend (yeah...), which is being accessed and controlled by ASP Classic with VBScript.

Now everything seems to act swimmingly, untill you add dates into the equation. For whatever reason, it just refuses to properly display specific dates, date ranges and order by them.

The database has been imported from an excel spreadsheet where the dates are entered in as milestones are reached. Thus the date fields have many null values, as this indicates that there is no milestone for this date.

Part of the reason I think this may be happening is because Access seems to only want to read the dates in the US specific way of "MM/DD/YYYY" whereas since we're in Australia, we address it via "DD/MM/YYYY". A Lcid statement in my ASP allowed the ASP pages to display the date correctly, but I still think it's registering them as numbers, rather than actual 'dates'. I have tried using the cDate function of both SQL and in ASP to no luck.

Basically to break it down - a query of SELECT Date WHERE Date BETWEEN CDate("&StartDate&") AND CDate("&EndDate&") will return absolutely nothing, regardless of the values of start date and end date. However, if I add a "IS NOT BETWEEN" clause, it will return everything, making me think they're still recognised as number values.

Anyone ever experienced anything like this before? I'm at my wits end here. I'm starting to wonder if I could be able to reformat Access to recognise YYYY/MM/DD but it looks fairly difficult to get it both reading them in properly and ASP recognising it properly too.

Is the name of your field actually called 'Date'? If so are escaping it in your sql statements properly? Like `Date`?

# ? Sep 16, 2009 13:31

MoNsTeR: Jun 29, 2002

Dr.Khron posted:

Can someone please explain how I hosed up?
This works:
code:
 COUNT(DISTINCT MALPHLMH.MAL_INVR_ID) AS PARTSBAL
 AND MALPHLMH.MAL_HLDG_BL > 0
This does NOT:
code:
 COUNT(DISTINCT MALPHLMH.MAL_INVR_ID) AS PARTSBAL
 HAVING SUM(MALPHLMH.MAL_HLDG_BL) > 0
In the second case, PARTSBAL (the number of distinct participants with a balance in a specific Security) is overstated by almost double.

I mean, I understand that WHERE happens before the GROUPing, and HAVING happens after, but I guess the order its hosed up in my head...

The count is overstated because in version #2 you are NOT, in fact, counting participants with a balance-in-each-security by security, you are counting participants by security for securities that have a balance. In other words, he SUM() in your HAVING clause does not describe the balance of a participant, it describes the balance of all participants for a security, because sec_name is what you are grouping by.

If you remove both >0 clauses and put the SUM() in your select, look at the output and do the filtering process mentally, the difference should be clear.

# ? Sep 16, 2009 23:13

jimbroof: Sep 4, 2003; my wife drew this for me :]

I have two stored procedures that look like this:

code:

CREATE PROCEDURE CreateSubTable
AS
BEGIN
  DECLARE @temp TABLE(...)
  -- process some data, fill @temp
  SELECT * FROM @temp
  DROP TABLE @temp 
END

and...

code:

CREATE PROCEDURE CreateMainTable
AS
BEGIN
  DECLARE @temp TABLE(...)
  DECLARE Iterator CURSOR FOR (...)
  FETCH NEXT FROM Iterator INTO @var
  WHILE @@FETCH_STATUS = 0
  BEGIN
    INSERT INTO @temp
    EXECUTE CreateSubTable
  END
END

I know I left a few pieces out, but you get the idea. I'm using a sub-procedure to fill a series of small tables and combining them into a larger table with the main procedure.

When I execute CreateSubTable manually, everything looks fine. When I execute CreateMainTable, no rows get inserted. The row count of @temp in CreateSubTable is correct when I execute it and send its output to the console, but when I send its output to another table (@, #, or a permanent table) its row count is 0.

What the gently caress is going on? Am I doing something really stupid? So confused. Please help.

Thanks!

# ? Sep 17, 2009 00:52

Isometric Bacon: Jul 24, 2004; Let's get naked!

Begby posted:

Is the name of your field actually called 'Date'? If so are escaping it in your sql statements properly? Like `Date`?

Ah no it's not, I just used that as an example.

My actual code is a dynamically generated SQL String which is constructed from drop down menu's that the user selects. (i.e. they select specific WHERE values based on what they enter).

The SQL String generally ends up looking something like this once generated:

SELECT QRG.*, Admin.Admin_Name, Functional_Area.Functional_Area, Writer.Writer_Name FROM Writer INNER JOIN (Functional_Area INNER JOIN (Admin INNER JOIN QRG ON Admin.Admin_ID = QRG.Admin_ID) ON Functional_Area.ID = QRG.Functional_Area_ID) ON Writer.Writer_ID = QRG.Writer_ID WHERE QRG.QA_Finalised BETWEEN CDate(17/09/2009) AND CDate(17/09/2009) ORDER BY QA_Sent;

You can safely ignore most of that, except for the WHERE clause. Basically this will return nothing. If I modify the clause to say IS NOT BETWEEN it will return all the records.

I've tried everything I can think of. The date fields are being generated as ASP Strings (which i've convereted to CDate in both ASP and SQL) but still no luck...

# ? Sep 17, 2009 04:40

Isometric Bacon: Jul 24, 2004; Let's get naked!

Fixed my issue in regards to the SQL String - Turns out I needed to cast it into a string before casting CDate on it in the database, confusing!

cDate('<%=Value%>')

Two simple comma's solve everything.

Unfortunately it's still behaving rather strange. If I do a "ORDER BY" on my database, it will order the values roughly in order of date, but not exactly. I have a feeling this has something to do with the MM/DD/YYYY to DD/MM/YYYY conversion that's done automatically in ASP on the server...

# ? Sep 17, 2009 06:21

Squashy Nipples: Aug 18, 2007

MoNsTeR posted:

The count is overstated because in version #2 you are NOT, in fact, counting participants with a balance-in-each-security by security, you are counting participants by security for securities that have a balance. In other words, he SUM() in your HAVING clause does not describe the balance of a participant, it describes the balance of all participants for a security, because sec_name is what you are grouping by.

If you remove both >0 clauses and put the SUM() in your select, look at the output and do the filtering process mentally, the difference should be clear.

Ahhhhh... that is a very good explanation, THANK YOU.

# ? Sep 17, 2009 16:24

Strong Sauce: Jul 2, 2003; You know I am not really your father.

In general is it faster to just use REPLACE and let it handle insertion and checking for duplicates (in unique keys) or is it better to do a SELECT to check for a row's existence and execute INSERT/UPDATE accordingly? I'm using SQLite3 in this instance if it matters.

# ? Sep 17, 2009 20:45

jimbroof: Sep 4, 2003; my wife drew this for me :]

FYI I had a nested INSERT-EXEC inside of a try-catch. Whoops.

# ? Sep 17, 2009 22:34

var1ety: Jul 26, 2004

Strong Sauce posted:

In general is it faster to just use REPLACE and let it handle insertion and checking for duplicates (in unique keys) or is it better to do a SELECT to check for a row's existence and execute INSERT/UPDATE accordingly? I'm using SQLite3 in this instance if it matters.

If you have a unique key policing your data it's probably faster to do a REPLACE.

In situations where duplicates are infrequent what I personally do is blindly try the insert. If the insert throws a uniqueness exception then I will update/ignore/whatever, as appropriate.

# ? Sep 17, 2009 22:43

Boz0r: Sep 7, 2006; The Rocketship in action.

I'm trying to learn triggers in SQL so I've gotten 4 crazy rear end constraints to set up. My problem is that we're usually using DB2 at school but I can't access it from home, and I'd like to know if these triggers would work and if they could be written better. I've also been told that you can't use subqueries in DB2 so I've tried using Views instead where I could.

This is the general scheme of the DB:

code:

Equipment (
			room VARCHAR(15), 
			type VARCHAR(20)
)
Meetings (
			meetid INT,
			date DATE,
			slot INT, --timeslot
			owner VARCHAR(15),
			what VARCHAR(40) --description
)
Participants (
			meetid INT,
			pid VARCHAR(15), --Can be a userid or room name
			status CHAR(1) --u(nknown), a(ccept), d(ecline)
)
People (
			name VARCHAR(40),
			office VARCHAR(15),
			userid VARCHAR(15),
			group CHAR(3)
)
Rooms (
			room VARCHAR(15),
			capacity INT
)

These are the triggers I'm trying to set up:

1* amoeller must be invited to any meeting that involves beer.

code:

CREATE VIEW moellerMeetings AS
	SELECT meetid 
	FROM Participants
	WHERE pid = 'amoeller';

CREATE TRIGGER BeerMeeting
	AFTER UPDATE ON Meetings
	REFERENCING NEW ROW AS n
	FOR EACH ROW
	WHEN n.what LIKE '%beer%'	--Beer is involved.
		AND
		n.meetid NOT IN moellerMeetings	--Meeting is not in his list of meetings already.
	INSERT INTO Participants
	VALUES(n.meetid,'amoeller','a') --Automatically accepts because it's beer.

2* People in group phd cannot block people in group vip from booking a room. (I figure this means a vip can override a phd booking of a room.)

code:

CREATE VIEW RoomList AS
	SELECT room
	FROM Rooms;

CREATE VIEW BookedRooms AS
	SELECT date, slot, pid
	FROM Meetings, Participants
	WHERE pid IN RoomList;

CREATE TRIGGER MeetingOverrider
	BEFORE INSERT ON Participants
	REFERENCING NEW ROW AS n
	FOR EACH STATEMENT
	WHEN 0 < 	(SELECT COUNT(pid)			--Room is booked
				FROM BookedRooms
				WHERE pid=n.pid
					AND slot = (SELECT slot				--Check timeslot
								FROM Meetings
								WHERE meetid=n.meetid)
					AND date = (SELECT date				--Check date
								FROM Meetings
								WHERE meetid=n.meetid)
				)
	DELETE FROM Participants
	WHERE pid = n.pid

3* People in group phd must always accept meetings with people in group vip.

code:

CREATE VIEW Vips AS
	SELECT userid 
	FROM People
	WHERE group = 'vip';

CREATE VIEW Phds AS
	SELECT userid 
	FROM People
	WHERE group = 'phd';

CREATE TRIGGER PhdAccept
	AFTER INSERT ON Participants
	REFERENCING NEW ROW AS n
	REFERENCING OLD ROW AS o
	FOR EACH ROW
	WHEN n.pid IN Vips		--vip is attending?
		AND
		o.pid IN Phds 		--Participant is phd
	UPDATE Participants
	SET status = 'a'
	WHERE pid = o.pid & pid IN Phds;

4* For any meeting, the number of participants cannot exceed the capacity of the room.

code:

CREATE VIEW FullMeetings AS
SELECT meetid
FROM Meetings
WHERE 	(SELECT COUNT(DISTINCT pid)
		FROM Participants
		WHERE meetid=Meetings.meetid AND
			status<>'d' AND
			pid NOT IN RoomList)
	=
		(SELECT capacity
		FROM Rooms, Participants
		WHERE room=pid AND meetid=Meetings.meetid);

CREATE TRIGGER CapacityChecker
	BEFORE INSERT ON Participants
	REFERENCING NEW ROW AS n
	FOR EACH STATEMENT
	WHEN (n.meetid IN FullMeetings)
		SET n.meetid = NULL;

Does this make sense or should I rewrite the whole thing?
Is there a better way to do these things?
Also, is there a better validator around than the built-in one in DB2? That thing throws all sorts of crazy poo poo at me when I write wrong stuff.

# ? Sep 18, 2009 00:53

Atoramos: Aug 31, 2003; Jim's now a Blind Cave Salamander!

I'm sure this is probably an easy thing to do, and I don't have the experience to nab it easily.

code:

CITY	      COUNTY	
Accord	      Ulster
Acra	      Greene
Adams	      Jefferson
Adams Basin   Monroe
Adams Center  Jefferson
Adams Corners Putnam
Adams Cove    Jefferson

I want to make it so the COUNTY is distinct (I need one city from each county, I don't really care which is 'chosen' by SQL as the ones not to display)

Nevermind, got it. I forgot all about GROUP BY.

Atoramos fucked around with this message at 14:02 on Sep 18, 2009

# ? Sep 18, 2009 13:17

Triple Tech: Jul 28, 2006; So what, are you quitting to join Homo Explosion?

That's not really... possible. If you group by county, you have to aggregate city on something, and since city is nominal, aggregation doesn't really make sense.

For scenarios like "pick random" or "pick one, I don't care which", SQL isn't ideal since it's based on defining sets and fetching them efficiently.

But it's true that it happens so often, there might as well be a provision to make random access more convenient and possibly more efficient. (other than making a random seed column and picking the min/max of it)

# ? Sep 18, 2009 14:12

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 19:35

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

I want to add the ability for users to specify a timezone so I can display timestamps with the appropriate offset in a MySQL based application. Is using set timezone = timezone a bad idea? Should I be doing this in my application code, rather than just setting it at the database level?

edit: actually, I'm not even sure that it's even necessary to display exact timestamps in this app. I think X hours/days/weeks/months/years ago is all people will need. Still curious about where in an application timezone conversion should be done though.

fletcher fucked around with this message at 23:52 on Sep 22, 2009

# ? Sep 22, 2009 23:48

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »