Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zombywuf
Mar 29, 2008

pseudorandom posted:

What's the best way to go about handling items of information in a database that potentially have an unlimited number of attributes to each item?

A few examples of what I'm trying to describe would be:
  • A post with comments. How would comments best be implemented? Should all comments on the website be mashed into one large comments table? Perhaps each post gets its own comments table? Or is there a way to place comments inside each post's row as some sort of third dimension?
  • A rating system. I'd have to record each user who rates the item, and what the rating is, so that people can't rate more than once. Much like the first example, how is this information best handled?

When you say an unlimited number of attributes, do you mean an unlimited number of the same type of attribute, i.e. a post has an unlimited number of comments but has no other attributes?

If that is the case then first you make sure all your tables have a primary key which exists only to identify each record in the table. You can then associate any number of records from one table with the other as DankTamagachi describes. The association can be enforced in the DB with a foreign key relationship. Your schema for both the cases above would look something like (if I recall MySQL syntax correctly:
code:
create table Users (
  user_id int not null primary key auto_increment,
  user_name varchar not null
);

create table Posts (
  post_id int not null primary key auto_increment,
  fk_user_id int not null references Users (user_id),
  post text not null
);

create table Comments (
  comment_id int not null primary key auto_increment,
  fk_user_id int not null references Users (user_id),
  fk_post_id int not null references Posts(post_id),
  comment text not null
);

create table Ratings (
  rating_id int not null primary key auto_increment,
  fk_user_id int not null references Users(user_id),
  fk_post_id int not null references Posts(post_id),
  rating int not null
  unique (fk_user_id, fk_post_id) -- This ensures a user can only rate a post once
);
Basically each attribute you want to associate with posts should have a new table with column referencing the post id. If you are looking up attributes by post a lot (which you probably will be) you will want to create an index on the attribute tables for the fk_post_id column.

EDIT: I think it should be obvious how you'd add contacts using this scheme You'll need two columns referencing Users(user_id)

Adbot
ADBOT LOVES YOU

Che Delilas
Nov 23, 2009
FREE TIBET WEED

pseudorandom posted:

potentially unlimited number of attributes for each item?

Define "potentially unlimited attributes." Do you mean that any given item can have any number of user-defined or item-specific attributes ("attributes" in this case would be columns in your table)? Or do you mean that you as the designer are going to specify the attributes (again, columns) that are common to all items and there can be an unlimited number of entries that correspond to one unique item?

Your examples seem to indicate the latter. So say item "widget27" could have a million people all enter a comment about that item. If this is the case, I would do what you suggested: have one table for comments and have foreign keys for the item ID and the user ID who entered the comment (if you're storing user information).

For ratings you could basically do the same thing. Have a table with UserID, ItemID, and Rating. The first two columns are foreign keys to maintain referential integrity, and you can also make the two together the composite primary key for the table, which will make sure each user only rates each item once and avoids needing to create a separate ID for ratings.

If, on the other hand, you want each item to have user-defined or item-specific attributes, that's tougher. You can look at the Entity-Attribute-Value. Be warned, EAV is hairy, hurts readability and breaks some of the advantages of the relational database model.

pseudorandom
Jun 16, 2010



Yam Slacker

Che Delilas posted:

Define "potentially unlimited attributes." Do you mean that any given item can have any number of user-defined or item-specific attributes ("attributes" in this case would be columns in your table)? Or do you mean that you as the designer are going to specify the attributes (again, columns) that are common to all items and there can be an unlimited number of entries that correspond to one unique item?

Yeah, the latter is pretty much what I meant. Sorry I couldn't better explain it myself. Actually, I think the reason why I was confused is because several of the things I'm trying to do could be done both ways, so that's why I needed help.

And, thank you very much, Zombywuf. That seems to be perfect!

Thanks for your help, guys! :)

Super Delegate
Jan 20, 2005

ƃɐlɟ ǝɥʇ
I'm trying to import a flat file of people/employee details into a new MSSQL 2005 database. I'm using bulk insert on a view of a staging/temp table, but I'm not sure how to move this data into my relational tables.

My biggest problem is that each person has multiple IDs associated with them from other systems. I assign them a surrogate primary key "pID"(person ID)in my person table. I then use the pID as a foreign key in a table that stores all IDs each person could be referenced with.

Reo
Apr 11, 2003

That'll do, Carlos.
That'll do.


Ugh. I'm trying to build a view in MySQL based on different types of user permissions. There is a "friends" table that shows if two users are friends, and a "lock" table that shows if a photo has been given viewing permissions to a specific user.

tphoto.sec has security settings: 0 = anonymous viewing, 1 = friends only, 2 = specific users

User 54 is friends with user 2 and user 56.

There is a row in thas that gives permission on photo 21 to user ID 2.
code:
oid 	sid 	stype 		sort 	datecreated
21 	2 	pho-lck 	99 	2011-03-01 00:00:00
Here's the query I've written, with the relevant part that's not working marked:
code:
SELECT  tphoto.pid, tphoto.sec, tphoto.uid as ownerID, tuser.uid as ViewerID,

CASE tphoto.sec 
WHEN 0
	THEN 'Approved-Anon'
WHEN 1 and EXISTS (select * from tfriends where approved = 1 and (
tfriends.oid=tphoto.uid and tfriends.sid = tuser.uid) or 
(tfriends.oid=tuser.uid and tfriends.sid=tphoto.uid)
)
	THEN 'Approved-Friend'

------------------------------------
WHEN 2 and EXISTS (select * from thas where stype='pho-lck' and oid=tphoto.pid and thas.sid=tuser.uid)
	THEN 'Approved-Lock'
------------------------------------

else 'Denied'
END

as Approval
from tphoto, tuser 
where tphoto.pid in (12,21,26) and tuser.uid in (1,2,54,56)
order by tphoto.sec, tphoto.pid, tuser.uid
And here are the results I get:

code:
pid 	sec 	ownerID 	ViewerID 	Approval
12 	0 	54 	 	1 	Approved-Anon
12 	0 	54 	 	2 	Approved-Anon
12 	0 	54  		54 	Approved-Anon
12 	0 	54  		56 	Approved-Anon
26 	1 	54 	 	1 	Denied
26 	1 	54 	 	2 	Approved-Friend
26 	1 	54 	 	54 	Denied
26 	1 	54  		56 	Approved-Friend
21 	2 	58  		1 	Denied
21 	2 	58  		2 	Denied 	 	--------------THIS SHOULD SHOW 'Approved-Lock'
21 	2 	58 	 	54 	Denied
21 	2 	58  		56 	Denied
I can't figure out why that's not working, unless there's a problem or rule against multiple EXISTS checks in a CASE statement. Any help would be appreciated.

Reo fucked around with this message at 23:58 on Sep 25, 2011

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
WHEN 1 and EXISTS (...)

This does not do what you think it does. This case is testing whether tphoto.sec is equal to (1 AND EXISTS(...)), not whether (tphoto.sec = 1) AND EXISTS(...). The value of the "2" case value is always going to be either 1 or 0, never 2.

You need to rewrite the CASE statement as:

CASE WHEN tphoto.sec = 0 THEN 'Approved-Anon' WHEN tphoto.sec = 1 AND EXISTS ( ...

Edit: Got confused between AND and OR, same issue applies though.

Hammerite fucked around with this message at 00:10 on Sep 26, 2011

Reo
Apr 11, 2003

That'll do, Carlos.
That'll do.


I thought the same thing and changed the security value to a char column (A,B,C) instead of integers, and got vastly different results - (the case statements now show WHEN tphoto.secB = 'A' etc.
code:
pid 	secB 	ownerID 	ViewerID 	Approval
12 	A 	54	 	1 	Approved-Friend
12 	A 	54 		2 	Approved-Friend
12 	A 	54 		54 	Approved-Friend
12 	A 	54 		56 	Approved-Friend
26 	B 	54 		1 	Approved-Anon
26 	B 	54 		2 	Approved-Anon
26 	B 	54 		54 	Approved-Anon
26 	B 	54 		56 	Approved-Anon
21 	C 	58 		1 	Approved-Anon
21 	C 	58 		2 	Approved-Anon
21 	C 	58 		54 	Approved-Anon
21 	C 	58 		56 	Approved-Anon

Reo fucked around with this message at 00:13 on Sep 26, 2011

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Reo posted:

I thought the same thing and changed the security value to a char column (A,B,C) instead of integers, and got vastly different results - (the case statements now show WHEN tphoto.secB = 'A' etc.
code:
pid 	secB 	ownerID 	ViewerID 	Approval
12 	A 	54 	1 	Approved-Anon
12 	A 	54 		2 	Approved-Anon
12 	A 	54 		54 	Approved-Anon
12 	A 	54 		56 	Approved-Anon
26 	B 	54 		1 	Approved-Friend
26 	B 	54 		2 	Approved-Friend
26 	B 	54 		54 	Approved-Friend
26 	B 	54 		56 	Approved-Friend
21 	C 	58 		1 	Approved-Friend
21 	C 	58 		2 	Approved-Friend
21 	C 	58 		54 	Approved-Friend
21 	C 	58 		56 	Approved-Friend

post the amended query in full?

Reo
Apr 11, 2003

That'll do, Carlos.
That'll do.


I just updated the results, apparently it flips them if I use WHEN 'A' vs. WHEN tphoto.secB = 'A'. Anyways, here it is:

code:
SELECT   tphoto.pid,tphoto.secB,tphoto.uid as ownerID, tuser.uid as ViewerID,
CASE tphoto.secB
WHEN tphoto.secB = 'A'
THEN 'Approved-Anon'
WHEN  tphoto.secB = 'B' and EXISTS (select * from tfriends where approved = 1 and (
tfriends.oid=tphoto.uid and tfriends.sid = tuser.uid) or 
(tfriends.oid=tuser.uid and tfriends.sid=tphoto.uid)
)
THEN 'Approved-Friend'
WHEN  tphoto.secB = 'C' and EXISTS (select * from thas where stype='pho-lck' and oid=tphoto.pid and thas.sid=tuser.uid)
THEN 'Approved-Lock'
else 'Denied'
END
as Approval
from tphoto, tuser where tphoto.pid in (12,21,26) and tuser.uid in (1,2,54,56)
order by tphoto.sec, tphoto.pid, tuser.uid
code:
pid 	secB 	ownerID 	ViewerID 	Approval
12 	A 	54 		1 	Approved-Friend
12 	A 	54 		2 	Approved-Friend
12 	A 	54 		54 	Approved-Friend
12 	A 	54 		56 	Approved-Friend
26 	B 	54 		1 	Approved-Anon
26 	B 	54 		2 	Approved-Anon
26 	B 	54 		54 	Approved-Anon
26 	B 	54 		56 	Approved-Anon
21 	C 	58 		1 	Approved-Anon
21 	C 	58 		2 	Approved-Anon
21 	C 	58 		54 	Approved-Anon
21 	C 	58 		56 	Approved-Anon

Reo fucked around with this message at 00:16 on Sep 26, 2011

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
CASE tphoto.secB WHEN ...

This is not what you want. Change it to CASE WHEN ...

The workings of the CASE statement are explained on this page: http://dev.mysql.com/doc/refman/5.0/en/case-statement.html

Note that there are two different ways to use it.

Reo
Apr 11, 2003

That'll do, Carlos.
That'll do.


Aha! God drat it. That was it all along. Thank you!
code:
pid 	secB 	ownerID 	ViewerID 	Approval
12 	A 	54	 	1 	Approved-Anon
12 	A 	54 		2 	Approved-Anon
12 	A 	54 		54 	Approved-Anon
12 	A 	54 		56 	Approved-Anon
26 	B 	54 		1 	Denied
26 	B 	54 		2 	Approved-Friend
26 	B 	54 		54 	Denied
26 	B 	54 		56 	Approved-Friend
21 	C 	58 		1 	Denied
21 	C 	58 		2 	Approved-Lock
21 	C 	58 		54 	Denied
21 	C 	58 		56 	Denied

Super Delegate
Jan 20, 2005

ƃɐlɟ ǝɥʇ
I have to loop through a table and perform inserts/updates depending on if a certain column exists. Is there anyway to make variables that contain the first entry in a table?
I want to do something like the code below, but the code doesn't work.
code:
DECLARE @myvar nvarchar
@myvar = select top 1 table.column from table

Zombywuf
Mar 29, 2008

Super Delegate posted:

I have to loop through a table and perform inserts/updates depending on if a certain column exists. Is there anyway to make variables that contain the first entry in a table?
I want to do something like the code below, but the code doesn't work.
code:
DECLARE @myvar nvarchar
@myvar = select top 1 table.column from table

It sounds like what you're doing is horribly wrong, but to do it your code should look something like this:
code:
declare @myvar nvarchar;
select top 1 @myvar = table.column from table;
It sounds like you're trying to do an upsert though and thus should look at the "merge" statement. In general if you're trying to solve a DB problem and the word loop appears in your solution you are probably doing it very wrong.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
If that's MSSQL you need to give the nvarchar a size when you declare it. So if the column is 50 characters wide, you'd do
code:
DECLARE @myvar nvarchar(50);
SET @myvar = (select top 1 table.column from table order by table.id);
Also notice I put the query in parentheses and also put in an order by. Without an explicit order, "first" is undefined. Sure it will probably return it in the order it was inserted/created, but maybe the DB feels coy some day and decides to return some other row first.

Zombywuf
Mar 29, 2008

Jethro posted:

If that's MSSQL you need to give the nvarchar a size when you declare it.

You don't have to. Interesting things happen when you don't though.

I like turtles
Aug 6, 2009

Alright, I'm wondering if anyone can help me figure out some FileMaker nonsense? I know, I know, but it's what the client wants/already has in place.
I have data a data structure containing a non-unique ID and a dollar value. I want to have a field display the total dollar value for all instances of a given non-unique ID, across records. How can I do this?

No Safe Word
Feb 26, 2005

Zombywuf posted:

You don't have to. Interesting things happen when you don't though.

It doesn't just default to nvarchar(max)?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
Now that I think about it, I believe it defaults to nvarchar(40), but that may be context dependent.

Zombywuf
Mar 29, 2008

Jethro posted:

Now that I think about it, I believe it defaults to nvarchar(40), but that may be context dependent.

It can be very context dependant. I've seen it default to Varchar(1) for example (and then not give warnings about string truncation).

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
We're slightly paranoid about Oracle's moves on MySQL; does anyone have any experience or comment on MariaDB?

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

Golbez posted:

We're slightly paranoid about Oracle's moves on MySQL; does anyone have any experience or comment on MariaDB?

The only things i've heard are the people that tried it had a seamless transition if you were on 5.x. If your on 4.x you have the same issues as when your going to 5.x.

Reo
Apr 11, 2003

That'll do, Carlos.
That'll do.


General database design question;

I have lots of relationships that boil down to assigning object A to object B, of varying types. For example, descriptive tag A is tied to photo B, photo B is tied to photoset Q, etc. Is there any advantage/disadvantage to having one table thus:

Table [Assoc]
ParentID
ChildID
type

---versus N tables ---

Table [TagToPhoto]
TagID
PhotoID

Table [PhotoToSet]
PhotoID
SetID

...

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Reo posted:

General database design question;

I have lots of relationships that boil down to assigning object A to object B, of varying types. For example, descriptive tag A is tied to photo B, photo B is tied to photoset Q, etc. Is there any advantage/disadvantage to having one table thus:

Table [Assoc]
ParentID
ChildID
type

---versus N tables ---

Table [TagToPhoto]
TagID
PhotoID

Table [PhotoToSet]
PhotoID
SetID

Unless you need to have nested sets of photos, go for the second option. It will result in nicer queries and less banging your head against the wall.

Zombywuf
Mar 29, 2008

Reo posted:

General database design question;

I have lots of relationships that boil down to assigning object A to object B, of varying types. For example, descriptive tag A is tied to photo B, photo B is tied to photoset Q, etc. Is there any advantage/disadvantage to having one table thus:

<snipped due to traumatising memories>

Anything but the first one, you will not be able to create foreign key relationships to enforce integrity, the resulting queries will be terrible, maintenance will make you tear out your eyes, the dead will rise from their graves, etc...

What is the relationship between the objects? This should be considered for each pair in turn. For example if a photo always belongs to exactly one photoset (which I assume contains 0 or more photos) then each photo should have a photoset_id. If it's a generic n to m relationship as I would assume is the case between tags and photos then yes you want one table per relationship (and put references constraints on the drat columns).

Super Delegate
Jan 20, 2005

ƃɐlɟ ǝɥʇ

Zombywuf posted:

It sounds like what you're doing is horribly wrong, but to do it your code should look something like this:
code:
declare @myvar nvarchar;
select top 1 @myvar = table.column from table;
It sounds like you're trying to do an upsert though and thus should look at the "merge" statement. In general if you're trying to solve a DB problem and the word loop appears in your solution you are probably doing it very wrong.

I mostly create reports so I don't know much experience outside of that. Merge sounds like it will be useful, but I'll explain my DB setup, and hopefully someone can confirm this can be done with merge.

My new database (SQL server 2005) stores information about employees and will be replacing several legacy databases. Unfortunately, the transition will take a long time, and I will have to keep my database in sync until it completely replaces the legacy systems.

I receive a CSV from each legacy system that looks similar to this:
code:
LegacysystemID,firstname,lastname,persontitle,task...
My new database:
[Person] - table for person information
COLUMNS
pID - PK/person ID number (auto increment)
firstname
lastname
...

[personAlias] - Each person has multiple IDs (each legacy system assigned them unrelated IDs, so this table will link all of their old IDs back to them.)

COLUMNS
PersonAliasKey - Table PK
pID - FK from person
LegacyID

[PersonTitle]
PersonTitleKey - Table PK
pID - FK from person
title
...
[PersonTask]
PersonTaskKey - table PK
pID - FK from person
task - each task is a note about a job an employee is working on.

This is the logic I want for the insert:

For each row of the CSV, Check to see if the legacyID exists in the personAlias table.

If the legacyID is present in by DB:
Grab the pID associated with the record and make an entry into the PersonTitle along with the title. (This is an upsert, The person's title should be updated if an entry has already been made or inserted if it was not previously present. This table is normalized because in the future, we will associate multiple titles to people)

Next, insert the pID and and task into the PersonTask table. Each person will have multiple tasks and records will never be deleted/updated, only inserted. If the task data is an exact duplicate of an entry already associated with that pID, no changes are needed to this table.

If the legacyID is not present in by DB:
Insert firstname + lastname into the person table, use the pID that was just created and enter that pID + LegacyID into PersonAliasKey,and pID + task into PersonTask.

Zombywuf
Mar 29, 2008

Super Delegate posted:

For each row of the CSV, Check to see if the legacyID exists in the personAlias table.

If the legacyID is present in by DB:
Grab the pID associated with the record and make an entry into the PersonTitle along with the title. (This is an upsert, The person's title should be updated if an entry has already been made or inserted if it was not previously present. This table is normalized because in the future, we will associate multiple titles to people)

Next, insert the pID and and task into the PersonTask table. Each person will have multiple tasks and records will never be deleted/updated, only inserted. If the task data is an exact duplicate of an entry already associated with that pID, no changes are needed to this table.

If the legacyID is not present in by DB:
Insert firstname + lastname into the person table, use the pID that was just created and enter that pID + LegacyID into PersonAliasKey,and pID + task into PersonTask.

Urgh, joined upserts. The general strategy I would take is to attempt to mirror the structure you want in temporary tables then merge the changes in. So after reading the CSV into a temp table do something like:
code:
select
  p.pID,
  c.persontitle,
  c.task
into
  #peopletoupdate
from
  #csv c
  inner join PersonAlias p
     c.LegacySystemID = p.LegacyID
Then for the people not in #peopletoupdate add them to personAlias, Person and then #peopletoupate using the new pid. From there you can use two merge statements to update personTitle and personTask from #peopletoupdate.

This is only a sketch solution, sorry if this is a little unclear, these kinds of tasks are often painful.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Golbez posted:

We're slightly paranoid about Oracle's moves on MySQL; does anyone have any experience or comment on MariaDB?

Also consider Percona's not-a-fork. I've used it as a drop-in replacement a few times, and it's worked well. The only problems I've had are the same problems you usually have when upgrading, like forgetting about old_passwords and client library versions.

Withnail
Feb 11, 2004
In Oracle 10

Does anyone have a good query solution for concatenating a multiple rows into a single comma delimited column.
I need to do this with a straight query with no custom function (ie stragg) and it's on version 10 so no listagg.

So that:
code:
Group   CODE
10      CODE-A
10      CODE-B
20      CODE-A
20      CODE-D

returns

Group   CODES
10      CODE-A, CODE-B
20      CODE-A, CODE-D
Edit. It was a little more complicated as I had to join, but this is what I have so far. Seems to be working and efficient enough as it's only a few thousand rows.



code:
SELECT a.supplier_id,
       a.NAME,
       b.classification_code
FROM   d0combus.mfg_supplier_info_vw a
       left join (SELECT b.supplier_id,
                         Substr (
                 Sys_connect_by_path(b.classification_code, ', '), 2)
                                                AS
                         classification_code
                  FROM   (SELECT supplier_id,
                                 classification_code,
                                 Row_number() over (PARTITION BY supplier_id
                                 ORDER BY
                                 classification_code
                                         DESC)
                                 AS val_index
                          FROM   d0combus.mfg_supplier_classification_vw) b
                  WHERE  connect_by_isleaf = 1
                  CONNECT BY b.val_index = PRIOR b.val_index + 1
                                                 AND b.supplier_id = PRIOR
                                                     b.supplier_id
                  START WITH b.val_index = 1) b
         ON a.supplier_id = b.supplier_id
WHERE  a.supplier_id IN (SELECT DISTINCT supplier_id
                         FROM   d0combus.mfg_supplier_info_address_vw
                         WHERE  Nvl (valid_to, SYSDATE) >= SYSDATE) 

Withnail fucked around with this message at 03:29 on Sep 29, 2011

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!
Does oracle not gave a group concat function?

Goat Bastard
Oct 20, 2004

Yes, listagg, but not until 11g release 2. There's also stragg, which is a commonly used one written by Tom Kyte, but it's not built in.

The best "straight SQL" way I know of is that connect_by_path trick, but you have to be careful about nulls - if they can appear in the group and you don't want them in the concatanated string then you'll need to do something ugly like regexp_replace(regexp_replace(concatenated_column, '(, )(, )+', ', '), '(^(, )+)|((, )+$)', '')

Gwaihir
Dec 8, 2009
Hair Elf
Possible silly question.

I'm running a report to compare two different counts:
code:
SELECT b.medcon,
       a.returncount,
       b.total,
       DEC(returncount) / DEC(total) AS percentage
FROM   (SELECT medcon,
               COUNT(*) returncount
        FROM   t_mreq
        WHERE  compdt BETWEEN To_yyyymmdd(CURRENT DATE - 6 MONTHS) AND
                              To_yyyymmdd(CURRENT DATE)
               AND reftypcd = 'RETURN'
        GROUP  BY medcon) a
RIGHT OUTER JOIN 
        (SELECT medcon, COUNT(*) total
         FROM   t_mreq
         WHERE  compdt BETWEEN To_yyyymmdd(CURRENT DATE - 6 MONTHS) AND
                               To_yyyymmdd(CURRENT DATE)
         GROUP  BY medcon) b
ON a.medcon = b.medcon  
This was just the first way to do it that occurred to me, however it takes a while to run (10 minutes or so, with about 250,000 records in the table in the 6 month timeframe I've set). Normally, I would fire up my query analyzer and try and see where I'm doing something dumb, if I'm using a column that has no index, or what the bottleneck might be. In this case, however, the analyzer has no idea what the hell is going on, so it just chokes and says the query is doing a table scan of the temp tables that I assume it is building for my two sub-selects.

I'm using DB2 on an IBM iSeries machine (Mainframe-lite midrange machine, 16 cpus, 144 gigs of ram, 144 15k RPM sas disk FC-SAN storage). the to_yyyymmdd thing is a little function to convert real database date types to the yyyymmdd int that our vendor's tables use to store dates.

If that runtime seems about right, that's OK, since I can just schedule this thing to run once a day at 5 AM or something and park out in a table in my reports library, but the user in question is one of those types that really goes nuts over wanting things that are accurate as of Right Now so I'd prefer being able to stick it in a view and get semi-decent response time.

e: Most of the time used comes from doing the general count without a filter on it, this one:
code:
SELECT medcon, COUNT(*) total
         FROM   t_mreq
         WHERE  compdt BETWEEN To_yyyymmdd(CURRENT DATE - 6 MONTHS) AND
                               To_yyyymmdd(CURRENT DATE)
         GROUP  BY medcon
That guy on it's own takes 9 minutes, and there are not any table scans or obvious stuff in it.

Gwaihir fucked around with this message at 22:35 on Sep 29, 2011

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!
what happens when you take out the group by?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

That seems ludicrously long, though I am by no means an expert on IBM hardware. Is the 250,000 rows the result, or the size of the outer join table? Because the most expensive thing there seems to be that join, or potentially the group by depending on how many rows it's compressing.

Zombywuf
Mar 29, 2008

Gwaihir posted:

e: Most of the time used comes from doing the general count without a filter on it, this one:
code:
SELECT medcon, COUNT(*) total
         FROM   t_mreq
         WHERE  compdt BETWEEN To_yyyymmdd(CURRENT DATE - 6 MONTHS) AND
                               To_yyyymmdd(CURRENT DATE)
         GROUP  BY medcon
That guy on it's own takes 9 minutes, and there are not any table scans or obvious stuff in it.

What indexes do you have and how wide is the table?

Gwaihir
Dec 8, 2009
Hair Elf

Scaramouche posted:

That seems ludicrously long, though I am by no means an expert on IBM hardware. Is the 250,000 rows the result, or the size of the outer join table? Because the most expensive thing there seems to be that join, or potentially the group by depending on how many rows it's compressing.

the outer join select is looking at about 250,000 records for it's count operation. That table has probably 15+ indexes on it, I'll have to check exactly what columns are indexed, and how wide it is tomorrow morning. I would guess about ~30 columns wide. I know the compdt column is included in at least one of the indexes, as well as medcon, but I'm largely hamstrung making any new useful index. These tables are all part of our vendor's software package, and I can't really touch/fix even the most bone-headed things. (The most often advised index from running the index adviser on their library has been logged as a suggestion approximately 67 million times!)

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

Gwaihir posted:

e: Most of the time used comes from doing the general count without a filter on it, this one:
code:
SELECT medcon, COUNT(*) total
         FROM   t_mreq
         WHERE  compdt BETWEEN To_yyyymmdd(CURRENT DATE - 6 MONTHS) AND
                               To_yyyymmdd(CURRENT DATE)
         GROUP  BY medcon
That guy on it's own takes 9 minutes, and there are not any table scans or obvious stuff in it.

Store the To_yyyymmdd(CURRENT DATE - 6 MONTHS) and To_yyyymmdd(CURRENT DATE) as variables first maybe? A guy gave me a query the other day that took 20 seconds and I cut it down to a fraction of a second by doing that.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

I can't really give much advice, but you're right to think something's wrong. I have a cross join on about 2 million rows for a table 90 columns wide (oh how I hate this table) and it's running on a Windows/MSSQL server that's a shadow of your IBM server, and it takes about 2 mins. My server has a ton of ram so it's rarely touching the disc, but that should be true of yours as well.

It could be the variable assignment as Doctor rear end in a top hat states, I'm not really familiar with how DB2 handles dynamic assignment like that. It's unlikely but the function could be just be very very badly written as well.

Zombywuf
Mar 29, 2008

Gwaihir posted:

the outer join select is looking at about 250,000 records for it's count operation. That table has probably 15+ indexes on it, I'll have to check exactly what columns are indexed, and how wide it is tomorrow morning. I would guess about ~30 columns wide. I know the compdt column is included in at least one of the indexes, as well as medcon, but I'm largely hamstrung making any new useful index. These tables are all part of our vendor's software package, and I can't really touch/fix even the most bone-headed things. (The most often advised index from running the index adviser on their library has been logged as a suggestion approximately 67 million times!)

Well, the index you need is (compdt, medcon, t_mreq), possibly even with the first two columns swapped but that depends heavily on the density of medcon. Also, is your database one of those where count(1) is much faster than count(*)?

ZyrKx
Dec 10, 2006

Licorice Whip!

Goat Bastard posted:

Yes, listagg, but not until 11g release 2. There's also stragg, which is a commonly used one written by Tom Kyte, but it's not built in.

The best "straight SQL" way I know of is that connect_by_path trick, but you have to be careful about nulls - if they can appear in the group and you don't want them in the concatanated string then you'll need to do something ugly like regexp_replace(regexp_replace(concatenated_column, '(, )(, )+', ', '), '(^(, )+)|((, )+$)', '')

There is an undocumented function in 10g that will do this, wmsys.wm_concat.

Adbot
ADBOT LOVES YOU

Gwaihir
Dec 8, 2009
Hair Elf
Allll right.. Problem was totally in my user defined function. No difference in execution time when I replaced count(*) with count(compdt). There is an Index on T_MREQ with COMPDT, MEDCON, (Bunch of other columns I don't need). Taking the function out and changing it to variables chopped execution time from 563635ms (9.4 minutes) to 500ms, much better.

Guess I'm going to have to go back and see what junk my predecessor actually put in that function...
E: I wish I hadn't looked :( Augh string concatenation!
code:
Function to_yyyymmdd(ydate, date)
BEGIN 
DECLARE MYDAY VARCHAR ( 10 ) ; 
DECLARE MYMONTH VARCHAR ( 10 ) ; 
DECLARE MYYEAR VARCHAR ( 10 ) ; 
DECLARE MYDATE VARCHAR ( 10 ) ; 
SET MYDAY = CAST ( DAY ( YDATE ) AS VARCHAR ( 10 ) ) ; 
SET MYMONTH = CAST ( MONTH ( YDATE ) AS VARCHAR ( 10 ) ) ; 
SET MYYEAR = CAST ( YEAR ( YDATE ) AS VARCHAR ( 10 ) ) ; 
IF LENGTH ( MYDAY ) < 2 THEN SET MYDAY = '0' || MYDAY ; END IF ; 
IF LENGTH ( MYMONTH ) < 2 THEN SET MYMONTH = '0' || MYMONTH ; END IF ; 
SET MYDATE = MYYEAR || MYMONTH || MYDAY ; 
RETURN ( CAST ( MYDATE AS DEC ( 10 , 0 ) ) ) ; 
END


Gonna replace that mess with
RETURN(int(replace(char(YDATE, ISO),'-',''))), which runs just about as fast as the static value variables.

Gwaihir fucked around with this message at 14:53 on Sep 30, 2011

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply