Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Fly
Nov 3, 2002

moral compass
What fill factor are you using when you reindex?

Is something changing the indexed columns?

edit: fill rate -> fill factor

Adbot
ADBOT LOVES YOU

cannibustacap
Jul 7, 2003

Brrrruuuuuiinnssss

cannibustacap posted:

I have XAMPP set up with MYSQL working and everything, but I'd like to only deal with GZIP'd sql files. I have a database that is 2GB's large, but its SQL counterpart is closer to 100Mbs.


Anyway, I read online that you can use the command:

C:\xampp\mysql\bin>gunzip < local_backup_20090120_newadmin.sql.gz | mysql -r root my_database

But I consistently get
'gunzip' is not recognized as an internal or external command, operable program or batch file.

out of it.

So, how do I enable my XAMPP 1.70's MYSQL to allow the use of GZIP?

Then...

oversteer posted:

Can you use pipes and redirects like that on Windows?

I'd try extracting the .gz file using something (gunzip, WinRAR etc) then do

type local_backup_20090120 | mysql blah blah

But that defeats the purpose of gzipping. My .SQL file is 1900Mbs, the GZ one is like 100 Mbs.

And besides, the remote server I use can support the gunzip command.

There must be some .dll or .bin file that I can put in my folder that will allow me to use gunzip. But I just don't know where to go with it.

oversteer
Jun 6, 2005

I don't think your server has gunzip installed, or it's not in your path.

I just tested with gzip, Windows version downloadable from https://www.gzip.org (put it in your Windows directory, it's just one single file)

gzip -d -c file.sql.gz

..this dumps the file to the standard output leaving the .gz file unchanged.

You should be able to pipe that to MySQL

gzip -d -c file.sql.gz | mysql -r root your_database

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

cannibustacap posted:

C:\xampp\mysql\bin>gunzip < local_backup_20090120_newadmin.sql.gz | mysql -r root my_database

But I consistently get
'gunzip' is not recognized as an internal or external command, operable program or batch file.
gunzip is a linux utility. You need a windows program which can read .gz files and also run from the command-line.

Stephen
Feb 6, 2004

Stoned
In MySQL, is there a difference between these two queries?

SELECT * FROM a
LEFT JOIN b ON b.id = a.b_id AND b.something = 1

SELECT * FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE b.something = 1

Atom
Apr 6, 2005

by Y Kant Ozma Post

Stephen posted:

In MySQL, is there a difference between these two queries?

SELECT * FROM a
LEFT JOIN b ON b.id = a.b_id AND b.something = 1

SELECT * FROM a
LEFT JOIN b ON b.id = a.b_id
WHERE b.something = 1

The second query behaves like an INNER JOIN b ON b.id = a.b_id AND b.something = 1

Stephen
Feb 6, 2004

Stoned

Atom posted:

The second query behaves like an INNER JOIN b ON b.id = a.b_id AND b.something = 1

That is, it won't display results where b.something = NULL.

Errr.. yeah I just re-read that and realized I'm an idiot.

I meant to write INNER instead of LEFT.

cannibustacap
Jul 7, 2003

Brrrruuuuuiinnssss

oversteer posted:

I don't think your server has gunzip installed, or it's not in your path.

I just tested with gzip, Windows version downloadable from https://www.gzip.org (put it in your Windows directory, it's just one single file)

gzip -d -c file.sql.gz

..this dumps the file to the standard output leaving the .gz file unchanged.

You should be able to pipe that to MySQL

gzip -d -c file.sql.gz | mysql -r root your_database

Cool thanks,

But which is the specific file I download and put in my windows directory? There seems to be a huge number of files to get on that site...

pikes
Aug 9, 2002
:o

Fly posted:

What fill factor are you using when you reindex?

Is something changing the indexed columns?

edit: fill rate -> fill factor

We figured it out. There was a job that was running dbcc shrinkfile on each database and that was fragmenting the hell out of it. At least it was easy!

oversteer
Jun 6, 2005

cannibustacap posted:

Cool thanks,

But which is the specific file I download and put in my windows directory? There seems to be a huge number of files to get on that site...

http://www.gzip.org/gzip124xN.zip

open that and put "gzip.exe" into c:\windows

Xae
Jan 19, 2005

It isn't until I used Teradata that I realized how badly all other explain plans sucked.

code:
Explanation
 1) First, we lock ADHOC.widgets for access, we lock
    PROD.SALE_TRANS_LINE for access, we lock
    PROD.SALE_TRANS for access, we lock
    PROD.SALE_TRANS_PARTY for access, we lock
    PROD.PARTY_CUSTOMER for access, and we lock
    PROD.INVENTORY_ITEM for access.
 2) Next, we do an all-AMPs RETRIEVE step from
    ADHOC.widgets by way of an all-rows scan with a
    condition of ("(NOT (ADHOC.widgets.widget_ITEM_ID IS
    NULL )) AND ((NOT (ADHOC.widgets.SALES_ID IS NULL )) AND
    (NOT (ADHOC.widgets.SALE_DATE IS NULL )))") into
    Spool 2 (all_amps), which is built locally on the AMPs.  Then we
    do a SORT to partition Spool 2 by rowkey.  The size of Spool 2 is
    estimated with no confidence to be 368,466 rows.  The estimated
    time for this step is 0.03 seconds.
 3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
    RowHash match scan, which is joined to PROD.SALE_TRANS_LINE by
    way of a RowHash match scan with a condition of ("NOT
    (PROD.SALE_TRANS_LINE.ITEM_ID IS NULL)").  Spool 2 and
    PROD.SALE_TRANS_LINE are joined using a rowkey-based
    sliding-window merge join (contexts = 1, 16), with a join
    condition of ("(SALES_ID = PROD.SALE_TRANS_LINE.SALES_ID) AND
    (SALE_DATE = PROD.SALE_TRANS_LINE.SALE_DATE)").  The input
    table PROD.SALE_TRANS_LINE will not be cached in memory.  The
    result goes into Spool 3 (all_amps), which is redistributed by
    hash code to all AMPs.  Then we do a SORT to order Spool 3 by row
    hash.  The size of Spool 3 is estimated with no confidence to be
    357,487 rows.  The estimated time for this step is 8.29 seconds.
....
 7) We execute the following steps in parallel.
      1) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of
         a RowHash match scan, which is joined to
         PROD.INVENTORY_ITEM by way of a RowHash match scan with
         no residual conditions locking PROD.INVENTORY_ITEM for
         access.  Spool 6 and PROD.INVENTORY_ITEM are joined using
         a merge join, with a join condition of (
         "PROD.INVENTORY_ITEM.ITEM_ID = widget_ITEM_ID").  The
         result goes into Spool 9 (all_amps), which is redistributed
         by hash code to all AMPs.  Then we do a SORT to order Spool 9
         by row hash.  The size of Spool 9 is estimated with no
         confidence to be 469,527 rows.  The estimated time for this
         step is 0.34 seconds.
      2) We do an all-AMPs RETRIEVE step from PROD.PARTY_CUSTOMER
         by way of an all-rows scan with no residual conditions into
         Spool 10 (all_amps), which is redistributed by hash code to
         all AMPs.  Then we do a SORT to order Spool 10 by row hash.
         The result spool file will not be cached in memory.  The size
         of Spool 10 is estimated with high confidence to be
         179,947,614 rows.  The estimated time for this step is 1
         minute and 27 seconds.
 8) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of a
    RowHash match scan, which is joined to Spool 10 (Last Use) by way
    of a RowHash match scan.  Spool 9 and Spool 10 are left outer
    joined using a merge join, with a join condition of ("(CUST_ID )=
    (TRANSLATE((PARTY_ID )USING LATIN_TO_UNICODE)(FLOAT, FORMAT
    '-9.99999999999999E-999'))").  The result goes into Spool 1
    (group_amps), which is built locally on the AMPs.  Then we do a
    SORT to order Spool 1 by the sort key in spool field1.  The size
    of Spool 1 is estimated with no confidence to be 469,527 rows.
    The estimated time for this step is 7.31 seconds.
 9) Finally, we send out an END TRANSACTION step to all AMPs involved
    in processing the request.
 -> The contents of Spool 1 are sent back to the user as the result of
    statement 1.  The total estimated time is 2 minutes and 10 seconds.
My Favorite part:

quote:

We do an all-AMPs RETRIEVE step from PROD.PARTY_CUSTOMER
by way of an all-rows scan with no residual conditions into
Spool 10 (all_amps), which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 10 by row hash.
The result spool file will not be cached in memory. The size
of Spool 10 is estimated with high confidence to be
179,947,614 rows. The estimated time for this step is 1
minute and 27 seconds.


Retrieve 180,000,000 rows, hash and sort them in 90 seconds?

:hellyeah:

Compare with Oracles explain plan (different query):
code:
Plan Table
------------------------------------------------------------------------------------------------------------ 
| Operation                  |  Name    |  Rows | Bytes|  Cost  |  TQ  |IN-OUT| PQ Distrib | Pstart| Pstop | 
------------------------------------------------------------------------------------------------------------ 
| SELECT STATEMENT           |          |    1  |  51  |      3 |      |      |            |       |       |
|  HASH JOIN                 |          |     1 |   51 |      3 | 0,01 | P->S | QC (RANDOM)|       |       | 
|   PARTITION RANGE ITERATOR |          |       |      |        | 0,01 | PCWP |            |     4 |     5 | 
|    PARTITION HASH ALL      |          |       |      |        | 0,01 | PCWP |            |     1 |     3 | 
|     TABLE ACCESS FULL      |EMP_COMP  |     3 |   87 |      1 | 0,01 | PCWP |            |    10 |    15 | 
|   TABLE ACCESS FULL        |DEPT      |    21 |  462 |      1 | 0,00 | P->P | PART (KEY) |       |       | 
------------------------------------------------------------------------------------------------------------ 
9 rows selected.

Xae fucked around with this message at 17:58 on Jan 23, 2009

epswing
Nov 4, 2003

Soiled Meat
I have Users, Events, and Roles.

Users control many Events, and Events can be controlled by many Users.

Roles restrict have much control a User has over a given Event. Users can have many roles (per Event). Roles are used by many Users.

The Roles table will have a mostly-unchanging list of Roles, for example: "view_widgets", "print_reports", "edit_attendants", basically an ACL of possible things a User can do to an Event.

So I want a User to be associated with many Events, and for each of those Events, have a set of Roles.

How best to describe this in terms of database table structure? A single join table, or two?

How best to describe this in Hibernate using annotations to describe the relationships between User, Event and Role?

Note: A User can only control one Event at a time, in the web interface.

Who needs to know what?

Users need to know
-- Which Events can I control?
-- Which Roles do I have for the Event I'm currently controlling

Events need to know
-- Which Users can control me?

Roles don't need to know anything.

epswing fucked around with this message at 22:07 on Jan 23, 2009

Pizza Partisan
Sep 22, 2003

epswing posted:

I have Users, Events, and Roles.

Users control many Events, and Events can be controlled by many Users.

Roles restrict have much control a User has over a given Event. Users can have many roles (per Event). Roles are used by many Users.

The Roles table will have a mostly-unchanging list of Roles, for example, just 4 rows: "create", "view", "edit", "delete".

So I want a User to be associated with many Events, and for each of those Events, have a set of Roles.

How best to describe this in terms of database table structure? A single join table, or two?

How best to describe this in Hibernate using annotations to describe the relationships between User, Event and Role? I'm relatively new to Hibernate.

Your tables would look like this:
code:
User Table - UserID PK
Event Table - EventID PK
Role Table - RoleID PK
EventRole Table - EventRoleID PK, EventID, UserID, RoleID
As for expressing it in Hibernate, I don't work in Java, so I can't really begin to help you. :smith:

epswing
Nov 4, 2003

Soiled Meat

surrealcatalyst posted:

Your tables would look like this:
code:
User Table - UserID PK
Event Table - EventID PK
Role Table - RoleID PK
EventRole Table - EventRoleID PK, EventID, UserID, RoleID
As for expressing it in Hibernate, I don't work in Java, so I can't really begin to help you. :smith:

Yes, a single join table would indeed encapsulate all the necessary links.

I have no idea how to represent this in Hibernate. It's like a many-to-many-to-many join table. I'm not even sure how to best represent this in Java.

cannibustacap
Jul 7, 2003

Brrrruuuuuiinnssss

oversteer posted:

http://www.gzip.org/gzip124xN.zip

open that and put "gzip.exe" into c:\windows

Thanks! But one issue.

While it does work to do decompression in the command line, it doens't work with the mysql command.


This works: gzip -d beta_backup_20090115.sql.gz | mysql -u root DB_name

But, all it does is decompression the file. It doesn't upload it to the mysql database.

What I'd like to do is in one line
1) Decompress the .GZ file
2) upload it to the mysql server
3) Delete the .SQL file

Is there any way to do that? Can't seem to figure it out.

epswing
Nov 4, 2003

Soiled Meat
Here's maybe a more direct (Hibernate) question. How can I model a Map which is keyed by an entity, and each value is a Set of other entities?

code:
public class User {
  private Map<Event, Set<Role>> roles;
}
So for a User u, and Event e,

code:
u.getRoles().keySet() // Set of events this User can control
u.getRoles().get(e) // Set of Roles a User has for a particular Event

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
I'm using MySQL and I was wondering if there is an easy way to do the following:

SELECT Product, SUM(B1) as group0, SUM(B2) as group1 FROM table WHERE Product IN ('a', 'b', 'c') GROUP BY Product

Now let's say there's also a Product 'd' which is defined as a family of products: the sum of 'a' and 'b' = 'd'. Is there a way to have MySQL calculate 'd' for me as well? I'm currently just doing it in java after the above query executes. Note that the product family is not defined in a MySQL table (retarded, yes).

fletcher fucked around with this message at 02:30 on Jan 24, 2009

Bad Titty Puker
Nov 3, 2007
Soiled Meat

fletcher posted:

I'm using MySQL and I was wondering if there is an easy way to do the following:

SELECT Product, SUM(B1) as group0, SUM(B2) as group1 FROM table WHERE Product IN ('a', 'b', 'c') GROUP BY Product

Now let's say there's also a Product 'd' which is defined as a family of products: the sum of 'a' and 'b' = 'd'. Is there a way to have MySQL calculate 'd' for me as well? I'm currently just doing it in java after the above query executes. Note that the product family is not defined in a MySQL table (retarded, yes).

something like this?

code:
SELECT
  Product,
  SUM(B1) AS group0,
  SUM(B2) AS group1
FROM
  table
WHERE
  Product IN ('a', 'b', 'c)
GROUP BY 
  Product

UNION

SELECT
  'd' AS Product,
  SUM(B1) AS group0,
  SUM(B2) AS group1
FROM
  table
WHERE
  Product IN ('a' , 'b')

Vince McMahon
Dec 18, 2003
I took your guys advice regarding the image library db. As a recap, the current database structure is [imagecode][title][keywords][etc], where keywords is a comma-delimited field (i.e. "dog,cat,fish,ape", up to 255 characters). Searching for a keyword is essentially doing a "LIKE" search with mysql, in the most simple case.

The advice given to me before was to structure it into three tables, like this:

Table1: [imagecode][title][image_idnum]
Table2: [image_idnum][keyword_idnum]
Table3: [keyword_idnum][keyword]

So, every unique keyword has an entry in table 3. Every new image is inserted into table 1, and then an entry is made in table 2 for every keyword for that image. So, to pull out the relevant info turns from:

code:
SELECT imagecode etc FROM oldtable WHERE keywords LIKE '%dog%'
to:

code:
SELECT t1.imagecode etc 
FROM t1 

INNER JOIN t2
ON t1.image_idnum = t2.image_idnum

INNER JOIN t3
ON t2.keyword_idnum = t3.keyword_idnum

WHERE t3.keyword = 'dog' 
Now, this current query works about 100x quicker than the old one, which is great. The problem however is that the query alone returns multiple hits for a particular imagecode. So I figured I'd just add a "GROUP BY t1.imagecode" to the end, but it's made it about 35 times slower than the old way. I'm basically wondering if I'm doing this the right way, or whether I've flubbed it somewhere (bear in mind that the library has 100,000 images).

I've indexed all the fields on t2 and t3, plus the imagecode and image_idnum from t1.

Any input is appreciated.

Sergeant Hobo
Jan 7, 2007

Zhu Li, do the thing!

Vince McMahon posted:


Now, this current query works about 100x quicker than the old one, which is great. The problem however is that the query alone returns multiple hits for a particular imagecode. So I figured I'd just add a "GROUP BY t1.imagecode" to the end, but it's made it about 35 times slower than the old way. I'm basically wondering if I'm doing this the right way, or whether I've flubbed it somewhere (bear in mind that the library has 100,000 images).

I've indexed all the fields on t2 and t3, plus the imagecode and image_idnum from t1.

Any input is appreciated.

I believe you're looking for the DISTINCT keyword, as in SELECT DISTINCT t1.imagecode etc.

megalodong
Mar 11, 2008

Vince McMahon posted:

Now, this current query works about 100x quicker than the old one, which is great. The problem however is that the query alone returns multiple hits for a particular imagecode. So I figured I'd just add a "GROUP BY t1.imagecode" to the end, but it's made it about 35 times slower than the old way. I'm basically wondering if I'm doing this the right way, or whether I've flubbed it somewhere (bear in mind that the library has 100,000 images).
That frequently means there's duplicated entries somewhere. Do you have a unique constraint on table 2?
Sounds like an image is listed with the same keyword more than once.

Vince McMahon
Dec 18, 2003

Sergeant Hobo posted:

I believe you're looking for the DISTINCT keyword, as in SELECT DISTINCT t1.imagecode etc.

Aha that seemed to work, cheers. I always thought DISTINCT was one of those words to avoid for some reason.

megalodong posted:

That frequently means there's duplicated entries somewhere. Do you have a unique constraint on table 2?
Sounds like an image is listed with the same keyword more than once.

Table 2 looks like this at the moment:

code:
ImageID  	KeywordID
1	456
1	78
1	9
2	4
2	5145
3	672
3	4
So there is no unique constraint as such. For every keyword associated with an image, there is a row in this table. It is possible that there are a few duplicate entries but I don't think it would be enough to slow it down this much.

oversteer
Jun 6, 2005

cannibustacap posted:

But, all it does is decompression the file. It doesn't upload it to the mysql database.



Try this: gzip -c -d beta_backup_20090115.sql.gz | mysql -u root DB_name

The -c flag tells gzip to write to standard output (allowing it to be piped to mysql using |)

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Just wanted a second opinion on this SO post. Converting data formats from one thing to another isn't ORM, right? Right? :psyduck:

Edit: Clarification for my homies in #cobol.

Imagine you had to make a master database of all the criminals in the United States. Your company's service is selling a product which sits atop and provides access to this data in a clean, unified format.

This data is provided publicly by the 50 states, but in wildly different formats. Some are one file of data, not normalized. Other are normalized tables in CSV format. Some are Excel documents. Some are TSVs. Some records are even provided that are not complete without manual intervention (other, manually created data sources).

The purpose of my project is to make a "driver" for each of the 50 states and make sure the end product of the process is a master database of criminals in a perfect, relation model. Everything keyed correctly, the schema in perfect shape, etc.

Triple Tech fucked around with this message at 21:10 on Jan 26, 2009

mister_gosh
May 24, 2002

I'm creating some tables to track the number of times user's log in to my database and to track how often certain tools are executed.

I think these are my tables: User (includes location and product their affiliated to), Product, Tool, Login

A typical month may show:

John is located in Sarasota, FL, works with Acme FX (product), logged in to the database once every day in January (twice on 1/15), executed the Grow tool on 1/11, 1/14 and twice on 1/26, executed the Kill Whitey tool on 1/2 and five times on 1/25.

Jane is located in North Pole, Greenland and ...

Based on this stuff, I eventually want to write an application which can extract information such as usage from Florida, John's usage, a certain tools usage, etc.

My question is, how do I model this? I want to track the Grow application on a day to day basis. I want to track that a certain user used it, on which days, etc. I want to pull up the number of logins for all users for 2008, or just John and just for January 2009. Do I create another table called Usage with many date types? Do I create a monthly table with rows for 1-31?

Any links to creating this type of database table?

mister_gosh fucked around with this message at 22:12 on Jan 26, 2009

Dromio
Oct 16, 2002
Sleeper
I'm having a hell of a time figuring out how to do this, I'm looking at the PIVOT command, but I'm pretty clueless.

With tables like this:
code:
Documents Table
+++++++++++++++
DocumentID
DocumentName

MetaData Table
++++++++++++++
MetaDataID
MetaDataName

DocumentsData Table
+++++++++++++++++++
DocumentID
MetaDataID
MetaDataValue
I know there are rows in the MetaDataName table that define "Title" and "Location" metadataIDs. I want to get those values for a specific document and create a view like this:

code:
DocumentID    Title              Location
57_312        "My doc"           "North America"
57_313        "Next doc"         "South America"
Is this what PIVOT is about, or am I barking up the wrong tree?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
mister_gosh: Not sure tool is, but you probably want a table called "events" or something. Whatever your atomic unit of use is. Are you differentiating the difference between an application login and the use of a specific tool? Or does the mere use of a tool imply an immediate login? Anyway, you'll have

events: id, user_id, product_id, date_time

This will tell you who used what product and when. The questions you ask on top of that (what events happened in January, who used it the most, who used it last year, etc) sit on top of a table that looks like this. The table is simple, asking the questions and developing the application are the hard parts.

Dromio Pivoting is like... When you want to have a value become a dimension. Let's say you have a table that models monthly sales...

code:
January  08 : 25
February 08 : 12
March    09 : 7
April    09 : 8
You can rotate or pivot that data so it looks like this:

code:
Year : January : February : March : April
2008 :      25 :       12 :       :
2009 :         :          :     7 :     8
See how the original left values became the column names? That's pivoting.

Dromio
Oct 16, 2002
Sleeper

Triple Tech posted:

Pivoting is like... When you want to have a value become a dimension.

That sounds like what I'm doing, making some of the MetaDataName values into columns. But every example I see ends up using aggregation functions within or something. Maybe I just haven't found a simple enough sample query.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Dromio posted:

I'm having a hell of a time figuring out how to do this, I'm looking at the PIVOT command, but I'm pretty clueless.

With tables like this:
code:
Documents Table
+++++++++++++++
DocumentID
DocumentName

MetaData Table
++++++++++++++
MetaDataID
MetaDataName

DocumentsData Table
+++++++++++++++++++
DocumentID
MetaDataID
MetaDataValue
I know there are rows in the MetaDataName table that define "Title" and "Location" metadataIDs. I want to get those values for a specific document and create a view like this:

code:
DocumentID    Title              Location
57_312        "My doc"           "North America"
57_313        "Next doc"         "South America"
Is this what PIVOT is about, or am I barking up the wrong tree?
If I'm reading you correctly, you don't need to pivot, you just need to JOIN

EDIT:

Oh, I get it! You've got some EAV (Entity Attribute Value) going on. That is kinda like pivot, but not really. What you need to do is just join against the MetaData table multiple times. Something like

code:
select doc.DocumentId
      ,ttl.Value Title
      ,loc.Value Location
from Documents doc inner join
     DocumentsData ttl on ttl.DocumentId = doc.DocumentId inner join
     MetaData ttl_md on ttl.MetaDataID = ttl_md.MetaDataID inner join
     DocumentsData loc on loc.DocumentId = doc.DocumentId inner join
     MetaData loc_md on loc.MetaDataID = loc_md.MetaDataID
where ttl_md.MetaDataName = 'Title'
      and loc_md.MetaDataName = 'Location'

Jethro fucked around with this message at 23:20 on Jan 26, 2009

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

Dromio posted:

That sounds like what I'm doing, making some of the MetaDataName values into columns. But every example I see ends up using aggregation functions within or something. Maybe I just haven't found a simple enough sample query.

By definition, they have to, because if multiple rows in my example were modeled for January 08, they would have to collapse into one cell, somehow. I guess you could get away with this by enforcing uniqueness on the tuple of attribute-title and document-id, and then using max/min (max and min of one element in a set should return the same item). You seem to be doing EAV, which I'm not a fan of, but some say it's inescapable.

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Vince McMahon posted:

Aha that seemed to work, cheers. I always thought DISTINCT was one of those words to avoid for some reason.

Distinct is a sort operation, so using when you don't need to should be avoided to reduce response time. But it's fine in your situation.

epswing
Nov 4, 2003

Soiled Meat

Triple Tech posted:

Imagine you had to make a master database of all the criminals in the United States. Your company's service is selling a product which sits atop and provides access to this data in a clean, unified format.

This data is provided publicly by the 50 states, but in wildly different formats. Some are one file of data, not normalized. Other are normalized tables in CSV format. Some are Excel documents. Some are TSVs. Some records are even provided that are not complete without manual intervention (other, manually created data sources).

The purpose of my project is to make a "driver" for each of the 50 states and make sure the end product of the process is a master database of criminals in a perfect, relation model. Everything keyed correctly, the schema in perfect shape, etc.

We have a similar can of worms at work. I work with a large set of xml files (~14 million) in a document-driven database (marklogic server). The files are journal articles, sent to us by dozens of publishers, and we're loading about 10,000 articles per day (which include both backfiles and newly released works). In our database, they all conform to the NLM xml ddt, but the publishers send us all sorts of crap: text (multiple articles per file), text (one article per file), csv, their own mangled xml, etc).

So we have a couple people on our team dedicated solely to writing loaders for each publishier, to get all the data into our format.

Needless to say, it's a gigantic pain in the rear end.

epswing fucked around with this message at 07:45 on Jan 27, 2009

Victor
Jun 18, 2004
Dromio, pivot is probably exactly what you want (and SQL2005 appears to execute it quite efficiently, better than the alternative joins). I do this in my historical EAV article.

mister_gosh
May 24, 2002

Are there any good free or cheap database modeling tools out there like Embarcadero or Toad? I can't justify spending close to or over $1000 on something like that.



Triple Tech posted:

mister_gosh: Not sure tool is, but you probably want a table called "events" or something. Whatever your atomic unit of use is. Are you differentiating the difference between an application login and the use of a specific tool? Or does the mere use of a tool imply an immediate login? Anyway, you'll have

events: id, user_id, product_id, date_time

This will tell you who used what product and when. The questions you ask on top of that (what events happened in January, who used it the most, who used it last year, etc) sit on top of a table that looks like this. The table is simple, asking the questions and developing the application are the hard parts.


Thanks, I'll try it out!

Dromio
Oct 16, 2002
Sleeper
Thanks to everyone who told me about EAV. That is exactly what the database contains, I just didn't even know the correct name for it.

Victor posted:

Dromio, pivot is probably exactly what you want (and SQL2005 appears to execute it quite efficiently, better than the alternative joins). I do this in my historical EAV article.

Victor, your article has me very close to what I need. But in your example you're only getting one jobrole (MAX) for the user that has two defined. I want to return a row for every jobrole so that the output would look like this:
code:
record_id first_name last_name job_title     
--------- ---------- --------- --------------
1         Larry      Smith     Senior Manager
2         Larry      Smith     Manager
3         Victor     NULL      NULL          
Is this possible?

EDIT: Nevermind, it looks like Jethro's post gets me what I need.

Dromio fucked around with this message at 17:38 on Jan 27, 2009

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
For SQL Server 2005, how can I update table A if the information that limits the query is in joinable table B?

Like, UPDATE goons SET avatar = gangsign WHERE threads this person has posted has thread.score < 2

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Triple Tech posted:

For SQL Server 2005, how can I update table A if the information that limits the query is in joinable table B?

Like, UPDATE goons SET avatar = gangsign WHERE threads this person has posted has thread.score < 2

code:
UPDATE g
   SET avatar = gangsign
  FROM goons g
 WHERE --conditions

Dromio
Oct 16, 2002
Sleeper
Jethro,

Thank you. I revisited your post with a clearer head and it turned out to be just what I needed.

oversteer
Jun 6, 2005

Indexes tables in MySQL -

We have a relatively hefty table (75m rows, 4GB) which is currently split up into months using a set of merge tables.

It stores log data from sensors and is arranged as follows

ID (autoincrement)
Location (Foreign Key)
Point (Foreign Key)
Value
Timestamp

There's one index (Location, Point, Timestamp). Searching on those fields is nice and quick, however sometimes we need to search by Location+Timestamp. Due to the input process we can't normalise this any further but adding a new index (Location, Timestamp) will increase the table size massively, and I'm not sure on the benefit (will the 3GB or so index that gets created be of any use as it won't be held in memory). However the query really can't take more than ~20s as the whole table gets locked and inserts quickly fill up the connection pool...

Should I just add the index or is it time to look at a total reorganisation of this table? I'm not sure whether the Merge table complicates matters (especially the locking, as it's MyISAM)

Adbot
ADBOT LOVES YOU

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Does the first tuple need to be in that particular order? If you can rearrange it to match Location-Timestamp[-Point], then your two use cases can consume one index (at least for SQL Server).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply