Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
NoSpoon
Jul 2, 2004
I have a table with some email addresses, dates and names.

I select the data ordered my email, then date and get
code:
blah@blah.com   Joe Bloggs       27 June
blah@blah.com   John Smith       29 June
what@ever.com   Jim Brown        25 June
what@ever.com   Jack Black       29 June
Which displays something like
code:
blah@blah.com
    Joe Bloggs       27 June
    John Smith       29 June

what@ever.com
    Jim Brown        25 June
    Jack Black       29 June
If I order by date then email...
code:
what@ever.com   Jim Brown        25 June
blah@blah.com   Joe Bloggs       27 June
what@ever.com   Jack Black       29 June
blah@blah.com   John Smith       29 June
Which displays something like
code:
what@ever.com
    Jim Brown        25 June

blah@blah.com
    Joe Bloggs       27 June

what@ever.com
    Jack Black       29 June

blah@blah.com
    John Smith       29 June
The display I want is:
code:
what@ever.com
    Jim Brown        25 June
    Jack Black       29 June

blah@blah.com
    Joe Bloggs       27 June
    John Smith       29 June
Which is ordered by date, but kept within the grouping of email addresses. This would be the result from the DB:
code:
what@ever.com   Jim Brown        25 June
what@ever.com   Jack Black       29 June
blah@blah.com   Joe Bloggs       27 June
blah@blah.com   John Smith       29 June
I can, of course, use PHP to grab the list of email addresses, then make a query for every single address, but one query is far preferable to hundreds (or thousands).

Any way to do this within MySQL?

Adbot
ADBOT LOVES YOU

Bad Titty Puker
Nov 3, 2007
Soiled Meat
SQL DBMSes use three-valued logic, or 3VL, and NULLs are placeholders, or indeterminate values that represent that a data value is missing or unknown.

In 3VL:

code:
True & True   => True
True & False  => False
False & False => False
True & Null   => Unknown
False & Null  => Unknown
Null & Null   => Unknown

True | True   => True
True | False  => True
True | Null   => Unknown
False | False => False
False | Null  => Unknown
Null | Null   => Unknown

(x = NULL)    => Unknown
(x IS NULL)   => True if value x is NULL
code:
create table t(val int, ct int not null)

insert into t(val, ct) values(NULL, 10)
insert into t(val, ct) values(1, 20)
insert into t(val, ct) values(2, 30)


select * from t where val = NULL /* = select * from t where (unknown) */

val         ct
----------- -----------

(0 row(s) affected)


select * from t where val is NULL 

val         ct
----------- -----------
NULL        10


select sum(val), sum(ct) from t /* = sum of non-null values */
----------- -----------
3           60


select count(*) from t where val = -1 
/* COUNT() returns the number of rows in the result set */

-----------
0


select sum(val) from t where val = -1 
/* SUM of empty set is NULL in SQL database systems */

-----------
NULL

No Safe Word
Feb 26, 2005

Markoff Chaney posted:

NULL can be useful, just remember that it's strange, at least in Oracle. 1 = NULL and 1 != NULL both evaluate as false for one thing, and for another you can't index on null. If you've got a column with an index on it and there are null values in there you can use that index if you specify <column value> = <whatever>, but not <column value> IS NULL;

EDIT: and I just used a ; instead of a . at the end of a sentence, I've been doing this poo poo for too long

Basically never do equality tests with NULL on any platform ever, and always use IS NULL. Yes, it can bone up indexing, but it's better that it's right first ;)

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
When I go to phpMyAdmin and view the details of most of the tables in my database, I see something that looks like the attatched image.

My database has 7 tables. Two of them have two columns as a Primary Key. Those tables do not exhibit this behaviour in phpMyAdmin. The other 5 each have a single column as Primary Key. The naming convention for each of these tables' Primary Key columns is (tablename)ID. So the User table has a Primary Key called UserID. Those tables all exhibit this behaviour. I don't know what PRIMARY is in the image; there is no column called that in the table, so I don't understand what phpMyAdmin is trying to tell me.

Only registered members can see post attachments!

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Hammerite posted:

When I go to phpMyAdmin and view the details of most of the tables in my database, I see something that looks like the attatched image.

My database has 7 tables. Two of them have two columns as a Primary Key. Those tables do not exhibit this behaviour in phpMyAdmin. The other 5 each have a single column as Primary Key. The naming convention for each of these tables' Primary Key columns is (tablename)ID. So the User table has a Primary Key called UserID. Those tables all exhibit this behaviour. I don't know what PRIMARY is in the image; there is no column called that in the table, so I don't understand what phpMyAdmin is trying to tell me.
Some googling suggested that the problem is with something called an "index", which is related in a fashion I don't understand to the primary key for a table. The 5 tables with the warning message each have two indexes, one of which is named after the primary key column that I specified when I created the table, the other called "PRIMARY".

The attatched image shows the results of asking MySQL to show me the indexes on two of my tables, one of which (Message) is one with duplicate indexes, the other of which (WatchedGame) has a single index involving two columns.

It becomes clear that there was some fault with the queries I used to create these tables. Something in said queries caused MySQL to get the wrong idea and add an additional index to each one. But I'm confused as to how a CREATE TABLE query should be written, because I didn't use different syntax for the different types of tables. I created WatchedGame using

code:
CREATE TABLE WatchedGame(
    User BIGINT(20) UNSIGNED NOT NULL,
    Game BIGINT(20) UNSIGNED NOT NULL,
    PRIMARY KEY(User, Game)
) CHARACTER SET utf8 COLLATE utf8_general_ci
I created Message using

code:
CREATE TABLE Message(
    MessageID SERIAL,
    User BIGINT(20) UNSIGNED NOT NULL,
    Thread BIGINT(20) UNSIGNED NOT NULL,
    PostDate DATETIME NOT NULL,
    MessageText TEXT NOT NULL,
    DeletedByUser TINYINT(1) NOT NULL DEFAULT 0,
    DeletedByAdmin TINYINT(1) NOT NULL DEFAULT 0,
    PRIMARY KEY(MessageID)
) CHARACTER SET utf8 COLLATE utf8_general_ci
Evidently there's something wrong with the second of these queries, but what is it? The syntax is identical so I don't understand why MySQL's behaviour was different.

Also, I guess I should delete one of the duplicate indexes, but does it matter which one? If I delete the wrong one, will MessageID cease to be my primary key and cause me problems, or any other wacky poo poo like that?

code:
DROP INDEX PRIMARY ON Message

Only registered members can see post attachments!

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
I just tested this out again by creating a throwaway table

code:
CREATE TABLE Bumblebee (
    BumblebeeID SERIAL,
    BeeName VARCHAR(20) NOT NULL,
    PRIMARY KEY (BumblebeeID)
)
It does indeed create the table with a primary key and a duplicate unique key. I've tried googling but I haven't found any pages explaining this behaviour. This seems bizarre.

Edit: The problem is with using SERIAL. I ought to use BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT in its place.

Hammerite fucked around with this message at 21:55 on Jul 3, 2009

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Hammerite posted:


code:
CREATE TABLE WatchedGame(
    User BIGINT(20) UNSIGNED NOT NULL,
    Game BIGINT(20) UNSIGNED NOT NULL,
    PRIMARY KEY(User, Game)
) CHARACTER SET utf8 COLLATE utf8_general_ci
code:
CREATE TABLE Message(
    MessageID BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT
    User BIGINT(20) UNSIGNED NOT NULL,
    Thread BIGINT(20) UNSIGNED NOT NULL,
    PostDate DATETIME NOT NULL,
    MessageText TEXT NOT NULL,
    DeletedByUser TINYINT(1) NOT NULL DEFAULT 0,
    DeletedByAdmin TINYINT(1) NOT NULL DEFAULT 0,
    PRIMARY KEY(MessageID)
) CHARACTER SET utf8 COLLATE utf8_general_ci

Where's the real key of the table Message? If you define MessageID as an autoincrementing number then it can be used as a surrogate key, and to simplify joins, but without a real relational key then the table isn't normalized to first normal form and would allow duplicate rows to be inserted.

Table WatchedGame does not have any foreign keys defined that reference the tables User or Game. Similarly the table Message does not have any foreign keys, say to the User and/or Thread tables. Foreign keys are a form of declarative referential integrity (DRI), and without DRI it is possible for invalid data values to be written to the tables.

In table Message, the two "bit" or "Boolean" columns DeletedByUser and DeletedByAdmin could be replaced by a column named something like "DeleteCode". In the original design,
code:
    DeletedByUser TINYINT(1) NOT NULL DEFAULT 0,
    DeletedByAdmin TINYINT(1) NOT NULL DEFAULT 0,
could be improved by adding CHECK constraints like so:
code:
    DeletedByUser TINYINT(1) NOT NULL DEFAULT 0 CHECK (DeletedByUser IN (0, 1)),
    DeletedByAdmin TINYINT(1) NOT NULL DEFAULT 0 CHECK (DeletedByAdmin IN (0, 1))
and the new design could be something like:
code:
  DeleteCode CHAR(1) CHECK(DeleteCode in 'U', 'A')
Some of the columns may be sized too wide: The "ID" columns are declared as BIGINT(20), and column MessageText is declared as TEXT. Sizing columns to fit the data more closely is a good practice for data integrity and performance.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

camels posted:

Where's the real key of the table Message? If you define MessageID as an autoincrementing number then it can be used as a surrogate key, and to simplify joins, but without a real relational key then the table isn't normalized to first normal form and would allow duplicate rows to be inserted.

I don't understand! MessageID is the primary key of Message. What then constitutes a real key?

camels posted:

Table WatchedGame does not have any foreign keys defined that reference the tables User or Game. Similarly the table Message does not have any foreign keys, say to the User and/or Thread tables. Foreign keys are a form of declarative referential integrity (DRI), and without DRI it is possible for invalid data values to be written to the tables.

How do you mean - such as a record with a User value that doesn't match the UserID value of any row in table User? I suppose I was aware of the possibility but unconcerned, as I was not aware of an easy way to prevent it at the database level. I will look into the idea of defining some foreign keys for my tables.

camels posted:

Some of the columns may be sized too wide: The "ID" columns are declared as BIGINT(20), and column MessageText is declared as TEXT. Sizing columns to fit the data more closely is a good practice for data integrity and performance.

The reason I originally set some of my longer string type columns to TEXT was because I had the impression (I do not know where from) that the maximum size you could give to a VARCHAR column was 255, as in MessageText VARCHAR(255). Reading http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html I see that this impression was erroneous. Do you suggest (say) MessageText VARCHAR(10000) or similar, with the limit being some large number I don't want messages to exceed?

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Hammerite posted:

I don't understand! MessageID is the primary key of Message. What then constitutes a real key?

A real key is one that is derived from the data, for example a candidate key of the table Message might be (User, Thread, PostDate). By adding a UNIQUE key constraint to Message on (User, Thread, PostDate) the table would be normalized to first normal form.

quote:

How do you mean - such as a record with a User value that doesn't match the UserID value of any row in table User? I suppose I was aware of the possibility but unconcerned, as I was not aware of an easy way to prevent it at the database level. I will look into the idea of defining some foreign keys for my tables.

I think I would recommend reading a book or article on database basics, especially an introduction to data modeling, keys, and normalization.

quote:

The reason I originally set some of my longer string type columns to TEXT was because I had the impression (I do not know where from) that the maximum size you could give to a VARCHAR column was 255, as in MessageText VARCHAR(255). Reading http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html I see that this impression was erroneous. Do you suggest (say) MessageText VARCHAR(10000) or similar, with the limit being some large number I don't want messages to exceed?

I would estimate the max size of the columns and maybe size it a bit wider, based on the business rules and domain.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
OK, thank you for your advice.

edit:

quote:

I think I would recommend reading a book or article on database basics, especially an introduction to data modeling, keys, and normalization.
Have you a particular text in mind?

Hammerite fucked around with this message at 01:47 on Jul 5, 2009

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
As an additional thought, googling tells me that in order to use foreign keys in MySQL I need to switch the tables that will use them from the MyISAM engine to InnoDB. Idle reading suggests that MyISAM is a generally somewhat higher-performance engine than InnoDB but that since in particular some of my tables are UPDATE'd quite often in comparison with how often they are read, InnoDB with foreign keys may or may not be superior. It is tough for me to know what is optimal.

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Hammerite posted:

OK, thank you for your advice.

edit:

quote:

I think I would recommend reading a book or article on database basics, especially an introduction to data modeling, keys, and normalization.

Have you a particular text in mind?

I think The Manga Guide to Databases is a good intro book. I also like Fabian Pascal's introductory books; see my earlier post.

Some online articles:

http://databases.about.com/od/specificproducts/a/normalization.htm
Has good info and links

Wikipedia article on Database normalization.

http://filedb.experts-exchange.com/incoming/2009/02_w06/102071/rettigNormalizationPoster.pdf

For a quick starting point on database theory: http://en.wikipedia.org/wiki/Relational_model

McGlockenshire
Dec 16, 2005

GOLLOCKS!

Hammerite posted:

As an additional thought, googling tells me that in order to use foreign keys in MySQL I need to switch the tables that will use them from the MyISAM engine to InnoDB. Idle reading suggests that MyISAM is a generally somewhat higher-performance engine than InnoDB but that since in particular some of my tables are UPDATE'd quite often in comparison with how often they are read, InnoDB with foreign keys may or may not be superior. It is tough for me to know what is optimal.
InnoDB and MyISAM no longer have major performance differences when the server is properly configured, i.e. not using defaults. Using FKs always incurs a performance penalty due to the lookups required, but that's true with any database from any vendor. It's better to eat the possible performance penalty and only worry about it if it becomes a measurable issue.

The only legitimate reason to use MyISAM over InnoDB is for fulltext indexing, and there are plenty of other ways to go about searching (i.e. Sphinx, though that's another can of worms).

8ender
Sep 24, 2003

clown is watching you sleep

McGlockenshire posted:

Using FKs always incurs a performance penalty due to the lookups required, but that's true with any database from any vendor. It's better to eat the possible performance penalty and only worry about it if it becomes a measurable issue.

Not sure if MySQL is the same, but on Oracle if you're doing SQL operations on two tables that should have a foreign key they will actually be sped up adding one. I suspect this is some sort of optimization trickery on Oracle's part.

Xae
Jan 19, 2005

In PL/SQL how do I use a collection in an "IN" clause?

I was certain I have done this before, but I can't remember or Google the syntax.

Backhand
Sep 25, 2008
I've got a painfully vague but hopefully easy question. This is actually a continuation of sorts for the problem I posted a few pages ago; the person the report was written for wants some things expanded and I've been having trouble with that.

I've already got a report written in Actuate that retrieves account numbers for accounts that have more than one active electrical service installed. It works great, no problems. Now however I have to display a subreport that pops up information on each individual service, information that is listed in various other tables. I'm pretty sure I've got a solid query there that does what I need it to. Only one problem: I can't find a good way to get at the data I need. A brief look at the initial query so you know what I'm talking about :

code:
SELECT   a.account_no, 
         Count(* ), 
         c.account_full_name_lf, 
         c.address_line1, 
         c.cycle_code 
FROM     umsdata.um00252v a 
         INNER JOIN umsdata.um00140t b 
           ON a.location_no = b.location_no 
              AND a.service_seq = b.serv_seq 
              AND a.service_stat = 'CUTON' 
              AND b.service_tp = 'ELEC' 
         JOIN umsdata.l_um_account c 
           ON a.account_no = c.account_no 
GROUP BY a.account_no, 
         c.account_full_name_lf, 
         c.address_line1, 
         c.cycle_code 
HAVING   Count(* ) > 1 
ORDER BY a.account_no 
This gives me the account number, the number of different active electrical services it has, and some account holder information. No big deal.

To definitely identify a given service and get the appropriate information for it I need two things: Its location number and its service seq. I want to grab these from my initial query and use them as parameters in the next: Just having the account number doesn't really help, since an account can have more services than just electrical and I don't want to display those. The SQL in my initial query clearly has to access that information at some point to count it up, but since I'm using an aggregate function in the form of COUNT I don't think there's any way to get at that information in my result set. My hope is that you all can tell me otherwise. For reference, here's the query I used in the nested report a bit further down:

code:
SELECT a.meter_no, 
       a.location_address, 
       b.bill_cd, 
       a.location_no 
FROM   umsdata.l_um_installed_meter a 
       INNER JOIN umsdata.um00162t b 
         ON a.location_no = b.location_no 
            AND a.service_seq = b.service_seq 
WHERE  a.location_no = :InnerLocationNo 
       AND a.service_seq = :InnerServiceSeq 
Lastly, I also tried just using the account number (which is easy to retrieve) as a parameter and using a couple of subqueries to retrieve the location_no and service_seq fields in the nested report, but it uh... didn't work too well. If you think that's the way I should go, any syntax recommendations would be appreciated.

MoNsTeR
Jun 29, 2002

Xae posted:

In PL/SQL how do I use a collection in an "IN" clause?

I was certain I have done this before, but I can't remember or Google the syntax.
code:
create or replace type myType as table of varchar2(256);

declare
  t_foo myType;
begin
  --one way to populate it
  t_foo := myType('A','B','C','D');
  --another way
  t_foo := myType();
  t_foo.extend;
  t_foo(1) := 'A';
  t_foo.extend;
  t_foo(2) := 'B';
  --yet another variation that can simplify programmatic population
  t_foo.extend;
  t_foo(t_foo.count) := 'C';
  t_foo.extend;
  t_foo(t_foo.count) := 'D';
  --
  for rec in 
  (  
    select *
      from whatever
     where something in (select * from table(t_foo))
  )
  loop
    --process your rows
  end loop;
end;

Xae
Jan 19, 2005

MoNsTeR posted:

where something in (select * from table(t_foo))
That was what I couldn't find, thanks.
I plan on doing a BULK COLLECT though. I'm rewriting it to only use one select instead of the N it currently uses.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Was disappointed just now to find that you can't do

code:
SELECT Name,
       Data_length + Index_length AS Size,
       Rows,
       Row_format
FROM (SHOW TABLE STATUS) AS XXX
in MySQL. :saddowns:

McGlockenshire
Dec 16, 2005

GOLLOCKS!
You might have more luck attacking INFORMATION_SCHEMA. None of the special info MySQL provides can be treated like a table, which is very silly.

supster
Sep 26, 2003

I'M TOO FUCKING STUPID
TO READ A SIMPLE GRAPH

Hammerite posted:

Was disappointed just now to find that you can't do

code:
SELECT Name,
       Data_length + Index_length AS Size,
       Rows,
       Row_format
FROM (SHOW TABLE STATUS) AS XXX
in MySQL. :saddowns:
It's because SHOW doesn't return a proper result set so you can't use it in a subquery. You can accomplish what you are trying to do with something like this:
code:
select
    table_name as name,
    (data_length + index_length) as size,
    table_rows as rows,
    row_format
from information_schema.tables
where table_type = "BASE TABLE"

dancavallaro
Sep 10, 2006
My title sucks
I'm trying to do geolocation by IP address, essentially. I have a geolocation table, called ip_group_city, which basically maps IP ranges to coordinates. The way you look up the coordinates for an IP address is:

code:
SELECT * FROM `ip_group_city` 
WHERE `ip_start` <= INET_ATON('xxx.xxx.xxx.xxx') 
ORDER BY `ip_start` DESC LIMIT 1
where xxx.xxx.xxx.xxx is the IP address you want to look up. Then I have another table, visitor_ips, that stores IP addresses and timestamps. I want to randomly choose 100 IP addresses from any time after the previous day, and for that I do

code:
SELECT * FROM `visitor_ips` 
WHERE `time` >= DATE_SUB( CURDATE( ) , INTERVAL 1 DAY ) 
ORDER BY RAND() LIMIT 100
But now how the gently caress do I combine these, to look up the coordinates corresponding to 100 randomly chosen IP addresses from the last day? This is making my head spin. I know I could easily do this in two separate queries, but I figure there has to be some magic (or retardedly easy) way to do this in one query.

What I need is a magical query that does something like this:

code:
SELECT * FROM `ip_group_city` 
WHERE `ip_start` <= INET_ATON(A_BUNCH_OF_FUCKING_IPS) 
ORDER BY `ip_start` DESC LIMIT 1 

WHERE A_BUNCH_OF_FUCKING_IPS 
IS EVERYTHING IN

(SELECT `ip_addr` FROM `visitor_ips` 
WHERE `time` >= DATE_SUB( CURDATE( ) , INTERVAL 1 DAY ) 
ORDER BY RAND() LIMIT 100)
but is actually valid, of course.

minato
Jun 7, 2004

cutty cain't hang, say 7-up.
Taco Defender
Use a sub-select?
code:
SELECT *
FROM `ip_group_city` 
   JOIN (SELECT `ip_addr`
         FROM `visitor_ips` 
         WHERE `time` >= DATE_SUB( CURDATE( ) , INTERVAL 1 DAY ) 
         ORDER BY RAND() LIMIT 100) AS vis_ips
WHERE `ip_start` <= INET_ATON(vis_ips.ip_addr) 
ORDER BY `ip_start` DESC LIMIT 1

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
Awhile back I posted about storing a huge pile of stock market data in a single table. The query times on it is pretty bad. Well, I don't know if I should actually complain. Imagine all symbols on the New York Stock Exchange, in some case going back about 29 years, with their daily data. They're all smashed together in one table. I'm wondering if one could expect a speedup if I put each company in its own table, and had a master table storing the indices so I know which one to call up. Is this kind of the normal thing to do with huge piles of data? Assuming I need to look at all companies across a year, that means doing this lookup a few thousand times. In the end, could it still be faster than querying on the huge pile?

I also read about stored routines and wondered if the lookup for taking a stock symbol to a table index is the kind of thing for which I should use that.

Edit: Perhaps as a defense against being told to just try it--organizing a database to accomodate that layout doesn't really seem hard to me, but to query it is going to take a little bit of revamping of the program I'm using to call it. It's a Java application using JDBC; I'm pretty much firing SQL out so I don't think the program itself would be a factor.

Rocko Bonaparte fucked around with this message at 04:17 on Jul 17, 2009

Begby
Apr 7, 2005

Light saber? Check. Black boots? Check. Codpiece? Check. He's more machine than kid now.

Rocko Bonaparte posted:

Awhile back I posted about storing a huge pile of stock market data in a single table. The query times on it is pretty bad. Well, I don't know if I should actually complain. Imagine all symbols on the New York Stock Exchange, in some case going back about 29 years, with their daily data. They're all smashed together in one table. I'm wondering if one could expect a speedup if I put each company in its own table, and had a master table storing the indices so I know which one to call up. Is this kind of the normal thing to do with huge piles of data? Assuming I need to look at all companies across a year, that means doing this lookup a few thousand times. In the end, could it still be faster than querying on the huge pile?

I also read about stored routines and wondered if the lookup for taking a stock symbol to a table index is the kind of thing for which I should use that.

No, do not break it into individual tables.

The root cause of your slowness is probably due to a bad index and/or bad table structure. Splitting into multiple tables is going to make it worse.

Post your sql, data structure (along with indexes), number of records, database type (mysql, sql server, etc.), and how long your statement is taking to execute.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!

Begby posted:

No, do not break it into individual tables.

The root cause of your slowness is probably due to a bad index and/or bad table structure. Splitting into multiple tables is going to make it worse.

Post your sql, data structure (along with indexes), number of records, database type (mysql, sql server, etc.), and how long your statement is taking to execute.
Well, that's a fast response. I'm on the road right now and was pondering this again so I thought I'd post. So I'll try to be more precise in the next few days by adding more information but since you asked now, I wanted to tell you what I vaguely know off the top of my head.

I've basically created one table that is organized like so:

symbol, string--I think 8 characters
close date--date
opening price--decimal 8,4
high price--decimal 8,4
low price--decimal 8,4
closing price--decimal 8,4
volume--big integer, 20 digits

If any of that makes any sense; 8,4 means 8 digits with 4 more for precision.

I added some symbols for companies that went out of business years ago, so I don't know the full record count anymore. I believe I have at least over 2 million records. New date for each symbol is added on a regular basis; daily at points. So on a given day, each symbol might get a new record added for it, one after the other. I tend to query a year's worth of data for all symbols within a date range. I'm running mysql, but I'd have to get the version string from you later. How should I measure the time to you? Should I use the low-level mysql client interface?

Acer Pilot
Feb 17, 2007
put the 'the' in therapist

:dukedog:

I've got about 5,000 integers that need to be checked if a user has seen before. These integers are grouped into different categories of varying size, so one category could contain 325 of the integers while another has 50 of them.

They're all unique integers and won't appear in any category more than once. Meaning that if it's in one category, it's only in that category.

Right now I have a table setup to contain the following:

user_id | category_1 | category_2 | category_3 | category_n... etc

In category_1 there's just a comma separated list of integers that the user has seen before.

eg. 1,2,3,5,6,7,111,9,10

This seems a bit clunky so I'm thinking about storing each individual integer seen in its own row. Kind of like this.

user_id | int_seen | category

The only problem I can think of is that there could be any number of user_ids... It'd probably be capped at 10,000 but that'd still be 50,000,000 entries in the table...

My solution to that would be clearing the table of a specific user_id's entries when they have seen all the integers and making another table that contains a list of users that have done this.

That and maybe having a group_id column in the original table so people in a group can share a result set instead.

What do you guys think? Any ideas or alternatives?

edit: hope that made sense

Zoracle Zed
Jul 10, 2001
Well that's certainly an improvement over your current setup. Is the category for a given number constant across user IDs?
My first thought would be something like:

CREATE TABLE Integers (Integer INT, CategoryID INT);
CREATE TABLE Users_Integers (UserID INT, Integer INT, UserHasSeenInteger LOGICAL);

Acer Pilot
Feb 17, 2007
put the 'the' in therapist

:dukedog:

Zoracle Zed posted:

Well that's certainly an improvement over your current setup. Is the category for a given number constant across user IDs?
My first thought would be something like:

CREATE TABLE Integers (Integer INT, CategoryID INT);
CREATE TABLE Users_Integers (UserID INT, Integer INT, UserHasSeenInteger LOGICAL);

Yes, the category for a number is constant across user IDs. I'll probably need to include the category id as well in the user's "seen" table as the integers are displayed by category to begin with.

This is on MySQL too by the way.

karms
Jan 22, 2006

by Nyc_Tattoo
Yam Slacker
I have a row in my MySQL table that I want to duplicate into the same table. I could do this with INSERT INTO :table SELECT * FROM :table WHERE id=:id, but since one on the columns is an auto-increment mysql balks at me for trying to have 2 rows with the same id.

Is there a real easy way to exclude/ignore the auto-increment column, or do I have to read in all column names and filter out the auto-incrementing ones?

dancavallaro
Sep 10, 2006
My title sucks

minato posted:

Use a sub-select?
code:
SELECT *
FROM `ip_group_city` 
   JOIN (SELECT `ip_addr`
         FROM `visitor_ips` 
         WHERE `time` >= DATE_SUB( CURDATE( ) , INTERVAL 1 DAY ) 
         ORDER BY RAND() LIMIT 100) AS vis_ips
WHERE `ip_start` <= INET_ATON(vis_ips.ip_addr) 
ORDER BY `ip_start` DESC LIMIT 1

Balls. From what I can tell, this is exactly what I need, but when I execute it it chugs for about 30 seconds and then returns

code:
#126 - Incorrect key file for table '#sql_fe6_1.MYI'; try to repair it
Any idea what the hell that means? I have no idea what that table might be. I repaired the tables in my geolocation database but that didn't do anything..

Begby
Apr 7, 2005

Light saber? Check. Black boots? Check. Codpiece? Check. He's more machine than kid now.

Rocko Bonaparte posted:

Well, that's a fast response. I'm on the road right now and was pondering this again so I thought I'd post. So I'll try to be more precise in the next few days by adding more information but since you asked now, I wanted to tell you what I vaguely know off the top of my head.

I've basically created one table that is organized like so:

symbol, string--I think 8 characters
close date--date
opening price--decimal 8,4
high price--decimal 8,4
low price--decimal 8,4
closing price--decimal 8,4
volume--big integer, 20 digits

If any of that makes any sense; 8,4 means 8 digits with 4 more for precision.

I added some symbols for companies that went out of business years ago, so I don't know the full record count anymore. I believe I have at least over 2 million records. New date for each symbol is added on a regular basis; daily at points. So on a given day, each symbol might get a new record added for it, one after the other. I tend to query a year's worth of data for all symbols within a date range. I'm running mysql, but I'd have to get the version string from you later. How should I measure the time to you? Should I use the low-level mysql client interface?

This sounds like an issue with your indexes. What is your SQL statement and what indexes do you have?

As far as time you don't have to be exact, but any MySQL gui should tell you how long it takes for a query to run.

Another thing that will help, put EXPLAIN in front of your sql statement and run it, and post the results here. It will show what indexes are being used and if any temporary tables are being created.

Zoracle Zed
Jul 10, 2001

drcru posted:

Yes, the category for a number is constant across user IDs. I'll probably need to include the category id as well in the user's "seen" table as the integers are displayed by category to begin with.
What do you mean by "the integers are displayed by category" (give me an example query) and why can't you do it with an inner join between the two tables?

dancavallaro
Sep 10, 2006
My title sucks

dancavallaro posted:

Balls. From what I can tell, this is exactly what I need, but when I execute it it chugs for about 30 seconds and then returns

code:
#126 - Incorrect key file for table '#sql_fe6_1.MYI'; try to repair it
Any idea what the hell that means? I have no idea what that table might be. I repaired the tables in my geolocation database but that didn't do anything..

gently caress it. I just decided to break it up into separate queries, and it runs fast enough. It doesn't really matter because my the IP address data I'm using is only updated once a day anyway, so I only have to do this lookup once a day and then I can cache it for the next 24 hours.

Next step: server-side clustering of up to ~200,000 coordinate locations to display on a map via Google Maps API.

Sergeant Hobo
Jan 7, 2007

Zhu Li, do the thing!

Mercator posted:

I have a row in my MySQL table that I want to duplicate into the same table. I could do this with INSERT INTO :table SELECT * FROM :table WHERE id=:id, but since one on the columns is an auto-increment mysql balks at me for trying to have 2 rows with the same id.

Is there a real easy way to exclude/ignore the auto-increment column, or do I have to read in all column names and filter out the auto-incrementing ones?

I might just go ahead and filter out the column names. I can at least imagine how that code would look (albeit a bit on the convoluted side).

Acer Pilot
Feb 17, 2007
put the 'the' in therapist

:dukedog:

Zoracle Zed posted:

What do you mean by "the integers are displayed by category" (give me an example query) and why can't you do it with an inner join between the two tables?

They're called separately, one for the integer list and one to see which integers they've seen.

SELECT `integer` FROM `integers_list` WHERE `category_id` = ?

SELECT `integer_seen` FROM `integers_seen` WHERE `user_id` = ? AND `category_id` = ?

Squashy Nipples
Aug 18, 2007

Rocko Bonaparte posted:

Well, that's a fast response. I'm on the road right now and was pondering this again so I thought I'd post. So I'll try to be more precise in the next few days by adding more information but since you asked now, I wanted to tell you what I vaguely know off the top of my head.

I've basically created one table that is organized like so:

symbol, string--I think 8 characters
close date--date
opening price--decimal 8,4
high price--decimal 8,4
low price--decimal 8,4
closing price--decimal 8,4
volume--big integer, 20 digits

If any of that makes any sense; 8,4 means 8 digits with 4 more for precision.

I added some symbols for companies that went out of business years ago, so I don't know the full record count anymore. I believe I have at least over 2 million records. New date for each symbol is added on a regular basis; daily at points. So on a given day, each symbol might get a new record added for it, one after the other. I tend to query a year's worth of data for all symbols within a date range. I'm running mysql, but I'd have to get the version string from you later. How should I measure the time to you? Should I use the low-level mysql client interface?

Don't forget to include a column for Split/Reverse Split ratio (make it defualt to 1.000). This way, you can build the split math right into the SQL. This is SUPER important for data across long periods of time.

Otherwise, you'll have to adjust all the prices before you populate the DB.

Also, agreeing that you are having setup problems. I have a similar DB to yours, but with only 5 years of data... but its in freaking Access. You should be able to do what you want in MySQL with plenty of speed.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
I believe my data is currently automatically compensated for splits, but that is still an important point that I should review.

Anyways, I don't know poo poo about indices, so there's a problem right there. I'll look it up here and see to make of it.

Anyways, the table is a slightly different format:
code:
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| symbol     | varchar(8)    | YES  |     | NULL    |       | 
| close_date | date          | YES  |     | NULL    |       | 
| high       | decimal(14,4) | YES  |     | NULL    |       | 
| low        | decimal(14,4) | YES  |     | NULL    |       | 
| close      | decimal(14,4) | YES  |     | NULL    |       | 
| open       | decimal(14,4) | YES  |     | NULL    |       | 
| volume     | bigint(20)    | YES  |     | NULL    |       | 
+------------+---------------+------+-----+---------+-------+
7 rows in set (0.00 sec)
Apparently, I have 8,986,764 rows (yikes!).

I regular query for me is something like:
code:
select * from stockvalue where close_date >= '2007-01-01' and close_date < '2008-01-01';
It claims to take 16 seconds here, which I'm finding harder to believe. This system seems to have been beefier. Actually the big difference is I'm using a 32-bit linux kernel here whereas normally at home I'm using a 64-bit one. I get a ton of Java errors doing these queries there too that I never see here.

Anyways the mysql version:
code:
mysql  Ver 14.12 Distrib 5.0.75, for debian-linux-gnu (i486) using readline 5.2
Update: Explain command:
code:
mysql> explain select * from stockvalue where close_date >= '2007-01-01' and close_date < '2008-01-01';
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | stockvalue | ALL  | NULL          | NULL | NULL    | NULL | 8986764 | Using where | 
+----+-------------+------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

Ummmmm wth?

Update 2: I just looked up indices quickly. I was running on the damaged notion that a database would take care of this for me already. That seems to not be the case. So if I'm prone to looking for data based on close_date, should I create an index for it? I would assume I should do something like:

code:
CREATE INDEX date_index ON stockvalue(close_date)
Update 3: Well I guess it did, since trying the query again took 3.70 seconds.

Rocko Bonaparte fucked around with this message at 04:57 on Jul 18, 2009

Zoracle Zed
Jul 10, 2001

drcru posted:

They're called separately, one for the integer list and one to see which integers they've seen.

SELECT `integer` FROM `integers_list` WHERE `category_id` = ?

SELECT `integer_seen` FROM `integers_seen` WHERE `user_id` = ? AND `category_id` = ?


Using the schema I suggested earlier, you could do:

code:
SELECT Integer 
FROM Integers AS i INNER JOIN Users_Integers  AS ui ON i.Integer = ui.Integer
WHERE i.Integer = ? AND i.Category = ? AND ui.UserHasSeenInteger = TRUE
It would save needless duplication of integer categories and would also avoid moving numbers back and forth between a seen and unseen table.

Adbot
ADBOT LOVES YOU

karms
Jan 22, 2006

by Nyc_Tattoo
Yam Slacker

Sergeant Hobo posted:

I might just go ahead and filter out the column names. I can at least imagine how that code would look (albeit a bit on the convoluted side).

Yeah, I figured as much. It's not real pretty but its still understandable.

:) thanks for looking though.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply