Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
salithus
Nov 18, 2005

Roundboy posted:

Did i explain this enough? What is the proper work around ?

Maybe this is too easy, but:

where (tableb.fax != tablea.fax1 and tableb.fax != tablea.fax2)

Should only return true when there's no match.

Adbot
ADBOT LOVES YOU

Roundboy
Oct 21, 2008
Tested.. no go.. what if fax1 or fax2 is null ?

here is a 'live' example

A
code:
   KEY1    KEY2    FAX1       FAX2
1. PICMM	4716804	xxxxxx1291 <null>			
2. PICMM	4709590	xxxxxx1690 <null>		     	
B
code:
   KEY1    KEY2    FAX      
1. PICMM	4716804	xxxxxx6609	
2. PICMM	4709590	xxxxxx1690     	
#1 above is a valid 'no match' and should be returned. But instead, the query returns no results

Squashy Nipples
Aug 18, 2007

Howabout a CASE statement?

salithus
Nov 18, 2005

Roundboy posted:

Tested.. no go.. what if fax1 or fax2 is null ?

Doh, knew it was too easy to work.

where (
(tableb.fax != tablea.fax1 or tablea.fax1 is null) and
(tableb.fax != tablea.fax2 or tablea.fax2 is null)
)

Not sure what it does performance-wise.

Aredna
Mar 17, 2007
Nap Ghost
MS SQL Server 2000

I need to remove all non-digits at the beginning of a string up to the first digit. I'd use a regex replace normally, but the the standard replace with SQL Server 2000 just doesn't seem to do what I need. Is there another simple function I'm overlooking?

edit: Yes, yes I was: PATINDEX. This solved my problem, where n is the field
code:
substring(n,patindex('%[0-9]%',n),len(n)-patindex('%[0-9]%',n)+1)

Aredna fucked around with this message at 21:10 on May 20, 2009

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Could we put a huge note in the OP that's like stop worrying about performance. Unless your rows are measured in millions, at least... Something.

Roundboy
Oct 21, 2008
I should also note i have been trying to apply "demorgan's law" here, with no good results.

EDIT: Oh jesus crap.. everything was working out fine.. if i put a to_number around each fax field. Apparently Oracle did not like to match a fax number stored in a VARCHAR2(10) to another one in a VARCHAR2(20). Even though they were identical numbers

Roundboy fucked around with this message at 21:30 on May 20, 2009

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Triple Tech posted:

Could we put a huge note in the OP that's like stop worrying about performance. Unless your rows are measured in millions, at least... Something.

I had some queries that would take 2 minutes to execute. Often, the best optimization isn't in table structure - that's when you get into millions of rows - but in the query. Cut that motherfucker down to 1.3 seconds.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
It's poor understanding of queries combined with poor schema designs... I can't say I'm a pro and I probably have a lot to learn, but I've been working with SQL Server for the past couple months and it's pretty eye opening. I'm even considering looking for a way to specialize in data modeling and what not. I don't want to say DBA because that's more "database janitor", haha...

Bad Titty Puker
Nov 3, 2007
Soiled Meat
Sometimes one is just stuck with a lovely schema where refactoring it is painful and hard to get approval for. I've found that performance can often be improved dramatically on statements with sub-par performance just by index tuning and SQL statement tuning. It's fun and it's a bit of an art. Hey, if that's janitor work, bring me a mop.

Squashy Nipples
Aug 18, 2007

Golbez posted:

I had some queries that would take 2 minutes to execute. Often, the best optimization isn't in table structure - that's when you get into millions of rows - but in the query. Cut that motherfucker down to 1.3 seconds.

What DBMS do you work with?

I've found that its pretty hard to fool the DB2 query optimizer... you have to start writing really ponderous, hosed up queries before it starts to effect execution time in a statistically significant way. Whenever I've tested queries overnight, the network throughput is always a much bigger variance.

I dunno, maybe this means my janitor is really good at indexing or something? ;)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Dr.Khron posted:

What DBMS do you work with?

I've found that its pretty hard to fool the DB2 query optimizer... you have to start writing really ponderous, hosed up queries before it starts to effect execution time in a statistically significant way. Whenever I've tested queries overnight, the network throughput is always a much bigger variance.

I dunno, maybe this means my janitor is really good at indexing or something? ;)

MySQL. I had queries with 20 joins and all sorts of crap that were making things horrible. A search engine rewrite later and the slow query log is so much shorter. :)

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Golbez posted:

MySQL. I had queries with 20 joins and all sorts of crap that were making things horrible. A search engine rewrite later and the slow query log is so much shorter. :)

I'm not surprised your optimizer was making bad choices with that many joins. When you start including more than 8 or so tables in a query (in Oracle at least) the optimizer begins to go into traveling salesman problem mode and hurfs some serious durfs. http://en.wikipedia.org/wiki/Travelling_salesman_problem

If your rdbms has an equivalent to the Oracle ordered hint you can force it to not stroke out and begin dribbling from the mouth when you need to use that many joins though, if you can figure out the proper table access order.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Or you can just be suspicious that you're doing that many joins.

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Triple Tech posted:

Or you can just be suspicious that you're doing that many joins.

Sometimes you gotta do what you gotta do and can't refactor the table structure into something less retarded. C'est la vie.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Triple Tech posted:

Or you can just be suspicious that you're doing that many joins.

Basically this. It was finally time to rip the life support off that search engine and begin anew. I also discovered the EXPLAIN command. :)

stuph
Aug 31, 2004

donkey punching my way to the top

Markoff Chaney posted:

Sometimes you gotta do what you gotta do and can't refactor the table structure into something less retarded. C'est la vie.

Sometimes you solve this in queries, sometimes you solve this in code. I'm always tempted to do all of my work in the query, but more and more I'm pushing work into code - it's a hell of a lot more expensive to upgrade a DB server than it is to buy another webserver. Optimizing is always great, but when you've got a DB that JUST CAN'T EVER BE DOWN, adding another index to 50 million+ rows isn't something you can really do - so code changes it is. Took one big query, split it into 4 smaller ones and did the joining in arrays - went from taking up to 15 minutes to never taking more than 10 seconds, less than half of that being DB time.

Lamb-Blaster 4000
Sep 20, 2007

WARNING: UGLY QUERY AHEAD

I've inherited the maintenance duties for one of our company's older sites, it's a book store and the massive search function has to check the user's query against holiday names among other things, I've been tasked with adding that functionality.

The query was trudging ahead just fine without my addition, but somehow when I add the bit about the holiday names it returns every matched row 8 times...

here's the full query (it was originally one line! :froggonk: , I've formatted it here. The part I added is at the end

"summer fun" was my test query, when I run just my portion on the server I get 10 results. The entire query including my part returns the same 10 books, but 8 times each

The sql statement is cobbled together inside coldfusion to deal with various filters like age and price limits, any troubleshooting suggestions on why a statement would return the same row multiple times would be much appreciated.

code:
SELECT 
   DISTINCT p.ID, 
   p.*, 
   sub.mainCategoryID as catID 
FROM 
   products p, 
   subCategories sub, 
   subCategories sub2, 
   subCategories sub3 
where 
   sub.ID = p.subCategoryID and 
   (sub2.ID = p.subCategoryID2) and 
   (sub3.ID = p.subCategoryID3) and 
   p.visible=1 and 
   listingDate < now() and 
   p.upperAgeLimit >= 0 and 
   (
      p.price <= 100 OR 
      (p.salePrice > 0 and p.salePrice <= 100)
   ) and 
   (
      p.price >= 0 OR 
     (p.salePrice > 0 and p.salePrice >= 0)
   ) and 
   (
      lower(p.description) like "%summer%fun%" or 
      lower(p.name) like "%summer%fun%" or 
      lower(p.transliteratedName) like "%summer%fun%" or 
      lower(p.ukrainianName) like "%summer%fun%" or 
      lower(p.author) like "%summer%fun%" or 
      lower(p.ukrainianAuthor) like "%summer%fun%" or
      lower(p.ukrainianDescription) like "%summer%fun%" or 
      lower((select s.name from series s where id=p.seriesID)) like "%summer%fun%" or 
      lower((select s.ukrainianName from series s where id=p.seriesID)) like "%summer%fun%" or 
      lower((select s.transliteratedName from series s where id=p.seriesID)) like "%summer%fun%" or 
      lower((select s.description from series s where id=p.seriesID)) like "%summer%fun%" or 
      lower((select s.ukrainianDescription from series s where id=p.seriesID)) like "%summer%fun%"
   ) or 
   p.holidayID in (select h.id from holidays h where h.name like "%summer%fun%" or h.greeting like "%summer%fun%")

Popoi
Jul 23, 2000

I'm not familiar with the syntax, but my guess would be that the OR is overriding the earlier logic for the joins due to the way the logical operators are processed. Add some fields from sub2 and sub3 to the query to see if they are indeed identical rows, or if they're caused by bad joins.

If that turns out to be the case, all you should have to do is enclose the last two statements in parentheses so they're evaluated as X AND (Y OR Z) instead of (X AND Y) OR Z.

Lamb-Blaster 4000
Sep 20, 2007

AWESOME! Worked like a charm! Thanks so much!

checkeredshawn
Jul 16, 2007

Hey everyone, I'm writing a paper on SQL injections for a technical writing course at university, and I'm at the end of the paper where I'd like to explain how to protect yourself against attacks. I figured that escaping dangerous characters was a good way to do it, but after reading Wikipedia I saw that "Parameterized Statements" are suggested. I don't really understand it, though, the article says that user input must not be directly embedded in the SQL statement, but instead you should do something like this:

code:
PreparedStatement prep = conn.prepareStatement("SELECT * FROM USERS WHERE USERNAME=? AND PASSWORD=?");
prep.setString(1, username);
prep.setString(2, password);
How does this help, though? Why can't someone still just put something like foo for the username and '' OR 'x'='x' for the password? I'm probably overlooking something obvious and for that I apologize. Thanks.

Edit: is this because the escaping it yourself method requires you to remember every dangerous character to escape, whereas the parameterization method will know what characters to escape?

checkeredshawn fucked around with this message at 02:57 on May 22, 2009

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Yes, to some extent. But depending on the implementation...

If you're doing the escaping yourself, you're sending one wholy constructed string with variables rendered as text.

But for some engines, like SQL server, you can actually send a SQL command with parameters and send in arguments, which ties directly into the parameterization style. So, it's like... Even safer. And it helps with caching and what not.

Basically, parameterization is superior if it applies to your queries, and it often will (unless you're trying to do some meta poo poo).

Sergeant Hobo
Jan 7, 2007

Zhu Li, do the thing!
So the idea is you capture the input as parameters and run checks as to what they are before putting them into the query string?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Ideally, there should already be some managed workflow where you're not really checking the parameters, just submitting them to a library that already does this checking and/or uses them as parameters.

Any impulse to stitch query strings together should be redirected to finding out what the proper, safe, injection-free way of doing SQL is for your environment.

And if there is no answer on that (which is highly unlikely), the best thing to do would be to design your own library which does exactly this. It should encapsulate the concept validating parameters and then seemlessly passing them to the DB.

epswing
Nov 4, 2003

Soiled Meat
MySQL:

code:
CREATE TABLE  `froshnet`.`test` (
  `id` int(11) NOT NULL,
  `owes` int(11) NOT NULL,
  `paid` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into test (id, owes, paid) values
(3,  115,   80),
(4,  115,  100),
(6,  100,  100),
(7,   95,   95),
(89, 120, null);

SELECT id, owes, paid FROM test
WHERE owes != paid;
I get this:
code:
3  | 115 | 80
4  | 115 | 100
I want this:
code:
3  | 115 | 80
4  | 115 | 100
89 | 120 | 0
The table is the result of a couple joins, which is why that null is sitting there. I've tried with WHERE owes <> paid, which gives me the same output.

Help me understand how to deal with null.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Upgrade the value to not-a-null... Unless there's a case where zero and null you want reported.

"where owes does not equal (paid as if null then 0 else paid)"

This won't work if owes is equal to zero and paid is null, because the effective value for paid in the equation will also be zero.

Vince McMahon
Dec 18, 2003
select id, owes, paid from test where owes != IF(paid is null,0,paid)

gives you the right results, but yeah it's a little trickier if owes = 0 like the guy above suggests

The Merkinman
Apr 22, 2007

I sell only quality merkins. What is a merkin you ask? Why, it's a wig for your genitals!
Anyone know what's going on in this thread I made?
Error when converting from YaBB to smf (to then later convert to phpBB)

code:
Successful.
Converting membergroups... Successful.
Converting members... Successful.
Converting settings... Successful.
Converting personal messages (part 1)... Successful.
Successful.
Converting boards and categories...
Notice: Undefined index: general_cat in /home/content/d/a/t/dataangel/html/yabb22_to_smf.php on line 782

Notice: Undefined index: general_cat in /home/content/d/a/t/dataangel/html/yabb22_to_smf.php on line 784
Successful.
Converting topics (part 1)... Successful.
Converting posts (part 1 - this may take some time)... Unsuccessful!
This query:

    UPDATE `smfzork`.smfzork_topics
    SET ID_FIRST_MSG = ,
    ID_LAST_MSG =
    WHERE ID_TOPIC = 7;

Caused the error:

    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version
 for the right syntax to use near '
    ID_LAST_MSG =
    WHERE ID_TOPIC = 7' at line 2 

moostaffa
Apr 2, 2008

People always ask me about Toad, It's fantastic. Let me tell you about Toad. I do very well with Toad. I love Toad. No one loves Toad more than me, BELIEVE ME. Toad loves me. I have the best Toad.
I'm using MaxMind's GeoIP Org and GeoLite Country databases for ip lookups and I'm trying to optimize my SQL statements to improve performance, as we do thousands of lookups every day.

Currently I'm using
code:
SELECT cc.cc,INET_NTOA(o.start),INET_NTOA(o.end),o.org FROM `geoip_org` o JOIN `geoip_c` c ON o.end >= INET_ATON(@ip) JOIN `geoip_cc` cc ON c.end >= INET_ATON(@ip) LIMIT 1;
once for each IP
I'm wondering if there is a way to set this so I could feed it an array as a parameter and make it look up all, instead of calling the select statement thousands of times at once. Do I need to create a stored procedure for this or something?

Vince McMahon
Dec 18, 2003
slightly tangential but I found it quicker to convert the IP to an integer in ASP than doing it with mysql's inet_aton function. are both o.start and o.end indexed?

also why does geoip have 3 tables?

Squashy Nipples
Aug 18, 2007

The following SQL gets me a list of Coporate Actions (CAs) for an investment (Dividends, Capital Gains, and Accrual Payouts):


code:
SELECT DGH_PROC_REIN_DT AS CADATE,
 DGH_TYPE AS CATYPE,
 ( DGH_RATE_1 + DGH_RATE_2 ) AS CARATE

 FROM DGHIDIST

 WHERE DGH_MGT_CO = 'strMgt'
 AND DGH_EXT_INV_CD = 'strExt'
 AND DGH_PROC_REIN_DT BETWEEN 'FirstDate' AND 'LastDate'

 ORDER BY CADATE
The problem is, somtimes you have more then one CA for a given date, and this makes my code go crazy. Its probably better for me to re-write the code to handle this situation, but its a bunch of nested loops, so fixing it by changing the SQL is an attractive option.

Its simple enough to make the above SQL a sub-table, and just SUM up the RATEs, Grouping by the CADATE. BUT, I need to capture the DGH_TYPES and the logic behind how they were combined!

Lets pretend that there are two CA/DGH_TYPEs, 'A' and 'B'. If one or more 'A's is Grouped with one or more 'B's, then its a 'C', as such:
- If any number of 'A's are Grouped together, the result is an 'A'
- If any number of 'B's are Grouped together, the result is an 'B'
- If 'A's and 'B's are Grouped together, the result is an 'C'


code:
SELECT CADATE,
 (Super cool SQL magic tricks) AS CATYPE,
 SUM(CARATE) AS CARATE

 FROM ( 

SELECT DGH_PROC_REIN_DT AS CADATE,
 DGH_TYPE AS CATYPE,
 ( DGH_RATE_1 + DGH_RATE_2 ) AS CARATE

 FROM DGHIDIST

 WHERE DGH_MGT_CO = 'strMgt'
 AND DGH_EXT_INV_CD = 'strExt'
 AND DGH_PROC_REIN_DT BETWEEN 'FirstDate' AND 'LastDate'

 ) AS SUBTABLE

 GROUP BY CADATE

 ORDER BY CADATE
Help, please.
I was thinking of using powers of two to represent the two base states, but I'm pretty sure that only works with two states, not 1 to infinity.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
I don't really understand the structure of the table, but you can definitely make it faster by indexing o.end and c.end because they are being joined on. But it's hard to say definitively without more information.

As for running it multiple times... Most often, you can make a single column, temporary table and make a new query that looks up all the stuff you need by joining off that table.

Need. More. Data.

supster
Sep 26, 2003

I'M TOO FUCKING STUPID
TO READ A SIMPLE GRAPH

Vince McMahon posted:

slightly tangential but I found it quicker to convert the IP to an integer in ASP than doing it with mysql's inet_aton function. are both o.start and o.end indexed?

also why does geoip have 3 tables?
You can also cache the integer value of the IPs in the table so that you're not doing the conversion on every single select.

moostaffa
Apr 2, 2008

People always ask me about Toad, It's fantastic. Let me tell you about Toad. I do very well with Toad. I love Toad. No one loves Toad more than me, BELIEVE ME. Toad loves me. I have the best Toad.

Vince McMahon posted:

slightly tangential but I found it quicker to convert the IP to an integer in ASP than doing it with mysql's inet_aton function. are both o.start and o.end indexed?

also why does geoip have 3 tables?

code:
geoip_c c:
  - start (int)
  - end   (int, index)
  - ci    (tinyint)

geoip_cc cc:
  - ci    (tinyint, primary)
  - cc    (char(2))
  - cn    (varchar(50))

geoip_org o:
  - start (int)
  - end   (int, index)
  - org   (varchar(50))

Mongolian Queef
May 6, 2004

I'm toying with the idea of indexing Usenet groups. To do this, newsgroup providers give you access to an overview format of messages in the range you specify. It looks like this:
code:
204733445	Subject	Poster	Date	<....@news.astraweb.com>		Bytes	Lines	+Additional keywords + content
The subject field looks something along the lines of this (if you're lucky):
code:
Release name [001/200] - "ReleaseName.part001.rar" yEnc (01/70)
Thus I need to strip out information from the subjects and group articles which are similar. This is already done with a regular expression.
I have an overview dataset with more than 50 million articles to play around with. It's actually already in a database, but I want advice on how to store this in a good way. Currently a row in the release-table has a huge field in which the articles are stored
After stripping away the different part specifiers and other junk which isn't necessary, I have a trimmed down subject. This is used to create a row in the releases-table. So, when scanning another round of articles, I need to be able to find a release by searching for the trimmed down subject.
Should I keep all the articles in an article-table? I will need to be able to load a release and add articles that aren't already in the release (it happens that I need to scan a range I've already been over and I want to avoid duplicates).
I currently use MySQL 5.1.30.
Any ideas on how to implement something like this?
Currently, a table named releases has columns much like a single article has, with an extra varchar which contains the trimmed subject. It's primary key is an integer.
To get the articles for a release, a row in the articles-table will have an entry for the release id which in turn has all the articles in one of its columns.
Articles-table has 525050 rows and is currently 45 gb. :) Releases has the same amount of rows and is 120 mb.

Squashy Nipples
Aug 18, 2007

I'm pretty close to figuring out my problem... But stuck on a data-type conversion.

DB2 has no built data-type for Boolean (true/false), and I don't have the access to make my own data types. Is there a built in way to produce text-style Boolean output, or is a CASE statement the only way?

Here is my current workaround:

code:
( CASE WHEN SUM(CAAF) > 0 THEN 'TRUE' ELSE 'FALSE' END ) AS CAAF

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
tunah, if the data is already normalized, what more can you do to store it? Not sure how else you'd go about it.

Dr.Khron, yeah, that's probably the way to go. Maybe abstract it into a function or something.

Vince McMahon
Dec 18, 2003

moostaffa posted:

code:
geoip_c c:
  - start (int)
  - end   (int, index)
  - ci    (tinyint)

geoip_cc cc:
  - ci    (tinyint, primary)
  - cc    (char(2))
  - cn    (varchar(50))

geoip_org o:
  - start (int)
  - end   (int, index)
  - org   (varchar(50))

Would it be possible to add a cc/cn column to the org table? Or are there overlapping issues there?

If you tried that, also with passing an integer instead of the ip to mysql, it might be a bit quicker.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Still not totally understanding those geo tables but I will add this, in general: Most operations are I/O bound instead of CPU bound. So converting an IP to some normal form isn't the primary bottleneck of the lookup, unless the tables are trivially small or already stored in RAM. But testing and query plans will answer this definitively.

Adbot
ADBOT LOVES YOU

Squashy Nipples
Aug 18, 2007

All fixed!

What I realized is that the Accrual Flush is a boolean condition: you can pay out accruals regardless of whether or not there is also a fixed payout on a date. As long as I can insert zeros for the rate of an Accrual Flush, then I can just sum them up with AF = true. (this is because some of the Accrual payouts have specious rates listed... Thanks a lot, Putnam!)

Even though doing it this way means I have to change the code, too, it actually makes the code simpler, because it removes a branch and some logic.

Here is the SQL:

code:
SELECT CADATE,
 ( CASE WHEN SUM(CAAF) = 0 THEN 'FALSE' ELSE 'TRUE' END ) AS CAAF,
 SUM(CARATE) AS CARATE
 
 FROM ( "
        
SELECT DGH_PROC_REIN_DT AS CADATE,
 ( CASE WHEN DGH_TYPE = 'A' THEN 1 ELSE 0 END ) AS CAAF,
 ( CASE WHEN DGH_TYPE = 'A' THEN 0.0 ELSE ( DGH_RATE_1 + DGH_RATE_2 ) END ) AS CARATE
 
 FROM DGHIDIST
 WHERE DGH_MGT_CO = 'strMgt'
 AND DGH_EXT_INV_CD = 'strExt'
 AND DGH_PROC_REIN_DT BETWEEN 'FirstDate' AND 'LastDate'

 ) AS SUBTABLE
 GROUP BY CADATE
 ORDER BY CADATE

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply