Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
ssergE
Sep 10, 2001

No longer a stupid baby. Now a stupid teenager.
I have a problem, and I don't think I can solve it purely in SQL.

Just say I have a (simplified) table like so:
Date, Value
2008-07-13, 10
2008-07-12, 10
2008-07-11, 10
2008-07-10, 11
2008-07-09, 13
(etc)

I need a way to return the number of values that are the same from the first row, ie for the above data-set it would return 3. If there was an entry for the 2008-07-14 that was another number, such as 9, it would only return 1.

Any ideas? Otherwise I am going to have to resort to a perl script.

Running Sybase.

Adbot
ADBOT LOVES YOU

atomic johnson
Dec 7, 2000

peeping-tom techie with x-ray eyes

ssergE posted:

I have a problem, and I don't think I can solve it purely in SQL.

Just say I have a (simplified) table like so:
Date, Value
2008-07-13, 10
2008-07-12, 10
2008-07-11, 10
2008-07-10, 11
2008-07-09, 13
(etc)

I need a way to return the number of values that are the same from the first row, ie for the above data-set it would return 3. If there was an entry for the 2008-07-14 that was another number, such as 9, it would only return 1.

Any ideas? Otherwise I am going to have to resort to a perl script.

Running Sybase.

I'll freely admit that I don't know what Sybase might do differently, but what I'd do is

code:
select
   count(1)
from
   the_table
where
   Value = (
      select
         top 1
         Value
      from
         the_table
      order by
         Date desc
    )
(That is, select the number of entries from the table where the Value = the Value of the most recent Date).

ssergE
Sep 10, 2001

No longer a stupid baby. Now a stupid teenager.

atomic johnson posted:

I'll freely admit that I don't know what Sybase might do differently, but what I'd do is

code:
select
   count(1)
from
   the_table
where
   Value = (
      select
         top 1
         Value
      from
         the_table
      order by
         Date desc
    )
I am away from my database at the moment, but if I am reading that correctly, it isn't quite what I am after (I should have been more clear). The problem with that code is it returns a count of all rows that have the same value as the first row - I am after a count of how many rows have that value one-after-the-other. As in, if the first 2 days had a value of 1, day 3 had a value of 2, and day four had a value of 1, your code will return three, whereas I am after something that would return 2.

(That is, select the number of entries from the table where the Value = the Value of the most recent Date).

atomic johnson
Dec 7, 2000

peeping-tom techie with x-ray eyes

ssergE posted:

I am away from my database at the moment, but if I am reading that correctly, it isn't quite what I am after (I should have been more clear). The problem with that code is it returns a count of all rows that have the same value as the first row - I am after a count of how many rows have that value one-after-the-other. As in, if the first 2 days had a value of 1, day 3 had a value of 2, and day four had a value of 1, your code will return three, whereas I am after something that would return 2.

(That is, select the number of entries from the table where the Value = the Value of the most recent Date).

Oh, well no, it doesn't do that. You might want to look at using a cursor to iterate over the resultset for what you're looking to do.

DLCinferno
Feb 22, 2003

Happy

ssergE posted:

I am away from my database at the moment, but if I am reading that correctly, it isn't quite what I am after (I should have been more clear). The problem with that code is it returns a count of all rows that have the same value as the first row - I am after a count of how many rows have that value one-after-the-other. As in, if the first 2 days had a value of 1, day 3 had a value of 2, and day four had a value of 1, your code will return three, whereas I am after something that would return 2.

(That is, select the number of entries from the table where the Value = the Value of the most recent Date).

I know it doesn't help you, but it might help others. If you were using SQL Server 2005 or greater, you could do this in a single query using the ROW_NUMBER() function in conjunction with a PARTITION BY clause.

Foolio882
Jun 16, 2003

by Fistgrrl
Edit: LET THIS BE A LESSON. DONT USE KEYWORDS AS COLUMN NAMES AND I AM AN IDIOT

Solved :unsmith:

Okay, this has brought me to my knees in frustration, here's hoping someone can help.

I'm using PHP/MySQL, but I have a very specific problem that I cannot solve.

php:
<?
$result3 = mysql_query("UPDATE commuters SET delete='$delete' WHERE month='$month_number'")
or die(mysql_error());
?>
With the error being:
"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'delete='1' WHERE month='July'' at line 1"

Now what boggles my mind is this was the finishing touch on a pretty extensive piece of poo poo that has lot's of database interaction. this is VERY basic and I know it, but something has gone wrong in my brain and I can't sort it out.

In all seriousness, I copied and pasted the code that updates "renew" instead of "delete" and changed the variables. It doesn't work. loving christ :smith:

Foolio882 fucked around with this message at 16:52 on Jul 22, 2008

DLCinferno
Feb 22, 2003

Happy

DLCinferno posted:

I know it doesn't help you, but it might help others. If you were using SQL Server 2005 or greater, you could do this in a single query using the ROW_NUMBER() function in conjunction with a PARTITION BY clause.

edit: Now that I'm at work, here's how:
code:
create table #t1 (dt datetime, val int)

insert	#t1 
select	'2008-07-13', 10
insert	#t1 
select	'2008-07-12', 10
insert	#t1 
select	'2008-07-11', 10
insert	#t1 
select	'2008-07-10', 11
insert	#t1 
select	'2008-07-09', 13

select	max(num)
from	(
	select	row_number() over (partition by val order by val) num, dt
	from	#temp 
	) as a
join	#temp t on t.dt = a.dt 
where	t.dt = (select max(dt) from #temp)

Zoracle Zed
Jul 10, 2001

ssergE posted:

I am away from my database at the moment, but if I am reading that correctly, it isn't quite what I am after (I should have been more clear). The problem with that code is it returns a count of all rows that have the same value as the first row - I am after a count of how many rows have that value one-after-the-other. As in, if the first 2 days had a value of 1, day 3 had a value of 2, and day four had a value of 1, your code will return three, whereas I am after something that would return 2.

(That is, select the number of entries from the table where the Value = the Value of the most recent Date).
Are you asking something like "For how many days has X had its current value?"

If so,

code:
select count(*)
from mytable as tnow inner join mytable as told
       on told.date <= tnow.date and told.val = tnow.val
where tnow.date = (select max(date) from mytable)
        and not exists( select * from mytable as other
                        where  other.val <> tnow.val
                               and other.date between told.date and tnow.date 
                      )

Zoracle Zed fucked around with this message at 02:58 on Jul 23, 2008

ssergE
Sep 10, 2001

No longer a stupid baby. Now a stupid teenager.
Thanks to all the sql noggins who responded - I shall try the various solutions tomorrow at work. PS: the table has close to a billion rows.

permanoob
Sep 28, 2004

Yeah it's a lot like that.
Would this be the place to ask questions about the SQL server package itself? I'm guessing not, but I figured it's safe ro post here first asking, than to make a new thread in SH/HC.

Squashy Nipples
Aug 18, 2007

Ug, having a brainfart, and there is no one left at work I can talk logic with.

Ok, so I have a table that tracks 401K contribution rates. Its set up as a 'profile' table; If you pull all the records for a specific participant, each row has:

-Data Field (the percentage contribution)
-Source Code
-Effective Date
-Expiration Date

If the record is currently active, the Expiration date is Null.

Ok, what I need, is a list of all relevant Source Codes, during a specific time period (between datFirst and datLast). The rest of the SQL is complicated, but it works, so here is the relevant part of the WHERE clause:

code:
        & " AND ( " _
& "   ( TABLE.EXPIRATION_DATE <= 'datLast'" _
& "   AND TABLE.EFFECTIVE_DATE >= 'datFirst' )" _
& "  OR" _
& "   ( ( TABLE.EXPIRATION_DATE >= 'datLast'" _
& "       OR TABLE.EXPIRATION_DATE IS NULL ) " _
& "   AND TABLE.EFFECTIVE_DATE < 'datLast' )" _
& "      )" _
(Query edited for clarity. Yes, I know that you can't put a variable directly into a literal date string, I just removed the text formatting to make it easier to read)

Does this capture EVERY code that was valid between datFirst and datLast, or am I missing something?

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?

permanoob posted:

Would this be the place to ask questions about the SQL server package itself? I'm guessing not, but I figured it's safe ro post here first asking, than to make a new thread in SH/HC.

You might as well try, it couldn't hurt. Worst case scenario, you make a thread. :)

Squashy Nipples
Aug 18, 2007

So I built a matrix to track all the possibilites, and figured out logic for each scenario... and then I realized that becuase Effective Date is always <= Expiration date, the logic collaspes down to this:

code:
& " AND TABLE.EFFECTIVE_DATE < 'datLast'" _
& " AND ( TABLE.EXPIRATION_DATE > 'datFirst'" _
& "    OR TABLE.EXPIRATION_DATE IS NULL )"
A bit simpler, eh?

If there is still a hole in my logic, please point it out.

clone
Apr 19, 2002
I am still having some troubles with my access database.

The query below selects the most recent location link created for a person. The problem is that sometimes there is more than one link created on the particular date, giving me two locations (I only want one).
code:
SELECT LINK_LOCATION_ID,
       PERSON_ID,
       LOCATION_ID,
       DATE_CREATED
FROM   LINK_LOCATION AS LL
WHERE  LL.DATE_CREATED = (SELECT MAX(DATE_CREATED)
                          FROM   LINK_LOCATION
                          WHERE  LL.PERSON_ID = PERSON_ID)


This is the output.
code:
link_location_id	person_id	location_id	date_created
5520	                1955	        1217	        2008-02-06
2622	                1963	        1402	        2007-10-01
5610	                1963	        1402	        2007-10-01
2635	                1963	        1406	        2007-10-01
5607	                1963	        1406	        2007-10-01
4283	                1996	        2386	        2008-04-07
5536	                1996	        2386	        2008-04-07
3842	                2001	        2114	        2007-01-15
5577	                2027	        1992	        2008-01-14
3625	                2027	        1992	        2008-01-14
5593	                2028	        1497	        2007-11-26
How would I go about re-jigging the above query so it only selects the top 1 row for each person in the table?

Squashy Nipples
Aug 18, 2007

Well, in Access, if the table is entered sequentially, you can use an Autonumber for your Primary Key, and then just ORDER BY Autnumber DESC to get the latest entry on top.

Then you can use 'SELECT TOP n', this allows you to specify any number of rows you need. (Sometimes this is good for testing huge Queries)


As far as I can tell, your Primary Key is NOT sequential. If so, your real problem is that there is no way to tell which one was the "last" location; using my above methods would essentially pick a random row (depending on ORDER BY). I think you don't have the data you actually need. Can you add a timestamp to the date created? That would make the dates unique, and then you could always find the "latest" with MAX(date).

DTMTCM
Feb 18, 2005
I misclicked post icon so I have this account now.
I'll mention this before I explain my problem: I don't have a terribly large amount of experience with databases but I understand the basic concepts of joins, aggregate functions, etc.

My problems involves an MSSQL database and a query that hit an infinite loop or appears to be an infinite loop. I took the query and broke it down to smaller and smaller pieces until I could do 6/7 of the linking in the query but as soon as I linked on the last table (CustodianID in table A to CustodianID in table B) the query falls apart. I first wrote the SQL code for this query and then debugged it in Access using the nice visualization to make sure I was an idiot and I ran into the same problem. I should mention the query is very basic and just links to unique key values in a linear fashion, if that makes any sense.

I know this problem is vague but if I manually create two queries, translating from CustodianID to CustodianName as a second query using the same logic, it works just fine. I'm curious as to how would I go about debugging this query and learning more about various configurations or setting on the server that could cause this to happen.

DTMTCM fucked around with this message at 04:38 on Jul 28, 2008

DLCinferno
Feb 22, 2003

Happy

urmomlolzinbed posted:

I'll mention this before I explain my problem: I don't have a terribly large amount of experience with databases but I understand the basic concepts of joins, aggregate functions, etc.

My problems involves an MSSQL database and a query that hit an infinite loop or appears to be an infinite loop. I took the query and broke it down to smaller and smaller pieces until I could do 6/7 of the linking in the query but as soon as I linked on the last table (CustodianID in table A to CustodianID in table B) the query falls apart. I first wrote the SQL code for this query and then debugged it in Access using the nice visualization to make sure I was an idiot and I ran into the same problem. I should mention the query is very basic and just links to unique key values in a linear fashion, if that makes any sense.

I know this problem is vague but if I manually create two queries, translating from CustodianID to CustodianName as a second query using the same logic, it works just fine. I'm curious as to how would I go about debugging this query and learning more about various configurations or setting on the server that could cause this to happen.
Well for starters I highly doubt you created an infinite loop in SQL using a select. Worse comes to worse, you probably need some indexes on your tables at the very least. Can you post the code you're having problems with?

clone
Apr 19, 2002

Dr.Khron posted:

Well, in Access, if the table is entered sequentially, you can use an Autonumber for your Primary Key, and then just ORDER BY Autnumber DESC to get the latest entry on top.

Then you can use 'SELECT TOP n', this allows you to specify any number of rows you need. (Sometimes this is good for testing huge Queries)


As far as I can tell, your Primary Key is NOT sequential. If so, your real problem is that there is no way to tell which one was the "last" location; using my above methods would essentially pick a random row (depending on ORDER BY). I think you don't have the data you actually need. Can you add a timestamp to the date created? That would make the dates unique, and then you could always find the "latest" with MAX(date).

It does not really matter which row it picks, the highest primary key would be enough.

My biggest problem is being clueless about SQL. I could figure it out programmatically without much trouble, which would be very clunky for the circumstances i am in.

I suppose what i would like is my original query with a bit added in so I am only getting one person_id per row using the highest link_location_id. I am not smart enough go the next step.

Squashy Nipples
Aug 18, 2007

clone posted:

It does not really matter which row it picks, the highest primary key would be enough.

My biggest problem is being clueless about SQL. I could figure it out programmatically without much trouble, which would be very clunky for the circumstances i am in.

I suppose what i would like is my original query with a bit added in so I am only getting one person_id per row using the highest link_location_id. I am not smart enough go the next step.

Ug, if the LINK_LOCATION_ID Primary Key is sequential, then I gave you everything you need in my first reply!

Here is how I would do it: if you use an IN LIST with a GROUP BY sub-query then you don't need a Self-Join or TOP N.

code:
SELECT LINK_LOCATION_ID,
       PERSON_ID,
       LOCATION_ID,
       DATE_CREATED
 FROM  LINK_LOCATION
 WHERE LINK_LOCATION_ID IN
(
SELECT MAX(LINK_LOCATION_ID)
 FROM LINK_LOCATION
 GROUP BY PERSON_ID
)

DTMTCM
Feb 18, 2005
I misclicked post icon so I have this account now.

DLCinferno posted:

Well for starters I highly doubt you created an infinite loop in SQL using a select. Worse comes to worse, you probably need some indexes on your tables at the very least. Can you post the code you're having problems with?

Sorry about not posting the code. I didn't have access to it at that moment but it was a problem that had been bugging me all day. The Custodians table is just a table with 10 unique entries for the CustodianIDs that all map to a name. It seems like a trivial link but as I said before, I'm not exactly well experienced in queries.

code:
 SELECT
     //Custodians.CustodianName,  
     DiscoveryJobs.CustodianID,                             
     ExportVolumes.VolumeName,                             
     ExportedItems.ExportJobID                             
 FROM                             
     ExportedItems,                             
     //Custodians,                             
     ExportVolumes,                             
     DataExtractResults,                             
     DiscoveryJobs                             
 WHERE                             
     ExportedItems.ExportJobID={0} AND                             
     DataExtractResults.DataExtractResultsID = ExportedItems.DataExtractResultsID AND    
     DiscoveryJobs.DiscoveryJobID = DataExtractResults.DiscoveryJobID AND                
     //Custodians.CustodianID = DiscoveryJobs.CustodianID AND                            
     ExportVolumes.ExportVolumeID = ExportedItems.ExportVolumeID;                        

crazypenguin
Mar 9, 2005
nothing witty here, move along
I'm trying to accomplish some sort of relevancy-ordering in a search feature. Basically, we have a lot of data and right now we allow the user to search through it to find records that satisfy ALL the conditions they specify. This has some usability problems, so obviously what we'd like to do is return results sorted by how many of the conditions match.

I'm not sure how to translate this to SQL. Off the top of my head, I'd like to do separate queries for rows that match each condition, then GROUP BY the row ID and sum() up a magic column that's always 1 to get a result back that says "this row matched 3 conditions," but I can't figure out how to translate this to SQL. My first thought was to do something like union each query together and then do the group, but that's not possible since group by is part of the select statement, not the union syntax, and even if it were, MySQL apparently always removes duplicates from union statements.

Any thoughts? I know I'm not the first to want to do this, I just haven't had any luck guessing what to call it for google.

MoNsTeR
Jun 29, 2002

crazypenguin posted:

I'm trying to accomplish some sort of relevancy-ordering in a search feature. Basically, we have a lot of data and right now we allow the user to search through it to find records that satisfy ALL the conditions they specify. This has some usability problems, so obviously what we'd like to do is return results sorted by how many of the conditions match.

I'm not sure how to translate this to SQL. Off the top of my head, I'd like to do separate queries for rows that match each condition, then GROUP BY the row ID and sum() up a magic column that's always 1 to get a result back that says "this row matched 3 conditions," but I can't figure out how to translate this to SQL. My first thought was to do something like union each query together and then do the group, but that's not possible since group by is part of the select statement, not the union syntax, and even if it were, MySQL apparently always removes duplicates from union statements.

Any thoughts? I know I'm not the first to want to do this, I just haven't had any luck guessing what to call it for google.

To avoid removing duplicates, use UNION ALL instead of UNION. And if you want to group the results of the unions, just do:
code:
select grouping_item
     , aggregate_function(whatever)
  from (
        select grouping_item, whatever from ...
        union all
        select grouping_item, whatever from ...
        union all
        select grouping_item, whatever from ...
       )
 group by grouping_item

Victor
Jun 18, 2004
crazypenguin, something like this is an option:
code:
select  id,
        whatever,
        case when first_condition then 1 else 0 end +
        case when second_condition then 1 else 0 end as MATCH_LEVEL
from    your_table
order by MATCH_LEVEL desc
Depending on how much you mean by "a lot of data", the above might not perform acceptably, but it's worth a shot.

Zoracle Zed
Jul 10, 2001

Victor posted:

crazypenguin, something like this is an option:
code:
select  id,
        whatever,
        case when first_condition then 1 else 0 end +
        case when second_condition then 1 else 0 end as MATCH_LEVEL
from    your_table
order by MATCH_LEVEL desc
Depending on how much you mean by "a lot of data", the above might not perform acceptably, but it's worth a shot.

Might be a little more efficient to drop the case

code:
select  id,
        whatever,
        -( (first_condition) + (second_condition)) as MATCH_LEVEL
from    your_table
where   MATCH_LEVEL > 0
order by MATCH_LEVEL desc
I think booleans are cast to numeric with TRUE as negative one and FALSE zero, so the negative sign switches to positive ones.

ExileStrife
Sep 12, 2004

Happy birthday to you!
Happy birthday to you!
I'm trying to make an Oracle function that returns an day to second interval of the "working hours" (or days if it's long enough) between two given timestamps. Our working hours are from 8:30am to 5:00pm, Monday through Friday. Intervals that go outside those times get clipped and only start counting again if it rolls over to the next day. This is what I would like to see:

Start | End | Working Hours Duration
07/30/2008 9:00:00 | 07/30/2008 13:00:00 | 0 4:00:00
07/30/2008 12:00:00 | 07/30/2008 20:00:00 | 0 5:00:00
07/30/2008 12:00:00 | 07/31/2008 10:00:00 | 0 6:30:00
07/25/2008 12:00:00 | 07/31/2008 17:00:00 | 1 15:00:00

Stephen
Feb 6, 2004

Stoned
I've got two MySQL tables:
canines(id, name, etc.)
votes (id, canine_id)

I want to select the dogs, ordered by the number of votes it has.

My query:
code:
SELECT c.*, v.votecount  
FROM canines c 
LEFT JOIN (
 SELECT canine_id, COUNT(*) AS votecount 
 FROM votes 
 GROUP BY canine_id
) v ON v.canine_id = c.id 
ORDER BY v.votecount DESC
This works perfectly, but it runs extremely slowly. Is there a faster way of doing this query?

Edit: I should mention that I have an index on 'canine_id' on the votes table.

Aredna
Mar 17, 2007
Nap Ghost

Stephen posted:

I've got two MySQL tables:
canines(id, name, etc.)
votes (id, canine_id)

I want to select the dogs, ordered by the number of votes it has.
...
Edit: I should mention that I have an index on 'canine_id' on the votes table.


See if this is any faster:
code:
SELECT c.canine_id, c.name, c.etc, count(v.id) as votecount
FROM canines c 
  LEFT JOIN votes v
    ON c.canine_id = c.canine_id
GROUP BY c.canine_id, c.name, c.etc
ORDER BY count(*) DESC
I'm not sure what results you want if there are no votes. I'm kinda fuzzy without some further testing, but I think this will return either 0 or NULL for rows with no votes. If you change the LEFT JOIN to an INNER JOIN then it will elminate those rows completely.

crazypenguin
Mar 9, 2005
nothing witty here, move along
Thanks monster, victor, and zoracle zed. That's exactly what I was looking for. :)

var1ety
Jul 26, 2004

Stephen posted:

I've got two MySQL tables:
canines(id, name, etc.)
votes (id, canine_id)

I want to select the dogs, ordered by the number of votes it has.

My query:
code:
SELECT c.*, v.votecount  
FROM canines c 
LEFT JOIN (
 SELECT canine_id, COUNT(*) AS votecount 
 FROM votes 
 GROUP BY canine_id
) v ON v.canine_id = c.id 
ORDER BY v.votecount DESC
This works perfectly, but it runs extremely slowly. Is there a faster way of doing this query?

Edit: I should mention that I have an index on 'canine_id' on the votes table.

Aggregating the votes table is probably what's taking so long. The query will out of necessity need to do a full table scan to satisfy the request, which can take a long time when the number of rows is large. You can run the query on its own to verify.

There are not a lot of things you can do to tune this passively. You can make sure the votes table is minimized in side (by restricting the maximum length of the field data types) so that more rows can be retrieved at a time, and you can try caching the aggregation operation in some fashion.

If you can do more drastic things I would encourage you to redefine the votes table so that it holds an aggregated total per canine instead of individual votes. The table is probably write few read many, so it makes sense to optimize for reading.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
Is there any way to keep mysqldump from killing server performance while it's running?

edit: It seems it is because it locks the tables. Can I do a live replication on the same machine?

fletcher fucked around with this message at 02:02 on Jul 31, 2008

Stephen
Feb 6, 2004

Stoned

var1ety posted:

Aggregating the votes table is probably what's taking so long. The query will out of necessity need to do a full table scan to satisfy the request, which can take a long time when the number of rows is large. You can run the query on its own to verify.

There are not a lot of things you can do to tune this passively. You can make sure the votes table is minimized in side (by restricting the maximum length of the field data types) so that more rows can be retrieved at a time, and you can try caching the aggregation operation in some fashion.

If you can do more drastic things I would encourage you to redefine the votes table so that it holds an aggregated total per canine instead of individual votes. The table is probably write few read many, so it makes sense to optimize for reading.
Yeah I'm thinking maybe a running tally in the canines table for when I need a count, and keep the votes table for when I need to verify who made a vote and when. I guess I've always just had database normalization pounded into my head so hard, it was tough to think of the alternative. http://www.codinghorror.com/blog/archives/001152.html is a great article.

stack
Nov 28, 2000
This might be better placed in the coding horrors thread but maybe the developer knows something I don't. A developer wrote a bunch of insert statements which could collide with other uniquely keyed rows and decided to craft up his inserts like so (mysql) :

code:
INSERT IGNORE ... ON DUPLICATE KEY ...
This doesn't even make sense to me. IGNORE means quietly drop the insert, while the ON DUPLICATE KEY bit means go ahead and update the existing row.

I think the application is working as expected and rows are being updated, that is I assume they are since he hasn't raised any issues regarding his data being updated properly.

Is this :downs: or :)?

edabcedbdfeb
Nov 11, 2005

I'm having a pretty newbie problem with a search module I'm writing. The schema is that there are sessions in one table, with multiple files accessed per each session that are stored in another table. I'm trying to get sessions that do not have a certain file accessed, and this is what I'm doing:
code:
SELECT * FROM session_records s
LEFT JOIN file_records f
ON f.ipid = s.id
WHERE f.file NOT LIKE '%pdf%'
This just flat out does not work at all, returning a lot of sessions that still accessed PDFs. The opposite operation (looking for sessions that did access PDFs with LIKE) works fine. What am I doing wrong?

stack
Nov 28, 2000

kirvett posted:

I'm having a pretty newbie problem with a search module I'm writing. The schema is that there are sessions in one table, with multiple files accessed per each session that are stored in another table. I'm trying to get sessions that do not have a certain file accessed, and this is what I'm doing:
code:
SELECT * FROM session_records s
LEFT JOIN file_records f
ON f.ipid = s.id
WHERE f.file NOT LIKE '%pdf%'
This just flat out does not work at all, returning a lot of sessions that still accessed PDFs. The opposite operation (looking for sessions that did access PDFs with LIKE) works fine. What am I doing wrong?

code:
SELECT * FROM session_records s
LEFT JOIN file_records f ON (f.ipid = s.id AND f.file LIKE '%pdf%')
WHERE f.ipid IS NULL
Would likely run slow as all hell though.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

kirvett posted:

I'm having a pretty newbie problem with a search module I'm writing. The schema is that there are sessions in one table, with multiple files accessed per each session that are stored in another table. I'm trying to get sessions that do not have a certain file accessed, and this is what I'm doing:
code:
SELECT * FROM session_records s
LEFT JOIN file_records f
ON f.ipid = s.id
WHERE f.file NOT LIKE '%pdf%'
This just flat out does not work at all, returning a lot of sessions that still accessed PDFs. The opposite operation (looking for sessions that did access PDFs with LIKE) works fine. What am I doing wrong?
I think this is more what you are looking for:
code:
SELECT * FROM session_records
WHERE s.id NOT in (SELECT f.ipid FROM file_records f
WHERE f.file LIKE '%pdf%')

edabcedbdfeb
Nov 11, 2005

code:
SELECT * FROM session_records s
LEFT JOIN file_records f ON (f.ipid = s.id AND f.file LIKE '%pdf%')
WHERE f.ipid IS NULL
Since the SQL is dynamically generated and there could be other search terms for file accessed (either inclusive or exclusive), this probably wouldn't work.

code:
SELECT * FROM session_records 
WHERE s.id NOT in (SELECT f.ipid FROM file_records f
WHERE f.file LIKE '%pdf%')
I had previously considered doing something like this, although it wouldn't really work as is because MySQL considers all subqueries as dependent (see http://bugs.mysql.com/bug.php?id=9090). I can compile the subquery as a comma-separated list, which I've done before to get around this issue, but it's kind of a hassle. Looks like the only way though. There's also that wacky double subquery thing that somebody posted in that link that might also be worth a try.

Jazza
Nov 23, 2007
I have a small question, probably pretty easy to answer but my knowledge of SQL is pretty limited.

I have two tables - one called tbl_lunchorders which is basically a "container" table for a lunch ordering system which contains things like the user id (ie, who ordered it), what time they ordered it, etc. I then have another table called tbl_orderedfood which has entries for each food item ordered, with a foreign key (order_id) linking to tbl_lunchorders.

Basically what I want to do is to keep the last thirty COMPLETED orders in the database (tbl_lunchorders) and the ordered food items associated with it, and delete the older items.

Currently I've got this, which doesn't really help since while it clears the orders it doesn't clear the ordered food items.
mysql_query("DELETE FROM tbl_lunchorders WHERE filled = '1' AND id <= ('$latestorder[id]' - 30)") or die(mysql_error());

I made up something like:

mysql_query("DELETE FROM tbl_orderedfood f WHERE f.order_id = (SELECT o.id FROM tbl_lunchorders WHERE o.completed = '1' AND o.id <= ('$latestorder[id]' - 30))") or die(mysql_error());

Now this just throws a syntax error of:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE f.order_id = (SELECT o.id FROM tbl_lunchorders WHERE o.completed = '1' AND o.' at line 1

I've fiddled with this a bit and written various versions of it but I honestly don't know what I'm doing with this double table stuff. Any ideas?

Squashy Nipples
Aug 18, 2007

Why all the deleting? Sort of the point of of a DB is to grow and grow, and still be able to find stuff. Can't you just make a View for the last 30 Rows?

If you really want to keep to table to 30 entries, Stored Procedures (maybe with a Trigger) are the way to go. So, every time you add a Row, the procedure deletes every Row that is not returned in a "SELECT TOP 30" Subquery.

Fastbreak
Jul 4, 2002
Don't worry, I had ten bucks.
I have a question about the XML column type in MSSQL. I am just a developer and know enough about SQL to just get myself in a little trouble, but not enough to know how fast I can expect things to run. We need an answer for architecture and both DBA's are on vacation.

We have a webservice that is a general datacapture tool, you send it what is basically a hash table, it formats it into XML:
code:
<data>
   <key1>value1</key1>
   <key2>value2</key2>
   <key3>value3</key3>
   <key4>value4</key4>
</date>
The tool is used for a bunch of different apps, in a bunch of different places, using the same database row structure for each; the apps are seperated by PromotionID:
(freehanded from memory)
code:
[RowID] [bigint] IDENTITY(1,1) NOT NULL,
[PromotionID] [int] NOT NULL,
[FormData] [xml] NOT NULL,
[TimeStamper] [datetime] NOT NULL
We will have upward of 2 mill + of these entries, with XML a lot larger and potentially complex than that. The problem comes in where we will want to calculate percentages of items. For instance, how many key1 values are "X" based off the total number of key1 values. We can assume a controlled list of possible answers, I think 6 max.

How expensive of an SQL operation would it be to go through every row's XML, of a certain promoID, and calculate the percentages? It would naturally be faster to just use another count table for each value and vote, but we are hoping to be able to keep everything in one place. I am sure that the Xpath operations against the XML data are fast, but it still seems like it would be very expensive.

Any thoughts?

Adbot
ADBOT LOVES YOU

mezz
Aug 12, 2004

oh polly

Jazza posted:


mysql_query("DELETE FROM tbl_orderedfood f WHERE f.order_id = (SELECT o.id FROM tbl_lunchorders WHERE o.completed = '1' AND o.id <= ('$latestorder[id]' - 30))") or die(mysql_error());

Now this just throws a syntax error
First off, what Dr. Khron said, why delete those records? You usually won't run in performance issues even if you keep a few million rows.

Second, if id is an autocalculated field there is no guarantee that keeping the rows max(id)-30 up to max(id) will always be 30 records, you might end up even deleting the latest entries.

To simply awnser you question though, SELECT o.id FROM tbl_lunchorders o where... will fix your problem.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply