Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
kloa
Feb 14, 2007


My DBA experience since graduating college has been working as a Prod DBA (2 years) and I haven't really gotten into SQL development yet. Is there a recommended version control add-on for MS-SQL? Preferably free as we're mostly SaaS anyways and I don't need anything that robust, just for personal learning.

Adbot
ADBOT LOVES YOU

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

BabyFur Denny posted:

Hi, what tools do you use for mySQL clients?
Most of our work day is spent writing Queries, either live or in stored procedures. We're currently using SQL Explorer, an Eclipse plugin, which works nicely with our shared svn repository of SQL scripts, which we imported as project in Eclipse using SVNKit.
The issue we have is that Eclipse/ SQL Explorer crashes a lot whenever we click the wrong button at the wrong time and it gets really frustrating because usually quite some queries/code are just gone.

I've been looking at other tools, but I haven't found one yet that offers such a nice integration of browsing/committing/updating a svn repository.
At this point we would also easily consider a commercial solution.

Any recommendations here?

http://www.mysql.com/products/workbench/
?

Have no idea about it's SVN integration but it's what I've busted out when I've had to get deep into MySQL.

Sulla Faex
May 14, 2010

No man ever did me so much good, or enemy so much harm, but I repaid him with ENDLESS SHITPOSTING

Anaxandrides posted:

The EXPLAIN explains it all about your other table. Those indexes you're making aren't doing the trick for this thing -- SQL chose to use the PK for mdl_user rather than a specific index that includes those columns, and your other index (newindex1) isn't being used either. 99% bet that it's not being used because the column you're using as a join predicate-cum-WHERE clause isn't included in the index.

Replace CREATE INDEX newindex1 ON mdl_course ( shortname, startdate ); with CREATE INDEX IX_IDStartShortName ON mdl_course ( id, startdate, shortname ); and you should see some improvement. Similarly, CREATE INDEX IX_IDName ON mdl_user ( id, firstname, lastname ); should help. I'd be surprised if it didn't.

I tried this including several variations on FORCE INDEX (..) and sub-selects to pre-order the two sort tables but it didn't help. It's shaved off maybe 1 second on the normal query, but other forms are still around 25 seconds.

The combined index on one table is about 783 characters which is hosed but I tried to create an index using the first ten characters of the long column but that didn't help either, it wouldn't use the index then. I didn't spend too long trying to fix it (i.e. ORDER BY a substring) because after the initial quick sort I would still need to have a more exact sort. I don't know if doing a pre-sort sort (based on the first couple of letters of a column) would speed it up at all, for 650k rows...

Here's the new explain extended:

code:
+----+-------------+-------+--------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                                     | key                       | key_len | ref                                                       | rows | filtered | Extra                                        |
+----+-------------+-------+--------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | d     | index  | PRIMARY,ep_idshosta_ix                                            | ep_idshosta_ix            | 783     | NULL                                                      |  424 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | b     | ref    | PRIMARY,mdl_enro_cou_ix                                           | mdl_enro_cou_ix           | 8       | MYDATABASE.d.id                                   |    2 |   100.00 | Using index                                  |
|  1 | SIMPLE      | c     | ref    | mdl_userenro_enruse_uix,mdl_userenro_enr_ix,mdl_userenro_use_ix   | mdl_userenro_enruse_uix   | 8       | MYDATABASE.b.id                                   |  225 |   100.00 | Using index                                  |
|  1 | SIMPLE      | cc    | ref    | mdl_courcomp_use_ix,mdl_courcomp_cou_ix,mdl_course_completions_i2 | mdl_course_completions_i2 | 16      | MYDATABASE.b.courseid,MYDATABASE.c.userid |    1 |   100.00 | Using where                                  |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY,ep_idlasfir_ix                                            | PRIMARY                   | 8       | MYDATABASE.cc.userid                              |    1 |   100.00 | Using where                                  |
+----+-------------+-------+--------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
And if I force indexes on the two tables:

(ep_idshosta_ix is id, shortname, startdate)
(ep_idlasfir_ix is id, lastname, firstname)

code:
+----+-------------+-------+-------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type  | possible_keys                                                     | key                       | key_len | ref                                                       | rows | filtered | Extra                                        |
+----+-------------+-------+-------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | d     | index | ep_idshosta_ix                                                    | ep_idshosta_ix            | 783     | NULL                                                      |  424 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | b     | ref   | PRIMARY,mdl_enro_cou_ix                                           | mdl_enro_cou_ix           | 8       | MYDATABASE.d.id                                   |    2 |   100.00 | Using index                                  |
|  1 | SIMPLE      | c     | ref   | mdl_userenro_enruse_uix,mdl_userenro_enr_ix,mdl_userenro_use_ix   | mdl_userenro_enruse_uix   | 8       | MYDATABASE.b.id                                   |  225 |   100.00 | Using index                                  |
|  1 | SIMPLE      | cc    | ref   | mdl_courcomp_use_ix,mdl_courcomp_cou_ix,mdl_course_completions_i2 | mdl_course_completions_i2 | 16      | MYDATABASE.b.courseid,MYDATABASE.c.userid |    1 |   100.00 | Using where                                  |
|  1 | SIMPLE      | a     | ref   | ep_idlasfir_ix                                                    | ep_idlasfir_ix            | 8       | MYDATABASE.cc.userid                              |    1 |   100.00 | Using where; Using index                     |
+----+-------------+-------+-------+-------------------------------------------------------------------+---------------------------+---------+-----------------------------------------------------------+------+----------+----------------------------------------------+
If I'm not doing anything pretty obviously wrong and the source of the issue is that super long index on one column which can't be overcome easily, I'll just drop it and leave it as-is. It's not a query or a DB structure I've created, I just noticed how broken it is and thought I'd have a try at improving it because at some point they're bound to notice the lag and then blame it on us because we're the last ones to touch it. But if it's not an easy fix it's not on my to-do list so they can sort out a DBA at some point if they want.

e: Just noticed that the last table references a different column than the index I've forced. Yet if I remove the FORCE INDEX for the 'userid' ref table, it uses the PRIMARY key and the query takes 26seconds instead of 20. Ugh

e2: If it helps clarify, mdl_course is about 443 rows, mdl_user is about 41923. If I order by:

ORDER BY LEFT(d.shortname, 4),d.startdate,a.lastname,a.firstname

The time drops down to 13 odd seconds, but if I do two passes: LEFT(d.shortname, 1),d.shortname it goes back up to 20 seconds.

Sulla Faex fucked around with this message at 10:42 on Jul 2, 2014

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Sulla-Marius 88 posted:

code:
SELECT c.id, 
       a.id             AS userid, 
       a.firstname, 
       a.lastname, 
       d.shortname, 
       d.startdate, 
       d.id             AS courseid, 
       cc.timecompleted AS status, 
       cc.timestarted 
FROM   mdl_user a, 
       mdl_enrol b, 
       mdl_user_enrolments c, 
       mdl_course d 
       LEFT JOIN mdl_course_completions cc 
              ON cc.course = d.id 
WHERE  a.id = cc.userid 
       AND a.id = c.userid 
       AND b.id = c.enrolid 
       AND b.courseid = d.id 
ORDER  BY d.shortname, 
          d.startdate, 
          a.lastname, 
          a.firstname 
LIMIT  0, 100; 
Are you left joining course completions (i.e. you want all courses and the users enrolled in them, regardless of whether or not the user completed the course), or are you inner joining it? Putting a.id = cc.userid means this will behave as an inner join despite the left outer join above.

If you want to use a left join, do this:
code:
SELECT c.id, 
       a.id             AS userid, 
       a.firstname, 
       a.lastname, 
       d.shortname, 
       d.startdate, 
       d.id             AS courseid, 
       cc.timecompleted AS status, 
       cc.timestarted 
FROM   mdl_user a, 
       mdl_enrol b, 
       mdl_user_enrolments c, 
       mdl_course d 
       LEFT JOIN mdl_course_completions cc 
              ON cc.course = b.courseid
                 AND cc.userid = c.userid
WHERE  a.id = c.userid 
       AND b.id = c.enrolid 
       AND b.courseid = d.id 
ORDER  BY d.shortname, 
          d.startdate, 
          a.lastname, 
          a.firstname 
LIMIT  0, 100; 

One Swell Foop
Aug 5, 2010

I'm afraid we have no time for codes and manners.

kloa posted:

My DBA experience since graduating college has been working as a Prod DBA (2 years) and I haven't really gotten into SQL development yet. Is there a recommended version control add-on for MS-SQL? Preferably free as we're mostly SaaS anyways and I don't need anything that robust, just for personal learning.

The official answer is that you should use Microsoft Team Foundation Server's database plugin, but that would require a TFS licence. The next option would be to use middleware from Red Gate, which is pricey but less than TFS. The third option is to automate with scripts, which I've never tried but is supposedly possible.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there a recommended dummies book for understanding the basics of SQL? I just started a new job in hospital clinical informatics. I doubt I'll ever write to a database but I'm sure it would be handy if I knew how to read and pull reports from them. I'm trying to teach myself VBA and Python to that end.

Sulla Faex
May 14, 2010

No man ever did me so much good, or enemy so much harm, but I repaid him with ENDLESS SHITPOSTING

Jethro posted:

Are you left joining course completions (i.e. you want all courses and the users enrolled in them, regardless of whether or not the user completed the course), or are you inner joining it? Putting a.id = cc.userid means this will behave as an inner join despite the left outer join above.

If you want to use a left join, do this:

That is a fair point, thank you. I think it should be an INNER JOIN but I'm not 100% sure, I'll have to find out at what point it was changed from the original code and check who did it and if possible ask them.

Sulla Faex fucked around with this message at 11:23 on Jul 3, 2014

FieryBalrog
Apr 7, 2010
Grimey Drawer
I prefer this style of syntax so that the type of joins you are doing becomes clearer:
code:
SELECT ue.id            AS user_enrollment_id,  
       u.id             AS userid, 
       u.firstname, 
       u.lastname, 
       c.shortname, 
       c.startdate, 
       c.id             AS courseid, 
       cc.timecompleted AS status, 
       cc.timestarted 
FROM   mdl_user u
JOIN   mdl_user_enrolments ue ON ue.userid = u.id
JOIN   mdl_enrol e ON  e.id = ue.enrolid
JOIN   mdl_course c ON c.id = e.courseid
LEFT JOIN mdl_course_completions cc 
       ON cc.course = c.id 
WHERE  u.id = cc.userid --preserves left join filter behavior
ORDER  BY c.shortname, 
          c.startdate, 
          u.lastname, 
          u.firstname 
LIMIT  0, 100; x 
This way you're not mixing and matching the two different join syntaxes.

I know the 'a,b,c,d' style of aliasing tables exists but I'm not a fan. It leads to more confusion about which table is what. Great example is that Course Completion is aliased to 'cc' but Course itself is aliased to 'd' while User Enrollments is aliased to 'c' instead.

FieryBalrog fucked around with this message at 15:21 on Jul 4, 2014

Sulla Faex
May 14, 2010

No man ever did me so much good, or enemy so much harm, but I repaid him with ENDLESS SHITPOSTING
Yeah the trick is deciding what to do. I'm managing/updating legacy code but what they really want to do is just get it working on a new version of the platform. I wouldn't have touched that query except that it was literally taking twenty full seconds to run. It still is, but it's not part of the work description so I'm leaving it. It's always tough figuring out a balance for time spent vs income vs clarity vs not modifying code that somebody else might be more familiar with. I don't know who'll be maintaining this in future, there's no indication it will be me, so I don't want to touch/clean up too much in case the people who normally maintain it are going to get cranky.

Suffice to say the rest of it is pretty ugly but the thing that concerns me most is the functionality of it. If the job/project description included an unspoken "You will be maintaining this code in future, please do whatever is required to ensure long-term comfort" then I'd start moving stuff around, but as it is I'm going with a light touch. The touch is so bloody light that even the twenty second query is going to be left as-is because nobody else has mentioned it so I'm assuming they'd care more if it broke (or returned different data) than about trying to speed it up.

Just makes me want to work harder on my own projects and business ideas so I don't have to spend the rest of my life figuring out the balance between the right thing and the cost effective thing on code so ugly you wonder how it could ever have passed review.

the
Jul 18, 2004

by Cowcaster
I need to grab a bunch of entries, which method is faster?

a. Grab entire list A, Grab entire list B, filter out on my end

b. Grab entire list A, Grab what I need from list B by doing a query on each entry of list A

I'm trying to run b right now, and it's taking forever. I'm basically doing something like this:

code:
SELECT Info FROM List WHERE Id = [id from row in list A]
Over the 20,000 or so rows in list A

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

the posted:

I need to grab a bunch of entries, which method is faster?

a. Grab entire list A, Grab entire list B, filter out on my end

b. Grab entire list A, Grab what I need from list B by doing a query on each entry of list A

I'm trying to run b right now, and it's taking forever. I'm basically doing something like this:

code:
SELECT Info FROM List WHERE Id = [id from row in list A]
Over the 20,000 or so rows in list A

Smells like a join to me

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Malcolm XML posted:

Smells like a join to me

Listen to this guy right here.

the
Jul 18, 2004

by Cowcaster

Malcolm XML posted:

Smells like a join to me

Go on....

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


the posted:

Go on....

It's tough without knowing a little bit more about the schema but here's a sketch:

code:
	SELECT
		A.Info,
		B.Info
	FROM Table_A AS A
	JOIN Table_B AS B
	  ON B.Id = A.Id
	WHERE
	  -- Filters

the
Jul 18, 2004

by Cowcaster
You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code.

I'm grabbing one list with the following:

Python code:
opp_query = "SELECT AccountId, Id, StageName FROM Opportunity 
WHERE RecordType.Name = \'FD - Buying\'"
query_opps = svc.query(opp_query)
Then doing some stuff to it, and then doing a second query with:

Python code:
for row in filtered_opps:
	ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE 
AND OpportunityId = " + '\'' + row[1] + '\''
	query = svc.query(ocr_query)
Where you can see here, I'm making about 20,000 individual queries (the size of filtered_opps).

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


the posted:

You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code.

I'm grabbing one list with the following:

Python code:
opp_query = "SELECT AccountId, Id, StageName FROM Opportunity 
WHERE RecordType.Name = \'FD - Buying\'"
query_opps = svc.query(opp_query)
Then doing some stuff to it, and then doing a second query with:

Python code:
for row in filtered_opps:
	ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE 
AND OpportunityId = " + '\'' + row[1] + '\''
	query = svc.query(ocr_query)
Where you can see here, I'm making about 20,000 individual queries (the size of filtered_opps).

I'd see if this works, because 20k individual queries is going to suck.
Python code:
test_query = "SELECT O.AccountId, O.Id, O.StageName, R.ContactId FROM Opportunity AS O
JOIN OpportunityContactRole AS R
  ON R.OpportunityId = O.Id
WHERE
  RecordType.Name = \'FD - Buying\' -- THIS MAY NEED AN ALIAS LIKE O.RecordType.Name
  AND R.IsPrimary = TRUE"
query = svc.query(test_query)

the
Jul 18, 2004

by Cowcaster
I have this exact line:

opp_query = "SELECT O.AccountId, O.Id, O.StageName, R.ContactId FROM Opportunity AS O JOIN OpportunityContactRole AS R ON R.OpportunityId = O.Id WHERE RecordType.Name = \'FD - Buying\' AND R.IsPrimary = TRUE"

And I'm getting an error:

code:
SoapFaultError: 'MALFORMED_QUERY' 'MALFORMED_QUERY: R.ContactId FROM 
Opportunity AS O JOIN OpportunityContactRole AS ^ ERROR at Row:1:Column:73 
unexpected token: JOIN'
So, it looks like there's an issue with the JOIN command?

edit: If I'm supposed to put it on multiple lines, I'm not sure how I can format that in Python.

the fucked around with this message at 18:59 on Jul 7, 2014

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


the posted:

I have this exact line:

opp_query = "SELECT O.AccountId, O.Id, O.StageName, R.ContactId FROM Opportunity AS O JOIN OpportunityContactRole AS R ON R.OpportunityId = O.Id WHERE RecordType.Name = \'FD - Buying\' AND R.IsPrimary = TRUE"

And I'm getting an error:

code:
SoapFaultError: 'MALFORMED_QUERY' 'MALFORMED_QUERY: R.ContactId FROM 
Opportunity AS O JOIN OpportunityContactRole AS ^ ERROR at Row:1:Column:73 
unexpected token: JOIN'
So, it looks like there's an issue with the JOIN command?

edit: If I'm supposed to put it on multiple lines, I'm not sure how I can format that in Python.

It looks like SOQL doesn't do explicit joins and that makes me a sad.
https://developer.salesforce.com/page/From_SQL_to_SOQL
Apparently, it does some subquery like syntax to make it all work in the example.

the
Jul 18, 2004

by Cowcaster
fletcher would know how to do this.

According to that page, I can link by simply going Blah.blah, but I'm getting errors when doing it.

For example, every Opportunity has an OpportunityContactRole associated with it, but I get errors when trying to reference that:

opp_query = "SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId FROM Opportunity WHERE RecordType.Name = \'FD - Buying\'"

Error:

SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId ^ ERROR at Row:1:Column:34 Didn't understand relationship 'OpportunityContactRole' in field path. If you are attempting to use a custom relationship, be sure to append the '__r' after the custom relationship name. Please reference your WSDL or the describe call for the appropriate names."

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

the posted:

fletcher would know how to do this.

According to that page, I can link by simply going Blah.blah, but I'm getting errors when doing it.

For example, every Opportunity has an OpportunityContactRole associated with it, but I get errors when trying to reference that:

opp_query = "SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId FROM Opportunity WHERE RecordType.Name = \'FD - Buying\'"

Error:

SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId ^ ERROR at Row:1:Column:34 Didn't understand relationship 'OpportunityContactRole' in field path. If you are attempting to use a custom relationship, be sure to append the '__r' after the custom relationship name. Please reference your WSDL or the describe call for the appropriate names."

Alternatively you can just do a hash join client side on the filtered tuples.

Make a dictionary keyed by the Id and just lookup manually

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

the posted:

You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code.

I'm grabbing one list with the following:

Python code:
opp_query = "SELECT AccountId, Id, StageName FROM Opportunity 
WHERE RecordType.Name = \'FD - Buying\'"
query_opps = svc.query(opp_query)
Then doing some stuff to it, and then doing a second query with:

Python code:
for row in filtered_opps:
	ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE 
AND OpportunityId = " + '\'' + row[1] + '\''
	query = svc.query(ocr_query)
Where you can see here, I'm making about 20,000 individual queries (the size of filtered_opps).

Use a subquery (assuming you can perfrom 'some stuff' where additional SOQL)

code:
ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE and OpportunityId IN (SELECT Id FROM Opportunity WHERE RecordType.Name = 'FD - Buying'"
Or just build a list of OpportunityIds and query for them all at once like this (note that you will need to quote & comma your opportunity_ids here, I've omitted that part). Also make sure you split this up into multiple queries if necessary due to restrictions on how long your SOQL statement can be (20k characters I think?)

Python code:
opportunity_ids = []
# do stuff to retrieve opportunity_ids
ocr_query = "SELECT ContactId FROM OpportunityContactRole WHERE IsPrimary = TRUE and OpportunityId IN (" + opportunity_ids +")"

the posted:

fletcher would know how to do this.

According to that page, I can link by simply going Blah.blah, but I'm getting errors when doing it.

For example, every Opportunity has an OpportunityContactRole associated with it, but I get errors when trying to reference that:

opp_query = "SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId FROM Opportunity WHERE RecordType.Name = \'FD - Buying\'"

Error:

SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId ^ ERROR at Row:1:Column:34 Didn't understand relationship 'OpportunityContactRole' in field path. If you are attempting to use a custom relationship, be sure to append the '__r' after the custom relationship name. Please reference your WSDL or the describe call for the appropriate names."

Maybe flip it around and query for Opportunity fields through OpportunityContactRole:

Python code:
opp_query = "SELECT Opportunity.AccountId, Opportunity.Id, Opportunity.StageName, ContactId FROM OpportunityContactRole WHERE Opportunity.RecordType.Name = 'FD - Buying'"
(I think that should work, I don't have RecordTypes on my Opportunity object to try it)

PapFinn
Jul 15, 2003

I am Ron Pearlman's illigitimate love child.
I'm pretty much a beginner with SQL and will need to be pulling some extracts from our HR system for yearly audits soon. Last year, I took full extracts and did what I needed to do in Excel. I'd like to build this all out in SQL Server this year so they can be run on demand.

I'm looking less for the exact syntax as an answer than I am looking for the concepts that I either don't know or I'm not thinking to utilize for this issue.

I need to select all the terminated records for the calender year, but exclude people who have at least one active job code. Some people have multiple jobs, usually with matching employee id's but sometimes not. I was thinking about attempting to join the table to itself and counting the instances of (first,last,last4SS#) but I'm just not sure how to do that and check for an active record within the query. Maybe using if statements would work, but I've really just begun the process. I'm still learning and would rather learn the correct way to do this than come to some convoluted solution.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

PapFinn posted:

I'm pretty much a beginner with SQL...

I need to select all the terminated records for the calender year, but exclude people who have at least one active job code. Some people have multiple jobs, usually with matching employee id's but sometimes not.
Without further information, it sounds like the use of an SQL EXCEPT clause. E.G., "select ssn from terminated except select ssn from active". I do hope that your SSNs are unique.

PapFinn
Jul 15, 2003

I am Ron Pearlman's illigitimate love child.
I haven't used Except, so I'll start reading up on it now. Thanks.

Out of curiosity, what are you getting at with the comment about SSN? There are certainly duplicate last 4 of SSN, but I don't think I've run into a duplicate "First,Last,Last4SSN". I'm only working with ~80,000 records though. Is that bad practice?

Edit: Maybe a better explanation of what I'm doing. With the data below, I want to return only User Two. User One does have a terminated record, but the second record for User One would not qualify him as a terminated user.

code:
EmpID  UserFirst  UserLast SSN   Job# Status      TermDate
12345  User       One      1234  0    Terminated  2014-01-05
12345  User       One      1234  1    Active      NULL
12346  User       Two      4321  0    Terminated  2014-02-05

PapFinn fucked around with this message at 15:28 on Jul 11, 2014

One Swell Foop
Aug 5, 2010

I'm afraid we have no time for codes and manners.
A key point of SQL records is that there should be a completely reliable way of tying an individual record to the object it represents (in this case, an employee). Your 'first,last,last4ssn' is only good enough up until the point when it isn't good enough. In this case, that's not your fault, but it could turn out to be your problem in the future.

That said,

code:
SELECT * FROM Employees E1 WHERE E1.Status = 'Terminated' AND NOT EXISTS
  (SELECT * FROM Employees E2 
    WHERE E1.First = E2.First
	AND E1.Last = E2.Last
	AND E1.Last4SSN = E2.Last4SSN
	AND E2.Status = 'Active')

One Swell Foop fucked around with this message at 15:52 on Jul 11, 2014

Workaday Wizard
Oct 23, 2009

by Pragmatica
Hello SQL thread, I have a PostgreSQL question:
In my project I am tracking the changes of a property x on a number of different targets.
I have a table called 'property_x' like this:
pre:
id: serial primary key
target: char[8]
value: double
update_time: timestamp
I want to get all the maximas for each target. Like so:


I don't care about the cases where the maxima is a plateau (just return any point on the plateau).

Now doing this in my application is easy enough (for each point assume it's a maxima unless we have a next point and that point is bigger), but this takes ages since the whole table need to be transmitted from the db to the application.

What's the proper way to do this in PgSQL?

Sedro
Dec 31, 2008
Something like this
SQL code:
select id, target, value, update_time
from (
    select *, lead(value) over(partition by target order by update_time) as next
    from property_x
) t
where value > next
(this will require small modification to work in the case where your local maxima is the last update_time)

Edit: attempted fix
SQL code:
select id, target, value, update_time
from (
    select *,
           lead(value) over(partition by target order by update_time) as next,
           lag(value)  over(partition by target order by update_time) as prev
    from property_x
) t
where value > next or (next is null and value >= prev)

Sedro fucked around with this message at 23:48 on Jul 11, 2014

Workaday Wizard
Oct 23, 2009

by Pragmatica

Thank you very much :tipshat:

Fluue
Jan 2, 2008
Need some quick logic help on this SQL query I'm refactoring.

I have a query that searches the database for several parameters, but there are 2 logic checks (WHERE) that I can't seem to write concisely.

My query:

code:
SELECT *
FROM :catalog
WHERE legend !=1  ## trouble spot 1
AND final_ed !=1  ## trouble spot 2
AND (
     id IN (
            SELECT id 
            FROM :catalog 
            WHERE type1 = :pokeType1 
            OR type2 = :pokeType1) 
     OR id IN (
               SELECT id 
               FROM :catalog
               WHERE type1 = :pokeType2 
               OR type2 = :pokeType2)
     )
ORDER BY rand() LIMIT :setLimit
My DB is currently in need of some fixing, but I need to do a bandaid fix for now while I rebuild it. That said, the columns for legend and final_ed are setup like this:

Legend: rows marked as legend have the value 1 in the Legend column, everything else has 0 in the legend column
Final_ed: Everything not considered "final_ed" is 1, rows that are "final_ed" have value of 0 in final_ed column

How can I modify my query so I can get results in these kind of scenarios:
1. Get me all results considered "legend" and "final_ed"
2. Get me all considered "not legend" and not "final_ed"
3. Get me all considered "not legend" and ARE "final_ed"

Using (stupidly) less-than-equal-to operands does not give me the result I want.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
code:
WHERE legend =1 AND final_ed =0

WHERE legend <> 1 AND final_ed <> 0

WHERE legend <> 1 AND final_ed = 0
?

Fluue
Jan 2, 2008

Jethro posted:

code:
WHERE legend =1 AND final_ed =0

WHERE legend <> 1 AND final_ed <> 0

WHERE legend <> 1 AND final_ed = 0
?

So there's no way to do this query without making 3 separate ones?

BabyFur Denny
Mar 18, 2003
where legend*final_ed = 0

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Fluue posted:

So there's no way to do this query without making 3 separate ones?
I think I mostly just didn't understand what you were asking.
If you want all three scenarios at once, then you'd do:
code:
WHERE ((legend =1 AND final_ed =0)
      OR (legend <> 1 AND final_ed <> 0)
      OR (legend <> 1 AND final_ed = 0))
though the three conditions you want could also be written as:
code:
WHERE (legend <> 1 OR final_ed =0)

Fluue
Jan 2, 2008

Jethro posted:

I think I mostly just didn't understand what you were asking.
If you want all three scenarios at once, then you'd do:
code:
WHERE ((legend =1 AND final_ed =0)
      OR (legend <> 1 AND final_ed <> 0)
      OR (legend <> 1 AND final_ed = 0))
though the three conditions you want could also be written as:
code:
WHERE (legend <> 1 OR final_ed =0)

Sorry for not being clear. I was a little vague on this, but the queries that do not say that final_ed must be '0' should also include rows that have final_ed as '1' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1)

Since I can't do a wildcard on this (without using a LIKE, which leads to even bigger issues when dealing with INT) am I left dynamically building this query in my script before sending it to the DB?

e: it'd probably be easier to split legend off into another table and just do a join, right?

Fluue fucked around with this message at 01:13 on Jul 15, 2014

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Fluue posted:

Sorry for not being clear. I was a little vague on this, but the queries that do not say that final_ed must be '0' should also include rows that have final_ed as '1' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1)

Since I can't do a wildcard on this (without using a LIKE, which leads to even bigger issues when dealing with INT) am I left dynamically building this query in my script before sending it to the DB?

e: it'd probably be easier to split legend off into another table and just do a join, right?

I think I know what you're after, but can't type a response on my phone. What SQL engine are you using?

Roundboy
Oct 21, 2008
Is an 'or' 100% avoidable in a where clause? Should it be used or avoided?

I have a criteria to pull records where:
- they are in state x,y,z
- they are active
- they are inactive and updated within 2 years ago

my first instinct is to union them:

code:
select ID from table1
where state in('x','y','z')
and status = 'A'
union
select ID from table1
where state in('x','y','z')
and status = 'I'
and UPDATEDATE >= DATEADD(month, -24, GETUTCDATE())
This seems .. verbose, but the alternative is :

code:
where state in ('x','y','z')
and (status = 'A' or (status = 'I' and UPDATEDATE >= DATEADD(month, -24, GETUTCDATE())))
sqlserver 2008 if it matters. The table in question holds millions of records, and i expect to get a decent chunk

Roundboy fucked around with this message at 02:04 on Jul 15, 2014

One Swell Foop
Aug 5, 2010

I'm afraid we have no time for codes and manners.
An 'or' is fine and is perfectly at home in a where clause. Generally, don't try and second-guess the query optimizers in a modern database system. Your second example is optimal.

Is this query running on a remote server or locally? You may find that data volume and transfer rates are a bigger bottleneck than the server performance; however if you do find that you need the server to perform better then you should talk to the DBA about your query, if one is available.

One Swell Foop fucked around with this message at 03:35 on Jul 15, 2014

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
So the trouble here is that you need to start by writing concisely what you want in non-contradictory, but clear English. If you have to use examples to explain your meaning, then perhaps you don't understand what it is you want, though there are cases where examples help clarify meaning. If you can describe what it is you want, then it may be possible to derive a query from it.

For starters, you're assuming that we know the meaning of your columns, and that's neither correct nor essential. A statement such as this, therefore, is rather non-sensical and confusing: "[T]he queries that do not say that final_ed must be '0' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1)"... I'm still trying to understand what that even means. In particular, a WHERE clause adds a restriction to a query; it is not generally its purpose to remove a restriction. If you mean that you want all rows, independent of the value of final_ed, then it should not be restricted in your WHERE clause. If you mean, "only when final_ed is 0 or 1", then you want "WHERE (final_ed=0 OR final_ed=1)", which would seem strange since those are apparently the only two values anyway.

Please review what you actually need. Dispense with all statements of "rows that 'ARE' property-X" and please describe them specifically: final_ed=0, final_ed!=0, final_ed="watermelons", legend=7, "legend is an even number", and so forth. I think if you do this, you will find that the query starts to appear automatically.

Edit: It's doesn't have to be English, but you'll likely keep getting answers that are 'wrong' until such time as you can explain what it is you even want. What is the expected behavior? Include some sample rows and which would be returned (and not returned) by your query.

PhantomOfTheCopier fucked around with this message at 03:56 on Jul 15, 2014

Withnail
Feb 11, 2004
Can someone help explain why this query might run like poo poo but changing to select * runs fast


edit. missing index was the culprit

Withnail fucked around with this message at 21:03 on Jul 15, 2014

Adbot
ADBOT LOVES YOU

Essential
Aug 14, 2003
How important is it to have a CreatedDate & ModifiedDate columns on tables? I've got a guy here telling me that EVERY table, without exception should have those 2 columns. We're adding a couple tables and right now I don't care about CreatedDate/ModifiedDate on a few of the tables and I can't ever see it being needed on those tables. His argument is that what IF at some point in the future I want it and even if I don't, it isn't going to affect anything negatively.

I can see his point but I'd rather not add columns if I don't truly need them.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply