Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

TheFluff posted:

[snip]
e: I guess in many cases the id in(subquery) ends up with the same query plan as where exists huh, so it doesn't really matter does it
id in (select id from subquery) probably matches the intent of the query writer (see if this id from the main query is in the list of ids returned by the subquery) more often than exists (select 0 from subquery s where s.id = m.id) (see if the subquery returns any results, which we don't actually care about so we put garbage in the select, when we plug the id from the main query into the subquery). RDBMS creators figured out a while ago that the former often makes more intuitive sense than the latter, so they made the plans identical so, at least in this case, you don't have to pick performance over clarity.

That said, I use exists pretty much 100% of the time. Mostly because I got most of my SQL training from someone who did most of their learning when that could make a huge difference.

Adbot
ADBOT LOVES YOU

Cold on a Cob
Feb 6, 2006

i've seen so much, i'm going blind
and i'm brain dead virtually

College Slice
EXISTS is safer than IN as well. Try writing a NOT IN statement and throw in a NULL....

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
NOT IN(subquery) is also far harder for the query planner to optimize than NOT EXISTS IIRC, although that might be Postgres-specific? Don't remember.

e: right, at least it used to be that if you selected a nullable column in the subquery the query planner would go bananas. Not sure if that's the case anymore, but as mentioned you gotta make sure you don't get nulls in a NOT IN anyway.

TheFluff fucked around with this message at 21:51 on Oct 1, 2019

abelwingnut
Dec 23, 2002


TheFluff posted:

NOT IN(subquery) is also far harder for the query planner to optimize than NOT EXISTS IIRC, although that might be Postgres-specific? Don't remember.

e: right, at least it used to be that if you selected a nullable column in the subquery the query planner would go bananas. Not sure if that's the case anymore, but as mentioned you gotta make sure you don't get nulls in a NOT IN anyway.

this is true in ms-sql as well.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
That was a really old version of PostgreSQL, probably 7.4. I had queries hanging on a server because of that and went thru a complete analysis and rewrite. We're talking ten orders of magnitude difference between the "obvious" solution and the eventual gyration I used to permit "not x" in a user interface.

I've analyzed since then and things went from 8hr runtime to 8ms, so yeah I think they fixed it. :)

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Can someone help me untangle this WHERE clause?

I have daily batches of data loading into a database and in this process a new batch is loaded into a current table T1 and the current data gets moved to a previous table T2. I then do a difference between the two tables to determine activity. An issue I have is that sometimes the data are days or weeks apart and a the difference becomes artificially inflated due to the extended period between the two sequential data samples, so I'm trying to solve that by checking for the delta between two time stamps: what I'd like to happen is that if the elapsed time between two sequential files is greater than 45 hours, return zero for ppd and wupd.

code:
select t1.stamp as current, t2.stamp as previous, t1.username, t1.team, coalesce(t1.score-t2.score,0) as ppd, coalesce(t1.wu-t2.wu,0) as wupd
from public.user_temp_summary_current t1
left join public.user_temp_summary_previous t2 on t1.username=t2.username and t1.team=t2.team
where t2.stamp > (select distinct stamp - interval '45 hours' from public.user_temp_summary_current) 
The problem is that my check returns nothing for the entire record if the elapsed time is greater than 45 hours, when I want it to return
[username, team, stamp, 0, 0]

Clearly there is something wrong with my logic in the WHERE clause, but I can't figure out how to rephrase it. I'd thought about adding CASE statements for the calculation of ppd and wupd, but that seems kludgy and inefficient.

Vegetable
Oct 22, 2010

Quick question about combining left and right joins. Let’s say I write this:

SELECT something
FROM a
LEFT JOIN b ON a.id = b.id
RIGHT JOIN c ON b.id = c.id

Am I right to say a and b will be joined first, then the results of that gets joined with c?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Vegetable posted:

Quick question about combining left and right joins. Let’s say I write this:

SELECT something
FROM a
LEFT JOIN b ON a.id = b.id
RIGHT JOIN c ON b.id = c.id

Am I right to say a and b will be joined first, then the results of that gets joined with c?

I don't know off the top of my head. RIGHT JOINs aren't very commonly used. Is this an academic question or do you have an actual problem you want to solve (in which case you'd be better off describing what the actual problem is)

My inclination was that it should work the way you described, but I tried it out: http://sqlfiddle.com/#!17/b2041f/4 and that doesn't appear to be correct, nor can I see that it follows any rule based on simple left- or right-associativity. But I haven't spent time trying to figure it out.

Vegetable
Oct 22, 2010

Basically, I want all rows from c, and I want to attach to them (some set of columns that combines a and b).

The logical thing is to start with c and left join to a subquery of a & b, but I figured if I can write the whole thing without a subquery it’s neater.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
I think it gets transformed into (something equivalent to) the following query involving only LEFT JOINs.

code:
SELECT
    subq.idA AS Id
FROM
    C
    LEFT JOIN (
        SELECT
            A.Id AS idA,
            B.Id AS idB
        FROM
            A
            LEFT JOIN B ON A.Id = B.Id
    ) AS subq ON C.Id = subq.idB;

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Agrikk posted:

Can someone help me untangle this WHERE clause?

I have daily batches of data loading into a database and in this process a new batch is loaded into a current table T1 and the current data gets moved to a previous table T2. I then do a difference between the two tables to determine activity. An issue I have is that sometimes the data are days or weeks apart and a the difference becomes artificially inflated due to the extended period between the two sequential data samples, so I'm trying to solve that by checking for the delta between two time stamps: what I'd like to happen is that if the elapsed time between two sequential files is greater than 45 hours, return zero for ppd and wupd.

code:
select t1.stamp as current, t2.stamp as previous, t1.username, t1.team, coalesce(t1.score-t2.score,0) as ppd, coalesce(t1.wu-t2.wu,0) as wupd
from public.user_temp_summary_current t1
left join public.user_temp_summary_previous t2 on t1.username=t2.username and t1.team=t2.team
where t2.stamp > (select distinct stamp - interval '45 hours' from public.user_temp_summary_current) 
The problem is that my check returns nothing for the entire record if the elapsed time is greater than 45 hours, when I want it to return
[username, team, stamp, 0, 0]

Clearly there is something wrong with my logic in the WHERE clause, but I can't figure out how to rephrase it. I'd thought about adding CASE statements for the calculation of ppd and wupd, but that seems kludgy and inefficient.

code:
select
    t1.stamp as current,
    t2.stamp as previous,
    t1.username,
    t1.team,
    coalesce(t1.score - t2.score, 0) as ppd,
    coalesce(t1.wu - t2.wu, 0) as wupd
from
    public.user_temp_summary_current t1
    left join public.user_temp_summary_previous t2
        on t1.username = t2.username and t1.team = t2.team
where
    t2.stamp > (select distinct stamp - interval '45 hours' from public.user_temp_summary_current) 
The purpose of the WHERE clause is to filter out rows. The purpose of an individual expression in the SELECT expression list is to specify a column that you want to appear in the result set. If you want to express the notion that such-and-such a column in the result-set should be conditionally this or conditionally that, then that needs to be in the expression in the SELECT expression list.

Alternatively you could come up with a round-the-houses way of avoiding repeating that logic by using a subquery... but that would be weird and less readable IMO.

Hammerite fucked around with this message at 10:12 on Oct 2, 2019

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
Yeah, don't complain about your query not returning rows with an elapsed time >= 45 hours when you explicitly write the query to only include rows where the time is < 45 hours. Also, as written, the query will also exclude rows where there is no previous data, defeating the left outer join you presumably used for a reason. Also, why are you using a subquery in the where condition? Either the time stamps are all the same, in which case just use t2.stamp > (t1.stamp - interval '45 hours'), or they're not all the same and the subquery will return multiple values which your DB will (should?) complain about.

If the previous results are too old, can the previous column be empty (as if there were no previous results)? If so, just put the condition in the join.
SQL code:
select
    t1.stamp as current,
    t2.stamp as previous,
    t1.username,
    t1.team,
    coalesce(t1.score - t2.score, 0) as ppd,
    coalesce(t1.wu - t2.wu, 0) as wupd
from
    public.user_temp_summary_current t1
    left join public.user_temp_summary_previous t2
        on t1.username = t2.username and t1.team = t2.team
           and t2.stamp > (t1.stamp - interval '45 hours')
Otherwise, yeah, you'll have to do the timestamp comparison in a case statement in the ppd and wupd columns and you'll have to "duplicate" your logic. But that makes sense because you just want the logic in those two columns. You don't want to affect the whole query (otherwise you would have put the condition in the join).

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Vegetable posted:

Quick question about combining left and right joins. Let’s say I write this:

SELECT something
FROM a
LEFT JOIN b ON a.id = b.id
RIGHT JOIN c ON b.id = c.id

Am I right to say a and b will be joined first, then the results of that gets joined with c?
This depends on the database. PostgreSQL says:

https://www.postgresql.org/docs/9.0/sql-select.html posted:

A JOIN clause combines two FROM items. Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, JOINs nest left-to-right. 

My question would be, If the subselect is easier to read and understand, why don't you trust your query planner and use it? Premature optimization etc.

Cuntpunch
Oct 3, 2003

A monkey in a long line of kings
I've been out of the database loop for a while, but have I missed a memory or is it right to be a little concerned that brand spanking new SQL stuff at work is relying very heavily on CURSOR?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Cuntpunch posted:

I've been out of the database loop for a while, but have I missed a memory or is it right to be a little concerned that brand spanking new SQL stuff at work is relying very heavily on CURSOR?

I'd say you'd be right to be concerned.

Just-In-Timeberlake
Aug 18, 2003

Nth Doctor posted:

I'd say you'd be right to be concerned.

Maybe? The alternatives don't seem to have gotten much better since last I checked.

tyool 2019 and still no "for each row in query" loop method in TSQL, smdh.

Tax Oddity
Apr 8, 2007

The love cannon has a very short range, about 2 feet, so you're inevitably out of range. I have to close the distance.

Just-In-Timeberlake posted:

Maybe? The alternatives don't seem to have gotten much better since last I checked.

tyool 2019 and still no "for each row in query" loop method in TSQL, smdh.

The problem isn't cursors, specifically. The problem is looping in SQL. Don't do it.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Tax Oddity posted:

The problem isn't cursors, specifically. The problem is looping in SQL. Don't do it.

:hai:

I've only had to do a few loops in SQL and in those cases, I would always generate a temp table with the data to process and then run the cursor on the pre-gathered data. Typically for sproc executions per row.

Cursors can have locks maintained on the underlying data for the entire duration of the cursor.

Just-In-Timeberlake
Aug 18, 2003

Nth Doctor posted:

:hai:

I've only had to do a few loops in SQL and in those cases, I would always generate a temp table with the data to process and then run the cursor on the pre-gathered data. Typically for sproc executions per row.


Yeah, that's my usual method, but it's still poo poo.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Jethro posted:

Yeah, don't complain about your query not returning rows with an elapsed time >= 45 hours when you explicitly write the query to only include rows where the time is < 45 hours. Also, as written, the query will also exclude rows where there is no previous data, defeating the left outer join you presumably used for a reason. Also, why are you using a subquery in the where condition? Either the time stamps are all the same, in which case just use t2.stamp > (t1.stamp - interval '45 hours'), or they're not all the same and the subquery will return multiple values which your DB will (should?) complain about.


Otherwise, yeah, you'll have to do the timestamp comparison in a case statement in the ppd and wupd columns and you'll have to "duplicate" your logic. But that makes sense because you just want the logic in those two columns. You don't want to affect the whole query (otherwise you would have put the condition in the join).

This makes perfect sense, and yeah my thinking was flawed. I ended up going with the two cases statements (one for each column) and chucking the subquery. The load time and subsequent processing increased by about 60 seconds after loading 100 files (of 2300) and hopefully it won't creep up incrementally much more than that over the whole data set, but the data is "right" and that's the important thing.

Thanks for your help!

Mycroft Holmes
Mar 26, 2010

by Azathoth
n/m

Mycroft Holmes fucked around with this message at 22:09 on Oct 7, 2019

Mycroft Holmes
Mar 26, 2010

by Azathoth
new question. I've got three problems i can't get to work.
3. Write a query to display empno column and ename column data from emp table. Display all employees who work in the same deptno with any employee whose ename contains at least one T.
5. Create a query to display empno column, ename column, sal column, and the dept’s average sal data from emp table. Display all employees who have earned more than the same deptno’s average sal.
8. Create a query to display empno column, ename column, and sal column data from emp table. Display all employees whose ename contain at least one T and have earned less than the average sal of the same job.

I can't seem to get the first problem to work
Select empno, ename
from emp
where deptno = (select deptno
from emp
where ename = '%T%');

this doesn't work.

I can't seem to figure out how to do the dept avg salary instead of the avg of all salaries.

Can't do 8 without 3 and 5

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Mycroft Holmes posted:

new question. I've got three problems i can't get to work.
3. Write a query to display empno column and ename column data from emp table. Display all employees who work in the same deptno with any employee whose ename contains at least one T.
5. Create a query to display empno column, ename column, sal column, and the dept’s average sal data from emp table. Display all employees who have earned more than the same deptno’s average sal.
8. Create a query to display empno column, ename column, and sal column data from emp table. Display all employees whose ename contain at least one T and have earned less than the average sal of the same job.

I can't seem to get the first problem to work
Select empno, ename
from emp
where deptno = (select deptno
from emp
where ename = '%T%');

this doesn't work.

I can't seem to figure out how to do the dept avg salary instead of the avg of all salaries.

Can't do 8 without 3 and 5

These look like homework problems.
If they are, have you covered JOINs yet? Because you need to JOIN emp back to itself on deptno so you can get both the T named employees, along with their departmental Co-workers in separate columns in the same data set.

Mycroft Holmes
Mar 26, 2010

by Azathoth

Nth Doctor posted:

These look like homework problems.
If they are, have you covered JOINs yet? Because you need to JOIN emp back to itself on deptno so you can get both the T named employees, along with their departmental Co-workers in separate columns in the same data set.

can't do joins, that was last lesson, we have to do subquiries.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Your example is almost right. Change where deptno = (select ...) to where deptno in(select ...) and you should be in business. A select statement doesn't return a single value so you can't use the equality operator.

There was also some discussion the other week further up the page about WHERE IN() vs WHERE EXISTS() if you're curious.

redleader
Aug 18, 2005

Engage according to operational parameters
cursed_schema.sql

nielsm
Jun 1, 2009



Mycroft Holmes posted:

new question. I've got three problems i can't get to work.
3. Write a query to display empno column and ename column data from emp table. Display all employees who work in the same deptno with any employee whose ename contains at least one T.
5. Create a query to display empno column, ename column, sal column, and the dept’s average sal data from emp table. Display all employees who have earned more than the same deptno’s average sal.
8. Create a query to display empno column, ename column, and sal column data from emp table. Display all employees whose ename contain at least one T and have earned less than the average sal of the same job.

I can't seem to get the first problem to work
Select empno, ename
from emp
where deptno = (select deptno
from emp
where ename = '%T%');

this doesn't work.

I can't seem to figure out how to do the dept avg salary instead of the avg of all salaries.

Can't do 8 without 3 and 5

Keep in mind you need to use the LIKE operator to do wildcard matches, the = operator will do straight equals comparison.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Like the others mentioned you have two issues:

1. Use like instead of = for wildcard comparisons (empname like ‘%T%’).
2. Use in instead of = for multi-list filtering.

For #2 specifically you should also learn to filter via inner joined subquery (select... group by...) or (select distinct...) instead of in (select...). That will come in handy for some of your other problems.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!



Why would you inflict this on us?

NinpoEspiritoSanto
Oct 22, 2013




Nth Doctor posted:

Why would you inflict this on us?

Why would someone even admit to doing such a thing let alone blog about it...

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Bundy posted:

Why would someone even admit to doing such a thing let alone blog about it...

I mean, lots of people fail to understand normalization and how to apply it in an RDBMS. That isn't newsworthy.

I once got in a protracted discussion with a friend of a friend who was convinced that the way to maintain last read links in forums is to have a comma delimited list of post ids in the user table.

Even after I drew the diagrams out, and pointed out the maximum lengths that could fit in a column.

:downs:: well I guess the forum should stop keeping track at that point

nielsm
Jun 1, 2009



Nth Doctor posted:

I mean, lots of people fail to understand normalization and how to apply it in an RDBMS. That isn't newsworthy.

I once got in a protracted discussion with a friend of a friend who was convinced that the way to maintain last read links in forums is to have a comma delimited list of post ids in the user table.

Even after I drew the diagrams out, and pointed out the maximum lengths that could fit in a column.

:downs:: well I guess the forum should stop keeping track at that point

Some people don't understand that computers are actually very good at keeping track of thousands or millions of very similar-looking data rows. They can't imagine themselves doing it/doing it in an Excel sheet, so it can't possibly be a good idea to do it in an RDBMS either.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
They say that one of their goals is "avoid the need for joins as much as possible". Why do you suppose they have that as a goal? Given that they are working from seemingly bizarre premises, it is not that surprising that they have created something bizarre.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Hammerite posted:

They say that one of their goals is "avoid the need for joins as much as possible". Why do you suppose they have that as a goal? Given that they are working from seemingly bizarre premises, it is not that surprising that they have created something bizarre.

quote:

Avoid the need for joins as much as pos-
:commissar:

Just-In-Timeberlake
Aug 18, 2003

yeah, when it got to the part where they were rolling their own PKs I had to make sure I wasn't reading the Onion

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Okay I skimmed it but had to go back and read it.

quote:

custom JSON.NET Contract Resolver to ensure that properties that are already saved in other columns of the table aren't duplicated in the JSON
Way to rebuild a... database engine.

quote:

For a many-to-many, we do something a bit naughty: we store the values with | characters between each value:
Millennials? Sorry, I meant: Didn't have an intro database course in college?

quote:

Each time the row is written, we split the values, and use the MERGEstatement to keep a join table up to date: (trigger is delete then insert cross apply)
Durr you delete hence blow the eventid, opening yourself up to race conditions. And cross apply? Really? That should tell you something is wrong.

quote:

:eng99:
Generating primary keys in chunks manually.
no eventual consistency to worry about ... Instead of a unit of work, we simply abstract a SQL transaction/connection.
Fast queries ... (the whole doc)
Class constructors can set reasonable default values.
:suicide:

Why didn't they just use Access? Or Excel?

quote:

SQL Server 2016 is introducing support for JSON,... it'll take a while before we can rely on these features.
The gently caress?


Well thanks for sharing an article demonstrating how not to do many things. I'm gonna need a drink now.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


I realize this is an old post of theirs, but I'm most angry that I've used Octopus Deploy, though we're in the middle of phasing it out. I should share this around the office because every single one of our devs knows better.

redleader
Aug 18, 2005

Engage according to operational parameters

Hammerite posted:

They say that one of their goals is "avoid the need for joins as much as possible". Why do you suppose they have that as a goal? Given that they are working from seemingly bizarre premises, it is not that surprising that they have created something bizarre.

a lot of devs are really, really scared of joins for some reason. they often say "performance!" but i think they're really just nervous and intimidated by sql

Adbot
ADBOT LOVES YOU

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


redleader posted:

a lot of devs are really, really scared of joins for some reason. they often say "performance!" but i think they're really just nervous and intimidated by sql

It's this and never making indices

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply