Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


anthonypants posted:

Dunno what flavor of SQL you're in but you're probably looking at a Common Table Expression here. Something like this:
code:
WITH 1st_ekg ( ekg_pt, ekg_time )
AS
(
SELECT patient, lab_type, lab_time
FROM table
WHERE lab_type = 'ekg'
ORDER BY lab_time DESC
)

WITH 1st_mri ( mri_pt, mri_time )
AS
(
SELECT patient, lab_type, lab_time
FROM table
WHERE lab_type = 'mri'
ORDER BY lab_time DESC
)

SELECT table.patient, table.admission_time, 1st_ekg.ekg_time AS 1st_ekg_time, 1st_mri.mri_time AS 1st_mri_time
FROM table
INNER JOIN 1st_mri
ON table.patient = 1st_mri.mri_pt
INNER JOIN 1st_ekg
ON table.patient = 1st_ekg.ekg_pt

This won't quite do it, it still produces all the rows. You'd want something like:
code:
WITH 1st_ekg ( ekg_pt, ekg_time )
AS
(
SELECT patient AS ekg_pt, MIN(lab_time) AS ekg_time 
FROM table
WHERE lab_type = 'ekg'
GROUP BY patient
)

WITH 1st_mri ( mri_pt, mri_time )
AS
(
SELECT patient AS mri_pt, MIN(lab_time) AS mri_time 
FROM table
WHERE lab_type = 'mri'
GROUP BY patient
)

SELECT table.patient, table.admission_time, 1st_ekg.ekg_time AS 1st_ekg_time, 1st_mri.mri_time AS 1st_mri_time
FROM table
INNER JOIN 1st_mri
ON table.patient = 1st_mri.mri_pt
INNER JOIN 1st_ekg
ON table.patient = 1st_ekg.ekg_pt
Although I'm more used to Oracle's subquery factoring (WITH clause) formatting so I'm not certain all the naming stuff is correct for SQL Sever, looks right following that doc.

Adbot
ADBOT LOVES YOU

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


This should do it, aggregate your two subtotals separately and then join them on afterwards.

code:
SELECT
  PlayerName,
  Kills,
  Deaths
FROM
  Quake2Devastation.PlayerInfo LEFT JOIN
  (
    SELECT
      KillerName,
      count(*) as Kills
    FROM
      KillInfo
    GROUP BY
      KillerName
  ) KillCounts ON  PlayerName = KillerName LEFT JOIN
  (
    SELECT
      VictimName,
      count(*) as Deaths
    FROM
      KillInfo
    GROUP BY
      VictimName
  ) DeathCounts ON PlayerName = VictimName
If you still want to order it you could put it on at the end, don't order the subqueries.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


The reason your other way doesn't work is, having done it on the actually super useful Oracle livesql (https://livesql.oracle.com) is hard to show because the data explodes out, but with 5 players, who each kill each other once, you end up with 150 rows with joining for kills and deaths at the same time. You get for each kill row, every possible death row, because they're independent rows joined in.

Link to worksheet so you can play with it:
https://livesql.oracle.com/apex/livesql/s/guwqntt2shpfia6s7u1fbb5d3

Oracle's accounts are free for this.

Edit: They're expensive for everything though.

Also, on the topic of your query, if you have it, COALESCE or NVL would save you needing the players table, you can join the two subqueries like so:
code:
SELECT
  NVL(KillerName, VictimName) PlayerName,
  NVL(Kills, 0) Kills,
  NVL(Deaths, 0) Deaths
FROM
  (
    SELECT
      KillerName,
      count(*) as Kills
    FROM
      KillInfo
    GROUP BY
      KillerName
  ) KillCounts ON  PlayerName = KillerName FULL OUTER JOIN
  (
    SELECT
      VictimName,
      count(*) as Deaths
    FROM
      KillInfo
    GROUP BY
      VictimName
  ) DeathCounts ON KillerName= VictimName
Which saves a join.

Moonwolf fucked around with this message at 00:03 on Jun 20, 2018

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


anthonypants posted:

Let's say I have the following tables:
Control, with BatchID and Source columns
Translation, with Source and Translate columns
Raw, with BatchID and Data columns

I want the following logic, but in T-SQL:
code:
for each row in Translation {
  for each row in Control {
    if Translation.Source == Control.Source {
      UPDATE Raw SET Data = Translate WHERE Raw.BatchID = Control.BatchID
    }
  }
}
I know there's a FOR and an IF in T-SQL, but I know there's a better, SQL way to do this.
This is also what I am interested in.

You don't need T-SQL:
code:
UPDATE
 Raw r1
SET
 r1.Data = t1.Translate
From
 Raw r1 JOIN 
 Control c1 ON 
 r1.BatchId = c1.BatchId JOIN
 Translation t1 ON
 c1.Source = t1.Source
This is joining Raw to Control on BatchId, and Control to Translation on Source, which matches your looping.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Yeah, if you don't have it then just left join the subqueries onto the core table.

On the cursors thing, I'd say only use them if you're going to do more than 1 discrete thing per output row. Anything where you're only generating one set of output can be done in a single statement and doesn't need them.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Bobcats posted:

Is there a sql white space/tab standard that doesn’t cause suicidal ideation?

2 space tabs. Any serious reporting query will otherwise end up so indented it linewraps.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!



Nah, he wants https://cloud.google.com/dataprep/docs/html/IFNULL-Function_66194064 IFNULL on each column, which mirrors more convention SQL NVL.

So IFNULL(Unix1, 'Empty') - IFNULL(Unix2, 'Empty'), or however the actual transform works in bigquery, COALESCE appears to be purely for columns, and doesn't look like it lets you fake it with a string literal.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


You're right, I'm not sure why I picked that, going to blame heat. Either way I think from the looks of the bigquery coalesce it doesn't allow non-column values, where ifnull does. Might just be a gap in my reading of the docs though, bigquery does a bunch of stuff quite different from classic SQL style from their own examples.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


For bulk loading in almost everything you're best off stripping all the indexing and rebuilding it afterwards, building 2 giant indices is vastly faster than keeping 2 b-trees balanced while you keep writing. The cost of constantly rebuilding them will be what's causing that fall off in speed, early on they're still simple.

What problem is your second index meant to solve? An index that holds a subset of columns that completely fulfils a known query can be useful to stop you having to read the row after the index lookup, but will be murder on your writes. Btree indices are normally there to let you get to the set of rows you want efficiently, what are you expected access paths onto the data once it's all done? Most recent row for a customer? Today's data for all customers? All data for a customer? This would set what your leading keys should be.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Hunt11 posted:

I am right now working with the sakila dataset and when I am trying to join two tables I keep on getting error message 1046.

This is the code that I have done so far.
code:
SELECT Address.Address, Address.address_id, Staff.first_name, Staff.last_name
FROM sakila.Address
INNER JOIN Staff ON  Address.address_id = Staff.CustomerID;

1046 is that you don't have a database selected, possibly because you haven't specified where staff is. Also, that doesn't look like the join key you want, staff should have an address id for that.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Hughmoris posted:

In TSQL, is there a difference between putting a clause in the JOIN statement versus in the WHERE statement?

I'm still learning SQL and I've seen it mentioned but I'm not sure I fully understand what occurs.

SQL code:
SELECT *
FROM T1
LEFT JOIN T2 ON T1.Key = T2.Key AND Event=AAA
SQL code:
SELECT *
FROM T1
LEFT JOIN T2 ON T1.Key = T2.Key
WHERE Event=AAA
Will those queries return different results? What's the fundamental difference of putting the clause in the JOIN statement versus the WHERE statement? Or is there none?

Well, that would entirely depend on which table Event comes from. If it's in table 1 then it doesn't matter, if it's in table 2 then the second one stops it being a left join by forcing a value to be required. Version 1 gets T1 with all columns from T2 where the keys match and Event = AAA, or nulls. The second one gets T1 with all columns from T2 where the keys match or nulls, and then demands that Event = AAA, which is impossible if it's not set.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


If you reinsert with it ordered properly then that'll help with reducing reads. If you're doing a ton of whole-day fuckery then you might be best served with partitioning if it's available in your db, since that'll make throwing away the old data fast, unless you're actually truncating the whole table.

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


In SQL Server it defaults to unbound preceding to current row if you put an order by clause on. If you hadn't ordered it then range boundaries wouldn't make any sense so it'd default to the entire dataset, because you haven't put a partitioning clause on.

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Grump posted:

I need help creating a SQL query that will return a cocktail from a database, given that I have supplied all the ingredients that go into that cocktail

So for example, I want the "Gin and Tonic" row to return only if I have supplied the correct ids for Gin (id is 1) and Tonic (id is 2). I I only supply "Tonic," I shouldn't get back the row

I'm using SQLAlchemy and Flask, but I'm still having trouble wrapping my head around how the query will work altogether
<snip>
[/code]

Ruggan's question's a good one. This is a really hard problem with a dynamic amount of parameters though. What you need to get is:
a) Any cocktail that you have at least one ingredient for;
b) All their ingredients lists;
c) Compare your input list to those lists.

Auto-generated SQL doesn't stand a chance.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Grump posted:

Yes - Gin and Tonic should be returned if you included Whiskey


So you're saying I could do something like

1. Client asks for all liquors including Gin and Tonic
2. SQL statement returns "Gin and Tonic" and "Gin and Tonic and Lime"
3. Python compares the client's list of IDs with the IDs of each Cocktail.ingredients
4. If all the client IDs appear in each Cocktail.ingredients.id, add that cocktail to a result array
5. Return the result array to the client

???

Forums ate my query, will edit it in once I work out how to get it to let me post it.

How do people get Cloudflare to let you post SQL? It just claims it's an attack.

Moonwolf fucked around with this message at 00:14 on Jan 14, 2019

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Bruegels Fuckbooks posted:

what do you mean by "posting" sql- you mean in the post of a web request, perchance?

Yeah, just replying with it in a code block, like everyone else is. So that I can reply with the example query to Grump's post. Although now it works, for no obvious reason. This one should be db neutral.

code:
SELECT
	name
FROM
	cocktails JOIN
	(
	SELECT
		potential_cocktails.cocktail_id 
	FROM
		(SELECT DISTINCT
			cocktail_id
		FROM
			ing_in_cocktails
		WHERE
			ing_in_cocktails.ing_id IN (%(ing_id_1)s)) potential_cocktails JOIN
		ing_in_cocktails needed ON needed.cocktail_id = potential_cocktails.cocktail_id LEFT JOIN
		ing_in_cocktails present ON present.ing_id = needed.ing_id
	WHERE
		present.ing_id IN (%(ing_id_1)s)) OR
		present.ing_id IS NULL
	GROUP BY
		potential_cocktails.cocktail_id
	HAVING
		count(present.ing_id) = count(needed.ing_id))) complete_cocktails ON complete_cocktails.cocktail_id = cocktails.id

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


TheFluff posted:

If you have a halfway decent database it will have set operators (INTERSECT/EXCEPT), which can make this kind of thing far easier to read:

SQL code:
SELECT *
FROM cocktails
WHERE NOT EXISTS (
    SELECT ing_in_cocktails.ingredient_id
    FROM ing_in_cocktails
    WHERE ing_in_cocktails.cocktail_id = cocktails.id
    EXCEPT
    SELECT ingredients.id
    FROM ingredients
    WHERE ingredients.id IN (%parameter_list_of_ingredient_ids_goes_here%)
)
Rather than trying to find cocktails we can make by counting the number of necessary ingredients, this works by excluding cocktails we can't make because there's at least one ingredient missing. In the NOT EXISTS subquery we take the set of necessary ingredients for a given cocktail and then subtract the set of ingredients we have available (the EXCEPT operator returns rows that are output by the left query but not by the right query). If the result is the empty set, we know we have all ingredients we need for this cocktail, and so we exclude cocktails where this set is not empty.

I use the set operators exceedingly rarely, but sometimes they're just the perfect tool for the job.

e: I have not actually looked at the execution plan for this, but I doubt it's any slower than joining on an aggregating subquery

e2: the last bit after the EXCEPT (SELECT ingredients.id FROM ingredients...) doesn't even need to be a real query, you can use a table-valued parameter instead, or a VALUES list or w/e.

You can make that even simpler, you don't need the EXCEPT at all:
SQL code:
SELECT *
FROM cocktails
WHERE NOT EXISTS (
    SELECT ing_in_cocktails.ingredient_id
    FROM ing_in_cocktails
    WHERE ing_in_cocktails.cocktail_id = cocktails.id AND
    ing_id NOT IN (%parameter_list_of_ingredient_ids_goes_here%)
)
Of course, the actual performance of any of these would vary based on the size of any of these tables, but is largely going to be similar at any conventionally sized cocktails list.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


I've only ever found them to be useful for reconciling data coming from two difference sources that's meant to be the same, or checking that a new query/ETL flow spits out the same data as the old one when you're refactoring stuff for performance.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Those should plan to the same anyway. What could be faster depending on the size of the data would be using an IN clause:

code:
UPDATE
  table
SET
  flag=1
WHERE
  id IN (
  SELECT
    id
  FROM
    table
  WHERE
    otherflag='Y'
)
If id is the primary key then this is total overkill anyway, and could just be
code:
UPDATE
  table
SET
  flag=1
WHERE
  otherflag='Y'

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Nth Doctor posted:

It's this and never making indices

Or covering everything in them 'for speed'

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


abelwingnut posted:

general sql question:

let's say i have:

code:
SELECT
	*
FROM
	tableA a
	JOIN tableB b ON udfBlah(a.x) = b.y
so in my mind, the way the engine works, it looks at the first row in tableA, evaluates a.x, then goes to tableB, then looks for b.y in the first row. whatever that result may be, true or false, what happens next?

my guess is it has held the result of udfBlah(a.x) in memory so that it doesn't have to reevaluate, and can then move on to row 2 in tableB, evaluate it, then row 3 in tableB, and so forth until it reaches the end of the table B. at that point it then reevaluates udfBlah(a.x) for row 2, then cycles through table B again while holding the new value of udfBlah(a.x) in memory.

do i have that correct, or does it reevaluate udfBlah(a.x) with each evaluation?

So there was some mention of join types, but pretty much any db is always going to keep the results of your udfBlah(a.x) for evaluation, because anything else takes more cpu. Even at a nested loop it will evaluate that once, and then examine b.y for equality a row at a time. Merge join I'm not certain because of how it'd need to sort b.y to match the ordering of udfBlah(x.y), hash join would just make the hashes of all the udfBlah(a.x) for every a row and then join to the matching ones on b.

The only place where this wouldn't hold is if udfBlah is non-determinative, so if you've got a random or time function in there so the output can change, then you'd need to really examine what it was doing, it could either evaluate it per row coming in or per join.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Having just looked up that on Wikipedia, that looks hideous. Although it does more than SQL, it's still actually more unreadable than any of the DB specific procedural languages I've seen.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


TheFluff posted:

Speaking of which, if you have MSSQL, system-versioned temporal tables are great and I wish I had them in Postgres. You can implement something similar with triggers, but you can't get the query syntax and that's half the fun.

Oh, someone else other than Oracle does native audited tables, that's good, we might see it spread since it's actually really useful. Oracle's one's called Flashback Data Archive for proper long term, they've let you use undo to wind back queries in the near term a while. I believe Postgres is moving to an undo area system, at least under EDB's work on it, which would enable them to do this stuff much more reasonably than the orphaned-rows-and-vacuuming system they have now.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Vegetable posted:

Quick question:

orders has columns buyer_id, seller_id, order_id, total_buyer_orders.

total_buyer_orders is the number of orders ever created by the buyer.

I’m trying to get a list of sellers whose orders comprise at least 80% of any buyer’s total_buyer_orders.

SELECT buyer_id, seller_id
FROM bookings
HAVING count(booking_id)/total_buyer_orders >= 0.8

Does this work? Specifically, is it okay to combine two columns in a HAVING clause like that?

That should work, HAVING is just like WHERE but you can also interrogate the output of aggregates, like you are doing. You could easily make that as:
SELECT * FROM (
SELECT buyer_id, seller_id, count(booking_id)/total_buyer_orders perc
FROM bookings
)
WHERE perc >= 0.8

which is what HAVING lets you avoid in cleaner syntax.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


As far as I know there is no dialect of SQL which can do that in a single operation. Although the better way to solve that would be to have a table of:
code:
transactionId  itemId   count
4              5        30
4              3        112
5              5        2
where you do one join to decorate with the item name from id and then pivot into your transaction + items format.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


That's fair, this sort of thing is often handed to us, but yeah, you're going to just have to join a bunch of times I'm afraid. It should be pretty cheap though, just ugly in the SQL.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


That'll probably be quite slow, but depending on which db Agrikk is using there might be unpivot to do the same thing with less reads. A lot of that would be determined if you're doing this for 1 transaction id or a huge group of them though Agrikk.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


It's also a guard against dynamic/constructed SQL with empty values in, which is generally a sign something's gone terribly wrong and terminating rather than continuing is the best bet.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


NPR Journalizard posted:

Is there an easy way to backup a DB, and then restore it to the same SQL server under a different name, but with all the DB objects like stored procs and views pointing at the new name?

I know I can generate scripts with a drop and create option, and I think that will cover a bunch of the items, but im not sure it will cover all of them.

Most engines will let you do a rename during restore and recovery, which database are you using here for this?

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Combat Pretzel posted:

Just a quick check, because I have been writing a bunch of terrible SQL queries by hand, using subpar tools (MSQuery *cough*), against DB2 tables with expected lovely table and column naming schemes from the 80's, after ages of no SQL at all, while waiting to get proper gear at my new workplace (I'm currently using a terrible thin client with a restricted environment), so my head spinning and need a check whether I'm thinking right (not even sure why I'm doing it on a weekend).

I've been trying to identify what people performed what operations in the factory by matching dates and times against attendances. The query looks loosely like this, and takes it's sweet time, relatively, and probably expected considering it has to loop the joined table a shitload:

code:
SELECT *
FROM
    operation o
    LEFT OUTER JOIN presence p
    ON
        (o.date * 1000000 + o.time) >= (p.punch_in_date * 1000000 + p.punch_in_time) AND
        (o.date * 1000000 + o.time) <= (p.punch_out_date * 1000000 + p.punch_out_time) AND
WHERE o.something = blah
(This datetime terribleness is technical debt.)

I presume adding this to the ON statement should help a lot, because it can home in on the rows more easily? Or would DB2 optimize it that way internally regardless?
code:
(o.date = p.punch_in_date OR o.date = p.punch_out_date)
(Considering shifts shouldn't be longer than 11 hours, checking entry and exit dates for graveyard shifts.)

Also, DB2 has a BETWEEN operator, does it optimize queries better than writing it out by hand?

That join will probably help, because it gives it the potential to do a better join type than endlessly looping. Because you're doing a LEFT join you should put it in the ON because otherwise you're removing the LEFTness by demanding that an o date matches a punch date.

Between is normally just syntactical sugar for x >= foo and <= bar, it shouldn't make any difference to your results at all.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Hammerite posted:

I became aware recently that in SQL Server you can use a VALUES table-expression ("Table Value Constructor") to specify a collection of values without having to create and populate a temporary table.

Suppose I have a similar use case to this but I have to use Oracle. What's the best I can do? I do not want to assume I am free to create temporary tables, either global or private. Assume I am going to left outer join from the pseudo-table to a real table, so a WHERE X IN (...) condition won't do.

There isn't, as far as I know, anything that nice in Oracle yet. If the set of pseudo-table data is fixed, but you can't store it, you could put it in a WITH clause/CTE by doing UNION ALL and selecting them from Dual. Handily stolen example:
code:
WITH t1 AS
  (SELECT 1 SUBYPE, 123 DATAID, 0 ORIGNALDATAID, '01-Aug-2010' DTE FROM dual
  UNION ALL
  SELECT 1, 145, 10, '01-Aug-2010' FROM dual
  UNION ALL
  SELECT 1, 12345, 475, '01-Aug-2010' FROM dual
  UNION ALL
  SELECT 1, 789, 0, '01-Aug-2010' FROM dual
  UNION ALL
  SELECT 2, 789456, 0, '01-Aug-2010' FROM dual
  UNION ALL
  SELECT 2, 789456, 0, '02-Aug-2010' FROM dual)
SELECT dataid,ORIGNALDATAID,decode(ORIGNALDATAID,0,dataid,ORIGNALDATAID)req FROM t1 where subype=1 and dte='01-Aug-2010'

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


abelwingnut posted:

i'm not really sure how to say this, but hopefully this works.

is there any way to run through a script line-by-line? like in a debug mode? i've seen this in other languages, where you can just say execute this line, and it'll show you how variables change with each line. anything like that without having to manually insert and remove selects with each line?

This only makes sense within the context of whatever database you're using's procedural language extensions (TSQL, PL/SQL, PL/pgSQL, etc). Within those, there should be a debugger like Sir Fishbone says, probably integrated into whatever IDE you use for it if it's a better supported one.

SQL proper is declarative, you can't step through it because the entire statement is parsed and applied as one thing, there you're best off commenting out joins and reading the explain plans while you work out why it's not doing what you expect.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Knot My President! posted:

Crossposting:

Is there some kind of magical galaxy brain way to do SQL that I'm missing? I learned to do SQL a certain way and I'm at a loss as to why I got rejected after a technical take-home. The thing is that I know the answers are correct because I uploaded the tables into BigQuery to test all of my answers there.

I generally use table aliases / common table expressions in order to avoid using subqueries, and I avoid using self joins to filter data since they're generally harder aesthetically to read (unless there's a parent-child relationship in the same table). My syntax and methods are "highest ranked answer on Stack Overflow". After inheriting a ton of bad queries from the senior analyst, I went through and organized everything in such a way so anyone from top level down could make sense of my code even if they had no idea how to write SQL, including documentation that explained what everything did. But how I functionally write code on a job is apparently different from what a recruiter wants in an interview so I'm at a loss here :shrug:

I'm in my final rounds with another job I am excited for but I want to cover my bases nonetheless. Any ideas?

If this is re: your post in poo poo that pisses me off, the hirers were insane. Doubly so if you were on new Postgres, because they've taken the pattern matching feature that Oracle put in the spec and implemented it, so there's a really performant way to do trend finding now rather than hideous joins.

Edit: Also table aliases are free, they're just letting you shorthand a name, just like renaming columns in a result set.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Malcolm XML posted:

Wait they implemented match recognize??? Holy poo poo

It's literally regex for everything

Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Nth Doctor posted:

As both a SQL and Regex guy, I love this. :peanut:


I'm just super sad that my company uses SQL Server. :smith:

Well just badger Microsoft to implement it, it's in the SQL standard now (as of 2016), I think Oracle got it put in when they did it in 12.1.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


LostMy2010Accnt posted:

So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe.

I think the problem is that I'm trying to get two different sized tables to merge in the manner I want. I have a table with 1200 job postings that have a location and then I have a table of about 200 locations with the latitudes and longitudes. I got the locations table from the jobs table and now trying to join the lat and longs to the respective job locations.

Thank you for the advice; now I know I need to go back and fix my query.

Yeah, as above, the size of the tables doesn't matter for your results, given that it's not generating results so large that the query is crashing on resources. The other thing I'd flag is that your query is explictly allowing the results to be NULL, because you're left joining. In your end output where you need a lat/long value you should inner join (just JOIN, inner is optional everywhere) so that the join must complete to give results, otherwise you're going to get all the ones where it doesn't match. Fixing those in development or another data quality query is probably going to be useful, a way to find all the results where it's not giving you what you expect can be done by doing

code:
SELECT 
  *
FROM
  combined_jobs
LEFT OUTER JOIN
  location_cache on (combined_jobs.location = location_cache.city)
WHERE location_cache.city IS NULL
because this returns all jobs where you can't get a city to match.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Knot My President! posted:

Thanks man. Was driving me nuts.

Update to this:

I got a perfect score on the SQL technical interview at Facebook and the person clapped excitedly when I did. Ended up crushing the whole interview and got an offer three days later. Best part? I was able to turn down Facebook entirely for another job I also got a perfect score on. For this one, the data Big Boss for the entire company gave me the techical screening. People ops said he was blown away. Got a kickass team and a 30% pay bump when I start.

Very nice, well done.

quote:

gently caress Twitch for getting verbally upset and having one of the interviewers leave the Zoom because "CTEs aren't performant". Also gently caress Autodesk for making me do a six hour take-home SQL test and then ghosting me after I turned it in. (really???) So glad I'm done with this job hunting shitshow.

That's hillariously unprofessional interviewing, wtf? Also CTEs are at worst identical to inline subqueries, but you give the db more chance to temp-table them and not have to run it again if you're reusing it. What the hell was that person on?

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


NihilCredo posted:

This one isn't wrong but, since it's logically equivalent to a plain JOIN, if I saw that in code review I'd flag the line and check if they intended the LEFT to be there and forgot to add "OR a.binary IS NULL" to the filter.

What? No it isn't, a is the Left table, b is the table that can be missing and return NULLs. Left/Right Outer has the table specified need to exist and the other side be optional. If it were a FULL OUTER it'd end up being a LEFT because of the non-NULL specified on a. That's a correct join and filter on the driving table.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


abelwingnut posted:

so the query planner sees a difference between:

SELECT * FROM tableA a LEFT JOIN tableB b ON a.id = b.idForeign WHERE b.binary = 1

and

SELECT * FROM tableA a LEFT JOIN tableB b ON a.id = b.binary AND b.binary = 1

?

in any case, would it be best to put tableB in a cte with its predicates and LEFT JOIN that?

Those are different, because your where clause is separate from your joining clauses and filters post join. In most cases it'd be the same and CTEing it wouldn't do anything either way. In all cases, check your explain plans on as live like system as you can, because behaviour can change based on what the db thinks is most effective with CTEs.

Adbot
ADBOT LOVES YOU

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


prom candy posted:

I have some list views in my application that are just absolutely balls slow. In order to fulfill the UI requirements as well as the advanced filtering requirements we have to do a lot of joins with a lot of different tables and for customers whose base data set includes hundreds of thousands of items in a table that contains 1M+ items, loading a single page of 30 items can take about 15-20 seconds. I've mucked around with EXPLAIN quite a bit and although I've found a couple places where I can improve indices to shave a couple seconds here and there I can't seem to get the base queries down into the range where I'd like them to be. Is this about the time that I should be looking at caching this data in an elasticsearch (or other) store or should MySQL be able to do what I need it to do and it's worth continuing to try to press on and improve my queries?

On top of what the others have said, MySQL can only do nested loop joins, which are simple, and the fastest way to do small results sets, but will always suck no matter what for 10k+ results sets, and be terrible for 100k+. Trimming unused joins will certainly help though.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply