SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

Malcolm XML posted:

Wait they implemented match recognize??? Holy poo poo

It's literally regex for everything

Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation.

# ? Oct 21, 2020 19:11

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 23:01

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Moonwolf posted:

Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation.

As both a SQL and Regex guy, I love this. :peanut:

I'm just super sad that my company uses SQL Server. :smith:

# ? Oct 22, 2020 18:04

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

Nth Doctor posted:

As both a SQL and Regex guy, I love this.

I'm just super sad that my company uses SQL Server.

Well just badger Microsoft to implement it, it's in the SQL standard now (as of 2016), I think Oracle got it put in when they did it in 12.1.

# ? Oct 23, 2020 13:53

DELETE CASCADE: Oct 25, 2017; i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

Knot My President! posted:

Crossposting:

Is there some kind of magical galaxy brain way to do SQL that I'm missing? I learned to do SQL a certain way and I'm at a loss as to why I got rejected after a technical take-home. The thing is that I know the answers are correct because I uploaded the tables into BigQuery to test all of my answers there.

I generally use table aliases / common table expressions in order to avoid using subqueries, and I avoid using self joins to filter data since they're generally harder aesthetically to read (unless there's a parent-child relationship in the same table). My syntax and methods are "highest ranked answer on Stack Overflow". After inheriting a ton of bad queries from the senior analyst, I went through and organized everything in such a way so anyone from top level down could make sense of my code even if they had no idea how to write SQL, including documentation that explained what everything did. But how I functionally write code on a job is apparently different from what a recruiter wants in an interview so I'm at a loss here

I'm in my final rounds with another job I am excited for but I want to cover my bases nonetheless. Any ideas?

imo the bolded part is the galaxy brain way to do sql. i mean of course there's more to it, but it sounds to me like you know quite well what you're doing, and that interviewer sucked. the fact that you even know about common table expressions probably puts you in the 90th percentile

# ? Oct 25, 2020 01:10

LostMy2010Accnt: Dec 13, 2018

I'm hopefully in the right thread and if not, I'm sorry for bad posting. I wasn't sure if this was a SQL or Python question but I think it falls under SQL. I'm reworking an old project and I'm trying to create a dataframe that consists of a join from two tables. Both tables have a common field (location and city). What I'm trying to do is a couple things and I know I'm boning it up. First I'm trying to merge two tables with a different amount of entries. The first table (combined_jobs) is full of jobs at GitHub and Stack Overflow consisting of fields like job title, company name, and location. The second table (location_cache) has three fields with the city (same as location), latitude, and longitude. What I'm trying to do here is for every entry in the location field of the first table, it will get the latitude and longitude from the second table.

Here's how I'm trying to do it:

code:

def create_dataframe(connection: sqlite3.Connection) -> pd.DataFrame:
    """Create dataframe containing fields from combined_jobs and location_cache tables"""
    dataframe = pd.read_sql(f"""
    SELECT
        combined_jobs.id, combined_jobs.company, combined_jobs.link, combined_jobs.location,
        combined_jobs.date, combined_jobs.content, combined_jobs.title, location_cache.city,
        location_cache.latitude, location_cache.longitude
    FROM
        combined_jobs
    LEFT OUTER JOIN
        location_cache on (combined_jobs.location = location_cache.city)""",
                                  connection)
    return dataframe

The reason I'm asking this is because when I'm printing out the returned dataframe, it has the fields but the latitude and longitude are showing up as Not a Number (NaN) and I'm not sure what I'm doing wrong in this. Any suggestions on how I should go about this? I couldn't find anything in searches or Stack Overflow, but that could just be me not asking the right questions. Also I'm not sure if I've provided enough information, if needed I can provide more I just didn't want to overload my first post.

# ? Oct 31, 2020 17:25

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Your query looks ok (caveat that we can�t see the data). Are you able to run it in your sqllite instance to verify the output?

Is it possible your latitude and longitude are defined as varchar or nvarchar columns in your sql data? If so they might be being returned as a string representation of the number ala �1.000� instead of 1.000.

If this is the case - ideally, change the types to numeric. If you can�t change the schema, try casting the columns in your query - CAST(longitude AS numeric(?,?)) - the ?s being dependent on what precision you need / want.

# ? Nov 1, 2020 14:28

LostMy2010Accnt: Dec 13, 2018

Ruggan posted:

Your query looks ok (caveat that we can�t see the data). Are you able to run it in your sqllite instance to verify the output?

Is it possible your latitude and longitude are defined as varchar or nvarchar columns in your sql data? If so they might be being returned as a string representation of the number ala �1.000� instead of 1.000.

If this is the case - ideally, change the types to numeric. If you can�t change the schema, try casting the columns in your query - CAST(longitude AS numeric(?,?)) - the ?s being dependent on what precision you need / want.

So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe.

I think the problem is that I'm trying to get two different sized tables to merge in the manner I want. I have a table with 1200 job postings that have a location and then I have a table of about 200 locations with the latitudes and longitudes. I got the locations table from the jobs table and now trying to join the lat and longs to the respective job locations.

Thank you for the advice; now I know I need to go back and fix my query.

LostMy2010Accnt fucked around with this message at 21:46 on Nov 1, 2020

# ? Nov 1, 2020 21:40

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

LostMy2010Accnt posted:

So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe.

I think the problem is that I'm trying to get two different sized tables to merge in the manner I want. I have a table with 1200 job postings that have a location and then I have a table of about 200 locations with the latitudes and longitudes. I got the locations table from the jobs table and now trying to join the lat and longs to the respective job locations.

Thank you for the advice; now I know I need to go back and fix my query.

Size of the tables you are joining is irrelevant. Lat and long are null because of some combination of:
1. Location_Cache.City never matches exactly combined_jobs.location
OR
2. Location_Cache.City matches exactly combined_jobs.location but Location_Cache.Lat and .Long are NULL for that city.

Pick a SINGLE KNOWN job in a SINGLE KNOWN city, and try to match it with the relevant row in Location_Cache. You can probably generalize the fine-tuning from there.

Use a WHERE clause to limit each table to the row you know you are looking for ahead of time.

At a guess: leading or trailing whitespace, case sensitive collation, or trying to match 'New York, NY' with 'New York' is happening.

Use LTRIM(), RTRIM(), and LOWER() as first steps.

# ? Nov 1, 2020 23:35

LostMy2010Accnt: Dec 13, 2018

Nth Doctor posted:

Size of the tables you are joining is irrelevant. Lat and long are null because of some combination of:
1. Location_Cache.City never matches exactly combined_jobs.location
OR
2. Location_Cache.City matches exactly combined_jobs.location but Location_Cache.Lat and .Long are NULL for that city.

Pick a SINGLE KNOWN job in a SINGLE KNOWN city, and try to match it with the relevant row in Location_Cache. You can probably generalize the fine-tuning from there.

Use a WHERE clause to limit each table to the row you know you are looking for ahead of time.

At a guess: leading or trailing whitespace, case sensitive collation, or trying to match 'New York, NY' with 'New York' is happening.

Use LTRIM(), RTRIM(), and LOWER() as first steps.

Thank you. I'll try that out.

I appreciate all the help, folks. Nice to have threads like this.

# ? Nov 1, 2020 23:55

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

LostMy2010Accnt posted:

So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe.

I think the problem is that I'm trying to get two different sized tables to merge in the manner I want. I have a table with 1200 job postings that have a location and then I have a table of about 200 locations with the latitudes and longitudes. I got the locations table from the jobs table and now trying to join the lat and longs to the respective job locations.

Thank you for the advice; now I know I need to go back and fix my query.

Yeah, as above, the size of the tables doesn't matter for your results, given that it's not generating results so large that the query is crashing on resources. The other thing I'd flag is that your query is explictly allowing the results to be NULL, because you're left joining. In your end output where you need a lat/long value you should inner join (just JOIN, inner is optional everywhere) so that the join must complete to give results, otherwise you're going to get all the ones where it doesn't match. Fixing those in development or another data quality query is probably going to be useful, a way to find all the results where it's not giving you what you expect can be done by doing

code:

SELECT 
  *
FROM
  combined_jobs
LEFT OUTER JOIN
  location_cache on (combined_jobs.location = location_cache.city)
WHERE location_cache.city IS NULL

because this returns all jobs where you can't get a city to match.

# ? Nov 2, 2020 01:35

LostMy2010Accnt: Dec 13, 2018

Moonwolf posted:

Yeah, as above, the size of the tables doesn't matter for your results, given that it's not generating results so large that the query is crashing on resources. The other thing I'd flag is that your query is explictly allowing the results to be NULL, because you're left joining. In your end output where you need a lat/long value you should inner join (just JOIN, inner is optional everywhere) so that the join must complete to give results, otherwise you're going to get all the ones where it doesn't match. Fixing those in development or another data quality query is probably going to be useful, a way to find all the results where it's not giving you what you expect can be done by doing
code:
SELECT 
  *
FROM
  combined_jobs
LEFT OUTER JOIN
  location_cache on (combined_jobs.location = location_cache.city)
WHERE location_cache.city IS NULL
because this returns all jobs where you can't get a city to match.

Thank you for the advice, I'm going to be working on it some more today so I'll put this advice to good use.

Again, I appreciate everyone's input; thank you for the help.

# ? Nov 2, 2020 12:51

kumba: Nov 8, 2003; I posted my food for USPOL Thanksgiving!

enjoy the ride; Lipstick Apathy

here's some quirky poo poo i've never noticed before: if i have two columns:

FedTaxID (nvarchar(10))
SSNHashed (varbinary(8000))

and i do a

select coalesce(FedTaxID,SSNHashed)

this is what the output looks like when FedTaxID is NULL

i assume this is due to some implicit cast forcing the hash to be interpreted as some bastardized ASCII?

this doesn't affect my results materially at all i just thought it was an interesting curiosity that i've never noticed before

# ? Nov 2, 2020 17:50

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

kumba posted:

here's some quirky poo poo i've never noticed before: if i have two columns:

FedTaxID (nvarchar(10))
SSNHashed (varbinary(8000))

and i do a

select coalesce(FedTaxID,SSNHashed)

this is what the output looks like when FedTaxID is NULL

i assume this is due to some implicit cast forcing the hash to be interpreted as some bastardized ASCII?

this doesn't affect my results materially at all i just thought it was an interesting curiosity that i've never noticed before

The Microsoft Documentation posted:

Returns the data type of expression with the highest data type precedence. If all expressions are nonnullable, the result is typed as nonnullable.

# ? Nov 2, 2020 18:16

Hughmoris: Apr 21, 2007; Let's go to the abyss!

I'm an RN in a EHR (Epic) analyst position. I want to get more on the data side of the house. I have experience with querying the EDW, python+pandas, data viz etc... and am currently learning about virtualization and containers.

Are there any certifications I can work on over the next year that will make me more marketable for BI/ETL/EDW Developer positions? I've really enjoyed my exposure to this work and I think I need more tech-centric certs on my resume to balance out my clinical expertise and lack of a data-centric role.

# ? Nov 3, 2020 03:26

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

kumba posted:

i assume this is due to some implicit cast forcing the hash to be interpreted as some bastardized ASCII?

that's not ASCII. It's displaying it as Windows-1252, with box characters for control codes/invalid bytes.

# ? Nov 3, 2020 11:48

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Hughmoris posted:

I'm an RN in a EHR (Epic) analyst position. I want to get more on the data side of the house. I have experience with querying the EDW, python+pandas, data viz etc... and am currently learning about virtualization and containers.

Are there any certifications I can work on over the next year that will make me more marketable for BI/ETL/EDW Developer positions? I've really enjoyed my exposure to this work and I think I need more tech-centric certs on my resume to balance out my clinical expertise and lack of a data-centric role.

certifications are worthless as indicators of ability except as a gatekeeping mechanism or are so vendor specific as to be meaningless (most vendor certs are run by their marketing depts)

all the interesting medical data is locked inside the records, which, well, lmao you aren�t getting anything useful out of HL7 certs except as a means to get a job where you won�t do anything useful

# ? Nov 4, 2020 01:18

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Moonwolf posted:

Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation.

im horny for match_recognize it�s so nice and not even that hard if u can handle general NFA/DFA

# ? Nov 4, 2020 01:20

Knot My President!: Jan 10, 2005

DELETE CASCADE posted:

imo the bolded part is the galaxy brain way to do sql. i mean of course there's more to it, but it sounds to me like you know quite well what you're doing, and that interviewer sucked. the fact that you even know about common table expressions probably puts you in the 90th percentile

Thanks man. Was driving me nuts.

Update to this:

I got a perfect score on the SQL technical interview at Facebook and the person clapped excitedly when I did. Ended up crushing the whole interview and got an offer three days later. Best part? I was able to turn down Facebook entirely for another job I also got a perfect score on. For this one, the data Big Boss for the entire company gave me the techical screening. People ops said he was blown away. Got a kickass team and a 30% pay bump when I start.

gently caress Twitch for getting verbally upset and having one of the interviewers leave the Zoom because "CTEs aren't performant". Also gently caress Autodesk for making me do a six hour take-home SQL test and then ghosting me after I turned it in. (really???) So glad I'm done with this job hunting shitshow.

# ? Nov 7, 2020 18:24

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

Knot My President! posted:

Thanks man. Was driving me nuts.

Update to this:

I got a perfect score on the SQL technical interview at Facebook and the person clapped excitedly when I did. Ended up crushing the whole interview and got an offer three days later. Best part? I was able to turn down Facebook entirely for another job I also got a perfect score on. For this one, the data Big Boss for the entire company gave me the techical screening. People ops said he was blown away. Got a kickass team and a 30% pay bump when I start.

Very nice, well done.

quote:

gently caress Twitch for getting verbally upset and having one of the interviewers leave the Zoom because "CTEs aren't performant". Also gently caress Autodesk for making me do a six hour take-home SQL test and then ghosting me after I turned it in. (really???) So glad I'm done with this job hunting shitshow.

That's hillariously unprofessional interviewing, wtf? Also CTEs are at worst identical to inline subqueries, but you give the db more chance to temp-table them and not have to run it again if you're reusing it. What the hell was that person on?

# ? Nov 7, 2020 18:51

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Moonwolf posted:

Also CTEs are at worst identical to inline subqueries

That's not always a guarantee depending on the vendor.

Notably, in Postgres before 12.0 subqueries could be inlined but CTEs couldn't, so for simple filters and self-join you might get better performance by skipping the CTE.

# ? Nov 7, 2020 19:13

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Moonwolf posted:

That's hillariously unprofessional interviewing, wtf?

Yeah, that's an absurd reaction to someone giving an answer you don't like to a technical question. A more reasonable reaction would be to say to the candidate something like "OK, but I see you used a CTE. Our house style says that we avoid using CTEs because we believe they're often unperformant. How would you do it without a CTE?" That lets you test the candidate on how cooperative they are, whether they can vary their approach when new requirements are presented, and so on.

If you don't have the time to engage with the candidate at that level then you don't have any reason to be in attendance at the interview. Either that or you have so many eager candidates you can afford to be an rear end in a top hat to them and reject many of them on unrigorous heuristic grounds. Of course for a very well-known company like Twitch this might be the case.

# ? Nov 8, 2020 14:21

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Hammerite posted:

If you don't have the time to engage with the candidate at that level then you don't have any reason to be in attendance at the interview. Either that or you have so many eager candidates you can afford to be an rear end in a top hat to them and reject many of them on unrigorous heuristic grounds. Of course for a very well-known company like Twitch this might be the case.

Reminds me of the old joke "I discard half the resumes I receive at random. I don't want to hire unlucky people."

# ? Nov 9, 2020 09:06

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Moonwolf posted:

That's hillariously unprofessional interviewing, wtf?

Yes, and this is the second person I've seen mentioning unprofessional interviews at Twitch in the last few months. The other was on Discord so I can't link it here.

# ? Nov 9, 2020 17:13

Tea Bone: Feb 18, 2011; I'm going for gasps.

I have a table recording IPs and time (among some other things) when a page is hit.

I'd like to be able to run some queries against both the total hits and the unique hits. I had assumed grouping by IPs would do the trick but MySQL doesn't like that anymore and I get an error:

code:

Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column

I know I can turn full_group_by off, but I figured it's there for a reason and I'm better off trying to write a better query.

I've added a 'unique' boolean column to the table which sets to true each time a new IP is entered. It works fine for now but it feels like this should be a fairly simple query but I'm just drawing a blank on it.

I suppose what I want is to be able to just select a row if it's the first time that IP shows up.

# ? Nov 13, 2020 16:58

Just-In-Timeberlake: Aug 18, 2003

Tea Bone posted:

I have a table recording IPs and time (among some other things) when a page is hit.

I'd like to be able to run some queries against both the total hits and the unique hits. I had assumed grouping by IPs would do the trick but MySQL doesn't like that anymore and I get an error:
code:
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column
I know I can turn full_group_by off, but I figured it's there for a reason and I'm better off trying to write a better query.

I've added a 'unique' boolean column to the table which sets to true each time a new IP is entered. It works fine for now but it feels like this should be a fairly simple query but I'm just drawing a blank on it.

I suppose what I want is to be able to just select a row if it's the first time that IP shows up.

I�m not a MySQL expert by any means but I think

code:

set session sql_mode='';

is what you�re looking for

# ? Nov 13, 2020 20:45

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Tea Bone posted:

I have a table recording IPs and time (among some other things) when a page is hit.

I'd like to be able to run some queries against both the total hits and the unique hits. I had assumed grouping by IPs would do the trick but MySQL doesn't like that anymore and I get an error:
code:
Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column
I know I can turn full_group_by off, but I figured it's there for a reason and I'm better off trying to write a better query.

I've added a 'unique' boolean column to the table which sets to true each time a new IP is entered. It works fine for now but it feels like this should be a fairly simple query but I'm just drawing a blank on it.

I suppose what I want is to be able to just select a row if it's the first time that IP shows up.

I'm not sure what you're trying to do, but if you want a query that returns something like ip, seen_at, first_seen_at then you can use a window function for that:

code:

select
	ip,
	seen_at,
	first_value(seen_at) over(partition by ip order by seen_at) as first_seen_at
from hits;

This will give you all hits and each row will have a column first_seen_at that contains the timestamp for when that ip was first seen.

# ? Nov 14, 2020 02:13

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

We have a select box that basically lets you do a search based on a user where some other records must exist for that the user that owns that record. So something like:

"SELECT record WHERE user IN (SELECT user FROM othertable WHERE targetfield = a AND otherconditions) AND user IN (SELECT user FROM othertable WHERE targetfield = b AND otherconditions)..."

"otherconditions" are all the same for all the subqueries except targetfield changes. Is there a way to express that we want "select all users from othertable WHERE user has instances of records where targetfield=a,b,c exist"?

I'm coming up with some type of CTE maybe or group aggregation function? Like we LISTAGG(a,b,c) and then somehow check that all of a,b,c are contained in the aggregated string?

DB is Oracle 11g.

edit: maybe something along the lines of "SELECT record WHERE user IN (SELECT user FROM othertable WHERE otherconditions GROUP BY user HAVING COUNT(targetfield in (a,b,c)) > 3"... but there can also be multiple rows where targetfield = a so we really need the number of distinct targetfield where the distinct targetfield in (a,b,c).

Paul MaudDib fucked around with this message at 19:54 on Nov 16, 2020

# ? Nov 16, 2020 19:35

Gatac: Apr 22, 2008; Fifty Cent's next biopic.

Paul MaudDib posted:

We have a select box that basically lets you do a search based on a user where some other records must exist for that the user that owns that record. So something like:

"SELECT record WHERE user IN (SELECT user FROM othertable WHERE targetfield = a AND otherconditions) AND user IN (SELECT user FROM othertable WHERE targetfield = b AND otherconditions)..."

"otherconditions" are all the same for all the subqueries except targetfield changes. Is there a way to express that we want "select all users from othertable WHERE user has instances of records where targetfield=a,b,c exist"?

I'm coming up with some type of CTE maybe or group aggregation function? Like we LISTAGG(a,b,c) and then somehow check that all of a,b,c are contained in the aggregated string?

DB is Oracle 11g.

The actual column for 'targetfield' is the same, it's just the values a, b, c that differ? How about using a built-in collection type constructor?

SELECT user FROM othertable WHERE targetfield IN sys.odcivarchar2list(a, b, c) AND otherconditions

There's also sys.odcinumberlist and sys.odcidatelist. (All three are VARRAYs, which are kinda eh but I guess if you only have a few elements the limitations won't matter.) If neither suffices, I suppose you could just create a custom collection type and use its default constructor.

# ? Nov 16, 2020 19:52

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

I think this works:

SELECT record WHERE user IN (SELECT user FROM othertable WHERE otherconditions AND targetfield in (a,b,c) GROUP BY user HAVING COUNT(distinct targetfield) = 3)

So, WHERE to select only rows with desired targetfield values, group by user, use the logical clause in the COUNT to distinctify the values of targetfield inside the group, return the users where COUNT(distinct targetfield) = 3. (or however many rows are in the IN clause)

Paul MaudDib fucked around with this message at 01:41 on Nov 17, 2020

# ? Nov 16, 2020 20:10

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

It would be interesting to see the performance difference EXPLAINed between HAVING; or SELECT user, count(thing); or SELECT INTERSECT.

# ? Nov 17, 2020 04:12

DELETE CASCADE: Oct 25, 2017; i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

i don't touch oracle unless the task is to migrate away from it to postgres. so possibly my advice here will be incorrect. but here we go: i essentially NEVER use "select a from b where c in (select d from e where ...)", and instead i ALWAYS use the form "select a from b where exists (select 1 from e where c = d ...)" to express the semi-join. the ONLY time i use the "in (...)" is when i have a small list of constants. and i NEVER use "not in (...)" due to the issues that arise if you ever have a null in the list. so basically, in production queries, i effectively avoid this sort of thing completely.

# ? Nov 17, 2020 07:26

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

I'm not familiar with Oracle either but I'd go for what DELETE CASCADE mentions as well and just do

code:

SELECT record
FROM sometable
WHERE
  EXISTS(select 1 from othertable where sometable.user = othertable.user and targetfield = a and otherconditions)
  AND EXISTS(select 1 from othertable where sometable.user = othertable.user and targetfield = b and otherconditions)
  -- etc

I'd assume this would perform better than a solution based on distinct aggregation, and especially so if it's likely that there are many matching rows for each subquery.

e: I think this is actually relational division and could be done with nested NOT EXISTS clauses but I'm not smart enough to figure out how

TheFluff fucked around with this message at 12:58 on Nov 17, 2020

# ? Nov 17, 2020 12:31

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

Paul MaudDib fucked around with this message at 07:17 on Nov 18, 2020

# ? Nov 18, 2020 00:01

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Couldn't let it go, and I figured out how to do it without a bunch of stacked exists queries:

SQL code:

select record
from sometable
where not exists(
  select 1
  from (values (a), (b), (c)) as required(targetfield)
  where not exists(
    select 1
    from othertable
    where
      othertable.targetfield = required.targetfield
      and sometable.user = othertable.user
      and otherconditions
  )
)

I think this is the relational division pattern (inverse of a cross product), or at least sort of. Horribly confusing to read with the double negative ("select records in sometable where there is no targetfield that has no matching row in othertable"). No idea if it works in Oracle but it does work in Postgres. I'm really not sure what the query planner will make of it though - I didn't try it on any real tables.

TheFluff fucked around with this message at 15:00 on Nov 18, 2020

# ? Nov 18, 2020 12:37

kiwid: Sep 30, 2013

There must be something easy I'm missing about grouping data with non-aggregate columns.

My data looks like this:

code:

| employee_number | employee_first | employee_last | hours | type     | date       |
|-----------------|----------------|---------------|-------|----------|------------|
| 5               | John           | Smith         | 4     | regular  | 2020-11-02 |
| 5               | John           | Smith         | 8     | regular  | 2020-11-03 |
| 7               | Jane           | Doe           | 8     | regular  | 2020-11-05 |
| 5               | John           | Smith         | 2     | overtime | 2020-11-02 |
| 2               | Chris          | Simpson       | 8     | regular  | 2020-11-05 |

My query:

code:

select
    `employee_number`,
    sum(`hours`)
from `shifts`
where `type` = 'Regular'
group by `employee_number`

Is there anyway to include the first and last name columns in an aggregate query like this?

# ? Nov 18, 2020 21:16

Shy: Mar 20, 2010

MIN(employee_first), MIN(employee_last)

# ? Nov 18, 2020 21:34

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

kiwid posted:

There must be something easy I'm missing about grouping data with non-aggregate columns.

My data looks like this:

code:

| employee_number | employee_first | employee_last | hours | type     | date       |
|-----------------|----------------|---------------|-------|----------|------------|
| 5               | John           | Smith         | 4     | regular  | 2020-11-02 |
| 5               | John           | Smith         | 8     | regular  | 2020-11-03 |
| 7               | Jane           | Doe           | 8     | regular  | 2020-11-05 |
| 5               | John           | Smith         | 2     | overtime | 2020-11-02 |
| 2               | Chris          | Simpson       | 8     | regular  | 2020-11-05 |

My query:

code:

select
    `employee_number`,
    sum(`hours`)
from `shifts`
where `type` = 'Regular'
group by `employee_number`

Is there anyway to include the first and last name columns in an aggregate query like this?

code:

select
    `employee_number`,
    `employee_first`,
    `employee_last`,
    sum(`hours`)
from `shifts`
where `type` = 'Regular'
group by `employee_number`, `employee_first`, `employee_last`

now, if the employee first and employee last names aren't the same for a given employee_number (i.e. you have some data integrity problems) then that won't work directly but you can put max(employee_first), max(employee_last) and get at least something out of it.

# ? Nov 18, 2020 21:35

kiwid: Sep 30, 2013

min and max are the perfect solutions for my problem, I knew it'd be something simple. Much better than what I was doing which was select distinct employee_number, first, last then joining via a subquery in the from.

# ? Nov 18, 2020 21:53

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Not sure if this is more a DB question or a :cloud:

question but I'm trying here first: we have a failover group for our production Azure SQL database that I usually connect to if I'm querying production, instead of connecting to the individual SQL Server instance. When I connect to the group, I can see our pre-production database, even though it only exists on one of the servers that the production database is on. While connected to that group, if I run

code:

SELECT OBJECT_NAME(s.OBJECT_ID), i.name, s.*
FROM sys.dm_db_index_physical_stats(DB_ID(N'our-preprod-db'), NULL, NULL, NULL , 'SAMPLED') as S
	inner join sys.indexes as I on S.index_id = I.index_id and i.object_id = s.object_id
ORDER BY avg_fragmentation_in_percent DESC

I get different results than when I'm connected to the server preprod lives on.

Thinking about it, it's weird that I can even see the proprod database in the tree when I connect to the group since preprod isn't even in a failover group at all. Also weird to me Azure has the concept of a server here because the databases are the level resource allocation (billing) seems to happen.

Anyway, anyone know what the heck I'm even querying when I connect to the group that the DB doesn't participate in and then ask for that DB's physical stats?

# ? Nov 24, 2020 17:17

Adbot: ADBOT LOVES YOU

# ? Jun 7, 2024 23:01

DELETE CASCADE: Oct 25, 2017; i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

switch to postgres, you'll save your company money on ms sql licenses, and then i might have the slightest idea of what you are even asking

# ? Nov 24, 2020 18:47

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »