|
Malcolm XML posted:Wait they implemented match recognize??? Holy poo poo Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation.
|
# ? Oct 21, 2020 19:11 |
|
|
# ? Jun 8, 2024 09:00 |
|
Moonwolf posted:Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation. As both a SQL and Regex guy, I love this. I'm just super sad that my company uses SQL Server.
|
# ? Oct 22, 2020 18:04 |
|
Nth Doctor posted:As both a SQL and Regex guy, I love this. Well just badger Microsoft to implement it, it's in the SQL standard now (as of 2016), I think Oracle got it put in when they did it in 12.1.
|
# ? Oct 23, 2020 13:53 |
|
Knot My President! posted:Crossposting: imo the bolded part is the galaxy brain way to do sql. i mean of course there's more to it, but it sounds to me like you know quite well what you're doing, and that interviewer sucked. the fact that you even know about common table expressions probably puts you in the 90th percentile
|
# ? Oct 25, 2020 01:10 |
|
I'm hopefully in the right thread and if not, I'm sorry for bad posting. I wasn't sure if this was a SQL or Python question but I think it falls under SQL. I'm reworking an old project and I'm trying to create a dataframe that consists of a join from two tables. Both tables have a common field (location and city). What I'm trying to do is a couple things and I know I'm boning it up. First I'm trying to merge two tables with a different amount of entries. The first table (combined_jobs) is full of jobs at GitHub and Stack Overflow consisting of fields like job title, company name, and location. The second table (location_cache) has three fields with the city (same as location), latitude, and longitude. What I'm trying to do here is for every entry in the location field of the first table, it will get the latitude and longitude from the second table. Here's how I'm trying to do it: code:
|
# ? Oct 31, 2020 17:25 |
|
Your query looks ok (caveat that we can’t see the data). Are you able to run it in your sqllite instance to verify the output? Is it possible your latitude and longitude are defined as varchar or nvarchar columns in your sql data? If so they might be being returned as a string representation of the number ala “1.000” instead of 1.000. If this is the case - ideally, change the types to numeric. If you can’t change the schema, try casting the columns in your query - CAST(longitude AS numeric(?,?)) - the ?s being dependent on what precision you need / want.
|
# ? Nov 1, 2020 14:28 |
|
Ruggan posted:Your query looks ok (caveat that we can’t see the data). Are you able to run it in your sqllite instance to verify the output? So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe. I think the problem is that I'm trying to get two different sized tables to merge in the manner I want. I have a table with 1200 job postings that have a location and then I have a table of about 200 locations with the latitudes and longitudes. I got the locations table from the jobs table and now trying to join the lat and longs to the respective job locations. Thank you for the advice; now I know I need to go back and fix my query. LostMy2010Accnt fucked around with this message at 21:46 on Nov 1, 2020 |
# ? Nov 1, 2020 21:40 |
|
LostMy2010Accnt posted:So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe. Size of the tables you are joining is irrelevant. Lat and long are null because of some combination of: 1. Location_Cache.City never matches exactly combined_jobs.location OR 2. Location_Cache.City matches exactly combined_jobs.location but Location_Cache.Lat and .Long are NULL for that city. Pick a SINGLE KNOWN job in a SINGLE KNOWN city, and try to match it with the relevant row in Location_Cache. You can probably generalize the fine-tuning from there. Use a WHERE clause to limit each table to the row you know you are looking for ahead of time. At a guess: leading or trailing whitespace, case sensitive collation, or trying to match 'New York, NY' with 'New York' is happening. Use LTRIM(), RTRIM(), and LOWER() as first steps.
|
# ? Nov 1, 2020 23:35 |
|
Nth Doctor posted:Size of the tables you are joining is irrelevant. Lat and long are null because of some combination of: Thank you. I'll try that out. I appreciate all the help, folks. Nice to have threads like this.
|
# ? Nov 1, 2020 23:55 |
|
LostMy2010Accnt posted:So following your advice I ran it in SQLlite and the output is definitely showing my tables are not merging properly. My lat and long columns are showing up as Null for all 1200 of my entries. I'm definitely getting two entries that are merging correctly, but the rest are Null. So clearly my issue is joining the two tables which is happening before I get to the dataframe. Yeah, as above, the size of the tables doesn't matter for your results, given that it's not generating results so large that the query is crashing on resources. The other thing I'd flag is that your query is explictly allowing the results to be NULL, because you're left joining. In your end output where you need a lat/long value you should inner join (just JOIN, inner is optional everywhere) so that the join must complete to give results, otherwise you're going to get all the ones where it doesn't match. Fixing those in development or another data quality query is probably going to be useful, a way to find all the results where it's not giving you what you expect can be done by doing code:
|
# ? Nov 2, 2020 01:35 |
|
Moonwolf posted:Yeah, as above, the size of the tables doesn't matter for your results, given that it's not generating results so large that the query is crashing on resources. The other thing I'd flag is that your query is explictly allowing the results to be NULL, because you're left joining. In your end output where you need a lat/long value you should inner join (just JOIN, inner is optional everywhere) so that the join must complete to give results, otherwise you're going to get all the ones where it doesn't match. Fixing those in development or another data quality query is probably going to be useful, a way to find all the results where it's not giving you what you expect can be done by doing Thank you for the advice, I'm going to be working on it some more today so I'll put this advice to good use. Again, I appreciate everyone's input; thank you for the help.
|
# ? Nov 2, 2020 12:51 |
|
here's some quirky poo poo i've never noticed before: if i have two columns: FedTaxID (nvarchar(10)) SSNHashed (varbinary(8000)) and i do a select coalesce(FedTaxID,SSNHashed) this is what the output looks like when FedTaxID is NULL i assume this is due to some implicit cast forcing the hash to be interpreted as some bastardized ASCII? this doesn't affect my results materially at all i just thought it was an interesting curiosity that i've never noticed before
|
# ? Nov 2, 2020 17:50 |
|
kumba posted:here's some quirky poo poo i've never noticed before: if i have two columns: The Microsoft Documentation posted:Returns the data type of expression with the highest data type precedence. If all expressions are nonnullable, the result is typed as nonnullable.
|
# ? Nov 2, 2020 18:16 |
|
I'm an RN in a EHR (Epic) analyst position. I want to get more on the data side of the house. I have experience with querying the EDW, python+pandas, data viz etc... and am currently learning about virtualization and containers. Are there any certifications I can work on over the next year that will make me more marketable for BI/ETL/EDW Developer positions? I've really enjoyed my exposure to this work and I think I need more tech-centric certs on my resume to balance out my clinical expertise and lack of a data-centric role.
|
# ? Nov 3, 2020 03:26 |
|
kumba posted:i assume this is due to some implicit cast forcing the hash to be interpreted as some bastardized ASCII? that's not ASCII. It's displaying it as Windows-1252, with box characters for control codes/invalid bytes.
|
# ? Nov 3, 2020 11:48 |
|
Hughmoris posted:I'm an RN in a EHR (Epic) analyst position. I want to get more on the data side of the house. I have experience with querying the EDW, python+pandas, data viz etc... and am currently learning about virtualization and containers. certifications are worthless as indicators of ability except as a gatekeeping mechanism or are so vendor specific as to be meaningless (most vendor certs are run by their marketing depts) all the interesting medical data is locked inside the records, which, well, lmao you aren’t getting anything useful out of HL7 certs except as a means to get a job where you won’t do anything useful
|
# ? Nov 4, 2020 01:18 |
|
Moonwolf posted:Hmm, I thought they had, it might have just been EDB or 2nd Quadrant talking about it in the roadmap, since I can't find it in the docs. It is row regex though, including recursion and greedy/non-greedy in the Oracle implementation. im horny for match_recognize it’s so nice and not even that hard if u can handle general NFA/DFA
|
# ? Nov 4, 2020 01:20 |
|
DELETE CASCADE posted:imo the bolded part is the galaxy brain way to do sql. i mean of course there's more to it, but it sounds to me like you know quite well what you're doing, and that interviewer sucked. the fact that you even know about common table expressions probably puts you in the 90th percentile Thanks man. Was driving me nuts. Update to this: I got a perfect score on the SQL technical interview at Facebook and the person clapped excitedly when I did. Ended up crushing the whole interview and got an offer three days later. Best part? I was able to turn down Facebook entirely for another job I also got a perfect score on. For this one, the data Big Boss for the entire company gave me the techical screening. People ops said he was blown away. Got a kickass team and a 30% pay bump when I start. gently caress Twitch for getting verbally upset and having one of the interviewers leave the Zoom because "CTEs aren't performant". Also gently caress Autodesk for making me do a six hour take-home SQL test and then ghosting me after I turned it in. (really???) So glad I'm done with this job hunting shitshow.
|
# ? Nov 7, 2020 18:24 |
|
Knot My President! posted:Thanks man. Was driving me nuts. Very nice, well done. quote:gently caress Twitch for getting verbally upset and having one of the interviewers leave the Zoom because "CTEs aren't performant". Also gently caress Autodesk for making me do a six hour take-home SQL test and then ghosting me after I turned it in. (really???) So glad I'm done with this job hunting shitshow. That's hillariously unprofessional interviewing, wtf? Also CTEs are at worst identical to inline subqueries, but you give the db more chance to temp-table them and not have to run it again if you're reusing it. What the hell was that person on?
|
# ? Nov 7, 2020 18:51 |
|
Moonwolf posted:Also CTEs are at worst identical to inline subqueries That's not always a guarantee depending on the vendor. Notably, in Postgres before 12.0 subqueries could be inlined but CTEs couldn't, so for simple filters and self-join you might get better performance by skipping the CTE.
|
# ? Nov 7, 2020 19:13 |
|
Moonwolf posted:That's hillariously unprofessional interviewing, wtf? Yeah, that's an absurd reaction to someone giving an answer you don't like to a technical question. A more reasonable reaction would be to say to the candidate something like "OK, but I see you used a CTE. Our house style says that we avoid using CTEs because we believe they're often unperformant. How would you do it without a CTE?" That lets you test the candidate on how cooperative they are, whether they can vary their approach when new requirements are presented, and so on. If you don't have the time to engage with the candidate at that level then you don't have any reason to be in attendance at the interview. Either that or you have so many eager candidates you can afford to be an rear end in a top hat to them and reject many of them on unrigorous heuristic grounds. Of course for a very well-known company like Twitch this might be the case.
|
# ? Nov 8, 2020 14:21 |
|
Hammerite posted:If you don't have the time to engage with the candidate at that level then you don't have any reason to be in attendance at the interview. Either that or you have so many eager candidates you can afford to be an rear end in a top hat to them and reject many of them on unrigorous heuristic grounds. Of course for a very well-known company like Twitch this might be the case. Reminds me of the old joke "I discard half the resumes I receive at random. I don't want to hire unlucky people."
|
# ? Nov 9, 2020 09:06 |
|
Moonwolf posted:That's hillariously unprofessional interviewing, wtf? Yes, and this is the second person I've seen mentioning unprofessional interviews at Twitch in the last few months. The other was on Discord so I can't link it here.
|
# ? Nov 9, 2020 17:13 |
|
I have a table recording IPs and time (among some other things) when a page is hit. I'd like to be able to run some queries against both the total hits and the unique hits. I had assumed grouping by IPs would do the trick but MySQL doesn't like that anymore and I get an error: code:
I've added a 'unique' boolean column to the table which sets to true each time a new IP is entered. It works fine for now but it feels like this should be a fairly simple query but I'm just drawing a blank on it. I suppose what I want is to be able to just select a row if it's the first time that IP shows up.
|
# ? Nov 13, 2020 16:58 |
|
Tea Bone posted:I have a table recording IPs and time (among some other things) when a page is hit. I’m not a MySQL expert by any means but I think code:
is what you’re looking for
|
# ? Nov 13, 2020 20:45 |
|
Tea Bone posted:I have a table recording IPs and time (among some other things) when a page is hit. I'm not sure what you're trying to do, but if you want a query that returns something like ip, seen_at, first_seen_at then you can use a window function for that: code:
|
# ? Nov 14, 2020 02:13 |
|
We have a select box that basically lets you do a search based on a user where some other records must exist for that the user that owns that record. So something like: "SELECT record WHERE user IN (SELECT user FROM othertable WHERE targetfield = a AND otherconditions) AND user IN (SELECT user FROM othertable WHERE targetfield = b AND otherconditions)..." "otherconditions" are all the same for all the subqueries except targetfield changes. Is there a way to express that we want "select all users from othertable WHERE user has instances of records where targetfield=a,b,c exist"? I'm coming up with some type of CTE maybe or group aggregation function? Like we LISTAGG(a,b,c) and then somehow check that all of a,b,c are contained in the aggregated string? DB is Oracle 11g. edit: maybe something along the lines of "SELECT record WHERE user IN (SELECT user FROM othertable WHERE otherconditions GROUP BY user HAVING COUNT(targetfield in (a,b,c)) > 3"... but there can also be multiple rows where targetfield = a so we really need the number of distinct targetfield where the distinct targetfield in (a,b,c). Paul MaudDib fucked around with this message at 19:54 on Nov 16, 2020 |
# ? Nov 16, 2020 19:35 |
|
Paul MaudDib posted:We have a select box that basically lets you do a search based on a user where some other records must exist for that the user that owns that record. So something like: The actual column for 'targetfield' is the same, it's just the values a, b, c that differ? How about using a built-in collection type constructor? SELECT user FROM othertable WHERE targetfield IN sys.odcivarchar2list(a, b, c) AND otherconditions There's also sys.odcinumberlist and sys.odcidatelist. (All three are VARRAYs, which are kinda eh but I guess if you only have a few elements the limitations won't matter.) If neither suffices, I suppose you could just create a custom collection type and use its default constructor.
|
# ? Nov 16, 2020 19:52 |
|
I think this works: SELECT record WHERE user IN (SELECT user FROM othertable WHERE otherconditions AND targetfield in (a,b,c) GROUP BY user HAVING COUNT(distinct targetfield) = 3) So, WHERE to select only rows with desired targetfield values, group by user, use the logical clause in the COUNT to distinctify the values of targetfield inside the group, return the users where COUNT(distinct targetfield) = 3. (or however many rows are in the IN clause) Paul MaudDib fucked around with this message at 01:41 on Nov 17, 2020 |
# ? Nov 16, 2020 20:10 |
|
It would be interesting to see the performance difference EXPLAINed between HAVING; or SELECT user, count(thing); or SELECT INTERSECT.
|
# ? Nov 17, 2020 04:12 |
|
i don't touch oracle unless the task is to migrate away from it to postgres. so possibly my advice here will be incorrect. but here we go: i essentially NEVER use "select a from b where c in (select d from e where ...)", and instead i ALWAYS use the form "select a from b where exists (select 1 from e where c = d ...)" to express the semi-join. the ONLY time i use the "in (...)" is when i have a small list of constants. and i NEVER use "not in (...)" due to the issues that arise if you ever have a null in the list. so basically, in production queries, i effectively avoid this sort of thing completely.
|
# ? Nov 17, 2020 07:26 |
|
I'm not familiar with Oracle either but I'd go for what DELETE CASCADE mentions as well and just docode:
e: I think this is actually relational division and could be done with nested NOT EXISTS clauses but I'm not smart enough to figure out how TheFluff fucked around with this message at 12:58 on Nov 17, 2020 |
# ? Nov 17, 2020 12:31 |
|
.
Paul MaudDib fucked around with this message at 07:17 on Nov 18, 2020 |
# ? Nov 18, 2020 00:01 |
|
Couldn't let it go, and I figured out how to do it without a bunch of stacked exists queries:SQL code:
TheFluff fucked around with this message at 15:00 on Nov 18, 2020 |
# ? Nov 18, 2020 12:37 |
|
There must be something easy I'm missing about grouping data with non-aggregate columns. My data looks like this: code:
code:
|
# ? Nov 18, 2020 21:16 |
MIN(employee_first), MIN(employee_last)
|
|
# ? Nov 18, 2020 21:34 |
|
kiwid posted:There must be something easy I'm missing about grouping data with non-aggregate columns. code:
|
# ? Nov 18, 2020 21:35 |
|
min and max are the perfect solutions for my problem, I knew it'd be something simple. Much better than what I was doing which was select distinct employee_number, first, last then joining via a subquery in the from.
|
# ? Nov 18, 2020 21:53 |
|
Not sure if this is more a DB question or a question but I'm trying here first: we have a failover group for our production Azure SQL database that I usually connect to if I'm querying production, instead of connecting to the individual SQL Server instance. When I connect to the group, I can see our pre-production database, even though it only exists on one of the servers that the production database is on. While connected to that group, if I run code:
Thinking about it, it's weird that I can even see the proprod database in the tree when I connect to the group since preprod isn't even in a failover group at all. Also weird to me Azure has the concept of a server here because the databases are the level resource allocation (billing) seems to happen. Anyway, anyone know what the heck I'm even querying when I connect to the group that the DB doesn't participate in and then ask for that DB's physical stats?
|
# ? Nov 24, 2020 17:17 |
|
|
# ? Jun 8, 2024 09:00 |
|
switch to postgres, you'll save your company money on ms sql licenses, and then i might have the slightest idea of what you are even asking
|
# ? Nov 24, 2020 18:47 |