Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there a recommended dummies book for understanding the basics of SQL? I just started a new job in hospital clinical informatics. I doubt I'll ever write to a database but I'm sure it would be handy if I knew how to read and pull reports from them. I'm trying to teach myself VBA and Python to that end.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I need to learn how to write SQL queries for work. I just stepped through the free Stanford Relational Algebra videos and they were surprisingly engaging. Next is the SQL section of videos.

I know that often times theory can be learned in the classroom but never knowingly utilized in the field. For those who write queries, how often (if ever) do you think about relational algebra?

Hughmoris fucked around with this message at 01:55 on Dec 31, 2017

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there a recommended formatting/style guide I should adhere to as I start learning to write SQL queries? I'm using SSMS but the computer is pretty locked down and I don't think I'll be able to use plugins to help.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the answers. One more beginner question: Should I be spending my time learning how to use the query builder/designer in SSMS, or should I focus instead on writing queries by hand for a while?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

NihilCredo posted:

I work in an office full of young support people who use SSMS four out of eight hours a day but who couldn't tell you what a trigger is. I warned them several times that by blindly following wizards and menus, instead of learning the actual SQL commands and how they work, they're practically leaving money on the table (in the form of valuable skills and future job prospects). Please don''t do the same.

Thanks for the advice. I'm not interested in taking short cuts if it'll possibly handicap me down the road.

I just finished up the Wise Owl: SQL Server Queries series, and am working my way through some practice problems. Are there any SQL rooms in IRC or Discord that people hang out in, to get feedback on some simple queries I'm writing?

For this particular problem: I have a Movies DB, and I want to know who the oldest actor is for each movie. My query returns the results but are there any easy improvements to be had?



SQL code:
WITH my_select
AS (SELECT
    FilmName,
    MIN(ActorDOB) AS BirthDate
FROM tblCast AS c
INNER JOIN tblActor AS a
    ON c.CastActorID = a.ActorID
INNER JOIN tblFilm AS f
    ON c.CastFilmID = f.FilmID
GROUP BY FilmName)

SELECT
    FilmName,
    ActorName,
    ActorDOB
FROM my_select
INNER JOIN tblActor
    ON my_select.BirthDate = tblActor.ActorDOB
ORDER BY FilmName

Hughmoris
Apr 21, 2007
Let's go to the abyss!

NihilCredo posted:

One mistake: your query will return, for each movie, all the actors that share the same birth date as the oldest actor in the movie, even if those actors didn't appear in the movie.

I've been trying my hand at fixing this for the past 45 minutes, can't figure it out. Any tips? I can explain what I need to do in pseudo code but I'm not sure how to tackle it in SQL.

Hughmoris fucked around with this message at 02:58 on Jan 2, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

kumba posted:

You want to rank the birth dates within each movie, and only select the person with the lowest birthdate for each movie, so I'd use a ranking function in your CTE:


SQL code:
WITH my_select
AS (SELECT
    FilmName,
    ActorName,
    ActorDOB AS BirthDate,
    ROW_NUMBER() OVER (PARTITION BY f.FilmID Order By a.ActorDOB asc) AS BirthDateRank
FROM tblCast AS c
INNER JOIN tblActor AS a
    ON c.CastActorID = a.ActorID
INNER JOIN tblFilm AS f
    ON c.CastFilmID = f.FilmID
)

SELECT
    FilmName,
    ActorName,
    ActorDOB
FROM my_select
WHERE BirthDateRank = 1
ORDER BY FilmName
Note: this will still run into a problem if two actors have the same birth date and they are the oldest, since this will result in something non-deterministic. You'll want to look at the various ranking functions (RANK, DENSE_RANK, ROW_NUMBER, etc) and what other criteria you'd want as a tiebreaker for the edge case.

Thank you. I haven't seen ROW_NUMBER() or PARTITION BY in my adventures yet, so I'll do some reading along with the ranking functions.

NihilCredo posted:

The subquery is fine.

You don't want to join on all the actors with the same birth date, but on all the actors with the same birth date in that movie.

Add a join on tblCast to the main query.

I'll give that a try. Thanks!

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

Windowed functions are really awesome and powerful, and I use them a lot. However, you need to be really careful with the window's frame. The frame of a window defines the subset of rows in which the function will work. The default frame is "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". That basically means, when analyzing this function across all your returned rows, only analyze rows that are between the beginning of your PARTITION and the row being calculated (not the end of your PARTITION).

This can introduce a very problematic "gotchya" where your window function isn't returning the data you'd expect because the frame isn't set appropriately. If you use the LAST_VALUE() function, for example, the last row will always be the current row:



Make sure you read up on windowed functions and frames before you start using code in production.

https://www.red-gate.com/simple-talk/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/

Thanks for the write up and link. What type of problems make you think "Hmmm, I need to use a window function here"?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

The classic example is to find the value associated with a local min or max. Let's use an example - we'll do something intuitively simple: stock prices. First, let's create some data:

SQL stuff...

You're awesome. I really do appreciate you typing that up, and now windowed functions make much more sense.

Another question for you: I did notice that you're using the '#' symbol throughout your code. This is the first time I'm seeing that symbol. A quick google tells me that it is "Prefix local temporary table names". Are you using it because you need it for this example, or do you use it in your day to day work too?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

# means a temp table, which is similar to a table variable except that it is actually created as a SQL object in the Temp database. I use then for scratch pad type stuff on sets of data, or for dashing up an example for this thread. They are basically scoped to one connection and should get explicitly cleaned up when you're done with them.

Thanks for the clarification.

I just wrote my first basic query that actually produces data I can use for my job. Feels good. Hopefully this is the beginning of the end of submitting data requests and waiting on others to create reports for me.

I have a question about how to think about a different problem in SQL. I've solved it in Python but I have to go through the headache of splitting queries in to different CSV files and writing more code.

I have two tables. TABLE A is occurrences, TABLE B is time windows. For each occurence in TABLE A, I need to see if it fell within any of its available time windows. I just need a Y/N or 0/1 to signify if it did or not. Any guidance on how to approach this is appreciated.

TABLE A:


TABLE B:


FINAL RESULT:

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Kuule hain nussivan posted:

My first, possibly inefficient mock for this.

SELECT a.ID, a.Occurence_time, CASE WHEN (ISNULL(b.ID, 0) = 0) THEN 0 ELSE 1 END AS Pass
FROM Table_A AS a
LEFT JOIN Table_B AS b
ON b.ID = a.ID AND a.Occurence_time >= b.Start_time AND a.Occurence_time <= b.Stop_time

I'll give this a shot. Thanks!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Still learning my way through SQL, a bit stumped on how to tackle something with SQL Server.

I have a query that returns the results I want
code:
(department, datetime)
('AA', 2018-01-01 13:00)
('AA', 2018-01-01 20:00)
('BB', 2018-01-12 03:00)
('BB', 2018-01-12 14:55)
What I want is to add an extra column that calls out if the event occured on 'day' or 'night' shift. Day shift is defined at 07:00 - 18:59. Night shift 19:00 - 06:59.

Ideally, the end result is:
code:
(department, datetime, shift)
('AA', 2018-01-01 13:00, 'DAY')
('AA', 2018-01-01 20:00, 'NIGHT')
('BB', 2018-01-12 03:00, 'NIGHT)
('BB', 2018-01-12 14:55, 'DAY')
I'm reading up on CAST and CASE statements, not sure if I should be using one or both.

Any guidance is appreciated.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

code:
CASE
WHEN
DATEPART(HOUR, [datetime]) >= 7 AND
DATEPART(HOUR, [datetime]) < 19
THEN 'DAY'
ELSE 'NIGHT'
END AS Shift
The syntax may need a tweak here or there, but that's the gist.

As a stylistic thing, I would refrain from naming things reserved words if I we're you.

Good luck and welcome to SQL!


Edit: finally was able to get in front of a console and here's what works:
code:
DECLARE @Foo DATETIME = '1/1/2017 1:00 PM'

SELECT CASE
WHEN
DATEPART(HOUR, @Foo) >= 7 AND
DATEPART(HOUR, @Foo) < 19
THEN 'DAY'
ELSE 'NIGHT'
END AS [Shift]
TIL Shift is a SQL reserved word.

You're a saint. Thanks!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
SQL rookie question:

I was given an excel report with 4,000 account IDs and they want me to look up the associated account names in SSMS. It's a single table that contains both but it is greater than 5 million rows. I initially tried exporting all account IDs and account names from SSMS to Excel and do a simple VLOOKUP but Excel had a heart attack.

What's good practice for something like this? My second thought was a simple:
SQL code:
SELECT AccountID
       AccountName
  FROM Accounts
 WHERE AccountID IN (
          ...list of 4,000
 )

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the ideas. I did the Excel method mentioned above and it worked great.

As a learning exercise I'm going to try to create a table and figure out how to slurp in the Excel sheet, since I haven't dealt with creating tables yet.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I'm still learning the basics of SQL Server and need a little guidance on a problem:

I have a table of hospital admissions and I'm joining it to a table of lab procedures. I end up with something like this:

code:
patient, admission_time, lab_type, lab_time

john doe, 5/24/2018 10:00, ekg, 5/24/2018 10:15
john doe, 5/24/2018 10:00, ekg, 5/26/2018 23:00
john doe, 5/24/2018 10:00, mri, 5/24/2018 11:28
john doe, 5/24/2018 10:00, mri, 5/27/2018 14:50
sally field, 5/17/2018 09:40, ekg, 5/17/2018 09:50
sally field, 5/17/2018 09:40, ekg, 5/21/2018 13:22
sally field, 5/17/2018 09:40, mri, 5/17/2018 16:10
....
I'd like to end up with is a table that shows the patient, their admission time, and the first time they received an ekg and mri.

code:
patient, admission_time, 1st_ekg_time, 1st_mri_time

john doe, 5/24/2018 10:00, 5/24/2018 10:15, 5/24/2018 11:28
sally field, 5/17/2018 09:40, 5/17/2018 09:50, 5/17/2018 16:10
...
What is a good way to approach this?

Hughmoris fucked around with this message at 05:14 on Jun 19, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

anthonypants posted:

Dunno what flavor of SQL you're in but you're probably looking at a Common Table Expression here. Something like this:
code:
WITH 1st_ekg ( ekg_pt, ekg_time )
AS
(
SELECT patient, lab_type, lab_time
FROM table
WHERE lab_type = 'ekg'
ORDER BY lab_time DESC
)

WITH 1st_mri ( mri_pt, mri_time )
AS
(
SELECT patient, lab_type, lab_time
FROM table
WHERE lab_type = 'mri'
ORDER BY lab_time DESC
)

SELECT table.patient, table.admission_time, 1st_ekg.ekg_time AS 1st_ekg_time, 1st_mri.mri_time AS 1st_mri_time
FROM table
INNER JOIN 1st_mri
ON table.patient = 1st_mri.mri_pt
INNER JOIN 1st_ekg
ON table.patient = 1st_ekg.ekg_pt

Thanks for taking the time to type that up. I'm using SQL Server, and am starting to explore CTEs. I believe I see how your solution is working, and will give it a go in the morning. Thanks again!

Hughmoris fucked around with this message at 05:25 on Jun 19, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

quote:

Hospital Lab SQL stuff...

Thanks everyone for the tips, I was able to get the results I needed.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I have a beginner T-SQL question that is tripping me up:

I have a table:
code:
ORDER, PERSON, TRANSACTION
aaa, John, 2018-07-02 13:00
aaa, Bill, 2018-07-02 09:00
aaa,Carl, 2018-07-02 18:00
bbb, Sarah, 2018-07-03 19:00
bbb, Julie, 2018-07-03 22:00
bbb, Phil, 2018-07-03 20:15
What I want to end up with is a table that shows: for each order, what is the earliest transaction time and person, and what is the last transaction time and person.
So my end result is ideally something like this:
code:
ORDER, FIRST_TRANSACTION, FIRST_PERSON, LAST_TRANSACTION, LAST_PERSON
aaa, 2018-07-02 09:00, Bill, 2018-07-02 18:00, Carl
bbb, 2018-07-03 19:00, Sarah, 2018-07-03 22:00, Julie
I'm flailing a bit at this point. I've tried using MIN, MAX but I end up getting a row for each person. I remember seeing somewhere a method where you ORDER BY a sub query and LIMIT 1 to get the first or last result but I'm positive what that would look like.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Hammerite posted:

There are multiple ways of achieving this sort of result IIRC. One way to do it is to construct a subquery that gets the minimum and maximum for each order, then join it against two copies of the original table so as to get the PERSON data from that table.

code:
SELECT
    Subquery.ORDER,
    Subquery.FIRST_TRANSACTION,
    MyTable_First.PERSON AS FIRST_PERSON,
    Subquery.LAST_TRANSACTION,
    MyTable_Last.PERSON AS LAST_PERSON
FROM
    (
        SELECT
            ORDER,
            MIN(TRANSACTION) AS FIRST_TRANSACTION,
            MAX(TRANSACTION) AS LAST_TRANSACTION
        FROM
            MyTable
        GROUP BY
            ORDER
    ) AS Subquery
    JOIN MyTable AS MyTable_First ON
        Subquery.ORDER = MyTable_First.ORDER AND
        Subquery.FIRST_TRANSACTION = MyTable_First.TRANSACTION
    JOIN MyTable AS MyTable_Last ON
        Subquery.ORDER = MyTable_Last.ORDER AND
        Subquery.LAST_TRANSACTION = MyTable_Last.TRANSACTION

I was able to apply this concept to my problem and it worked like a champ. Thank you!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I'm continuing my journey of learning SQL, specifically T-SQL. I'm starting to explore how to set a query to run weekly on a specific weekday.

I'm an analyst that is not part of the server or date warehouse team. Assuming my company has all the typical SQL reporting tools (SSRS?), what should I read up on for the simplest way to schedule a T-SQL query to run every Tuesday?

Ideal world, the query runs every Tuesday and it emails the resultant CSV file to somebody.

Hughmoris fucked around with this message at 05:39 on Aug 2, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Cold on a Cob posted:

Read up on the sql agent jobs. If all you want is csv results, you can use sp_send_dbmail from inside your sql job.

Relevant links:
Agent/Jobs: https://docs.microsoft.com/en-us/sql/ssms/agent/sql-server-agent?view=sql-server-2017
Sending Email from T-SQL: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-send-dbmail-transact-sql?view=sql-server-2017

Thanks for the guidance and links!

Say I wanted to end up getting fancier than a CSV. I have an Excel workbook that I use as a reporting template. Basically I open the workbook template, refresh my data with the latest CSV file which in turn updates my pivot tables, then save and email out.

I'm all for automating as much as possible. Do you see a path to accomplish this given the need to run the query first? I've fairly comfortable with VBA and basic scripting (not T-SQL related) but my workstation isn't always on when it needs to run.


*Well that sucks. Looks like I dont have access to SQL Agent Jobs.

Hughmoris fucked around with this message at 16:00 on Aug 2, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the ideas on how to schedule a weekly query and email.

I have an older coworker who is a Microsoft Excel and Access guru. He pretty much exclusively uses the Query Designer (visual designer) in Access, and he is struggling a bit in transitioning to T-SQL since the Query Designer in SSMS is poor compared to Access's (per him).

Are there any third party tools I can point him at that have a great visual query designer, and can hook in to SQL Server?

Hughmoris
Apr 21, 2007
Let's go to the abyss!
T-SQL novice, and I see a little guidance on how to approach a problem.

I have a table that contains patients and their ordered medications. I have a second table which contains those patients and their lab results. What I want to do is evaluate what impact a medication order can have one their follow-up lab value.

Medication_Table:
code:
patient, medication, administration_time
john, potassium supplement, 2018-08-01 08:30
john, magnesium supplement, 2018-08-02 09:00
mary, phosphorus supplement, 2018-08-05 15:00
phil, magnesium supplement, 2018-08-03 13:15
Lab Result Table:
code:
patient, lab_order, lab_order_time, lab_result
john, potassium, 2018-07-25 13:00, 4.3
john, potassium, 2018-08-01 10:30, 4.5
phil, magnesium, 2018-08-04 04:00, 1.8
Ideal Final Table:
code:
patient, medication, administration time, next_relevant_lab_order, next_relevant_lab_order_time, next_relevant_lab_result
john, potassium supplement, 2018-08-01 08:30, potassium, 2018-08-01 10:30, 4.5
phil, magnesium supplement, 2018-08-03 13:15, magnesium, 2018-08-04 04:00, 1.8
I guess the plain english version is that for every electrolyte supplement given, I want to look to that patient's closest subsequent electrolyte lab value to see what the effect is.

Is this something that Window functions / Lead functions are meant for?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

Windowed functions are intended for doing some sort of partitioned or ordered calculation on your dataset. Lag and Lead are specifically for finding the last or next value of a row without needing to do some crazy joins and filtering. Usually you would use windowed functions on a single dataset (like an individual table or maybe a product of joins in the right situation). Let’s say you were trying to find the value of the prior lab result for the same lab: using Lag lets you skip hairy aggregations and self-joins.

If I were approaching this problem, I would probably start by trying the easiest logical solution: an outer apply select top 1 lab where lab_date > admin_date order by lab_date. Applies can have worse performance than joins because they’re basically evaluating each row individually, so that might not be your ideal solution.

If that performed poorly, then I might try left joining labs to admins on lab_date > admin_date and aggregating min lab_date and grouping on admin_id - effectively giving you the first lab after each admin - and using that to join out to the right data for every admin and it’s next lab. That’s the traditional join based solution. You might be able to further optimize by capping your join on something like lab_date < dateadd(hours, admin_date, @dosage_effect_duration) - that could reduce the number of rows joined (reads).

I learned something new today! I've never used the outer apply method before so I tried that out and it worked like a charm. To your point though, the performance is not great due to having to read through a lot of rows. I'll try my hand at the left join technique you spoke to. Thanks for the help!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Hmm. Yeah, I think I'm going to have to figure out how to use the second method you recommended. The outer apply query is taking about 7 minutes for 1 month.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
In TSQL, is there a difference between putting a clause in the JOIN statement versus in the WHERE statement?

I'm still learning SQL and I've seen it mentioned but I'm not sure I fully understand what occurs.

SQL code:
SELECT *
FROM T1
LEFT JOIN T2 ON T1.Key = T2.Key AND Event=AAA
SQL code:
SELECT *
FROM T1
LEFT JOIN T2 ON T1.Key = T2.Key
WHERE Event=AAA
Will those queries return different results? What's the fundamental difference of putting the clause in the JOIN statement versus the WHERE statement? Or is there none?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Thanks for the links, and the other helpful information.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

I think you’re both expressing the same thing - that SQL is (mostly) declarative - and that you should generally write queries in the most human parseable format as the optimizer is usually pretty good at determining an efficient control flow.

Just a quick reminder though that a declarative language is only as efficient as its language’s implementation, which while for SQL is great, it’s not perfect. Especially for complex queries, syntax and order of joins can make a difference. Here’s a really good article on the subject: https://explainextended.com/2009/07/12/double-thinking-in-sql/

Back to the original topic, I disagree that all filtering conditions belong in the where clause. Even in a query fully composed of inner joins I often find the query easier to read when certain filter criteria of table B (in a A inner join B scenarios) are written as a part of the join condition. This makes it clear to me exactly which rows from table B are being considered for joining rather than needing to jump down to the where clause. But I acknowledge that this is all personal opinion.

Thanks for the link, was an interesting read.

rt4 posted:

The biggest thing for reads is to EXPLAIN and see if it does any table scans. Another important pitfall to beware of is correlated subqueries.

I'll take a look at EXPLAIN, thanks!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Hughmoris fucked around with this message at 00:49 on Dec 11, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

Good stuff...

Thanks for this!

Nth Doctor posted:

Ugh. I don't like schemas like this.

What filters are you getting for the query?
What indices are on the table?
Is it really a patient name or some numerical ID? Same question for event type?

I don't know on the first two questions. I need to read up on indices. Its a patient numerical ID, which is unique to the patient encounter. The event type is generic. The only unique things are the patient ID and the (relatively unique) timestamp.

The gist of the basic query that returns the original table above is:
SQL code:
 
SELECT * 
   FROM DocumentationTable DT
WHERE DocumenationID IN (1,2)  --1 being alarm, 2 being acknowledgement

Hughmoris
Apr 21, 2007
Let's go to the abyss!

downout posted:

Are you asking what query gets the results in "Ideal Endstate"? If so, you can join two data sets based on timestamps. The first data set is Alarms and the second is Acknowledgements. The join is pretty complex because you need the first row from the Acknowledgement set where the timestamp is greater than the Alarm timestamp. I can't remember exactly what the syntax was, but it's possible. Actually it might be cross apply.

Correct, looking for a query to get the Ideal EndState. I believe I figured it out using a correlated subquery. The performance is a hot garbage fire but it returns the correct results (that I've validated so far).

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I appreciate you take the time to type all that up. I need to work my way through understanding how it works. That type of performance is awesome, because my correlated subquery takes 8+ minutes for a month of data.


PhantomOfTheCopier posted:

At the risk of sounding prophetic, are you sure you really want the timestamps in the results? That is, are you really just looking for the final state for the user (a much easier problem), or do you truly need the alert and acknowledgement matched in that fashion?

And the obvious question, what happens if the sequence is (alert1,alert2,ack2,ack1)? Or (alert, alert,ack)?

My understanding of the data is that an acknowledgement will acknowledge all of the alerts that have fired since the last acknowledgement.

The timestamps are needed because the person requesting the report wants to the see stats around when during the day they typically fire, and how long on average until an alarm is acknowledged.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

code:
SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc

You are a wizard. Utilizing this method allowed me to query 11 months worth of data in about 10 seconds. My correlated subquery for 1 month of data took 8+ minutes.

Nth Doctor, thanks for taking another crack at. I need to rewrite the query using your approach to see how it compares. I also need to start working on understanding execution plans.

It's really great to see how other people approach a given problem, and storing those concepts for future problems.

Hughmoris fucked around with this message at 03:59 on Dec 13, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

Only when the performance is a problem. Premature performance tweaks can have detrimental effects of you lock the query optimizer out of performing its magic.

Also making a good hygiene habit of joining on the PK / clustered index wherever possible.

Ruggan posted:

I never start by looking at the indexes. I write the query with the cleanest and most sensible syntax I can come up with, and 95% of the time that gives me a query that is good enough. At that point I’m done.

For the other 5% I’ll do some optimization work which usually starts with looking the query plan for things that are costing a lot: scans, key lookups, that sort of thing.

Rarely am I presented with a situation where query refactoring could lead to better performance via different index utilization, especially when you’ve written a clean query with simple joins. When this does happen a lot of it involves nudging the optimizer to select from the right index with little tricks like filtering a column in multiple joined tables explicitly (like a date range - only pulling data from that date range in each table aka two where between or on between statements).

Sometimes I find that indexes that are necessary or important (like an index on an important foreign key join column) are missing and I’ll either add it if I’m dbo or bug the right person to do it. In this case, having some clout is really helpful as I don’t often have to have the “yes I know indexes have a cost” conversation and rarely request useless / overly specific indexes. If you’re making this sort of request you can bolster your argument by looking at query stats in the query store - if you find a query is running hundreds of times a day and would largely benefit from this index then that helps you convince the dbo to make it.

Thanks for the insights. I recently took a SQL Assessment exam that my software vendor requires. Thanks to this thread, I aced it. In 2018 I've gone from barely being able to spell SQL, to creating executive facing dashboards.

In 2019, I look forward to deepening my knowledge of SQL beyond simple data retrieval. Thanks for the help everyone!

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Bobcats posted:

Are correlated sub queries bad practice? I’m trying to pull across a ton of tables(last events for whatever into their own column) and I see no other way to accomplish what I want to do.

Other people can speak to practice but they've been dirt slow when I've had to use them. I replaced a corellated subquery with a window function and dropped the query time from 8 minutes to 11 seconds.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Anyone here mess with Cerner and their CCL?

With the help of this thread, I was just getting comfortable with writing basic queries in SQL Server. Recently changed jobs and this new places uses Cerner, which sounds like its built on top of Oracle and uses a proprietary SQL layer called CCL.

One thing I've noticed immediately is how great SSMS is compared to this in-house Cerner tool.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I might have an opportunity to take a BI/data warehouse job. I can write basic SQL and I've created data viz dashboards for leadership, both which I enjoy. The data warehouse part will be new to me. Stuff like ETL and SSIS etc...

For those that do this type of work, what are your thoughts? Do you enjoy it? Is it a snooze fest? Does being an administrator for a data warehouse and ETL stuff have a lot of variability and things to learn?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply