Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

Contrary to that, since outer joins are so special and inner joins are routine, I want outers to look different and will continue to write inners as straightforward queries with all the tables in one place, all the field equalities together, all the restrictions together, and let my query planner do its job.

"... and let your query planner do its job"? I don't follow what the significance is meant to be of this flourish at the end of your remark. The query planner will "do its job" equally well regardless of whether join conditions appear in one place or another. The difference is one of readability for a human, and it is for reasons of readability that join conditions belong in the ON clause, as I said. The reason why filtering conditions belong in the WHERE clause, unaccompanied by join conditions, is that this draws the reader's attention to them. They are much more likely to define what the query does. The join conditions are boilerplate - necessary boilerplate, but boilerplate all the same. That is why they should be filed away in the ON clause, where they can be checked if necessary but don't distract from the important filtering business taking place in the WHERE clause.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Thanks for the links, and the other helpful information.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


I think you’re both expressing the same thing - that SQL is (mostly) declarative - and that you should generally write queries in the most human parseable format as the optimizer is usually pretty good at determining an efficient control flow.

Just a quick reminder though that a declarative language is only as efficient as its language’s implementation, which while for SQL is great, it’s not perfect. Especially for complex queries, syntax and order of joins can make a difference. Here’s a really good article on the subject: https://explainextended.com/2009/07/12/double-thinking-in-sql/

Back to the original topic, I disagree that all filtering conditions belong in the where clause. Even in a query fully composed of inner joins I often find the query easier to read when certain filter criteria of table B (in a A inner join B scenarios) are written as a part of the join condition. This makes it clear to me exactly which rows from table B are being considered for joining rather than needing to jump down to the where clause. But I acknowledge that this is all personal opinion.

Pollyanna
Mar 5, 2005

Milk's on them.


If we’re going to be selecting from a particular table by a range of the latest date to an earlier date e.g. 2 weeks prior, 8 weeks prior, quarter to date, etc. then the index I want is on the timestamp descending, right?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Pollyanna posted:

If we’re going to be selecting from a particular table by a range of the latest date to an earlier date e.g. 2 weeks prior, 8 weeks prior, quarter to date, etc. then the index I want is on the timestamp descending, right?

It probably doesn't matter from a querying perspective if you do ASC or DESC, but it may be a hassle from a page management perspective on insert where every new row goes to the first entry of the index rather than the last. I'm not explaining it super well but I'd think your b trees would thank you for writing consistently to the tail of the index rather than the head.

Pollyanna
Mar 5, 2005

Milk's on them.


Nth Doctor posted:

It probably doesn't matter from a querying perspective if you do ASC or DESC, but it may be a hassle from a page management perspective on insert where every new row goes to the first entry of the index rather than the last. I'm not explaining it super well but I'd think your b trees would thank you for writing consistently to the tail of the index rather than the head.

So I won’t get an advantage on querying for data if I make a btree (dayof DESC) index? What should I use if I’m going to be regularly “looking backwards” on a date range?

We will also be regularly clearing and reinserting a certain number of rows (for a given uuid attribute). Basically we’re planning on dropping the past 13 days of data and re-inserting 14 days worth of data every day at 3am. What can I add to the database to make that easier for us?

Pollyanna fucked around with this message at 17:01 on Dec 4, 2018

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Pollyanna posted:

So I won’t get an advantage on querying for data if I make a btree (dayof DESC) index? What should I use if I’m going to be regularly “looking backwards” on a date range?

We will also be regularly clearing and reinserting a certain number of rows (for a given uuid attribute). Basically we’re planning on dropping the past 13 days of data and re-inserting 14 days worth of data every day at 3am. What can I add to the database to make that easier for us?

So I have questions about the delete 13 insert 14 process, but leaving those aside:
Using either a date asc or date desc should result in index seeks like you want which will give you the performance boost you're after.

Conceptualizing the structure of the index itself is where I'm making my argument for asc, but it presupposed you were going for an append only type of insertion.

Here's where I'm coming from: think about the low level memory manipulation that goes on when you allocate a range of address and use them to store a list of data. It's a cheaper operation to append to the end of the list provided you're still within the allocated bounds vs. prepend an entry to the head of the list because the head of the list is at the boundary of your memory allocation. Prepending means every entry in the list needs to be moved down one spot. Appending means maybe a pointer to the tail gets updated.

I'm still phone posting so I can't do fancy pictures to illustrate my point but I think that adding new entries to the head of an index could have similar performance issues as prepending an entry to a low level memory allocated list.

Pollyanna
Mar 5, 2005

Milk's on them.


Nth Doctor posted:

So I have questions about the delete 13 insert 14 process, but leaving those aside:

Oh man, trust me: I do too.

quote:

Using either a date asc or date desc should result in index seeks like you want which will give you the performance boost you're after.

Conceptualizing the structure of the index itself is where I'm making my argument for asc, but it presupposed you were going for an append only type of insertion.

Here's where I'm coming from: think about the low level memory manipulation that goes on when you allocate a range of address and use them to store a list of data. It's a cheaper operation to append to the end of the list provided you're still within the allocated bounds vs. prepend an entry to the head of the list because the head of the list is at the boundary of your memory allocation. Prepending means every entry in the list needs to be moved down one spot. Appending means maybe a pointer to the tail gets updated.

I'm still phone posting so I can't do fancy pictures to illustrate my point but I think that adding new entries to the head of an index could have similar performance issues as prepending an entry to a low level memory allocated list.

Been a while since I had to think about this, but I get what you mean. If we were only ever appending, then having a tail of the most recent just means updating a pointer. Unfortunately we’re also doing some weird windowing stuff so we’re also going back and dropping 13, then appending 14, and I guess that might make it more annoying, especially since we’re also scoping that by another column rather than the whole table.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


If you reinsert with it ordered properly then that'll help with reducing reads. If you're doing a ton of whole-day fuckery then you might be best served with partitioning if it's available in your db, since that'll make throwing away the old data fast, unless you're actually truncating the whole table.

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Moonwolf posted:

If you reinsert with it ordered properly then that'll help with reducing reads. If you're doing a ton of whole-day fuckery then you might be best served with partitioning if it's available in your db, since that'll make throwing away the old data fast, unless you're actually truncating the whole table.

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

Proper ordering on insert won't matter all that much if there's already the index. Unless the dates are themselves the clustered index, insertion order would only affect insertion time in a substantial way. You might get a tiny bit less fragmentation but that's really all it would do.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Has anyone leveraged Change Tracking (MS SQL) on a linked server or through SSIS?

Looks like the CT functions can’t be accessed remotely which leaves me with doing some hacky OpenQuery thing or writing it all directly in SSIS. Just wondering if anyone has experience here.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Ruggan posted:

Has anyone leveraged Change Tracking (MS SQL) on a linked server or through SSIS?

Looks like the CT functions can’t be accessed remotely which leaves me with doing some hacky OpenQuery thing or writing it all directly in SSIS. Just wondering if anyone has experience here.

Sup. I rolled my own replication leveraging change tracking.
E:
code:
USE Scratch_DB;

SELECT CHANGE_TRACKING_CURRENT_VERSION()
DECLARE @SQL NVARCHAR(MAX) = N'USE Scratch_DB;
SELECT CHANGE_TRACKING_CURRENT_VERSION()'

EXEC sp_executesql @SQL;

EXEC Linked_Server.master.sys.sp_executesql N'EXEC sp_executesql @SQL;', N'@SQL NVARCHAR(MAX)', @SQL;

Nth Doctor fucked around with this message at 16:51 on Dec 5, 2018

Pollyanna
Mar 5, 2005

Milk's on them.


Moonwolf posted:

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

It might change, since the data we get each day may not be complete - part of the data takes time to be completely up to date. It might be finished the next day, so we can’t just get it once. I suggested making two tables for the different data, but it ended up being more complicated than it was worth.

It’s super stupid and I hate our reporting system and the less I think about it the saner I remain.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Agrikk posted:

Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

Depending on what the other columns are, it's a hell of a lot easier on the server to compare straight integers rather than long varchars and the like.

Our typical PK is a tenant id and an identity column at a minimum. If there are more int columns as candidate keys we sometimes include them in the PK definition for clustered index benefits.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Nth Doctor posted:

Depending on what the other columns are, it's a hell of a lot easier on the server to compare straight integers rather than long varchars and the like.

Our typical PK is a tenant id and an identity column at a minimum. If there are more int columns as candidate keys we sometimes include them in the PK definition for clustered index benefits.

The three columns are a user ID (varchar(50)), a group ID (varchar(20) and a timestamp, making the row unique for those three columns. Typical queries include searching for all timestamps for a given user/group combo and searching for all users in a given timestamp.

If I were to use an autoincrementing primary key, I'd still have to have an index to make these use cases work properly, necessitating the use of an update to an index.

What is a use case where an identity column makes sense?

Volguus
Mar 3, 2009

Agrikk posted:

The three columns are a user ID (varchar(50)), a group ID (varchar(20) and a timestamp, making the row unique for those three columns. Typical queries include searching for all timestamps for a given user/group combo and searching for all users in a given timestamp.

If I were to use an autoincrementing primary key, I'd still have to have an index to make these use cases work properly, necessitating the use of an update to an index.

What is a use case where an identity column makes sense?

Well, when you hit the limits, for example:
primary key - max 32 columns
index - 900 bytes for a clustered index. 1,700 for a nonclustered index.

In general if you have a bunch of columns that specify the "uniqueness" of a record, just give it a unique key (identity or whatever else) and just make appropriate indexes for the most common queries. If you have only 3 columns that determine the uniqueness of a record, then it should be fine to have those as a primary key.

Though, putting a timestamp in a unique constraint does seem a bit weird. It depends how it's used though: does it ever get updated? Who insert's it? Are there possibilities of collision (same timestamp to be set from different clients/applications)? It can be fine though.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Volguus posted:

Well, when you hit the limits, for example:
primary key - max 32 columns
index - 900 bytes for a clustered index. 1,700 for a nonclustered index.

In general if you have a bunch of columns that specify the "uniqueness" of a record, just give it a unique key (identity or whatever else) and just make appropriate indexes for the most common queries. If you have only 3 columns that determine the uniqueness of a record, then it should be fine to have those as a primary key.

Though, putting a timestamp in a unique constraint does seem a bit weird. It depends how it's used though: does it ever get updated? Who insert's it? Are there possibilities of collision (same timestamp to be set from different clients/applications)? It can be fine though.

Thank you for this explanation. This helps.

In my case, the data comes in for each user/group in batches, and the batch never changes. Thus there are never any updates and inserts happen once an hour for all user/group rows. The time of the insert is the timestamp.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Agrikk posted:

Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

If this data is actually relational and other tables might have foreign keys referencing rows in this table, then you should pretty much always use a surrogate primary key (i.e. an integer that doesn't mean anything). Natural primary keys are tempting but down that road lies madness.

Also, keep in mind that primary keys are clustered by default in MSSQL and if you're using a natural primary key then it's quite likely that it won't be a good clustering key. This is a really complex topic but you can check out this for a primer.

TheFluff fucked around with this message at 15:21 on Dec 7, 2018

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Yeah it's tempting to go "oh hey I don't need to waste space/resources on that primary key because I have uniqueness in (x)+(y)+(z)" but it almost never works out that way because of some of the reasons stated here.

Volguus
Mar 3, 2009
Oh yes, foreign keys. I forgot that one. A quite strong reason to not have a composite PK.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



I'd go so far as to put the 'natural' key in it's own table and have the rows point to it with an FK. Better normalization and a unique index wouldn't have to be nearly as big to cover the user+group+timestamp combination.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Performance and viability may depend on careful optimizations within the database or the construction of the indexes when you're using tricky primary keys. When you come to realize that AbcSql no longer supports some of your other needs and you want to switch to SqlLmn++ you'll almost certainly get to rebuild your application to use a basic integer key. This problem will be worse if you have text as a key field since behavior is database-dependent.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

spiritual bypass
Feb 19, 2008

Grimey Drawer
The biggest thing for reads is to EXPLAIN and see if it does any table scans. Another important pitfall to beware of is correlated subqueries.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

I think you’re both expressing the same thing - that SQL is (mostly) declarative - and that you should generally write queries in the most human parseable format as the optimizer is usually pretty good at determining an efficient control flow.

Just a quick reminder though that a declarative language is only as efficient as its language’s implementation, which while for SQL is great, it’s not perfect. Especially for complex queries, syntax and order of joins can make a difference. Here’s a really good article on the subject: https://explainextended.com/2009/07/12/double-thinking-in-sql/

Back to the original topic, I disagree that all filtering conditions belong in the where clause. Even in a query fully composed of inner joins I often find the query easier to read when certain filter criteria of table B (in a A inner join B scenarios) are written as a part of the join condition. This makes it clear to me exactly which rows from table B are being considered for joining rather than needing to jump down to the where clause. But I acknowledge that this is all personal opinion.

Thanks for the link, was an interesting read.

rt4 posted:

The biggest thing for reads is to EXPLAIN and see if it does any table scans. Another important pitfall to beware of is correlated subqueries.

I'll take a look at EXPLAIN, thanks!

downout
Jul 6, 2009

Hughmoris posted:

Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

There are a couple of red flags like functions in the WHERE clause or in JOIN conditions. Another consideration is the size of columns being used in comparisons, how many joins, the number of columns returned, and the count of rows in each table. There's also analyzing the query execution plan. Redgate and some other companies have tools that make analyzing the execution plan easier that might be helpful as a beginner. If you could anonymize it, then it might be worth posting here or on dba.stackexchange to see if anyone can see some obvious red flags.

When I did query analysis ~2 years ago it was always hard to find a quick beginners guide. It always felt a bit like an artform to get decent at identifying problems.

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Hughmoris posted:

Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

You're probably going to end up using indexes, so you should understand them https://use-the-index-luke.com/sql/preface

SSMS has a plan viewer built in (a GUI for the EXPLAIN output) but it's kinda annoying to use if you've got a complicated query or stored procedure. I like https://www.sentryone.com/plan-explorer better. Both will show you where the server thinks you're missing an index, if it has that advice. Sometimes it's even good advice!

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Yeah it seems that it always comes down to:
- Slow reads: Not enough indexes, unforseen joins on non-indexes/order by, implicit cast/conversion in the WHERE, weird isolation levels set/unset
- Slow updates: Too many indexes, unknown triggers, table wide locks, stalled jobs

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Depending on the DB, complicated queries can produce --- guess --- complicated query execution plans. Sometimes you can learn a lot by playing. Start with the individual pieces of the query and build them up progressively and see how the performance degrades. When you hit am obvious bottleneck, use EXPLAIN to see what has been added or changed to the plan that's causing the misbehavior.

But yeah, there no guide as such because queries are built from simpler forms. When you understand optimizing the basics, the rest will follow.

spiritual bypass
Feb 19, 2008

Grimey Drawer
I'd be curious to see your query and the DDL for the tables involved

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Hughmoris fucked around with this message at 00:49 on Dec 11, 2018

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Some of the general guidelines my company has used in performance tuning:

Examine the actual query plan as everyone else has said.

Using the handy SSMS visualization is super useful.

Look for thick arrows indicating a shitload of rows going through an operation that yields thin arrows indicating few rows. Those can probably be redone to filter better.

Look for unnecessary table and index scans. Look for spooling. Look for operations with drastic differences between the estimated number of rows and the actual number.

Though the query optimizer can and will rearrange the join orders on you, make an attempt to help it along by joining from tables with few rows to tables with many

List join predicates in the same order as a hypothetical index on the table you are joining

Try to leverage PK / clustered indices in join predicates

Keep index statistics relatively up to date

Rebuild the indices during low usage times if they need it

Avoid subqueries

Avoid UDFs

Avoid Outer Apply

If you can't bash the query into submission with those steps, try breaking out bigger guns
Use index hints

Use join type hints

Option (force order)

These things take flexibility away from the query optimizer, though, which isn't really great for the long term.


I can probably come off more sensibly when not phone posting, but hopefully that with everything else people have said will help.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Hughmoris posted:

Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Ugh. I don't like schemas like this.

What filters are you getting for the query?
What indices are on the table?
Is it really a patient name or some numerical ID? Same question for event type?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

Good stuff...

Thanks for this!

Nth Doctor posted:

Ugh. I don't like schemas like this.

What filters are you getting for the query?
What indices are on the table?
Is it really a patient name or some numerical ID? Same question for event type?

I don't know on the first two questions. I need to read up on indices. Its a patient numerical ID, which is unique to the patient encounter. The event type is generic. The only unique things are the patient ID and the (relatively unique) timestamp.

The gist of the basic query that returns the original table above is:
SQL code:
 
SELECT * 
   FROM DocumentationTable DT
WHERE DocumenationID IN (1,2)  --1 being alarm, 2 being acknowledgement

downout
Jul 6, 2009

Hughmoris posted:

Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Are you asking what query gets the results in "Ideal Endstate"? If so, you can join two data sets based on timestamps. The first data set is Alarms and the second is Acknowledgements. The join is pretty complex because you need the first row from the Acknowledgement set where the timestamp is greater than the Alarm timestamp. I can't remember exactly what the syntax was, but it's possible. Actually it might be cross apply.

downout fucked around with this message at 02:51 on Dec 11, 2018

Hughmoris
Apr 21, 2007
Let's go to the abyss!

downout posted:

Are you asking what query gets the results in "Ideal Endstate"? If so, you can join two data sets based on timestamps. The first data set is Alarms and the second is Acknowledgements. The join is pretty complex because you need the first row from the Acknowledgement set where the timestamp is greater than the Alarm timestamp. I can't remember exactly what the syntax was, but it's possible. Actually it might be cross apply.

Correct, looking for a query to get the Ideal EndState. I believe I figured it out using a correlated subquery. The performance is a hot garbage fire but it returns the correct results (that I've validated so far).

spiritual bypass
Feb 19, 2008

Grimey Drawer
I don't think it's possible to get a correct result here without a correlated subquery

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
you could i think........ but it would involve a while loop :gerty:

Adbot
ADBOT LOVES YOU

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


I took a derived table approach with a window function and had something close to working:
code:
CREATE TABLE dbo.#DocumentationTable
(
  Patient_No INT,
  Event_ID INT,
  Event_Date DATETIME
);

INSERT dbo.#DocumentationTable
(
  Patient_No,
  Event_ID,
  Event_Date
)
/*
Removing data generation because I used our own data set to do this.
I took the IDENTITY column of a table with lots of rows, and Patient_No is that column % 10001
Event ID is:
  CASE (identity_column %5)
    WHEN 0 THEN 1
    WHEN 1 THEN 1
    WHEN 2 THEN 1
    WHEN 3 THEN 2
    WHEN 4 THEN 2
  END AS Event_ID

Event_Date is adding minutes to 1/1/1970 based on the identity column
*/

-- dbo.#DocumentationTable has 10k distinct patient no's, and more eventID 1's than eventID 2's.
SELECT
  COUNT(DISTINCT D.Patient_No) AS Distinct_Patients,
  COUNT(*) AS Total_Records
FROM dbo.#DocumentationTable AS D;

SELECT
  D.Event_ID,
  COUNT(*) AS Total
FROM dbo.#DocumentationTable AS D
GROUP BY D.Event_ID;


SELECT
  D2.Patient_No,
  D2.Event_ID AS Alarm,
  D2.Event_Date AS Alarm_Date,
  Ack_DT.Acknowledgement,
  Ack_DT.Acknowledgement_Date
FROM
(
  SELECT
    D.Patient_No,
    D.Event_ID AS Acknowledgement,
    D.Event_Date AS Acknowledgement_Date,
    LAG(D.Event_Date, 1, NULL) OVER(PARTITION BY D.Patient_No ORDER BY D.Patient_No, D.Event_Date) AS Previous_Acknowledgement_Date
  FROM dbo.#DocumentationTable AS D
  WHERE D.Event_ID = 2
) AS Ack_DT
JOIN dbo.#DocumentationTable AS D2
  ON D2.Patient_No = Ack_DT.Patient_No
  AND D2.Event_ID = 1
  AND D2.Event_Date <= Ack_DT.Acknowledgement_Date
  AND (Ack_DT.Previous_Acknowledgement_Date IS NULL OR D2.Event_Date > Ack_DT.Previous_Acknowledgement_Date)
ORDER BY D2.Patient_No, D2.Event_Date;

GO
DROP TABLE dbo.#DocumentationTable;
I generated some data and here are my results:
pre:
Distinct_Patients Total_Records
----------------- -------------
10001             1000000

(1 row(s) affected)

Event_ID    Total
----------- -----------
1           600373
2           399627

(2 row(s) affected)

Patient_No  Alarm       Alarm_Date              Acknowledgement Acknowledgement_Date
----------- ----------- ----------------------- --------------- -----------------------
1           1           1970-01-07 22:41:00.000 2               1970-02-25 13:28:00.000
1           1           1970-02-04 17:25:00.000 2               1970-02-25 13:28:00.000
1           1           1970-02-18 14:47:00.000 2               1970-02-25 13:28:00.000
1           1           1970-04-29 01:37:00.000 2               1970-06-09 17:43:00.000
...
Things get a little fuzzy around the first or last records for a patient, but that should be easy enough to fix. Hopefully this gives you a good starting point.
This runs in 9 seconds for a million row table, including the time taken to generate the data.

Nth Doctor fucked around with this message at 18:45 on Dec 11, 2018

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply