SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

PhantomOfTheCopier posted:

Contrary to that, since outer joins are so special and inner joins are routine, I want outers to look different and will continue to write inners as straightforward queries with all the tables in one place, all the field equalities together, all the restrictions together, and let my query planner do its job.

"... and let your query planner do its job"? I don't follow what the significance is meant to be of this flourish at the end of your remark. The query planner will "do its job" equally well regardless of whether join conditions appear in one place or another. The difference is one of readability for a human, and it is for reasons of readability that join conditions belong in the ON clause, as I said. The reason why filtering conditions belong in the WHERE clause, unaccompanied by join conditions, is that this draws the reader's attention to them. They are much more likely to define what the query does. The join conditions are boilerplate - necessary boilerplate, but boilerplate all the same. That is why they should be filed away in the ON clause, where they can be checked if necessary but don't distract from the important filtering business taking place in the WHERE clause.

# ? Dec 2, 2018 01:56

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:55

Hughmoris: Apr 21, 2007; Let's go to the abyss!

kumba posted:

There can be a difference, yes. See:

https://community.modeanalytics.com/sql/tutorial/sql-left-join/

https://community.modeanalytics.com/sql/tutorial/sql-joins-where-vs-on/

Thanks for the links, and the other helpful information.

# ? Dec 2, 2018 03:49

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

I think you�re both expressing the same thing - that SQL is (mostly) declarative - and that you should generally write queries in the most human parseable format as the optimizer is usually pretty good at determining an efficient control flow.

Just a quick reminder though that a declarative language is only as efficient as its language�s implementation, which while for SQL is great, it�s not perfect. Especially for complex queries, syntax and order of joins can make a difference. Here�s a really good article on the subject: https://explainextended.com/2009/07/12/double-thinking-in-sql/

Back to the original topic, I disagree that all filtering conditions belong in the where clause. Even in a query fully composed of inner joins I often find the query easier to read when certain filter criteria of table B (in a A inner join B scenarios) are written as a part of the join condition. This makes it clear to me exactly which rows from table B are being considered for joining rather than needing to jump down to the where clause. But I acknowledge that this is all personal opinion.

# ? Dec 2, 2018 04:03

Pollyanna: Mar 5, 2005; Milk's on them.

If we�re going to be selecting from a particular table by a range of the latest date to an earlier date e.g. 2 weeks prior, 8 weeks prior, quarter to date, etc. then the index I want is on the timestamp descending, right?

# ? Dec 4, 2018 16:19

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Pollyanna posted:

If we’re going to be selecting from a particular table by a range of the latest date to an earlier date e.g. 2 weeks prior, 8 weeks prior, quarter to date, etc. then the index I want is on the timestamp descending, right?

It probably doesn't matter from a querying perspective if you do ASC or DESC, but it may be a hassle from a page management perspective on insert where every new row goes to the first entry of the index rather than the last. I'm not explaining it super well but I'd think your b trees would thank you for writing consistently to the tail of the index rather than the head.

# ? Dec 4, 2018 16:28

Pollyanna: Mar 5, 2005; Milk's on them.

Nth Doctor posted:

It probably doesn't matter from a querying perspective if you do ASC or DESC, but it may be a hassle from a page management perspective on insert where every new row goes to the first entry of the index rather than the last. I'm not explaining it super well but I'd think your b trees would thank you for writing consistently to the tail of the index rather than the head.

So I won�t get an advantage on querying for data if I make a btree (dayof DESC) index? What should I use if I�m going to be regularly �looking backwards� on a date range?

We will also be regularly clearing and reinserting a certain number of rows (for a given uuid attribute). Basically we�re planning on dropping the past 13 days of data and re-inserting 14 days worth of data every day at 3am. What can I add to the database to make that easier for us?

Pollyanna fucked around with this message at 17:01 on Dec 4, 2018

# ? Dec 4, 2018 16:55

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Pollyanna posted:

So I won’t get an advantage on querying for data if I make a btree (dayof DESC) index? What should I use if I’m going to be regularly “looking backwards” on a date range?

We will also be regularly clearing and reinserting a certain number of rows (for a given uuid attribute). Basically we’re planning on dropping the past 13 days of data and re-inserting 14 days worth of data every day at 3am. What can I add to the database to make that easier for us?

So I have questions about the delete 13 insert 14 process, but leaving those aside:
Using either a date asc or date desc should result in index seeks like you want which will give you the performance boost you're after.

Conceptualizing the structure of the index itself is where I'm making my argument for asc, but it presupposed you were going for an append only type of insertion.

Here's where I'm coming from: think about the low level memory manipulation that goes on when you allocate a range of address and use them to store a list of data. It's a cheaper operation to append to the end of the list provided you're still within the allocated bounds vs. prepend an entry to the head of the list because the head of the list is at the boundary of your memory allocation. Prepending means every entry in the list needs to be moved down one spot. Appending means maybe a pointer to the tail gets updated.

I'm still phone posting so I can't do fancy pictures to illustrate my point but I think that adding new entries to the head of an index could have similar performance issues as prepending an entry to a low level memory allocated list.

# ? Dec 4, 2018 18:05

Pollyanna: Mar 5, 2005; Milk's on them.

Nth Doctor posted:

So I have questions about the delete 13 insert 14 process, but leaving those aside:

Oh man, trust me: I do too.

quote:

Using either a date asc or date desc should result in index seeks like you want which will give you the performance boost you're after.

Conceptualizing the structure of the index itself is where I'm making my argument for asc, but it presupposed you were going for an append only type of insertion.

Here's where I'm coming from: think about the low level memory manipulation that goes on when you allocate a range of address and use them to store a list of data. It's a cheaper operation to append to the end of the list provided you're still within the allocated bounds vs. prepend an entry to the head of the list because the head of the list is at the boundary of your memory allocation. Prepending means every entry in the list needs to be moved down one spot. Appending means maybe a pointer to the tail gets updated.

I'm still phone posting so I can't do fancy pictures to illustrate my point but I think that adding new entries to the head of an index could have similar performance issues as prepending an entry to a low level memory allocated list.

Been a while since I had to think about this, but I get what you mean. If we were only ever appending, then having a tail of the most recent just means updating a pointer. Unfortunately we�re also doing some weird windowing stuff so we�re also going back and dropping 13, then appending 14, and I guess that might make it more annoying, especially since we�re also scoping that by another column rather than the whole table.

# ? Dec 4, 2018 22:55

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

If you reinsert with it ordered properly then that'll help with reducing reads. If you're doing a ton of whole-day fuckery then you might be best served with partitioning if it's available in your db, since that'll make throwing away the old data fast, unless you're actually truncating the whole table.

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

# ? Dec 5, 2018 00:53

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Moonwolf posted:

If you reinsert with it ordered properly then that'll help with reducing reads. If you're doing a ton of whole-day fuckery then you might be best served with partitioning if it's available in your db, since that'll make throwing away the old data fast, unless you're actually truncating the whole table.

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

Proper ordering on insert won't matter all that much if there's already the index. Unless the dates are themselves the clustered index, insertion order would only affect insertion time in a substantial way. You might get a tiny bit less fragmentation but that's really all it would do.

# ? Dec 5, 2018 01:31

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Has anyone leveraged Change Tracking (MS SQL) on a linked server or through SSIS?

Looks like the CT functions can�t be accessed remotely which leaves me with doing some hacky OpenQuery thing or writing it all directly in SSIS. Just wondering if anyone has experience here.

# ? Dec 5, 2018 15:25

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Ruggan posted:

Has anyone leveraged Change Tracking (MS SQL) on a linked server or through SSIS?

Looks like the CT functions can�t be accessed remotely which leaves me with doing some hacky OpenQuery thing or writing it all directly in SSIS. Just wondering if anyone has experience here.

Sup. I rolled my own replication leveraging change tracking.
E:

code:

USE Scratch_DB;

SELECT CHANGE_TRACKING_CURRENT_VERSION()
DECLARE @SQL NVARCHAR(MAX) = N'USE Scratch_DB;
SELECT CHANGE_TRACKING_CURRENT_VERSION()'

EXEC sp_executesql @SQL;

EXEC Linked_Server.master.sys.sp_executesql N'EXEC sp_executesql @SQL;', N'@SQL NVARCHAR(MAX)', @SQL;

Nth Doctor fucked around with this message at 16:51 on Dec 5, 2018

# ? Dec 5, 2018 16:45

Pollyanna: Mar 5, 2005; Milk's on them.

Moonwolf posted:

Although if 10 days ago hasn't changed then how does binning it and reinserting help anything other than your storage vendor?

It might change, since the data we get each day may not be complete - part of the data takes time to be completely up to date. It might be finished the next day, so we can�t just get it once. I suggested making two tables for the different data, but it ended up being more complicated than it was worth.

It�s super stupid and I hate our reporting system and the less I think about it the saner I remain.

# ? Dec 6, 2018 03:13

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

# ? Dec 7, 2018 04:21

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Agrikk posted:

Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

Depending on what the other columns are, it's a hell of a lot easier on the server to compare straight integers rather than long varchars and the like.

Our typical PK is a tenant id and an identity column at a minimum. If there are more int columns as candidate keys we sometimes include them in the PK definition for clustered index benefits.

# ? Dec 7, 2018 04:33

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Nth Doctor posted:

Depending on what the other columns are, it's a hell of a lot easier on the server to compare straight integers rather than long varchars and the like.

Our typical PK is a tenant id and an identity column at a minimum. If there are more int columns as candidate keys we sometimes include them in the PK definition for clustered index benefits.

The three columns are a user ID (varchar(50)), a group ID (varchar(20) and a timestamp, making the row unique for those three columns. Typical queries include searching for all timestamps for a given user/group combo and searching for all users in a given timestamp.

If I were to use an autoincrementing primary key, I'd still have to have an index to make these use cases work properly, necessitating the use of an update to an index.

What is a use case where an identity column makes sense?

# ? Dec 7, 2018 05:22

Volguus: Mar 3, 2009

Agrikk posted:

The three columns are a user ID (varchar(50)), a group ID (varchar(20) and a timestamp, making the row unique for those three columns. Typical queries include searching for all timestamps for a given user/group combo and searching for all users in a given timestamp.

If I were to use an autoincrementing primary key, I'd still have to have an index to make these use cases work properly, necessitating the use of an update to an index.

What is a use case where an identity column makes sense?

Well, when you hit the limits, for example:
primary key - max 32 columns
index - 900 bytes for a clustered index. 1,700 for a nonclustered index.

In general if you have a bunch of columns that specify the "uniqueness" of a record, just give it a unique key (identity or whatever else) and just make appropriate indexes for the most common queries. If you have only 3 columns that determine the uniqueness of a record, then it should be fine to have those as a primary key.

Though, putting a timestamp in a unique constraint does seem a bit weird. It depends how it's used though: does it ever get updated? Who insert's it? Are there possibilities of collision (same timestamp to be set from different clients/applications)? It can be fine though.

# ? Dec 7, 2018 05:49

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Volguus posted:

Well, when you hit the limits, for example:
primary key - max 32 columns
index - 900 bytes for a clustered index. 1,700 for a nonclustered index.

In general if you have a bunch of columns that specify the "uniqueness" of a record, just give it a unique key (identity or whatever else) and just make appropriate indexes for the most common queries. If you have only 3 columns that determine the uniqueness of a record, then it should be fine to have those as a primary key.

Though, putting a timestamp in a unique constraint does seem a bit weird. It depends how it's used though: does it ever get updated? Who insert's it? Are there possibilities of collision (same timestamp to be set from different clients/applications)? It can be fine though.

Thank you for this explanation. This helps.

In my case, the data comes in for each user/group in batches, and the batch never changes. Thus there are never any updates and inserts happen once an hour for all user/group rows. The time of the insert is the timestamp.

# ? Dec 7, 2018 05:57

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Agrikk posted:

Question:

What is the difference between creating a table with an autoincrementing primary key with a secondary index on the relevant columns or creating a primary key across the relevant columns?

Said another way: Given that I have a table ( in MS SQL 2017) of five columns and three of them are the index columns, what's the difference between creating a PK on the three columns or adding an autoincrementing column and making it the PK and then creating a secondary index containing the three columns?

If this data is actually relational and other tables might have foreign keys referencing rows in this table, then you should pretty much always use a surrogate primary key (i.e. an integer that doesn't mean anything). Natural primary keys are tempting but down that road lies madness.

Also, keep in mind that primary keys are clustered by default in MSSQL and if you're using a natural primary key then it's quite likely that it won't be a good clustering key. This is a really complex topic but you can check out this for a primer.

TheFluff fucked around with this message at 15:21 on Dec 7, 2018

# ? Dec 7, 2018 15:12

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Yeah it's tempting to go "oh hey I don't need to waste space/resources on that primary key because I have uniqueness in (x)+(y)+(z)" but it almost never works out that way because of some of the reasons stated here.

# ? Dec 7, 2018 18:47

Volguus: Mar 3, 2009

Oh yes, foreign keys. I forgot that one. A quite strong reason to not have a composite PK.

# ? Dec 7, 2018 19:31

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

I'd go so far as to put the 'natural' key in it's own table and have the rows point to it with an FK. Better normalization and a unique index wouldn't have to be nearly as big to cover the user+group+timestamp combination.

# ? Dec 7, 2018 20:00

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Performance and viability may depend on careful optimizations within the database or the construction of the indexes when you're using tricky primary keys. When you come to realize that AbcSql no longer supports some of your other needs and you want to switch to SqlLmn++ you'll almost certainly get to rebuild your application to use a basic integer key. This problem will be worse if you have text as a key field since behavior is database-dependent.

# ? Dec 8, 2018 14:39

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

# ? Dec 8, 2018 17:10

spiritual bypass: Feb 19, 2008; Grimey Drawer

The biggest thing for reads is to EXPLAIN and see if it does any table scans. Another important pitfall to beware of is correlated subqueries.

# ? Dec 8, 2018 18:08

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Ruggan posted:

I think you�re both expressing the same thing - that SQL is (mostly) declarative - and that you should generally write queries in the most human parseable format as the optimizer is usually pretty good at determining an efficient control flow.

Just a quick reminder though that a declarative language is only as efficient as its language�s implementation, which while for SQL is great, it�s not perfect. Especially for complex queries, syntax and order of joins can make a difference. Here�s a really good article on the subject: https://explainextended.com/2009/07/12/double-thinking-in-sql/

Back to the original topic, I disagree that all filtering conditions belong in the where clause. Even in a query fully composed of inner joins I often find the query easier to read when certain filter criteria of table B (in a A inner join B scenarios) are written as a part of the join condition. This makes it clear to me exactly which rows from table B are being considered for joining rather than needing to jump down to the where clause. But I acknowledge that this is all personal opinion.

Thanks for the link, was an interesting read.

rt4 posted:

The biggest thing for reads is to EXPLAIN and see if it does any table scans. Another important pitfall to beware of is correlated subqueries.

I'll take a look at EXPLAIN, thanks!

# ? Dec 9, 2018 17:53

downout: Jul 6, 2009

Hughmoris posted:

Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

There are a couple of red flags like functions in the WHERE clause or in JOIN conditions. Another consideration is the size of columns being used in comparisons, how many joins, the number of columns returned, and the count of rows in each table. There's also analyzing the query execution plan. Redgate and some other companies have tools that make analyzing the execution plan easier that might be helpful as a beginner. If you could anonymize it, then it might be worth posting here or on dba.stackexchange to see if anyone can see some obvious red flags.

When I did query analysis ~2 years ago it was always hard to find a quick beginners guide. It always felt a bit like an artform to get decent at identifying problems.

# ? Dec 10, 2018 14:07

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Hughmoris posted:

Is there any recommended reading for SQL performance tuning for beginners?

I'm using SQL Server, and have a query that takes roughly 2 minutes to run. The query runs overnight and is not mission critical but I'd like to try my hand at speeding it up. I'm not quite sure where to start reading/digging.

*I'm not a DBA, just an analyst pulling data.

You're probably going to end up using indexes, so you should understand them https://use-the-index-luke.com/sql/preface

SSMS has a plan viewer built in (a GUI for the EXPLAIN output) but it's kinda annoying to use if you've got a complicated query or stored procedure. I like https://www.sentryone.com/plan-explorer better. Both will show you where the server thinks you're missing an index, if it has that advice. Sometimes it's even good advice!

# ? Dec 10, 2018 15:16

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Yeah it seems that it always comes down to:
- Slow reads: Not enough indexes, unforseen joins on non-indexes/order by, implicit cast/conversion in the WHERE, weird isolation levels set/unset
- Slow updates: Too many indexes, unknown triggers, table wide locks, stalled jobs

# ? Dec 10, 2018 19:57

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Depending on the DB, complicated queries can produce --- guess --- complicated query execution plans. Sometimes you can learn a lot by playing. Start with the individual pieces of the query and build them up progressively and see how the performance degrades. When you hit am obvious bottleneck, use EXPLAIN to see what has been added or changed to the plan that's causing the misbehavior.

But yeah, there no guide as such because queries are built from simpler forms. When you understand optimizing the basics, the rest will follow.

# ? Dec 11, 2018 00:16

spiritual bypass: Feb 19, 2008; Grimey Drawer

I'd be curious to see your query and the DDL for the tables involved

# ? Dec 11, 2018 00:21

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table

code:

Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313

Ideal Endstate

code:

Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308

I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Hughmoris fucked around with this message at 00:49 on Dec 11, 2018

# ? Dec 11, 2018 00:37

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Some of the general guidelines my company has used in performance tuning:

Examine the actual query plan as everyone else has said.

Using the handy SSMS visualization is super useful.

Look for thick arrows indicating a shitload of rows going through an operation that yields thin arrows indicating few rows. Those can probably be redone to filter better.

Look for unnecessary table and index scans. Look for spooling. Look for operations with drastic differences between the estimated number of rows and the actual number.

Though the query optimizer can and will rearrange the join orders on you, make an attempt to help it along by joining from tables with few rows to tables with many

List join predicates in the same order as a hypothetical index on the table you are joining

Try to leverage PK / clustered indices in join predicates

Keep index statistics relatively up to date

Rebuild the indices during low usage times if they need it

Avoid subqueries

Avoid UDFs

Avoid Outer Apply

If you can't bash the query into submission with those steps, try breaking out bigger guns
Use index hints

Use join type hints

Option (force order)

These things take flexibility away from the query optimizer, though, which isn't really great for the long term.

I can probably come off more sensibly when not phone posting, but hopefully that with everything else people have said will help.

# ? Dec 11, 2018 00:49

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Hughmoris posted:

Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Ugh. I don't like schemas like this.

What filters are you getting for the query?
What indices are on the table?
Is it really a patient name or some numerical ID? Same question for event type?

# ? Dec 11, 2018 01:04

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Nth Doctor posted:

Good stuff...

Thanks for this!

Nth Doctor posted:

Ugh. I don't like schemas like this.

What filters are you getting for the query?
What indices are on the table?
Is it really a patient name or some numerical ID? Same question for event type?

I don't know on the first two questions. I need to read up on indices. Its a patient numerical ID, which is unique to the patient encounter. The event type is generic. The only unique things are the patient ID and the (relatively unique) timestamp.

The gist of the basic query that returns the original table above is:

SQL code:

 
SELECT * 
   FROM DocumentationTable DT
WHERE DocumenationID IN (1,2)  --1 being alarm, 2 being acknowledgement

# ? Dec 11, 2018 01:13

downout: Jul 6, 2009

Hughmoris posted:

Thanks for all the ideas. This is dealing with healthcare data so I err on the side of caution in regards to posting relevant details but I'll research the topics pointed out.

I do have a problem that has turned out to be a stumper for me and two others I work with:

I have a table that contains alarms and acknowledgements. An acknowledgement can clear multiple alarms. For a given alarm, when was the next acknowledgment that occurred?

Original Table
code:
Patient, Event, EventTime
Joe, Alarm, 12/10 1300
Sally, Alarm, 12/10 1305
Sally, Alarm, 12/10 1306
Sally, Acknowledgement,12/10 1308
Joe, Acknowledgment,12/10 1313
Ideal Endstate
code:
Patient,Alarm, AlarmTime, Acknowledgement, AcknowledgementTime
Joe, Alarm, 12/10 1300, Acknowledgement, 12/10 1313
Sally, Alarm, 12/10 1305, Acknowledgement, 12/10 1308
Sally, Alarm, 12/10 1306, Acknowledgement, 12/10 1308
I thought this would be easy when I first received the request but its stumping a few of us SQL rookies. Any suggestions on how to approach it?

I've started utilizing window functions more but mainly for first and last values. This feels like I could use LEAD but I'm not sure how to use the conditional of AcknowledgeTime > AlarmTime.

Are you asking what query gets the results in "Ideal Endstate"? If so, you can join two data sets based on timestamps. The first data set is Alarms and the second is Acknowledgements. The join is pretty complex because you need the first row from the Acknowledgement set where the timestamp is greater than the Alarm timestamp. I can't remember exactly what the syntax was, but it's possible. Actually it might be cross apply.

downout fucked around with this message at 02:51 on Dec 11, 2018

# ? Dec 11, 2018 02:48

Hughmoris: Apr 21, 2007; Let's go to the abyss!

downout posted:

Are you asking what query gets the results in "Ideal Endstate"? If so, you can join two data sets based on timestamps. The first data set is Alarms and the second is Acknowledgements. The join is pretty complex because you need the first row from the Acknowledgement set where the timestamp is greater than the Alarm timestamp. I can't remember exactly what the syntax was, but it's possible. Actually it might be cross apply.

Correct, looking for a query to get the Ideal EndState. I believe I figured it out using a correlated subquery. The performance is a hot garbage fire but it returns the correct results (that I've validated so far).

# ? Dec 11, 2018 03:03

spiritual bypass: Feb 19, 2008; Grimey Drawer

I don't think it's possible to get a correct result here without a correlated subquery

# ? Dec 11, 2018 04:08

kumba: Nov 8, 2003; I posted my food for USPOL Thanksgiving!

enjoy the ride; Lipstick Apathy

you could i think........ but it would involve a while loop :gerty:

# ? Dec 11, 2018 04:11

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:55

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

I took a derived table approach with a window function and had something close to working:

code:

CREATE TABLE dbo.#DocumentationTable
(
  Patient_No INT,
  Event_ID INT,
  Event_Date DATETIME
);

INSERT dbo.#DocumentationTable
(
  Patient_No,
  Event_ID,
  Event_Date
)
/*
Removing data generation because I used our own data set to do this.
I took the IDENTITY column of a table with lots of rows, and Patient_No is that column % 10001
Event ID is:
  CASE (identity_column %5)
    WHEN 0 THEN 1
    WHEN 1 THEN 1
    WHEN 2 THEN 1
    WHEN 3 THEN 2
    WHEN 4 THEN 2
  END AS Event_ID

Event_Date is adding minutes to 1/1/1970 based on the identity column
*/

-- dbo.#DocumentationTable has 10k distinct patient no's, and more eventID 1's than eventID 2's.
SELECT
  COUNT(DISTINCT D.Patient_No) AS Distinct_Patients,
  COUNT(*) AS Total_Records
FROM dbo.#DocumentationTable AS D;

SELECT
  D.Event_ID,
  COUNT(*) AS Total
FROM dbo.#DocumentationTable AS D
GROUP BY D.Event_ID;


SELECT
  D2.Patient_No,
  D2.Event_ID AS Alarm,
  D2.Event_Date AS Alarm_Date,
  Ack_DT.Acknowledgement,
  Ack_DT.Acknowledgement_Date
FROM
(
  SELECT
    D.Patient_No,
    D.Event_ID AS Acknowledgement,
    D.Event_Date AS Acknowledgement_Date,
    LAG(D.Event_Date, 1, NULL) OVER(PARTITION BY D.Patient_No ORDER BY D.Patient_No, D.Event_Date) AS Previous_Acknowledgement_Date
  FROM dbo.#DocumentationTable AS D
  WHERE D.Event_ID = 2
) AS Ack_DT
JOIN dbo.#DocumentationTable AS D2
  ON D2.Patient_No = Ack_DT.Patient_No
  AND D2.Event_ID = 1
  AND D2.Event_Date <= Ack_DT.Acknowledgement_Date
  AND (Ack_DT.Previous_Acknowledgement_Date IS NULL OR D2.Event_Date > Ack_DT.Previous_Acknowledgement_Date)
ORDER BY D2.Patient_No, D2.Event_Date;

GO
DROP TABLE dbo.#DocumentationTable;

I generated some data and here are my results:

pre:

Distinct_Patients Total_Records
----------------- -------------
10001             1000000

(1 row(s) affected)

Event_ID    Total
----------- -----------
1           600373
2           399627

(2 row(s) affected)

Patient_No  Alarm       Alarm_Date              Acknowledgement Acknowledgement_Date
----------- ----------- ----------------------- --------------- -----------------------
1           1           1970-01-07 22:41:00.000 2               1970-02-25 13:28:00.000
1           1           1970-02-04 17:25:00.000 2               1970-02-25 13:28:00.000
1           1           1970-02-18 14:47:00.000 2               1970-02-25 13:28:00.000
1           1           1970-04-29 01:37:00.000 2               1970-06-09 17:43:00.000
...

Things get a little fuzzy around the first or last records for a patient, but that should be easy enough to fix. Hopefully this gives you a good starting point.
This runs in 9 seconds for a million row table, including the time taken to generate the data.

Nth Doctor fucked around with this message at 18:45 on Dec 11, 2018

# ? Dec 11, 2018 18:36

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »