Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
At the risk of sounding prophetic, are you sure you really want the timestamps in the results? That is, are you really just looking for the final state for the user (a much easier problem), or do you truly need the alert and acknowledgement matched in that fashion?

And the obvious question, what happens if the sequence is (alert1,alert2,ack2,ack1)? Or (alert, alert,ack)?

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I appreciate you take the time to type all that up. I need to work my way through understanding how it works. That type of performance is awesome, because my correlated subquery takes 8+ minutes for a month of data.


PhantomOfTheCopier posted:

At the risk of sounding prophetic, are you sure you really want the timestamps in the results? That is, are you really just looking for the final state for the user (a much easier problem), or do you truly need the alert and acknowledgement matched in that fashion?

And the obvious question, what happens if the sequence is (alert1,alert2,ack2,ack1)? Or (alert, alert,ack)?

My understanding of the data is that an acknowledgement will acknowledge all of the alerts that have fired since the last acknowledgement.

The timestamps are needed because the person requesting the report wants to the see stats around when during the day they typically fire, and how long on average until an alarm is acknowledged.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Is there a quick way to convert a string in the format

'2013.06.11.PM.09.48'

to datetime in SQL Server?

For clarity, that is yyyy.mm.dd.AM/PM.hh.mm

I could break it apart into components and do various conversions and then put them back together as datetime, but I'm hoping there is something obvious that I have missed.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Hughmoris posted:

I appreciate you take the time to type all that up. I need to work my way through understanding how it works. That type of performance is awesome, because my correlated subquery takes 8+ minutes for a month of data.


My understanding of the data is that an acknowledgement will acknowledge all of the alerts that have fired since the last acknowledgement.

The timestamps are needed because the person requesting the report wants to the see stats around when during the day they typically fire, and how long on average until an alarm is acknowledged.

Because I can't leave well enough alone, I looked into the edge cases I mentioned before and fixed them:
code:
CREATE TABLE dbo.#DocumentationTable
(
  Patient_No INT,
  Event_ID INT,
  Event_Date DATETIME
);

INSERT dbo.#DocumentationTable
(
  Patient_No,
  Event_ID,
  Event_Date
)
VALUES
  (1, 1, '1/1/2010'),
  (1, 1, '1/2/2010'),
  (1, 1, '1/3/2010'),
  (1, 2, '1/4/2010'),
  (1, 1, '1/5/2010'),
  (1, 2, '1/6/2010'),
  (1, 1, '1/7/2010'),
  (2, 2, '12/31/2009'),
  (2, 1, '1/1/2010'),
  (2, 1, '1/2/2010'),
  (2, 1, '1/3/2010'),
  (2, 2, '1/4/2010'),
  (2, 1, '1/5/2010'),
  (2, 2, '1/6/2010'),
  (2, 1, '1/7/2010');

SELECT
  COUNT(DISTINCT D.Patient_No) AS Distinct_Patients,
  COUNT(*) AS Total_Records
FROM dbo.#DocumentationTable AS D;

SELECT
  D.Event_ID,
  COUNT(*) AS Total
FROM dbo.#DocumentationTable AS D
GROUP BY D.Event_ID;

SELECT
  D2.Patient_No,
  D2.Event_ID AS Alarm,
  D2.Event_Date AS Alarm_Date,
  Ack_DT.Acknowledgement,
  Ack_DT.Acknowledgement_Date
FROM dbo.#DocumentationTable AS D2
LEFT OUTER JOIN 
(
  SELECT
    D.Patient_No,
    D.Event_ID AS Acknowledgement,
    D.Event_Date AS Acknowledgement_Date,
    LAG(D.Event_Date, 1, NULL) OVER(PARTITION BY D.Patient_No ORDER BY D.Patient_No, D.Event_Date) AS Previous_Acknowledgement_Date
  FROM dbo.#DocumentationTable AS D
  WHERE D.Event_ID = 2
) AS Ack_DT
  ON Ack_DT.Patient_No = D2.Patient_No
  AND Ack_DT.Acknowledgement_Date >= D2.Event_Date
  AND (Ack_DT.Previous_Acknowledgement_Date IS NULL OR Ack_DT.Previous_Acknowledgement_Date < D2.Event_Date)
WHERE D2.Event_ID = 1
ORDER BY D2.Patient_No, D2.Event_Date;
GO
DROP TABLE dbo.#DocumentationTable;
This will return rows for patients who have had an alarm raised but no acknowledgement yet.
As you can see with Patient 2, if for some reason there is an acknowledgement before any alarms were found in the data set, those acknowledgements are ignored.
pre:
Patient_No  Alarm       Alarm_Date              Acknowledgement Acknowledgement_Date
----------- ----------- ----------------------- --------------- -----------------------
1           1           2010-01-01 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-02 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-03 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-05 00:00:00.000 2               2010-01-06 00:00:00.000
1           1           2010-01-07 00:00:00.000 NULL            NULL
2           1           2010-01-01 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-02 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-03 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-05 00:00:00.000 2               2010-01-06 00:00:00.000
2           1           2010-01-07 00:00:00.000 NULL            NULL

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


I think I may have a simpler query. Using your same base result set:

code:
DROP TABLE IF EXISTS #DocumentationTable

CREATE TABLE #DocumentationTable
(
    Patient_No INT,
    Event_ID INT,
    Event_Date DATETIME
);

INSERT INTO #DocumentationTable
(
    Patient_No,
    Event_ID,
    Event_Date
)
VALUES
    (1, 1, '1/1/2010'),
    (1, 1, '1/2/2010'),
    (1, 1, '1/3/2010'),
    (1, 2, '1/4/2010'),
    (1, 1, '1/5/2010'),
    (1, 2, '1/6/2010'),
    (1, 1, '1/7/2010'),
    (2, 2, '12/31/2009'),
    (2, 1, '1/1/2010'),
    (2, 1, '1/2/2010'),
    (2, 1, '1/3/2010'),
    (2, 2, '1/4/2010'),
    (2, 1, '1/5/2010'),
    (2, 2, '1/6/2010'),
    (2, 1, '1/7/2010');
Here's the query, using a windowed function with defined frame, taking advantage of the MIN aggregate discarding null rows:

code:
SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc
Same result set, but query plan is more efficient (~2/3 the cost) and a little nicer of a query plan due to eliminating any need to self-join:

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Ruggan posted:

I think I may have a simpler query. Using your same base result set:

code:
DROP TABLE IF EXISTS #DocumentationTable

CREATE TABLE #DocumentationTable
(
    Patient_No INT,
    Event_ID INT,
    Event_Date DATETIME
);

INSERT INTO #DocumentationTable
(
    Patient_No,
    Event_ID,
    Event_Date
)
VALUES
    (1, 1, '1/1/2010'),
    (1, 1, '1/2/2010'),
    (1, 1, '1/3/2010'),
    (1, 2, '1/4/2010'),
    (1, 1, '1/5/2010'),
    (1, 2, '1/6/2010'),
    (1, 1, '1/7/2010'),
    (2, 2, '12/31/2009'),
    (2, 1, '1/1/2010'),
    (2, 1, '1/2/2010'),
    (2, 1, '1/3/2010'),
    (2, 2, '1/4/2010'),
    (2, 1, '1/5/2010'),
    (2, 2, '1/6/2010'),
    (2, 1, '1/7/2010');
Here's the query, using a windowed function with defined frame, taking advantage of the MIN aggregate discarding null rows:

code:
SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc
Same result set, but query plan is more efficient (~2/3 the cost) and a little nicer of a query plan due to eliminating any need to self-join:



Ooh I like that! I don't use windowed functions terribly often in my day to day work, and that's a really nice application of them!

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Ruggan posted:

code:
SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc

You are a wizard. Utilizing this method allowed me to query 11 months worth of data in about 10 seconds. My correlated subquery for 1 month of data took 8+ minutes.

Nth Doctor, thanks for taking another crack at. I need to rewrite the query using your approach to see how it compares. I also need to start working on understanding execution plans.

It's really great to see how other people approach a given problem, and storing those concepts for future problems.

Hughmoris fucked around with this message at 03:59 on Dec 13, 2018

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe
Are there any good guides on when to make something a new column on a table, and when to compose with auxiliary tables? Like I get that if you can have a one to multiple relationship you need to compose either directly or using mapping/indexing tables, but even if you use that as a guideline, if you just take everything that is 1-to-1 and which is not repeated for other tables and add those as new columns, you can end up with some absolute monoliths of tables for any application of scale. On the other hand, maintaining an application where every field of an object is added through composition sounds like hell on earth.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

You basically want a "when to normalize and how much guide" which can have different approaches depending on the goal.

This article (was first result, have no idea how good the site is) goes over the basics:
https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe
That article looks very relevant. Just having the terminology is very useful as well.

unknown
Nov 16, 2002
Ain't got no stinking title yet!


SQL code so bad that Cloudflare refused to allow me to post it unedited.

Developer wondering why it wasn't working with any speed. :cripes:

code:

SLEECT tsl.*,
       jdgv.jdgv_name,
       jdgv.jdgv_id,
       deetz.deetz_name,
       tsl.tsl_site                                    AS site,
       tsl.tsl_plan                                    AS plan,
       Concat(ifec.firstname, Space(1), ifec.lastname) AS ifec_Name,
       Date_format(tsl.change_datetime, "%m-%d-%y")    AS LastUpdated,
       (SLEECT Count(*) allcheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allCount,
       (SLEECT Count(DISTINCT ( visitor )) alluniquecheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allUniqueCount,
       (SLEECT Group_concat('', ifec.firstname, Space(1), ifec.lastname)
        FROM   shared_tsl
               LEFT JOIN ifec
                      ON shared_tsl.ifec_id = ifec.ifec_id
        WHERE  shared_tsl.tsl_id = tsl.tsl_id)         AS borrow_ifec_name,
       (SLEECT image_name
        FROM   tsl_image
        WHERE  tsl_id = tsl.tsl_id
               AND is_main_image = 1
        LIMIT  1)                                      AS image_name,
       status.description                              AS status,
       ifec.firstname                                  AS ifecfirstname,
       ifec.lastname                                   AS ifeclastname,
       ifec2.firstname                                 AS ifec2firstname,
       ifec2.lastname                                  AS ifec2lastname
FROM   tsl
       LEFT JOIN status
              ON tsl.tsl_status = status.status_id,
       tsl AS tsl5
       LEFT JOIN jdgv
              ON tsl5.jdgv_id = jdgv.jdgv_id,
       tsl AS tsl4
       LEFT JOIN deetz
              ON tsl4.deetz_id = deetz.deetz_id,
       tsl AS tsl3
       LEFT JOIN ifec
              ON tsl3.tsl_ifec_id = ifec.ifec_id,
       tsl AS tsl2
       LEFT JOIN ifec AS ifec2
              ON tsl2.tsl_ifec2_id = ifec2.ifec_id
WHERE  tsl5.tsl_id = tsl.tsl_id
       AND tsl4.tsl_id = tsl.tsl_id
       AND tsl3.tsl_id = tsl.tsl_id
       AND tsl2.tsl_id = tsl.tsl_id
       AND ifec.office_id IN ( 77 )
       AND tsl.sharetsl = true
       AND ifec.activeifec = true
       AND tsl.tsl_ifec_id <> 1533
       AND tsl.tsl_site LIKE '%words%'
        OR jdgv.jdgv_name LIKE '%words%'
        OR tsl.mls_number LIKE '%words%'
           AND tsl.tsl_status NOT IN ( 1, 2, 7, 8, 13 )
GROUP  BY tsl.tsl_id
ORDER  BY tsl.tsl_status DESC,
          tsl.tsl_plan ASC; 

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


lol, that hurts my brain

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


unknown posted:

SQL code so bad that Cloudflare refused to allow me to post it unedited.

Developer wondering why it wasn't working with any speed. :cripes:

code:

SLEECT tsl.*,
       jdgv.jdgv_name,
       jdgv.jdgv_id,
       deetz.deetz_name,
       tsl.tsl_site                                    AS site,
       tsl.tsl_plan                                    AS plan,
       Concat(ifec.firstname, Space(1), ifec.lastname) AS ifec_Name,
       Date_format(tsl.change_datetime, "%m-%d-%y")    AS LastUpdated,
       (SLEECT Count(*) allcheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allCount,
       (SLEECT Count(DISTINCT ( visitor )) alluniquecheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allUniqueCount,
       (SLEECT Group_concat('', ifec.firstname, Space(1), ifec.lastname)
        FROM   shared_tsl
               LEFT JOIN ifec
                      ON shared_tsl.ifec_id = ifec.ifec_id
        WHERE  shared_tsl.tsl_id = tsl.tsl_id)         AS borrow_ifec_name,
       (SLEECT image_name
        FROM   tsl_image
        WHERE  tsl_id = tsl.tsl_id
               AND is_main_image = 1
        LIMIT  1)                                      AS image_name,
       status.description                              AS status,
       ifec.firstname                                  AS ifecfirstname,
       ifec.lastname                                   AS ifeclastname,
       ifec2.firstname                                 AS ifec2firstname,
       ifec2.lastname                                  AS ifec2lastname
FROM   tsl
       LEFT JOIN status
              ON tsl.tsl_status = status.status_id,
       tsl AS tsl5
       LEFT JOIN jdgv
              ON tsl5.jdgv_id = jdgv.jdgv_id,
       tsl AS tsl4
       LEFT JOIN deetz
              ON tsl4.deetz_id = deetz.deetz_id,
       tsl AS tsl3
       LEFT JOIN ifec
              ON tsl3.tsl_ifec_id = ifec.ifec_id,
       tsl AS tsl2
       LEFT JOIN ifec AS ifec2
              ON tsl2.tsl_ifec2_id = ifec2.ifec_id
WHERE  tsl5.tsl_id = tsl.tsl_id
       AND tsl4.tsl_id = tsl.tsl_id
       AND tsl3.tsl_id = tsl.tsl_id
       AND tsl2.tsl_id = tsl.tsl_id
       AND ifec.office_id IN ( 77 )
       AND tsl.sharetsl = true
       AND ifec.activeifec = true
       AND tsl.tsl_ifec_id <> 1533
       AND tsl.tsl_site LIKE '%words%'
        OR jdgv.jdgv_name LIKE '%words%'
        OR tsl.mls_number LIKE '%words%'
           AND tsl.tsl_status NOT IN ( 1, 2, 7, 8, 13 )
GROUP  BY tsl.tsl_id
ORDER  BY tsl.tsl_status DESC,
          tsl.tsl_plan ASC; 

So does your dev realize indenting ORs doesn't make them logically nest? Or anything else about query design?
I should scrolled to the bottom in the hope of keeping sanity and saw that.

Joda
Apr 24, 2010

When I'm off, I just like to really let go and have fun, y'know?

Fun Shoe
poo poo like that is normal at my current work place. My old boss who started the company and his early devs were all ideas guys who fancied themselves programmers. You know when entering an SP or function that was made before 2015 that you're about to see some funky stuff. Only difference is aliases are either missing or more often than not just a single letter, and tables and column names are rife with typos and typically mix more than one language IN THE SAME NAME. My favorite is finding an old view that is actively being used that joins to a massive table to get a value that is never used and without a clustered index to join them.

unknown
Nov 16, 2002
Ain't got no stinking title yet!


Nth Doctor posted:

So does your dev realize indenting ORs doesn't make them logically nest? Or anything else about query design?
I should scrolled to the bottom in the hope of keeping sanity and saw that.

Not my dev, I just got called in to investigate wtf was going on (as other people are already on holidays) and discovered the monstrosity.

Indenting is from the query clean up/formatting tool linked in the first post. I did have the change variable names for obvious reasons - hence the junk character names, but the logic (ha!) is all the dev's own work.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Hughmoris posted:

Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

Only when the performance is a problem. Premature performance tweaks can have detrimental effects of you lock the query optimizer out of performing its magic.

Also making a good hygiene habit of joining on the PK / clustered index wherever possible.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Hughmoris posted:

Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

I never start by looking at the indexes. I write the query with the cleanest and most sensible syntax I can come up with, and 95% of the time that gives me a query that is good enough. At that point I’m done.

For the other 5% I’ll do some optimization work which usually starts with looking the query plan for things that are costing a lot: scans, key lookups, that sort of thing.

Rarely am I presented with a situation where query refactoring could lead to better performance via different index utilization, especially when you’ve written a clean query with simple joins. When this does happen a lot of it involves nudging the optimizer to select from the right index with little tricks like filtering a column in multiple joined tables explicitly (like a date range - only pulling data from that date range in each table aka two where between or on between statements).

Sometimes I find that indexes that are necessary or important (like an index on an important foreign key join column) are missing and I’ll either add it if I’m dbo or bug the right person to do it. In this case, having some clout is really helpful as I don’t often have to have the “yes I know indexes have a cost” conversation and rarely request useless / overly specific indexes. If you’re making this sort of request you can bolster your argument by looking at query stats in the query store - if you find a query is running hundreds of times a day and would largely benefit from this index then that helps you convince the dbo to make it.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Nth Doctor posted:

Only when the performance is a problem. Premature performance tweaks can have detrimental effects of you lock the query optimizer out of performing its magic.

Also making a good hygiene habit of joining on the PK / clustered index wherever possible.

Ruggan posted:

I never start by looking at the indexes. I write the query with the cleanest and most sensible syntax I can come up with, and 95% of the time that gives me a query that is good enough. At that point I’m done.

For the other 5% I’ll do some optimization work which usually starts with looking the query plan for things that are costing a lot: scans, key lookups, that sort of thing.

Rarely am I presented with a situation where query refactoring could lead to better performance via different index utilization, especially when you’ve written a clean query with simple joins. When this does happen a lot of it involves nudging the optimizer to select from the right index with little tricks like filtering a column in multiple joined tables explicitly (like a date range - only pulling data from that date range in each table aka two where between or on between statements).

Sometimes I find that indexes that are necessary or important (like an index on an important foreign key join column) are missing and I’ll either add it if I’m dbo or bug the right person to do it. In this case, having some clout is really helpful as I don’t often have to have the “yes I know indexes have a cost” conversation and rarely request useless / overly specific indexes. If you’re making this sort of request you can bolster your argument by looking at query stats in the query store - if you find a query is running hundreds of times a day and would largely benefit from this index then that helps you convince the dbo to make it.

Thanks for the insights. I recently took a SQL Assessment exam that my software vendor requires. Thanks to this thread, I aced it. In 2018 I've gone from barely being able to spell SQL, to creating executive facing dashboards.

In 2019, I look forward to deepening my knowledge of SQL beyond simple data retrieval. Thanks for the help everyone!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Ok, I managed to cobble together this query using a window function to calculate a running total, but I'm not 100% understanding how it works.

code:
SELECT "transaction"."id",
       "transaction"."payee_id",
       "transaction"."date",
       "transaction"."amount",
       "transaction"."memo",
       "transaction"."category_id",
       SUM("transaction"."amount") OVER (
         ORDER BY "transaction"."date" ASC
         ) AS "cumsum"
FROM "transaction"
1. I'm not specifying the bounds of the window anywhere...is it just the default for the window to be from the beginning of the result set to the current row?
2. I don't understand why the ORDER BY is a part of the OVER instead of at the end...aka FROM "transaction" ORDER_BY "transaction.date". It seems like if you order the results, then your window will automatically be over an ordered set of transactions. Obviously I'm wrong I just don't understand this well enough to understand why I'm wrong.

edit: The next post made me realize I didn't put that I'm using postgres 10.1

Thermopyle fucked around with this message at 21:19 on Dec 31, 2018

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


In SQL Server it defaults to unbound preceding to current row if you put an order by clause on. If you hadn't ordered it then range boundaries wouldn't make any sense so it'd default to the entire dataset, because you haven't put a partitioning clause on.

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Moonwolf posted:

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

I’m reasonably sure the implementation of windowed functions like this toss out set based operations and take a more algorithmic (ala read head) approach. They’re beneficial precisely because they do not need to join the full set to itself. I think.

Shadow0
Jun 16, 2008


If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.

Grimey Drawer
Is it possible to have a database (or the server hosting the database maybe) execute some simple code when a write is performed?

I started looking at using Firebase for an Android app I'm making. I'm trying to make a database to store high scores, and I'd like it to just keep the top 10 results in the database but I don't want the clients touching other people's records. However, I'm not sure if it's possible.

If not, oh well, I can just have the client compare and then send a result if it beats another score and then just store everything forever.

I guess the other thing that can be done is the bottom record could be re-written by the new record.

NinpoEspiritoSanto
Oct 22, 2013




There are things called triggers in SQL which might be what you're looking for. I'm on mobile at the moment but can follow up later if you're still stuck.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Are you saying you’re only storing high scores in the database or you’re storing all scores in one table and high scores in another? Or what?

As mentioned, you can do triggers. However triggers suck as they’re synchronous operations. There are other options that are more technically complex as well. But I’m not convinced you need them in the first place.

Shadow0
Jun 16, 2008


If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.

Grimey Drawer

Bundy posted:

There are things called triggers in SQL which might be what you're looking for. I'm on mobile at the moment but can follow up later if you're still stuck.

Oh, that's a good starting point, thank you!

Ruggan posted:

Are you saying you’re only storing high scores in the database or you’re storing all scores in one table and high scores in another? Or what?

As mentioned, you can do triggers. However triggers suck as they’re synchronous operations. There are other options that are more technically complex as well. But I’m not convinced you need them in the first place.

Yeah, I don't really need to do anything fancy, I don't think the database will ever exceed the free-tier limits or anything. Of course I can just sort the results and return the top 5 or 10 or whatever. I'm just trying to learn a little bit more about using databases and whatnot since I know almost nothing at this point. I'm trying to do it "right" and learn a little bit, if that makes sense.

The idea was that the database just holds the top 10 scores. Then the client gets the top 5 or so and shows them on the high score screen; and any time a client scores a new high score, it gets sent off to the database, which then theoretically will update to reflect the new set of high scores, deleting the lowest result.

wolrah
May 8, 2006
what?
You definitely don't want any situations where the client has the ability to modify someone else's score data. Many years ago Super Meat Boy did that and it didn't take long before that bit them in the rear end.

https://forums.somethingawful.com/showthread.php?noseen=0&threadid=2803713&pagenumber=258#post398884189

I'm not familiar with Firebase so I'm not sure what parts of the equation it may handle for you, but the way I'd do this would be to throw together a quick web API that sat between the database and the clients, with an endpoint for submitting high scores and another for retrieving the leaderboard. That app could then do whatever other housekeeping work you wanted it to do.

Rule #1 of internet-facing services: Never trust the client. Assume any safeguards you put in to client code will be defeated. Validate everything they send you and don't let them interact with anything they don't need to.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Triggers aren’t for what you’re trying to do. A trigger is code that fires when a database table is updated in some way (record added, deleted, updated). They are typically used for dependent updates - for example, if you had a Scores table, you might have a trigger that updates the HighScores table whenever the Scores table changes. Like I said before, I strongly dislike triggers as they are synchronous operations - the transaction does not complete until the trigger does and so the user must wait for the triggered operation to complete before regaining control. When people misuse triggers for expensive operations (like doing some aggregate across an entire table) it can really suck.

Once again, you don’t need a trigger. Triggers are for performing a dependent operation. All you need is a stored procedure. This stored procedure should accept two parameters: user and score. It should take the score and update your HighScores table directly when appropriate. You can expose this to your application through a direct SQL call, or some sort of API as wolrah mentioned.

Shadow0
Jun 16, 2008


If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.

Grimey Drawer

wolrah posted:

You definitely don't want any situations where the client has the ability to modify someone else's score data. Many years ago Super Meat Boy did that and it didn't take long before that bit them in the rear end.

https://forums.somethingawful.com/showthread.php?noseen=0&threadid=2803713&pagenumber=258#post398884189

I'm not familiar with Firebase so I'm not sure what parts of the equation it may handle for you, but the way I'd do this would be to throw together a quick web API that sat between the database and the clients, with an endpoint for submitting high scores and another for retrieving the leaderboard. That app could then do whatever other housekeeping work you wanted it to do.

Web is even more terrifying to me than the database, I'm not sure I'm up to that task.

wolrah posted:

Rule #1 of internet-facing services: Never trust the client. Assume any safeguards you put in to client code will be defeated. Validate everything they send you and don't let them interact with anything they don't need to.

Yeah, I definitely assumed that!
I don't know how I would go about making sure that scores sent over were valid (I know nothing about security [Actually, it seems like I might know nothing about anything]). Maybe whenever the value is increased naturally, I store the answer as a hash, then send that with the score and have the server hash the score and compare them? Does that sound right?
I'm pretty confident approximately 0 people will download my app, but I'm trying to get in good practice.

Ruggan posted:

Triggers aren’t for what you’re trying to do. A trigger is code that fires when a database table is updated in some way (record added, deleted, updated). They are typically used for dependent updates - for example, if you had a Scores table, you might have a trigger that updates the HighScores table whenever the Scores table changes. Like I said before, I strongly dislike triggers as they are synchronous operations - the transaction does not complete until the trigger does and so the user must wait for the triggered operation to complete before regaining control. When people misuse triggers for expensive operations (like doing some aggregate across an entire table) it can really suck.

Once again, you don’t need a trigger. Triggers are for performing a dependent operation. All you need is a stored procedure. This stored procedure should accept two parameters: user and score. It should take the score and update your HighScores table directly when appropriate. You can expose this to your application through a direct SQL call, or some sort of API as wolrah mentioned.

Stored procedures look interesting, thanks! I think that might work well. Firestore/Firebase doesn't seem to have exactly that, but it does have something called "Cloud Functions". It seems pretty similar to the web API idea. to the thing you said not to do... I'll keep looking around, haha. :negative: I think they run in the background though, so they are non-blocking. If I'm reading it correctly (I'm sure I'm not).
I guess I can expose a method like "sendScoreToServer", and then it fires off and does all its thing on the server. Seems like it'll be perfect. Then I'll just have to figure out how the security mess works and limit the database to only take that call and queries. Thanks for the help, guys!

Shadow0 fucked around with this message at 15:14 on Jan 2, 2019

wolrah
May 8, 2006
what?

Shadow0 posted:

Web is even more terrifying to me than the database, I'm not sure I'm up to that task.
Whatever programming language you're most comfortable with, there is almost certainly a widely used framework or two available which handles the hard parts of dealing with HTTP and just lets you write your code surrounded by a bit of boilerplate. Since it sounds like you're treating this more as a toy for now and presumably don't want to spend money on running it I'd look in to any "serverless" options. Those basically abstract all of the web parts including the server operation and let you just focus on the code.

quote:

Yeah, I definitely assumed that!
I don't know how I would go about making sure that scores sent over were valid (I know nothing about security [Actually, it seems like I might know nothing about anything]). Maybe whenever the value is increased naturally, I store the answer as a hash, then send that with the score and have the server hash the score and compare them? Does that sound right?
I'm pretty confident approximately 0 people will download my app, but I'm trying to get in good practice.
By validation I mean more of ensuring the data is correctly formatted and is not being directly fed in to the database in a way that might allow injection attacks.

If the game is running entirely on the client there's no way to actually ensure the scores are legitimate. Depending on your game's design there may be ways to sanity check it (like if a good high score should be ~30,000 then a client submitting 4.2 billion is probably doing something shady) but unless the server is involved in running the game you pretty much have to trust the client as far as the score is concerned.

Shadow0 posted:

Stored procedures look interesting, thanks! I think that might work well. Firestore/Firebase doesn't seem to have exactly that, but it does have something called "Cloud Functions". It seems pretty similar to the web API idea. to the thing you said not to do... I'll keep looking around, haha. :negative: I think they run in the background though, so they are non-blocking. If I'm reading it correctly (I'm sure I'm not).
I guess I can expose a method like "sendScoreToServer", and then it fires off and does all its thing on the server. Seems like it'll be perfect. Then I'll just have to figure out how the security mess works and limit the database to only take that call and queries. Thanks for the help, guys!

Cloud functions seem to be Google's form of the "serverless" idea I mentioned above. Use HTTP triggers for your "submit score" and "get leaderboard" functions that the client connects to, then the trusted code you've set up on the Google platform does the database interaction.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Moonwolf posted:

In SQL Server it defaults to unbound preceding to current row if you put an order by clause on. If you hadn't ordered it then range boundaries wouldn't make any sense so it'd default to the entire dataset, because you haven't put a partitioning clause on.

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

After thinking about it for a couple days I think I get this. Thanks!

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


One of our internal data warehouses has a shitton of truncate/reloads that occur nightly. I've engineered many of them myself because they're very fast and performance is an ETL bottleneck. However, this data warehouse does not enforce really any foreign key constraints which I have always felt was bad design. Many of our thousands of queries rely on inner joins actually finding the matching record, and so there's a definite risk to not enforcing these foreign keys.

So with that said, do any of you have strong opinions between one of the two?
- Enforce FK constraints at the cost of ETL speed and necessitating Merge
- Stick with Truncate / Bulk Inserts and say "gently caress it" to the foreign key constraints as they come from other systems anyway

Also, anyone know off hand whether a Merge operation beats a Truncate / Bulk Insert at a certain rowcount threshold (obviously variable based on width/datatypes, but even a real ballpark number)? I imagine that Merge's overhead (comparing records) costs more at a low rowcount but might beat Truncate / Bulk Insert when through a low cost PK comparison can avoid inserting or updating millions of records. But I don't know that for sure.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Ugh. Five minutes before I left for Christmas, one of my customers started complaining of timeouts when running arbitrary SQL against a read-only server via ODBC.

We give their poo poo up to five minutes to select their data.

Their guy supposedly attached their SQL but actually attached their result set.

"Look man, you're getting 60k rows of data with over 900k instances of NULL values in this 9.5MB text result set. Maybe you should select less at once? The whole reason we made this tool available was to let you do these queries yourself without involving a dev like me. I'm not paid to performance tune your lovely report."

Kuule hain nussivan
Nov 27, 2008

Hey, finally I get a chance to ask a question WHERE Type = 'Stupid'!

I'm sure there's some really easy way of doing this, but I've been pondering it for a while now and can't come up with nothing foolproof.

Using T-SQL (SQL Server 14 to be exact), how can I extract all values from an NVARCHAR column that adhere to the pattern 'Firstname Surname'. It doesn't even have to be that restrictive, since I really only want all values that have exactly two strings separated by a space. No other details will matter.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Kuule hain nussivan posted:

Hey, finally I get a chance to ask a question WHERE Type = 'Stupid'!

I'm sure there's some really easy way of doing this, but I've been pondering it for a while now and can't come up with nothing foolproof.

Using T-SQL (SQL Server 14 to be exact), how can I extract all values from an NVARCHAR column that adhere to the pattern 'Firstname Surname'. It doesn't even have to be that restrictive, since I really only want all values that have exactly two strings separated by a space. No other details will matter.

Probably horrible perf but w/e

code:
select fart from butt 

-- must contain two blocks with at least one non-space characters, separated by a space
where fart like '%[^ ]% %[^ ]%'

-- cannot have three blocks in the same pattern
and fart not like '%[^ ]% %[^ ]% %[^ ]%'
Note that this will accept Firstname Lastname (multiple spaces in the middle). If you want it to fail, make the second pattern more generous, like '% % %'.

Kuule hain nussivan
Nov 27, 2008

NihilCredo posted:

Probably horrible perf but w/e

code:
select fart from butt 

-- must contain two blocks with at least one non-space characters, separated by a space
where fart like '%[^ ]% %[^ ]%'

-- cannot have three blocks in the same pattern
and fart not like '%[^ ]% %[^ ]% %[^ ]%'
Note that this will accept Firstname Lastname (multiple spaces in the middle). If you want it to fail, make the second pattern more generous, like '% % %'.

This'll do me just fine. It's only 5000 rows and it's meant to be an ad-hoc query so performance can gently caress right off. Thanks for the help, knew it would be something nice and simple that I've just missed.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I wanna learn about databases for web/app backends so what is a good and cheap/free place to run one? It’ll just be for playing around with stuff so preferably free is better.

The AWS stuff is confusing as hell because there are like 500 services with descriptions reading like they do the same things. (I’ve also made a AWS account ages ago so I don’t think the free micro server tier is available to me anymore.)

Also, what database system would I learn? It probably doesn’t matter but if everybody uses like MySQL then I guess I would prefer that.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Postgres, Heroku. There's a free hobby tier IIRC. Or possibly MSSQL on Azure if you really want to end up in a Microsoft shop. MySQL is bad.

NinpoEspiritoSanto
Oct 22, 2013




Boris Galerkin posted:

I wanna learn about databases for web/app backends so what is a good and cheap/free place to run one?

Your computer?

Also for non surprising, help avoid picking up stupid habits purposes, learn sql/relational theory with postgres. MySQL is a lot better than it used to be but still isn't truly ACID and has some insane defaults. Also its replication sucks.

Source: My experience wrangling the drat thing on big (800k+ tables) and small for the last two decades.

If your own computer isn't feasible, any free tier AWS ec2, or digital ocean and OVH do decently cheap VMs etc.

NinpoEspiritoSanto fucked around with this message at 16:53 on Jan 11, 2019

Adbot
ADBOT LOVES YOU

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


MS SQL is king imo, and express is free, capped at 10gb databases last I checked.

https://www.microsoft.com/en-us/sql-server/sql-server-editions-express

Perfect for learning or ad hoc analysis.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply