SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

At the risk of sounding prophetic, are you sure you really want the timestamps in the results? That is, are you really just looking for the final state for the user (a much easier problem), or do you truly need the alert and acknowledgement matched in that fashion?

And the obvious question, what happens if the sequence is (alert1,alert2,ack2,ack1)? Or (alert, alert,ack)?

# ? Dec 12, 2018 00:25

Adbot: ADBOT LOVES YOU

# ? Jun 1, 2024 17:00

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Nth Doctor posted:

Query...

I appreciate you take the time to type all that up. I need to work my way through understanding how it works. That type of performance is awesome, because my correlated subquery takes 8+ minutes for a month of data.

PhantomOfTheCopier posted:

At the risk of sounding prophetic, are you sure you really want the timestamps in the results? That is, are you really just looking for the final state for the user (a much easier problem), or do you truly need the alert and acknowledgement matched in that fashion?

And the obvious question, what happens if the sequence is (alert1,alert2,ack2,ack1)? Or (alert, alert,ack)?

My understanding of the data is that an acknowledgement will acknowledge all of the alerts that have fired since the last acknowledgement.

The timestamps are needed because the person requesting the report wants to the see stats around when during the day they typically fire, and how long on average until an alarm is acknowledged.

# ? Dec 12, 2018 01:15

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Is there a quick way to convert a string in the format

'2013.06.11.PM.09.48'

to datetime in SQL Server?

For clarity, that is yyyy.mm.dd.AM/PM.hh.mm

I could break it apart into components and do various conversions and then put them back together as datetime, but I'm hoping there is something obvious that I have missed.

# ? Dec 12, 2018 03:46

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Hughmoris posted:

I appreciate you take the time to type all that up. I need to work my way through understanding how it works. That type of performance is awesome, because my correlated subquery takes 8+ minutes for a month of data.

My understanding of the data is that an acknowledgement will acknowledge all of the alerts that have fired since the last acknowledgement.

The timestamps are needed because the person requesting the report wants to the see stats around when during the day they typically fire, and how long on average until an alarm is acknowledged.

Because I can't leave well enough alone, I looked into the edge cases I mentioned before and fixed them:

code:

CREATE TABLE dbo.#DocumentationTable
(
  Patient_No INT,
  Event_ID INT,
  Event_Date DATETIME
);

INSERT dbo.#DocumentationTable
(
  Patient_No,
  Event_ID,
  Event_Date
)
VALUES
  (1, 1, '1/1/2010'),
  (1, 1, '1/2/2010'),
  (1, 1, '1/3/2010'),
  (1, 2, '1/4/2010'),
  (1, 1, '1/5/2010'),
  (1, 2, '1/6/2010'),
  (1, 1, '1/7/2010'),
  (2, 2, '12/31/2009'),
  (2, 1, '1/1/2010'),
  (2, 1, '1/2/2010'),
  (2, 1, '1/3/2010'),
  (2, 2, '1/4/2010'),
  (2, 1, '1/5/2010'),
  (2, 2, '1/6/2010'),
  (2, 1, '1/7/2010');

SELECT
  COUNT(DISTINCT D.Patient_No) AS Distinct_Patients,
  COUNT(*) AS Total_Records
FROM dbo.#DocumentationTable AS D;

SELECT
  D.Event_ID,
  COUNT(*) AS Total
FROM dbo.#DocumentationTable AS D
GROUP BY D.Event_ID;

SELECT
  D2.Patient_No,
  D2.Event_ID AS Alarm,
  D2.Event_Date AS Alarm_Date,
  Ack_DT.Acknowledgement,
  Ack_DT.Acknowledgement_Date
FROM dbo.#DocumentationTable AS D2
LEFT OUTER JOIN 
(
  SELECT
    D.Patient_No,
    D.Event_ID AS Acknowledgement,
    D.Event_Date AS Acknowledgement_Date,
    LAG(D.Event_Date, 1, NULL) OVER(PARTITION BY D.Patient_No ORDER BY D.Patient_No, D.Event_Date) AS Previous_Acknowledgement_Date
  FROM dbo.#DocumentationTable AS D
  WHERE D.Event_ID = 2
) AS Ack_DT
  ON Ack_DT.Patient_No = D2.Patient_No
  AND Ack_DT.Acknowledgement_Date >= D2.Event_Date
  AND (Ack_DT.Previous_Acknowledgement_Date IS NULL OR Ack_DT.Previous_Acknowledgement_Date < D2.Event_Date)
WHERE D2.Event_ID = 1
ORDER BY D2.Patient_No, D2.Event_Date;
GO
DROP TABLE dbo.#DocumentationTable;

This will return rows for patients who have had an alarm raised but no acknowledgement yet.
As you can see with Patient 2, if for some reason there is an acknowledgement before any alarms were found in the data set, those acknowledgements are ignored.

pre:

Patient_No  Alarm       Alarm_Date              Acknowledgement Acknowledgement_Date
----------- ----------- ----------------------- --------------- -----------------------
1           1           2010-01-01 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-02 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-03 00:00:00.000 2               2010-01-04 00:00:00.000
1           1           2010-01-05 00:00:00.000 2               2010-01-06 00:00:00.000
1           1           2010-01-07 00:00:00.000 NULL            NULL
2           1           2010-01-01 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-02 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-03 00:00:00.000 2               2010-01-04 00:00:00.000
2           1           2010-01-05 00:00:00.000 2               2010-01-06 00:00:00.000
2           1           2010-01-07 00:00:00.000 NULL            NULL

# ? Dec 12, 2018 16:07

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

I think I may have a simpler query. Using your same base result set:

code:

DROP TABLE IF EXISTS #DocumentationTable

CREATE TABLE #DocumentationTable
(
    Patient_No INT,
    Event_ID INT,
    Event_Date DATETIME
);

INSERT INTO #DocumentationTable
(
    Patient_No,
    Event_ID,
    Event_Date
)
VALUES
    (1, 1, '1/1/2010'),
    (1, 1, '1/2/2010'),
    (1, 1, '1/3/2010'),
    (1, 2, '1/4/2010'),
    (1, 1, '1/5/2010'),
    (1, 2, '1/6/2010'),
    (1, 1, '1/7/2010'),
    (2, 2, '12/31/2009'),
    (2, 1, '1/1/2010'),
    (2, 1, '1/2/2010'),
    (2, 1, '1/3/2010'),
    (2, 2, '1/4/2010'),
    (2, 1, '1/5/2010'),
    (2, 2, '1/6/2010'),
    (2, 1, '1/7/2010');

Here's the query, using a windowed function with defined frame, taking advantage of the MIN aggregate discarding null rows:

code:

SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc

Same result set, but query plan is more efficient (~2/3 the cost) and a little nicer of a query plan due to eliminating any need to self-join:

# ? Dec 12, 2018 20:46

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Ruggan posted:

I think I may have a simpler query. Using your same base result set:

code:

DROP TABLE IF EXISTS #DocumentationTable

CREATE TABLE #DocumentationTable
(
    Patient_No INT,
    Event_ID INT,
    Event_Date DATETIME
);

INSERT INTO #DocumentationTable
(
    Patient_No,
    Event_ID,
    Event_Date
)
VALUES
    (1, 1, '1/1/2010'),
    (1, 1, '1/2/2010'),
    (1, 1, '1/3/2010'),
    (1, 2, '1/4/2010'),
    (1, 1, '1/5/2010'),
    (1, 2, '1/6/2010'),
    (1, 1, '1/7/2010'),
    (2, 2, '12/31/2009'),
    (2, 1, '1/1/2010'),
    (2, 1, '1/2/2010'),
    (2, 1, '1/3/2010'),
    (2, 2, '1/4/2010'),
    (2, 1, '1/5/2010'),
    (2, 2, '1/6/2010'),
    (2, 1, '1/7/2010');

Here's the query, using a windowed function with defined frame, taking advantage of the MIN aggregate discarding null rows:

code:

SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc

Same result set, but query plan is more efficient (~2/3 the cost) and a little nicer of a query plan due to eliminating any need to self-join:

Ooh I like that! I don't use windowed functions terribly often in my day to day work, and that's a really nice application of them!

# ? Dec 12, 2018 21:00

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Ruggan posted:

code:

SELECT      *
FROM        (
            SELECT      Patient_No,
                        Event_ID,
                        Alarm_Date              =   Event_Date,
                        Acknowledgement_Date    =   MIN(CASE WHEN Event_ID = 2 THEN Event_Date END) 
                                                    OVER (
                                                        --Frame our aggregate by patient
                                                        PARTITION BY Patient_No 
                                                        --Order by event date
                                                        ORDER BY Event_Date 
                                                        --Look only at future rows
                                                        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
                                                    )
            FROM        #DocumentationTable
            ) DATA
WHERE       Event_ID = 1 --Alarms
ORDER BY    Patient_No,
            Alarm_Date asc

You are a wizard. Utilizing this method allowed me to query 11 months worth of data in about 10 seconds. My correlated subquery for 1 month of data took 8+ minutes.

Nth Doctor, thanks for taking another crack at. I need to rewrite the query using your approach to see how it compares. I also need to start working on understanding execution plans.

It's really great to see how other people approach a given problem, and storing those concepts for future problems.

Hughmoris fucked around with this message at 03:59 on Dec 13, 2018

# ? Dec 13, 2018 03:50

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

Are there any good guides on when to make something a new column on a table, and when to compose with auxiliary tables? Like I get that if you can have a one to multiple relationship you need to compose either directly or using mapping/indexing tables, but even if you use that as a guideline, if you just take everything that is 1-to-1 and which is not repeated for other tables and add those as new columns, you can end up with some absolute monoliths of tables for any application of scale. On the other hand, maintaining an application where every field of an object is added through composition sounds like hell on earth.

# ? Dec 14, 2018 22:09

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

You basically want a "when to normalize and how much guide" which can have different approaches depending on the goal.

This article (was first result, have no idea how good the site is) goes over the basics:
https://www.essentialsql.com/get-ready-to-learn-sql-database-normalization-explained-in-simple-english/

# ? Dec 17, 2018 22:56

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

That article looks very relevant. Just having the terminology is very useful as well.

# ? Dec 18, 2018 20:31

unknown: Nov 16, 2002; Ain't got no stinking title yet!

SQL code so bad that Cloudflare refused to allow me to post it unedited.

Developer wondering why it wasn't working with any speed. :cripes:

code:


SLEECT tsl.*,
       jdgv.jdgv_name,
       jdgv.jdgv_id,
       deetz.deetz_name,
       tsl.tsl_site                                    AS site,
       tsl.tsl_plan                                    AS plan,
       Concat(ifec.firstname, Space(1), ifec.lastname) AS ifec_Name,
       Date_format(tsl.change_datetime, "%m-%d-%y")    AS LastUpdated,
       (SLEECT Count(*) allcheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allCount,
       (SLEECT Count(DISTINCT ( visitor )) alluniquecheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allUniqueCount,
       (SLEECT Group_concat('', ifec.firstname, Space(1), ifec.lastname)
        FROM   shared_tsl
               LEFT JOIN ifec
                      ON shared_tsl.ifec_id = ifec.ifec_id
        WHERE  shared_tsl.tsl_id = tsl.tsl_id)         AS borrow_ifec_name,
       (SLEECT image_name
        FROM   tsl_image
        WHERE  tsl_id = tsl.tsl_id
               AND is_main_image = 1
        LIMIT  1)                                      AS image_name,
       status.description                              AS status,
       ifec.firstname                                  AS ifecfirstname,
       ifec.lastname                                   AS ifeclastname,
       ifec2.firstname                                 AS ifec2firstname,
       ifec2.lastname                                  AS ifec2lastname
FROM   tsl
       LEFT JOIN status
              ON tsl.tsl_status = status.status_id,
       tsl AS tsl5
       LEFT JOIN jdgv
              ON tsl5.jdgv_id = jdgv.jdgv_id,
       tsl AS tsl4
       LEFT JOIN deetz
              ON tsl4.deetz_id = deetz.deetz_id,
       tsl AS tsl3
       LEFT JOIN ifec
              ON tsl3.tsl_ifec_id = ifec.ifec_id,
       tsl AS tsl2
       LEFT JOIN ifec AS ifec2
              ON tsl2.tsl_ifec2_id = ifec2.ifec_id
WHERE  tsl5.tsl_id = tsl.tsl_id
       AND tsl4.tsl_id = tsl.tsl_id
       AND tsl3.tsl_id = tsl.tsl_id
       AND tsl2.tsl_id = tsl.tsl_id
       AND ifec.office_id IN ( 77 )
       AND tsl.sharetsl = true
       AND ifec.activeifec = true
       AND tsl.tsl_ifec_id <> 1533
       AND tsl.tsl_site LIKE '%words%'
        OR jdgv.jdgv_name LIKE '%words%'
        OR tsl.mls_number LIKE '%words%'
           AND tsl.tsl_status NOT IN ( 1, 2, 7, 8, 13 )
GROUP  BY tsl.tsl_id
ORDER  BY tsl.tsl_status DESC,
          tsl.tsl_plan ASC;

# ? Dec 19, 2018 21:22

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

lol, that hurts my brain

# ? Dec 20, 2018 06:12

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

unknown posted:

SQL code so bad that Cloudflare refused to allow me to post it unedited.

Developer wondering why it wasn't working with any speed. :cripes:

code:


SLEECT tsl.*,
       jdgv.jdgv_name,
       jdgv.jdgv_id,
       deetz.deetz_name,
       tsl.tsl_site                                    AS site,
       tsl.tsl_plan                                    AS plan,
       Concat(ifec.firstname, Space(1), ifec.lastname) AS ifec_Name,
       Date_format(tsl.change_datetime, "%m-%d-%y")    AS LastUpdated,
       (SLEECT Count(*) allcheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allCount,
       (SLEECT Count(DISTINCT ( visitor )) alluniquecheck
        FROM   tslinfo
        WHERE  tslinfo.tsl_id = tsl.tsl_id)            AS allUniqueCount,
       (SLEECT Group_concat('', ifec.firstname, Space(1), ifec.lastname)
        FROM   shared_tsl
               LEFT JOIN ifec
                      ON shared_tsl.ifec_id = ifec.ifec_id
        WHERE  shared_tsl.tsl_id = tsl.tsl_id)         AS borrow_ifec_name,
       (SLEECT image_name
        FROM   tsl_image
        WHERE  tsl_id = tsl.tsl_id
               AND is_main_image = 1
        LIMIT  1)                                      AS image_name,
       status.description                              AS status,
       ifec.firstname                                  AS ifecfirstname,
       ifec.lastname                                   AS ifeclastname,
       ifec2.firstname                                 AS ifec2firstname,
       ifec2.lastname                                  AS ifec2lastname
FROM   tsl
       LEFT JOIN status
              ON tsl.tsl_status = status.status_id,
       tsl AS tsl5
       LEFT JOIN jdgv
              ON tsl5.jdgv_id = jdgv.jdgv_id,
       tsl AS tsl4
       LEFT JOIN deetz
              ON tsl4.deetz_id = deetz.deetz_id,
       tsl AS tsl3
       LEFT JOIN ifec
              ON tsl3.tsl_ifec_id = ifec.ifec_id,
       tsl AS tsl2
       LEFT JOIN ifec AS ifec2
              ON tsl2.tsl_ifec2_id = ifec2.ifec_id
WHERE  tsl5.tsl_id = tsl.tsl_id
       AND tsl4.tsl_id = tsl.tsl_id
       AND tsl3.tsl_id = tsl.tsl_id
       AND tsl2.tsl_id = tsl.tsl_id
       AND ifec.office_id IN ( 77 )
       AND tsl.sharetsl = true
       AND ifec.activeifec = true
       AND tsl.tsl_ifec_id <> 1533
       AND tsl.tsl_site LIKE '%words%'
        OR jdgv.jdgv_name LIKE '%words%'
        OR tsl.mls_number LIKE '%words%'
           AND tsl.tsl_status NOT IN ( 1, 2, 7, 8, 13 )
GROUP  BY tsl.tsl_id
ORDER  BY tsl.tsl_status DESC,
          tsl.tsl_plan ASC;

So does your dev realize indenting ORs doesn't make them logically nest? Or anything else about query design?
I should scrolled to the bottom in the hope of keeping sanity and saw that.

# ? Dec 20, 2018 06:46

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

poo poo like that is normal at my current work place. My old boss who started the company and his early devs were all ideas guys who fancied themselves programmers. You know when entering an SP or function that was made before 2015 that you're about to see some funky stuff. Only difference is aliases are either missing or more often than not just a single letter, and tables and column names are rife with typos and typically mix more than one language IN THE SAME NAME. My favorite is finding an old view that is actively being used that joins to a massive table to get a value that is never used and without a clustered index to join them.

# ? Dec 20, 2018 10:35

unknown: Nov 16, 2002; Ain't got no stinking title yet!

Nth Doctor posted:

So does your dev realize indenting ORs doesn't make them logically nest? Or anything else about query design?
I should scrolled to the bottom in the hope of keeping sanity and saw that.

Not my dev, I just got called in to investigate wtf was going on (as other people are already on holidays) and discovered the monstrosity.

Indenting is from the query clean up/formatting tool linked in the first post. I did have the change variable names for obvious reasons - hence the junk character names, but the logic (ha!) is all the dev's own work.

# ? Dec 21, 2018 18:47

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

# ? Dec 23, 2018 23:25

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Hughmoris posted:

Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

Only when the performance is a problem. Premature performance tweaks can have detrimental effects of you lock the query optimizer out of performing its magic.

Also making a good hygiene habit of joining on the PK / clustered index wherever possible.

# ? Dec 24, 2018 00:27

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Hughmoris posted:

Vague question here: When writing queries, do you typically familiarize yourself with the indexes before you start, or only when the performance needs improvement?

I just learned about SP_HELPINDEX on SQL Server and am thinking I need to start using it.

I never start by looking at the indexes. I write the query with the cleanest and most sensible syntax I can come up with, and 95% of the time that gives me a query that is good enough. At that point I�m done.

For the other 5% I�ll do some optimization work which usually starts with looking the query plan for things that are costing a lot: scans, key lookups, that sort of thing.

Rarely am I presented with a situation where query refactoring could lead to better performance via different index utilization, especially when you�ve written a clean query with simple joins. When this does happen a lot of it involves nudging the optimizer to select from the right index with little tricks like filtering a column in multiple joined tables explicitly (like a date range - only pulling data from that date range in each table aka two where between or on between statements).

Sometimes I find that indexes that are necessary or important (like an index on an important foreign key join column) are missing and I�ll either add it if I�m dbo or bug the right person to do it. In this case, having some clout is really helpful as I don�t often have to have the �yes I know indexes have a cost� conversation and rarely request useless / overly specific indexes. If you�re making this sort of request you can bolster your argument by looking at query stats in the query store - if you find a query is running hundreds of times a day and would largely benefit from this index then that helps you convince the dbo to make it.

# ? Dec 24, 2018 07:53

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Nth Doctor posted:

Only when the performance is a problem. Premature performance tweaks can have detrimental effects of you lock the query optimizer out of performing its magic.

Also making a good hygiene habit of joining on the PK / clustered index wherever possible.

Ruggan posted:

I never start by looking at the indexes. I write the query with the cleanest and most sensible syntax I can come up with, and 95% of the time that gives me a query that is good enough. At that point I�m done.

For the other 5% I�ll do some optimization work which usually starts with looking the query plan for things that are costing a lot: scans, key lookups, that sort of thing.

Rarely am I presented with a situation where query refactoring could lead to better performance via different index utilization, especially when you�ve written a clean query with simple joins. When this does happen a lot of it involves nudging the optimizer to select from the right index with little tricks like filtering a column in multiple joined tables explicitly (like a date range - only pulling data from that date range in each table aka two where between or on between statements).

Sometimes I find that indexes that are necessary or important (like an index on an important foreign key join column) are missing and I�ll either add it if I�m dbo or bug the right person to do it. In this case, having some clout is really helpful as I don�t often have to have the �yes I know indexes have a cost� conversation and rarely request useless / overly specific indexes. If you�re making this sort of request you can bolster your argument by looking at query stats in the query store - if you find a query is running hundreds of times a day and would largely benefit from this index then that helps you convince the dbo to make it.

Thanks for the insights. I recently took a SQL Assessment exam that my software vendor requires. Thanks to this thread, I aced it. In 2018 I've gone from barely being able to spell SQL, to creating executive facing dashboards.

In 2019, I look forward to deepening my knowledge of SQL beyond simple data retrieval. Thanks for the help everyone!

# ? Dec 24, 2018 22:11

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Ok, I managed to cobble together this query using a window function to calculate a running total, but I'm not 100% understanding how it works.

code:

SELECT "transaction"."id",
       "transaction"."payee_id",
       "transaction"."date",
       "transaction"."amount",
       "transaction"."memo",
       "transaction"."category_id",
       SUM("transaction"."amount") OVER (
         ORDER BY "transaction"."date" ASC
         ) AS "cumsum"
FROM "transaction"

1. I'm not specifying the bounds of the window anywhere...is it just the default for the window to be from the beginning of the result set to the current row?
2. I don't understand why the ORDER BY is a part of the OVER instead of at the end...aka FROM "transaction" ORDER_BY "transaction.date". It seems like if you order the results, then your window will automatically be over an ordered set of transactions. Obviously I'm wrong I just don't understand this well enough to understand why I'm wrong.

edit: The next post made me realize I didn't put that I'm using postgres 10.1

Thermopyle fucked around with this message at 21:19 on Dec 31, 2018

# ? Dec 31, 2018 20:06

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

In SQL Server it defaults to unbound preceding to current row if you put an order by clause on. If you hadn't ordered it then range boundaries wouldn't make any sense so it'd default to the entire dataset, because you haven't put a partitioning clause on.

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

# ? Dec 31, 2018 21:09

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Moonwolf posted:

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

I�m reasonably sure the implementation of windowed functions like this toss out set based operations and take a more algorithmic (ala read head) approach. They�re beneficial precisely because they do not need to join the full set to itself. I think.

# ? Jan 1, 2019 03:45

Shadow0: Jun 16, 2008; If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.; Grimey Drawer

Is it possible to have a database (or the server hosting the database maybe) execute some simple code when a write is performed?

I started looking at using Firebase for an Android app I'm making. I'm trying to make a database to store high scores, and I'd like it to just keep the top 10 results in the database but I don't want the clients touching other people's records. However, I'm not sure if it's possible.

If not, oh well, I can just have the client compare and then send a result if it beats another score and then just store everything forever.

I guess the other thing that can be done is the bottom record could be re-written by the new record.

# ? Jan 2, 2019 02:06

NinpoEspiritoSanto: Oct 22, 2013

There are things called triggers in SQL which might be what you're looking for. I'm on mobile at the moment but can follow up later if you're still stuck.

# ? Jan 2, 2019 02:14

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Are you saying you�re only storing high scores in the database or you�re storing all scores in one table and high scores in another? Or what?

As mentioned, you can do triggers. However triggers suck as they�re synchronous operations. There are other options that are more technically complex as well. But I�m not convinced you need them in the first place.

# ? Jan 2, 2019 02:38

Shadow0: Jun 16, 2008; If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.; Grimey Drawer

Bundy posted:

There are things called triggers in SQL which might be what you're looking for. I'm on mobile at the moment but can follow up later if you're still stuck.

Oh, that's a good starting point, thank you!

Ruggan posted:

Are you saying you�re only storing high scores in the database or you�re storing all scores in one table and high scores in another? Or what?

As mentioned, you can do triggers. However triggers suck as they�re synchronous operations. There are other options that are more technically complex as well. But I�m not convinced you need them in the first place.

Yeah, I don't really need to do anything fancy, I don't think the database will ever exceed the free-tier limits or anything. Of course I can just sort the results and return the top 5 or 10 or whatever. I'm just trying to learn a little bit more about using databases and whatnot since I know almost nothing at this point. I'm trying to do it "right" and learn a little bit, if that makes sense.

The idea was that the database just holds the top 10 scores. Then the client gets the top 5 or so and shows them on the high score screen; and any time a client scores a new high score, it gets sent off to the database, which then theoretically will update to reflect the new set of high scores, deleting the lowest result.

# ? Jan 2, 2019 03:51

wolrah: May 8, 2006; what?

You definitely don't want any situations where the client has the ability to modify someone else's score data. Many years ago Super Meat Boy did that and it didn't take long before that bit them in the rear end.

https://forums.somethingawful.com/showthread.php?noseen=0&threadid=2803713&pagenumber=258#post398884189

I'm not familiar with Firebase so I'm not sure what parts of the equation it may handle for you, but the way I'd do this would be to throw together a quick web API that sat between the database and the clients, with an endpoint for submitting high scores and another for retrieving the leaderboard. That app could then do whatever other housekeeping work you wanted it to do.

Rule #1 of internet-facing services: Never trust the client. Assume any safeguards you put in to client code will be defeated. Validate everything they send you and don't let them interact with anything they don't need to.

# ? Jan 2, 2019 06:45

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Triggers aren�t for what you�re trying to do. A trigger is code that fires when a database table is updated in some way (record added, deleted, updated). They are typically used for dependent updates - for example, if you had a Scores table, you might have a trigger that updates the HighScores table whenever the Scores table changes. Like I said before, I strongly dislike triggers as they are synchronous operations - the transaction does not complete until the trigger does and so the user must wait for the triggered operation to complete before regaining control. When people misuse triggers for expensive operations (like doing some aggregate across an entire table) it can really suck.

Once again, you don�t need a trigger. Triggers are for performing a dependent operation. All you need is a stored procedure. This stored procedure should accept two parameters: user and score. It should take the score and update your HighScores table directly when appropriate. You can expose this to your application through a direct SQL call, or some sort of API as wolrah mentioned.

# ? Jan 2, 2019 07:38

Shadow0: Jun 16, 2008; If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.; Grimey Drawer

wolrah posted:

You definitely don't want any situations where the client has the ability to modify someone else's score data. Many years ago Super Meat Boy did that and it didn't take long before that bit them in the rear end.

https://forums.somethingawful.com/showthread.php?noseen=0&threadid=2803713&pagenumber=258#post398884189

I'm not familiar with Firebase so I'm not sure what parts of the equation it may handle for you, but the way I'd do this would be to throw together a quick web API that sat between the database and the clients, with an endpoint for submitting high scores and another for retrieving the leaderboard. That app could then do whatever other housekeeping work you wanted it to do.

Web is even more terrifying to me than the database, I'm not sure I'm up to that task.

wolrah posted:

Rule #1 of internet-facing services: Never trust the client. Assume any safeguards you put in to client code will be defeated. Validate everything they send you and don't let them interact with anything they don't need to.

Yeah, I definitely assumed that!
I don't know how I would go about making sure that scores sent over were valid (I know nothing about security [Actually, it seems like I might know nothing about anything]). Maybe whenever the value is increased naturally, I store the answer as a hash, then send that with the score and have the server hash the score and compare them? Does that sound right?
I'm pretty confident approximately 0 people will download my app, but I'm trying to get in good practice.

Ruggan posted:

Triggers aren’t for what you’re trying to do. A trigger is code that fires when a database table is updated in some way (record added, deleted, updated). They are typically used for dependent updates - for example, if you had a Scores table, you might have a trigger that updates the HighScores table whenever the Scores table changes. Like I said before, I strongly dislike triggers as they are synchronous operations - the transaction does not complete until the trigger does and so the user must wait for the triggered operation to complete before regaining control. When people misuse triggers for expensive operations (like doing some aggregate across an entire table) it can really suck.

Once again, you don’t need a trigger. Triggers are for performing a dependent operation. All you need is a stored procedure. This stored procedure should accept two parameters: user and score. It should take the score and update your HighScores table directly when appropriate. You can expose this to your application through a direct SQL call, or some sort of API as wolrah mentioned.

Stored procedures look interesting, thanks! I think that might work well. Firestore/Firebase doesn't seem to have exactly that, but it does have something called "Cloud Functions". It seems pretty similar ~~to the web API idea.~~ to the thing you said not to do... I'll keep looking around, haha. :negative:

I think they run in the background though, so they are non-blocking. If I'm reading it correctly (I'm sure I'm not).
I guess I can expose a method like "sendScoreToServer", and then it fires off and does all its thing on the server. Seems like it'll be perfect. Then I'll just have to figure out how the security mess works and limit the database to only take that call and queries. Thanks for the help, guys!

Shadow0 fucked around with this message at 15:14 on Jan 2, 2019

# ? Jan 2, 2019 14:57

wolrah: May 8, 2006; what?

Shadow0 posted:

Web is even more terrifying to me than the database, I'm not sure I'm up to that task.

Whatever programming language you're most comfortable with, there is almost certainly a widely used framework or two available which handles the hard parts of dealing with HTTP and just lets you write your code surrounded by a bit of boilerplate. Since it sounds like you're treating this more as a toy for now and presumably don't want to spend money on running it I'd look in to any "serverless" options. Those basically abstract all of the web parts including the server operation and let you just focus on the code.

quote:

Yeah, I definitely assumed that!
I don't know how I would go about making sure that scores sent over were valid (I know nothing about security [Actually, it seems like I might know nothing about anything]). Maybe whenever the value is increased naturally, I store the answer as a hash, then send that with the score and have the server hash the score and compare them? Does that sound right?
I'm pretty confident approximately 0 people will download my app, but I'm trying to get in good practice.

By validation I mean more of ensuring the data is correctly formatted and is not being directly fed in to the database in a way that might allow injection attacks.

If the game is running entirely on the client there's no way to actually ensure the scores are legitimate. Depending on your game's design there may be ways to sanity check it (like if a good high score should be ~30,000 then a client submitting 4.2 billion is probably doing something shady) but unless the server is involved in running the game you pretty much have to trust the client as far as the score is concerned.

Shadow0 posted:

Stored procedures look interesting, thanks! I think that might work well. Firestore/Firebase doesn't seem to have exactly that, but it does have something called "Cloud Functions". It seems pretty similar ~~to the web API idea.~~ to the thing you said not to do... I'll keep looking around, haha. I think they run in the background though, so they are non-blocking. If I'm reading it correctly (I'm sure I'm not).
I guess I can expose a method like "sendScoreToServer", and then it fires off and does all its thing on the server. Seems like it'll be perfect. Then I'll just have to figure out how the security mess works and limit the database to only take that call and queries. Thanks for the help, guys!

Cloud functions seem to be Google's form of the "serverless" idea I mentioned above. Use HTTP triggers for your "submit score" and "get leaderboard" functions that the client connects to, then the trusted code you've set up on the Google platform does the database interaction.

# ? Jan 2, 2019 16:46

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Moonwolf posted:

In SQL Server it defaults to unbound preceding to current row if you put an order by clause on. If you hadn't ordered it then range boundaries wouldn't make any sense so it'd default to the entire dataset, because you haven't put a partitioning clause on.

Window functions function by effectively joining the data on again, to allow it to be sorted and aggregated there. Your ordering on the window applies only to the results of the window, which matters for the choice of bounding the results to aggregate, whereas a final order by clause orders the returned results.

After thinking about it for a couple days I think I get this. Thanks!

# ? Jan 2, 2019 21:01

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

One of our internal data warehouses has a shitton of truncate/reloads that occur nightly. I've engineered many of them myself because they're very fast and performance is an ETL bottleneck. However, this data warehouse does not enforce really any foreign key constraints which I have always felt was bad design. Many of our thousands of queries rely on inner joins actually finding the matching record, and so there's a definite risk to not enforcing these foreign keys.

So with that said, do any of you have strong opinions between one of the two?
- Enforce FK constraints at the cost of ETL speed and necessitating Merge
- Stick with Truncate / Bulk Inserts and say "gently caress it" to the foreign key constraints as they come from other systems anyway

Also, anyone know off hand whether a Merge operation beats a Truncate / Bulk Insert at a certain rowcount threshold (obviously variable based on width/datatypes, but even a real ballpark number)? I imagine that Merge's overhead (comparing records) costs more at a low rowcount but might beat Truncate / Bulk Insert when through a low cost PK comparison can avoid inserting or updating millions of records. But I don't know that for sure.

# ? Jan 8, 2019 00:48

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Ugh. Five minutes before I left for Christmas, one of my customers started complaining of timeouts when running arbitrary SQL against a read-only server via ODBC.

We give their poo poo up to five minutes to select their data.

Their guy supposedly attached their SQL but actually attached their result set.

"Look man, you're getting 60k rows of data with over 900k instances of NULL values in this 9.5MB text result set. Maybe you should select less at once? The whole reason we made this tool available was to let you do these queries yourself without involving a dev like me. I'm not paid to performance tune your lovely report."

# ? Jan 8, 2019 01:06

Kuule hain nussivan: Nov 27, 2008

Hey, finally I get a chance to ask a question WHERE Type = 'Stupid'!

I'm sure there's some really easy way of doing this, but I've been pondering it for a while now and can't come up with nothing foolproof.

Using T-SQL (SQL Server 14 to be exact), how can I extract all values from an NVARCHAR column that adhere to the pattern 'Firstname Surname'. It doesn't even have to be that restrictive, since I really only want all values that have exactly two strings separated by a space. No other details will matter.

# ? Jan 10, 2019 16:45

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Kuule hain nussivan posted:

Hey, finally I get a chance to ask a question WHERE Type = 'Stupid'!

I'm sure there's some really easy way of doing this, but I've been pondering it for a while now and can't come up with nothing foolproof.

Using T-SQL (SQL Server 14 to be exact), how can I extract all values from an NVARCHAR column that adhere to the pattern 'Firstname Surname'. It doesn't even have to be that restrictive, since I really only want all values that have exactly two strings separated by a space. No other details will matter.

Probably horrible perf but w/e

code:

select fart from butt 

-- must contain two blocks with at least one non-space characters, separated by a space
where fart like '%[^ ]% %[^ ]%'

-- cannot have three blocks in the same pattern
and fart not like '%[^ ]% %[^ ]% %[^ ]%'

Note that this will accept Firstname Lastname (multiple spaces in the middle). If you want it to fail, make the second pattern more generous, like '% % %'.

# ? Jan 10, 2019 17:51

Kuule hain nussivan: Nov 27, 2008

NihilCredo posted:

Probably horrible perf but w/e
code:
select fart from butt 

-- must contain two blocks with at least one non-space characters, separated by a space
where fart like '%[^ ]% %[^ ]%'

-- cannot have three blocks in the same pattern
and fart not like '%[^ ]% %[^ ]% %[^ ]%'
Note that this will accept Firstname Lastname (multiple spaces in the middle). If you want it to fail, make the second pattern more generous, like '% % %'.

This'll do me just fine. It's only 5000 rows and it's meant to be an ad-hoc query so performance can gently caress right off. Thanks for the help, knew it would be something nice and simple that I've just missed.

# ? Jan 10, 2019 18:30

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

I wanna learn about databases for web/app backends so what is a good and cheap/free place to run one? It�ll just be for playing around with stuff so preferably free is better.

The AWS stuff is confusing as hell because there are like 500 services with descriptions reading like they do the same things. (I�ve also made a AWS account ages ago so I don�t think the free micro server tier is available to me anymore.)

Also, what database system would I learn? It probably doesn�t matter but if everybody uses like MySQL then I guess I would prefer that.

# ? Jan 11, 2019 07:19

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Postgres, Heroku. There's a free hobby tier IIRC. Or possibly MSSQL on Azure if you really want to end up in a Microsoft shop. MySQL is bad.

# ? Jan 11, 2019 07:56

NinpoEspiritoSanto: Oct 22, 2013

Boris Galerkin posted:

I wanna learn about databases for web/app backends so what is a good and cheap/free place to run one?

Your computer?

Also for non surprising, help avoid picking up stupid habits purposes, learn sql/relational theory with postgres. MySQL is a lot better than it used to be but still isn't truly ACID and has some insane defaults. Also its replication sucks.

Source: My experience wrangling the drat thing on big (800k+ tables) and small for the last two decades.

If your own computer isn't feasible, any free tier AWS ec2, or digital ocean and OVH do decently cheap VMs etc.

NinpoEspiritoSanto fucked around with this message at 16:53 on Jan 11, 2019

# ? Jan 11, 2019 11:29

Adbot: ADBOT LOVES YOU

# ? Jun 1, 2024 17:00

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

MS SQL is king imo, and express is free, capped at 10gb databases last I checked.

https://www.microsoft.com/en-us/sql-server/sql-server-editions-express

Perfect for learning or ad hoc analysis.

# ? Jan 11, 2019 15:58

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »