Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
Ah okay I understand now. Stab #2: use a CTE and partition by the type
code:
; 
WITH data 
     ( 
          id, 
          deviceid, 
          gatewayid, 
          type, 
          value, 
          receivedtimestamp, 
          timestamp, 
          rownum 
     ) 
     AS 
     ( 
              SELECT   *, 
                       row_number() OVER (partition BY type, order by timestamp DESC) 
              FROM     parameter 
              WHERE    deviceid = 1 
     ) 
SELECT   * 
FROM     data 
WHERE    rownum = 1 
ORDER BY timestamp ASC

Adbot
ADBOT LOVES YOU

go play outside Skyler
Nov 7, 2005


kumba posted:

Ah okay I understand now. Stab #2: use a CTE and partition by the type
code:
; 
WITH data 
     ( 
          id, 
          deviceid, 
          gatewayid, 
          type, 
          value, 
          receivedtimestamp, 
          timestamp, 
          rownum 
     ) 
     AS 
     ( 
              SELECT   *, 
                       row_number() OVER (partition BY type, order by timestamp DESC) 
              FROM     parameter 
              WHERE    deviceid = 1 
     ) 
SELECT   * 
FROM     data 
WHERE    rownum = 1 
ORDER BY timestamp ASC

Thanks a bunch! This looks like a perfect solution ... unfortunately I am using MySQL 5.6 which does not seem to support CTEs and the WITH keyword :(

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

go play outside Skyler posted:

Thanks a bunch! This looks like a perfect solution ... unfortunately I am using MySQL 5.6 which does not seem to support CTEs and the WITH keyword :(

CTEs should be logically equivalent to subqueries (though they may have different performance / caching depending on the vendor). Meaning that that query is the same as:

code:
SELECT   * 
FROM  ( 
   SELECT   
      id, 
      deviceid, 
      gatewayid, 
      type, 
      value, 
      receivedtimestamp, 
      timestamp, 
      row_number() OVER (partition BY type, order by timestamp DESC) AS rownum
   FROM  parameter 
   WHERE  deviceid = 1 
) AS data
WHERE    rownum = 1 
ORDER BY timestamp ASC
You might be able to skip the subquery by using HAVING rownum = 1 instead of WHERE, I'm not sure if HAVING works with partitions.

meanieface
Mar 27, 2012

During times of universal deceit, telling the truth becomes a revolutionary act.

NihilCredo posted:

CTEs should be logically equivalent to subqueries (though they may have different performance / caching depending on the vendor). Meaning that that query is the same as:

code:
SELECT   * 
FROM  ( 
   SELECT   
      id, 
      deviceid, 
      gatewayid, 
      type, 
      value, 
      receivedtimestamp, 
      timestamp, 
      row_number() OVER (partition BY type, order by timestamp DESC) AS rownum
   FROM  parameter 
   WHERE  deviceid = 1 
) AS data
WHERE    rownum = 1 
ORDER BY timestamp ASC
You might be able to skip the subquery by using HAVING rownum = 1 instead of WHERE, I'm not sure if HAVING works with partitions.

Can’t skip the sub query.
If you want to filter by a windowing function, you’ve got to do that outside of the query creating the windowing function.

SELECT *
FROM (SELECT ROW_NUMBER() OVER(PARTITION BY CoffeeUsage ORDER BY CoffeeFlavor) AS CoffeeRow, CoffeeFlavor
FROM INeedANap)
WHERE CoffeeRow > 1

(Silly query bc posting from phone.)

meanieface
Mar 27, 2012

During times of universal deceit, telling the truth becomes a revolutionary act.

Just-In-Timeberlake posted:

can someone help me out with this pivot, I'm a loving moron when it comes to this

I want to take these results:

code:
name	col	        value
---------------------------------------
guid	name	guid
guid	value	9e1cb05b-a7ba-4283-abb4-29d0212ee303
label	name	label
label	value	Invoice issues
email	name	email
email	value	abc<at>abc.com
format	name	format
format	value	html
and pivot them into this:

code:
guid       label       email      format
------------------------------------------------------
9e1...     Invoi....   abc@...  html

This request made me feel like god is dead just in time for Yule. Bravo on that timing.

Assuming that you are doing this as part of another, larger, process and you only want *one* row at the end --

SELECT email, [format], [guid], label
FROM
(SELECT r.*
FROM #temptbl r
WHERE col = 'value'
) r
pivot(MAX([value]) FOR r.name IN ([email],[format],[guid],[label])) p

Just-In-Timeberlake
Aug 18, 2003

meanieface posted:

This request made me feel like god is dead just in time for Yule. Bravo on that timing.

Assuming that you are doing this as part of another, larger, process and you only want *one* row at the end --

SELECT email, [format], [guid], label
FROM
(SELECT r.*
FROM #temptbl r
WHERE col = 'value'
) r
pivot(MAX([value]) FOR r.name IN ([email],[format],[guid],[label])) p

thanks, I'll give this a shot after the holidays.

Just a weird requirement, but sometimes you gotta do dirty things and cash that check.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

meanieface posted:

Can’t skip the sub query.
If you want to filter by a windowing function, you’ve got to do that outside of the query creating the windowing function.

SELECT *
FROM (SELECT ROW_NUMBER() OVER(PARTITION BY CoffeeUsage ORDER BY CoffeeFlavor) AS CoffeeRow, CoffeeFlavor
FROM INeedANap)
WHERE CoffeeRow > 1

(Silly query bc posting from phone.)

this is all immaterial because go play outside Skyler said they are on MySQL v5.6 and windowing functions were added to MySQL in v8.0.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
code:
SELECT
    p.*
FROM
    Parameter AS p
    JOIN (
        SELECT Type, MAX(Timestamp) AS Latest
        FROM Parameter
        WHERE DeviceId = 1
        GROUP BY Type
    ) AS subq ON p.Type = subq.Type AND p.Timestamp = subq.Latest
WHERE
    p.DeviceId = 1

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

Hammerite posted:

this is all immaterial because go play outside Skyler said they are on MySQL v5.6 and windowing functions were added to MySQL in v8.0.

lmao get rekt idiot, use a real database :rolleyes:

Nolgthorn
Jan 30, 2001

The pendulum of the mind alternates between sense and nonsense
I'm using http://knexjs.org/ for MySQL manipulation which is fine. But migrations are failing when I use it with MariaDB, the string indexes are the wrong length. Is there a way I can write my migrations so that they set the index length? MariaDB is supposed to be a drop in replacement for MySQL.

The node adapter for MariaDB is abandoned for almost 2 years and doesn't compile on some computers using npm install.

PierreTheMime
Dec 9, 2004

Hero of hormagaunts everywhere!
Buglord
I'm having to deal with SSIS packages for a project and the supposed SME for their team knows absolutely nothing with regards to running in command line (which we need for automation). The problem is that I'm not sure if the package is properly setting all the variables I'm passing in.

Is there an easy way to get a list of "current" variables in the log during or post-execution? Currently the variables I'm passing are enough to allow the package to run, but the tables we're expecting to update aren't and it's a bit frustrating working with a blackbox where you're uncertain how the information you're handing a tool is being used internally. I've looked over the XML and ensured that all "evaluate as expression" flags are set to false, and searched over the whole doc to ensure the variables I'm setting are being used in the correct location.

The especially frustrating thing is that certain variables are definitely being evaluated, as it's dumping the log file in the location and filename that's only specified by a variable.

Edit: nevermind, turns out the bastards had the other variables being used inside a loop with a bad static value (:argh:). I manually edited their garbage and had to replace a half-dozen static values that they swore they’d updated to variable and now it’s working, at least until they change something again.

PierreTheMime fucked around with this message at 04:00 on Dec 29, 2017

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
I'd like to create an "overlay table", where I have some immutable base set of data and then I can create a mask over the top using a second table with the same PK. In other words the view select would be:


CREATE VIEW OVERLAY_VW AS

SELECT

CASE
IF OVERLAY_TBL.COL1 IS NOT NULL THEN OVERLAY_TBL.COL1
ELSE BASE_TBL.COL1 END AS COL1

(repeat for each column)

FROM BASE_TBL LEFT JOIN OVERLAY_TBL ON BASE_TBL.PK = OVERLAY_TBL.PK


Is there a more graceful way to do that?

The use-case here is that raw data is periodically imported from a dirty external source and I am doing both scripted and manual cleanup on top of it, so it would be nice to be able to be able to just truncate and reload that table when I get new data, while not losing any cleanup that I've already done.

Paul MaudDib fucked around with this message at 03:31 on Dec 29, 2017

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Paul MaudDib posted:

I'd like to create an "overlay table", where I have some immutable base set of data and then I can create a mask over the top using a second table with the same PK. In other words the view select would be:


CREATE VIEW OVERLAY_VW AS

SELECT

CASE
IF OVERLAY_TBL.COL1 IS NOT NULL THEN OVERLAY_TBL.COL1
ELSE BASE_TBL.COL1 END AS COL1

(repeat for each column)

FROM BASE_TBL LEFT JOIN OVERLAY_TBL ON BASE_TBL.PK = OVERLAY_TBL.PK


Is there a more graceful way to do that?

The use-case here is that raw data is periodically imported from a dirty external source and I am doing both scripted and manual cleanup on top of it, so it would be nice to be able to be able to just truncate and reload that table when I get new data, while not losing any cleanup that I've already done.

COALESCE(OVERLAY_TBL.COL1, BASE_TBL.COL1) or ISNULL if you're on MSSQL.

Just-In-Timeberlake
Aug 18, 2003

meanieface posted:

This request made me feel like god is dead just in time for Yule. Bravo on that timing.

Assuming that you are doing this as part of another, larger, process and you only want *one* row at the end --

SELECT email, [format], [guid], label
FROM
(SELECT r.*
FROM #temptbl r
WHERE col = 'value'
) r
pivot(MAX([value]) FOR r.name IN ([email],[format],[guid],[label])) p

This worked by the way, so thanks!

olives black
Nov 24, 2017


LENIN.
STILL.
WON'T.
FUCK.
ME.
Anyone got suggestions for books on proper relational database design? I'm thinking about starting a new project from scratch, and I want to avoid hitting the coding stage with a flawed design.

Amazon says this one's good, but if anyone has a better suggestion I'd appreciate it.

Sorry for asking this on page 227, but I saw nothing in the OP or the first few pages on this (just one or two posts on anti-patterns).

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I need to learn how to write SQL queries for work. I just stepped through the free Stanford Relational Algebra videos and they were surprisingly engaging. Next is the SQL section of videos.

I know that often times theory can be learned in the classroom but never knowingly utilized in the field. For those who write queries, how often (if ever) do you think about relational algebra?

Hughmoris fucked around with this message at 01:55 on Dec 31, 2017

olives black
Nov 24, 2017


LENIN.
STILL.
WON'T.
FUCK.
ME.

Hughmoris posted:

I need to learn how to write SQL queries for work. I just stepped through the free Stanford Relational Algebra videos and they were surprisingly engaging. Next is the SQL section of videos.

I know that often times theory can be learned in the classroom but never knowingly utilized in the field. For those who write queries, how often (if ever) do you think about relational algebra?

Checking this out, thanks.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Hughmoris posted:

For those who write queries, how often (if ever) do you think about relational algebra?

Never. Been a SQL developer for 13 years and I do schema design as a mostly intuitive thing, which is a lovely answer, I know, but seriously: never do I think explicitly in terms of the algebra.

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
Been a SQL analyst for 3 years, have never thought about algebra once

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

olives black posted:

Anyone got suggestions for books on proper relational database design? I'm thinking about starting a new project from scratch, and I want to avoid hitting the coding stage with a flawed design.

Amazon says this one's good, but if anyone has a better suggestion I'd appreciate it.

Sorry for asking this on page 227, but I saw nothing in the OP or the first few pages on this (just one or two posts on anti-patterns).

The answer is going to be different for transactional schema design and analytic schema design. For OLAP, The Data Warehouse Toolkit (or most things by Ralph Kimbal) is generally considered the standard. For OLTP, I'd defer to someone else' opinion.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Nth Doctor posted:

Never. Been a SQL developer for 13 years and I do schema design as a mostly intuitive thing, which is a lovely answer, I know, but seriously: never do I think explicitly in terms of the algebra.

Relational algebra is irrelevant to query writing and has largely been replaced by convention for schema design. If you want to design a new database, especially the query engine, or understand why the conventions are what they are, understanding the algebra becomes important.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there a recommended formatting/style guide I should adhere to as I start learning to write SQL queries? I'm using SSMS but the computer is pretty locked down and I don't think I'll be able to use plugins to help.

Kuule hain nussivan
Nov 27, 2008

Hughmoris posted:

Is there a recommended formatting/style guide I should adhere to as I start learning to write SQL queries? I'm using SSMS but the computer is pretty locked down and I don't think I'll be able to use plugins to help.
There doesn't seem to be a lot of official SQL style guides. http://www.sqlstyle.guide/ seems like a pretty decent read. I personally don't sign off on all of it, but most of it is common sense stuff.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
The only times I've needed to think about relational algebra are when you're trying to read a query plan and figure out what it's doing that is taking so goddamn long, and really only in a tertiary sense. It's something that's hidden "under the hood", you really shouldn't need to think about it unless you are writing a database engine yourself.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Thanks for the answers. One more beginner question: Should I be spending my time learning how to use the query builder/designer in SSMS, or should I focus instead on writing queries by hand for a while?

spiritual bypass
Feb 19, 2008

Grimey Drawer
I've never used that particular query building tool, but I've also never wanted one. SQL is usually pretty simple. It's probably much faster just to type everything by hand once you remember all the keywords and concept.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
Virtually without exception, in any type of software development the bottleneck will never be "how fast your fingers can type" and therefore any sort of GUI tool (beyond a basic IDE) is doomed to fail.

Schema management tools like PowerDesigner can be really nice for handling boilerplate schema migrations but that's the only "visual" tool I use on any sort of a consistent basis. Otherwise I just use PGadmin and DBeaver.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Hughmoris posted:

Thanks for the answers. One more beginner question: Should I be spending my time learning how to use the query builder/designer in SSMS, or should I focus instead on writing queries by hand for a while?

I work in an office full of young support people who use SSMS four out of eight hours a day but who couldn't tell you what a trigger is. I warned them several times that by blindly following wizards and menus, instead of learning the actual SQL commands and how they work, they're practically leaving money on the table (in the form of valuable skills and future job prospects). Please don''t do the same.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

NihilCredo posted:

I work in an office full of young support people who use SSMS four out of eight hours a day but who couldn't tell you what a trigger is. I warned them several times that by blindly following wizards and menus, instead of learning the actual SQL commands and how they work, they're practically leaving money on the table (in the form of valuable skills and future job prospects). Please don''t do the same.

Thanks for the advice. I'm not interested in taking short cuts if it'll possibly handicap me down the road.

I just finished up the Wise Owl: SQL Server Queries series, and am working my way through some practice problems. Are there any SQL rooms in IRC or Discord that people hang out in, to get feedback on some simple queries I'm writing?

For this particular problem: I have a Movies DB, and I want to know who the oldest actor is for each movie. My query returns the results but are there any easy improvements to be had?



SQL code:
WITH my_select
AS (SELECT
    FilmName,
    MIN(ActorDOB) AS BirthDate
FROM tblCast AS c
INNER JOIN tblActor AS a
    ON c.CastActorID = a.ActorID
INNER JOIN tblFilm AS f
    ON c.CastFilmID = f.FilmID
GROUP BY FilmName)

SELECT
    FilmName,
    ActorName,
    ActorDOB
FROM my_select
INNER JOIN tblActor
    ON my_select.BirthDate = tblActor.ActorDOB
ORDER BY FilmName

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Hughmoris posted:

For this particular problem: I have a Movies DB, and I want to know who the oldest actor is for each movie. My query returns the results but are there any easy improvements to be had?

One mistake: your query will return, for each movie, all the actors that share the same birth date as the oldest actor in the movie, even if those actors didn't appear in the movie.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

NihilCredo posted:

One mistake: your query will return, for each movie, all the actors that share the same birth date as the oldest actor in the movie, even if those actors didn't appear in the movie.

I've been trying my hand at fixing this for the past 45 minutes, can't figure it out. Any tips? I can explain what I need to do in pseudo code but I'm not sure how to tackle it in SQL.

Hughmoris fucked around with this message at 02:58 on Jan 2, 2018

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

Hughmoris posted:

I've been trying my hand at fixing this for the past 45 minutes, can't figure it out. Any tips? I can explain what I need to do in pseudo code but I'm not sure how to tackle it in SQL.

You want to rank the birth dates within each movie, and only select the person with the lowest birthdate for each movie, so I'd use a ranking function in your CTE:


SQL code:
WITH my_select
AS (SELECT
    FilmName,
    ActorName,
    ActorDOB AS BirthDate,
    ROW_NUMBER() OVER (PARTITION BY f.FilmID Order By a.ActorDOB asc) AS BirthDateRank
FROM tblCast AS c
INNER JOIN tblActor AS a
    ON c.CastActorID = a.ActorID
INNER JOIN tblFilm AS f
    ON c.CastFilmID = f.FilmID
)

SELECT
    FilmName,
    ActorName,
    ActorDOB
FROM my_select
WHERE BirthDateRank = 1
ORDER BY FilmName
Note: this will still run into a problem if two actors have the same birth date and they are the oldest, since this will result in something non-deterministic. You'll want to look at the various ranking functions (RANK, DENSE_RANK, ROW_NUMBER, etc) and what other criteria you'd want as a tiebreaker for the edge case.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Hughmoris posted:

I've been trying my hand at fixing this for the past 45 minutes, can't figure it out. Any tips? I can explain what I need to do in pseudo code but I'm not sure how to tackle it in SQL.

The subquery is fine.

You don't want to join on all the actors with the same birth date, but on all the actors with the same birth date in that movie.

Add a join on tblCast to the main query.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

kumba posted:

You want to rank the birth dates within each movie, and only select the person with the lowest birthdate for each movie, so I'd use a ranking function in your CTE:


SQL code:
WITH my_select
AS (SELECT
    FilmName,
    ActorName,
    ActorDOB AS BirthDate,
    ROW_NUMBER() OVER (PARTITION BY f.FilmID Order By a.ActorDOB asc) AS BirthDateRank
FROM tblCast AS c
INNER JOIN tblActor AS a
    ON c.CastActorID = a.ActorID
INNER JOIN tblFilm AS f
    ON c.CastFilmID = f.FilmID
)

SELECT
    FilmName,
    ActorName,
    ActorDOB
FROM my_select
WHERE BirthDateRank = 1
ORDER BY FilmName
Note: this will still run into a problem if two actors have the same birth date and they are the oldest, since this will result in something non-deterministic. You'll want to look at the various ranking functions (RANK, DENSE_RANK, ROW_NUMBER, etc) and what other criteria you'd want as a tiebreaker for the edge case.

Thank you. I haven't seen ROW_NUMBER() or PARTITION BY in my adventures yet, so I'll do some reading along with the ranking functions.

NihilCredo posted:

The subquery is fine.

You don't want to join on all the actors with the same birth date, but on all the actors with the same birth date in that movie.

Add a join on tblCast to the main query.

I'll give that a try. Thanks!

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Different movies may have the same title as well.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


One problem I haven't found a great solution to is returning nested data structures.

Let's say I have two tables: Task and TaskOwner. The relationship is one-to-many, and I'm consuming the output in C# to expose it via a web API endpoint as JSON. In the example below, I'll be returning 10 rows from the Task table and all related TaskOwners.

I've used two approaches in the past, and I don't love either of them:

  • Return 10 Task rows. Get distinct list of TaskOwnerIds and return them as a second query. Map the result to the Task rows.
  • Join Task to TaskOwner and return the result. De-duplicate Task rows in the ORM, some of which will be repeated due to the one-to-many join.

What I'd really like is some way of returning the nested data structure directly from SQL - in a way, composing the object with a single query without returning duplicated data. Maybe this is possible with XML PATH, but that also seems ugly. Any novel approaches people have taken for this?

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
The novel approach would be to trust in your ORM's cache. Pull the Task data once, and whenever you access the TaskOwner property your ORM should only actually hit the DB if the TaskOwner is not already cached. Even if not, primary key lookups should be extremely fast in any database. Or if the data is sparse enough that that's a bottleneck, you do the solution with the join.

Before you go implementing something crazy, are you sure that this is actually a bottleneck for you?

Paul MaudDib fucked around with this message at 18:10 on Jan 2, 2018

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Hughmoris posted:

Thank you. I haven't seen ROW_NUMBER() or PARTITION BY in my adventures yet, so I'll do some reading along with the ranking functions.


I'll give that a try. Thanks!

Windowed functions are really awesome and powerful, and I use them a lot. However, you need to be really careful with the window's frame. The frame of a window defines the subset of rows in which the function will work. The default frame is "BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW". That basically means, when analyzing this function across all your returned rows, only analyze rows that are between the beginning of your PARTITION and the row being calculated (not the end of your PARTITION).

This can introduce a very problematic "gotchya" where your window function isn't returning the data you'd expect because the frame isn't set appropriately. If you use the LAST_VALUE() function, for example, the last row will always be the current row:



Make sure you read up on windowed functions and frames before you start using code in production.

https://www.red-gate.com/simple-talk/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Paul MaudDib posted:

The novel approach would be to trust in your ORM's cache. Pull the Task data once, and whenever you access the TaskOwner property your ORM should only actually hit the DB if the TaskOwner is not already cached.

Before you go implementing something crazy, are you sure that this is actually a bottleneck for you?

Yeah. It's a customer facing site where I work. The tasks are basically todo items that need to be done to complete a project. There are tens of thousands of tasks per project, and about 3 million in our system total. Each of these tasks has multiple one-to-many relationships. TaskOwner is a mapping table between the Tasks and Users table, of which we have about 50k.

We currently use Entity Framework, but it imposes a costly overhead, and does not integrate well with version control due to its graphical UI. I personally really dislike EF. It also encourages our developers to tightly couple to our database structure, as they can just hook into tables directly. Due to this tight coupling, I'm nervous to change the database's structure - I'm not sure if I'll inadvertently break some developer's bad code. I'd prefer all web app interaction to go through pre-defined SQL (stored procedures).

I want to move us to using something like Dapper, which I use in some personal projects - it's a Micro-ORM most notably in use at StackOverflow. Most of our queries that hit or pull from the database are already encapsulated in stored procedures, which that supports. I can really wring some great performance out of it. All I need to do is codify all the interactions as Stored Procs and force the transition. My biggest holdup is finding a way to represent these complex nested objects while minimizing the amount of data that needs to flow around. I can do it with one of the methods I mentioned, but I wish there was a better way.

Ruggan fucked around with this message at 18:20 on Jan 2, 2018

Adbot
ADBOT LOVES YOU

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Turns out SQL has a FOR JSON PATH formatter that can be used to make nested outputs:

code:
SELECT      T.TaskId as Id,
            T.TaskName as Name,
            Owners = (
                SELECT      U.UserId as Id,
                            U.UserName as Name
                FROM        TaskOwner TU
                INNER JOIN  User U
                ON          TF.OwnerId = F.OwnerId
                WHERE       TF.TaskId = T.TaskId
                FOR JSON PATH
            )
            
FROM        Task T

FOR JSON PATH
This will output tasks in the following format:

code:
[  
   {  
      "Id":1,
      "Name":"Test",
      "Owners":[  
         {  
            "Id":1,
            "Name":"A"
         },
         {  
            "Id":2,
            "Name":"B"
         },
         {  
            "Id":3,
            "Name":"C"
         }
      ]
   },
   ...
]
This is basically exactly what I'm looking for, but I want to understand more of how the engine is doing it behind the scenes before I actually use it. Also need to performance test it.

Anybody have experience with using FOR JSON PATH?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply