Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Dead Man's Ham
Dec 2, 2002
Ive been tasked with writing an update script and have hit the extent of my very limited SQL knowledge.

I have this dataset and I need to add either 1002 or T002 to the b_misc1 column for rows with NEWHSA in the b_benecode column depending on if there is another record in the table with the same b_empno value that has either MEDNATCARE or MEDPLUS as a value in the b_benecode column

In this example line 2 would need to be set to T002 and lines 5, 9, 12, 14,16,18 would be set to 1002


Im pretty sure I can get this to work with a cursor - but from what I read that is a no-no. Does anyone have any advice on how this can be done outside of using a cursor?

Adbot
ADBOT LOVES YOU

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Couldn't you do it with CASE?

(not looking syntax so this could be screwy)
CASE WHEN b_benecode='NEWHSA' AND (sub query finding the b_empno value) THEN 'T002'
ELSE '1002'

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

Dead Man's Ham posted:

Ive been tasked with writing an update script and have hit the extent of my very limited SQL knowledge.

I have this dataset and I need to add either 1002 or T002 to the b_misc1 column for rows with NEWHSA in the b_benecode column depending on if there is another record in the table with the same b_empno value that has either MEDNATCARE or MEDPLUS as a value in the b_benecode column

In this example line 2 would need to be set to T002 and lines 5, 9, 12, 14,16,18 would be set to 1002


Im pretty sure I can get this to work with a cursor - but from what I read that is a no-no. Does anyone have any advice on how this can be done outside of using a cursor?

i have a solution but cloudfare or whatever keeps blocking me from posting it because of sql injection

how do i get around this or email you

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy
code:
;WIT cte 
     AS (SELET m.id_col, m.b_empno
                CASE 
                  WHEN c.b_benecode = 'MEDNATCARE' THEN 'T002' 
                  WHEN c.b_benecode = 'MEDPLUS' THEN '1002' 
                END AS targetvalue 
         FRO   mytable m 
         WHERE  m.b_benecode IN ( 
                                       'MEDNATCARE', 'MEDPLUS' )  
                
                AND EXITS (SELET 1 
                            FRO   mytable m2 
                            WHERE  m.b_empno = m2.b_empno 
                                   AND m2.b_benecode = 'NEWHSA'
AND m.b_misc1 = '' 
                -- assuming from your picture this is an empty string and not NULL 
                --use the next line instead of the previous line if it's NULL 
                --and m.b_misc1 is null 
) 
UPDAT mt 
SET    b_misc1 = c.targetvalue 
FRO   mytable mt 
       JOIN cte c 
         ON mt.b_empno = c.b_empno
      WHERE mt.b_benecode = 'NEWHSA' 
replace "WIT" with "WITH" and "SELET" with "SELECT" and "FRO" with "FROM" and "UPDAT" with "UPDATE" and "EXITS" to "EXISTS" and you should be golden

sorry for double post: gently caress you, firewall

e: i hosed it up, but i fixed it to where i think it will work. i should really install management studio on my home computer

kumba fucked around with this message at 00:50 on Oct 12, 2018

Dead Man's Ham
Dec 2, 2002
Thanks for the help everyone! I got it working.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

LLSix posted:

What steps should I take to try to optimize this query? The execution plan shows that everything is either an index seek or a primary key lookup except for a single sort triggered by the one order by clause. The sort takes up 1/3 of the total query cost so that seems like a good place to start, but neither I nor anyone else at work knows a good way to speed it up.

code:
SELECT xl.xlinkid, 
       xl.entityid2, 
       xl.relationshipid1, 
       users.userid, 
       nc.namecardid, 
       nc.namefull, 
       users.loginid, 
       '', 
       '0', 
       userrate.ratetype, 
       userrate.amtrate, 
       xtable.description, 
       nc.namefirst, 
       nc.namelast 
FROM   xlink xl, 
       namecard nc, 
       users 
       LEFT OUTER JOIN userrate 
                    ON userrate.userid = users.userid 
                       AND userrate.siteid = 'C010' 
                       AND userrate.isdefault = 1 
       LEFT OUTER JOIN xtable 
                    ON userrate.ratetypeid = xtable.xtableid 
WHERE  xl.linkclass = 'MyUsers' 
       AND xl.parententitytypeid = '400' 
       AND xl.parententityid = '0000000001X000' 
       AND users.userid = xl.entityid2 
       AND nc.namecardid = users.namecardid 
       AND xl.siteid = 'C010' 
ORDER  BY users.loginid 
loginid is not a key or indexed in users. I tried ordering it by a couple of different keys that are indexed in users instead and they all performed the same as the current query so I don't think that making an index for loginid will have an impact.

I'm pretty new at this so any advice is appreciated.


Edit: GoogleDrive seems to think the execution plan is an audio file but if you download it and open it, it behaves correctly. Here's a jpg of the same thing.


You've got a clustered index scan on the userrate table, which is pretty bad (it's basically scanning the entire table). Not sure how I'd approach fixing it though. You could try adding an index on (userrate.userid, userrate.siteid, userrate.isdefault, userrate.ratetypeid) but I'm not sure if it'd help since you're selecting more columns than that. You can do exotic tricks with including non-indexed columns in the index but at that point you're making the index extremely specialized for this particular query.

Also, try profiling the query and see if the actual costs break down like the execution plan thinks they will.

Love Stole the Day
Nov 4, 2012
Please give me free quality professional advice so I can be a baby about it and insult you

Jethro posted:

If all the data you are interested in is from the people table, why is that the outer table in the join? I guess in theory every t1.c_id should have a corresponding p1.id, so it shouldn't make that much of a difference, but it certainly looks funny. You filter out null cast_ids, but are you sure none of the entries in the array are null (I have never used Postgres or array columns, so I don't know if that's even a sensible question)?

If I want "records from one table where the id is in another table", I write a query using an IN

SQL code:
select jsonb_build_object('id', p1.id, 'name', p1.name, 'biography', p1.biography) c1 
  from people p1
 where p1.id in (
        select unnest(m3.cast_ids) c_id
        from movies m3
        where m3.cast_ids is not null 
          and m3.id = 2)
Or am I misunderstanding what you want?

PhantomOfTheCopier posted:

There's no movie id in the output rows, nor any grouping by the movie id, so without a constraint you'd get a list of people ids that appear in any movie.

Start by getting (movieid,peopleid) rows from your cast unnest, join that to people names to get (movieid, peopleid, name) duplicates and nulls allowed. jsonify.

Thanks for the guidance! I am able to use PhantomOfTheCopier's advice to connect the movie and the people appropriately as shown below, but I'm struggling to collate the data into the array that I need so that I can throw it into a single column.

SQL code:
select t1.m_id, t1.m_name, t1.c_id, p1.name c_name
from (
   select m1.id m_id, m1.name m_name, unnest(m1.cast_ids) c_id
   from movies m1
   order by m1.id asc
) t1
inner join people p1 on t1.c_id = p1.id


Arrays make this data stuff incredibly annoying, sadly, and it throws me for a loop. My intuition tells me that I'm dancing around a simple solution but I just don't know about the one simple trick.



Full disclosure: this was for a take-home project that I failed to complete some months ago. I'm revisiting it to try and be better for next time. There's more to the take-home project of course, but it was just this one part that I wasn't able to complete: taking an array of foreign keys and appearing to replace them inline with their corresponding row data.

Love Stole the Day fucked around with this message at 21:55 on Oct 17, 2018

LLSix
Jan 20, 2010

The real power behind countless overlords

Munkeymon posted:

An index covering all the fields of XLink you're using might help a bit by eliminating that key lookup.

Great idea! this cut the cost down by 11%


TheFluff posted:

You've got a clustered index scan on the userrate table, which is pretty bad (it's basically scanning the entire table). Not sure how I'd approach fixing it though. You could try adding an index on (userrate.userid, userrate.siteid, userrate.isdefault, userrate.ratetypeid) but I'm not sure if it'd help since you're selecting more columns than that. You can do exotic tricks with including non-indexed columns in the index but at that point you're making the index extremely specialized for this particular query.

Also, try profiling the query and see if the actual costs break down like the execution plan thinks they will.

This also helped, though not as much because by this point in the execution the result set is very small.

Thank you both for your help!

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Love Stole the Day posted:

Thanks for the guidance! I am able to use PhantomOfTheCopier's advice to connect the movie and the people appropriately as shown below, but I'm struggling to collate the data into the array that I need so that I can throw it into a single column.

SQL code:
select t1.m_id, t1.m_name, t1.c_id, p1.name c_name
from (
   select m1.id m_id, m1.name m_name, unnest(m1.cast_ids) c_id
   from movies m1
   order by m1.id asc
) t1
inner join people p1 on t1.c_id = p1.id


Arrays make this data stuff incredibly annoying, sadly, and it throws me for a loop. My intuition tells me that I'm dancing around a simple solution but I just don't know about the one simple trick.

https://www.postgresql.org/docs/9.5/static/functions-aggregate.html

Guessing...
SQL code:

select m1.name,array_agg(p1.name)
from (...)
inner join people...
group by m1.name

You have all the data, you can now group by movie (you may need to group by movieid,moviename to avoid grouping different movies with the same name) and aggregate the people into an array. If you need different sorting in the array, you'll have to order the results of your above query first, stuff it into a cte, and do the aggregate on that.

Macichne Leainig
Jul 26, 2012

by VG
I have a coworker who is arguing in favor of the One True Lookup Table implementation, specifically for data-driven ListBox/combo box values, since in our systems the key-value for list/combo boxes is int-string (basically an enum).

I have argued that this is bad DB normalization, adds maintenance headaches and generally does not solve any problems with the DB, since the data tables utilizing the lookup table should have a good FK enforcing the referential integrity. Plus, any developer with a brain can find that relationship in the DB schema, so you don't save any time or effort.

He thinks the time saved in not having to create new tables in the future is worth it. I think it's a bad idea, but I'm not skilled enough of a SQL developer to know more reasons why. Is there actually a case for an OTLT here, or is my coworker just being stubborn and dumping OOP concepts into SQL?

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Protocol7 posted:

I have a coworker who is arguing in favor of the One True Lookup Table implementation, specifically for data-driven ListBox/combo box values, since in our systems the key-value for list/combo boxes is int-string (basically an enum).

I have argued that this is bad DB normalization, adds maintenance headaches and generally does not solve any problems with the DB, since the data tables utilizing the lookup table should have a good FK enforcing the referential integrity. Plus, any developer with a brain can find that relationship in the DB schema, so you don't save any time or effort.

He thinks the time saved in not having to create new tables in the future is worth it. I think it's a bad idea, but I'm not skilled enough of a SQL developer to know more reasons why. Is there actually a case for an OTLT here, or is my coworker just being stubborn and dumping OOP concepts into SQL?

Ask them if they can guarantee that no query will generate a plan that locks it while every single other query running wants a read.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

It's kind of dependent on what's being tracked/implemented. That yes, OTLT is often used to sidestep normalization. But there are other cases where it can make sense. Generally where I've seen it used is lots of arbitrary strings that are being displayed in presentation layer; usually things like multi-lingual applications. So in that case I've OTLT for say, units, with (type specificer),(language),(unit) returns "pounds", "libra" etc depending on the language requested.

Macichne Leainig
Jul 26, 2012

by VG

Scaramouche posted:

It's kind of dependent on what's being tracked/implemented. That yes, OTLT is often used to sidestep normalization. But there are other cases where it can make sense. Generally where I've seen it used is lots of arbitrary strings that are being displayed in presentation layer; usually things like multi-lingual applications. So in that case I've OTLT for say, units, with (type specificer),(language),(unit) returns "pounds", "libra" etc depending on the language requested.

That makes sense, but we only target the US due to the nature of our software so there would be no reason to have localization options.

Munkeymon posted:

Ask them if they can guarantee that no query will generate a plan that locks it while every single other query running wants a read.

Easy! Add with (nolock) to each query! :downs:

Gatac
Apr 22, 2008

Fifty Cent's next biopic.
(What I hope is) a quick Oracle question: I'm currently trying to MERGE into a table that contains time slices, as in an ID foreign-keyed to another table, START as a date, END as a date, then other columns not particularly relevant to the question. My problem is scenarios where multiple rows from the source query end up matching a single row in the destination table. In those cases, I'd really like it to only treat the "oldest" (as in, lowest START date) matching entry from the source as a match, but I haven't found a way to fit that into the ON conditions of the MERGE.

I suppose I could try to put some monstrosity of an analytic function into the source query to basically already do the matching in there and poo poo out some expression to use in the ON condition, but at that point I might as well just forget about MERGE and do it in PL/SQL where I have more options. Any ideas?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Protocol7 posted:

That makes sense, but we only target the US due to the nature of our software so there would be no reason to have localization options.




Yeah, iunno then. We have a similar case where every location specifier has to end up mapped to an ISO code when possible. That could have easily turned into a OTLT kind of implementation, but we did the more standard joins/lookup table approach.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Protocol7 posted:

I have a coworker who is arguing in favor of the One True Lookup Table implementation, ..


He thinks the time saved in not having to create new tables in the future is worth it... or is my coworker just being stubborn and dumping OOP concepts into SQL?
Coworker is front loading: It's nice to avoid time spent creating new tables, but what about every other activity? If you drop a table or do a reorganization, you'll be impacting the global lookup. If you need to separate APIs, change ownership, access privileges, authorization for modifications, you have no data separation and will be stuck doing months of reorg work. If you need to shard (ugh), same story. Benefits of indexing on values will be lost since the app always has to obtain the full list per use case.

It doesn't seem very oop to me. OTLT is closed for extension. That is, you can't alter or add to the lookup result, such as the aforementioned "value/unit" scenario. If you have to i18n one of the users, you can't add those fields without forcing nulls on all the others.

Yeah, it's lazy. I see it more as an Agile practice: If we ever need a change, we'll just rebuild everything.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Protocol7 posted:

I have a coworker who is arguing in favor of the One True Lookup Table implementation, specifically for data-driven ListBox/combo box values, since in our systems the key-value for list/combo boxes is int-string (basically an enum).

I have argued that this is bad DB normalization, adds maintenance headaches and generally does not solve any problems with the DB, since the data tables utilizing the lookup table should have a good FK enforcing the referential integrity. Plus, any developer with a brain can find that relationship in the DB schema, so you don't save any time or effort.

He thinks the time saved in not having to create new tables in the future is worth it. I think it's a bad idea, but I'm not skilled enough of a SQL developer to know more reasons why. Is there actually a case for an OTLT here, or is my coworker just being stubborn and dumping OOP concepts into SQL?

I can think of two other benefits to OTLT:
1) If users can create their own lookup types, it's probably easier to use OTLT than to try to write code to do the DDL. (Though maybe I'm wrong and there are ORMs that can handle that sort of thing well. I don't do a lot of application development, so maybe I have an incorrect conception of the state of tools out there.)
2) If you have some sort of pre-fetch-and-cache of your lookups (or any other kind of process that takes your data from the DB and puts it somewhere else, like some kind of reporting ETL), a process can do that with new lookups as soon as they're added to the table instead of needing to edit the process or remember to add the new table to a list somewhere or make sure you use the correct naming convention to get it picked up automatically or something.

That said, how long does it take to create a new lookup table? You open up the ddl script for the last one you did, change the table name and maybe some column names if you're feeling ambitious, and hit execute. Is that 12 seconds you save not going to be given back instantly by the fact that you now need to manually document which lookups are used where? As you said, with separate tables and foreign keys, the relationships are documented by the db itself, so you don't have to go hunting for whatever other table or spreadsheet or word doc or whatever has the (almost certainly out of date) relationships in it.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Jethro posted:

That said, how long does it take to create a new lookup table? You open up the ddl script for the last one you did, change the table name and maybe some column names if you're feeling ambitious, and hit execute.

Pretty much this. If your coworker is worried about the creation and maintenance of basic Id-Name tables their opinion is pretty much auto invalidated.

This is exactly why every team project needs a Data Access Layer - abstract the implementation details of the database away from the people who don’t understand the how or why.

OTLT is trash, scales poorly, and is a good way to cause yourself a lot of pain in the future when you inevitably refactor code or change or retire a lookup type.

Macichne Leainig
Jul 26, 2012

by VG
His argument is "If you're using a template to create a lookup table and all you're doing is changing the column names but keeping the table definition effectively the same, why make another table?"

He also created an SSMS code snippet for the DDL to create the table... so, the level of effort he has created for himself to define a new table is even smaller... :rolleyes:

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Protocol7 posted:

His argument is "If you're using a template to create a lookup table and all you're doing is changing the column names but keeping the table definition effectively the same, why make another table?"

He also created an SSMS code snippet for the DDL to create the table... so, the level of effort he has created for himself to define a new table is even smaller... :rolleyes:

Concede that his idea is great and that you can take it a step further by reducing whole database to

code:
CREATE TABLE Data (
   Id int IDENTITY NOT NULL,
   Value nvarchar(MAX)
)
and never make a new table again!

Macichne Leainig
Jul 26, 2012

by VG
Hey that sounds like a great idea! You could almost call each Value field a document, and call that a document-oriented table... Hell, go the full mile, and call it a document-oriented database. I think we're onto something... :thunk:

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Could be the next big thing - crazier things have happened

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Munkeymon posted:

Ask them if they can guarantee that no query will generate a plan that locks it while every single other query running wants a read.

:laugh:

Paper With Lines
Aug 21, 2013

The snozzberries taste like snozzberries!

Munkeymon posted:

Concede that his idea is great and that you can take it a step further by reducing whole database to

code:
CREATE TABLE Data (
   Id int IDENTITY NOT NULL,
   Value nvarchar(MAX)
)
and never make a new table again!

In the industry I work in, I am pretty sure that is how our software that runs the manufacturing plant (which is a mainstream piece of software) stores historical values of the instruments in the plant. Whenever I want to look something up it is like:

code:
SELECT Tag Name, Value
From AnalogHistory
WHERE TagName = 'Instrument Tag Name'
There are other WHERE clauses too that it uses to specify like date ranges and stuff, but this is the gist of it. If you pull multiple tag names, the output is in long format. If you select too long of a date range, the admin can tell it is lagging the system.

disclaimer: I use SQL, but I am not a DBA or anything so I could be missing some technical detail here, but this is really what it looks like to me.

JHVH-1
Jun 28, 2002
I'm not a DBA but I know mysql and mongdb pretty well from an ops perspective... but now I have a postgres thing to deal with.

So basically we have a production environment and want to take that database and mirror it in staging. My plan is to come up with a docker image to run this job nightly. On mysql this would be easy but I am not familiar with postres so there is different permissions and tables getting locked on database dump or import. The nodejs app runs and will be connected and that caused issues when I was trying to copy the database the other day.

Is there an easy way to handle this? My other alternative is to somehow automate using a snapshot and replacing the RDS endpoint with a new database instance, but then it also has the same grant access and database names so it wouldn't be perfect either.

There any articles or documents I should read up on to better understand this stuff, or maybe some tools other than what comes with pgsql?

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
we just use pg_basebackup and run a sql script afterwards to rename and set permissions as necessary

Sistergodiva
Jan 3, 2006

I'm like you,
I have no shame.

Ok, so I have a list of skills as a csv file. Each row starts with a skill and then has 5-10 ish related skills. These related skills also show up as the first in their own row with the other ones in their group as related.

I am thinking I should have a table where i give each skill a id to tie to the name. Then I am thinking a skill to skill table with 2 skill ids each row to set relations?

My biggest hurdle right now is to figure out how to import this csv. Getting the first skill in each row in a list with a id is no problem, but I can't think of a good way to create all the relationships.

This is mysql btw.

Kuule hain nussivan
Nov 27, 2008

Sistergodiva posted:

Ok, so I have a list of skills as a csv file. Each row starts with a skill and then has 5-10 ish related skills. These related skills also show up as the first in their own row with the other ones in their group as related.

I am thinking I should have a table where i give each skill a id to tie to the name. Then I am thinking a skill to skill table with 2 skill ids each row to set relations?

My biggest hurdle right now is to figure out how to import this csv. Getting the first skill in each row in a list with a id is no problem, but I can't think of a good way to create all the relationships.

This is mysql btw.
Can you make a copy of the file, make a regex to change all instances of your separator after the first one on a line to something else, and then handle the related skills as a single string?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Sistergodiva posted:

Ok, so I have a list of skills as a csv file. Each row starts with a skill and then has 5-10 ish related skills. These related skills also show up as the first in their own row with the other ones in their group as related.

I am thinking I should have a table where i give each skill a id to tie to the name. Then I am thinking a skill to skill table with 2 skill ids each row to set relations?

My biggest hurdle right now is to figure out how to import this csv. Getting the first skill in each row in a list with a id is no problem, but I can't think of a good way to create all the relationships.

This is mysql btw.

So parsing the lines like Kuule hain nussivan could work. But I would break the related skills up into rows as soon as you can rather than breaking first normal form.
Schema wise, unless you want each skill relationship to have a primary skill and a secondary skill then you should probably do something like:
pre:
Skill
  Skill_Key INT IDENTITY,
  Skill VARCHAR(blah),
  Random_other_shit_1 DataType,
  Random_other_shit_2 DataType,
  ...
  
Skill_Group
  Skill_Group_Key INT IDENTITY,
  Skill_Group_Name VARCHAR(Blah) -- if you need to give a group of related skills a name
  
Skill_Group_Member
  Skill_Group_Key INT,
  Skill_Key INT

FK's should be what appear natural. You could also put a unique index on Skill_Group_Member if you don't want to give the same skill membership in the group multiple times.

JHVH-1
Jun 28, 2002

DELETE CASCADE posted:

we just use pg_basebackup and run a sql script afterwards to rename and set permissions as necessary

Unfortunately I don't think that is an option with RDS managed databases. I got used to mysql user space where I just dump and import no matter what is connected or queries are active and it handles it.

kloa
Feb 14, 2007


It's been a solid 2 years since I stepped away from being a DBA/BI analyst/report writer/CRM admin/full-stack webdev/bunch of other stuff not in my job description (the position started out as a Junior DBA :wtc:) to work on another project for a year, and the previous 12+ months I've taken off to pursue other interests. I'm back on the job hunt, and need to refresh my SQL/BI/everything a bit.

Is there a ProjectEuler/TopCoder for querying data? Or should I just find a dataset, import it, and play around with SQL/Tableau/new BI hotness? I was pretty heavy into CTEs/window functions, and trying to learn SSAS/data warehouses before I quit that job. Any advice appreciated.

spiritual bypass
Feb 19, 2008

Grimey Drawer
I think Hacker Rank has a SQL section

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


kloa posted:

a DBA/BI analyst/report writer/CRM admin/full-stack webdev/bunch of other stuff not in my job description

Sounds like my job

kloa
Feb 14, 2007


rt4 posted:

I think Hacker Rank has a SQL section

:yosnice:

Hunt11
Jul 24, 2013

Grimey Drawer
I am right now working with the sakila dataset and when I am trying to join two tables I keep on getting error message 1046.

This is the code that I have done so far.
code:
SELECT Address.Address, Address.address_id, Staff.first_name, Staff.last_name
FROM sakila.Address
INNER JOIN Staff ON  Address.address_id = Staff.CustomerID;

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Anyone have any suggestions for how to store custom form structure in SQL tables that isn’t E-A-V and that is still decently reportable?

Specifically referencing the traditional setup of the following tables:

Form
Question
QuestionOption
Response
QuestionResponse

I find the above structure a royal pain in the rear end to work with, especially when you include multi-response questions (e.g. select the top 3), multiple data types and formats as options (date, int, numeric), shared option lists across multiple questions, and things like conditionally required or visible questions.

Unfortunately I haven’t heard of anything that scales better. Not sure I’m willing to hardcode the structure of each form into the application and create a fully denormalized table to store the data in the database. I might do that in the short term but obviously that becomes painful after a few forms.

Anyone else face this problem and solve it in a creative way? Or am I destined to implement this lovely annoying EAV architecture?

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Hunt11 posted:

I am right now working with the sakila dataset and when I am trying to join two tables I keep on getting error message 1046.

This is the code that I have done so far.
code:
SELECT Address.Address, Address.address_id, Staff.first_name, Staff.last_name
FROM sakila.Address
INNER JOIN Staff ON  Address.address_id = Staff.CustomerID;

1046 is that you don't have a database selected, possibly because you haven't specified where staff is. Also, that doesn't look like the join key you want, staff should have an address id for that.

kloa
Feb 14, 2007


Ruggan posted:

Anyone have any suggestions for how to store custom form structure in SQL tables that isn’t E-A-V and that is still decently reportable?

Specifically referencing the traditional setup of the following tables:

Form
Question
QuestionOption
Response
QuestionResponse

I built a similar form for a previous company, but it was more strict and less open-ended (not having stuff like Choose Top 3). I ended up breaking everything into chunks like you have, mainly because it's easier to read/understand each piece separately, rather than trying to be smart/shortcut it with crazy logic.

Are you worried about scaling for reportability or just future sanity to maintain it? It sounds like users will be creating forms (or you're entering them for them? :yikes:), so you could create a JSON object from the users' created structure and dump it into a single column somewhere. Then you recreate the form with the JSON field data on the front-end. Not sure how you're going to report on it, if it's able to have infinite questions and/or conditionals at all, unless you're just reporting on simple things like response rates (ie: 80% of users answered at least 7 questions on a 10 question Form).

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Let me give some more context. I work at a large software company and manage the internal development and reporting for our implementation division. Our division is responsible for installing the software at customer sites on a project basis. These installs are long - usually 6-18 months - and so there are a variety of things we track throughout the software implementation project both at a project level and also at a more granular product-by-product level. The website I'm working on is meant to consolidate a lot of disparate processes that have built up over time and live in InfoPath, Access, Excel, etc with the goal of making this data more reportable and more actionable by division leadership. I'm the gatekeeper for what process data is sane enough to keep versus what does not meet the threshold for broader adoption.

To answer your question, yes - I'm creating the forms for the users based on being the gatekeeper of deciding what is worth tracking. Part of why I'm not opening it up for users to self-create their own forms is because we have an internal tool that allows this, and it's a shitshow. Since we're a large company, we have lots of people creating garbage metrics for their own snowflake garbage processes and the vast majority end up being unused or quickly retired when their process is killed.

I'm responsible for both the web framework and the reporting, so I have a vested interest in building something that works well for both. For the development side, I want something composable - easy for me to add, remove, or change questions when necessary without rebuilding web code. For the reporting side, I want something easy to report on - allowing report writers to select the data elements from a table column or filter on question responses by a question ID. EAV meets both of those requirements, but it's a big upfront investment, and if I don't do it right the first time I'm going to need to worry about changes impacting everything that has been built on top of it.

For example, one place that EAV feels very painful is the handling of multiple data types. Let's say you need to store both text and int data. Which of these is better?

code:
-- Option 1: create a table to store response values and force all values into nvarchar.  
-- Web code needs to convert values to and from their appropriate type.  Not a big deal since we serialize it all to JSON anyway.
-- Reporting sucks because you need a ton of CONVERT or CAST statements to get the apporpriate datatypes.

CREATE TABLE dbo.QuestionResponse (
    QuestionId  int NOT NULL,
    ResponseId  int NOT NULL,
    Value       nvarchar(MAX) NULL
)
GO

-- Option 2: create a table to store response values with columns for each datatype.  
-- Web code is more annoying because you need to grab the right column based on question datatype.  Maybe the DAL handles it - whatever.
-- Reporting is better because data is stored in its proper format, but you still need to know which column to select from.  CASE statements galore.

CREATE TABLE dbo.QuestionResponse (
    QuestionId  int NOT NULL,
    ResponseId  int NOT NULL,
    ValueInt    int NULL,
    ValueString nvarchar(MAX) NULL
)
GO

Adbot
ADBOT LOVES YOU

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
option 1, but add another column that tells you what the type of Value is, and build the reporting queries with dynamic sql casts :getin:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply