Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
galenanorth
May 19, 2016

For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? None of my textbooks say it explicitly, but I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint.

galenanorth fucked around with this message at 23:16 on May 24, 2021

Adbot
ADBOT LOVES YOU

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

galenanorth posted:

For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint.

What? Which database are we even talking about here? At least in postgres table inheritance has some huge caveats that makes it almost useless in practice; see https://www.postgresql.org/docs/12/ddl-inherit.html

If you implement a type hierarchy yourself though without inheritance you can sort of hack up some of these with some combination of foreign key and check constraints, but it's gonna be hacky since check constraints can't reference other rows even in the same table, let alone others. To give you an idea of how hacky, let's say you want to implement a subtype discriminator. We have the tables animals (id, num_legs, name) and cats (id, animal_id, fluffiness). We want to ensure a cat must have 4 or less legs. We can do this by duplicating num_legs into the cats table and making it part of the foreign key, so we get FOREIGN KEY (animal_id, num_legs) REFERENCES animals (id, num_legs) ON UPDATE CASCADE, and then we add a check constraint on cats that says num_legs <= 4. That does implement a subtype discriminator but if you ever actually do this, you need therapy. Of course, I haven't actually tested this and I have no idea if check constraints are enforced when a row is cascade-updated via foreign key. I'd assume they do, but who knows. I don't think I've actually seen an ON UPDATE CASCADE foreign key in the wild, ever.

Anyway inheritance is pretty stupid as a way to do data modeling, and especially so in relational databases, so you probably shouldn't do it.


e: dang, my hilarious subtype constraint idea above doesn't actually work, foreign keys must reference unique or primary key columns. :smith: oh well just make num_legs part of a composite primary key then :unsmigghh:

btw you can do disjointness with foreign keys and unique indexes if you really want to, but total completeness is obnoxious. at the very least it would require deferred constraints (only checked on transaction commit) or it would be impossible to insert - parent couldn't exist without its children and vice versa.

TheFluff fucked around with this message at 00:16 on May 25, 2021

galenanorth
May 19, 2016

I see what mean. Thank you, I'll remember to try not to ever use inheritance when I'm doing database design. How do you do disjointness with foreign keys and unique indexes?

The context is that I was reading about the Extended Entity Relationship Model, in which the authors explain inheritance as being implemented using one-to-one relationships. I use postgres, but I plan on using it with Django's ORM, which ignores its INHERIT in favor of techniques used more generally by other DBMSes. It uses class table inheritance (the method with the foreign keys) or concrete table inheritance, but there are third-party extensions for single-table inheritance.

I'm guessing that for concrete inheritance, in which the common columns are copied over, Django and SQLAlchemy implement all the triggers and whatever's necessary to make things propagate automatically. I read that for inheritance of the "one" side in a one-to-many or one-to-one relationship, you have to make a link table with foreign keys to both the "one" table and the subtype entity's table, for each table, and though this is a lot of work, there isn't a performance hit as long as you don't query the supertype. Anyway, Django implements link tables for many-to-many relationships, so I'm guessing it generates link tables there, too.

Coronel C, Morris S. 2019. Database systems: design, implementation, and management. Boston, MA: Cengage Learning. 13th ed. p. 168-175.
Elmasri R, Navathe S. 2016. Fundamentals of Database Systems. Hoboken, NJ: Pearson. 7th ed., p. 107-120.
Fowler M, Rice D, Foemmel M, Hieatt E, Mee R, Stafford R. 2002. Patterns of Enterprise Application Architecture. Boston, MA: Addison Wesley. - source for the "class table inheritance, concrete table inheritance, single-table inheritance" thing, which squares with SQLAlchemy's support

I gotta finish this tomorrow, I'm synthesizing from textbook sources left and right and it feels like this should have all been in one place

edit: I meant "I see what you mean."

galenanorth fucked around with this message at 12:32 on May 25, 2021

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

galenanorth posted:

I see what mean. Thank you, I'll remember to try not to ever use inheritance when I'm doing database design. How do you do disjointness with foreign keys and unique indexes?

The context is that I was reading about the Extended Entity Relationship Model, in which the authors explain inheritance as being implemented using one-to-one relationships. I use postgres, but I plan on using it with Django's ORM, which ignores its INHERIT in favor of techniques used more generally by other DBMSes. It uses class table inheritance (the method with the foreign keys) or concrete table inheritance, but there are third-party extensions for single-table inheritance.

I'm guessing that for concrete inheritance, in which the common columns are copied over, Django and SQLAlchemy implement all the triggers and whatever's necessary to make things propagate automatically. I read that for inheritance of the "one" side in a one-to-many or one-to-one relationship, you have to make a link table with foreign keys to both the "one" table and the subtype entity's table, for each table, and though this is a lot of work, there isn't a performance hit as long as you don't query the supertype. Anyway, Django implements link tables for many-to-many relationships, so I'm guessing it generates link tables there, too.

Coronel C, Morris S. 2019. Database systems: design, implementation, and management. Boston, MA: Cengage Learning. 13th ed. p. 168-175.
Elmasri R, Navathe S. 2016. Fundamentals of Database Systems. Hoboken, NJ: Pearson. 7th ed., p. 107-120.
Fowler M, Rice D, Foemmel M, Hieatt E, Mee R, Stafford R. 2002. Patterns of Enterprise Application Architecture. Boston, MA: Addison Wesley. - source for the "class table inheritance, concrete table inheritance, single-table inheritance" thing, which squares with SQLAlchemy's support

I gotta finish this tomorrow, I'm synthesizing from textbook sources left and right and it feels like this should have all been in one place

I've used Django's inheritance and it's very simple, you just have a parent table and a number of child tables where the child tables have a parent_id as their primary key. There is nothing more complicated than that going on. Django is really not good about using database level constraints in general; it does a lot of things on the application level that the database really should be taking care of by itself - for example, Django implements ON DELETE CASCADE by itself with select and delete queries, instead of just letting the database do it on its own, and that significantly increases the risk of race conditions. Django doesn't use triggers at all either, it's all just implemented on the application side. It absolutely doesn't implement any of the constraints we've talked about. I don't recommend using inheritance in Django just as I don't recommend using it in general. It may seem tempting but it's generally a pretty bad abstraction.

The disjointness idea I had yesterday involved a separate table with a parent_id and one nullable foreign key column per potential child table plus a check constraint of some kind, but now that I think about it I doubt it'd actually be useful so never mind.

TheFluff fucked around with this message at 10:24 on May 25, 2021

kiwid
Sep 30, 2013

Been a while since I've had to write any SQL since being spoiled with an ORM.

I need to join a specific related row in a one-to-many relationship. For example:

orders
code:
| id |
|----|
| 1  |
| 2  |
| 3  |
instructions
code:
| order_id | type | notes |
|----------|------|-------|
| 1        | A    | blue   |
| 1        | B    | red   |
| 1        | C    | yellow   |
| 2        | A    | green   |
| 3        | C    | purple   |
I want a column in my result with only instructions with type A. For example:

code:
| id | notes |
|----|-------|
| 1  | blue  |
| 2  | green |
| 3  | null  |
Is the only way to do this by using a subquery on the join? Also type and order_id are a unique index and so there can't be multiple of the same type for an order.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

kiwid posted:

Been a while since I've had to write any SQL since being spoiled with an ORM.

I need to join a specific related row in a one-to-many relationship. For example:

orders
code:
| id |
|----|
| 1  |
| 2  |
| 3  |
instructions
code:
| order_id | type | notes |
|----------|------|-------|
| 1        | A    | blue   |
| 1        | B    | red   |
| 1        | C    | yellow   |
| 2        | A    | green   |
| 3        | C    | purple   |
I want a column in my result with only instructions with type A. For example:

code:
| id | notes |
|----|-------|
| 1  | blue  |
| 2  | green |
| 3  | null  |
Is the only way to do this by using a subquery on the join? Also type and order_id are a unique index and so there can't be multiple of the same type for an order.
If I'm understanding you correctly you can just use a left join and put the filter on instructions.type in the join clause instead of in the where clause, like so:

SQL code:
select
  orders.id,
  instructions.notes
from
  orders
  left join instructions on (
    orders.id = instructions.order_id
    and instructions.type = "A"
  )
For inner joins there is no difference between putting a condition in the join clause vs putting it in the where clause, but for left joins it matters - if you put instructions.type = "A" in the where clause instead you won't get order id 3 in the resultset.

kiwid
Sep 30, 2013

TheFluff posted:

If I'm understanding you correctly you can just use a left join and put the filter on instructions.type in the join clause instead of in the where clause, like so:

SQL code:
select
  orders.id,
  instructions.notes
from
  orders
  left join instructions on (
    orders.id = instructions.order_id
    and instructions.type = "A"
  )
For inner joins there is no difference between putting a condition in the join clause vs putting it in the where clause, but for left joins it matters - if you put instructions.type = "A" in the where clause instead you won't get order id 3 in the resultset.

Ah yes, this is exactly what I was looking for. Thanks.

meanieface
Mar 27, 2012

During times of universal deceit, telling the truth becomes a revolutionary act.

galenanorth posted:

For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? None of my textbooks say it explicitly, but I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint.

I.. just.. if someone tries to make you do this, side eye them hard and remind them that we don't put business rules in the database itself. We put business rules in the UI code. We check automatically when engineers are changing poo poo to ensure they aren't violating the requirements of the logging fields.

It's always, always easiest to ensure that the data entry person can only click volunteer or staff because it's a radio-style button than to have the person click save and then for nothing to save.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
if you really want to model non-relational constraints in the database, you can always use a check constraint that calls a function that does whatever you want

Tea Bone
Feb 18, 2011

I'm going for gasps.
I have a wordpress website I've taken over which has a ton of symbols in place of some characters (i.e instead of a single quote mark)

encoding settings on the database and wordpress config are all in order so what I think has happened is somewhere down the line the database has been exported using the wrong encoding then re-imported with the symbols.

I've been running a REPLACE across the affected columns as I find these symbols but really I'd like a better way to fix it across the entire database.

NinpoEspiritoSanto
Oct 22, 2013




Tea Bone posted:

I have a wordpress website I've taken over which has a ton of symbols in place of some characters (i.e instead of a single quote mark)

encoding settings on the database and wordpress config are all in order so what I think has happened is somewhere down the line the database has been exported using the wrong encoding then re-imported with the symbols.

I've been running a REPLACE across the affected columns as I find these symbols but really I'd like a better way to fix it across the entire database.

Ahh, mojibake, I'm assuming this is a MySQL backed DB?

I've dealt with this before, using the Python and the ftfy package and selecting all text/varchar type columns via SELECT HEX(table.columnname), passing into binascii.unhexlify then using a try decode("ascii") with a fallback of decode("utf-8") to find it all. The fix was either done in place with ftfy.fix_encoding and writing the data back to a correctly encoded table; or with my own issue dealing with incorrect utf8 saved to latin1 encoded tables, so things I couldn't encode back to cp1252 were written to a native utf8 table to be imported post table conversion to utf8mb4.

This isn't foolproof however, some things were lost/replaced with the utf8 unknown char or just stripped out as there was no way of knowing what the original value was supposed to be, but over a terabyte of text data was fixed with a handful of rows needing to be reconstructed by hand.

NinpoEspiritoSanto fucked around with this message at 11:53 on May 27, 2021

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Seconding the ftfy recommendation, it's loving magical. Always extremely handy to have around to debug wacky unicode issues in almost any application that accepts user input, even if you're not otherwise using Python (ftfy.explain_unicode() and ftfy.fix_and_explain() are extremely good at figuring out a) what the heck is even this string, and b) what the heck happened to it).

Data Graham
Dec 28, 2009

📈📊🍪😋



:stare:

God I wish I'd had that when I was rebuilding the data of a text archive site that I'd been maintaining since 1994.

Tea Bone
Feb 18, 2011

I'm going for gasps.

Bundy posted:

Ahh, mojibake, I'm assuming this is a MySQL backed DB?

I've dealt with this before, using the Python and the ftfy package and selecting all text/varchar type columns via SELECT HEX(table.columnname), passing into binascii.unhexlify then using a try decode("ascii") with a fallback of decode("utf-8") to find it all. The fix was either done in place with ftfy.fix_encoding and writing the data back to a correctly encoded table; or with my own issue dealing with incorrect utf8 saved to latin1 encoded tables, so things I couldn't encode back to cp1252 were written to a native utf8 table to be imported post table conversion to utf8mb4.

This isn't foolproof however, some things were lost/replaced with the utf8 unknown char or just stripped out as there was no way of knowing what the original value was supposed to be, but over a terabyte of text data was fixed with a handful of rows needing to be reconstructed by hand.

Thank you that looks exactly like what I need.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
If it's utf-8-encoded data stored in a latin-1 column then you can also fix it using nothing but ALTER TABLE statements, by changing the column character set via changing it to binary and then to the correct encoding. That might lock the table while the changes take place though, I don't know.

NinpoEspiritoSanto
Oct 22, 2013




Hammerite posted:

If it's utf-8-encoded data stored in a latin-1 column then you can also fix it using nothing but ALTER TABLE statements, by changing the column character set via changing it to binary and then to the correct encoding. That might lock the table while the changes take place though, I don't know.

While this is true, if mysql can't work out what the character was in case of double encoding then it will either truncate the row from that point on, replace with the unk symbol or throw an error, depending on settings. Defaults don't throw the error.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Bundy posted:

Python and the ftfy package
This has got to be the most Pythonic thing I've ever heard.

Bundy posted:

if mysql can't ... Defaults don't throw the error.
And the most MySQLish.


Tallyho!!

kiwid
Sep 30, 2013

kiwid posted:

Been a while since I've had to write any SQL since being spoiled with an ORM.

I need to join a specific related row in a one-to-many relationship. For example:

orders
code:
| id |
|----|
| 1  |
| 2  |
| 3  |
instructions
code:
| order_id | type | notes |
|----------|------|-------|
| 1        | A    | blue   |
| 1        | B    | red   |
| 1        | C    | yellow   |
| 2        | A    | green   |
| 3        | C    | purple   |
I want a column in my result with only instructions with type A. For example:

code:
| id | notes |
|----|-------|
| 1  | blue  |
| 2  | green |
| 3  | null  |
Is the only way to do this by using a subquery on the join? Also type and order_id are a unique index and so there can't be multiple of the same type for an order.

So this query has evolved a bit and now I need a result that looks like this:

code:
| id | has_type_a      | has_type_b      | has_type_c      |
|----|-----------------|-----------------|-----------------|
| 1  | true            | true            | true            |
| 2  | true            | false (or null) | false (or null) |
| 3  | false (or null) | false (or null) | true            |
What would be the best way to achieve this now? I could join the instructions table for each column, I could do a subquery join, I also considered doing something like a group_concat() but the orders table has a large amount of non-aggregated columns I'm selecting that I haven't included in this example.

kiwid fucked around with this message at 18:43 on Jun 1, 2021

Heavy_D
Feb 16, 2002

"rararararara" contains the meaning of everything, kept in simple rectangular structures
Hi kiwid. If you were using T-SQL I'd probably suggest using PIVOT, but it looks like you're on MySQL. So I'd suggest starting with something like
code:
  SELECT order_id AS id
        ,MIN(IF(type='A','true',NULL)) AS has_type_a
        ,MIN(IF(type='B','true',NULL)) AS has_type_b
        ,MIN(IF(type='C','true',NULL)) AS has_type_c
    FROM instructions
GROUP BY order_id

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

kiwid posted:

So this query has evolved a bit and now I need a result that looks like this:

code:
| id | has_type_a      | has_type_b      | has_type_c      |
|----|-----------------|-----------------|-----------------|
| 1  | true            | true            | true            |
| 2  | true            | false (or null) | false (or null) |
| 3  | false (or null) | false (or null) | true            |
What would be the best way to achieve this now? I could join the instructions table for each column, I could do a subquery join, I also considered doing something like a group_concat() but the orders table has a large amount of non-aggregated columns I'm selecting that I haven't included in this example.
In postgres I would probably just do

SQL code:
select
  orders.id,
  exists(
    select 1
    from instructions
    where instructions.order_id = orders.id and instructions.type = "A"
  ) as has_type_a,
  exists(
    select 1
    from instructions
    where instructions.order_id = orders.id and instructions.type = "B"
  ) as has_type_b
  -- etc
from orders
because although it makes the select list look like a mess, it makes the intent very clear and in postgres the query planner is usually quite good at optimizing correlated subqueries. Your code smells like some kind of mysql variant though and I don't know how that handles it. The more conventional way is probably joining instructions multiple times, as you suggested:

SQL code:
select
  orders.id,
  not isnull(instructions_type_a.type) as has_type_a,
  not isnull(instructions_type_b.type) as has_type_b,
  -- etc
from
  orders
  left join instructions as instructions_type_a on (
    orders.id = instructions_type_a.order_id and instructions_type_a.type = "A")
  left join instructions as instructions_type_b on (
    orders.id = instructions_type_b.order_id and instructions_type_b.type = "B")
  -- etc
MSSQL has a pivot function that can be real handy for this sort of thing (aggregation by turning rows into columns) but I don't think you're using that?


edit: the above assumes you're selecting columns other than id from orders as well, otherwise you should just use some kind of aggregation on instructions as suggested by Heavy_D.

TheFluff fucked around with this message at 21:56 on Jun 1, 2021

kiwid
Sep 30, 2013

Sorry I should have mentioned that yes I'm using T-SQL. I will investigate the pivot option. I definitely can't group because there are a lot of non-aggregated columns. Thanks.

edit: This is the current mess of a query that is joining the table multiple times:

SQL code:
SELECT orderlog.orderlog_nbr AS id,
    orderlog.orderlog_status,
    orderlog.container_release_number,
    orderlog.ship_booking_number,
    orderlog.plc_id AS location_id,
    location.short_name AS location_short_name,
    location.full_name AS location_name,
    orderlog.container_park,
    orderlog.ecn AS edn,
    orderlog.order_quantity,
    orderlog.order_quantity_uom,
    orderlog.nbr_of_containers,
    orderlog.packaging_type,
    orderlog.bag_weight_uom,
    orderlog.bag_markings,
    orderlog_bag_markings.notes AS pack,
    CASE 
        WHEN orderlog_nqac.orderlog_nbr IS NOT NULL
            THEN 'true'
        ELSE 'false'
        END AS nqac,
    CASE 
        WHEN orderlog_bag_phyto.orderlog_nbr IS NOT NULL
            THEN 'true'
        ELSE 'false'
        END AS phyto,
    CASE 
        WHEN orderlog_bag_sgs.orderlog_nbr IS NOT NULL
            THEN 'true'
        ELSE 'false'
        END AS sgs,
    CASE 
        WHEN orderlog_bag_strip_test.orderlog_nbr IS NOT NULL
            THEN 'true'
        ELSE 'false'
        END AS strip_test,
    CASE 
        WHEN orderlog_bag_grade_cert.orderlog_nbr IS NOT NULL
            THEN 'true'
        ELSE 'false'
        END AS grade_cert,
    orderlog.comments,
    orderlog.delivery_docket_pig_no,
    orderlog.delivery_docket_type,
    orderlog.commodity_id,
    commodity.name AS commodity_name,
    orderlog.iv_product AS product_id,
    product.description AS product_name,
    orderlog.vendor_id AS customer_id,
    vendor.short_name AS customer_short_name,
    vendor.full_name AS customer_name,
    orderlog.vendor_reference AS customer_reference,
    orderlog.ship_from_id,
    ship_from.short_name AS ship_from_short_name,
    ship_from.full_name AS ship_from_name,
    orderlog.ship_to_id,
    ship_to.short_name AS ship_to_short_name,
    ship_to.full_name AS ship_to_name,
    orderlog.shipping_id AS shipping_line_id,
    shipping_line.short_name AS shipping_line_short_name,
    shipping_line.full_name AS shipping_line_name,
    orderlog.vessel,
    orderlog.voyage,
    orderlog.loading_port,
    orderlog.discharge_port,
    orderlog.destination,
    orderlog.freight_carrier_id,
    freight_carrier.short_name AS freight_carrier_short_name,
    freight_carrier.full_name AS freight_carrier_name,
    orderlog.add_date,
    orderlog.start_date,
    orderlog.train_date,
    orderlog.etd,
    orderlog.cut_off_date
FROM production.dba.s1_orderlog AS orderlog
INNER JOIN production.dba.s1_name_and_address AS location ON orderlog.plc_id = location.name_and_address_id
INNER JOIN production.dba.s1_commodity AS commodity ON orderlog.commodity_id = commodity.commodity_id
INNER JOIN production.dba.s1_name_and_address AS vendor ON orderlog.vendor_id = vendor.name_and_address_id
INNER JOIN production.dba.s1_name_and_address AS ship_from ON orderlog.ship_from_id = ship_from.name_and_address_id
INNER JOIN production.dba.s1_name_and_address AS ship_to ON orderlog.ship_to_id = ship_to.name_and_address_id
LEFT OUTER JOIN production.dba.s1_name_and_address AS shipping_line ON orderlog.shipping_id = shipping_line.name_and_address_id
LEFT OUTER JOIN production.dba.s1_name_and_address AS freight_carrier ON orderlog.freight_carrier_id = freight_carrier.name_and_address_id
LEFT OUTER JOIN production.dba.s1_commodity_iv_product AS product ON orderlog.commodity_id = product.commodity_id
    AND orderlog.iv_product = product.iv_product
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_bag_markings ON orderlog.plc_id = orderlog_bag_markings.plc_id
    AND orderlog.orderlog_nbr = orderlog_bag_markings.orderlog_nbr
    AND orderlog_bag_markings.loading_instruction_nbr = '54'
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_nqac ON orderlog.plc_id = orderlog_nqac.plc_id
    AND orderlog.orderlog_nbr = orderlog_nqac.orderlog_nbr
    AND orderlog_nqac.loading_instruction_nbr = '44'
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_bag_phyto ON orderlog.plc_id = orderlog_bag_phyto.plc_id
    AND orderlog.orderlog_nbr = orderlog_bag_phyto.orderlog_nbr
    AND orderlog_bag_phyto.loading_instruction_nbr = '10'
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_bag_sgs ON orderlog.plc_id = orderlog_bag_sgs.plc_id
    AND orderlog.orderlog_nbr = orderlog_bag_sgs.orderlog_nbr
    AND orderlog_bag_sgs.loading_instruction_nbr = '7'
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_bag_strip_test ON orderlog.plc_id = orderlog_bag_strip_test.plc_id
    AND orderlog.orderlog_nbr = orderlog_bag_strip_test.orderlog_nbr
    AND orderlog_bag_strip_test.loading_instruction_nbr = '9'
LEFT OUTER JOIN production.dba.s1_orderlog_loading_instr AS orderlog_bag_grade_cert ON orderlog.plc_id = orderlog_bag_grade_cert.plc_id
    AND orderlog.orderlog_nbr = orderlog_bag_grade_cert.orderlog_nbr
    AND orderlog_bag_grade_cert.loading_instruction_nbr = '17'
WHERE (orderlog.orderlog_status <> 'x')
    AND (orderlog.plc_id = '01')
    AND (orderlog.orderlog_type_code = '3')

kiwid fucked around with this message at 01:51 on Jun 2, 2021

Heavy_D
Feb 16, 2002

"rararararara" contains the meaning of everything, kept in simple rectangular structures
In the pivot approach, if you add something like:
code:
LEFT JOIN
(
   (SELECT order_id, instruction 
      FROM instructions) AS instr_rows
     PIVOT
   (MIN(instruction) 
    FOR instruction IN ([A],[B],[C])) AS instr_cols
)
ON instr_cols.order_id = orderlog.orderlog_nbr
You should have access to columns instr_cols.[A], instr_cols.[B] and instr_cols.[C] in your main query, which you can test for being null/non-null using a case statement in the same manner as your strip_test column.

kiwid
Sep 30, 2013

Heavy_D posted:

In the pivot approach, if you add something like:
code:
LEFT JOIN
(
   (SELECT order_id, instruction 
      FROM instructions) AS instr_rows
     PIVOT
   (MIN(instruction) 
    FOR instruction IN ([A],[B],[C])) AS instr_cols
)
ON instr_cols.order_id = orderlog.orderlog_nbr
You should have access to columns instr_cols.[A], instr_cols.[B] and instr_cols.[C] in your main query, which you can test for being null/non-null using a case statement in the same manner as your strip_test column.

Hey thanks,

PIVOT ended up being exactly what I needed. It took me a little to figure out the syntax but I've modified the query and it's working.

kiwid
Sep 30, 2013

Alright, so I've finally launched my application into production which is simply a web dashboard with a read only connection to our ERP software's database for reporting purposes. Unfortunately I'm seeing deadlocks now which didn't occur during development and so my select queries being cancelled. I've never experienced deadlocking before so I'm not sure how to come up with a solution here. It appears that when the ERP software is making update transactions, it'll lock the rows and won't allow me to issue a select statement. The reporting needs to be as "real-time" as possible and so an ETL process into a data warehouse wasn't the solution I determined a while back, at least in this instance.

Here is the error I'm seeing:

quote:

[2021-06-07 15:44:27] production.ERROR: SQLSTATE[40001]: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 75) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

I started Googling and found this "NOLOCK" hint (https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/). Just wondering if any of you have any feedback on using something like this? I understand the possibility of getting a "dirty read", but I believe that's acceptable considering the dashboard is polling an updated query every 60 seconds and would correct itself.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


kiwid posted:

Alright, so I've finally launched my application into production which is simply a web dashboard with a read only connection to our ERP software's database for reporting purposes. Unfortunately I'm seeing deadlocks now which didn't occur during development and so my select queries being cancelled. I've never experienced deadlocking before so I'm not sure how to come up with a solution here. It appears that when the ERP software is making update transactions, it'll lock the rows and won't allow me to issue a select statement. The reporting needs to be as "real-time" as possible and so an ETL process into a data warehouse wasn't the solution I determined a while back, at least in this instance.

Here is the error I'm seeing:
I started Googling and found this "NOLOCK" hint (https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/). Just wondering if any of you have any feedback on using something like this? I understand the possibility of getting a "dirty read", but I believe that's acceptable considering the dashboard is polling an updated query every 60 seconds and would correct itself.

Our app is built around accepting dirty reads as okay, and squeezing every ounce of performance out of the database server. For years, using (NOLOCK) was a near-mandatory standard. Several years back, we discovered SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED aka STILRU which is basically batch level NOLOCK and is the norm now.

We know dirty reads are a thing, but in the interest of deadlock dodging, we accept the risks. It's not the greatest because of the reads issue, and a slightly more conservative SET TRANSACTION ISOLATION LEVEL SNAPSHOT ISOLATION.

Here's an article I skimmed: https://www.sqlshack.com/snapshot-isolation-in-sql-server/

Instant edit:
I would also do some profiling to see why your CRUD is longrunning enough to cause deadlock issues. Maybe its nothing and has to do with the locks that SELECTs are taking on underlying rows, but it's worth a look.

kiwid
Sep 30, 2013

Nth Doctor posted:

Our app is built around accepting dirty reads as okay, and squeezing every ounce of performance out of the database server. For years, using (NOLOCK) was a near-mandatory standard. Several years back, we discovered SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED aka STILRU which is basically batch level NOLOCK and is the norm now.

We know dirty reads are a thing, but in the interest of deadlock dodging, we accept the risks. It's not the greatest because of the reads issue, and a slightly more conservative SET TRANSACTION ISOLATION LEVEL SNAPSHOT ISOLATION.

Here's an article I skimmed: https://www.sqlshack.com/snapshot-isolation-in-sql-server/

Instant edit:
I would also do some profiling to see why your CRUD is longrunning enough to cause deadlock issues. Maybe its nothing and has to do with the locks that SELECTs are taking on underlying rows, but it's worth a look.

Thanks for this information. I'll implement NOLOCK as a quick fix then and I'll start to investigate the SNAPSHOT ISOLATION. I'll also look into the profiling. This database is 65GB so I'm sure there are some performance issues there but I'm not sure if it's server related or just lovely ERP software related. As a side note, is 65GB considered big these days?

Star War Sex Parrot
Oct 2, 2003

kiwid posted:

As a side note, is 65GB considered big these days?
No, not really. That's easily an in-memory workload.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
You can get like half a terabyte of RAM on consumer hardware with :pcgaming: RGB LEDs :pcgaming: these days, for perspective. Heroku's standard postgres plans can get you up to 768 GB of RAM on a single instance. At my last job a Heroku postgres instance with 30.5 GB RAM - which is less than what I have in my gaming PC at home - easily handled a ~100GB database, and I don't consider that to be a big database either. There are exceedingly few businesses that can't be run on top of a single SQL database (c.f. LetsEncrypt or Stackoverflow). Imagine I linked that Gary Bernhardt "consulting service: I tell you your dataset fits in RAM; I charge you $10k for saving you $500k" tweet here.

TheFluff fucked around with this message at 00:47 on Jun 8, 2021

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


At my work we have 2 OLTP databases which are clocking in at ~70TB and 55TB, and it still drat works, although large table operations are getting hard so the business has finally really decided to invest in archiving/purging out the drat things.

Away all Goats
Jul 5, 2005

Goose's rebellion

newbie at SQL here. I've definitely got a stupid question.

code:
DECLARE
    pledges dd_pledge%ROWTYPE;
    pledge_amt dd_pledge.pledgeamt%TYPE;
    paid_amt dd_pledge.pledgeamt%TYPE;
    balance dd_pledge.pledgeamt%TYPE;
    months_paid NUMBER(5);
BEGIN
    SELECT *
    INTO pledges
    FROM dd_pledge
    WHERE idpledge = 105;
    
months_paid := pledges.paymonths;
pledge_amt := pledges.pledgeamt;

IF pledges.paymonths = 0 
    THEN paid_amt := pledges.pledgeamt;
    ELSE paid_amt := months_paid * (pledges.pledgeamt/pledges.paymonths);
END IF;  
balance := pledges.pledgeamt - paid_amt;
DBMS_OUTPUT.PUT_LINE('Pledge ID: ' || pledges.idpledge);
DBMS_OUTPUT.PUT_LINE('Donor ID: ' || pledges.iddonor);
DBMS_OUTPUT.PUT_LINE('Pledge Amount: ' || pledges.pledge_amt);
DBMS_OUTPUT.PUT_LINE('Amount Paid: ' || paid_amt);
DBMS_OUTPUT.PUT_LINE('Balance: ' || balance);
END;
I keep getting the error "PLS-00302: component 'PLEDGE_AMT' must be declared". However it doesn't have a problem with the extracting the paymonths / months_paid in the exact same method. How do I fix this?

Away all Goats fucked around with this message at 04:26 on Jun 14, 2021

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
code:

DBMS_OUTPUT.PUT_LINE('Pledge Amount: ' || pledges.pledge_amt);

This is the only place that uses pledges.pledge_amt instead of pledges.pledgeamt.

In general your naming schemes seem kind of inconsistent.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Moonwolf posted:

At my work we have 2 OLTP databases which are clocking in at ~70TB and 55TB, and it still drat works, although large table operations are getting hard so the business has finally really decided to invest in archiving/purging out the drat things.

Are those Oracle, SQL server, or did you do that on something else? That's pretty huge and I'm curious if you have a ton of tables and multiple applications sharing them, or your core data model is really that big.

Either way it's impressive, do your biggest indexes still fit in memory?

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


Twerk from Home posted:

Are those Oracle, SQL server, or did you do that on something else? That's pretty huge and I'm curious if you have a ton of tables and multiple applications sharing them, or your core data model is really that big.

Either way it's impressive, do your biggest indexes still fit in memory?

That's on Informix, which probably limits the sphere of where I work noticeably. Yeah, the working dataset and indices all fit in memory because we have 3/4 TB of memory on them (mirrored out of 1.5TB), and we're testing some AMD boxes where we're going up to 1TB visible. Schemas are both 200+ tables + PKs, FKs and composites for specific queries, but the bulk is it being high transaction volumes going back about 10 years, although most of the data is in the last 1-2 years always from traffic growth.

worms butthole guy
Jan 29, 2021

by Fluffdaddy
hi goons i'm trying to run a sql command on wordpress where i want to filter the data by the where cost in the below, but still show all the other fields in meta_key and meta_value...i'm sure i'm just missing something

code:
SELECT *
FROM _JWR_users RIGHT JOIN _JWR_usermeta 
ON _JWR_users.ID = _JWR_usermeta.user_id 
WHERE (_JWR_usermeta.meta_key = 'artist_tour_status' AND _JWR_usermeta.meta_value != "Not on Tour")
ORDER BY _JWR_users.user_nicename
so I guess essentially i'm trying to SELECT * from those two tables, but I only want to see related data that happens to have a artist_tour_status that isn't Not on Tour

thanks

worms butthole guy fucked around with this message at 13:54 on Jun 17, 2021

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
Do the filter on the right join instead?

Select * from t1
Right join (select * from t2 where ...)
On ...

Phone posting so just giving the sketch

worms butthole guy
Jan 29, 2021

by Fluffdaddy
ooo I didn't know you could even do that

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Postgresql question: any way to give a user privileges such that they can run arbitrary SELECTs, but their SELECTs never create locks (at the cost of getting potentially dirty reads)? We've given our tech support staff read-only privileges, but they have still accidentally managed to cause a deadlock against a production query, probably by leaving multiple tabs running or whatever.

And yes, I know the only 100% safe way is to create a read-only replica, but I would first look for a more lightweight solution is possible.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

NihilCredo posted:

Postgresql question: any way to give a user privileges such that they can run arbitrary SELECTs, but their SELECTs never create locks (at the cost of getting potentially dirty reads)? We've given our tech support staff read-only privileges, but they have still accidentally managed to cause a deadlock against a production query, probably by leaving multiple tabs running or whatever.

And yes, I know the only 100% safe way is to create a read-only replica, but I would first look for a more lightweight solution is possible.

That sounds... very weird. AFAIK, normal SELECT queries only ever acquire an ACCESS SHARE lock on the referenced table(s), and the only lock which can conflict with that is ACCESS EXCLUSIVE. ACCESS EXCLUSIVE in turn is normally only acquired by DDL statements like DROP TABLE, ALTER TABLE, ALTER INDEX etc. The only way I can see this causing a deadlock outside of a schema migration is if you have a transaction that uses explicit locking and does LOCK TABLE without specifying the lock mode, because that gives you ACCESS EXCLUSIVE (and you should probably fix that transaction, in that case). Or, possibly, if their select queries are written as SELECT FOR SHARE or SELECT FOR UPDATE.

Either way, there is no way in postgres to completely disable locking; there's no equivalent to MSSQL's NOLOCK as far as I know.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Fixins posted:

hi goons i'm trying to run a sql command on wordpress where i want to filter the data by the where cost in the below, but still show all the other fields in meta_key and meta_value...i'm sure i'm just missing something

code:
SELECT *
FROM _JWR_users RIGHT JOIN _JWR_usermeta 
ON _JWR_users.ID = _JWR_usermeta.user_id 
WHERE (_JWR_usermeta.meta_key = 'artist_tour_status' AND _JWR_usermeta.meta_value != "Not on Tour")
ORDER BY _JWR_users.user_nicename
so I guess essentially i'm trying to SELECT * from those two tables, but I only want to see related data that happens to have a artist_tour_status that isn't Not on Tour

thanks
i'm a broken record that just keeps posting a "hey you can use where exists() here" solution to every query question itt but here you go

SQL code:
select *
from
  _jwr_users
  inner join _jwr_usermeta on _jwr_usermeta.user_id = _jwr_users.ID
where exists(
  select 1
  from _jwr_usermeta as usermeta_inner
  where
    usermeta_inner.meta_key = 'artist_tour_status'
    and usermeta_inner.meta_value != 'Not on Tour'
    and usermeta_inner.user_id = _jwr_users.ID
)
order by _jwr_users.user_nicename

TheFluff fucked around with this message at 16:10 on Jun 17, 2021

Adbot
ADBOT LOVES YOU

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

TheFluff posted:

That sounds... very weird. AFAIK, normal SELECT queries only ever acquire an ACCESS SHARE lock on the referenced table(s), and the only lock which can conflict with that is ACCESS EXCLUSIVE. ACCESS EXCLUSIVE in turn is normally only acquired by DDL statements like DROP TABLE, ALTER TABLE, ALTER INDEX etc. The only way I can see this causing a deadlock outside of a schema migration is if you have a transaction that uses explicit locking and does LOCK TABLE without specifying the lock mode, because that gives you ACCESS EXCLUSIVE (and you should probably fix that transaction, in that case). Or, possibly, if their select queries are written as SELECT FOR SHARE or SELECT FOR UPDATE.

Yeah, I too thought it was hard to get locks. But we had an ETL job query that normally takes ~20 seconds that had been waiting for the better part of an hour, so we ran the recommended lock monitoring queries and we saw something like this:

code:
blocked_user       | blocking_user      | blocked_statement | current_statement_in_blocking_process
----------------------------------------------------------------------------------------------------------------
application        | tech support guy 1 | etl job query     | the same etl query but ran manually from dbeaver
{ repeated for ~80 rows, with various users in both blocked and blocking roles, all running that same query }
The ETL query under exam is indeed just a SELECT, although a pretty complicated one with CTEs hitting views, materialized views, and so forth. Come to think of it, there was also a REFRESH MATERIALIZED VIEW among the lock mess, but again the materialized view is just a plain SELECT.

Unfortunately I was stupid and proceeded to immediately kill the human-user PIDs (which did clear the deadlock and the job started working in the usual ~20s again), but didn't think of first adding a few more columns to the pg_locks query to find out exactly which resources were being deadlocked.


quote:

Either way, there is no way in postgres to completely disable locking; there's no equivalent to MSSQL's NOLOCK as far as I know.

Darn.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply