|
For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? None of my textbooks say it explicitly, but I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint.
galenanorth fucked around with this message at 23:16 on May 24, 2021 |
# ? May 24, 2021 21:24 |
|
|
# ? Jun 1, 2024 11:45 |
|
galenanorth posted:For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint. What? Which database are we even talking about here? At least in postgres table inheritance has some huge caveats that makes it almost useless in practice; see https://www.postgresql.org/docs/12/ddl-inherit.html If you implement a type hierarchy yourself though without inheritance you can sort of hack up some of these with some combination of foreign key and check constraints, but it's gonna be hacky since check constraints can't reference other rows even in the same table, let alone others. To give you an idea of how hacky, let's say you want to implement a subtype discriminator. We have the tables animals (id, num_legs, name) and cats (id, animal_id, fluffiness). We want to ensure a cat must have 4 or less legs. We can do this by duplicating num_legs into the cats table and making it part of the foreign key, so we get FOREIGN KEY (animal_id, num_legs) REFERENCES animals (id, num_legs) ON UPDATE CASCADE, and then we add a check constraint on cats that says num_legs <= 4. That does implement a subtype discriminator but if you ever actually do this, you need therapy. Of course, I haven't actually tested this and I have no idea if check constraints are enforced when a row is cascade-updated via foreign key. I'd assume they do, but who knows. I don't think I've actually seen an ON UPDATE CASCADE foreign key in the wild, ever. Anyway inheritance is pretty stupid as a way to do data modeling, and especially so in relational databases, so you probably shouldn't do it. e: dang, my hilarious subtype constraint idea above doesn't actually work, foreign keys must reference unique or primary key columns. oh well just make num_legs part of a composite primary key then btw you can do disjointness with foreign keys and unique indexes if you really want to, but total completeness is obnoxious. at the very least it would require deferred constraints (only checked on transaction commit) or it would be impossible to insert - parent couldn't exist without its children and vice versa. TheFluff fucked around with this message at 00:16 on May 25, 2021 |
# ? May 24, 2021 23:29 |
|
I see what mean. Thank you, I'll remember to try not to ever use inheritance when I'm doing database design. How do you do disjointness with foreign keys and unique indexes? The context is that I was reading about the Extended Entity Relationship Model, in which the authors explain inheritance as being implemented using one-to-one relationships. I use postgres, but I plan on using it with Django's ORM, which ignores its INHERIT in favor of techniques used more generally by other DBMSes. It uses class table inheritance (the method with the foreign keys) or concrete table inheritance, but there are third-party extensions for single-table inheritance. I'm guessing that for concrete inheritance, in which the common columns are copied over, Django and SQLAlchemy implement all the triggers and whatever's necessary to make things propagate automatically. I read that for inheritance of the "one" side in a one-to-many or one-to-one relationship, you have to make a link table with foreign keys to both the "one" table and the subtype entity's table, for each table, and though this is a lot of work, there isn't a performance hit as long as you don't query the supertype. Anyway, Django implements link tables for many-to-many relationships, so I'm guessing it generates link tables there, too. Coronel C, Morris S. 2019. Database systems: design, implementation, and management. Boston, MA: Cengage Learning. 13th ed. p. 168-175. Elmasri R, Navathe S. 2016. Fundamentals of Database Systems. Hoboken, NJ: Pearson. 7th ed., p. 107-120. Fowler M, Rice D, Foemmel M, Hieatt E, Mee R, Stafford R. 2002. Patterns of Enterprise Application Architecture. Boston, MA: Addison Wesley. - source for the "class table inheritance, concrete table inheritance, single-table inheritance" thing, which squares with SQLAlchemy's support I gotta finish this tomorrow, I'm synthesizing from textbook sources left and right and it feels like this should have all been in one place edit: I meant "I see what you mean." galenanorth fucked around with this message at 12:32 on May 25, 2021 |
# ? May 25, 2021 02:59 |
|
galenanorth posted:I see what mean. Thank you, I'll remember to try not to ever use inheritance when I'm doing database design. How do you do disjointness with foreign keys and unique indexes? I've used Django's inheritance and it's very simple, you just have a parent table and a number of child tables where the child tables have a parent_id as their primary key. There is nothing more complicated than that going on. Django is really not good about using database level constraints in general; it does a lot of things on the application level that the database really should be taking care of by itself - for example, Django implements ON DELETE CASCADE by itself with select and delete queries, instead of just letting the database do it on its own, and that significantly increases the risk of race conditions. Django doesn't use triggers at all either, it's all just implemented on the application side. It absolutely doesn't implement any of the constraints we've talked about. I don't recommend using inheritance in Django just as I don't recommend using it in general. It may seem tempting but it's generally a pretty bad abstraction. The disjointness idea I had yesterday involved a separate table with a parent_id and one nullable foreign key column per potential child table plus a check constraint of some kind, but now that I think about it I doubt it'd actually be useful so never mind. TheFluff fucked around with this message at 10:24 on May 25, 2021 |
# ? May 25, 2021 09:55 |
|
Been a while since I've had to write any SQL since being spoiled with an ORM. I need to join a specific related row in a one-to-many relationship. For example: orders code:
code:
code:
|
# ? May 25, 2021 14:37 |
|
kiwid posted:Been a while since I've had to write any SQL since being spoiled with an ORM. SQL code:
|
# ? May 25, 2021 15:28 |
|
TheFluff posted:If I'm understanding you correctly you can just use a left join and put the filter on instructions.type in the join clause instead of in the where clause, like so: Ah yes, this is exactly what I was looking for. Thanks.
|
# ? May 25, 2021 16:02 |
|
galenanorth posted:For inheritance, how would you enforce the disjointness constraint, the total completeness constraint, and the subtype discriminator (e.g. a Pilot must have Flight_Hours > 1,500 in the Employee parent table)? None of my textbooks say it explicitly, but I'm guessing that if you're not using single-table inheritance, you'd use procedural programming like you do when enforcing cardinality. These all seem less vital to enforce than most constraints because they seem like they'd be harder to mess up during data entry, though. For example, it seems like it'd be easier for the data entry person to remember a Member someone can be a Volunteer or Staff, but not both, than remember, say, a "three items per customer per visit" cardinality constraint. I.. just.. if someone tries to make you do this, side eye them hard and remind them that we don't put business rules in the database itself. We put business rules in the UI code. We check automatically when engineers are changing poo poo to ensure they aren't violating the requirements of the logging fields. It's always, always easiest to ensure that the data entry person can only click volunteer or staff because it's a radio-style button than to have the person click save and then for nothing to save.
|
# ? May 26, 2021 07:43 |
|
if you really want to model non-relational constraints in the database, you can always use a check constraint that calls a function that does whatever you want
|
# ? May 26, 2021 19:45 |
|
I have a wordpress website I've taken over which has a ton of symbols in place of some characters (i.e ’ instead of a single quote mark) encoding settings on the database and wordpress config are all in order so what I think has happened is somewhere down the line the database has been exported using the wrong encoding then re-imported with the symbols. I've been running a REPLACE across the affected columns as I find these symbols but really I'd like a better way to fix it across the entire database.
|
# ? May 27, 2021 11:21 |
Tea Bone posted:I have a wordpress website I've taken over which has a ton of symbols in place of some characters (i.e ’ instead of a single quote mark) Ahh, mojibake, I'm assuming this is a MySQL backed DB? I've dealt with this before, using the Python and the ftfy package and selecting all text/varchar type columns via SELECT HEX(table.columnname), passing into binascii.unhexlify then using a try decode("ascii") with a fallback of decode("utf-8") to find it all. The fix was either done in place with ftfy.fix_encoding and writing the data back to a correctly encoded table; or with my own issue dealing with incorrect utf8 saved to latin1 encoded tables, so things I couldn't encode back to cp1252 were written to a native utf8 table to be imported post table conversion to utf8mb4. This isn't foolproof however, some things were lost/replaced with the utf8 unknown char or just stripped out as there was no way of knowing what the original value was supposed to be, but over a terabyte of text data was fixed with a handful of rows needing to be reconstructed by hand. NinpoEspiritoSanto fucked around with this message at 11:53 on May 27, 2021 |
|
# ? May 27, 2021 11:47 |
|
Seconding the ftfy recommendation, it's loving magical. Always extremely handy to have around to debug wacky unicode issues in almost any application that accepts user input, even if you're not otherwise using Python (ftfy.explain_unicode() and ftfy.fix_and_explain() are extremely good at figuring out a) what the heck is even this string, and b) what the heck happened to it).
|
# ? May 27, 2021 12:28 |
God I wish I'd had that when I was rebuilding the data of a text archive site that I'd been maintaining since 1994.
|
|
# ? May 27, 2021 12:43 |
|
Bundy posted:Ahh, mojibake, I'm assuming this is a MySQL backed DB? Thank you that looks exactly like what I need.
|
# ? May 27, 2021 14:12 |
|
If it's utf-8-encoded data stored in a latin-1 column then you can also fix it using nothing but ALTER TABLE statements, by changing the column character set via changing it to binary and then to the correct encoding. That might lock the table while the changes take place though, I don't know.
|
# ? May 27, 2021 22:36 |
Hammerite posted:If it's utf-8-encoded data stored in a latin-1 column then you can also fix it using nothing but ALTER TABLE statements, by changing the column character set via changing it to binary and then to the correct encoding. That might lock the table while the changes take place though, I don't know. While this is true, if mysql can't work out what the character was in case of double encoding then it will either truncate the row from that point on, replace with the unk symbol or throw an error, depending on settings. Defaults don't throw the error.
|
|
# ? May 27, 2021 23:38 |
|
Bundy posted:Python and the ftfy package Bundy posted:if mysql can't ... Defaults don't throw the error. Tallyho!!
|
# ? May 28, 2021 21:35 |
|
kiwid posted:Been a while since I've had to write any SQL since being spoiled with an ORM. So this query has evolved a bit and now I need a result that looks like this: code:
kiwid fucked around with this message at 18:43 on Jun 1, 2021 |
# ? Jun 1, 2021 18:41 |
|
Hi kiwid. If you were using T-SQL I'd probably suggest using PIVOT, but it looks like you're on MySQL. So I'd suggest starting with something likecode:
|
# ? Jun 1, 2021 21:35 |
|
kiwid posted:So this query has evolved a bit and now I need a result that looks like this: SQL code:
SQL code:
edit: the above assumes you're selecting columns other than id from orders as well, otherwise you should just use some kind of aggregation on instructions as suggested by Heavy_D. TheFluff fucked around with this message at 21:56 on Jun 1, 2021 |
# ? Jun 1, 2021 21:51 |
|
Sorry I should have mentioned that yes I'm using T-SQL. I will investigate the pivot option. I definitely can't group because there are a lot of non-aggregated columns. Thanks. edit: This is the current mess of a query that is joining the table multiple times: SQL code:
kiwid fucked around with this message at 01:51 on Jun 2, 2021 |
# ? Jun 2, 2021 01:46 |
|
In the pivot approach, if you add something like:code:
|
# ? Jun 2, 2021 07:56 |
|
Heavy_D posted:In the pivot approach, if you add something like: Hey thanks, PIVOT ended up being exactly what I needed. It took me a little to figure out the syntax but I've modified the query and it's working.
|
# ? Jun 4, 2021 05:24 |
|
Alright, so I've finally launched my application into production which is simply a web dashboard with a read only connection to our ERP software's database for reporting purposes. Unfortunately I'm seeing deadlocks now which didn't occur during development and so my select queries being cancelled. I've never experienced deadlocking before so I'm not sure how to come up with a solution here. It appears that when the ERP software is making update transactions, it'll lock the rows and won't allow me to issue a select statement. The reporting needs to be as "real-time" as possible and so an ETL process into a data warehouse wasn't the solution I determined a while back, at least in this instance. Here is the error I'm seeing: quote:[2021-06-07 15:44:27] production.ERROR: SQLSTATE[40001]: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 75) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. I started Googling and found this "NOLOCK" hint (https://www.mssqltips.com/sqlservertip/2470/understanding-the-sql-server-nolock-hint/). Just wondering if any of you have any feedback on using something like this? I understand the possibility of getting a "dirty read", but I believe that's acceptable considering the dashboard is polling an updated query every 60 seconds and would correct itself.
|
# ? Jun 7, 2021 18:05 |
|
kiwid posted:Alright, so I've finally launched my application into production which is simply a web dashboard with a read only connection to our ERP software's database for reporting purposes. Unfortunately I'm seeing deadlocks now which didn't occur during development and so my select queries being cancelled. I've never experienced deadlocking before so I'm not sure how to come up with a solution here. It appears that when the ERP software is making update transactions, it'll lock the rows and won't allow me to issue a select statement. The reporting needs to be as "real-time" as possible and so an ETL process into a data warehouse wasn't the solution I determined a while back, at least in this instance. Our app is built around accepting dirty reads as okay, and squeezing every ounce of performance out of the database server. For years, using (NOLOCK) was a near-mandatory standard. Several years back, we discovered SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED aka STILRU which is basically batch level NOLOCK and is the norm now. We know dirty reads are a thing, but in the interest of deadlock dodging, we accept the risks. It's not the greatest because of the reads issue, and a slightly more conservative SET TRANSACTION ISOLATION LEVEL SNAPSHOT ISOLATION. Here's an article I skimmed: https://www.sqlshack.com/snapshot-isolation-in-sql-server/ Instant edit: I would also do some profiling to see why your CRUD is longrunning enough to cause deadlock issues. Maybe its nothing and has to do with the locks that SELECTs are taking on underlying rows, but it's worth a look.
|
# ? Jun 7, 2021 18:22 |
|
Nth Doctor posted:Our app is built around accepting dirty reads as okay, and squeezing every ounce of performance out of the database server. For years, using (NOLOCK) was a near-mandatory standard. Several years back, we discovered SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED aka STILRU which is basically batch level NOLOCK and is the norm now. Thanks for this information. I'll implement NOLOCK as a quick fix then and I'll start to investigate the SNAPSHOT ISOLATION. I'll also look into the profiling. This database is 65GB so I'm sure there are some performance issues there but I'm not sure if it's server related or just lovely ERP software related. As a side note, is 65GB considered big these days?
|
# ? Jun 7, 2021 19:10 |
|
kiwid posted:As a side note, is 65GB considered big these days?
|
# ? Jun 7, 2021 22:43 |
|
You can get like half a terabyte of RAM on consumer hardware with RGB LEDs these days, for perspective. Heroku's standard postgres plans can get you up to 768 GB of RAM on a single instance. At my last job a Heroku postgres instance with 30.5 GB RAM - which is less than what I have in my gaming PC at home - easily handled a ~100GB database, and I don't consider that to be a big database either. There are exceedingly few businesses that can't be run on top of a single SQL database (c.f. LetsEncrypt or Stackoverflow). Imagine I linked that Gary Bernhardt "consulting service: I tell you your dataset fits in RAM; I charge you $10k for saving you $500k" tweet here.
TheFluff fucked around with this message at 00:47 on Jun 8, 2021 |
# ? Jun 8, 2021 00:42 |
|
At my work we have 2 OLTP databases which are clocking in at ~70TB and 55TB, and it still drat works, although large table operations are getting hard so the business has finally really decided to invest in archiving/purging out the drat things.
|
# ? Jun 9, 2021 15:54 |
|
newbie at SQL here. I've definitely got a stupid question.code:
Away all Goats fucked around with this message at 04:26 on Jun 14, 2021 |
# ? Jun 14, 2021 04:24 |
|
code:
This is the only place that uses pledges.pledge_amt instead of pledges.pledgeamt. In general your naming schemes seem kind of inconsistent.
|
# ? Jun 14, 2021 04:35 |
|
Moonwolf posted:At my work we have 2 OLTP databases which are clocking in at ~70TB and 55TB, and it still drat works, although large table operations are getting hard so the business has finally really decided to invest in archiving/purging out the drat things. Are those Oracle, SQL server, or did you do that on something else? That's pretty huge and I'm curious if you have a ton of tables and multiple applications sharing them, or your core data model is really that big. Either way it's impressive, do your biggest indexes still fit in memory?
|
# ? Jun 14, 2021 12:48 |
|
Twerk from Home posted:Are those Oracle, SQL server, or did you do that on something else? That's pretty huge and I'm curious if you have a ton of tables and multiple applications sharing them, or your core data model is really that big. That's on Informix, which probably limits the sphere of where I work noticeably. Yeah, the working dataset and indices all fit in memory because we have 3/4 TB of memory on them (mirrored out of 1.5TB), and we're testing some AMD boxes where we're going up to 1TB visible. Schemas are both 200+ tables + PKs, FKs and composites for specific queries, but the bulk is it being high transaction volumes going back about 10 years, although most of the data is in the last 1-2 years always from traffic growth.
|
# ? Jun 14, 2021 13:03 |
|
hi goons i'm trying to run a sql command on wordpress where i want to filter the data by the where cost in the below, but still show all the other fields in meta_key and meta_value...i'm sure i'm just missing somethingcode:
thanks worms butthole guy fucked around with this message at 13:54 on Jun 17, 2021 |
# ? Jun 17, 2021 13:51 |
|
Do the filter on the right join instead? Select * from t1 Right join (select * from t2 where ...) On ... Phone posting so just giving the sketch
|
# ? Jun 17, 2021 14:10 |
|
ooo I didn't know you could even do that
|
# ? Jun 17, 2021 14:14 |
|
Postgresql question: any way to give a user privileges such that they can run arbitrary SELECTs, but their SELECTs never create locks (at the cost of getting potentially dirty reads)? We've given our tech support staff read-only privileges, but they have still accidentally managed to cause a deadlock against a production query, probably by leaving multiple tabs running or whatever. And yes, I know the only 100% safe way is to create a read-only replica, but I would first look for a more lightweight solution is possible.
|
# ? Jun 17, 2021 14:31 |
|
NihilCredo posted:Postgresql question: any way to give a user privileges such that they can run arbitrary SELECTs, but their SELECTs never create locks (at the cost of getting potentially dirty reads)? We've given our tech support staff read-only privileges, but they have still accidentally managed to cause a deadlock against a production query, probably by leaving multiple tabs running or whatever. That sounds... very weird. AFAIK, normal SELECT queries only ever acquire an ACCESS SHARE lock on the referenced table(s), and the only lock which can conflict with that is ACCESS EXCLUSIVE. ACCESS EXCLUSIVE in turn is normally only acquired by DDL statements like DROP TABLE, ALTER TABLE, ALTER INDEX etc. The only way I can see this causing a deadlock outside of a schema migration is if you have a transaction that uses explicit locking and does LOCK TABLE without specifying the lock mode, because that gives you ACCESS EXCLUSIVE (and you should probably fix that transaction, in that case). Or, possibly, if their select queries are written as SELECT FOR SHARE or SELECT FOR UPDATE. Either way, there is no way in postgres to completely disable locking; there's no equivalent to MSSQL's NOLOCK as far as I know.
|
# ? Jun 17, 2021 15:49 |
|
Fixins posted:hi goons i'm trying to run a sql command on wordpress where i want to filter the data by the where cost in the below, but still show all the other fields in meta_key and meta_value...i'm sure i'm just missing something SQL code:
TheFluff fucked around with this message at 16:10 on Jun 17, 2021 |
# ? Jun 17, 2021 16:08 |
|
|
# ? Jun 1, 2024 11:45 |
|
TheFluff posted:That sounds... very weird. AFAIK, normal SELECT queries only ever acquire an ACCESS SHARE lock on the referenced table(s), and the only lock which can conflict with that is ACCESS EXCLUSIVE. ACCESS EXCLUSIVE in turn is normally only acquired by DDL statements like DROP TABLE, ALTER TABLE, ALTER INDEX etc. The only way I can see this causing a deadlock outside of a schema migration is if you have a transaction that uses explicit locking and does LOCK TABLE without specifying the lock mode, because that gives you ACCESS EXCLUSIVE (and you should probably fix that transaction, in that case). Or, possibly, if their select queries are written as SELECT FOR SHARE or SELECT FOR UPDATE. Yeah, I too thought it was hard to get locks. But we had an ETL job query that normally takes ~20 seconds that had been waiting for the better part of an hour, so we ran the recommended lock monitoring queries and we saw something like this: code:
Unfortunately I was stupid and proceeded to immediately kill the human-user PIDs (which did clear the deadlock and the job started working in the usual ~20s again), but didn't think of first adding a few more columns to the pg_locks query to find out exactly which resources were being deadlocked. quote:Either way, there is no way in postgres to completely disable locking; there's no equivalent to MSSQL's NOLOCK as far as I know. Darn.
|
# ? Jun 17, 2021 17:24 |