Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Unfortunately it's not my table so I can't alter the design. I can only write queries against it.

Adbot
ADBOT LOVES YOU

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


That's fair, this sort of thing is often handed to us, but yeah, you're going to just have to join a bunch of times I'm afraid. It should be pretty cheap though, just ugly in the SQL.

nielsm
Jun 1, 2009



Agrikk posted:

Another question:

Given a table like so:
code:
TransactionID   Item1   Item1Count Item2 Item2Count  Item3 Item3count  Item4 Item4count
abc123                 4              10                5               20            6             100            7          50
Without doing four joins? Is there a lookup table function or technique that'll look up the all the values in the whole row from one table at once?

Uh, untested stupid idea:

SQL code:
with TransactionItems as (
select TransactionID, Item1 ItemID, Item1Count ItemCount from Transactions
union
select TransactionID, Item2 ItemID, Item2Count ItemCount from Transactions
union
select TransactionID, Item3 ItemID, Item3Count ItemCount from Transactions
union
select TransactionID, Item4 ItemID, Item4Count ItemCount from Transactions
)
select *
from TransactionItems ti join Items i on ti.ItemID = i.ItemID

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


That'll probably be quite slow, but depending on which db Agrikk is using there might be unpivot to do the same thing with less reads. A lot of that would be determined if you're doing this for 1 transaction id or a huge group of them though Agrikk.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Its a MySQL database and its for an ad-hoc report run against about 40k rows.

As ugly as it is, I suppose I'll just go with the list of joins. Thanks y'all for your input.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
There have been way too many uses of the word "pivot" in the thread lately. This is not the Excel thread.

Please pivot to proper https://en.m.wikipedia.org/wiki/Database_normalization

:eng99:

Sir Bobert Fishbone
Jan 16, 2006

Beebort

PhantomOfTheCopier posted:

There have been way too many uses of the word "pivot" in the thread lately. This is not the Excel thread.

Please pivot to proper https://en.m.wikipedia.org/wiki/Database_normalization

:eng99:

Makin a lot of assumptions about how much control we have over db schemas here

Gatac
Apr 22, 2008

Fifty Cent's next biopic.
A project I recently switched to has a table that has no less than 25 varchar2(1) columns to be filled with Y/N in response to specific questions on a work order. (They weren't bloody-minded enough to have check constraints for all 25 columns but I almost would have appreciated it more if they had.) And of course they're labelled Question1...Question25. To be fair, they started with 15 of those only and then added the rest in batches of 2 or 3 at a time as the need arose, apparently never once thinking about whether this was a good idea.

Forget database normalization, this is the blackest timeline.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

Gatac posted:

A project I recently switched to has a table that has no less than 25 varchar2(1) columns to be filled with Y/N in response to specific questions on a work order. (They weren't bloody-minded enough to have check constraints for all 25 columns but I almost would have appreciated it more if they had.) And of course they're labelled Question1...Question25. To be fair, they started with 15 of those only and then added the rest in batches of 2 or 3 at a time as the need arose, apparently never once thinking about whether this was a good idea.

Forget database normalization, this is the blackest timeline.

you've sparked a question I've been meaning to ask for a while - what would be the proper database/application patterns for a multi-stage wizard type workflow where you have multiple "pages" of data you need to collect but still want to maintain constraints on your database/etc?

what we currently do is it's tied to your browser session at the application level, I suppose you could also do a thing where you have each "table" be a stage of the workflow and you can maintain constraints on each piece individually as it comes in, that way you could push it out of the application layer for multi-server scaling or to allow server restarts, etc.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Paul MaudDib posted:

you've sparked a question I've been meaning to ask for a while - what would be the proper database/application patterns for a multi-stage wizard type workflow where you have multiple "pages" of data you need to collect but still want to maintain constraints on your database/etc?

what we currently do is it's tied to your browser session at the application level, I suppose you could also do a thing where you have each "table" be a stage of the workflow and you can maintain constraints on each piece individually as it comes in, that way you could push it out of the application layer for multi-server scaling or to allow server restarts, etc.

IMO, you should have one table per stage, which isn't necessarily the same as a page.

If two steps of the wizard are independent of each other, then can share the same record in the same table - the fact that they are displayed on separate pages is purely a presentation concern. You'll be able to move fields between them, or reorder the pages, or split / merge them, without the need for a migration. Or you could have one big page on desktop, and many smaller ones on mobile.

However, if step 5 requires the user to have completed step 4 in order to be properly displayed, then they belong to different stages of the process, and step 5 should live in a table with a foreign key constraint on the step 4 table. The page order in that case isn't just presentation, it's a data-level constraint.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I have a SQL Server 2017 table with data added in batches hourly, adding approximately 2.5 million small rows (5 columns data type datetime, varchar, bigint, bigint, bigint) every 60 minutes. This table holds the trailing two weeks of data so everything older than DATEADD(hour, -336, GETDATE()) gets deleted every hour.

I'm trying to speed up the deletion process and am thinking about implementing a sliding window partitioning scheme with partitions for every hour, merging a staging table onto the top of the table and truncating a staging table off the bottom.

Can someone check my thinking here? I'm a novice to partitioning and am happy to do the legwork to get it working. I just want to make sure this is the correct approach.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Before you pull out the big guns, have you checked if a plain old clustered non-unique index on the datetime column might sufficiently speed up the deletion without significantly impacting write speeds (assuming the incoming data is usually chronological)?

edit: a columnstore index may also help you, depending on the use you normally make of the data

NihilCredo fucked around with this message at 08:19 on Apr 22, 2020

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


We do this for very high volume tables. Our approach is to use a partitioned index partitioned on the add_date column by week. To delete old data, I believe we move a partition out to a different table which we subsequently drop. This is all fuzzy recollection from when I owned the code that used the table a few years ago, but I can look up the method when I hop online to work later.

Just-In-Timeberlake
Aug 18, 2003
Not sure if this is a SQL solution or something else so asking for opinions.

Say you have 90k+ text files (~3gb), with several different file formats

What would be the best way to store and search them if you couldn't use a simple file search? (I wrote a file search app on steroids to do that and it works aces, but going forward this won't be an option)

I'm considering the following:

  • load each file into MSSQL using a FULLTEXT index (either the whole file per record or parse each row into its own record). Pro: I know how to do this already
  • Use ElasticSearch or Solr (or something similar). Con: I know gently caress all about these and would have to figure it out

Any opinions? Pros/Cons?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Nth Doctor posted:

We do this for very high volume tables. Our approach is to use a partitioned index partitioned on the add_date column by week. To delete old data, I believe we move a partition out to a different table which we subsequently drop. This is all fuzzy recollection from when I owned the code that used the table a few years ago, but I can look up the method when I hop online to work later.

I went back and looked up our method:
code:
CREATE TABLE dbo.To_Be_Deleted
(
/* Columns matching your real table here */    
CONSTRAINT PK_For_To_Be_Deleted PRIMARY KEY CLUSTERED 
  (
    /*PK Columns here*/
  )
);
DECLARE @MinimumPartition INT;
SELECT 
  @MinimumPartition = MIN($PARTITION.Partition_Function(Partition_Column))
FROM dbo.Real_Table;

DECLARE @SQL NVARCHAR(MAX)
SET @SQL = N'
ALTER TABLE dbo.Real_Table 
SWITCH PARTITION ' + 
CAST(@MinimumPartition AS VARCHAR(10)) + 
' TO dbo.To_Be_Deleted';

-- Move partition
EXEC (@SQL);

-- Drop the table with the old partition
DROP TABLE dbo.To_Be_Deleted;
  
I'm disconnected from our VPN so I can't pull up the partition function as an example but this is the broad strokes for dropping a broad swath of date-related data

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Multiple replies and half asleep from a long day I'll do my best...

Do you have functional indexes? Index on (epoch seconds modulo 3600)? Do you know if the bottleneck is finding the rows or removing them? Compare with a select runtime.

For full text search it depends on your expected request rate. It's not much data so storing it in-house in a database permits you more custom configuration, language support, scripting and so forth. Cloud solutions depend on how often you're expecting to conduct searches. Remember that you'll be paying to retrieve the full document from most cloud providers, so you have to benefit from their search architecture.

Most provide ad-hoc searching, say a few per day. That would be something like S3 storage and an ETL job to rebuild indexes when documents are updated. If you expect lots of requests, then it becomes more efficient to permanently store the documents in a searchable fashion, leverage Elastic search, semantic analysis tools, and so forth.

I'm not the expert in setting it all up but your primary concerns are total storage size, expected document update rate, expected search rate, and total cost of operation for on-premises versus request and data transfer costs.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
Thanks for the replies everyone and I'll back up to avoid solving for the wrong issue here.

It's basically an insert performance problem that I'm solving with a solution that that generates a delete performance problem.

Like I said upthread, I have a table that contains data that is inserted hourly in batches of around 2 million rows each hour. I retain that data for two weeks and then purge the oldest data. During that time I also perform rollups to a daily table, that rolls up to a monthly table.

I would like to bring down the performance of inserts and deletes of the hourly table. The table is currently 106GB in size and contains around 780 million rows.

The table looks like this:

[username] [varchar](75) NOT NULL,
[team] [bigint] NOT NULL,
[stamp] [datetime] NOT NULL,
[score] [bigint] NULL,
[wu] [bigint] NULL,

with a clustered primary key on [username][team][stamp] with an additional non-clustered index on [stamp].

Every hour I perform a DELETE FROM table WHERE stamp < (dateadd(ww,-2, (SELECT MAX(stamp) FROM table))) to snip off the oldest row.

I'm performing this trim because inserts are taking a long time and I've found that keeping the row count low is the best bang for the buck.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Is it commonplace when designing a schema to find that you need to add relationships across a "diamond"?

Say I have table A which has foreign key constraints to tables B and C. Table D also has foreign keys to tables B and C. The purpose of table D is to dictate which pairs of (B, C) values can legally appear in table A. So, add a foreign key from A to D as well; simple enough.

But now say that instead of C on one side of the diamond, I have C1 and C2. A has a foreign key to C1 which has a foreign key to C2; D has foreign keys to B and C2. A still has its foreign key to B.

If I'm using a composite key for table C1 then I can still do what I did before, because A will have a column or columns which are guaranteed to match with the grandparent row in C2. But what if I'm using surrogate keys everywhere? I basically have to break that convention for C1.

In case this all made anyone's eyes glaze over here are ascii diagrams.

code:
---- D ----        ---- D ----
|         |        |         |
|         |        |         v
v         v        v         C2
B         C        B         ^
^         ^        ^         C1
|         |        |         ^
|         |        |         |
---- A ----        ---- A ----

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Hammerite posted:

Is it commonplace when designing a schema to find that you need to add relationships across a "diamond"?

...

code:
---- D ----        ---- D ----
|         |        |         |
|         |        |         v
v         v        v         C2
B         C        B         ^
^         ^        ^         C1
|         |        |         ^
|         |        |         |
---- A ----        ---- A ----
It doesn't look like a diamond pattern problem (to me). It's an explicit whitelist/blacklist problem, so B is kinda superfluous.

Order.itemid references whitelist.itemid. (A -> D)

Why should A reference B or C/C1 when the relevant restricted possibilities are contained in D? (I'll ponder c1 vs c2 while sleeping).

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

It doesn't look like a diamond pattern problem (to me). It's an explicit whitelist/blacklist problem, so B is kinda superfluous.

Order.itemid references whitelist.itemid. (A -> D)

Why should A reference B or C/C1 when the relevant restricted possibilities are contained in D? (I'll ponder c1 vs c2 while sleeping).

D is a junction table between B and C2.

In greater detail than I gave before: a row in C2 represents a schema-version for documents managed by the system (a document-schema-version). A row in C1 represents such a document. C1 references C2 because a document has a document-schema-version. Each document can contain zero, one or many entities that come in several varieties (entity-varieties). Each entity-variety itself has schema-versions (entity-variety-schema-versions). A row in B represents an entity-schema-version. A row in A represents an entity within a document. Each document-schema-version may have associated to it zero or one entity-variety-schema-versions for each entity-variety. A row in D represents the possibility that entities of a specific entity-variety can be present in a document with a specific document-schema-version, and what the entity-schema-version of entities of that entity-variety must be if they are present. It follows that for any row in A you need to be able to chase around the diamond both ways and arrive at the same row in D.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


I guess reading that I still think Phantom may be right.

code:
          C1
          v
B         C2
v         v
---- D ----
     ^
     A 
?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Ruggan posted:

I guess reading that I still think Phantom may be right.

code:
          C1
          v
B         C2
v         v
---- D ----
     ^
     A 
?

I was going to post and say this is wrong, but actually bits of it are right.

Like you and PhantomOfTheCopier were saying, A should have a foreign key reference to D. But it still doesn't make sense for B and C2 to have foreign key references to D as your diagram suggests; they should be the other way around. And A still needs a reference to C1. So it should look like:

code:
B <-- D --> C2
      ^     ^
      |     |
      A --> C1
There is still a diamond and a need to use composite keys, however the arrows are now going in consistent directions, which is the main thing that was bothering me before.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Looks more reasonable like that and probably good enough as a first step on a free dead gay comedy forum. (Reading through it looks reasonable, a few random, unchecked thoughts sprang to mind questioning if it must be that way, but...)

There's certainly a good supply of graphs that can't be represented easily as relational data, with the associated key/trigger/app tradeoffs, rather dependent on the features of your database. But you knew that.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


PhantomOfTheCopier posted:

Looks more reasonable like that and probably good enough as a first step on a free dead gay comedy forum. (Reading through it looks reasonable, a few random, unchecked thoughts sprang to mind questioning if it must be that way, but...)

There's certainly a good supply of graphs that can't be represented easily as relational data, with the associated key/trigger/app tradeoffs, rather dependent on the features of your database. But you knew that.

You got in for free?!

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Nth Doctor posted:

You got in for free?!
Please use full text search capabilities of your database to find a more precise word, because I'm forced to observe: He wasn't easy but he sure was cheap. :pervert:

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
Perplexed in another area of database design now. (This is SQL Server)

I was looking at the documentation on foreign keys and noted that the columns of a foreign key have to match a primary key or unique key in the referenced table. Well, that is perfectly reasonable. But surely it should be ok for the foreign key to reference a superset of the columns in a unique key? Uniqueness is guaranteed after all. But no, it won't have it, even if you add a unique index ... includes (...):

code:
drop table if exists Z;
drop table if exists Y;
drop table if exists X;

create table X (
	X0 int not null,
	X1 int not null,
	constraint pkX primary key (X0, X1)
)

create table Y (
	X00 int not null,
	X01 int not null,
	X10 int not null,
	X11 int not null,
	constraint pkY primary key (X00, X01, X10),
	constraint fkYX0 foreign key (X00, X01) references X (X0, X1),
	constraint fkYX1 foreign key (X10, X11) references X (X0, X1)
)

create unique index YIx on Y (X00, X01, X10) include (X11)

create table Z (
	X00 int not null,
	X01 int not null,
	X10 int not null,
	X11 int not null,
	constraint pkZ primary key (X00, X10),
	constraint fkZY foreign key (X00, X01, X10, X11) references Y (X00, X01, X10, X11)
)
Msg 1776, Level 16, State 0, Line 23
There are no primary or candidate keys in the referenced table 'Y' that match the referencing column list in the foreign key 'fkZY'.

If you declare YIx as being on (X00, X01, X10, X11) it accepts it just fine. So if you want to do this you have to add an entirely superflous unique constraint/unique index... :psyduck:

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
While it's true mathematically, the database has to implement the reference in a consistent and enforceable manner. Take a look at the system tables. My guess is you'll find that references are tied to the (single) foreign index, as opposed to a list of indexes. There's also a question of how much analysis should be needed for a foreign key relationship to be established as proper.

First question: Suppose you had 100 tables, then a table in the middle that referenced those 100, then the upper table that did this little trick. What would be required on an insert to ensure the foreign composite key was valid?

Second question: What should happen when you drop one of the foreign key constraints in the middle table? How should the database track that such a change will break the expectations of the upper table?


vvvv In that case you'd better follow the instructions and not worry about it much.

PhantomOfTheCopier fucked around with this message at 14:26 on Apr 30, 2020

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

While it's true mathematically, the database has to implement the reference in a consistent and enforceable manner. Take a look at the system tables. My guess is you'll find that references are tied to the (single) foreign index, as opposed to a list of indexes. There's also a question of how much analysis should be needed for a foreign key relationship to be established as proper.

First question: Suppose you had 100 tables, then a table in the middle that referenced those 100, then the upper table that did this little trick. What would be required on an insert to ensure the foreign composite key was valid?

Second question: What should happen when you drop one of the foreign key constraints in the middle table? How should the database track that such a change will break the expectations of the upper table?

I'm sorry, I can't really interpret what you're saying here easily. I don't know what you mean to refer to by the phrase "this little trick". I'm also not sure what direction your height metaphor goes in (is "upper table" X or Z?)

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
There's a ton of theory here, but basically the database is ensuring efficient handling of foreign keys by requiring a (single) reference index. It could do a full foreign table scan to ensure that the entries are there, but then every delete operation would need to check all referencing tables to make sure that nothing breaks. Instead, deleting the key from the index permits a quick check of referencing tables so the correct action can be triggered.

Likewise, the database could permit a reference to multiple indexes. But then why not multiple indexes across multiple tables? Why not have (z.x00,z.x01,z.x10,z.x11) reference (y.x00,y.x01,y.x10,x.1)? (And now you know why I didn't bother with your names previously because who can keep that straight?) Or maybe it could reference (y.x00,y.x01,x.0,x.1)?

Basically they've chosen to disallow such things because that leads to madness and a bunch of superfluous operations. Storage is less expensive. (There may even be infinite trigger recursion hiding in the non-single-index scenario.)

Here's a good question to ask: Given the table layouts described, how could you convince someone that Z.(x00,x01,x10,x11) is unique? For that matter, what about Y.(x00,x01,x10,x11)? What are the mathematical and/or programmatic steps taken to ensure uniqueness? Those are the steps the DB would have to perform on DDL creation to handle a transitive tree of relationships, and it would have to be updated if someone tried to change one small part of it. Their design choice was, "Naup, single index reference".


ps. There's another cute thing you can do in Y as defined that doesn't alter the primary key entry but would require a change to Z. (Possibly db implementation specific, but logically valid.) We'll call it homework question two.

pps. Foreign keys don't really reference column values; they use row IDs / object IDs. It's a subtle point, but I suspect you already knew that.

PhantomOfTheCopier fucked around with this message at 23:42 on Apr 30, 2020

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
You seem convinced that there's a big problem with there being "multiple indexes". There isn't, that's a red herring as far as I can see. I overcomplicated the original example because I was unnecessarily replicating some aspects of the real-world problem I had. Here is a minimal example:

code:
drop table if exists z;
drop table if exists y;

create table Y(
    IdY int not null,
    Value int not null,
    constraint pky primary key(IdY)
)

create table Z(
    IdZ int not null,
    IdY int not null,
    Value int not null,
    constraint pkz primary key(IdZ),
    constraint fk foreign key (IdY, Value) references Y (IdY, Value) -- Msg 1776 There are no primary or candidate keys etc.
)
You're also telling me off as though I was imperiously demanding you studiously use my table names throughout your posts, but all I was said was that I didn't know what up and down are in your metaphor. You could just clarify with a single sentence and then continue to use the metaphor, I didn't presume to forbid you that.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
You spent time on this and typed a bunch, so I'm going to do you the courtesy of responding properly to some of it.

PhantomOfTheCopier posted:

There's a ton of theory here, but basically the database is ensuring efficient handling of foreign keys by requiring a (single) reference index. It could do a full foreign table scan to ensure that the entries are there, but then every delete operation would need to check all referencing tables to make sure that nothing breaks. Instead, deleting the key from the index permits a quick check of referencing tables so the correct action can be triggered.

I don't think there's anything in my original example that implies the DB engine should do more than it currently already supports. SQL Server allows foreign keys to be on columns that aren't indexed in the referencing table. They do have to be indexed in the referenced table. I went to the trouble (as I said in my post) of seeing if the DB would allow what I was trying to do if I used the "INCLUDES" mechanism to include the additional column in the index, but it doesn't.

quote:

Likewise, the database could permit a reference to multiple indexes. But then why not multiple indexes across multiple tables? Why not have (z.x00,z.x01,z.x10,z.x11) reference (y.x00,y.x01,y.x10,x.1)? (And now you know why I didn't bother with your names previously because who can keep that straight?) Or maybe it could reference (y.x00,y.x01,x.0,x.1)?

Basically they've chosen to disallow such things because that leads to madness and a bunch of superfluous operations. Storage is less expensive. (There may even be infinite trigger recursion hiding in the non-single-index scenario.)

As I said in my previous post I don't see how "multiple indexes" applies, so I think these two paragraphs are erroneous. Perhaps you can tell me why they aren't really.

quote:

Here's a good question to ask: Given the table layouts described, how could you convince someone that Z.(x00,x01,x10,x11) is unique?

I could not convince anyone of that, because it is not true. The primary key on Z is (x00, x01) and it has no further unique indexes.

quote:

For that matter, what about Y.(x00,x01,x10,x11)?

Suppose for a contradiction that Y has two rows with the same values for (X00, X01, X10, X11). The primary key on Y is (X00, X01, X10). If Y has two rows that have the same values for (X00, X01, X10, X11) then in particular those two rows have the same values for (X00, X01, X10), which violates the primary key on Y. This is the desired contradiction. QED.

quote:

What are the mathematical and/or programmatic steps taken to ensure uniqueness?

It's identifying a subset of the columns (in the proposed foreign key on Z) that are collectively subject to a unique constraint in the referenced table. I don't think that's mathematically or programmatically intractable.

quote:

Those are the steps the DB would have to perform on DDL creation to handle a transitive tree of relationships, and it would have to be updated if someone tried to change one small part of it. Their design choice was, "Naup, single index reference".

The word "transitive" here is curious. Nothing need be transitive; see my minimal example.

quote:

ps. There's another cute thing you can do in Y as defined that doesn't alter the primary key entry but would require a change to Z. (Possibly db implementation specific, but logically valid.) We'll call it homework question two.

There are various indicators in your post that we are talking at least partially at cross purposes, so I'm not going to attempt to infer what you're driving at here.

nielsm
Jun 1, 2009



When IdY is unique in itself, why does your Z table need to reference more fields than that? I don't understand what that achieves. All I see is denormal data with the Value field duplicated needlessly.
Are you sure you shouldn't have Z not contain the Value and instead have a view that joins Y and Z?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

nielsm posted:

When IdY is unique in itself, why does your Z table need to reference more fields than that? I don't understand what that achieves. All I see is denormal data with the Value field duplicated needlessly.
Are you sure you shouldn't have Z not contain the Value and instead have a view that joins Y and Z?

In the two-table example yes everything appears gratuitous, but that is in the nature of a minimal reproducible example.

In the first example I posed (with X, Y and Z) I cut out some of the detail from the real-world example (but left some in, and got the worst of both worlds - a non-minimal example and a lack of clarity).

Imagine that there is another table W with only X00, and that X references W. The real-world meaning of a row in Y is that two rows from X can be associated to one another, but asymmetrically (there is an "owning" row and an "owned" row); each row from X can "own" another row, but only one per W-value of the "owned" row. Z meanwhile is constrained by Y, but should reference X twice (missing from the example).

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



Hammerite posted:

I don't think there's anything in my original example that implies the DB engine should do more than it currently already supports. SQL Server allows foreign keys to be on columns that aren't indexed in the referencing table. They do have to be indexed in the referenced table. I went to the trouble (as I said in my post) of seeing if the DB would allow what I was trying to do if I used the "INCLUDES" mechanism to include the additional column in the index, but it doesn't.

Is all of this just wanting an FK constraint to work (more) like an index?

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


I haven't thought too deeply about the problem, but are we looking for something like a CHECK constraint or possibly functionality like an indexed view to enforce schema rules not otherwise expressible by FKs?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Nth Doctor posted:

I haven't thought too deeply about the problem, but are we looking for something like a CHECK constraint or possibly functionality like an indexed view to enforce schema rules not otherwise expressible by FKs?

I wasn't previously aware of indexed views. Playing around with it, I see that an alternative to the 3-column primary key + 4-column unique index (on Y in my original example) is to make all 4 columns the primary key and then define a unique index on a view. But it's not clear to me that this is an improvement.

I don't think a CHECK constraint works since it applies only to a single row of a single table and this stuff inherently involves comparing with other rows and values in other tables

Munkeymon posted:

Is all of this just wanting an FK constraint to work (more) like an index?

How do you mean

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Nth Doctor posted:

I haven't thought too deeply about the problem, but are we looking for something like a CHECK constraint or possibly functionality like an indexed view to enforce schema rules not otherwise expressible by FKs?

That’s what it sounds like to me. That given a FK reference you also want to verify a certain value is set a certain way. Sounds like a normalization issue to a degree, but sometimes this stuff gets super complex in the real world.

I do think hammer is right that it’s not a technically impossible problem - when you have a unique index that includes your addl column, it is by definition, still unique. Also the supporting index means no additional reads needed. I just don’t think the engine supports it.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I'm only referencing multiple indexes to raise the issue of a general implementation of what's proposed by the example. It's a very important design question: If the syntax supported This, what cases must be considered?

PhantomOfTheCopier posted:

ps. There's another cute thing you can do in Y as defined that doesn't alter the primary key entry but would require a change to Z. (Possibly db implementation specific, but logically valid.) We'll call it homework question two.
If Z(a,b,c,d) references a row of idxY(a,b,c) without 'd', then "UPDATE Y SET d='something else'" doesn't necessarily change the index in Y, hence doesn't trigger a change in Z, yet violates the implied foreign key relationship that's intended. Thus, Z(a,b,c,d) actually needs to reference idxY(a,b,c) and fkYX1, that is to idxY and deferred-idxX, in order to detect a change to any of (a,b,c,d). This is why I'm using the term "multiple indexes".

So we're not really talking cross purposes, but I may be thinking about the more general implementation issue. Nothing prevents you from building a database that supports multiple foreign index references in a foreign key implementation, but the implementation of triggering actions to maintain consistency is very important. If it becomes impossible or intractable to enforce consistency, then the implementation probably isn't going to make it into a general purpose database.


The database doesn't want to import a full logic engine to establish DDL because there are cases that are hard. Building a system that supports "suppose by contradiction that..." is possible but instead of being predicated on a simple "if exists / no longer exists" statement, would have to build mechanisms to check all potentially-related entities for each foreign column for every operation. I still expect there are infinite regression cases hiding in there.

The case you've given is actually easy to solve with subsets... in hierarchical databases. :buddy: Sadly we don't get those here, but you can fake them with materialized views, partitions, and the like.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

PhantomOfTheCopier posted:

I'm only referencing multiple indexes to raise the issue of a general implementation of what's proposed by the example. It's a very important design question: If the syntax supported This, what cases must be considered?

If Z(a,b,c,d) references a row of idxY(a,b,c) without 'd', then "UPDATE Y SET d='something else'" doesn't necessarily change the index in Y, hence doesn't trigger a change in Z, yet violates the implied foreign key relationship that's intended. Thus, Z(a,b,c,d) actually needs to reference idxY(a,b,c) and fkYX1, that is to idxY and deferred-idxX, in order to detect a change to any of (a,b,c,d). This is why I'm using the term "multiple indexes".

Ok, I can see where you are coming from. I will grant you that it involves work on the part of the database engine to be able to tackle this problem in general. I still think that a "sufficiently clever" database engine could do it, but perhaps it is asking more than is reasonable.

Adbot
ADBOT LOVES YOU

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Hammerite posted:

Ok, I can see where you are coming from. I will grant you that it involves work on the part of the database engine to be able to tackle this problem in general. I still think that a "sufficiently clever" database engine could do it, but perhaps it is asking more than is reasonable.
Agreed. I definitely agree that it's doable. I worry about the strange cases; none have sprung to mind but I haven't given it much thought.

I think the key to the design is that foreign keys have to notify the referring table, and the columns tell it which ones matter.

Thanks for putting up with the walls of text above. I wasn't as descriptive as I could have been initially.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply