Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Jaded Burnout
Jul 10, 2004


Just-In-Timeberlake posted:

Not sure if I entirely understand your explanation but if you have, for the sake of simplicity, 3 tables:

Orders
Order_Items
Shipped_Items

You could INNER JOIN Orders and Order_Items, and LEFT OUTER JOIN the Shipped_Items on the Order_Items table. If the item hasn't shipped the value for any column in Shipped_Items will be NULL.

It'd be something like this, adjust as necessary for your situation

code:
SELECT 	*
FROM 	[dbo].[Orders] [o]
		INNER JOIN [dbo].[Order_Items] [oi] ON [o].[OrderID] = [oi].[OrderID]
		LEFT OUTER JOIN [dbo].[Shipped_Items] [si] ON [si].[OrderItemID] = [oi].[OrderItemID]
WHERE	[si].[OrderItemID] IS NULL;

Ah, kinda hook round from the other side, nice.

I wound up with (effectively) this:
code:
        SELECT * FROM a
        INNER JOIN b ON b.c_id = a.c_id
        LEFT OUTER JOIN e ON e.d_id = b.d_id
          AND e.a_id = a.id
        WHERE e.id IS NULL
So pretty much your query but with an extra constraint in the outer join to bring it back to the original table (since that is I think lost a couple steps in). Also trimmed our a few joins that could be skipped.

Thanks!

Edit: I also had to slap a DISTINCT in there otherwise I was getting back one copy of `a` for every missing entry.

Jaded Burnout fucked around with this message at 23:33 on May 9, 2022

Adbot
ADBOT LOVES YOU

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
SELECT * FROM questions WHERE type='homework'; :clint:

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
I like using an ORM because it'll make transitioning from a local SQLite database to PostgreSQL less painful (most likely), but I'm starting to understand why it's a good idea to at least have an idea of the underlying SQL syntax. It took an animated loving Venn diagram to help me understand what the hell that query meant.

Jaded Burnout
Jul 10, 2004


D34THROW posted:

I like using an ORM because it'll make transitioning from a local SQLite database to PostgreSQL less painful (most likely), but I'm starting to understand why it's a good idea to at least have an idea of the underlying SQL syntax. It took an animated loving Venn diagram to help me understand what the hell that query meant.

Yeah this is one of the few situations where I had to step outside the ORM.

I still only half understand the different join types after all these years.

Macichne Leainig
Jul 26, 2012

by VG
Trying to do something sneaky with a view.

Basically I have an app where you have datasets, each dataset can have N images, and each image can have M annotations.

I want to tally up the images, and the total annotations under each dataset.

Here's what I have so far for images, which works great:

SQL code:
SELECT 
    `datasets`.`ID` AS `ID`,
    `datasets`.`Name` AS `name`,
    `datasets`.`Desc` AS `desc`,
    COALESCE(`ic`.`image_count`, 0) AS `images`
FROM
    `datasets`
        LEFT JOIN
    (SELECT 
        `images`.`ID` AS `ID`,
            `images`.`DatasetID` AS `DatasetID`,
            COUNT(0) AS `image_count`
    FROM
        `images`
    GROUP BY `images`.`DatasetID`) `ic` ON ((`ic`.`DatasetID` = `datasets`.`ID`))
The problem is when I go a step further and try to tally up the annotations on a dataset level. My main problem is that I don't have a dataset ID on the annotation table, which seems unnecessary. Is there a way I can shim it into the query here? I've tried to add a join like:

SQL code:
LEFT JOIN
    (SELECT 
        `annotations`.`ID` AS `ID`,
            `annotations`.`ImageID` AS `ImageID`,
            `ic`.`DatasetID` AS `DatasetID`,
            COUNT(0) AS `anno_count`
    FROM
        `annotations`
    GROUP BY `annotations`.`DatasetID`) `ac` ON ((`ac`.`ImageID` = `ic`.`ID`))
But it barfs on trying to add the column into that select (probably for good reason).

I'm sure I'm missing something silly with joins here, any help is appreciated.

kloa
Feb 14, 2007


Protocol7 posted:

Trying to do something sneaky with a view.

Basically I have an app where you have datasets, each dataset can have N images, and each image can have M annotations.

I want to tally up the images, and the total annotations under each dataset.

Here's what I have so far for images, which works great:

*snip*

But it barfs on trying to add the column into that select (probably for good reason).

I'm sure I'm missing something silly with joins here, any help is appreciated.

Assuming the the images are zero-to-many on datasets, and annotations are zero-to-many on each image:

Sample data:
SQL code:
create table datasets (
    id integer,
    name varchar,
    description varchar
);

create table images (
    id integer,
    img blob,
    datasetid integer
);

create table annotations (
    id integer,
    imageid integer,
    value varchar
);


insert into datasets values (1, 'sa', 'something awful');
insert into datasets values (2, 're', 'reddit');
insert into datasets values (3, 'so', 'stack overflow');

insert into images values (1, '', 1);
insert into images values (2, '', 1);
insert into images values (3, '', 2);

insert into annotations values (1, 1, 'blah');
insert into annotations values (2, 1, 'some text');
insert into annotations values (3, 1, 'heh');
insert into annotations values (4, 3, 'wat');
Query:
SQL code:
select
    -- dataset cols
    ds.id
    ,ds.name
    ,ds.description
    
    -- image cols
    ,count(distinct i.id) as image_count
    
    -- annotation cols
    ,count(distinct a.id) as anno_count
from datasets ds
    left join images i
        on i.datasetid = ds.id
    left join annotations a
        on a.imageid = i.id
group by ds.id, ds.name, ds.description;
You get:
SQL code:
1|sa|something awful|2|3
2|re|reddit|1|1
3|so|stack overflow|0|0
Made a quick and dirty lunch break version here: https://www.mycompiler.io/view/CRvrVubMi4l

kloa fucked around with this message at 22:55 on May 12, 2022

Macichne Leainig
Jul 26, 2012

by VG
Oh my goodness I was just grouping wrong :doh:

Thank you very much kind goon!

kloa
Feb 14, 2007


Protocol7 posted:

Oh my goodness I was just grouping wrong :doh:

Thank you very much kind goon!

That's usually most people's issue when they pull data :thumbsup:

(or self causing duplication and then blaming the data quality :argh:)

kloa fucked around with this message at 22:58 on May 12, 2022

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

Jaded Burnout posted:

Yeah this is one of the few situations where I had to step outside the ORM.

I still only half understand the different join types after all these years.

there's actually not that many

A join B: your standard join, workhorse of the sql query, friend to all
A left join B: the other join. you do an inner join but then also keep each row from A where no matching B row was found
A right join B: same as left join but backwards for no reason, nobody does this except to be cute/clever, ignore forever
A full join B: pretty rare but occasionally useful, it's basically both left and right joins at the same time, we're still in clever territory here tho
A cross join B: cartesian product. no join condition

there's also semi joins and anti joins, but sql doesn't have a special syntax for them. you either use a left join where b.id is/is not null, or a where [not] exists(...) clause

a semi join is like A left join B but you don't care about any of the columns of B, you just want to know if a matching row exists
an anti join is the same idea but inverted, you want to know if no matching row exists

if you see the keyword "lateral" in there somewhere, you should consider yourself to be in the presence of a wizard such as myself

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


DELETE CASCADE posted:

there's actually not that many

A join B: your standard join, workhorse of the sql query, friend to all
A left join B: the other join. you do an inner join but then also keep each row from A where no matching B row was found
A right join B: same as left join but backwards for no reason, nobody does this except to be cute/clever, ignore forever
A full join B: pretty rare but occasionally useful, it's basically both left and right joins at the same time, we're still in clever territory here tho
A cross join B: cartesian product. no join condition

there's also semi joins and anti joins, but sql doesn't have a special syntax for them. you either use a left join where b.id is/is not null, or a where [not] exists(...) clause

a semi join is like A left join B but you don't care about any of the columns of B, you just want to know if a matching row exists
an anti join is the same idea but inverted, you want to know if no matching row exists

if you see the keyword "lateral" in there somewhere, you should consider yourself to be in the presence of a wizard such as myself

Name/Post combo checks out

Kuule hain nussivan
Nov 27, 2008

Quick PostgreSQL question: I have a situation where row insertions and updates need to be passed on to a separate process. My initial plan was to use triggers along with NOTIFY / LISTEN to pass the records overt to a message queue via a simple bridge service. However, from the docs I understood that postgres will simply discard all messages if nothing is listening on the channel. This is not good, since the bridge service crashing would lead to missed changes.

Anyone have any suggestions on an alternative method? The number of changes shouldn't be mindblowing, maybe a couple a second during peaks.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Kuule hain nussivan posted:

Quick PostgreSQL question: I have a situation where row insertions and updates need to be passed on to a separate process. My initial plan was to use triggers along with NOTIFY / LISTEN to pass the records overt to a message queue via a simple bridge service. However, from the docs I understood that postgres will simply discard all messages if nothing is listening on the channel. This is not good, since the bridge service crashing would lead to missed changes.

Anyone have any suggestions on an alternative method? The number of changes shouldn't be mindblowing, maybe a couple a second during peaks.

Use triggers to populate an audit table and also NOTIFY any listeners that a new audit record was inserted. The bridge service will LISTEN, grab the audit record and then delete it (or mark it as consumed if you want to keep a history); if the service crashes, on every startup it should check the audit table and consume whatever it missed.

If you don't care about real-time delivery, you can simply skip the LISTEN/NOTIFY mechanics entirely and just have the service poll the audit table.

Kuule hain nussivan
Nov 27, 2008

Yeah, I figured something like this might be the way to go. Thanks for the confirmation.

I'll give it a ponder and see if I can come up with an alternative.

abelwingnut
Dec 23, 2002


has anyone come across a star schema, snowflake schema, or galaxy schema in a sql server environment? had an interview today and the dude was asking me about modeling a hypothetical local db in these formats, and he was wondering why i had never heard of them or done it. i've always designed stuff in similar paths, but always, always, always sided to normalize as much as possible.

was kind of surprised he was so adamant. researching them now, it seems like they could make a lot of sense for SELECT and UPDATE statements. obviously the more you normalize the dimension tables, the slower, but they just feel more in the spirit of sql server? hard to explain.

what exactly are the downsides to employing these models in sql server? it violates my inherent sense for normalizing and avoiding redundancy, that's for sure. obviously i see the benefit for a warehouse environment where reporting is key.

the whole idea of dimensionalizing dates and times instead of using a timestamp in a transactional table...:psyduck:

but i also get it for the use case.

abelwingnut fucked around with this message at 06:39 on May 24, 2022

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Never have. I got the impression they were ultimately replaced by Mongo (web scale!), ie document stores, or key value stores once they became viable enough in a distributed cloudy environment. Other than special use cases I'd imagine that most would proclaim YAGNI and stick with the easily understood and well optimized relational query engines (when their business is primarily relational) or KV stores (when... ever).

On the other hand, I could see the concepts being relevant at the level of data lake/warehousing design where separating 1TB-per-day data benefits from dimensional partitioning and improves general queries and map-reduce operations. (Though I'd contend that's not something most companies need to handle.)

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

OLAP-style schemas also make things simpler for business analysts. Often they're far from SQL wizards (or they even rely entirely on graphical query builders) so asking them to properly handle aggregation and filtering from a 3NF schema can be a bit much.

The star/snowflake/galaxy schema however should still be a final, read-only target for data warehousing. If it's your only database, I'm going to assume that there is still a separate operational data store that isn't necessarily a relational database - perhaps you're using Redis or whatever.

King Burgundy
Sep 17, 2003

I am the Burgundy King,
I can do anything!

abelwingnut posted:

has anyone come across a star schema, snowflake schema, or galaxy schema in a sql server environment? had an interview today and the dude was asking me about modeling a hypothetical local db in these formats, and he was wondering why i had never heard of them or done it.

These are standard design patterns for things like data warehouseing, especially if there will be a data cube of some sort sitting on top of it(SSAS or Azure Analysis Services/etc).

Were you interviewing for a BI position? If not, not really sure why the question would be asked.

abelwingnut
Dec 23, 2002


no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff.

and tbh, like i said, i do see some benefit here, especially with the snowflake schema. it is sort of how i’ve always designed the schemas i’ve needed—i just didn’t know the name of the concept. but i never went to the level of using date or time tables to normalize.

the star schema seems anathema entirely, though, given how many redundancies of data it can lead to.

that all said, worst interviewer of my life. dude only understood things by these terms. and when i asked for clarification as to ‘what is a star schema?’, just rolled his eyes. completely didn’t care about whatever logic about schema building/modeling i knew. just frustrating.

abelwingnut fucked around with this message at 13:45 on May 24, 2022

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Star Schemas as was pointed out are good for doing business analytics and benefit more from column-store databases rather than a row-oriented approach that you would find in an OLTP database use case.

At my company, we siphon off data regularly from our traditional normalized schema that our customers use to run their businesses into a star-schema data warehouse through an ETL process. The T is transforming the normalized data into a star-schema form.

The whole idea is just to reform your data ahead of time in such a way that it is better suited for the types of queries you would run when doing analytics over the whole relevant dataset vs. for high throughput transaction processing.

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

abelwingnut posted:

no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff.

and tbh, like i said, i do see some benefit here, especially with the snowflake schema. it is sort of how i’ve always designed the schemas i’ve needed—i just didn’t know the name of the concept. but i never went to the level of using date or time tables to normalize.

the star schema seems anathema entirely, though, given how many redundancies of data it can lead to.

that all said, worst interviewer of my life. dude only understood things by these terms. and when i asked for clarification as to ‘what is a star schema?’, just rolled his eyes. completely didn’t care about whatever logic about schema building/modeling i knew. just frustrating.

He was probably hiring for a data warehousing role. In data warehousing, you take a normalized database, and write an ETL script to transform the data into a cube that uses star schema (e.g. https://www.mssqltips.com/sqlservertip/5690/create-a-star-schema-data-model-in-sql-server-using-the-microsoft-toolset/). SQL Server Analysis services has a special query language called MDX that you use to query these cubes (e.g. https://docs.microsoft.com/en-us/analysis-services/multidimensional-models/mdx/mdx-query-the-basic-query?view=sql-analysis-services-2022). The ETL script will run at arbitrary times (say nightly) - the purpose of doing this is so your analysts can look at patterns in the data without impacting the performance of the production database, and also because it's much easier to aggregate/summarize the patterns in the data quickly.

D34THROW
Jan 29, 2012

RETAIL RETAIL LISTEN TO ME BITCH ABOUT RETAIL
:rant:
Alright, got my DB job table populated with some fake data after a day and a half of writing the functions to do so, now let's do a query.

SELECT * FROM job WHERE branch_id == 4 returns nothing! Why does this return nothing!

Half an hour later, I realize that the functions to fill the DB with fake data weren't populating job.branch_id :doh:

raminasi
Jan 25, 2005

a last drink with no ice

abelwingnut posted:

no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff.

If I were told about a role that primarily built ETL pipelines and also worked with Snowflake (the product), I would 100% assume it was for a data warehousing engineering position, for which the patterns you've named are the bread and butter. The dude's an rear end for not stopping and making sure you two were on the same page about the thing you were interviewing for (and discussing the concepts specifically in the context of a local SQL Server data model is kind of weird) but his assumptions themselves don't sound unreasonable.

joebuddah
Jan 30, 2005
I feel like this should be straight forward, but I seem to be missing something.

My desired result

Is to get a the list of all products and max value sold.

When ever I add a date filter to the where , WidgetC goes away

What am I missing?

code:
 

SELECT  
products.productName,
max(saleInfo.saleTotal)
 FROM products

left join sales
on sales.productUUID = products.productUUID

left join saleInfo
on sales.saleUUID = sales.saleUUID

   
group by
products.productName




[code]
Desired result 
Products| 
ID|  Name| maxSale
==========
1. | widgetA | 3
2. | WidgetB| 4
3 | WidgetC. | 0

code:

Products|
ID|  Name|
==========
1. | widgetA
2. | WidgetB
3 | WidgetC


SaleInfo

ID  |  saleTotal |   saleID
---------------------------------------
1.|        4            |     1
---------------------------------
2 |      2              |   2


Sales

SaleID |    saleDate    |  ProductID 
----------------------------------------------------------------
1          |    04/25/2022  | 1             |   
-------------------------------------------------------
2          |    04/24/2022  | 2              | 





update I got it figured out

code:

Select
a.productName,
a.productUUID,
max(a.saleTotal)
from
(
SELECT  
products.productName,
sales.productUUID,
saleInfo.saleTotal
 FROM products

left join sales
on sales.productUUID = products.productUUID
and sales.timestamp >= dateadd(day,-4,getdate())
left  join saleInfo
on sales.saleUUID = saleInfo.saleUUID
) a
   
 group by 

a.productName,a.productUUID

joebuddah fucked around with this message at 05:35 on May 27, 2022

Kuule hain nussivan
Nov 27, 2008

joebuddah posted:

I feel like this should be straight forward, but I seem to be missing something.

You can make the query more readable (in my opinion) by adding the date filtering to the sales join of your original query. Like so...

code:
 
SELECT  
products.productName,
COALESCE(max(saleInfo.saleTotal), 0) as maxSale
 FROM products

left join sales
on sales.productUUID = products.productUUID
  and sales.timestamp >= dateadd(day,-4,getdate())

left join saleInfo
on sales.saleUUID = sales.saleUUID
   
group by
products.productName
I also added a COALESCE to make any NULLs returned for products with no-sales into zeroes. Give it a whirl.

joebuddah
Jan 30, 2005
Thanks, you're correct the way you didn't it makes it more readable. The COALESCE makes it better.

Thanks

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
I'm puzzled at the difference between a query with a left join and what I thought was an equivalent query using a not exists subquery. Here's the first query. (Apologies for the pastebin link, Cloudflare was blocking me if I included the SQL in the post.) This query gives me the row I expect to see; what I'm trying to do is get a row from the table cam_grab, unless there is a row in cam_tag with a matching cam_uuid and the tag "noresize". In this case there is no such row, so I get the cam_grab row in the result.

However, this second query gives me no results. Isn't that the same thing? And the other thing is that if I change AND NOT EXISTS... to AND EXISTS... it still doesn't work. One of those two cases has to work, right? If I take the "NOT EXISTS(...)" out of the WHERE and put it into the SELECT, it's 1; I am 100% sure there is no row in cam_tag with that uuid and tag='noresize'. Is there something extremely dumb and obvious I'm missing here?

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
(scratch all that ...) It's a joke right? Your query planner is munging up things. What are the actual rows? And can you reproduce it with just those rows?

(Which is to say I kind of see it, and they aren't the same queries, but now I've gotten myself confused and I'm having a hard time coming up with an explanation.)

PhantomOfTheCopier fucked around with this message at 02:12 on May 29, 2022

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror

PhantomOfTheCopier posted:

I don't believe that "there's no such row". You should actually look at the tags for that id.

Believe, friend:

code:
MariaDB> SELECT * FROM cam_tag WHERE cam_uuid=(SELECT cam_uuid FROM cam_grab WHERE cam_grab.uuid='6c47e303-38da-43a7-a53b-7412a133a8ae');
Empty set (0.000 sec)
I also ran this in a separate environment where I ran "TRUNCATE TABLE cam_tag" so that it's entirely empty in order to make absolutely sure, and I get the same results. The actual cam_grab uuid seems to be irrelevant; I get the same results with any row I pick out of that table.

Now, this is what really confuses me. How can neither of those first two queries work? And why won't the first one work when the NOT EXISTS()... evaluates to 1 when I put it in the select in the third query???

e: to pre-empt the obvious answer, I triple-checked that the row does actually exist--"SELECT * FROM cam_grab WHERE cam_grab.uuid='6c47e303-38da-43a7-a53b-7412a133a8ae'" shows it

Tiny Bug Child fucked around with this message at 02:30 on May 29, 2022

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I don't understand why you need a sub query to find a row in tag with a given id. But whatever.

In any case everything works fine here with sample data and duplicate rows, null values, etcetera.

Fire up EXPLAIN. And what's your database+version. And get out of your current tables and do all your testing against a temp table with just these rows to see at what point things are falling apart. It just sounds like your database can't properly handle NOT EXISTS (which sheesh that was 10yr ago in PostgreSQL).


ps. You're crossing the streams. Just write SELECT from tag where id="your value" and tag="property".

pps. The way to figure out what a database is doing is to start removing things from your queries. Don't start stuffing more copies of not behaving queries inside other queries. :) For example, just do select from grab where exists (...), then the not exists case. There shouldn't be overlap, as you rightly note, but it sure sounds like there is in this case. It's either the data or the database.


vvv Right but the crystal ball isn't uncovering it so you have to divide further. Create tables with only the single id and a not matching tag and see what happens.

PhantomOfTheCopier fucked around with this message at 02:47 on May 29, 2022

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
This is a simplification of a way bigger query; in the actual query I don't have a given ID, I have a bunch of other conditions that all seem to be working properly. It's the smallest possible query I could get to that still shows the problem--just selecting one row that definitely exists, plus the NOT EXISTS condition. The version is MariaDB 10.7.3, so it's not that old.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I suppose I'm saying I agree with you in that "They should work", but they clearly don't so start small(er) then add real rows of data to figure out when it falls apart (assuming it even works with sample data):
code:
CREATE TEMP TABLE cobj (id INTEGER);
CREATE TEMP TABLE ctag (id INTEGER, tag TEXT);

INSERT INTO cobj VALUES 
  (1)
 ,(2)
 ,(3)
 ,(4)
 ,(4)
 ,(5)
 ,(5)
 ,(6)
 ,(6)
 ,(null)
;
INSERT INTO ctag VALUES 
  (1,'pass')
 ,(2,'fail')
 ,(3,null)
 ,(4,'pass')
 ,(4,'fail')
 ,(5,'pass')
 ,(5,null)
 ,(6,'fail')
 ,(6,null)
 ,(null,'pass')
 ,(null,'fail')
 ,(null,null)
;


SELECT cobj.*,ctag.* FROM cobj LEFT JOIN ctag ON ctag.id=cobj.id AND ctag.tag='pass';
 id | id | tag
----+----+------
  1 |  1 | pass
  2 |    |
  3 |    |
  4 |  4 | pass
  4 |  4 | pass
  5 |  5 | pass
  5 |  5 | pass
  6 |    |
  6 |    |
    |    |
(10 rows)


SELECT cobj.* FROM cobj WHERE EXISTS (SELECT * FROM ctag WHERE ctag.id=cobj.id AND ctag.tag='pass');
 id
----
  1
  4
  4
  5
  5
(5 rows)


SELECT cobj.* FROM cobj WHERE NOT EXISTS (SELECT * FROM ctag WHERE ctag.id=cobj.id AND ctag.tag='pass');
 id
----
  2
  3
  6
  6

(5 rows)

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
A breakthrough, of sorts: I was trying to recreate the issue on a dbfiddle site, but had to change the tables because the latest version I could find was MariaDB 10.6, which doesn't support the uuid type. So I changed the uuid columns to varchars... and it works exactly as expected. And if I create the tables with varchar columns instead of uuids in my environment it works too, with no changes to the queries. Mystery partially solved, I guess--I don't really understand why it's not working.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Very nice! Good job. (If it isn't behaving as expected, don't assume anything else will behave and start treating it like a black box, checking everything from the ground up.)

I used PostgreSQL back when NOT EXISTS wasn't working and figuring out why it was misbehaving was a similar process of digging apart everything.

So it's either the db version or the type. Wery interesting.

ps Maybe explicitly cast the uuid back to a string, append a character or something, and see if that works. It'll be slow but at least tell you something. And is the id field in the tag table also a uuid? Perhaps it's matching in one case but not the other without an explicit cast. As a new feature in Maria, it might be broken.

(It's probably doing integer matching at some point of the query, and it'll cast '55ab-...' to 55 silently.)

PhantomOfTheCopier fucked around with this message at 04:03 on May 29, 2022

Tiny Bug Child
Sep 11, 2004

Avoid Symmetry, Allow Complexity, Introduce Terror
It really seems like something screwy is going on with the uuid type. All the uuid columns are the uuid type, and as written the query doesn't work, but if I change that where in the subquery to this...

WHERE CAST(`cam_tag`.`cam_uuid` AS BINARY) = CAST(`cam_grab`.`cam_uuid` AS BINARY)

it works exactly as expected. And CAST(... AS UUID) works too!

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
Once, while learning mysql, I came across a command-line command that would give you the original instruction that would create a specific table, just as you had it.

I am trying to find it, but my search terms are too generic and I am getting plenty of useless results, tutorials and stuff. Does anyone know what instruction I'm talking about?

Thanks in advance.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
https://dev.mysql.com/doc/refman/8.0/en/show-create-table.html ??

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
Thanks! That is it.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
The advice I have frequently seen for audit logging in postgres and similar RDBMS is - “don’t”, use another store that can work async and won’t compete with OLTP for resources/IO.

Is that broadly good advice? If so, would that be a good use-case for something like mongo, or another similar document store? Append-only history where nothing is ever deleted seems like a good impedance match to that kind of “this is basically a fancy fwrite” category of store…

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
For what it's worth I've never heard that, but it depends on the definition and intended purpose of "audit log". If it means "a collection of rows affected by inserts, updates, or deletes in a source table in adherence with the transaction isolation expectations" then there's typical no choice but to put it in the database since waiting for external call confirmation is slower.

For example, a database of monetary account balances may have an audit log and the information is sufficient to roll back or forward to any point in time. That's TRIGGER territory.

If it's still transaction safe but not necessarily "live data", in PostgreSQL there's still streaming and log replication that can offload logging to a separate system. Arguably even a WAL receiver has a justifiable cost beyond just the log, since the full copy of the data has other benefits.

If audit log means "some vague comments about data changes in a source table", then of course it can be offloaded and stored more cheaply. Usually the use case is a "handful of fields someone without database access once requested but which now supports notable business workflows and probably compliance, until that day when an emergency happens and the audit is insufficient/out of order/only eventually consistent/found to be incomplete and we realize we should have used triggers". :11tea:

Adbot
ADBOT LOVES YOU

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
What are the implications of using a composite primary key versus creating an autoincrementing column for a PK and then creating an index on the columns? Why would one use one method over the other?

For example:

If I have data [colA, colB, colC, colD...] that is unique for [colA,colB] what are the implications of creating a table with a composite primary key of [colA, colB] versus creating an autoincrementing col0 and then creating a secondary index on [colA, colB]?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply