|
Just-In-Timeberlake posted:Not sure if I entirely understand your explanation but if you have, for the sake of simplicity, 3 tables: Ah, kinda hook round from the other side, nice. I wound up with (effectively) this: code:
Thanks! Edit: I also had to slap a DISTINCT in there otherwise I was getting back one copy of `a` for every missing entry. Jaded Burnout fucked around with this message at 23:33 on May 9, 2022 |
# ? May 9, 2022 23:21 |
|
|
# ? Jun 10, 2024 16:20 |
|
SELECT * FROM questions WHERE type='homework';
|
# ? May 12, 2022 02:31 |
|
I like using an ORM because it'll make transitioning from a local SQLite database to PostgreSQL less painful (most likely), but I'm starting to understand why it's a good idea to at least have an idea of the underlying SQL syntax. It took an animated loving Venn diagram to help me understand what the hell that query meant.
|
# ? May 12, 2022 13:57 |
|
D34THROW posted:I like using an ORM because it'll make transitioning from a local SQLite database to PostgreSQL less painful (most likely), but I'm starting to understand why it's a good idea to at least have an idea of the underlying SQL syntax. It took an animated loving Venn diagram to help me understand what the hell that query meant. Yeah this is one of the few situations where I had to step outside the ORM. I still only half understand the different join types after all these years.
|
# ? May 12, 2022 16:09 |
|
Trying to do something sneaky with a view. Basically I have an app where you have datasets, each dataset can have N images, and each image can have M annotations. I want to tally up the images, and the total annotations under each dataset. Here's what I have so far for images, which works great: SQL code:
SQL code:
I'm sure I'm missing something silly with joins here, any help is appreciated.
|
# ? May 12, 2022 19:15 |
|
Protocol7 posted:Trying to do something sneaky with a view. Assuming the the images are zero-to-many on datasets, and annotations are zero-to-many on each image: Sample data: SQL code:
SQL code:
SQL code:
kloa fucked around with this message at 22:55 on May 12, 2022 |
# ? May 12, 2022 22:38 |
|
Oh my goodness I was just grouping wrong Thank you very much kind goon!
|
# ? May 12, 2022 22:44 |
|
Protocol7 posted:Oh my goodness I was just grouping wrong That's usually most people's issue when they pull data (or self causing duplication and then blaming the data quality ) kloa fucked around with this message at 22:58 on May 12, 2022 |
# ? May 12, 2022 22:56 |
|
Jaded Burnout posted:Yeah this is one of the few situations where I had to step outside the ORM. there's actually not that many A join B: your standard join, workhorse of the sql query, friend to all A left join B: the other join. you do an inner join but then also keep each row from A where no matching B row was found A right join B: same as left join but backwards for no reason, nobody does this except to be cute/clever, ignore forever A full join B: pretty rare but occasionally useful, it's basically both left and right joins at the same time, we're still in clever territory here tho A cross join B: cartesian product. no join condition there's also semi joins and anti joins, but sql doesn't have a special syntax for them. you either use a left join where b.id is/is not null, or a where [not] exists(...) clause a semi join is like A left join B but you don't care about any of the columns of B, you just want to know if a matching row exists an anti join is the same idea but inverted, you want to know if no matching row exists if you see the keyword "lateral" in there somewhere, you should consider yourself to be in the presence of a wizard such as myself
|
# ? May 14, 2022 05:40 |
|
DELETE CASCADE posted:there's actually not that many Name/Post combo checks out
|
# ? May 14, 2022 17:11 |
|
Quick PostgreSQL question: I have a situation where row insertions and updates need to be passed on to a separate process. My initial plan was to use triggers along with NOTIFY / LISTEN to pass the records overt to a message queue via a simple bridge service. However, from the docs I understood that postgres will simply discard all messages if nothing is listening on the channel. This is not good, since the bridge service crashing would lead to missed changes. Anyone have any suggestions on an alternative method? The number of changes shouldn't be mindblowing, maybe a couple a second during peaks.
|
# ? May 16, 2022 13:52 |
|
Kuule hain nussivan posted:Quick PostgreSQL question: I have a situation where row insertions and updates need to be passed on to a separate process. My initial plan was to use triggers along with NOTIFY / LISTEN to pass the records overt to a message queue via a simple bridge service. However, from the docs I understood that postgres will simply discard all messages if nothing is listening on the channel. This is not good, since the bridge service crashing would lead to missed changes. Use triggers to populate an audit table and also NOTIFY any listeners that a new audit record was inserted. The bridge service will LISTEN, grab the audit record and then delete it (or mark it as consumed if you want to keep a history); if the service crashes, on every startup it should check the audit table and consume whatever it missed. If you don't care about real-time delivery, you can simply skip the LISTEN/NOTIFY mechanics entirely and just have the service poll the audit table.
|
# ? May 16, 2022 14:08 |
|
Yeah, I figured something like this might be the way to go. Thanks for the confirmation. I'll give it a ponder and see if I can come up with an alternative.
|
# ? May 17, 2022 06:11 |
|
has anyone come across a star schema, snowflake schema, or galaxy schema in a sql server environment? had an interview today and the dude was asking me about modeling a hypothetical local db in these formats, and he was wondering why i had never heard of them or done it. i've always designed stuff in similar paths, but always, always, always sided to normalize as much as possible. was kind of surprised he was so adamant. researching them now, it seems like they could make a lot of sense for SELECT and UPDATE statements. obviously the more you normalize the dimension tables, the slower, but they just feel more in the spirit of sql server? hard to explain. what exactly are the downsides to employing these models in sql server? it violates my inherent sense for normalizing and avoiding redundancy, that's for sure. obviously i see the benefit for a warehouse environment where reporting is key. the whole idea of dimensionalizing dates and times instead of using a timestamp in a transactional table... but i also get it for the use case. abelwingnut fucked around with this message at 06:39 on May 24, 2022 |
# ? May 24, 2022 06:36 |
|
Never have. I got the impression they were ultimately replaced by Mongo (web scale!), ie document stores, or key value stores once they became viable enough in a distributed cloudy environment. Other than special use cases I'd imagine that most would proclaim YAGNI and stick with the easily understood and well optimized relational query engines (when their business is primarily relational) or KV stores (when... ever). On the other hand, I could see the concepts being relevant at the level of data lake/warehousing design where separating 1TB-per-day data benefits from dimensional partitioning and improves general queries and map-reduce operations. (Though I'd contend that's not something most companies need to handle.)
|
# ? May 24, 2022 07:16 |
|
OLAP-style schemas also make things simpler for business analysts. Often they're far from SQL wizards (or they even rely entirely on graphical query builders) so asking them to properly handle aggregation and filtering from a 3NF schema can be a bit much. The star/snowflake/galaxy schema however should still be a final, read-only target for data warehousing. If it's your only database, I'm going to assume that there is still a separate operational data store that isn't necessarily a relational database - perhaps you're using Redis or whatever.
|
# ? May 24, 2022 09:07 |
|
abelwingnut posted:has anyone come across a star schema, snowflake schema, or galaxy schema in a sql server environment? had an interview today and the dude was asking me about modeling a hypothetical local db in these formats, and he was wondering why i had never heard of them or done it. These are standard design patterns for things like data warehouseing, especially if there will be a data cube of some sort sitting on top of it(SSAS or Azure Analysis Services/etc). Were you interviewing for a BI position? If not, not really sure why the question would be asked.
|
# ? May 24, 2022 09:07 |
|
no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff. and tbh, like i said, i do see some benefit here, especially with the snowflake schema. it is sort of how i’ve always designed the schemas i’ve needed—i just didn’t know the name of the concept. but i never went to the level of using date or time tables to normalize. the star schema seems anathema entirely, though, given how many redundancies of data it can lead to. that all said, worst interviewer of my life. dude only understood things by these terms. and when i asked for clarification as to ‘what is a star schema?’, just rolled his eyes. completely didn’t care about whatever logic about schema building/modeling i knew. just frustrating. abelwingnut fucked around with this message at 13:45 on May 24, 2022 |
# ? May 24, 2022 13:43 |
|
Star Schemas as was pointed out are good for doing business analytics and benefit more from column-store databases rather than a row-oriented approach that you would find in an OLTP database use case. At my company, we siphon off data regularly from our traditional normalized schema that our customers use to run their businesses into a star-schema data warehouse through an ETL process. The T is transforming the normalized data into a star-schema form. The whole idea is just to reform your data ahead of time in such a way that it is better suited for the types of queries you would run when doing analytics over the whole relevant dataset vs. for high throughput transaction processing.
|
# ? May 24, 2022 14:31 |
|
abelwingnut posted:no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff. He was probably hiring for a data warehousing role. In data warehousing, you take a normalized database, and write an ETL script to transform the data into a cube that uses star schema (e.g. https://www.mssqltips.com/sqlservertip/5690/create-a-star-schema-data-model-in-sql-server-using-the-microsoft-toolset/). SQL Server Analysis services has a special query language called MDX that you use to query these cubes (e.g. https://docs.microsoft.com/en-us/analysis-services/multidimensional-models/mdx/mdx-query-the-basic-query?view=sql-analysis-services-2022). The ETL script will run at arbitrary times (say nightly) - the purpose of doing this is so your analysts can look at patterns in the data without impacting the performance of the production database, and also because it's much easier to aggregate/summarize the patterns in the data quickly.
|
# ? May 24, 2022 20:24 |
|
Alright, got my DB job table populated with some fake data after a day and a half of writing the functions to do so, now let's do a query. SELECT * FROM job WHERE branch_id == 4 returns nothing! Why does this return nothing! Half an hour later, I realize that the functions to fill the DB with fake data weren't populating job.branch_id
|
# ? May 26, 2022 21:08 |
|
abelwingnut posted:no. just a hybrid sql dev/engineer position: i was told the primary objective was designing ETL pipelines for local imports and exports with some side jobs involving snowflake (the product, not the schema). but turns out he came in hot with all of this stuff. If I were told about a role that primarily built ETL pipelines and also worked with Snowflake (the product), I would 100% assume it was for a data warehousing engineering position, for which the patterns you've named are the bread and butter. The dude's an rear end for not stopping and making sure you two were on the same page about the thing you were interviewing for (and discussing the concepts specifically in the context of a local SQL Server data model is kind of weird) but his assumptions themselves don't sound unreasonable.
|
# ? May 26, 2022 21:25 |
|
I feel like this should be straight forward, but I seem to be missing something. My desired result Is to get a the list of all products and max value sold. When ever I add a date filter to the where , WidgetC goes away What am I missing? code:
code:
code:
joebuddah fucked around with this message at 05:35 on May 27, 2022 |
# ? May 26, 2022 23:22 |
|
joebuddah posted:I feel like this should be straight forward, but I seem to be missing something. You can make the query more readable (in my opinion) by adding the date filtering to the sales join of your original query. Like so... code:
|
# ? May 27, 2022 06:13 |
|
Thanks, you're correct the way you didn't it makes it more readable. The COALESCE makes it better. Thanks
|
# ? May 28, 2022 03:13 |
|
I'm puzzled at the difference between a query with a left join and what I thought was an equivalent query using a not exists subquery. Here's the first query. (Apologies for the pastebin link, Cloudflare was blocking me if I included the SQL in the post.) This query gives me the row I expect to see; what I'm trying to do is get a row from the table cam_grab, unless there is a row in cam_tag with a matching cam_uuid and the tag "noresize". In this case there is no such row, so I get the cam_grab row in the result. However, this second query gives me no results. Isn't that the same thing? And the other thing is that if I change AND NOT EXISTS... to AND EXISTS... it still doesn't work. One of those two cases has to work, right? If I take the "NOT EXISTS(...)" out of the WHERE and put it into the SELECT, it's 1; I am 100% sure there is no row in cam_tag with that uuid and tag='noresize'. Is there something extremely dumb and obvious I'm missing here?
|
# ? May 28, 2022 22:45 |
|
(scratch all that ...) It's a joke right? Your query planner is munging up things. What are the actual rows? And can you reproduce it with just those rows? (Which is to say I kind of see it, and they aren't the same queries, but now I've gotten myself confused and I'm having a hard time coming up with an explanation.) PhantomOfTheCopier fucked around with this message at 02:12 on May 29, 2022 |
# ? May 29, 2022 01:38 |
|
PhantomOfTheCopier posted:I don't believe that "there's no such row". You should actually look at the tags for that id. Believe, friend: code:
Now, this is what really confuses me. How can neither of those first two queries work? And why won't the first one work when the NOT EXISTS()... evaluates to 1 when I put it in the select in the third query??? e: to pre-empt the obvious answer, I triple-checked that the row does actually exist--"SELECT * FROM cam_grab WHERE cam_grab.uuid='6c47e303-38da-43a7-a53b-7412a133a8ae'" shows it Tiny Bug Child fucked around with this message at 02:30 on May 29, 2022 |
# ? May 29, 2022 02:22 |
|
I don't understand why you need a sub query to find a row in tag with a given id. But whatever. In any case everything works fine here with sample data and duplicate rows, null values, etcetera. Fire up EXPLAIN. And what's your database+version. And get out of your current tables and do all your testing against a temp table with just these rows to see at what point things are falling apart. It just sounds like your database can't properly handle NOT EXISTS (which sheesh that was 10yr ago in PostgreSQL). ps. You're crossing the streams. Just write SELECT from tag where id="your value" and tag="property". pps. The way to figure out what a database is doing is to start removing things from your queries. Don't start stuffing more copies of not behaving queries inside other queries. For example, just do select from grab where exists (...), then the not exists case. There shouldn't be overlap, as you rightly note, but it sure sounds like there is in this case. It's either the data or the database. vvv Right but the crystal ball isn't uncovering it so you have to divide further. Create tables with only the single id and a not matching tag and see what happens. PhantomOfTheCopier fucked around with this message at 02:47 on May 29, 2022 |
# ? May 29, 2022 02:35 |
|
This is a simplification of a way bigger query; in the actual query I don't have a given ID, I have a bunch of other conditions that all seem to be working properly. It's the smallest possible query I could get to that still shows the problem--just selecting one row that definitely exists, plus the NOT EXISTS condition. The version is MariaDB 10.7.3, so it's not that old.
|
# ? May 29, 2022 02:42 |
|
I suppose I'm saying I agree with you in that "They should work", but they clearly don't so start small(er) then add real rows of data to figure out when it falls apart (assuming it even works with sample data):code:
|
# ? May 29, 2022 03:01 |
|
A breakthrough, of sorts: I was trying to recreate the issue on a dbfiddle site, but had to change the tables because the latest version I could find was MariaDB 10.6, which doesn't support the uuid type. So I changed the uuid columns to varchars... and it works exactly as expected. And if I create the tables with varchar columns instead of uuids in my environment it works too, with no changes to the queries. Mystery partially solved, I guess--I don't really understand why it's not working.
|
# ? May 29, 2022 03:14 |
|
Very nice! Good job. (If it isn't behaving as expected, don't assume anything else will behave and start treating it like a black box, checking everything from the ground up.) I used PostgreSQL back when NOT EXISTS wasn't working and figuring out why it was misbehaving was a similar process of digging apart everything. So it's either the db version or the type. Wery interesting. ps Maybe explicitly cast the uuid back to a string, append a character or something, and see if that works. It'll be slow but at least tell you something. And is the id field in the tag table also a uuid? Perhaps it's matching in one case but not the other without an explicit cast. As a new feature in Maria, it might be broken. (It's probably doing integer matching at some point of the query, and it'll cast '55ab-...' to 55 silently.) PhantomOfTheCopier fucked around with this message at 04:03 on May 29, 2022 |
# ? May 29, 2022 03:17 |
|
It really seems like something screwy is going on with the uuid type. All the uuid columns are the uuid type, and as written the query doesn't work, but if I change that where in the subquery to this... WHERE CAST(`cam_tag`.`cam_uuid` AS BINARY) = CAST(`cam_grab`.`cam_uuid` AS BINARY) it works exactly as expected. And CAST(... AS UUID) works too!
|
# ? May 29, 2022 05:39 |
|
Once, while learning mysql, I came across a command-line command that would give you the original instruction that would create a specific table, just as you had it. I am trying to find it, but my search terms are too generic and I am getting plenty of useless results, tutorials and stuff. Does anyone know what instruction I'm talking about? Thanks in advance.
|
# ? May 29, 2022 22:53 |
|
https://dev.mysql.com/doc/refman/8.0/en/show-create-table.html ??
|
# ? May 30, 2022 00:33 |
|
Thanks! That is it.
|
# ? May 30, 2022 09:13 |
|
The advice I have frequently seen for audit logging in postgres and similar RDBMS is - “don’t”, use another store that can work async and won’t compete with OLTP for resources/IO. Is that broadly good advice? If so, would that be a good use-case for something like mongo, or another similar document store? Append-only history where nothing is ever deleted seems like a good impedance match to that kind of “this is basically a fancy fwrite” category of store…
|
# ? Jun 3, 2022 06:14 |
|
For what it's worth I've never heard that, but it depends on the definition and intended purpose of "audit log". If it means "a collection of rows affected by inserts, updates, or deletes in a source table in adherence with the transaction isolation expectations" then there's typical no choice but to put it in the database since waiting for external call confirmation is slower. For example, a database of monetary account balances may have an audit log and the information is sufficient to roll back or forward to any point in time. That's TRIGGER territory. If it's still transaction safe but not necessarily "live data", in PostgreSQL there's still streaming and log replication that can offload logging to a separate system. Arguably even a WAL receiver has a justifiable cost beyond just the log, since the full copy of the data has other benefits. If audit log means "some vague comments about data changes in a source table", then of course it can be offloaded and stored more cheaply. Usually the use case is a "handful of fields someone without database access once requested but which now supports notable business workflows and probably compliance, until that day when an emergency happens and the audit is insufficient/out of order/only eventually consistent/found to be incomplete and we realize we should have used triggers".
|
# ? Jun 7, 2022 04:19 |
|
|
# ? Jun 10, 2024 16:20 |
|
What are the implications of using a composite primary key versus creating an autoincrementing column for a PK and then creating an index on the columns? Why would one use one method over the other? For example: If I have data [colA, colB, colC, colD...] that is unique for [colA,colB] what are the implications of creating a table with a composite primary key of [colA, colB] versus creating an autoincrementing col0 and then creating a secondary index on [colA, colB]?
|
# ? Jun 10, 2022 17:54 |