|
PhantomOfTheCopier posted:Agreed. I definitely agree that it's doable. I worry about the strange cases; none have sprung to mind but I haven't given it much thought. I think what I’m missing is why included columns can’t count towards that requirement. Technically and hypothetically speaking. Assume table X with columns a, b, c and unique index idxX(a). If idxX(a) also includes b, it is still unique. A change to X.b will still trigger a change to a row in idxX if b is included. Isn’t this sufficient to notify references, making a FK reference to X(a,b) technically viable?
|
# ? May 6, 2020 02:49 |
|
|
# ? Jun 8, 2024 07:33 |
|
Ruggan posted:I think what I’m missing is why included columns can’t count towards that requirement. Technically and hypothetically speaking. * Historical: "INCLUDE" is new(ish). There's every reason to believe the feature was added disregarding triggering since that can be achieved in different, simpler ways. * Performance: You shouldn't use INCLUDE anyway, excepting in cases where you have stats showing that it could provide benefits. * Purpose: INCLUDE specifically addresses an optimization concern about fetching data using an index-only scan. There's no reason a priori that it should be equivalent to an index that covers that field. * Question: Will it let you INDEX (a,b,c) INCLUDE (c) ? * Design: It's likely that all index types supported by the database (ie not btree) provide the same functionality of "relation-to-relation on fields F" support. Depending on implementation, INCLUDE may be nothing more than raw data, so it may not even have an equality operator. Converting lookups to "relation-to-relation on fields Fa, coupled with various possibly-wide fields scanned for various types of equality" requires more data structures in the index to support and will blow performance out of the water. Suppose you have your setup, idxX(a)+includes(b), and suppose you have two tables referencing it, Y(a) and Z(a,b). If a value of 'b' changes, how shall it determine which to update? If (a,b) are both "in the index", the tuple id in X could cause an indextuple id update, triggering a foreign tuple update in both Y and Z... but it's not needed in Y. And if only (a) is in there, then you get the current behavior. So now you need an extra field in your on-disk data structure to track different column purposes in your index... which is not impossible. There are open source databases, so you can certainly go add this functionality. Perhaps the mailing lists have information on the reasons, if any, that they haven't within their respective implementations. That's all the time I have right now.
|
# ? May 6, 2020 19:57 |
|
PhantomOfTheCopier posted:Suppose you have your setup, idxX(a)+includes(b), and suppose you have two tables referencing it, Y(a) and Z(a,b). If a value of 'b' changes, how shall it determine which to update? If (a,b) are both "in the index", the tuple id in X could cause an indextuple id update, triggering a foreign tuple update in both Y and Z... but it's not needed in Y. And if only (a) is in there, then you get the current behavior. So now you need an extra field in your on-disk data structure to track different column purposes in your index... which is not impossible. Thanks, this helps clarify it. I can see how it’s a sensible requirement to require a full, exact index match for a FK from this. When you add more included columns and allow this it gets even worse: idxX(a) + include(b,c) with fks A(a) B(a,b) C(a,c) D(a,b,c). Gets ugly. Thanks for the example.
|
# ? May 6, 2020 21:00 |
|
Am I the only one who thinks this is stupid? MSSQL:code:
code:
It's no different than filtering by WHERE 1=1, and most likely for the same reason (programmatically-generated code)
|
# ? May 7, 2020 09:33 |
|
Here's a WHERE Type = 'Stupid' question: Is there a functional/performance difference between these JOINs in terms of how I restrict Field2 on Table2? In practice they seem to have the same outcome, but I don't know which is 'proper'. code:
code:
mik fucked around with this message at 13:56 on May 7, 2020 |
# ? May 7, 2020 13:53 |
|
NihilCredo posted:Am I the only one who thinks this is stupid? MSSQL: But you're not filtering, you're sorting. What do you expect the DBMS to do when the sorting expression evaluates to the same value for every row?
|
# ? May 7, 2020 14:01 |
mik posted:Here's a WHERE Type = 'Stupid' question: Yes, there is a difference, when you make a left join. If you did an (inner) join they would be equivalent. The first will match every row in A with all rows in B where Field1 matches, and B.Field2 has the value. If there are rows where Field1 matches none where B.Field2 does, you will get a row without any data from B. The second will match every row in A with all rows in B where Field 1 matches. It will then filter out joined rows where B.Field2 does not have the value. This means rows from A where there were no matching rows from B will be left out. (It will effectively behave as a plain inner join.)
|
|
# ? May 7, 2020 14:17 |
|
Gatac posted:But you're not filtering, you're sorting. What do you expect the DBMS to do when the sorting expression evaluates to the same value for every row? The same thing it already does when I ask it to ORDER BY Foo and Foo happens to have the same value in every row: return the equivalent rows in any order it likes, as it will satisfy my request. I cannot think of a single scenario where I would want it to crash instead.
|
# ? May 7, 2020 15:00 |
|
NihilCredo posted:The same thing it already does when I ask it to ORDER BY Foo and Foo happens to have the same value in every row: return the equivalent rows in any order it likes, as it will satisfy my request. Yeah, I see no reason why it wouldn't do as you ask rather than try to be clever. It can recognise already that the expression is a constant, so why not just silently disregard that item in the ORDER BY list and come up with the same query plan it would do if that item wasn't in the list? (in this case, the same query plan as it would if there were no ORDER BY clause at all, since it's the only list item)
|
# ? May 7, 2020 15:35 |
|
My perspective on this is just the other way: I'd like the DBMS to not just silently disregard stuff like that, but error out so I can fix my query. NihilCredo's first example, where there's an expression that (in some cases) might just happen to evaluate to the same value for every row is accepted as sort criteria because the DBMS can't determine that it'll be the same value across all rows before it actually executes the query. The second example clearly can only evaluate to a constant value of 0, so there's a rule to catch it before executing the query. I don't see this as the system trying to be clever; it's trying to save you from nonsense in your query. Apparently harmless nonsense in this case, granted, but I can see where the people who made it error out were coming from. Should it be a configurable warning rather than an error? Maybe.
|
# ? May 7, 2020 17:05 |
|
It's also a guard against dynamic/constructed SQL with empty values in, which is generally a sign something's gone terribly wrong and terminating rather than continuing is the best bet.
|
# ? May 7, 2020 23:07 |
|
I dunno. I don’t think it is the engine’s job to enforce coding style, just syntax. If someone hands the optimizer a query with dumb bullshit and it is valid SQL the engine should execute it. I would expect sorting by a constant to have no impact on the output sort order.
|
# ? May 8, 2020 04:14 |
|
NihilCredo posted:Am I the only one who thinks this is stupid? MSSQL: I found a line in a text copy of the ANSI standard from 1992 (because they make you pay hundreds of dollars for the current PDFs 🙄) that says quote:If a <sort specification> contains an <unsigned integer>, then the <unsigned integer> shall be greater than 0 and not greater than the degree of T. Munkeymon fucked around with this message at 14:35 on May 8, 2020 |
# ? May 8, 2020 14:32 |
|
Munkeymon posted:I found a line in a text copy of the ANSI standard from 1992 (because they make you pay hundreds of dollars for the current PDFs 🙄) that says
|
# ? May 8, 2020 14:35 |
|
Nope, still fails if you do 1 ELSE 0
|
# ? May 8, 2020 14:39 |
|
code:
Credit where credit's due.
|
# ? May 8, 2020 15:25 |
|
I've got a PostgreSQL 11 table that contains only a 1:N set of lookup codes of various types for a particular entity, like this:code:
If I want to improve the performance of such queries, what would it be best to add? - The most obvious: a hash index on (code_source, code_type, code) - A hash index on (code) - A hash index on (code) INCLUDING (code_source, code_type) - Other option?
|
# ? May 12, 2020 11:07 |
|
NihilCredo posted:I've got a PostgreSQL 11 table that contains only a 1:N set of lookup codes of various types for a particular entity, like this: Admittedly I'm a T-SQL/SQL Server guy and am not well versed on the finer points of the differences between it and PostgreSQL but the text columns being part of a PK is a smell to me. I would normalize them out to their own table, especially if the code source and code types are going to be a fairly limited set. How does PostgreSQL do things like clustering? I saw from documentation that defining the primary key creates its own b tree index but that appears to be somewhere that SQL Server and PostgreSQL diverge. Thirdly I think that using uuids for pks isn't a great fit compared to something like ints at the CPU level if only because you can fit two entire ints into the ALU's inputs for comparison in a single instruction vs. 128 bits for a uuid, or even worse for string comparisons with the text columns.
|
# ? May 12, 2020 17:46 |
|
NihilCredo posted:I've got a PostgreSQL 11 table that contains only a 1:N set of lookup codes of various types for a particular entity, like this: I'll still point out, however, that btree indexes are more robust. Anything other than direct equality won't use the hash index. (Regular expressions, like, ranges, inequality.) The best way would be to create a (source,type,code) index, check the resulting size and performance improvement via Explain Analyze, and then compare it to other choices. It's possible a (source,type) index would be sufficient, so... I'd argue that your lookupcodes table is in the wrong order. If instead, "this table relates source/types to a set of IDs and their specific code", then the primary key would be source/type/id and the query planner could index scan on the first two fields. A standalone index on the id would help for thingy lookups.
|
# ? May 13, 2020 03:30 |
|
Postgres question: we have a table in production that has sneaked up to about 132M rows without anybody really noticing. It's a fairly simple table - integer primary key, three integer foreign keys (one of them nullable), two booleans, one integer and a timestamp. Each of the foreign key columns has its own index. Normally querying this table isn't a problem but we're starting to notice that postgres sometimes seems to do seq scans on it even though the query filters by the indexed foreign keys. For example, if one particular query I've been looking at uses a semi-join on one of the foreign key columns it'll do a seq scan, but if you take the list of matching ID's and do a WHERE foreign_id IN(long_ass_list_of_ids) it'll use the index in a bitmap index scan. When I look at the query plan for one of the problematic queries where it's doing this, the planner is overestimating rows by a factor 90, and when I look at the table stats the n_distinct estimate for some of the foreign key columns is off by about a factor 100 compared to the actual value. I've been doing some reading and I think I want to try raising the statistics_target for some of these columns (currently using the default 100), but what's a reasonable value to pick for a table of this size? Is there a downside to just setting it to the max value 1000? We're currently on PG 9 but are working on migrating our way to 12. TheFluff fucked around with this message at 11:38 on May 15, 2020 |
# ? May 15, 2020 11:35 |
|
TheFluff posted:Postgres question The tradeoff in increasing the statistics target is that all query planning that uses that will be a little slower. It's probably okay, but be aware of it. However it may just be that it hasn't had a chance to update the statistics in a while, especially if you haven't done any autovacuum tuning yet. By default autovacuum is set to be pretty timid and doesn't wan to use too many system resources, but then doesn't get around to actually working through the whole db and this is when stats are updated. You can force an update with `vacuum (analyze, verbose) tablename;`. Give that a try first, then see if the queries use the indexes like you think they should before messing around with the targets.
|
# ? May 17, 2020 01:35 |
|
I'm mostly self-taught so apologies if this is simple. I have a query I've built and I'm trying to figure out if there's a way to remove all entries that are tied to a field value that only appear once in the final result. I can do this easy enough in Excel with conditional formatting to highlight and remove those that only appear once, but I want to be able to scale large it if need-be. So if I wanted to remove records with unique 'name' field information and my data looked like this: code:
code:
Chakan fucked around with this message at 20:23 on May 20, 2020 |
# ? May 20, 2020 20:16 |
|
Chakan posted:I'm mostly self-taught so apologies if this is simple. I have a query I've built and I'm trying to figure out if there's a way to remove all entries that are tied to a field value that only appear once in the final result. I can do this easy enough in Excel with conditional formatting to highlight and remove those that only appear once, but I want to be able to scale large it if need-be. Check this or similar: https://www.w3schools.com/sql/sql_having.asp
|
# ? May 20, 2020 20:20 |
|
Chakan posted:e: VV I don't think having will work for how I've set it up because I'm already grouping in a way that makes me think it'll give bunk data if I do that, but that's good to remember & I'll experiment. Thanks! it shouldn't matter what you're grouping by, you should be able to exclude names via subselect in your where clause using group by and having and then group the outer data by whatever
|
# ? May 21, 2020 00:04 |
|
t-sql here, is it possible to have stored procedures that produce temp tables that are then joined within the scope of a grand stored procedure? like: -- spOverall - spProducesTemp1 - spProducesTemp2 - spProducesTemp3 SELECT blah FROM #Temp1 JOIN #Temp2 JOIN Temp3 <result is update to a non-temp table> END --spOverall -- And yes, spProducesTemp2 could use the temp table from spProducesTemp1 to generate #Temp2. abelwingnut fucked around with this message at 00:41 on May 21, 2020 |
# ? May 21, 2020 00:38 |
|
E: nm misunderstood question
|
# ? May 21, 2020 00:43 |
|
abelwingnut posted:t-sql here, is it possible to have stored procedures that produce temp tables that are then joined within the scope of a grand stored procedure? like: Yes, you can define a temp table to live in the global scope by using double hash ala ##globalTemp. https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables?view=sql-server-ver15 Scroll down to the 2nd entry under table types.
|
# ? May 21, 2020 00:46 |
|
check out How to Share Data between Stored Procedures, by notable sql server greybeard erland sommarskog. you should check out his other articles too, just so you know what topics he's covered in enormous detail
|
# ? May 21, 2020 00:49 |
|
thanks, will check out all of this tonight or tomorrow. gotta a 4k-line single stored procedure that i should probably reduce down to specific jobs!
|
# ? May 21, 2020 00:53 |
|
kumba posted:it shouldn't matter what you're grouping by, you should be able to exclude names via subselect in your where clause using group by and having and then group the outer data by whatever Ah, thank you.
|
# ? May 21, 2020 18:13 |
|
Anyone have experience tuning configs for Postgres? We're moving to an Azure Postgres-as-a-service instance and the MS guys who are supposed to help us with this keep blowing us off. Right now we're on a 16 core server, but the config sets the max_parallel_workers is set to 8 and the max_parallel_workers_per_gather is set to 2. Which seems to be why the server maxes at 50% CPU usage under load. In general the server is just unbearably slow. Non-concurrent index builds for large (50-100gb) tables are taking 6-8 hours.
|
# ? May 24, 2020 13:54 |
|
I'd at least start from a config that PGTune spits out.
|
# ? May 26, 2020 17:37 |
|
creating an index uses maintenance_work_mem, so if you have a big one to create and your default from the config is not very high (which is generally advisable), you can increase that setting for your session to give it more memory to sort in
|
# ? May 26, 2020 19:17 |
|
Star War Sex Parrot posted:I'd at least start from a config that PGTune spits out. That is the plan. The pg folks from MS are supposed to get back to us this week. Hopefully we get a technical person and not another drat salesman.
|
# ? May 28, 2020 03:53 |
Riddle me this, SQL fans: I've got a view that joins 8 tables. 7 inners, one left. call these A-H. In production, this view returns 1.5m rows. I've got a query that left joins table C on this view again, groups by C's ID, and does a bunch of aggregations (sums, mostly). The result set in prod is about 4500 rows. We discovered that, when run on production data, the query takes about six and a half minutes and was timing out our app requests. However, in staging, it took a significantly saner amount of time (30s) with *almost* identical data (couple items added by our QA team). Bad times, so we start looking into it. What we found was that there were two columns in table F that were unpopulated (null values all the way down) in production, but had data in staging. This seems to be the exclusive difference. We haven't added any data to prod to test this, but running the query with the removal of those columns dropped the query time back down to 30 seconds. Why the gently caress is this happening? Explain plan is not helpful, and these columns are strictly data holders with no indices or weird data types or anything.
|
|
# ? May 29, 2020 22:14 |
|
ChickenWing posted:Riddle me this, SQL fans: EXPLAIN ANALYZE
|
# ? May 29, 2020 22:17 |
|
Xae posted:That is the plan. We found our problem: min_parallel_table_scan_size . The Azure default is messed up. It thinks the size is in bytes, but it is in 8kb blocks. Meaning it was set to not parallelize a query unless the plan estimated 8,388,608 8kb blocks , roughly 70gb, to read.
|
# ? May 30, 2020 00:00 |
|
ChickenWing posted:Riddle me this, SQL fans: which rdbms? were those columns used in the query and/or view, and if so, how? i assume the result set is equivalent between staging and prod
|
# ? May 30, 2020 02:39 |
|
ChickenWing posted:Riddle me this, SQL fans: Are your offending columns potentially large data types that, when non null, might be causing out-of-row data reads? In MS SQL Server this includes varchar(max), nvarchar(max), varbinary(max), xml, text, ntext, image. Could also be contributing to a row not fitting on a single page.
|
# ? May 30, 2020 08:19 |
|
|
# ? Jun 8, 2024 07:33 |
Malcolm XML posted:EXPLAIN ANALYZE well that found the errant issue - join with expected cost 9000 actual cost about 7000000 with about 3 billion rows filtered out redleader posted:which rdbms? were those columns used in the query and/or view, and if so, how? i assume the result set is equivalent between staging and prod postgres 11, columns are selected into the view, then used in calculations in the query output - the calculation is basically (col1 * (col2 - col3 + col4(!) + col5(!) + col6) / col7) where (!) indicates the fully-null columns Ruggan posted:Are your offending columns potentially large data types that, when non null, might be causing out-of-row data reads? just NUMERICs
|
|
# ? May 30, 2020 14:42 |