|
Olly the Otter posted:I'd be astonished if it's actually running the subquery multiple times, with row deletion happening in between. Is there a query I could try that might confirm this behavior deterministically? I can read it and it doesn't match your example. There are 2260 rows here, not five, and it plans to delete 1 row, not all five. Note that it's locking all 2260 rows, which is probably not what you want. Meanwhile it's performing a subquery scan in a way that allows the query planner to optimize in a way that may perform multiple passes. Do EXPLAIN ANALYZE to show the actual loops executed. You need to put it in a CTE to materialize and prevent the query planner from second-guessing what you want. You get different results in different cases because the optimizer uses updated statistics each time the query runs.
|
# ? Jun 27, 2019 22:56 |
|
|
# ? Jun 8, 2024 00:22 |
|
Ok, here's an EXPLAIN ANALYZE, in a case where it's definitely deleting 6 rows:code:
|
# ? Jun 27, 2019 23:07 |
|
Loops=6 all over the place. Look at these two lines: Seq Scan on tasks (cost=0.00..1.00 rows=1 width=10) (actual time=0.020..0.023 rows=6 loops=1) -> Subquery Scan on "ANY_subquery" (cost=1.01..1.03 rows=1 width=32) (actual time=0.013..0.013 rows=1 loops=6) Read bottom up. It did the sub-query scan six times and turned that into 6 rows. The simplest and probably slightly incorrect explanation that I can give would be... It's forced to do the sequence scan to fulfill the IN operator because there's no index, and the stored statistics on the table are such that the query planner is running multiple loops to fulfill that condition. For example, there may actually be six stored data pages for the table (or a database may decide to do some of the evaluation in parallel), so it breaks the conditional validation into multiple pieces to save speed and memory, or to prevent locking multiple pages at the same time. PhantomOfTheCopier fucked around with this message at 23:24 on Jun 27, 2019 |
# ? Jun 27, 2019 23:12 |
|
Ok, that helps, thanks.
|
# ? Jun 27, 2019 23:56 |
|
That almost sounds like a bug, particularly when the same query gives different results based on... something.
|
# ? Jun 28, 2019 00:40 |
|
Olly the Otter posted:Edit2: The EXPLAIN seems to have changed -- the previous EXPLAIN was before the table had become corrupted, so here's one from this apparently corrupted table: Have you ANALYZEd the table since the corruption?
|
# ? Jun 29, 2019 05:32 |
|
Hey dudes. How do you run PgAdmin4 in Ubuntu? In win, it's an installer. The official instructions are old/not good/don't work with new versions of Ubuntu. There's an approach where you can install as a python wheel, but I'm getting permission or package not installed errors even after setting up the wheel (the script attempts to edit something in the protected /etc directory, but if you sudo it, you then lose access to the packages the wheel installed). This is oddly difficult / not user-friendly.
Dominoes fucked around with this message at 12:52 on Jul 14, 2019 |
# ? Jul 14, 2019 12:45 |
|
Hi all. I have a date query question that I can't figure out this late in the day. Say I have a table that looks like code:
|
# ? Jul 17, 2019 20:53 |
|
SixPabst posted:Hi all. I have a date query question that I can't figure out this late in the day. code:
|
# ? Jul 17, 2019 22:30 |
|
Yeah not entirely clear. If window A overlaps B, and B overlaps C, but C doesn't overlap A, what is the desired result?
|
# ? Jul 17, 2019 22:45 |
|
Our resident SQL performance guy shared this brain teaser in our internal forums this week: (our database is multi-tenant) Performance Guy posted:In 5 seconds or less, identity the problem with this query: The problem being that the OR in the WHERE clause isn't properly wrapped in parenthesis, so the where clause logically: code:
When it SHOULD be: code:
After half a dozen quick and correct analyses by engineers like me, a customer support guy chimes in: Not a developer guy posted:Well, to begin with there is no structure to the query. I was taught to focus on structure so that the query is easily readable. Even if your code ran 500 times faster but no structure, you failed. with that poo poo, man. The only changes I made were to rename the tables and columns. All formatting, capitalization, and aliasing are as they appeared. I want to light this dude's keyboard on fire. Another high up engineer and performance guy had a very diplomatic response: Performance Guy 2 posted:Not a developer guy,
|
# ? Jul 19, 2019 16:24 |
|
Can someone explain the difference between SSMS, SSDT, SSRS, and SSIS to me like I'm an idiot. Because I thought I understood the difference but either I don't or my director doesn't, and I'm trying not to look like I'm an rear end in a top hat if I'm the one being the dumbass.
|
# ? Jul 19, 2019 16:53 |
500 times faster? I'd make the most unreadable, unmaintainable piece of poo poo you'd ever seen involving every bad practice you could think of and multiple fast_sqrt magic number equivalents if I could make a central, heavyand often used query 500 times faster.
|
|
# ? Jul 19, 2019 17:13 |
Tibalt posted:Can someone explain the difference between SSMS, SSDT, SSRS, and SSIS to me like I'm an idiot. SSMS is a gui for the database. SSDT is a set of db development tools that integrate into VS, SSRS is tools for reports, SSIS is programmability/ETL. Which part is confusing? Just google any of those terms, MS docs explains it well enough. Shy fucked around with this message at 18:42 on Jul 19, 2019 |
|
# ? Jul 19, 2019 18:32 |
|
Shy posted:SSMS is a gui for the database. SSDT is a set of db development tools that integrate into VS, SSRS is tools for reports, SSIS is programmability/ETL. Which part is confusing? Just google any of those terms, MS docs explains it well enough. The reason for my frustration is that I've been hired onto this new startup, and things are a bit wacky. I'm being told that I have everything I need, but I only have SSMS installed. Sure, I can write scripts and use the Wizards to get by, but I need SSDT, right? Before I start pushing back on this, I really want to make sure I'm not making an rear end of myself.
|
# ? Jul 19, 2019 19:01 |
Tibalt posted:Well, here's my understanding: SSDT slots into Visual Studios, and SSRS/SSIS slot into SSDT. SSMS is a separate thing, right? You wouldn't use SSMS to create a SSRS project, right? If I'm suppose to be using SSRS/SSIS, I need SSDT and VS installed on my computer, right? Yes you need VS and SSDT (data storage and processing workload in the VS installer) for development, see https://docs.microsoft.com/en-us/sql/integration-services/integration-services-ssis-development-and-management-tools?view=sql-server-2017 SSMS is a management tool, if you're developing you'll need VS
|
|
# ? Jul 19, 2019 19:07 |
|
Shy posted:Yes you need VS and SSDT (data storage and processing workload in the VS installer) for development, see https://docs.microsoft.com/en-us/sql/integration-services/integration-services-ssis-development-and-management-tools?view=sql-server-2017
|
# ? Jul 19, 2019 20:04 |
|
I am trying to design a database schema and I have a question about the best way to do something. I want to have a ratings database that will have a few thousand items that can be rated by each user. Each user can only rate an item once. I need to be able to get an average rating for each item. New items will be periodically added to the database which can then be rated. Which of the following scenarios will work best. 1) Have a separate table of ratings for each item -> store the userid that rated the item and the rating they gave. ex: item123rating -> userid, rating item1234rating -> userid, rating or 2) Have the rating tied to the user -> a field for each item attached to the user table. user123 -> item123rating, rating user123 -> item1234rating, rating or 3) ????? a better way that someone with more experience can recommend I seem to think in my mind that the first method is the best way to do it, but I am not database expert.
|
# ? Jul 21, 2019 21:23 |
|
You need a table with 2 foreign keys - one to the user table and one to the "rateable items" table. This table will have either 3 or 4 columns: the 2 foreign keys, the rating given by the user, and optionally an artificial primary key. (If you do not use an artificial primary key, then the primary key is a composite key consisting of the two foreign keys in one order or the other.) Your first idea is incorrect because it proposes having "a table for each item" where the collection of items can be added to as part of ordinary operation of the software. That's always the wrong solution - categorically. Your second idea similarly is incorrect because it's also wrong to have "a column for each item" where the collection of items can be added to as part of ordinary operation of the software. Both of them are wrong not only conceptually, but also more practically because your RDBMS isn't designed to work like that, but it is designed to work with foreign key relationships and more-or-less normalised database table designs. Doing it the right way will also mean that you can more easily write code that does the very simple task you wanted to achieve - getting the average for each item.
|
# ? Jul 21, 2019 22:05 |
|
Assuming the table is UserRating (UserID, ItemID, Rating) I want to find the average rating for item 123: SELECT AVG(Rating) FROM UserRating WHERE ItemID = 123 I want to find the average rating for each item that has at least one rating: SELECT ItemID, AVG(Rating) FROM UserRating GROUP BY ItemID
|
# ? Jul 21, 2019 22:09 |
|
Hammerite posted:You need a table with 2 foreign keys - one to the user table and one to the "rateable items" table. This table will have either 3 or 4 columns: the 2 foreign keys, the rating given by the user, and optionally an artificial primary key. (If you do not use an artificial primary key, then the primary key is a composite key consisting of the two foreign keys in one order or the other.) Hammerite posted:Assuming the table is UserRating (UserID, ItemID, Rating) Thank you very much. This makes complete sense to me.
|
# ? Jul 22, 2019 01:28 |
|
Today I brought down our site for 30 minutes by running a count on a single table, and what I thought killed the query actually didn’t. How the hell would a count on a single table gently caress things up that badly?
|
# ? Jul 23, 2019 18:06 |
|
No idea. Never worked with DBs connected to websites but I would begin with: How big was the table? How did you run the count? How integral is the table to the website? How active was it at time of querying? Etc etc
|
# ? Jul 23, 2019 18:08 |
|
Table’s apparently pretty big, several gigs in size. Flat out select count(*) from table query. The table is pretty important to only half the site, most of the site doesn’t use it, but to the ones that do it’s integral. It gets queried whenever a fairly common page is loaded. I hate my life.
|
# ? Jul 23, 2019 18:18 |
|
Also depends on the default read isolation level*. See if you can give it a hint that you're OK with an inaccurate count. That'd be with(nolock) in TSQL but IDK how it'd look elsewhere. *might be transaction isolation level?
|
# ? Jul 23, 2019 18:35 |
|
I’m not even gonna bother. This DB and app is a train wreck as-is without me bringing it down.
|
# ? Jul 23, 2019 18:49 |
|
Yeah most likely you read locked the table. At least in MS SQL, reads usually issue shared locks which allow other reads but block writes. It’s to prevent you from reading data in an inconsistent state. If your table is huge and the site relies on inserting data into it or updating data in it, your long running query probably blocked that. Isolation levels or nolock hints are the way to avoid that with various associated cons.
|
# ? Jul 23, 2019 19:40 |
|
Bad query created a table-level lock and dependent queries swamped active connections as it paged in-memory tables to disk. Happened to me way back when. Set a threshold for the max joins in a table if MySQL for example, "[mysqld] => max_join_size = 5000000" or however egregious you want your indecency
|
# ? Jul 23, 2019 21:47 |
|
It very very much depends on the database you're using. In any case, please use information schema (caveats). Edit: Indeed anything higher than read committed is not typical, so while transaction isolation could cause it, if your app is changing the level you have... bigger concerns, let us say. Edit: A couple of gigs of data is nothing. That query should run without issue until you get to terabytes. ... which means I have a guess of what you're using, and that it's not tuned for cache and worker memory. This would still require some interesting interplay with other traffic, especially for a 30min outage. What's the nominal TPS otherwise? PhantomOfTheCopier fucked around with this message at 22:08 on Jul 23, 2019 |
# ? Jul 23, 2019 21:55 |
|
Oh boy, transaction isolation levels. Good stuff. If you're on SQL Server, there's a fast way to get a rough rowcount: code:
|
# ? Jul 23, 2019 22:38 |
|
The data we’re trying to get doesn’t even belong in our database so it can go to hell as far as I’m concerned. Not even my half of the application. I try to debug it and it spits in my face. gently caress it.
|
# ? Jul 23, 2019 22:47 |
|
Pollyanna posted:Table’s apparently pretty big, several gigs in size. Flat out select count(*) from table query. The table is pretty important to only half the site, most of the site doesn’t use it, but to the ones that do it’s integral. It gets queried whenever a fairly common page is loaded. I don't know if this works outside of MSSQL but you might be able to do a select count(1) from table, that way it doesn't have to bother retrieving all of the data just to count rows. If I remember right it should just return a '1' for each row in the table. Been a hot minute since I've touched a real SQL database, so I'm probably wrong. Either way the with (nolock) hints as suggested before should at least stop the query from deadlocking the table.
|
# ? Jul 23, 2019 22:49 |
|
Protocol7 posted:I don't know if this works outside of MSSQL but you might be able to do a select count(1) from table, that way it doesn't have to bother retrieving all of the data just to count rows. If I remember right it should just return a '1' for each row in the table. This was my thought as well. It sounds right but I can’t quite commit to it for some reason
|
# ? Jul 24, 2019 22:57 |
|
It depends on the database. Some actually select *, some prefer select(id), etc.
|
# ? Jul 25, 2019 00:10 |
|
What RDBMS?
|
# ? Jul 25, 2019 08:37 |
How come binary database interfaces aren't more common with ORMs? It seems a waste to communicate using massive strings if you, for instance, just want to select a tonne of data by IDs.
|
|
# ? Jul 26, 2019 12:57 |
|
I've inherited a script that I'm updating for the current year, and it takes forever, so I'm trying to optimize it. I've got three questions that I'm not quite sure about. Would using UNION ALL with two SELECT DISTINCT queries provide a benefit over UNION, if I can assume the two select queries aren't going to have any duplicate records? I've read that using INNER JOINs are more efficient. Does that rule still apply to UPDATE queries, like this? code:
code:
|
# ? Jul 30, 2019 20:04 |
|
Those should plan to the same anyway. What could be faster depending on the size of the data would be using an IN clause:code:
code:
|
# ? Jul 30, 2019 20:11 |
|
The answer to all your questions is “look at the query plan”. But I’ll try to address them.Tibalt posted:Would using UNION ALL with two SELECT DISTINCT queries provide a benefit over UNION, if I can assume the two select queries aren't going to have any duplicate records? UNION ALL is usually faster than UNION (maybe not due to the potential transmission of more records). I don’t know if your two DISTINCTs will be faster or slower than a single UNION. It’s gonna depend on your data and schema. Tibalt posted:I've read that using INNER JOINs are more efficient. Does that rule still apply to UPDATE queries, like this? INNER JOINs aren’t blanket more efficient. Also the comma operator is functionally equivalent to an inner join. Hard to say if getting rid off the DISTINCT will improve performance - gotta look at that plan.
|
# ? Jul 30, 2019 20:20 |
|
|
# ? Jun 8, 2024 00:22 |
|
Unfortunately, no, not a PK. It's more like a invoice number - I want to flag the whole invoice if one of the items meets criteria. What size are we talking about for the IN clause? I'm looking at just over a million records here, with a couple hundred thousand ID numbers. Edit: Thanks for the reply Ruggan as well. I'll take a look at the query plan as well. Tibalt fucked around with this message at 20:27 on Jul 30, 2019 |
# ? Jul 30, 2019 20:20 |