|
Xae posted:That is the plan. We found our problem: min_parallel_table_scan_size . The Azure default is messed up. It thinks the size is in bytes, but it is in 8kb blocks. Meaning it was set to not parallelize a query unless the plan estimated 8,388,608 8kb blocks , roughly 70gb, to read.
|
# ¿ May 30, 2020 00:00 |
|
|
# ¿ May 13, 2024 14:51 |
|
ChickenWing posted:Yeah I've been trying to reveal as much as I can without actually showing much in the way of schema details because proprietary code, etc. Check any sub queries or CTEs you have. Postgres is pretty iffy on pushing down filter criteria into them. Also sorting 4 billion records for under 500kb of memory sounds real fishy. Xae fucked around with this message at 03:33 on Jun 8, 2020 |
# ¿ Jun 8, 2020 03:31 |
|
I've got two tables that are throwing problems at me. They're the same table, but we have to maintain US and EU data centers with different data in them. They're partitioned tables. In the US queries on the table will use the indexes correctly except for one partition. Which sucks, because a seq scan can add 5-10 seconds. In the EU it will never use the index on any partition. Which means our update process goes from minutes to ten or so hours as it full table scans a billion rows a few times. I verified that the indexes show as valid. I've VACUUM ANALYZE'ed the "bad" partitions. I've VACUUM ANALYZE'ed the whole tables. I've REINDEX and then VACUUM ANALYZE'ed the tables AND partitions Whats next? Drop the indexes on the partition and try again?
|
# ¿ Jun 22, 2020 15:38 |
|
I've got a three day weekend to solve this problem in Postgrescode:
But if I do this: code:
As a bonus if I do this code:
We've got an implementation team at MS who is supposed to be helping with this poo poo, but because of the holiday they're kinda MIA. We've only gotten an acknowledgement that "That... shouldn't be happening". I've messed around and done more REINDEX, VACUUMs and ANALYZE commands than I can count. Hell, I even detached the partitions, clustered on the index on TimeStampA and reattached. Still does a sequence scan.
|
# ¿ Jul 3, 2020 04:43 |
|
Pardot posted:Maybe if you pull the select max up into a cte? No change. But that did get me to think about encapsulating the subquery in a function, which seems to work? There is one sticky partition that won't use the index, but 5 minutes beats a few hours. Thanks!
|
# ¿ Jul 3, 2020 20:38 |
|
abelwingnut posted:is there a good free db or table for all postal/zip/mail codes in the world? Addresses are the second biggest clusterfuck in computer science, right after time. Even if a specific country doesn't change often there are nearly 200 countries. So as a whole they change all the time. Postal codes in many countries are unique or near-unique identifiers so they'll change constantly. Ruggan posted:Not a Postgres expert (I do mostly MS SQL), but maybe do something like this? The query runs through an ETL tool that does not support that. It has to be a single self-contained query.
|
# ¿ Jul 4, 2020 04:48 |
|
CarForumPoster posted:I have a novice DB choice question. I'd like to choose something that will work for the next 2-3 years. Look at moving the big ol' json string to a different table and having a key to join them. You might see performance gains, you might not. Can't there be multiple attorneys per case? How are you handling that in one row? I would also look at creating partitions for the table based on a filing date, or some other date that never changes. Then default to a time limited search. This will split up your indexes by partition and avoid searching the entire history when most of the time people should know roughly when a case was filed.
|
# ¿ Jul 16, 2020 17:15 |
|
CarForumPoster posted:Yes there can. Right now I have a bad system where I separate into a "lead attorney" and a list of other attorneys. This works for the cases I started this with but with the subsequent data, its not a great system. I'm tempted to avoid having a bunch of tables that reference each other though. If only because it makes the performance worse and the data analysis harder. It depends on how you access things, but look ups might improve your performance. Names as strings are big. Ints are small. The associative table between Case and Attorney will scan much faster than the Case table. TimestampTZ64(FilingDate)+Int32(CaseID)+Int32(AttorneyID)+Int32(AttorneyType) = 160 bits = 20 Bytes Postgres is decent at generating plans, so should do a scan on Attorney, then index look up on Case/Attorney Association table, then Partition/PK look up on Case. In that case the only table scan is on the Attorney table, which can't possibly be that big. It'll probably be in memory most of the time. From there it uses indexes to avoid scanning the big table. It seems dumb and extra effort, but Warehouses are the one place where bit size and that jazz still matters. Because a hundred bytes here or there times a few million or a billion rows adds up fast. When you're running that fuzzy logic search its better to run it on an Attorney table with 10,000 rows than the Case table with 10,000,000.
|
# ¿ Jul 16, 2020 20:50 |
|
CarForumPoster posted:I'm worried that was the argument that led to this nearly unusable API layout: https://www.courtlistener.com/api/rest-info/ Take a look at that diagram and try to figure out how to find all of "Bob Loblaw's" cases in Yeah, that is a bit over designed for your purposes. Build the data model out a few ways. Use a data generator to fill out the tables and then start querying it. A lot of these questions can only be answered by experimenting.
|
# ¿ Jul 17, 2020 05:05 |
|
|
# ¿ May 13, 2024 14:51 |
|
Mox Hombre posted:yo what's up YOSPOS long time no see. code:
https://www.postgresql.org/docs/11/tutorial-window.html
|
# ¿ Jul 24, 2020 18:36 |