SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Olly the Otter posted:

I'd be astonished if it's actually running the subquery multiple times, with row deletion happening in between. Is there a query I could try that might confirm this behavior deterministically?

I tried EXPLAIN, but I'm not very good at reading the output. I can tell that it's basically saying the query's inefficient due to the bad primary key, but not sure what else is helpful here.

code:

mydb=> explain delete from tasks where tid in (select tid from tasks order by tid limit 1 for update) returning dsid;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Delete on tasks  (cost=43.94..82.48 rows=1 width=34)
   ->  Hash Semi Join  (cost=43.94..82.48 rows=1 width=34)
         Hash Cond: (tasks.tid = "ANY_subquery".tid)
         ->  Seq Scan on tasks  (cost=0.00..32.60 rows=2260 width=10)
         ->  Hash  (cost=43.92..43.92 rows=1 width=32)
               ->  Subquery Scan on "ANY_subquery"  (cost=43.90..43.92 rows=1 width=32)
                     ->  Limit  (cost=43.90..43.91 rows=1 width=10)
                           ->  LockRows  (cost=43.90..72.15 rows=2260 width=10)
                                 ->  Sort  (cost=43.90..49.55 rows=2260 width=10)
                                       Sort Key: tasks_1.tid
                                       ->  Seq Scan on tasks tasks_1  (cost=0.00..32.60 rows=2260 width=10)

I can read it and it doesn't match your example. There are 2260 rows here, not five, and it plans to delete 1 row, not all five. Note that it's locking all 2260 rows, which is probably not what you want. Meanwhile it's performing a subquery scan in a way that allows the query planner to optimize in a way that may perform multiple passes.

Do EXPLAIN ANALYZE to show the actual loops executed. You need to put it in a CTE to materialize and prevent the query planner from second-guessing what you want.

You get different results in different cases because the optimizer uses updated statistics each time the query runs.

# ? Jun 27, 2019 22:56

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:22

Olly the Otter: Jul 22, 2007

Ok, here's an EXPLAIN ANALYZE, in a case where it's definitely deleting 6 rows:

code:

tadpole=> explain analyze delete from tasks where tid in (select tid from tasks order by tid limit 1 for update) returning dsid;
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Delete on tasks  (cost=1.01..2.05 rows=1 width=34) (actual time=0.089..0.141 rows=6 loops=1)
   ->  Nested Loop Semi Join  (cost=1.01..2.05 rows=1 width=34) (actual time=0.076..0.112 rows=6 loops=1)
         Join Filter: (tasks.tid = "ANY_subquery".tid)
         ->  Seq Scan on tasks  (cost=0.00..1.00 rows=1 width=10) (actual time=0.020..0.023 rows=6 loops=1)
         ->  Subquery Scan on "ANY_subquery"  (cost=1.01..1.03 rows=1 width=32) (actual time=0.013..0.013 rows=1 loops=6)
               ->  Limit  (cost=1.01..1.02 rows=1 width=10) (actual time=0.011..0.011 rows=1 loops=6)
                     ->  LockRows  (cost=1.01..1.02 rows=1 width=10) (actual time=0.011..0.011 rows=1 loops=6)
                           ->  Sort  (cost=1.01..1.01 rows=1 width=10) (actual time=0.004..0.005 rows=4 loops=6)
                                 Sort Key: tasks_1.tid
                                 Sort Method: quicksort  Memory: 25kB
                                 ->  Seq Scan on tasks tasks_1  (cost=0.00..1.00 rows=1 width=10) (actual time=0.006..0.008 rows=6 loops=1)
 Planning time: 0.303 ms
 Execution time: 0.212 ms
(13 rows)

# ? Jun 27, 2019 23:07

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Loops=6 all over the place.

Look at these two lines:
Seq Scan on tasks (cost=0.00..1.00 rows=1 width=10) (actual time=0.020..0.023 rows=6 loops=1)
-> Subquery Scan on "ANY_subquery" (cost=1.01..1.03 rows=1 width=32) (actual time=0.013..0.013 rows=1 loops=6)

Read bottom up. It did the sub-query scan six times and turned that into 6 rows.

The simplest and probably slightly incorrect explanation that I can give would be... It's forced to do the sequence scan to fulfill the IN operator because there's no index, and the stored statistics on the table are such that the query planner is running multiple loops to fulfill that condition. For example, there may actually be six stored data pages for the table (or a database may decide to do some of the evaluation in parallel), so it breaks the conditional validation into multiple pieces to save speed and memory, or to prevent locking multiple pages at the same time.

PhantomOfTheCopier fucked around with this message at 23:24 on Jun 27, 2019

# ? Jun 27, 2019 23:12

Olly the Otter: Jul 22, 2007

Ok, that helps, thanks.

# ? Jun 27, 2019 23:56

redleader: Aug 18, 2005; Engage according to operational parameters

That almost sounds like a bug, particularly when the same query gives different results based on... something.

# ? Jun 28, 2019 00:40

McGlockenshire: Dec 16, 2005; GOLLOCKS!

Olly the Otter posted:

Edit2: The EXPLAIN seems to have changed -- the previous EXPLAIN was before the table had become corrupted, so here's one from this apparently corrupted table:

Have you ANALYZEd the table since the corruption?

# ? Jun 29, 2019 05:32

Dominoes: Sep 20, 2007

Hey dudes. How do you run PgAdmin4 in Ubuntu? In win, it's an installer. The official instructions are old/not good/don't work with new versions of Ubuntu. There's an approach where you can install as a python wheel, but I'm getting permission or package not installed errors even after setting up the wheel (the script attempts to edit something in the protected /etc directory, but if you sudo it, you then lose access to the packages the wheel installed). This is oddly difficult / not user-friendly.

Dominoes fucked around with this message at 12:52 on Jul 14, 2019

# ? Jul 14, 2019 12:45

SixPabst: Oct 24, 2006

Hi all. I have a date query question that I can't figure out this late in the day.

Say I have a table that looks like

code:

|   PK     |   CompanyId             |   StartDate  |   EndDate     |
---------------------------------------------------------------------------------
|   1      |   1                     |   1/1/2019   |   12/30/2019  |
|   2      |   1                     |   3/1/2019   |    2/29/2020  |
|   3      |   2                     |   2/1/2019   |   12/30/2019  |
|   4      |   2                     |   1/1/2020   |   12/30/2020  |

What I want to do is find all records for any company where start date and end date overlap or intersect. In this example, I'd get rows 1 and 2 back. This seems straightforward by joining the table to itself and doing some date comparison but I'm not getting the results I expect and feel like I'm missing something obvious.

# ? Jul 17, 2019 20:53

kumba: Nov 8, 2003; I posted my food for USPOL Thanksgiving!

enjoy the ride; Lipstick Apathy

SixPabst posted:

Hi all. I have a date query question that I can't figure out this late in the day.

Say I have a table that looks like
code:
|   PK     |   CompanyId             |   StartDate  |   EndDate     |
---------------------------------------------------------------------------------
|   1      |   1                     |   1/1/2019   |   12/30/2019  |
|   2      |   1                     |   3/1/2019   |    2/29/2020  |
|   3      |   2                     |   2/1/2019   |   12/30/2019  |
|   4      |   2                     |   1/1/2020   |   12/30/2020  |
What I want to do is find all records for any company where start date and end date overlap or intersect. In this example, I'd get rows 1 and 2 back. This seems straightforward by joining the table to itself and doing some date comparison but I'm not getting the results I expect and feel like I'm missing something obvious.

code:

select
	*
	from @mytable m1
	join @mytable m2 on m1.CompanyID = m2.CompanyID and m1.ID > m2.ID
	where (m1.StartDate between m2.StartDate and m2.EndDate)
	or (m2.StartDate between m1.StartDate and m1.EndDate)

with more than 4 records in the table you'll probably end up with duplicates from this i think? don't have time to work that out right now but here's a starting point

# ? Jul 17, 2019 22:30

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Yeah not entirely clear. If window A overlaps B, and B overlaps C, but C doesn't overlap A, what is the desired result?

# ? Jul 17, 2019 22:45

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Our resident SQL performance guy shared this brain teaser in our internal forums this week:
(our database is multi-tenant)

Performance Guy posted:

In 5 seconds or less, identity the problem with this query:

code:

SELECT DISTINCT
  O.Tenant_ID,
  O.Order_Key
FROM dbo.Shipment AS S
JOIN dbo.Shipment_Item AS SI
  ON SI.Tenant_ID = S.Tenant_ID 
  AND SI.Shipment_Key = S.Shipment_Key
JOIN dbo.Order_Item AS O
  ON O.Tenant_ID = SI.Tenant_ID
  AND O.Order_Item_Key = SI.Order_Item_Key
WHERE S.Tenant_ID = @Tenant_ID
  AND (@Shipment_Method_Key IS NOT NULL AND @Shipment_Method_Key = S.Shipment_Method_Key) OR (@Shipment_Key IS NOT NULL AND S.Shipment_Key = @Shipment_Key)

The problem being that the OR in the WHERE clause isn't properly wrapped in parenthesis, so the where clause logically:

code:

WHERE
  (
    S.Tenant_ID = @Tenant_ID
    AND (@Shipment_Method_Key IS NOT NULL AND @Shipment_Method_Key = S.Shipment_Method_Key)
  )
  OR (@Shipment_Key IS NOT NULL AND S.Shipment_Key = @Shipment_Key)

When it SHOULD be:

code:

WHERE
  S.Tenant_ID = @Tenant_ID
  AND (
    (@Shipment_Method_Key IS NOT NULL AND @Shipment_Method_Key = S.Shipment_Method_Key)
    OR (@Shipment_Key IS NOT NULL AND S.Shipment_Key = @Shipment_Key)
  )

so that the data returned is all from the passed in tenant instead of from unrelated other customers of ours.

After half a dozen quick and correct analyses by engineers like me, a customer support guy chimes in:

Not a developer guy posted:

Well, to begin with there is no structure to the query. I was taught to focus on structure so that the query is easily readable. Even if your code ran 500 times faster but no structure, you failed.

I would have to give it structure before commenting.

This is an example of one I wrote.

code:

SELECT  distinct
   c.item_number,
   c.revision,
   e.Creation_Step_Codename,
   d.Price_Component,
   max(a.add_date)

from   Item_Price_Component a inner join Item_Price b
         on a.Item_Price = b.Item_Price
       Inner join Item_List c
        on b.Item_Key = c.Item_Key
       inner join Supplier_Item_Price_Component d
        on  a.Supplier_Item_Price_Component_Key = d.Supplier_Item_Price_Component_Key
       inner join Item_Creation_Step f
        on  f.Item_Key  = c.item_key
         and f.Item_Creation_Step_Key = b.Item_Creation_Step_Key       
       inner join Creation_Step e
          on  e.Creation_Step_Key = f.creation_step_key
         
           
         


where  e.material = '1' and b.active = 'True'

Group by c.item_number,c.revision,e.Creation_Step_Codename,d.Price_Component

with that poo poo, man. The only changes I made were to rename the tables and columns. All formatting, capitalization, and aliasing are as they appeared.

I want to light this dude's keyboard on fire.

Another high up engineer and performance guy had a very diplomatic response:

Performance Guy 2 posted:

Not a developer guy,

Other than the poorly formed OR where condition that was the point of his example, Performance Guy's code correctly follows our SQL Standards and is laid out logically and consistently.

You may wish to review this document if you want to be able to write and format SQL in a way acceptable to our developers: <link to the SQL standards guide>

# ? Jul 19, 2019 16:24

Tibalt: May 14, 2017; What, drawn, and talk of peace! I hate the word, As I hate hell, all Montagues, and thee

Can someone explain the difference between SSMS, SSDT, SSRS, and SSIS to me like I'm an idiot.

Because I thought I understood the difference but either I don't or my director doesn't, and I'm trying not to look like I'm an rear end in a top hat if I'm the one being the dumbass.

# ? Jul 19, 2019 16:53

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

500 times faster? I'd make the most unreadable, unmaintainable piece of poo poo you'd ever seen involving every bad practice you could think of and multiple fast_sqrt magic number equivalents if I could make a central, heavyand often used query 500 times faster.

# ? Jul 19, 2019 17:13

Shy: Mar 20, 2010

Tibalt posted:

Can someone explain the difference between SSMS, SSDT, SSRS, and SSIS to me like I'm an idiot.

Because I thought I understood the difference but either I don't or my director doesn't, and I'm trying not to look like I'm an rear end in a top hat if I'm the one being the dumbass.

SSMS is a gui for the database. SSDT is a set of db development tools that integrate into VS, SSRS is tools for reports, SSIS is programmability/ETL. Which part is confusing? Just google any of those terms, MS docs explains it well enough.

Shy fucked around with this message at 18:42 on Jul 19, 2019

# ? Jul 19, 2019 18:32

Tibalt: May 14, 2017; What, drawn, and talk of peace! I hate the word, As I hate hell, all Montagues, and thee

Shy posted:

SSMS is a gui for the database. SSDT is a set of db development tools that integrate into VS, SSRS is tools for reports, SSIS is programmability/ETL. Which part is confusing? Just google any of those terms, MS docs explains it well enough.

Well, here's my understanding: SSDT slots into Visual Studios, and SSRS/SSIS slot into SSDT. SSMS is a separate thing, right? You wouldn't use SSMS to create a SSRS project, right? If I'm suppose to be using SSRS/SSIS, I need SSDT and VS installed on my computer, right?

The reason for my frustration is that I've been hired onto this new startup, and things are a bit wacky. I'm being told that I have everything I need, but I only have SSMS installed. Sure, I can write scripts and use the Wizards to get by, but I need SSDT, right? Before I start pushing back on this, I really want to make sure I'm not making an rear end of myself.

# ? Jul 19, 2019 19:01

Shy: Mar 20, 2010

Tibalt posted:

Well, here's my understanding: SSDT slots into Visual Studios, and SSRS/SSIS slot into SSDT. SSMS is a separate thing, right? You wouldn't use SSMS to create a SSRS project, right? If I'm suppose to be using SSRS/SSIS, I need SSDT and VS installed on my computer, right?

The reason for my frustration is that I've been hired onto this new startup, and things are a bit wacky. I'm being told that I have everything I need, but I only have SSMS installed. Sure, I can write scripts and use the Wizards to get by, but I need SSDT, right? Before I start pushing back on this, I really want to make sure I'm not making an rear end of myself.

Yes you need VS and SSDT (data storage and processing workload in the VS installer) for development, see https://docs.microsoft.com/en-us/sql/integration-services/integration-services-ssis-development-and-management-tools?view=sql-server-2017
SSMS is a management tool, if you're developing you'll need VS

# ? Jul 19, 2019 19:07

Tibalt: May 14, 2017; What, drawn, and talk of peace! I hate the word, As I hate hell, all Montagues, and thee

Shy posted:

Yes you need VS and SSDT (data storage and processing workload in the VS installer) for development, see https://docs.microsoft.com/en-us/sql/integration-services/integration-services-ssis-development-and-management-tools?view=sql-server-2017
SSMS is a management tool, if you're developing you'll need VS

Update: My director thought SSMS and Visual Studio were the same thing, and requested only SSMS for me. When he was telling me I had everything I needed, he was referring to the completed ticket to install SSMS, I don't think he read my request for VS. Thank you for assuring me I wasn't crazy.

# ? Jul 19, 2019 20:04

ModeSix: Mar 14, 2009

I am trying to design a database schema and I have a question about the best way to do something.

I want to have a ratings database that will have a few thousand items that can be rated by each user. Each user can only rate an item once. I need to be able to get an average rating for each item. New items will be periodically added to the database which can then be rated.

Which of the following scenarios will work best.

1) Have a separate table of ratings for each item -> store the userid that rated the item and the rating they gave.
ex: item123rating -> userid, rating
item1234rating -> userid, rating

or

2) Have the rating tied to the user -> a field for each item attached to the user table.
user123 -> item123rating, rating
user123 -> item1234rating, rating

or

3) ????? a better way that someone with more experience can recommend

I seem to think in my mind that the first method is the best way to do it, but I am not database expert.

# ? Jul 21, 2019 21:23

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

You need a table with 2 foreign keys - one to the user table and one to the "rateable items" table. This table will have either 3 or 4 columns: the 2 foreign keys, the rating given by the user, and optionally an artificial primary key. (If you do not use an artificial primary key, then the primary key is a composite key consisting of the two foreign keys in one order or the other.)

Your first idea is incorrect because it proposes having "a table for each item" where the collection of items can be added to as part of ordinary operation of the software. That's always the wrong solution - categorically.

Your second idea similarly is incorrect because it's also wrong to have "a column for each item" where the collection of items can be added to as part of ordinary operation of the software.

Both of them are wrong not only conceptually, but also more practically because your RDBMS isn't designed to work like that, but it is designed to work with foreign key relationships and more-or-less normalised database table designs. Doing it the right way will also mean that you can more easily write code that does the very simple task you wanted to achieve - getting the average for each item.

# ? Jul 21, 2019 22:05

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Assuming the table is UserRating (UserID, ItemID, Rating)

I want to find the average rating for item 123:

SELECT AVG(Rating) FROM UserRating WHERE ItemID = 123

I want to find the average rating for each item that has at least one rating:

SELECT ItemID, AVG(Rating) FROM UserRating GROUP BY ItemID

# ? Jul 21, 2019 22:09

ModeSix: Mar 14, 2009

Hammerite posted:

You need a table with 2 foreign keys - one to the user table and one to the "rateable items" table. This table will have either 3 or 4 columns: the 2 foreign keys, the rating given by the user, and optionally an artificial primary key. (If you do not use an artificial primary key, then the primary key is a composite key consisting of the two foreign keys in one order or the other.)

Your first idea is incorrect because it proposes having "a table for each item" where the collection of items can be added to as part of ordinary operation of the software. That's always the wrong solution - categorically.

Your second idea similarly is incorrect because it's also wrong to have "a column for each item" where the collection of items can be added to as part of ordinary operation of the software.

Both of them are wrong not only conceptually, but also more practically because your RDBMS isn't designed to work like that, but it is designed to work with foreign key relationships and more-or-less normalised database table designs. Doing it the right way will also mean that you can more easily write code that does the very simple task you wanted to achieve - getting the average for each item.

Hammerite posted:

Assuming the table is UserRating (UserID, ItemID, Rating)

I want to find the average rating for item 123:

SELECT AVG(Rating) FROM UserRating WHERE ItemID = 123

I want to find the average rating for each item that has at least one rating:

SELECT ItemID, AVG(Rating) FROM UserRating GROUP BY ItemID

Thank you very much. This makes complete sense to me.

# ? Jul 22, 2019 01:28

Pollyanna: Mar 5, 2005; Milk's on them.

Today I brought down our site for 30 minutes by running a count on a single table, and what I thought killed the query actually didn�t. How the hell would a count on a single table gently caress things up that badly?

# ? Jul 23, 2019 18:06

Furia: Jul 26, 2015; Grimey Drawer

No idea. Never worked with DBs connected to websites but I would begin with: How big was the table? How did you run the count? How integral is the table to the website? How active was it at time of querying? Etc etc

# ? Jul 23, 2019 18:08

Pollyanna: Mar 5, 2005; Milk's on them.

Table�s apparently pretty big, several gigs in size. Flat out select count(*) from table query. The table is pretty important to only half the site, most of the site doesn�t use it, but to the ones that do it�s integral. It gets queried whenever a fairly common page is loaded.

:cripes:

I hate my life.

# ? Jul 23, 2019 18:18

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Also depends on the default read isolation level*. See if you can give it a hint that you're OK with an inaccurate count. That'd be with(nolock) in TSQL but IDK how it'd look elsewhere.

*might be transaction isolation level?

# ? Jul 23, 2019 18:35

Pollyanna: Mar 5, 2005; Milk's on them.

I�m not even gonna bother. This DB and app is a train wreck as-is without me bringing it down.

# ? Jul 23, 2019 18:49

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Yeah most likely you read locked the table. At least in MS SQL, reads usually issue shared locks which allow other reads but block writes. It�s to prevent you from reading data in an inconsistent state.

If your table is huge and the site relies on inserting data into it or updating data in it, your long running query probably blocked that. Isolation levels or nolock hints are the way to avoid that with various associated cons.

# ? Jul 23, 2019 19:40

nem: Jan 4, 2003; panel.dev
apnscp: cPanel evolved

Bad query created a table-level lock and dependent queries swamped active connections as it paged in-memory tables to disk. Happened to me way back when. Set a threshold for the max joins in a table if MySQL for example, "[mysqld] => max_join_size = 5000000" or however egregious you want your indecency :whip:

# ? Jul 23, 2019 21:47

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

It very very much depends on the database you're using. In any case, please use information schema (caveats).

Edit: Indeed anything higher than read committed is not typical, so while transaction isolation could cause it, if your app is changing the level you have... bigger concerns, let us say.

Edit: A couple of gigs of data is nothing. That query should run without issue until you get to terabytes.

... which means I have a guess of what you're using, and that it's not tuned for cache and worker memory. This would still require some interesting interplay with other traffic, especially for a 30min outage. What's the nominal TPS otherwise?

PhantomOfTheCopier fucked around with this message at 22:08 on Jul 23, 2019

# ? Jul 23, 2019 21:55

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Oh boy, transaction isolation levels. Good stuff.

If you're on SQL Server, there's a fast way to get a rough rowcount:

code:

USE Your_Database_Here;

SELECT
  I.[rows]
FROM sys.sysindexes AS I
WHERE
  I.id = OBJECT_ID('Your_Table_Name_Here')
  AND I.indid < 2; -- We're aiming for the PK index

YMMV in other servers.

# ? Jul 23, 2019 22:38

Pollyanna: Mar 5, 2005; Milk's on them.

The data we�re trying to get doesn�t even belong in our database so it can go to hell as far as I�m concerned. Not even my half of the application. I try to debug it and it spits in my face. gently caress it.

# ? Jul 23, 2019 22:47

Macichne Leainig: Jul 26, 2012; by VG

Pollyanna posted:

Table�s apparently pretty big, several gigs in size. Flat out select count(*) from table query. The table is pretty important to only half the site, most of the site doesn�t use it, but to the ones that do it�s integral. It gets queried whenever a fairly common page is loaded.

I hate my life.

I don't know if this works outside of MSSQL but you might be able to do a select count(1) from table, that way it doesn't have to bother retrieving all of the data just to count rows. If I remember right it should just return a '1' for each row in the table.

Been a hot minute since I've touched a real SQL database, so I'm probably wrong. Either way the with (nolock) hints as suggested before should at least stop the query from deadlocking the table.

# ? Jul 23, 2019 22:49

Furia: Jul 26, 2015; Grimey Drawer

Protocol7 posted:

I don't know if this works outside of MSSQL but you might be able to do a select count(1) from table, that way it doesn't have to bother retrieving all of the data just to count rows. If I remember right it should just return a '1' for each row in the table.

Been a hot minute since I've touched a real SQL database, so I'm probably wrong. Either way the with (nolock) hints as suggested before should at least stop the query from deadlocking the table.

This was my thought as well. It sounds right but I can�t quite commit to it for some reason

# ? Jul 24, 2019 22:57

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

It depends on the database. Some actually select *, some prefer select(id), etc.

# ? Jul 25, 2019 00:10

redleader: Aug 18, 2005; Engage according to operational parameters

What RDBMS?

# ? Jul 25, 2019 08:37

Joda: Apr 24, 2010; When I'm off, I just like to really let go and have fun, y'know?; Fun Shoe

How come binary database interfaces aren't more common with ORMs? It seems a waste to communicate using massive strings if you, for instance, just want to select a tonne of data by IDs.

# ? Jul 26, 2019 12:57

Tibalt: May 14, 2017; What, drawn, and talk of peace! I hate the word, As I hate hell, all Montagues, and thee

I've inherited a script that I'm updating for the current year, and it takes forever, so I'm trying to optimize it. I've got three questions that I'm not quite sure about.

Would using UNION ALL with two SELECT DISTINCT queries provide a benefit over UNION, if I can assume the two select queries aren't going to have any duplicate records?

I've read that using INNER JOINs are more efficient. Does that rule still apply to UPDATE queries, like this?

code:

 update a
 set
	a.Flag = '1'
from
	TABLE_1 a,
	(select distinct ID from TABLE_1 where Other_Flag = 'Y') b
where
	a.ID = b.ID

versus

code:

 update a
 set
	a.Flag = '1'
from
	TABLE_1 a
INNER JOIN
	(select distinct ID from TABLE_1 where Other_Flag = 'Y') b
ON
	a.ID = b.ID

Would getting rid of the DISTINCT in the SELECT subquery above help performance?

# ? Jul 30, 2019 20:04

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

Those should plan to the same anyway. What could be faster depending on the size of the data would be using an IN clause:

code:

UPDATE
  table
SET
  flag=1
WHERE
  id IN (
  SELECT
    id
  FROM
    table
  WHERE
    otherflag='Y'
)

If id is the primary key then this is total overkill anyway, and could just be

code:

UPDATE
  table
SET
  flag=1
WHERE
  otherflag='Y'

# ? Jul 30, 2019 20:11

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

The answer to all your questions is �look at the query plan�. But I�ll try to address them.

Tibalt posted:

Would using UNION ALL with two SELECT DISTINCT queries provide a benefit over UNION, if I can assume the two select queries aren't going to have any duplicate records?

UNION ALL is usually faster than UNION (maybe not due to the potential transmission of more records). I don�t know if your two DISTINCTs will be faster or slower than a single UNION. It�s gonna depend on your data and schema.

Tibalt posted:

I've read that using INNER JOINs are more efficient. Does that rule still apply to UPDATE queries, like this?
code:
 update a
 set
	a.Flag = '1'
from
	TABLE_1 a,
	(select distinct ID from TABLE_1 where Other_Flag = 'Y') b
where
	a.ID = b.ID
versus
code:
 update a
 set
	a.Flag = '1'
from
	TABLE_1 a
INNER JOIN
	(select distinct ID from TABLE_1 where Other_Flag = 'Y') b
ON
	a.ID = b.ID
Would getting rid of the DISTINCT in the SELECT subquery above help performance?

INNER JOINs aren�t blanket more efficient. Also the comma operator is functionally equivalent to an inner join.

Hard to say if getting rid off the DISTINCT will improve performance - gotta look at that plan.

# ? Jul 30, 2019 20:20

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:22

Tibalt: May 14, 2017; What, drawn, and talk of peace! I hate the word, As I hate hell, all Montagues, and thee

Unfortunately, no, not a PK. It's more like a invoice number - I want to flag the whole invoice if one of the items meets criteria.

What size are we talking about for the IN clause? I'm looking at just over a million records here, with a couple hundred thousand ID numbers.

Edit: Thanks for the reply Ruggan as well. I'll take a look at the query plan as well.

Tibalt fucked around with this message at 20:27 on Jul 30, 2019

# ? Jul 30, 2019 20:20

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »