|
OK, I've got an xquery on SQL 2005 question here. I'm working with an XML structure that we store in a SQL Server 2005 database. The XML structure stored is large, and prohibitive to display all at once via an XSLT transformation, which is the current display method. I'm looking to implement filtering, ordering, and paging on the XML data. I've got a basic XML structure that's something like this(though I've changed certain details): code:
For performance testing, I ran my functions against a rather large XML structure in our database, which contains over 25,000(25,057 to be exact) of those Item tags. This is an especially large structure, but it's not necessarily out of touch with the largest structures in production. My paging function can fetch this large structure and return, say, items 776-800 in about 25 seconds. However, if I were to need those same items returned, but with the data sorted by Weight, the OrderByWeight function takes six and a half minutes to execute the following code: code:
Worse yet, I can apply other order methods that have been asked for, like sorting by "<City>, <State>" on the shipper end. I tried to apply this sort to my large XML file before I went for lunch, thinking I'd check the execution time when I returned. 30 minutes later, it was still executing, at which point I cancelled it. That code was very similar to the above, just like this instead: code:
|
# ¿ Jan 14, 2008 22:03 |
|
|
# ¿ May 21, 2024 10:47 |
|
I'd like to ask here to see if any of you guys have any idea on what I might be able to do to speed up an operation I'm working on. In SQL 2005, I've got a table with a large amount of data, millions of records. The table has a primary key that's a single ID column that's a uniqueidentifier. These ID attributes have been filled, to this point, using newid(). The data I'm really interested in, though, is the data inside an untyped XML column. The column I have has data something like this: code:
code:
Now, that's not exactly what I have(I'm running the operation in batches, I'm doing a lookup on two other columns that aren't nearly as significant, and I just rewrote that xquery off the top of my head), but that's the gist of it. I've got an index on the Date column, so that's not really the big issue. The query as it stands now is just entirely too slow. I'm batching in groups of 10,000 records, and removing 1 batch of 10,000 records takes about 15 minutes to run. This isn't really acceptable, because this bit of code is going to have to run, alongside several other routine maintenance and cleanup operations inside of a short window of scheduled downtime. Outside of that window, that table requires high availability and I can't be tying it up trying to white out large numbers of rows. I've placed an XML Index on the table to try to speed up my .exists() function, but in order to get that XML index on the table, I had to change the existing primary key index from a non-clustered index to a clustered one(since it's required for the XML index), and I've been reading about how clustered indexes on GUID columns are a bad idea if the GUIDs have been created using newid(). I can move to where we'll begin utilizing newsequentialid(), but it'll be several years before the old random data is all gone. However, if I'm going to have an XML index, which seems like it's going to be necessary for any sort of decent speed, I'm going to need to have this. I'm out of ideas for what I can do to speed this up. It's unacceptably slow now, but if it's as fast as it's going to get without significantly restructuring the table somehow, then it's all I can do. I'm just hoping that there's some avenue here that I've missed that I can use to increase performance. I've got some options for potentially doing different behavior in the hopes that I could do something less costly, but I'd like to be sure first that I haven't missed an avenue for increasing performance while doing this is the right way, especially since all of those other options get into having to do a system impact to see how other bits of the application riding on this database would react. Any thoughts?
|
# ¿ Sep 22, 2008 07:44 |
|
Jargon posted:Ok this worked: I don't know postgres, and I'm coming at this from a SQL 2005 background, so I can't claim to know that this works, but it's what I'd suggest. Do the group by on the line_items table only, and then LEFT OUTER JOIN scenes to your grouped line_items table. Something like code:
I know SQL Server is also picky like this, where a group by query requires that all columns in the select list be either in the group statement, or in an aggregate. Your other option here, if the "grouping" is really only on 1 item like it seems you have here(as I'm assuming you don't have multiple Scenes with the same ID), you can defeat this by using the aggregate functions, like, say, selecting MAX(scenes.filename_base), which will work, because it's an aggregate function, and there's only 1 value for filename_base that's repeated several times in the join, so it's also MAX(filename_base). Again, though, I haven't used Postgres, so my suggestions are probably best taken with a strong dose of documentation. thefncrow fucked around with this message at 18:24 on Sep 22, 2008 |
# ¿ Sep 22, 2008 17:46 |
|
Begby posted:Secondly, is @Roles a string? If so can't you do AND USER_ROLE_ID IN ( @Roles )? That alone should save some processing time you would think. I am no SQL Server expert, but I would assume that parse function gets run for every record. SQL Server will let you do "column IN ( @StringVar )", but when you do it, it's equivalent to "column = @StringVar". If you have a comma delimited list, you have to write something to parse out each item individually, which is what he's doing there.
|
# ¿ Sep 26, 2008 20:24 |
|
I'm working in a SQL Server 2005 environment here. I have a large data table with a 3 part primary key. The 3 part primary key is something like CustomerNumber, BillID, LineItem. Changes to this table are difficult at best, because it's large and partitioned. I've got another table, now, where I'm working with the line items for a given (CustomerNumber, BillID) combination. Basically, this table is an identity int column, CustomerNumber, BillID, and some XML data about those line items and some associated data for each. What I want is a constraint on this table that, for any inserted row, the (CustomerNumber, BillID) combination matches a record in my larger table. I can't do it as a foreign key, because of the third part of the key(and the PK without the third part is not unique to each record). How can I enforce this? It seems like I'm missing something obvious, but I can't see it.
|
# ¿ Nov 14, 2008 00:52 |
|
Sergeant Hobo posted:I think you're looking for a TSQL trigger. Basically, you would need to run a query on the 3-key table to search for the specific CustonerNumber and BillID combo that is coming into the database and, if it exists in the 3-key table, you go to the INSERT statement and, if not, throw back some kind of error message. Sorry, but I'm not familiar with TSQL; the limited experience I have with this stuff has been through Oracle's PL/SQL. Yeah, I just was thinking about a trigger, and I think that's the way I'm going to want to go at this point. It just didn't occur to me at first because I kept thinking there was some sort of constraint I could apply to the table that would do the trick, but I think it'll just have to be a trigger. Thanks.
|
# ¿ Nov 14, 2008 03:07 |
|
I'm really at a loss as to how something like this might be done without serious overhead, so I thought I might lob it out here, see if anyone has a solid solution. In SQL Server 2005, I'm working with a table that has a PK key sequential GUID called ID and an attribute for a parent item, ParentID. The parent item is the key GUID of the item that this item refers to, and is NULL for root items. Additionally, there is a datetime of when the record was added to the table, DateProcessed. These parent items cascade, such that, for example, we would have one root item with 2 children nodes 2 and 3, and then nodes 4 and 5 whose records reference them as the child of node 2 and 3, respectively. The table is very large, like 200,000+ records, and the size of each grouping is relatively small(most groups don't reach 5 records, though they could), giving me a large number of these groups. Given this, I want to return just the leaf node that was most recently inserted for each such grouping in the entire table, with a cap at 1000 of the most recent records returned. The best I've been able to come up with on this has been to rely on the fact that, by using newsequentialid(), the items more recently inserted have GUID values greater than previous items. Based on that, I was able to generate this: code:
If I move the TOP 1000 inside the CTE to the root expression, I get incredible performance gains, as you might expect. However, the next step, once I have this, is to add some additional where clauses to the outer query in order to provide the ability to search for certain items, and so the filtering needs to occur outside, or else I'm just left doing my search inside that TOP 1000 set instead of searching the whole set and placing a 1000 record limit on the results. Is there an easier way to do this that I'm just not seeing?
|
# ¿ Nov 13, 2009 22:33 |
|
Saltin posted:I have a simple one.I'm not much of a SQL fellow, as you can probably guess from this question! If you just want the name of the parent company, then what I'd suggest is to do something like this: code:
|
# ¿ Nov 13, 2009 22:49 |
|
I actually got this solved late on Friday, but I appreciate the help.Nurbs posted:I'm not up on performance, but couldn't you do something like this? That would pull back all leaf nodes on the tree. Imagine the circumstance Root -> Child -> (Grandchild1, Grandchild2). Grandchildren 1 and 2 both have ID values that don't appear as ParentID values, so they'd both be returned, instead of just returning the most recent of the two. Jethro posted:I don't know if this will be any faster, but why not have the ItemGroup in your CTE just be the ID of the top parent item? That way you don't have to do the analytic function. This was actually part of my eventual solution. I went with the group item as the ID value of the root node, and, sloppy as it is, I flattened out the CTE to rid myself of it entirely. That meant losing recusion and going to only a fixed number of levels, but based on what the Items table does, it turned out it would already begin experiencing weirdness if we ever go beyond 3 levels of tree depth. So, I compromised on that and went with 3 levels of depth instead of building it recursively, and I ended up with something like code:
|
# ¿ Nov 16, 2009 20:50 |
|
|
# ¿ May 21, 2024 10:47 |
|
A question for you guys. I've been trying to google this, but I get items that aren't matching up to what I'm looking for. I'm on SQL Server 2005 with an insert query where I need to fetch the ID item for an insert. I know about scope_identity, but the ID column on the table is a GUID generated by a default of newid(), and there is not an identity column for the table. My first attempt was that I wanted to do a query like this: code:
Is it just not possible to stuff the output items into a scalar variable directly without using an intermediary table variable? I'll use the intermediate table variable if I have to, but it seems like there should be a way to do this without it. thefncrow fucked around with this message at 20:43 on Aug 30, 2010 |
# ¿ Aug 30, 2010 20:26 |