|
I have a table that stores the history of status changes for accounts but have some bad rows in there, here is a simplified example:code:
code:
Anyone have any idea how to do this row combination in SQL? I can do it programatically using row-by-row comparisons but for a 22 million row table it will take many hours... Thanks.
|
# ? Sep 5, 2009 19:23 |
|
|
# ? Jun 4, 2024 23:35 |
|
What flavor of db are you using? If it's Oracle you could use the max() and min() partition analytic function dark magic on the valid_from and valid_to columns to get what you're looking for I think. http://psoug.org/reference/analytic_functions.html Also, nice user name. Vanadium Dame fucked around with this message at 21:04 on Sep 5, 2009 |
# ? Sep 5, 2009 21:02 |
|
dwayne_dibbley posted:I have a table that stores the history of status changes for accounts but have some bad rows in there, here is a simplified example: code:
|
# ? Sep 6, 2009 05:02 |
|
Markoff Chaney posted:What flavor of db are you using? If it's Oracle you could use the max() and min() partition analytic function dark magic Yeah, Oracle. I've been trying to use the analytical functions but so far no success... :-(
|
# ? Sep 6, 2009 10:47 |
|
Xae posted:
Thanks, unfortunately this combines too many of the rows. The status A are combined when they shouldn't be and the first B shouldn't be touched also.
|
# ? Sep 6, 2009 10:49 |
|
dwayne_dibbley posted:... Something like this might work, haven't done any serious PL/SQL ever and no Oracle nearby to test it. code:
|
# ? Sep 7, 2009 09:09 |
|
dwayne_dibbley posted:Thanks, unfortunately this combines too many of the rows. The status A are combined when they shouldn't be and the first B shouldn't be touched also. This should work assuming that all rows have a non-null valid_from value. It makes liberal use of the lag and lead analytic functions to check the previous and next statuses to determine if the records are sequential. code:
|
# ? Sep 7, 2009 18:29 |
|
Clone5 posted:<Snip very cool SQL> (I had to make one very very small correction on the outer-most lead function, in the order by valid_from we now had nulls in the column so I replaced that with an explicit row_number() ordering) code:
But man... I am drat impressed. I had nearly given up! ps. thanks to the previous 3 posters for taking the time to reply.
|
# ? Sep 8, 2009 01:03 |
|
We're using MySQL and I'm having the hardest time getting it to work the way I want it to. There's a table with 3 relevant fields: machineID, switch, and port. code:
I managed to hack together something using a temporary table, but I'd rather it be done all at once. code:
code:
Beardless Woman fucked around with this message at 17:05 on Sep 8, 2009 |
# ? Sep 8, 2009 15:31 |
|
Beardless Woman posted:We're using MySQL and I'm having the hardest time getting it to work the way I want it to. code:
|
# ? Sep 8, 2009 16:37 |
|
Knightmare posted:
blueberrypudding fucked around with this message at 14:59 on Sep 9, 2009 |
# ? Sep 8, 2009 16:46 |
|
You are both absolutely correct. Thank you so much!
|
# ? Sep 8, 2009 17:04 |
|
So I'm working with a medium-sized (~400k row) database of scores. The idea is that every x minutes, a PHP script inserts a bunch of new scores into the database, then calculates the new ranks of each score, handling ties, and inserts that value into the database. Here's the code I was using to do this:code:
I have some other ideas on how to do this, like creating a temporary table then getting the new rank from there, but I'm a bit hesitant to do anything after the debacle that this code caused (it's not my server, I'm doing this as semi-contract work for a friend of a friend) without being 100% sure it'll at least not completely gently caress up a table... any ideas?
|
# ? Sep 10, 2009 20:49 |
|
That query doesn't look so hot (given its length alone). Can you describe how your rank is calculated?
|
# ? Sep 10, 2009 21:18 |
|
Yeah, I just tried running that query again (learning from my mistakes, I exported the database to my own XAMPP server so I had permission to kill the query after it stalled) and same thing happened, so that query's not usable. The scores are times, so lower is better, but other than it's what you'd expect: if A's time is 1.2, B's is 1.5, C's is 1.5, and D's is 2.0, A would be rank 1, B and C would both be rank 2 and D would be rank 4. The ranks themselves are calculated in the query, which I pulled from http://www.1keydata.com/sql/sql-rank.html. Again, works great for small tables, but not for something as large as this one. edit: Decided to go the simple route and just use order by/variable incrementing: code:
reworp fucked around with this message at 21:58 on Sep 10, 2009 |
# ? Sep 10, 2009 21:34 |
|
reworp posted:The scores are times, so lower is better, but other than it's what you'd expect: if A's time is 1.2, B's is 1.5, C's is 1.5, and D's is 2.0, A would be rank 1, B and C would both be rank 2 and D would be rank 4. In other words, a person's rank is exactly equal to the number of people with strictly lower times, plus one. code:
ShoulderDaemon fucked around with this message at 22:39 on Sep 10, 2009 |
# ? Sep 10, 2009 22:30 |
|
ShoulderDaemon posted:In other words, a person's rank is exactly equal to the number of people with strictly lower times, plus one. This is true, but that method is also very, very slow, since you're doing a SELECT COUNT(*) for 400k entries. It was actually how the rank was being calculated before I started working on this. Also, that query gives an error when I tried to run it, "#1093 - You can't specify target table 'parent' for update in FROM clause" Edit: Your new SQL gives the same error. I believe in MySQL you aren't allowed to reference the table that you're updating in derived tables reworp fucked around with this message at 22:49 on Sep 10, 2009 |
# ? Sep 10, 2009 22:38 |
|
I just built a synthetic 0.5mil rowset in Postgres 8.3 to look at that. With an index on scores, the query planner is just touching each row in the score table once, at the top, and to determine the count it looks up the score of that row in the index, then uses a single index check to find the count of lower scores. That's 0.5mil index hits, and 0.5mil row hits, which isn't that bad. You could do better by walking the index manually in order and maintaining your own next-rank counter, but I don't know if your database lets you write custom index queries. That doesn't actually reduce your hitcount at all, but it preserves locality on the index better, so there's some small constant factor speedup. If you want to do it with a row walk instead of an index walk, it's similar to what you had before with the @rownum counter, but you need two state variables: The last rank assigned, and the score of that rank. You walk the table in score order, increment the rank variable if the score of the current row is greater than the previous score, then assign the rank. Probably clearest in a stored procedure. That gives you 0.5mil row hits and however many index hits are needed to sort your rows, which if you're clustered on the score index is cheaper than what I had, but otherwise is more expensive. Edit: Oh, you're in MySQL, no wonder it's slow. You'll probably just want to cluster on a score index and do the row walk yourself as I wrote above; otherwise you'll just be fighting with the terrible query planner forever. ShoulderDaemon fucked around with this message at 23:08 on Sep 10, 2009 |
# ? Sep 10, 2009 23:06 |
|
reworp posted:This is true, but that method is also very, very slow, since you're doing a SELECT COUNT(*) for 400k entries. It was actually how the rank was being calculated before I started working on this. Also, that query gives an error when I tried to run it, "#1093 - You can't specify target table 'parent' for update in FROM clause" code:
(Once defined it is executed by issuing CALL CalculateRanks())
|
# ? Sep 11, 2009 00:09 |
|
Also double check to make sure your engine doesn't already provide ranking functions (SQL Server does).
|
# ? Sep 11, 2009 03:43 |
|
I wondered how one my query for trends in some data that has a record for a date field. For example, I wondered if it was possible in one query to find all records which, for the past 3 days, have had a day-to-day net increase in a certain field. I wondered then if it was possible for this to follow date records where there are holes, such that it would jump over the hole and look just by the consecutive days it has recorded, not by actual consecutive days in real time. Perhaps an example: code:
|
# ? Sep 13, 2009 06:37 |
|
Can someone please explain how I hosed up? This works: code:
code:
I mean, I understand that WHERE happens before the GROUPing, and HAVING happens after, but I guess the order its hosed up in my head...
|
# ? Sep 15, 2009 20:32 |
|
Rocko Bonaparte posted:I wondered how one my query for trends in some data that has a record for a date field. For example, I wondered if it was possible in one query to find all records which, for the past 3 days, have had a day-to-day net increase in a certain field. I wondered then if it was possible for this to follow date records where there are holes, such that it would jump over the hole and look just by the consecutive days it has recorded, not by actual consecutive days in real time. code:
I'm not sure it does exactly what you want, but it gets at the general idea. The main point is that lead() gets the next row as defined by the windowing clause, regardless of whether the dates are calendar-continuous.
|
# ? Sep 15, 2009 20:37 |
|
I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases. Is there a good starting point for intermediate users regarding database design best practices? I'm looking for a book or something. Budget is < $100
|
# ? Sep 16, 2009 02:32 |
|
inveratulo posted:I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases. If you just search for "data normalization" or "normal forms" (there are five but generally getting to third normal form is good enough) you can probably find a bunch of good tutorials/references. From there you can make intelligent decisions on what you want to denormalize for ease of reporting etc. (you may also find "OLAP" and "data cubing" to be useful topics, but I'd get comfortable with normalization first) Sorry I don't have more specific recommendations but this was actually one of the only useful things I learned in college several decades ago, so I can't even remember what texts we used and I'm sure there is a bunch of better poo poo now anyway.
|
# ? Sep 16, 2009 02:43 |
|
I've been having troubles sorting by dates in my program. We're using a Access database as a backend (yeah...), which is being accessed and controlled by ASP Classic with VBScript. Now everything seems to act swimmingly, untill you add dates into the equation. For whatever reason, it just refuses to properly display specific dates, date ranges and order by them. The database has been imported from an excel spreadsheet where the dates are entered in as milestones are reached. Thus the date fields have many null values, as this indicates that there is no milestone for this date. Part of the reason I think this may be happening is because Access seems to only want to read the dates in the US specific way of "MM/DD/YYYY" whereas since we're in Australia, we address it via "DD/MM/YYYY". A Lcid statement in my ASP allowed the ASP pages to display the date correctly, but I still think it's registering them as numbers, rather than actual 'dates'. I have tried using the cDate function of both SQL and in ASP to no luck. Basically to break it down - a query of SELECT Date WHERE Date BETWEEN CDate("&StartDate&") AND CDate("&EndDate&") will return absolutely nothing, regardless of the values of start date and end date. However, if I add a "IS NOT BETWEEN" clause, it will return everything, making me think they're still recognised as number values. Anyone ever experienced anything like this before? I'm at my wits end here. I'm starting to wonder if I could be able to reformat Access to recognise YYYY/MM/DD but it looks fairly difficult to get it both reading them in properly and ASP recognising it properly too.
|
# ? Sep 16, 2009 07:18 |
|
inveratulo posted:I'm curious about high-level database design using MySQL for a work project. I am designing an asset tracker which utilizes SNMP to gather device data, then slice and dice it various ways and finally display it on a webpage. I'm a Linux SA, so my strengths lie mostly in hardware, systems, coding, and not so much relational databases. (pdf link) http://www.karwin.com/downloads/SQL_Antipatterns_MySQLUC2008.pdf
|
# ? Sep 16, 2009 08:56 |
|
Isometric Bacon posted:I've been having troubles sorting by dates in my program. Is the name of your field actually called 'Date'? If so are escaping it in your sql statements properly? Like `Date`?
|
# ? Sep 16, 2009 13:31 |
|
Dr.Khron posted:Can someone please explain how I hosed up? If you remove both >0 clauses and put the SUM() in your select, look at the output and do the filtering process mentally, the difference should be clear.
|
# ? Sep 16, 2009 23:13 |
|
I have two stored procedures that look like this:code:
code:
When I execute CreateSubTable manually, everything looks fine. When I execute CreateMainTable, no rows get inserted. The row count of @temp in CreateSubTable is correct when I execute it and send its output to the console, but when I send its output to another table (@, #, or a permanent table) its row count is 0. What the gently caress is going on? Am I doing something really stupid? So confused. Please help. Thanks!
|
# ? Sep 17, 2009 00:52 |
|
Begby posted:Is the name of your field actually called 'Date'? If so are escaping it in your sql statements properly? Like `Date`? Ah no it's not, I just used that as an example. My actual code is a dynamically generated SQL String which is constructed from drop down menu's that the user selects. (i.e. they select specific WHERE values based on what they enter). The SQL String generally ends up looking something like this once generated: SELECT QRG.*, Admin.Admin_Name, Functional_Area.Functional_Area, Writer.Writer_Name FROM Writer INNER JOIN (Functional_Area INNER JOIN (Admin INNER JOIN QRG ON Admin.Admin_ID = QRG.Admin_ID) ON Functional_Area.ID = QRG.Functional_Area_ID) ON Writer.Writer_ID = QRG.Writer_ID WHERE QRG.QA_Finalised BETWEEN CDate(17/09/2009) AND CDate(17/09/2009) ORDER BY QA_Sent; You can safely ignore most of that, except for the WHERE clause. Basically this will return nothing. If I modify the clause to say IS NOT BETWEEN it will return all the records. I've tried everything I can think of. The date fields are being generated as ASP Strings (which i've convereted to CDate in both ASP and SQL) but still no luck...
|
# ? Sep 17, 2009 04:40 |
|
Fixed my issue in regards to the SQL String - Turns out I needed to cast it into a string before casting CDate on it in the database, confusing! cDate('<%=Value%>') Two simple comma's solve everything. Unfortunately it's still behaving rather strange. If I do a "ORDER BY" on my database, it will order the values roughly in order of date, but not exactly. I have a feeling this has something to do with the MM/DD/YYYY to DD/MM/YYYY conversion that's done automatically in ASP on the server...
|
# ? Sep 17, 2009 06:21 |
|
MoNsTeR posted:The count is overstated because in version #2 you are NOT, in fact, counting participants with a balance-in-each-security by security, you are counting participants by security for securities that have a balance. In other words, he SUM() in your HAVING clause does not describe the balance of a participant, it describes the balance of all participants for a security, because sec_name is what you are grouping by. Ahhhhh... that is a very good explanation, THANK YOU.
|
# ? Sep 17, 2009 16:24 |
|
In general is it faster to just use REPLACE and let it handle insertion and checking for duplicates (in unique keys) or is it better to do a SELECT to check for a row's existence and execute INSERT/UPDATE accordingly? I'm using SQLite3 in this instance if it matters.
|
# ? Sep 17, 2009 20:45 |
|
FYI I had a nested INSERT-EXEC inside of a try-catch. Whoops.
|
# ? Sep 17, 2009 22:34 |
|
Strong Sauce posted:In general is it faster to just use REPLACE and let it handle insertion and checking for duplicates (in unique keys) or is it better to do a SELECT to check for a row's existence and execute INSERT/UPDATE accordingly? I'm using SQLite3 in this instance if it matters. If you have a unique key policing your data it's probably faster to do a REPLACE. In situations where duplicates are infrequent what I personally do is blindly try the insert. If the insert throws a uniqueness exception then I will update/ignore/whatever, as appropriate.
|
# ? Sep 17, 2009 22:43 |
|
I'm trying to learn triggers in SQL so I've gotten 4 crazy rear end constraints to set up. My problem is that we're usually using DB2 at school but I can't access it from home, and I'd like to know if these triggers would work and if they could be written better. I've also been told that you can't use subqueries in DB2 so I've tried using Views instead where I could. This is the general scheme of the DB: code:
1* amoeller must be invited to any meeting that involves beer. code:
code:
3* People in group phd must always accept meetings with people in group vip. code:
code:
Is there a better way to do these things? Also, is there a better validator around than the built-in one in DB2? That thing throws all sorts of crazy poo poo at me when I write wrong stuff.
|
# ? Sep 18, 2009 00:53 |
I'm sure this is probably an easy thing to do, and I don't have the experience to nab it easily.code:
Nevermind, got it. I forgot all about GROUP BY. Atoramos fucked around with this message at 14:02 on Sep 18, 2009 |
|
# ? Sep 18, 2009 13:17 |
|
That's not really... possible. If you group by county, you have to aggregate city on something, and since city is nominal, aggregation doesn't really make sense. For scenarios like "pick random" or "pick one, I don't care which", SQL isn't ideal since it's based on defining sets and fetching them efficiently. But it's true that it happens so often, there might as well be a provision to make random access more convenient and possibly more efficient. (other than making a random seed column and picking the min/max of it)
|
# ? Sep 18, 2009 14:12 |
|
|
# ? Jun 4, 2024 23:35 |
I want to add the ability for users to specify a timezone so I can display timestamps with the appropriate offset in a MySQL based application. Is using set timezone = timezone a bad idea? Should I be doing this in my application code, rather than just setting it at the database level? edit: actually, I'm not even sure that it's even necessary to display exact timestamps in this app. I think X hours/days/weeks/months/years ago is all people will need. Still curious about where in an application timezone conversion should be done though. fletcher fucked around with this message at 23:52 on Sep 22, 2009 |
|
# ? Sep 22, 2009 23:48 |