|
quote:code: Thank you very much! Worked perfectly.
|
# ? Dec 14, 2008 03:21 |
|
|
# ? May 28, 2024 14:58 |
|
How does the running time of MySQL queries compare to retrieving information from a text file? That is, how much more or less is it expensive to run something like SELECT * FROM table_name versus reading all that information from a file without processing it?
|
# ? Dec 14, 2008 19:51 |
|
File I/O costs will most likely dwarf CPU costs. Network I/O costs might not need to be considered if you're on a 1Gbit LAN. If the table is badly fragmented, then reading from a file would probably be faster.
|
# ? Dec 14, 2008 20:34 |
|
Victor posted:File I/O costs will most likely dwarf CPU costs. Network I/O costs might not need to be considered if you're on a 1Gbit LAN. If the table is badly fragmented, then reading from a file would probably be faster. Doesn't MySQL need to read from disk as well, though? Isn't it also affected by File I/O, or does it try to hold as much information in memory as possible?
|
# ? Dec 14, 2008 20:57 |
|
Well, it clearly depends on the table size compared to memory availability, how well MySQL manages its buffers, how often the table is accessed, etc. My point in my previous post was that while MySQL might be slightly slower, I/O costs dwarf CPU costs, so the difference would be small. Network I/O costs and fragmentation of the table are really the biggest factors. If the table happens to be entirely cached in memory, then you have no I/O costs and you just have to see how network I/O costs factor in.
|
# ? Dec 14, 2008 22:05 |
|
hexadecimal posted:Doesn't MySQL need to read from disk as well, though? Isn't it also affected by File I/O, or does it try to hold as much information in memory as possible? The first time, (provided you use a similar disk read method) flat file will be faster. There's just more overhead. Subsequent times, MySQL will be faster, assuming you've setup your buffers correctly.
|
# ? Dec 14, 2008 22:15 |
|
Pento McGreno posted:The first time, (provided you use a similar disk read method) flat file will be faster. There's just more overhead. Subsequent times, MySQL will be faster, assuming you've setup your buffers correctly. Can you elaborate more on setting up the buffers?
|
# ? Dec 14, 2008 22:21 |
|
You don't use a database because it's fast, you use a database because it gives you the best interface to inserting, updating, and selecting data (with locking, rollbacks, all the other features that relational databases give you). If your app somehow just needs 1 huge text file to work, and you're not going to have to insert / update / delete data from the text file, then why use a database?
|
# ? Dec 15, 2008 19:34 |
|
In my database, I have a table called "flights" which has information about the flight, as well as a plane_id, which is connected to a row in a "planes" table which has information about the plane. I have a user who has like 900 planes in his database which are just basically "orphan" planes. I want to remove these duplicate planes, which do not have a single flight connected to them. How do I do this? I came up with a query which pretty much does what I want that uses LEFT JOIN's and discards all entries that have a flight_id value greater that's not NULL, but this query for some stupid reason takes like over 40 seconds to run on normal user_id's, and for this particular user, it I guess takes too long because it always returns a "1053" error. Whats the best way to do this? edit: here is the query that is taking too long to run: code:
nbv4 fucked around with this message at 20:04 on Dec 15, 2008 |
# ? Dec 15, 2008 19:49 |
|
nbv4 posted:In my database, I have a table called "flights" which has information about the flight, as well as a plane_id, which is connected to a row in a "planes" table which has information about the plane. I have a user who has like 900 planes in his database which are just basically "orphan" planes. I want to remove these duplicate planes, which do not have a single flight connected to them. How do I do this? I came up with a query which pretty much does what I want that uses LEFT JOIN's and discards all entries that have a flight_id value greater that's not NULL, but this query for some stupid reason takes like over 40 seconds to run on normal user_id's, and for this particular user, it I guess takes too long because it always returns a "1053" error. Whats the best way to do this? What are your indexes like? Make sure you have an index on pilot_id, flight_id, and plane_id (on both tables!) How big is the flights table? For it to be taking this long, you'd both have to have no indexes AND a table >1,000,000 rows. Personally, I'd do: code:
|
# ? Dec 15, 2008 20:19 |
|
^^ actually I'm stupid. There were no indexes... I added the indexes and now it runs in under a second.
|
# ? Dec 15, 2008 20:29 |
|
What database system is this that something like that takes over 40 seconds?
|
# ? Dec 15, 2008 20:56 |
|
Triple Tech posted:What database system is this that something like that takes over 40 seconds? MySQL 5 with about 14,000 rows total and no indexes lollers.
|
# ? Dec 15, 2008 21:04 |
|
I guess I'm still very new to databases, and I'm not exactly how powerful our hardware is (and it might be since THE MANAGERS just like to throw money at the problem instead of make things The Right Way (tm)). But in my limited experience, anything that's over a couple seconds is suspect to being approached incorrectly, and I work with tables that are at least in the hundreds of thousands rows range. Tens of seconds is just... Crazy to me. Chalk one up for Microsoft and SQL Server, I guess. The longest part of the process I'm working on is selecting (almost) an entire table of data (like four million rows) and "translating" it to another format for insertion in a different table. That takes like... 13 minutes?
|
# ? Dec 15, 2008 21:24 |
|
Triple Tech posted:I guess I'm still very new to databases, and I'm not exactly how powerful our hardware is (and it might be since THE MANAGERS just like to throw money at the problem instead of make things The Right Way (tm)). But in my limited experience, anything that's over a couple seconds is suspect to being approached incorrectly, and I work with tables that are at least in the hundreds of thousands rows range. Tens of seconds is just... Crazy to me. Chalk one up for Microsoft and SQL Server, I guess. I've had some stuff that was pretty well-optimized and still took over half a day to run, but it was Type 2 slowly-changing dimension stuff on a multi-billion row database, so it's entirely understandable to take quite some time.
|
# ? Dec 15, 2008 21:38 |
|
What is type 2? Kind of difficult to Google for, I'm not sure what I'm looking at.
|
# ? Dec 15, 2008 22:11 |
|
Triple Tech posted:What is type 2? Kind of difficult to Google for, I'm not sure what I'm looking at.
|
# ? Dec 15, 2008 23:11 |
|
Triple Tech posted:What is type 2? Kind of difficult to Google for, I'm not sure what I'm looking at. wikipedia posted:http://en.wikipedia.org/wiki/Slowly_Changing_Dimension#Type_2 Unfortunately sometimes it's appropriate and you just have a buttload of data to process.
|
# ? Dec 15, 2008 23:42 |
|
What are some good books or other resources to learn about SCD and other types of data modeling? When I had trouble reporting on when people moved from one supervisor to another or changed job functions I re-invented Type 3 without knowing what it was until today. Unfortunately I'm just a report bitch and never got my ticket pushed through to make it happen and have since moved on to another department that doesn't have the same problems.
|
# ? Dec 16, 2008 00:18 |
|
Aredna posted:What are some good books or other resources to learn about SCD and other types of data modeling? "The Data Warehouse Toolkit" is pretty much the bible of dimensional data warehousing, but it is pretty long. I don't recommend it unless you actually want to have a career in the field.
|
# ? Dec 16, 2008 09:58 |
|
How can I insert large amounts of binary data via SQL without using load_file? MySQL 5.0 C++
|
# ? Dec 16, 2008 14:58 |
|
ODC posted:How can I insert large amounts of binary data via SQL without using load_file? You could try using a tool like SyncSort (http://www.syncsort.com/products/dmx/home.htm, not free, you cant even download a trial), but I'm not sure it would be any faster than just writing your own import script which does the inserts line by line. In Oracle you could use SQL*Loader (sqlldr), which basically bypasses the normal oracle interface for inserts and instead writes the data straight to the table. It also works with binary data.
|
# ? Dec 17, 2008 02:41 |
|
hexadecimal posted:Can you elaborate more on setting up the buffers? Use InnoDB, and set innodb_buffer_pool_size to something big (size of data + indexes + 10%). Data and indexes will be stored here in memory, rather than on disk.
|
# ? Dec 17, 2008 07:23 |
If there are two tables (mysql): product -id name productvote -productId -timestamp I'm trying to get a list of products ordered by the time it was last voted on. SELECT product.id, product.name FROM productvote JOIN product ON product.id = productvote.productId ORDER BY productvote.timestamp DESC; That seemed to work, but it can show duplicate products. Doing a SELECT DISTINCT instead didn't seem to work like I thought it did, why not? Or should I do it like this? How do I maintain the order of the productIds inside the IN() in the result of the parent query? SELECT product.id, product.name FROM product WHERE product.id IN (SELECT DISTINCT productvote.productId FROM productvote ORDER BY productvote.timestamp DESC);
|
|
# ? Dec 17, 2008 20:41 |
|
fletcher posted:If there are two tables (mysql): given that it's mysql, you probably want something like code:
|
# ? Dec 17, 2008 20:50 |
|
fletcher posted:If there are two tables (mysql): Personally I'd do: code:
|
# ? Dec 17, 2008 22:44 |
|
Praetorian42 posted:Personally I'd do: As someone who has wrote a poo poo load of sql, your solution offends me. I think 99% of the time, this is a horrible way to do it. Every row in the product table is going to spawn 1 select from the product vote table. It's going to take hundreds of times longer to complete. Sure, its fine if your data set is small but even so, you shouldn't encourage doing things the wrong way. You get the same exact result just by creating a subquery of product and max(timestamp) and joining that with your product table. This is how Jethro solved the problem. It's a 1 to 1 relationship and it should be super fast, and it properly scales with data set size.
|
# ? Dec 17, 2008 23:28 |
Thanks for all the responses guys.Jethro posted:given that it's mysql, you probably want something like This query seems really slow (~2 sec for a pretty small data set) What indexes would help me with a query like this? code:
|
|
# ? Dec 18, 2008 03:46 |
|
tayl0r posted:As someone who has wrote a poo poo load of sql, your solution offends me. I said it would be slow. I was going for something that was easy to understand. It doesn't help the person if you give them a query and they don't understand what makes it work. ORDER BY subqueries are conceptually very easy to understand (for me at least), while joining on a subquery might not be so much. If there wasn't already a fast solution posted I would have given one.
|
# ? Dec 18, 2008 04:16 |
|
fletcher posted:Thanks for all the responses guys. Make sure you have an index on productId. Also: Is it just me or do MySQL explain plans suck rear end compared to Oracle/Postgres ones? That poo poo is impossible to decode. Operation Atlas fucked around with this message at 04:31 on Dec 18, 2008 |
# ? Dec 18, 2008 04:22 |
Yep, there is. edit: ^ yeah that's why I asked, I don't know how to make sense of it either. fletcher fucked around with this message at 04:44 on Dec 18, 2008 |
|
# ? Dec 18, 2008 04:28 |
|
One of the most basic concepts of writing SQL, that I think a lot of people don't realize, is that any result set can be used just like a table. You can select from it, join it with other tables or result sets, etc. The problem in this case is that you have a one-to-many relationship between the product and productVote tables. You want to get 1 row back per product. So, when you join those two tables together you need to somehow get the productVote table down to only 1 row per product. The proper way to do that, in this case, is with the max() aggregation function, grouping by productID. That gives you back 1 row per product, with the most recent date that it was voted on. Then, join that result set with the product table and you are good to go. Once you can understand SQL concepts like this, you can start writing some pretty awesome queries. This is how Jethro solved the problem originally.
|
# ? Dec 18, 2008 04:51 |
|
You basically want to have an index on anything that is in your where clause. The thing to keep in mind is that if you're going to be getting back a majority of the rows that are in the table, the optimizer probably thinks its faster to just do a full table scan and not use any indexes at all. In this case, since all products probably have a rating, your database is probably not using any indexes since it is accessing the majority of the records in the product table and in the productVote table. So, it's probably doing 2 full table scans. You might be able to build an index on the product table on productID and productName, then it might realize it could just get all the data it needs from the index and not have to hit the table at all. One thing that I think is useful sometimes is just to build a shitload of indexes. Build indexes on every individual column, build indexes on multiple columns together, etc. Make sure you do an analyze or a gather stats or whatever you need to do to make your database regenerate it's index and table stats. Then, look at your explain plan for your questy and see if it is using any of those indexes. Maybe try to give it some hints if its not performing the way you expect. Then, drop the indexes that it isn't using.
|
# ? Dec 18, 2008 05:00 |
|
First, let me reiterate that MySQL sucks rear end. I kind of wonder whether people tend not to know the better ways to formulate SQL (like tayl0r was mentioned) because of this. I am starting to wonder how much MySQL stunts people, kind of like BASIC (VB, not as much) supposedly stunts people. tayl0r, it's freaking awesome to completely understand the relational nature of DB work, but do understand that it can be useful to have an equivalent (logic-wise) subquery version that you can point to and say, "these do the same thing, but this one is better and here is why". Don't require such a fricken abrupt transition between traditional imperative thought and relational-functional thought. And an aside, I find this syntax easier to read; does anyone agree with me? code:
|
# ? Dec 18, 2008 06:51 |
|
Good point Victor. I should realize that it does take quite a mind-shift to think relationally instead of procedurally. Another thing I've noticed is that there are a ton of programming aids, frameworks, libraries, and tools out there solely devoted to preventing you from having to write SQL. It's really easy to write applications that have complicated database backends, but not write a single line of SQL yourself. I don't like that syntax. I prefer to keep each query tabbed over the same amount, like this: code:
|
# ? Dec 18, 2008 07:47 |
|
Perhaps this might be good for the original post: The Database Programmer. It's a blog written by a guy who seems to understand databases well, on database fundamentals. I didn't have any issues with his first few entries (table of index), but I haven't vetted all of them. No derived tables yet or in the plans, unfortunately. :-( IMHO, we need more web-accessible, free content on doing a better job interacting with databases. From what I've seen, The Database Programmer blog shows promise.
|
# ? Dec 18, 2008 08:12 |
|
Victor posted:And an aside, I find this syntax easier to read; does anyone agree with me? code:
|
# ? Dec 18, 2008 09:30 |
|
Victor posted:Perhaps this might be good for the original post: The Database Programmer. It's a blog written by a guy who seems to understand databases well, on database fundamentals. I didn't have any issues with his first few entries (table of index), but I haven't vetted all of them. No derived tables yet or in the plans, unfortunately. :-( This site looks awesome so far. Also, capitalizing keywords isn't really needed with good syntax highlighting.
|
# ? Dec 18, 2008 15:49 |
|
code:
2. SQL/86 joins > SQL/99 joins, also write your joins left-to-right as you go down your table list 3. align commas and the last character of standard keywords 4. all lower case (except top-level query column aliases for clarity) 5. indent subqueries 6. always use the "as" keyword 7. use short table aliases, no more than 4 characters I frequently write and maintain 300+ line queries and strictly adhering to this system (plus more rules specific to case, decode, etc.) is what keeps me sane.
|
# ? Dec 18, 2008 17:20 |
|
|
# ? May 28, 2024 14:58 |
|
Do you guys use any tools like PL/SQL Developer, Oracle SQL Developer, Visual Studio, etc? PL/SQL Developer is by far the best tool that I've ever used for writing SQL and working with the database. It's specifically a PL/SQL development IDE but it also has a ton of things just for browsing database objects, looking at explain plans, writing SQL, importing & exporting data from tables, comparing table DDL in two different schemas or databases, etc. It's a shame it only works with Oracle.
|
# ? Dec 18, 2008 20:30 |