|
The events system was actually just an example to couch the general principle in and get it across, I'm not actually writing an events system. In my particular case, the FKs will not introduce significant overhead. I was more concerned about what people thought of breaking 3rd-normal form in order to simplify the matter of referring to various different types, and whether people had actually used the EAV anti-design pattern I mentioned in Option 3 and regretted it. Thanks for the comments so far!
|
# ? Jan 16, 2008 14:24 |
|
|
# ? May 15, 2024 02:51 |
|
minato posted:In the past I've implemented Option 1, like a good little database normalizer. However the clunkiness of doing so has led me to want to implement Option 2. The rest of my team (not very DB-design-savvy) is advocating Option 3, which to me is right out because it would take god himself to pry FK constraints from my dead hands. But I don't know, I've never had problems with it. The most common way I've used option 3 is in a "files" table, where a file can be attached to any other item in the system. I have the file table itself, which only stores info pertaining to the file, then a table I usually call file_item_jn - the join table has the file ID and two fields called foreign_table and foreign_key. So I haven't regretted it in the places I've used it. But then I did develop another system where I have separate join tables for each "type" - and then I have a view that unions them all and creates the type field for me (so a "virtual option 3"), since there are places where I have to provide a snapshot of all the data at once.
|
# ? Jan 16, 2008 14:35 |
|
minato posted:The events system was actually just an example to couch the general principle in and get it across, I'm not actually writing an events system. In my particular case, the FKs will not introduce significant overhead. The EAV approach would be laughably inadequate at the scale my database operates. The normalized version is a little tricky to deal with when a system wants to read through events, like the mail sender I mentioned. What it does is operate over new events in batches. Over each batch (range of event IDs), it does a single query to get a count of events of each type that happen to be there. It can then construct a set of queries, one per type in the batch, to get the events of each type. The queries are built from SQL generated automatically from the XML. That gnarliness is all centralized in one piece of code, so nothing else in the system really needs to care about it. When loading just a single event, of course, there's no problem because again the SQL is built straight from the event specification.
|
# ? Jan 16, 2008 14:48 |
|
Golbez posted:Elaborate please, are they getting rid of/drastically fixing fulltext in a few months? No, we'll just be migrating to a new solution.
|
# ? Jan 17, 2008 00:31 |
|
cletus42o posted:I've always used option 3, but then I never use constraints in my database - I handle all of that in my application. But maybe I should rethink that? Yes, yes you should. There is a reason why there is millions of dollars put into R&D. The database will do constraints, and transactions faster, more accurately, and just plain better than anything you can possibly conceive of.
|
# ? Jan 17, 2008 00:45 |
|
Xae posted:Yes, yes you should. Yes, but constraints are not free. Example: if you have a column in a table that references another table, and you put a FK constraint on that, you'd better put an index on it too - even if you have no reason to do so. Why? Because if you don't, when you delete rows from the target table the server will have no choice but to do a table scan of the other table to make sure you're not violating the constraint. Another example: in the case discussed above, with the head table and a heterogeny of auxilliary tables, if each auxilliary table has a FK constraint back to the main table you'll have hell to pay when it's time to clean out a bunch of rows from the main table. (Actually what you'll have to do is drop all the constraints, clean out the crap, and then re-add the constraints - if you don't you'll be waiting days. This has happened to us and we now don't have those constraints.) I believe that FK constraints are a good idea when appropriate and when you can afford them, but to blindly throw them into your schema whenever you have a cross-table reference is not wise.
|
# ? Jan 17, 2008 01:54 |
|
It's true that constraints of any sort make deletes more costly, but deletes are costly in the first place and should be avoided as a matter of design if possible. I would say start with FK's where they should go, and consider dropping them for cases where you absolutely need to in order to maintain performance.
|
# ? Jan 17, 2008 03:00 |
|
m5 posted:Example: if you have a column in a table that references another table, and you put a FK constraint on that, you'd better put an index on it too - even if you have no reason to do so. Why? Because if you don't, when you delete rows from the target table the server will have no choice but to do a table scan of the other table to make sure you're not violating the constraint. m5 posted:I believe that FK constraints are a good idea when appropriate and when you can afford them, but to blindly throw them into your schema whenever you have a cross-table reference is not wise. However like all principles, they sometimes need to be violated in the name of performance.
|
# ? Jan 17, 2008 03:11 |
|
minato posted:Sure, but 99% of the time the FK points to the PK of the target table, and PKs should always have an index on them to ensure uniqueness. Foreign Keys, by definition point to the primary key (or unique field, which could be a primary key). If it isn't a key, it is just a lovely denormalized field. Having primary keys doesn't hurt performance, unless the developers are morons. So only semi-frequently does it hurt performance. Most databases will automatically create an index on a column once it is the primary key anyway. quote:Yes, but constraints are not free. Plus, if you can't be sure you data is accurate why are you bothering to store it? A full scan, only if you don't bother making an index, is a required thing. Otherwise you breach data integrity. You have orphaned records. You are now hoping that something else doesn't take the old PK and completely gently caress your system. I prefer to rely on something more than hope and dumb luck... Any system in which deletes would cause that much of a strain that it would be unworkable with todays hardware probably isn't deleting anyway. They simply end date the record, set a delete flag or otherwise execute a logical delete. Data has value, and you don't just throw it away. Xae fucked around with this message at 04:41 on Jan 17, 2008 |
# ? Jan 17, 2008 04:35 |
|
Xae posted:Plus, if you can't be sure you data is accurate why are you bothering to store it?
|
# ? Jan 17, 2008 04:59 |
|
minato posted:Sure, but 99% of the time the FK points to the PK of the target table, and PKs should always have an index on them to ensure uniqueness.
|
# ? Jan 17, 2008 05:01 |
|
Victor posted:I'm afraid I have a hard time seeing how you have the experience to back this up. At least, where I work, sometimes we go without constraints, but have nightly integrity checks that run to ensure there are no orphaned records. This doesn't catch the circumstance where one can have orphan records created, incorrectly, that then get deleted before this integrity check runs. However, from my small amount of experience, I do not see this special case as a bug pattern common enough to worry about. You are still running the checks, is your machine that pegged on CPU that you can't afford to ensure that the data you are bothering to store is being stored correctly? Lets face it, hardware is cheap, the programmer time to build and maintain your own custom system to keep integrity cost more than just buying a better boxx. Also if you are that desperate for computing power why are you deleting in the first place? That isn't even touching on the fact that a single failure on an enterprise level system can cost serious cash.
|
# ? Jan 17, 2008 05:06 |
|
You're just assuming too much. The rule that programmer time is more valuable than computer cost is a good one, but it isn't all-covering, all-important, never-inapplicable. To give you an idea, checking a foreign key constraint can keep transactions open a tiny bit longer, which result in locks being held a bit longer, which can, if one is near a certain breaking point (think adding 1% to a fully packed highway), will cause the system to go crazy. Moreover, dealing with constraints can be a royal PITA, due to having to delete in the right order, and having major issues with self-referencing tables.
|
# ? Jan 17, 2008 05:35 |
|
Victor posted:At least, where I work, sometimes we go without constraints, but have nightly integrity checks that run to ensure there are no orphaned records. Currently I'm not sure of either of those things in my own app, which is why I'd be reluctant to do it this way. Xae, I see your point of view but you seem to be under the assumption that you can throw enough hardware and money at any scalability problem in order to fix it. I don't think that's always the case, and isn't always practical. I'd love it if I could get better performance out of my app just by putting it on bigger iron.
|
# ? Jan 17, 2008 05:35 |
|
Victor posted:You're just assuming too much. The rule that programmer time is more valuable than computer cost is a good one, but it isn't all-covering, all-important, never-inapplicable. To give you an idea, checking a foreign key constraint can keep transactions open a tiny bit longer, which result in locks being held a bit longer, which can, if one is near a certain breaking point (think adding 1% to a fully packed highway), will cause the system to go crazy. Moreover, dealing with constraints can be a royal PITA, due to having to delete in the right order, and having major issues with self-referencing tables. If your system is at capacity why not execute an immediate logical delete, then come back for the "real" delete during a batch window? It will relieve stress on your system and keep data integrity. The database doesn't simply decide to do checks for a random reason. It does them because it needs to. It needs to make sure that the integrity of the database is being kept. If you do not check for child records you will problems. Someone running an OLAP system is now getting records, which should have been deleted counted in their statistics. You don't need to throw big iron at things to solve performance problems. Even a $5 k or $10k server is a hell of a machine these days. If an application takes a month of programmer time, it costs ~$10K between salary, benefits and overhead. This is why throwing hardware at problems is popular, because it works. If you are working in a completely budget constrained organization things may be different, but in that case you shouldn't delete immediately anyway, because it takes too many resources. Constraints are a tool, and a very useful one. They may force you to change how your normally operate, but they do so for good reasons.
|
# ? Jan 17, 2008 13:38 |
|
Xae posted:A full scan, only if you don't bother making an index, is a required thing. Otherwise you breach data integrity. You have orphaned records. You are now hoping that something else doesn't take the old PK and completely gently caress your system. I prefer to rely on something more than hope and dumb luck... You're not familiar with the real world. The tables I'm talking about have tens of millions of rows in them. We delete stuff because we need to curb infinite growth in the database - I think it's well over 100GB now in backup form. You only have orphaned records if the software has bugs in it, mostly. If there are a few orphaned records floating around among millions and millions of fine happy ones, I'm OK with that. I don't need my database to be a perfect shining jewel of integrity. I need it to mostly work, and to support the heavy load it's under with acceptable performance.
|
# ? Jan 17, 2008 15:04 |
|
Xae, one major problem with your argument is that you assume getting a new server is an easy thing to do. Combine legacy issues with politics, and it's not a simple choice. Yes, it would be nice if people were all-around competent and powerful hardware could simply be purchased. I know this because we recently got a $6K DB server and it screams. However, your making blatant generalizations are... unsettling.
|
# ? Jan 17, 2008 20:12 |
|
Filash posted:No, we'll just be migrating to a new solution. Tease.
|
# ? Jan 17, 2008 21:09 |
|
m5 posted:You're not familiar with the real world. The tables I'm talking about have tens of millions of rows in them. We delete stuff because we need to curb infinite growth in the database - I think it's well over 100GB now in backup form. I have worked on systems ranging from a ~1995 sunbox to a 45node Terradata array. I have worked on databases that where written before "relational" was an option. I am quite familiar with the real world. I mainly do remediation work. I come in when company decides they want a data warehouse only to realize that their OLTP side is a mess and can't be processed until they unfuck it. I see this crap about "oh, we can manage it better than the database", every goddamn time. Other popular excuses include "Well, it slows down the database!" or "We didn't want to write joins". quote:You only have orphaned records if the software has bugs in it, mostly. quote:If there are a few orphaned records floating around among millions and millions of fine happy ones, I'm OK with that. I don't need my database to be a perfect shining jewel of integrity. I need it to mostly work, and to support the heavy load it's under with acceptable performance. There is a cost associated with using bad data, and it is almost always far greater than the cost of fixing the problem.
|
# ? Jan 17, 2008 22:56 |
|
Look, I don't know how to explain the situation other than to say that maintenance of foreign key constraints in the tables we have has sometimes become untenable. I have not claimed that the code is perfect, nor that I expect no software errors. The issues are performance issues. We cannot buy a server much bigger than what we've got, and partitioning the database is more work than can be afforded now (possibly ever).
|
# ? Jan 18, 2008 00:07 |
|
Part of the thing is that he is asking for design advice. You need to catch things like "will we need to partition" before you start making the system. If you don't, then you end up in a situation where you become trapped and can't fix it. If he was running in a system I wouldn't advice changing it, but since the system isn't yet built it is best to design it correctly, so that when the inevitable happens, and you are in the best position to deal with the situation. You design following "the rules", then break where needed due to money, political or other "real" constraints.
|
# ? Jan 18, 2008 00:24 |
|
Xae posted:Part of the thing is that he is asking for design advice. You need to catch things like "will we need to partition" before you start making the system. If you don't, then you end up in a situation where you become trapped and can't fix it. Oh we knew we would have to partition, and we definitely can, but it's not going to be a small project. There's more to it than just the database part. (The overall situation is not the best one could hope for; retards are now in control of what we do.) Note that what we've done is remove FK constraints where they became too burdensome. Where they're not, we still have them.
|
# ? Jan 18, 2008 01:22 |
|
So I am a terrible SQL noob and basically am trying to execute a reasonably complicated expression. I am importing some crap from some xml/txt files into a MySQL database through python and basically need help with an INSERT statement into a table that can reference information from 2 other tables. So I have a table of users with attributes and a table of groups. I have a txt file for groups that lists all the users of each group and I want to build a relationship table to have an id and then a user_id and group_id. So I want to be able to write an INSERT statement to insert into relationship (user_id, group_id) but I need to look up the user_id and group_id from my other tables as I go. So what I think i need guidance the most is SQL user variables.
|
# ? Jan 18, 2008 19:43 |
|
It's typically best to load the data you have into some table created to hold said data, and then do the transformation with SQL statements.
|
# ? Jan 18, 2008 21:12 |
|
I'd like to include an example of what I did so some can explain A) Why this was so slow and B) What a better method would have been. INSERT INTO memberships (user_id, group_id) SELECT users.user_id, groups.group_id FROM users, groups WHERE users.user_name = '%PYTHONVAR' AND groups.name = '%PYTHONVAR' As I parsed through the txt file and found a group and then as i found a user i would generate a query like this. PS Overall it made like 70k writes into memberships
|
# ? Jan 19, 2008 05:10 |
|
schzim posted:I'd like to include an example of what I did so some can explain You could have uploaded the (username, groupname) pairs into their own table and then run a single insert/select (which is what Victor was suggesting). code:
|
# ? Jan 19, 2008 13:51 |
|
Is there any site or book that goes in to the best way to construct good search engines in PHP/MySQL? Surely fulltext isn't the end-all be-all to it... right?
|
# ? Jan 19, 2008 21:42 |
|
Golbez posted:Is there any site or book that goes in to the best way to construct good search engines in PHP/MySQL? Surely fulltext isn't the end-all be-all to it... right? Step 1: use a text indexing search engine like Lucene
|
# ? Jan 19, 2008 22:23 |
|
code:
What sort of query could I run to return the row with the largest ID number? I know it has to do with MAX.. but I just cant quite seem to figure it out (yes I know ) SELECT * FROM EXAMPLETABLE WHERE ID = (SELECT MAX(ID) FROM EXAMPLETABLE) (Edited to make a fix so that maybe this is an acqually feesable query?) Or something like this? All IDs are unique. And I doubt the query proposed there would even execute. This isnt a real database, I'm just trying to make sure I understand some things at the moment. Walked fucked around with this message at 16:40 on Jan 21, 2008 |
# ? Jan 21, 2008 16:30 |
|
That's exactly how you'd do it. That query would work fine.
|
# ? Jan 21, 2008 16:46 |
|
minato posted:That's exactly how you'd do it. That query would work fine. Woah, raddical Thanks To bump it up, how would one get the most recent n entries? I know to script it with multiple recursive queries from C#, but is there a way to run a single query to get the two highest IDs?
|
# ? Jan 21, 2008 16:52 |
|
You could do a SELECT * FROM exampletable ORDER BY id DESC LIMIT 0,2
|
# ? Jan 21, 2008 17:06 |
|
Walked posted:Woah, raddical Some databases have pagination or "TOP" qualifiers. In SQL Server you can say code:
|
# ? Jan 21, 2008 17:06 |
|
In sql server 2000, is there any way to check if a Text or varchar parameter is a valid xml document or fragment? Sql 2005 is not available for financial and political reasons. Basically, calling sp_xml_preparedocument with inputs that aren't valid xml causes an error that appears to be un-trappable. I'd like to have this SP indicate that condition by return code instead of bailing during execution.
|
# ? Jan 23, 2008 17:22 |
|
ray2k posted:In sql server 2000, is there any way to check if a Text or varchar parameter is a valid xml document or fragment? Sql 2005 is not available for financial and political reasons. Basically, calling sp_xml_preparedocument with inputs that aren't valid xml causes an error that appears to be un-trappable. I'd like to have this SP indicate that condition by return code instead of bailing during execution. This should work: code:
|
# ? Jan 23, 2008 17:55 |
|
Not a question, just a potentially handy proc for folks using MS SQL Server (works in 2k5, though I think it should work in most versions). If you've ever wondered if a table/proc/view/whatever was referenced in one of your stored procedures and didn't want to rely on your RDBMS's dependency tracking (if any), here's a proc that will do a full text search of each proc for whatever text you give it. This should help you hunt down any instances of usage of procs/tables/views, unless of course they are data-driven (ie, you've got table names/SQL in actual tables and you execute those).code:
|
# ? Jan 23, 2008 18:14 |
|
I have a fun little query that I wanted to write, and I thought it would be really simple, but turns out that it may actually be impossible to do in one query. I wanted to get your guys' ideas about what may be the best way of doing this. I have two tables. One is a table of items. The other is a table of attributes, with multiple possible attributes of several attribute types possible for each item. So I have something like this: code:
I've been trying different group by and count combinations, but at this point I'm ready to give up and just give each tag type its own table. Is there a way of just making this one query or should I just separate the tag types into their own tables?
|
# ? Jan 24, 2008 23:12 |
|
Quick one, and the answer is probably "no", but is there a way to insert a row into a MySQL table with an autoincremented field and get the value of the autoincrement field, all in one step?
|
# ? Jan 24, 2008 23:32 |
|
If you're using php, you can call mysql_insert_id() after the insert.
|
# ? Jan 24, 2008 23:34 |
|
|
# ? May 15, 2024 02:51 |
|
USSR Lives On posted:I have a fun little query that I wanted to write, and I thought it would be really simple, but turns out that it may actually be impossible to do in one query. I wanted to get your guys' ideas about what may be the best way of doing this. SELECT Count(attributes.item_id) AS item_attribute_type_count, attributes.tag_type, attributes.item_id FROM attributes GROUP BY attributes.tag_type, attributes.item_id is that what you mean?
|
# ? Jan 25, 2008 01:01 |