|
jwnin posted:As a result, our devevelopment team thinks that the best thing to do is to split each customer out into its own database. The catch is that there are around 20,000 customers per year, and each customer database would be about 100 MB in size. The more disparate databases you have, the more work you'll have to do when it comes to setup, backup and restore, and ensuring schema changes are propagated to each database. And it's utter hell to do any sort of report that requires data from multiple databases. Depending on the complexity, a cross-database report would have to extract the necessary info from each database, insert it into a temporary database , and then run the report. That's very slow. I speak from experience: our company had a database built around our product, and then management surprised us by announcing a new product. We didn't have time to rejig the database to support multiple products, so we simply duplicated the database schema and kept them separate, hoping that some day we'd be able to merge the two. It's caused us a load of grief and time and I wish we'd kept it all in one.
|
# ¿ Nov 9, 2007 02:19 |
|
|
# ¿ Apr 28, 2024 21:39 |
|
Argue posted:I'm using Oracle. What's the best way to do something similar to WHERE name IN (a set of values), except instead of checking for equality, we use the LIKE keyword? It would end up being functionally the same as WHERE name LIKE val1 OR name LIKE val2 OR name LIKE val3, except the number of right hand values can change. Currently, what I'm doing is pre-preparing the SQL by iterating over the list of possible names (which are not retrieved from the DB) and appending "OR name LIKE ?" each time. code:
|
# ¿ Nov 9, 2007 08:38 |
|
Imagine that you want to enter the user information into an Excel spreadsheet, like so:code:
So if you want to look up the information for user "bob", then you might use the SQL: code:
minato fucked around with this message at 10:17 on Nov 12, 2007 |
# ¿ Nov 12, 2007 10:12 |
|
nbv4 posted:One more question. Is there a guide somewhere that helps you determine which data type is the best for certain kinds of data? For instance, whats the best type to use to store a users email address? What if you have a value that only has a small number of values? Such as, for example, car manufacturer. Lets say there are only 5 possible manufacturers. Would it be best to store that data as either "1,2,3,4,5", then have each number correlate to a manufacturer, or would it be best to just store it as a string? Or what about setting up 5 different boolean rows for each manufacturer? Would it matter which way you did it? To store a car manufacturer, there are lots of ways you could do it, and there are pros and cons to each way. If you chose to use a string and store it directly (e.g. "Chrysler"), then it's easy to just browse the car rows and see which car corresponds with which manufacturer. However most people will use integers because it's more space efficient, faster for the database to process, is less prone to spelling/case errors, and is less effort to update if the name ever changes (say, due to a typo). Most people would set it up like this: code:
|
# ¿ Nov 12, 2007 11:34 |
|
Not closing the </form>?
|
# ¿ Nov 14, 2007 07:32 |
|
You need to put the parameters into quotes. Try something like this:php:<? $sql = sprintf( "SELECT * FROM pilots WHERE email='%s' AND password='%s'", mysql_real_escape_string($email), mysql_real_escape_string(md5($password)) ); $result = mysql_query($sql); ?>
|
# ¿ Nov 19, 2007 05:57 |
|
Opius posted:However, instead of just finding the items with the most 'yay' votes, I'd like to find the ones with the highest ratio of 'yay' votes to 'nay' votes. code:
Edit: Jethro's solution is better, replace my count(vote) with "sum(case when vote="yay" then 1 else 0 end)".
|
# ¿ Dec 3, 2007 01:33 |
|
That's what you get for choosing the Entity-Attribute-Value model, which many people consider to be a design anti-pattern because of its many drawbacks. You're either going to have to do a JOIN per column, or a sub-select like you're doing now.
|
# ¿ Dec 12, 2007 02:38 |
|
fletcher posted:So does that mean the views column shouldn't be updated every time a unique user views the article? So there's a trade-off. If your "views" column is being updated very very frequently (like, you're slashdot or something), then adding an index to that column will slow down your DB a little bit, but the upside is that your report to get the top 100 articles sorted by view will be very quick. You'd have to have a pretty heavy-traffic site to notice any slowdown though. In the event that you did get unacceptable slowdown, the solution would be to sacrifice accuracy for speed. Either: - Have two "views" columns, one indexed and one unindexed. Keep the unindexed one up-to-date and accurate, and periodically (via a cronjob) copy over this value to the indexed column. - Have one unindexed "views" column, but cache the results of the top 100 articles so that you don't see the performance hit every time.
|
# ¿ Dec 12, 2007 10:16 |
|
Dakha posted:This is probably really easy to fix, but I'm trying select a list of material numbers, and their corresponding text descriptions. However, the table with the text descriptions has several entries for each material number, for different languages. I want to get the german description, unless it doesn't exist, in which case I want the english descriptions. code:
|
# ¿ Dec 12, 2007 19:45 |
|
I have a general SQL table design question. Let's say I have a table that stores "application events". The table needs to refer to the source of the event, and these sources may be of various different types, from various unrelated sub-systems of the application. E.g. - Forum thread X was updated - File Y was uploaded - User Z was added I'm trying to work out the best way of designing a table to store the reference to the source, and using a FK constraint as well (so that deleting the source of the event would also delete the event). Option 1) 3rd-normal form ahoy! ----------------------------- code:
Pros: +More space efficient (to an extent) +More correct database normalization Cons: -Suckier to work with (requires JOINs, more complex collation code) -Small performance hit from the FK constraints on event.event_id -A new table is required to add a new type. Option 2) Use a single table: ----------------------------- code:
Pros: +Simple to understand +Means only searching one table, less schema management +A new column is all that's required to add a new type. Cons: -Should implement a row "CHECK" to ensure that exactly 1 of the event-specific fields was chosen. -A little more code required to identify which type of event this is. Option 3) Store a "type" and ID ----------------------------- code:
Pros: +Succinct and simple Cons: -Not possible to use FK constraints on "id". -Alarmingly like the "Entity-Attribute-Value" anti-pattern. In the past I've implemented Option 1, like a good little database normalizer. However the clunkiness of doing so has led me to want to implement Option 2. The rest of my team (not very DB-design-savvy) is advocating Option 3, which to me is right out because it would take god himself to pry FK constraints from my dead hands. Has anyone had any experience on what is the best approach?
|
# ¿ Jan 16, 2008 05:28 |
|
The events system was actually just an example to couch the general principle in and get it across, I'm not actually writing an events system. In my particular case, the FKs will not introduce significant overhead. I was more concerned about what people thought of breaking 3rd-normal form in order to simplify the matter of referring to various different types, and whether people had actually used the EAV anti-design pattern I mentioned in Option 3 and regretted it. Thanks for the comments so far!
|
# ¿ Jan 16, 2008 14:24 |
|
m5 posted:Example: if you have a column in a table that references another table, and you put a FK constraint on that, you'd better put an index on it too - even if you have no reason to do so. Why? Because if you don't, when you delete rows from the target table the server will have no choice but to do a table scan of the other table to make sure you're not violating the constraint. m5 posted:I believe that FK constraints are a good idea when appropriate and when you can afford them, but to blindly throw them into your schema whenever you have a cross-table reference is not wise. However like all principles, they sometimes need to be violated in the name of performance.
|
# ¿ Jan 17, 2008 03:11 |
|
Victor posted:At least, where I work, sometimes we go without constraints, but have nightly integrity checks that run to ensure there are no orphaned records. Currently I'm not sure of either of those things in my own app, which is why I'd be reluctant to do it this way. Xae, I see your point of view but you seem to be under the assumption that you can throw enough hardware and money at any scalability problem in order to fix it. I don't think that's always the case, and isn't always practical. I'd love it if I could get better performance out of my app just by putting it on bigger iron.
|
# ¿ Jan 17, 2008 05:35 |
|
That's exactly how you'd do it. That query would work fine.
|
# ¿ Jan 21, 2008 16:46 |
|
fletcher posted:I understand the idea of separating logic in a model-view-control system, but I don't understand the concept of separating the SQL with a templating system. I'm working in PHP, where can I find more info about how and why to do it? What do you mean by SQL templates exactly? I can interpret what you said in 3 ways: 1) Making a paramatized SQL query, then using "prepare(query)" and "execute(query_handle, parameters)" to actually execute it. 2) A set of common-place SQL tasks which have already been written for you, you just fill in the blanks. Kind of like a sample web-site layout. 3) Abstracting persistent record information away from the main code by using some sort of Persistence Layer (also known as ActiveRecord). Or did you mean something else?
|
# ¿ Jan 30, 2008 01:34 |
|
jwnin posted:I cannot agree more with this. I get uncomfortable with ORM after a certain load. We use ORM via a bespoke Persistence Layer mechanism, and for the most part it is great. It's so convenient to be able to write code like this: php:<? $org_name = User::getByOID($user_id)->getOrganization()->getName(); ?> This kind of code is very clear, there's no SQL involved so it's more maintainable and there's less for the app developer to understand, and it's very easy to write. But where it falls down is that it's very difficult to optimize without breaking the paradigm. Say I wanted to do this kind of thing in a loop: php:<? foreach($user_id_array as $user_id) { $org_names[] = User::getByOID($user_id)->getOrganization()->getName(); ... } ?> My experience seems to jibe with that of jwnin's above, in that some of my co-developers are only willing to use ORM and will go to any lengths to avoid writing SQL (because they're not very good at it), rather than seeing ORM as just a convenient tool that should be abandoned once performance gets ridiculously slow.
|
# ¿ Jan 30, 2008 04:42 |
|
fletcher posted:Lets say you have products in one table and votes for the products in another table. How do you get the products and total votes for each product? code:
|
# ¿ Feb 4, 2008 06:05 |
|
nbv4 posted:I have a PHP script that makes a few hundred mysql queries, then executes them one by one. All of these queries are INSERT queries. The way I way it is now, if one of those queries fails, it's a pain in the rear end to go through and fix whatever problem that may have occurred. I want to have it so the system does a dry run through each query, and only if every single query is valid, it executes them all. Is there any way to test a query against the database? It's probably very simple, but Googling gives me nothing. Do you mean "Don't fail if the record is already there?". Because to do that, you can always do: code:
minato fucked around with this message at 11:33 on Feb 7, 2008 |
# ¿ Feb 7, 2008 11:24 |
|
nbv4 posted:the user uploads a file that is transformed into about 600 queries. If one line of that data is not right for whatever reason, only 599 lines will get entered, and it'll be absolute hell to get things right. I need some way to ensure 100% of the queries are correct, or else it tells the user his data needs looking over. You also sound like you might want to learn about transactions. That's where you run a "BEGIN" statement to take a snapshot of the database, run a bunch of SQL statements, and if any one of them fails you can "ROLLBACK" to the state of the database when you took the snapshot. If they all work and you're happy with it, you run "COMMIT" and it saves them all. (Edit: transactions are only supported in relatively recent versions of MySQL, I don't think 4 supports them) But really, you should be validating all data before it gets anywhere near the database before beginning any inserts.
|
# ¿ Feb 7, 2008 11:34 |
|
I've been using Postgres for the last 6-7 years, and I can't fault it. It's extremely robust, it's fully-featured, and the documentation is excellent. I can't compare performance but all the benchmarks I've seen put it equal to (if not better than) MySQL. My mind boggled when I found out that MySQL only relatively recently introduced transactions. There may be certain situations where MySQL is better, but I think most people would be very happy with Postgres.
|
# ¿ Mar 1, 2008 04:15 |
|
UltraRed posted:Why don't I want to store passwords using an md5() or SHA1() hash? What's insecure about them that makes everyone suggest going elsewhere?
|
# ¿ Mar 5, 2008 03:55 |
|
fletcher posted:Is there a way I can validate a user id actually exists before I do something like relating a row in a different table to it? In PostgreSQL it looks something like this: code:
The "ON UPDATE CASCADE" bit means that if the user ID gets updated to a different value, the photo table's user_id will be updated too. The "ON DELETE CASCADE" bit means that if a user ever gets deleted from the user table, their rows in the photo table will be automatically removed too.
|
# ¿ Mar 10, 2008 03:27 |
|
Orbis Tertius posted:I want to have a form that lets the user search a number of variables that are in several different tables and I want the resulting table to only display columns for the fields that were filled in on the form. This is not quite so simple, because it's necessary to keep track of which tables are required for the JOINs and SELECTs. Here's a very simple example, and it helps that we always use table "a" in the query: php:<? $from_tables = array(); $selects = array(); $filters = array(); if($_REQUEST['some_b_param']) { $from_tables[] = "JOIN b USING (a.id = b.a_id)"; $selects[] = "b.*"; $filters[] = "b.foo = ".escape_sql($_REQUEST['some_b_param']); } if($_REQUEST['some_c_param']) { $from_tables[] = "JOIN c USING (a.id = c.a_id)"; $selects[] = "c.*"; $filters[] = "c.bar = ".escape_sql($_REQUEST['some_c_param']); } $sql = "SELECT a.* ".implode(",", $selects)." FROM a ".implode(" ", $from_tables)." WHERE true AND ".implode(" AND ", $filters); ?>
|
# ¿ Mar 10, 2008 03:34 |
|
nbv4 posted:I have a SQL problem that I can't even figure out where to begin. I need to get the date of the third from last occurance of a record. Say this is the table: code:
But that won't work in all cases: for example if you wanted the 5th to last record (ID 1 in my example), you'll get nothing back because there's no total with id 5. What you want is the row where the total is either the position you want, or failing that, the the next number after that. So let's tweak the SQL again: code:
|
# ¿ Mar 17, 2008 08:11 |
|
You might want to order by total, then by date, to avoid cases where there are some dates with 0 approaches.
|
# ¿ Mar 17, 2008 08:33 |
|
Isn't it as simple as:code:
|
# ¿ Mar 19, 2008 02:42 |
|
fletcher posted:Ah, that is what I needed. So what about the performance issue? It takes ~4s to run this query.
|
# ¿ Mar 21, 2008 06:46 |
|
Clobbersaurus posted:I have kind of a general but easy newb question -
|
# ¿ Mar 28, 2008 19:58 |
|
I can't think of a way to do it without a subquery:code:
|
# ¿ Apr 2, 2008 04:52 |
|
"FROM foo JOIN bar ON (condition)" is equivalent to "FROM foo, bar WHERE (condition)". The main reason for JOIN's existence is to help separate the table relationships from the rest of the filtering conditions. With regards to indexing, this is what the EXPLAIN statement is for (if you can interpret the often cryptic results). A general rule of thumb is that if the DB engine has to perform a sequential scan on a large table, then you probably want an index on the column(s) it's using. Don't forget about multi-column indices. E.g. if you have a table like "student_class (student_id, class_id, ...)" and you're looking up by both student_id and class_id simultaneously, then it may be faster to add an index on (student_id, class_id) than to have two separate indexes on student_id and class_id separately. Adding indexes on ORDER BY columns generally hasn't helped at all in my experience.
|
# ¿ Apr 3, 2008 02:52 |
|
fletcher posted:If you have a users table and an articles table, what do you guys name the column that links the article to the user? 'userId' or something more descriptive like 'authorId'?
|
# ¿ Apr 8, 2008 06:24 |
|
fletcher posted:Why does adding an index on one columnn speed up one SELECT query a lot while it slows down another SELECT query? Or should that not happen and something else is causing it? Indexes should only improve the speed of SELECTs (though they may have adverse effects on the speed of UPDATEs and DELETEs). If adding an index is slowing a SELECT down, then I would guess that the query planner is stupidly using the index when a sequential search would be better (which is the case when the number of rows in the table is low). Not much you can do about that.
|
# ¿ Apr 15, 2008 08:00 |
|
Do not go for Option 1, ever. Horrible search performance, horrible update logic, no referential integrity, almost impossible to get aggregate statistics efficiently... the list goes on. Option 2 is normalized. It's only unnormalized if you're considering having an integer tag_id to go with every tag name, and there may not be much point in that since it's just a 1-1 relationship. So I don't see too much difference between Option 2 and 3. Are your tags case-sensitive? If not, you'll have to add "lower(tagname)" all over the place. This might affect indexing too - if your DB supports indexing on expressions, index on "lower(tagname)" rather than just tagname.
|
# ¿ Apr 30, 2008 04:49 |
|
It's kinda tricky, but it is possible. The way I'd do it is to assign each factor/item combination a rank, and then choose only those with a rank <= 5. Let's say for now that you don't want to randomize items, and that instead you want the 5 latest items (ordered by the "date" field). Your SQL might look like this: code:
I don't know if MySQL allows you to re-use a SELECT sub-query in the WHERE clause, you might have to duplicate the sub-query in the WHERE clause where I've put "rank". So that gets you 5 rows per factor, but they'll always be the same 5 rows. The next step is to randomize which rows in items are ranked. That's a tricky one, since SQL doesn't really work like that. You have no control over the order of rows in which the SQL expressions will be evaluated over. SQL usually relies on its expressions being idempotent (input of X results in output of Y no matter what point in time it's called). So I guess you really want a function such that given an item and a random seed, it will hash the two together to generate a unique and non-random value that's "rankable", so you can draw out the top 5 items. Perhaps md5("random string" + item.id) might be enough (where "random string" is constant, but changes every time you make the SQL call).
|
# ¿ May 14, 2008 06:41 |
|
That works for me in PostgreSQL. Are you sure you've spelt the columnname correctly? If the column name is the same as a Postgres keyword, try putting it in quotes (e.g. "A.columnname")
|
# ¿ Jun 17, 2009 19:06 |
|
What database are you using? If you're using Postgres you can use something like:code:
|
# ¿ Jun 29, 2009 22:26 |
|
What's that FROM clause doing on line 6?
|
# ¿ Jul 2, 2009 23:30 |
|
Use a sub-select?code:
|
# ¿ Jul 17, 2009 03:33 |
|
|
# ¿ Apr 28, 2024 21:39 |
|
Use a sub-select, likecode:
If you're using Postgres 8.4 you can use Common Table Expressions, but that might be overkill.
|
# ¿ Aug 6, 2009 17:25 |