|
Mackerel, the Thief posted:They... don't... use objects?! Like, at all? Everything procedural?
|
# ¿ Aug 24, 2010 22:30 |
|
|
# ¿ May 14, 2024 00:20 |
|
In a situation where transactional integrity isn't required (I'm not updating bank records here), is there any reason at all to be fancy and go with a multiple-table delete statement in MySQL over the much more understandable and sane method of using two individual delete statements?code:
code:
|
# ¿ Aug 25, 2010 23:04 |
|
Hammerite posted:I would go for the single statement because to me it seems if anything simpler. (In most situations like this I would expect a single-table delete statement would suffice, as cascading deletes would take care of the rest.) The first one seems simpler than the second?
|
# ¿ Aug 25, 2010 23:35 |
|
Sprawl posted:Why wouldn't you just add a FOREIGN KEY with an on delete to remove the entires from the other table? I was going to say, "Never used it before, but I'll read up on it and find out if we can". Then I discovered it's only supported for InnoDB, and we use MyISAM exclusively. Golbez fucked around with this message at 14:29 on Aug 26, 2010 |
# ¿ Aug 26, 2010 14:25 |
|
Doctor rear end in a top hat posted:Is there a reason? We were using MyISAM and I switched the tables I use to InnoDB so I could use foreign keys. No problems so far except apparently InnoDB saves some kind of log file that can take up a lot of room if you leave it set at the defaults. 'For the types of data and queries we run, they perform the best'.
|
# ¿ Aug 26, 2010 14:54 |
|
I figure it exists for a reason and use datetime exclusively. The various functions make it worth it in my eyes, particularly the diff ones, though if things aren't set up right you could conceivably have time zone disagreement, which wouldn't be a problem with a timestamp.
|
# ¿ Aug 30, 2010 16:33 |
|
MySQL Table:pre:user_id | log_time | status ---------------------------------------- 15 | 2010-09-01 10:14:20 | 1 15 | 2010-09-01 10:15:30 | 0 15 | 2010-09-01 10:15:31 | 1 15 | 2010-09-01 10:17:45 | 1 15 | 2010-09-01 10:18:50 | 0 Well, when you do that... let's say you're at 1 at 10:15:30, then click at 10:15:35 (so status is 0), then again at 10:15:35 (so :35's status is now 1). You now have two rows in a row with status 1. I could trim this out in the reporting stage, I figure, but I was wondering if there was an easy way to trim out consecutive rows (like 3 and 4 in the above table) in a query? 1) Is there an easy query to trim out situations like this? Like, um... "DELETE FROM table WHERE available = 1 AND next_row.available = 1", and then delete the later row? 2) Is there an easy way to do an insert and a where? Like, "INSERT INTO ... WHERE (SELECT status from last entry) != STATUS", so that I simply don't make the duplicate record to begin with?
|
# ¿ Sep 1, 2010 16:27 |
|
Hammerite posted:I'm confused about the "Status" column. Does it actually store any information beyond "how many times did this user log in during this second"? Maybe this is a case where INSERT IGNORE would be useful. Sorry, I wasn't clear; it's a boolean. i.e., "They marked themselves online at this timestamp" = 1. This is our way to generate a report showing when web chat operators are online.
|
# ¿ Sep 1, 2010 17:08 |
|
McGlockenshire posted:It honestly might be easier to separate the two pieces of data into two separate tables, one to store *only* first-time logins, and one to store *only* activity markers. You can then combine this with INSERT IGNORE and a unique key on the id/timestamp column to blindly fire updates without worrying about the state of the other column. It's simply a marker if they're available for chat, though... so it can only have two states. There's no 'initial' login that's logged. I mean, I figure I can work around how it is now in the reporting; simply find the next entry where it's different, rather than the next entry period. But I was hoping I could clean it up, if not in the insertion, then in a subsequent query.
|
# ¿ Sep 1, 2010 18:33 |
|
MoNsTeR posted:No, because you should use 1-3 character (4 at most) table aliases and refer to all columns as alias.columnname. For example "select u.user_id, e.employee_id from user u, employee e ...". Then standardize which aliases are used for which tables and you will be amazed at how easy it is to dive in and read queries from all across the system. Otherwise your expressions become hopelessly long e.g. some_really_long_table_name.super_long_column_name, etc. Not using aliases, inconsistently using aliases, and mixing alises with full table name references are all horrible, horrible patterns that you WILL kick yourself for following. I've been told to stop using aliases. And I always use "AS" for column aliases, and always omit it for table aliases.
|
# ¿ Sep 4, 2010 05:15 |
|
They don't let me use nulls in our database because of the lead dev's OCD. Now, I can understand OCD, I hate wasting DB rows for example, but is there any *real* reason to have every column be created NOT NULL?
|
# ¿ Sep 9, 2010 15:39 |
|
So, let's say an "Address2" field, the second line of an address (as much as I dislike such fields, an address belongs on one line dang it, unless you're separating out number and street into their own fields too in a postal DB) would be blank or null in the case it doesn't exist? Its value is known - their address only has one line, so the second line does not exist. In the past I would have checked this for null, but now you've got me realizing I was probably doing it wrong. On the other hand, in my chat app where it uses an "operator_id" field to see which operator said a line, and is blank if the user said it (since there can be multiple operators but only one user), that would be a valid usage of null.
|
# ¿ Sep 9, 2010 17:19 |
|
Hammerite posted:Or alternatively you could take the view that you need 2 extra tables. Addresses should never have more than one line.
|
# ¿ Sep 10, 2010 03:26 |
|
Hammerite posted:My address has 10 lines and half of them are NULL and half of the rest have newline characters in the middle. A properly formed American address needs a name, an address (everything on one line - apartment, suite, etc), a city, state, and zipcode. Building name optional. The only time I can justify giving two lines to the user is if they are insistent on giving both a PO Box and a street address, which makes sense if they can't get US mail at their street address and UPS won't deliver to their post office. But in that case, make it clear which one you want. The problem with giving multiple address lines is the first address line will cease to have semantic meaning - it could be an address, or it could be "Gelman Library", or it could be their backup PO Box, or it could be "Knock on door". Screw that. That's what a comment is for. Nothing should ever be mailed out to an address that hasn't been looked at by human eyes anyway, so. Sometimes a building/firm is useful because of a specific ZIP+4 code, but that should be in a separate field, not shlomped in with 'Address'. Same rules apply to most foreign addresses as well, though I'll allow multiple lines/textarea entry for those if a format isn't strictly known. Canadian addresses are aggressively one-liners, whereas I remember having some British addresses take up two lines, and Korean addresses even more. But on the other hand, if you're an American company, you should always spend extra time vetting foreign addresses. Basically, my complaint isn't so much using multiple columns for an address; it's that there are always multiple "address" columns, i.e. the actual street address. There's no reason that my apartment should be expected on a second line, etc. One line for the address, one line for the building/form, one line for other instructions, etc. Keep the semantics. None of this "address1" "address2" stuff.
|
# ¿ Sep 10, 2010 14:37 |
|
MoNsTeR posted:Well, yes there is, mainly that that's how 99% of people in the US write addresses, and how 99% of web forms for addresses are structured. Line 1 is street address, Line 2 is optionally for apartment/suite number etc. I'm a big advocate of obsessive hyper-normalization, but there are some places where the data structure meets reality and you just have to go with reality. quote:Addresses are actually a great example for another reason, namely zipcode. A zipcode exists in exactly one state, so your address data shouldn't even have a state field (and nor should your user form) since it's redundant. Zipcodes also partially determine your city, and if you stick to "preferred" records it fully determines your city, so your data and form shouldn't include city either. But how useful is that, really? Now you can't select from your address table and get useful data without joining to a zipcode lookup or invoking functions. And how big of a problem, honestly, is people entering cities and/or states that are inconsistent with their zipcode? So even though our instinct as good data architects is to Normalize Normalize Normalize!, doing so ends up producing a structure that is less usable, without compensating gains in data integrity. There's the rare situation where someone doesn't know their ZIP code, so entering it should be optional so long as someone enters both city and state. And yes, storing only the ZIP code would require some pretty complex (and outsourced) tables explaining which zip goes to which city, etc. Not to mention the fun times when there's multiple 'acceptable' city names for a ZIP code. Also, ZIP code boundaries can change, so it helps to have the other data, as a proper postal system will figure it out. And ZIP codes can be phased out; looking at usps.com, it looks like the post office finally retired 10048, which was formerly the location of 1-7 World Trade Ctr. So it makes sense to store all three pieces of data, because the ZIP code is not immutable and is sometimes not available. And if you were talking total normalization, you'd also store the street and address and apartment (and apartment type) separately, to more easily lookup the +4. I still see no reason to give people two lines for an address.
|
# ¿ Sep 10, 2010 17:06 |
|
An actual database question, in MySQL. I'm loading an Access table (via CSV) into PHP, then modifying some things and putting it into a MySQL table. I want to change it from this...pre:log_id user_id date status So, when I do change the key (through an ALTER TABLE command, rather than dealing with it in the CREATE, then INSERT), there's a duplicate record - so much for my previous statement, right? I chalk it up to Access being a dick about data, and I'm wondering, is there some easy way to fix this, or do I have to manually prune out the duplicate records?
|
# ¿ Sep 10, 2010 17:38 |
|
Hammerite posted:Do you have an idea of how arduous it would be to weed out the duplicates? Find out how many there are and if there are only like half a dozen, gently caress it, do it manually. I think this (untested) should work: I'm assuming only a half dozen, but the problem is, this is somewhat live data. I'm designing this transition based on a snapshot of the Access database from a week ago, but when it comes to the final transition, we've been trying to design it as fire-and-forget, one command one conversion, rather than have to take it down for extra time. I'll look over that one; the idea that popped into my head was to basically wipe the table clean and replace it (via a temporary table) with the results of a SELECT DISTINCT user_id, date, status FROM whatever GROUP BY user_id, date, status.
|
# ¿ Sep 10, 2010 19:33 |
|
Hammerite posted:Actually, now that I think about it, couldn't you declare the three-column primary key to MySQL, leave the log_id column as indexed and auto_increment, then do the inserts using INSERT IGNORE and then drop log_id? Huh. That's simple enough that it might just work. I'll try it out when I touch that one next. (Though, why even keep log_id around? Why not just never create it?)
|
# ¿ Sep 10, 2010 19:43 |
|
Hammerite posted:Well, if you can eliminate it from the dump file too, go for it. I was just thinking that MySQL would complain about the extra column in the CSV that it doesn't know about. Oh. ... true. The things I forget when not looking smack at the code.
|
# ¿ Sep 10, 2010 20:08 |
|
What are the limitations of COALESCE? I have this query:code:
Even weirder(?), when I change it to COALESCE("a", SUM(amount)), Navicat tells me it's returning a blob, but when I save it to disk, the blob is an empty file. Edit: I found a workaround, I suppose: Wrapping the whole thing in COALESCE. code:
Of course, now I wonder if it's faster to do that (with the subquery), or just run a mysql_num_rows and return '0' if it's 0. And that reeks of premature optimization. But alas, I wonder. Golbez fucked around with this message at 22:32 on Sep 20, 2010 |
# ¿ Sep 20, 2010 22:10 |
|
In MySQL, I've been using FROM ... LEFT JOIN ... USING (...) style for a while, but I've become very paranoid about using multiple ones. An example:code:
code:
So what happens if location_id also exists in users? Even if it's equal to the one in companies? Now, before, I would have done the first query above and it would have worked fine, completely unambiguous. But with USING I'm not entirely sure. In a case with any possible ambiguity, should I abandon USING, or is it safe when I know the data is the same? Another question, less about paranoia and more about how this works. Let's say I'm pulling in things from two tables on the same ID: code:
code:
|
# ¿ Sep 29, 2010 18:05 |
|
Avarkx posted:If this doesn't throw a syntax error (and I would be seriously surprised if it didn't), go for it. I don't have access to a MySQL instance right now so I can't check this myself or I would, just to see. The manual includes this: quote:This is a conservative extension if we consider each comma in a list of table_reference items as equivalent to an inner join. For example:
|
# ¿ Sep 30, 2010 16:18 |
|
Avarkx posted:USING is a clause with a specific application, namely auto-coalescing common columns for you. Use ON if you don't need the functionality of USING. I figure I should be specific and use ON rather than USING for multiple table issues. Edit: The manual continues, "In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise."
|
# ¿ Sep 30, 2010 17:42 |
|
Avarkx posted:In simple cases, we are talking about the same thing, yes. I'm just attempting to illustrate that in a case involving a possibly ambiguous column, the value of the leading table is favored. That is, the ambiguity is resolved by taking the first non-null value in the common column of the referenced tables (if a value exists, of course). I'm also saying that the USING clause is specific enough for that very reason, but the majority of the time you will be doing yourself (and anybody reading your code later) a service by sticking to the ON clause when the case is in fact, simple. Thanks so much for your analysis.
|
# ¿ Sep 30, 2010 20:18 |
|
Dsan posted:I wanna do this: You could add \ to the double quotes, or just use http://us2.php.net/mysql_real_escape_string.
|
# ¿ Oct 5, 2010 15:56 |
|
Let's say I have a status history table, with statuses like available, not available, fired, etc. Now let's say that sometimes statuses are put in more than once in a row:pre:User ID | Status | Date | Notes ---------------------------------------------------------- 1 | AVAIL | 10/10/10 | Confirmed availability 1 | AVAIL | 08/01/10 | 1 | NOTAVAIL | 07/02/10 | On extended vacation 1 | AVAIL | 02/05/09 | I wouldn't be averse to a solution that also involved PHP. I'm thinking, maybe ... select them ordered by date, then loop through PHP until I hit one that doesn't match the most recent status, then get the previous date? I'd like a SQL solution, but I guess a PHP one could work in the meantime. Golbez fucked around with this message at 22:28 on Oct 19, 2010 |
# ¿ Oct 19, 2010 22:16 |
|
McGlockenshire posted:Totally doable. I think. This will use MySQL's don't-have-to-GROUP BY-all-the-columns logic, which may not work in not-MySQL databases. You did not specify which SQL server you're using, but you're using PHP, and PHP and MySQL go together like Dumb and Dumber. Thanks, though I'm not really getting it to work, but... immediate last status? What if there's three in a row? Yes, PHP/MySQL, I keep forgetting to mention my DB here.
|
# ¿ Oct 20, 2010 14:49 |
|
A couple of database design questions: 1) Phone numbers (American only, 10 digits) and Social Security Numbers. Should these be stored with or without dashes? Right now I'm doing it with dashes, validating input along the way. 2) Totals. Right now, we monitor transactions of adding or subtracting funds to a person's account. And right now, to get the total funds in an account, I sum up all of their transactions. And I was wondering if perhaps there was a better way of doing it. Maintaining it in a field in the user's record. But then I worry about when a transaction is made and, for whatever reason, the user's record isn't updated. MyISAM tables in a MySQL database. Any thoughts, or should I just keep it how it is? Edit: In a similar vein, the status updates I wrote about earlier. Right now, to find someone's status, I simply grab the most recent status message from the status history table. Should that be stored as well, or keep it 'dynamic' like this? Golbez fucked around with this message at 22:12 on Oct 21, 2010 |
# ¿ Oct 21, 2010 22:04 |
|
Doctor rear end in a top hat posted:I don't store any formatting in the database, and phone numbers are a pain in the rear end especially because there are so many goddamn ways that people format them. If you just store digits, you can preg_replace('/[^\d]/','',$phone) and validate it that way. As for the transactions, if the calculation is trivial, you might as well do it every time. You could make a view that has the calculated total as one of the columns. Internal app, maybe 2 people ever using this system, and I validate numbers on a strict regex, so that's actually not too big an issue. Though, one might say the fact that I have to validate them that strictly is indicative of an issue.
|
# ¿ Oct 21, 2010 22:16 |
|
ShoulderDaemon posted:You probably shouldn't be tracking money in a MySQL database, especially one that isn't transactional. Does it make things better or worse that this is replacing an Access application?
|
# ¿ Oct 21, 2010 22:21 |
|
ShoulderDaemon posted:Better. If you're writing a new application, is there any reason you have to be using MyISAM? Could you switch to postgres, or at least InnoDB? MySQL is non-negotiable. InnoDB possibly, but house rules say use MyISAM. Fortunately, no money is moving through the system, so it doesn't have to be transactionally perfect; it's not like I'm debiting from one account and crediting another. I'm importing data from a weekly spreadsheet. It's record keeping and auditing.
|
# ¿ Oct 21, 2010 22:27 |
|
Another question, probably simple but I'm having a mindfart. Let's say I have... pre:User ID | Status | Date -------------------------------- 1 | AVAIL | 1/1/2009 2 | AVAIL | 1/5/2009 2 | NOT | 2/2/2009 1 | NOT | 3/3/2009 1 | AVAIL | 4/4/2009 3 | NOT | 5/5/2009 3 | AVAIL | 6/6/2009
|
# ¿ Oct 21, 2010 22:39 |
|
Hammerite posted:What's the reasoning behind that? AFAIK there are only a few special usage scenarios for which MyISAM is more appropriate than InnoDB. For something like what you are describing I really don't know why you would use MyISAM. Because that's how it's been done and I just got here a few months ago and am still technically a temp? vv Though, since this is a project that is 99% independent from the rest of the site, I could probably get away with it without anyone knowing.
|
# ¿ Oct 21, 2010 22:56 |
|
Sprawl, I have a question based on something from what you did, which looks like it might work but this bit is confusing me. Here's a query:code:
Edit: And your query isn't working because of said issue. Could it be MySQL's fault? Edit 2: OK, I swapped the inner query for a simple "select ... order by desc limit 1", and added a "where status = 12" to the outer query, and it's working now. Thanks Golbez fucked around with this message at 15:47 on Oct 22, 2010 |
# ¿ Oct 22, 2010 14:53 |
|
Sprawl posted:If it wasn't working you were doing it wrong and without a full explanation of what you were doing with exact query and table layout i can't do anything to fix your broken work. Literally copying your query and changing the field names. That was it. Oho, wait - I was misreading your final 'group by id'; I was taking that to mean employee_id, you meant row id. OK, running that... oh hey it works Sorry. So yes, your work was actually perfect and I thank you very much for it.
|
# ¿ Oct 22, 2010 15:58 |
|
MoNsTeR posted:The current status should be stored on the entity to which it pertains. The history of that status should be stored in a separate table, typically trigger-populated (though you could use a procedure if you are allergic to triggers). Why, though? I've generally thought that you shouldn't really store calculated values, only the stuff they were calculated from, and the 'latest' entry in a history could qualify as that. (Now, if I were storing every single change, yeah, that wouldn't make sense, but that's more of a "history of the record" thing, rather than a record of history. Does that make sense?)
|
# ¿ Oct 22, 2010 16:46 |
|
MoNsTeR posted:You're not storing every change? That doesn't make any sense. By every change, I meant along the lines of, "On timestamp X, they changed the address from this to this. On timestamp Y, they changed this minor field in this minor table to this", etc. It would make no sense to generate a record from that. But generate a sum from a list of sums, or generate "current status" from the most recent status update, where past statuses do matter? However, because of other new requirements, I have to end up generating dates from the most recent status, so the whole 'not storing a calculated value' thing is out the window.
|
# ¿ Oct 23, 2010 08:30 |
|
Best practices: Storing names. Right now, we store last name and first name. But I'm seeing some problems and inconsistencies. For example, sometimes there's a middle initial tacked to first name, and sometimes it even has a period after it. Sometimes "Jr." is after the last name, sometimes the first name. This obviously is not good. What's the best way of storing this stuff? I don't need too much, and only ever search on first and last name; middle, etc. is irrelevant for everything but reports. I was wondering if this seemed reasonable: LastName FirstName FullName It seems a little off to demand people enter or change a name twice, but on the other hand, a name is not something that changes often. People may still enter middle initials without a period but that's less a worry. And there may be inconsistency with how they handle Jr. (with or without a comma? with or without a period?) But, at least the searchable data itself is clean. Another option is: LastName FirstName MiddleName Suffix The middle name could enforce a period if there's only one letter in it, and the suffix could be populated from a dropdown which would ensure the proper formatting. Then I wonder if I'm making a big deal out of nothing. As stated, the name outside of the basics is for display only, but I'd like our display to be nice.
|
# ¿ Oct 28, 2010 15:54 |
|
Turned out that the double statuses were in fact an error and we'll be regenerating all 2200 records. Thanks for looking at it, though.
|
# ¿ Oct 29, 2010 14:40 |
|
|
# ¿ May 14, 2024 00:20 |
|
Hammerite posted:What's the reasoning behind that? AFAIK there are only a few special usage scenarios for which MyISAM is more appropriate than InnoDB. For something like what you are describing I really don't know why you would use MyISAM. Coming back to this four months later after I tried again, it's mainly because all of our systems, such as backup, are based around MyISAM.
|
# ¿ Mar 4, 2011 21:37 |