Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Mackerel, the Thief posted:

They... don't... use objects?! Like, at all? Everything procedural? :aaa:

:smith:

Adbot
ADBOT LOVES YOU

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
In a situation where transactional integrity isn't required (I'm not updating bank records here), is there any reason at all to be fancy and go with a multiple-table delete statement in MySQL over the much more understandable and sane method of using two individual delete statements?
code:
DELETE lists
      ,list_contacts
  FROM lists
      ,list_contacts
 WHERE lists.list_id = 15
   AND list_contacts.list_id = 15;
versus
code:
DELETE FROM lists WHERE list_id = 15;
DELETE FROM list_contacts WHERE list_id = 15;

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

I would go for the single statement because to me it seems if anything simpler. (In most situations like this I would expect a single-table delete statement would suffice, as cascading deletes would take care of the rest.)

The first one seems simpler than the second? :confused:

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Sprawl posted:

Why wouldn't you just add a FOREIGN KEY with an on delete to remove the entires from the other table?

I was going to say, "Never used it before, but I'll read up on it and find out if we can".

Then I discovered it's only supported for InnoDB, and we use MyISAM exclusively.

Golbez fucked around with this message at 14:29 on Aug 26, 2010

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Doctor rear end in a top hat posted:

Is there a reason? We were using MyISAM and I switched the tables I use to InnoDB so I could use foreign keys. No problems so far except apparently InnoDB saves some kind of log file that can take up a lot of room if you leave it set at the defaults.

'For the types of data and queries we run, they perform the best'.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
I figure it exists for a reason and use datetime exclusively. The various functions make it worth it in my eyes, particularly the diff ones, though if things aren't set up right you could conceivably have time zone disagreement, which wouldn't be a problem with a timestamp.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
MySQL Table:
pre:
user_id   | log_time            | status
----------------------------------------
15        | 2010-09-01 10:14:20 | 1
15        | 2010-09-01 10:15:30 | 0
15        | 2010-09-01 10:15:31 | 1
15        | 2010-09-01 10:17:45 | 1
15        | 2010-09-01 10:18:50 | 0
The gist of it, it logs when someone logs in. Vastly simplified, but here we are. Previously, all three columns were the primary key, but I ran into a problem when someone clicked the button more than twice a second - primary key duplication. I debated using a REPLACE query or INSERT...ON DUPLICATE KEY UPDATE, and went with the latter. So, I've switched the key to just the first two columns and set the query so that, on duplicate key, it just changes the status. This has the effect of, if they've clicked the button twice in one second, it will overwrite the previous entry for that status. That is to say, if there was a 10:15:30 with status 0, and then they clicked again, 10:15:30's entry would be set to 1. I figure, you're losing nothing here; if someone clicks twice in one second, why store both of them, just store the final result?

Well, when you do that... let's say you're at 1 at 10:15:30, then click at 10:15:35 (so status is 0), then again at 10:15:35 (so :35's status is now 1). You now have two rows in a row with status 1. I could trim this out in the reporting stage, I figure, but I was wondering if there was an easy way to trim out consecutive rows (like 3 and 4 in the above table) in a query?

1) Is there an easy query to trim out situations like this? Like, um... "DELETE FROM table WHERE available = 1 AND next_row.available = 1", and then delete the later row?

2) Is there an easy way to do an insert and a where? Like, "INSERT INTO ... WHERE (SELECT status from last entry) != STATUS", so that I simply don't make the duplicate record to begin with?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

I'm confused about the "Status" column. Does it actually store any information beyond "how many times did this user log in during this second"? Maybe this is a case where INSERT IGNORE would be useful.

Sorry, I wasn't clear; it's a boolean. i.e., "They marked themselves online at this timestamp" = 1. This is our way to generate a report showing when web chat operators are online.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

McGlockenshire posted:

It honestly might be easier to separate the two pieces of data into two separate tables, one to store *only* first-time logins, and one to store *only* activity markers. You can then combine this with INSERT IGNORE and a unique key on the id/timestamp column to blindly fire updates without worrying about the state of the other column.

It's a workaround and not an elegant solution. It reduces one area of complexity (entering data) while increasing another area of complexity (reporting on both sets).

It's simply a marker if they're available for chat, though... so it can only have two states. There's no 'initial' login that's logged.

I mean, I figure I can work around how it is now in the reporting; simply find the next entry where it's different, rather than the next entry period. But I was hoping I could clean it up, if not in the insertion, then in a subsequent query.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MoNsTeR posted:

No, because you should use 1-3 character (4 at most) table aliases and refer to all columns as alias.columnname. For example "select u.user_id, e.employee_id from user u, employee e ...". Then standardize which aliases are used for which tables and you will be amazed at how easy it is to dive in and read queries from all across the system. Otherwise your expressions become hopelessly long e.g. some_really_long_table_name.super_long_column_name, etc. Not using aliases, inconsistently using aliases, and mixing alises with full table name references are all horrible, horrible patterns that you WILL kick yourself for following.

I've been told to stop using aliases. :eng99:

And I always use "AS" for column aliases, and always omit it for table aliases.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
They don't let me use nulls in our database because of the lead dev's OCD. Now, I can understand OCD, I hate wasting DB rows for example, but is there any *real* reason to have every column be created NOT NULL?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
So, let's say an "Address2" field, the second line of an address (as much as I dislike such fields, an address belongs on one line dang it, unless you're separating out number and street into their own fields too in a postal DB) would be blank or null in the case it doesn't exist? Its value is known - their address only has one line, so the second line does not exist. In the past I would have checked this for null, but now you've got me realizing I was probably doing it wrong.

On the other hand, in my chat app where it uses an "operator_id" field to see which operator said a line, and is blank if the user said it (since there can be multiple operators but only one user), that would be a valid usage of null.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

Or alternatively you could take the view that you need 2 extra tables.

Let's say your Address and Address2 fields are in a table called Customer.

A Customer could have multiple addresses, so you make an extra table called Address, with a foreign key back to Customer. Now you put fields Address1, Address2, Address3 in table Address, right? You could do, or you could go the whole hog and make another table called AddressLine, with a foreign key back to Address.

Edit: An address probably should be at least one line long, so maybe you store the first line of the address in the Address table, and extra lines can go in the AddressLine table.

Addresses should never have more than one line. :colbert:

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

My address has 10 lines and half of them are NULL and half of the rest have newline characters in the middle.

A properly formed American address needs a name, an address (everything on one line - apartment, suite, etc), a city, state, and zipcode. Building name optional.

The only time I can justify giving two lines to the user is if they are insistent on giving both a PO Box and a street address, which makes sense if they can't get US mail at their street address and UPS won't deliver to their post office. But in that case, make it clear which one you want.

The problem with giving multiple address lines is the first address line will cease to have semantic meaning - it could be an address, or it could be "Gelman Library", or it could be their backup PO Box, or it could be "Knock on door". Screw that. That's what a comment is for. Nothing should ever be mailed out to an address that hasn't been looked at by human eyes anyway, so.

Sometimes a building/firm is useful because of a specific ZIP+4 code, but that should be in a separate field, not shlomped in with 'Address'.

Same rules apply to most foreign addresses as well, though I'll allow multiple lines/textarea entry for those if a format isn't strictly known. Canadian addresses are aggressively one-liners, whereas I remember having some British addresses take up two lines, and Korean addresses even more. But on the other hand, if you're an American company, you should always spend extra time vetting foreign addresses.

Basically, my complaint isn't so much using multiple columns for an address; it's that there are always multiple "address" columns, i.e. the actual street address. There's no reason that my apartment should be expected on a second line, etc. One line for the address, one line for the building/form, one line for other instructions, etc. Keep the semantics. None of this "address1" "address2" stuff.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MoNsTeR posted:

Well, yes there is, mainly that that's how 99% of people in the US write addresses, and how 99% of web forms for addresses are structured. Line 1 is street address, Line 2 is optionally for apartment/suite number etc. I'm a big advocate of obsessive hyper-normalization, but there are some places where the data structure meets reality and you just have to go with reality.
They should learn to put apartment numbers on the same line as the address. Give them one line and they'll figure it out! Working in shipping for a few years has made me an addressing snob. :v:

quote:

Addresses are actually a great example for another reason, namely zipcode. A zipcode exists in exactly one state, so your address data shouldn't even have a state field (and nor should your user form) since it's redundant. Zipcodes also partially determine your city, and if you stick to "preferred" records it fully determines your city, so your data and form shouldn't include city either. But how useful is that, really? Now you can't select from your address table and get useful data without joining to a zipcode lookup or invoking functions. And how big of a problem, honestly, is people entering cities and/or states that are inconsistent with their zipcode? So even though our instinct as good data architects is to Normalize Normalize Normalize!, doing so ends up producing a structure that is less usable, without compensating gains in data integrity.

There's the rare situation where someone doesn't know their ZIP code, so entering it should be optional so long as someone enters both city and state. And yes, storing only the ZIP code would require some pretty complex (and outsourced) tables explaining which zip goes to which city, etc. Not to mention the fun times when there's multiple 'acceptable' city names for a ZIP code. Also, ZIP code boundaries can change, so it helps to have the other data, as a proper postal system will figure it out. And ZIP codes can be phased out; looking at usps.com, it looks like the post office finally retired 10048, which was formerly the location of 1-7 World Trade Ctr.

So it makes sense to store all three pieces of data, because the ZIP code is not immutable and is sometimes not available.

And if you were talking total normalization, you'd also store the street and address and apartment (and apartment type) separately, to more easily lookup the +4.

I still see no reason to give people two lines for an address. :colbert:

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
An actual database question, in MySQL. I'm loading an Access table (via CSV) into PHP, then modifying some things and putting it into a MySQL table. I want to change it from this...
pre:
log_id
user_id
date
status
to simply get rid of log_id (currently the primary key) and make the other three fields the primary key, since, the way this is set up, there can never be more than one entry of a particular status change per user per day.

So, when I do change the key (through an ALTER TABLE command, rather than dealing with it in the CREATE, then INSERT), there's a duplicate record - so much for my previous statement, right? I chalk it up to Access being a dick about data, and I'm wondering, is there some easy way to fix this, or do I have to manually prune out the duplicate records?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

Do you have an idea of how arduous it would be to weed out the duplicates? Find out how many there are and if there are only like half a dozen, gently caress it, do it manually. I think this (untested) should work:

code:
SELECT
    GROUP_CONCAT(log_id) AS log_ids,
    user_id,
    date,
    status,
    COUNT(*) AS ct
FROM
    table
GROUP BY
    user_id,
    date,
    status
HAVING
    ct >= 2
LIMIT 25

I'm assuming only a half dozen, but the problem is, this is somewhat live data. I'm designing this transition based on a snapshot of the Access database from a week ago, but when it comes to the final transition, we've been trying to design it as fire-and-forget, one command one conversion, rather than have to take it down for extra time.

I'll look over that one; the idea that popped into my head was to basically wipe the table clean and replace it (via a temporary table) with the results of a SELECT DISTINCT user_id, date, status FROM whatever GROUP BY user_id, date, status.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

Actually, now that I think about it, couldn't you declare the three-column primary key to MySQL, leave the log_id column as indexed and auto_increment, then do the inserts using INSERT IGNORE and then drop log_id?

Huh. That's simple enough that it might just work. I'll try it out when I touch that one next.

(Though, why even keep log_id around? Why not just never create it?)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

Well, if you can eliminate it from the dump file too, go for it. I was just thinking that MySQL would complain about the extra column in the CSV that it doesn't know about.

Oh. ... true. The things I forget when not looking smack at the code.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
What are the limitations of COALESCE? I have this query:
code:
SELECT SUM(amount) FROM payments WHERE payment_id = $i GROUP BY payment_id
This works, when there's a payment_id with that number. Otherwise, it delivers a null, because there's nothing to sum. I tried to change it to SELECT COALESCE(SUM(amount), 0) to force it to give some result, but no dice. Even when I change the 0 to a 1, it still gives no rows, leading me to think this is either some limitation of COALESCE, or I'm misunderstanding what's going on here.

Even weirder(?), when I change it to COALESCE("a", SUM(amount)), Navicat tells me it's returning a blob, but when I save it to disk, the blob is an empty file.

Edit: I found a workaround, I suppose: Wrapping the whole thing in COALESCE.
code:
SELECT COALESCE((SELECT SUM(amount) FROM payments WHERE payment_id = $i GROUP BY payment_id), 0)
It works, but I'm still wondering why the first one doesn't work. When I get rid of the GROUP BY, it works.

Of course, now I wonder if it's faster to do that (with the subquery), or just run a mysql_num_rows and return '0' if it's 0. And that reeks of premature optimization. But alas, I wonder.

Golbez fucked around with this message at 22:32 on Sep 20, 2010

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
In MySQL, I've been using FROM ... LEFT JOIN ... USING (...) style for a while, but I've become very paranoid about using multiple ones. An example:
code:
SELECT users.name, companies.company_name, locations.state
FROM users
LEFT JOIN companies ON users.company_id = companies.company_id
LEFT JOIN locations ON companies.location_d = locations.location_id
This, I have no problem with. When I convert it to USING, like so...
code:
SELECT users.name, companies.company_name, locations.state
FROM users
LEFT JOIN companies USING (company_id)
LEFT JOIN locations USING (location_id)
I'm assuming it's smart enough to know location_id is from companies, not from users. I'm assuming - I'm no expert on database design - that the right table on that second join is locations, and the left table is the up-to-now combination of users and companies. Right?

So what happens if location_id also exists in users? Even if it's equal to the one in companies? Now, before, I would have done the first query above and it would have worked fine, completely unambiguous. But with USING I'm not entirely sure. In a case with any possible ambiguity, should I abandon USING, or is it safe when I know the data is the same?

Another question, less about paranoia and more about how this works. Let's say I'm pulling in things from two tables on the same ID:
code:
SELECT users.name, userlog.last_entry, employees.foo
FROM users
LEFT JOIN userlog ON users.user_id = userlog.user_id
LEFT JOIN employees ON users.user_id = employees.user_id
If I wanted to convert this to USING, could I simply do...
code:
SELECT users.name, userlog.last_entry, employees.foo
FROM users
LEFT JOIN (userlog, employees) USING (user_id)
?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Avarkx posted:

If this doesn't throw a syntax error (and I would be seriously surprised if it didn't), go for it. I don't have access to a MySQL instance right now so I can't check this myself or I would, just to see.

The manual includes this:

quote:

This is a conservative extension if we consider each comma in a list of table_reference items as equivalent to an inner join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, CROSS JOIN is a syntactic equivalent to INNER JOIN (they can replace each other).
So I was wondering if it was possible to combine that comma notation in the LEFT JOIN with USING. Though the whole cross/inner join thing scares me a little.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Avarkx posted:

USING is a clause with a specific application, namely auto-coalescing common columns for you. Use ON if you don't need the functionality of USING.
I thought that USING (c1) was functionally identical, in a simple situation, to "... ON t1.c1 = t2.c1". I'm a little sick at the moment so I might be missing the obvious here, but is what you're saying equal to what I'm saying? :)

I figure I should be specific and use ON rather than USING for multiple table issues.

Edit: The manual continues, "In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise."

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Avarkx posted:

In simple cases, we are talking about the same thing, yes. I'm just attempting to illustrate that in a case involving a possibly ambiguous column, the value of the leading table is favored. That is, the ambiguity is resolved by taking the first non-null value in the common column of the referenced tables (if a value exists, of course). I'm also saying that the USING clause is specific enough for that very reason, but the majority of the time you will be doing yourself (and anybody reading your code later) a service by sticking to the ON clause when the case is in fact, simple.

Thanks so much for your analysis. :)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Dsan posted:

I wanna do this:
code:
mysql_query("UPDATE variable SET value = 's:1:"0";'
WHERE value = 's:1:"1";' AND name = 'site_offline'");
but ofc. I get an error because of the " "

what is the work around here ?
It is how it stands in the database

You could add \ to the double quotes, or just use http://us2.php.net/mysql_real_escape_string.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Let's say I have a status history table, with statuses like available, not available, fired, etc. Now let's say that sometimes statuses are put in more than once in a row:

pre:
User ID  |  Status   |  Date     |  Notes
----------------------------------------------------------
1        |  AVAIL    |  10/10/10 |  Confirmed availability
1        |  AVAIL    |  08/01/10 |  
1        |  NOTAVAIL |  07/02/10 |  On extended vacation
1        |  AVAIL    |  02/05/09 |
or whatever. I don't necessarily know why it happens, but I do know it happens, or at least did with the system I'm replacing. So what's the best way to get the oldest date for the most recent status? That is to say, I want to run a query that, run on the above data, will give me 08/01/10. Not 02/05/09, since that's not the most recent status block. It's like I want to find out the date of the next adjacent row iff said row's status is equal to the most recent one, all the way on down til we run into the last one.

I wouldn't be averse to a solution that also involved PHP. I'm thinking, maybe ... select them ordered by date, then loop through PHP until I hit one that doesn't match the most recent status, then get the previous date? I'd like a SQL solution, but I guess a PHP one could work in the meantime.

Golbez fucked around with this message at 22:28 on Oct 19, 2010

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

McGlockenshire posted:

Totally doable. I think. This will use MySQL's don't-have-to-GROUP BY-all-the-columns logic, which may not work in not-MySQL databases. You did not specify which SQL server you're using, but you're using PHP, and PHP and MySQL go together like Dumb and Dumber.

code:
SELECT `User ID`, `Status`, `Date`, Newest_Data.*
  FROM Status_History,
       (SELECT `User ID` AS nd_uid, `Status` AS nd_status, `Date` AS nd_date
          FROM Status_History
         GROUP BY `User ID`
         ORDER BY `Date` DESC
        ) AS Newest_Data
 WHERE Newest_Data.nd_uid = Status_History.`User ID`
   AND Newest_Data.nd_date > Status_History.`Date`
 GROUP BY `User ID`
 ORDER BY `Date` DESC
What this should give you is two sets of columns. The right-most set will be each user's newest status, and the date that status was given. The left-most set will be the user's immediate last status, and it's date. Once it's shown to work, you can add on a Status_History.`Status` = Newest_Data.nd_status in the WHERE clause to only catch those with two updates of the same status in a row.

Thanks, though I'm not really getting it to work, but... immediate last status? What if there's three in a row? :v: Yes, PHP/MySQL, I keep forgetting to mention my DB here.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
A couple of database design questions:

1) Phone numbers (American only, 10 digits) and Social Security Numbers. Should these be stored with or without dashes? Right now I'm doing it with dashes, validating input along the way.

2) Totals. Right now, we monitor transactions of adding or subtracting funds to a person's account. And right now, to get the total funds in an account, I sum up all of their transactions. And I was wondering if perhaps there was a better way of doing it. Maintaining it in a field in the user's record. But then I worry about when a transaction is made and, for whatever reason, the user's record isn't updated. MyISAM tables in a MySQL database. Any thoughts, or should I just keep it how it is?

Edit: In a similar vein, the status updates I wrote about earlier. Right now, to find someone's status, I simply grab the most recent status message from the status history table. Should that be stored as well, or keep it 'dynamic' like this?

Golbez fucked around with this message at 22:12 on Oct 21, 2010

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Doctor rear end in a top hat posted:

I don't store any formatting in the database, and phone numbers are a pain in the rear end especially because there are so many goddamn ways that people format them. If you just store digits, you can preg_replace('/[^\d]/','',$phone) and validate it that way. As for the transactions, if the calculation is trivial, you might as well do it every time. You could make a view that has the calculated total as one of the columns.

Internal app, maybe 2 people ever using this system, and I validate numbers on a strict regex, so that's actually not too big an issue. Though, one might say the fact that I have to validate them that strictly is indicative of an issue.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

ShoulderDaemon posted:

You probably shouldn't be tracking money in a MySQL database, especially one that isn't transactional.

Does it make things better or worse that this is replacing an Access application?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

ShoulderDaemon posted:

Better. If you're writing a new application, is there any reason you have to be using MyISAM? Could you switch to postgres, or at least InnoDB?

MySQL is non-negotiable. InnoDB possibly, but house rules say use MyISAM.

Fortunately, no money is moving through the system, so it doesn't have to be transactionally perfect; it's not like I'm debiting from one account and crediting another. I'm importing data from a weekly spreadsheet. It's record keeping and auditing.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Another question, probably simple but I'm having a mindfart.

Let's say I have...
pre:
User ID  | Status  | Date
--------------------------------
1        | AVAIL   | 1/1/2009
2        | AVAIL   | 1/5/2009
2        | NOT     | 2/2/2009
1        | NOT     | 3/3/2009
1        | AVAIL   | 4/4/2009
3        | NOT     | 5/5/2009
3        | AVAIL   | 6/6/2009
and I want to get the User IDs where AVAIL is the most recent status. So in this case, it would give me user IDs 1 and 3, but not 2.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

What's the reasoning behind that? AFAIK there are only a few special usage scenarios for which MyISAM is more appropriate than InnoDB. For something like what you are describing I really don't know why you would use MyISAM.

Because that's how it's been done and I just got here a few months ago and am still technically a temp? v:shobon:v

Though, since this is a project that is 99% independent from the rest of the site, I could probably get away with it without anyone knowing.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Sprawl, I have a question based on something from what you did, which looks like it might work but this bit is confusing me. Here's a query:
code:
select status_id, Max(status_date) from table where user_id = 1326
Now, looking at the data, I can see that the most recent status id is 12, and date is 5/30/2007. However, whenever I run this, or any variation of it I can think of, the status id returned is 4 - the earliest one in the table, so the first one that comes up. I've tried various groupings, order bys, havings... I simply cannot get it to say 12. What am I doing wrong? :(

Edit: And your query isn't working because of said issue. Could it be MySQL's fault?

Edit 2: OK, I swapped the inner query for a simple "select ... order by desc limit 1", and added a "where status = 12" to the outer query, and it's working now. Thanks :)

Golbez fucked around with this message at 15:47 on Oct 22, 2010

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Sprawl posted:

If it wasn't working you were doing it wrong and without a full explanation of what you were doing with exact query and table layout i can't do anything to fix your broken work.

Literally copying your query and changing the field names. That was it. Oho, wait - I was misreading your final 'group by id'; I was taking that to mean employee_id, you meant row id. OK, running that... oh hey it works :saddowns: Sorry. So yes, your work was actually perfect and I thank you very much for it. :)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MoNsTeR posted:

The current status should be stored on the entity to which it pertains. The history of that status should be stored in a separate table, typically trigger-populated (though you could use a procedure if you are allergic to triggers).

Why, though? I've generally thought that you shouldn't really store calculated values, only the stuff they were calculated from, and the 'latest' entry in a history could qualify as that. (Now, if I were storing every single change, yeah, that wouldn't make sense, but that's more of a "history of the record" thing, rather than a record of history. Does that make sense?)

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

MoNsTeR posted:

You're not storing every change? That doesn't make any sense.

"Current state" is by no means a calculated value, it only seems that way because you're starting from a model of storing state changes as events rather than just... storing state. Now, sometimes an event model is appropriate. But if a thing has a state, just store it! It is "history of the record" because state is part of the record and that's what you're keeping history of.

9 times out of 10, in the actual transactional system (viz. reporting), you care about the current state. And when you do want the history, you just need to display it, not analyze or interact with it. Optimize for those use cases.

By every change, I meant along the lines of, "On timestamp X, they changed the address from this to this. On timestamp Y, they changed this minor field in this minor table to this", etc. It would make no sense to generate a record from that. But generate a sum from a list of sums, or generate "current status" from the most recent status update, where past statuses do matter?

However, because of other new requirements, I have to end up generating dates from the most recent status, so the whole 'not storing a calculated value' thing is out the window.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Best practices: Storing names. Right now, we store last name and first name. But I'm seeing some problems and inconsistencies. For example, sometimes there's a middle initial tacked to first name, and sometimes it even has a period after it. Sometimes "Jr." is after the last name, sometimes the first name. This obviously is not good.

What's the best way of storing this stuff? I don't need too much, and only ever search on first and last name; middle, etc. is irrelevant for everything but reports. I was wondering if this seemed reasonable:
LastName
FirstName
FullName

It seems a little off to demand people enter or change a name twice, but on the other hand, a name is not something that changes often. People may still enter middle initials without a period but that's less a worry. And there may be inconsistency with how they handle Jr. (with or without a comma? with or without a period?) But, at least the searchable data itself is clean. Another option is:
LastName
FirstName
MiddleName
Suffix

The middle name could enforce a period if there's only one letter in it, and the suffix could be populated from a dropdown which would ensure the proper formatting.

Then I wonder if I'm making a big deal out of nothing. As stated, the name outside of the basics is for display only, but I'd like our display to be nice.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Turned out that the double statuses were in fact an error and we'll be regenerating all 2200 records. :v: Thanks for looking at it, though. :)

Adbot
ADBOT LOVES YOU

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

Hammerite posted:

What's the reasoning behind that? AFAIK there are only a few special usage scenarios for which MyISAM is more appropriate than InnoDB. For something like what you are describing I really don't know why you would use MyISAM.

Coming back to this four months later after I tried again, it's mainly because all of our systems, such as backup, are based around MyISAM.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply