Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Suran37
Feb 28, 2009
Alright thanks for the feedback again.
Is this better now?

Adbot
ADBOT LOVES YOU

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Yeah that looks way better and is quite easy to understand.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
In MovieActor and MovieDirector you do not need an additional column to serve as the primary key. The primary key of MovieActor should be (MovieID, ActorID); the primary key of MovieDirector should be (MovieID, DirectorID).

Zombywuf
Mar 29, 2008

I'd be tempted to make the Actors just have a Name field rather than First Name, Last Name. Consider "Samual L. Jackson" for example.

EDIT: also have a People table instead of an Actors and Directors table, sometimes they are the same person. You have tables to establish the type of relation.

Zombywuf fucked around with this message at 13:13 on Nov 15, 2011

mister_gosh
May 24, 2002

I'm looking to create hot (and cold) backups of a production MySQL database. Are there any tools out there that are proven/standard?

Factor Mystic
Mar 20, 2006

Baby's First Post-Apocalyptic Fiction

Zombywuf posted:

EDIT: also have a People table instead of an Actors and Directors table, sometimes they are the same person. You have tables to establish the type of relation.

I agree with this, personally I view a superior design as something like 'MoviePerson' with the addition of 'MoviePersonRole' to store roles like actor, director, producer, etc. If you just want actors or just want directors, it's still a simple query and it would allow for an individual to be both an Actor and Director without duplicating the person instance, for example.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Keep in mind that you can still have duplicates depending on how you determine if a person is the same. First and Last name will obviously reject a lot of legitimate people from having their own record in the table. So then you try and insert their date of birth, but you realize that you can't always get it and you end up declaring the column as nullable and you're back at where you started. Table normalization is important but I'd argue that normalizing your data is more important the database. And then of course is the problem of some movies will have "Samuel L. Jackson" listed as the actor and other will just have "Samuel Jackson". Are they two different people? Will they have separate records in your table?

I'd like to hear what some of the other more experienced developers have to say with regard to designing a database schema that takes into account the practical problems that come from imperfect data.

subx
Jan 12, 2003

If we hit that bullseye, the rest of the dominoes should fall like a house of cards. Checkmate.

MEAT TREAT posted:

Keep in mind that you can still have duplicates depending on how you determine if a person is the same. First and Last name will obviously reject a lot of legitimate people from having their own record in the table. So then you try and insert their date of birth, but you realize that you can't always get it and you end up declaring the column as nullable and you're back at where you started. Table normalization is important but I'd argue that normalizing your data is more important the database. And then of course is the problem of some movies will have "Samuel L. Jackson" listed as the actor and other will just have "Samuel Jackson". Are they two different people? Will they have separate records in your table?

I'd like to hear what some of the other more experienced developers have to say with regard to designing a database schema that takes into account the practical problems that come from imperfect data.

A lot of it depends on where you are getting the data. If you can look at things like if the actor's name links to a page that can clear up ambiguity between two people. There's obviously a lot of other ways, that's just an example.

Even then some of the best designed tables suffer from user error and/or imperfect data. We had a huge events table at my last job, and when I imported data from the old system the locations for the events were all sorts of hosed. It's amazing how many ways someone can mistype or misspell the city they loving live in?!?!

subx fucked around with this message at 18:22 on Nov 15, 2011

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

subx posted:

Even then some of the best designed tables suffer from user error and/or imperfect data. We had a huge events table at my last job, and when I imported data from the old system the locations for the events were all sorts of hosed. It's amazing how many ways someone can mistype or misspell the city they loving live in?!?!

Freeform fields were invented by Satan.

McGlockenshire
Dec 16, 2005

GOLLOCKS!

mister_gosh posted:

I'm looking to create hot (and cold) backups of a production MySQL database. Are there any tools out there that are proven/standard?

Percona's Xtrabackup is pretty much the gold standard here and works both with MyISAM and InnoDB tables.

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

mister_gosh posted:

I'm looking to create hot (and cold) backups of a production MySQL database. Are there any tools out there that are proven/standard?

The standard mysqladmin dumps work fine for all backup purposes i find.

mister_gosh
May 24, 2002

Thanks for the recommendations!

mobby_6kl
Aug 9, 2009

by Fluffdaddy

Factor Mystic posted:

I agree with this, personally I view a superior design as something like 'MoviePerson' with the addition of 'MoviePersonRole' to store roles like actor, director, producer, etc. If you just want actors or just want directors, it's still a simple query and it would allow for an individual to be both an Actor and Director without duplicating the person instance, for example.

N-thing this, you never know when you'll need to store this movie and the related "talent" http://www.imdb.com/title/tt0366242/


subx posted:

A lot of it depends on where you are getting the data. If you can look at things like if the actor's name links to a page that can clear up ambiguity between two people. There's obviously a lot of other ways, that's just an example.

Even then some of the best designed tables suffer from user error and/or imperfect data. We had a huge events table at my last job, and when I imported data from the old system the locations for the events were all sorts of hosed. It's amazing how many ways someone can mistype or misspell the city they loving live in?!?!

That's my experience as well, often you just don't have enough data to uniquely, and correctly, identify an object, and then you can either save it as it is and deal with lovely data quality, or don't, and deal with missing data.

Say, john.smith@megacorp.com emailed you some questions about your products. Then a month later John Smith from Megacorp calls you about another product. Is this the same guy? What about Johnathan Smith who shows up at your event? :iiam: Of course, you can use similarity metrics to determine that John and Johnathan are possibly the same thing, but that still doesn't tell you if it's all the same physical person without any additional information.

To deal with this, I'd probably reference the original record in the suspected duplicates, along with a confidence level, perhaps, and collapse them into one once there's enough information to match them. I haven't actually done this IRL, so maybe that's a terrible idea. I suspect that IMDB just has people manually making sure that their poo poo is correct when adding new movies/people.


I do have a question of my own, though hopefully not as complicated. Is there a neat way to split a long text by length, and into a table, in Postgres (or T-SQL)?
code:
from
_______________________
| ID  | TXT            |
| 1   | Some crap here |
| 2   | Blah           |
|_____|________________|
to
______________________
| ID  | TXT          |
| 1   | Some         |
| 1   | crap         |
| 1   | here         |
| 2   | Blah         |
|_____|______________|
SELECT id, regexp_split_to_table(txt, '\\s') FROM tmp; 
That does almost exactly what I need, except with regexes instead of a specific length. There's string_to_array, but again, that's not it :(

JasonV
Dec 8, 2003
I'm wondering if there's a way to do this easily. My google-fu is coming up with nothing...

I've got a table of addresses that has duplicate entries I want to clean up:


code:
id   | street     | number | city 
---------------------------------------
155  | Main     | 123    | smallVille
187  | Main St  | 123    | SmallVill
256  | Main Str | 123    | Small Ville
Now, there's a bunch of other tables that reference the address table. Say, an order table (order_id, customer_id, address_id) and a customer table (customer_id, first_name, last_name, address_id), etc.

So, my first idea is to make a php script that takes the duplicates and iterates over them and updates any table that references the address table. That's fine, but I keep feeling I should be able to do this without php.. ?

It's MySQL and InnoDB. All the foreign key relationships are set up. I can set everything as 'ON UPDATE CASCADE' (or whatever might be needed)....

Is there no way to say to MySQL: "Hey, follow the relationships I've set up and switch address 155 and 256 to 187 everywhere." ?

Demonachizer
Aug 7, 2004
Just wondering if any of you know of a decent solution to monitor table sizes in a database and that can output reports and graphs so we can track growth. I just want to check to make sure we aren't reinventing the wheel by coding something.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

JasonV posted:

I'm wondering if there's a way to do this easily. My google-fu is coming up with nothing...

I've got a table of addresses that has duplicate entries I want to clean up:


code:
id   | street     | number | city 
---------------------------------------
155  | Main     | 123    | smallVille
187  | Main St  | 123    | SmallVill
256  | Main Str | 123    | Small Ville
Now, there's a bunch of other tables that reference the address table. Say, an order table (order_id, customer_id, address_id) and a customer table (customer_id, first_name, last_name, address_id), etc.

So, my first idea is to make a php script that takes the duplicates and iterates over them and updates any table that references the address table. That's fine, but I keep feeling I should be able to do this without php.. ?

It's MySQL and InnoDB. All the foreign key relationships are set up. I can set everything as 'ON UPDATE CASCADE' (or whatever might be needed)....

Is there no way to say to MySQL: "Hey, follow the relationships I've set up and switch address 155 and 256 to 187 everywhere." ?

The only way that I can think of to do this is to drop the primary key on the referenced table (but keep that column indexed); update the table to unify the PK column values for the rows that are duplicates of one another; then delete the duplicate rows and reinstate the primary key. This may not be acceptable to you. Otherwise I think you have to run a whole bunch of update statements against all the tables referring to this one.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

demonachizer posted:

Just wondering if any of you know of a decent solution to monitor table sizes in a database and that can output reports and graphs so we can track growth. I just want to check to make sure we aren't reinventing the wheel by coding something.

Solutions to this are likely going to be vendor-specific, so you should tell us which database software you are using in order to get helpful suggestions.

Aniki
Mar 21, 2001

Wouldn't fit...

Suran37 posted:

Alright thanks for the feedback again.
Is this better now?



Maybe I am misunderstanding what you are doing, but this is how I would lay out the Movie table:

MOVIE
MovieID (PK)
Title
RatingID (FK1)
GenreID (FK2)
ReleaseYear
StudioId (FK3)
Length

I replaced Rating, Genre, and Studio with RatingId, GenreId, and StudioId since storing those INT values should take up less space than storing the full Rating, Genre, or Studio name every time you create a movie record. I also got rid of the Actor(s) and Director fields, since the Movie Actors and Movie Director are already linked to the movie by the MovieID.

Demonachizer
Aug 7, 2004

Hammerite posted:

Solutions to this are likely going to be vendor-specific, so you should tell us which database software you are using in order to get helpful suggestions.

poo poo sorry. I started a thread with that question before and included that it is MSSQL in the subject.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
I ended up reinventing the wheel for the same problem by using a cursor to loop over all tables fitting a criteria (name/user/date) and EXECing sp_spaceusd for each one, then joining the results. It works fine, except that it's not possible to have nested execs in table-valued functions, making this sp a bit awkward to use. If there's a better way, I'm all ears.

Zombywuf
Mar 29, 2008

Anyone know what Postgres means beyond "Your transaction failed, what the gently caress are you doing using triggers to update a table in a highly concurrent serializable query" when it says:

quote:

Canceled on identification as a pivot, during write.

I am very close to using manual table locks at this point, but I could really do with knowing what that error message means otherwise I'm probably just going to make the problem worse.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Zombywuf posted:

Anyone know what Postgres means beyond "Your transaction failed, what the gently caress are you doing using triggers to update a table in a highly concurrent serializable query" when it says:


I am very close to using manual table locks at this point, but I could really do with knowing what that error message means otherwise I'm probably just going to make the problem worse.

http://wiki.postgresql.org/wiki/SSI

It looks like multiple transactions are running in parallel, and if they were run serially the results would be different depending upon the order.

JasonV
Dec 8, 2003

Hammerite posted:

The only way that I can think of to do this is to drop the primary key on the referenced table (but keep that column indexed); update the table to unify the PK column values for the rows that are duplicates of one another; then delete the duplicate rows and reinstate the primary key. This may not be acceptable to you. Otherwise I think you have to run a whole bunch of update statements against all the tables referring to this one.

Yeah, my php scripts run a whole bunch of update statements, and they work fine. I was just hoping this was a common problem with a simple solution I didn't know about.

Thanks, tho.

Zombywuf
Mar 29, 2008

Jethro posted:

http://wiki.postgresql.org/wiki/SSI

It looks like multiple transactions are running in parallel, and if they were run serially the results would be different depending upon the order.

Seems I've developed a blind spot at the top of google as I'm expecting ads to be there....

I suspect I know why this is happening in my db, have had to wrap everything in ugly retry loops. Any idea what the actual message means though? What the hell it means by identification as a pivot specifically?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Zombywuf posted:

Seems I've developed a blind spot at the top of google as I'm expecting ads to be there....

I suspect I know why this is happening in my db, have had to wrap everything in ugly retry loops. Any idea what the actual message means though? What the hell it means by identification as a pivot specifically?
More googling leads me to believe it's just a technical term that means "Transaction A read something that will later be written to by Transaction B and vice-versa"

See this thing here

Quixzlizx
Jan 7, 2007
I'm not sure if this is the right thread for this query, but I'm looking for a newbie-level resource for getting into MS SQL. I've read the general SQL tutorial at w3schools.com and I think I have a pretty good handle on it, but I'd like a book/website that specifically teaches how to do things using the MS SQL environment. I've found https://fianga.com, but it seems to be a bit too newbie-friendly (there are instructions for turning on your computer and starting up MS SQL in each section I've read so far), and the writing isn't exactly the clearest.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
Microsoft has their Books Online thing for SQL Server, have a look at it. I'm usually looking for something specific on MSDN, but this seems to have a good overview of what you might find useful.

Suran37
Feb 28, 2009
Ok, so I am setting up my database(from the pictures above) in SQL Developer, and I am at the point where I need to add Foreign Keys.

Does this look right?


If so, do I need to put the RatingID in the Movie Table for each movie?

Kekekela
Oct 28, 2004

Suran37 posted:

Ok, so I am setting up my database(from the pictures above) in SQL Developer, and I am at the point where I need to add Foreign Keys.

Does this look right?

No, you should have a RATINGID in both tables that make up the key. I'm not familiar with this particular GUI but it looks like you may be setting this up backwards in addition to mismatching columns. Intuitively it seems like you should be setting this up on the Movie table with Rating as the referenced table and RATING_PK as the referenced constraint. In any event, you definitely don't want the association to be between two different ID columns (unless they are actually the same and you've just got hosed up naming conventions, but I'm guessing this isn't the case)

Suran37
Feb 28, 2009

Kekekela posted:

No, you should have a RATINGID in both tables that make up the key. I'm not familiar with this particular GUI but it looks like you may be setting this up backwards in addition to mismatching columns. Intuitively it seems like you should be setting this up on the Movie table with Rating as the referenced table and RATING_PK as the referenced constraint. In any event, you definitely don't want the association to be between two different ID columns (unless they are actually the same and you've just got hosed up naming conventions, but I'm guessing this isn't the case)

Yeah, I realize I had them all backwards, and I changed some of the column names to make it easier, but I think I got it all figured out now.

Also I need to query all the movies that Clint Eastwood is involved in. One of the movies he is an Actor and the Director. Therefore, the movie gets listed twice. Is it possible to alter my query so it is only listed once? I am using the following query:

code:
SELECT TITLE
FROM MOVIE INNER JOIN PERSON ON MOVIE.MOVIEID = PERSON.MOVIEID INNER JOIN NAMES ON PERSON.NAMEID = NAMES.NAMEID
WHERE FIRST_NAME = 'Clint' AND LAST_NAME = 'Eastwood';

Suran37 fucked around with this message at 00:50 on Nov 21, 2011

Thel
Apr 28, 2010

Simplest way would be just to throw a 'DISTINCT' in, which will remove duplicates - viz.:

code:
SELECT DISTINCT TITLE
FROM MOVIE INNER JOIN PERSON ON MOVIE.MOVIEID = PERSON.MOVIEID INNER JOIN NAMES ON PERSON.NAMEID = NAMES.NAMEID
WHERE FIRST_NAME = 'Clint' AND LAST_NAME = 'Eastwood';

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


Thel posted:

Simplest way would be just to throw a 'DISTINCT' in, which will remove duplicates - viz.:

code:
SELECT DISTINCT TITLE
FROM MOVIE INNER JOIN PERSON ON MOVIE.MOVIEID = PERSON.MOVIEID INNER JOIN NAMES ON PERSON.NAMEID = NAMES.NAMEID
WHERE FIRST_NAME = 'Clint' AND LAST_NAME = 'Eastwood';

Pros use FOR XML PATH to pull all the roles out as a comma separated string :smug:

I am not pro so cannot remember the syntax for this except that it's a pain in the arse to use :( .

herakles
Jan 17, 2009
I'm currently writing a mock filesystem to teach myself SQL, and I'm trying to write a stored procedure that will delete a folder and all sub-objects.

DB Schema is like so:
Owner (OwnerID, OwnerName, CreateDate)
Directory (DirectoryID, OwnerID, DirectoryName, ParentDirectoryID, CreateDate)
File (FileID, ParentDirectoryID, FileName, FileData, CreateDate)

Here's the procedure:
code:
CREATE PROC [dbo].[p_Directory_Delete]
@ID INT
AS
--Define variable table to hold child objects
DECLARE @ChildObjects TABLE
(DirectoryID INT,
 ParentDirectoryID INT NOT NULL)
;WITH cte_ChildDirectories (DirectoryID, ParentDirectoryID)
AS
 (SELECT DirectoryID, ParentDirectoryID
  FROM dbo.[Directory]
  WHERE ParentDirectoryID = @ID
  UNION ALL
  SELECT d.DirectoryID,
	d.ParentDirectoryID
  FROM cte_ChildDirectories ccd
  INNER JOIN dbo.[Directory] d
     ON ccf.DirectoryID = d.ParentDirectoryID
  )
--Insert the results of the CTE into the @ChildObjects variable table  
INSERT INTO @ChildObjects (DirectoryID, ParentDirectoryID)
SELECT DirectoryID, ParentDirectoryID
FROM cte_ChildDirectories
--Delete all files in the folder and its children
DELETE FROM dbo.[File]
WHERE FileID IN (SELECT DirectoryID, ParentDirectoryID FROM @ChildObjects)
DELETE FROM dbo.[Directory]
WHERE DirectoryID IN (SELECT DirectoryID, ParentDirectoryID FROM dbo.[Directory])
I'm obviously running into some problems here, anyone have any suggestions to make this work?

Powerful Two-Hander
Mar 10, 2004

Mods please change my name to "Tooter Skeleton" TIA.


herakles posted:

I'm currently writing a mock filesystem to teach myself SQL, and I'm trying to write a stored procedure that will delete a folder and all sub-objects.

DB Schema is like so:
Owner (OwnerID, OwnerName, CreateDate)
Directory (DirectoryID, OwnerID, DirectoryName, ParentDirectoryID, CreateDate)
File (FileID, ParentDirectoryID, FileName, FileData, CreateDate)

Here's the procedure:
code:
CREATE PROC [dbo].[p_Directory_Delete]
@ID INT
AS
--Define variable table to hold child objects
DECLARE @ChildObjects TABLE
(DirectoryID INT,
 ParentDirectoryID INT NOT NULL)
;WITH cte_ChildDirectories (DirectoryID, ParentDirectoryID)
AS
 (SELECT DirectoryID, ParentDirectoryID
  FROM dbo.[Directory]
  WHERE ParentDirectoryID = @ID
  UNION ALL
  SELECT d.DirectoryID,
	d.ParentDirectoryID
  FROM cte_ChildDirectories ccd
  INNER JOIN dbo.[Directory] d
     ON ccf.DirectoryID = d.ParentDirectoryID
  )
--Insert the results of the CTE into the @ChildObjects variable table  
INSERT INTO @ChildObjects (DirectoryID, ParentDirectoryID)
SELECT DirectoryID, ParentDirectoryID
FROM cte_ChildDirectories
--Delete all files in the folder and its children
DELETE FROM dbo.[File]
WHERE FileID IN (SELECT DirectoryID, ParentDirectoryID FROM @ChildObjects)
DELETE FROM dbo.[Directory]
WHERE DirectoryID IN (SELECT DirectoryID, ParentDirectoryID FROM dbo.[Directory])
I'm obviously running into some problems here, anyone have any suggestions to make this work?

Well one thing is that it'll only work for one level of directories as you're joining child directory to parent and not checking if there are any child directories under the child itself, you could add a while loop to go through all sub directories and add them to the delete list.

Also, the DELETE FROM statement will probably error because you've got multiple values going into the WHERE IN()(note: unless this is allowed in the version of SQL you're using), it should probably be something like:

DELETE FROM dbo.[File]
WHERE ParentDirectoryId in (select DirectoryID from @ChildObjects)

And similar for the second delete.

Edit: You could cut your table variable to one column and insert the directory ID specified as an input into it as well, that way you'd just have a list of directory IDs to be deleted.

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Why doesn't using count(expr) work in MySQL?

select count(name = 'bob') from employees;

It returns the total amount of rows. This works:

select count(name) from employees where name = 'bob';

I tried to do something like:

select count(name = 'bob'), count(name = 'jeff') from employees;

And I noticed they were equal numbers.

ShoulderDaemon
Oct 9, 2003
support goon fund
Taco Defender

Bob Morales posted:

Why doesn't using count(expr) work in MySQL?

In Postgres, at least, count(expr) returns the number of rows for which expr is not null. The expression name = 'bob' is only null if name is null, which I assume doesn't ever happen in your table.

Aredna
Mar 17, 2007
Nap Ghost
What ShoulderDaemon said is correct. You can do the following though:

code:
select
   sum(case when name = 'Bob' then 1 else 0 end) as bobcount,
   sum(case when name = 'Jeff' then 1 else 0 end) as jeffcount
from employees

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
Or just
code:
select
   count(case when name = 'Bob' then 1 end) as bobcount,
   count(case when name = 'Jeff' then 1 end) as jeffcount
from employees

Fruit Smoothies
Mar 28, 2004

The bat with a ZING
I've got a query
code:
INSERT INTO `table` (`field1`, `field2`, `field3`) VALUES (SELECT `value1`, `value2`, $array[$i] FROM `otherTable`)
Is there any way to get MySQL to loop through an array structure as it selects? To save me having to generate VALUES () () () for each row inserted?

Hope that makes sense.

Adbot
ADBOT LOVES YOU

Aredna
Mar 17, 2007
Nap Ghost
Using Oracle 10g Express (10.2.0.1.0)

I need to pull some data from CLOBs that are XML. I have it working using
code:
XMLType(fieldname).extract('/PATH/text()').getStringVal()
but I'm now finding that some of the XML records are not well formed.

Is there anyway I can default to NULL in these cases rather than have Oracle throw an error?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply