Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Victor
Jun 18, 2004
Lone_Strider, please provide rowcounts for those tables. Is there any possibility of upgrading to SQL2005? Its pivot operator is exactly what you want.

MoNsTeR posted:

It is highly discouraged and very wrong and makes me want to drink hemlock. :emo:

Not that this is at all helpful to anyone who's stuck with one, but please please please if you're thinking about implementing one, don't!
Yet another person says EVA is bad for every single situation out there. Depending on how much data you have, it isn't that bad at all. It's especially useful for when attributes are very sparsely populated.

Adbot
ADBOT LOVES YOU

Potassium Problems
Sep 28, 2001

Victor posted:

Lone_Strider, please provide rowcounts for those tables. Is there any possibility of upgrading to SQL2005? Its pivot operator is exactly what you want.
It varies from customer to customer. The point where one customer is experiencing slowness is 8 FileDataKeys records, 3862 FileData records, 26768 FileDataValue records.

Funny story, we had plans to update to SQL 2005 a year ago, but we work with the mortgage industry and when it took a poo poo, any upgrade plans went with it. I can push for it, arguing how much of an impact will make though & hopefully we can get it done.

var1ety
Jul 26, 2004

Triple Tech posted:

What is the alternative to a workflow that would like to support ad-hoc fields? It's still a pertinent, albeit ugly, question.

In Oracle you could speed up your query by using aggregate functions to avoid the multiple table scans forced by scalar subqueries. Something like this:

code:
select filename, max(case when attr = 'FOO' then val end) AS foo_val
from mytable
group by filename
I can't find the topic at the moment, but a best practice suggested by Tom Kyte for solving this problem in Oracle is to create, say, 20 custom attributes with a varchar2 type of suitable length at design time for your user to customize.

code:
create table t_files (filename varchar2(50) not null,
                      attr1 varchar2(100),
                      attr2 varchar2(100), ...)
You then wrap the table with a simple view.

code:
create view files as
select filename from t_files
When the user needs to create an attribute he sets aside one of the columns and populates it with data. For instance, he decides he needs to store a last modification time, so he sets aside attr1 and stores a date in it in YYYYMMDDHH24MISS format.

The user then modifies the view to include this new column, allowing himself to easily include it in his queries.

code:
create or replace view files as
select filename, to_date(attr1, 'YYYYMMDDHH24MISS') AS modified_date from t_files
If he needs to query our new column directly he can put a function based index on the column.

code:
create index t_files_nk1 on t_files(to_date(attr1, 'YYYYMMDDHH24MISS'))
Edit: Table breaking.

Victor
Jun 18, 2004
Try the case when approach below, along with left joins instead of subqueries.

ray2k
Feb 7, 2004

Puppet scum beware!
So in finally moving to Sql 2005 at work I'm revisiting some queries we pushed off because they do deep aggregation and would benefit from ROW_NUMBER/RANK/PIVOT/etc. If I have a schema where let's say a Product table can have 0 to many ProductReviews, where ProductReviews.ProductId references Product.ProductId, how would I get the first 9 reviews in a column/cross-tab fashion?

I.E. instead of
ProductId ReviewId ReviewText
1 1
1 2
...
1 9

I would see
ProductId Review1Id Review1Text Review2Id Review2Text ... Review9Id Review9Text
1 1 'lorem ipsum' 2 'lorem ipsum2' ... 9 'lorem psum9'

I've read that the norm in Sql 2000 is using CASE statements in the select query which works fine if the values are certain at query-time, but a ProductReviewId clearly wouldn't be.

MoNsTeR
Jun 29, 2002

Victor posted:

Yet another person says EVA is bad for every single situation out there. Depending on how much data you have, it isn't that bad at all. It's especially useful for when attributes are very sparsely populated.
There's little that EVA can do that you can't do with real tables, in which case you're actually using the RDBMS for what it's for. If you have a problem that truly demands an EVA solution, don't use an RDBMS.

More to the point, most of the time someone thinks they need EVA, they don't actually need it, and they are underestimating the maintenance effort by an order of magnitude. Hence the exhortation to reconsider.

MoNsTeR fucked around with this message at 16:55 on Nov 5, 2008

Puddleboy
Feb 15, 2008

MoNsTeR posted:

There's little that EVA can do that you can't do with real tables, in which case you're actually using the RDBMS for what it's for. If you have a problem that truly demands an EVA solution, don't use an RDBMS.

More to the point, most of the time someone thinks they need EVA, they don't actually need it, and they are underestimating the maintenance effort by an order of magnitude. Hence the exhortation to reconsider.

We use an EVA successfully in our web-based application. The same code and database layout is used for a number of different customers and industries, allowing them to customize the data they capture about objects in their system.

Some of our customers have 20 attribute fields for each object, some of them have over 600. In the case of the 600-field-client (with millions of objects tracked), only 20-40 of those attributes might be used by any given object, but they have many different departments that store different data.

To overcome the performance hits regarding data extraction/mining, we do a pivot of the metadata tables every few hours to a standard table that gets indexed. Search results are then based on this table.

EVA has its uses -- we couldn't replicate this in any simple fashion with a standard relational database table setup.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
But why not model them "the normal way"? What's so insanely beneficial to this setup? It's like the cost of adding a new field completely outweighs the cost incurred by modeling it properly? And that includes maintenance costs and complexity for both?

Puddleboy
Feb 15, 2008

Triple Tech posted:

But why not model them "the normal way"? What's so insanely beneficial to this setup? It's like the cost of adding a new field completely outweighs the cost incurred by modeling it properly? And that includes maintenance costs and complexity for both?

I wouldn't call it "insanely" beneficial, but the structure of the database is not nearly as complex or difficult to maintain as one might think, and we don't have any tables in the system that contain 600 unused columns for a client that only needs 30 columns.

There also aren't any particular limits on the number of different columns one might try to track, which could easily go much higher than the 600 with some of the future proposed applications of the software.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
I'm not convinced that this is the "right" or most effective thing to do (just saying, no convincing needed). Having 600 attributes to one object just smells like an attempt to model what are really heterogeneous objects whose dimensions are skew and have nothing to do with one another.

Puddleboy
Feb 15, 2008

Triple Tech posted:

I'm not convinced that this is the "right" or most effective thing to do (just saying, no convincing needed). Having 600 attributes to one object just smells like an attempt to model what are really heterogeneous objects whose dimensions are skew and have nothing to do with one another.

A specific application is for the tracking of a piece of evidence. Let's say that the item is a gun.

Then the gun has a make, model, serial number, caliber.

Then we also want to know if the gun has been tested by a lab. Yes? What types of tests were done to the item? What were the results of those tests? Who performed them? Who found this gun? Where did they find it? And so on and so forth. Now imagine I am a police department with 100 different types of evidence (guns, knives, blood, saliva, clothes, found property, drugs, etc), and for each type of evidence, we might go through a totally different collection process with a whole new set of "questions" or attributes to answer about the item in question.

We want to treat the guns and drugs the same at a base level, because we will be performing other functions on these items (chain of custody, disposition, etc) in an identical manner.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
So you have a base table called "evidence" and then you have a knives table, a guns table, a clothes table, that all join off this evidence table. The only weakness to this system is the inability to address completely brand new types of evidence, which I feel is the only scenario where EVA shines, and it doesn't happen often enough, I feel, to merit using it anyway.

Puddleboy
Feb 15, 2008

Triple Tech posted:

So you have a base table called "evidence" and then you have a knives table, a guns table, a clothes table, that all join off this evidence table. The only weakness to this system is the inability to address completely brand new types of evidence, which I feel is the only scenario where EVA shines, and it doesn't happen often enough, I feel, to merit using it anyway.

That's exactly what happens with our customer base. The customization of this structure is completely in the hands of the clients and they do, on a very regular basis, modify the types of evidence they are tracking -- especially forensic labs who might deal with items from a crime scene one day, to corporate information theft the next. We use the same software for any of the many industries that can benefit from its use, and it is flexible enough to allow them to track anything the way they want to.

I completely agree that for most (the vast majority) of purposes/applications EVA can make a mess of things quick, but the only claim I am making (from experience) is that it does, indeed, have non-abhorrent uses ;)

Victor
Jun 18, 2004

Triple Tech posted:

So you have a base table called "evidence" and then you have a knives table, a guns table, a clothes table, that all join off this evidence table. The only weakness to this system is the inability to address completely brand new types of evidence, which I feel is the only scenario where EVA shines, and it doesn't happen often enough, I feel, to merit using it anyway.
So, you require DDL to be run every time there's a new table or new column? We need metadata on the tables to indicate which tables are types of evidence, or we need an inviolable naming convention. Remember that any permissions settings probably need to get applied to all tables. Parameterized SQL is not allowed to parameterize the table name, so we're going to have to get into dynamic SQL. Any triggers on one table need to get replicated to all the others.

N.B. this only applies if one is not using table inheritance:
Views, if they exist, need replicating. Want to view "all items"? Then N tables need to be unioned together somehow. Whatever does that unioning needs to be kept up-to-date. Do you really want to require all this if we can make performance a non-issue?

Victor fucked around with this message at 02:26 on Nov 6, 2008

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
You don't union heterogeneous objects, that's what the evidence table is for. It has all of the pieces of evidence in the entire universe. The only thing in the other types is information specific to that piece of evidence.

It's the whole "let's treat things that are wildly different as the same without factoring out the similarities" angle that I find ridiculous.

Potassium Problems
Sep 28, 2001
Thanks for the help and recommendations guys, I'm talking my boss into upgrading to SQL 2005, and I'm getting familiar with pivot in the meantime.

MoNsTeR
Jun 29, 2002

Triple Tech posted:

The only weakness to this system is the inability to address completely brand new types of evidence, which I feel is the only scenario where EVA shines, and it doesn't happen often enough, I feel, to merit using it anyway.
Make a new table. More specifically, enable your application to create new tables behind the scenes as necessary. You can (and should) even create the corresponding referential integrity.

There's nothing wrong with having thousands or even tens of thousands of tables in your application's schema, and there's nothing wrong with tables that have hundreds of columns. Intuitively, there's a point at which this becomes unmanageable, if nothing else many implementations have hard limits on things like columns-per-table. But unless you're scratching at that limit, implementing an EVA is just an exercise in abstracting the RDBMS from itself, and re-inventing everything it normally does for you, like referential integrity, in ways that don't actually work.

The fallacy that leads people to create unnecessary EVA's is thinking that it's easier to maintain a set of abstracted, generic tables that use homegrown faux-RI (or none at all) than it is to create lots and lots of tables. It isn't.

Rumsfoord
Jan 29, 2005

Eat or Be Eaten

MoNsTeR posted:

Make a new table. More specifically, enable your application to create new tables behind the scenes as necessary. You can (and should) even create the corresponding referential integrity.

There's nothing wrong with having thousands or even tens of thousands of tables in your application's schema, and there's nothing wrong with tables that have hundreds of columns. Intuitively, there's a point at which this becomes unmanageable, if nothing else many implementations have hard limits on things like columns-per-table. But unless you're scratching at that limit, implementing an EVA is just an exercise in abstracting the RDBMS from itself, and re-inventing everything it normally does for you, like referential integrity, in ways that don't actually work.

The fallacy that leads people to create unnecessary EVA's is thinking that it's easier to maintain a set of abstracted, generic tables that use homegrown faux-RI (or none at all) than it is to create lots and lots of tables. It isn't.


I have to agree with this guy... we use EAV at work and it is horrible. I want to punch the "architect" who came up with our schema... we are years into using EAV schema -- and now we have a huge project to get out of it because of all of the problems we are having....


There is no referential integrity -- we have attributes of the same type for the same entity all over the place -- in one case I've seen 46 active attributes of the same type (with different values).

An example of this is a car -- we have one attribute for the car -- called color.

Imagine if you go in to query the attributes for CAR 123, and you get back 46 attributes -- some saying the car is blue, some saying the car is red, etc.

In addition, every time you want to display another attribute, you either join to the attribute table again, or use another call to a function to get the attributes value.

You also have to deal with types, I don't know what the standard way to deal with types is for EAV models, but ours uses a god drat varchar to store every single type of data and all kinds of crap is built in to convert it to a real type (for example, a money amount in our system is stored as a varchar with a precision and scale and a type id specifying what kind of value it is... when you go to retreive the value, it reads the mask and scale and turns '010033' into 100.33).

Furthermore, there are attributes out there for certain customers that are the same as other previously created attributes -- back to the car example -- one customer stores his car colors as a 'color' attribute while another calls it 'hue' attribute. So you want to know which customers of yours have blue cars, and now you are just supposed to know that you need to check for both 'hue' and 'color' attributes. The system sure as hell doesn't know the two attributes are synonomous (unless you program in more convoluted logic to tell it so).

Hey -- guess what -- you also get the fun of debugging 300 line queries full of magic numbers -- and I remember one day a report stopped working "all of a sudden." It was a query with 17 left joins to the attributes table -- it started with 2000 rows and due to data errors (this is where I found that 46 duplicate attributes for a given entity) it balooned up to 16,000,000 rows (which a distinct clause would pare back down to a few thousand albeit at huge expense).

I HATE EAV... every day I think of quitting because of the horrible poo poo queries I see and have to debug.... but for some reason I don't. Maybe I am a masochist.

Rumsfoord
Jan 29, 2005

Eat or Be Eaten

dancavallaro posted:

This man speaks the truth. It didn't work when I tried it in the WHERE close, but with HAVING it worked fine.

One step further: how can I, instead of selecting the DonorID along with the sum, select the DonorName from the Donors table corresponding to that DonorID?

edit: This worked:

code:
SELECT (SELECT FullName FROM Donors WHERE Donors.DonorID = Donations.DonorID), SUM(DonationAmt)
FROM Donations
GROUP BY Donations.DonorID
HAVING SUM(DonationAmt) >= 250;

You can also do a join if you don't want to do a bunch of subselects... there are 2 ways you can do it:

I prefer:
code:
SELECT 
     D.DonorID,
     D.FullName,
     SUM(DN.DonationAmt) as TotalDonations
FROM
     Donations DN JOIN Donors D
          ON DN.DonorID = D.DonorID
GROUP BY D.DonorID, D.FullName
or

code:
SELECT 
     D.DonorID,
     D.FullName,
     SUM(DN.DonationAmt) as TotalDonations
FROM
     Donations DN, Donors D
WHERE          
     DN.DonorID = D.DonorID
GROUP BY D.DonorID, D.FullName

Acer Pilot
Feb 17, 2007
put the 'the' in therapist

:dukedog:

Oh god. Please help me figure out what index to put in for this. I see this in all my slow logs since it doesn't use indexes. It's called on pretty much every page on my forum.

code:
SELECT moderator.mid as is_mod, moderator.allow_warn, m.*, g.*
FROM ibf_members m
	LEFT JOIN ibf_groups g ON (g.g_id=m.mgroup)
	LEFT JOIN ibf_moderators moderator ON (moderator.member_id=m.id OR moderator.group_id=m.mgroup )
WHERE m.id=592;
code:
id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	m 	const 	PRIMARY,id 	PRIMARY 	3 	const 	1 	 
1 	SIMPLE 	g 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	 
1 	SIMPLE 	moderator 	ALL 	group_id,member_id 	NULL 	NULL 	NULL 	30
I can't, for the life of me, figure this out.

Victor
Jun 18, 2004

Triple Tech posted:

You don't union heterogeneous objects, that's what the evidence table is for. It has all of the pieces of evidence in the entire universe. The only thing in the other types is information specific to that piece of evidence.
You're correct -- I forgot you had stated the use of table inheritance. That addresses only part of my post. In particular, with the multi-table approach, you still need some metadata scheme to describe the tables and columns to users, and have the application itself metadata driven. It's not all peaches and cream. I have yet to be convinced that it is always a better approach than EVA.

Just to note, I have quite a bit of experience with using metadata to describe data structure and then automatically generate user interfaces from that metadata. In fact, my shop has an app that would very easily support the table inheritance model you propose. However, this is only because we have a pretty extraordinary metadata architecture -- something that took a lot of time to develop.

quote:

It's the whole "let's treat things that are wildly different as the same without factoring out the similarities" angle that I find ridiculous.
Hmmm, I could derive an attack against dynamic languages from that statement. :-p More to the point, you don't seem to be willing to do a solid comparison between table inheritance and EVA; instead, you argue on solidly ideological grounds. That's called ivory tower thinking in some places.

Rumsfoord, be careful of criticizing technique X because you have personal experience with technique X implemented in a particularly crappy way. Your multiple-values-for-a-given-attribute problem can be solved with a simple unique constraint. Trust me, duplicate, crappy data is in no way unique to EVA. Your performance worries of many joins or subqueries is alleviated by SQL2005's pivot operator, which I demonstrate in my short snippet on using EVA to store audit logs. Some of the crappier databases don't support pivot and it's a crying shame. I've written SQL2005 views that pivot over 25 columns, resulting in a view that has 2,000,000 rows, and performance was lightening-fast. Your color-vs-hue problem is merely a problem with enforcing semantically-unique attribute names.

Rabbi Dan
Oct 26, 2005

ASK ME ABOUT MY CREEPY FACEBOOK APP FOR STALKING GIRLS I DON'T KNOW
I'm trying to write an SQL query for my .NET AJAX AutoCompleteExtender control.

I have a bunch of records in a column in my database which are class titles and I'm trying to write the query that will get the suggestion results for the auto-suggest box.

Example class records might be:

"Contemporary Documentary Films"

"Electromagnetic Fields"

"Intermediate Calculus"


So far all I have is: SELECT * FROM Classes WHERE Title Like 'Prefix%'

But that only matches the first word in the title string. I want to make it so that typing "Doc" would match "Contemporary Documentary Films" just like typing "Cont" and "Fil" would. I can't just do WHERE Title Like '%Prefix%' because then weird stuff like "cum" would match "Documentary" in "Contemporary Documentary Films" which is undesirable.

Could anyone please let me know how I can do this? Thanks!

Puddleboy
Feb 15, 2008

Rabbi Dan posted:

I'm trying to write an SQL query for my .NET AJAX AutoCompleteExtender control.

I have a bunch of records in a column in my database which are class titles and I'm trying to write the query that will get the suggestion results for the auto-suggest box.

Example class records might be:

"Contemporary Documentary Films"

"Electromagnetic Fields"

"Intermediate Calculus"


So far all I have is: SELECT * FROM Classes WHERE Title Like 'Prefix%'

But that only matches the first word in the title string. I want to make it so that typing "Doc" would match "Contemporary Documentary Films" just like typing "Cont" and "Fil" would. I can't just do WHERE Title Like '%Prefix%' because then weird stuff like "cum" would match "Documentary" in "Contemporary Documentary Films" which is undesirable.

Could anyone please let me know how I can do this? Thanks!


It looks like a simple solution to this would to be match only the beginning of each word which you could do with:

WHERE Title LIKE 'Prefix%" OR Title LIKE '% Prefix%';

Be sure to note the space in the second condition.

Rabbi Dan
Oct 26, 2005

ASK ME ABOUT MY CREEPY FACEBOOK APP FOR STALKING GIRLS I DON'T KNOW

Puddleboy posted:

It looks like a simple solution to this would to be match only the beginning of each word which you could do with:

WHERE Title LIKE 'Prefix%" OR Title LIKE '% Prefix%';

Be sure to note the space in the second condition.

Wow that works...and I get it...Thank you.

thefncrow
Mar 14, 2001
I'm working in a SQL Server 2005 environment here.

I have a large data table with a 3 part primary key. The 3 part primary key is something like CustomerNumber, BillID, LineItem. Changes to this table are difficult at best, because it's large and partitioned.

I've got another table, now, where I'm working with the line items for a given (CustomerNumber, BillID) combination. Basically, this table is an identity int column, CustomerNumber, BillID, and some XML data about those line items and some associated data for each. What I want is a constraint on this table that, for any inserted row, the (CustomerNumber, BillID) combination matches a record in my larger table.

I can't do it as a foreign key, because of the third part of the key(and the PK without the third part is not unique to each record). How can I enforce this? It seems like I'm missing something obvious, but I can't see it.

Sergeant Hobo
Jan 7, 2007

Zhu Li, do the thing!

thefncrow posted:

I'm working in a SQL Server 2005 environment here.

I have a large data table with a 3 part primary key. The 3 part primary key is something like CustomerNumber, BillID, LineItem. Changes to this table are difficult at best, because it's large and partitioned.

I've got another table, now, where I'm working with the line items for a given (CustomerNumber, BillID) combination. Basically, this table is an identity int column, CustomerNumber, BillID, and some XML data about those line items and some associated data for each. What I want is a constraint on this table that, for any inserted row, the (CustomerNumber, BillID) combination matches a record in my larger table.

I can't do it as a foreign key, because of the third part of the key(and the PK without the third part is not unique to each record). How can I enforce this? It seems like I'm missing something obvious, but I can't see it.

I think you're looking for a TSQL trigger. Basically, you would need to run a query on the 3-key table to search for the specific CustonerNumber and BillID combo that is coming into the database and, if it exists in the 3-key table, you go to the INSERT statement and, if not, throw back some kind of error message. Sorry, but I'm not familiar with TSQL; the limited experience I have with this stuff has been through Oracle's PL/SQL.

thefncrow
Mar 14, 2001

Sergeant Hobo posted:

I think you're looking for a TSQL trigger. Basically, you would need to run a query on the 3-key table to search for the specific CustonerNumber and BillID combo that is coming into the database and, if it exists in the 3-key table, you go to the INSERT statement and, if not, throw back some kind of error message. Sorry, but I'm not familiar with TSQL; the limited experience I have with this stuff has been through Oracle's PL/SQL.

Yeah, I just was thinking about a trigger, and I think that's the way I'm going to want to go at this point. It just didn't occur to me at first because I kept thinking there was some sort of constraint I could apply to the table that would do the trick, but I think it'll just have to be a trigger.

Thanks.

Sergeant Hobo
Jan 7, 2007

Zhu Li, do the thing!
Before making that last post, I was thinking you'd might be able to do this with some kind of assertion on the tables but I don't think that would've been correct, unless you can do some line like:

code:
CHECK EXISTS(query to search 3-key table for new values of CustomerNumber and BillID);
And even then, I'm not sure if that'd be the best approach. The trigger might be better overall for this kind of thing.

Forzan
Mar 15, 2002

by Ozmaugh
I've finished the design work and I'm about to start coding a web application that will use about sixty tables with around twenty to one hundred columns in each, most of those integers and booleans. A few of the tables are going to be update intensive. I was thinking of using Postgres.

I looked for database comparisons using Google, but I can't find anything after 2005. In fact, all of the old news I read mislead me to believe that MySQL couldn't perform stored procedures, or that Postgres had performance problems. But apparently Postgres is pretty fast and MySQL has had stored procedures built in since 2005. It's even worse if I limit my search results within the last year - just a bunch of articles from 2001 and 2003 that have website redesigns.

So what's the new consensus on Postgres for web applications? Familiarity isn't a problem, nor is hosting. Should anything deter me from using it? Does this come down to religion?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

thefncrow posted:

I'm working in a SQL Server 2005 environment here.

I have a large data table with a 3 part primary key. The 3 part primary key is something like CustomerNumber, BillID, LineItem. Changes to this table are difficult at best, because it's large and partitioned.

I've got another table, now, where I'm working with the line items for a given (CustomerNumber, BillID) combination. Basically, this table is an identity int column, CustomerNumber, BillID, and some XML data about those line items and some associated data for each. What I want is a constraint on this table that, for any inserted row, the (CustomerNumber, BillID) combination matches a record in my larger table.

I can't do it as a foreign key, because of the third part of the key(and the PK without the third part is not unique to each record). How can I enforce this? It seems like I'm missing something obvious, but I can't see it.
Two sets of questions:
Why is there not a "bills" table which has a key of (CustomerNumber, BillId) instead of just a "bill line items" table? If there is such a table, then why aren't you using that for the foreign key constraint?

If your new table has XML data about the line items, why is LineItem not part of the key? If the XML data is about the bills and not the bill line items, see the above questions.

If you don't have any real ability to change the schema then neither of those questions is particularly helpful, but it seems like it points out why you're having trouble.

As for your problem, I just recently learned about check constraints, and they're my new best friend. To make one that does what you want, create a UDF that returns true or false depending on whether a given (CustomerNumber, BillId) is in the line item table, and then create a check constraint on the table which requires that the function return true for the (CustomerNumber, BillId) of the record in question.

pikes
Aug 9, 2002
:o
In SQL 2005 I'm writing a query that works fine as an inner select or a join but I know there's gotta be a way to do it without doing either of those, for obvious performance reasons. I just want to hit the database once.

Lets say you have a table with 2 columns, users and shipments. A user can have multiple shipments but those shipments can only be inbound or outbound so the table would look like this (I'm keeping it very basic for the sake of this example):

Shipments
code:
User            Shipment
1               Inbound
1               Inbound
1               Inbound
1               Inbound
2               Inbound
2               Outbound
2               Inbound
3               Inbound
3               Inbound
3               Inbound
3               Inbound
4               Outbound
4               Outbound
4               Outbound
I want to pull back all users that did not have any outbound shipments. I know I can do it with an inner select:
code:
select user
from users
where user not in
(
        select user
        from shipments
        where shipment = 'Outbound'
)
This seems so simple but I don't know how I'd do it without hitting the database twice. Any ideas?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

pikes posted:

This seems so simple but I don't know how I'd do it without hitting the database twice. Any ideas?
The other way to write it would be

code:
select users.user
from users left outer join
     shipments on shipments.user = users.user and shipment = 'Outbound'
where shipments.user is null
You'll have to test to see which is better, assuming they don't perform the same.

Stephen
Feb 6, 2004

Stoned

Forzan posted:

I've finished the design work and I'm about to start coding a web application that will use about sixty tables with around twenty to one hundred columns in each, most of those integers and booleans. A few of the tables are going to be update intensive. I was thinking of using Postgres.

I looked for database comparisons using Google, but I can't find anything after 2005. In fact, all of the old news I read mislead me to believe that MySQL couldn't perform stored procedures, or that Postgres had performance problems. But apparently Postgres is pretty fast and MySQL has had stored procedures built in since 2005. It's even worse if I limit my search results within the last year - just a bunch of articles from 2001 and 2003 that have website redesigns.

So what's the new consensus on Postgres for web applications? Familiarity isn't a problem, nor is hosting. Should anything deter me from using it? Does this come down to religion?
Wikipedia has a pretty good entry for this here. I don't know exactly how accurate it is, but it seems ok.

Victor
Jun 18, 2004

pikes posted:

I want to pull back all users that did not have any outbound shipments. I know I can do it with an inner select:
code:
select user
from users
where user not in
(
        select user
        from shipments
        where shipment = 'Outbound'
)
This seems so simple but I don't know how I'd do it without hitting the database twice. Any ideas?
This query does not hit the database twice. In fact, if you do a query plan on this, you will likely find that the query optimizer is performing a "Left Anti Semi Join". That's what Jethro's code does, explicitly. SQL Server 2005 almost definitely does the same thing for both queries.

Vince McMahon
Dec 18, 2003
Got a bit of a potentially complex design/query issue..

I work for an image library, which has roughly 200k images on file. We're a small company and our system has grown organically which invariably means it's a bit of a mess. At the moment, image information is stored in a MySQL database along the lines of:

Reference code | Keywords | Cat1 | Cat2 | Cat3 | Cat4 | Cat5

The keywords field is a string of 20-30 keywords delimited by commas (e.g. "keyword1,keyword2,keyword3......"). There are tens of thousands of keywords (maybe more). There are maybe 50 different category types (stored in the 'cat' fields).

When people search at the moment, the query finds any image refs where the keywords are LIKE '%input%', or any of the 5 categories are like '%input%'. This means that when it's searching for say '%dog%', it goes through a string of up to 255 characters in the keywords field, plus each category field, for 100,000 records. It's painful to watch it struggle like this.

I'm wondering if you guys know of a better way to 1) store the information, and 2) search the information. I'm not particularly advanced with mysql, but I thought perhaps a relational database might be in order as I hate comma-delimited fields. Any input is appreciated.

Squashy Nipples
Aug 18, 2007

Wow, interesting problem, I've never dealt with anything quite like that. I'm curious to hear if there are "standard" solutions to this.

The one thing I do know is that putting any kind delimited data into a DB field completely defeats the purpose of a relational DB. That could be most of your performance problems right there. To fix the delimited field, I would make a three column table just for the keywords: unique key, picture ID, keyword. It will have a ton of rows, but you'll be able to properly join and filter on specific keywords.

Vince McMahon
Dec 18, 2003

Dr.Khron posted:

Wow, interesting problem, I've never dealt with anything quite like that. I'm curious to hear if there are "standard" solutions to this.

The one thing I do know is that putting any kind delimited data into a DB field completely defeats the purpose of a relational DB. That could be most of your performance problems right there. To fix the delimited field, I would make a three column table just for the keywords: unique key, picture ID, keyword. It will have a ton of rows, but you'll be able to properly join and filter on specific keywords.

Yeah sorry I didn't explain that very well. By relational DB I kind of meant what you described - basically a row per keyword per image. It concerns me that the rows in this database will be in the millions - will that slow things down or can mysql handle it?

toby
Dec 4, 2002

Is there any reason not to make a keywords table, with just unique id and keyword, a picture table, with picture info, and a table to tie the two together (keyword id, picture id)?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Vince McMahon posted:

Yeah sorry I didn't explain that very well. By relational DB I kind of meant what you described - basically a row per keyword per image. It concerns me that the rows in this database will be in the millions - will that slow things down or can mysql handle it?
even mysql will be able to handle millions of rows better than it can handle comma delimited fields.

Adbot
ADBOT LOVES YOU

Vince McMahon
Dec 18, 2003

toby posted:

Is there any reason not to make a keywords table, with just unique id and keyword, a picture table, with picture info, and a table to tie the two together (keyword id, picture id)?

Would that be like this?

code:
SELECT   P_table.Picture
FROM     P_table
         INNER JOIN Kw_p_table
           ON P_table.Pid = Kw_p_table.Pid
         INNER JOIN Kw_table
           ON Kw_table.Kid = Kw_p_table.Kid
WHERE    Kw_table.Keyword LIKE '%input%'
GROUP BY P_table.Picture
Is that technically more efficient than having two tables - one for unique keywords and one for keywords + picture pairs, or one table (row for every picture & keyword)?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply