Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

namol posted:

Stupid SQL Question here.

System: SQL Server 2005 running on Server 2003 Enterprise 64bit.

Can you mix using the like clause of where and the contains clause? I'm trying to implement a google like search feature on an internal site and I'm having issues searching for numerical values. The table that being hit has been setup for full text searching. Whenever I search for an exact numerical value it never returns any results but if I search for using a wild card it will find it.

SELECT IDNum, QuoteNumber, DodgeNumber, JobName, Estimator, City, State, Rep, Rep2, ProjectType, Zip, County FROM estdodge WHERE CONTAINS(*, '"12345"') Order by QuoteNumber DESC

SELECT IDNum, QuoteNumber, DodgeNumber, JobName, Estimator, City, State, Rep, Rep2, ProjectType, Zip, County FROM estdodge WHERE CONTAINS(*, '"1234*"') Order by QuoteNumber DESC

IE:
Search for 12345 returns no results.
Search for 1234* returns results like 12341,12342,12343,12344,12345,123456 etc.

Is there some reason for this behavior that I'm missing?

Are all the columns string data types? i wonder if its not matching them as non-wildcard because of string->number matching issues or something strange.

Adbot
ADBOT LOVES YOU

namol
Mar 21, 2007
I double checked this morning and the columns are all varchar data types except for the IDNum which is an integer type.

Can where contains and like be combined for use?

Frozen Peach
Aug 25, 2004

garbage man from a garbage can
I know there's got to be a way to handle this in SQL but gently caress if I know how. My SQL-fu is weak. Hopefully someone can point me in the right direction.

We have a customer database. Customers are grouped by districts and each district has their own customer entry along with any number of customers inside of that district. It's all one big ugly table and I hate it. A customer number is 8 digits. The first 5 signify the district, the last 3 the customer within that district. The 000 record is the master record for the district.

Ex:

12345000 = District
12345001 = Customer #1
12345023 = Customer #23

Each customer has a value I want to group by.

code:
SELECT Substring(customer, 1, 5) + '000', 
       VALUE, 
       COUNT(*) 
FROM   customers 
GROUP  BY Substring(customer, 1, 5) + '000', 
          VALUE 

Customer        Value  Count
-------------------------------------
12345000	120	5
12345000	123	2
12345000	997	1
98765000	120	5
11111000	390	1
22222000	120	1
So for district 12345 there are 5 customers with 120, 2 with 123, and 1 with 997. For 98765 there are 5 and they're all 120.

The problem is for 12345 I need to know the exact customers that are the odd balls. The value with the highest count should be returned as customer 12345000, and anyone that has a different value within that district should be included otherwise.

I basically want my result set to look like this:

code:
Customer        Value
---------------------
12345000	120
12345001	123
12345002	123
12345023	997
98765000	120
11111000	390
22222000	120
I'm close to just giving up and doing it procedurally but I just have this sinking suspicion there's something stupid I'm missing that would let me do this all in SQL.

Edit: SQL Server 2000, if that makes a difference. :negative:

Frozen Peach fucked around with this message at 16:55 on Mar 23, 2011

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
Upgrade to SQL 2005/8 and then use analytical functions.

Molotov Cock Tale
Jun 30, 2010

Frozen-Solid posted:

I know there's got to be a way to handle this in SQL but gently caress if I know how. My SQL-fu is weak. Hopefully someone can point me in the right direction.
Edit: SQL Server 2000, if that makes a difference. :negative:

My SQL-Fu is strong :science: well strong-ish, but I'm not sure how you wanted duplicate groups handled, so this doesn't do that. I think I've got the test data roughly right, let me know if not...

code:

DECLARE @TABLE TABLE
(
	Customer CHAR(8) NOT NULL,
	Value SMALLINT NOT NULL
)
INSERT INTO @TABLE(Customer,Value)
SELECT '12345001', 123 UNION
SELECT '12345002', 123 UNION
SELECT '12345003', 120 UNION
SELECT '12345004', 120 UNION
SELECT '12345005', 120 UNION
SELECT '12345006', 120 UNION
SELECT '12345007', 120 UNION
SELECT '12345023', 997 UNION
SELECT '98765001', 120 UNION
SELECT '98765002', 120 UNION
SELECT '98765003', 120 UNION
SELECT '98765004', 120 UNION
SELECT '98765005', 120 UNION
SELECT '11111000', 390 UNION
SELECT '22222000', 120

SELECT A.AreaCode + '000' AS Customer, A.Value 
FROM
(
	SELECT SUBSTRING(customer, 1, 5) AS AreaCode, Value, COUNT(*) AS GroupCount
	FROM   @TABLE AS T
	GROUP  BY SUBSTRING(customer, 1, 5), Value
) AS A
INNER JOIN
(
	SELECT AreaCode, MAX(GroupCount) AS MaxGroupCount
	FROM
	(
		SELECT SUBSTRING(customer, 1, 5) AS AreaCode, Value, COUNT(*) AS GroupCount
		FROM   @TABLE 
		GROUP  BY SUBSTRING(customer, 1, 5), Value
	) AS Derived
	GROUP BY AreaCode
) AS B ON B.AreaCode = A.AreaCode AND B.MaxGroupCount = A.GroupCount

UNION

SELECT T.Customer, T.Value
FROM @TABLE AS T
INNER JOIN
(
	SELECT A.AreaCode, A.Value FROM
	(
		SELECT SUBSTRING(customer, 1, 5) AS AreaCode, Value, COUNT(*) AS GroupCount
		FROM   @TABLE AS T
		GROUP  BY SUBSTRING(customer, 1, 5), Value
	) AS A
	INNER JOIN
	(
		SELECT AreaCode, MAX(GroupCount) AS MaxGroupCount
		FROM
		(
			SELECT SUBSTRING(customer, 1, 5) AS AreaCode, Value, COUNT(*) AS GroupCount
			FROM   @TABLE 
			GROUP  BY SUBSTRING(customer, 1, 5), Value
		) AS Derived
		GROUP BY AreaCode
	) AS B ON B.AreaCode = A.AreaCode AND B.MaxGroupCount = A.GroupCount
) AS C ON SUBSTRING(T.customer, 1, 5) = C.AreaCode AND T.Value <> C.Value

And yes, this would have been massively easier in 2005 with windowing functions, but upgrading from 2000 -> 2005 is much harder than upgrading 2005->2008.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad

Golbez posted:

Let's say I have a nightly process that imports millions of rows from a file. We create a table and indexes, then load the file. This process is running so long, it's starting to run into the work day. So I was wondering: Do you think there'd be quantifiable optimization by not defining the indexes before we load the file (thus not having to reindex on every entry), and defining the indexes only after the file is loaded?

This is a higher-level response --
We went the way of delaying indexing and doing it on the backside to speed things up. Then our data went up by 10x. Then we rewrote our SLAs to push back data availability to 9am. Then our data went up 10x again. We had to switch to Vertica (which is no walk in the park itself).

There's two points I'd impart on you:
- consider if you really need to load/store everything, at least into MySQL. In our case, it's really a waste, since the customers look at maybe 1% of the data ever. The reason is that we were ambitious about what we wanted to store. And we decided to store everything, where our data was human-generated. So tons and tons of insignificant events. Instead of coming up with the right summary data to extract, we blindly loaded in everything. So when you're talking about hundreds of millions of different events, it's just not wise to be trying to store it into a relational database. And yet we still try :v:
- If you're having performance issues with MySQL, I'd say you're already on thin ice, in terms of disk space and indexing time. If your data is human-generated and could scale up quickly, be vigilant, I guess.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

kimbo305 posted:

This is a higher-level response --
We went the way of delaying indexing and doing it on the backside to speed things up. Then our data went up by 10x. Then we rewrote our SLAs to push back data availability to 9am. Then our data went up 10x again. We had to switch to Vertica (which is no walk in the park itself).

There's two points I'd impart on you:
- consider if you really need to load/store everything, at least into MySQL. In our case, it's really a waste, since the customers look at maybe 1% of the data ever. The reason is that we were ambitious about what we wanted to store. And we decided to store everything, where our data was human-generated. So tons and tons of insignificant events. Instead of coming up with the right summary data to extract, we blindly loaded in everything. So when you're talking about hundreds of millions of different events, it's just not wise to be trying to store it into a relational database. And yet we still try :v:
- If you're having performance issues with MySQL, I'd say you're already on thin ice, in terms of disk space and indexing time. If your data is human-generated and could scale up quickly, be vigilant, I guess.

It's a problem that will be solved within the year because we're moving that database in-house, and in the meantime, I managed to cut the replication time down by six hours, making me a local hero in the office. :v:

Frozen Peach
Aug 25, 2004

garbage man from a garbage can

Molotov Cock Tale posted:

My SQL-Fu is strong :science: well strong-ish, but I'm not sure how you wanted duplicate groups handled, so this doesn't do that. I think I've got the test data roughly right, let me know if not...

<snip>

And yes, this would have been massively easier in 2005 with windowing functions, but upgrading from 2000 -> 2005 is much harder than upgrading 2005->2008.

I've only gotten to the first half of it, but so far this looks awesome. It's taking some time to modify the queries to work with my data (there's all sorts of joins and where clauses and magic I'm doing just to get it in the format I posted) I'm not sure what you mean by duplicate groups, but it's early and my brain hasn't kicked in yet. This is a great start though and I should be able to handle the rest. Thank you so much!

Every problem I've run into with SQL in the last 2 years comes with "well if you were on 2005!!!" It's really depressing. We've looked at upgrading to 2005 or 2008, but we're a small business and the cost of upgrading (both in testing the migration and rewriting a ton of stored procedures that likely won't work... and the actual server software itself) is pretty staggering.

Edit: finished screwing around with the rest of the query and from what I can tell it's perfect. Send me an e-mail frozen at frozen-solid.net and I'll give you a plat upgrade certificate for your time

Frozen Peach fucked around with this message at 15:08 on Mar 24, 2011

Molotov Cock Tale
Jun 30, 2010

Frozen-Solid posted:

Edit: finished screwing around with the rest of the query and from what I can tell it's perfect.

What I'd meant by duplicate groups was how should it be handled when two values in the same district have the same count. Looking at some test data I made up, it may go a bit wrong:

code:

DECLARE @TABLE TABLE
(
	Customer CHAR(8) NOT NULL,
	Value SMALLINT NOT NULL
)
INSERT INTO @TABLE(Customer,Value)
SELECT '11111001', 390 UNION
SELECT '11111002', 391 UNION
SELECT '11111003', 390 UNION
SELECT '11111004', 391 UNION
SELECT '11111005', 392 

--Query snipped

Customer Value
-------- ------
11111000 390
11111000 391
11111001 390
11111002 391
11111003 390
11111004 391
11111005 392

(7 row(s) affected)

So if the "top value" for the district is tied between two or more, you'll get the duplicates. I haven't done a rewrite as you may have already fixed this, or it may not be possible with your data?

If it's a bad/repeated query to get the data in the initial format, don't forget temp tables. Depends on data volume of course, could be faster or slower than querying many times...Test, test, test.

Dropped you an email - let me know if you need more help, or try stackoverflow.com, where people will fight to the death to get an accepted answer on TSQL :allears:

Lister_of_smeg
May 25, 2005
Hi guys,

I'm doing a course on relational database theory. I'm stuck on an assignment question and wondered if any of you fine goons could maybe give me some pointers.

The question is based around the three relations detailed below. I've cut out the stuff that isn't relevant to this question:

relation Event
primary key: EventCode
Date

relation Entertainer
primary key: EntertainerCode

relation EntertainedBy
primary key: (EventCode, EntertainerCode)


I have to define a constraint on these relations of the following nature: An entertainer cannot perform at more than one event on the same date.

The constraint must be defined using relational operators such as select, join, project, divide etc.

I know how to get the relevant data together (project (Event join (Entertainer join EntertainedBy)) over EventCode, Date, EntertainerCode) but can't figure out how to add the constraint.

I'm not looking for the answer as I do want to figure this out myself, just some pointers to hopefully get me in the right direction.

Thanks!

Thel
Apr 28, 2010

My relational theory-fu is a bit rusty, but wouldn't that just be a unique constraint on EntertainedBy (i.e. no duplicate entries)?

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Thel posted:

My relational theory-fu is a bit rusty, but wouldn't that just be a unique constraint on EntertainedBy (i.e. no duplicate entries)?

Isn't the unique constraint enforced by the way the Primary Key on EntertainedBy is defined?

Lister_of_smeg
May 25, 2005
Sorry, I'll clarify the question a little: Write a relational expression to express the following constraint: An entertainer cannot perform at more than one event on the same date.

Bloody Antlers
Mar 27, 2010

by Jeffrey of YOSPOS
I'm making a thesaurus application as a self learning project, and I'm a little confused about how I should structure my database. I can think of a few ways to do it, but I am interested in knowing which approach is most effective/efficient. I'm working with SQLite.

I am assuming the thesaurus database will have ~4,000 unique word entries, with each entry having anywhere from 1 to 100 synonyms associated with it. Since the number of synonyms for each word will be variable, I'm not sure how to best arrange/query the stored data.

I can immediately think of two ways to build the database & query synonyms:

1) Three columns & one table approach:

Table name = allwords.

Column A: ID (primary key, auto increment)
Column B: Word (the word which we want synonyms for)
Column C: Synonym (one synonym for the Word in column B)

Query: SELECT * FROM "allwords" WHERE word = "fight"

Returns:

1 fight scrap
2 fight altercation
3 fight scuffle
4 fight brawl

2) Two columns & many tables approach:

Each Word has its own table, and each table has two columns:

Table name = fight

Column A: ID (primary key, auto increment)
Column B: Synonym

Query: Select * FROM "fight"

Returns:

1 scrap
2 altercation
3 scuffle
4 brawl

Which of the two approaches would be best? Am I on the right track with either of them?

Any thoughts or comments would be greatly appreciated.

McGlockenshire
Dec 16, 2005

GOLLOCKS!
Possibly-stupid drowsy-before-bed idea.

Are there ever conditions under which word A is synonymous with word B, but word B is not synonymous with A?

I'm pretty sure the answer is no.

Therefore, you can effectively group all of the related words together into synonym families.

Table Families ( family integer primary key )
Table Words ( word text, family integer references Families(family), unique(word, family) )

To get synonyms, SELECT word, family FROM Words WHERE family = ( SELECT family FROM Words WHERE word = ? )

Keep in mind that words can be multiple parts of speech and have a different meaning based on that context. This implies that each word can be a member of multiple families, which instantly breaks the query I just wrote. You'd need to add part-of-speech or context or something else to the Words table to differentiate different word meanings to tie them together with the appropriate family of synonyms, then find a way to query that one specific word+pos+context.

Given that there could be multiple entries for one word, you could end up returning words from the synonym query that belong to other families, which you could then crawl out to a certain extent to get related-but-not-synonymous words.

Anyway, if this seems stupid, pick option #1. Option #2, the creation of multiple tables for specific data, is an anti-pattern and you should forget you ever proposed it.

McGlockenshire fucked around with this message at 07:32 on Mar 28, 2011

namol
Mar 21, 2007
Dumb question here but in sql server 2005 can you store the results of a query temporarily and then perform a search on those results? If there's a better way of doing things, please let me know, I'm a clueless newbie when it comes to sql in general.

spiritual bypass
Feb 19, 2008

Grimey Drawer
A subquery perhaps?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Depending on what you want to do, and how long you want to retain the information, there's a couple ways. If it's pretty short term (e.g. the lifespan of a stored proc/function) you could use temporary tables, and for longer term you could use a view, maybe.

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

namol posted:

Dumb question here but in sql server 2005 can you store the results of a query temporarily and then perform a search on those results? If there's a better way of doing things, please let me know, I'm a clueless newbie when it comes to sql in general.

Are you sure you need another query? how about you post the problem your trying to solve?

Arcsech
Aug 5, 2008
I've read the first few and the last few pages of the thread, but what's a good SQL tutorial? Google turns up w3schools (which I know I shouldn't use) and some other sketchy looking sites.

bamhand
Apr 15, 2010
The first couple searches that Google turns up are ok, you mainly just want to practice a lot and google specific things you want to do.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Depends on the language too. T-SQL (e.g. Microsoft SQL Server) or O/PL SQL (e.g. Oracle) or PostGres/MySQL?

(below assumes T-SQL)
This looks okay for learning basics though I'm sure there's better ones, since it seems like it's just scraped from somewhere else: http://www.sql-tutorial.net/ (just a little MFA)

Generally when I do quick searches and want specific info I'll click on a java2s or devguru link 90% of the time.

Peanut and the Gang
Aug 24, 2009

by exmarx
I keep adding "WHERE key=x AND closed=0" to a lot of my mysql queries. But the rows have `closed` set to 1 after they're a month old. So as time goes on, the percent of my table rows with closed=1 will end up as the majority of my table. Is it a good idea to switch my index from a single index (key) to multipart index (closed, key)? From what I've read, it sounds like when doing a multipart index (in mysql), it's ideal to use the least variable column as the first part of the index. So until I'm able to start purging old data, the first 90% of the index will be with closed=1, which I want to ignore, and the last 10% will be the same as I was querying earlier, but without having to worry about extra processing WHERE closed=1 anymore. Is this method more efficient than my old method of using the first index and then filtering the 10% with closed=0?

To add an example:
code:
threads(
  id
  userId
  deleted
)
Query: SELECT * FROM threads WHERE userId=17 AND deleted=0;
If 90% of threads have deleted=1, which index is better?:
ALTER TABLE threads ADD INDEX(userId)
ALTER TABLE threads ADD INDEX(deleted,userId)

Peanut and the Gang fucked around with this message at 03:19 on Apr 6, 2011

Aredna
Mar 17, 2007
Nap Ghost
My understanding is that if you ever need to filter or query off of only one of the fields instead of both then only the first field in your list will be able to use an index.

With that said I'm not sure the best way to set up your indexes and I think most people here will say to measure performance on it a few different ways to determine which way is fastest for your environment.

Thel
Apr 28, 2010

(MS SQL Server 2005, will be going to 2008 later this year and can wait until then if needed.)

So, I have a table with {staff_ID,start_datetime,end_datetime} tuples (there's a bunch of other stuff, those are the needed fields for this though). There's no restriction either in any of the applications or in the database to make sure times don't overlap and for one of our reports I need to amalgamate any overlapping appointments in order to get an actual time spent. i.e. a staff member might see one client from 10-12, then another client wanders in at 11 and stays until 1. Sum up the appointment times and it'd come to four hours, but the actual time spent by the staff is 10-1, so three hours.

I get the feeling this should be an already-solved problem, but my google-fu is weekweak.

stuph
Aug 31, 2004

donkey punching my way to the top

barbarianbob posted:

I keep adding "WHERE key=x AND closed=0" to a lot of my mysql queries. But the rows have `closed` set to 1 after they're a month old. So as time goes on, the percent of my table rows with closed=1 will end up as the majority of my table. Is it a good idea to switch my index from a single index (key) to multipart index (closed, key)? From what I've read, it sounds like when doing a multipart index (in mysql), it's ideal to use the least variable column as the first part of the index. So until I'm able to start purging old data, the first 90% of the index will be with closed=1, which I want to ignore, and the last 10% will be the same as I was querying earlier, but without having to worry about extra processing WHERE closed=1 anymore. Is this method more efficient than my old method of using the first index and then filtering the 10% with closed=0?

To add an example:
code:
threads(
  id
  userId
  deleted
)
Query: SELECT * FROM threads WHERE userId=17 AND deleted=0;
If 90% of threads have deleted=1, which index is better?:
ALTER TABLE threads ADD INDEX(userId)
ALTER TABLE threads ADD INDEX(deleted,userId)

Usually you're almost always going to be faster on the more unique columns. It may make the most sense to do:

alter table threads add index userdeleted (userid, deleted);

That gets you quickly to a userid and then (might) use the optimizer on the deleted column.

Whilst farting I
Apr 25, 2006

I'm trying to organize media and have a very simple question stumping me.

There's 3 tables:

MasterList
DVD
Digital

MasterList contains multiple descriptive fields like genre, medium, etc., and one unique (not in terms of data, just in terms of a way to identify it to any observer) string with which to identify it. The other two tables also contain that unique identifier - let's say I have Seinfeld Season 7 on DVD, the number "SS7" will be in the DVD table, but also MasterList. Or if I have it in a digital non-disc format like from iTunes, it will be in the Digital table instead.

Let's say I want to pull everything from MasterList that's not in the other two tables. How do I do that? I'm using SQLite and every single query I try is returning almost a million rows when it should be returning like a few hundred. :psyduck:

Here's what I'm trying:

code:
SELECT masterlist.id,
       genre,
       medium,
       YEAR,
       country
FROM   masterlist,
       dvd,
       digital
WHERE  masterlist.id <> digital.id
        OR masterlist.id <> dvd.id  
For clarity, the only column in the other two tables is that "id" column.

Whilst farting I fucked around with this message at 03:29 on Apr 10, 2011

wellwhoopdedooo
Nov 23, 2007

Pound Trooper!

Whilst farting I posted:

code:
SELECT masterlist.id,
       genre,
       medium,
       YEAR,
       country
FROM   masterlist,
       dvd,
       digital
WHERE  masterlist.id <> digital.id
        OR masterlist.id <> dvd.id  

When you do from with multiple tables like that, you get every possible combination of every row in all the tables. So, that WHERE clause is just excluding the rows that have corresponding IDs, which is probably only a small subset, and returning everything else.

code:
SELECT
  masterlist.id,
  genre,
  medium,
  YEAR,
  country
  FROM      masterlist M
  LEFT JOIN dvd        DV ON DV.id = M.id
  LEFT JOIN digital    DI ON DI.id = M.id
  WHERE
    DV.id IS NULL AND
    DI.id IS NULL
What we're doing here, is selecting everything from masterlist and trying to match to the other two tables. Normally a JOIN doesn't let you fail--if it can't match up to the other table, it doesn't return the original row either, which is the exact opposite of what you want.

The LEFT part of the join allows us to fail. When a LEFT JOIN fails, it sets everything to NULL for that table in the returned row.

So, our query will be all rows from masterlist. Then, we say, both the join table's id fields must be NULL. Since a primary key can't be NULL, it must have failed to join.

As it is, the query will return all rows in masterlist that aren't in either table. If you want all the rows in masterlist that are missing from one of the two tables, change that AND to OR. Change the fields you select to * to see what's going on in probably a better way than I'm explaining.

wellwhoopdedooo fucked around with this message at 04:31 on Apr 10, 2011

Cock Democracy
Jan 1, 2003

Now that is the finest piece of chilean sea bass I have ever smelled

Bloody Antlers posted:

I'm making a thesaurus application as a self learning project
...
1) Three columns & one table approach:

Table name = allwords.

Column A: ID (primary key, auto increment)
Column B: Word (the word which we want synonyms for)
Column C: Synonym (one synonym for the Word in column B)
...
2) Two columns & many tables approach:

Each Word has its own table, and each table has two columns:

Table name = fight

Column A: ID (primary key, auto increment)
Column B: Synonym

I know this post is a few weeks old now but nobody replied to it. #1 is close and #2 is insane. You don't want to create a new table for each word. You don't say what data type you're planning for "Synonym", but I assume you're thinking varchar, which would work but it's kind of a bad idea. Here's what I would recommend:

Table: words
id: integer, primary, auto increment
word: varchar(255)

Table: word_synonyms
id: integer, primary, auto increment
word_id: integer
synonym_word_id: integer
(create index on word_id and synonym_word_id)
(create unique constraint on word_id, synonym_word_id)

Here's how you would query to see all the words and their synonyms:

SELECT w.id, w.word, GROUP_CONCAT(w2.word)
FROM (words w, word_synonyms ws, words w2)
WHERE w.word_id = ws.word_id
AND ws.synonym_word_id = w2.word_id
GROUP BY w.id

This way there's one and only one record per word. You're also recording the word->synonym relationships as integer to integer. You could even add a "status" field to both tables and you have the ability to enable/disable words and word->synonym relationships.

Cock Democracy fucked around with this message at 14:55 on Apr 10, 2011

Whilst farting I
Apr 25, 2006

wellwhoopdedooo posted:

When you do from with multiple tables like that, you get every possible combination of every row in all the tables. So, that WHERE clause is just excluding the rows that have corresponding IDs, which is probably only a small subset, and returning everything else.

code:
SELECT
  masterlist.id,
  genre,
  medium,
  YEAR,
  country
  FROM      masterlist M
  LEFT JOIN dvd        DV ON DV.id = M.id
  LEFT JOIN digital    DI ON DI.id = M.id
  WHERE
    DV.id IS NULL AND
    DI.id IS NULL
What we're doing here, is selecting everything from masterlist and trying to match to the other two tables. Normally a JOIN doesn't let you fail--if it can't match up to the other table, it doesn't return the original row either, which is the exact opposite of what you want.

The LEFT part of the join allows us to fail. When a LEFT JOIN fails, it sets everything to NULL for that table in the returned row.

So, our query will be all rows from masterlist. Then, we say, both the join table's id fields must be NULL. Since a primary key can't be NULL, it must have failed to join.

As it is, the query will return all rows in masterlist that aren't in either table. If you want all the rows in masterlist that are missing from one of the two tables, change that AND to OR. Change the fields you select to * to see what's going on in probably a better way than I'm explaining.

I had completely forgotten about left joins and what they do. :doh:

That explanation is probably much clearer than you think, messing around with the query didn't produce that much more that wasn't already explained. :) Thanks so much! This has been a huge help. I'd have no idea why the null clauses would be necessary otherwise, especially.

Primpod
Dec 25, 2007

jamming on crusty white
Something odd I noticed earlier today in a query that's meant to look for a point inside a polygon on a SQLSERVER2005 machine. I strongly suspect it doesn't currently actually do this even if it did work, but hey. This statment will crash with a divide by zero error, but only if the final where clause is included, otherwise it will return the results. I've swapped out the calls to other tables with some inline selects that represent the data and show the problem.

From the results given without the where clause I don't think there's an issue, so my guess is that the query analyser is choking on it somewhere, either that or I've made a mistake and in the first case it's glossing over it.

My question is, does the query analyser make mistakes like this or is it my fault, and is there any way to see in greater detail what SQL Servers doing with my query after I run it?

code:
WITH LngLats
     AS (
        --SELECT --List of Longitude and Latitudes for testing
        SELECT 451 as id, -2.9415893554653394 as longitude, 53.39873510548289  as latitude),
     --Brings out Path, with point and next point in path listed. 
     SelectPoly
     AS ( 
			Select '31441614-17EC-46D8-A1AD-9F796E202DB4' as IDArea , 53.3771954465433 as Latitude, -2.9740843392226 as longitude,	4 as sortorder,  0 as o
			UNION SELECT '31441614-17EC-46D8-A1AD-9F796E202DB4',	53.4230475647408,	-2.8697142220351,	3,	1
			UNION SELECT '31441614-17EC-46D8-A1AD-9F796E202DB4',	53.491733130322,	-2.91297288902729,	2,	2
			UNION SELECT '31441614-17EC-46D8-A1AD-9F796E202DB4',	53.5003110153049,	-2.9466185189101,	1,	3
			UNION SELECT '31441614-17EC-46D8-A1AD-9F796E202DB4',	53.4794759985868,	-2.96996446617572,	0,	4
		),
     Paths
     AS (SELECT v1.idarea,
                v1.latitude  v1lat,
                v1.longitude v1lng,
                v2.latitude  v2lat,
                v2.longitude v2lng,
                v1.sortorder v1sortorder
         FROM   selectpoly v1
                INNER JOIN selectpoly v2
                  ON v1.sortorder = ( v2.sortorder - 1 )
                      OR v2.sortorder = 0
                         AND v1.o = 0),
     NumMatches
     AS (SELECT latlng.longitude,
                latlng.latitude,
                ( v1lat + ( latlng.longitude - v1lng ) / ( v2lng - v1lng ) *
                          ( v2lat - v1lat ) )
                numcheck
         FROM   lnglats latlng
                CROSS APPLY (SELECT v2lat,
                                    v1lat,
                                    v2lng,
                                    v1lng
                             FROM   paths) TEMP)
SELECT *
FROM   nummatches
Where nummatches.numcheck = nummatches.numcheck 

Thel
Apr 28, 2010

OK, another brainteaser: (I'm trimming to just the relevant fields)

Location: {loc_ID, loc_name, loc_capacity, loc_opendate}
Person: {pers_ID, pers_name}
EntryLog: {el_ID, el_loc_ID [FK to Location.loc_ID], el_pers_ID [FK to Person.pers_ID], el_startdate, el_enddate}

el_enddate is NULL for current entries (ones that haven't been ended yet). You can assume that data is well-formed and that a location never goes over capacity. Also, computation time isn't a big issue (the dataset is small, ~50 locations and ~1000 entry logs)

Given this, for all Locations:
if they're full (number of current entries = loc_capacity), return null.
If they're under capacity and have been full in the past: return the most recent exit date that put it under capacity (i.e. how long have they had a vacancy).
If they've never been full, return loc_opendate.

So: Location A has capacity 3, with three people currently in it: return null
Location B has capacity 2, only ever had one person in it: return loc_opendate
Location C has capacity 3:
- Person C1 was in from 01-jan-2011 to 01-feb-2011.
- Person C2 was in from 01-jan-2011 to 01-mar-2011.
- Person C3 was in from 01-jan-2011 to 01-feb-2011.

In this case you'd return '01-feb-2011', since that was the last time they were full (note that exit dates/times can collide - also they enter data down to half-hour precision - i.e. '03-mar-2011 15:30').

I can post up some more sample data if people need it.

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
This is not a SQL question, but a MySQL administrative question. We restarted our server last week (first time in months) and the reverse DNS is being weird:

quote:

# Time: 110412 14:39:01
# User@Host: php_r[php_r] @ www.myspace.com [192.168.1.31]
# Query_time: 1392 Lock_time: 0 Rows_sent: 0 Rows_examined: 1210071
[query goes here]

We have myspace, facebook, etc. redirect to this IP. (incidentally SA is also on that list but don't tell anyone :ssh:) So whenever the server first asks the DNS server for info on that IP, the DNS server seems to give it a random entry from the table. I've been told that, in the past, we've restarted Apache until we got the right answer. (Why Apache and not MySQL, I don't know... they could be wrong) I was wondering,
1) Is this potentially even a problem?
2) How can I convince it to get something more meaningful than Myspace?
3) How can I easily get something like the quoted message without either running a long query, or modifying the ini file to shorten slow query time or to turn on the general query log?

Tad Naff
Jul 8, 2004

I told you you'd be sorry buying an emoticon, but no, you were hung over. Well look at you now. It's not catching on at all!
:backtowork:
I am bad at SQL, but our third-party people are worse. They've made a query with two subqueries in it that takes about a minute to run. I chopped it into three separate queries, and it runs much faster. Question: how can I achieve the following with a single (efficient) query?

Ed: it's to find "related items", i.e. random other items that share a subject with the initial item.

Excuse the PHP, it's a Drupal module:
code:
  $items=Array();
  $result=db_query("
            SELECT DISTINCT subject_id
            FROM item_subject
            WHERE item_id = '$id'
  ");
  $in=array();
  while($sid=db_fetch_object($result)){
    $in[]=$sid->subject_id;
  }
  if($in){
    $in=implode(',',$in);
    $result=db_query("
            SELECT DISTINCT item_id
            FROM item_subject
            WHERE subject_id IN(".$in.")
    ");
    $in=array();
    while($iid=db_fetch_object($result)){
      $in[]=$iid->item_id;
    }
    if($in){
      $in=implode(',',$in);
      $result=db_query("
        SELECT item_id AS ID, Title
        FROM item
        WHERE item_id
        IN (".$in.")
        AND
        item_id != ".$id."
        ORDER BY RAND()
        LIMIT ".$num_items."
      ");
      while ($row = db_fetch_object($result)) {
        $items[$row->ID] = $row->Title;
      }
    }
  }

Tad Naff fucked around with this message at 22:26 on Apr 12, 2011

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
code:
SELECT DISTINCT
    item.item_id AS ID,
    item.Title
FROM
    item_subject
    JOIN item_subject AS item_subject_2
        ON item_subject.subject_id = item_subject_2.subject_id AND
           item_subject_2.item_id <> '$id'
    JOIN item
        ON item_subject_2.item_id = item.item_id
WHERE
    item_subject.item_id = '$id'
ORDER BY
    RAND()

Tad Naff
Jul 8, 2004

I told you you'd be sorry buying an emoticon, but no, you were hung over. Well look at you now. It's not catching on at all!
:backtowork:

Hammerite posted:

Magic

Thanks man! Wallace and Madeline Chung's collection is more navigable now, thanks to you.

Goat Bastard
Oct 20, 2004

Thel posted:

OK, another brainteaser: (I'm trimming to just the relevant fields)

Location: {loc_ID, loc_name, loc_capacity, loc_opendate}
Person: {pers_ID, pers_name}
EntryLog: {el_ID, el_loc_ID [FK to Location.loc_ID], el_pers_ID [FK to Person.pers_ID], el_startdate, el_enddate}

el_enddate is NULL for current entries (ones that haven't been ended yet). You can assume that data is well-formed and that a location never goes over capacity. Also, computation time isn't a big issue (the dataset is small, ~50 locations and ~1000 entry logs)

Given this, for all Locations:
if they're full (number of current entries = loc_capacity), return null.
If they're under capacity and have been full in the past: return the most recent exit date that put it under capacity (i.e. how long have they had a vacancy).
If they've never been full, return loc_opendate.

So: Location A has capacity 3, with three people currently in it: return null
Location B has capacity 2, only ever had one person in it: return loc_opendate
Location C has capacity 3:
- Person C1 was in from 01-jan-2011 to 01-feb-2011.
- Person C2 was in from 01-jan-2011 to 01-mar-2011.
- Person C3 was in from 01-jan-2011 to 01-feb-2011.

In this case you'd return '01-feb-2011', since that was the last time they were full (note that exit dates/times can collide - also they enter data down to half-hour precision - i.e. '03-mar-2011 15:30').

I can post up some more sample data if people need it.

I think this does what you want:

code:
WITH      has_been_full 
          AS (SELECT loc_id
                    ,enddate AS date_under_capacity
              FROM   (SELECT el.loc_id
                            ,el.enddate
                            ,row_number() OVER (PARTITION BY el.loc_id ORDER BY el.enddate DESC) AS most_recent
                      FROM   (SELECT DISTINCT 
                                     el.el_loc_id  AS loc_id
                                    ,el.el_enddate AS enddate
                                    ,(SELECT count(*)
                                      FROM   entrylog 
                                      WHERE  el_loc_id = el.el_loc_id
                                      AND    el_startdate < el.el_enddate
                                      AND    (el_enddate IS NULL OR el_enddate > el.el_enddate)
                                     ) AS count_remaining
                                    ,(SELECT count(*)
                                      FROM   entrylog 
                                      WHERE  el_loc_id = el.el_loc_id
                                      AND    el_startdate < el.el_enddate
                                      AND    (el_enddate IS NULL OR el_enddate >= el.el_enddate)
                                     ) AS previous_count
                              FROM   entrylog el
                              WHERE  el.el_enddate IS NOT NULL
                             ) el
                      JOIN   location loc
                             ON loc.loc_id = el.loc_id
                      WHERE  el.count_remaining < loc.loc_capacity
                      AND    el.previous_count >= loc.loc_capacity
                     )
              WHERE  most_recent = 1
             )
         ,currently_full 
          AS (SELECT   loc.loc_id 
              FROM     location loc
              JOIN     entrylog el
                       ON el.el_loc_id = loc.loc_id
              HAVING   count(CASE 
                               WHEN el_startdate < getdate() 
                                    AND (el_enddate IS NULL OR el_enddate > getdate()) 
                               THEN
                                 1
                             END
                            ) >= loc.loc_capacity
              GROUP BY loc.loc_id
                      ,loc.loc_capacity
             )
SELECT    loc.loc_id
         ,CASE
            WHEN cf.loc_id IS NOT NULL THEN
              NULL
            WHEN hbf.loc_id IS NOT NULL THEN
              hbf.date_under_capacity
            ELSE
              loc.loc_opendate
          END AS date_under_capacity
FROM      location loc
LEFT JOIN currently_full cf
          ON loc.loc_id = cf.loc_id
LEFT JOIN has_been_full hbf
          ON loc.loc_id = hbf.loc_id
I haven't tested it in SQL Server (as I don't have an SQL Server instance I can use), but it runs against Oracle 10g with current_date substituted for getdate() This is also where you should substitute a bind variable if you want to run the query to for a date other than today. Edit: actually you'd have to change the has_been_full query too, an AND el.enddate <= :report_date in the subquery with the row_number() should do it. And come to think of it that line should be there anyway with a getdate() if end dates can be in the future. I don't think anything else I've used is Oracle specific.

I've been looking at your other one too but I don't have a solution for that yet sorry, and I don't know how much help I'm going to be able to give you - I've got as far as working out you need a recursive/hierarchical query to collapse the overlapping appointments into a single appointment block, and the syntax for doing that is very different in Oracle and Sql Server (and almost all of my experience is with Oracle). I'll keep going when I've got time though, because it's an interesting problem.

Goat Bastard fucked around with this message at 22:03 on Apr 13, 2011

LLJKSiLk
Jul 7, 2005

by Athanatos
Current Query
code:
SELECT DISTINCT [dbo].[TABLE_ONE].ProviderID,[dbo].[TABLE_TWO].FacilityID,[dbo].[TABLE_TWO].Name
FROM [dbo].[TABLE_ONE]
INNER JOIN [dbo].[TABLE_TWO] ON [dbo].[TABLE_ONE].ProviderID=[dbo].[TABLE_TWO].ProviderID
WHERE [dbo].[TABLE_ONE].ImportID IN
(SELECT ImportedFileID
FROM [dbo].[ImportedFiles]
WHERE [State]=@State AND [ImportType]='Inp')
AND [TABLE_TWO].[State]=@State
What I would like to do is add 12 columns to the right, which would be basically returns of:
(SELECT COUNT([TABLE_ONE].Field1) WHERE Date BETWEEN @Date1 AND @Date2) AS Month1-12
I want to tie it to the [TABLE_ONE].ProviderID so that I can get counts of that specific ID for certain time periods.

How do I do this? I've tried adding columns, but it returns a count of ALL of the ProviderIDs. I need to tie to my result set.

raej
Sep 25, 2003

"Being drunk is the worst feeling of all. Except for all those other feelings."
I'm using Aginity with Netezza to run a query that will divide up a table into quartiles. The problem I'm having is that NTILE() is not supported. What I'd love to be able to do is utilize the sane logic as this statement:

SELECT id, NTILE(4) OVER (ORDER BY id) AS Quartile FROM table;

I've tried replicating the logic on NTILE in line, but I'm getting an error:

SELECT id, (4*(Row_Number()-1)/COUNT(*))+1 OVER (ORDER BY id) AS Quartile FROM table;

Can anyone help me out? How can I replicate NTILE()?

Adbot
ADBOT LOVES YOU

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

LLJKSiLk posted:

Current Query
code:
SELECT DISTINCT [dbo].[TABLE_ONE].ProviderID,[dbo].[TABLE_TWO].FacilityID,[dbo].[TABLE_TWO].Name
FROM [dbo].[TABLE_ONE]
INNER JOIN [dbo].[TABLE_TWO] ON [dbo].[TABLE_ONE].ProviderID=[dbo].[TABLE_TWO].ProviderID
WHERE [dbo].[TABLE_ONE].ImportID IN
(SELECT ImportedFileID
FROM [dbo].[ImportedFiles]
WHERE [State]=@State AND [ImportType]='Inp')
AND [TABLE_TWO].[State]=@State
What I would like to do is add 12 columns to the right, which would be basically returns of:
(SELECT COUNT([TABLE_ONE].Field1) WHERE Date BETWEEN @Date1 AND @Date2) AS Month1-12
I want to tie it to the [TABLE_ONE].ProviderID so that I can get counts of that specific ID for certain time periods.

How do I do this? I've tried adding columns, but it returns a count of ALL of the ProviderIDs. I need to tie to my result set.

This is MSSQL server 2005/8 right?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply