|
namol posted:Stupid SQL Question here. Are all the columns string data types? i wonder if its not matching them as non-wildcard because of string->number matching issues or something strange.
|
# ? Mar 22, 2011 21:32 |
|
|
# ? Jun 7, 2024 22:51 |
|
I double checked this morning and the columns are all varchar data types except for the IDNum which is an integer type. Can where contains and like be combined for use?
|
# ? Mar 23, 2011 14:56 |
|
I know there's got to be a way to handle this in SQL but gently caress if I know how. My SQL-fu is weak. Hopefully someone can point me in the right direction. We have a customer database. Customers are grouped by districts and each district has their own customer entry along with any number of customers inside of that district. It's all one big ugly table and I hate it. A customer number is 8 digits. The first 5 signify the district, the last 3 the customer within that district. The 000 record is the master record for the district. Ex: 12345000 = District 12345001 = Customer #1 12345023 = Customer #23 Each customer has a value I want to group by. code:
The problem is for 12345 I need to know the exact customers that are the odd balls. The value with the highest count should be returned as customer 12345000, and anyone that has a different value within that district should be included otherwise. I basically want my result set to look like this: code:
Edit: SQL Server 2000, if that makes a difference. Frozen Peach fucked around with this message at 16:55 on Mar 23, 2011 |
# ? Mar 23, 2011 16:46 |
|
Upgrade to SQL 2005/8 and then use analytical functions.
|
# ? Mar 23, 2011 17:12 |
|
Frozen-Solid posted:I know there's got to be a way to handle this in SQL but gently caress if I know how. My SQL-fu is weak. Hopefully someone can point me in the right direction. My SQL-Fu is strong well strong-ish, but I'm not sure how you wanted duplicate groups handled, so this doesn't do that. I think I've got the test data roughly right, let me know if not... code:
|
# ? Mar 23, 2011 22:34 |
|
Golbez posted:Let's say I have a nightly process that imports millions of rows from a file. We create a table and indexes, then load the file. This process is running so long, it's starting to run into the work day. So I was wondering: Do you think there'd be quantifiable optimization by not defining the indexes before we load the file (thus not having to reindex on every entry), and defining the indexes only after the file is loaded? This is a higher-level response -- We went the way of delaying indexing and doing it on the backside to speed things up. Then our data went up by 10x. Then we rewrote our SLAs to push back data availability to 9am. Then our data went up 10x again. We had to switch to Vertica (which is no walk in the park itself). There's two points I'd impart on you: - consider if you really need to load/store everything, at least into MySQL. In our case, it's really a waste, since the customers look at maybe 1% of the data ever. The reason is that we were ambitious about what we wanted to store. And we decided to store everything, where our data was human-generated. So tons and tons of insignificant events. Instead of coming up with the right summary data to extract, we blindly loaded in everything. So when you're talking about hundreds of millions of different events, it's just not wise to be trying to store it into a relational database. And yet we still try - If you're having performance issues with MySQL, I'd say you're already on thin ice, in terms of disk space and indexing time. If your data is human-generated and could scale up quickly, be vigilant, I guess.
|
# ? Mar 23, 2011 22:52 |
|
kimbo305 posted:This is a higher-level response -- It's a problem that will be solved within the year because we're moving that database in-house, and in the meantime, I managed to cut the replication time down by six hours, making me a local hero in the office.
|
# ? Mar 23, 2011 23:00 |
|
Molotov Cock Tale posted:My SQL-Fu is strong well strong-ish, but I'm not sure how you wanted duplicate groups handled, so this doesn't do that. I think I've got the test data roughly right, let me know if not... I've only gotten to the first half of it, but so far this looks awesome. It's taking some time to modify the queries to work with my data (there's all sorts of joins and where clauses and magic I'm doing just to get it in the format I posted) I'm not sure what you mean by duplicate groups, but it's early and my brain hasn't kicked in yet. This is a great start though and I should be able to handle the rest. Thank you so much! Every problem I've run into with SQL in the last 2 years comes with "well if you were on 2005!!!" It's really depressing. We've looked at upgrading to 2005 or 2008, but we're a small business and the cost of upgrading (both in testing the migration and rewriting a ton of stored procedures that likely won't work... and the actual server software itself) is pretty staggering. Edit: finished screwing around with the rest of the query and from what I can tell it's perfect. Send me an e-mail frozen at frozen-solid.net and I'll give you a plat upgrade certificate for your time Frozen Peach fucked around with this message at 15:08 on Mar 24, 2011 |
# ? Mar 24, 2011 14:31 |
|
Frozen-Solid posted:Edit: finished screwing around with the rest of the query and from what I can tell it's perfect. What I'd meant by duplicate groups was how should it be handled when two values in the same district have the same count. Looking at some test data I made up, it may go a bit wrong: code:
If it's a bad/repeated query to get the data in the initial format, don't forget temp tables. Depends on data volume of course, could be faster or slower than querying many times...Test, test, test. Dropped you an email - let me know if you need more help, or try stackoverflow.com, where people will fight to the death to get an accepted answer on TSQL
|
# ? Mar 24, 2011 20:44 |
|
Hi guys, I'm doing a course on relational database theory. I'm stuck on an assignment question and wondered if any of you fine goons could maybe give me some pointers. The question is based around the three relations detailed below. I've cut out the stuff that isn't relevant to this question: relation Event primary key: EventCode Date relation Entertainer primary key: EntertainerCode relation EntertainedBy primary key: (EventCode, EntertainerCode) I have to define a constraint on these relations of the following nature: An entertainer cannot perform at more than one event on the same date. The constraint must be defined using relational operators such as select, join, project, divide etc. I know how to get the relevant data together (project (Event join (Entertainer join EntertainedBy)) over EventCode, Date, EntertainerCode) but can't figure out how to add the constraint. I'm not looking for the answer as I do want to figure this out myself, just some pointers to hopefully get me in the right direction. Thanks!
|
# ? Mar 26, 2011 19:47 |
|
My relational theory-fu is a bit rusty, but wouldn't that just be a unique constraint on EntertainedBy (i.e. no duplicate entries)?
|
# ? Mar 26, 2011 23:06 |
|
Thel posted:My relational theory-fu is a bit rusty, but wouldn't that just be a unique constraint on EntertainedBy (i.e. no duplicate entries)? Isn't the unique constraint enforced by the way the Primary Key on EntertainedBy is defined?
|
# ? Mar 27, 2011 01:35 |
|
Sorry, I'll clarify the question a little: Write a relational expression to express the following constraint: An entertainer cannot perform at more than one event on the same date.
|
# ? Mar 27, 2011 15:21 |
|
I'm making a thesaurus application as a self learning project, and I'm a little confused about how I should structure my database. I can think of a few ways to do it, but I am interested in knowing which approach is most effective/efficient. I'm working with SQLite. I am assuming the thesaurus database will have ~4,000 unique word entries, with each entry having anywhere from 1 to 100 synonyms associated with it. Since the number of synonyms for each word will be variable, I'm not sure how to best arrange/query the stored data. I can immediately think of two ways to build the database & query synonyms: 1) Three columns & one table approach: Table name = allwords. Column A: ID (primary key, auto increment) Column B: Word (the word which we want synonyms for) Column C: Synonym (one synonym for the Word in column B) Query: SELECT * FROM "allwords" WHERE word = "fight" Returns: 1 fight scrap 2 fight altercation 3 fight scuffle 4 fight brawl 2) Two columns & many tables approach: Each Word has its own table, and each table has two columns: Table name = fight Column A: ID (primary key, auto increment) Column B: Synonym Query: Select * FROM "fight" Returns: 1 scrap 2 altercation 3 scuffle 4 brawl Which of the two approaches would be best? Am I on the right track with either of them? Any thoughts or comments would be greatly appreciated.
|
# ? Mar 28, 2011 00:08 |
|
Possibly-stupid drowsy-before-bed idea. Are there ever conditions under which word A is synonymous with word B, but word B is not synonymous with A? I'm pretty sure the answer is no. Therefore, you can effectively group all of the related words together into synonym families. Table Families ( family integer primary key ) Table Words ( word text, family integer references Families(family), unique(word, family) ) To get synonyms, SELECT word, family FROM Words WHERE family = ( SELECT family FROM Words WHERE word = ? ) Keep in mind that words can be multiple parts of speech and have a different meaning based on that context. This implies that each word can be a member of multiple families, which instantly breaks the query I just wrote. You'd need to add part-of-speech or context or something else to the Words table to differentiate different word meanings to tie them together with the appropriate family of synonyms, then find a way to query that one specific word+pos+context. Given that there could be multiple entries for one word, you could end up returning words from the synonym query that belong to other families, which you could then crawl out to a certain extent to get related-but-not-synonymous words. Anyway, if this seems stupid, pick option #1. Option #2, the creation of multiple tables for specific data, is an anti-pattern and you should forget you ever proposed it. McGlockenshire fucked around with this message at 07:32 on Mar 28, 2011 |
# ? Mar 28, 2011 07:29 |
|
Dumb question here but in sql server 2005 can you store the results of a query temporarily and then perform a search on those results? If there's a better way of doing things, please let me know, I'm a clueless newbie when it comes to sql in general.
|
# ? Mar 31, 2011 16:49 |
|
A subquery perhaps?
|
# ? Apr 1, 2011 01:04 |
|
Depending on what you want to do, and how long you want to retain the information, there's a couple ways. If it's pretty short term (e.g. the lifespan of a stored proc/function) you could use temporary tables, and for longer term you could use a view, maybe.
|
# ? Apr 1, 2011 20:31 |
|
namol posted:Dumb question here but in sql server 2005 can you store the results of a query temporarily and then perform a search on those results? If there's a better way of doing things, please let me know, I'm a clueless newbie when it comes to sql in general. Are you sure you need another query? how about you post the problem your trying to solve?
|
# ? Apr 2, 2011 07:19 |
|
I've read the first few and the last few pages of the thread, but what's a good SQL tutorial? Google turns up w3schools (which I know I shouldn't use) and some other sketchy looking sites.
|
# ? Apr 4, 2011 20:06 |
|
The first couple searches that Google turns up are ok, you mainly just want to practice a lot and google specific things you want to do.
|
# ? Apr 4, 2011 21:05 |
|
Depends on the language too. T-SQL (e.g. Microsoft SQL Server) or O/PL SQL (e.g. Oracle) or PostGres/MySQL? (below assumes T-SQL) This looks okay for learning basics though I'm sure there's better ones, since it seems like it's just scraped from somewhere else: http://www.sql-tutorial.net/ (just a little MFA) Generally when I do quick searches and want specific info I'll click on a java2s or devguru link 90% of the time.
|
# ? Apr 4, 2011 23:23 |
|
I keep adding "WHERE key=x AND closed=0" to a lot of my mysql queries. But the rows have `closed` set to 1 after they're a month old. So as time goes on, the percent of my table rows with closed=1 will end up as the majority of my table. Is it a good idea to switch my index from a single index (key) to multipart index (closed, key)? From what I've read, it sounds like when doing a multipart index (in mysql), it's ideal to use the least variable column as the first part of the index. So until I'm able to start purging old data, the first 90% of the index will be with closed=1, which I want to ignore, and the last 10% will be the same as I was querying earlier, but without having to worry about extra processing WHERE closed=1 anymore. Is this method more efficient than my old method of using the first index and then filtering the 10% with closed=0? To add an example: code:
If 90% of threads have deleted=1, which index is better?: ALTER TABLE threads ADD INDEX(userId) ALTER TABLE threads ADD INDEX(deleted,userId) Peanut and the Gang fucked around with this message at 03:19 on Apr 6, 2011 |
# ? Apr 6, 2011 03:13 |
|
My understanding is that if you ever need to filter or query off of only one of the fields instead of both then only the first field in your list will be able to use an index. With that said I'm not sure the best way to set up your indexes and I think most people here will say to measure performance on it a few different ways to determine which way is fastest for your environment.
|
# ? Apr 6, 2011 17:45 |
|
(MS SQL Server 2005, will be going to 2008 later this year and can wait until then if needed.) So, I have a table with {staff_ID,start_datetime,end_datetime} tuples (there's a bunch of other stuff, those are the needed fields for this though). There's no restriction either in any of the applications or in the database to make sure times don't overlap and for one of our reports I need to amalgamate any overlapping appointments in order to get an actual time spent. i.e. a staff member might see one client from 10-12, then another client wanders in at 11 and stays until 1. Sum up the appointment times and it'd come to four hours, but the actual time spent by the staff is 10-1, so three hours. I get the feeling this should be an already-solved problem, but my google-fu is
|
# ? Apr 7, 2011 23:57 |
|
barbarianbob posted:I keep adding "WHERE key=x AND closed=0" to a lot of my mysql queries. But the rows have `closed` set to 1 after they're a month old. So as time goes on, the percent of my table rows with closed=1 will end up as the majority of my table. Is it a good idea to switch my index from a single index (key) to multipart index (closed, key)? From what I've read, it sounds like when doing a multipart index (in mysql), it's ideal to use the least variable column as the first part of the index. So until I'm able to start purging old data, the first 90% of the index will be with closed=1, which I want to ignore, and the last 10% will be the same as I was querying earlier, but without having to worry about extra processing WHERE closed=1 anymore. Is this method more efficient than my old method of using the first index and then filtering the 10% with closed=0? Usually you're almost always going to be faster on the more unique columns. It may make the most sense to do: alter table threads add index userdeleted (userid, deleted); That gets you quickly to a userid and then (might) use the optimizer on the deleted column.
|
# ? Apr 8, 2011 16:05 |
|
I'm trying to organize media and have a very simple question stumping me. There's 3 tables: MasterList DVD Digital MasterList contains multiple descriptive fields like genre, medium, etc., and one unique (not in terms of data, just in terms of a way to identify it to any observer) string with which to identify it. The other two tables also contain that unique identifier - let's say I have Seinfeld Season 7 on DVD, the number "SS7" will be in the DVD table, but also MasterList. Or if I have it in a digital non-disc format like from iTunes, it will be in the Digital table instead. Let's say I want to pull everything from MasterList that's not in the other two tables. How do I do that? I'm using SQLite and every single query I try is returning almost a million rows when it should be returning like a few hundred. Here's what I'm trying: code:
Whilst farting I fucked around with this message at 03:29 on Apr 10, 2011 |
# ? Apr 10, 2011 03:27 |
|
Whilst farting I posted:
When you do from with multiple tables like that, you get every possible combination of every row in all the tables. So, that WHERE clause is just excluding the rows that have corresponding IDs, which is probably only a small subset, and returning everything else. code:
The LEFT part of the join allows us to fail. When a LEFT JOIN fails, it sets everything to NULL for that table in the returned row. So, our query will be all rows from masterlist. Then, we say, both the join table's id fields must be NULL. Since a primary key can't be NULL, it must have failed to join. As it is, the query will return all rows in masterlist that aren't in either table. If you want all the rows in masterlist that are missing from one of the two tables, change that AND to OR. Change the fields you select to * to see what's going on in probably a better way than I'm explaining. wellwhoopdedooo fucked around with this message at 04:31 on Apr 10, 2011 |
# ? Apr 10, 2011 04:29 |
|
Bloody Antlers posted:I'm making a thesaurus application as a self learning project Table: words id: integer, primary, auto increment word: varchar(255) Table: word_synonyms id: integer, primary, auto increment word_id: integer synonym_word_id: integer (create index on word_id and synonym_word_id) (create unique constraint on word_id, synonym_word_id) Here's how you would query to see all the words and their synonyms: SELECT w.id, w.word, GROUP_CONCAT(w2.word) FROM (words w, word_synonyms ws, words w2) WHERE w.word_id = ws.word_id AND ws.synonym_word_id = w2.word_id GROUP BY w.id This way there's one and only one record per word. You're also recording the word->synonym relationships as integer to integer. You could even add a "status" field to both tables and you have the ability to enable/disable words and word->synonym relationships. Cock Democracy fucked around with this message at 14:55 on Apr 10, 2011 |
# ? Apr 10, 2011 14:52 |
|
wellwhoopdedooo posted:When you do from with multiple tables like that, you get every possible combination of every row in all the tables. So, that WHERE clause is just excluding the rows that have corresponding IDs, which is probably only a small subset, and returning everything else. I had completely forgotten about left joins and what they do. That explanation is probably much clearer than you think, messing around with the query didn't produce that much more that wasn't already explained. Thanks so much! This has been a huge help. I'd have no idea why the null clauses would be necessary otherwise, especially.
|
# ? Apr 11, 2011 05:16 |
|
Something odd I noticed earlier today in a query that's meant to look for a point inside a polygon on a SQLSERVER2005 machine. I strongly suspect it doesn't currently actually do this even if it did work, but hey. This statment will crash with a divide by zero error, but only if the final where clause is included, otherwise it will return the results. I've swapped out the calls to other tables with some inline selects that represent the data and show the problem. From the results given without the where clause I don't think there's an issue, so my guess is that the query analyser is choking on it somewhere, either that or I've made a mistake and in the first case it's glossing over it. My question is, does the query analyser make mistakes like this or is it my fault, and is there any way to see in greater detail what SQL Servers doing with my query after I run it? code:
|
# ? Apr 11, 2011 17:53 |
|
OK, another brainteaser: (I'm trimming to just the relevant fields) Location: {loc_ID, loc_name, loc_capacity, loc_opendate} Person: {pers_ID, pers_name} EntryLog: {el_ID, el_loc_ID [FK to Location.loc_ID], el_pers_ID [FK to Person.pers_ID], el_startdate, el_enddate} el_enddate is NULL for current entries (ones that haven't been ended yet). You can assume that data is well-formed and that a location never goes over capacity. Also, computation time isn't a big issue (the dataset is small, ~50 locations and ~1000 entry logs) Given this, for all Locations: if they're full (number of current entries = loc_capacity), return null. If they're under capacity and have been full in the past: return the most recent exit date that put it under capacity (i.e. how long have they had a vacancy). If they've never been full, return loc_opendate. So: Location A has capacity 3, with three people currently in it: return null Location B has capacity 2, only ever had one person in it: return loc_opendate Location C has capacity 3: - Person C1 was in from 01-jan-2011 to 01-feb-2011. - Person C2 was in from 01-jan-2011 to 01-mar-2011. - Person C3 was in from 01-jan-2011 to 01-feb-2011. In this case you'd return '01-feb-2011', since that was the last time they were full (note that exit dates/times can collide - also they enter data down to half-hour precision - i.e. '03-mar-2011 15:30'). I can post up some more sample data if people need it.
|
# ? Apr 12, 2011 00:42 |
|
This is not a SQL question, but a MySQL administrative question. We restarted our server last week (first time in months) and the reverse DNS is being weird:quote:# Time: 110412 14:39:01 We have myspace, facebook, etc. redirect to this IP. (incidentally SA is also on that list but don't tell anyone ) So whenever the server first asks the DNS server for info on that IP, the DNS server seems to give it a random entry from the table. I've been told that, in the past, we've restarted Apache until we got the right answer. (Why Apache and not MySQL, I don't know... they could be wrong) I was wondering, 1) Is this potentially even a problem? 2) How can I convince it to get something more meaningful than Myspace? 3) How can I easily get something like the quoted message without either running a long query, or modifying the ini file to shorten slow query time or to turn on the general query log?
|
# ? Apr 12, 2011 21:07 |
|
I am bad at SQL, but our third-party people are worse. They've made a query with two subqueries in it that takes about a minute to run. I chopped it into three separate queries, and it runs much faster. Question: how can I achieve the following with a single (efficient) query? Ed: it's to find "related items", i.e. random other items that share a subject with the initial item. Excuse the PHP, it's a Drupal module: code:
Tad Naff fucked around with this message at 22:26 on Apr 12, 2011 |
# ? Apr 12, 2011 22:23 |
|
code:
|
# ? Apr 12, 2011 23:27 |
|
Hammerite posted:Magic Thanks man! Wallace and Madeline Chung's collection is more navigable now, thanks to you.
|
# ? Apr 12, 2011 23:51 |
|
Thel posted:OK, another brainteaser: (I'm trimming to just the relevant fields) I think this does what you want: code:
I've been looking at your other one too but I don't have a solution for that yet sorry, and I don't know how much help I'm going to be able to give you - I've got as far as working out you need a recursive/hierarchical query to collapse the overlapping appointments into a single appointment block, and the syntax for doing that is very different in Oracle and Sql Server (and almost all of my experience is with Oracle). I'll keep going when I've got time though, because it's an interesting problem. Goat Bastard fucked around with this message at 22:03 on Apr 13, 2011 |
# ? Apr 13, 2011 10:58 |
|
Current Querycode:
(SELECT COUNT([TABLE_ONE].Field1) WHERE Date BETWEEN @Date1 AND @Date2) AS Month1-12 I want to tie it to the [TABLE_ONE].ProviderID so that I can get counts of that specific ID for certain time periods. How do I do this? I've tried adding columns, but it returns a count of ALL of the ProviderIDs. I need to tie to my result set.
|
# ? Apr 13, 2011 16:38 |
|
I'm using Aginity with Netezza to run a query that will divide up a table into quartiles. The problem I'm having is that NTILE() is not supported. What I'd love to be able to do is utilize the sane logic as this statement: SELECT id, NTILE(4) OVER (ORDER BY id) AS Quartile FROM table; I've tried replicating the logic on NTILE in line, but I'm getting an error: SELECT id, (4*(Row_Number()-1)/COUNT(*))+1 OVER (ORDER BY id) AS Quartile FROM table; Can anyone help me out? How can I replicate NTILE()?
|
# ? Apr 13, 2011 22:22 |
|
|
# ? Jun 7, 2024 22:51 |
|
LLJKSiLk posted:Current Query This is MSSQL server 2005/8 right?
|
# ? Apr 13, 2011 22:34 |