|
My DBA experience since graduating college has been working as a Prod DBA (2 years) and I haven't really gotten into SQL development yet. Is there a recommended version control add-on for MS-SQL? Preferably free as we're mostly SaaS anyways and I don't need anything that robust, just for personal learning.
|
# ? Jul 1, 2014 22:23 |
|
|
# ? May 30, 2024 10:56 |
|
BabyFur Denny posted:Hi, what tools do you use for mySQL clients? http://www.mysql.com/products/workbench/ ? Have no idea about it's SVN integration but it's what I've busted out when I've had to get deep into MySQL.
|
# ? Jul 1, 2014 23:59 |
Anaxandrides posted:The EXPLAIN explains it all about your other table. Those indexes you're making aren't doing the trick for this thing -- SQL chose to use the PK for mdl_user rather than a specific index that includes those columns, and your other index (newindex1) isn't being used either. 99% bet that it's not being used because the column you're using as a join predicate-cum-WHERE clause isn't included in the index. I tried this including several variations on FORCE INDEX (..) and sub-selects to pre-order the two sort tables but it didn't help. It's shaved off maybe 1 second on the normal query, but other forms are still around 25 seconds. The combined index on one table is about 783 characters which is hosed but I tried to create an index using the first ten characters of the long column but that didn't help either, it wouldn't use the index then. I didn't spend too long trying to fix it (i.e. ORDER BY a substring) because after the initial quick sort I would still need to have a more exact sort. I don't know if doing a pre-sort sort (based on the first couple of letters of a column) would speed it up at all, for 650k rows... Here's the new explain extended: code:
(ep_idshosta_ix is id, shortname, startdate) (ep_idlasfir_ix is id, lastname, firstname) code:
e: Just noticed that the last table references a different column than the index I've forced. Yet if I remove the FORCE INDEX for the 'userid' ref table, it uses the PRIMARY key and the query takes 26seconds instead of 20. Ugh e2: If it helps clarify, mdl_course is about 443 rows, mdl_user is about 41923. If I order by: ORDER BY LEFT(d.shortname, 4),d.startdate,a.lastname,a.firstname The time drops down to 13 odd seconds, but if I do two passes: LEFT(d.shortname, 1),d.shortname it goes back up to 20 seconds. Sulla Faex fucked around with this message at 10:42 on Jul 2, 2014 |
|
# ? Jul 2, 2014 08:59 |
|
Sulla-Marius 88 posted:
If you want to use a left join, do this: code:
|
# ? Jul 2, 2014 15:16 |
|
kloa posted:My DBA experience since graduating college has been working as a Prod DBA (2 years) and I haven't really gotten into SQL development yet. Is there a recommended version control add-on for MS-SQL? Preferably free as we're mostly SaaS anyways and I don't need anything that robust, just for personal learning. The official answer is that you should use Microsoft Team Foundation Server's database plugin, but that would require a TFS licence. The next option would be to use middleware from Red Gate, which is pricey but less than TFS. The third option is to automate with scripts, which I've never tried but is supposedly possible.
|
# ? Jul 2, 2014 16:19 |
|
Is there a recommended dummies book for understanding the basics of SQL? I just started a new job in hospital clinical informatics. I doubt I'll ever write to a database but I'm sure it would be handy if I knew how to read and pull reports from them. I'm trying to teach myself VBA and Python to that end.
|
# ? Jul 3, 2014 03:54 |
Jethro posted:Are you left joining course completions (i.e. you want all courses and the users enrolled in them, regardless of whether or not the user completed the course), or are you inner joining it? Putting a.id = cc.userid means this will behave as an inner join despite the left outer join above. That is a fair point, thank you. I think it should be an INNER JOIN but I'm not 100% sure, I'll have to find out at what point it was changed from the original code and check who did it and if possible ask them. Sulla Faex fucked around with this message at 11:23 on Jul 3, 2014 |
|
# ? Jul 3, 2014 11:20 |
|
I prefer this style of syntax so that the type of joins you are doing becomes clearer:code:
I know the 'a,b,c,d' style of aliasing tables exists but I'm not a fan. It leads to more confusion about which table is what. Great example is that Course Completion is aliased to 'cc' but Course itself is aliased to 'd' while User Enrollments is aliased to 'c' instead. FieryBalrog fucked around with this message at 15:21 on Jul 4, 2014 |
# ? Jul 4, 2014 15:14 |
Yeah the trick is deciding what to do. I'm managing/updating legacy code but what they really want to do is just get it working on a new version of the platform. I wouldn't have touched that query except that it was literally taking twenty full seconds to run. It still is, but it's not part of the work description so I'm leaving it. It's always tough figuring out a balance for time spent vs income vs clarity vs not modifying code that somebody else might be more familiar with. I don't know who'll be maintaining this in future, there's no indication it will be me, so I don't want to touch/clean up too much in case the people who normally maintain it are going to get cranky. Suffice to say the rest of it is pretty ugly but the thing that concerns me most is the functionality of it. If the job/project description included an unspoken "You will be maintaining this code in future, please do whatever is required to ensure long-term comfort" then I'd start moving stuff around, but as it is I'm going with a light touch. The touch is so bloody light that even the twenty second query is going to be left as-is because nobody else has mentioned it so I'm assuming they'd care more if it broke (or returned different data) than about trying to speed it up. Just makes me want to work harder on my own projects and business ideas so I don't have to spend the rest of my life figuring out the balance between the right thing and the cost effective thing on code so ugly you wonder how it could ever have passed review.
|
|
# ? Jul 4, 2014 15:48 |
|
I need to grab a bunch of entries, which method is faster? a. Grab entire list A, Grab entire list B, filter out on my end b. Grab entire list A, Grab what I need from list B by doing a query on each entry of list A I'm trying to run b right now, and it's taking forever. I'm basically doing something like this: code:
|
# ? Jul 7, 2014 17:53 |
|
the posted:I need to grab a bunch of entries, which method is faster? Smells like a join to me
|
# ? Jul 7, 2014 17:59 |
|
Malcolm XML posted:Smells like a join to me Listen to this guy right here.
|
# ? Jul 7, 2014 18:16 |
|
Malcolm XML posted:Smells like a join to me Go on....
|
# ? Jul 7, 2014 18:21 |
|
the posted:Go on.... It's tough without knowing a little bit more about the schema but here's a sketch: code:
|
# ? Jul 7, 2014 18:25 |
|
You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code. I'm grabbing one list with the following: Python code:
Python code:
|
# ? Jul 7, 2014 18:27 |
|
the posted:You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code. I'd see if this works, because 20k individual queries is going to suck. Python code:
|
# ? Jul 7, 2014 18:37 |
|
I have this exact line: opp_query = "SELECT O.AccountId, O.Id, O.StageName, R.ContactId FROM Opportunity AS O JOIN OpportunityContactRole AS R ON R.OpportunityId = O.Id WHERE RecordType.Name = \'FD - Buying\' AND R.IsPrimary = TRUE" And I'm getting an error: code:
edit: If I'm supposed to put it on multiple lines, I'm not sure how I can format that in Python. the fucked around with this message at 18:59 on Jul 7, 2014 |
# ? Jul 7, 2014 18:55 |
|
the posted:I have this exact line: It looks like SOQL doesn't do explicit joins and that makes me a sad. https://developer.salesforce.com/page/From_SQL_to_SOQL Apparently, it does some subquery like syntax to make it all work in the example.
|
# ? Jul 7, 2014 19:04 |
|
fletcher would know how to do this. According to that page, I can link by simply going Blah.blah, but I'm getting errors when doing it. For example, every Opportunity has an OpportunityContactRole associated with it, but I get errors when trying to reference that: opp_query = "SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId FROM Opportunity WHERE RecordType.Name = \'FD - Buying\'" Error: SoapFaultError: 'INVALID_FIELD' "INVALID_FIELD: SELECT AccountId, Id, Stagename, OpportunityContactRole.ContactId ^ ERROR at Row:1:Column:34 Didn't understand relationship 'OpportunityContactRole' in field path. If you are attempting to use a custom relationship, be sure to append the '__r' after the custom relationship name. Please reference your WSDL or the describe call for the appropriate names."
|
# ? Jul 7, 2014 19:34 |
|
the posted:fletcher would know how to do this. Alternatively you can just do a hash join client side on the filtered tuples. Make a dictionary keyed by the Id and just lookup manually
|
# ? Jul 7, 2014 20:24 |
the posted:You can view my history, but I'm doing queries in Salesforce using Python. The SQL queries are basically no different, but they're embedded in Python code. Use a subquery (assuming you can perfrom 'some stuff' where additional SOQL) code:
Python code:
the posted:fletcher would know how to do this. Maybe flip it around and query for Opportunity fields through OpportunityContactRole: Python code:
|
|
# ? Jul 8, 2014 16:23 |
|
I'm pretty much a beginner with SQL and will need to be pulling some extracts from our HR system for yearly audits soon. Last year, I took full extracts and did what I needed to do in Excel. I'd like to build this all out in SQL Server this year so they can be run on demand. I'm looking less for the exact syntax as an answer than I am looking for the concepts that I either don't know or I'm not thinking to utilize for this issue. I need to select all the terminated records for the calender year, but exclude people who have at least one active job code. Some people have multiple jobs, usually with matching employee id's but sometimes not. I was thinking about attempting to join the table to itself and counting the instances of (first,last,last4SS#) but I'm just not sure how to do that and check for an active record within the query. Maybe using if statements would work, but I've really just begun the process. I'm still learning and would rather learn the correct way to do this than come to some convoluted solution.
|
# ? Jul 11, 2014 14:11 |
|
PapFinn posted:I'm pretty much a beginner with SQL...
|
# ? Jul 11, 2014 14:29 |
|
I haven't used Except, so I'll start reading up on it now. Thanks. Out of curiosity, what are you getting at with the comment about SSN? There are certainly duplicate last 4 of SSN, but I don't think I've run into a duplicate "First,Last,Last4SSN". I'm only working with ~80,000 records though. Is that bad practice? Edit: Maybe a better explanation of what I'm doing. With the data below, I want to return only User Two. User One does have a terminated record, but the second record for User One would not qualify him as a terminated user. code:
PapFinn fucked around with this message at 15:28 on Jul 11, 2014 |
# ? Jul 11, 2014 15:07 |
|
A key point of SQL records is that there should be a completely reliable way of tying an individual record to the object it represents (in this case, an employee). Your 'first,last,last4ssn' is only good enough up until the point when it isn't good enough. In this case, that's not your fault, but it could turn out to be your problem in the future. That said, code:
One Swell Foop fucked around with this message at 15:52 on Jul 11, 2014 |
# ? Jul 11, 2014 15:50 |
|
Hello SQL thread, I have a PostgreSQL question: In my project I am tracking the changes of a property x on a number of different targets. I have a table called 'property_x' like this: pre:id: serial primary key target: char[8] value: double update_time: timestamp I don't care about the cases where the maxima is a plateau (just return any point on the plateau). Now doing this in my application is easy enough (for each point assume it's a maxima unless we have a next point and that point is bigger), but this takes ages since the whole table need to be transmitted from the db to the application. What's the proper way to do this in PgSQL?
|
# ? Jul 11, 2014 23:05 |
|
Something like thisSQL code:
Edit: attempted fix SQL code:
Sedro fucked around with this message at 23:48 on Jul 11, 2014 |
# ? Jul 11, 2014 23:39 |
|
Sedro posted:... Thank you very much
|
# ? Jul 11, 2014 23:56 |
|
Need some quick logic help on this SQL query I'm refactoring. I have a query that searches the database for several parameters, but there are 2 logic checks (WHERE) that I can't seem to write concisely. My query: code:
Legend: rows marked as legend have the value 1 in the Legend column, everything else has 0 in the legend column Final_ed: Everything not considered "final_ed" is 1, rows that are "final_ed" have value of 0 in final_ed column How can I modify my query so I can get results in these kind of scenarios: 1. Get me all results considered "legend" and "final_ed" 2. Get me all considered "not legend" and not "final_ed" 3. Get me all considered "not legend" and ARE "final_ed" Using (stupidly) less-than-equal-to operands does not give me the result I want.
|
# ? Jul 14, 2014 00:45 |
|
code:
|
# ? Jul 14, 2014 16:04 |
|
Jethro posted:
So there's no way to do this query without making 3 separate ones?
|
# ? Jul 14, 2014 19:19 |
|
where legend*final_ed = 0
|
# ? Jul 14, 2014 19:28 |
|
Fluue posted:So there's no way to do this query without making 3 separate ones? If you want all three scenarios at once, then you'd do: code:
code:
|
# ? Jul 14, 2014 19:28 |
|
Jethro posted:I think I mostly just didn't understand what you were asking. Sorry for not being clear. I was a little vague on this, but the queries that do not say that final_ed must be '0' should also include rows that have final_ed as '1' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1) Since I can't do a wildcard on this (without using a LIKE, which leads to even bigger issues when dealing with INT) am I left dynamically building this query in my script before sending it to the DB? e: it'd probably be easier to split legend off into another table and just do a join, right? Fluue fucked around with this message at 01:13 on Jul 15, 2014 |
# ? Jul 15, 2014 00:52 |
|
Fluue posted:Sorry for not being clear. I was a little vague on this, but the queries that do not say that final_ed must be '0' should also include rows that have final_ed as '1' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1) I think I know what you're after, but can't type a response on my phone. What SQL engine are you using?
|
# ? Jul 15, 2014 01:44 |
|
Is an 'or' 100% avoidable in a where clause? Should it be used or avoided? I have a criteria to pull records where: - they are in state x,y,z - they are active - they are inactive and updated within 2 years ago my first instinct is to union them: code:
code:
Roundboy fucked around with this message at 02:04 on Jul 15, 2014 |
# ? Jul 15, 2014 01:58 |
|
An 'or' is fine and is perfectly at home in a where clause. Generally, don't try and second-guess the query optimizers in a modern database system. Your second example is optimal. Is this query running on a remote server or locally? You may find that data volume and transfer rates are a bigger bottleneck than the server performance; however if you do find that you need the server to perform better then you should talk to the DBA about your query, if one is available. One Swell Foop fucked around with this message at 03:35 on Jul 15, 2014 |
# ? Jul 15, 2014 03:32 |
|
So the trouble here is that you need to start by writing concisely what you want in non-contradictory, but clear English. If you have to use examples to explain your meaning, then perhaps you don't understand what it is you want, though there are cases where examples help clarify meaning. If you can describe what it is you want, then it may be possible to derive a query from it. For starters, you're assuming that we know the meaning of your columns, and that's neither correct nor essential. A statement such as this, therefore, is rather non-sensical and confusing: "[T]he queries that do not say that final_ed must be '0' (By this I mean, if we don't want only rows with the 0 attribute in final_ed, then we want both 0 and 1)"... I'm still trying to understand what that even means. In particular, a WHERE clause adds a restriction to a query; it is not generally its purpose to remove a restriction. If you mean that you want all rows, independent of the value of final_ed, then it should not be restricted in your WHERE clause. If you mean, "only when final_ed is 0 or 1", then you want "WHERE (final_ed=0 OR final_ed=1)", which would seem strange since those are apparently the only two values anyway. Please review what you actually need. Dispense with all statements of "rows that 'ARE' property-X" and please describe them specifically: final_ed=0, final_ed!=0, final_ed="watermelons", legend=7, "legend is an even number", and so forth. I think if you do this, you will find that the query starts to appear automatically. Edit: It's doesn't have to be English, but you'll likely keep getting answers that are 'wrong' until such time as you can explain what it is you even want. What is the expected behavior? Include some sample rows and which would be returned (and not returned) by your query. PhantomOfTheCopier fucked around with this message at 03:56 on Jul 15, 2014 |
# ? Jul 15, 2014 03:50 |
|
Can someone help explain why this query might run like poo poo but changing to select * runs fast edit. missing index was the culprit Withnail fucked around with this message at 21:03 on Jul 15, 2014 |
# ? Jul 15, 2014 18:25 |
|
|
# ? May 30, 2024 10:56 |
|
How important is it to have a CreatedDate & ModifiedDate columns on tables? I've got a guy here telling me that EVERY table, without exception should have those 2 columns. We're adding a couple tables and right now I don't care about CreatedDate/ModifiedDate on a few of the tables and I can't ever see it being needed on those tables. His argument is that what IF at some point in the future I want it and even if I don't, it isn't going to affect anything negatively. I can see his point but I'd rather not add columns if I don't truly need them.
|
# ? Jul 16, 2014 17:34 |