Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
nielsm
Jun 1, 2009



I can't think straight today so help me out. T-SQL.

I have three tables: Cases, CasesResumed, CasesClosed.
Cases have a CaseID, OpenDate, CloseDate (nullable). The OpenDate is always filled, CloseDate is filled if the current status is closed, null if it is open.
CasesResumed have CaseID and a ResumeDate. This indicates the case was re-opened on that date.
CasesClosed have a CaseID and CloseDate. This indicates the case was closed on that date.

I need to get historical data for number of total open and total closed cases on a series of month starts.
My current strategy would involve selecting all cases with an OpenDate on or before each month start. For each of these cases, find the last resume and last close date on or before the month start. Use the case open, resume, close to find the status of the case on that month start date.
With that, count number of open and closed cases on each month start.

The major roadblock I'm having is getting the last resume/close date for each case and coalescing that with the case open date, to determine the case status. Any hints?

Adbot
ADBOT LOVES YOU

nielsm
Jun 1, 2009



Managed to build a solution. I'm not sure if this is the best approach but at least it works.

SQL code:
DECLARE @@CutoffDate AS DATE
SET @@CutoffDate = '2018-01-01'

DECLARE @@Stats TABLE (
	[Date] DATE NOT NULL,
	[Closed] INT NOT NULL,
	[Open] INT NOT NULL
);

WHILE @@CutoffDate < '2019-01-01'
BEGIN
	INSERT INTO @@Stats SELECT @@CutoffDate, SUM([Closed]), SUM([Open])
	FROM (
		SELECT
			c.CaseID,
			CASE WHEN MAX(cc.CloseDate) > COALESCE(MAX(cr.ResumeDate), c.OpenDate) THEN 1 ELSE 0 END [Closed],
			CASE WHEN MAX(cc.CloseDate) > COALESCE(MAX(cr.ResumeDate), c.OpenDate) THEN 0 ELSE 1 END [Open]
		FROM Case c
		LEFT JOIN CaseClosed cc ON c.CaseID=cc.CaseID
		LEFT JOIN CaseResumed cr ON c.CaseID=cr.CaseID
		WHERE c.OpenDate < @@CutoffDate AND (cc.CloseDate < @@CutoffDate OR cc.CloseDate IS NULL) AND (cr.ResumeDate < @@CutoffDate OR  cr.ResumeDate IS NULL)
		GROUP BY c.CaseID
		) a;

	SET @@CutoffDate = DATEADD(month, 1, @@CutoffDate);
END

SELECT [Date], [Closed], [Open] FROM @@Stats;

nielsm
Jun 1, 2009



SQL script that drops the existing procedure and creates the new one? One of those migrations systems often present in ORMs?

nielsm
Jun 1, 2009



Mycroft Holmes posted:

new question. I've got three problems i can't get to work.
3. Write a query to display empno column and ename column data from emp table. Display all employees who work in the same deptno with any employee whose ename contains at least one T.
5. Create a query to display empno column, ename column, sal column, and the dept’s average sal data from emp table. Display all employees who have earned more than the same deptno’s average sal.
8. Create a query to display empno column, ename column, and sal column data from emp table. Display all employees whose ename contain at least one T and have earned less than the average sal of the same job.

I can't seem to get the first problem to work
Select empno, ename
from emp
where deptno = (select deptno
from emp
where ename = '%T%');

this doesn't work.

I can't seem to figure out how to do the dept avg salary instead of the avg of all salaries.

Can't do 8 without 3 and 5

Keep in mind you need to use the LIKE operator to do wildcard matches, the = operator will do straight equals comparison.

nielsm
Jun 1, 2009



Nth Doctor posted:

I mean, lots of people fail to understand normalization and how to apply it in an RDBMS. That isn't newsworthy.

I once got in a protracted discussion with a friend of a friend who was convinced that the way to maintain last read links in forums is to have a comma delimited list of post ids in the user table.

Even after I drew the diagrams out, and pointed out the maximum lengths that could fit in a column.

:downs:: well I guess the forum should stop keeping track at that point

Some people don't understand that computers are actually very good at keeping track of thousands or millions of very similar-looking data rows. They can't imagine themselves doing it/doing it in an Excel sheet, so it can't possibly be a good idea to do it in an RDBMS either.

nielsm
Jun 1, 2009



SQL code:
select CONCAT('a', 'b')
pre:
Msg 195, Level 15, State 10, Line 1
'CONCAT' is not a recognized built-in function name.
WTF?
It even autocompletes CONCAT in the SSMS editor.
MS SQL Server 10.50.6000.

Edit: Apparently it was only added in SQL Server 2012 and this is 2008 R2.
I thought CONCAT was supposed to be ancient ANSI SQL.

nielsm
Jun 1, 2009



Agrikk posted:

All my tables are set to latin1_swedish_ci so for whatever reason @externalstring is being recognized as cp850_general_ci during the updates. I tried setting the table to be cp850 but either I did it wrong (likely) or this wasn't the problem.

Help, please?

Try the CONVERT(a USING b) function to convert your input string to the database's encoding.

nielsm
Jun 1, 2009



If you can modify the schema and get everyone to agree on it, handle the addresses as 32 bit integers and just store the first and last address of each range, and do a simple between check.

nielsm
Jun 1, 2009



PhantomOfTheCopier posted:

Next thought, the issue is there speed will suck because the join condition is "inside out". IE, you're trying to find a value between two calculated values, and that'll be slow, maybe even O(mn) depending on the database. So, flip the problem around. What if you wrote a big florping case statement that checked all possible netmasks for a given ip? Then you could do a direct equality match with B.

This is not too bad an idea, if you're stuck with strings:
On the client performing the query, generate all possible netmasks the IP could match, this should be limited to around 25 possibilities. Then query for an exact string match for each of those.

E.g. if your address to search for is 15.56.204.85, then the possible netmasks would be:
15.56.204.85/32, 15.56.204.84/31, 15.56.204.84/30, 15.56.204.80/29, 15.56.204.80/28, 15.56.204.64/27, ... 15.56.204.0/22, 15.56.200.0/21, 15.56.192.0/20, ..., 15.0.0.0/8, 14.0.0.0/7
If any of those exist in the database, you have a match.

nielsm
Jun 1, 2009



Agrikk posted:

Yeah. I’m loading 2200 CSV files [each with 1.5 million rows of data consisting of five columns with a total character width of no more than 86] into a single table with an auto-incrementing rowID and an index on three of the five columns.

I recognize that I could drop the index and recreate it after, but that won’t help me in the long run since I’ll be adding data to this table every night and the table has to be useable as fresh data is added.

Just to get the obvious questions out of the way: You are doing the inserts inside a transaction, right? I'm pretty sure if you open a transaction the indexes will only be updated when you commit, instead of after every insert.

nielsm
Jun 1, 2009



Agrikk posted:

How do I add columns dynamically to a table based on incoming data?

I'm envisioning a table that has columns [date][serviceAspend][serviceBspend]...[serviceXspend]

and every week I get a report that contains a date column and some combination of serviceSpend columns that I won't know ahead of time.

How can I build a script that loads the data from an incoming file and if a new service comes in on a report that hasn't been "seen" before, the new service gets added as a new column to the existing table? and data loaded?

Normalize that poo poo.

code:
Reports:
  ReportID (PK)
  ReportDate
  ReportReceivedTimestamp

ReportableServices:
  ServiceID (PK)
  ServiceName

ReportedSpends:
  ReportID (FK to Reports)
  ServiceID (FK to ReportableServices)
  SpendAmount

nielsm
Jun 1, 2009



Paul MaudDib posted:

I did stack machines and regular languages and all that stuff back in college, but we never really got into some of the more esoteric features of regexes. I’m under the impression that some of those actually take regexes past the power of a regular language into context sensitive languages, if I’m remembering the terms right.

Yes basically any features for look ahead or look behind assertions that don't consume input are outside regular language powers. At least as rule of thumb. Some regex implementations might degrade from a FSM implementation with O(n) time to match, where n is length of the haystack, to a much slower backtracking algorithm that can be quadratic in length of input plus complexity of the expression.
All of this is from memory, might have some details wrong.

nielsm
Jun 1, 2009



I'm getting a T-SQL task dumped on me that's (probably) too complex/weird to be solved by a straight sequence of insert/update/delete statements, and I have almost zero knowledge of the procedural language.
Can anyone recommend some books/other resources for learning procedural T-SQL?

nielsm
Jun 1, 2009



Yeah that's totally a pivot. Either do the straight group by query and get your four columns (f1, f2, year_month, count) and format the data at the client side, or do the same query and use the database's pivot features on that year_month column. Maybe have a dimension table of date ranges and the year-month label to simplify the join condition.

nielsm
Jun 1, 2009



Use a TABLESAMPLE query but repeat it until you don't get zero results.

nielsm
Jun 1, 2009



Check the collation set for database and tables. Perhaps also default collation for connection? I don't remember if that's a thing.
Collations with _ci suffix are case insensitive, suffixes _cs and _bin should be case sensitive.


Edit: beaten and lol holy poo poo mysql how insane can you be

nielsm
Jun 1, 2009



I have a large-ish table (8 million records) in SQL Server and I need to do a simple update via a translation table.

I currently have:
SQL code:
declare @@mapping as table (A nvarchar(50), B nvarchar(50), C nvarchar(50));
insert into @@mapping values
('abc', '123', 'xxx'),
('def', '234', 'xxx'),
('ghi', '987', 'yyy');
-- about 12 rows in @@mapping

create index LargeTable_FieldA on LargeTable (FieldA);

update t set FieldA = m.B, FieldB = m.C, FieldX = m.A
from LargeTable t join @@mapping m on t.FieldA = m.A;
The execution plan I get is scanning @@mapping and for each row find all rows in LargeTable that match, then update those rows. The index helps this, but it still means seeking around LargeTable a whole bunch, and creating the index just for this update takes a good while.

If I was implementing this in a flat file store in a C program I'd be scanning LargeTable and look up the matching mapping row then update. This only performs a single scan of the large table and doesn't really need an index at all. Is there any way to force SQL Server to do the update that way instead?

I really only need to run this update once ever, except that the process I'm working on has a bunch of development cycles, so an extra 40 minutes of execution time isn't that much of an issue in the big picture. Mostly asking for curiosity's sake.

nielsm
Jun 1, 2009



Thanks, I'll try out those methods.

Right now I get a Nested Loops (Inner Join) using Hash Match (Aggregate), with a table scan of the mapping table at top and an index seek on the nonclustered index of the large table at the bottom. This has an actual cost of 48% on the index seek, 19% on the nested loops, 31% on the hash match, and 1% on the final clustered index update.

nielsm
Jun 1, 2009



Okay I experimented a bit, two things made a difference:

First, using a tempdb table ##mapping instead of a table var @@mapping.
Second, creating the clustered index on ##mapping.A allowed the execution plan to prefer a nested loops join with a table scan of LargeTable at the top, doing a stream aggregate match on the mapping table.

I didn't need any index hints or changing the join type on the update statement after those two changes.

The actual cost moved to 85% table update, 9% table scan on LargeTable, 6% index seek on ##mapping, zero on the join. Execution time improved by a factor 16, from about 12 minutes down to 45 seconds with a LargeTable dataset of 800k rows.
Edit: And yes those numbers are much closer to what I had expected the costs for the opration to be.

nielsm fucked around with this message at 14:22 on Apr 1, 2020

nielsm
Jun 1, 2009



Agrikk posted:

Another question:

Given a table like so:
code:
TransactionID   Item1   Item1Count Item2 Item2Count  Item3 Item3count  Item4 Item4count
abc123                 4              10                5               20            6             100            7          50
Without doing four joins? Is there a lookup table function or technique that'll look up the all the values in the whole row from one table at once?

Uh, untested stupid idea:

SQL code:
with TransactionItems as (
select TransactionID, Item1 ItemID, Item1Count ItemCount from Transactions
union
select TransactionID, Item2 ItemID, Item2Count ItemCount from Transactions
union
select TransactionID, Item3 ItemID, Item3Count ItemCount from Transactions
union
select TransactionID, Item4 ItemID, Item4Count ItemCount from Transactions
)
select *
from TransactionItems ti join Items i on ti.ItemID = i.ItemID

nielsm
Jun 1, 2009



When IdY is unique in itself, why does your Z table need to reference more fields than that? I don't understand what that achieves. All I see is denormal data with the Value field duplicated needlessly.
Are you sure you shouldn't have Z not contain the Value and instead have a view that joins Y and Z?

nielsm
Jun 1, 2009



mik posted:

Here's a WHERE Type = 'Stupid' question:

Is there a functional/performance difference between these JOINs in terms of how I restrict Field2 on Table2? In practice they seem to have the same outcome, but I don't know which is 'proper'.

code:
SELECT 
	* 
FROM Table1 A

LEFT JOIN Table2 B 
	ON B.Field1 = A.Field1 
	AND B.Field2 = 'ABC'
and

code:
SELECT 
	* 
FROM Table1 A

LEFT JOIN Table2 B 
	ON B.Field1 = A.Field1 
	
WHERE B.Field2 = 'ABC'

Yes, there is a difference, when you make a left join. If you did an (inner) join they would be equivalent.

The first will match every row in A with all rows in B where Field1 matches, and B.Field2 has the value. If there are rows where Field1 matches none where B.Field2 does, you will get a row without any data from B.
The second will match every row in A with all rows in B where Field 1 matches. It will then filter out joined rows where B.Field2 does not have the value. This means rows from A where there were no matching rows from B will be left out. (It will effectively behave as a plain inner join.)

nielsm
Jun 1, 2009



Actual max concurrent, or number of distinct users seen within each interval? The answer is probably going to be different, with the latter probably being easier.

nielsm
Jun 1, 2009



kumba posted:

if I'm reading this right, I could imagine a setup where you populate a table with each datetime interval you want (e.g. 1/1/2020 00:00:00, 1/1/2020 00:30:00, etc), then left join that table to your user login tracking table on that using BETWEEN and get a count

Yeah something like that will probably work, but I think you should change to count(distinct userid) to not get the same user logging in/out repeatedly.

nielsm
Jun 1, 2009



Assign +1 and -1 values to the log in and log out events. Make sure you have one row per event, not per session, you may have to union the table with itself.
Use a window function to keep a running sum of log in/out events. Join that with your time dimension to take a max of every 30 minute interval.

Your RDBMS probably has a way to make temporary tables, use that. Temporary tables are private to the connection and disappear when you disconnect.

nielsm
Jun 1, 2009



Can anyone confirm if this way of securing an application sounds correct?

MS SQL Server 2019
The application has a couple data tables, containing sensitive data, and a table for logging access to the application and data accessed.
A selection of domain users needs access to the application, other users have no access.
Write a set of stored procedures to do all data access required by users, these sprocs validate parameters and write to the access log table. The sprocs are marked as execute as owner (and owned by dbo).
Create an AD group (CONTOSO\AppUsers) and make the users member of this group.
The AD group is added as a login user to the SQL database, with no permissions on its own.
Create a database-level role "app_user" and add the login user "CONTOSO\AppUsers" as member of that role.
Grant the "app_user" role execute the sprocs.
In the sprocs, the ORIGINAL_LOGIN() function can then be used to obtain the user's own account name (CONTOSO\jsmith) for access logging.

nielsm
Jun 1, 2009



Ruggan posted:

Why EXECUTE AS OWNER?

You don’t really need that. As long as you can deploy the SP permissions are implicit. If the user can execute the SP they will not be blocked on access to tables they otherwise would not have permission against.

I'm not entirely sure I understand what you're arguing here, but I also don't understand the authorization model entirely.

Do you mean that SPs have permissions on tables (etc.) checked at creation time, and then after creation, no checks are made at execution? Other than user having permission to execute the SP itself.

So in theory, I could create an SP that accesses a table I have permissions on, such that I'm the owner of that SP, then have my permissions on that table revoked, and then still be able to execute the SP to access the table in that way?

Are there any disadvantages or risks involved in using EXECUTE AS OWNER?

nielsm
Jun 1, 2009



Try this:

SQL code:
SELECT COALESCE(a1.p_key, a2.p_key) p_key, COALESCE(a1.f_key, a2.f_key) f_key, COALESCE(a1.data, a2.data) data
FROM tbl2 b
LEFT OUTER JOIN tbl1 a1 ON b.fkey=a1.f_key
LEFT OUTER JOIN tbl3 c ON b.p_key=c.f_key_to_tbl1
LEFT OUTER JOIN tbl1 a2 ON c.pkey=a2.f_key
WHERE (a1.p_key IS NULL) <> (a2.p_key IS NULL)
Join either one direction or the other, use a condition that selects only the rows where one or the other direction produces a join, but not where both or neither do.

Actually, the other solution that may be simpler is to just write two separate queries and use UNION ALL between them.

nielsm
Jun 1, 2009



Hammerite posted:

If you need to represent "data is missing" or "column is not applicable", then a nullable column is indicated. If you don't need to represent either of these situations, then a nullable column is not indicated. I've never been in a situation where it's a matter of personal stylistic choice whether to make a column nullable or not, it's always been clear from the context whether it would or wouldn't be correct to make the column nullable. I suppose that to the extent that your observation boils down to "if the column shouldn't be nullable, then don't make it nullable", I agree with it.

I believe the idea is that if you have data that can be absent, then those data should live in a separate table that joins with the one you'd otherwise put the nullable column in.

nielsm
Jun 1, 2009



Generate the amount of id's you need from the sequence, assign them manually to the data you insert in the first table, then you know which id's correspond to which data for inserting into the second table.

nielsm
Jun 1, 2009



The data will go through the machine the import/export proces is run on. In general, the database servers from different vendors can't connect to each other for direct transfers, they use different protocols and all. All that wizard is doing for you is hooking up a SELECT query on the source to fetch all the data, and then send INSERT queries to the destination to load it back in.

nielsm
Jun 1, 2009



I can't write the SQL off the top of my head, but you can probably write a CTE that ranks the items for each transaction with ordinal 1, 2, 3, 4, 5,... and then select a five-way self join on the CTE to construct rows with up to five ranked items from each.

nielsm
Jun 1, 2009



Consider if you should normalize your schema.

I think it would look something like

Host (id , name)
Pop (id, name)
Measurement (id, time)
ResponseTime (measurement_id, host_id, pop_id, response_time)

nielsm
Jun 1, 2009



You obviously need to have ingredients that have a recipe of their own too. Like you have a recipe for mayonnaise and then a recipe for a mayo-based burger dressing, and then a recipe for a hamburger using that dressing.

nielsm
Jun 1, 2009



You can maybe sign up for a virtual desktop on Azure. Not sure if you can get those as an individual or only as an enterprise customer.

Or go install Windows on the Mac, I don't think you can run new macOS versions on it anyway, and will get locked out of more and more software as developers update their SDKs.

nielsm
Jun 1, 2009



You make a JOIN in your FROM clause, joining the related tables to the report table, on the condition of each table's key being equal to the matching foreign key of the report table.
That gives you rows containing the data from the report table, the job_card table, the user table, and the report_type table, which you can then select as appropriate.
The report.job_id etc. foreign key columns will still be present in the joined result, but if you don't include them in the list of fields to select, you won't see them in the result set.

nielsm
Jun 1, 2009



Nth Doctor posted:

I'm a big advocate for identity columns even when there's already a uniquely valued column in the table, if the column isn't comparable as a single CPU step. There's no reason you should spend multiple cycles comparing J, J, 4, 2, 1, 2, 6, 3 to J, J, 4, 2, 1, 2, 6, 0 one by one when you can just as easily see 34920 vs. 88382381 isn't a match with a single step.

On the other hand, 8 bytes is the size of an int64 and your CPU can already compare two int64 values in a single instruction. Compilers for C++ and other languages can translation a comparison of an array of 8 bytes into a comparison of a single 64 bit value, depending on exact context, so a database engine might also be able to.

nielsm
Jun 1, 2009



You'll have to generate dynamic SQL anyway. The language doesn't allow parameterizing table names anyway, as far as I know.
If you want a "nice" result, create a temporary table, generate SELECT INTO statements for all the source tables, and store the results in the temporary table. Then run the report on the temp table.

nielsm
Jun 1, 2009



https://www.postgresql.org/docs/current/datatype-datetime.html says the timestamp type has 1 microsecond resolution, but your textual representation only has 1 millisecond resolution. The stored values probably have more precision than gets displayed, so they don't produce an exact match with the copy-pasted textual value.

Adbot
ADBOT LOVES YOU

nielsm
Jun 1, 2009



prom candy posted:

How do I write a query to return all users that have a Honda AND a Toyota?

Use an EXISTS in the WHERE clause instead of joining the tables twice. That's also easier to chain when you need to check for three or four makes.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply