Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
mortarr
Apr 28, 2005

frozen meat at high speed

RandomBlue posted:

I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part.

If you control the input and can limit things there it's not so bad but I work on a product that works with user provided CSVs, spreadsheets, DB sources, CRMs, etc.. and of course we work with international customers so all that poo poo has to work together seamlessly.

I had a fun one where all dates were initially NZST, then for a year or three were a mix of NZST and NZDT with no marker which was which, so there were gaps when setting the time forward and double-ups when setting back, and then it went back to NZST. The incoming data was from scada and IoT, so the raw source data was long gone by the time I found the fault, and there was a lot - 1000's of devices, with some posting new info at 5min intervals.

I ended up adding columns with raw timezone, UTC conversion and a flag to highlight the faulty data - it was only over the daylight savings crossovers that it was an issue, but there was pretty much nothing we could do so long after the fact.

Adbot
ADBOT LOVES YOU

redleader
Aug 18, 2005

Engage according to operational parameters

kiwid posted:

How come a MERGE is so fast?

I'm not working with a lot of data, but copying 5000 rows is about 300ms.

query planner

MERGE sucks, but it can optimise merges just like it can INSERTs and UPDATEs and etc

a much less fun question is "how come a MERGE is so slow?"



mortarr posted:

I just want to say I have done all this converting from datetime to datetimeoffset and feel your pain. I was the timezone "expert" for a couple of companies in the past, and I hope I never have to dick about with timezones ever again, they are by far the most complex and error-prone kind of calculations I've ever had to work with.

we have utc, datetimes in the client's tz, plain dates, and quite possibly some other thing



RandomBlue posted:

I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part.

If you control the input and can limit things there it's not so bad but I work on a product that works with user provided CSVs, spreadsheets, DB sources, CRMs, etc.. and of course we work with international customers so all that poo poo has to work together seamlessly.

for a long time, we had a time zone selector on our csv upload page. it defaulted to the browser's time zone. we had contractors in a random country doing a lot of uploads for our clients. they very often did not select the correct time zone for the client they were uploading data for

TasogareNoKagi
Jul 11, 2013

RandomBlue posted:

I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part.

If you control the input and can limit things there it's not so bad but I work on a product that works with user provided CSVs, spreadsheets, DB sources, CRMs, etc.. and of course we work with international customers so all that poo poo has to work together seamlessly.

I don't care if the old system was happy with it, "17:04 AM" is not a valid time!

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider
e: misread that post like an idiot and created 12 new datetime bugs

e: One of the more fun ones I ran into was Java parsing a dd-mm-yyyy value as mm-dd-yyyy and when the moth value was > 12 instead of throwing an error it would just wrap the year and subtract 12 from the month.

RandomBlue fucked around with this message at 00:29 on Mar 12, 2024

kiwid
Sep 30, 2013

RandomBlue posted:

e: One of the more fun ones I ran into was Java parsing a dd-mm-yyyy value as mm-dd-yyyy and when the moth value was > 12 instead of throwing an error it would just wrap the year and subtract 12 from the month.

:catstare:

Seventh Arrow
Jan 26, 2005

"And that, my son, is when mankind discovered time travel."

A Festivus Miracle
Dec 19, 2012

I have come to discourse on the profound inequities of the American political system.

this is the wrong thread,lol.

A Festivus Miracle fucked around with this message at 21:16 on Mar 13, 2024

kiwid
Sep 30, 2013

Ugh, more datetime timezone bullshit.

I have a table that looks like this:



I want to combine these three columns into one in a new view and convert from EST to UTC.

It looks like I can simply ignore the 3rd column because the time is already in 24-hour time, right?

Is there a straight forward way to combine the 1st and 2nd columns into a datetime?

edit: the 2nd column is a varchar(6)

edit2: This is what I've come up with but it's so inelegant:

SQL code:
cast(
    concat_ws(
        ' ',
        cast(scale_ticket_trans_date as date),
        timefromparts(
            left(scale_ticket_trans_time, 2),
            substring(scale_ticket_trans_time, 3, 2),
            right(scale_ticket_trans_time, 2),
            0,
            0
        )
    )
as datetime)

kiwid fucked around with this message at 17:16 on Mar 18, 2024

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

kiwid posted:

Ugh, more datetime timezone bullshit.

I have a table that looks like this:



I want to combine these three columns into one in a new view and convert from EST to UTC.

It looks like I can simply ignore the 3rd column because the time is already in 24-hour time, right?

Is there a straight forward way to combine the 1st and 2nd columns into a datetime?

edit: the 2nd column is a varchar(6)

edit2: This is what I've come up with but it's so inelegant:

SQL code:
cast(
    concat_ws(
        ' ',
        cast(scale_ticket_trans_date as date),
        timefromparts(
            left(scale_ticket_trans_time, 2),
            substring(scale_ticket_trans_time, 3, 2),
            right(scale_ticket_trans_time, 2),
            0,
            0
        )
    )
as datetime)

i can't think of a way to do this without looking at the parts of the string like you're already doing, unfortunately i think you're stuck doing that


e: lol this does not work at all but keeping it here for posterity cause drat that was dumb

SELECT
DATEADD(second,CAST(scale_ticket_trans_time AS INT),CAST(scale_ticket_trans_date AS DATE))
FROM mytable


kumba fucked around with this message at 17:31 on Mar 18, 2024

kiwid
Sep 30, 2013

kumba posted:

i can't think of a way to do this without looking at the parts of the string like you're already doing, unfortunately i think you're stuck doing that


e: lol this does not work at all but keeping it here for posterity cause drat that was dumb

SELECT
DATEADD(second,CAST(scale_ticket_trans_time AS INT),CAST(scale_ticket_trans_date AS DATE))
FROM mytable


Thanks :D

Unrelated follow up, what it the point in declaring a column as DECIMAL(10, 0)? Is this ever not just an integer?

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

kiwid posted:

Thanks :D

Unrelated follow up, what it the point in declaring a column as DECIMAL(10, 0)? Is this ever not just an integer?

there exist datetime2(0) fields in a database i have to contend with at work and i have wondered the same thing :D

Oysters Autobio
Mar 13, 2017
Question: Does the line comment syntax ("--") in most SQL dialects (in this case, impala) guarantee that every text after it will always be read in as a comment? i.e. if you were to enter in a variable in a SQL query behind a line comment, is there any way that there's some combination of characters that could escape that comment field into a normal query statement? I know SQL sanitization for query statements is important when it comes to user input because of SQL injection attacks, but what if user input is limited to only behind SQL comment syntax?

for example, say the SQL query was SELECT * FROM mytable -- {variable}, so that when user types in 'foo', the end query becomes SELECT * FROM mytable -- foo . Without any sanitization on the input, can a SQL line comment be escaped back into regular SQL commands? I know if this was a multi-line comment then users could enter in '*/' to end the comment and then enter in other commands afterwards, but are there escapes available for a single-line comment syntax?

Here's the problem: For stupid reasons, mgmt wants every query on our database to contain a comment with a Jira ticket identifier (i.e. "DBT-123", "DBT-3234", "DBT-3433") which has a bunch of info on the query (i.e. purpose of query, etc.). This is so that when they do random audits of the database logs, its easy for them to match up the query and why they were doing it based on the ticket. This is stupid and tedious but fine with our DBAs, they just put in a comment at the end of the line (e.g. SELECT * FROM mystupidtable -- DBT-123) when doing manual queries, and they have a bunch of open tickets for generic testing purposes they just reuse over and over.

Mgmt is adamant that users also must also have a Jira ticket and they add that ticket identifier in a comment query too so that every query they make will include the comments in the impala database logs for auditing purposes

The trouble we're running into is that we also manage user-facing dashboards which are live connected to impala database with kerberos delegation which means that when a user loads in a dashboard, the server uses the credentials of the user to send in the request rather than some sort of server-account credentials. This means that results have row-level security implemented anyways so users can do all they want with dashboards/vizzes in Tableau without risk of exposing data they aren't allowed to see.

The only way that I know of doing this in Tableau is through custom SQL queries and parameters. Generally this is a feature used for when users can be presented with a dropdown menu of options which are entered into a custom parameterized query (SELECT * FROM mytable WHERE region = <region_parameter>. We can enable Parameters in Tableau which allow anything rather than a fixed list, but for obvious reasons (i.e. SQL injection) this is turned off by default in admin settings.

If we were to turn that setting off, and create an integer parameter (i.e. user can only enter in integers between a min and max value)), whats the security risk like here for us to create a parameterized custom SQL query in Tableau like SELECT * FROM mytable -- <parameter> and instruct their Jira ticket number. Is there anyway here that a user could escape the comment field? If yes, what if we instead used a paramaterized query which only allowed to a user inputted integer value, like SELECT * FROM mytable -- DBT-<parameter> (Tableau lets you validate against different types of parameters like this) and prompt them to only enter in the number portion of their ticket (i.e. their ticket is "DBT-1234", they would enter in "1234" when prompted).

Bearing in mind in the above scenario, the database is protected by kerberos auth which is mapped to user LDAP profiles, so even if they could escape the comment, the results would be filtered out for any data they're not allowed to access anyways (i.e. row-level security).

How dumb of an idea is either of those two? (Ignoring the fact that there's row-level security anyways so why mgmt is adamant to mandate these stupid unvalidated comment text to begin with i dont know).

nielsm
Jun 1, 2009



If the variables are handled by the DBMS and not an ORM (or other access layer) doing some kind of textual substitution, then the values assigned to the variables should never become part of the query text as such.

If instead the variables are in fact just an ORM/access layer doing textual substitution, then it all depends on that library/framework and has nothing to do with SQL as such.
(SQL comments go from the -- to the next newline, so it depends on whether the user could pass in a value containing newlines, and the substituting code not doing any escaping of those.)

nielsm fucked around with this message at 21:58 on Mar 23, 2024

Oysters Autobio
Mar 13, 2017
Ah there it is, I didn't realize the dashes only allowed commenting until a newline, so yeah unsanitized someone could escape it.

Yeah another reason to prefer open source. I have no idea what Tableau uses here for their custom SQL option, so I don't know how that query is actually materialized before it goes into impala. I imagine they must be using some kind of JavaScript ORM to handle passing the values from the GUI to the SQL.

I'm assuming the same issue applies even if I use Tableau's built-in numeric parameter? IE because we don't know how these Tableau params are being entered into the SQL, it's probably a bad idea to assume that it wouldn't be possible to do SQL injection if I simply used numeric parameter only?

That being said might be overthinking this depending on how Kerberos manages user auth in impala, ie does it with being able to send DDL to certain ACLs only. Because if at least DDL is protected, then even if a malicious actor escaped the string, the most they could do is what? Read all the data they're entitled to read anyways?

This exercise is purely for "auditing" purposes and management has at least accepted the risk that this method (ie entering comments) has nothing enforcing it (we'll prob add in some passive monitoring of the logs to flag queries without it though).

maybe I should just use their enumerated parameters with a lookup list containing every number 1 - 9999 lmao. I would love to see the look on someone's face.

Oysters Autobio fucked around with this message at 13:13 on Mar 27, 2024

kiwid
Sep 30, 2013

Is there a way to select MAX in a join?

For example, I have two tables with a many-to-many pivot that has a revision field. I'd like join on the max revision number if possible, or do I need to filter this in the where and how would I do this?

code:
+------------+
| Table 1    |
+----+-------+
| id | name  |
+----+-------+
| 1  | John  |
+----+-------+
| 2  | Berry |
+----+-------+
| 3  | Sarah |
+----+-------+
| 4  | Amy   |
+----+-------+
code:
+----------------------------------+
| Pivot Table                      |
+-----------+-----------+----------+
| table1_id | table2_id | revision |
+-----------+-----------+----------+
| 1         | 100       | 1        |
+-----------+-----------+----------+
| 1         | 100       | 2        |
+-----------+-----------+----------+
| 1         | 101       | 3        |
+-----------+-----------+----------+
| 1         | 100       | 4        |
+-----------+-----------+----------+
| 2         | 105       | 1        |
+-----------+-----------+----------+
| 2         | 106       | 2        |
+-----------+-----------+----------+
| 3         | 100       | 1        |
+-----------+-----------+----------+
I want results that look like this:

code:
+-----------+-------------+-----------+
| table1_id | table1_name | table2_id |
+-----------+-------------+-----------+
| 1         | John        | 100       |
+-----------+-------------+-----------+
| 2         | Berry       | 106       |
+-----------+-------------+-----------+
| 3         | Sarah       | 100       |
+-----------+-------------+-----------+

kiwid fucked around with this message at 19:48 on Apr 23, 2024

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

i think this is what you're looking for

code:
SELECT
	table1_id, table1_name, table_2id
	FROM table1 t1
	JOIN table2 t2 ON t1.table1_id = t2.table1_id
				   AND t2.revision = (
							SELECT MAX(t2subQuery.revision)
							FROM table2 t2subQuery
							WHERE t2.table1_id = t2subQuery.table1_id
							)							

Adbot
ADBOT LOVES YOU

nielsm
Jun 1, 2009



SQL code:
WITH ranked_pivot AS (
	SELECT
		table1_id,
		table2_id,
		RANK() OVER (PARTITION BY table1_id ORDER BY revision DESC) recentness
	FROM pivot_table
)
SELECT
	t1.id AS table1_id,
	t1.name AS table1_name,
	p.table2_id
FROM table1 AS t1
JOIN ranked_pivot AS p ON t1.id = p.table1_id AND p.recentness = 1
I think this should do it.
For every grouping of table1_id's, rank the revisions by most recent (largest) and join only to the most recent one.

Things you can read up on:
Common Table Expressions (CTE, the WITH clause)
Window functions (the aggretate() OVER (PARTITION BY ORDER BY) part)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply