|
RandomBlue posted:I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part. I had a fun one where all dates were initially NZST, then for a year or three were a mix of NZST and NZDT with no marker which was which, so there were gaps when setting the time forward and double-ups when setting back, and then it went back to NZST. The incoming data was from scada and IoT, so the raw source data was long gone by the time I found the fault, and there was a lot - 1000's of devices, with some posting new info at 5min intervals. I ended up adding columns with raw timezone, UTC conversion and a flag to highlight the faulty data - it was only over the daylight savings crossovers that it was an issue, but there was pretty much nothing we could do so long after the fact.
|
# ? Mar 11, 2024 02:02 |
|
|
# ? May 4, 2024 08:58 |
|
kiwid posted:How come a MERGE is so fast? query planner MERGE sucks, but it can optimise merges just like it can INSERTs and UPDATEs and etc a much less fun question is "how come a MERGE is so slow?" mortarr posted:I just want to say I have done all this converting from datetime to datetimeoffset and feel your pain. I was the timezone "expert" for a couple of companies in the past, and I hope I never have to dick about with timezones ever again, they are by far the most complex and error-prone kind of calculations I've ever had to work with. we have utc, datetimes in the client's tz, plain dates, and quite possibly some other thing RandomBlue posted:I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part. for a long time, we had a time zone selector on our csv upload page. it defaulted to the browser's time zone. we had contractors in a random country doing a lot of uploads for our clients. they very often did not select the correct time zone for the client they were uploading data for
|
# ? Mar 11, 2024 07:31 |
|
RandomBlue posted:I loving hate working with dates and times because there's always some new way poo poo gets hosed up. There are a shitload of formats even if timezones aren't an issues but they always are. Storing it in UTC is the easiest part. I don't care if the old system was happy with it, "17:04 AM" is not a valid time!
|
# ? Mar 12, 2024 00:12 |
|
e: misread that post like an idiot and created 12 new datetime bugs e: One of the more fun ones I ran into was Java parsing a dd-mm-yyyy value as mm-dd-yyyy and when the moth value was > 12 instead of throwing an error it would just wrap the year and subtract 12 from the month. RandomBlue fucked around with this message at 00:29 on Mar 12, 2024 |
# ? Mar 12, 2024 00:27 |
|
RandomBlue posted:e: One of the more fun ones I ran into was Java parsing a dd-mm-yyyy value as mm-dd-yyyy and when the moth value was > 12 instead of throwing an error it would just wrap the year and subtract 12 from the month.
|
# ? Mar 12, 2024 01:46 |
|
"And that, my son, is when mankind discovered time travel."
|
# ? Mar 12, 2024 03:54 |
|
this is the wrong thread,lol.
A Festivus Miracle fucked around with this message at 21:16 on Mar 13, 2024 |
# ? Mar 13, 2024 21:13 |
|
Ugh, more datetime timezone bullshit. I have a table that looks like this: I want to combine these three columns into one in a new view and convert from EST to UTC. It looks like I can simply ignore the 3rd column because the time is already in 24-hour time, right? Is there a straight forward way to combine the 1st and 2nd columns into a datetime? edit: the 2nd column is a varchar(6) edit2: This is what I've come up with but it's so inelegant: SQL code:
kiwid fucked around with this message at 17:16 on Mar 18, 2024 |
# ? Mar 18, 2024 17:05 |
|
kiwid posted:Ugh, more datetime timezone bullshit. i can't think of a way to do this without looking at the parts of the string like you're already doing, unfortunately i think you're stuck doing that e: lol this does not work at all but keeping it here for posterity cause drat that was dumb SELECT DATEADD(second,CAST(scale_ticket_trans_time AS INT),CAST(scale_ticket_trans_date AS DATE)) FROM mytable kumba fucked around with this message at 17:31 on Mar 18, 2024 |
# ? Mar 18, 2024 17:21 |
|
kumba posted:i can't think of a way to do this without looking at the parts of the string like you're already doing, unfortunately i think you're stuck doing that Thanks Unrelated follow up, what it the point in declaring a column as DECIMAL(10, 0)? Is this ever not just an integer?
|
# ? Mar 18, 2024 17:59 |
|
kiwid posted:Thanks there exist datetime2(0) fields in a database i have to contend with at work and i have wondered the same thing
|
# ? Mar 18, 2024 18:21 |
|
Question: Does the line comment syntax ("--") in most SQL dialects (in this case, impala) guarantee that every text after it will always be read in as a comment? i.e. if you were to enter in a variable in a SQL query behind a line comment, is there any way that there's some combination of characters that could escape that comment field into a normal query statement? I know SQL sanitization for query statements is important when it comes to user input because of SQL injection attacks, but what if user input is limited to only behind SQL comment syntax? for example, say the SQL query was SELECT * FROM mytable -- {variable}, so that when user types in 'foo', the end query becomes SELECT * FROM mytable -- foo . Without any sanitization on the input, can a SQL line comment be escaped back into regular SQL commands? I know if this was a multi-line comment then users could enter in '*/' to end the comment and then enter in other commands afterwards, but are there escapes available for a single-line comment syntax? Here's the problem: For stupid reasons, mgmt wants every query on our database to contain a comment with a Jira ticket identifier (i.e. "DBT-123", "DBT-3234", "DBT-3433") which has a bunch of info on the query (i.e. purpose of query, etc.). This is so that when they do random audits of the database logs, its easy for them to match up the query and why they were doing it based on the ticket. This is stupid and tedious but fine with our DBAs, they just put in a comment at the end of the line (e.g. SELECT * FROM mystupidtable -- DBT-123) when doing manual queries, and they have a bunch of open tickets for generic testing purposes they just reuse over and over. Mgmt is adamant that users also must also have a Jira ticket and they add that ticket identifier in a comment query too so that every query they make will include the comments in the impala database logs for auditing purposes The trouble we're running into is that we also manage user-facing dashboards which are live connected to impala database with kerberos delegation which means that when a user loads in a dashboard, the server uses the credentials of the user to send in the request rather than some sort of server-account credentials. This means that results have row-level security implemented anyways so users can do all they want with dashboards/vizzes in Tableau without risk of exposing data they aren't allowed to see. The only way that I know of doing this in Tableau is through custom SQL queries and parameters. Generally this is a feature used for when users can be presented with a dropdown menu of options which are entered into a custom parameterized query (SELECT * FROM mytable WHERE region = <region_parameter>. We can enable Parameters in Tableau which allow anything rather than a fixed list, but for obvious reasons (i.e. SQL injection) this is turned off by default in admin settings. If we were to turn that setting off, and create an integer parameter (i.e. user can only enter in integers between a min and max value)), whats the security risk like here for us to create a parameterized custom SQL query in Tableau like SELECT * FROM mytable -- <parameter> and instruct their Jira ticket number. Is there anyway here that a user could escape the comment field? If yes, what if we instead used a paramaterized query which only allowed to a user inputted integer value, like SELECT * FROM mytable -- DBT-<parameter> (Tableau lets you validate against different types of parameters like this) and prompt them to only enter in the number portion of their ticket (i.e. their ticket is "DBT-1234", they would enter in "1234" when prompted). Bearing in mind in the above scenario, the database is protected by kerberos auth which is mapped to user LDAP profiles, so even if they could escape the comment, the results would be filtered out for any data they're not allowed to access anyways (i.e. row-level security). How dumb of an idea is either of those two? (Ignoring the fact that there's row-level security anyways so why mgmt is adamant to mandate these stupid unvalidated comment text to begin with i dont know).
|
# ? Mar 23, 2024 20:42 |
If the variables are handled by the DBMS and not an ORM (or other access layer) doing some kind of textual substitution, then the values assigned to the variables should never become part of the query text as such. If instead the variables are in fact just an ORM/access layer doing textual substitution, then it all depends on that library/framework and has nothing to do with SQL as such. (SQL comments go from the -- to the next newline, so it depends on whether the user could pass in a value containing newlines, and the substituting code not doing any escaping of those.) nielsm fucked around with this message at 21:58 on Mar 23, 2024 |
|
# ? Mar 23, 2024 21:56 |
|
Ah there it is, I didn't realize the dashes only allowed commenting until a newline, so yeah unsanitized someone could escape it. Yeah another reason to prefer open source. I have no idea what Tableau uses here for their custom SQL option, so I don't know how that query is actually materialized before it goes into impala. I imagine they must be using some kind of JavaScript ORM to handle passing the values from the GUI to the SQL. I'm assuming the same issue applies even if I use Tableau's built-in numeric parameter? IE because we don't know how these Tableau params are being entered into the SQL, it's probably a bad idea to assume that it wouldn't be possible to do SQL injection if I simply used numeric parameter only? That being said might be overthinking this depending on how Kerberos manages user auth in impala, ie does it with being able to send DDL to certain ACLs only. Because if at least DDL is protected, then even if a malicious actor escaped the string, the most they could do is what? Read all the data they're entitled to read anyways? This exercise is purely for "auditing" purposes and management has at least accepted the risk that this method (ie entering comments) has nothing enforcing it (we'll prob add in some passive monitoring of the logs to flag queries without it though). maybe I should just use their enumerated parameters with a lookup list containing every number 1 - 9999 lmao. I would love to see the look on someone's face. Oysters Autobio fucked around with this message at 13:13 on Mar 27, 2024 |
# ? Mar 27, 2024 12:22 |
|
Is there a way to select MAX in a join? For example, I have two tables with a many-to-many pivot that has a revision field. I'd like join on the max revision number if possible, or do I need to filter this in the where and how would I do this? code:
code:
code:
kiwid fucked around with this message at 19:48 on Apr 23, 2024 |
# ? Apr 23, 2024 19:44 |
|
i think this is what you're looking for code:
|
# ? Apr 23, 2024 19:59 |
SQL code:
For every grouping of table1_id's, rank the revisions by most recent (largest) and join only to the most recent one. Things you can read up on: Common Table Expressions (CTE, the WITH clause) Window functions (the aggretate() OVER (PARTITION BY ORDER BY) part)
|
|
# ? Apr 23, 2024 20:36 |
|
Thanks, I used a mix of both. Opted for the cte but using max() instead of rank(). Before I was loving around with group by which wasn't it.
|
# ? Apr 24, 2024 16:53 |
|
|
# ? May 4, 2024 08:58 |
|
I hosed up, I ended up needing to use RANK() window function. Still new to window functions but this solved my question better. Thanks!
|
# ? May 3, 2024 15:15 |