SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

ChickenWing: Jul 22, 2010

# ? Jun 1, 2020 20:44

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 17:24

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

You keep feeding us little pieces, almost like a sequence scan that has to calculate every single result before it can perform the sort. The table stats might not be sufficient to know that N null rows will actually induce so many calculations. (Versus histogram stats of values which are easier to plan.)

One trouble with that pseudoview is that it can short-circuit if col1 is null, but there's no way to know that until the rows have been joined.

This is like almost every other problem in computer science. Minimal. Working. Example. Start ripping out join tables and columns. Where does it break down? Is it linear (with the number of joined columns, tables)? Are there certain pairs that misbehave? Is it fast with a query but slow with a view? What about EXPLAIN ANALYZE VERBOSE COSTS BUFFERS TIMING? docs. What do the perf tables say about memory consumption? Does it have enough work-mem? :f5:

There's tonnes of things to check.

# ? Jun 4, 2020 06:07

ChickenWing: Jul 22, 2010

Yeah I've been trying to reveal as much as I can without actually showing much in the way of schema details because proprietary code, etc.

Based on experimentation so far it looks like the answer is that when the query hits a certain size it just rolls over and dies. It has nothing to do with the columns I called out specifically, we've found other bits and bobs that, when removed, result in a sane query plan.

# ? Jun 4, 2020 14:30

abelwingnut: Dec 23, 2002

alright, another question.

what version of sql handles decentralization/distribution best? like i could see needing to have this one db i'm working with on 5-10 servers, maybe more. what techniques are best? is there sort of a primer on this? i've only managed centralized databases so this is all new to me.

and yea, i know most versions of nosql are all about this, but the data we're dealing with is pretty straightforward. the structure won't change much once we get it finalized.

or maybe i'm thinking of some form of instant replication? like, i'm imagining having two servers, say in nyc and california, and for prime speed, i'd want west coast users to hit the ca server and east coast users to hit the nyc server. but i'd want to make sure the information inserted into one server is replicated over fairly quickly to the other.

or is better to just have one centralized server that is juiced up? thinking we'd ultimately have a ton of connections and calls so just trying to lighten the load based on location. or am i overthinking this?

abelwingnut fucked around with this message at 17:45 on Jun 4, 2020

# ? Jun 4, 2020 17:40

Paul MaudDib: May 3, 2006; TEAM NVIDIA:
FORUM POLICE

abelwingnut posted:

or is better to just have one centralized server that is juiced up? thinking we'd ultimately have a ton of connections and calls so just trying to lighten the load based on location. or am i overthinking this?

if you can solve your problems with one server it's a lot easier, yeah. Or at least a master/slave replication setup with failover redundancy.

true distributed databases are hard, especially geographically separated ones. What happens when two applications modify the same record at the same time? You have to either wait for the other server to acknowledge every lock, or allow that at an application level some writes are going to fail. Sidestepping that question entirely by having a single-point-of-truth master server is way easier and way faster.

With a master/slave replication setup you can use something like Postgres and it's free. Distributed database is one of the few use-cases that you really need to shell out for a commercial database like Oracle for, and that is both expensive and risky from a business sense, and the performance probably will not be what you are hoping.

Paul MaudDib fucked around with this message at 20:50 on Jun 4, 2020

# ? Jun 4, 2020 18:19

Impotence: Nov 8, 2010; Lipstick Apathy

Paul MaudDib posted:

What happens when two applications modify the same record at the same time?

Pretty much this: 120-200ms+ minimum network latency to acknowledge that the other side has gotten a write sounds painful.

# ? Jun 4, 2020 20:39

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

abelwingnut posted:

or maybe i'm thinking of some form of instant replication? like, i'm imagining having two servers, say in nyc and california, and for prime speed, i'd want west coast users to hit the ca server and east coast users to hit the nyc server. but i'd want to make sure the information inserted into one server is replicated over fairly quickly to the other.

Distributed data gives you lots of different options. Read replicas are easy for query offloading, but the product has to be able to work despite replication delay. In PostgreSQL land, you can try synchronous replication if you want stronger guarantees that data has been committed.

The trouble is you're up against Brewer ("CAP theorem"), and most solutions drop 'partition tolerance' (and you actually can't just "choose two of three"), so you're going to have to rebuild and backfill servers after events. Failback automation is rather difficult. If you're going to go there, you need a reason because it'll cost money.

(I've never seen a "multi master" solution work on a scaled production system. The support overhead increases exponentially. To be fair, I haven't tried it or researched it for a few years. You can build such a system on top of most existing databases.)

# ? Jun 4, 2020 23:39

Xae: Jan 19, 2005

ChickenWing posted:

Yeah I've been trying to reveal as much as I can without actually showing much in the way of schema details because proprietary code, etc.

Based on experimentation so far it looks like the answer is that when the query hits a certain size it just rolls over and dies. It has nothing to do with the columns I called out specifically, we've found other bits and bobs that, when removed, result in a sane query plan.

Check any sub queries or CTEs you have. Postgres is pretty iffy on pushing down filter criteria into them.

Also sorting 4 billion records for under 500kb of memory sounds real fishy.

Xae fucked around with this message at 03:33 on Jun 8, 2020

# ? Jun 8, 2020 03:31

Spikes32: Jul 25, 2013; Happy trees

I've got a user login tracking table that tracks when a user logs in and logs out. I want to figure out the max concurrent users for every thirty minute interval going back six months. I'm new to all this still, can anyone shoot out some keywords I can use to figure out how to do this?

# ? Jun 18, 2020 22:15

nielsm: Jun 1, 2009

Actual max concurrent, or number of distinct users seen within each interval? The answer is probably going to be different, with the latter probably being easier.

# ? Jun 18, 2020 22:27

kumba: Nov 8, 2003; I posted my food for USPOL Thanksgiving!

enjoy the ride; Lipstick Apathy

Spikes32 posted:

I've got a user login tracking table that tracks when a user logs in and logs out. I want to figure out the max concurrent users for every thirty minute interval going back six months. I'm new to all this still, can anyone shoot out some keywords I can use to figure out how to do this?

if I'm reading this right, I could imagine a setup where you populate a table with each datetime interval you want (e.g. 1/1/2020 00:00:00, 1/1/2020 00:30:00, etc), then left join that table to your user login tracking table on that using BETWEEN and get a count

e.g.

SELECT d.timestamp, count(distinct u.userID)
FROM dateTable d
LEFT JOIN userLoginTrackingTable u ON d.TimeStamp BETWEEN u.userLoginTime AND u.userLogoutTime
WHERE d.TimeStamp >= dateadd(month,-6,cast(current_timestamp as date))
GROUP BY d.timestamp
ORDER BY d.timestamp

using a left join so you don't throw out intervals where there's no one logged in i think works here? no idea how performant that would be but that's where i'd start probably

edit: vvvvv good call! updated the above to account for that

kumba fucked around with this message at 22:37 on Jun 18, 2020

# ? Jun 18, 2020 22:28

nielsm: Jun 1, 2009

kumba posted:

if I'm reading this right, I could imagine a setup where you populate a table with each datetime interval you want (e.g. 1/1/2020 00:00:00, 1/1/2020 00:30:00, etc), then left join that table to your user login tracking table on that using BETWEEN and get a count

Yeah something like that will probably work, but I think you should change to count(distinct userid) to not get the same user logging in/out repeatedly.

# ? Jun 18, 2020 22:31

Spikes32: Jul 25, 2013; Happy trees

nielsm posted:

Actual max concurrent, or number of distinct users seen within each interval? The answer is probably going to be different, with the latter probably being easier.

I need max concurrent not number of unique users per interval sadly. I'm figuring out if we need to buy more licenses for this piece of software. And unfortunately the database I'm working with is validated, so I can't go adding another table into it willy nilly. Is there a way to do this with just a query? I can export the data into excel and kludge something together if needed, but that sounds even more painful.

# ? Jun 18, 2020 22:56

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Spikes32 posted:

I need max concurrent not number of unique users per interval sadly. I'm figuring out if we need to buy more licenses for this piece of software. And unfortunately the database I'm working with is validated, so I can't go adding another table into it willy nilly. Is there a way to do this with just a query? I can export the data into excel and kludge something together if needed, but that sounds even more painful.

You could generate the timestamps on the fly in the query with like a CTE or something.

Calculate the number of buckets in the time period you care about, and the earliest time for the series. Then do something like DATEADD(bucket unit, bucket size in unit, previous bucket timestamp) to get the next bucket.

I'm phone posting and can't get to a PC to sketch this out.

# ? Jun 18, 2020 23:11

nielsm: Jun 1, 2009

Assign +1 and -1 values to the log in and log out events. Make sure you have one row per event, not per session, you may have to union the table with itself.
Use a window function to keep a running sum of log in/out events. Join that with your time dimension to take a max of every 30 minute interval.

Your RDBMS probably has a way to make temporary tables, use that. Temporary tables are private to the connection and disappear when you disconnect.

# ? Jun 18, 2020 23:16

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

nielsm posted:

Assign +1 and -1 values to the log in and log out events. Make sure you have one row per event, not per session, you may have to union the table with itself.
Use a window function to keep a running sum of log in/out events. Join that with your time dimension to take a max of every 30 minute interval.

Your RDBMS probably has a way to make temporary tables, use that. Temporary tables are private to the connection and disappear when you disconnect.

I�d do it this way too.

Union itself to itself to separate log in, log out into two rows as nielsm mentioned.

# ? Jun 18, 2020 23:37

Spikes32: Jul 25, 2013; Happy trees

Thanks everyone! When I'm at work tomorrow I'll start working with these ideas.

# ? Jun 18, 2020 23:58

Spikes32: Jul 25, 2013; Happy trees

So after staring at windows function and union all for a while, and temporary tables not working with the access I have to the dB, this is what I ended up doing. Temporary tables kinda worked, I could create them just not see them or access them after the fact for some dumb reason. Major thanks to everyone who helped out. It took about 3 mins to execute, which was fine for the one time use I needed it for.

I was surprised to find that the user count was much lower than expected, and way lower than I knew it to be for this past Wednesday. After looking more at the raw data, I found that about half of the logout entries had null values. Turns out the program enters null if the user X's out with the upper right button instead of clicking the logout button or going file logout :argh:

. So now I gotta decide what the average user session is and substitute that for the null values.

DECLARE @DateFrom DATETIME
SET @DateFrom = '2020-04-20 00:00'

DECLARE @DateTo DATETIME
SET @DateTo = '2020-06-19'

;WITH DateRanges AS
(
SELECT @DateFrom AS 'DateValue'
UNION ALL
SELECT DATEADD(MINUTE, 10, DateValue)
FROM DateRanges
WHERE DateValue < @DateTo
)
SELECT d.Datevalue, COUNT(distinct L.user_name)
FROM DateRanges d
LEFT JOIN LOGIN_LOG L
ON d.DateValue BETWEEN l.LOGIN_TIME AND l.LOGOUT_TIME
GROUP BY d.DateValue
ORDER BY d.DateValue
option (maxrecursion 0)

# ? Jun 19, 2020 18:44

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Spikes32 posted:

So after staring at windows function and union all for a while, and temporary tables not working with the access I have to the dB, this is what I ended up doing. Temporary tables kinda worked, I could create them just not see them or access them after the fact for some dumb reason. Major thanks to everyone who helped out. It took about 3 mins to execute, which was fine for the one time use I needed it for.

I was surprised to find that the user count was much lower than expected, and way lower than I knew it to be for this past Wednesday. After looking more at the raw data, I found that about half of the logout entries had null values. Turns out the program enters null if the user X's out with the upper right button instead of clicking the logout button or going file logout . So now I gotta decide what the average user session is and substitute that for the null values.

DECLARE @DateFrom DATETIME
SET @DateFrom = '2020-04-20 00:00'

DECLARE @DateTo DATETIME
SET @DateTo = '2020-06-19'

;WITH DateRanges AS
(
SELECT @DateFrom AS 'DateValue'
UNION ALL
SELECT DATEADD(MINUTE, 10, DateValue)
FROM DateRanges
WHERE DateValue < @DateTo
)
SELECT d.Datevalue, COUNT(distinct L.user_name)
FROM DateRanges d
LEFT JOIN LOGIN_LOG L
ON d.DateValue BETWEEN l.LOGIN_TIME AND l.LOGOUT_TIME
GROUP BY d.DateValue
ORDER BY d.DateValue
option (maxrecursion 0)

My spider sense is telling me you want to add a TOP 1 / ORDER BY DateValue DESC to the recursive half of your CTE.
It probably doesn't matter but it feels necessary.

# ? Jun 19, 2020 19:08

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Spikes32 posted:

So now I gotta decide what the average user session is and substitute that for the null values.

The session cookies should be set to expire after some time and/or there should be a max idle time for users

# ? Jun 22, 2020 13:53

Xae: Jan 19, 2005

I've got two tables that are throwing problems at me.

They're the same table, but we have to maintain US and EU data centers with different data in them.

They're partitioned tables. In the US queries on the table will use the indexes correctly except for one partition. Which sucks, because a seq scan can add 5-10 seconds.

In the EU it will never use the index on any partition. Which means our update process goes from minutes to ten or so hours as it full table scans a billion rows a few times.

I verified that the indexes show as valid.
I've VACUUM ANALYZE'ed the "bad" partitions.
I've VACUUM ANALYZE'ed the whole tables.
I've REINDEX and then VACUUM ANALYZE'ed the tables AND partitions

Whats next? Drop the indexes on the partition and try again?

# ? Jun 22, 2020 15:38

Spikes32: Jul 25, 2013; Happy trees

Munkeymon posted:

The session cookies should be set to expire after some time and/or there should be a max idle time for users

I would love for that to be the case but it's not. The users stay logged in indefinitely to the program because if any info was entered it can't be lost and would be if the user hadn't saved prior to leaving the computer. Yay gmp labs!

# ? Jun 22, 2020 16:49

ModeSix: Mar 14, 2009

So, kind of an SQL Newbie here and I am trying to get something to work on Microsoft SQL Server 2005 (yay for old outdated poo poo).

I can get it to work properly on my local machine (running SQL Server 2019) but when I run it on the 2005 server it errors out.

Query:

code:

MERGE CustomDB.[dbo].StockCounts AS [Target] 
USING (SELECT ID, 
              ProductNo 
       FROM   LabelTraxxReplication.[dbo].STOCKPRODUCT) AS [Source] (ID, 
      ProductNo) 
ON [Target].ID = [Source].ID 
WHEN NOT MATCHED THEN 
  INSERT (id, 
          ProductNo, 
          CountDate, 
          CountID) 
  VALUES ([Source].ID, 
          [Source].ProductNo, 
          NULL, 
          NULL);

Error:

code:

Msg 102, Level 15, State 1, Line 1
Incorrect syntax near '.'.
Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword 'AS'.

Now I don't know enough about the differences here of why this would error out nor how I would go about searching this (I really don't do SQL ~~very often~~ ever and sort of had to Google this to make it work in the first place).

Basically I want to copy/merge items from a source database into the target database and add new ones that might get added to the source if they are not found in the target.

If someone can help me either unfuck this one to work on SQL Server 2005 or propose/give me an example of a different solution that will accomplish the same thing and work on SQL Server 2005 that would be awesome and I would forever be indebted.

ModeSix fucked around with this message at 03:49 on Jun 23, 2020

# ? Jun 23, 2020 03:26

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

ModeSix posted:

So, kind of an SQL Newbie here and I am trying to get something to work on Microsoft SQL Server 2005 (yay for old outdated poo poo).

I can get it to work properly on my local machine (running SQL Server 2019) but when I run it on the 2005 server it errors out.

Query:
code:
MERGE CustomDB.[dbo].StockCounts AS [Target] 
USING (SELECT ID, 
              ProductNo 
       FROM   LabelTraxxReplication.[dbo].STOCKPRODUCT) AS [Source] (ID, 
      ProductNo) 
ON [Target].ID = [Source].ID 
WHEN NOT MATCHED THEN 
  INSERT (id, 
          ProductNo, 
          CountDate, 
          CountID) 
  VALUES ([Source].ID, 
          [Source].ProductNo, 
          NULL, 
          NULL); 
Error:
code:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near '.'.
Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword 'AS'.
Now I don't know enough about the differences here of why this would error out nor how I would go about searching this (I really don't do SQL ~~very often~~ ever and sort of had to Google this to make it work in the first place).

Basically I want to copy/merge items from a source database into the target database and add new ones that might get added to the source if they are not found in the target.

If someone can help me either unfuck this one to work on SQL Server 2005 or propose/give me an example of a different solution that will accomplish the same thing and work on SQL Server 2005 that would be awesome and I would forever be indebted.

I don't recall if SQL server 2005 has the MERGE statement.

You could do something like
INSERT Target
(
Columns to insert...
)
SELECT
Source.column,
...
FROM LabelTraxxReplication.[dbo].STOCKPRODUCT AS Source
LEFT OUTER JOIN CustomDB.[dbo].StockCounts AS [Target]
ON ...
WHERR Target.id IS NULL;

Sorry for the formatting and not spelling all of the columns out. I'm on my phone and also exhausted.

# ? Jun 23, 2020 05:30

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

MERGE was introduced with SQL2008, confirm.

# ? Jun 23, 2020 09:34

ModeSix: Mar 14, 2009

Thanks for that clarification guys.

So this is what I got and it works!

code:

INSERT INTO CustomDB.[dbo].StockCounts (
    id, 
    ProductNo, 
    CountDate, 
    CountID
)
SELECT        
    [Source].ID, 
    [Source].ProductNo, 
    NULL, 
    NULL
FROM Replication.[dbo].STOCKPRODUCT AS [Source]
LEFT JOIN CustomDB.[dbo].StockCounts AS [Target] ON [Target].ID = [Source].ID
WHERE [TARGET].Id IS NULL;

# ? Jun 23, 2020 21:35

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

ModeSix posted:

Thanks for that clarification guys.

So this is what I got and it works!

code:

INSERT INTO CustomDB.[dbo].StockCounts (
    id, 
    ProductNo, 
    CountDate, 
    CountID
)
SELECT        
    [Source].ID, 
    [Source].ProductNo, 
    NULL, 
    NULL
FROM Replication.[dbo].STOCKPRODUCT AS [Source]
LEFT JOIN CustomDB.[dbo].StockCounts AS [Target] ON [Target].ID = [Source].ID
WHERE [TARGET].Id IS NULL;

# ? Jun 23, 2020 23:49

Hughmoris: Apr 21, 2007; Let's go to the abyss!

I have an upcoming interview for a Data Warehouse Specialist position in a health system. My background is nursing and clinical informatics.

I realize this is broad but any recommendations on questions to ask during the interview, specific to data warehouse work? The job description is pretty vague about specific technologies used. I have experience with T-SQL, Excel, Tableau etc... but I haven't had any exposure to ETL, Hadoop, Spark, that type stuff.

I've been doing healthcare reporting, workflows, and analytics for 6 years so I'm comfortable with the work (or my ability to google stuff I don't know). I'm just not too familiar with the more IT side of this position.

# ? Jun 28, 2020 01:50

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Ask if the ETL team is separate from the reporting team. Important to know if you�re going to be doing SSRS or Crystal or just the data warehouse work.

Ask if you�ll be working on their clinical system data or if the team is focused on internal data (employees, payroll, revenue, etc)? Typically very different stakeholders and systems.

Ask what healthcare software they use? Epic or Cerner? If you will be doing healthcare dw you should know if your work is focused in one of the enterprise software vendors products (I work at Epic) as this may highly impact your focus (they�d want you to be certified and it�s all proprietary stuff).

Ask what the ETL pipeline looks like. What tech stack? What timing? Nightly ETL only or mid day or some sort of streaming solution? Good to know for what to expect, especially when it comes to off-hours work expectations / on call shifts.

Ask what version of the tech stack they are on. Are they keeping up to date with releases? Can you use the newest tech or are you shacked to decades old releases? I wouldn�t want to work somewhere that refused to upgrade past SQL Server 2008 for example.

Ask the size of the team, and especially if you are interviewing with one of them, ask their career path and how they got to where they are. Ask what projects they work on that motivate or excite them. I wouldn�t want to work somewhere where someone couldn�t at least bullshit a convincing answer.

Ask what their WFH policy is during this period. Make sure you don�t sign up for a company with a terrible stance towards coronavirus.

That�s all I got in the moment but happy to answer other questions if you got em.

# ? Jun 28, 2020 04:01

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Ruggan posted:

Good stuff...

You are a saint. This is exactly what I was looking for, thank you!

# ? Jun 28, 2020 21:22

NPR Journalizard: Feb 14, 2008

Is there an easy way to backup a DB, and then restore it to the same SQL server under a different name, but with all the DB objects like stored procs and views pointing at the new name?

I know I can generate scripts with a drop and create option, and I think that will cover a bunch of the items, but im not sure it will cover all of them.

# ? Jul 1, 2020 04:51

Moonwolf: Jun 29, 2004; Flee from th' terrifyin' evil of "NHS"!

NPR Journalizard posted:

Is there an easy way to backup a DB, and then restore it to the same SQL server under a different name, but with all the DB objects like stored procs and views pointing at the new name?

I know I can generate scripts with a drop and create option, and I think that will cover a bunch of the items, but im not sure it will cover all of them.

Most engines will let you do a rename during restore and recovery, which database are you using here for this?

# ? Jul 1, 2020 09:10

NPR Journalizard: Feb 14, 2008

Moonwolf posted:

Most engines will let you do a rename during restore and recovery, which database are you using here for this?

MSSQL 2012 or above.

I can rename the database during the restore no problems. Its the stored procs and views, of which there are quite a lot, that are the pressing concern. Cant have them refernce back to the original DB name, because thats going to break a lot of stuff in a live environment.

# ? Jul 1, 2020 10:50

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

I think you want https://docs.microsoft.com/en-us/sql/relational-databases/data-tier-applications/export-a-data-tier-application

# ? Jul 1, 2020 13:36

NPR Journalizard: Feb 14, 2008

Munkeymon posted:

I think you want https://docs.microsoft.com/en-us/sql/relational-databases/data-tier-applications/export-a-data-tier-application

That looks exactly like what I want, thank you very much.

# ? Jul 2, 2020 05:36

Hughmoris: Apr 21, 2007; Let's go to the abyss!

Ruggan posted:

Ask if the ETL team is separate from the reporting team. Important to know if you�re going to be doing SSRS or Crystal or just the data warehouse work.

Ask if you�ll be working on their clinical system data or if the team is focused on internal data (employees, payroll, revenue, etc)? Typically very different stakeholders and systems.

Ask what healthcare software they use? Epic or Cerner? If you will be doing healthcare dw you should know if your work is focused in one of the enterprise software vendors products (I work at Epic) as this may highly impact your focus (they�d want you to be certified and it�s all proprietary stuff).

Ask what the ETL pipeline looks like. What tech stack? What timing? Nightly ETL only or mid day or some sort of streaming solution? Good to know for what to expect, especially when it comes to off-hours work expectations / on call shifts.

Ask what version of the tech stack they are on. Are they keeping up to date with releases? Can you use the newest tech or are you shacked to decades old releases? I wouldn�t want to work somewhere that refused to upgrade past SQL Server 2008 for example.

Ask the size of the team, and especially if you are interviewing with one of them, ask their career path and how they got to where they are. Ask what projects they work on that motivate or excite them. I wouldn�t want to work somewhere where someone couldn�t at least bullshit a convincing answer.

Ask what their WFH policy is during this period. Make sure you don�t sign up for a company with a terrible stance towards coronavirus.

That�s all I got in the moment but happy to answer other questions if you got em.

Had the interview. Couple of thought/questions:

Sounds like ETL team and reporting team are the same. I'd be doing a mixture of ETL work and reporting/visualizations
They are a Cerner shop, and I'd be working a lot with clinical data due to my clinical background
For ETL pipeline they said a lot of stuff. Sounds like they use Hadoop (I think she said that), and a Cerner EDW.
Anyone use Vertica? Sounds like that is their flavor of SQL.

Overall, sounded like it could be a cool gig. I'd have to brush up on my tech skills a little more. No on-call which is nice. 100% remote right now, will revisit options down the road as COVID progresses.

# ? Jul 2, 2020 20:06

Xae: Jan 19, 2005

I've got a three day weekend to solve this problem in Postgres

code:

SELECT *
  FROM TableA
 WHERE TimeStampA > (SELECT max(TimestampB) FROM TableB)

Runs in 8 hours because TableA is partitioned and Postgres will not use the index on TimestampA on most of the partitions

But if I do this:

code:

SELECT *
  FROM TableA
 WHERE TimeStampA > ${CONSTANT}

Uses the index and runs in Seconds

As a bonus if I do this

code:

SELECT *
  FROM TableA_PartitionThatNormallyWontUseAnIndex
 WHERE TimeStampA> (SELECT max(TimestampB) FROM TableB)

It uses the index and finishes instantly.

We've got an implementation team at MS who is supposed to be helping with this poo poo, but because of the holiday they're kinda MIA. We've only gotten an acknowledgement that "That... shouldn't be happening".

I've messed around and done more REINDEX, VACUUMs and ANALYZE commands than I can count. Hell, I even detached the partitions, clustered on the index on TimeStampA and reattached. Still does a sequence scan.

# ? Jul 3, 2020 04:43

Pardot: Jul 25, 2001

Maybe if you pull the select max up into a cte?

# ? Jul 3, 2020 19:24

Xae: Jan 19, 2005

Pardot posted:

Maybe if you pull the select max up into a cte?

No change.

But that did get me to think about encapsulating the subquery in a function, which seems to work? There is one sticky partition that won't use the index, but 5 minutes beats a few hours.

Thanks!

# ? Jul 3, 2020 20:38

Adbot: ADBOT LOVES YOU

# ? May 28, 2024 17:24

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Not a Postgres expert (I do mostly MS SQL), but maybe do something like this?

code:

DECLARE @maxDt as datetime = (SELECT max(TimestampB) FROM TableB)

SELECT *
  FROM TableA
 WHERE TimeStampA > @maxDt

# ? Jul 4, 2020 03:04

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »