uh
|
|
# ? Jun 1, 2020 20:44 |
|
|
# ? Jun 8, 2024 09:13 |
|
You keep feeding us little pieces, almost like a sequence scan that has to calculate every single result before it can perform the sort. The table stats might not be sufficient to know that N null rows will actually induce so many calculations. (Versus histogram stats of values which are easier to plan.) One trouble with that pseudoview is that it can short-circuit if col1 is null, but there's no way to know that until the rows have been joined. This is like almost every other problem in computer science. Minimal. Working. Example. Start ripping out join tables and columns. Where does it break down? Is it linear (with the number of joined columns, tables)? Are there certain pairs that misbehave? Is it fast with a query but slow with a view? What about EXPLAIN ANALYZE VERBOSE COSTS BUFFERS TIMING? docs. What do the perf tables say about memory consumption? Does it have enough work-mem? There's tonnes of things to check.
|
# ? Jun 4, 2020 06:07 |
Yeah I've been trying to reveal as much as I can without actually showing much in the way of schema details because proprietary code, etc. Based on experimentation so far it looks like the answer is that when the query hits a certain size it just rolls over and dies. It has nothing to do with the columns I called out specifically, we've found other bits and bobs that, when removed, result in a sane query plan.
|
|
# ? Jun 4, 2020 14:30 |
|
alright, another question. what version of sql handles decentralization/distribution best? like i could see needing to have this one db i'm working with on 5-10 servers, maybe more. what techniques are best? is there sort of a primer on this? i've only managed centralized databases so this is all new to me. and yea, i know most versions of nosql are all about this, but the data we're dealing with is pretty straightforward. the structure won't change much once we get it finalized. or maybe i'm thinking of some form of instant replication? like, i'm imagining having two servers, say in nyc and california, and for prime speed, i'd want west coast users to hit the ca server and east coast users to hit the nyc server. but i'd want to make sure the information inserted into one server is replicated over fairly quickly to the other. or is better to just have one centralized server that is juiced up? thinking we'd ultimately have a ton of connections and calls so just trying to lighten the load based on location. or am i overthinking this? abelwingnut fucked around with this message at 17:45 on Jun 4, 2020 |
# ? Jun 4, 2020 17:40 |
|
abelwingnut posted:or is better to just have one centralized server that is juiced up? thinking we'd ultimately have a ton of connections and calls so just trying to lighten the load based on location. or am i overthinking this? if you can solve your problems with one server it's a lot easier, yeah. Or at least a master/slave replication setup with failover redundancy. true distributed databases are hard, especially geographically separated ones. What happens when two applications modify the same record at the same time? You have to either wait for the other server to acknowledge every lock, or allow that at an application level some writes are going to fail. Sidestepping that question entirely by having a single-point-of-truth master server is way easier and way faster. With a master/slave replication setup you can use something like Postgres and it's free. Distributed database is one of the few use-cases that you really need to shell out for a commercial database like Oracle for, and that is both expensive and risky from a business sense, and the performance probably will not be what you are hoping. Paul MaudDib fucked around with this message at 20:50 on Jun 4, 2020 |
# ? Jun 4, 2020 18:19 |
|
Paul MaudDib posted:What happens when two applications modify the same record at the same time? Pretty much this: 120-200ms+ minimum network latency to acknowledge that the other side has gotten a write sounds painful.
|
# ? Jun 4, 2020 20:39 |
|
abelwingnut posted:or maybe i'm thinking of some form of instant replication? like, i'm imagining having two servers, say in nyc and california, and for prime speed, i'd want west coast users to hit the ca server and east coast users to hit the nyc server. but i'd want to make sure the information inserted into one server is replicated over fairly quickly to the other. The trouble is you're up against Brewer ("CAP theorem"), and most solutions drop 'partition tolerance' (and you actually can't just "choose two of three"), so you're going to have to rebuild and backfill servers after events. Failback automation is rather difficult. If you're going to go there, you need a reason because it'll cost money. (I've never seen a "multi master" solution work on a scaled production system. The support overhead increases exponentially. To be fair, I haven't tried it or researched it for a few years. You can build such a system on top of most existing databases.)
|
# ? Jun 4, 2020 23:39 |
|
ChickenWing posted:Yeah I've been trying to reveal as much as I can without actually showing much in the way of schema details because proprietary code, etc. Check any sub queries or CTEs you have. Postgres is pretty iffy on pushing down filter criteria into them. Also sorting 4 billion records for under 500kb of memory sounds real fishy. Xae fucked around with this message at 03:33 on Jun 8, 2020 |
# ? Jun 8, 2020 03:31 |
|
I've got a user login tracking table that tracks when a user logs in and logs out. I want to figure out the max concurrent users for every thirty minute interval going back six months. I'm new to all this still, can anyone shoot out some keywords I can use to figure out how to do this?
|
# ? Jun 18, 2020 22:15 |
Actual max concurrent, or number of distinct users seen within each interval? The answer is probably going to be different, with the latter probably being easier.
|
|
# ? Jun 18, 2020 22:27 |
|
Spikes32 posted:I've got a user login tracking table that tracks when a user logs in and logs out. I want to figure out the max concurrent users for every thirty minute interval going back six months. I'm new to all this still, can anyone shoot out some keywords I can use to figure out how to do this? if I'm reading this right, I could imagine a setup where you populate a table with each datetime interval you want (e.g. 1/1/2020 00:00:00, 1/1/2020 00:30:00, etc), then left join that table to your user login tracking table on that using BETWEEN and get a count e.g. SELECT d.timestamp, count(distinct u.userID) FROM dateTable d LEFT JOIN userLoginTrackingTable u ON d.TimeStamp BETWEEN u.userLoginTime AND u.userLogoutTime WHERE d.TimeStamp >= dateadd(month,-6,cast(current_timestamp as date)) GROUP BY d.timestamp ORDER BY d.timestamp using a left join so you don't throw out intervals where there's no one logged in i think works here? no idea how performant that would be but that's where i'd start probably edit: vvvvv good call! updated the above to account for that kumba fucked around with this message at 22:37 on Jun 18, 2020 |
# ? Jun 18, 2020 22:28 |
kumba posted:if I'm reading this right, I could imagine a setup where you populate a table with each datetime interval you want (e.g. 1/1/2020 00:00:00, 1/1/2020 00:30:00, etc), then left join that table to your user login tracking table on that using BETWEEN and get a count Yeah something like that will probably work, but I think you should change to count(distinct userid) to not get the same user logging in/out repeatedly.
|
|
# ? Jun 18, 2020 22:31 |
|
nielsm posted:Actual max concurrent, or number of distinct users seen within each interval? The answer is probably going to be different, with the latter probably being easier. I need max concurrent not number of unique users per interval sadly. I'm figuring out if we need to buy more licenses for this piece of software. And unfortunately the database I'm working with is validated, so I can't go adding another table into it willy nilly. Is there a way to do this with just a query? I can export the data into excel and kludge something together if needed, but that sounds even more painful.
|
# ? Jun 18, 2020 22:56 |
|
Spikes32 posted:I need max concurrent not number of unique users per interval sadly. I'm figuring out if we need to buy more licenses for this piece of software. And unfortunately the database I'm working with is validated, so I can't go adding another table into it willy nilly. Is there a way to do this with just a query? I can export the data into excel and kludge something together if needed, but that sounds even more painful. You could generate the timestamps on the fly in the query with like a CTE or something. Calculate the number of buckets in the time period you care about, and the earliest time for the series. Then do something like DATEADD(bucket unit, bucket size in unit, previous bucket timestamp) to get the next bucket. I'm phone posting and can't get to a PC to sketch this out.
|
# ? Jun 18, 2020 23:11 |
Assign +1 and -1 values to the log in and log out events. Make sure you have one row per event, not per session, you may have to union the table with itself. Use a window function to keep a running sum of log in/out events. Join that with your time dimension to take a max of every 30 minute interval. Your RDBMS probably has a way to make temporary tables, use that. Temporary tables are private to the connection and disappear when you disconnect.
|
|
# ? Jun 18, 2020 23:16 |
|
nielsm posted:Assign +1 and -1 values to the log in and log out events. Make sure you have one row per event, not per session, you may have to union the table with itself. I’d do it this way too. Union itself to itself to separate log in, log out into two rows as nielsm mentioned.
|
# ? Jun 18, 2020 23:37 |
|
Thanks everyone! When I'm at work tomorrow I'll start working with these ideas.
|
# ? Jun 18, 2020 23:58 |
|
So after staring at windows function and union all for a while, and temporary tables not working with the access I have to the dB, this is what I ended up doing. Temporary tables kinda worked, I could create them just not see them or access them after the fact for some dumb reason. Major thanks to everyone who helped out. It took about 3 mins to execute, which was fine for the one time use I needed it for. I was surprised to find that the user count was much lower than expected, and way lower than I knew it to be for this past Wednesday. After looking more at the raw data, I found that about half of the logout entries had null values. Turns out the program enters null if the user X's out with the upper right button instead of clicking the logout button or going file logout . So now I gotta decide what the average user session is and substitute that for the null values. DECLARE @DateFrom DATETIME SET @DateFrom = '2020-04-20 00:00' DECLARE @DateTo DATETIME SET @DateTo = '2020-06-19' ;WITH DateRanges AS ( SELECT @DateFrom AS 'DateValue' UNION ALL SELECT DATEADD(MINUTE, 10, DateValue) FROM DateRanges WHERE DateValue < @DateTo ) SELECT d.Datevalue, COUNT(distinct L.user_name) FROM DateRanges d LEFT JOIN LOGIN_LOG L ON d.DateValue BETWEEN l.LOGIN_TIME AND l.LOGOUT_TIME GROUP BY d.DateValue ORDER BY d.DateValue option (maxrecursion 0)
|
# ? Jun 19, 2020 18:44 |
|
Spikes32 posted:So after staring at windows function and union all for a while, and temporary tables not working with the access I have to the dB, this is what I ended up doing. Temporary tables kinda worked, I could create them just not see them or access them after the fact for some dumb reason. Major thanks to everyone who helped out. It took about 3 mins to execute, which was fine for the one time use I needed it for. My spider sense is telling me you want to add a TOP 1 / ORDER BY DateValue DESC to the recursive half of your CTE. It probably doesn't matter but it feels necessary.
|
# ? Jun 19, 2020 19:08 |
|
Spikes32 posted:So now I gotta decide what the average user session is and substitute that for the null values. The session cookies should be set to expire after some time and/or there should be a max idle time for users
|
# ? Jun 22, 2020 13:53 |
|
I've got two tables that are throwing problems at me. They're the same table, but we have to maintain US and EU data centers with different data in them. They're partitioned tables. In the US queries on the table will use the indexes correctly except for one partition. Which sucks, because a seq scan can add 5-10 seconds. In the EU it will never use the index on any partition. Which means our update process goes from minutes to ten or so hours as it full table scans a billion rows a few times. I verified that the indexes show as valid. I've VACUUM ANALYZE'ed the "bad" partitions. I've VACUUM ANALYZE'ed the whole tables. I've REINDEX and then VACUUM ANALYZE'ed the tables AND partitions Whats next? Drop the indexes on the partition and try again?
|
# ? Jun 22, 2020 15:38 |
|
Munkeymon posted:The session cookies should be set to expire after some time and/or there should be a max idle time for users I would love for that to be the case but it's not. The users stay logged in indefinitely to the program because if any info was entered it can't be lost and would be if the user hadn't saved prior to leaving the computer. Yay gmp labs!
|
# ? Jun 22, 2020 16:49 |
|
So, kind of an SQL Newbie here and I am trying to get something to work on Microsoft SQL Server 2005 (yay for old outdated poo poo). I can get it to work properly on my local machine (running SQL Server 2019) but when I run it on the 2005 server it errors out. Query: code:
code:
Basically I want to copy/merge items from a source database into the target database and add new ones that might get added to the source if they are not found in the target. If someone can help me either unfuck this one to work on SQL Server 2005 or propose/give me an example of a different solution that will accomplish the same thing and work on SQL Server 2005 that would be awesome and I would forever be indebted. ModeSix fucked around with this message at 03:49 on Jun 23, 2020 |
# ? Jun 23, 2020 03:26 |
|
ModeSix posted:So, kind of an SQL Newbie here and I am trying to get something to work on Microsoft SQL Server 2005 (yay for old outdated poo poo). You could do something like INSERT Target ( Columns to insert... ) SELECT Source.column, ... FROM LabelTraxxReplication.[dbo].STOCKPRODUCT AS Source LEFT OUTER JOIN CustomDB.[dbo].StockCounts AS [Target] ON ... WHERR Target.id IS NULL; Sorry for the formatting and not spelling all of the columns out. I'm on my phone and also exhausted.
|
# ? Jun 23, 2020 05:30 |
|
MERGE was introduced with SQL2008, confirm.
|
# ? Jun 23, 2020 09:34 |
|
Thanks for that clarification guys. So this is what I got and it works! code:
|
# ? Jun 23, 2020 21:35 |
|
ModeSix posted:Thanks for that clarification guys.
|
# ? Jun 23, 2020 23:49 |
|
I have an upcoming interview for a Data Warehouse Specialist position in a health system. My background is nursing and clinical informatics. I realize this is broad but any recommendations on questions to ask during the interview, specific to data warehouse work? The job description is pretty vague about specific technologies used. I have experience with T-SQL, Excel, Tableau etc... but I haven't had any exposure to ETL, Hadoop, Spark, that type stuff. I've been doing healthcare reporting, workflows, and analytics for 6 years so I'm comfortable with the work (or my ability to google stuff I don't know). I'm just not too familiar with the more IT side of this position.
|
# ? Jun 28, 2020 01:50 |
|
Ask if the ETL team is separate from the reporting team. Important to know if you’re going to be doing SSRS or Crystal or just the data warehouse work. Ask if you’ll be working on their clinical system data or if the team is focused on internal data (employees, payroll, revenue, etc)? Typically very different stakeholders and systems. Ask what healthcare software they use? Epic or Cerner? If you will be doing healthcare dw you should know if your work is focused in one of the enterprise software vendors products (I work at Epic) as this may highly impact your focus (they’d want you to be certified and it’s all proprietary stuff). Ask what the ETL pipeline looks like. What tech stack? What timing? Nightly ETL only or mid day or some sort of streaming solution? Good to know for what to expect, especially when it comes to off-hours work expectations / on call shifts. Ask what version of the tech stack they are on. Are they keeping up to date with releases? Can you use the newest tech or are you shacked to decades old releases? I wouldn’t want to work somewhere that refused to upgrade past SQL Server 2008 for example. Ask the size of the team, and especially if you are interviewing with one of them, ask their career path and how they got to where they are. Ask what projects they work on that motivate or excite them. I wouldn’t want to work somewhere where someone couldn’t at least bullshit a convincing answer. Ask what their WFH policy is during this period. Make sure you don’t sign up for a company with a terrible stance towards coronavirus. That’s all I got in the moment but happy to answer other questions if you got em.
|
# ? Jun 28, 2020 04:01 |
|
Ruggan posted:Good stuff... You are a saint. This is exactly what I was looking for, thank you!
|
# ? Jun 28, 2020 21:22 |
Is there an easy way to backup a DB, and then restore it to the same SQL server under a different name, but with all the DB objects like stored procs and views pointing at the new name? I know I can generate scripts with a drop and create option, and I think that will cover a bunch of the items, but im not sure it will cover all of them.
|
|
# ? Jul 1, 2020 04:51 |
|
NPR Journalizard posted:Is there an easy way to backup a DB, and then restore it to the same SQL server under a different name, but with all the DB objects like stored procs and views pointing at the new name? Most engines will let you do a rename during restore and recovery, which database are you using here for this?
|
# ? Jul 1, 2020 09:10 |
Moonwolf posted:Most engines will let you do a rename during restore and recovery, which database are you using here for this? MSSQL 2012 or above. I can rename the database during the restore no problems. Its the stored procs and views, of which there are quite a lot, that are the pressing concern. Cant have them refernce back to the original DB name, because thats going to break a lot of stuff in a live environment.
|
|
# ? Jul 1, 2020 10:50 |
|
I think you want https://docs.microsoft.com/en-us/sql/relational-databases/data-tier-applications/export-a-data-tier-application
|
# ? Jul 1, 2020 13:36 |
Munkeymon posted:I think you want https://docs.microsoft.com/en-us/sql/relational-databases/data-tier-applications/export-a-data-tier-application That looks exactly like what I want, thank you very much.
|
|
# ? Jul 2, 2020 05:36 |
|
Ruggan posted:Ask if the ETL team is separate from the reporting team. Important to know if you’re going to be doing SSRS or Crystal or just the data warehouse work. Had the interview. Couple of thought/questions:
Overall, sounded like it could be a cool gig. I'd have to brush up on my tech skills a little more. No on-call which is nice. 100% remote right now, will revisit options down the road as COVID progresses.
|
# ? Jul 2, 2020 20:06 |
|
I've got a three day weekend to solve this problem in Postgrescode:
But if I do this: code:
As a bonus if I do this code:
We've got an implementation team at MS who is supposed to be helping with this poo poo, but because of the holiday they're kinda MIA. We've only gotten an acknowledgement that "That... shouldn't be happening". I've messed around and done more REINDEX, VACUUMs and ANALYZE commands than I can count. Hell, I even detached the partitions, clustered on the index on TimeStampA and reattached. Still does a sequence scan.
|
# ? Jul 3, 2020 04:43 |
|
Maybe if you pull the select max up into a cte?
|
# ? Jul 3, 2020 19:24 |
|
Pardot posted:Maybe if you pull the select max up into a cte? No change. But that did get me to think about encapsulating the subquery in a function, which seems to work? There is one sticky partition that won't use the index, but 5 minutes beats a few hours. Thanks!
|
# ? Jul 3, 2020 20:38 |
|
|
# ? Jun 8, 2024 09:13 |
|
Not a Postgres expert (I do mostly MS SQL), but maybe do something like this?code:
|
# ? Jul 4, 2020 03:04 |