|
Jabor posted:The use case for uuids is "I want to generate records in many disparate servers at the same time, and not have to deal with collisions when we eventually reconcile them". Note that you can replace "many disparate servers" with "many disparate clients" and the same principle holds, without any need for your application to be anything more than a boring client-server monolith. It's very nice when your mobile fart app can just locally generate a globally unique FartID and then keep POSTing your fart until success without any concerns about accidental double farts or whatever.
|
# ? Oct 7, 2021 22:11 |
|
|
# ? Jun 10, 2024 18:41 |
|
there are people who don’t want surrogate keys? …why would one hate surrogate keys?
|
# ? Oct 8, 2021 09:33 |
|
There are other reasons to favour GUIDs as primary keys aside from distributed databases. https://en.wikipedia.org/wiki/German_tank_problem also: in some circumstances, the fact that primary key values in different tables cannot collide might provide greater confidence that you are not introducing bugs like joins to the wrong table.
|
# ? Oct 8, 2021 10:03 |
|
redleader posted:the only people who take offense at the idea of surrogate keys seem to be those who have crawled extremely up their own rear end using relational algebra Everyone else is talking about surrogate keys as the alternative to externally-meaningful keys, but I suspect you are talking about surrogate keys as the alternative to composite keys, which is a different dichotomy.
|
# ? Oct 8, 2021 10:08 |
|
Hammerite posted:Everyone else is talking about surrogate keys as the alternative to externally-meaningful keys, but I suspect you are talking about surrogate keys as the alternative to composite keys, which is a different dichotomy. nope, applies to both cases
|
# ? Oct 8, 2021 11:35 |
|
Well, what does the choice of real-world vs. surrogate keys have to do with relational algebra and one's view of it? The meaning of a key (or absence of meaning) is surely irrelevant to relational algebra.
|
# ? Oct 8, 2021 11:49 |
|
I'm happy to see there are finally some names given to some of the approaches I've advocated for instead of UUIDs in the past, and it seems that some databases have added functions to flop the high/low timestamps of UUIDs to reduce the btree insert performance issues. Maybe everyone will switch to ULIDs as the new hotness since there are libraries everywhere. Here's the best summary I stumbled across, a good blend of basic performance and summary of alternatives: https://brandur.org/nanoglyphs/026-ids
|
# ? Oct 9, 2021 15:51 |
|
gonna need some guidance on a stupid report i have to build. basically, i have to de-normalize? the data i have. so the data is of the form: code:
however, my coworker needs me to convert that into: code:
any help would be much appreciated. abelwingnut fucked around with this message at 19:11 on Oct 14, 2021 |
# ? Oct 14, 2021 19:08 |
I can't write the SQL off the top of my head, but you can probably write a CTE that ranks the items for each transaction with ordinal 1, 2, 3, 4, 5,... and then select a five-way self join on the CTE to construct rows with up to five ranked items from each.
|
|
# ? Oct 14, 2021 22:20 |
|
abelwingnut posted:gonna need some guidance on a stupid report i have to build. basically, i have to de-normalize? the data i have. What database vendor and version are you using? I actually posted some pivot code for a previous post recently. The numbered CTE is a good option if you have a set amount of columns, otherwise you'll probably need a dynamic approach.
|
# ? Oct 15, 2021 02:52 |
|
i’m using ms sql server.
|
# ? Oct 15, 2021 08:34 |
|
abelwingnut posted:i’m using ms sql server. This should work -- CTE method like nielsm suggested code:
|
# ? Oct 15, 2021 16:09 |
|
thanks! i'll check it out next week.
|
# ? Oct 16, 2021 15:21 |
|
got it working! thank you!
|
# ? Oct 18, 2021 21:25 |
|
Serious Halloween / Spooky Crap > The Cavern of COBOL > The SQL is Always Just a Cache Grab
|
# ? Oct 29, 2021 00:26 |
|
I'm building a view only dashboard that has related meta data on top of our main production database. I've created a separate database for the application with read/write, and I'm pulling in data from our production database using a view and a read only connection. I was considering adding foreign keys in my app's database relating the meta data to the data in the view but I started wondering if this would affect the production database in any way. Like it wouldn't prevent a delete or something on the production database would it?
|
# ? Nov 1, 2021 16:14 |
|
I'm new to MySQL did years and years of SQL Server. I want to turn off safe update mode on my dev db permanently on the stored connection level. Can I do that? Looking like I can't. Talking Workbench. Seems annoying. https://stackoverflow.com/questions/11448068/mysql-error-code-1175-during-update-in-mysql-workbench
|
# ? Nov 1, 2021 16:58 |
|
I'm having a hell of a time trying to figure this out..if anyone has some pointers I'd be thrilled.. Let's say I have a table: code:
code:
Edit: Welp, I think I figured it out! I did an INNER JOIN on the same table then it generated two timestamp columns for me which I can use to get the diff. Huzzah! Bank fucked around with this message at 23:01 on Nov 3, 2021 |
# ? Nov 3, 2021 22:38 |
|
Bank posted:I'm having a hell of a time trying to figure this out..if anyone has some pointers I'd be thrilled.. So just to put it out there: EventStart and EventEnd are probably better suited to being two columns on a table representing the entire event rather than separate rows. Customer No, Event Key, EventStartTime, EventEndTime (nullable) Child records could be accessed by having the parent event key as a column on the child record's row
|
# ? Nov 4, 2021 22:19 |
|
Nth Doctor posted:So just to put it out there: I probably didn't explain it clearly, but the problem is each event is a bit flip. So for example: code:
|
# ? Nov 4, 2021 23:02 |
|
Bank posted:I probably didn't explain it clearly, but the problem is each event is a bit flip. So for example: Okay that helps give context. It still would simplify your queries if you at least had some sort of Application ID you could set to group all of those records that would be different from Customer 1 starting another application some other time. Feel free to ignore me
|
# ? Nov 5, 2021 01:12 |
|
No worries..in this case a customer should only go through the full process once. I didn't design this table, but probably would have done it differently if given the chance. I'm not a BSA/DBA by any measure though, I'm just trying to fix this stupid dashboard because someone decided to spend the past 3 months making excuses while waiting for their contract to end.
|
# ? Nov 5, 2021 01:25 |
|
Bank posted:No worries..in this case a customer should only go through the full process once. I can write the SQL better when I'm not on my phone, but one way to solve this problem is by joining to the table N times where N is the number of different steps.. So for example, select ... From table as t Inner join t as tp on t.customer=tp.customer Where t.type=submitted And tp.type =processed Etc. This will get you one row per customer where you have a column for submitted date times and processed date times,etc. From here it's easy to calculate the difference. Basically you are pivoting the data from rows to columns.
|
# ? Nov 5, 2021 03:33 |
|
Condimentalist posted:I can write the SQL better when I'm not on my phone, but one way to solve this problem is by joining to the table N times where N is the number of different steps.. Yup, that's basically what I did (see the edit I had). Inner join on itself and I basically had access to twice the number of columns. It felt really hacky, but I guess other than re-designing the table this is probably the easiest way to do it.
|
# ? Nov 5, 2021 05:08 |
|
Bank posted:Yup, that's basically what I did (see the edit I had). Inner join on itself and I basically had access to twice the number of columns. It felt really hacky, but I guess other than re-designing the table this is probably the easiest way to do it. Depending on the engine you're using, you can probably use window functions and some filtering to do this too.
|
# ? Nov 6, 2021 15:53 |
|
Hey all, hoping for some input on unfucking a Data Warehouse loading process that I'm about to inherit. Both live system and DWH are Oracle 19c databases. Right now, the intended process is that at 9 PM every day (well after anything happens in the live system), the DWH pulls in full copies of about a dozen "core" tables from the live system into its loading schema and processes from there. However, that's about all that's going right as of now. To wit:
I think what I want to achieve is a real-time DWH, but I freely admit I'm out of my depth here. How do I keep what's essentially copies of the live tables in the DWH loading schema with a "reasonable" lag? So far I've looked at:
Adding to the complications is that having real time (or near real time) data in the DWH means we would be loading while the live data is changing; how do I square having an always consistent overall version of all relevant tables in the DWH with the delays that arise from the replication? And on the matter of data model changes in the source system: in a perfect world, they wouldn't happen outside of a coordinated release process, but we're still in the beginning stages of switching to that kind of forethought and planning. My pessimistic read is that the loading process needs to be as resilient as I can possibly make it because relying on anyone playing nice in this environment is a sucker's bet. Even right now, where there's way too much data lying around in the loading schema precisely because of this issue, that data is not complete as I see it because changes in the source objects mean new rows probably can't be inserted into the loading tables. It'd be cool if it was able to replicate DDL on the source tables to the mirrored tables in the DWH's loading schema; alternatively I would accept if it dumped source rows transformed into JSON or XML into a CLOB column, deferring the issue of data model up to the point where I need to process data out of the loading schema. Any pointers/opinions/relevant resources from you guys?
|
# ? Nov 6, 2021 18:12 |
|
Do you store historical snapshots in the data warehouse, or load data from multiple sources into it? Or do you only really have a split between the live database and the data warehouse to prevent heavy analytical queries from impacting the live database? If it's the latter, it sounds like a read replica/standby would be the best and the simplest solution.
|
# ? Nov 9, 2021 19:31 |
|
We do store some historical data but I'm currently clearing up whether anyone actually needs it. Right now we're looking at the refreshable PDB method since Active Data Guard (the currently canonical "real-time read-only replica" method) costs $$$ in additional licensing.
|
# ? Nov 9, 2021 19:42 |
|
I got a promotion at work and will be doing SQL stuff but I don't have more than cursory knowledge (it's easier to learn how SQL works than our 25 year old mainframe which is why they hire internally lol). Is there a good online primer/intro course I can go through so I'm a bit ahead of things? e: SubponticatePoster fucked around with this message at 22:06 on Nov 11, 2021 |
# ? Nov 11, 2021 18:29 |
|
I personally like these two: https://www.codecademy.com/learn/learn-sql https://www.w3schools.com/sql/default.asp
|
# ? Nov 11, 2021 19:56 |
|
Once you've gotten more familiar with the syntax and want to understand a little better why some queries are slow and why some are fast, you can give this a shot. Beware: The formatting kind of hides it, but it's a whole-rear end book.
|
# ? Nov 11, 2021 23:13 |
|
How do I return the column names of the largest two values in a given row of data in an Aurora Postgres (12.7) table? I am building a CDN monitoring system that pings each of 60 POPs from 20 locations and dumps the data into a table like this: code:
host, pop1, pop2 host2, pop2, pop1 host3, pop60, pop1 Is this doable with straight SQL?
|
# ? Nov 12, 2021 20:58 |
|
group by hostname, pivot columns to rows within a group, take the max of the rows within the group? I’d have to look if there’s an ANSI-compliant way to do pivot (ANSI SQL is incredibly limited and annoying and even then there’s variations in implementation details) but every decent SQL DB is going to offer that functionality regardless of ANSI-ness, perhaps just with slightly different syntax. remember that depending on what you are doing, if this is part of something bigger, it may be easier to pull the annoying bits out into a view - you can prove the annoying bits work as expected, and potentially you can come back later and tune that bit if you’ve found a different way you want it to work (especially if that particular DB offers some specific non-ANSI functionality that helps you). So you might have a view that gives you (hostname, minColumn, minValue, maxColumn, maxValue) and if you are using that as a part of a larger query you just join that. That approach also happens to give you a way to “encapsulate” non-portable queries or non-ORM’able queries - if you have to support three DBs, you can just have different definitions of that view for each DB, and if you need to use it as part of an ORM query (despite ORMs not necessarily supporting the syntax you need) you can define the view as an immutable table/entity in Hibernate or whatever, and then just join against it from HQL like a normal entity - you can have it as a lazy-loaded entity relationship, or if you want to avoid polluting your entities just do the SQL-style “JOIN MyReportView ON entity.hostname = MyReportView.hostname”. (people hate ORMs but I’m of the opinion that ORM doesn’t have to be all or nothing, and if you can pull specific bits back to native SQL but still use ORM generally it can often be convenient to do so. Using “native query” functionality inside the ORM framework is fine if you just want a specific tuple, but if you want the whole object/POJO/etc then you’re writing the boilerplate code, which needs to be maintained going forward, and that’s where “views as ORM entities to encapsulate lovely parts of the logic” work well. It’s just that people drilling down to connection level and doing native poo poo there that makes me wary, we’ve had far too many fails from people who thought they were cute writing native JDBC queries but also didn’t understand the full contract of Connection/PreparedStatement/ResultSet/etc and ended up causing cursor leaks/etc, or not understanding the necessity of hibernate getting flushed before native queries leading to stuff being visible or not visible when it should/shouldn’t be. So in my book, if your complex query needs to pull back more than a tuple (i.e. needs full objects), and it can be made to work with the “join a view and pull back the rows that match some ORM query” pattern then I’ve generally leaned on people to do that just to avoid the hazards of things going on that Hibernate doesn’t know about. The ORM is bumper bowling for lovely developers: yeah you still suck but it keeps you going in the right direction and limits how bad the fallout can be.) Paul MaudDib fucked around with this message at 22:02 on Nov 12, 2021 |
# ? Nov 12, 2021 21:06 |
Consider if you should normalize your schema. I think it would look something like Host (id , name) Pop (id, name) Measurement (id, time) ResponseTime (measurement_id, host_id, pop_id, response_time)
|
|
# ? Nov 13, 2021 00:02 |
|
Agrikk posted:How do I return the column names of the largest two values in a given row of data in an Aurora Postgres (12.7) table? Having 60 of them sucks. Adding one more will suck more. The system tables can get the column names so you could leverage that to build a query to reshape the data. Sorry, pheun posting so that's the best I can do right now.
|
# ? Nov 13, 2021 00:05 |
|
nielsm posted:Consider if you should normalize your schema. This is absolutely the right approach and it's amazingly simple now that you've brought it up. When I create the table it seemed like a no brainer to make it an X-Y chart with Z as the time dimension, but then I got stuck in this situation and I realized that it didn't feel right without knowing why. I'm going to try rebuilding the app, which will require some re-coding of the host nodes, but I think it'll be worth it. Thanks!
|
# ? Nov 13, 2021 07:58 |
|
Agrikk posted:This is absolutely the right approach and it's amazingly simple now that you've brought it up. Normalization saves the day yet again
|
# ? Nov 13, 2021 15:22 |
|
What is a good primer on postgresql roles and users and how to manage them? I know the basics of mySQL (as much as I need anyway) but I need to apply the same knowledge to extending a system to also record stuff in postgre. Problem is, that drat access control! I haven't gotten even to the point of creating a database and inserting things. I want to do this through the command line, I mean, psql in linux. And all the blog posts and explanations I find are either incomplete or waaaaaay over my head. Thanks in advance!
|
# ? Nov 23, 2021 15:19 |
|
Dawncloack posted:What is a good primer on postgresql roles and users and how to manage them? https://www.postgresql.org/docs/14/tutorial-createdb.html If you really think you need to be creating roles https://www.postgresql.org/docs/14/user-manag.html Or did you not configure the server? https://www.postgresql.org/docs/14/auth-pg-hba-conf.html It's easier to help if you describe what is working, and what it is actually doing if it seems wrong. PhantomOfTheCopier fucked around with this message at 23:20 on Nov 23, 2021 |
# ? Nov 23, 2021 23:14 |
|
|
# ? Jun 10, 2024 18:41 |
|
Thank you for those! It's a starting question of sorts. I have SQL notions but I with the psql command line program I havent even gotten to the point of creating a database and inserting. I keep getting "you don't have the privileges to do that ha ha!" errors.
|
# ? Nov 24, 2021 00:28 |