Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I don't understand the problem. You have an image in front of you and the assignment is to make an image that looks just like that image. Are you supposed to put it into a Word document, embed that in PowerPoint, export it to PDf, print that, take a picture with your phone, and email that as your solution?

Or are you asking how to create entity relationship diagrams? https://stackoverflow.com/questions/22673352/colorize-table-name-in-graphviz-entity-relationship-diagram

Adbot
ADBOT LOVES YOU

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
is there a good solution for when you want a paginated result from a query, but also want to know the total number of results in the set?

the two naive solutions are to (a) pull everything and just paginate the results in the client, which is undesirable when the result set is very large, and (b) run two separate queries, one for the results and one for the count.

the "other" idea I had is to query everything, then use a scrollable resultset (JDBC/Hibernate) and manually scroll the results partway though, output a limited number of results, then close it?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Well I suppose you could stick the COUNT(*) in an additional column in the result set, but if your database driver supports it and it's not a performance bottleneck I'd much rather run a combined query (eg. "SELECT COUNT(*) FROM FOO; SELECT * FROM FOO OFFSET @o LIMIT @l").

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Try a window function?

SELECT count(1) OVER (), column FROM ...

Then stuff that in a cursor or maybe offset/limit? (I haven't had a chance to try either.) Seems like this would be bad for performance eventually depending on indexes.

Skyarb
Sep 20, 2018

MMMPH MMMPPHH MPPPH GLUCK GLUCK OH SORRY I DIDNT SEE YOU THERE I WAS JUST CHOKING DOWN THIS BATTLEFIELD COCK DID YOU KNOW BATTLEFIELD IS THE BEST VIDEO GAME EVER NOW IF YOULL EXCUSE ME ILL GO BACK TO THIS BATTLECOCK
Working with postgres, and I am doing a bunch of batch inserts. I need to get the ID generated from the inserts and use those id's for another insert in another table.

Problem is, postgres does not guarantee the order of the returned IDs when doing batch inserts. How can I batch a bulk insert, and take those returned ids and use them elsewhere while being certain I don't lose referential order.

nielsm
Jun 1, 2009



Generate the amount of id's you need from the sequence, assign them manually to the data you insert in the first table, then you know which id's correspond to which data for inserting into the second table.

Skyarb
Sep 20, 2018

MMMPH MMMPPHH MPPPH GLUCK GLUCK OH SORRY I DIDNT SEE YOU THERE I WAS JUST CHOKING DOWN THIS BATTLEFIELD COCK DID YOU KNOW BATTLEFIELD IS THE BEST VIDEO GAME EVER NOW IF YOULL EXCUSE ME ILL GO BACK TO THIS BATTLECOCK

nielsm posted:

Generate the amount of id's you need from the sequence, assign them manually to the data you insert in the first table, then you know which id's correspond to which data for inserting into the second table.

But I just want to use the automatically genereated ID's (harumph)

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Return both the IDs and whatever column/s you need to line up the referenced object.

Skyarb
Sep 20, 2018

MMMPH MMMPPHH MPPPH GLUCK GLUCK OH SORRY I DIDNT SEE YOU THERE I WAS JUST CHOKING DOWN THIS BATTLEFIELD COCK DID YOU KNOW BATTLEFIELD IS THE BEST VIDEO GAME EVER NOW IF YOULL EXCUSE ME ILL GO BACK TO THIS BATTLECOCK

NihilCredo posted:

Return both the IDs and whatever column/s you need to line up the referenced object.

The extra information for table 2 isn't inserted into table 1.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Skyarb posted:

The extra information for table 2 isn't inserted into table 1.

Return some key info from table 1, then, and follow the object link at the application level via a dictionary or something.

I am assuming you do have such a link, otherwise you couldn't know the data is matched correctly, but please let me know if that's not the case

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
In PostgreSQL, is there a way I can store the current search_path in a variable, change it, and then repopulate it with the old value later?

For example:

This has to be runnable with search_path set to an arbitrary value. After it is run, search_path should be the same as it was before the code ran. But the command to create MyTable must have caused it to be created within MySchema.

code:
DROP SCHEMA IF EXISTS MySchema CASCADE;
CREATE SCHEMA IF NOT EXISTS MySchema;

MYSTERY COMMAND 1;

SET search_path = MySchema;

CREATE TABLE MyTable (int x);

MYSTERY COMMAND 2;
Does there exist a MYSTERY COMMAND 1 and a MYSTERY COMMAND 2 that will make this work as described?

The goal here is that it should be possible to create a script that creates several objects in the same schema, but not have to make sure that each object's CREATE statement specifies the schema name correctly; they should automatically be created in the desired schema. With whatever boilerplate is necessary at the top and bottom of the file to store, set, and restore the search_path.

Ninja.Bob
Mar 31, 2005
Will these functions do what you need? You might be able to just use set_config because if the last parameter 'is_local' is true the setting is only applied to the current transaction.
https://www.postgresql.org/docs/current/functions-admin.html
SQL code:
select set_config('search_path', 'MySchema', true);
select current_setting('search_path');

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Ninja.Bob posted:

Will these functions do what you need? You might be able to just use set_config because if the last parameter 'is_local' is true the setting is only applied to the current transaction.
https://www.postgresql.org/docs/current/functions-admin.html
SQL code:
select set_config('search_path', 'MySchema', true);
select current_setting('search_path');

Sorry for taking a while to respond. It looks like that set_config(..., ..., true) is what I need, thanks.

So it should look like:

code:
DROP SCHEMA IF EXISTS MySchema CASCADE;
CREATE SCHEMA IF NOT EXISTS MySchema;

DO $$
BEGIN
    PERFORM set_config('search_path', 'MySchema', true);
    CREATE TABLE MyTable (int x);
END $$;
Something I wasn't expecting is that when I tried creating a stored procedure inside the anonymous block, it seems the stored procedure doesn't have the search_path from the context where it was defined, instead it has the default search_path. I tried creating a table and a stored procedure that inserts into that table, but it only worked if the table's name was schema-qualified in the stored procedure.

e: copied wrong line of code

Hammerite fucked around with this message at 15:10 on Aug 20, 2021

Ninja.Bob
Mar 31, 2005

Hammerite posted:

Something I wasn't expecting is that when I tried creating a stored procedure inside the anonymous block, it seems the stored procedure doesn't have the search_path from the context where it was defined, instead it has the default search_path. I tried creating a table and a stored procedure that inserts into that table, but it only worked if the table's name was schema-qualified in the stored procedure.

Yep that's correct, just add the set line to the end of the function definition.

SQL code:
create function test() returns void as 
$$
begin
  ...
end;
$$ language plpgsql set search_path = MySchema;

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
On SOLR - we have a relatively trivial document (maybe 30 lines) but it's got some nested documents inside it (that are also relatively trivial - about 10 lines).

One of our team members has raised the point that the child document thing may be overkill and may be awkward to work with (and we don't have a problem with updating the whole document as a unit). This has raised the point of whether we should bother with the child document at all, or whether we should just flatten the parent document and give it field names like "myChildDoc_fieldA", etc, so the child documents just get flattened onto the parent docs according to some scheme we set up.

If we go the child document route - right now our child documents don't have an actual ID in them. I suppose we could structure that so that they are related to the parent doc, like for document uid #123 we could give the child document a uid of "123.myChildDoc" or something like that.

Thoughts? It seems like small, relatively trivial nested docs like that should be well-supported/easy to use in the first place in a noSQL document store. But right now we don't have a ton of expertise on SOLR.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
based on the size of your documents, my thoughts are to throw the solr cluster away and use postgres

I would blow Dane Cook
Dec 26, 2008
I've got to copy the data from a relatively large postgresql database (1854 GB) to a new Azure Microsoft SQL Server Database.

My first idea was to use to the "SQL Server import and export wizard" because it is relatively straightforward once you get the postgres driver setup. It can also create an SSIS package for what I want it to do.

Now if I select the postgres server as the source and the MSSQL server as the destination, will all the data have to come through the machine the wizard is running on? (I.E. my little laptop and residential internet connection), or will the two servers copy the data between themselves directly without having to go through my little laptop?

I need to figure this out because if it's coming through my laptop at all then I need to spin up a big VM to run it on instead?

Or is my whole approach completely wrong?

MrKatharsis
Nov 29, 2003

feel the bern
If it's a one time thing, routing it through your home internet is fine.

Using a VM is a good idea so your coworkers can continue the task after you win the lottery this afternoon.

Or in case you lose power.

Or or or...

I would blow Dane Cook
Dec 26, 2008

MrKatharsis posted:

If it's a one time thing, routing it through your home internet is fine.

Using a VM is a good idea so your coworkers can continue the task after you win the lottery this afternoon.

Or in case you lose power.

Or or or...

You're saying it would go through my home pc?

nielsm
Jun 1, 2009



The data will go through the machine the import/export proces is run on. In general, the database servers from different vendors can't connect to each other for direct transfers, they use different protocols and all. All that wizard is doing for you is hooking up a SELECT query on the source to fetch all the data, and then send INSERT queries to the destination to load it back in.

I would blow Dane Cook
Dec 26, 2008

nielsm posted:

The data will go through the machine the import/export proces is run on. In general, the database servers from different vendors can't connect to each other for direct transfers, they use different protocols and all. All that wizard is doing for you is hooking up a SELECT query on the source to fetch all the data, and then send INSERT queries to the destination to load it back in.

Ah ok thanks. I might get a VM then.

Heavy_D
Feb 16, 2002

"rararararara" contains the meaning of everything, kept in simple rectangular structures
If you saved the package the wizard creates, and then skipped running it out of the wizard, you might be able to deploy it to the database. Then you could get the server to run it instead of your laptop. But a VM might be easier to organise.

I would blow Dane Cook
Dec 26, 2008

Heavy_D posted:

If you saved the package the wizard creates, and then skipped running it out of the wizard, you might be able to deploy it to the database. Then you could get the server to run it instead of your laptop. But a VM might be easier to organise.

Thanks for that, this is all going to azure so i'll whats easier.

Spikes32
Jul 25, 2013

Happy trees
Got a question on querying where the target has multiple custom fields where the custom fields aren't columns in the table but rows in another table, with a third table hosting all the values.

I want to be able to query the content table and also get the custom fields for that piece of content with their respective labels and values. A custom field will only ever have one type of data in it, but I can't figure out how to scale to n number of custom fields and only grab the data field that is not null. Any suggestions for where I could start working on this? Screenshot of my dB example made in excel helps I hope.

Kuule hain nussivan
Nov 27, 2008

Spikes32 posted:

Got a question on querying where the target has multiple custom fields where the custom fields aren't columns in the table but rows in another table, with a third table hosting all the values.

I want to be able to query the content table and also get the custom fields for that piece of content with their respective labels and values. A custom field will only ever have one type of data in it, but I can't figure out how to scale to n number of custom fields and only grab the data field that is not null. Any suggestions for where I could start working on this? Screenshot of my dB example made in excel helps I hope.
Oh wow, this monstrosity reminds me of the complete rear end database I have to work with.

Will it matter if the value fields you pull are all stringified? If I understand this correctly, you could join Content to CustomFieldValue using the fk connection and then convert all the value type columns to nvarchar and COALESCE them to get the one non-null value. Then you should probably join to CustomField so you can grab the field name for each value from there. Then you can use the PIVOT command to turn your field name - field value combinations into columns and you should be done. Let me know if this sounds ok to you and I'll try to whip up a query for you.

Spikes32
Jul 25, 2013

Happy trees
That does sound like it would work, but if you don't feel like writing it up I can muddle through it too

Kuule hain nussivan
Nov 27, 2008

Spikes32 posted:

That does sound like it would work, but if you don't feel like writing it up I can muddle through it too

Hmmm...I tried mullin this over for a bit, and I don't think what I thought would work. I'll ponder it for a while and see if I can come up with something workable.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
Where a good natural key exists, and you are willing to accept the business rules that go along with a natural primary key (eg guarantee key uniqueness, cannot change the natural key value, or must remap the natural key across the entire dataset), is it worthwhile to use a natural key, or should you pretty much always use a surrogate key?

The example I'm looking at is username: we guarantee username uniqueness, and we do not allow usernames to be changed ever. We do currently have a userUid value, but we generally want to encapsulate this and make it an implementation detail that users don't need to know about. In a lot of places, in order to present username rather than userUid inside an API response, we have to do a ton of joins in order to pull back the username (example: we have audit logging requirements and we track who the user to create a row was and who modified it last, so every single audit logged row will join user table at least twice).

Obviously this isn't the end of the world in performance terms, joins are fast in general since they're obviously primary-key-indexed and we don't have tons of users, but just as a matter of style, should we be using the natural key here, or should we just make a surrogate key and deal with it? Are there use-cases where a surrogate key would suddenly make sense again? Is it generally "bad style" to use a varchar/text column as a primary key due to that blowing out the size of indexes/etc, or is that interacted with via hash (Oracle 19c) rather than the value directly, or is that a "worry about it when you cross that bridge" problems?

The one area where I think surrogate does definitely make sense is working with composite keys (primary key is a tuple of multiple columns) via ORM, I find the process of managing "ID objects" painful in java/hibernate and I'd strongly rather just have a surrogate that is a single value.

Condimentalist
Jun 13, 2007

Spikes32 posted:

Got a question on querying where the target has multiple custom fields where the custom fields aren't columns in the table but rows in another table, with a third table hosting all the values.

I want to be able to query the content table and also get the custom fields for that piece of content with their respective labels and values. A custom field will only ever have one type of data in it, but I can't figure out how to scale to n number of custom fields and only grab the data field that is not null. Any suggestions for where I could start working on this? Screenshot of my dB example made in excel helps I hope.

Not that it necessarily matters, but what rmdbs are you using?

It's gross, but if I had to solve this problem I might try to tackle it using dynamic SQL // a dynamic pivot VIEW... I may noodle this some and see if I can throw a script together for you

Spikes32
Jul 25, 2013

Happy trees

Condimentalist posted:

Not that it necessarily matters, but what rmdbs are you using?

It's gross, but if I had to solve this problem I might try to tackle it using dynamic SQL // a dynamic pivot VIEW... I may noodle this some and see if I can throw a script together for you

Mariadb, but I don't have admin access to the dB which makes things more complicated. A view might not work because any particular piece of content might have N number of different custom fields depending on content type and when the content was created / modified and what custom fields were associated with the content at the time. This is a LIMS database if it helps at all, so the software is intentionally configurable.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Your example doesn't really make sense.

- Weight should to be a float value, but your weight extra actually has an int
- There's a row that claims to be a birth date, but contains a string instead of a date
- You want a query to divine that the aforementioned bogus birth date row is actually a supplier, and return that instead of the correct (uncorrupted) supplier value?

Assuming those are just errors in putting together the example and aren't actually in your database ... what's the point of this query? If you're looking up a single element, leaving it as rows seems equally as useful (if not more so) than pivoting it into columns, given that I don't know (when running this query) what columns I'm going to have. If I'm querying a bunch of elements for further processing, getting back data of a totally unknown shape is pretty useless to me too. Do you just want to render in a UI all the values for every piece of data you have? Even that would better be served by getting the values as separate rows (ordered so that all the values for one piece of data are together), unless your "UI" is just an SQL shell you're pasting queries into.

Condimentalist
Jun 13, 2007

Spikes32 posted:

Mariadb, but I don't have admin access to the dB which makes things more complicated. A view might not work because any particular piece of content might have N number of different custom fields depending on content type and when the content was created / modified and what custom fields were associated with the content at the time. This is a LIMS database if it helps at all, so the software is intentionally configurable.

Just for the heck of it, would you mind posting your sample data as text so I can SQL fiddle with it? This is slightly more interesting than my actual work.

Spikes32
Jul 25, 2013

Happy trees

Condimentalist posted:

Just for the heck of it, would you mind posting your sample data as text so I can SQL fiddle with it? This is slightly more interesting than my actual work.

Happy to. And to the poster above, thanks for catching those errors I've fixed them. They would not normally happen in the database. As to the point of this query, one thing I'm doing is helping troubleshoot errors and bugs. Doing so it really helps to compare good content with the bugged content. Seeing all custom fields (which may be where the bug is) in a row helps with that. Yes this query would not be helpful from a reporting standpoint, but it does help me otherwise.


Content,,,,,,,,
PK_Content,Stuff,Stuff 2,,,,,,
100,a,b,,,,,,
,,,,,,,,
CustomFieldValue,,,,,,,,
cfv_pk,cfv_fk_customfield,cfv_fk_content,cfv_floatValue,cfv_intValue,cfv_stringValue,cfv_textValue,cfv_dateValue,cfv_modifiedOn
7,1,100,,1.00,,,,2021-01-01 12:30:00:00
8,2,100,,,riteAid,,,2021-01-01 12:30:00:00
9,3,100,,,,,2005-01-01 00:00:00:00,2021-01-01 12:30:00:00
10,3,100,,,,,2005-01-01 00:00:00:00,2021-01-02 11:30:00:00
,,,,,,,,
CustomField,,,,,,,,
cf_pk,datatable_fk,cf_label,cf_fk_datatype,,,,,
1,33,weight,11,,,,,
2,33,supplier,12,,,,,
3,33,birth date,13,,,,,
,,,,,,,,
DataType,,,,,,,,
dt_pk,dt_name,,,,,,,
11,float,,,,,,,
12,string,,,,,,,
13,datetime,,,,,,,
,,,,,,,,
DataTable,,,,,,,,
DTB_PK,DTB_Name,,,,,,,
33,Content,,,,,,,
,,,,,,,,
,,,,,,,,
Desired Output Query,,,,,,,,
PK_Content,Stuff,Stuff 2,Weight,Supplier,Birth Date,,,
100,a,b,1,Walmart,2005-01-01 00:00:00:00,,,

Condimentalist
Jun 13, 2007

Spikes32 posted:

Happy to. And to the poster above, thanks for catching those errors I've fixed them. They would not normally happen in the database. As to the point of this query, one thing I'm doing is helping troubleshoot errors and bugs. Doing so it really helps to compare good content with the bugged content. Seeing all custom fields (which may be where the bug is) in a row helps with that. Yes this query would not be helpful from a reporting standpoint, but it does help me otherwise.


Content,,,,,,,,
PK_Content,Stuff,Stuff 2,,,,,,
100,a,b,,,,,,
,,,,,,,,
CustomFieldValue,,,,,,,,
cfv_pk,cfv_fk_customfield,cfv_fk_content,cfv_floatValue,cfv_intValue,cfv_stringValue,cfv_textValue,cfv_dateValue,cfv_modifiedOn
7,1,100,,1.00,,,,2021-01-01 12:30:00:00
8,2,100,,,riteAid,,,2021-01-01 12:30:00:00
9,3,100,,,,,2005-01-01 00:00:00:00,2021-01-01 12:30:00:00
10,3,100,,,,,2005-01-01 00:00:00:00,2021-01-02 11:30:00:00
,,,,,,,,
CustomField,,,,,,,,
cf_pk,datatable_fk,cf_label,cf_fk_datatype,,,,,
1,33,weight,11,,,,,
2,33,supplier,12,,,,,
3,33,birth date,13,,,,,
,,,,,,,,
DataType,,,,,,,,
dt_pk,dt_name,,,,,,,
11,float,,,,,,,
12,string,,,,,,,
13,datetime,,,,,,,
,,,,,,,,
DataTable,,,,,,,,
DTB_PK,DTB_Name,,,,,,,
33,Content,,,,,,,
,,,,,,,,
,,,,,,,,
Desired Output Query,,,,,,,,
PK_Content,Stuff,Stuff 2,Weight,Supplier,Birth Date,,,
100,a,b,1,Walmart,2005-01-01 00:00:00:00,,,

Well,

this is ugly and dumb but maybe it will help you out...

https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=554bb3e0bbb5958e9bb24e967f3e68f9

Basically, you can manually pivot the rows into columns by manually entering the row datatype name (cf_label) -- Apparently there isn't a good way to do this using mysql, Maria, etc,. There is an easier way to do this type of pivoting in MSSQL which you can see here http://sqlfiddle.com/#!18/3992c/3 -- I had to remove the dynamic pivot code for sqlfiddle, but it looks like this

code:
/*Standard PIVOT with manual columns*/
SELECT *
FROM #combinedCustomData cf
INNER JOIN #Content c ON c.PK_Content = cf.cfv_fk_content
	PIVOT (
	max(cf.cfv_ValueCalculated)
FOR cf_label IN (weight,[birth date],supplier)
	) AS pvt;
/*Dynamic pivot*/
DECLARE @columns NVARCHAR(MAX) = N'', @sql NVARCHAR(MAX) = N'';

-- select the category names
SELECT @columns += QUOTENAME (cf.cf_label) + N','
FROM #CustomField cf
ORDER BY cf.cf_label;

-- remove the last comma
SET @columns = LEFT(@columns, LEN (@columns) - 1);
-- construct dynamic SQL
SET @sql
	= N'
		SELECT *
		INTO ##tmpoutput
		FROM (
			SELECT *
			FROM #combinedCustomData cf
			) t
		PIVOT(max(cfv_ValueCalculated) FOR t.cf_label IN (' + @columns + N')) AS pivot_table1;
	';

-- execute the dynamic SQL
EXECUTE sys.sp_executesql @sql;

Good luck. hah. I had to work with this type of customized data model years ago on an operating room scheduling software; basically the administrator could add new fields and values and then we had to manually pivot the data for reports. lol

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Paul MaudDib posted:

is it worthwhile to use a natural key, or should you pretty much always use a surrogate key?

The example I'm looking at is username: we guarantee username uniqueness, and we do not allow usernames to be changed ever.

The one area where I think surrogate does definitely make sense is working with composite keys (primary key is a tuple of multiple columns) via ORM, I find the process of managing "ID objects" painful in java/hibernate and I'd strongly rather just have a surrogate that is a single value.
Working backwards... Wow I'm always glad I don't use ORMs because every time there's this reversal oddity. The one time you want the natural key, composite keys, and you can't deal with them in the ORM? Sad.

Usernames being unique is different than being immutable, and if there's one thing you'll learn in business: Nothing is immutable. Some derp will realize they meant "John Jane" instead of "JaneJohn", and since it's the VPs fourth cousin you'll have to fix it.

While joins in the database should be performant against row oids, that doesn't mean every application will properly leverage the database. The classic "paginated list of search results" is a good example where transmitting stringy keys is more expensive than integers.

There is one you missed: Data security. There's an argument that keys should not be publicly visible. IE direct references. EG, scanning for usernames against APIs is much easier than scanning UUIDs.

(Aside, I think UUID primary keys are about the dumbest thing ever, especially when they're permanent in the database. It's very rare for anyone to need that much cryptographic entropy, and you might as well just stick to a primary key of randomized-integer keys of decimal length 10--18 depending on your needs.)

Finally... test it! Some databases stash a crap load in buffers, which can kill performance, so avoiding string keys may yield performance benefits.

redleader
Aug 18, 2005

Engage according to operational parameters
nitpick: uuids aren't "cryptographic" in any sense of the word

even if you know your uuidv4 is generated with a csprng or true random source, you cannot guarantee that this will always be the case. it would be somewhere between easy and trivial to gently caress up id generation accidentally, and end up using regular uuids

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
The use case for uuids is "I want to generate records in many disparate servers at the same time, and not have to deal with collisions when we eventually reconcile them".

If you're not running a wildly distributed setup, and are perhaps doing something a bit more traditional with a beefy server plus some read replicas, then don't bother.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Agreed, UUIDs aren't used for data encryption, but the randomness provided by the system for cryptographic algorithms isn't the latent heat of radiation either. :hampants: If you want to use resistor white noise to populate your cryptographic entropy pool, I won't stop you.

Agreed on the second point as well, distributed id generation. I cannot count the number of occurrences I've seen of engineers centralizing UUID generation. Apparently randomness didn't exist before UUIDs.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Paul MaudDib posted:

Where a good natural key exists, and you are willing to accept the business rules that go along with a natural primary key (eg guarantee key uniqueness, cannot change the natural key value, or must remap the natural key across the entire dataset), is it worthwhile to use a natural key, or should you pretty much always use a surrogate key?

The example I'm looking at is username: we guarantee username uniqueness, and we do not allow usernames to be changed ever. We do currently have a userUid value, but we generally want to encapsulate this and make it an implementation detail that users don't need to know about. In a lot of places, in order to present username rather than userUid inside an API response, we have to do a ton of joins in order to pull back the username (example: we have audit logging requirements and we track who the user to create a row was and who modified it last, so every single audit logged row will join user table at least twice).

Obviously this isn't the end of the world in performance terms, joins are fast in general since they're obviously primary-key-indexed and we don't have tons of users, but just as a matter of style, should we be using the natural key here, or should we just make a surrogate key and deal with it? Are there use-cases where a surrogate key would suddenly make sense again? Is it generally "bad style" to use a varchar/text column as a primary key due to that blowing out the size of indexes/etc, or is that interacted with via hash (Oracle 19c) rather than the value directly, or is that a "worry about it when you cross that bridge" problems?

The one area where I think surrogate does definitely make sense is working with composite keys (primary key is a tuple of multiple columns) via ORM, I find the process of managing "ID objects" painful in java/hibernate and I'd strongly rather just have a surrogate that is a single value.

No, don't go there. Identifiers should never carry information. Always use a surrogate primary key. Always. Yes, it's possible to have a more nuanced take, but the case for surrogate keys is so strong that if you have to ask there's no need for nuance. As PhantomOfTheCopier points out, someone will come along at some point and change the requirements for your supposedly stable and globally unique natural key. If it carries information, it's essentially inevitable that someone will need to change it. If you're thinking "but we can refactor at that point", it's almost certainly not going to be that easy. Identifiers like primary keys tend to get stored in all kinds of bizarre places, even (and especially) outside of your own system, so once this happens and you have to change it, it's almost certainly going to have impact on systems and processes outside of your control.

By all means feel free to put unique indexes on your natural keys and expose them however you want, but each row should always have an identifier of its own that isn't tied to what that row represents.

I'm not sure why people seem afraid of joins either. A non-duplicating inner join on a UUID or integer primary key is as close to a free lunch as you'll ever get in a database. It costs basically nothing. Having many joins is a sign of a nicely normalized schema, it's not a code smell. Joins are like the entire reason you're even using a relational database instead of something else. Wanting to avoid joins when using a relational database is like sitting down on a toilet and wanting to avoid taking a poo poo. Still, if it's bothering you, feel free to denormalize. If the username is actually as stable as you claim you can just store it in many places, wherever it's convenient, without any major downside - it's not like you'll ever have to update it in a bunch of places, is it...?

TheFluff fucked around with this message at 03:23 on Oct 7, 2021

Adbot
ADBOT LOVES YOU

redleader
Aug 18, 2005

Engage according to operational parameters
the only people who take offense at the idea of surrogate keys seem to be those who have crawled extremely up their own rear end using relational algebra

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply