Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
You should read this: https://www.postgresql.org/docs/9.1/manage-ag-createdb.html

The PostgreSQL user's Linux account (/etc/passwd) can connect to template1 to add database roles. createdb can specify which role owns the created database.

If you can't "psql template1" as the PG user, you have other issues to fix first.

Adbot
ADBOT LOVES YOU

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
At least a while ago, the default postgres config was super locked down such that it was almost difficult to even log in, like you’re mentioning. On Linux the solution was often to ‘su’ to the Postgres user and run psql since the default with mode was system user authentication. If that didn’t work, or you were on windows where that wasn’t really a thing, you had to enable ‘trust’ mode auth (no authentication) for local.

Once you’re on the PG console then you run ALTER USER postgres WITH ENCRYPTED PASSWORD ‘xyz’;

(You should see an ALTER USER echoback, if not you probably forgot the semicolon and the command won’t have taken effect yet.)

Then after the Postgres user is assigned a password, you basically just need to make sure auth mode ‘md5’ is enabled for whatever connection modality you want (tcp more for network connections, local socket connections, etc) and that trust auth is turned off. To get network connections to work you also may need to set a ‘listen address’ in postgresql.conf. The address ‘*’ is fine unless you have a reason to not be listening on some particular interface but iirc this is also not set by default just like network logins are not set by default.

The relevant file is pg_hba.conf for auth settings, and postgresql.conf for listen address.

Do also be aware that you probably want to run the pg_tune script asap as by default postgres is only configured to use like 256mb of memory so that it runs in toy environments, but if you want to do real stuff with it, increasing it will help performance a lot. Also do try to keep the number of connections reasonable - every single connection gets dedicated memory and you’ll have to balance between connection count and connection resources.

Paul MaudDib fucked around with this message at 01:50 on Nov 24, 2021

Just-In-Timeberlake
Aug 18, 2003
e: nevermind, figured it out

Is there a way to write a pivot table to pivot on a common value in a column in TSQL? For example, if I have this:

code:

col1         |     col 2
----------------------------
0	     |    value1
0            |    value2
1	     |    value1
1            |    value2

And I want to turn it into this based on the value in col1:

code:
col1      |    col2       |     col 3
------------------------------------------------
0         |      value1   |      value2
1         |      value1   |      value2

Just-In-Timeberlake fucked around with this message at 14:48 on Dec 10, 2021

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Postgres 12 weird performance characteristic of the day:

I had an input array of child IDs and needed to find all the parent IDs (one-to-many relationship). Turns out that doing:

code:
select * from parents where parent_id in (select parent_id from children where child_id = any(:child_ids))
was somehow MUCH slower than:

code:
select * from parents where exists (select * from children where child_id = any(:child_ids) and parent_id = parents.parent_id)
The former for some reason expected to have a lot of matches and did a hash join over two sequential scans. The latter actually used the primary key indexes on both tables. Oh well, just two or three days wasted over this :shepicide:

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

NihilCredo posted:

Postgres 12 weird performance characteristic of the day:

I had an input array of child IDs and needed to find all the parent IDs (one-to-many relationship). Turns out that doing:

code:
select * from parents where parent_id in (select parent_id from children where child_id = any(:child_ids))
was somehow MUCH slower than:

code:
select * from parents where exists (select * from children where child_id = any(:child_ids) and parent_id = parents.parent_id)
The former for some reason expected to have a lot of matches and did a hash join over two sequential scans. The latter actually used the primary key indexes on both tables. Oh well, just two or three days wasted over this :shepicide:

yeah actually this is very normal, I’m not quite clear on what exactly the EXISTS syntax allows the planner to eliminate vs the IN syntax but the same is also true in Oracle, usually anywhere you use that WHERE id IN (subquery) syntax you should strongly consider using EXISTS instead

Just-In-Timeberlake posted:

e: nevermind, figured it out

what did you see, denvercoder9!?

Paul MaudDib fucked around with this message at 19:52 on Dec 10, 2021

Just-In-Timeberlake
Aug 18, 2003

Paul MaudDib posted:

what did you see, denvercoder9!?

figured out what terms to search by and followed this example:

https://www.sqlshack.com/dynamic-pivot-tables-in-sql-server/

abelwingnut
Dec 23, 2002


Paul MaudDib posted:

yeah actually this is very normal, I’m not quite clear on what exactly the EXISTS syntax allows the planner to eliminate vs the IN syntax but the same is also true in Oracle, usually anywhere you use that WHERE id IN (subquery) syntax you should strongly consider using EXISTS instead

what did you see, denvercoder9!?

from stack overflow:

--

EXISTS will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.

With IN, it will collect all the results from the sub-query before further processing.

--

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Yeah, EXISTS() tends to be really underused for some reason IME. At least in postgres though if you use IN(subquery) instead I believe the query planner is capable of cheating by looking at what the subquery is doing, and in many simple cases it's possible to optimize it into an actual semi-join (i.e. EXISTS()), and that's probably why NihlCredo got bitten - you can often get away with it. The one to really avoid though is NOT IN(subquery), because while it looks similar it's much harder to optimize. NOT EXISTS() isn't easy either but at least it significantly reduces the risk of some really awful query plans (old article so some of the particulars may have changed but I'm almost certain NOT IN still has a whole pile of caveats associated with it).

TheFluff fucked around with this message at 03:09 on Dec 11, 2021

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003
it's because of null. they should really just add semi join and anti join as sql keywords imo :colbert:

abelwingnut
Dec 23, 2002


TheFluff posted:

Yeah, EXISTS() tends to be really underused for some reason IME.

yea, same. personally, i didn't use it for like six years because i didn't grasp the syntax. then it clicked and i started using it everywhere. it really does help.

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
So I'm working on a group project and volunteered for the sql db parts because I make excellent choices. The DB is going to hold mostly recipes, which it turns out is more complicated than I thought it would be. I found this answer on stack overflow and it looks good, but I'm not seeing how the ingredients get matched with the recipe itself. Am I missing something super obvious? Do I need another field somewhere?

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Guildenstern Mother posted:

So I'm working on a group project and volunteered for the sql db parts because I make excellent choices. The DB is going to hold mostly recipes, which it turns out is more complicated than I thought it would be. I found this answer on stack overflow and it looks good, but I'm not seeing how the ingredients get matched with the recipe itself. Am I missing something super obvious? Do I need another field somewhere?
That answer in the op/Austin is kinda messed up. The ingredient table should just be an ID and something identifying the ingredient (it's name, possible a list of alternative names, keywords, etc). It should NOT include a recipe; that defeats normalization (and you end up with 55 different spellings of buttre).

The "recipe" table is more like "metadata", but I'll stick with their name.

The "quantity" table in the first answer (+10pts) links the recipe and ingredient IDs.

Where you store the units is an interesting choice; most ingredients presume certain units. EG "milk" had better be a volume. The recipe "display units" might prefer something other than gallons, but the ingredient is still in the main volume unit. For internationalization, the quantity table in the answer therefore looks flubbed since you have to i18n each ingredient separately, whereby you may end up with mixed units.

abelwingnut
Dec 23, 2002


on mobile, so broad sketch here.

i would do a recipe table, an ingredients table, a units table (g, lb, tbsp, tsp, gal, et al), and a junction table to connect them. in the junction table, you’d have:

idRecipe (from recipe table)
idIngredient (from ingredient table)
idUnit (from units table)
amt (number)

abelwingnut fucked around with this message at 00:28 on Dec 16, 2021

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
I just want to make sure I'm getting this right, the junction table would have an entry for each ingredient in each recipe? Something like:

recipe_id 1, ingredient_id 2, etc
recipe_id 1, ingredient_id 3, etc

Condimentalist
Jun 13, 2007
Yes, that's correct. You will also want to have a column for the quantity of that ingredient and any other metadata about it that pertains to the ingredient and the recipe.

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
Gotcha. That makes sense and I like the simplicity. Keeps it easy to do searches by ingredient as well.

abelwingnut
Dec 23, 2002


yea, to clarify, you'd have like:

code:
recipe table

idRecipe	nameRecipe
1		lasagna
2		cacio e pepe
3		carbonara

--

ingredients table

idIngredients	nameIngredient
1		tomato
2		Lasagna Noodles
3		pecorino

--

units table

idUnit		nameUnit
1		cup
2		ounces
3		can

--

junction table

idJunction	idRecipe	idIngredient	idUnits	amount
1		1		1		3	1
2		1		2		2	16
3		2		3		2	8
4		1		3		2	4
so you'd have entries in the junction table that tell you:

lasagna needs a can of tomatoes, 16oz of lasagna noodles, and 4oz of pecorino
cacio e pepe needs 8oz of pecorino
forgot to add anything for carbonara but i think you see how this is working out.

and yea, feel free to add whatever other columns you need to each table. this is just how they all connect.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

PhantomOfTheCopier posted:

Where you store the units is an interesting choice; most ingredients presume certain units. EG "milk" had better be a volume. The recipe "display units" might prefer something other than gallons, but the ingredient is still in the main volume unit. For internationalization, the quantity table in the answer therefore looks flubbed since you have to i18n each ingredient separately, whereby you may end up with mixed units.

I would have three or more possible measures (volume, weight, count) and exactly one canonical unit for each measure (milliliter, gram, item). Ingredients should be associated with a measure, and each recipe-ingredient should have a quantity referring to the canonical unit.

Other units like gallons, pounds, teaspoons, cups etc., being fixed, should be hardcoded conversions to/from the base unit in the UI layer (have a dropdown menu to choose SI/imperial/"american household" units) and do not need to be handled in the database. For a recipe site, you can aggressively apply integer rounding so that you can freely convert back and forth between e.g. ounces and grams without loss of precision.

However, you should store an identifier of the units that the recipe-creator inputted as a safeguard against later changes, i.e. in case some psycho PM suddenly decides 'actually a teaspoon should be 4.97ml not 5ml, update all recipes!'.

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE

NihilCredo posted:

I would have three or more possible measures (volume, weight, count) and exactly one canonical unit for each measure (milliliter, gram, item). Ingredients should be associated with a measure, and each recipe-ingredient should have a quantity referring to the canonical unit.

Other units like gallons, pounds, teaspoons, cups etc., being fixed, should be hardcoded conversions to/from the base unit in the UI layer (have a dropdown menu to choose SI/imperial/"american household" units) and do not need to be handled in the database. For a recipe site, you can aggressively apply integer rounding so that you can freely convert back and forth between e.g. ounces and grams without loss of precision.

However, you should store an identifier of the units that the recipe-creator inputted as a safeguard against later changes, i.e. in case some psycho PM suddenly decides 'actually a teaspoon should be 4.97ml not 5ml, update all recipes!'.

lol, have gnu units be the frontend for this and if it fails validation to your standardized volume/weight/unit units just don't accept the recipe

pro enterprise strategies right here, "that's not my problem".

redleader
Aug 18, 2005

Engage according to operational parameters
don't forget that recipes can use the same ingredient multiple times - e.g. you use 1 cup water in step one, then 2 cups in step 3. personally i wouldn't fold those into one entry, since they're used for different purposes and it's convenient to see at a glance

if you want to make it more complicated, ingredients are often (but not always!) grouped into sections - one for the cake, one for the icing, etc

nielsm
Jun 1, 2009



You obviously need to have ingredients that have a recipe of their own too. Like you have a recipe for mayonnaise and then a recipe for a mayo-based burger dressing, and then a recipe for a hamburger using that dressing.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

NihilCredo posted:

I would have three or more possible measures (volume, weight, count) and exactly one canonical unit for each measure (milliliter, gram, item). Ingredients should be associated with a measure, and each recipe-ingredient should have a quantity referring to the canonical unit.

...
That's probably what I'd want to see as well. Having an endless supply of non-base units will lead to problems eventually, as well as more complicated data validation and sanity checking.

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
So the data that's going to populate a table is (hopefully) going to come from an API that reads a recipe site and returns the recipe as json data. It will then display on an edit page before being saved to the database to minimize weird fuckery. Since this is just a minimum viable product I'm going to probably set that as a 'would be nice' goal and see how its going before the assignment deadline, but I suppose I could just check the inputs against a list and if its not on there demand it be fixed before saving. Sounds like a problem for the people doing the front end.

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Guildenstern Mother posted:

So the data that's going to populate a table is (hopefully) going to come from an API that reads a recipe site and returns the recipe as json data. It will then display on an edit page before being saved to the database to minimize weird fuckery. Since this is just a minimum viable product I'm going to probably set that as a 'would be nice' goal and see how its going before the assignment deadline, but I suppose I could just check the inputs against a list and if its not on there demand it be fixed before saving. Sounds like a problem for the people doing the front end.

If that's the case, may be worthwhile to just copy the schema of the json.

Lumpy
Apr 26, 2002

La! La! La! Laaaa!



College Slice
Postgres 11.x dummy question. Given this table:

code:
CREATE TABLE public.thing_other_relationship
(
  id                VARCHAR(37) PRIMARY KEY,
  name     VARCHAR(100) NOT NULL,
  relationship_type VARCHAR(50) NOT NULL,
  thing_id       VARCHAR(37)  NOT NULL,
  other_id       VARCHAR(37)  NOT NULL,
  FOREIGN KEY (other_id) REFERENCES other (id),
  FOREIGN KEY (thing_id) REFERENCES thing (id)
);
and the desire to change it so relationships are stored as a VARCHAR array (so there can be multiple relationship types instead of just one in the same record) while preserving the existing data is this sane / correct :

code:
ALTER TABLE   public.thing_other_relationship
  ADD COLUMN relationships VARCHAR(50) ARRAY USING array[relationship_type];

ALTER TABLE   public.thing_other_relationship
  DROP COLUMN relationship_type;

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

You can do it in a single statement via ALTER COLUMN [column] ALTER TYPE [new type] USING [mapping].

However if you don't want to have any downtime, you should do it in three steps:

1) Add the new column and a trigger to keep it in sync with any changes to the old column

2) Update your clients to a new version that uses only the new column

3) Drop the old column and the trigger

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
I have so many questions.

Is this really PostgreSQL (it sure looks like "it came from MySQL")?

Why are there. so. many. varchars?

If this is a thing-to-other relationship, is that single relationship of multiple types?

What are the storage and performance impacts of changing a field to an array? Arrays exist, but that doesn't mean they're great for indexing, querying, etc.

What are these types? Are they normalized?

Lumpy
Apr 26, 2002

La! La! La! Laaaa!



College Slice

PhantomOfTheCopier posted:

I have so many questions.

Is this really PostgreSQL (it sure looks like "it came from MySQL")?

Yes.

PhantomOfTheCopier posted:

Why are there. so. many. varchars?

Because all the IDs are UUIDs and The People Who Came Before™ use that to store them. :shrug:

PhantomOfTheCopier posted:

If this is a thing-to-other relationship, is that single relationship of multiple types?

It's a single relationship that can have different aspects. In practice, it is a relationship between an asset an an organization. They can be the owner, the operator, or both.

PhantomOfTheCopier posted:

What are the storage and performance impacts of changing a field to an array? Arrays exist, but that doesn't mean they're great for indexing, querying, etc.

We are in the process of changing the way data is kept: previously there was no idea of a relationship, each asset had a single organization ID on it. We now have assets that are owned by one customer, and operated by another, so we need to be able to tie the asset to both orgs. As for the need for the change in question: The table got made by another dev before the decision was made to keep the "type of relationship" in an array. There are only a few actual records in the table, and only on a staging DB, so the switch will have little to no impact. We have no real use case for querying based on that array field, so performance should be fine.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
But PostgreSQL has had a UUID data type for a decade and it's faster and more efficient than a toasted text field. https://www.postgresql.org/docs/14/datatype-uuid.html

:smithicide:

PhantomOfTheCopier fucked around with this message at 00:08 on Dec 19, 2021

Lumpy
Apr 26, 2002

La! La! La! Laaaa!



College Slice

PhantomOfTheCopier posted:

But PostgreSQL has had a UUID data type for a decade and it's faster and more efficient than a toasted text field. https://www.postgresql.org/docs/14/datatype-uuid.html

:smithicide:

I’ll just go back in time and have the people who did that not do it. Thanks!

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



PhantomOfTheCopier posted:

EG "milk" had better be a volume.

Might not be if you're baking

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

I'm a month late on this, but in case anyone else missed it: SQLite has finally added a STRICT mode with type checking. It still does the standard SQL type coercion unfortunately, but way better than the free-for-all it used to be.

Between this and supporting projects like Litestream, I would seriously explore to running new backend projects on SQLite unless I needed access to specific tooling/features from the Postgres ecosystem.

Happiness Commando
Feb 1, 2002
$$ joy at gunpoint $$

I'm doing a presentation at work using AWS Athena to query some structured CSVs. I know very little SQL past the thread title. Please help.

I have a table populated by synthetic health record data as such (the "description" field is doctors notes or equivalent):
code:
Patient Name 	Encounter Date		Encounter Type		Condition	Description
I have another table that used some natural language processing model to parse the description field and determine what the condition is
code:
Patient Name	Encounter Date		Encounter Type		Parsed Condition
I need to create a third table which shows all records where Table1.Condition = $null AND Table2.ParsedCondition != $null

I dont even know how to start thinking about translating my requirement to an actual query. What kind of join do I use?

Edit:
Documentation suggested FULL OUTER JOIN to mash different tables together. So starting with a create table that did
code:
SELECT t1.foo1, t1.foo2, t2.foo3 
FROM databasename.bar1 t1
FULL OUTER JOIN databasename.bar2 t2
ON bas1 = otherbas1
I changed it to
code:
SELECT t1.foo1, t1.foo2, t2.foo3, t3.foo4 FROM databasename.bar1 t1
FULL OUTER JOIN databasename.bar2 t2
ON bas1 = otherbas1
FULL OUTER JOIN databasename.bar3 t3
ON bas2 = otherbas2

WHERE t2.foo3 IS NULL AND t3.foo4 IS NOT NULL
The obfuscation doesn't make it easy to read, but the query certainly works, and then I can just prefix it with create table databasename.structured_vs_parsed as and that should be it, I think

Happiness Commando fucked around with this message at 21:25 on Dec 30, 2021

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

NihilCredo posted:

I'm a month late on this, but in case anyone else missed it: SQLite has finally added a STRICT mode with type checking. It still does the standard SQL type coercion unfortunately, but way better than the free-for-all it used to be.

Between this and supporting projects like Litestream, I would seriously explore to running new backend projects on SQLite unless I needed access to specific tooling/features from the Postgres ecosystem.

I don't really see the comparison, sqlite and postgres occupy two almost entirely disjoint niches and the "killer feature" most postgres users need that sqlite does not have is "available for concurrent access over a network".

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Happiness Commando posted:

]
I need to create a third table which shows all records where Table1.Condition = $null AND Table2.ParsedCondition != $null

I dont even know how to start thinking about translating my requirement to an actual query. What kind of join do I use?
Maybe I'm asleep, but no type of join. Select rows from table2 where (all the fields of the "key" match) and where table1 has a null condition. Sure it's a "join" in the sense that there are two tables but it's inner.

select t2.* from t2,t1 where t2.patient=t1.patient and t2.date=t1.date and t2.type=t1.type and t1.condition is null;

(Everyone who didn't answer will now come back and complain about the lack of an explicit join.) (Let the database do its job.)

You can also look up the benefits of EXISTS for queries of this type. EG, where exists (select patient, date,type from t1 where condition is null).

abelwingnut
Dec 23, 2002


i'm trying to teach someone ssms and t-sql. what is the cheapest and fastest way to set them up for this?

an issue, though: they're on a old mac. like 2012 old. anyway, i know ssms is on mac, but i've heard...less than stellar things. so i was curious if ms had copied snowflake by now and set up their own online ssms platform. does this exist? if so, then this would be pretty easy. could just set up an animalworks db on there and get on with teaching.

if not, then what would you all recommend? it's not out of the question we go and get a cheap pc for this, but i'd rather not buy anything. i know an online version of ssms isn't ideal, either, but this someone isn't looking to admin, just script. last resort, we just download old copies of ssms and express for her mac and go for there, but it seems like this approach might disrupt the teaching at points, so really looking for something other than this.

thanks.

abelwingnut fucked around with this message at 18:42 on Jan 6, 2022

nielsm
Jun 1, 2009



You can maybe sign up for a virtual desktop on Azure. Not sure if you can get those as an individual or only as an enterprise customer.

Or go install Windows on the Mac, I don't think you can run new macOS versions on it anyway, and will get locked out of more and more software as developers update their SDKs.

Just-In-Timeberlake
Aug 18, 2003

abelwingnut posted:

i'm trying to teach someone ssms and t-sql. what is the cheapest and fastest way to set them up for this?

an issue, though: they're on a old mac. like 2012 old. anyway, i know ssms is on mac, but i've heard...less than stellar things. so i was curious if ms had copied snowflake by now and set up their own online ssms platform. does this exist? if so, then this would be pretty easy. could just set up an animalworks db on there and get on with teaching.

if not, then what would you all recommend? it's not out of the question we go and get a cheap pc for this, but i'd rather not buy anything. i know an online version of ssms isn't ideal, either, but this someone isn't looking to admin, just script. last resort, we just download old copies of ssms and express for her mac and go for there, but it seems like this approach might disrupt the teaching at points, so really looking for something other than this.

thanks.

sign up for an AWS account, you get one year of free tier stuff. Low specs but who cares, it'll just be running SSMS

create a free tier Server 201x instance and remote into that.

NPR Journalizard
Feb 14, 2008

nielsm posted:

You can maybe sign up for a virtual desktop on Azure. Not sure if you can get those as an individual or only as an enterprise customer.

Or go install Windows on the Mac, I don't think you can run new macOS versions on it anyway, and will get locked out of more and more software as developers update their SDKs.

Azure SQL VM instances can be free as long as they are a dev server, and dont go into prod.

Adbot
ADBOT LOVES YOU

Guildenstern Mother
Mar 31, 2010

Why walk when you can ride?
Is it possible someone could explain composite tables and their use to me, in the kind of terms you'd use with a particularly slow learning child?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply