Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

Hughmoris posted:

Rookie question here:

I have a small sample database on SQL Server. I want to import those tables into Postgres for self-learning. In SQL Server, I exported each table as a CSV.

What is standard practice for importing data into a database, in regards to keys and constraints? Should I create new tables in PG, import the CSVs, then go back and ALTER TABLES to add constraints and the like?

*Also, is it better to generally dump a CSV copy into temp tables and then insert into the final tables, or does it not really matter in practice?

yes, it will be faster if you do the csv import before creating any constraints or indexes. otherwise the db has to check/modify those for each row while doing the import, which is slower than doing that work in bulk once the table is populated. but for a small sample db it may not make much difference

i don't think using a temp table will make it go any faster

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!

DELETE CASCADE posted:

yes, it will be faster if you do the csv import before creating any constraints or indexes. otherwise the db has to check/modify those for each row while doing the import, which is slower than doing that work in bulk once the table is populated. but for a small sample db it may not make much difference

i don't think using a temp table will make it go any faster

Thanks!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Question about SQL Server and Transactions, as I seek to better understand them...

I was under the novice impression that changes to a table could not be queried before said transaction is complete?

SQL code:
CREATE TABLE test (
    id INT IDENTITY(1,1),
    lname NVARCHAR(20)
);

BEGIN TRANSACTION

INSERT INTO [dbo].[test]
	( lname) VALUES ( 'john' );

SELECT * FROM TEST;  -- this returns John

ROLLBACK TRANSACTION;
Being able to query 'John' before the transaction is complete appears to be default behavior?

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider

Hughmoris posted:

Question about SQL Server and Transactions, as I seek to better understand them...

I was under the novice impression that changes to a table could not be queried before said transaction is complete?

SQL code:
CREATE TABLE test (
    id INT IDENTITY(1,1),
    lname NVARCHAR(20)
);

BEGIN TRANSACTION

INSERT INTO [dbo].[test]
	( lname) VALUES ( 'john' );

SELECT * FROM TEST;  -- this returns John

ROLLBACK TRANSACTION;
Being able to query 'John' before the transaction is complete appears to be default behavior?

They're part of the same transaction, so that select query reads the new state. Everything between BEGIN TRANSACTION and COMMIT or ROLLBACK TRANSACTION shares that transaction and the associated changes.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

RandomBlue posted:

They're part of the same transaction, so that select query reads the new state. Everything between BEGIN TRANSACTION and COMMIT or ROLLBACK TRANSACTION shares that transaction and the associated changes.

I am a big dumb-dumb. :doh:

That makes perfect sense. Thanks!

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
Open another query window / client / terminal before doing the rollback, run the select query in there, and new rows won't show up. With uniqueness constraints an insert will block in the second client, if there's a conflict, until the first transaction commits or aborts (or something hits a timeout).

With the right combination you can even test deadlock behavior.

Seventh Arrow
Jan 26, 2005

I could be totally wrong on this, but wouldn't you also have to do a nolock on the SELECT query?

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider

Seventh Arrow posted:

I could be totally wrong on this, but wouldn't you also have to do a nolock on the SELECT query?

Nope.

Condimentalist
Jun 13, 2007
Kind of random question: Has anyone taken the oracle database foundations 1z0-006 exam? I am trying to test out of some college classes; I have been using ANSI sql and t-sql for ten years but I haven’t touched oracle.

Wondering if I need to study pl/sql or if it’s pretty generic.

RandomBlue
Dec 30, 2012

hay guys!


Biscuit Hider

Condimentalist posted:

Kind of random question: Has anyone taken the oracle database foundations 1z0-006 exam? I am trying to test out of some college classes; I have been using ANSI sql and t-sql for ten years but I haven’t touched oracle.

Wondering if I need to study pl/sql or if it’s pretty generic.

Until you get to custom stored procs, functions and packages it's not drastically different from other SQL. Or wasn't when I worked with it for almost 10 years in the 90's and 00's and I doubt they've gotten much worse in that regard.

You might take a look at their basic syntax for those and do a trial test if there is one.

I haven't taken their exam though.

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
It would be wise to check the expected content and format of their exam. If it's multiple choice, for example, the knowledge will transfer well. If it's a practical examination, you could fail by wasting all your time "trying to figure out what's with this crazy quoting and square bracket syntax :rant:".

The data types in plain SQL are also somewhat limited so you'll see a lot of subtle variation across databases. EG, how to store and index a network address (IPv4 or 6), or a time interval.


vvv Well you get 25% for free, so you only need to make up the 35% difference and you're there!

PhantomOfTheCopier fucked around with this message at 21:56 on Aug 10, 2022

Condimentalist
Jun 13, 2007
That is very helpful, thank you!

It is multiple choice and a passing score is only 60% so I think I’ll just give it a shot.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
If I have a table like so:


What is a clean way to go about returning a row that a count of Bob/Sarah for each column? Ideal result is below. Is this where I need to start utilizing unions and CTEs?



Name | A | B | C | D|
Bob | 1 | 2 | 1 | 1
Sarah | 5 | 4 | 5 | 5

Condimentalist
Jun 13, 2007
Something like this should work. Phone posting.
Select
Sum(Case when a = ‘bob’ then 1 else 0 end) as aBobCount,
Sum(case when b = ‘bob then 1 else 0 end) as BBobCount
From table

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Hughmoris posted:

If I have a table like so:


What is a clean way to go about returning a row that a count of Bob/Sarah for each column? Ideal result is below. Is this where I need to start utilizing unions and CTEs?



Name | A | B | C | D|
Bob | 1 | 2 | 1 | 1
Sarah | 5 | 4 | 5 | 5

I'm assuming here that the distinct names need to be figured out by aggregating over all of the columns together. If the names can be hardcoded into the query as literals then you can just use filtered aggregation directly like in the post above.

The strategy here is to first unpivot the table so the columns become rows, and then we count with a group by as usual:

SQL code:
select
  pivoted_table.name,
  count(*) filter(where pivoted_table.col = 'a') as a_count,
  count(*) filter(where pivoted_table.col = 'b') as b_count,
  count(*) filter(where pivoted_table.col = 'c') as c_count,
  count(*) filter(where pivoted_table.col = 'd') as d_count
from (
  select a as name, 'a' as col from some_table
  union all
  select b as name, 'b' as col from some_table
  union all
  select c as name, 'c' as col from some_table
  union all
  select d as name, 'd' as col from some_table
) as pivoted_table
group by pivoted_table.name
The count with filter thing is postgres specific, but you can do the same thing with a case statement in any SQL dialect, it's just a bit more obnoxious (the post above mine uses exactly that technique). Also, if you're on MSSQL it's probably more idiomatic to use UNPIVOT on the table instead of manually unpivoting it with unions like I did here (postgres doesn't have unpivot). If you like you can also put the pivoted table in a CTE; I probably would have if the query had been any more complex, just because I think it's more readable that way.

Fiddle: https://www.db-fiddle.com/f/cUcSfoLR1ziHLpmhMDNdYL/0

There are other ways to accomplish this by the way, this is just what came first to my mind.

TheFluff fucked around with this message at 02:27 on Aug 16, 2022

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Condimentalist posted:

Something like this should work. Phone posting.
Select
Sum(Case when a = ‘bob’ then 1 else 0 end) as aBobCount,
Sum(case when b = ‘bob then 1 else 0 end) as BBobCount
From table

Thanks!

TheFluff posted:

I'm assuming here that the distinct names need to be figured out by aggregating over all of the columns together. If the names can be hardcoded into the query as literals then you can just use filtered aggregation directly like in the post above.

The strategy here is to first unpivot the table so the columns become rows, and then we count with a group by as usual:

SQL code:
select
  pivoted_table.name,
  count(*) filter(where pivoted_table.col = 'a') as a_count,
  count(*) filter(where pivoted_table.col = 'b') as b_count,
  count(*) filter(where pivoted_table.col = 'c') as c_count,
  count(*) filter(where pivoted_table.col = 'd') as d_count
from (
  select a as name, 'a' as col from some_table
  union all
  select b as name, 'b' as col from some_table
  union all
  select c as name, 'c' as col from some_table
  union all
  select d as name, 'd' as col from some_table
) as pivoted_table
group by pivoted_table.name
The count with filter thing is postgres specific, but you can do the same thing with a case statement in any SQL dialect, it's just a bit more obnoxious (the post above mine uses exactly that technique). If you're on MSSQL it's probably more idiomatic to use UNPIVOT on the table instead of manually unpivoting it with unions like I did here (postgres doesn't have unpivot). If you like you can also put the pivoted table in a CTE; I probably would have if the query had been any more complex, just because I think it's more readable that way.

Fiddle: https://www.db-fiddle.com/f/cUcSfoLR1ziHLpmhMDNdYL/0

This is terrific and fits exactly what I need. I didn't know such a thing as db-fiddle existed, seems like it'll come in very handy. Thanks!

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
I hope this is the right place to ask.

I have been learning sql for a while now for a volunteer project.

For the next iteration we are going to use postgre. Looking at the database for the first iteration,well, it needs a lot of work. It reaches 1NF but certainly not 2NF. And now that I'm trying to get some number crunching done on the dataset of the first project, well, it ain't easy.

I have been learning about foreign keys, junction tables, joins, dependencies etc. To propose something much better.

- what are some pitfalls that a beginner might fall into? What should I be looking at?

- Are model databases, or database standarda a thing? I would like to do something neat and then make a paper out of it, so that similar projects can just copy, modify and paste, and I don't know if it's done.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Microsoft's SQL server docs all use the same example database that you can download and play around with, it's called AdventureWorks I think? But other than that not really.

The normal forms are important theoretical concepts but in day to day work I don't think I've ever heard them brought up. You normalize as much as you feel like and sometimes denormalize for performance or ease of use.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE
Beginner pitfalls, or I guess just a bunch of random Big Thinks from a number of years of experience:


- natural primary keys are neat theoretically but aren't really practical in a real system. always use surrogate primary keys (serial int or UUID, doesn't really matter). by extension, don't use composite PK's either.

- the theoretical foundation of SQL is really cool and good, and it's very good to be familiar with it, but in reality you'll inevitably end up in some kinda ugly places. sometimes that's fine, sometimes it's super obnoxious, and a big part of getting Good At Relational Databases is learning when you can get away with cheating.

- constraints are good. making invalid states unrepresentable is much better than trying to check for them at runtime.

- relatedly, NOT NULL should be the default. consider carefully before making a column nullable.

- the fact that right join exists doesn't mean you should use it, it just confuses people. rewrite as a left join.

- you don't need to treat CTE's as some special construct, you can (and probably should) just use them to name subqueries for readability.

- for some reason semi-joins and anti-joins don't seem to get taught very often, but they're very common in practice. the proper way to write them is with EXISTS()/NOT EXISTS() subqueries; prefer that over various shenanigans with IN(subquery) and left joins.

- get a good query plan visualizer and start looking at query plans. https://explain.dalibo.com/ is a great visualizer for postgres plans. the sooner you start understanding what the database is actually doing the better. you might want a book to help you with this.

- ORM's are bad, query builders are good.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Speaking of anti-joins, I just read this very timely article on the topic: https://www.crunchydata.com/blog/rise-of-the-anti-join

Lonely Wolf
Jan 20, 2003

Will hawk false idols for heaps and heaps of dough.

TheFluff posted:

- natural primary keys are neat theoretically but aren't really practical in a real system. always use surrogate primary keys (serial int or UUID, doesn't really matter). by extension, don't use composite PK's either.

Natural primary keys are perfectly fine. The problem is that you don't actually have any. You have things that look like they might be but oops oh no haha oh poo poo. If you actually have them go ahead and use them. But you don't. So use surrogate primary keys.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

Hughmoris posted:

Speaking of anti-joins, I just read this very timely article on the topic: https://www.crunchydata.com/blog/rise-of-the-anti-join

Good article. Personally I prefer to always use the NOT EXISTS() form over the LEFT JOIN form, for a couple of reasons - mostly subjective and related to readability. Most important is the clarity of intent. NOT EXISTS() tells me immediately that here's an anti-join. A left join can be used for a whole bunch of things, and to figure out that it's actually an anti-join I need to look at the tbl.id IS NULL bit in the WHERE clause, where it might be easy to overlook in a complex query. To me, the FROM clause is where I look to figure out "okay, which tables are available in the SELECT and WHERE clauses", and if I'm not selecting anything from a table, I prefer to not have it there. In other words, in my mind, both semi-joins and anti-joins logically belong in the WHERE clause. This is also why I prefer EXISTS() to an inner join even in simple cases where they're equivalent, if I'm not selecting anything from the joined table.

More generally, I really like trying to keep the FROM clause as simple as possible. If I have to join on a subquery, I almost always put the subquery in a CTE instead of having it in the FROM clause. This is simply because the FROM clause is the starting point of the query, and almost always where I start reading. It's where all of your table aliases are, after all, so it's where you go to look to figure out where things are defined. Also, being able to quickly figure out e.g. where the duplicating joins are is very helpful to understanding what a query is doing.

Oh, and on that note, if you have a query with even a single join in it, for God's sake spell out all of your column references as tbl_name.col_name! ! ! I don't care if the column name is unambiguous, saving three seconds on omitting the table name when writing the query isn't worth making every single person reading the query having to double check which table the column is coming from!

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I swear I have no affiliation with Crunchy Data but they (literally) just released an online playground for Postgres.

Question for those who use Postgres, before I go off and make a terrible tool. Is there anything that will read a CSV or Parquet file and auto-generate a .pgsql script that will create the tables and schemas for said CSV, so I can import it? Right now for learning purposes, I'm having to type out all the columns and types of said CSV before I can copy it with psql. It gets a bit tiring with multiple CSVs, or lots of columns.

If there isn't a handy tool already built, I might have something together in Python.

lazerwolf
Dec 22, 2009

Orange and Black

TheFluff posted:


- ORM's are bad, query builders are good.

Can you expound on this? Why are ORMs bad

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

lazerwolf posted:

Can you expound on this? Why are ORMs bad

This post is more than 15 years old now, but still pretty relevant: https://blog.codinghorror.com/object-relational-mapping-is-the-vietnam-of-computer-science/.

ORMs are an extremely leaky abstraction. To efficiently use an ORM, you have to also understand the basics of how the ORM translates its own API to queries, how its generated tables will actually work on the database, and what effective queries will look like for your data.

That said, I start with an ORM most of the time on most applications, and will defend that decision. Every sane ORM has the ability to let you hand-write your own queries where you need to, and ORMs result in less boilerplate code than a query builder for all of the basic operations.

ORMs aren't bad, they're a trade off to move faster by accepting more complexity. Usually this is a trade-off that I personally am willing to make. They also let more people write data access code without confident SQL understanding, which is fine as long as other people who do understand the ORM / SQL are reviewing code.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

lazerwolf posted:

Can you expound on this? Why are ORMs bad

Interacting with a relational database from some other application is basically a two step process: generating the query (in most practical cases it contains some form of dynamically generated stuff), and translating the resulting rowset to some kind of native object that makes sense to work with in the application code. ORM's try to do both of these for you by starting from the end result: you declare an object hierarchy in the application (or get it generated for you based on your database schema), and from there the ORM creates an API that loads and stores objects in this hierarchy. This is the wrong way to do it and creates a lot of problems, and I will try to explain why.

The big problem here is that the abstraction ORM's start from is fundamentally flawed. Yes, tables and rows can be represented as classes and objects, but the point of having a relational database is having relationships, duh, and even something as simple as a many-to-many relationship with some kind of extra column on the join table gets pretty awkward to represent and work with in most ORM's. Many concepts that are simple and natural in SQL do not easily translate to operations on the kind of data structures ORM's tend to use - even relatively simple semi-joins are usually quite awkward to generate in many ORM's, because the fundamental assumption is that if you want to filter by a relation, then you must also want to fetch it (because touching a relationship translates to a left join). Queries that include aggregations or really any computed column that doesn't exist in the database are second class citizens that are usually really awkward to work with, both when it comes to actually writing the query and when it comes to getting the result into an application level object. Simple operations like a ".Save()" method on an object obscure the way the database actually updates things and makes it very easy to get data races. Most ORM's are also pretty bad at query generation, and a lot of pretty simple relationship loading operations tend to result in the "N+1 queries" problem - you get one query to load a bunch of rows, then one query per row in the resultset to load a bunch of related data, which is obviously a huge performance headache.

A relational database is an immensely powerful and flexible tool, and even though SQL has a lot of warts as a programming language, it's still a very expressive DSL that there's no point in trying to replace. Trying to pretend the database is just a dumb object storage and abstracting away SQL in favor of a much more limited API that uses fundamentally different abstractions creates a lot of problems. The abstractions naturally become leaky while still obscuring a lot of what is actually going on, and they make it difficult to use the database for many tasks it is actually very good at. Sometimes you hear the counter-argument that almost all ORM's let you write raw SQL if you need to, but this almost always means you get to write the SQL as a string with no means to do dynamic query generation, which isn't good enough. There are upsides to ORM's - they make it very easy to do a lot of simple CRUD stuff, for example - but most of those things aren't actually hard problems to solve, especially not if you have a good query builder library.

So, what you should do instead is to approach the problem from the other end: start with the query generation. One of the main problems with SQL is that it really doesn't have any mechanisms for metaprogramming, code reuse, or composition, but by using a query builder library we can get those mechanisms. Also, starting from the query building end doesn't stop us from having sophisticated object mapping functionality, and most good query builders come with that too. It just tends to be a lot more flexible when you don't start with rigid assumptions about relationship hierarchies. Some query builders also end up with some kind of hybrid API, where you get ORM-like shortcut API's for simple functionality but also a full query builder for when you want to dynamically generate arbitrary SQL. Two that come to mind are objection.js and SQLAlchemy.


tl;dr: ORM's are bad because what they're trying to do is avoid writing SQL, which is bad because SQL is actually cool and good and you should be writing it. Thanks for coming to my TED talk.

TheFluff fucked around with this message at 21:09 on Aug 17, 2022

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Hughmoris posted:

Question for those who use Postgres, before I go off and make a terrible tool. Is there anything that will read a CSV or Parquet file and auto-generate a .pgsql script that will create the tables and schemas for said CSV, so I can import it? Right now for learning purposes, I'm having to type out all the columns and types of said CSV before I can copy it with psql. It gets a bit tiring with multiple CSVs, or lots of columns.
I almost always start with a blank file followed by BEGIN, table creation commands, possibly indexes, and maybe some initial data, followed by ROLLBACK. Then you just get in psql in another open terminal and use "\i filename" and watch it run and hopefully complete without errors.

Copy paste extend rinse repeat.

In some cases I'll do data tests with https://www.postgresql.org/docs/current/sql-savepoint.html so I can still create all the DDL stuff yet have empty tables at the end. Sometimes I'll leave off the ROLLBACK in the file so I can look around in psql before deciding what what is needed. When done, update to COMMIT, input the file, done.

Before going off the deep end with endless contradictory information online, make sure you're comfortable with the introductions in the PostgreSQL documentation. Overall those docs are amazing, well written, excellent examples, lots of technical detail, and things follow best practice.

(Except maybe partitioning. Don't do that without a good reason.)

Hughmoris
Apr 21, 2007
Let's go to the abyss!

PhantomOfTheCopier posted:

I almost always start with a blank file followed by BEGIN, table creation commands, possibly indexes, and maybe some initial data, followed by ROLLBACK. Then you just get in psql in another open terminal and use "\i filename" and watch it run and hopefully complete without errors.

Copy paste extend rinse repeat.

In some cases I'll do data tests with https://www.postgresql.org/docs/current/sql-savepoint.html so I can still create all the DDL stuff yet have empty tables at the end. Sometimes I'll leave off the ROLLBACK in the file so I can look around in psql before deciding what what is needed. When done, update to COMMIT, input the file, done.

Before going off the deep end with endless contradictory information online, make sure you're comfortable with the introductions in the PostgreSQL documentation. Overall those docs are amazing, well written, excellent examples, lots of technical detail, and things follow best practice.

(Except maybe partitioning. Don't do that without a good reason.)

Thahnks!

All of my experience has been writing reports in T-SQL so all this DDL stuff is new to me. Since this is all just personal projects and learning, I think I'm going to take a stab at combining Python + pandas to read in a CSV -> get column info such as min/max/types -> dynamically generated a .sql file that I can then run within psql.

We'll see how it goes!

redleader
Aug 18, 2005

Engage according to operational parameters

TheFluff posted:

Beginner pitfalls, or I guess just a bunch of random Big Thinks from a number of years of experience:


- ORM's are bad, query builders are good.

good stuff, i agree with it all

re: query builders: does anyone know of any good ones (or indeed any at all) for .net?

lazerwolf
Dec 22, 2009

Orange and Black
Thanks for the insight. I mainly use an ORM for data modeling and table building using Django. I’m interested in other ways to manage my data models as code.

Krakhan
Oct 13, 2005

I see you.

redleader posted:

good stuff, i agree with it all

re: query builders: does anyone know of any good ones (or indeed any at all) for .net?

Looking into this as someone who generally prefers to write their own queries too and not deal with ORMs unless I have to, and it appears that the main one is SqlKata. I have no idea how well this actually works, but seems intriguing. The most I use otherwise when I don't have to deal with ORMs like EF is something like Dapper.

DELETE CASCADE
Oct 25, 2017

i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

TheFluff posted:

Good article. Personally I prefer to always use the NOT EXISTS() form over the LEFT JOIN form, for a couple of reasons - mostly subjective and related to readability. Most important is the clarity of intent. NOT EXISTS() tells me immediately that here's an anti-join. A left join can be used for a whole bunch of things, and to figure out that it's actually an anti-join I need to look at the tbl.id IS NULL bit in the WHERE clause, where it might be easy to overlook in a complex query. To me, the FROM clause is where I look to figure out "okay, which tables are available in the SELECT and WHERE clauses", and if I'm not selecting anything from a table, I prefer to not have it there. In other words, in my mind, both semi-joins and anti-joins logically belong in the WHERE clause. This is also why I prefer EXISTS() to an inner join even in simple cases where they're equivalent, if I'm not selecting anything from the joined table.

More generally, I really like trying to keep the FROM clause as simple as possible. If I have to join on a subquery, I almost always put the subquery in a CTE instead of having it in the FROM clause. This is simply because the FROM clause is the starting point of the query, and almost always where I start reading. It's where all of your table aliases are, after all, so it's where you go to look to figure out where things are defined. Also, being able to quickly figure out e.g. where the duplicating joins are is very helpful to understanding what a query is doing.

Oh, and on that note, if you have a query with even a single join in it, for God's sake spell out all of your column references as tbl_name.col_name! ! ! I don't care if the column name is unambiguous, saving three seconds on omitting the table name when writing the query isn't worth making every single person reading the query having to double check which table the column is coming from!

your post on orms is really good btw. but i wanted to ask a question in response to this one. what do you think about the placement of join conditions? i'm sure we all agree that joins should be written using the join clause, not by specifying the relationships in the where clause. constant conditions seem like they belong in the where clause, but for an inner join it's equivalent to write "a join b on a.id = b.id and b.col = 1". until you switch to a left join. and what about a relationship between tables that isn't a foreign key, but is relevant for the specific query? are there like, style guidelines for this? i try to do what feels right

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
Lots of awesome answers, thanks everyone!

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

DELETE CASCADE posted:

i'm sure we all agree that joins should be written using the join clause, not by specifying the relationships in the where clause.
I think that's a fairly stretched interpretation of what they said, namely that splitting things between various joins and wheres creates unnecessary confusion, and that in general they prefer simple from statements.

The trouble with joins is that they depend very much on the database in use. PostgreSQL has a query planner and optimizer that is second to none (or maybe one). Half the joins get converted into other forms and vice versa, so the correct approach is almost always to avoid premature optimization. Many databases are awful with most queries, however, so much of the world has learned to write all sorts of custom joins to try to trick the query into behaving, instead of first considering the logic of the query, its readability, or maintainability.

By example, suppose you have selected FROM book and you're going to join on author. Do you say "JOIN author ON author.id=book.author" or do you say "JOIN author ON book.author=author.id"? Or do you say, this is just an inner join, why confuse matters when this is simply "FROM book,author WHERE author.id=book.author"? Do you even know which is faster in your database, and when should you care?

Like most style guidelines, whatever is claimed as 'best' can easily and quickly be demonstrated as 'wrong'. Consistency is a fine goal, appropriate but concise naming another, but in the long term nothing beats an individual that can read queries. Ask them what never works across multiple databases. That's the style to avoid.

redleader
Aug 18, 2005

Engage according to operational parameters

DELETE CASCADE posted:

your post on orms is really good btw. but i wanted to ask a question in response to this one. what do you think about the placement of join conditions? i'm sure we all agree that joins should be written using the join clause, not by specifying the relationships in the where clause. constant conditions seem like they belong in the where clause, but for an inner join it's equivalent to write "a join b on a.id = b.id and b.col = 1". until you switch to a left join. and what about a relationship between tables that isn't a foreign key, but is relevant for the specific query? are there like, style guidelines for this? i try to do what feels right

the way i do it is: JOIN conditions only include columns involved in the relationship between the tables. all other conditions go in the WHERE clause. this fits in with how i think of the different operations: a JOIN introduces a related table, a WHERE filters the rows. i also find it's easier to scan a query when this is followed: if you know that filter clauses are in the WHERE, then you can just skim over the table names in the FROM and JOIN and ignore the boring parts of the ON clause. so in your example the b.col = 1 would go in the WHERE

left joins might be different, of course, but i try to follow the above (and usually can, far more often than not)


PhantomOfTheCopier posted:

By example, suppose you have selected FROM book and you're going to join on author. Do you say "JOIN author ON author.id=book.author" or do you say "JOIN author ON book.author=author.id"? Or do you say, this is just an inner join, why confuse matters when this is simply "FROM book,author WHERE author.id=book.author"?

whether book.author or author.id are entirely equivalent, and which comes first comes down purely to your coding standards (i prefer the former). i shudder to think of the dbms that treats these differently

the old-style join syntax should be entirely avoided unless you have a drat good reason to (the drat good reason is that you've profiled it and found that it's substantially better than writing out the joins explicitly, and you've commented it to that effect). again, any halfassed db should treat this the same as the other two, but it wouldn't surprise me to learn that some dbs are dumber about this construct

old style joins should be a syntax error tbh

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
FROM x JOIN y ON x.a = y.a -- fine (mention the tables in the same order as they were introduced in the FROM clause)
FROM x JOIN y ON y.a = x.a -- fine (mention y first because it's the one we're joining right now)
FROM x JOIN y ON x.a = y.a JOIN z on y.b = z.b -- fine (consistent)
FROM x JOIN y ON x.a = y.a JOIN z on z.b = y.b -- wrong but in a way most people won't notice (including me) (but once I notice it, it would annoy me)

redleader posted:

old style joins should be a syntax error tbh

Yes.

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

DELETE CASCADE posted:

your post on orms is really good btw. but i wanted to ask a question in response to this one. what do you think about the placement of join conditions? i'm sure we all agree that joins should be written using the join clause, not by specifying the relationships in the where clause. constant conditions seem like they belong in the where clause, but for an inner join it's equivalent to write "a join b on a.id = b.id and b.col = 1". until you switch to a left join. and what about a relationship between tables that isn't a foreign key, but is relevant for the specific query? are there like, style guidelines for this? i try to do what feels right

I think I'm pretty much with redleader on this, but yeah I too just go by feel for the most part. I've never really seen a SQL style guide.

PhantomOfTheCopier posted:

By example, suppose you have selected FROM book and you're going to join on author. Do you say "JOIN author ON author.id=book.author" or do you say "JOIN author ON book.author=author.id"? Or do you say, this is just an inner join, why confuse matters when this is simply "FROM book,author WHERE author.id=book.author"? Do you even know which is faster in your database, and when should you care?
if i had to work with a dbms where a = b produces a different query plan than b = a, i would quit the the job, move to siberia to work as a lumberjack and never touch a computer again

like seriously, I have a very hard time imagining how this could ever be true even in a toy implementation of a query planner, it just makes absolutely no sense at all

like, yeah, i agree in principle that sometimes you can't write things quite the way you like for performance reasons. postgres versions that aren't that old would always materialize cte's for example, which is often not what you want. however, not even mysql 5.7 (which i currently have the grave misfortune of having to wrangle) is anywhere near as sensitive to minor syntax difference as you're implying.

redleader posted:

old style joins should be a syntax error tbh

:agreed:

I don't like natural joins either. i'm very stupid and lazy and don't want to have to think about what the join does, so just spell it out for me thanks

Dawncloack
Nov 26, 2007
ECKS DEE!
Nap Ghost
Hey friends

As part of the project I am working on I have to solve some problems which, in my humble opinion, have to do with a badly designed DB. It's ok, my colleagues are great professionals, they can't know *everything*.

One of the things they did was to make the time local time, as opposed to UTC, so one of my firsts tasks was to write a script that would update the times to UTC, aware of daylight savings time changes and all of that. This is for context.

So as part of solving the device problems, I selected for entries where recording time of an entry and insertion in the database matched. These don't always match because insertion is done on all the recorded packages only when the device is connected, which isn't always.

code:
SELECT ReportRID, RecTime, inserttime FROM RecData2 WHERE RecTime = inserttime ORDER BY ReportRID;
Nothing surprising in the results:
code:
|    842426 | 2018-02-19 20:42:54 | 2018-02-19 20:42:54 |
|    842454 | 2018-02-19 20:43:50 | 2018-02-19 20:43:50 |
|    843346 | 2018-02-20 21:58:06 | 2018-02-20 21:58:06 |
However, I repeated the same search with the corrected (UTC) recording time, contained in a column that I added.
So as part of solving the device problems, I selected for entries where recording time of an entry and insertion in the database matched.
code:
SELECT ReportRID, RecTime_UTC, inserttime FROM RecData2 WHERE RecTime = inserttime ORDER BY ReportRID;
code:
|    842426 | 2018-02-19 18:42:54 | 2018-02-19 20:42:54 |
|    842454 | 2018-02-19 18:43:50 | 2018-02-19 20:43:50 |
|    843346 | 2018-02-20 19:58:06 | 2018-02-20 21:58:06 |
I did it as a test, I was not expecting it to work at all, even less that every hit in the first search is a hit in the second search, despite the condition (RecTime = inserttime ) being patently, visually wrong.

For my own enlightenment: what's going on here?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
You're selecting RecTime_UTC and inserttime but in the WHERE clause you're comparing RecTime (not RecTime_UTC) with inserttime

Adbot
ADBOT LOVES YOU

redleader
Aug 18, 2005

Engage according to operational parameters
did you update inserttime, just add a
that new column RecTime_UTC, or is your second where clause wrong? the two queries are identical except for a column in the select

which rdbms? i think this might be somewhat db-dependent, although i can hazard a guess...

assuming you updated inserttime as well as creating RecTime_UTC, my guess is that inserttime was inserted with a data type that had a time zone offset. using the first row from your example, i'm guessing inserttime would have been inserted as 2018-02-19 20:42:54+02:00 (instead of a time with no offset like 2018-02-19 20:42:54). this time is equivalent to the utc time 2018-02-19 18:42:54+00:00. when comparing the date columns in the where clause, the db compares the dates in utc. these are the same, so you see the same rows appearing in both queries

for example, here's how sql server's datetimeoffset type works:

quote:

The data is stored in the database and processed, compared, sorted, and indexed in the server as in UTC. The time zone offset will be preserved in the database for retrieval.
i suspect other dbs would work similarly



this makes some assumptions about your schema - specifically that something in your schema is offset-aware. probably. i think. idk, times are a nightmare even with good libraries for working with them. and dbs do not have good datetime libraries

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply