Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
I need help creating a SQL query that will return a cocktail from a database, given that I have supplied all the ingredients that go into that cocktail

So for example, I want the "Gin and Tonic" row to return only if I have supplied the correct ids for Gin (id is 1) and Tonic (id is 2). I I only supply "Tonic," I shouldn't get back the row

I'm using SQLAlchemy and Flask, but I'm still having trouble wrapping my head around how the query will work altogether

This is what my table structure looks like

code:
+-------------------+
| Tables_in_my_database |
+-------------------+
| cocktails         |
| ing_in_cocktails  |
| ingredients       |
+-------------------+
This is my cocktails table

code:
+----+----------------+-------+---------+
| id | name           | glass | finish  |
+----+----------------+-------+---------+
|  1 | white russisan | rocks | stirred |
|  2 | gin and tonic  | rocks | stirred |
+----+----------------+-------+---------+
This is my ingredients table

code:
+----+---------+----------+
| id | name    | ing_type |
+----+---------+----------+
|  1 | vodka   | fruit    |
|  2 | kahluha | fruit    |
|  3 | gin     | fruit    |
|  4 | tonic   | fruit    |
+----+---------+----------+
and this is my relational table

code:
+----+-------------+--------+
| id | cocktail_id | ing_id |
+----+-------------+--------+
|  1 |           1 |      1 |
|  2 |           1 |      2 |
|  3 |           2 |      3 |
|  4 |           2 |      4 |
+----+-------------+--------+
Here are the corresponding SQLAlchemy models

code:
class Cocktail(db.Model):
    __tablename__ = 'cocktails'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(80), nullable=False)
    glass = db.Column(db.String(20), nullable=False)
    finish = db.Column(db.String(20), nullable=True)
    ingredients = db.relationship(
        'Ingredient',
        secondary=ing_in_cocktails,
        backref=db.backref('cocktails', lazy='dynamic')
    )

class Ingredient(db.Model):
    __tablename__ = 'ingredients'
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(80), nullable=False)
    ing_type = db.Column(db.String(20), nullable=False)


ing_in_cocktails = db.Table(
    'ing_in_cocktails',
    db.Column('id', db.Integer, primary_key=True),
    db.Column('cocktail_id', db.Integer, db.ForeignKey('cocktails.id')),
    db.Column('ing_id', db.Integer, db.ForeignKey('ingredients.id'))
)
This query got me most of the way there, but the issue here is that it's returning "Gin and Tonic" if I provide any of the ingredient in the cocktail

code:
# problematic because this returns "Gin and Tonic," despite not passing
# all the ingredients
Cocktail.query.join(ing_in_cocktails).filter(ing_in_cocktails.columns.ing_id.in_([3]))
And the query above translates to this SQL

code:
SELECT cocktails.id AS cocktails_id, cocktails.name AS cocktails_name, cocktails.glass AS cocktails_glass, cocktails.finish AS cocktails_finish
FROM cocktails INNER JOIN ing_in_cocktails ON cocktails.id = ing_in_cocktails.cocktail_id
WHERE ing_in_cocktails.ing_id IN (%(ing_id_1)s)

teen phone cutie fucked around with this message at 21:20 on Jan 13, 2019

Adbot
ADBOT LOVES YOU

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Ruggan posted:

Are you only looking for EXACT matches, or are you looking for any recipes you can make using your list?

For example, should an input of Gin, Whiskey, and Tonic return Gin & Tonic because you could make it? Or no because that isn’t the exact recipe?

Yes - Gin and Tonic should be returned if you included Whiskey


Moonwolf posted:

Ruggan's question's a good one. This is a really hard problem with a dynamic amount of parameters though. What you need to get is:
a) Any cocktail that you have at least one ingredient for;
b) All their ingredients lists;
c) Compare your input list to those lists.

Auto-generated SQL doesn't stand a chance.

So you're saying I could do something like

1. Client asks for all liquors including Gin and Tonic
2. SQL statement returns "Gin and Tonic" and "Gin and Tonic and Lime"
3. Python compares the client's list of IDs with the IDs of each Cocktail.ingredients
4. If all the client IDs appear in each Cocktail.ingredients.id, add that cocktail to a result array
5. Return the result array to the client

???

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Moonwolf posted:

You can make that even simpler, you don't need the EXCEPT at all:
SQL code:
SELECT *
FROM cocktails
WHERE NOT EXISTS (
    SELECT ing_in_cocktails.ingredient_id
    FROM ing_in_cocktails
    WHERE ing_in_cocktails.cocktail_id = cocktails.id AND
    ing_id NOT IN (%parameter_list_of_ingredient_ids_goes_here%)
)
Of course, the actual performance of any of these would vary based on the size of any of these tables, but is largely going to be similar at any conventionally sized cocktails list.

I’ll try this and some of the others when I get home thanks!

Additionally, I’m still not great at converting SQL to SQLAlchemy so any help there would be great as well

e: looks like this one did the trick!

welp here's the SQLAlchemy after lots of trial and error

code:
      query = db.session.query(Cocktail).filter(
          ~exists().where(
              and_(
                  CocktailIngredient.cocktail_id == Cocktail.id,
                  ~CocktailIngredient.ing_id.in_([15, 17])
              )
          )
      )
which translates to

code:
SELECT cocktails.id AS cocktails_id, cocktails.name AS cocktails_name, cocktails.glass AS cocktails_glass, cocktails.finish AS cocktails_finish
FROM cocktails
WHERE NOT (EXISTS (SELECT *
FROM ings_in_cocktail
WHERE ings_in_cocktail.cocktail_id = cocktails.id AND ings_in_cocktail.ing_id NOT IN (%(ing_id_1)s, %(ing_id_2)s)))
Not sure what I have to do to prevent the SELECT * FROM

teen phone cutie fucked around with this message at 03:17 on Jan 15, 2019

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
Lol wow. I wish I knew this days ago

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
Say I have a relational table called ingredients_in_cocktails that have 2 primary/foreign keys: Cocktail ID and Ingredient ID

so a row might look like:

code:
Ingredient ID          |           Cocktail ID                   |            action
1                                           1                                                 garnish
I want to add a new primary key just called ID (it won't be relational or anything). How do I accomplish this? My goal is to be able to add duplicate rows, such as this:

code:
ID           |        Ingredient ID          |           Cocktail ID                   |            action
1                       1                                           1                                                 add
1                       1                                           1                                                 garnish

teen phone cutie fucked around with this message at 03:19 on May 26, 2019

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
Ok so i need to create the ID column as a normal row, THEN remove the unique index, THEN give my ID column the unique index?

PhantomOfTheCopier posted:

You don't really want to do that. Either ingredients are unique or recipes are ordered. In the first case, "lime juice" should be a different ingredient than "lime wedge".

In the latter case, much like with baking, "sugar" appears multiple times and the actual distinguishing characteristic is the 'order' of construction. Your composite key in that case will be recipe=(cocktail,step number), and just the foreign key constraint recipe.ingredient references ingredient.id.

As an aside, that table name is a tragedy. Try "cocktail", "ingredient", and "recipe".

I do have a "step" column in that table - I just didn't post it.

I actually am solving for the problem where you need to wipe the rim of a cocktail glass with a lemon twist and then throw the lemon twist into the glass, similar to your sugar example.

Scaramouche posted:

Yeah, just make ID the primary key and unique.

Then you can query like:
code:
select * from ingredients_in_cocktails
where cocktailid=x
You can probably massage the output by building some assumptions into ORDER BY IngredientID (I've seen what you're using this for in the Post Screenshots thread).

EDIT-This assumes you know what the cocktail is. Upon review of your app I'm guessing you'll also want to query by ingredient but maybe not return the extra rows.

I'm assuming (and hope) my existing queries should be fine

teen phone cutie fucked around with this message at 03:23 on May 26, 2019

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
hmmm I seem to be running into some errors while doing this. Mainly because both cocktail_id and ingredient_id are foreign keys.

When running something like:

code:
ALTER TABLE ings_in_cocktail DROP INDEX cocktail_id;
I get an error: "Cannot drop index 'cocktail_id': needed in a foreign key constraint"

And when trying

code:
ALTER TABLE `ings_in_cocktail` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY
I get an error: "Multiple primary key defined"

I am very much a SQL newbie, so I could use some hand-holding on this.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Ruggan posted:

You don’t need the ordinal/step to have meaning or appear. The steps could just say “wipe the rim with the lemon and save the lemon for later” and “put the lemon you saved into the drink” or w/e.

This is a good point. I'd probably be better off just adding a "step text" column or something like that. Thanks for that. I think I'm over-complicating this.


PhantomOfTheCopier posted:

But this is all a trick. I'll bet you're really doing a homework problem to write instructions for making a peanut butter and jelly sandwich. :downsrim:

Or I'm a front-end developer trying to learn SQL and Python in my spare time jesus loving christ

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
I posted redacted a readacted version of my table. Here's what I actually have

Cocktails:

code:
ID   |    Name     |   Glass     |    Finish
1         Mojito       Rocks          Shaken with Ice
Ingredients:

code:
ID   |    Name    |   Type
1         Rum        Liquor
2         Mint       Garnish
CocktailIngredients

code:
Ing_ID (FK)   |   Cocktail_ID (FK)     |  Unit      |   Amount  |  Action   |    Step
1             |           1            |  Ounce     |   1.5     |  Add      |     1
2             |           1            |  Leaves     |   5      |  Garnish  |     2

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
yeah these are all good points. tbh, I'm not really looking to over-engineer this thing. It's just a fun little portfolio project to prove I can work with backend tech.

They're probably all stretch goals I can accomplish if I ever get bored.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
Not sure if there's a better thread for this, but I have a performance questions re: a personal project I'm working on: What do you people think is the most performant way to do the "unread count" for threads that Something Awful has? Talking about this thing:



I'm building this feature into my project where I have "threads," "posts," and "users" and I'm thinking about making association table to store maybe the thread ID, user ID, and just arbitrary "number of posts read," but I also feel like that table could get really big, really quick and I wonder if there's a better way to handle something like this.

I've been poking around at other forums and SA seems to be one of the only forums that even try to show that count, so I wonder if other forums just said gently caress it, we're not doing that because of this exact performance concern.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

PhantomOfTheCopier posted:

Hence all those algorithms courses you took on computing and managing lists of windows, date ranges, finding overlaps, etc. :science:

yeah right all those algorithm courses. I'm totally not just building this project by the seat of my pants :tinfoil:

okay but if I'm reading your response right, you're just saying store store number of posts read and last post ID and make it simple?

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

teen phone cutie posted:

Not sure if there's a better thread for this, but I have a performance questions re: a personal project I'm working on: What do you people think is the most performant way to do the "unread count" for threads that Something Awful has? Talking about this thing:



I'm building this feature into my project where I have "threads," "posts," and "users" and I'm thinking about making association table to store maybe the thread ID, user ID, and just arbitrary "number of posts read," but I also feel like that table could get really big, really quick and I wonder if there's a better way to handle something like this.

I've been poking around at other forums and SA seems to be one of the only forums that even try to show that count, so I wonder if other forums just said gently caress it, we're not doing that because of this exact performance concern.

so i posted in the python thread about this, but now that I have my table schemas for users, threads, and read_count, I'm now looking for a query that will help me join the read count to the thread schema

https://forums.somethingawful.com/showthread.php?threadid=3812541&pagenumber=171#post522793982

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Jabor posted:

I'm reading the docs you linked and all I can think is that ORMs are a mistake.

The SQL you want is something like:

code:
SELECT id, title, is_sticky, last_read_post_id, number_posts_read
FROM threads t
LEFT JOIN read_count r ON r.user_id = ? AND t.id = r.thread_id
WHERE t.is_deleted = False AND t.forum_id = ?
ORDER BY t.is_sticky DESC, t.last_post_at DESC
Trivial to write, I have no idea how you would do that in your ORM.

this is great thank you!

I ended up translating it to the ORM by doing this:

Python code:
db.session.query(Thread, ReadCount)
    .filter(and_(Thread.is_deleted == False, Thread.forum_id == int(forum_id)))
    .join(
      ReadCount,
      and_(ReadCount.thread_id == Thread.id, ReadCount.user_id == user.id),
      isouter=True,
    )
    .order_by(Thread.is_sticky.desc(), Thread.last_post_at.desc())
The only issue is that my serializer is getting rid of properties that don't belong to the "Thread" class, so I had to do some hacky stuff to keep them in the serialized response

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
Does anyone have a good resource for learning indexes, how they work, what they do, how to effectively use them? I'm working a personal project and am about done with the backend and DB, but I'm still a beginner with DBs so I just learned that indexes are a thing and I'd love to read over a dummy's how-to guide if there's a good one out there.

Specifically for MySQL if it matters.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
hi i'm diving into the Redis docs for the first time ever and outside of understanding caching as a concept, i'm going in with not much experience with a goal to cache some of my database queries, and while i'm sure i'm going to need more help in the future, i have one initial question about using Redis in general:

should i be caching data that is subject to change a lot? for example, if i am making a query to get "page 1" of a bunch of data but some stuff might get moved to "page 2" in a short time, is that even worth caching, even if i bust the cache when new data is created?

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

nielsm posted:

Caching search results like that can make sense from a user experience point of view. Query the entire result set, save in the cache with a key for the specific user and query, so the results don't move when the user browses the pages.

but what if there's hundreds of pages? what's the best way to cache a large amount of paginated data?

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
hi again i'm hoping someone can help me out with a really trick mysql statement. i'm trying to get the latest Offense from an Offense table that is tied to a user, but the trick is offenses are tied
to posts and private messages, not users, so i first need to join together Posts and PrivateMessages before I can filter down on the user level

i have 4 tables: Posts, PrivateMessagePosts, Users, and Offenses. Offenses are tied to either a `Post.id` or `PrivateMessagePost.id` but not a user. they look like this:



so far I have a SQLAlchemy query which will get a list of posts given a thread ID (that part is irrelevant), and return them with all of the offenses tied to them. it looks like this:

pay no mind to the current offense join! that is to get a CSV of all offenses tied to each post we're querying. what i want additionally is the latest offense that is active on the user

Python code:
 db.session.query(
      Post,
      Thread.title,
      User.username,
      User.byline,
      User.avatar_url,
      User.registered_on,
      User.id
      Forum.id,
      Forum.title,
      func.group_concat(offense.id),
      func.group_concat(offense.is_ban),
      func.group_concat(func.coalesce(offense.end_time, "None")),
      func.group_concat(offense.post_id),
      active_offense_one.id,
  )
  .join(Thread, base_post.thread_id == Thread.id)
  .join(Forum, Forum.id == Thread.forum_id)
  .join(some_user, some_user.id == base_post.author_id)
  .join(
      offense,
      and_(
          offense.post_id == base_post.id,
          offense.approved_user_id > 0,
          or_(offense.end_time > current_time, offense.is_ban == True),
      ),
      isouter=True,
  ) // this join ties all the offenses on the post to 
  .group_by(
      base_post.id,
      active_offense_one.id
  )
which translates to this SQL:

SQL code:
SELECT 
  posts_1.id AS posts_1_id, 
  posts_1.body AS posts_1_body, 
  posts_1.posted_at AS posts_1_posted_at, 
  posts_1.updated_at AS posts_1_updated_at, 
  posts_1.ip_address AS posts_1_ip_address, 
  posts_1.thread_id AS posts_1_thread_id, 
  posts_1.author_id AS posts_1_author_id, 
  threads.title AS threads_title, 
  users_1.username AS users_1_username, 
  users_1.byline AS users_1_byline, 
  users_1.avatar_url AS users_1_avatar_url, 
  users_1.registered_on AS users_1_registered_on, 
  forums.id AS forums_id, 
  forums.title AS forums_title, 
  """
  using group_concat to get a CSV of all the offenses tied to this specific post
  """
  group_concat(offenses_1.id) AS group_concat_1, 
  group_concat(offenses_1.is_ban) AS group_concat_2, 
  group_concat(coalesce(offenses_1.end_time, %(coalesce_1)s)) AS group_concat_3, 
  group_concat(offenses_1.post_id) AS group_concat_4, 
  offenses_2.id AS offenses_2_id, users_1.id AS users_1_id
FROM offenses AS offenses_2, posts AS posts_1 
INNER JOIN threads ON posts_1.thread_id = threads.id 
INNER JOIN forums ON forums.id = threads.forum_id 
INNER JOIN users AS users_1 ON users_1.id = posts_1.author_id 
LEFT OUTER JOIN offenses AS offenses_1 
  ON offenses_1.post_id = posts_1.id 
  AND offenses_1.approved_user_id > %(approved_user_id_1)s 
  AND (offenses_1.end_time > %(end_time_1)s OR offenses_1.is_ban = true) 
GROUP BY posts_1.id, offenses_2.id
That's all great and working fine, BUT I can't figure out a good way to get the latest active offense the user has. i've been bashing my head against the wall for 2 days on this

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
also i think i've posted about using ORMs before here and was told they suck, and now i'm finally understanding it. they generate some rear end queries, which is why i'm trying to manually write this query without letting the ORM get the relational data.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
man i literally got there right before you posted that. thanks!!!! i have been ready to smash my keyboard looking for this solution

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
i previously posted ITT about wanting to implement Redis for some paginated SQL queries and one suggestion was to request all the data and just store it as one object and just bust the cache when an item is updated or added (items cannot be removed at this time)

But i was thinking of another possible solution and i'd like to be sanity checked here. What I was thinking is:

1. create a factory function that will take page number and return a SELECT statement for a page of items
2. when calling my SELECT statements in whatever GET request, run the factory function with the requested page and store the returned query as the Redis key (if the Redis key/value doesn't already exists. if it does, just read from memory)
3. when adding/updating an item, run another SELECT query to get what page this item is going to be on and then execute the factory function again with the calculated page number to get what the SELECT statement is going to be and then nullify that Redis key/value

does this sound like it could work? am i overengineering this? i'm just really not sure how to properly use Redis with paginated data that can always be added to or updated

teen phone cutie fucked around with this message at 23:11 on Mar 14, 2023

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

nielsm posted:

I think you're thinking of a suggestion I made, but what I was thinking of was actually to use the cache to keep the result set constant while the user was browsing it. That is, nothing would invalidate or update the cached result set, other than a timeout or other indication that the user was done using that result set. In that model, you don't attempt to update the cached result set at all, but the user also won't see new items unless they explicitly invoke a new search or request a reload of the result set. In return, the user gets fully predictable paging with no results jumping around or getting skipped when switching page.

gotcha yeah i figured i was bastardizing the explanation a bit. though this solution still doesn't seem optimal for my use-case. the paginated items are comments in a thread, so the page results are going to be consistent other than a comment being edited or a new comment being added, which i would like to be instant with adds/updates busting the cache.

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself
i have a crazy query that I need a bit of help optimizing. most of it is fast, but there's one piece of it that is slowing it down to a screeching halt

Basically I have an endpoint that is returning me a paginated array of posts that belong to threads, which themselves belong to forums. I want to get the users active ban. This all works fine. The part that's slowing this down is the "author_total_posts" because I want to return the count of every single post the user made

JavaScript code:
Array<{
  "id": 123,
  "body": "hello world",
  "author_username": "teen phone cutie",
  "author_id": 1,
  "thread_id": 123,
  ...more_post_and_thread_info
  "active_offense": {
    "is_ban": true,
    "id": 123,
    ...more_info_about_offense
  },
  "author_total_posts": 400 // this is the data that I need that's causing the slowdown
}>
The SQL to generate me this response looks something like this:

SQL code:
SELECT   posts_1.id,
                posts_1.body,
                posts_1.thread_id,
                posts_1.author_id,
                threads.title,
                users.username,
                users.byline,
                forums.id,
                forums.title,
                Group_concat(offenses_1.id),
                Group_concat(offenses_1.is_ban),
                Group_concat(COALESCE(offenses_1.end_time, "2024-01-14T13:01:52.352914Z")),
                Group_concat(offenses_1.post_id),
                offense_lat.id,
                offense_lat.is_ban,
                offense_lat.end_time,
                post_count_lat.count_1
FROM            posts AS posts_1
INNER JOIN      threads
ON              posts_1.thread_id = threads.id
AND             threads.is_deleted = false
AND             posts_1.thread_id = 11
INNER JOIN      forums
ON              forums.id = threads.forum_id
INNER JOIN      users
ON              users.id = posts_1.author_id
LEFT OUTER JOIN offenses AS offenses_1
ON              offenses_1.post_id = posts_1.id
AND             offenses_1.approved_user_id > 0
LEFT OUTER JOIN lateral
                (
                         select   offenses_2.id                      AS id,
                                  offenses_2.end_time                AS end_time,
                                  offenses_2.is_ban                  AS is_ban,
                                  offenses_2.post_id                 AS post_id,
                                  offenses_2.private_message_post_id AS private_message_post_id
                         FROM     offenses                           AS offenses_2
                         WHERE    (
                                           offenses_2.is_ban = true
                                  OR       offenses_2.end_time > "2024-01-14T13:01:52.352914Z")
                         AND      offenses_2.approved_user_id > 0
                         AND      offenses_2.offender_id = posts_1.author_id
                         ORDER BY offenses_2.is_ban DESC,
                                  offenses_2.end_time DESC limit 1) AS offense_lat
ON              true = 1
LEFT OUTER JOIN lateral
                (
                           SELECT     count(posts_2.thread_id) AS count_1
                           FROM       posts                    AS posts_2
                           INNER JOIN threads
                           ON         threads.id = posts_2.thread_id
                           WHERE      posts_1.author_id = posts_2.author_id
                           AND        threads.is_deleted = false) AS post_count_lat
ON              true = 1
WHERE           threads.id = 11
GROUP BY        posts_1.id,
                offense_lat.id,
                post_count_lat.count_1
ORDER BY        posts_1.posted_at ASC
LIMIT 25
OFFSET 475
The problem I'm having is despite this being a paginated request to get 25 posts, that last LEFT OUTER JOIN takes like 2 seconds to process, and I can't figure out a more optimal way to get the total users posts for each post that I want to return.

Hopefully this makes sense. I can provide more information about the tables if need be but yeah the gist is just that last LEFT OUTER JOIN takes forever to get the total post count for each post author that's being returned in a list of posts

teen phone cutie fucked around with this message at 02:03 on Jan 15, 2024

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Hammerite posted:

is that absolutely the smallest example you can create that exhibits the undesired behaviour?

you refer to "that second LEFT OUTER JOIN" but then you refer to it as a query that counts posts. which appears to be the third LEFT OUTER JOIN in that query; the second "lateral" one.

I don't understand why you need to use this fancy LATERAL feature that I've never encountered before; why do you need to do that rather than just put the join logic in the ON clause? What, if anything, is different if you do? (this applies to both "lateral" joins)

yep sorry i meant the third. The lateral thing I picked up from stackoverflow, so I'm not sure if there's an easier way to accomplish what I want. SQL is not my biggest strength.

As I understand it, the reason I need the lateral subquery is because the main query is getting a page of posts.

So let's say for example the query is getting page 301 of the "SELECT * FROM Questions WHERE Type = 'Stupid'" thread. BUT I want to also get all the posts you've made on SA total and list it under your avatar. So in your case you have "14748" posts and I need to get that from all threads that are not deleted.

So the thread in the outer-most query is just this thread, but in the lateral subquery, it's all threads you've posted in that are not deleted

Hammerite posted:

In most cases where you refer to multiple instances of a table you give each instance a unique alias, but you didn't do that for "threads". It's referenced in the outermost query and in the counting query as "threads". I can't imagine that "threads" in the counting query could sensibly refer to anything but the most local one, but for consistency if nothing else you might as well make sure.

so you're saying make 2 different aliases for each, the outer and the subquery?

teen phone cutie fucked around with this message at 02:22 on Jan 15, 2024

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

Hammerite posted:

is that absolutely the smallest example you can create that exhibits the undesired behaviour?

When I remove the last outer join, the query goes from 2 seconds to 100ms, so I'm 100% certain it's the problem. Unless, you're suggesting I try something else?

Adbot
ADBOT LOVES YOU

teen phone cutie
Jun 18, 2012

last year i rewrote something awful from scratch because i hate myself

McGlockenshire posted:

What's EXPLAIN say?

Hammerite posted:

This is probably the most practical solution to your immediate problem. Denormalise by storing a user's number of posts as a column in the User table. If it's important that the number be accurate and you're worried that you'll miss cases where it should be updated, there are strategies for dealing with that (triggers, scheduling tasks to update the column)

I don't think so. I mean that where you have "INNER JOIN threads" in the outer query, to be consistent with your practice for other tables it should have been "INNER JOIN threads AS threads_1" and where you have "INNER JOIN threads" in the subquery it would be "INNER JOIN threads AS threads_2".

What happens if you replace the problematic JOIN with this:

code:
LEFT OUTER JOIN (
    SELECT
        "posts_2"."author_id",
        COUNT(*) AS "count_1"
    FROM
        "posts" AS "posts_2"
        INNER JOIN "threads" AS "threads_2" ON
            "threads_2"."id" = "posts_2"."thread_id"
    WHERE
        "threads_2"."is_deleted" = FALSE
    GROUP BY
        "posts_2"."author_id"
) AS "post_count_lat" ON
    "post_count_lat"."author_id" = "posts_1"."author_id"
 
I only just noticed that the outer query has a GROUP BY clause, and I don't understand why that is there or why those specific columns are the grouping ones. If it's necessitated by the uses of Group_concat in the outer query, doesn't it make more sense to put that grouping in a subquery defining offenses_1?

lemme give this stuff a shot when i get some time. will report back!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply