|
wow that's a new one for me, mysql is even more of a piece of poo poo than i thought
|
# ? Dec 20, 2019 23:26 |
|
|
# ? Jun 4, 2024 13:38 |
|
I definitely want to use this as an impetus to get on to postgres, but I can only imagine that having that many joins is also probably evidence that my query is also a massive piece of poo poo, no?
|
# ? Dec 21, 2019 03:46 |
|
Maybe you’re just in 6nf but 61 joins is generally a lot, yeah.
|
# ? Dec 21, 2019 04:14 |
|
prom candy posted:Alright so I've been working on this query for the past week or so and I'm making a ton of progress. It's super gnarly but all my specs are passing which is really exciting after so much wheel-spinning. Then I ran it against some production data and ran into "MySQL can only use 61 tables in a join." I'm creating the query dynamically because a parent account needs to see data broken down by all their child accounts, so for a parent that has lots of children, I'm basically creating another join + several columns for each one like "child_1_stat_a", "child_1_stat_b", "child_2_stat_a", "child_2_stat_b", etc. I guess when we get to a parent that has a lot of children this has the effect of creating too many joins. Each of the child account's joins use a sub-query that also has 3-4 joins of its own (basically querying a generic table again like that "coins" table and then doing joins to limit it to just the ones that are owned by that child account.) you need what's called a recursive query. Essentially find me accounts, including all subaccounts, including all subaccounts of those subaccounts, etc etc. That's the CTE that gives you your "objects of interest". Then you INNER JOIN to account_table and SELECT WHERE condition = xyz AND sum(balance) = abc, perform any relevant GROUP BY or group summary calls, etc (This is also a thing in PG or most other relevant DBs) Paul MaudDib fucked around with this message at 05:55 on Dec 21, 2019 |
# ? Dec 21, 2019 04:28 |
|
Paul MaudDib posted:you need what's called a recursive query. Essentially find me accounts, including all subaccounts, including all subaccounts of those subaccounts, etc etc. That's the CTE that gives you your "objects of interest". Then you INNER JOIN to account_table and SELECT WHERE condition = xyz AND sum(balance) = abc, perform any relevant GROUP BY or group summary calls, etc I started to type up the same suggestion, but stopped myself. Could you tell us a bit more about your schema if CTE doesn't do the trick?
|
# ? Dec 21, 2019 18:49 |
|
Nth Doctor posted:I started to type up the same suggestion, but stopped myself. Could you tell us a bit more about your schema if CTE doesn't do the trick? not knowing the schema is like flying blind. literally anything could be the issue, there's not enough info in the question beyond "mysql is poo poo"
|
# ? Dec 22, 2019 05:16 |
|
Yeah I'm going to come back with a fiddle after the holidays, last time I did that I got halfway through it and solved my own problem.
|
# ? Dec 23, 2019 05:17 |
|
prom candy posted:Alright so I've been working on this query for the past week or so and I'm making a ton of progress. It's super gnarly but all my specs are passing which is really exciting after so much wheel-spinning. Then I ran it against some production data and ran into "MySQL can only use 61 tables in a join." I'm creating the query dynamically because a parent account needs to see data broken down by all their child accounts, so for a parent that has lots of children, I'm basically creating another join + several columns for each one like "child_1_stat_a", "child_1_stat_b", "child_2_stat_a", "child_2_stat_b", etc. I guess when we get to a parent that has a lot of children this has the effect of creating too many joins. Each of the child account's joins use a sub-query that also has 3-4 joins of its own (basically querying a generic table again like that "coins" table and then doing joins to limit it to just the ones that are owned by that child account.) Why are you trying to force all of the children's attributes into a single row? Return a row per child and you won't have this ridiculous problem. Are you also storing the children's IDs in numbered columns (or a single "list of values" column)? (You should not be)
|
# ? Dec 30, 2019 15:43 |
|
The reason I'm returning everything in one row is because that's how the data ends up getting displayed, and we also need to be able to order and paginate the results. Each row in the report shows the total value for the parent account, and then a breakdown by child. It's entirely possible/extremely probable I'm doing everything all wrong. I've been working with SQL for like 15 years but 99% of that has just been having my hand-held by ORMs doing pretty basic stuff. This is definitely the most complicated report I've had to build. Potentially I could strip out the child fields and then change how the report is displayed so that someone needs to expand a row to see the child breakdown but I'm definitely going to get pushback on that as management wants it just-so.
|
# ? Jan 4, 2020 19:21 |
|
The CTO is encouraging everyone to get training in something they're interested in, so I'd like to ask for recommendations for books or (paid) online courses about becoming better at optimizing Postgres. At some kind of intermediate level, I guess? I like to think I'm pretty good at SQL in general and I can read and understand the gist of a query plan, so I can usually point at what is making a query slow, but when it comes to "how do I stop it from doing a sequential scan on this ginormous table" I'm sorta lost and usually reduced to loving around and throwing poo poo at the wall until something sticks. A book about data warehousing architecture in general would also be interesting, but there I have no idea where to even start.
|
# ? Jan 8, 2020 13:35 |
|
prom candy posted:The reason I'm returning everything in one row is because that's how the data ends up getting displayed, and we also need to be able to order and paginate the results. Each row in the report shows the total value for the parent account, and then a breakdown by child. SQL code:
|
# ? Jan 8, 2020 15:15 |
|
Cool, thanks! I think I understand what's happening there. I'm going to try to come up with an SQL Fiddle that matches my schema/what I'm trying to do so that I can actually ask my question with a proper example.
|
# ? Jan 8, 2020 15:35 |
|
TheFluff posted:... PostgreSQL Or something local https://www.postgresql.org/about/events/ Or start here https://www.postgresql.org/docs/books/ But seriously the documentation is levels above and contains most of the info you need to get started for 90--95%+ of all use cases. Reading the documentation or the new-feature logs from recent versions should give you lots of ideas about how to improve things.
|
# ? Jan 8, 2020 15:57 |
|
We want to allow users to specify aliases for columns in one of our SQL queries, like so:code:
code:
|
# ? Jan 9, 2020 19:46 |
|
Pollyanna posted:We want to allow users to specify aliases for columns in one of our SQL queries, like so: use your ORM's programmatic/criteria-style query builder tool and add the user input as an alias. Then you can safely accept the query string through the normal mechanisms. in hibernate you'd do this with Criteria and criteria.alias(). Paul MaudDib fucked around with this message at 19:59 on Jan 9, 2020 |
# ? Jan 9, 2020 19:55 |
|
This isn’t going through an ORM - it’s a piece of SQL that gets submitted to an outside resource. As far as I can tell, ActiveRecord doesn’t really have sanitization functions specifically for this use case.
|
# ? Jan 9, 2020 22:16 |
|
Pollyanna posted:This isn’t going through an ORM - it’s a piece of SQL that gets submitted to an outside resource. As far as I can tell, ActiveRecord doesn’t really have sanitization functions specifically for this use case. so what you're saying is that you're building an ad-hoc, in-house ORM, right? it's OK, we've all been there. run it through an ORM, get it to give you the generated SQL, and submit the SQL to the outside resource.
|
# ? Jan 9, 2020 22:22 |
|
Why is a cosmetic user choice being propagated all the way to the database query in the first place? If you want to let them write their own column headers, change the header text in the grid visualization, or worst case map your object to a dictionary with the user's inputs as the keys. The famous rule is "don't put business logic in your presentation code". Here it sounds like you're putting presentation code into your business logic, which is no less sloppy. NihilCredo fucked around with this message at 22:31 on Jan 9, 2020 |
# ? Jan 9, 2020 22:29 |
|
We’re ultimately generating a CSV for them and offering to specify an arbitrary alias for the columns. Prior to this, we were generating a CSV ourselves, then opening it back up and renaming the columns before sending the file. Now we have a workflow where we send a SQL query to Snowflake, and it outputs the results as a CSV to S3, which we can then download and send. We were hoping to simply have Snowflake handle the column renaming for us by specifying aliases in the SQL query The more I think about this, the less I like it. This feels like a bad idea. We could simply prevent users from using double quotes in their aliases, but I don’t know if there’s any other cases we’d miss by calling it done there. Maybe we should post-process the CSV instead... Should I push back on this? Edit: Another engineer is unsure of this so we’ll hash it out from here. Pollyanna fucked around with this message at 22:47 on Jan 9, 2020 |
# ? Jan 9, 2020 22:39 |
|
Doing a find/replace on the first line of a CSV is safely trivial and trivially safe, so yes, absolutely do that between the downloading and sending instead of mangling sql queries at runtime. poo poo, you even said you already had working code that did exactly that when the CSV came from a different place.
|
# ? Jan 9, 2020 22:52 |
|
Yeah, I know. I’ll figure it out from here.
|
# ? Jan 9, 2020 23:22 |
|
NihilCredo posted:Doing a find/replace on the first line of a CSV is safely trivial and trivially safe, so yes, absolutely do that between the downloading and sending instead of mangling sql queries at runtime. Would definatly recommend changing the header rather then trying to do the rename as part of the SQL statement, more so if you're not using an ORM.
|
# ? Jan 10, 2020 07:22 |
|
Anyone here mess with Cerner and their CCL? With the help of this thread, I was just getting comfortable with writing basic queries in SQL Server. Recently changed jobs and this new places uses Cerner, which sounds like its built on top of Oracle and uses a proprietary SQL layer called CCL. One thing I've noticed immediately is how great SSMS is compared to this in-house Cerner tool.
|
# ? Jan 22, 2020 00:19 |
|
Having just looked up that on Wikipedia, that looks hideous. Although it does more than SQL, it's still actually more unreadable than any of the DB specific procedural languages I've seen.
|
# ? Jan 22, 2020 00:36 |
|
I'm trying to create a trigger to make a column a monotonically increasing ordering partitioned by user and it's not going well. So far I haveSQL code:
Also open to better ideas for populating this column in the DB, if any exist. I know it's possible to make a view that has that column on it, but I'm assuming making that work in Entity Framework would be painful and leave some automated footguns lying around. Munkeymon fucked around with this message at 21:54 on Jan 22, 2020 |
# ? Jan 22, 2020 18:43 |
|
Maybe I'm missing something super obvious, but why does it have to be a trigger? Can't you just use INSERT INTO Products SELECT (query which calculates the appropriate index goes here)?
|
# ? Jan 23, 2020 00:10 |
|
I'll make a few guess and try not to be too offended. Why are you trying to update the same row using an AFTER trigger? Seems like you'd get a race condition. Presumably you're trying to Do Instead. Why are you trying to update a (temporary) subselect? Seems like you need to update the table itself, not some queried view of it. What is that mess? How is it not just Select 1+coalesce(max(index),0) where user=old.user and product=old.product? Does it work when the transaction inserts multiple rows in the same statement?
|
# ? Jan 23, 2020 04:51 |
|
it looks to me like the design goal is to insert a thing now and later to run the updates to the indexes that might reference it, since those would be slower.
|
# ? Jan 23, 2020 05:54 |
|
Paul MaudDib posted:it looks to me like the design goal is to insert a thing now and later to run the updates to the indexes that might reference it, since those would be slower. Fails at meeting that goal. Triggers are considered part of the same unit of work transaction as the query that triggers it. If a trigger fails the triggering statement will too. Also triggers are synchronous and so the process won’t finish until the trigger does as well. If that is truly the goal then you need some asynchronous method like a messaging queue via service broker, or a simple last update timestamp and recurring process. But I think the real correct solution is what others posted above me (calc it for that record only as part of the insert). The only wrinkle is if on deleting a record you want to collapse that index down to fill that space but the trigger didn’t do that so... Why do you want this field anyway?
|
# ? Jan 23, 2020 08:21 |
|
TheFluff posted:Maybe I'm missing something super obvious, but why does it have to be a trigger? Can't you just use INSERT INTO Products SELECT (query which calculates the appropriate index goes here)? Because EF writes the INSERT, not me. Now that you mention it, I should see if I can write a custom inserter for this one type. Avoiding a load-bearing trigger would probably be worth it. PhantomOfTheCopier posted:I'll make a few guess and try not to be too offended. I'm trying to resolve a distributed concurrency race condition by using the transactional guarantees in SQL, actually. Ruggan pointed out how that'd work. quote:Why are you trying to update a (temporary) subselect? Seems like you need to update the table itself, not some queried view of it. That updates the table just fine, which is a neat trick I just learned the other day. The subquery is there to generate row numbers for all the User's Products - inner joining on only the changed rows without the subquery just generates row numbers for the changed rows. quote:What is that mess? How is it not just Select 1+coalesce(max(index),0) where user=old.user and product=old.product? max(Index) only generates one value, which breaks multi-row inserts. "that mess" is to deal with multi-row inserts that might even have more than one User. Now that I think about it, it's probably possible to start the row count at max(Index), but I kinda doubt that'd resolve the deadlock or make this appreciably more efficient, since any individual User isn't expected to have very many Products in our business model. Ruggan posted:Why do you want this field anyway? Questionable API design. User+Index is basically a composite key for a Product, but only for the New API and nobody wanted to make a link table to store the Index by User, I guess, so they halfassed the hell out of it and generated the Index in the app server. It works fine when you poke at it manually with Postman, so it ships lol TBF, it also works fine with the normal traffic in Prod so far, too. However, automated scripts that hit the API as fast as possible make duplicate Indexes, and it's not totally unreasonable that one of our consumers might design their integration to batch calls at the end of a workflow, so I want to make this as bulletproof as possible just in case. Anyway, this works: SQL code:
|
# ? Jan 23, 2020 15:46 |
|
Is (UserId,ProductId) unique? If so, why not just make that the composite key? If not, are you sure you're OK with the key being something that could change based on the phase of the moon?
|
# ? Jan 23, 2020 17:37 |
|
Jethro posted:Is (UserId,ProductId) unique? If so, why not just make that the composite key? If not, are you sure you're OK with the key being something that could change based on the phase of the moon? Yes, they're IDENTITYs for their respective tables and it's too late the weird Index thing is in prod and I just have to deal. e: just realized I didn't adequately answer the second question: the IDs are meant to be shown to people as well, apparently, so "9239843576" is not acceptable as a substitute for "1". Like I said: questionable API design Munkeymon fucked around with this message at 17:53 on Jan 23, 2020 |
# ? Jan 23, 2020 17:43 |
|
I see two problems with your design. 1) as written in your post, your updated trigger will set the Index for all rows with the given productId, not just the row for that UserId, ProductId combo. So if ProductId 1234 is Index 5 for UserId 123, the trigger will set Index to 5 for ProductId 1234 for all users. 2) Are you sure you want it to have the same ordering as ProductId instead of something like insert order? If UserId 123 has ProductId 1234 and 3456, do you really want the Index on the 3456 row to change when someone inserts ProductId 2345? You said this is the composite key in the app. No offense to your coworkers, but I don't imagine the people who made that design decision were ready to deal with the key for a record changing under them. If you've got a new enough version of MSSQL, you might be able to hack something together with a UDF that gets values from a set of sequences.
|
# ? Jan 23, 2020 19:10 |
|
Jethro posted:I see two problems with your design. I think I've miscommunicated something. Users → Products is one to many. One User gets as many entries in the Product table* as they want and ProductId is just int IDENTITY and PK so it is insert order. The Product table has an FK into users, so given a UserId and an Index, you should get exactly one Product row back, by the API design. Thanks for pointing out that I was updating all Product rows for a User - that was not intentional. However, having had time to stress test it, I'm getting duplicate Indexes again, so (╯°□°)╯︵ ┻━┻ I have no flucking clue what to do anymore *we only have one product we offer to end users, which is why I named it that - sorry for the confusion http://sqlfiddle.com/#!18/9f8a9/6 is a fiddle that'll at least show the schema and the current trigger but won't run, but people can copy it into SSMS and... not the see the problem because I only get duplicate Indexes randomly when running the API service so FML, I guess Munkeymon fucked around with this message at 20:57 on Jan 23, 2020 |
# ? Jan 23, 2020 19:58 |
|
Munkeymon posted:I think I've miscommunicated something. Users → Products is one to many. One User gets as many entries in the Product table* as they want and ProductId is just int IDENTITY and PK so it is insert order. The Product table has an FK into users, so given a UserId and an Index, you should get exactly one Product row back, by the API design. quote:Thanks for pointing out that I was updating all Product rows for a User - that was not intentional. However, having had time to stress test it, I'm getting duplicate Indexes again, so (╯°□°)╯︵ ┻━┻ I have no flucking clue what to do anymore I'm thinking this is the kind of race condition that you can really only solve with something purpose built like IDENTITY columns or sequences or the like.
|
# ? Jan 23, 2020 20:57 |
|
Jethro posted:Oh, I thought ProductId was a FK to another table where it was an IDENTITY column. In that case, my "bug report" was spurious. Yeah, I'm going to try to redo it with a sequence.
|
# ? Jan 23, 2020 22:23 |
|
Regarding the above, max works just fine when you trigger for each row instead of for each statement. My primary point was, "Given all the joins and windowing you have going on, it's no wonder that it's deadlocking (and will probably have significant performance issues as well)". Coupled with a little bit of, "maybe go back and restart with a much simpler solution then build on top of that". But
|
# ? Jan 23, 2020 23:17 |
|
Munkeymon posted:Because EF writes the INSERT, not me. Now that you mention it, I should see if I can write a custom inserter for this one type. Avoiding a load-bearing trigger would probably be worth it. "load-bearing trigger" is a great bit, I'm stealing that I ported an application to a different ORM and some of the things you can do with the mappings are subtly different at the margins, so to paper over the cracks we have a lot of "load bearing triggers and views". My favorite one was this one-to-many relationship that we couldn't get to work right using a one-to-many mapping due to some issue with data shape, so I implemented it as a many-to-many table using a view to create a "join table" that's selected out of the child table and then using INSTEAD OF INSERT OR UPDATE OR DELETE DO BEGIN NULL END triggers to throw operations on the "join table" in the trash. Paul MaudDib fucked around with this message at 00:43 on Jan 24, 2020 |
# ? Jan 23, 2020 23:47 |
|
PhantomOfTheCopier posted:Regarding the above, max works just fine when you trigger for each row instead of for each statement. My primary point was, "Given all the joins and windowing you have going on, it's no wonder that it's deadlocking (and will probably have significant performance issues as well)". Coupled with a little bit of, "maybe go back and restart with a much simpler solution then build on top of that". But That'd be cool. How do I do that? I don't see anything in the spec to make a trigger per-row instead of per statement, so I did what I had to do to make it work in the general case. Paul MaudDib posted:"load-bearing trigger" is a great bit, I'm stealing that The possibility of the table becoming a (potentially materialized) view that redirects all operations via triggers unspooled in my mind the other day and I took a little walk to the window to watch the snow fall to try to push the maintenance implications out of my head
|
# ? Jan 24, 2020 15:21 |
|
|
# ? Jun 4, 2024 13:38 |
|
Munkeymon posted:That'd be cool. How do I do that? I don't see anything in the spec to make a trigger per-row instead of per statement, so I did what I had to do to make it work in the general case.
|
# ? Jan 24, 2020 20:07 |