SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

Ruggan: Feb 20, 2007; WHAT THAT SMELL LIKE?!

Hughmoris posted:

Hmm. Yeah, I think I'm going to have to figure out how to use the second method you recommended. The outer apply query is taking about 7 minutes for 1 month.

What are the table names and relevant column names of the two tables?

# ? Aug 21, 2018 23:50

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 18:59

anthonypants: May 6, 2007; by Nyc_Tattoo; Dinosaur Gum

Do any of you guys do version control with SQL code, and if so what do you use and how do you use it?

# ? Aug 22, 2018 19:29

Kuule hain nussivan: Nov 27, 2008

anthonypants posted:

Do any of you guys do version control with SQL code, and if so what do you use and how do you use it?

I've just been pushing stuff to git. Individual .sql files for table creation, procedures, possible initialization data etc. I'm sure there's a better way, but I haven't had any major issues yet.

# ? Aug 22, 2018 19:35

kumba: Nov 8, 2003; I posted my food for USPOL Thanksgiving!

enjoy the ride; Lipstick Apathy

i'm a fan of SVN but that's only because i mostly write ad hoc queries and have no reason to have the fancy deployment capabilities of git

# ? Aug 22, 2018 19:38

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

anthonypants posted:

Do any of you guys do version control with SQL code, and if so what do you use and how do you use it?

Any version control system will handle sql files just fine. Even if you use a visual designer or something, just export the schema as a .sql file and VC that. There's no reason not to.

The more interesting question is whether to manage migrations manually or using some kind of declarative tool, which depends heavily on what your deployment story looks like.

# ? Aug 22, 2018 20:28

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Yeah if you don't care about deployment just about anything can handle versioning, even like command line Visual Source Safe if you wanted (does that still exist? brrr)

# ? Aug 22, 2018 21:07

anthonypants: May 6, 2007; by Nyc_Tattoo; Dinosaur Gum

NihilCredo posted:

Any version control system will handle sql files just fine. Even if you use a visual designer or something, just export the schema as a .sql file and VC that. There's no reason not to.

The more interesting question is whether to manage migrations manually or using some kind of declarative tool, which depends heavily on what your deployment story looks like.

Is "just export all your existing databases using 'script to create as' and then generate incremental .sql files for each change" really best practices

# ? Aug 22, 2018 21:34

Kuule hain nussivan: Nov 27, 2008

anthonypants posted:

Is "just export all your existing databases using 'script to create as' and then generate incremental .sql files for each change" really best practices

Not trying to be snarky, but what are the major downsides of this simple method?

# ? Aug 22, 2018 21:38

spiritual bypass: Feb 19, 2008; Grimey Drawer

anthonypants posted:

Is "just export all your existing databases using 'script to create as' and then generate incremental .sql files for each change" really best practices

It's better than nothing, which is the only alternative I've seen besides extremely expensive proprietary database versioning tools that don't work very well

# ? Aug 22, 2018 21:38

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Definitely standard repository best practices, but it's more a matter of creating appropriate roll forward/backward SQL. Dumping the DDL is a start, but after that you'll want the specific ALTER TABLE script and corresponding rollback in the repo. The changes should obviously be paired with the test cases.

# ? Aug 22, 2018 22:00

thebigcow: Jan 3, 2001; Bully!

https://cloudblogs.microsoft.com/sqlserver/2017/11/15/announcing-sql-operations-studio-for-preview/

SQL Operations studio has git support built right in if that interests you.

# ? Aug 24, 2018 01:22

Cold on a Cob: Feb 6, 2006; i've seen so much, i'm going blind
and i'm brain dead virtually; College Slice

anthonypants posted:

Do any of you guys do version control with SQL code, and if so what do you use and how do you use it?

I use SQL Server Data Tools. It's a visual studio add-on, works for SQL server and azure dbs, and since it's just a collection of files it works fine with any version control you want to use (we use TFS, but i've seen it used with git and svn). It lets you handle structure (tables, procs, schemas, etc) and pre/post deploy sql scripts for seeding and manipulating data. It can also create your initial project by reading the existing schema of a database and reverse engineering it into a set of scripts for you (it obviously can't do seed data for that though). It will also do some rudimentary validations on your procedures, functions, and views during "compile time".

If SSDT doesn't suit your needs or you want something that works with non-SQL Server, another option I've explored (but not yet used) is DbUp. It seems like a good choice if you want to write all the change scripts manually, whereas with SSDT it generates a lot of the scripts for you. Sometimes SSDT is way overkill for my needs or doesn't give me the fine-grained control I want so I'm eager to try DbUp sometime.

# ? Aug 24, 2018 10:29

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

rt4 posted:

It's better than nothing, which is the only alternative I've seen besides extremely expensive proprietary database versioning tools that don't work very well

Liquibase has worked extremely well for us. It was not a big hassle to export our existing DB, in fact we used it as a first step to migrate from MySQL to Postgres while maintaining support for both in a single changelog file. I like that it keeps track of which changes have already been executed on each enviornment, which makes it safe to rerun all the changes on application start.

# ? Aug 24, 2018 18:58

Rick: Feb 23, 2004; When I was 17, my father was so stupid, I didn't want to be seen with him in public. When I was 24, I was amazed at how much the old man had learned in just 7 years.

sl;dfjkl;jsdfakd;jf kdsf kljfskdfj kl

Databasetype: mariadb running on windows via Xampp

I was trying to remove privileges to a table that I don't want to be updated via php anymore and accidentally removed root's global permission to update. IT TURNS OUT YOU CAN'T DO A WHOLE LOT LIKE THIS. I've spent roughly three hours trying various fixes from google but can't seem to fix it. Like I try to load mysql with the skip-permissions thing on and it still denies me permission to update. I've tried loogging in as other versions of root and can't seem to do that right either.

Can I do anything at all? Should I just export the databases and start over? This is just a personal thing I run on my pc, if this was for development for anyone else I'd have a backup global admin account and probably wouldn't be using root to begin with.

E: Solved. In Xampp you have to actually create the my.cnf so that you can include the skip-grant-tables. Once you do that, GUIs may still throw a fit when you try to update user permissions but if you just update them directly in the mysql database, it works.

Rick fucked around with this message at 01:04 on Sep 9, 2018

# ? Sep 8, 2018 23:10

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

How do I partition based on the first character of a column in a MySQL 5.6 database?

My column contains special characters and whatnot so I want to partition based on the ASCII value of the first character in the column.

I have the ASCII ranges I want to use for the eight partitions, but I don't actually know how to set up the partition:
1 37
38 42
43 47
48 53
54 67
68 73
74 81
82 255

How do I define these partitions at table creation?

edit: Or is there a better way of doing this?

Agrikk fucked around with this message at 01:56 on Sep 23, 2018

# ? Sep 22, 2018 21:21

McGlockenshire: Dec 16, 2005; GOLLOCKS!

It sounds like you want partitioned tables? If so, 5.6 only supports checking the entire value of the field. Consider pre-processing your data to add another column that contains the numeric ASCII value of the first character in the magic column, and partition using PARTITION BY RANGE on that. It might be possible to do this automatically using a trigger and the ASCII builtin... might. I don't know how triggers and partitioning interact.

# ? Sep 23, 2018 04:03

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Thanks for this link.

It actually led me to KEY PARTITIONING which ultimately does what I was trying to do with grouping by first letter, which was assign data evenly across partitions based on one or more columns.

Cheers!

# ? Sep 23, 2018 06:20

Pollyanna: Mar 5, 2005; Milk's on them.

We have a Postgres table with roughly 10 million to 15 million rows, and some of the columns in that table contain PII. We want to blank out that PII in those columns and I tried to do so via an �update_all� in Rails equivalent to �update <table> set <the columns = fake data>�, but it seems to take a very very long time. Why would this happen? Is there a way to figure out why it would be so slow? Is the time it takes to update an entire column in a table just that long?

# ? Sep 27, 2018 19:29

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Pollyanna posted:

We have a Postgres table with roughly 10 million to 15 million rows, and some of the columns in that table contain PII. We want to blank out that PII in those columns and I tried to do so via an �update_all� in Rails equivalent to �update <table> set <the columns = fake data>�, but it seems to take a very very long time. Why would this happen? Is there a way to figure out why it would be so slow? Is the time it takes to update an entire column in a table just that long?

There's a lot of questions about the structure to answer I think.

One is, what is the criteria you are updating by? If it's an indexed column (e.g. ID) it's faster than a string compare on an unindexed column (e.g. WHERE ColumnPII LIKE '%x%'). If there's no criteria at all (e.g. apply to entire table) then what columns are being modified and how? Are those columns indexed?

# ? Sep 27, 2018 20:24

Pollyanna: Mar 5, 2005; Milk's on them.

Scaramouche posted:

There's a lot of questions about the structure to answer I think.

One is, what is the criteria you are updating by? If it's an indexed column (e.g. ID) it's faster than a string compare on an unindexed column (e.g. WHERE ColumnPII LIKE '%x%'). If there's no criteria at all (e.g. apply to entire table) then what columns are being modified and how? Are those columns indexed?

Apply to entire table, yeah. The table has about 16 btree indexes applied to it (things like created_at, updated_at DESC, uuid, email, etc). It�s supposed to be flat-out just this:

code:

UPDATE table
SET phone_number = �000-000-0000�;

where there is no index on phone number. I want to expand this to email too, but that definitely does have an index on it.

EDIT: I tried updating just one of them and it also takes forever. What the hell.

Pollyanna fucked around with this message at 20:53 on Sep 27, 2018

# ? Sep 27, 2018 20:44

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Pollyanna posted:

We have a Postgres table with roughly 10 million to 15 million rows, and some of the columns in that table contain PII. We want to blank out that PII in those columns and I tried to do so via an �update_all� in Rails equivalent to �update <table> set <the columns = fake data>�, but it seems to take a very very long time. Why would this happen? Is there a way to figure out why it would be so slow? Is the time it takes to update an entire column in a table just that long?

A few questions:
Are you testing what happens when you do the update query directly in the DB, or are you only testing through Rails? Are you sure Rails is doing what you are expecting and not something stupid?
Is there a lot of activity happening on that table while you're running the update? Are you sure there's nothing else blocking you?
Maybe my opinions are skewed from being old now, but 10 million rows isn't nothing (though it's certainly not Big Data). Is there some way you could break things up so that you can 1) maybe do things in a more parallel fashion and 2) not have a lot of resources devoted to one big ongoing uncommitted transaction. I don't have any experience with Postgres so I don't know how much issue 2 applies, nor do I know if there is there some sort of hint or setting that could enable or force more parallelism.

# ? Sep 27, 2018 21:20

Pollyanna: Mar 5, 2005; Milk's on them.

I�m going through Rails - I�m pretty sure we don�t get direct RW access to the staging database (or if we do, I don�t have it). I don�t know if it�s Rails or the DB, but the fact that it�s hanging on even updating one row where the primary key is supplied implies that it�s not Rails. Right?

I�m gonna have to find a way to get closer to the metal to figure out what�s happening.

# ? Sep 27, 2018 21:34

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

Failing on one row implies it's in the Rails layer to me; the database itself would be very straightforward about if something passed/failed. However if Rails is causing the problem (e.g. permissions, connection string, etc) it could hang around indeterminately. If you can maybe check the Triggers to see if something weird is happening on_update

# ? Sep 27, 2018 22:05

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Slow to update a single row also reeks of Rails to me buuuuut if the application is really dumb nothing prevents that row from being locked and waiting that entire time. EG, if 10k processes are all updating that row, them of course it will be slow. Now if you can guarantee that nothing is happening on the database and it's still slow, then probably it's not the database.

# ? Sep 27, 2018 23:16

DELETE CASCADE: Oct 25, 2017; i haven't washed my penis since i jerked it to a phtotograph of george w. bush in 2003

When you have a series of SQL statements (outside of a transaction) like:

update table set col = newval where id = 1;
update table set col = newval where id = 2;
...
update table set col = newval where id = n;

Then sure the first one will complete, the change will be visible, then the second will execute. (It will also probably be super slow, don't do this.) But when you write SQL of the form:

update table set col = newval;

You are modifying every row in one statement, therefore implicitly in one transaction. You won't see any changes until the entire statement finishes, not even to the first row.

Also, the way Postgres backend store works, an update is basically a delete plus an insert. If you have a bunch of indexes on the table, they also have to be updated for every single row, even if the modification you are making doesn't touch the indexed columns, because the internal id of the row has changed and now the index is out of date. Constraints may need to be checked similarly per row depending on how they are set up. For large tables with many rows, if you can handle the downtime, it is generally fastest to drop all indexes/constraints on or referring to the table, make your modifications, then create the indexes/constraints again. Equivalently, you can create a new table that is identical except for those indexes/constraints, perform your modifications there, add the indexes/constraints back to the new table, and only at the very end do a transactional delete-and-rename to replace the previous table. That's normally what I do for bulk table loads or wholesale modifications like this.

Can't speak to the Rails part, but ultimately all Rails could possibly be doing is SQL statements like the above.

Edit: If you can't even update one row separately then maybe there is a lock preventing it, check your query's status in pg_stat_activity to see.

DELETE CASCADE fucked around with this message at 01:52 on Sep 28, 2018

# ? Sep 28, 2018 01:48

nexxai: Jul 17, 2002; quack quack bjork; Fun Shoe

I have a very small public (non-revenue generating) website that uses MariaDB (Server version: 10.2.18-MariaDB-10.2.18+maria~xenial-log mariadb.org binary distribution) as the database backend. I currently backup the DB by an every-15-minute cron job running mysqldump and then sending the resulting .sql file to a cloud storage provider. I was wondering if anyone has any recommendations for better backup software that's easier to use and restore from, but also that is free (or at least very inexpensive).

# ? Sep 28, 2018 03:11

McGlockenshire: Dec 16, 2005; GOLLOCKS!

Your current solution is the "easiest" one. At some point, you'll find that mysqldump can be kind of a drag on the live database. It's perfectly safe to run your backups from a read-only replica on another machine.

Make sure that you are actively testing restores. Consider automating restores to a separate environment on a continual basis and inspecting the data from time to time. A place I've worked before imported the previous evening's backup into each developer's version of the site every morning, for example.

The "best" way to do MySQL backups is to use something that talks to the built in InnoDB hot backup bits, like Percona Xtrabackup. There are some caveats when using it with MariaDB 10.2 or higher, but it looks like they have a fork that'll work for you, the link is near the bottom of the page. It is absolutely not an easy solution and operating it requires knowledge of a bunch of MySQL jargon, including replication things, even if you don't use replication. Both backups and restores require extra steps and a thorough understanding of the process. Restores also can't be done piecemeal - it's effectively a transaction-safe copy of /var/lib/mysql.

Stick with mysqldump as long as it suits your use case. It's a good enough tool.

McGlockenshire fucked around with this message at 08:22 on Sep 28, 2018

# ? Sep 28, 2018 08:18

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

McGlockenshire posted:

Your current solution is the "easiest" one. At some point, you'll find that mysqldump can be kind of a drag on the live database. It's perfectly safe to run your backups from a read-only replica on another machine.

Given the circumstances, indeed the best solution, and a very thorough writeup.

I've seen exceptions to that last sentence, however, because backup load on a read only results in flooding of the master after the backup concludes. I haven't seen this performance dip on PostgreSQL but it used to be very common on MySQL.

Regarding Percona tools, they are generally required for any production environment, so definitely plan on using them. As above, however, I've seen cases where xtrabackup fails under live load (a while ago, and iirc it couldn't keep up as there's no locking to prevent the history from going away mid-backup).

# ? Sep 28, 2018 14:56

Pollyanna: Mar 5, 2005; Milk's on them.

Took a look at pg_stat_activity and there was a whole-table update query that had been running since yesterday afternoon on that very column. I killed it and all the rest of the queries stopped waiting. Now a single update works fine :dance:

So is it because of the indexes? Or just the sheer size of the table, and I need to batch it?

Edit: Hmm, even via psql I can�t do a �select * from big_table limit 1;� without it taking forever. The only thing using that table is an autovacuum on it. Is that what�s preventing me from getting even one record?

Edit 2: it looks like the autovacuum is indeed what�s locking it. There�s a bunch of RowExclusiveLocks associated with its PID, as well as an ExclusiveLock and a ShareUpdateExclusiveLock. Am I just supposed to wait until the vacuum finishes? It�s been going for like 3 hours now.

Edit 3: we ended up narrowing it down to the indexes. 10~15 million records and an index on id desc means that just doing a blind select limit takes ages. :bang:

Problem solved, though.

Pollyanna fucked around with this message at 21:36 on Sep 28, 2018

# ? Sep 28, 2018 15:03

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Pollyanna posted:

Edit 3: we ended up narrowing it down to the indexes. 10~15 million records and an index on id desc means that just doing a blind select limit takes ages. Problem solved, though.

This I don't quite follow. An unqualified SELECT should not invoke an index at all, and simply return rows in page order.

Also auto vacuum shouldn't be doing so much unless you have a large number of updates and deletes or something.

# ? Sep 29, 2018 00:19

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

PhantomOfTheCopier posted:

Also auto vacuum shouldn't be doing so much unless you have a large number of updates and deletes or something.

Like updating PII on 15 million rows?

# ? Sep 29, 2018 18:12

Love Stole the Day: Nov 4, 2012; Please give me free quality professional advice so I can be a baby about it and insult you

Need some SQL help. Postgres. I'm using a table that contains id references to another table and I am making a query to replace the keys with their corresponding row data.

For the id keys themselves it's fine, but I need to use an array and form a similar array of matching row data as json objects... and that array of integer ids ( int[] ) is what's throwing me off. So far, I'm able to get the matching content itself, but something about how I'm forming my select clause is causing it to return many array sof one element each rather than all contained in a single array.

Here's the query:

code:


select ..., json_build_object(...) as director, array[json_build_object(...)] as cast
from
     (people p1
     inner join movies m1 on p1.id = m1.director_id
     left join ( 
          (select unnest(m2.cast_ids) cast_member from movies m2) t2 
          left join people p2 on p2.id = cast_member
     ) on p2.id = any(m2.cast_ids)

What this query returns is everything correctly, except the "cast" array contains only one cast_member data as json and instead of having all cast members for each movie, I have multiple copies of the same movie, each with its own different cast member associated with the movie via the cast_ids array in the movies table.

I'm trying to figure out how to make all of the cast member objects go into that array so that we have one result for each movie with an array of all cast members based on the id references.

I hope this sufficiently explains my issue. I'm sure I'm probably doing this in a very stupid and inefficient way. Thanks for reading!

Love Stole the Day fucked around with this message at 22:59 on Sep 29, 2018

# ? Sep 29, 2018 22:52

PhantomOfTheCopier: Aug 13, 2008; Pikabooze!

Nth Doctor posted:

Like updating PII on 15 million rows?

Not really, since that's not much data, though it's possible we're not getting the whole story. If your rows are 32 8-byte fields, it's only 4GB of data to move. Sure there are indexes to consider, but that's a far cry from TBs, so you've got something else going on.

First, the single-transaction update is a concern, because you're required to keep an entire copy of the data in case of a crash or rollback. This is why they recommend dropping and rebuilding indexes, dumping and restoring, changing transaction isolation, shared table locking, creating temporary or secondary tables for the new data, adding new null columns and migrating, and so forth. Second, your nominal traffic is important. I remember the word 'staging' somewhere up there, but if you can't update a single field you should really be asking how you got in this situation in the first place and how you're going to avoid it on the master.

# ? Sep 30, 2018 00:25

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

PhantomOfTheCopier posted:

Not really, since that's not much data, though it's possible we're not getting the whole story. If your rows are 32 8-byte fields, it's only 4GB of data to move. Sure there are indexes to consider, but that's a far cry from TBs, so you've got something else going on.

First, the single-transaction update is a concern, because you're required to keep an entire copy of the data in case of a crash or rollback. This is why they recommend dropping and rebuilding indexes, dumping and restoring, changing transaction isolation, shared table locking, creating temporary or secondary tables for the new data, adding new null columns and migrating, and so forth. Second, your nominal traffic is important. I remember the word 'staging' somewhere up there, but if you can't update a single field you should really be asking how you got in this situation in the first place and how you're going to avoid it on the master.

I really appreciate a thorough answer in response to be being a dickish smartass, POTC. Cheers, buddy.

# ? Sep 30, 2018 01:57

nem: Jan 4, 2003; panel.dev
apnscp: cPanel evolved

What are some recommended solutions to store round-robin data in a database, preferably Postgres, indefinitely? I've got a crude setup that chunks bandwidth data into 5 minute buckets, then periodically averages to daily, weekly, monthly as accounts age. It chokes as diversity scales though, and I'd like to explore alternatives.

I've looked into InfluxDB. Pair with Grafana, nice, and can begin logging/displaying other server metrics, but another daemon to run. Alternatively what about Timescale? It's a Postgres extension, nice, which partitions data automatically, double nice. With a little legwork it can also log server data too, so lend me some enlightenment. :unsmith:

# ? Sep 30, 2018 06:34

Pollyanna: Mar 5, 2005; Milk's on them.

PhantomOfTheCopier posted:

Not really, since that's not much data, though it's possible we're not getting the whole story. If your rows are 32 8-byte fields, it's only 4GB of data to move. Sure there are indexes to consider, but that's a far cry from TBs, so you've got something else going on.

First, the single-transaction update is a concern, because you're required to keep an entire copy of the data in case of a crash or rollback. This is why they recommend dropping and rebuilding indexes, dumping and restoring, changing transaction isolation, shared table locking, creating temporary or secondary tables for the new data, adding new null columns and migrating, and so forth. Second, your nominal traffic is important. I remember the word 'staging' somewhere up there, but if you can't update a single field you should really be asking how you got in this situation in the first place and how you're going to avoid it on the master.

They're pretty large rows, with a large amount of columns, one of which is an hstore holding arbitrary JSON data. That may affect things. As for staging vs. master, we're looking to wipe PII data only on staging, so it's not a major concern - it's considered tech debt that has a deadline for us (demos n poo poo for customers).

Truncation is still probably going to be our best bet, and we also need to figure out what the gently caress is going on re: indexes. I suspect that a lot of our problems stem from those 16 indexes...at least one of which is redundant.

# ? Sep 30, 2018 16:30

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

I have a Folding@Home table [user_hourly_summary]hosted on AWS Aurora using the MySQL 5.6 engine with the following columns:

-- username [varchar(100)]- team[int] - stamp[datetime] - score[bigint] - wu[bigint]

I am trying to write a query that will calculate the activity (net gain in work units and points) between a timestamp and the timestamp immediately prior for every timestamp for every user and insert it into a different table [user_hourly_activity]. A new set of data arrives every 60 minutes and inserts 1.8 million new rows into the database representing every user and their latest cumulative score and work units. The table user_hourly_summary has two indexes:

PRIMARY (UserName, team, stamp)
Index1 (username, team, stamp, score, wu)

This query will run immediately after the arrival of the latest batch of user data and needs to then calculate the latest user activity for the preceding 60 minutes.

This is what I have and it takes twenty minutes to run. I feel like there should be a more efficient way to run this than running three queries an hour over two million rows.

code:

INSERT INTO user_hourly_activity ( stamp, UserName, pph, wuph, team)
	SELECT username, team, stamp, score - 
		(SELECT score FROM user_hourly_summary t2 
			WHERE t2.username = t1.username 
			AND t2.team = t1.team
			AND t2.stamp < t1.stamp
			ORDER BY t2.stamp DESC LIMIT 1) AS pph,
		wu-
		(SELECT wu from user_hourly_summary t3 
			WHERE t3.username = t1.username 
			AND t3.team = t1.team
			AND t3.stamp < t1.stamp
			ORDER BY t3.stamp DESC LIMIT 1) AS wuph
		FROM user_hourly_summary t1

Is there a better way of doing this?

# ? Oct 1, 2018 04:27

Nth Doctor: Sep 7, 2010; Darkrai used Dream Eater!
It's super effective!

Agrikk posted:

I have a Folding@Home table [user_hourly_summary]hosted on AWS Aurora using the MySQL 5.6 engine with the following columns:

-- username [varchar(100)]- team[int] - stamp[datetime] - score[bigint] - wu[bigint]

I am trying to write a query that will calculate the activity (net gain in work units and points) between a timestamp and the timestamp immediately prior for every timestamp for every user and insert it into a different table [user_hourly_activity]. A new set of data arrives every 60 minutes and inserts 1.8 million new rows into the database representing every user and their latest cumulative score and work units. The table user_hourly_summary has two indexes:

PRIMARY (UserName, team, stamp)
Index1 (username, team, stamp, score, wu)

This query will run immediately after the arrival of the latest batch of user data and needs to then calculate the latest user activity for the preceding 60 minutes.

This is what I have and it takes twenty minutes to run. I feel like there should be a more efficient way to run this than running three queries an hour over two million rows.
code:
INSERT INTO user_hourly_activity ( stamp, UserName, pph, wuph, team)
  SELECT username, team, stamp, score - 
    (SELECT score FROM user_hourly_summary t2 
      WHERE t2.username = t1.username 
      AND t2.team = t1.team
      AND t2.stamp < t1.stamp
      ORDER BY t2.stamp DESC LIMIT 1) AS pph,
    wu-
    (SELECT wu from user_hourly_summary t3 
      WHERE t3.username = t1.username 
      AND t3.team = t1.team
      AND t3.stamp < t1.stamp
      ORDER BY t3.stamp DESC LIMIT 1) AS wuph
    FROM user_hourly_summary t1
Is there a better way of doing this?

Subqueries aren't the greatest.

Here's a sketch just from noodling over it while my Outlook shits itself:

code:

SELECT
  t1.username,
  t1.team,
  t1.stamp,
  t1.score - t3.score, -- sph
  t1.wu - t3.wu -- wuph
FROM user_hourly_summary t1
LEFT OUTER JOIN (
  SELECT
    t2.UserName,
    t2.team,
    MAX(t2.stamp) AS stamp
  FROM user_hourly_summary AS t2
  WHERE t2.UserName = t1.UserName
    AND t2.team = t1.team
    AND t2.stamp < t1.stamp
  GROUP BY t2.UserName, t2.team
) AS dt
  ON dt.username = t1.userName
  AND dt.team = t1.team
LEFT OUTER JOIN user_hourly_summary AS t3
  ON t3.UserName = dt.UserName
  AND t3.team = dt.team
  AND t3.stamp = dt.stamp

I don't really like it, but MySQL didn't get window functions until version 8. What you really want here is the first lagging row from the latest for the user.

The derived table will hopefully be a little bit less stupid. If there is some way to know ahead of time which user/teams you are doing this query with, you can throw that list into the derived table query's WHERE clause (and the main query's WHERE clause too, for that matter) to limit the rows that you process.

That could also be a problem, come to think of it. Your query as presented may be recalculating the per-hour values EVERY TIME it runs for every row. Not just the latest ones.

# ? Oct 1, 2018 15:21

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Nth Doctor posted:

Subqueries aren't the greatest.

That could also be a problem, come to think of it. Your query as presented may be recalculating the per-hour values EVERY TIME it runs for every row. Not just the latest ones.

My plan is to perform the query once over the existing data and dump the results into an activity table and then run it hourly thereafter, limiting via a WHERE clause to the previous 1-2 hours.

But I'll see how your query performs. Thank you for it.

And yeah, I was bummed to find that MySQL didn't have a lag function like MS-SQL.

# ? Oct 1, 2018 15:36

Adbot: ADBOT LOVES YOU

# ? Jun 4, 2024 18:59

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

Does the data that comes in each hour (potentially) contain multiple records per user, or is this just one record per user per hour? If it's just one record per user per hour, do all the new records that come in have the same timestamp, or are all the timestamps different?

Is user_hourly_summary truncated each hour before the data comes in, or does it have all the data that has ever come in? If all the data stays in there, is there a where clause that you didn't show us that prevents you from recalculating all the user activity that you calculated in the previous hours?

If you're seeing multiple timestamps per hour and you're already only calculating based on the new data that came in, then you're probably already as good as you're going to get.

# ? Oct 1, 2018 16:07

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »