Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Jethro posted:

Does the data that comes in each hour (potentially) contain multiple records per user, or is this just one record per user per hour? If it's just one record per user per hour, do all the new records that come in have the same timestamp, or are all the timestamps different?

Is user_hourly_summary truncated each hour before the data comes in, or does it have all the data that has ever come in? If all the data stays in there, is there a where clause that you didn't show us that prevents you from recalculating all the user activity that you calculated in the previous hours?

If you're seeing multiple timestamps per hour and you're already only calculating based on the new data that came in, then you're probably already as good as you're going to get.

The data comes in a batch each hour. Each user appears only once and the timestamp is the same for all users. user_hourly_summary contains all the data for each user forever and yes, there will be a where clause that processes the prior two updates (for thoroughness: the two hour ago update should already be calculated but sometime's it's missed due to patching or whatever):

code:
where stamp > convert_tz(date_add(now(), INTERVAL -2 HOUR), 'UTC', 'US/Pacific')
The work flow is:

[1] pull data from web site-> [2] push data into user_current_stats -> [3] push data into user_hourly_summary -> [4] calculate data for user_hourly_activity

1. Raw user data is pulled from the folding@home web site and some processing is done to clean up the data.
2. The cleaned data gets pushed into user_current_stats and is tagged with the timestamp of the download time. This table is truncated before every update. This data contains only the latest data available to speed the summary page of the web site and various back-end calculations.
3. user data then gets pushed into user_hourly_summary which contains all of the data for each user for each hour forever.
4. user_hourly_activity gets calculated based on the most recent two timestamps in user_hourly_summary

If there is a better workflow for this, I'm all ears. I'm more than happy to rebuild this database and reload the data however makes updates fast and the web page as responsive as possible.

Also:

https://folding.rainwalk.net/index.php is my web page currently running the data on a pair of SQL Server 2016 databases. These databases are independently processing data and are running behind a load balancer on pretty beefy AWS EC2 instances. I'm messing around with AWS Aurora (MySQL 5.6) and I want to see if I can make Aurora as responsive while adding resilience.

Agrikk fucked around with this message at 17:26 on Oct 1, 2018

Adbot
ADBOT LOVES YOU

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
If you have the option, why not go with PostgreSQL in RDS and get the benefits of window functions (and the query optimizer)?

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

PhantomOfTheCopier posted:

If you have the option, why not go with PostgreSQL in RDS and get the benefits of window functions (and the query optimizer)?

That’s for another day.

I’ve been doing a tour of AWS databases and Aurora MySQL 5.6 is current on my list.

Aurora Postgres is next, followed by the RDS versions of MySQL, Postgres and Oracle.

nem
Jan 4, 2003

panel.dev
apnscp: cPanel evolved

PhantomOfTheCopier posted:

If you have the option, why not go with PostgreSQL in RDS and get the benefits of window functions (and the query optimizer)?

Edit: derp not me :eng99:

Remora
Aug 15, 2010

Well, if this is the stupid question thread, this is probably where I belong.

I'm learning SQL (and VBA... guess why) for my job, and I am basically having to teach myself. My current question is - is there a really good book or site or blog post that just explains what the gently caress joins are *for* and why/when you use them? I've read a bunch of MS support articles and I am not loving getting it. "This is how to use a join" I am drowning in, "This is why joins are cool and useful and why you should know and use them" I am absolutely not finding.

Also... Is there a resource for Access 2013/16 that's better than the others? Book, site, whatever besides a real class? I don't get a choice about using Access, but I would like to be as good as I can be at it.

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum

Remora posted:

Well, if this is the stupid question thread, this is probably where I belong.

I'm learning SQL (and VBA... guess why) for my job, and I am basically having to teach myself. My current question is - is there a really good book or site or blog post that just explains what the gently caress joins are *for* and why/when you use them? I've read a bunch of MS support articles and I am not loving getting it. "This is how to use a join" I am drowning in, "This is why joins are cool and useful and why you should know and use them" I am absolutely not finding.

Also... Is there a resource for Access 2013/16 that's better than the others? Book, site, whatever besides a real class? I don't get a choice about using Access, but I would like to be as good as I can be at it.
Microsoft has a series of videos on database fundamentals, which might help: https://mva.microsoft.com/en-us/training-courses/sql-database-fundamentals-16944

Basically, you have one table full of data, and you need to correlate that data with data in another table. Let's say you've got the following tables:
code:
Threads table:
-------------------------------------------------------------------------------
ThreadID	ThreadName						Author
2672629		'SELECT * FROM Questions WHERE Type = \'Stupid\''	37414
code:
Posts table:
-------------------------------------------------------------------------------
PostID		ParentThread	Author	Body
488532303	2672629		37414	'the body of the post you just made'
A Threads table contains only threads, and a Posts table contains only posts. So if you wanted to display all of the data on all of the posts in a specific thread, you'd have to make a join:
SQL code:
SELECT p.* FROM Posts AS p
INNER JOIN Threads AS t
ON p.ParentThread = t.ThreadID
WHERE t.ThreadID = 2672629;
The technical term for splitting up data into different tables is called "database normalization", and learning more about the normalized forms and why data is normalized in the first place might help.

anthonypants fucked around with this message at 07:25 on Oct 3, 2018

Remora
Aug 15, 2010

I suppose what I'm not getting is - and I haven't watched the videos yet, phoneposting before I leave for work - that's structured very similarly to a lot of examples I've seen, and I don't see why you wouldn't just SELECT FROM [Posts] WHERE [ParentThread]=whatever. The threads table doesn't look like it has anything you even want.

Tax Oddity
Apr 8, 2007

The love cannon has a very short range, about 2 feet, so you're inevitably out of range. I have to close the distance.
Well, what if you had a ReceiptHeader and a ReceiptRow table? The ReceiptHeader would tell you which store the receipt is from, the date of purchase, name of the seller etc. Each individual ReceiptRow would tell you which product you've bought and how much it cost, for a particular receipt. Does that make any more sense?

Tax Oddity fucked around with this message at 13:26 on Oct 3, 2018

Cold on a Cob
Feb 6, 2006

i've seen so much, i'm going blind
and i'm brain dead virtually

College Slice

Remora posted:

I suppose what I'm not getting is - and I haven't watched the videos yet, phoneposting before I leave for work - that's structured very similarly to a lot of examples I've seen, and I don't see why you wouldn't just SELECT FROM [Posts] WHERE [ParentThread]=whatever. The threads table doesn't look like it has anything you even want.

Let's say you were selecting all posts in all threads by a particular author. In this case, you need to join Thread table to Post table to accomplish this because you're working with data in two separate tables.

SQL code:
SELECT p.* 
FROM Posts AS p
INNER JOIN Threads AS t ON p.ParentThread = t.ThreadID
WHERE t.Author = 37414;
Edit: Maybe think of joins as a more succinct way to join arrays of data structures without having to write a bunch of loops.

Cold on a Cob fucked around with this message at 13:31 on Oct 3, 2018

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Remora posted:

I suppose what I'm not getting is - and I haven't watched the videos yet, phoneposting before I leave for work - that's structured very similarly to a lot of examples I've seen, and I don't see why you wouldn't just SELECT FROM [Posts] WHERE [ParentThread]=whatever. The threads table doesn't look like it has anything you even want.
Well, in the case of generating a list of posts, you are correct that you might not need anything from the thread if you know the thread id, but there are plenty of times when you do need the information from multiple tables in each row. Or maybe you only have the thread title, not the id:
SQL code:
Select pu.name poster,
       pu.regDate,
       pu.avatar,
       pu.title,
       p.postedDate,
       p.postContents,
       IIf(p.editFlag = 'Y',CONCATENATE(IIf(IsNull(peu.name),'Somebody',peu.name),' hosed around with this message at ',DateFormat(p.editDate,'Mon D, YYYY'),' around ',DateFormat(p.editDate,'HH:MI')),) editMessage
  From [Threads] as [t]
       Inner Join [Posts] as [p] on t.threadId = p.threadId
       Inner Join [Users] as [pu] on p.postedBy = pu.userId
       Left Outer Join [Users] as [peu] on p.editedBy = peu.userId
 Where [t].Title = 'SELECT * FROM Questions WHERE Type = \'Stupid\''
So, to get the information about the poster and who (if anyone) edited the post, you need to join to the Users table two times.

Remora
Aug 15, 2010

Oh, cool, okay, that makes a lot more sense. Sorry for being dense.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Agrikk posted:

The data comes in a batch each hour. Each user appears only once and the timestamp is the same for all users. user_hourly_summary contains all the data for each user forever and yes, there will be a where clause that processes the prior two updates (for thoroughness: the two hour ago update should already be calculated but sometime's it's missed due to patching or whatever):

code:
where stamp > convert_tz(date_add(now(), INTERVAL -2 HOUR), 'UTC', 'US/Pacific')
The work flow is:

[1] pull data from web site-> [2] push data into user_current_stats -> [3] push data into user_hourly_summary -> [4] calculate data for user_hourly_activity

1. Raw user data is pulled from the folding@home web site and some processing is done to clean up the data.
2. The cleaned data gets pushed into user_current_stats and is tagged with the timestamp of the download time. This table is truncated before every update. This data contains only the latest data available to speed the summary page of the web site and various back-end calculations.
3. user data then gets pushed into user_hourly_summary which contains all of the data for each user for each hour forever.
4. user_hourly_activity gets calculated based on the most recent two timestamps in user_hourly_summary

If there is a better workflow for this, I'm all ears. I'm more than happy to rebuild this database and reload the data however makes updates fast and the web page as responsive as possible.

Also:

https://folding.rainwalk.net/index.php is my web page currently running the data on a pair of SQL Server 2016 databases. These databases are independently processing data and are running behind a load balancer on pretty beefy AWS EC2 instances. I'm messing around with AWS Aurora (MySQL 5.6) and I want to see if I can make Aurora as responsive while adding resilience.



I ended up changing the work flow slightly to get a better performing query that represents a lag function:

I push the data into current stats, then do a join using max(stamp) against the hourly_summary table. THEN i dump the current stats into the summary table. The processing of data end to end takes four minutes instead of twenty.

Thanks all for your help!

Paul MaudDib
May 3, 2006

TEAM NVIDIA:
FORUM POLICE
We seem to be getting much slower performance using an inner join than a theta join using Oracle. i.e.

"SELECT a.uid from TableA a, TableB b WHERE b.parentFk = a.uid AND b.someVal = '1234' order by a.uid asc"

runs much faster than:

SELECT a.uid from TableA a INNER JOIN TableB on b.parentFk = a.uid WHERE b.someVal = '1234' group by a.uid"

Is there some intuitive reason why theta joins should be faster? You would think it would be slower given that theta joins can include arbitrary comparisons...

Nth Doctor
Sep 7, 2010

Darkrai used Dream Eater!
It's super effective!


Paul MaudDib posted:

We seem to be getting much slower performance using an inner join than a theta join using Oracle. i.e.

"SELECT a.uid from TableA a, TableB b WHERE b.parentFk = a.uid AND b.someVal = '1234' order by a.uid asc"

runs much faster than:

SELECT a.uid from TableA a INNER JOIN TableB on b.parentFk = a.uid WHERE b.someVal = '1234' group by a.uid"

Is there some intuitive reason why theta joins should be faster? You would think it would be slower given that theta joins can include arbitrary comparisons...

The... gently caress?

Bruegels Fuckbooks
Sep 14, 2004

Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Nth Doctor posted:

The... gently caress?

there's a reason why oracle is named after priestesses who huffed fumes in a temple before giving predictions. don't think about it too hard, if oracle does weird poo poo just run tkprof and figure out if something is indexed that shouldn't be or vice versa.

Gatac
Apr 22, 2008

Fifty Cent's next biopic.

Paul MaudDib posted:

We seem to be getting much slower performance using an inner join than a theta join using Oracle. i.e.

"SELECT a.uid from TableA a, TableB b WHERE b.parentFk = a.uid AND b.someVal = '1234' order by a.uid asc"

runs much faster than:

SELECT a.uid from TableA a INNER JOIN TableB on b.parentFk = a.uid WHERE b.someVal = '1234' group by a.uid"

Is there some intuitive reason why theta joins should be faster? You would think it would be slower given that theta joins can include arbitrary comparisons...

This may be just a typo, but you have an ORDER BY in the first and a GROUP BY in the second.

McGlockenshire
Dec 16, 2005

GOLLOCKS!
I assume Oracle just doesn't have makes you pay extra for EXPLAIN? If I saw this in MySQL or Postgres, I'd just assume the query planner was using bad data and run an ANALYZE.

McGlockenshire fucked around with this message at 08:56 on Oct 4, 2018

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Paul MaudDib posted:

We seem to be getting much slower performance using an inner join than a theta join using Oracle. i.e.

"SELECT a.uid from TableA a, TableB b WHERE b.parentFk = a.uid AND b.someVal = '1234' order by a.uid asc"

runs much faster than:

SELECT a.uid from TableA a INNER JOIN TableB on b.parentFk = a.uid WHERE b.someVal = '1234' group by a.uid"

Is there some intuitive reason why theta joins should be faster? You would think it would be slower given that theta joins can include arbitrary comparisons...
First, as McGlockenshire said, check the plans to see what the db is doing.
My guess would be that the engine is giving more deference to your table order in the explicit join query than in the other one, and that is causing the difference. See what happens with
SQL code:
SELECT a.uid
  from TableB b
       INNER JOIN TableA a on b.parentFk = a.uid
 WHERE b.someVal = '1234'
group by a.uid
Finally, just a minor point, but those queries are both inner equijoins. One uses explicit (ANSI SQL) join syntax, and the other uses implicit join syntax, but they are otherwise the same join (except for maybe left table vs. right table). A theta join is a join that isn't a natural join that uses =, <, ≤, >, or ≥. An equijoin is just the special = case of a theta join. A theta join could potentially use comparisons other than = (and you could use any comparison you like with implicit or explicit join syntax), but no db is going to perform poorly just because you could have written < instead of =. You wrote = and the db can see that you wrote =, so it's going to be able to take that into account.

Jethro fucked around with this message at 14:36 on Oct 4, 2018

Kashuno
Oct 9, 2012

Where the hell is my SWORD?
Grimey Drawer
hi all, I have a weird Microsoft SQL question. I have a view that when I do a SELECT TOP (1000), or even just TOP 1, it will immediately spike the server CPU usage to 100%, and never finish. However, If I right click and ALTER TO, but then get rid of the alter portion of the query and just run it open, the view returns 65000 rows in about 45 seconds. Any obvious reason why that would occur? Is there something I can do to troubleshoot the process a bit? I'm pretty new to SQL so hopefully I'm just missing something obvious

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

What are you using for sort criteria? e.g. SELECT TOP 1000 * FROM TABLE ORDER BY (indexed int) will operate very differently than ORDER BY (nvarchar)

If you're not providing a criteria I think it'll use the first column

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.
I have a partitioning question:

How to partition? :D


I am loading data from a csv into a Aurora MySQL 5.6 table set up like so:

user_daily_summary
- stamp (DATETIME)
- UserName VARCHAR(100)
- Score (BIGINT)
- WU (BIGINT)
- team (BIGINT)

With a primary key on (UserName, Team, Stamp).

Every day, two million rows are added to the table from a CSV. Each row in the CSV contains one row per user/teamnumber combo (a user name must be unique within a team, but the same username can exist in multiple teams) and the whole batch is timestamped for the time the file is loaded.

A second table is then calculated from the first table such that the values for the previous stamp is subtracted from the values of the most current stamp to get the activity for the last day:

user_daily_activity
- stamp (DATETIME)
- UserName (VARCHAR(100)
- PointsPerDay (BIGINT)
- WUperDay (BIGINT)
- team (BIGINT)

With a primary key on (UserName, Team, Stamp).


With all that said:

I am doing an initial load of data consisting of roughly 1900 files representing the previous 1900 days of data. Each file contains roughly 1.6 million lines of data.

I'm iterating over the list of files and loading the data one file at a time and the job is bogging down like crazy:



The job kicks off by loading 20 days per hour. But then quickly bogs down to 1-2 days loaded per hour. Loading 1800 files at 2 files per hour is unacceptable, because ongoing data loads will take too long.


So now all of this circles back to the basic question:


How do I speed up the data load? Is partitioning the way to go here? Since users will be queried by username (and pulling all data by user) partitioning on username seems to make sense. But what method? HASH or LINEAR HASH on UserName? KEY on Username/Team? KEY on UserName/Team/Stamp?

spiritual bypass
Feb 19, 2008

Grimey Drawer
How are you loading the data? Is it batched into transactions?

For reading, the best thing to do is usually to put an index on those two selection criteria if you haven't done it yet.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

rt4 posted:

How are you loading the data? Is it batched into transactions?

For reading, the best thing to do is usually to put an index on those two selection criteria if you haven't done it yet.

I'm loading using INSERT INTO... SELECT FROM... from a temp table on which I do some initial processing. I'm doing the initial load into the temp table by the Aurora LOAD FROM S3:

code:
    LOAD DATA FROM S3 's3-us-west-2://<bucketname>/users.txt'
    INTO TABLE temp_user
	FIELDS TERMINATED BY '\t'
    LINES TERMINATED BY '\n'
I have two indexes on each table. The PRIMARY index and a second index across all fields.

Moonwolf
Jun 29, 2004

Flee from th' terrifyin' evil of "NHS"!


For bulk loading in almost everything you're best off stripping all the indexing and rebuilding it afterwards, building 2 giant indices is vastly faster than keeping 2 b-trees balanced while you keep writing. The cost of constantly rebuilding them will be what's causing that fall off in speed, early on they're still simple.

What problem is your second index meant to solve? An index that holds a subset of columns that completely fulfils a known query can be useful to stop you having to read the row after the index lookup, but will be murder on your writes. Btree indices are normally there to let you get to the set of rows you want efficiently, what are you expected access paths onto the data once it's all done? Most recent row for a customer? Today's data for all customers? All data for a customer? This would set what your leading keys should be.

Love Stole the Day
Nov 4, 2012
Please give me free quality professional advice so I can be a baby about it and insult you
(This is Postgres) I am using an array of foreign keys and creating a json object based on its corresponding rows to the reference table.

code:
		
    select jsonb_build_object('id', t2.id, 'name', t2.name, 'biography', t2.biography) c1 
    from (
      (
        select distinct unnest(m3.cast_ids) c_id
        from movies m3
        where m3.cast_ids is not null 
          and m3.id = 2                   <----------- issue
        order by c_id asc
      ) t1
      left join people p1 on p1.id = t1.c_id
    ) t2
The problem I'm trying to tackle at the moment is that I need to somehow add a condition check so that this returns not every row of the `people` table but rather just the relevant ones contained in that original `unnest(...) c1` list.

Whenever I try to do this the intuitive way, I run into endless "undefined value" problems due to context. I've been trying to untangle this for several days now and am having trouble getting it done. I wasn't able to find a way to make use of a `left join lateral` and I wasn't able to find a way to shoe-horn in the condition checks necessary anywhere. So I'm thinking I might need to just scrap all of this and start over with a new approach. I'm hoping someone here can point out my blind spot for me.

As you can see where I marked the issue, I am able to manually control which movie's cast data is returned, but I can't seem to figure out how to make it work generally so that each row will yield based on the given `movie` table's `cast_ids.` If I remove the line that I labeled with the above arrow, then for every movie I will get the entire `people` table as if everybody in the entire table was involved for each movie. So naturally I'm trying to figure out how to make this where clause to filter out the non-relevant things.

Love Stole the Day fucked around with this message at 04:43 on Oct 8, 2018

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

Love Stole the Day posted:

(This is Postgres) I am using an array of foreign keys and creating a json object based on its corresponding rows to the reference table.

code:
		
    select jsonb_build_object('id', t2.id, 'name', t2.name, 'biography', t2.biography) c1 
    from (
      (
        select distinct unnest(m3.cast_ids) c_id
        from movies m3
        where m3.cast_ids is not null 
          and m3.id = 2                   <----------- issue
        order by c_id asc
      ) t1
      left join people p1 on p1.id = t1.c_id
    ) t2
The problem I'm trying to tackle at the moment is that I need to somehow add a condition check so that this returns not every row of the `people` table but rather just the relevant ones contained in that original `unnest(...) c1` list.

Whenever I try to do this the intuitive way, I run into endless "undefined value" problems due to context. I've been trying to untangle this for several days now and am having trouble getting it done. I wasn't able to find a way to make use of a `left join lateral` and I wasn't able to find a way to shoe-horn in the condition checks necessary anywhere. So I'm thinking I might need to just scrap all of this and start over with a new approach. I'm hoping someone here can point out my blind spot for me.

As you can see where I marked the issue, I am able to manually control which movie's cast data is returned, but I can't seem to figure out how to make it work generally so that each row will yield based on the given `movie` table's `cast_ids.` If I remove the line that I labeled with the above arrow, then for every movie I will get the entire `people` table as if everybody in the entire table was involved for each movie. So naturally I'm trying to figure out how to make this where clause to filter out the non-relevant things.

If all the data you are interested in is from the people table, why is that the outer table in the join? I guess in theory every t1.c_id should have a corresponding p1.id, so it shouldn't make that much of a difference, but it certainly looks funny. You filter out null cast_ids, but are you sure none of the entries in the array are null (I have never used Postgres or array columns, so I don't know if that's even a sensible question)?

If I want "records from one table where the id is in another table", I write a query using an IN

SQL code:
select jsonb_build_object('id', p1.id, 'name', p1.name, 'biography', p1.biography) c1 
  from people p1
 where p1.id in (
        select unnest(m3.cast_ids) c_id
        from movies m3
        where m3.cast_ids is not null 
          and m3.id = 2)
Or am I misunderstanding what you want?

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!
There's no movie id in the output rows, nor any grouping by the movie id, so without a constraint you'd get a list of people ids that appear in any movie.

Start by getting (movieid,peopleid) rows from your cast unnest, join that to people names to get (movieid, peopleid, name) duplicates and nulls allowed. jsonify.

Vegetable
Oct 22, 2010

This is the source table:



This is what I want:



Basically, for each row of Thing ID, I want a list of Group IDs that the Thing ID is not in.

I'm having trouble figuring out how to make this happen. Any ideas?

McGlockenshire
Dec 16, 2005

GOLLOCKS!
How large is the data set? Something like this will work but it won't be fast and it won't scale well. You can replace the inner SELECT DISTINCTs if you have other canonical sources of your Things and Groups, but it's the string concat of all of the data that will really kill you here.

Cloudflare has decided that posting derived tables in the FROM clause is forbidden, so sorry about the SEL ECTs.
code:
SELECT thing_id, group_id FROM (
    SELECT All_Things.thing_id,
           All_Groups.group_id,
           (All_Things.thing_id || ',' || All_Groups.group_id) AS thing_group
      FROM (SEL ECT DISTINCT thing_id FROM Thing_Groups) AS All_Things,
           (SEL ECT DISTINCT group_id FROM Thing_Groups) AS All_Groups
    ) AS All_Groups_And_Things
WHERE thing_group NOT IN( SELECT thing_id || ',' || group_id FROM Thing_Groups );

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

Has anyone used TeamSQL here (for MSSQL)? I grabbed it on Friday but it asked for JODBC which I grudgingly installed java for and now it can't connect to even my local instance of MSSQL, let alone the real one I want to work with. Good app, undercooked (my impression), total trash?

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum
If you want something lighter than SSMS, Microsoft also makes SQL Operations Studio, which they recently renamed to Azure Data Studio. It's based on VS Code, but it's got some neat SQL gimmicks in it.

Shy
Mar 20, 2010

Lighter in what way? SSMS is very quick. What are people missing from it to try other clients?

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

I was hoping to try the share/collaboration stuff since we have a bunch of one off scripts to find things that aren't in the interface/presented to end users

Remora
Aug 15, 2010

Continuing my theme of embarrassing questions. I have two tables, WpnSalesGenStorage and CourierGenStorage. 

code:

WpnSalesGenStorage
ID, AssignedWarehouse, AssignedCourier
..., ..., ...
823, Zanesville OH, NULL
824, Rochester, NH, NULL
825, Alexandria, LA, NULL
..., ..., ...

CourierGenStorage
ID, FullName, Warehouse
..., ..., ...
140, Danielle Everett, Macon GA
141, Ron Dyson, Truth or Consequences NM
142, Shirley Sullivan, Meriden CT
..., ..., ...

Sorry if the formatting sucks, I am phoneposting. If it matters, there's 48 possible values for the warehouse fields (they both were randomly generated from the same table), and FullNames are all guaranteed to be unique values. There are other fields in each table but I don't think they matter.

What I want to do is check the AssignedWarehouse of each row of WpnSalesGenStorage, pull a list of FullName values that have that AssignedWarehouse value as Warehouse, randomly choose one of those FullNames, and update the AssignedCourier field with that randomly selected FullName value.

I kind of assume if I can figure out how to do this for one row, I can figure out how to loop it until it does all rows where AssignedCourier=NULL. I've done that before.

I think I have the UPDATE logic to select the single row, but I'm unsure how to do the rest of the query. I can do pieces in isolation - for example, I have a customized GetRandomInteger(LowLimit,HighLimit) function that will return a decent random number and I've used it to randomly select records before in this DB. Not sure how that's going to help here, though.

code:

UPDATE (SELECT TOP 1 * FROM WpnSalesGenStorage WHERE WpnSalesGenStorage.AssignedCourier IS NULL) SET WpnSalesGenStorage.AssignedCourier = 

After this, I'm guessing at best. I'm guessing it's going to be a fairly involved subquery (at least by my lovely baby standards) and it'll run like poo poo, but I should only have to do it once so performance is not a huge deal. Do I want to do some kind of join on the tables where p.AssignedWarehouse = t.Warehouse, or something?

If this thread is not really for crap this simple, I can try Stack Exchange or something, but I wanted to try to keep my shame to a single location.

TIA for any advice or proverbs you can offer.

kumba
Nov 8, 2003

I posted my food for USPOL Thanksgiving!

enjoy the ride

Lipstick Apathy

this is stupid and clunky but it should work

code:
DECLARE @counter INT = (SELECT Count(*) 
   FROM   wpnsalesgenstorage 
   WHERE  assignedcourier IS NULL) 
DECLARE @currentName NVARCHAR(100) 
DECLARE @currentWarehouse NVARCHAR(100) 
DECLARE @currentSalesID INT 

WHILE @counter > 0 
  BEGIN 
      SET @currentSalesID = (SELECT TOP 1 id 
                             FROM   wpnsalesgenstorage 
                             WHERE  assignedcourier IS NULL 
                             ORDER  BY id) 
      SET @currentWarehouse = (SELECT assignedwarehouse 
                               FROM   wpnsalesgenstorage 
                               WHERE  id = @currentSalesID) 
      SET @currentName = (SELECT TOP 1 x.fullname 
                          FROM   (SELECT fullname, 
                                         GetRandomInteger(0,1000) AS num 
                                  FROM   couriergenstorage 
                                  WHERE  warehouse = @currentWarehouse) x 
                          ORDER  BY x.num) 

      UPDATE wpnsalesgenstorage 
      SET    assignedcourier = @currentName 
      WHERE  id = @currentSalesID 

      SET @counter = @counter - 1 
  END 
there's probably a better way that does not involve a while loop or cursor but for a one-off i imagine this will be fine

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

Vegetable posted:

This is the source table:



This is what I want:



Basically, for each row of Thing ID, I want a list of Group IDs that the Thing ID is not in.

I'm having trouble figuring out how to make this happen. Any ideas?
Seems like one of those cases where EXCEPT might be useful. IE do the full join of distinct pairs except the pairs from the inner join.

Remora
Aug 15, 2010

kumba posted:

this is stupid and clunky but it should work

Thanks a million. Dissecting that was very educational.

Ruggan
Feb 20, 2007
WHAT THAT SMELL LIKE?!


Vegetable posted:

This is the source table:



This is what I want:



Basically, for each row of Thing ID, I want a list of Group IDs that the Thing ID is not in.

I'm having trouble figuring out how to make this happen. Any ideas?

I hope your list of IDs isn’t just stored in your table. Assuming you’re normalized appropriately...

SELECT T.ThingId, G.GroupId
FROM Things T
CROSS JOIN Groups G
LEFT JOIN ThingGroups TG
ON TG.GroupId = G.GroupId
AND TG.ThingId = T.ThingId
WHERE TG.ThingId IS NULL

Then aggregate or whatever. The cross join gets you every combination of thing and group possible, and the left join and where will filter out all existing matches.

LLSix
Jan 20, 2010

The real power behind countless overlords

What steps should I take to try to optimize this query? The execution plan shows that everything is either an index seek or a primary key lookup except for a single sort triggered by the one order by clause. The sort takes up 1/3 of the total query cost so that seems like a good place to start, but neither I nor anyone else at work knows a good way to speed it up.

code:
SELECT xl.xlinkid, 
       xl.entityid2, 
       xl.relationshipid1, 
       users.userid, 
       nc.namecardid, 
       nc.namefull, 
       users.loginid, 
       '', 
       '0', 
       userrate.ratetype, 
       userrate.amtrate, 
       xtable.description, 
       nc.namefirst, 
       nc.namelast 
FROM   xlink xl, 
       namecard nc, 
       users 
       LEFT OUTER JOIN userrate 
                    ON userrate.userid = users.userid 
                       AND userrate.siteid = 'C010' 
                       AND userrate.isdefault = 1 
       LEFT OUTER JOIN xtable 
                    ON userrate.ratetypeid = xtable.xtableid 
WHERE  xl.linkclass = 'MyUsers' 
       AND xl.parententitytypeid = '400' 
       AND xl.parententityid = '0000000001X000' 
       AND users.userid = xl.entityid2 
       AND nc.namecardid = users.namecardid 
       AND xl.siteid = 'C010' 
ORDER  BY users.loginid 
loginid is not a key or indexed in users. I tried ordering it by a couple of different keys that are indexed in users instead and they all performed the same as the current query so I don't think that making an index for loginid will have an impact.

I'm pretty new at this so any advice is appreciated.


Edit: GoogleDrive seems to think the execution plan is an audio file but if you download it and open it, it behaves correctly. Here's a jpg of the same thing.

LLSix fucked around with this message at 15:21 on Oct 11, 2018

Adbot
ADBOT LOVES YOU

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



An index covering all the fields of XLink you're using might help a bit by eliminating that key lookup.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply