Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
nescience
Jan 24, 2011

h'okay
php:
<?
14  mysql_query("INSERT INTO main 
15  (player_name, player_class, level) VALUES
16  ($_POST["ign"], $_POST["p_class"], $_POST["level"]) ")
17  or die(mysql_error());
?>
I'm getting the error:
Parse error: syntax error, unexpected '"', expecting T_STRING or T_VARIABLE or T_NUM_STRING in /directory/script.php on line 16

Did I write something wrong?

Adbot
ADBOT LOVES YOU

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

Userf4058 posted:

php:
<?
14  mysql_query("INSERT INTO main 
15  (player_name, player_class, level) VALUES
16  ($_POST["ign"], $_POST["p_class"], $_POST["level"]) ")
17  or die(mysql_error());
?>
I'm getting the error:
Parse error: syntax error, unexpected '"', expecting T_STRING or T_VARIABLE or T_NUM_STRING in /directory/script.php on line 16

Did I write something wrong?

1)You've got double-quotes inside of your double-quoted string. Change these to single quotes and put curly braces around your variables.
2)You're setting yourself up for SQL injection. You should be using PDO instead.
3)Using die() and outputting your error for your users to see is not very elegant.

e: with PDO you would do this
php:
<?
$statement = $mysql->prepare("INSERT INTO main (player_name, player_class, level) VALUES (:name, :class, :level)");
$statement->execute(array(":name" => $_POST['ign'], ":class" => $_POST['p_class'], ":level" => $_POST['level']));
?>

butt dickus fucked around with this message at 16:35 on Jul 14, 2011

nescience
Jan 24, 2011

h'okay

Doctor rear end in a top hat posted:

code
wow this is great. Thank you so much. Yeah, I'm a newbie.
Btw, how is using DPO's different from something like this?

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

Userf4058 posted:

wow this is great. Thank you so much. Yeah, I'm a newbie.
Btw, how is using DPO's different from something like this?

That's more of a question for the PHP thread, but it does several things. PDO will work with other databases like MSSQL, it does prepared statements, transactions and piles of stuff.

isnoop
Jan 9, 2001

I used to be an admin,
but then I took an arrow
to the knee.

revmoo posted:

I've just started working with mysql workbench, and if I'm logged into a sql server and I've selected my schema and I type:

DROP DATABASE 'nameofdb';

and hit run, it says there is an error in my syntax. Now I'm sure that syntax is correct. What is the problem here?

You're using single quotes instead of backticks. Try this:
DROP DATABASE `nameofdb`;

isnoop
Jan 9, 2001

I used to be an admin,
but then I took an arrow
to the knee.

Thel posted:

What's with fare2? What is it doing?

Also, when you're doing a composite query like this, it's probably easier to build it up in stages and test each stage. Just to make sure, you have a given origin (i.e. JFK), and want one result for each single outbound route, with the cheapest price in your date range?
Fare2 is a self-join used to make sure each row returned contains only the most recent fare for a given route. Joining fare2 to fare based on the primary key being greater in fare2 means that I have the latest record in fare if all fare2 values are null.

Your restatement is precisely what I'm hoping to achieve.

Goat Bastard posted:

If your database supports analytic functions then you probably want something like

code:
WITH   sorted_fare AS
      (SELECT flightfk
             ,price
             ,row_number() OVER (PARTITION BY flightfk ORDER BY price) AS sort_order
       FROM   fare fare
      )
SELECT *
FROM   route r
JOIN   flight flt
       ON flt.routefk = r.id
JOIN   sorted_fare fare
       ON  flt.id = fare.flightfk
       AND fare.sort_order = 1
WHERE  r.status = 'active'
AND    flt.takeoffdate > '2011-07-13'
AND    flt.takeoffdate <= '2011-07-25'
AND    r.fromiata = 'JFK'
MySQL does indeed support that functionality as a basic nested query. I reformed my SQL that way and now it works. Granted, it is slower than before, but it also appears to be returning the correct data for a change.

Goat Bastard
Oct 20, 2004

isnoop posted:

MySQL does indeed support that functionality as a basic nested query. I reformed my SQL that way and now it works. Granted, it is slower than before, but it also appears to be returning the correct data for a change.

It's hard to know without seeing the real query, but I suspect that the problem with your original one was the GROUP BY. I'm not a MySQL expert, but I think if your GROUP BY clause doesn't cover all the non-aggregate columns you're trying to use then it just picks an arbitary row that satisfies it. The databases I'm more used to wouldn't even compile that, you'd get an exception saying something like "Column not part of Group By".

Goat Bastard fucked around with this message at 21:58 on Jul 14, 2011

MrDoDo
Jun 27, 2004

You better remember quick before we haul your sweet ass down to the precinct.
I am still a SQL newbie so I am looking for some help to understand how to do a update using STUFF with T-SQL. I have a query that is logically how I want things to behave but its kind of a mess. It seems that the STUFF function does not like taking in multiple results from a subquery like this. Is there a better way that I could return the results and then enumerate over them iteratively? I would greatly appreciate any input or criticism

code:

UPDATE bib 
SET    TEXT = REPLACE(TEXT, (SELECT TEXT 
                             FROM   bib                           #Table that stores individual bibliographic records
                             WHERE  bib# IN (SELECT bib# 
                                             FROM   item          #Table that has individual item records with using bib# as a foreign key to the bib table
                                             WHERE  itype = 'a')  #Trying to specifically grab all instances of a particular type of item (book, dvd etc) then use stuff to                         
                                                                   updated the 7th chracter of the string "text" in the bib table.
                                    AND tag = '000'), (SELECT 
                            Stuff((SELECT TEXT 
                                   FROM   bib 
                                   WHERE  bib# IN (SELECT 
                                          bib# 
                                                   FROM 
                                          item 
                                                   WHERE 
                                          itype = 'a') 
                                          AND tag = '000'), 7, 1, 'a'))) 
WHERE  bib# IN (SELECT bib# 
                FROM   item 
                WHERE  itype = 'a') 
       AND tag = '000' 

EDIT: I tried to add a little bit more info about what I am trying to accomplish and how the tables are

MrDoDo fucked around with this message at 23:27 on Jul 14, 2011

Sprawl
Nov 21, 2005


I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!
Reading what you are trying to do you have a few too many steps doing basically the same thing.

Is the stuff command doing what it should be? Because then you should be doing something much simpler.

Also from the looks of the replace text command its completely unnecessary as its going to always be the same as the field your getting in a sub query.

As far as i can tell this will give you the exact same results.

code:
UPDATE bib 
SET    bib.TEXT= (Stuff(bib.TEXT , 7, 1, 'a'))) )
join item
where
bib.bib# = item.bib#
and item.itype='a'
and tag='000'

revmoo
May 25, 2006

#basta
Oh I had no idea SQL used backticks ` right? Learn something new every day.

McGlockenshire
Dec 16, 2005

GOLLOCKS!
No, SQL uses double quotes. MySQL uses backticks, but also accepts double quotes. Oh, and MSSQL uses loving square brackets, I am not making that up.

If you want to think cross-platform, think double quotes.

MrDoDo
Jun 27, 2004

You better remember quick before we haul your sweet ass down to the precinct.

Sprawl posted:

Reading what you are trying to do you have a few too many steps doing basically the same thing.

Is the stuff command doing what it should be? Because then you should be doing something much simpler.

Also from the looks of the replace text command its completely unnecessary as its going to always be the same as the field your getting in a sub query.

As far as i can tell this will give you the exact same results.

code:
UPDATE bib 
SET    bib.TEXT= (Stuff(bib.TEXT , 7, 1, 'a'))) )
join item
where
bib.bib# = item.bib#
and item.itype='a'
and tag='000'


Wow thats awesome, thank you so much for clarifying. Unfortunately I think my mind got a little too clouded about what exactly the conditions were and just threw it everywhere making this way too messy. This gives me a great new understanding. Thanks again!

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

McGlockenshire posted:

No, SQL uses double quotes. MySQL uses backticks, but also accepts double quotes.

Just a quick note, you can make MySQL accept double quotes but the sql mode has to be set appropriately otherwise it thinks double quotes are just string quotes.

420 set session sql_mode = 'ANSI_QUOTES,TRADITIONAL' every day

Bob Morales
Aug 18, 2006


Just wear the fucking mask, Bob

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Any tips on my.cnf settings for InnoDB?

Going from this site:

http://www.mysqlperformanceblog.com/2006/05/30/innodb-memory-usage/

I've decided to change the following settings on our server.
code:
innodb_buffer_pool_size=8096M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=6
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=4M
cat /proc/meminfo on the server shows:

MemTotal: 12299960 kB
MemFree: 2467744 kB

Our ibdata1 file is 16G

The server is has 3 vCPU's, Xeon L5520 @ 2.27GHz. The Ruby/Apache server has 5, and also has 12GB. Wouldn't it make more sense to give the DB server more RAM and maybe equal out the CPU's? Not sure if it's a single quad with HT enabled, or a dual-quad with HT disabled. Maybe we should enable HT and give each server 8 vCPU?

ufarn
May 30, 2009
Cross-posting my problem(s) from the Django thread:

(I am going through the Models chapter of the Django Book, and I am trying to set up a database. I assume that I have to create any database I want through a separate MySQL console, and that I can't do this through any Python/Django command.)

For some reason both WAMP and XAMPP won't work on my Windows 7 installation; the first one throws a CLI error, and the latter won't give me access to the "admin" settings for my MySQL server.

With both, I get this in the MySQL console:

code:
mysql> CREATE DATABASE goondb;
ERROR 1044 (42000): Access denied for user ''@'localhost' to database 'goondb'
mysql>
Note that neither prompt me for a password at any time.

I have played around with WAMP on my computer before (a couple of months ago), and it's possible that something else has interfered with it meanwhile, but I have no idea how the hell to troubleshoot it. There seem to be some really high-brow Windows fixes for some problems like messing around with DLLs and the registry and whatnot, but they seem way too dangerous for my taste.

I have tried uninstalling and reinstalling programs and deleted folders in C:\WAMP \XAMPP, but to no avail.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
Why are you using wamp for django development? Just install the mysql binaries and then use the mysql workbench tool to administer it.

ufarn
May 30, 2009

MEAT TREAT posted:

Why are you using wamp for django development? Just install the mysql binaries and then use the mysql workbench tool to administer it.
How so? I am just getting into Django and general deployment.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

ufarn posted:

How so? I am just getting into Django and general deployment.

Well it includes php which isn't what you use to program django, as you know, so that's just extra crap. Apache is a full web server which you really shouldn't mess with until you want to deploy your app in production. Django has a built in web server which you should use because configuring Apache to work with python is not that interesting and full of pitfalls like the ones you are having.
Install mysql from the binaries on their site and continue your examples.

ufarn
May 30, 2009

MEAT TREAT posted:

Well it includes php which isn't what you use to program django, as you know, so that's just extra crap. Apache is a full web server which you really shouldn't mess with until you want to deploy your app in production. Django has a built in web server which you should use because configuring Apache to work with python is not that interesting and full of pitfalls like the ones you are having.
Install mysql from the binaries on their site and continue your examples.
I've downloaded the Workbench and the Python-MySQL, but I can't find a link for the binaries, although I'm sure I stumbled upon them before. Got a link?

EDIT: The server tool also fails to set up a MySQL server for the same reason:

quote:

Connecting to MySQL server localhost...

Can't connect to MySQL server on '127.0.0.1' (10061)

ufarn fucked around with this message at 10:19 on Jul 17, 2011

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
You don't have the server installed. That's the most important part. When you install it make sure you install it as a windows service, otherwise you need to start mysql manually every time you reboot.

http://dev.mysql.com/downloads/mysql/

ufarn
May 30, 2009

MEAT TREAT posted:

You don't have the server installed. That's the most important part. When you install it make sure you install it as a windows service, otherwise you need to start mysql manually every time you reboot.

http://dev.mysql.com/downloads/mysql/
It works now, thanks. Don't know why the mysql PATH command gives me

code:
C:\Users\niclas\Dropbox\Programming Projects\Django Projects\mysite>mysql
ERROR 1045 (28000): Access denied for user 'ODBC'@'localhost' (using password: N
O)
But I am able to CREATE DATABASE booksdb, when I load the dedicated client prompt from the Start menu (which I can use to set up Django).

I should probably find out where the database is stored on my computer, too ...

angrytech
Jun 26, 2009
How do the OS numbers break down for *SQL development? Is there a general trend towards windows? Obviously anyone doing MSSQL probably isn't going to be using linux.

Scaramouche
Mar 26, 2001

SPACE FACE! SPACE FACE!

This is off the cuff, but I'd guess that Windows is relatively dominant on the small-medium end and then probably nearly disappears on the big iron/enterprise side. I'm guessing if they're running Apache they're not using MSSQL/Running IIS not using MySQL/etc.

Can't really find much concrete on database server numbers, but it's probably an extrapolation of this as I don't see too many shops mixing flavours:
http://greatstatistics.com/
You might be able to get more by clicking around in this site:
http://trends.builtwith.com/Web%20Server
But I was only able to find web servers, not database servers (though they have a separate application frameworks section as well).

benisntfunny
Dec 2, 2004
I'm Perfect.

angrytech posted:

How do the OS numbers break down for *SQL development? Is there a general trend towards windows? Obviously anyone doing MSSQL probably isn't going to be using linux.

Without getting too specific... I work for a large company. We use a handful of DB2 databases.... Couldn't tell you what they're on. There are probably only two or three big ones running DB2. Thousands of SQL Server databases running on Windows Server and thoundands of Oracle DBs on solaris.

For development I use SQL Server Management Studio, Toad, DB2 Command Editor and even for the sake of just having another editor, SQL Developer (actually because it's free and the company direction is to stop paying for Toad).

Anyway, we are moving generally more towards Oracle mostly. Stability wise the windows server machines have been hit or miss where as the oracle boxes don't really ever have many problems.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad
A while back I asked about a very related issue, and I'm back begging for more help.

I've got a very large source table that I want to filter into another table, by way of joining on another table, F.

F looks like this:
code:
folder subfolder key acct
acct is a bit misleading, as most of the time it's NULL, and only non-NULL in special cases. A key can be present in multiple folders and subfolders, which are both ints. subfolder values can be shared across folders.

The target table A looks like this:
code:
key timestamp day metric
The source table S looks like this:
code:
key timestamp day metric dmetric
dmetric is the change in the metric (an int) since the last update, which is usually but not always the previous day.

Same as last time, we had originally designed everything around there being only a single row per key per day. But in fact, our upstream systems are sometimes spitting out multiple records a day. Something like this:
code:
key1   2011-07-21 8:01am   2011-07-21   100  20
key1   2011-07-21 2:29pm   2011-07-21   140  40
This was the old filtering query:

insert into A
select distinct F.key, day, metric, dmetric
from S join F on S.key = F.key

Dead simple. The distinct was kinda cheesy but it ran pretty fast so I had no reason to mess with it.

As it turns out, things get really screwy downstream because of the assumption of only 1 row per day per key.

I've been trying to figure out how to modify the query to basically turn the two example rows into this:

code:
key1   <some ts, don't care>   2011-07-21   140  60
So basically the latest metric value and the cumulative delta for all redundant rows put together.

This is what I had at first:
code:
insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day),
               SUM(dmetric) OVER (PARTITION BY F.key, day)
        from S join F on S.key = F.key) as Z
  where rank=1;
The mistake I made was two-fold:
- joining just on key means multiple rows of S because of multiple rows with the same key in different folders and subfolders
- SUM oversumming in those cases

So this is the new version:
code:
insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
               SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;
I think this is correct. Am still testing the results. What gets me is that this seems quite contrived. Maybe it's just because of the bad data situation we're trying to fix. It's also 3 times slower than the original query. Does anyone have recommendations for other ways to accomplish the same thing?


e: one thing that's sort of inconsistent looking is the FIRST_KEY. I don't specify the same longwinded ORDER for it because all of the dupe rows should have the same timestamp, which is the only thing I care about there. But that doesn't take away from my main goal, which is an overall cleaner way to solve the problem.

Oh, I tried this:
code:
select F.key, day,
       FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
       SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC)
from (select distinct key from S) as S` join F on S`.key = F.key) as Z
But this seems much slower, just from letting it run and from looking at the query plan.

kimbo305 fucked around with this message at 23:31 on Jul 21, 2011

Aredna
Mar 17, 2007
Nap Ghost
I think this gives you what you want:

code:
insert into A
select S.key, S.day, max(S.metric), sum(S.dmetric)
from S join F on S.key = F.key
group by S.key, S.day

benisntfunny
Dec 2, 2004
I'm Perfect.
Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad

Aredna posted:

I think this gives you what you want:

code:
insert into A
select S.key, S.day, max(S.metric), sum(S.dmetric)
from S join F on S.key = F.key
group by S.key, S.day

Hmmmm that's really elegant, but the metric can increase or decrease. Is there way to patch that in?

e: I suppose I could interpolate and use AVG for both... not quite the same behavior but probably good enough. I'll perf test that.

kimbo305 fucked around with this message at 05:18 on Jul 22, 2011

Aredna
Mar 17, 2007
Nap Ghost
I almost made a comment about assuming it was strictly increasing for that to work too. Since it can go either way you'll have to use rank (or row_number) to get the last record for the day based on the timestamp, which is almost always going to run slower.

This should work too and should be faster if few keys are filtered out by the join to F.
code:
insert into A
select z.key, z.day, z.metric, z.dmetric
from (select
         S.key,
         S.day,
         S.metric,
         RANK() OVER (PARTITION BY S.key, S.day ORDER BY S.timestamp DESC) as rank,
         SUM(S.dmetric) OVER (PARTITION BY key, day) as dmetric
      from S ) as z
   inner join (select distinct key from F) as y on z.key = y.key
where z.rank = 1
If this runs too slow as well you may try posting the indexes on S and F. It would also be useful to know roughly what % of keys in S are filtered out by the join to F in your original query.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad
S is huge and F is tiny, as is the filtered result set, so that last query is not gonna be ideal. It seems similar to the 2nd variant I posted, which is very slow.

Let's see, S is 25 billion rows and 280 million distinct keys, F is 1.4 mil rows and 1.1 mil distinct keys.
The resulting filtered table is e: 360 mil rows.

We're using Vertica, so there's no indexes. F is an unsegmented table because it's so small, and S is a single super-projection that segments on key and orders by key, day.

kimbo305 fucked around with this message at 05:46 on Jul 22, 2011

Goat Bastard
Oct 20, 2004

benisntfunny posted:

Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.

row_number() OVER (ORDER BY whatever)

Aredna
Mar 17, 2007
Nap Ghost

kimbo305 posted:

S is huge and F is tiny, as is the filtered result set, so that last query is not gonna be ideal. It seems similar to the 2nd variant I posted, which is very slow.

Let's see, S is 25 billion rows and 280 million distinct keys, F is 1.4 mil rows and 1.1 mil distinct keys.
The resulting filtered table is e: 360 mil rows.

We're using Vertica, so there's no indexes. F is an unsegmented table because it's so small, and S is a single super-projection that segments on key and orders by key, day.

That's a bit larger than I was expecting. I've not familiar Vertica, but the segmented setup sounds similar to Greenplum's distributed architecture that I've used before.

With Greenplum I've found performance benefits in similar situations by breaking it up into two steps, where the first step would finding the distinct keys in F and placing them into a them in a table segmented in the same manner as S.

Goat Bastard posted:

benisntfunny posted:

Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.
row_number() OVER (ORDER BY whatever)

This may not work for deleting the duplicated records, but if all repeated records are 100% unique you can select all of the distinct ones into a new table, delete the old table, and then rename the new one. If you are worried about views or anything else breaking a safer solution would be to then delete all of the records out of the old table and insert everything from the new table into the old one.

Both solutions are just taking a sledgehammer to the problem so if you need something more elegant you can modify the second suggestion to only select/delete/insert repeated records.

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

kimbo305 posted:


So this is the new version:
code:
insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
               SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;
I think this is correct. Am still testing the results. What gets me is that this seems quite contrived. Maybe it's just because of the bad data situation we're trying to fix. It's also 3 times slower than the original query. Does anyone have recommendations for other ways to accomplish the same thing?
Unless you mis-typed your sum(dmetric), you're still "over summing" since you didn't include the folder and subfolder. Also, most normal aggregates don't take an order by when you use them as analytic functions.

I would do something like this:
code:
insert into A
  select key, day, metric, dmetric
  from (select key, 
               day,
               metric,
               SUM(dmetric) OVER (PARTITION BY key, day) dmetric, 
               RANK() OVER (PARTITION BY key, day ORDER BY timestamp DESC) as rank
        from S
       where key in (select key from F)) as Z
  where rank=1;
You don't need the FIRST_KEY because the whole point of the RANK() function is to pick out the rows that have the "first key".

Alternatively:
code:
insert into A
  select key, day, metric, sum(dmetric) dmetric
  from (select key, 
               day,
               first_key(metric) over (PARTITION BY key, day ORDER BY timestamp DESC) metric,
               dmetric
        from S
       where key in (select key from F)) as Z
  group by key, day, metric;
In this case, I use the first_key to pull out which metric I want to keep, and then I just do a simple aggregation. In fact, this "feels" the cleanest to me, I just wouldn't do it myself since I pretty exclusively use MSSQL which doesn't have anything like a "FIRST_KEY"

Both these queries assume that Vertica has in or exists or the equivalent (and if it has both, try using exists if in performs poorly). If it has neither, then I do think joining to (select distinct key from F) instead of just F is the way to go. If that really performs horribly, then Aredna is almost certainly on the right track in terms of creating a separate table that has just the distinct keys from F.

Also, part of the problem is that Vertica isn't a normal OTLP-style SQL database, so even though it uses SQL syntax, it behaves differently than what I think most people reading this thread are used to.

Jethro fucked around with this message at 17:40 on Jul 22, 2011

Rescue Toaster
Mar 13, 2003
If I want to select a set of specific records from a table (where you would normally use an IN statement), but have a composite primary key, is there any better way to do it than a mess like this:

code:
SELECT *
FROM mydb
WHERE (((PKey1 = x1) AND (PKey2 = y1)) OR 
       ((PKey1 = x2) AND (PKey2 = y2)) OR
       ((PKey1 = x3) AND (PKey2 = y3)) OR
       ((PKey1 = x4) AND (PKey2 = y4)))
This is on sqlserver 2000. So far I think my only 'cleaner' option is to create a temporary table and do a join against it, but I'm not sure this is faster really. It may look slightly better in code simply because of less string building.


vvv EDIT: At first they all said 'y1', sorry!, it's a matched x,y pair for each primary key 1 & primary key 2.

Rescue Toaster fucked around with this message at 19:51 on Jul 22, 2011

epswing
Nov 4, 2003

Soiled Meat

Rescue Toaster posted:

code:
SELECT *
FROM mydb
WHERE (((PKey1 = x1) AND (PKey2 = y1)) OR 
       ((PKey1 = x2) AND (PKey2 = y1)) OR
       ((PKey1 = x3) AND (PKey2 = y1)) OR
       ((PKey1 = x4) AND (PKey2 = y1)))

This?

code:
SELECT *
FROM mydb
WHERE PKey2 IN (x1, x2, x3, x4) AND PKey2 = y1
Or am I not understanding your query?

Rescue Toaster
Mar 13, 2003
Sorry, I had it written wrong. There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

epswing
Nov 4, 2003

Soiled Meat

Rescue Toaster posted:

No, There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

Oh. Your original post only contained y1.

E: I see...

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players

Rescue Toaster posted:

Sorry, I had it written wrong. There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

WHERE CONCAT(Pkey1, Pkey2) IN ('hurf','durf',etc...)

I think that will work. You might run into an issue where a row has Pkey1 = 'hu' and Pkey2 has 'rf' will match as well as another row with Pkey1 = 'h' and Pkey2 = 'urf'. You could cut down on this by concatenating something between them, like a pipe.

Rescue Toaster
Mar 13, 2003
I will have to check but I'm virtually certain in the table of interest they are ints. I think I'll just say screw it and build the multiple AND statements, it won't be too hard, just a little silly.

Adbot
ADBOT LOVES YOU

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad

Jethro posted:

Unless you mis-typed your sum(dmetric), you're still "over summing" since you didn't include the folder and subfolder. Also, most normal aggregates don't take an order by when you use them as analytic functions.
I indeed mistyped it when I generalized it :3:

The correct version should be:
code:
insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               metric,
               SUM(dmetric) OVER (PARTITION BY F.key, day, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;
I wanted to partition things down to folder and subfolder, as you said.

quote:

You don't need the FIRST_KEY because the whole point of the RANK() function is to pick out the rows that have the "first key".
You're right. I tried it both ways and the FIRST_KEY way seems a bit faster. e: actually, measurably faster. The query planner estimates the non FIRST_KEY way being slower at the analytic phase.

quote:

code:
insert into A
  select key, day, metric, dmetric
  from (select key, 
               day,
               metric,
               SUM(dmetric) OVER (PARTITION BY key, day) dmetric, 
               RANK() OVER (PARTITION BY key, day ORDER BY timestamp DESC) as rank
        from S
       where key in (select key from F)) as Z
  where rank=1;
code:
insert into A
  select key, day, metric, sum(dmetric) dmetric
  from (select key, 
               day,
               first_key(metric) over (PARTITION BY key, day ORDER BY timestamp DESC) metric,
               dmetric
        from S
       where key in (select key from F)) as Z
  group by key, day, metric;
In this case, I use the first_key to pull out which metric I want to keep, and then I just do a simple aggregation. In fact, this "feels" the cleanest to me, I just wouldn't do it myself since I pretty exclusively use MSSQL which doesn't have anything like a "FIRST_KEY"

I'll try these to see how Vertica's query planner estimates them. The temp table is probably still going to be faster, but I'm lazy and would rather not have to set that up.

kimbo305 fucked around with this message at 22:08 on Jul 22, 2011

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply