SELECT * FROM Questions WHERE Type = 'Stupid'

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »

nescience: Jan 24, 2011; h'okay

php:

<?
14  mysql_query("INSERT INTO main 
15  (player_name, player_class, level) VALUES
16  ($_POST["ign"], $_POST["p_class"], $_POST["level"]) ")
17  or die(mysql_error());
?>

I'm getting the error:
Parse error: syntax error, unexpected '"', expecting T_STRING or T_VARIABLE or T_NUM_STRING in /directory/script.php on line 16

Did I write something wrong?

# ? Jul 14, 2011 16:19

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:21

butt dickus: Jul 7, 2007; top ten juiced up coaches
and the top ten juiced up players

Userf4058 posted:

php:
<?
14  mysql_query("INSERT INTO main 
15  (player_name, player_class, level) VALUES
16  ($_POST["ign"], $_POST["p_class"], $_POST["level"]) ")
17  or die(mysql_error());
?>
I'm getting the error:
Parse error: syntax error, unexpected '"', expecting T_STRING or T_VARIABLE or T_NUM_STRING in /directory/script.php on line 16

Did I write something wrong?

1)You've got double-quotes inside of your double-quoted string. Change these to single quotes and put curly braces around your variables.
2)You're setting yourself up for SQL injection. You should be using PDO instead.
3)Using die() and outputting your error for your users to see is not very elegant.

e: with PDO you would do this

php:

<?
$statement = $mysql->prepare("INSERT INTO main (player_name, player_class, level) VALUES (:name, :class, :level)");
$statement->execute(array(":name" => $_POST['ign'], ":class" => $_POST['p_class'], ":level" => $_POST['level']));
?>

butt dickus fucked around with this message at 16:35 on Jul 14, 2011

# ? Jul 14, 2011 16:29

nescience: Jan 24, 2011; h'okay

Doctor rear end in a top hat posted:

code

wow this is great. Thank you so much. Yeah, I'm a newbie.
Btw, how is using DPO's different from something like this?

# ? Jul 14, 2011 16:55

butt dickus: Jul 7, 2007; top ten juiced up coaches
and the top ten juiced up players

Userf4058 posted:

wow this is great. Thank you so much. Yeah, I'm a newbie.
Btw, how is using DPO's different from something like this?

That's more of a question for the PHP thread, but it does several things. PDO will work with other databases like MSSQL, it does prepared statements, transactions and piles of stuff.

# ? Jul 14, 2011 17:02

isnoop: Jan 9, 2001; I used to be an admin,
but then I took an arrow
to the knee.

revmoo posted:

I've just started working with mysql workbench, and if I'm logged into a sql server and I've selected my schema and I type:

DROP DATABASE 'nameofdb';

and hit run, it says there is an error in my syntax. Now I'm sure that syntax is correct. What is the problem here?

You're using single quotes instead of backticks. Try this:
DROP DATABASE `nameofdb`;

# ? Jul 14, 2011 18:09

isnoop: Jan 9, 2001; I used to be an admin,
but then I took an arrow
to the knee.

Thel posted:

What's with fare2? What is it doing?

Also, when you're doing a composite query like this, it's probably easier to build it up in stages and test each stage. Just to make sure, you have a given origin (i.e. JFK), and want one result for each single outbound route, with the cheapest price in your date range?

Fare2 is a self-join used to make sure each row returned contains only the most recent fare for a given route. Joining fare2 to fare based on the primary key being greater in fare2 means that I have the latest record in fare if all fare2 values are null.

Your restatement is precisely what I'm hoping to achieve.

Goat Bastard posted:

If your database supports analytic functions then you probably want something like

code:

WITH   sorted_fare AS
      (SELECT flightfk
             ,price
             ,row_number() OVER (PARTITION BY flightfk ORDER BY price) AS sort_order
       FROM   fare fare
      )
SELECT *
FROM   route r
JOIN   flight flt
       ON flt.routefk = r.id
JOIN   sorted_fare fare
       ON  flt.id = fare.flightfk
       AND fare.sort_order = 1
WHERE  r.status = 'active'
AND    flt.takeoffdate > '2011-07-13'
AND    flt.takeoffdate <= '2011-07-25'
AND    r.fromiata = 'JFK'

MySQL does indeed support that functionality as a basic nested query. I reformed my SQL that way and now it works. Granted, it is slower than before, but it also appears to be returning the correct data for a change.

# ? Jul 14, 2011 18:16

Goat Bastard: Oct 20, 2004

isnoop posted:

MySQL does indeed support that functionality as a basic nested query. I reformed my SQL that way and now it works. Granted, it is slower than before, but it also appears to be returning the correct data for a change.

It's hard to know without seeing the real query, but I suspect that the problem with your original one was the GROUP BY. I'm not a MySQL expert, but I think if your GROUP BY clause doesn't cover all the non-aggregate columns you're trying to use then it just picks an arbitary row that satisfies it. The databases I'm more used to wouldn't even compile that, you'd get an exception saying something like "Column not part of Group By".

Goat Bastard fucked around with this message at 21:58 on Jul 14, 2011

# ? Jul 14, 2011 21:52

MrDoDo: Jun 27, 2004; You better remember quick before we haul your sweet ass down to the precinct.

I am still a SQL newbie so I am looking for some help to understand how to do a update using STUFF with T-SQL. I have a query that is logically how I want things to behave but its kind of a mess. It seems that the STUFF function does not like taking in multiple results from a subquery like this. Is there a better way that I could return the results and then enumerate over them iteratively? I would greatly appreciate any input or criticism

code:


UPDATE bib 
SET    TEXT = REPLACE(TEXT, (SELECT TEXT 
                             FROM   bib                           #Table that stores individual bibliographic records
                             WHERE  bib# IN (SELECT bib# 
                                             FROM   item          #Table that has individual item records with using bib# as a foreign key to the bib table
                                             WHERE  itype = 'a')  #Trying to specifically grab all instances of a particular type of item (book, dvd etc) then use stuff to                         
                                                                   updated the 7th chracter of the string "text" in the bib table.
                                    AND tag = '000'), (SELECT 
                            Stuff((SELECT TEXT 
                                   FROM   bib 
                                   WHERE  bib# IN (SELECT 
                                          bib# 
                                                   FROM 
                                          item 
                                                   WHERE 
                                          itype = 'a') 
                                          AND tag = '000'), 7, 1, 'a'))) 
WHERE  bib# IN (SELECT bib# 
                FROM   item 
                WHERE  itype = 'a') 
       AND tag = '000'

EDIT: I tried to add a little bit more info about what I am trying to accomplish and how the tables are

MrDoDo fucked around with this message at 23:27 on Jul 14, 2011

# ? Jul 14, 2011 23:14

Sprawl: Nov 21, 2005; I'm a huge retarded sperglord who can't spell, but Starfleet Dental would still take me and I love them for it!

Reading what you are trying to do you have a few too many steps doing basically the same thing.

Is the stuff command doing what it should be? Because then you should be doing something much simpler.

Also from the looks of the replace text command its completely unnecessary as its going to always be the same as the field your getting in a sub query.

As far as i can tell this will give you the exact same results.

code:

UPDATE bib 
SET    bib.TEXT= (Stuff(bib.TEXT , 7, 1, 'a'))) )
join item
where
bib.bib# = item.bib#
and item.itype='a'
and tag='000'

# ? Jul 15, 2011 00:26

revmoo: May 25, 2006; #basta

Oh I had no idea SQL used backticks ` right? Learn something new every day.

# ? Jul 15, 2011 01:52

McGlockenshire: Dec 16, 2005; GOLLOCKS!

No, SQL uses double quotes. MySQL uses backticks, but also accepts double quotes. Oh, and MSSQL uses loving square brackets, I am not making that up.

If you want to think cross-platform, think double quotes.

# ? Jul 15, 2011 08:01

MrDoDo: Jun 27, 2004; You better remember quick before we haul your sweet ass down to the precinct.

Sprawl posted:

Reading what you are trying to do you have a few too many steps doing basically the same thing.

Is the stuff command doing what it should be? Because then you should be doing something much simpler.

Also from the looks of the replace text command its completely unnecessary as its going to always be the same as the field your getting in a sub query.

As far as i can tell this will give you the exact same results.
code:
UPDATE bib 
SET    bib.TEXT= (Stuff(bib.TEXT , 7, 1, 'a'))) )
join item
where
bib.bib# = item.bib#
and item.itype='a'
and tag='000'

Wow thats awesome, thank you so much for clarifying. Unfortunately I think my mind got a little too clouded about what exactly the conditions were and just threw it everywhere making this way too messy. This gives me a great new understanding. Thanks again!

# ? Jul 15, 2011 15:56

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

McGlockenshire posted:

No, SQL uses double quotes. MySQL uses backticks, but also accepts double quotes.

Just a quick note, you can make MySQL accept double quotes but the sql mode has to be set appropriately otherwise it thinks double quotes are just string quotes.

420 set session sql_mode = 'ANSI_QUOTES,TRADITIONAL' every day

# ? Jul 15, 2011 19:24

Bob Morales: Aug 18, 2006; ~~Just wear the fucking mask, Bob~~

I don't care how many people I probably infected with COVID-19 while refusing to wear a mask, my comfort is far more important than the health and safety of everyone around me!

Any tips on my.cnf settings for InnoDB?

Going from this site:

http://www.mysqlperformanceblog.com/2006/05/30/innodb-memory-usage/

I've decided to change the following settings on our server.

code:

innodb_buffer_pool_size=8096M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=6
innodb_flush_log_at_trx_commit=2
innodb_log_buffer_size=4M

cat /proc/meminfo on the server shows:

MemTotal: 12299960 kB
MemFree: 2467744 kB

Our ibdata1 file is 16G

The server is has 3 vCPU's, Xeon L5520 @ 2.27GHz. The Ruby/Apache server has 5, and also has 12GB. Wouldn't it make more sense to give the DB server more RAM and maybe equal out the CPU's? Not sure if it's a single quad with HT enabled, or a dual-quad with HT disabled. Maybe we should enable HT and give each server 8 vCPU?

# ? Jul 15, 2011 23:46

ufarn: May 30, 2009

Cross-posting my problem(s) from the Django thread:

(I am going through the Models chapter of the Django Book, and I am trying to set up a database. I assume that I have to create any database I want through a separate MySQL console, and that I can't do this through any Python/Django command.)

For some reason both WAMP and XAMPP won't work on my Windows 7 installation; the first one throws a CLI error, and the latter won't give me access to the "admin" settings for my MySQL server.

With both, I get this in the MySQL console:

code:

mysql> CREATE DATABASE goondb;
ERROR 1044 (42000): Access denied for user ''@'localhost' to database 'goondb'
mysql>

Note that neither prompt me for a password at any time.

I have played around with WAMP on my computer before (a couple of months ago), and it's possible that something else has interfered with it meanwhile, but I have no idea how the hell to troubleshoot it. There seem to be some really high-brow Windows fixes for some problems like messing around with DLLs and the registry and whatnot, but they seem way too dangerous for my taste.

I have tried uninstalling and reinstalling programs and deleted folders in C:\WAMP \XAMPP, but to no avail.

# ? Jul 16, 2011 19:43

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

Why are you using wamp for django development? Just install the mysql binaries and then use the mysql workbench tool to administer it.

# ? Jul 16, 2011 20:29

ufarn: May 30, 2009

MEAT TREAT posted:

Why are you using wamp for django development? Just install the mysql binaries and then use the mysql workbench tool to administer it.

How so? I am just getting into Django and general deployment.

# ? Jul 16, 2011 20:34

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

ufarn posted:

How so? I am just getting into Django and general deployment.

Well it includes php which isn't what you use to program django, as you know, so that's just extra crap. Apache is a full web server which you really shouldn't mess with until you want to deploy your app in production. Django has a built in web server which you should use because configuring Apache to work with python is not that interesting and full of pitfalls like the ones you are having.
Install mysql from the binaries on their site and continue your examples.

# ? Jul 17, 2011 01:01

ufarn: May 30, 2009

MEAT TREAT posted:

Well it includes php which isn't what you use to program django, as you know, so that's just extra crap. Apache is a full web server which you really shouldn't mess with until you want to deploy your app in production. Django has a built in web server which you should use because configuring Apache to work with python is not that interesting and full of pitfalls like the ones you are having.
Install mysql from the binaries on their site and continue your examples.

I've downloaded the Workbench and the Python-MySQL, but I can't find a link for the binaries, although I'm sure I stumbled upon them before. Got a link?

EDIT: The server tool also fails to set up a MySQL server for the same reason:

quote:

Connecting to MySQL server localhost...

Can't connect to MySQL server on '127.0.0.1' (10061)

ufarn fucked around with this message at 10:19 on Jul 17, 2011

# ? Jul 17, 2011 08:48

Janitor Prime: Jan 22, 2004; PC LOAD LETTER

What da fuck does that mean; Fun Shoe

You don't have the server installed. That's the most important part. When you install it make sure you install it as a windows service, otherwise you need to start mysql manually every time you reboot.

http://dev.mysql.com/downloads/mysql/

# ? Jul 17, 2011 17:07

ufarn: May 30, 2009

MEAT TREAT posted:

You don't have the server installed. That's the most important part. When you install it make sure you install it as a windows service, otherwise you need to start mysql manually every time you reboot.

http://dev.mysql.com/downloads/mysql/

It works now, thanks. Don't know why the mysql PATH command gives me

code:

C:\Users\niclas\Dropbox\Programming Projects\Django Projects\mysite>mysql
ERROR 1045 (28000): Access denied for user 'ODBC'@'localhost' (using password: N
O)

But I am able to CREATE DATABASE booksdb, when I load the dedicated client prompt from the Start menu (which I can use to set up Django).

I should probably find out where the database is stored on my computer, too ...

# ? Jul 17, 2011 17:37

angrytech: Jun 26, 2009

How do the OS numbers break down for *SQL development? Is there a general trend towards windows? Obviously anyone doing MSSQL probably isn't going to be using linux.

# ? Jul 17, 2011 18:36

Scaramouche: Mar 26, 2001; SPACE FACE! SPACE FACE!

This is off the cuff, but I'd guess that Windows is relatively dominant on the small-medium end and then probably nearly disappears on the big iron/enterprise side. I'm guessing if they're running Apache they're not using MSSQL/Running IIS not using MySQL/etc.

Can't really find much concrete on database server numbers, but it's probably an extrapolation of this as I don't see too many shops mixing flavours:
http://greatstatistics.com/
You might be able to get more by clicking around in this site:
http://trends.builtwith.com/Web%20Server
But I was only able to find web servers, not database servers (though they have a separate application frameworks section as well).

# ? Jul 19, 2011 21:01

benisntfunny: Dec 2, 2004; I'm Perfect.

angrytech posted:

How do the OS numbers break down for *SQL development? Is there a general trend towards windows? Obviously anyone doing MSSQL probably isn't going to be using linux.

Without getting too specific... I work for a large company. We use a handful of DB2 databases.... Couldn't tell you what they're on. There are probably only two or three big ones running DB2. Thousands of SQL Server databases running on Windows Server and thoundands of Oracle DBs on solaris.

For development I use SQL Server Management Studio, Toad, DB2 Command Editor and even for the sake of just having another editor, SQL Developer (actually because it's free and the company direction is to stop paying for Toad).

Anyway, we are moving generally more towards Oracle mostly. Stability wise the windows server machines have been hit or miss where as the oracle boxes don't really ever have many problems.

# ? Jul 21, 2011 14:45

kimbo305: Jun 9, 2007; actually, yeah, I am a little mad

A while back I asked about a very related issue, and I'm back begging for more help.

I've got a very large source table that I want to filter into another table, by way of joining on another table, F.

F looks like this:

code:

folder subfolder key acct

acct is a bit misleading, as most of the time it's NULL, and only non-NULL in special cases. A key can be present in multiple folders and subfolders, which are both ints. subfolder values can be shared across folders.

The target table A looks like this:

code:

key timestamp day metric

The source table S looks like this:

code:

key timestamp day metric dmetric

dmetric is the change in the metric (an int) since the last update, which is usually but not always the previous day.

Same as last time, we had originally designed everything around there being only a single row per key per day. But in fact, our upstream systems are sometimes spitting out multiple records a day. Something like this:

code:

key1   2011-07-21 8:01am   2011-07-21   100  20
key1   2011-07-21 2:29pm   2011-07-21   140  40

This was the old filtering query:

insert into A
select distinct F.key, day, metric, dmetric
from S join F on S.key = F.key

Dead simple. The distinct was kinda cheesy but it ran pretty fast so I had no reason to mess with it.

As it turns out, things get really screwy downstream because of the assumption of only 1 row per day per key.

I've been trying to figure out how to modify the query to basically turn the two example rows into this:

code:

key1   <some ts, don't care>   2011-07-21   140  60

So basically the latest metric value and the cumulative delta for all redundant rows put together.

This is what I had at first:

code:

insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day),
               SUM(dmetric) OVER (PARTITION BY F.key, day)
        from S join F on S.key = F.key) as Z
  where rank=1;

The mistake I made was two-fold:
- joining just on key means multiple rows of S because of multiple rows with the same key in different folders and subfolders
- SUM oversumming in those cases

So this is the new version:

code:

insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
               SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;

I think this is correct. Am still testing the results. What gets me is that this seems quite contrived. Maybe it's just because of the bad data situation we're trying to fix. It's also 3 times slower than the original query. Does anyone have recommendations for other ways to accomplish the same thing?

e: one thing that's sort of inconsistent looking is the FIRST_KEY. I don't specify the same longwinded ORDER for it because all of the dupe rows should have the same timestamp, which is the only thing I care about there. But that doesn't take away from my main goal, which is an overall cleaner way to solve the problem.

Oh, I tried this:

code:

select F.key, day,
       FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
       SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC)
from (select distinct key from S) as S` join F on S`.key = F.key) as Z

But this seems much slower, just from letting it run and from looking at the query plan.

kimbo305 fucked around with this message at 23:31 on Jul 21, 2011

# ? Jul 21, 2011 23:26

Aredna: Mar 17, 2007; Nap Ghost

I think this gives you what you want:

code:

insert into A
select S.key, S.day, max(S.metric), sum(S.dmetric)
from S join F on S.key = F.key
group by S.key, S.day

# ? Jul 22, 2011 01:37

benisntfunny: Dec 2, 2004; I'm Perfect.

Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.

# ? Jul 22, 2011 03:31

kimbo305: Jun 9, 2007; actually, yeah, I am a little mad

Aredna posted:

I think this gives you what you want:

code:

insert into A
select S.key, S.day, max(S.metric), sum(S.dmetric)
from S join F on S.key = F.key
group by S.key, S.day

Hmmmm that's really elegant, but the metric can increase or decrease. Is there way to patch that in?

e: I suppose I could interpolate and use AVG for both... not quite the same behavior but probably good enough. I'll perf test that.

kimbo305 fucked around with this message at 05:18 on Jul 22, 2011

# ? Jul 22, 2011 04:54

Aredna: Mar 17, 2007; Nap Ghost

I almost made a comment about assuming it was strictly increasing for that to work too. Since it can go either way you'll have to use rank (or row_number) to get the last record for the day based on the timestamp, which is almost always going to run slower.

This should work too and should be faster if few keys are filtered out by the join to F.

code:

insert into A
select z.key, z.day, z.metric, z.dmetric
from (select
         S.key,
         S.day,
         S.metric,
         RANK() OVER (PARTITION BY S.key, S.day ORDER BY S.timestamp DESC) as rank,
         SUM(S.dmetric) OVER (PARTITION BY key, day) as dmetric
      from S ) as z
   inner join (select distinct key from F) as y on z.key = y.key
where z.rank = 1

If this runs too slow as well you may try posting the indexes on S and F. It would also be useful to know roughly what % of keys in S are filtered out by the join to F in your original query.

# ? Jul 22, 2011 05:21

kimbo305: Jun 9, 2007; actually, yeah, I am a little mad

S is huge and F is tiny, as is the filtered result set, so that last query is not gonna be ideal. It seems similar to the 2nd variant I posted, which is very slow.

Let's see, S is 25 billion rows and 280 million distinct keys, F is 1.4 mil rows and 1.1 mil distinct keys.
The resulting filtered table is e: 360 mil rows.

We're using Vertica, so there's no indexes. F is an unsegmented table because it's so small, and S is a single super-projection that segments on key and orders by key, day.

kimbo305 fucked around with this message at 05:46 on Jul 22, 2011

# ? Jul 22, 2011 05:39

Goat Bastard: Oct 20, 2004

benisntfunny posted:

Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.

row_number() OVER (ORDER BY whatever)

# ? Jul 22, 2011 08:00

Aredna: Mar 17, 2007; Nap Ghost

kimbo305 posted:

S is huge and F is tiny, as is the filtered result set, so that last query is not gonna be ideal. It seems similar to the 2nd variant I posted, which is very slow.

Let's see, S is 25 billion rows and 280 million distinct keys, F is 1.4 mil rows and 1.1 mil distinct keys.
The resulting filtered table is e: 360 mil rows.

We're using Vertica, so there's no indexes. F is an unsegmented table because it's so small, and S is a single super-projection that segments on key and orders by key, day.

That's a bit larger than I was expecting. I've not familiar Vertica, but the segmented setup sounds similar to Greenplum's distributed architecture that I've used before.

With Greenplum I've found performance benefits in similar situations by breaking it up into two steps, where the first step would finding the distinct keys in F and placing them into a them in a table segmented in the same manner as S.

Goat Bastard posted:

benisntfunny posted:
Anyone know of a way to select something like ROWNUM in DB2? I see a way to order by it but I want to select it. I found a really old lovely database where the assholes have new PK and a lot of repeating records I want to delete.

row_number() OVER (ORDER BY whatever)

This may not work for deleting the duplicated records, but if all repeated records are 100% unique you can select all of the distinct ones into a new table, delete the old table, and then rename the new one. If you are worried about views or anything else breaking a safer solution would be to then delete all of the records out of the old table and insert everything from the new table into the old one.

Both solutions are just taking a sledgehammer to the problem so if you need something more elegant you can modify the second suggestion to only select/delete/insert repeated records.

# ? Jul 22, 2011 15:25

Jethro: Jun 1, 2000; I was raised on the dairy, Bitch!

kimbo305 posted:

So this is the new version:
code:
insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               FIRST_KEY(metric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC),
               SUM(dmetric) OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;
I think this is correct. Am still testing the results. What gets me is that this seems quite contrived. Maybe it's just because of the bad data situation we're trying to fix. It's also 3 times slower than the original query. Does anyone have recommendations for other ways to accomplish the same thing?

Unless you mis-typed your sum(dmetric), you're still "over summing" since you didn't include the folder and subfolder. Also, most normal aggregates don't take an order by when you use them as analytic functions.

I would do something like this:

code:

insert into A
  select key, day, metric, dmetric
  from (select key, 
               day,
               metric,
               SUM(dmetric) OVER (PARTITION BY key, day) dmetric, 
               RANK() OVER (PARTITION BY key, day ORDER BY timestamp DESC) as rank
        from S
       where key in (select key from F)) as Z
  where rank=1;

You don't need the FIRST_KEY because the whole point of the RANK() function is to pick out the rows that have the "first key".

Alternatively:

code:

insert into A
  select key, day, metric, sum(dmetric) dmetric
  from (select key, 
               day,
               first_key(metric) over (PARTITION BY key, day ORDER BY timestamp DESC) metric,
               dmetric
        from S
       where key in (select key from F)) as Z
  group by key, day, metric;

In this case, I use the first_key to pull out which metric I want to keep, and then I just do a simple aggregation. In fact, this "feels" the cleanest to me, I just wouldn't do it myself since I pretty exclusively use MSSQL which doesn't have anything like a "FIRST_KEY"

Both these queries assume that Vertica has in or exists or the equivalent (and if it has both, try using exists if in performs poorly). If it has neither, then I do think joining to (select distinct key from F) instead of just F is the way to go. If that really performs horribly, then Aredna is almost certainly on the right track in terms of creating a separate table that has just the distinct keys from F.

Also, part of the problem is that Vertica isn't a normal OTLP-style SQL database, so even though it uses SQL syntax, it behaves differently than what I think most people reading this thread are used to.

Jethro fucked around with this message at 17:40 on Jul 22, 2011

# ? Jul 22, 2011 17:32

Rescue Toaster: Mar 13, 2003

If I want to select a set of specific records from a table (where you would normally use an IN statement), but have a composite primary key, is there any better way to do it than a mess like this:

code:

SELECT *
FROM mydb
WHERE (((PKey1 = x1) AND (PKey2 = y1)) OR 
       ((PKey1 = x2) AND (PKey2 = y2)) OR
       ((PKey1 = x3) AND (PKey2 = y3)) OR
       ((PKey1 = x4) AND (PKey2 = y4)))

This is on sqlserver 2000. So far I think my only 'cleaner' option is to create a temporary table and do a join against it, but I'm not sure this is faster really. It may look slightly better in code simply because of less string building.

vvv EDIT: At first they all said 'y1', sorry!, it's a matched x,y pair for each primary key 1 & primary key 2.

Rescue Toaster fucked around with this message at 19:51 on Jul 22, 2011

# ? Jul 22, 2011 19:40

epswing: Nov 4, 2003; Soiled Meat

Rescue Toaster posted:

code:

SELECT *
FROM mydb
WHERE (((PKey1 = x1) AND (PKey2 = y1)) OR 
       ((PKey1 = x2) AND (PKey2 = y1)) OR
       ((PKey1 = x3) AND (PKey2 = y1)) OR
       ((PKey1 = x4) AND (PKey2 = y1)))

This?

code:

SELECT *
FROM mydb
WHERE PKey2 IN (x1, x2, x3, x4) AND PKey2 = y1

Or am I not understanding your query?

# ? Jul 22, 2011 19:48

Rescue Toaster: Mar 13, 2003

Sorry, I had it written wrong. There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

# ? Jul 22, 2011 19:50

epswing: Nov 4, 2003; Soiled Meat

Rescue Toaster posted:

No, There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

Oh. Your original post only contained y1.

E: I see...

# ? Jul 22, 2011 19:51

butt dickus: Jul 7, 2007; top ten juiced up coaches
and the top ten juiced up players

Rescue Toaster posted:

Sorry, I had it written wrong. There's a matched pair of primary key 1 and primary key 2. It would look more like (not valid SQL obviously)

WHERE (Pkey1, Pkey2) IN ((x1,y1),(x2,y2),(x3,y3),(x4,y4))

Both values must be specific to return that row.

WHERE CONCAT(Pkey1, Pkey2) IN ('hurf','durf',etc...)

I think that will work. You might run into an issue where a row has Pkey1 = 'hu' and Pkey2 has 'rf' will match as well as another row with Pkey1 = 'h' and Pkey2 = 'urf'. You could cut down on this by concatenating something between them, like a pipe.

# ? Jul 22, 2011 19:59

Rescue Toaster: Mar 13, 2003

I will have to check but I'm virtually certain in the table of interest they are ints. I think I'll just say screw it and build the multiple AND statements, it won't be too hard, just a little silly.

# ? Jul 22, 2011 21:07

Adbot: ADBOT LOVES YOU

# ? Jun 8, 2024 00:21

kimbo305: Jun 9, 2007; actually, yeah, I am a little mad

Jethro posted:

Unless you mis-typed your sum(dmetric), you're still "over summing" since you didn't include the folder and subfolder. Also, most normal aggregates don't take an order by when you use them as analytic functions.

I indeed mistyped it when I generalized it :3:

The correct version should be:

code:

insert into A
  select key, day, metric, dmetric
  from (select F.key, day, RANK() OVER (PARTITION BY F.key, day ORDER BY timestamp DESC, folder, subfolder) as rank,
               metric,
               SUM(dmetric) OVER (PARTITION BY F.key, day, folder, subfolder)
        from S join F on S.key = F.key) as Z
  where rank=1;

I wanted to partition things down to folder and subfolder, as you said.

quote:

You don't need the FIRST_KEY because the whole point of the RANK() function is to pick out the rows that have the "first key".

You're right. I tried it both ways and the FIRST_KEY way seems a bit faster. e: actually, measurably faster. The query planner estimates the non FIRST_KEY way being slower at the analytic phase.

quote:

code:

insert into A
  select key, day, metric, dmetric
  from (select key, 
               day,
               metric,
               SUM(dmetric) OVER (PARTITION BY key, day) dmetric, 
               RANK() OVER (PARTITION BY key, day ORDER BY timestamp DESC) as rank
        from S
       where key in (select key from F)) as Z
  where rank=1;

code:

insert into A
  select key, day, metric, sum(dmetric) dmetric
  from (select key, 
               day,
               first_key(metric) over (PARTITION BY key, day ORDER BY timestamp DESC) metric,
               dmetric
        from S
       where key in (select key from F)) as Z
  group by key, day, metric;

I'll try these to see how Vertica's query planner estimates them. The temp table is probably still going to be faster, but I'm lazy and would rather not have to set that up.

kimbo305 fucked around with this message at 22:08 on Jul 22, 2011

# ? Jul 22, 2011 22:06

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > SELECT * FROM Questions WHERE Type = 'Stupid'

«‹›303 »