Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
var1ety
Jul 26, 2004
Oracle has a CONNECT BY operator that lets you do this.

quote:

select c
from foo
connect by (foo.p = prior foo.c)
start with foo.p = 123
order by 1

quote:

124
125
126
127
128
129
130
131
188

If I couldn't do that I would write a recursive row-returning function to crawl the tree.

http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

The above link has info on MySQL approaches to handling nested data.

Adbot
ADBOT LOVES YOU

var1ety
Jul 26, 2004
Selecting non-grouped columns in a GROUP BY statement returns inconsistent results.

Take the MAX of your user's post dates and order by the result of this aggregate function.

var1ety
Jul 26, 2004
Oracle 10G XE is free, with the following limitations:

4GB user data
1GB RAM
1 CPU

http://www.oracle.com/technology/products/database/xe/index.html

var1ety
Jul 26, 2004

fletcher posted:

Great info about postgres guys, thanks. If I was going to try something new should I go postgres or oracle 10g? I don't think I will be going beyond there computer limitations any time soon.

I would use Oracle 10g XE or Microsoft SQL Server XE if I was learning something new. They're both free and mature, with Microsoft SQL Server XE having the advantage of integrating with Visual Studio Express out of the box. My preference is Oracle.

I believe that PostgreSQL is still lacking analytic functions with windowing, a feature I would have a very hard time living without.

var1ety
Jul 26, 2004

Sergeant Hobo posted:

I am taking a class where we are using Oracle 10g. I want to know if there a way to do one command that can be used to drop all the table at once (mainly because I don't feel like typing drop table tablename; a billion times, not that I'm lazy or anything). A quick Google search came up with this command:

code:
select 'drop table ', table_name, 'cascade constraints \;' from user_tables;
but that doesn't seem to work as it only is showing the tables (though it did lead to a nifty command to show the constraints in a table).

Run something like this. You can instantiate it as a procedure for easy reuse.

code:
begin
for row in (select table_name from user_tables)
loop
    execute immediate 'drop table ' || row.table_name;
end loop;
end;

var1ety
Jul 26, 2004
No forward slash at the end, and a space before the word "cascade". Yes, adding "cascade constraints" will handle foreign keys.

var1ety
Jul 26, 2004

Charun posted:

I have a date related problem.

I would create a table of months with their beginning date (00:00:00) and ending date (23:59:59) and join against that. You'd want to get the month the event started in, the event the month ended in, then all the months between there.

The most straightforward solution in Oracle syntax follows. MySQL 5.0 syntax should be very close to this.

code:
select fy_m, sum(days_in_month)*24
  from (select t.a,
               t.b,
               d3.fy_m,
               case
               -- event starts before, ends after
                 when t.a < d3.begin_date and t.b > d3.end_date then
                  d3.end_date - d3.begin_date
               -- event starts before, ends during
                 when t.a < d3.begin_date and t.b between d3.begin_date and
                      d3.end_date then
                  t.b - d3.begin_date
               -- event starts during, ends after
                 when t.a between d3.begin_date and d3.end_date and
                      t.b > d3.end_date then
                  d3.end_date - t.a
               -- starts during, ends during
                 when t.a between d3.begin_date and d3.end_date and
                      t.b between d3.begin_date and d3.end_date then
                  t.b - t.a
               end AS days_in_month
          from foo_down_event t
         inner join foo_month d1 on (t.a between d1.begin_date and
                                    d1.end_date)
         inner join foo_month d2 on (t.b between d2.begin_date and
                                    d2.end_date)
         inner join foo_month d3 on (d3.end_date <= d2.end_date and
                                    d3.begin_date >= d1.begin_date))
 group by fy_m
 order by 1
code:
FY_M	SUM(DAYS_IN_MONTH)*24
2008_M01	13.9997222222222
2008_M02	4

var1ety
Jul 26, 2004

Charun posted:

and it works perfectly! Do you want plat or archives?

No thanks, I just contribute for fun.

var1ety
Jul 26, 2004

nbv4 posted:

hmm, thats not working either. Here is my exact query:

Just compute your sorting column in the inner query and sort on it in the outer select directly.

var1ety
Jul 26, 2004

drcru posted:

Why is this:
code:
SELECT DISTINCT(post_id), tag_id
FROM `tagged`
ORDER BY `id` ASC, `post_id` ASC
LIMIT 5
Not showing unique values of post_id?

Yet this:
code:
SELECT DISTINCT(post_id)
FROM `tagged`
ORDER BY `id` ASC, `post_id` ASC
LIMIT 5
does..?

I'm trying to get it to display only one instance of each post_id but my first query only returns all the tag_id's for one post_id.

DISTINCT is not a function, so it cannot actually be applied to a single return column, despite how it looks.

code:
select (1), (2) from dual
That is valid in Oracle.

You'll need to reword what you're looking for - maybe "for each postid the tag of the postid with the largest date". noonches posted an example of a solution to this kind of problem higher up on this page.

var1ety
Jul 26, 2004

Code Jockey posted:

Real quick general question, not specific to MS SQL.

I had heard at one point that specifying a number of returned records is faster than just doing a select, in cases such as knowing, say, there'll be no more than a few thousand and doing a select top 10000. Now ignoring the "you might not know how many are actually there and might miss some", is it faster to explicitly state the max number of rows to return? If I had a non-specific query return 400 rows, and I ran that query specifying I only wanted 400, would the specific query return faster?

You can't say it would always be faster, because there are a lot of "it depends".

With a "top N" query you need to sort before you can return rows, but if you know you only want 400 results the database can use less sort space (only need to keep a window of the top 400, instead of sorting every row returned), which would make the query more efficient, and faster with very large result sets because you would not need to use temp space.

If your query is doing a full table scan it could be faster because it could stop earlier, and it could similarly end earlier with some indexed access paths.

Oracle can, based on statistics, shuffle around some joins and access paths based on the FIRST_ROWS_N (N=1,10,100,1000) hint.

var1ety
Jul 26, 2004

Grigori Rasputin posted:

I'm pretty new to MySQL and am wondering how best to enforce a set of data rules at the table level. The basic idea is that I want to link like data together, but never repeat a combination of it. I can deal with it in code, but I'd rather prevent it as low as possible.

Assume the following table:
first_id INT
second_id INT

the following rules:
1. first_id and second_id cannot be equal.
2. No combination of first_id and second_id can repeat. For example, you can't have the following records:
code:
first_id   | 0 | 1
second_id  | 1 | 0
I'm trying to link records in a table to records in the same table and never want an item to link to itself. How do I do it? Is there a better structure?

It sounds like you want to store and enforce integrity on a directed acyclic graph in the database. Here's a good Google result talking about DAGs and databases:

http://69.10.233.10/KB/database/Modeling_DAGs_on_SQL_DBs.aspx

I didn't read the whole article, but you could handle (1) with a direct BEFORE INSERT trigger, and you could handle (2) in a similar fashion by making sure that when a user tries to add edge e' (n->n') to an existing DAG G that edge e'' (n'->n) is not in the transitive closure of G (which would be pre-computed and stored in a table). This table would grow quite large in the case that you had a large number of vertices.

If an edge e' introduces a cycle you would be able to tell very quickly by examining the transitive closure of G and raising an error. Inserting edges that do not introduce cycles (or removing edges) would be at worst O(V^3), where V is the number of vertices in G (in the case you choose to fully recompute it after each operation).

You'd be doing this in code in a similar fashion anyway, and it will be faster in the database since you will have persistent, pre-computed information about your graphs, but it's probably going to be easier to keep the logic in your code.

var1ety
Jul 26, 2004

indigoe posted:

Should I be indexing more fields that are included in my WHERE clauses? I'm concerned I will end up with an index that's near as big as the database itself. Is there some other way to approach this?

Indexes are not always good, but a common rule of thumb is that if you are pulling out less than 15% of the rows in the table then evaluating a new index on a column (or group of columns) with a lot of distinct combinations might help. With a table that large any indexes will necessarily be large and cumbersome.

In my own experience I have found that the problem can usually be reduced by pre-computing intermediary values and storing them as additional columns, or in a secondary table. The strategies MySQL has available to it for reporting tasks are very limited, as compared to SQL Server and Oracle, but you want to do as much as you can in the database before you offload it to another system for processing. It will almost *never* be faster to offload calculations to another language if these same calculations can be done directly (through direct SQL or through procedures) in the database.

var1ety
Jul 26, 2004

nbv4 posted:

I have a table thats about to be populated with a few hundred thousand records. The table has about 34 columns. Some of these columns are boolean columns that could be combined with one another. For instance one column called "is_car" and another "is_motorcycle", which could be combined as "vehicle_type" with a value of either NULL, "m", or "c".

The way it is now works fine, I'm just wondering if it's worth the trouble to get a little more performance (if any performance at all) by restructuring the table. Will it matter at all in the long run?

It doesn't matter with small columns like that. If you consolidated them it would just reduce the width of the row in the raw data file by a few bytes.

Make sure you don't consider making an attribute table of (src_id, attrname, attrval) and storing your attributes in a 1:N relationship like that.

var1ety
Jul 26, 2004

Begby posted:

Why not?

Space is effectively unlimited on databases, but CPU power is not. You should design your database to answer the questions you plan to ask quickly and efficiently.

People model their database like this for a lot of reasons, but it's usually to avoid making concrete design decisions or to get around their DBAs (if it's for space it's an even worse reason).

It's no question that flattening the data to answer simple questions will be faster when that data is pre-flattened. You'll experience decreased concurrency through this system, and complex questions will be even harder for the database to answer quickly. God help you if you're using self joins to answer these questions.

var1ety
Jul 26, 2004

Zedlic posted:

Thanks for this.

My basic idea was to do something like this, but on a table that will contain hundreds of thousands of rows and grows about 5.000 rows a day this query will get pretty expensive.

The main problem is that the ACK_number isn't unique (loops from 0 to 999) so joining on that will return a lot of unwanted rows. So my fix was to only join on ACK_numbers that are less than 10 minutes apart, but that's a pretty cheap fix that made the query even more needlessly complicated and slow.

And as a side effect this approach pairs every incoming messages with every other incoming message as well, since they all have an ACK_number of 0.

If an UP/DN ACK pair is *always* present (the other end always replies) and they are logged reasonably in order and there are no duplicates and you're using Oracle or SQL Server you could use an analytic function like FIRST_VALUE to compute the final table efficiently.

Otherwise, the query is going to be necessarily slow unless you help stage the join condition. One way you could do this is translating this rolling 0-999 value into a real value in the range 0-999,999,999 (a big number you won't ever reach) and storing this in another table, or computing it on the fly with a function. You could then index this function/column, and joins would become very fast.

var1ety
Jul 26, 2004

Zedlic posted:

I've thought about mapping the ACK_number to something bigger that won't realistically loop ever, but the problem is finding a mapping function that maps each pair of ACK_numbers to a unique number. Any ideas on that are appreciated.

If you created a new column ack_number_noloop then you could count the number of ACKs with that sequence number with a smaller timestamp, multiply that count by 1000, and add the ack_number. This is a way to link the two records together as yaoi prophet recommended. It will work as long as when you get message 500 then the ACK for the previous message 500 has already been committed to the database.

var1ety
Jul 26, 2004
I'd do this

code:
select s.name AS size, coalesce(t.ct, 0) AS ct
  from (select sid, count(*) AS ct from person group by sid) t
 right outer join size s on (s.id = t.sid)
 order by 1

var1ety
Jul 26, 2004

epswing posted:

Can you tell me why this is the same/better/worse?

I think it's easier to understand what the query is doing when you break the query up. Additionally, while using the case function in that way is convenient, it is hard on the database because Oracle has to evaluate it for every row.

I benchmarked the two original queries and found that my query ran in 50% of the time.

I then benchmarked my original query against a modified version that uses the case method, and found that my query ran in 70% of the time.

code:
select sum(ct)
  from (select s.name, coalesce(t.ct, 0) AS ct
          from (select sid,
                       sum(case
                             when sid is not null then
                              1
                             else
                              0
                           end) AS ct
                  from p2
                 group by sid) t
         right outer join s on (s.id = t.sid));

var1ety
Jul 26, 2004

Stephen posted:

code:
SELECT * 
FROM autos a 
LEFT JOIN (
	SELECT 
		v.auto_id, 
		COUNT(v.id) as view_count 
	FROM views v 
	WHERE v.view_type_id = '3' 
	GROUP BY v.auto_id 
) vc ON vc.auto_id = a.id 
LEFT JOIN ( 
	SELECT * 
	FROM (
		SELECT id, auto_id, photo_format
		FROM photos 
		ORDER BY orderid ASC 
	) as test 
	GROUP BY auto_id 
) p ON p.auto_id = a.id 
LEFT JOIN makes ON a.make_id = makes.id 
LEFT JOIN models ON a.model_id = models.id 
INNER JOIN auto_upsales au ON au.auto_id = a.id 
WHERE (a.auto_deleted='0000-00-00' OR a.auto_deleted >=NOW())
	AND au.upsale_end > NOW() 
	AND au.upsale_id = 1 
ORDER BY vc.view_count ASC 
Is there anything I can do to improve the efficiency of this query? The views table is pushing 5 million records, and the photos table is up to near 200 000 so it's really starting to slow down. I have views.auto_id and views.view_type_id both indexed but it's still taking quite awhile.

The query is trying to return, for each make/model, the make, the model, the number of times it has been viewed, one photo, and something from auto_upsales?

var1ety
Jul 26, 2004

Aredna posted:

Right off you can change this:
code:
LEFT JOIN ( 
	SELECT * 
	FROM (
		SELECT id, auto_id, photo_format
		FROM photos 
		ORDER BY orderid ASC 
	) as test 
	GROUP BY auto_id 
) p ON p.auto_id = a.id 
to
code:
LEFT JOIN ( SELECT id, auto_id, photo_format
	    FROM photos 
	    ORDER BY orderid ASC 	
) p ON p.auto_id = a.id 
and achieve the same result. The group by isn't doing anything because you are selecting other fields that are not in a sum/min/max etc.

The GROUP BY will still force the query to return only one row per auto_id in the photos table. The ORDER BY clause in the original sub-select will probably make it return the row for the auto_id having the largest or smallest orderid. Taking this out of the sub-select will probably cripple performance, doing an outer join against a 200k row table.

You gotta feel bad for MySQL users. Without having analytics you really have to bend over backwards to avoid self joins.

var1ety
Jul 26, 2004

Stephen posted:

How do you get around something like this in Oracle? If you aren't allowed to reference columns that aren't in the GROUP BY, how do you output them with the rest of the row? Is this a MySQL habit that I should kick right now before I get into Oracle/SQLServer?

In older versions you would need to self join (get the id and maximum value and then join back to the original table). In newer versions you can use analytics such as row_number() or rank() to compute an ordering which you then filter on.

var1ety
Jul 26, 2004

Stephen posted:

Is there anything I can do to improve the efficiency of this query? The views table is pushing 5 million records, and the photos table is up to near 200 000 so it's really starting to slow down. I have views.auto_id and views.view_type_id both indexed but it's still taking quite awhile.

There aren't any great approaches to solve this problem in a generic way since at the end of the day you still need to aggregate 5 million records, and compute a possibly large outer join. The best thing to do is probably to asynchronously pre-compute part or all of the query and refresh/update on insert or on a timer depending on your needs and your workload. You're probably insert few read many so the additional time added to insert operations is a worthwhile investment.

Make sure you look at MySQL concurrency control and make adequate accommodations so that queries don't block while the tables are refreshing.

var1ety
Jul 26, 2004

MrHyde posted:

Thanks for the help. I wasn't really clear. I don't want a list of integers, I just want a list of comma separated values I have, a better example would be ('firstVal','2ndval','3rdval')

I guess the actual problem is I have a list of values from an outside source and I have a table. I want to figure out which values are in the list, but not in the table.

A pattern I've used with Oracle in the past that should be portable is to use a dummy table that is known to have more rows than elements in your input string to parse the string with substr/instr, using your current row number as an offset. You will need to find a way to number the rows, possibly using row_number() or rank(), in the case you do not have a rownum pseudo-column.

See the following (the input string 'a,b,c,d' would be passed in and duplicated in several spots in the query).

code:
select case
         when instr(base_str, ',') = 0 then
          base_str
         else
          substr(base_str, 1, instr(base_str, ',') - 1)
       end AS part
  from (select substr('a,b,c,d', instr('a,b,c,d', ',', 1, rownum) + 1) AS base_str
          from all_objects
         where rownum <=
               length('a,b,c,d') - length(replace('a,b,c,d', ',')) + 1)
code:
PART
b
c
d
a

var1ety
Jul 26, 2004

crazypenguin posted:

(mysql 5) I have a list of names (application side). I need to populate a table with the ids that refer to rows in the names table. Except some of these names may not yet exist in the names table and should be created.

What I'd like to do is:
1. Figure out what names are in my list (application side) that are NOT in the names table yet.
2. Add them to the names table.
3. Do the relatively simple query to add the ids corresponding the my list of names to the other table.

I can't figure out how to do (1) without either creating a temporary table to hold my list of names that I have application side (how do you select data from a database that ISN'T in the database?), or by finding a list that IS in the database, and computing the inverse application side.

So is there as way of doing (1) easily? Should I go with a temporary table? Or should I do it application side? Or is there some other method of accomplishing my goal that I'm not thinking of? (making the name column UNIQUE and just inserting everything does not count!)

If your names list is small you should pass it to your application. If it's large then a good solution might be to wrap the checking/creation SQL in a function which does nothing if the name exists, and otherwise it creates it. Then you would call this function N times, once for each name in your list.

var1ety
Jul 26, 2004

epswing posted:

(Discussed on synirc, posting for posterity: )

code:
select o.id, n1.name, n2.name, n3.name
from offers o
    inner join names n1 on o.nid1=n1.id
    inner join names n2 on o.nid2=n2.id
    inner join names n3 on o.nid3=n3.id
A little ugly, I'm not sure if there's a way to write one join to snag multiple fields like that though.

Anyone?

Using 3 joins is, in my opinion, the appropriate solution to this problem (you will need to use outer joins since the offers table can have nulls). With indexes the look-up against the names table should be very efficient.

Here's a single table join Tom Kyte has suggested for "attribute" tables (http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:31263576751669) that will work in this situation.

code:
select o.id,
       max(case when o.name_id1 is not null and n.id = o.name_id1 then n.name end) AS name_1,
       max(case when o.name_id2 is not null and n.id = o.name_id2 then n.name end) AS name_2,
       max(case when o.name_id3 is not null and n.id = o.name_id3 then n.name end) AS name_3
  from offers o
 inner join names n on (n.id = ANY(o.name_id1, o.name_id2, o.name_id3))
 where o.id = 3
 group by o.id
The query plan transform on this query is pretty nasty in Oracle and it's more expensive then doing 3 joins.

var1ety
Jul 26, 2004

fletcher posted:

What's the most efficient way to query a table for timestamps that are < 15 minutes old in MySQL?

code:
SELECT field FROM table WHERE (TIME_TO_SEC(TIMEDIFF(NOW(), timestamp))/60) < 15
or something like this

code:
SELECT field FROM table WHERE timestamp > UNIX_TIMESTAMP()-15*60
or something else?

In the first expression you are converting a date column from its native format into an integer for a comparison. Stick with the second expression. Although they are meant to be equivalent, the optimizer will more than likely not rewrite the first one to make them equivalent.

var1ety
Jul 26, 2004

MoNsTeR posted:

More or less. I would write it as:
code:
select a.*
  from table_a a
     , table_b b
 where a.key = b.key(+)
   and b.key is null
I find the JOIN keyword and all its SQL99 buddies completely and utterly unreadable, and I think they make learning SQL much, much more difficult than it needs to be.

I'll line up on the other side. ANSI joins are the only thing that make large statements understandable and readable. The (+) operator makes life miserable, has limitations, and even Oracle recommends you get on with your life.

10.2 SQL Reference posted:

Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator. Outer join queries that use the Oracle join operator (+) are subject to the following rules and restrictions, which do not apply to the FROM clause OUTER JOIN syntax:

var1ety
Jul 26, 2004

Alex The Great posted:

I have a SQLite database that I'm trying to search through, and firstly LIKE doesn't use indexing, so I'm having some really slow results.

So I'm trying to move to using =>, as I'm doing a search where I just want all character that are bigger than the first part of the string - if I type Al I want it to list Alabama, and for strings in general, it seems to be using the index with =>.

However, there are also some things I'd like to search that are numeric columns that it treats strictly as numeric columns. So if I search for "5" I'll get the results I want "50, 51, etc" but I'll also get 60, which isn't what I want.

My query started simple, and has evolved to this mess which still doesn't do what I want:

SELECT Report.NumColumn FROM Report WHERE +CAST(Report.NumColumn AS text) >=CAST(5 AS text)

Anyone ever had to do something like this before?

If I understand you correctly you could solve your problem by also adding a substr() predicate. Something like this:

code:
substr(col, 1, length('Al')) = 'A1'
Edit:

A poster named D. Richard Hipp posted some advice for a similar question at http://www.nabble.com/SQLite-Like-Query-Optimization-td15511886.html.

D. Richard Hipp posted:

LIKE operators cannot use indices unless the index is
case insensitive. Use GLOB for case sensitive fields.

LIKE and GLOB operators cannot use indices if the pattern
begins with a wildcard.

Nothing in SQLite will use an index if you are connecting
terms using OR.

It looks like what you really want to use here is a full-text
index. Please read about the FTS3 support in SQLite. That
seems to be what you are trying to accomplish.

var1ety fucked around with this message at 20:43 on Jul 21, 2008

var1ety
Jul 26, 2004

Stephen posted:

I've got two MySQL tables:
canines(id, name, etc.)
votes (id, canine_id)

I want to select the dogs, ordered by the number of votes it has.

My query:
code:
SELECT c.*, v.votecount  
FROM canines c 
LEFT JOIN (
 SELECT canine_id, COUNT(*) AS votecount 
 FROM votes 
 GROUP BY canine_id
) v ON v.canine_id = c.id 
ORDER BY v.votecount DESC
This works perfectly, but it runs extremely slowly. Is there a faster way of doing this query?

Edit: I should mention that I have an index on 'canine_id' on the votes table.

Aggregating the votes table is probably what's taking so long. The query will out of necessity need to do a full table scan to satisfy the request, which can take a long time when the number of rows is large. You can run the query on its own to verify.

There are not a lot of things you can do to tune this passively. You can make sure the votes table is minimized in side (by restricting the maximum length of the field data types) so that more rows can be retrieved at a time, and you can try caching the aggregation operation in some fashion.

If you can do more drastic things I would encourage you to redefine the votes table so that it holds an aggregated total per canine instead of individual votes. The table is probably write few read many, so it makes sense to optimize for reading.

var1ety
Jul 26, 2004

Scarboy posted:

Oracle, I'm not sure about the server version though.

There's a pretty comprehensive thread on the topic on Ask Tom at the following URL:

http://asktom.oracle.com/pls/asktom/f?p=100:11:2397328357508087::::P11_QUESTION_ID:229614022562

var1ety
Jul 26, 2004

Golbez posted:

I'm somewhat of a noob to normalization, so which would be the best option here?

The options:

code:
A: Use numbers, and tell the database what they mean
tblemails
  Frequency: 1, 2, or 3
tblfrequencies
  ID: 1, 2, or 3
  Name: Daily, Weekly, or Monthly
OR

code:
B: Don't use numbers.
tblemails
  Frequency: Daily, Weekly, or Monthly
OR

code:
C: Use numbers, but don't tell the database what they mean; handle the meaning in code, since the three possibilities will never change.
tblemails
  Frequency: 1, 2, or 3
PHP: "if 1, frequency = 'Daily'"
Which is the best practice? Right now, for simplicity, I'm going with #3...

Nobody should ever use C. Ever.

A or B are fine. I would personally use A. Make sure you size your number column appropriately or you will reduce the benefit.

var1ety
Jul 26, 2004

Golbez posted:

How do you mean? I figure a Tinyint would be perfectly fine.

Thanks for the advice about skipping C. I was just hoping to avoid making another table (and I see no point in storing "Daily", etc. in the main table, when a single byte can handle it), but then again, I'm sure a basic lookup table of frequencies will be useful in more than one application.

It wasn't a deep insight, I just meant to choose a datatype appropriate for the expected cardinality of the numeric column. Oracle only has the NUMBER data type, and if you do not size it then it defaults to 22 bytes.

var1ety
Jul 26, 2004

Carabus posted:

I would appreciate some advise on function/algorithm design, no code yet. I'm interested in data mining a table recording web page views. There are nearly 1 million rows (don't ask) but I am only interested in the 80k or so which have nonempty user columns. Another relevant column is page_id.

In order to do sane analysis of this data I need to count records for each user, for each page; then I can record in another table the user, page, and the total number of views.

However, with several thousand pages and users doing this would seem to require several million of queries with MySQL and other horrors. Is there a more sane way?

Isn't this

code:
select user, page_id, count(*) AS ct
  from webstats
 where user is not null
 group by user, page_id
Depending on the proportion of null/non-null user values it might make sense to add an index on the column.

var1ety
Jul 26, 2004

Aredna posted:

Why is IFNULL() superior to COALESCE?

In a mixed-version environment where one version does not support COALESCE it might make sense to standardize on operators available on all Production versions of the database. We are forced to, in some situations, ignore new features in our databases so that scripts will run with minimal modification on older versions.

My personal preference is to use COALESCE/CASE in all situations and ignore our database's large number of null-handling functions.

var1ety
Jul 26, 2004

SHODAN posted:

Quick question: I'm using the MySQL .NET connector in C# and trying to add parameters to a command using something like the following for the command text:

....STR_TO_DATE('?value1/?value2/?value3 8:00 PM','%c/%e/%Y %h:%i %p',?value4);

?value4 comes across fine when I use mysqlcommand.Parameters.AddWithValue, but the other 3 don't. I assume it's because of the single quotes, what's the proper way to use a parameter within single quotes? Escaping it somehow? I've already tried \? with no success.

In Oracle you would use concatenation.

code:
select to_date(:yyyymmdd || '235959', 'YYYYMMDDHH24MISS') from dual;

var1ety
Jul 26, 2004

LightI3ulb posted:

I want to build a query that will use three tables, and I know the speed of each table individually, so I want to maximize the speed with the joins. That being said, does the order of events in the query have an effect on which one is completed first?

So, lets say table A is super fast, B is not so fast, and C is dead slow. I want A to execute first and knock out as many rows as possible before it tries to join with B, and then B to C.

As an example:
select * from A a, B b, C c where a.this > # and a.date > curdate() and a.element=b.element and b.id=c.id

Would this execute in the order I want it to?

It depends on your database vendor and version. Many modern databases leverage index and table statistics to generate their own query plan for answering your question in the way they believe will be the most efficient, such that the order in which you specify your tables and predicates does not matter.

var1ety
Jul 26, 2004

Uziel posted:

I need some help querying some counts from Oracle.
I need to count the records for about 80 tables, but I want to generate that into one query if possible. I don't want the results as one big count, but each table's count preserved.

Is this possible?

Example of this with one table:
code:
select count(*) as Work Order History
from work_order_history
OK, that returns this:
code:
Work Order History
------------------
65,000,000
But I need to have a column for each table return, with their respective counts, like this:
code:
Work Order History | Subscriber Base
------------------   ----------------
65,000,000            123,000
Any ideas?

We generally use Forest&Trees reporting software to interface with the Oracle tables.

There's no intrinsic way to do this in Oracle until 11g (which adds pivot and unpivot). If you're using an older version you'll need to create a query with 80 columns, or create a filter outside of Oracle that will pivot your data set.

var1ety
Jul 26, 2004

Uziel posted:

The server is running Oracle 10.

Do you have an example of the query if it were just two columns instead of 80?

The simplest way is like this:

code:
select (select count(*) from dual) AS sys_dual,
       (select count(*) from SYSTEM_PRIVILEGE_MAP) AS sys_system_privilege_map
  from dual
If you can live with estimates and are auto-analyzing your tables you could query the data dictionary like this:

code:
select max(case
             when owner = 'SYS' and table_name = 'DUAL' then
              num_rows
           end) AS sys_dual,
       max(case
             when owner = 'SYS' and table_name = 'SYSTEM_PRIVILEGE_MAP' then
              num_rows
           end) AS sys_system_privilege_map
  from all_tables

Adbot
ADBOT LOVES YOU

var1ety
Jul 26, 2004

Victor posted:

Have you tried the following?

code:
select (select count(*) from dual) AS sys_dual,
       (select count(*) from SYSTEM_PRIVILEGE_MAP) AS sys_system_privilege_map
Including from dual caused you to run the above for every row in dual, which is clearly not what you wanted.

Dual is a dummy table in Oracle with exactly one row. You must select from a table in Oracle, for better or for worse.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply