Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Victor
Jun 18, 2004
Order doesn't matter. Just reference the proper table name or alias (and alias if the table appears multiple times. You can even update CTE names, probably as long as the corresponding view would be updatable.

Adbot
ADBOT LOVES YOU

Safety Shaun
Oct 20, 2004
the INTERNET!!!1
Instead of using a shedload of individual queries, how would I form a query to pull back the category name and how many posts are in each category as such

code:
CATEGORIES table        POSTS table
category_id             post_id
category_name           category_id
                        <snip>

Vanadium Dame
May 29, 2002

HELLO I WOULD LIKE TO TALK TO YOU ABOUT MY VERY STRONG OPINIONS

Safety Shaun posted:

Instead of using a shedload of individual queries, how would I form a query to pull back the category name and how many posts are in each category as such

code:
CATEGORIES table        POSTS table
category_id             post_id
category_name           category_id
                        <snip>
code:
select c.category_name
     , count(p.post_id)
  from CATEGORIES c
     , POSTS p
 where p.category_id = c.category_id
 group by c.category_name
Should work if I understand your question, Oracle syntax.

megalodong
Mar 11, 2008

Safety Shaun posted:

Instead of using a shedload of individual queries, how would I form a query to pull back the category name and how many posts are in each category as such

code:
CATEGORIES table        POSTS table
category_id             post_id
category_name           category_id
                        <snip>
code:
select max(c.category_name), count(p.post_id)
from categories c
inner join posts p
on c.category_id = p.category_id
group by c.category_id;
The max() is there to avoid things crying about category_name not being in "group by". It doesn't need to be since it's functionally dependant on category_id, but only mysql (that I know of) will allow it.

Safety Shaun
Oct 20, 2004
the INTERNET!!!1

Markoff Chaney posted:

Should work if I understand your question, Oracle syntax.

Thank you, that worked a treat. Page load has increased dramatically.

if wishes were knishes
Mar 5, 2006

Hi I'm Buddy-dot-gif
Is there TSQL for finding the Volume the db file is on?

Without using xp_cmdshell.

Bonus points for also being able to determine the disk usage of said volume.

Wazzerphuk
Feb 9, 2001

Hating Chelsea before it was cool
Winner of the PWM POTM for September
Winner of the PWM POTM for January
Co-Winner of the PWM POTM for March
code:
SELECT substring(filename, 1, 1) FROM master..sysaltfiles WHERE name = DB_NAME() AND dbid = DB_ID()
for the drive, and

code:
EXEC xp_fixeddrives
for the free space. You'll need to dump the output of xp_fixeddrives into a table variable/whatever if you want to tie the two of them together.

Edit: This is based on 2005 - not sure how it'll migrate up or down from there.

mister_gosh
May 24, 2002

mister_gosh posted:

I'm creating some tables to track the number of times user's log in to my database and to track how often certain tools are executed.

I think these are my tables: User (includes location and product their affiliated to), Product, Tool, Login

A typical month may show:

John is located in Sarasota, FL, works with Acme FX (product), logged in to the database once every day in January (twice on 1/15), executed the Grow tool on 1/11, 1/14 and twice on 1/26, executed the Kill Whitey tool on 1/2 and five times on 1/25.

Jane is located in North Pole, Greenland and ...

Based on this stuff, I eventually want to write an application which can extract information such as usage from Florida, John's usage, a certain tools usage, etc.

My question is, how do I model this? I want to track the Grow application on a day to day basis. I want to track that a certain user used it, on which days, etc. I want to pull up the number of logins for all users for 2008, or just John and just for January 2009. Do I create another table called Usage with many date types? Do I create a monthly table with rows for 1-31?

Any links to creating this type of database table?


Ok, Triple Tech, et al, this is what I came up with. I'm not completely sure if these are technically foreign keys or not (I'm self taught and sort of just beginning in RDBMS stuff).

Anyways, based on some analysis, I think I will average about 12,000 events a month. Does this model seem to support that? I'm guessing by everything being a char or int type that this won't fill much data, but I would like to avoid creating any bottlenecks or other problems. Any guess as to how big the initial allotment of space should be? How does a table with eventually 500,000 rows with simple types sound? Normal?

The user table will max out at about 100 rows, the location table will be about 20 rows, the product table about 20 rows and the tool table about 30 rows, so it's just the events I'm concerned with.

As each event occurs, I will create a new row in the usage_event table, fill out the date and get the user id, product id and tool id based on who executed what. The usage_event_id will be auto-incrementing unique id - not sure how to do that yet, but I'm still at the planning stage.

Only registered members can see post attachments!

mister_gosh fucked around with this message at 21:42 on Jan 28, 2009

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
More of a database design question than SQL strictly speaking but here it is:

I have a database (MySQL, but I mostly interact with it using an ORM) that holds biological data. Several of the objects realized from the db have fields that are similar in meaning, and currently identical in implementation. For example:

There is a table of countries. Each country can have several synonyms.
There is a table of organism names (taxa). Each organism can have several synonyms.

There is a table of samples. Each sample can have several external references (URNs).
There is a table of molecular sequences. Each organism can have several external references.

So there will be a table of synonyms and a table for external references. The big question is: should these secondary tables be shared by the parent primary tables? That is, should the country and organism synonyms be in one table ("synonyms") or two ("country_synonyms", "organism_synonyms")?

Arguments for the one table solution:

* Why the hell not?
* It's simpler
* It's fewer tables (and few classes derived from the tables)
* The implementation for the two fields is identical.

Arguments for the two table solution:

* "synonym" and "external reference" can mean slightly different things for the different parent objects. (The external reference for a molecular sequence points at its id or location in another database system, the external reference for a sample means the number scribbled on physical tag hung on a sample when collected.)
* Is the number of tables really a problem?
* The implementation for the two fields is identical _currently_ but what if it changes?

Any ideas, or design principles to consider in these cases?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

outlier posted:

More of a database design question than SQL strictly speaking but here it is:

I have a database (MySQL, but I mostly interact with it using an ORM) that holds biological data. Several of the objects realized from the db have fields that are similar in meaning, and currently identical in implementation. For example:

There is a table of countries. Each country can have several synonyms.
There is a table of organism names (taxa). Each organism can have several synonyms.

There is a table of samples. Each sample can have several external references (URNs).
There is a table of molecular sequences. Each organism can have several external references.

So there will be a table of synonyms and a table for external references. The big question is: should these secondary tables be shared by the parent primary tables? That is, should the country and organism synonyms be in one table ("synonyms") or two ("country_synonyms", "organism_synonyms")?

Arguments for the one table solution:

* Why the hell not?
* It's simpler
* It's fewer tables (and few classes derived from the tables)
* The implementation for the two fields is identical.

Arguments for the two table solution:

* "synonym" and "external reference" can mean slightly different things for the different parent objects. (The external reference for a molecular sequence points at its id or location in another database system, the external reference for a sample means the number scribbled on physical tag hung on a sample when collected.)
* Is the number of tables really a problem?
* The implementation for the two fields is identical _currently_ but what if it changes?

Any ideas, or design principles to consider in these cases?
The single synonym solution is simpler until you end up with an organism and country that have the same id, and then your DB tells you that 'Blue breasted tit warbler' is a synonym for Tunisia.

And as you point out, they aren't really the same thing. You're thinking of maybe keeping them in the same table because you happened to give them the same name when you set up the tables. If you had called them organism_common_names and country_alternate_designations or sequence_external_reference and sample_tag_number you probably wouldn't be asking the question in the first place.

Jethro fucked around with this message at 18:38 on Jan 29, 2009

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

Jethro posted:

The single synonym solution is simpler until you end up with an organism and country that have the same id, and then your DB tells you that 'Blue breasted tit warbler' is a synonym for Tunisia.

Bam. Hadn't thought of that. By and large, for tables that represent fields I use autogenerated IDs, but it's not a decision I'd thought deeply about. Thanks.

quote:

And as you point out, they aren't really the same thing. You're thinking of maybe keeping them in the same table because you happened to give them the same name when you set up the tables. If you had called them organism_common_names and country_alternate_designations or sequence_external_reference and sample_tag_number you probably wouldn't be asking the question in the first place.

For the sake of argument (and better understanding of design), if they were the same and would always be implemented the same, would they belong in the same table together? To cook up an artificial example, say I have two tables that are countries and ecosystems. The boundaries of a country are represented by a polygon that is stored in another table. The boundaries of an ecosystem are also represented by a polygon. Leaving aside the valid point about colliding identifiers as above, do the two sets of polygons belong in the same table?

var1ety
Jul 26, 2004

outlier posted:

For the sake of argument (and better understanding of design), if they were the same and would always be implemented the same, would they belong in the same table together? To cook up an artificial example, say I have two tables that are countries and ecosystems. The boundaries of a country are represented by a polygon that is stored in another table. The boundaries of an ecosystem are also represented by a polygon. Leaving aside the valid point about colliding identifiers as above, do the two sets of polygons belong in the same table?

In my experience the query convenience and the time saved by putting two pieces of data that are only marginally similar into the same table is almost always outweighed by the grief it causes you later in the application's life.

The biggest downside of the model is the inability to effectively use database constraints to police your data, which forces you to do constraint checking in your application, which is almost certainly a mistake.

MrMoo
Sep 14, 2000

Is it possible to SEO-style normalise a URI in a MySQL stored function? I'm thinking of a function that converts "Lingeries sets" into "lingerie-sets", and an appropriate method of querying the database for reverse lookup.

Normalising is easy on the command line:
code:
echo -n "Lingerie Sets" | tr "A-Z" "a-z" | tr -cs "a-z" "-"
lingerie-sets
And the SQL for the reverse lookup seems simple:
code:
SELECT *
FROM   `CATEGORIES`
WHERE  `NAME` REGEXP REPLACE('lingerie-sets','-','.+')
(edit) Maybe better not to strip characters to support other languages, hence something like this:

code:
DROP FUNCTION `sf_uri_normalise`//
CREATE DEFINER=`root`@`localhost` FUNCTION `sf_uri_normalise`(s VARCHAR(100))
    RETURNS varchar(100) CHARSET utf8
    NO SQL
    DETERMINISTIC
    COMMENT 'normalise a string to a SEO-friendly uri'
BEGIN
        RETURN LOWER(REPLACE(s, ' ', '-'));
END

MrMoo fucked around with this message at 20:05 on Jan 30, 2009

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...

var1ety posted:

In my experience the query convenience and the time saved by putting two pieces of data that are only marginally similar into the same table is almost always outweighed by the grief it causes you later in the application's life.

The biggest downside of the model is the inability to effectively use database constraints to police your data, which forces you to do constraint checking in your application, which is almost certainly a mistake.

That's the sort of advice I was looking for. Thanks.

Zoracle Zed
Jul 10, 2001

outlier posted:

Bam. Hadn't thought of that. By and large, for tables that represent fields I use autogenerated IDs, but it's not a decision I'd thought deeply about. Thanks.


For the sake of argument (and better understanding of design), if they were the same and would always be implemented the same, would they belong in the same table together? To cook up an artificial example, say I have two tables that are countries and ecosystems. The boundaries of a country are represented by a polygon that is stored in another table. The boundaries of an ecosystem are also represented by a polygon. Leaving aside the valid point about colliding identifiers as above, do the two sets of polygons belong in the same table?

You could do something like: Polygons (PolygonID, shape parameters), CountryShape (CountryID, PolygonID), and EcosystemShape (EcosystemID, PolygonID). Whether that's more convenient than just having two separate polygon tables would depend on the application.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

camels posted:

something like this?

code:
SELECT
  Product,
  SUM(B1) AS group0,
  SUM(B2) AS group1
FROM
  table
WHERE
  Product IN ('a', 'b', 'c)
GROUP BY 
  Product

UNION

SELECT
  'd' AS Product,
  SUM(B1) AS group0,
  SUM(B2) AS group1
FROM
  table
WHERE
  Product IN ('a' , 'b')

Yes! How come the portion after UNION won't properly GROUP BY the Product though?

Bad Titty Puker
Nov 3, 2007
Soiled Meat

fletcher posted:

Yes! How come the portion after UNION won't properly GROUP BY the Product though?

The result set in effect only has one Product, so the SUM() acts on the whole result set, and doesn't need a GROUP BY statement, if that makes sense.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

camels posted:

The result set in effect only has one Product, so the SUM() acts on the whole result set, and doesn't need a GROUP BY statement, if that makes sense.

Oh I was trying to be lazy and reuse most of the SQL statement already built since it's huge. I don't know why I didn't realize that. I need some sleep I think. Thanks!

Bad Titty Puker
Nov 3, 2007
Soiled Meat

fletcher posted:

Oh I was trying to be lazy and reuse most of the SQL statement already built since it's huge. I don't know why I didn't realize that. I need some sleep I think. Thanks!

Unless there's some compelling reason not to calculate "d" on the client side, I think doing so would be simpler, clearer, easier to maintain, and likely to perform better. Or you could use a temp table (untested):

code:
SELECT 
  Product, SUM(b1) AS sum_b1, SUM(b2) AS sum_b2
INTO 
  temp_table
FROM
  a_table
WHERE
  Product IN ('a', 'b', 'c')
GROUP BY 
  Product

SELECT 
  Product, sum_b1, sum_b2
FROM
  temp_table
UNION
SELECT 
  SUM(Product) AS d, SUM(sum_b1) AS sum_b1, SUM(sum_b2) AS sum_b2
FROM
  temp_table
WHERE
  Product IN ('a', 'b')
Anyway you would definitely want to look at the query execution plan of the 'do it all in one query' approach

Bad Titty Puker
Nov 3, 2007
Soiled Meat

var1ety posted:

In my experience the query convenience and the time saved by putting two pieces of data that are only marginally similar into the same table is almost always outweighed by the grief it causes you later in the application's life.

The biggest downside of the model is the inability to effectively use database constraints to police your data, which forces you to do constraint checking in your application, which is almost certainly a mistake.

Also it violates first normal form. A table models an entity, so you would not want to store different entities in the same table if they are only marginally similar. Besides that you would end up with a table where most of the column values are NULL.

Forkboy
Mar 2, 2005
LORD OF THE PANTS
Can someone explain aggregate functions and GROUP BY to me? I've read a bunch of simple tutorials and can't seem to make sense of this.

All this code has to do is, for each product we sell, find the latest purchase order (PO) that was generated (the MAX(po date)) and return some basic info about it.

code:
SELECT

  PO_Items.POI_ProductCode  AS 'Item Product Code',
  PO_Items.PO_Num AS 'PO Number',
  MAX( POs.PO_Date ) AS 'PO Date',
  PO_Items.POI_Quantity AS 'Qty. Ordered',
  PO_Items.POI_QtyReceived AS 'Qty. Received'

FROM PO_Items, POs

WHERE PO_Items.PO_Num = POs.PO_Num

GROUP BY PO_Items.POI_ProductCode

ORDER BY PO_Items.POI_ProductCode

This gives me invalid SQL errors, telling me that every other item in my SELECT statement should be in its own aggregate function (which returns inaccurate data) or in the GROUP BY clause (which returns all rows, not just the one with the latest date).

What am I missing here? It seems like it should be this simple.

Pizza Partisan
Sep 22, 2003

Forkboy posted:

Can someone explain aggregate functions and GROUP BY to me? I've read a bunch of simple tutorials and can't seem to make sense of this.

All this code has to do is, for each product we sell, find the latest purchase order (PO) that was generated (the MAX(po date)) and return some basic info about it.

code:
SELECT

  PO_Items.POI_ProductCode  AS 'Item Product Code',
  PO_Items.PO_Num AS 'PO Number',
  MAX( POs.PO_Date ) AS 'PO Date',
  PO_Items.POI_Quantity AS 'Qty. Ordered',
  PO_Items.POI_QtyReceived AS 'Qty. Received'

FROM PO_Items, POs

WHERE PO_Items.PO_Num = POs.PO_Num

GROUP BY PO_Items.POI_ProductCode

ORDER BY PO_Items.POI_ProductCode

This gives me invalid SQL errors, telling me that every other item in my SELECT statement should be in its own aggregate function (which returns inaccurate data) or in the GROUP BY clause (which returns all rows, not just the one with the latest date).

What am I missing here? It seems like it should be this simple.

It's exactly what the error message says. Either add the other fields to the GROUP BY or contain them in aggregate functions.

Edit: I'm sure it's not the way to do it, but I'd do a subquery. I'm sure you've got some sort of order ID, right?

code:
SELECT

  PO_Items.POI_ProductCode  AS 'Item Product Code',
  PO_Items.PO_Num AS 'PO Number',
  POs.PO_Date AS 'PO Date',
  PO_Items.POI_Quantity AS 'Qty. Ordered',
  PO_Items.POI_QtyReceived AS 'Qty. Received'

FROM PO_Items, POs

WHERE EXISTS POs.OrderID IN 

(SELECT MAX(POs.PO_DATE), POs.OrderID FROM PO_Items, POs 

GROUP BY PO_Items.POI_ProductCode)

ORDER BY PO_Items.POI_ProductCode

Pizza Partisan fucked around with this message at 19:25 on Feb 3, 2009

Forkboy
Mar 2, 2005
LORD OF THE PANTS

surrealcatalyst posted:

It's exactly what the error message says. Either add the other fields to the GROUP BY or contain them in aggregate functions.

This is what I'm not getting about the GROUP BY clause... adding every other field to the GROUP BY returns more than one record for each product code. I just need the one with the latest date.

rjmccall
Sep 7, 2007

no worries friend
Fun Shoe

Forkboy posted:

This is what I'm not getting about the GROUP BY clause... adding every other field to the GROUP BY returns more than one record for each product code. I just need the one with the latest date.

GROUP BY foo, bar, baz sorts the input relation into groups for all the unique combinations of values from foo, bar, and baz; its output relation then has one row summarizing each group. So the columns in the output relation have to summarize (potentially) multiple rows; therefore they must be either (1) aggregate functions (like AVG or MAX) over some input column or (2) one of the group-by columns, which are guaranteed to have a unique value within each group. What you want is an aggregate function which says "pick the row with the maximum date, then return such-and-such column from that row", which is not an a priori unreasonable request, but sadly there's just no such function built in to any databases that I know of. So you'll basically need a subquery.

rjmccall fucked around with this message at 20:07 on Feb 3, 2009

deimos
Nov 30, 2006

Forget it man this bat is whack, it's got poobrain!
I present The Ark of the Covenant (SQL 2005):
code:
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[SYS_getBody]
	@id numeric
AS
SET NOCOUNT ON
BEGIN TRANSACTION
DECLARE @strSQL varchar(4000)
  SET @strSQL=' SELECT strBody as Body '
  SET @strSQL=@strSQL+ ' FROM tblBodies '        
  SET @strSQL=@strSQL+ ' WHERE 1=1 '      
  SET @strSQL=@strSQL+ ' and intId in 
                            (select intContentItemId from tblcontent 
                            where intcontenttypeid=5 AND
                            intCollectionId = (select intRelatedCollectionId from tblstory 
                                               where intStoryId = ' + convert(varchar,@id) + '))'
  SET @strSQL= @strSQL+ ' ORDER BY intElementId asc' 

  exec (@strSQL)
  IF (@@error <> 0)
  BEGIN
      ROLLBACK TRANSACTION
      RETURN @@error
  END
  ELSE
  BEGIN
    COMMIT
    RETURN @@error
  END
This, my fellow goons, is what I have to deal with on a system that I have no control over. This is about the BEST code out of all the stored procedures, I don't want you guys to melt when you see the really bad code. I am not even a good DB programmer.

We complained to the provider of the above code that if we hired a people farm they could search our database manually faster than this stored procedure. To which they replied "do you have any pointers on how to make it faster?".

I am extremely frustrated right now and can't verbalize any reply to the question. Short of telling them to gently caress off and fire all their staff that has worked on the stored procedures, what resources can I give them on how not to be a worthless loving SQL programmer?

edit: table breakage fix

Victor
Jun 18, 2004
Make them read these books. Maybe look at some of my sql scripts such as historical data via EAV. Demand that they perform better, and when they say they can't, prove them wrong. Maybe if they were to read BetterSQL, they could understand some of what they're missing?

Bad Titty Puker
Nov 3, 2007
Soiled Meat
I would write a query that doesn't use dynamic SQL and is optimized, then run it on a test database where you can do this:

code:
DBCC DROPCLEANBUFFERS

GO

DBCC FREEPROCCACHE

GO

/* your query here */
and use a SQL Trace if you can, and then run the same code with calling that stored proc. That way you can tell your boss: "see, it runs 300x faster this way."

Something tells me you won't get anyone to see the light about EAV. Celko has some good stories to tell on USENET. He worked for a company is Austin that used EAV, and over time data integrity, performance, and maintainability got worse and worse and worse until the company went under.

Bad Titty Puker
Nov 3, 2007
Soiled Meat
Also, even if you can't make any changes to the stored proc or to the table design,
I wouldn't be surprised if there are missing, unneeded, and suboptimal indexes on the tables. Again it might be the kind of thing where you could use a test database to make a case for some changes, if you can get it to run much faster.

Khatib
Nov 12, 2007
How do I assign a single returned number from a select statement to a variable?

Basically I'm looking at something like this:


code:
DECLARE @SenID1 INTEGER
SET @SenID1 =
SELECT speedsensorid
FROM   (SELECT rn,
               speedsensorid
        FROM   (SELECT ROW_NUMBER()
                         OVER(ORDER BY speedsensorid) rn,
                       speedsensorid
                FROM   tblspeedsensor ss
                       INNER JOIN tblsite s
                         ON s.siteid = ss.siteid
                WHERE  s.siteid = @SITE_ID) x
        WHERE  rn = 1) z

I don't know jack about variables, and I'm having a hard time googling info on it because once you combine SQL and variables, it starts pointing you to all kinds of .ASP and VB guides it seems like.

And then I guess as a follow up if anyone knows, I'm going to have to do incremental values of that variable, based on rn + 1 to do this with multiple rows.


Unless there's an easier way to do all this...

Basically I need to run a case when speedsensorid = something in particular, and I'm using this to try and get those IDs and then put them into the case statement, because it seems like I can't use subselects in my case statements, unless I'm just doing that all wrong.

Pizza Partisan
Sep 22, 2003

Khatib posted:

How do I assign a single returned number from a select statement to a variable?

Basically I'm looking at something like this:


code:
DECLARE @SenID1 INTEGER
SET @SenID1 =
SELECT speedsensorid
FROM   (SELECT rn,
               speedsensorid
        FROM   (SELECT ROW_NUMBER()
                         OVER(ORDER BY speedsensorid) rn,
                       speedsensorid
                FROM   tblspeedsensor ss
                       INNER JOIN tblsite s
                         ON s.siteid = ss.siteid
                WHERE  s.siteid = @SITE_ID) x
        WHERE  rn = 1) z

I don't know jack about variables, and I'm having a hard time googling info on it because once you combine SQL and variables, it starts pointing you to all kinds of .ASP and VB guides it seems like.

And then I guess as a follow up if anyone knows, I'm going to have to do incremental values of that variable, based on rn + 1 to do this with multiple rows.


Unless there's an easier way to do all this...

Basically I need to run a case when speedsensorid = something in particular, and I'm using this to try and get those IDs and then put them into the case statement, because it seems like I can't use subselects in my case statements, unless I'm just doing that all wrong.

What you're looking for are SQL cursors. I don't have enough confidence to write out the code for you, but I can warn you to make sure they properly deallocate themselves so they don't kill your SQL server.

As for part one,
code:
Select @SenID1 = speedsensorid
should work.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
Let's say I'm getting a large (2 million rows?) amount of data that's supposed to be static and unchanging. Supposed to be. And this data gets republished monthly. What methods are available to 1) be aware of what data points have changed from month to month and 2) consume the data given a point in time?

Solution 1) Naively save every snapshot of data, annotated by date. Diff awareness is handled by some in-house program, but consumption of the data by date is trivial. Cons, space requirements balloon by an order of magnitude.

Solution 2A) Using an in-house program, track when the diffs happen and store them in an EAV table, annotated by date. Space requirements are low, but consumption integrated with the original data becomes unwieldly.

Solution 2B) Using an in-house program, track when the diffs happen and store them in a sparsely filled table that looks much like the original table, filled only with the data that's changed and the date when changed. Cons, model is sparse and consumption integrated with the original data is non-trivial.

I guess, basically, how do I integrate the dimension of time into a relational database, keeping in mind both the viewing of the data and awareness of differences between time periods?

Does this relate to data warehousing at all?

Smells like... Slowly changing dimension?

Triple Tech fucked around with this message at 23:23 on Feb 4, 2009

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Khatib posted:

How do I assign a single returned number from a select statement to a variable?

Basically I'm looking at something like this:


code:
DECLARE @SenID1 INTEGER
SET @SenID1 =
SELECT speedsensorid
FROM   (SELECT rn,
               speedsensorid
        FROM   (SELECT ROW_NUMBER()
                         OVER(ORDER BY speedsensorid) rn,
                       speedsensorid
                FROM   tblspeedsensor ss
                       INNER JOIN tblsite s
                         ON s.siteid = ss.siteid
                WHERE  s.siteid = @SITE_ID) x
        WHERE  rn = 1) z

I don't know jack about variables, and I'm having a hard time googling info on it because once you combine SQL and variables, it starts pointing you to all kinds of .ASP and VB guides it seems like.

And then I guess as a follow up if anyone knows, I'm going to have to do incremental values of that variable, based on rn + 1 to do this with multiple rows.


Unless there's an easier way to do all this...

Basically I need to run a case when speedsensorid = something in particular, and I'm using this to try and get those IDs and then put them into the case statement, because it seems like I can't use subselects in my case statements, unless I'm just doing that all wrong.

Can you use a lookup table? Cursors and row-at-a-time processing are slow as poo poo. I would look for a way to deal with whole sets of data instead of using procedural, one row at a time programming.

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Triple Tech posted:

Let's say I'm getting a large (2 million rows?) amount of data that's supposed to be static and unchanging. Supposed to be. And this data gets republished monthly. What methods are available to 1) be aware of what data points have changed from month to month and 2) consume the data given a point in time?

Solution 1) Naively save every snapshot of data, annotated by date. Diff awareness is handled by some in-house program, but consumption of the data by date is trivial. Cons, space requirements balloon by an order of magnitude.

Solution 2A) Using an in-house program, track when the diffs happen and store them in an EAV table, annotated by date. Space requirements are low, but consumption integrated with the original data becomes unwieldly.

Solution 2B) Using an in-house program, track when the diffs happen and store them in a sparsely filled table that looks much like the original table, filled only with the data that's changed and the date when changed. Cons, model is sparse and consumption integrated with the original data is non-trivial.

I guess, basically, how do I integrate the dimension of time into a relational database, keeping in mind both the viewing of the data and awareness of differences between time periods?

Does this relate to data warehousing at all?

Smells like... Slowly changing dimension?

What RDBMS are you using? I know SQL Server has database diff tools that might give you what you need. Here's one: Apex SQL Data Diff. It can diff database backups.

Victor
Jun 18, 2004
Triple Tech, how often do you have to query the 2 million rows and how often are you looking at data that isn't the most current? The specific usage of this table will almost definitely determine how you proceed. You might want to look into table-valued UDFs and see if they are smart about not fully "rendering" the table returned. If SQL Server is this smart, then you can abstract the implementation from the interface in a nice way. Otherwise, you might have to hard-code stuff and have it be relatively ugly.

Bad Titty Puker
Nov 3, 2007
Soiled Meat
Triple Tech,

Not sure what the specifications are, but I'm guessing that you can do what you need on the database side. What exactly do you need to do?

The diff tool I mentioned can compare two databases (or backups) and generate a .csv file of the data diffences. There are also tools that can analyze and manipulate transaction log data; actually it looks like Apex SQL makes one. I've used one before but can't remember who makes it.

If there is a tool that can analyze a differential database backup that would be p. sweet.

Another approach, if the data changes from month to month are relatively small, would be to use triggers to write to auditing tables that you could query to see all the data that changed in that month.

I don't really see a need to write much code on the client side, or to use table-valued UDFs here. If you log changes to auditing tables then you can just use a view or straight query the data. If you use SQL Server then you can use a tool that you can buy for 500 bucks.

Triple Tech
Jul 28, 2006

So what, are you quitting to join Homo Explosion?
It's SQL Server 2005. The hosed up part is that neither my boss or I know the use-case for this data, they just want it logged. Personally I think this whole venture is retarded and time consuming but my boss doesn't seem to want to budge. He's just out to fulfill this business directive.

And without getting into the text space, how do you diff two rows? How do I know that one whole row of data is equal to another row, in SQL space?

Bad Titty Puker
Nov 3, 2007
Soiled Meat

Triple Tech posted:

It's SQL Server 2005. The hosed up part is that neither my boss or I know the use-case for this data, they just want it logged. Personally I think this whole venture is retarded and time consuming but my boss doesn't seem to want to budge. He's just out to fulfill this business directive.

And without getting into the text space, how do you diff two rows? How do I know that one whole row of data is equal to another row, in SQL space?

I think it would make more sense to capture all the data changes over the one-month-period than diffing all the data in the two databases by hand. First I would consider using a tool to analyze the transaction logs, or using database triggers to write modified rows to audit tables. Second, if you absolutely have to diff all the data I would use a third-party tool instead of rolling your own code.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
My web host provides PHPMyAdmin as one of the tools that can be used to manage MySQL databases. I was idly looking through just now and noticed that a warning message of the following form appears on the page for each of the tables in my database (except for one of them which has two columns for its primary key):

quote:

The following indexes appear to be equal and one of them should be removed: PRIMARY, UserID

Is this something I should be concerned about?

The tables were created using commands of the following form:

quote:

CREATE TABLE User(UserID SERIAL, some other columns, PRIMARY KEY(UserID))

Pizza Partisan
Sep 22, 2003

Hammerite posted:

My web host provides PHPMyAdmin as one of the tools that can be used to manage MySQL databases. I was idly looking through just now and noticed that a warning message of the following form appears on the page for each of the tables in my database (except for one of them which has two columns for its primary key):


Is this something I should be concerned about?

The tables were created using commands of the following form:

Why create tables with commands rather than using PHPMyAdmin, if you've got access to it?

Adbot
ADBOT LOVES YOU

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

surrealcatalyst posted:

Why create tables with commands rather than using PHPMyAdmin, if you've got access to it?

I didn't realise at the time that there was a tool like PHPMyAdmin available. I was learning PHP and MySQL out of a book and pretty much did what the book said to do. It gave the PHP code for a page that could be used for sending queries to the database. I still use that page when I want to query the database, except for security it now needs the MySQL username, password and database name to be entered by the user.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply