|
Okay so I've got this question in Oracle that I can't figure out an answer to. It's probably not too complicated, but I just have a lack of general knowledge about the stuff. So I've got some tables, for example: code:
code:
|
# ? May 27, 2009 21:21 |
|
|
# ? May 29, 2024 02:49 |
|
Triple Tech posted:tunah, if the data is already normalized, what more can you do to store it? Not sure how else you'd go about it. Would this be completely unusable: table articles primary key id | release_id | ... subject, date, and so on for a single article Wouldn't it completely kill the database to select every row with a specified release_id out of the 45 million rows? Can I make the release_id a key even though there will be several articles that share that same release_id? Edit: What I mean to ask is, since a release is really a group of several articles, is it better to have all of the articles in a column for the release, or can it be laid out like above?
|
# ? May 27, 2009 21:31 |
|
morpheous: select from table a province, count(*) group by province select from table b province, count(*) group by province And then just join the second statement onto the first, by province. It's like joining on to a table, but just one declared on the spot. The complexity comes in the following scenario A: distinct provinces: A, B, C B: distinct provinces: C, D, E, F You have to full join the two tables together and then coalesce a display column for A.province and B.province. Or, make a CTE that's master list of all provinces (distinct A union distinct B) and then left join both tables against that master list (not against each other).
|
# ? May 27, 2009 21:36 |
|
Hmm, not totally clear on what you're saying... A release is one to many with articles, in that a set of articles composes a release, right? So you're saying maybe... The metadata of an article doesn't matter? That there's some way of stitching the body of an article (a data point which is some sort of varchar or blob?) into one huge data point of a row in the release table? I would say... If the metadata of articles or their atomicity is still important to you, then keep them in a seperate table and do the stitching elsewhere. Also, if you know that they will always be fetched by release id, then cluster index on that, article_id. I don't know enough about databases and performance to comment on how large streams are stored in a DB. But if they're a property of an article, then technically I guess they'd be in the article table. Your release table should be extremely thin, since it's generated summary/meta-data mined from the articles, if I'm understanding your scenario correctly.
|
# ? May 27, 2009 21:41 |
|
Triple Tech posted:Hmm, not totally clear on what you're saying... A release is one to many with articles, in that a set of articles composes a release, right? So you're saying maybe... The metadata of an article doesn't matter? That there's some way of stitching the body of an article (a data point which is some sort of varchar or blob?) into one huge data point of a row in the release table? The shortened subject can perhaps be hashed and stored as that instead, rather than using the actual shortened subject when selecting from releases. My first try was to have a single table for releases. All articles would be stored in a column for their release. Selecting by "name" from that took quite some time, probably since the column with articles grew extremely large, so a lot of data was in between where the "name" was stored for rows.
|
# ? May 27, 2009 21:56 |
|
Well, I mean, technically you could get around that by indexing, but having very wide tables precludes you from serendipitous page reads... Essentially the smaller the better. I was sort of going to advocate storing the streams on the filesystems and having no article body data in the database, but again, I cannot comment on the performance and use of such data having never done it myself. But since articles are so frequently talked about as first class objects, I think that alone merits them being in a seperate table. There's always growth and optimization available there. While, if you use a mega-release table, not only is it huge and bulky, it's denormalized, in a way, robbing you of that flexibility.
|
# ? May 27, 2009 22:31 |
|
Morpheus: A couple of options...code:
code:
|
# ? May 27, 2009 22:36 |
|
I don't have a specific SQL question, but I'm looking for a book recommendations. Specifically I'm looking to do development in C#. Anyone have a favorite or are they all pretty much the same?
|
# ? May 28, 2009 01:16 |
|
Well SQL and C# are two different realms of development. And I think with the advent of LINQ based in the C# world, you won't really be using SQL much. Consider looking that up before you start learning SQL that you might not need (even though SQL is p tits)
|
# ? May 28, 2009 03:36 |
|
Does anyone have experience with postgresql? I'm transferring a database from SQL Server to Postgres and I'm just trying to get the schema down now. In SQL Server I would do something like code:
code:
|
# ? May 28, 2009 04:29 |
|
Oh man thanks. I finally put together the query that I needed, and it is a monster. Basically, I used MoNsTeR's first example. I needed to select from five different tables (and a table for the province list), each created with subqueries, and each making their own subquery to join tables and values and good lord. But it produces beautiful results, and I learned about the (+) thing today, which is delightful. This is what one of my five subqueries looks like: (select PROVINCE, count(*) as COUNT from (select INFO_TABLE.PROV as PROVINCE, COLLECTION_TABLE.TYPE AS APP_TYPE from INFO_TABLE, COLLECTION_TABLE where INFO_TABLE.ID = COLLECTION_TABLE.ID) where APP_TYPE like '%New%' group by PROVINCE) NEW_APPS Ah, the sweet programming high of getting a problem solved.
|
# ? May 28, 2009 16:59 |
|
nevermind.
ilikechapstick fucked around with this message at 15:59 on May 29, 2009 |
# ? May 29, 2009 15:52 |
|
ilikechapstick posted:Having trouble with this question here...maybe I'm just not understanding it. Maybe something like select email, count(*) from users group by 1 having count(*) > 1;
|
# ? May 29, 2009 15:59 |
|
Numbnuts posted:
You're on the right track, but saying PRIMARY KEY is the same as saying UNIQUE and NOT NULL. What version of postgres are you using? I think WITH (OIDS=FALSE) is the default for more recent versions. code:
|
# ? May 31, 2009 17:20 |
|
Land Mime posted:You're on the right track, but saying PRIMARY KEY is the same as saying UNIQUE and NOT NULL. What version of postgres are you using? I think WITH (OIDS=FALSE) is the default for more recent versions. Thanks, I'm using 8.3.7. After I created a table pgAdmin gave me a SQL pane that shows how they would have created the table and it's pretty much exactly what you posted there. Now I just need to finish converting all the stored procedures
|
# ? May 31, 2009 22:40 |
|
I asked a question about this problem a few pages ago and got a response from Triple Tech, but didn't ask any of the followup questions that have occurred to me since (mostly since I wrote it off as something I would deal with later on). I will restate the problem for convenience. My database (MySQL) stores information about play-by-email board games (one particular board game) that people are playing against each other. Each game allows for up to 4 players, and there are 5 colours that the users can play as (allowing for future functionality that is anticipated to be necessary). When a user starts or joins a game, he chooses what colour he will be. I have a table Game and a table User. The table Game, in a poor piece of design, carries all of the information about who is playing in the game, in 5 columns: UserR, UserY, UserG, UserP, UserH (for Red, Yellow, Green, Purple and Grey). (These columns are NULL if there is nobody playing as the corresponding colour.) When running database queries, it is presently necessary to perform 5 LEFT JOINs, as in code:
Sometimes even 6 LEFT JOINs are needed, as there is a separate column for "game creator" and it is sometimes desirable to have this user's information joined as UserC. (The game's creator is usually one of the players, but need not always be.) This is evidently not good design and should be restructured by DROPping the columns UserR, UserY, UserG, UserP and UserH and implementing a new table, GamePlayer, that has three columns: Game, User and Colour. Then, the above query becomes code:
code:
Suppose I want to fetch information about the games that user x is playing, the creators of those games, and the players who are playing in the games. Is the following query the best way of doing this? code:
|
# ? Jun 1, 2009 13:48 |
|
Common problem, at least for me. You could also seperate out the two queries (game per user, users per game) and loop through the list of games individually. The overhead for the entire process goes up, but it's more functional and logical from a coder's point of view.
|
# ? Jun 1, 2009 14:21 |
Is it possible to gain information from 2 different tables in 1 column? Example: You want to pull reviews for games into a table, where you have the game's keys already stored, such as this: Act101031 48 Act030122 78 Pla012394 92 Pla302144 67 However, the ratings for Pla* games are in Table Pla_Score and the ratings for Act* games are in Act_Score. All you have is the game number to act as a key.
|
|
# ? Jun 1, 2009 16:24 |
|
Look up UNION as a way to stack SELECT statements together, vertically. Keep in mind that it may automatically DISTINCT the results.
|
# ? Jun 1, 2009 16:33 |
Triple Tech posted:Look up UNION as a way to stack SELECT statements together, vertically. Keep in mind that it may automatically DISTINCT the results. This looks good, but I'm not sure how to actually tie this together in a statement. Here's my problem, a bit more fleshed out: code:
|
|
# ? Jun 1, 2009 16:43 |
|
Oh I see what your thing is now... You're doing inheritance and the part of the key that dictates the subtable is mashed into that one key... Ideally that key in the GAMELIST table would be a two-column composite key. So consider that.(pseudo) select g.game_number, a.rank from gamelist g left join adv_list a on a.game_number = g.game_number where g.game_number looks like act/adv UNION ALL <-- dis ting right hurr select g.game_number, p.rank from gamelist g left join adv_list p on p.game_number = g.game_number where g.game_number looks like pla Inheritance... Never pretty...
|
# ? Jun 1, 2009 17:05 |
|
Triple Tech posted:Common problem, at least for me. You could also seperate out the two queries (game per user, users per game) and loop through the list of games individually. The overhead for the entire process goes up, but it's more functional and logical from a coder's point of view. (Ignore please my failure to use a database framework, this is an entirely separate issue that I may address at another time) 1) Query to get all the games user x is playing in ... code:
php:<? $GamesArray = array(); while ( $row = mysqli_fetch_assoc($QueryResult) ) { $GamesArray[] = $row; } $NumGames = count($GamesArray); $GameIDList = array(); for ($i=0;$i<$NumGames;$i++) { $GameIDList[] = $GamesArray[$i][Game]; } $GameIDListString = implode(', ',$GameIDList); ?> code:
php:<? $Counter = 0; $ReadPlayerYet = 0; while ( $row = mysqli_fetch_assoc($QueryResult) ) { if ( $row[Game] != $GamesArray[$Counter][Game] ) { if ( $ReadPlayerYet ) { $Counter++; if ( $row[Game] != $GamesArray[$Counter][Game] ) { $error = 1; } else { $error = 0; } } else { $error = 1; } if ( $error ) { // handle broken games that have no players } } else { $ReadPlayerYet = 1; } foreach ( $row as $key => $value ) { $GamesArray[$Counter][$key][$row[Colour]] = $value; } } ?> 2) Query to get all the games user x is playing in, as in 1) ... code:
php:<? $GamesArray = array(); while ( $row = mysqli_fetch_assoc($QueryResult) ) { $QueryResultX = mysqli_query($link, 'SELECT GamePlayer.Colour, User.(some columns) FROM GamePlayer LEFT JOIN User ON GamePlayer.User = User.UserID WHERE GamePlayer.Game = '.$row[Game]); while ( $rowX = mysqli_fetch_assoc($QueryResultX) ) { foreach ( $rowX as $key => $value ) { $row[$key][$rowX[Colour]] = $value; } } $GamesArray[] = $row; } ?> What's the difference in overhead likely to be between doing these things? 1) looks the tidiest, in that it involves doing two queries of limited complexity instead of n+1 - what's the difference in overhead likely to be between doing that and just using a single query to get all the data, as at the end of my previous post? I guess I'm asking opinions, which way of doing it is "least bad"? What happens if I'm looking to get information about a single game (to include information about its creator and all of its players)? I imagine I'd use the following: code:
code:
code:
|
# ? Jun 1, 2009 18:04 |
|
Well, this is an implementation detail/engineering concern. I think you should do the n + 1 version just to get it out of the way since it's simpler and more obvious. The other one, while it minimizes the amount of queries, makes your code more complex. Just solve your workflow first and worry about the queries later, I say. Just to be clear, we're choosing one of code:
|
# ? Jun 1, 2009 19:48 |
|
This code is for entertainment purposes only. This demonstrates what happens when prolonged sleep deprivation meets stimulants of various forms. Although I like the idea of using a query that returns one row per seat, instead of returning everything in one denormalized row, thus avoiding unions, left outer joins, self joins etc. I also like the idea of a Seats table (game, player, seat) Something like this? code:
Bad Titty Puker fucked around with this message at 20:03 on Jun 1, 2009 |
# ? Jun 1, 2009 19:54 |
|
OK, well, thank you both for your suggestions! I will give it some more thought which approach to take.
|
# ? Jun 1, 2009 22:28 |
|
This is long so please bear with. I need some advice to try to ease a problem we have with our Oracle database. Right now we have a database schema that is organized roughly like this: Organization Site Service ND_Service Service_Hours ND_Service_Hours service_status ND_service_status service_requirements ND_service_requirements etc. Every row in the site table has an organization_id, every row in the two service tables has a site_id. The problem is that our schema evolved oddly and we've ended up with those two sets of service tables to represent data for two different service types. Both sets of tables are almost identical except for a few differences here and there. This worked alright until we needed to get data out of both sets of service tables at once. To pull this off we needed to do a lot of union SQL queries. The performance and complexity penalties from doing this sucks. I've tried to pitch merging the two tables but I'm not getting buy in because of how much software would need to be refactored. Last week I pitched the idea of trying out some materialized views to our DBA. We set up some materialized views that merge the two sets of tables leaving us with something like this: mv_service mv_service_hours mv_service_status mv_service_requirements etc. These views are great but I'm finding that we're taking a nasty hit in performance when I join say the mv_service with mv_service_hours because its doing full table scans on each materialized view. I'm at a loss at how to ease this problem. I think at this point we may just use some regular old views to at least help us with the complexity issue but that doesn't help us on the performance front. Any ideas?
|
# ? Jun 3, 2009 14:53 |
|
Is your data indexed properly?
|
# ? Jun 3, 2009 15:08 |
|
Triple Tech posted:Is your data indexed properly? Yeah the tables themselves are indexed properly. The problem is that once the two sets of tables are a materialized view I'm unable to join between the different materialized views without lots of full table scans.
|
# ? Jun 3, 2009 15:11 |
|
Is your view indexed? I don't know what else to say other than that you shouldn't ever be getting full table scans. That and I don't have any experience with materialized views, so I'm not sure what the implementation details/limits are.
|
# ? Jun 3, 2009 15:41 |
|
8ender posted:Yeah the tables themselves are indexed properly. The problem is that once the two sets of tables are a materialized view I'm unable to join between the different materialized views without lots of full table scans. You will need to create indexes and constraints (such as a primary key) on the materialized views consistent with the queries you are running against them.
|
# ? Jun 3, 2009 15:42 |
|
var1ety posted:You will need to create indexes and constraints (such as a primary key) on the materialized views consistent with the queries you are running against them. I wasn't aware this was possible with materialized views.
|
# ? Jun 3, 2009 15:51 |
|
8ender posted:I wasn't aware this was possible with materialized views. You can find more information on their feature set in the Data Warehousing Guide and in the SQL Reference, but for the most part you can think of them as tables. Data Warehousing Guide - 8 Basic Materialized Views http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/basicmv.htm#i1007299 SQL Reference - CREATE MATERIALIZED VIEW http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_6002.htm#i2063793 SQL Reference - CREATE MATERIALIZED VIEW LOG http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_6003.htm#i2064649 If the underlying data changes with any regularity and response time is sensitive you will want to read about fast refreshing, which uses materialized view logs on the base tables to intelligently update only rows which have changed since the last rebuild. Data Warehousing Guide - 8 Basic Materialized Views - General Restrictions on Fast Refresh http://download.oracle.com/docs/cd/B19306_01/server.102/b14223/basicmv.htm#sthref534
|
# ? Jun 3, 2009 16:19 |
|
var1ety posted:If the underlying data changes with any regularity and response time is sensitive you will want to read about fast refreshing, which uses materialized view logs on the base tables to intelligently update only rows which have changed since the last rebuild. We've got fast refreshing setup and running well right now. If I can get our DBA to put some indexes and primary keys on these views then we might be in business. Now the only thing left to sort out is making Toplink work well with the views.
|
# ? Jun 3, 2009 16:34 |
|
Fight to merge the tables now before it's too late. The difficulty, perceived and actual, of refactoring is only going to get worse as the application and schema continue to evolve.
|
# ? Jun 3, 2009 21:04 |
|
MoNsTeR posted:Fight to merge the tables now before it's too late. The difficulty, perceived and actual, of refactoring is only going to get worse as the application and schema continue to evolve. Believe me, I know. I've fought the good fight and lost on that one unfortunately. I've gotten the indexes setup on the materialized views and thats solved the performance problem but now I have another. There doesn't seem to be a primary key defined in any of these views. It should be the column SERVICE_ID, which both the table used to create the view have, but its not being carried into the view. Unfortunately this is making Toplink gently caress up when I try to create objects from tables and include the new views. Any ideas?
|
# ? Jun 4, 2009 15:47 |
|
8ender posted:Believe me, I know. I've fought the good fight and lost on that one unfortunately. Manually add a primary key constraint on the materialized views.
|
# ? Jun 4, 2009 16:47 |
|
I've got a Table that gets updated monthly, usually on the 5th or 6th business day after month-end. I need to be able to do a quick check to make sure that the required data is there before the rest of code executes. (bear in mind that this is a pretty big table, with several thousand rows per month, for the last 100-150 months.) What is the fastest way to do this? code:
code:
Maybe it will depend on wether the cursor/recordset is Client-side or Server-Side?
|
# ? Jun 4, 2009 20:20 |
|
I don't know what language you're using but a count would be faster than selecting an entire set of records. Also, make sure you index on the field you're partitioning on (which seems to be the date). Even faster would be a clustered index, but that depends entirely on how else the table is used.
|
# ? Jun 4, 2009 20:25 |
|
Triple Tech posted:I don't know what language you're using Ooops, this is DB2. I only work with DB2 and JET. Triple Tech posted:a count would be faster than selecting an entire set of records. Ok, thats what I thought, but I was getting confused by trying to think through client/server (I do almost everything with a static, client-side cursor). Triple Tech posted:Even faster would be a clustered index, but that depends entirely on how else the table is used. Eh, I'm just an analyst, I have no powers over that stuff. I can't even create/update Views. If I want to make a change like that, the process is loving byzantine. Anyway, thanks for the help. Squashy Nipples fucked around with this message at 00:46 on Jun 5, 2009 |
# ? Jun 4, 2009 20:39 |
|
|
# ? May 29, 2024 02:49 |
|
If you do not need a count it's better to just check for a single row meeting your criteria. Two mostly equivalent Oracle versions follow.code:
code:
|
# ? Jun 4, 2009 20:48 |