Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
PotatoJudge
May 22, 2004

Tell me about the rabbits, George
I'm not sure if this is possible but I need to add an ad-hoc serial field to a Postgres query. The query below is what I'm working with, the geocode function creates the fields by running the supplied address against a couple of tables of TIGER data. I need to pass these results on to something else and that something else requires a unique id field.




code:
SELECT  g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, 
	(addy).address As stno, (addy).streetname As street, 
	(addy).streettypeabbrev As styp, (addy).location As city, 
        (addy).stateabbrev As st,(addy).zip, g.geomout As the_geom
FROM geocode('123 Main St, Anytown, OH, 99999') As g;

Adbot
ADBOT LOVES YOU

Pardot
Jul 25, 2001




PotatoJudge posted:

I need to pass these results on to something else and that something else requires a unique id field.


You should either use uuids

quote:

create extension "uuid-ossp";
select uuid_generate_v4() as id, g.rating, ST_X...

or make a new sequence and use that http://www.postgresql.org/docs/current/static/sql-createsequence.html. These can be temp for just the session if you want.

quote:

create sequence whatever;
select nextval('whatever') as id, g.rating, ST_X...

PotatoJudge
May 22, 2004

Tell me about the rabbits, George
UUIDs did the trick. Thanks!

Doctor Malaver
May 23, 2007

Ce qui s'est passé t'a rendu plus fort
If there are any goons with zero to very little SQL experience browsing this thread, I'm looking for reviewers for this book:

Some IT pros choose to be DBAs, getting the required training and certification before tackling the challenge of maintaining a SQL Server database. Many others, however, find themselves thrust into the role of database administrator out of necessity. These accidental DBAs need to quickly ramp up and concentrate on the concepts and techniques that will give them the biggest payoff in the shortest time.
Learn Learn SQL Server 2012 in a Month of Lunches is perfect for someone who wants—or needs—to learn SQL Server on the fly. Just set aside an hour a day—lunchtime would be perfect—you'll be up and running faster than you thought possible.


More details in this thread: http://forums.somethingawful.com/showthread.php?threadid=3463489&userid=0&perpage=40&pagenumber=1

Bhaal
Jul 13, 2001
I ain't going down alone
Dr. Infant, MD
Alright, I'm working on a particular family of queries for a large aggregate report. Our database (MySQL) has some historical problems that have yet to be fully fixed and cleaned, so the following query will run into minor cartesians (duplicate user accounts, basically) which is throwing off my results.

I should probably get into what I'm trying to accomplish, then:

The Schema (egregiously simplified)
code:
# Master users table, for our app
Table `users`
--------------
userid
username

# User table of 3rd party app that integrates into our system
# This is linked by username and should be 1:1 with our user table.  But..
# 1) I know, it should be FK'd by userid but it's not, I'm trying to fix that but it's 1 part politics and 9 parts complicated technical roadblocks
# 2) It's not 1:1, a nominal but not immense % of our users have 2 accounts in zusers each matching on username (I'm trying to fix that, too)
Table `zusers`
-------------
zuserid #NOT the same as users's userid
username #matches the user's username

# This is actually a collection of tables for the 3rd party app but just to finish the thought there's useful data I want to probe here, 
# linked via the zuserid
Table `zdata`
--------------
id
zuserid
data
Re: "Just get that zuser table normalized!" - I've simplified the real problem(s) in this post, there's actually about 3 columns on users/zusers each that are used to reference each other in clever (read: idiotic) ways. User propagation to this table is a (needlessly) complicated process written in super terrible ancient 3rd party php with multiple points of entry. I've spent a non-trivial amount of time in there and have not definitively identified why or how the duplicates come about. Other than sometimes it thinks they don't yet have an account and makes another one. The real solution I'm working on is getting the functionality we use out of this 3rd party app cloned in-house so that we don't have parallel user tables and can sever from the app entirely. It's got benefits outside of the scope of this problem and our app will feel a lot less like it's shackled to a corpse. I hate this 3rd party app, it fits one of those "80% of your problems come from 20% of your system" idioms perfectly.

Okay, so what I need (in this instance) is to get headcounts of people who match various criteria. I accomplish this all over the place with queries like so:

The Query
code:
select
sum(1) as total,
sum(if(...,1,0)) as criteria1,
sum(if(...,1,0)) as criteria2,
...
from users u
[joins, left-joins, and WHERE, all crafted to only get one row per user back...]
What I get is a single row and in each column is a headcount matching X criteria, along with a total headcount which I can use to pull percentages and so on. But when it comes to probing the 3rd party app I invariably run into duplicate zuser accounts (each potentially with valid data in them so we need to evaluate both accounts and set a criteria match for the user on any hits--kill me now). So some people get counted twice and that blows up the accuracy of our report (which is of course client facing; these reports are printed and passed around in their shareholder meetings, so inaccuracy is not an option). That query looks like the above with one specific addition:
code:
select
sum(1) as total,
sum(if(...,1,0)) as criteria1,
sum(if(...,1,0)) as criteria2,
...
from users u
join zusers zu on zu.username = u.username
[joins, left-joins, and so on ....]
My question: is there any query cleverness to clamp down on the duplicate head counts? I've tried a variety of things but can't seem to fall into what i need. The best I can do is a simple "group by u.userid", which gives me back a row for each user. I can certainly roll through that and accumulate my sums that way, but I'm really looking to get this all handled in MySQL, and in one query if at all possible. This is a very large aggregate report and the zuser stuff comes into play for roughly half the queries. It runs at acceptable speeds but our business growth is going to introduce data sets 1-2 orders of magnitudes larger in the 1-2 year horizon, so I'm trying to keep things tightened up now.

It feels like there's something I can do with DISTINCT and GROUP BY magic but I haven't been able to really find anything beyond a simple group-by and application-side walkthrough of all the rows like I mentioned above. I could go with staging tables (I'm already caching the final calculated results so the report only gets computed once) but I'll admit I'm not enough of a db expert to know if just throwing more meta tables at the problem is really the way to go. It feels like a cop-out to move in that direction when my query is so close to what i need.

My other best guess, which I haven't tried implementing yet, is to left join twice into zusers and arbitrarily arrange them. I'm guaranteed to only have one duplicate. Due to the nature of the duplicate-user-generating mindfuck code that is our 3rd party app, users will never get 3 zuser accounts, only one or two. So, maybe something like this:

code:
select
sum(1) as total,
sum(if(...,1,0)) as criteria1,
sum(if(...,1,0)) as criteria2,
...
from users u
left join zusers zu on zu.username = u.username
left join zusers zu_dupe on zu_dupe.username = u.username and zu.zuserid != zu_dupe.zuserid
[joins, left-joins, and so on but all matches to zu.userid are OR'd with matches to zu_dupe.zuserid (or use IN)]
WHERE zu.zuserid < zu_dupe.zuserid
Which I think would do it. It's ugly and hackish and I'm not quite sure how big of a performance impact we'll see. But I can't figure out any other way.

Any ideas? Reminder: What I'm after for reporting purposes is a headcount of users, total and matching on various criteria. If they have duplicate accounts that's fine (ugh, for this at least) I just need them counted once even though both their accounts are traversed.

Bhaal fucked around with this message at 22:43 on Apr 19, 2012

Bhaal
Jul 13, 2001
I ain't going down alone
Dr. Infant, MD
After writing all that, I went with adding another left join to zuser, turning all matches to zuser's zuserid in zdata into an "IN (zuser.id, zuser_dupe.zuserid)", and adding some bits in my where clause to bring the rows per user down to 1 across the board. It's ugly but it works, and I'm going to use the nominal performance hit as ammo for allocating dev time to getting rid of this 3rd party app as well as fixing some db design misses in our own app that wind up bloating out queries when you want to probe data meaningfully. If this thing had proper indicies and FKs from the start I would have put this whole reporting suite to bed months ago, and with turnkey automation for our account managers & clients, instead of having to sap dev time to babysit the process each time. Not to mention less anxiety for me when I don't have my reports using these big dumb queries that look less like data mining and more like reading tea leaves.

tl;dr 12+ year old LAMP app with only the barest notion of things like RDBM and MVC causes for much pain when you try to actually use it. News at 11.

Bhaal fucked around with this message at 02:06 on Apr 20, 2012

opie
Nov 28, 2000
Check out my TFLC Excuse Log!
It's 4am and my head hurts. I don't do a lot of sql queries, and I have no idea how to write one that will do this thing. I have a table called OrderReceived with four columns that I care about:

ReceivedDate - date
TransactionType - varchar, this corresponds with the table I want to join
TransactionNum - varchar, this is the column I want to join on
CustomerNum - varchar

I want to find all the transactions for a particular CustomerNum where it took longer than 10 days to receive the order. Here's where it gets tricky. There are several tables that contain transaction dates. And the corresponding TransactionNum and date of transaction columns aren't named the same in each table (I did not design this database). So let's say there are three tables (it's more like 8 that I have to deal with):

Shoes
S_Num - varchar
S_Date - date of sale

Pants
P_Number - varchar
P_Date - date

Hats
Hat_Num - varchar
Hat_Date - date

If TransactionType in the ReceivedOrder table equals 'Shoes', then I want to join on TransactionNumber and S_Num to get S_Date (it is a 1-1 relationship). The same logic follows for the other tables. I want to subtract the S_Date (or whatever) from the ReceivedDate, and if the difference is more than 10 days, then add the record to the record set.

I'm writing this in javascript which is also a huge pain in the rear end and I am very limited in what I can do within our infrastructure (not allowed to compile the c#). I don't see a way to just use an array that I can filter as I want because all the output wants a recordset that apparently executes the sql several times.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

opie posted:

It's 4am and my head hurts. I don't do a lot of sql queries, and I have no idea how to write one that will do this thing. I have a table called OrderReceived with four columns that I care about :

ReceivedDate - date
TransactionType - varchar, this corresponds with the table I want to join
TransactionNum - varchar, this is the column I want to join on
CustomerNum - varchar

I want to find all the transactions for a particular CustomerNum where it took longer than 10 days to receive the order. Here's where it gets tricky. There are several tables that contain transaction dates. And the corresponding TransactionNum and date of transaction columns aren't named the same in each table (I did not design this database). So let's say there are three tables (it's more like 8 that I have to deal with):

Shoes
S_Num - varchar
S_Date - date of sale

Pants
P_Number - varchar
P_Date - date

Hats
Hat_Num - varchar
Hat_Date - date

If TransactionType in the ReceivedOrder table equals 'Shoes', then I want to join on TransactionNumber and S_Num to get S_Date (it is a 1-1 relationship). The same logic follows for the other tables. I want to subtract the S_Date (or whatever) from the ReceivedDate, and if the difference is more than 10 days, then add the record to the record set.

I'm writing this in javascript which is also a huge pain in the rear end and I am very limited in what I can do within our infrastructure (not allowed to compile the c#). I don't see a way to just use an array that I can filter as I want because all the output wants a recordset that apparently executes the sql several times.

Well I guess the easiest way to do this is to use a UNION query. I guess you also have the option of doing a bunch of LEFT JOINs but then you have to test which table a result came from.

code:
    SELECT
        OrderReceived.*
    FROM
        OrderReceived
        JOIN Shoes ON
            OrderReceived.TransactionNum = Shoes.S_Num AND
            -- date comparison here --
    WHERE
        OrderReceived.TransactionType = 'Shoes' -- or whatever
UNION ALL
        SELECT
        OrderReceived.*
    FROM
        OrderReceived
        JOIN Pants ON
            OrderReceived.TransactionNum = Pants.P_Num AND
            -- date comparison here --
    WHERE
        OrderReceived.TransactionType = 'Pants'
UNION ALL
        SELECT
        OrderReceived.*
    FROM
        OrderReceived
        JOIN Hats ON
            OrderReceived.TransactionNum = Hat.Hat_Num AND
            -- date comparison here --
    WHERE
        OrderReceived.TransactionType = 'Hats'
You have the option of cleverly constructing the query using an array of tables names, column names and so on, rather than writing it out by hand.

By the way, this is server-side JavaScript or something, right? You're not writing SQL in the browser and then submitting it to a web server to execute.

opie
Nov 28, 2000
Check out my TFLC Excuse Log!
Thanks for the response. This is all server-side. I think what you have makes sense, it's just a bit daunting because there are about 8 tables and at least 40 possibilities for TransactionType - it doesn't actually map to the table names the way I thought. An example would be that a value of "SHOE", "BOOT", "HEEL", etc, would point to the Shoes table. And naturally there is one mapping that has a totally different way of doing everything. Everytime I dig into this stuff I find something even more idiotic than what I found before.

What I'm trying to accomplish with this is to avoid using headers for sections that will not have visible rows. So if they can only see records with the 10 day limitation, and there aren't any, then the header shouldn't be shown. Currently the code will show a placeholder with an empty line for each "hidden" record, and really I have no idea why it was done that way. What they now want is to not show the placeholders at all, or the headers if there is no visible data. The most efficient way of doing this is with the complicated query. Otherwise, I can query all the rows, go through each one until I find a visible record and print the header, and then do it again to display the rows (the code to show just the placeholder is in the c# which I can't change or compile). Ugly, but it'll work.

I don't expect this to make much sense - the system is goofy to begin with and not my area of expertise, and I'm sick and haven't slept in over 30 hours.

Loutre
Jan 14, 2004

✓COMFY
✓CLASSY
✓HORNY
✓PEPSI
I'm trying to write a query that has as its columns,

Course, # Of Tickets in table Tickets with this course name, # of Meetings in table Meetings with this course name, all where a USERID = some input gotten from my JSP page.

What can I do here?

I can get Course, # Tickets or Course, # Meetings where USERID = 'whatever' easy as pie. Getting # Tickets, # Meetings as their own columns is driving me mad. I feel like there's an easy solution I'm missing, but all my fiddling with unions and joins has gotten me nowhere.

Bhaal
Jul 13, 2001
I ain't going down alone
Dr. Infant, MD
I'm not certain where you mean userid fits in that, but I'm assuming it's on the course table. If so, you could use something like this:
code:
select 
c.coursename as "Course",
sum(if(t.id is not null,1,0)) as "# of tickets",
sum(if(m.id is not null,1,0)) as "# of meetings"
from course c
left join tickets t on t.coursename=c.coursename
left join meetings m on m.coursename=c.coursename
where c.userid='whatever'
group by c.coursename

Doc Faustus
Sep 6, 2005

Philippe is such an angry eater
Not really a SQL question, but I also don't think there's an Oracle DB thread, so here we go!

Trying to load a bunch of data into a table using Oracle's SQLDeveloper. Everything's fine for quite a while, until I get the error seen below. Any idea what could be prompting this error while just running insert statements? Any idea how to avoid them?

Only registered members can see post attachments!

Fcdts26
Mar 18, 2009
I'm taking a Oracle class and I have a question about adding a not null constraint to an existing column. I wrote this up and it works:

alter table MEMBER
MODIFY (ZIP varchar2(10) default 85233
CONSTRAINT mem_zip_nn NOT NULL)

Is there a shorter way of doing this? Thanks

var1ety
Jul 26, 2004

Nask26 posted:

I'm taking a Oracle class and I have a question about adding a not null constraint to an existing column. I wrote this up and it works:

alter table MEMBER
MODIFY (ZIP varchar2(10) default 85233
CONSTRAINT mem_zip_nn NOT NULL)

Is there a shorter way of doing this? Thanks

code:
alter table member modify (zip not null);
You do not need to re-specify the datatype and default value when modifying a column, unless you are changing them.

Fcdts26
Mar 18, 2009

var1ety posted:

code:
alter table member modify (zip not null);
You do not need to re-specify the datatype and default value when modifying a column, unless you are changing them.

Thanks! Worked perfect!

Goat Bastard
Oct 20, 2004

Doc Faustus posted:

Not really a SQL question, but I also don't think there's an Oracle DB thread, so here we go!

Trying to load a bunch of data into a table using Oracle's SQLDeveloper. Everything's fine for quite a while, until I get the error seen below. Any idea what could be prompting this error while just running insert statements? Any idea how to avoid them?



If you're absolutely certain that the .csv lines in question are valid (it's worth checking), then the first thing I would do is make sure you're using the latest version of SQL Developer, and if you are then investigate the feasibility of using a different tool (Toad, SQL Loader, impdp/imp, roll something yourself (not quite as stupid as it sounds if you have clobs or something complicated like that involved)). I've had nothing but problems with SQL Developer when importing data, but upgrading to the latest version quite often fixes whatever I've hit.

It's a buggy piece of poo poo for anything more advanced than ad-hoc queries/inserts. Although so are most similar tools to be fair, and at least you're not paying for it.

Goat Bastard fucked around with this message at 14:39 on Apr 23, 2012

Powdered Toast Man
Jan 25, 2005

TOAST-A-RIFIC!!!
Having a bit of difficulty; I'm new to T-SQL and my general SQL skills are a bit rusty after a long time not being used...

I'm dealing with AD objects, specifically groups (both security and distribution). To ensure accuracy, I am using their SIDs as the primary key. The group display name is still important, though, because the results need to be readable by a person, and also because I have a bunch of data that is organized by the display name rather than the SID of the group.

Essentially I need to build a link table relating each job title in the company to its associated security and distribution groups. I have taken this data out of a previously used spreadsheet and it is formatted something like this (where each item is a column):

JobTitle, Group1, Group2, Group3, ...

The contents of each group column differ depending on the job title. Some jobs have only a few groups, some have as many as 30. So, each JobTitle/Groupn pair is a relationship that needs to go into my link table. I have the groups associated to their SIDs in another table. I have each job title associated with an ID in another table. I want to take the information in the GroupsToTitles table and turn it into a link table associating the job title ID with the SID for the group.

I know how to do joins, but I'm really struggling with the mechanics of this. Would it be better to try to process one column at a time in the GroupsToTitles table or can it be done all at once? I just need a nudge in the right direction, here. Sorry if that was a bit rambling, I hope it made sense.

Fcdts26
Mar 18, 2009
Do most of you do SQL type programming as a full time job or is it usually in addition to another job? I took my first class this semester and plan to continue taking all of the other SQL related courses in the next few semesters. It seems to me at least that there is a definite job opportunity with DB stuff. Anything extra that you guys think is important id love to hear about it. One of my other teachers offered me a position as a entry level linux sys admin with a huge company that handles IT for almost every large company where I live. I'm fairly sure I would be able to transition to more DB stuff with them in the future and just want to make sure I am handling it all the best way possible. My job history as far as IT stuff goes is very limited as I was in public service for most of my adult life before starting on this journey. Anyway thanks!

DankTamagachi
Jan 20, 2005
Tamagachi to your throat!

Nask26 posted:

Do most of you do SQL type programming as a full time job or is it usually in addition to another job? I took my first class this semester and plan to continue taking all of the other SQL related courses in the next few semesters. It seems to me at least that there is a definite job opportunity with DB stuff. Anything extra that you guys think is important id love to hear about it. One of my other teachers offered me a position as a entry level linux sys admin with a huge company that handles IT for almost every large company where I live. I'm fairly sure I would be able to transition to more DB stuff with them in the future and just want to make sure I am handling it all the best way possible. My job history as far as IT stuff goes is very limited as I was in public service for most of my adult life before starting on this journey. Anyway thanks!

In my experience, the best place to use SQL in a non-entirely technical position (aka not a DBA or anything like that) is in finance. Many companies need financial analysts with SQL skills to build reports for account management and the like.

DukAmok
Sep 21, 2006

Using drugs will kill. So be for real.

DankTamagachi posted:

In my experience, the best place to use SQL in a non-entirely technical position (aka not a DBA or anything like that) is in finance. Many companies need financial analysts with SQL skills to build reports for account management and the like.

Agreed, I use SQL in an analytics type role currently, strikes me as much more interesting than pure tech/DBA side of it.

Fcdts26
Mar 18, 2009

DukAmok posted:

Agreed, I use SQL in an analytics type role currently, strikes me as much more interesting than pure tech/DBA side of it.

DankTamagachi posted:

In my experience, the best place to use SQL in a non-entirely technical position (aka not a DBA or anything like that) is in finance. Many companies need financial analysts with SQL skills to build reports for account management and the like.

This is exactly what I would like to do, Purely an analyst type thing. My SQL teacher works for a bank here in town and that's all he does is build reports. I didn't realize until now that my wife's company(insurance) has strictly Analyst jobs also. Any tips on what I should learn for real world applications? I'd like to use the summer to beef up my skills.

Fcdts26 fucked around with this message at 22:47 on Apr 25, 2012

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
MySQL. I have a logs table, 9 million rows dating back to 2004. We need to be able to do wildcard searches on it. However, this not being natural text, using a FULLTEXT index won't cut it, and MySQL can't use indexes in a LIKE "%foo" search. What are my options beyond cutting the log off and archiving the old stuff?

McGlockenshire
Dec 16, 2005

GOLLOCKS!
MySQL sucks at the thing you're trying to do. Try an external fulltext solution, like Sphinx, or maybe a Lucene-based thing like Solr or ElasticSearch. ElasticSearch will be the easiest of these to prototype.

DukAmok
Sep 21, 2006

Using drugs will kill. So be for real.

Nask26 posted:

This is exactly what I would like to do, Purely an analyst type thing. My SQL teacher works for a bank here in town and that's all he does is build reports. I didn't realize until now that my wife's company(insurance) has strictly Analyst jobs also. Any tips on what I should learn for real world applications? I'd like to use the summer to beef up my skills.

I knew nothing about SQL about two months ago, and everything I needed to know for analysis purposes I was able to learn on the job. The most important things for my purposes were learning the structures of the databases already in place, and how to efficiently join in only what is needed for larger data sets. A basic book on the subject got me 20% there as far as syntax and such goes, but the rest was all specific to the company.

Sub Par
Jul 18, 2001


Dinosaur Gum

Nask26 posted:

This is exactly what I would like to do, Purely an analyst type thing. My SQL teacher works for a bank here in town and that's all he does is build reports. I didn't realize until now that my wife's company(insurance) has strictly Analyst jobs also. Any tips on what I should learn for real world applications? I'd like to use the summer to beef up my skills.
Ranking, sorting, and the like are important for that kind of work - I'd pay special attention to some of the analytical functions (lead, lag, rank, row_number, etc) and get comfortable using them.

I also had fun with the SQL Cookbook (author is Anthony Molinaro). You can read the examples in there as "challenges" and try to do them on your own. Then if you fail, you can read the "correct" answer and figure out what you did wrong. If you got it right, it is still helpful to read because there are some creative solutions in there. The appendix of the book, about window functions and the concept of a group in SQL, was also extremely valuable to me.

Pardot
Jul 25, 2001




If any of you use OS X and want to try Postgres, I'd appreciate it if you checked out our new beta postgres.app, and please report any issues you have on the github tracker.

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe

Pardot posted:

If any of you use OS X and want to try Postgres, I'd appreciate it if you checked out our new beta postgres.app, and please report any issues you have on the github tracker.

That's a pretty :3: elephant. Also that seems really neat and useful, did you do it yourself?

InAndOutBrennan
Dec 11, 2008

DankTamagachi posted:

In my experience, the best place to use SQL in a non-entirely technical position (aka not a DBA or anything like that) is in finance. Many companies need financial analysts with SQL skills to build reports for account management and the like.

This is basically what I do. But add marketing, product development, executives, production, everybody basically to the internal client list.

The role here is called "Business analyst" but since 95% of the things people would like to know about is not implemented (if it were, they'd already know) it's a mix between implementation and analytics. Great fun with access to most of the company. And I'm not placed with IT but actually with CRM which is a huge plus to me.

Pardot
Jul 25, 2001




MEAT TREAT posted:

That's a pretty :3: elephant. Also that seems really neat and useful, did you do it yourself?

No, another guy on my team, just spreading the word :tipshat:

Chuu
Sep 11, 2004

Grimey Drawer
(note: when answering this question feel free to point me to some basic tutorials on how to do this sort of stuff, I have a feeling this is a common problem but I am not good with database work)

I need to create a database to store a bunch of security (as in futures, options, etc.) definitions. They are composed of four "types" of fields. An example:

SecurityID: 12345
SecurityName : CLM2
SettlementPrice: 1234.56
BreachedLimits: Y

SecurityID is guaranteed to be unique and never change. SecurityName is guaranteed never to change. SettlementPrice changes every day. BreachedLimits changes every once in an unpredictable while.

I am trying to create a database where I essentially can perform 3 types of operations on these. (1) Get the current values, (2)get the values as of a specific date, and (3) get the variables over a range of dates. #3 is really a subset of #2 (i.e perform #2 over every date in a range specified for #3).

I'm not sure what the best way to store this sort of data in a relational way is. There is a very natural primary key and a ton of fields that do not change which provide a natural "master" table -- but I am unsure how the indexing and storage would work for all the fields that change over time.

I am guessing this is a common problem but google is failing me. Any help?

Goat Bastard
Oct 20, 2004

Sounds like you need two tables.

One has SecurityId and SecurityName, and the other has SettlementPrice, BreachedLimits, a new field SettlementDate, and SecurityId, which is a foreign key to the first table.

The second table would get a new row every day (or every time a new SettlementPrice is created or whatever), and the primary key would either be a composite of (SecurityId, SettlementDate) or a new synthetic field, depending on your preference/conventions.

Your date range queries would then be in the form
code:
SELECT SettlementPrice 
FROM   settlements
JOIN   securities 
       ON securities.SecurityId = settlements.SecurityId
WHERE  settlements.SettlementDate BETWEEN :some_date AND :some_other_date

Zombywuf
Mar 29, 2008

Full text search in Postgres question. I need to add key/value tags for specific metadata to an existing full text search system. My current thinking is that the ugly but fast way to do it would be to add the tags to the full text record in some escaped form, i.e. "tag1:value tag2:value" and then I can mash the tags into the query making it a single scan to do the filtering. Am I a bad person?

baquerd
Jul 2, 2007

by FactsAreUseless

Chuu posted:

I need to create a database to store a bunch of security (as in futures, options, etc.) definitions. They are composed of four "types" of fields. An example:

SecurityID is guaranteed to be unique and never change. SecurityName is guaranteed never to change.

You may be able to say this on SecurityID, but symbols change from time to time.

Benson Cunningham
Dec 9, 2006

Chief of J.U.N.K.E.R. H.Q.
Quick question, is there a standard naming convention for a binary table row in a db? Like if you were going to make a ticket tracker, would you call the open/closed value open? Do you just pick one of the options and have that be 1 and the opposite be 0?

Pardot
Jul 25, 2001




Zombywuf posted:

Full text search in Postgres question. I need to add key/value tags for specific metadata to an existing full text search system. My current thinking is that the ugly but fast way to do it would be to add the tags to the full text record in some escaped form, i.e. "tag1:value tag2:value" and then I can mash the tags into the query making it a single scan to do the filtering. Am I a bad person?

Have you looked at hstore?

http://www.postgresql.org/docs/9.1/static/hstore.html
https://postgres.heroku.com/blog/past/2012/3/14/introducing_keyvalue_data_storage_in_heroku_postgres/
http://robots.thoughtbot.com/post/13829210385/the-durable-document-store-you-didnt-know-you-had-but

Goat Bastard
Oct 20, 2004

Benson Cunningham posted:

Quick question, is there a standard naming convention for a binary table row in a db? Like if you were going to make a ticket tracker, would you call the open/closed value open? Do you just pick one of the options and have that be 1 and the opposite be 0?

1/0 is better for internationalisation, and makes perfect sense for boolean columns. In your example I'd call the column open or is_open or something with 1 for true and 0 for false. I'm assuming that your DB doesn't have a boolean datatype - if it does then you should just use that.

FYI, you should never "just pick one of the options and have that be 1" - in a boolean context 1 means true. Always. I've worked on code where the original developer didn't understand that and it was just awful. Don't be that developer, please.

Zombywuf
Mar 29, 2008


Ah, forgot about that. Will multi-column GIN indexes over tsvector and hstore columns work? I've had to do a bunch of really evil poo poo to get full text search working efficiently and I want to try and add features as cleanly as possible.

EDIT, hmmm:
code:
> create index on fulltext using gin (tv, tags);
CREATE INDEX
>  explain select id from fulltext where tv @@ to_tsquery('hello') and tags @> 'tag => value';
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Seq Scan on fulltext  (cost=0.00..20.85 rows=1 width=4)
   Filter: ((tags @> '"tag"=>"value"'::hstore) AND (tv @@ to_tsquery('hello'::text)))
(2 rows)
Postgres, why you seq scan everything!

Zombywuf fucked around with this message at 11:06 on May 4, 2012

Pardot
Jul 25, 2001




Zombywuf posted:

Ah, forgot about that. Will multi-column GIN indexes over tsvector and hstore columns work? I've had to do a bunch of really evil poo poo to get full text search working efficiently and I want to try and add features as cleanly as possible.

EDIT, hmmm:
code:
> create index on fulltext using gin (tv, tags);
CREATE INDEX
>  explain select id from fulltext where tv @@ to_tsquery('hello') and tags @> 'tag => value';
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Seq Scan on fulltext  (cost=0.00..20.85 rows=1 width=4)
   Filter: ((tags @> '"tag"=>"value"'::hstore) AND (tv @@ to_tsquery('hello'::text)))
(2 rows)
Postgres, why you seq scan everything!

Any small table (rows=1 ?) will just be seq scanned, as that's faster than using an index. Or the query planner could have a bad estimate, you can run analyze or vacuum analyze to get beter stats.

Perhaps as a cheap and easy way to figure out if any other plans are even possible, try setting enable_seqscan = off.

And also keep in mind postgres can deal with more than one index on a query, you might want to try having separate ones and letting postgres figure it out.

Zombywuf
Mar 29, 2008

Pardot posted:

Any small table (rows=1 ?) will just be seq scanned, as that's faster than using an index. Or the query planner could have a bad estimate, you can run analyze or vacuum analyze to get beter stats.

Perhaps as a cheap and easy way to figure out if any other plans are even possible, try setting enable_seqscan = off.

And also keep in mind postgres can deal with more than one index on a query, you might want to try having separate ones and letting postgres figure it out.

I've generally found expecting Postgres's dealing with more than one index to leave a lot to be desired. Yeah disabling seqscan showed it to be able to do an index lookup. With a certain amount of time not well accounted for as the query planner doesn't show the time for the heap lookup.

I may have asked this before, is there any way to gain access to Postgres's plan cache? Especially for plpgsql functions.

Adbot
ADBOT LOVES YOU

Sab669
Sep 24, 2009

I've never been great with multiple table statements, how would I do something that will pull everything from 3 different tables.

Table1
Table2 has Table1_PK
Table3 has Table2_PK

I tried this:
code:
SELECT * FROM Table1, Table2, Table3
WHERE Table1.PK = Table2.Table1_PK
OR Table2.PK = Table3.Table2_PK
It returned a few results, but I'm not sure if it's the "proper" result set? I've always hated database stuff :argh:

e; I guess it actually couldn't possibly be the right results- there aren't that many records in the DB for the result set to make sense.

Sab669 fucked around with this message at 16:17 on May 9, 2012

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply