Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
stuxracer
May 4, 2006

Edit: Missed your edit. I should refresh the page next time :)

Do not use the alias names in your group/order functions. Use the same expression/reference name.

stuxracer fucked around with this message at 21:56 on Jan 23, 2014

Adbot
ADBOT LOVES YOU

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

xenilk posted:

I am getting an "Invalid use of group function" error, not sure why tho?
It's because SUM(( COUNT(ME.Info1) * 2) + COUNT(ME2.Info1)) makes no sense. the SUM function is an aggregate function (just like COUNT) that sums all the rows together. If you're just adding columns (well, technically expressions), the + is all you need.

As for the different kinds of counts, if you join to the events to get goals, and then join again to get assists, you'll essentially end up multiplying goals and assists (i.e., for each goal, you'll join to all assists). So there's a couple ways to do it. Either do the aggregation before the join:
code:
SELECT P.ID, COALESCE(ME_G.NumGoals,0) AS NumGoals, COALESCE(ME_A.NumAssists,0) AS NumAssists, 
COALESCE(ME_G.NumGoals,0) * 2 + COALESCE(ME_A.NumAssists,0) AS `NumPoints`
FROM Players AS P
-- Goals
LEFT OUTER JOIN (
SELECT ME1.Info1, COUNT(ME1.Info1) AS NumGoals
FROM Match_Events AS ME1
WHERE ME1.Type = 1
GROUP BY ME1.Info1) AS ME_G
  ON ME_G.Info1= P.ID
-- Assists
LEFT OUTER JOIN (
SELECT ME2.Info1, COUNT(ME2.Info1) AS NumAssists
FROM Match_Events AS ME2
WHERE ME2.Type = 2
GROUP BY ME2.Info1) AS ME_A
  ON ME_A.Info1= P.ID
GROUP BY P.ID
ORDER BY NumPoints DESC, NumGoals DESC, NumAssists DESC
Or, because in this particular instance, both things are coming from the same table, you can just join once and use CASE statements and SUM to act like a selective count.

code:
SELECT P.ID, SUM(CASE ME.Type WHEN 1 THEN 1 END) AS NumGoals, SUM(CASE ME.Type WHEN 2 THEN 1 END) AS NumAssists, 
SUM(CASE ME.Type WHEN 1 THEN 2 WHEN 2 THEN 1 END) AS `NumPoints`
FROM Players AS P
-- Goals and Assists
LEFT OUTER JOIN Match_Events AS ME
  ON ME.Info1= P.ID
  AND ME.Type in (1,2)
GROUP BY P.ID
ORDER BY NumPoints DESC, NumGoals DESC, NumAssists DESC

Jethro fucked around with this message at 23:12 on Jan 23, 2014

PhantomOfTheCopier
Aug 13, 2008

Pikabooze!

xenilk posted:

I would like for example to compile all the goals for all the players I was thinking to do something in that logic:

Everyone Else posted:

:words:
:psyduck: This was the first thing that ran through my mind. Now I'm terribly afraid that I'm missing something in the original question. Fix table names as you will.
code:
SELECT 
  p.id,e.type,count(e.type)
FROM player p, events e
WHERE
  e.player=p.id
GROUP BY
  p.id,e.type
ORDER BY
  p.id,e.type;

BabyFur Denny
Mar 18, 2003

Jethro posted:

Or, because in this particular instance, both things are coming from the same table, you can just join once and use CASE statements and SUM to act like a selective count.


[/code]

A bit more elegant (in my opinion) than
code:
SUM(CASE ME.Type WHEN 1 THEN 1 END) AS NumGoals,
is
code:
SUM(ME.Type=1) AS NumGoals,

xenilk
Apr 17, 2004

ERRYDAY I BE SPLIT-TONING! Honestly, its the only skill I got other than shooting the back of women and calling it "Editorial".
Thanks guys! All the calculations are done within the same table so those examples help a lot! I eneded up using the SUM(CASE) Jethro proposed, seems to work very nicely!

Centripetal Horse
Nov 22, 2009

Fuck money, get GBS

This could have bought you a half a tank of gas, lmfao -
Love, gromdul
So, I am doing something in a simple but sloppy way, and I am wondering if there is a better alternative.

Let's say I have a column `Average Price` which I want to break down into arbitrary increments for grouping purposes. This is easy with:

... case when p.price between 0 and 0.10 then '0 - .10' when p.price between 0.10 and 0.20 then `.11 - .20` end as `Average Price` ...

However, this obviously gets a big long-winded if I am dealing with hundreds of possible ranges, or if I want to change the increments/ranges. Is there an easier way to do this, or am I better off just writing a script to go through the database calcing this and adding it to a table?

Edit: Percona MySQL 5.5

BabyFur Denny
Mar 18, 2003
CONCAT(floor(p.price/10)*10,' - ',ceil(p.price/10)*10) will give you e.g. '4.1 - 4.2' for all prices in that range.

vulgey
Aug 2, 2004

Covered in blood and without any clothes. Where is my mother?
Not sure this is the right thread for this kind of question but are there any recommended books or online courses for getting into Oracle?

I'm currently a Unix Sys Admin just finishing up studying for my RHCE but there's potentially a new role coming up as sort of a "Sorcerer's Apprentice" to our current Oracle wizard. At the moment if this guy is off sick or on holiday and something goes wrong with our DB, we are hosed. Yeah this isn't the greatest position to be in I know. We had another guy that worked with him but he's been on long term sick for 8+ months, we never got a replacement and now it looks like he's not coming back.

My boss is a firm believer of recruiting from within and moving people around different teams and has already expressed a desire to move one of us to this new role. I haven't done any DB work since Access at school and I don't think that really counts so it needs to be something I can start from the bottom.

Cheers :)

e. To clarify I'd be doing my usual job but with the extra pay and responsibility of being the backup database guy. I could get by with reading our documentation and just going from that but I'd rather know WHY I'm doing things.

vulgey fucked around with this message at 13:30 on Jan 28, 2014

SheepNameKiller
Jun 19, 2004

I've never liked books or courses, chances are pretty good you'll be better off just trying to mentor with the person working at your company currently and supplementing it with looking up terms you don't understand on the internet as you go.

That said there have been one or two books I've gotten some benefit from in my career, unfortunately they're not specific to Oracle though.

fluppet
Feb 10, 2009
I'm just setting up replication on one of our SQL Server db's and its breaking our deploy tools as in the past someone setup the stored procedures to the dropped and recreated every time we deploy any change scripts.

Is there any reasoning behind doing it this way?

Night Shade
Jan 13, 2013

Old School

fluppet posted:

I'm just setting up replication on one of our SQL Server db's and its breaking our deploy tools as in the past someone setup the stored procedures to the dropped and recreated every time we deploy any change scripts.

Is there any reasoning behind doing it this way?

Guessing, but whoever set it up was dumb and didn't know about sp_refresh or DBCC FREEPROCCACHE. The only reason I can think of to drop/create all procedures is to validate that they still actually compile instead of referencing things that no longer exist, and to flush out their query plan. The former should be picked up in build before you even generate a change script and the latter can be handled for individual procs with sp_refresh and for an entire database with DBCC FREEPROCCACHE.

Aside: SQL Server is really good at working out when plans need to be dumped and recompiled... within a single database. Had to track down a bug recently where a stored proc in one database, selecting * from a table in a second database, didn't return new columns from the table after a change got deployed. The server was still using the old query plan.

aBagorn
Aug 26, 2004
Ok SQL experts (specifically MS-SQL):

I currently have a Schema with the following tables/columns (relevant columns only:

code:
PriceAgreement
---------------
AgreementID (PK, int)
AgreementName (varchar(200))
StartDate (datetime2(7))
EndDate (datetime2(7))
PriceGuaranteeDate (datetime2(7)
isPriceGuaranteed (bit)


Items
--------------
ProductSKU (PK, varchar(20))


AgreementsItems
-------------------
AgreementID (FK)
ProductSKU (FK)
ItemPrice (money)
StartDate (datetime2(7))
CancelEffectiveDate(datetime2(7))
PastItemPrice (money)
PastItemStartDate(datetime2(7))
Ok, so many to many with items to agreements (they can have different price points based on the agreement).

My conundrum (and I can't mess with the structure of the data unfortunately) is that I need some way to import new agreementitem data. This would involve setting PastItemPrice = ItemPrice and PastItemStartDate = StartDate and then putting in the new values, as well as updating the info in the PriceAgreement table. There's the potential for additions or deletions to the AgreementsItems table. This ideally needs to be automated from what we get from our vendors (excel) and I am completely lost.

ManoliIsFat
Oct 4, 2002

I've never used it personally, but I believe you want a CHECKSUM_AGG http://technet.microsoft.com/en-us/library/ms188920.aspx or CHECKSUM http://technet.microsoft.com/en-us/library/ms189788.aspx to detect changes to the table, if you don't wanna have an "old snapshot value" column or something. But you'll still need some way of saving the previous values of the checksum. Some table that lists AgreementID and Checksum, and then join against that to tell which checksums have changed.

tynam
May 14, 2007

aBagorn posted:

My conundrum (and I can't mess with the structure of the data unfortunately) is that I need some way to import new agreementitem data. This would involve setting PastItemPrice = ItemPrice and PastItemStartDate = StartDate and then putting in the new values, as well as updating the info in the PriceAgreement table. There's the potential for additions or deletions to the AgreementsItems table. This ideally needs to be automated from what we get from our vendors (excel) and I am completely lost.

So you're inserting new records into the AgreementsItems table, and updating the existing data based on what I assume is a unique key pairing of AgreementID and ProductSKU in AgreementsItems?

Off the top of my head...
Throw the excel into a temp table, left join the two tables on the AgreementID and ProductSKU (might have to get the ID based off name first depending on if the ID is stored in the excel, but this part is trivial). NULL values on the joined columns will indicate new records to insert. Then do an inner join to get the existing records and just update from the new table to the old. You can use the same code for both insert and update with a change to NULL/NOT NULL, but since you'll need a separate update anyway you might as well reduce the data being pulled.

Not sure what you mean by automation, but you can just run the script whenever you import the new excel data to the temp table, or any variety of ways.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
Are there any good books (or other resources) focused on designing/programming larger database projects, specifically in T-SQL? My relative simple procedure that allowed users to filter on some criteria and choose from a couple of possible outputs is slowly turning into an unholy mess of dynamic SQL generation and other crap that I'm afraid will bite me in the rear end in terms of maintenance later. I'm not looking forward to replicate FactoryFactory patterns or anything, but I'd like to get a better idea of the existing best practices I could use.

A Tartan Tory
Mar 26, 2010

You call that a shotgun?!
Hello, I am an idiot newbie with a very small, yet confusing, question.

quote:

"SELECT RoomNumber, RoomName FROM Vault"

This is my current query, there are 11 results for both, with three different rooms 1 (named one), 2 (named two) and 3 (named three). I want to limit it so it only shows one unique result, is there any way to make it so that this query won't show 11 results, but just show three results (e.g. the Room Name and Number for all three rooms).

Basically, is there a way to display only unique results using MySQL? I have tried LIMIT with very little success.

Edit: Read through the book again and found the totally forgotten about DISTINCT command. Thanks for reading my idiocy!

A Tartan Tory fucked around with this message at 19:06 on Feb 7, 2014

Erwin
Feb 17, 2006

Select distinct:
code:
select distinct roomnumber, roomname from vault
This gives you only one row per each of those unique name/number combos, so if you added a room named "one" with a room number of 2, it would be a separate result.

tynam
May 14, 2007

A Tartan Tory posted:

Hello, I am an idiot newbie with a very small, yet confusing, question.


This is my current query, there are 11 results for both, with three different rooms 1 (named one), 2 (named two) and 3 (named three). I want to limit it so it only shows one unique result, is there any way to make it so that this query won't show 11 results, but just show three results (e.g. the Room Name and Number for all three rooms).

Basically, is there a way to display only unique results using MySQL? I have tried LIMIT with very little success.

code:
SELECT DISTINCT RoomNumber, RoomName FROM Vault
Should work, unless MySQL doesn't use DISTINCT? I only use MS SQL but I would think DISTINCT is ubiquitous enough.

A Tartan Tory
Mar 26, 2010

You call that a shotgun?!
Thanks guys, literally completely forgot about DISTINCT and found it in the documentation just as you mentioned it.

Kilson
Jan 16, 2003

I EAT LITTLE CHILDREN FOR BREAKFAST !!11!!1!!!!111!
I'm using unixODBC and inserting some data where one of the fields has double quotes in it. When the record is inserted, the double quotes have been stripped out, and I can't figure out why. Is there some kind of setting for this, or do I have to escape the double quotes in some particular way?

I'm nearly 100% sure this worked before, but I don't know what (if anything) has changed, besides that this is now running on a new machine. Any ideas?

Deep Dish Fuckfest
Sep 6, 2006

Advanced
Computer Touching


Toilet Rascal
I've got a MySQL/InnoDB question that's half work-related, half for my own knowledge. Here goes:

Whenever you define a primary key in InnoDB, it makes that index a clustered index. Now suppose I insert something in the middle of that index. Then the engine needs to actually move all record that follow the one I'm now inserting if there isn't any "hole" where I'm trying to insert, although realistically I expect it to move all records on the page where it's trying to insert, and then insert a new page between the current one and the next where records that overflow from the current page will go. I haven't been able to find more info about this, though, so I might already be wrong.

If I'm right, then suppose I have a primary key of the form (MainID, SubID), and that I will be inserting rows that have an existing MainID, but a new SubID. This is pretty much the situation I described above where you're inserting things in the middle of a clustered index. If inserts are the dominant operation on the table, then would it ever be worth it to make the primary key/clustered index a dummy auto-incremented value, and make a secondary index on (MainID, SubID) so that all new records are physically inserted after all existing records? On the one hand, this would remove the need to move records, but on the other hand there's another index that needs to be modified whenever inserts happen.

Another thing I'm wondering about in this case is whether or not always inserting at the end of the clustered index would increase lock contention. On inserts, InnoDB locks the value after wherever it's trying to insert, plus the gap between that value and the previous one. In this case, this means each insert would lock the gap at the end of the index and the "+infinity" value, effectively serializing the inserts, whereas if things are being inserted all over the clustered index, I imagine that lock contention would be pretty low. Then again, repeated inserts with incremented primary key seems like a pretty common pattern, so I'd expect that people who write database would make sure it has good performances, so I don't know if it would really matter.

Obviously, I'm not going to try this without a very good reason to do so, which I'm unlikely to have, and a whole lot of profiling and measurements, but I'd still be interested to know if it makes any sense. Or if there's something completely wrong with my understanding of how databases work.

Bognar
Aug 4, 2011

I am the queen of France
Hot Rope Guy
Quick question: in SQL Server, why does 0xFF = 0xFF00? More importantly, how can I make that not true?

Test case: http://sqlfiddle.com/#!6/948b9/1

Pardot
Jul 25, 2001




Bognar posted:

Quick question: in SQL Server, why does 0xFF = 0xFF00? More importantly, how can I make that not true?

Test case: http://sqlfiddle.com/#!6/948b9/1

Seems like signed integer overflow. I have no idea what sql server is doing, but http://sqlfiddle.com/#!6/948b9/14

code:
SELECT 0x7F    => 127
union all
SELECT 0x80    => -128
union all
SELECT 0x7f80  => 127,-128
union all
SELECT 0xFFFF  => -1,-1

roomforthetuna
Mar 22, 2005

I don't need to know anything about virii! My CUSTOM PROGRAM keeps me protected! It's not like they'll try to come in through the Internet or something!
Not a question at all, but kind of related to that last thing, I recently moved an application to a newer server, and discovered it was behaving a bit weirdly; turns out that the old mysql treated this situation differently from the new one.

code:
CREATE TABLE bla (x TINYINT UNSIGNED);
INSERT INTO bla SET x=1;
SELECT x-2 FROM bla;
The old one would say the answer is -1 (presumably because x being unsigned doesn't mean x-2 has to be unsigned), but the new one says that x-2 is 18446744073709551615.

Luckily I didn't get any data horribly destroyed by this change, I just had to change some columns from unsigned tinyints to signed smallints to get the behavior back how it was supposed to be. The new way seems rather horrible and dangerous though - if I had been relying on eg. "SET x=x-1" doing its own clamping at 0, and top-clamping at a value of my choice, I'd have ended up going from an x behaving as expected to an x that wraps around when decrementing but not when incrementing. Yuck.

EAT THE EGGS RICOLA
May 29, 2008

What is the easiest way to read/recover a Text field in SQL Server 2000? It's long enough that Enterprise Manager is showing it as <long text> and Query Analyzer truncates it.

PleasantDilemma
Dec 5, 2006

The Last Hope for Peace
I'm having trouble writing a query. I'm not even sure how to start. This is in MySQL 5.6.

Say I've got tables A and B. A has an ID field and the field bCount. For a set of records in table A I want to get the bCount most recent rows in table B. A and B are a one-to-many relationship.

Usually in my code I already have a record from table A and then I can just read the bCount and do a new query like so:
SQL code:
SELECT b.* 
FROM   table_b AS b 
WHERE  b.a_id = id_from_a 
ORDER  BY b.b_date 
LIMIT  count_from_a 
I'm trying to expand this to be based on many rows from A. I don't see how I can dynamically set the limit since it will be different for every row in A.

Let me know if I am explaining this poorly.

mobby_6kl
Aug 9, 2009

by Fluffdaddy
You probably want a window function and a subquery. The former gives you the record number per ID, the latter allows you to filter out records where it's greater than bCount:

code:
SELECT * FROM 
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY a_id, ORDER BY b_date) rn 
FROM   table_b 
) foo join table_a on whatever
WHERE rn < bCount
That's T-SQL syntax but there's probably something similar available for you as well.

megalodong
Mar 11, 2008

Bognar posted:

Quick question: in SQL Server, why does 0xFF = 0xFF00? More importantly, how can I make that not true?

Test case: http://sqlfiddle.com/#!6/948b9/1

In T-SQL, constants of the form 0xXXXX... are binary constants, which are varbinary values. You can see this if you try subtracting two of them: http://sqlfiddle.com/#!6/948b9/17

If you have two binary values X and Y, and
  • X is shorter than Y
  • the bytes in X correspond to the bytes in the same position in Y
  • the "extra" bytes in Y are all 0x00 bytes
then it's up to the implementation if they're considered equal or not. SQL Server considers them equal.

To avoid that, cast them to pretty much any other data type that can hold the values (int, bigint, varchar...) and you'll get the right answer:

http://sqlfiddle.com/#!6/948b9/19

PleasantDilemma
Dec 5, 2006

The Last Hope for Peace

mobby_6kl posted:

You probably want a window function and a subquery. The former gives you the record number per ID, the latter allows you to filter out records where it's greater than bCount:

code:
SELECT * FROM 
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY a_id, ORDER BY b_date) rn 
FROM   table_b 
) foo join table_a on whatever
WHERE rn < bCount
That's T-SQL syntax but there's probably something similar available for you as well.

Thanks for your help buddy. I've never heard of window functions before. Quick google-fu shows that MySQL does not support window functions :(. I've talked it over with my boss and we went with a query that will get slightly more results than we need and then we filter the results in code. Not the best solution but the data set is expected to be small for this feature so we will roll with it for now. Also now I get to read up on window functions because they sound useful for when I work on projects with a better database.

WHERE MY HAT IS AT
Jan 7, 2011
Is anyone able to tell me where the issue is here? MySQL is giving me a syntax error on the first DECLARE statement for some reason (unexpected declare symbol). It looks fine to me, but I am bad at databases.

code:
DELIMITER //
CREATE PROCEDURE Run_Kanban(durationSeconds INT)
BEGIN
CALL Reset_Kanban();

DECLARE employeeCursor CURSOR FOR SELECT (Employee_ID, Skill_Level) FROM Employee;
DECLARE ID, Skill INT;
OPEN employeeCursor;
...
Any help here would be appreciated! It's probably something embarrassing that I'm missing as is usually the case when I have to ask here for help.

WHERE MY HAT IS AT fucked around with this message at 03:04 on Feb 15, 2014

Hanpan
Dec 5, 2004

What is the best way to query a table for column changes? I am storing product data (price, stock status, promotion status etc) and in order to keep it empirical I am adding a new row per product whenever there are data changes. For example.

code:
Product                Price              In Stock              On Promotion              Timestamp
Some product      £1.99             Yes                     Yes                              Jan 29th 2014
Some Product      £1.99             No                      Yes                              Jan 20th 2014
Some Product      £2.99             Yes                     Yes                              Jan 19th 2014
If I wanted to find out when the last time a particular attribute changed, i.e. when was the last time "Some product" went out of stock, or when was the list time "Some product" changed price. I'd also need to be able to run this same query within set time brackets, so when was the last time "Some product" changed price between 1st Dec 2013 and 20th Jan 2014?

I was thinking of perhaps creating a separate table that logs changes by attribute but am conscious of duplicating data. Is there a way to accomplish this with by querying this table instead without massive overhead?

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!
There's two ways to do it that I can think of:

In the first way, you give each row an ordinal number on a per-product basis (either when you create the row, or in your query by some method that depends on what RDBMS you're using), and then you join the table to itself by product and previous row. Return the maximum timestamp where the column you're looking for doesn't match the value in the previous row.
code:
select A.Product,
       max(A.Timestamp) LastChange
  from product_data A
       inner join product_data B on A.Product = B.Product
                                         and A.Row = B.Row + 1
 where A.InStock <> B.InStock
group by A.Product
Alternatively, join the table to itself by product and later timestamp. Return the maximum timestamp where the column doesn't match (to get the last entry before a change), and then again join the table to that and get the minimum timestamp that is greater than the timestamp we just got.

code:
select A.Product,
       min(A.Timestamp) LastChange
  from product_data A
       inner join (select C.Product,
                          max(C.Timestamp) MaxTimeStamp
                     from product_data C
                          inner join product_data D on C.Product = D.Product
                                                       and C.Timestamp < D.Timestamp
                                                       and C.InStock <> D.InStock
                   group by C.Product) B on A.Product = B.Product
                                            and A.Timestamp > B.MaxTimeStamp
group by A.Product

big trivia FAIL
May 9, 2003

"Jorge wants to be hardcore,
but his mom won't let him"

Hanpan posted:

What is the best way to query a table for column changes? I am storing product data (price, stock status, promotion status etc) and in order to keep it empirical I am adding a new row per product whenever there are data changes. For example.

code:
Product                Price              In Stock              On Promotion              Timestamp
Some product      £1.99             Yes                     Yes                              Jan 29th 2014
Some Product      £1.99             No                      Yes                              Jan 20th 2014
Some Product      £2.99             Yes                     Yes                              Jan 19th 2014
If I wanted to find out when the last time a particular attribute changed, i.e. when was the last time "Some product" went out of stock, or when was the list time "Some product" changed price. I'd also need to be able to run this same query within set time brackets, so when was the last time "Some product" changed price between 1st Dec 2013 and 20th Jan 2014?

I was thinking of perhaps creating a separate table that logs changes by attribute but am conscious of duplicating data. Is there a way to accomplish this with by querying this table instead without massive overhead?

Turn on and use Change Data Capture (assuming SQLDB). Alternatively create a ProductStatusAudit table that mirrors the ProductStatus table and insert a row via trigger when something is changed on the ProductStatus table. This is not "duplicating data."

Edit: I didn't realize you were storing all changes in an ordinal fashion in that table, so it becomes a Status and an Audit table in one. That's gross and you shouldn't do that.

big trivia FAIL fucked around with this message at 19:25 on Feb 22, 2014

jdi448hsnvb
Dec 6, 2006

asfd
Hey code goons. I'm in a first year comp sci class in which I have to create a program that does cool things. Without going into details, I plan on using my home PC as a server that the program will query. I'm wanting to use SQL to manage the data. It'll be between 1-2 gigs with somewhere in the neighborhood of 1 million rows and 6 columns, all strings. I definitely want to index as I need queries fast but I'm a bit confused about it. Assuming I won't need to update anything (once the table is made it will not need to be changed), can/should I index every column? Does this at some point become counterproductive and if so when?

Also, I'm using Python to code the entire project. My plan is to use SQLite/Python to build the database and then install the MySQL community server on my desktop to actually make the database accessible over the internet. Is this going to work or am I totally confused?

Thanks in advance

Quebec Bagnet
Apr 28, 2009

mess with the honk
you get the bonk
Lipstick Apathy

dammitcharlie posted:

Hey code goons. I'm in a first year comp sci class in which I have to create a program that does cool things. Without going into details, I plan on using my home PC as a server that the program will query. I'm wanting to use SQL to manage the data. It'll be between 1-2 gigs with somewhere in the neighborhood of 1 million rows and 6 columns, all strings. I definitely want to index as I need queries fast but I'm a bit confused about it. Assuming I won't need to update anything (once the table is made it will not need to be changed), can/should I index every column? Does this at some point become counterproductive and if so when?

Also, I'm using Python to code the entire project. My plan is to use SQLite/Python to build the database and then install the MySQL community server on my desktop to actually make the database accessible over the internet. Is this going to work or am I totally confused?

Thanks in advance

1) Once the database is built with SQLite why can't you query it locally instead of having to go out over the Internet?
2) Never ever ever ever ever ever ever ever expose an SQL server to the Internet. Build some kind of service (HTTP or otherwise) to act as a frontend if you really need to go over the internet.
3) I don't understand why you want to involve both SQLite and MySQL. Pick one.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

chmods please posted:

1) Once the database is built with SQLite why can't you query it locally instead of having to go out over the Internet?
2) Never ever ever ever ever ever ever ever expose an SQL server to the Internet. Build some kind of service (HTTP or otherwise) to act as a frontend if you really need to go over the internet.
3) I don't understand why you want to involve both SQLite and MySQL. Pick one.

I'd say pick SQLite, no need to mess with MySQL right now. 1 million rows is nothing, if you manage to get a slow query, then figure out how to optimize that specific one. Don't add indexes to everything, you'll be worse off.

fletcher fucked around with this message at 09:16 on Feb 25, 2014

jdi448hsnvb
Dec 6, 2006

asfd
Sorry, maybe I don't understand quite how this all works, but I can't do it locally because I don't want the program to be so massive; I suppose it could be but my professor said that she'd prefer we keep the programs relatively small.

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

dammitcharlie posted:

Sorry, maybe I don't understand quite how this all works, but I can't do it locally because I don't want the program to be so massive; I suppose it could be but my professor said that she'd prefer we keep the programs relatively small.

Your data shouldn't be stored with your code. Here are a couple options I would explore rather than trying to go client/server with MySQL:

* Can the data be generated by a small script so you can populate your database when needed?
* If it's some existing dataset and you can't generate it on the fly, can you just store a CSV of it on a thumbdrive/dropbox/ftp/whatever, and import into SQLite as needed?

jdi448hsnvb
Dec 6, 2006

asfd
The data will be generated by scraping websites over the next few days. So I guess I should scrape the data into a CSV file and then store that CSV with the program which will load it into sqlite upon starting?

Adbot
ADBOT LOVES YOU

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

dammitcharlie posted:

The data will be generated by scraping websites over the next few days. So I guess I should scrape the data into a CSV file and then store that CSV with the program which will load it into sqlite upon starting?

Yup I'd go with something just like that. Bonus points if you can simply specify a URL for the data source and if the data hasn't been downloaded before, it downloads it for you. That way the same steps for running it the first time can be used for running it a second time.

If you have time you might also want to explore having your scraper automatically compress the csv into a zip file, and then your program can uncompress it after it downloads it. I haven't used it before but it looks like Python has a nice interface for working with zip files.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply