Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
blacksun
Mar 16, 2006
I told Cwapface not to register me with a title that said I am a faggot but he did it anyway because he likes to tell the truth.
So I have a question for the DB experts here.

I'm a web developer and have been tasked with finding a way to export our stock and quantities from our MS SQL Server 2005 Express database and importing them into our online store (Magento) running on MySQL (in a LAMP stack).

Sadly as Magento's MySQL database has a retarded EAV table structure using a database replication program is likely going to be more trouble than its worth, especially with great tools like this: http://sourceforge.net/apps/mediawiki/magmi/index.php?title=Main_Page

Essentially I'm envisioning the project going something like:

SSIS or ETL software massages and exports needed data into a CSV, which is then uploaded onto our web server, MAGMI is called and the stock or quantities updated. This needs to be both able to be executed in an automated manner for quantity updates, and on call for stock updates (assuming about 1.5 million entries will be too many to be updated on an automated basis, which is probably more a restriction of Magento than SSIS or any ETL software).

Am I better off using a third party ETL or upgrading our SQL Server to 2005 Standard? Upgrading to 2008 probably isn't an option as our server with the DB uses Windows Small Business Server 2003 and I don't think 2008 is supported.

I've so far managed to create almost all of the SQL needed to gather and link all the data I need, however SSIS or any ETL program is rather new to me. Are there any resources I can use to get a better handle on it?

Is it worth forking out the extra money for SSIS over a third party ETL software package? I'll need to sell the bosses on this, so please explain your reasoning! For a third party ETL package I was looking at: http://www.dbsoftlab.com/etl-tools/advanced-etl-processor-enterprise/overview.html

Adbot
ADBOT LOVES YOU

Zombywuf
Mar 29, 2008

blacksun posted:

I'm a web developer and have been tasked with finding a way to export our stock and quantities from our MS SQL Server 2005 Express database and importing them into our online store (Magento) running on MySQL (in a LAMP stack).

The first tool I would reach for is a scripting language (probably Python), bash, awk and cron.

I know people who like SSIS, I'm not one of them. It seems to be more complex than any of the problems it claims to solve.

Cold on a Cob
Feb 6, 2006

i've seen so much, i'm going blind
and i'm brain dead virtually

College Slice
I'm not an ETL guy but we've used SSIS and it works ok for one-shot data migrations (especially if they involve XLS or Access data inputs), but I have some serious issues with it for being part of an ongoing process that will need to be maintained. My biggest issues are it's impossible to merge and diff the files even though it's XML, error handling is annoying, drag and drop programming is SLOW and hides a lot of configuration/process/etc, etc. But if your process is relatively simple it's worth a look.

I'd love to hear more from some ETL guys because I'd like to know what other options are worth looking at in the future.

McGlockenshire
Dec 16, 2005

GOLLOCKS!
Always do the simplest thing that could possibly work. You probably don't need an entire ETL suite to pull this off.

SQL Server Express supports triggers and stored procedures, right?

It should be simple enough to monitor records for the products you care about, stick their identifiers in a queue, then have a script routinely push the changed data upstream.

blacksun
Mar 16, 2006
I told Cwapface not to register me with a title that said I am a faggot but he did it anyway because he likes to tell the truth.
I have very limited experience in Python (I work mostly in PHP) and my SQL usage has been limited at best, hence why a GUI would be preferable for building this package.

Pilsner
Nov 23, 2002

JeansW posted:

Wait, so the "Registered Servers" window from previous years no longer exists? Also, Intellisense will always be poo poo for SQL until the language itself gets rewritten to be similar to LINQ.
I actually just found out about the Registered Servers feature. :downs: It's there, don't worry.

It's not really about the Intellisense being impossible for SQL, it's just Microsoft's version that's crap. For example, when I write "select * from Customer where ", it lists tables, columns, databases, functions and all sorts of stuff equally. 99.9% of the time you want columns there.

Oh well, I installed a free addon called dbForge SQL Complete Express that has amazing, reliable and fast Intellisense, so it's all good now.

blacksun
Mar 16, 2006
I told Cwapface not to register me with a title that said I am a faggot but he did it anyway because he likes to tell the truth.
So we've decided to head in a slightly different direction with the project (for the time being at least).

Instead of synchronising all the stock details etc, we were instead just going to sync quantities in one direction from a MS SQL Server 2005 Express database to a MySQL 5.0.95 database on our hosted web server.

For this project I was looking at OmegaSync to connect, compare and propogate changes from SQL Server to MySQL. However as OmegaSync doesn't allow the execution of SQL to generate the data to compare and synchronise, I need to create a table in our SQL Server database that will be updated with the relevant data as changes occur in the system.

For this sync to occur we need to concatenate two columns together and transform the transactional column that records stock movements in our database into a flat stock holding column.

I have created queries for both of these functions below however I'm unsure as to how I would go about automating these to insert (when new stock is added) and update (when quantities change) to a table.

If I understand this correctly (which I very well may not), this would be best accomplished through some combination of stored procedures and/or triggers?
Am I on the right path regarding this plan?

Could someone point me in the direction of some material for developing a table that is updated based on changes to another table?


Relevant Table Structure

MS-SQL Server 2005 Express

Table
[Style Colour Size]
Columns
[ID] (Primary Key)
[Style ID]
[Size ID]

Table
[Stock Style]
Columns
[ID] (Style Colour Size.Style ID)
[Code] (Needs to be concatenated with Stock Size.Code)

Table
[Stock Size]
Columns
[ID] (Style Colour Size.Size ID)
[Code] (Needs to be concatenated with Stock Style. Code)

Table
[Stock On Hand]
Columns
[Style/Colour/Size ID] (Style Colour Size.ID)
[Quantity] (Transactional column, shows positive and negative stock movements in and out of our system)

View (made by me but potentially unsuitable for final setup)
[stock_on_hand]
Columns
[Style/Colour/Size ID] (Style Colour Size.ID)
[qty] (Shows total stock holding for any given Style/Colour/Size combination)


Query to generate view:

SELECT TOP (100) PERCENT [Style/Colour/Size ID], SUM(Quantity) AS qty
FROM dbo.[Stock On Hand]
GROUP BY [Style/Colour/Size ID]
ORDER BY [Style/Colour/Size ID]


Query to select and concatenate the data from tables and view:

SELECT [ID], [qty],
(SELECT [Stock Style].[Code] + '-' + [Stock Size].[Code]
FROM [CI Fallow].[dbo].[Stock Style], [CI Fallow].[dbo].[Stock Size]
WHERE [Style Colour Size].[Style ID] = [Stock Style].[ID] AND
[Style Colour Size].[Size ID] = [Stock Size].[ID]
FOR XML PATH('') ) AS [SKU]
FROM [CI Fallow].[dbo].[Style Colour Size], [stock_on_hand]
WHERE [Style Colour Size].[ID] = [stock_on_hand].[Style/Colour/Size ID]

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Golbez posted:

Something like ... "SELECT IF(DAYOFWEEK(date) = 7, WEEK(date, 3) - 1, WEEK(date, 3)) AS week_number" ? And use that idea to populate where clauses and the like.

I like the simplicity of this...but it seems too good to be true. Anything involving dates and times inevitably makes my brain hurt. I mean, with a standard for week numbering like ISO-8601, should anything else really be used in the business world?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine

fletcher posted:

I like the simplicity of this...but it seems too good to be true. Anything involving dates and times inevitably makes my brain hurt. I mean, with a standard for week numbering like ISO-8601, should anything else really be used in the business world?

Though I just realized you'd have to account for the first week of the year; don't want to deal with a 0 week.

mindphlux
Jan 8, 2004

by R. Guyovich
ok, this is really doing my head in. I'm trying to do a very simple query joining two tables where my sort column is an nvarchar that contains 9 digit numbers. So, I'm converting the column to an integer, but it's not sorting. I've tried cast, I've tried convert, I've tried doing the conversion in the ORDER BY code, I just can't figure it out.

code:
SELECT TOP (100) PERCENT dbo.ownersdata.flag,
                         dbo.ownersdata.fein,
                         dbo.storedata.id,
                         dbo.storedata.division,
                         dbo.storedata.market,
                         dbo.storedata.store,
                         dbo.storedata.alpha,
                         dbo.storedata.fein AS StoreFEIN,
                         dbo.storedata.entity,
                         dbo.storedata.address,
                         dbo.storedata.city,
                         dbo.storedata.state,
                         dbo.storedata.zip,
                         dbo.storedata.phone,
                         dbo.storedata.recdweeklymail,
                         dbo.storedata.test,
                         dbo.storedata.ssma_timestamp
FROM   dbo.ownersdata
       RIGHT OUTER JOIN dbo.storedata
                     ON dbo.ownersdata.fein = dbo.storedata.fein
WHERE  ( dbo.ownersdata.flag = 1 )
ORDER  BY Cast('dbo.ownersdata.FEIN' AS INT) DESC

glompix
Jan 19, 2004

propane grill-pilled

mindphlux posted:

ok, this is really doing my head in. I'm trying to do a very simple query joining two tables where my sort column is an nvarchar that contains 9 digit numbers. So, I'm converting the column to an integer, but it's not sorting. I've tried cast, I've tried convert, I've tried doing the conversion in the ORDER BY code, I just can't figure it out.

code:
ORDER  BY Cast('dbo.ownersdata.FEIN' AS INT) DESC

Try removing the quotes from this order by clause. I think you're trying to cast that literal string to an int, which should actually throw a runtime error. At least in SQL Server, you can do the cast in the order by clause, or you can declare a column that does the cast, name it, and then sort it by that name. I.e.:

code:
select balls from farts order by cast(balls as int)
or

code:
select balls, cast(balls as int) as balls_i from farts order by balls_i

mindphlux
Jan 8, 2004

by R. Guyovich
I've done both those approaches - selecting a casted column as an expr, and just doing orderby as a cast without the quotes. I just added the quotes just incase I was being dumb - but both ways it returns an unsorted list. :confused:

banging my head against a wall. it is SQL server I'm using...

Jethro
Jun 1, 2000

I was raised on the dairy, Bitch!

mindphlux posted:

ok, this is really doing my head in. I'm trying to do a very simple query joining two tables where my sort column is an nvarchar that contains 9 digit numbers. So, I'm converting the column to an integer, but it's not sorting. I've tried cast, I've tried convert, I've tried doing the conversion in the ORDER BY code, I just can't figure it out.

code:
SELECT TOP (100) PERCENT dbo.ownersdata.flag,
                         dbo.ownersdata.fein,
                         dbo.storedata.id,
                         dbo.storedata.division,
                         dbo.storedata.market,
                         dbo.storedata.store,
                         dbo.storedata.alpha,
                         dbo.storedata.fein AS StoreFEIN,
                         dbo.storedata.entity,
                         dbo.storedata.address,
                         dbo.storedata.city,
                         dbo.storedata.state,
                         dbo.storedata.zip,
                         dbo.storedata.phone,
                         dbo.storedata.recdweeklymail,
                         dbo.storedata.test,
                         dbo.storedata.ssma_timestamp
FROM   dbo.ownersdata
       RIGHT OUTER JOIN dbo.storedata
                     ON dbo.ownersdata.fein = dbo.storedata.fein
WHERE  ( dbo.ownersdata.flag = 1 )
ORDER  BY Cast(dbo.ownersdata.FEIN AS INT) DESC
Two things:

1) Are you running this query directly, or are you creating a view based on this query and then querying that? If it's the latter, SQL Server will ignore the ORDER BY that is in the view except to determine which rows to return if you use TOP.

2) This has nothing to do with sorting, but either you don't really mean to do a RIGHT OUTER JOIN there, or you need to tweak WHERE ( dbo.ownersdata.flag = 1 ). If you have a condition in the WHERE clause that references the outer table, you effectively turn it into an INNER JOIN because the "missing" data from the outer table will always be null, and null = 1 is not true.

mindphlux
Jan 8, 2004

by R. Guyovich

Jethro posted:

Two things:

1) Are you running this query directly, or are you creating a view based on this query and then querying that? If it's the latter, SQL Server will ignore the ORDER BY that is in the view except to determine which rows to return if you use TOP.

2) This has nothing to do with sorting, but either you don't really mean to do a RIGHT OUTER JOIN there, or you need to tweak WHERE ( dbo.ownersdata.flag = 1 ). If you have a condition in the WHERE clause that references the outer table, you effectively turn it into an INNER JOIN because the "missing" data from the outer table will always be null, and null = 1 is not true.

Thanks for the feedback. I was doing a query of a view, working in SSMS. I'll try it another way and see what happens.

Pilsner
Nov 23, 2002

mindphlux posted:

ok, this is really doing my head in. I'm trying to do a very simple query joining two tables where my sort column is an nvarchar that contains 9 digit numbers. So, I'm converting the column to an integer, but it's not sorting. I've tried cast, I've tried convert, I've tried doing the conversion in the ORDER BY code, I just can't figure it out.

code:
SELECT TOP (100) PERCENT dbo.ownersdata.flag,
                         dbo.ownersdata.fein,
                         dbo.storedata.id,
                         dbo.storedata.division,
                         dbo.storedata.market,
                         dbo.storedata.store,
                         dbo.storedata.alpha,
                         dbo.storedata.fein AS StoreFEIN,
                         dbo.storedata.entity,
                         dbo.storedata.address,
                         dbo.storedata.city,
                         dbo.storedata.state,
                         dbo.storedata.zip,
                         dbo.storedata.phone,
                         dbo.storedata.recdweeklymail,
                         dbo.storedata.test,
                         dbo.storedata.ssma_timestamp
FROM   dbo.ownersdata
       RIGHT OUTER JOIN dbo.storedata
                     ON dbo.ownersdata.fein = dbo.storedata.fein
WHERE  ( dbo.ownersdata.flag = 1 )
ORDER  BY Cast('dbo.ownersdata.FEIN' AS INT) DESC
Have you just tried sorting on the column with no fuss? Integers stored as strings will sort the same as when stored as ints. It will be slow though.

Otherwise, try seeing this post, there must be a usable solution in there:

http://stackoverflow.com/questions/119730/how-do-i-sort-a-varchar-column-in-sql-server-that-contains-numbers

aBagorn
Aug 26, 2004
What's the best way to strip a time from a datetime field to compare to today's date?

We have a logging db that records start time and end time every activity a user performs on our intranet applications and they just asked me to keep a count of each daily, as well as a count of when end time is NULL (in order to keep count of errors). I'm not too good in SQL and I've heard that stripping time is tricky.

Ideally I'd throw this into a stored procedure with some parameters to be able to do all the activities and then use an external app with our custom notification DLL to send emails to users with all the counts

aBagorn fucked around with this message at 14:02 on Oct 23, 2012

JeansW
Aug 4, 2004
Yes. Like the jeans...

aBagorn posted:

What's the best way to strip a time from a datetime field to compare to today's date?

We have a logging db that records start time and end time every activity a user performs on our intranet applications and they just asked me to keep a count of each daily, as well as a count of when end time is NULL (in order to keep count of errors). I'm not too good in SQL and I've heard that stripping time is tricky.

Ideally I'd throw this into a stored procedure with some parameters to be able to do all the activities and then use an external app with our custom notification DLL to send emails to users with all the counts

Which DBMS? If it's SQL Server 2008 (or later) you can simply CAST(date_column AS DATE). If it's SQL Server < 2008 the most efficient way is to use DATEADD(DAY, DATEDIFF(DAY, 0, date_column), 0)

If it's not SQL Server then you need to specify which it is so someone else can chime in.

aBagorn
Aug 26, 2004

JeansW posted:

Which DBMS? If it's SQL Server 2008 (or later) you can simply CAST(date_column AS DATE). If it's SQL Server < 2008 the most efficient way is to use DATEADD(DAY, DATEDIFF(DAY, 0, date_column), 0)

If it's not SQL Server then you need to specify which it is so someone else can chime in.

It's SQL 2008

Am I on the right track with this, then? My SQL is terrible (for what a .NET developer should be)

SQL code:
CREATE PROCEDURE [Logging].[DailyCount]

@pActivity varchar(250)

AS
BEGIN

	SELECT COUNT(*) from
	Logging.ProcessLogging
	where CAST(StartTime AS DATE) = CAST(GETDATE() AS DATE)
	and Activity like '%'+@pActivity+'%'
END
GO

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
And for MySQL all you need, I believe, is DATE(date_column).

Aredna
Mar 17, 2007
Nap Ghost

aBagorn posted:

It's SQL 2008

Am I on the right track with this, then? My SQL is terrible (for what a .NET developer should be)

SQL code:
CREATE PROCEDURE [Logging].[DailyCount]

@pActivity varchar(250)

AS
BEGIN

	SELECT COUNT(*) from
	Logging.ProcessLogging
	where CAST(StartTime AS DATE) = CAST(GETDATE() AS DATE)
	and Activity like '%'+@pActivity+'%'
END
GO

What you have works. I tend to try to avoid modifying the columns in the table for comparison purposes and instead move all of that work to the other side so it's one calculation. Sometimes it doesn't matter, but sometimes it's quite a bit quicker. In this case the change I made below will be logically equivalent and does that.
SQL code:
CREATE PROCEDURE [Logging].[DailyCount]

@pActivity varchar(250)

AS
BEGIN

	SELECT COUNT(*) from
	Logging.ProcessLogging
	where StartTime >= CAST(GETDATE() AS DATE)
	  and StartTime <  CAST(DATEADD(DAY,1,GETDATE()) AS DATE)
	  and Activity like '%'+@pActivity+'%'
END
GO

kalthir
Mar 15, 2012

Not sure if it's any faster than what's already been suggested, but DATEDIFF does what you want.

SQL code:
CREATE PROCEDURE [Logging].[DailyCount]

@pActivity varchar(250)

AS
BEGIN

	SELECT COUNT(*) from
	Logging.ProcessLogging
	where DATEDIFF(day,StartTime, GETDATE()) = 0
	  and Activity like '%'+@pActivity+'%'
END
GO

JeansW
Aug 4, 2004
Yes. Like the jeans...

kalthir posted:

Not sure if it's any faster than what's already been suggested, but DATEDIFF does what you want.

SQL code:
CREATE PROCEDURE [Logging].[DailyCount]

@pActivity varchar(250)

AS
BEGIN

	SELECT COUNT(*) from
	Logging.ProcessLogging
	where DATEDIFF(day,StartTime, GETDATE()) = 0
	  and Activity like '%'+@pActivity+'%'
END
GO

By doing that you lose out on any performance increases from indexes, because your index will be built on the StartTime column, but not the DATEDIFF() function applied to the StartTime column.

What Aredna posted is the more proper way of doing it. Don't apply a function to the column on the table, instead, compare the column on the table to whatever you're doing.

big trivia FAIL
May 9, 2003

"Jorge wants to be hardcore,
but his mom won't let him"

This is driving me bonkers. I'm trying to reconcile a Source->Target migration that has about 5 steps in between and I have to reconcile the target view with the source table structure. Basically I want to see that a table from source has the same # of records as a view from target (given certain conditions and all). I set up a linked server to try to find offending records.

I have 1,672,494 records in my Target view. I have 1,672,522 records in my source. There are 28 source records that are not turning up.

When I try to do some poo poo like

SQL code:
SELECT
  t.col1,
  t.col2,
  t.col3,
  t.col4
FROM source_server.sourcedb.schema.table
WHERE NOT EXISTS(
   SELECT
      v.col1,
      v.col2,
      v.col3,
      v.col4
   FROM target_server.targetdb.schema.view
)

0 results found
I get no results. Maybe this belongs in the coding horrors thread? The actuality of the matter is that I simply cannot reconcile the records in the target view with the source DB, and I'm beginning to think that our developer may have messed up the ETL, because the target view is not built incorrectly.

big trivia FAIL fucked around with this message at 01:47 on Oct 25, 2012

Aredna
Mar 17, 2007
Nap Ghost
Is it possible you have duplicate records in your source table that are being eliminated in the view?
SQL code:
SELECT
  t.col1,
  t.col2,
  t.col3,
  t.col4,
  count(*)
FROM source_server.sourcedb.schema.table
GROUP BY
  t.col1,
  t.col2,
  t.col3,
  t.col4
HAVING count(*) > 1
This would be my first quick check if I'm not expecting duplicates in either data source.

Edit: This should show you exact differences if it's due to dupes in either data source.
SQL code:
WITH t AS
(
  SELECT
    t.col1,
    t.col2,
    t.col3,
    t.col4,
    count(*) as tblcount
  FROM source_server.sourcedb.schema.table t
  GROUP BY
    t.col1,
    t.col2,
    t.col3,
    t.col4
)
,v as
(
  SELECT
    v.col1,
    v.col2,
    v.col3,
    v.col4,
    count(*) as viewcount
  FROM target_server.targetdb.schema.view v
  GROUP BY
    v.col1,
    v.col2,
    v.col3,
    v.col4
)
SELECT *
FROM t
  FULL OUTER JOIN v
     ON t.col1 = v.col1
    AND t.col2 = v.col2
    AND t.col3 = v.col3
    AND t.col4 = v.col4
    AND t.tblcount <> v.viewcount

Aredna fucked around with this message at 02:05 on Oct 25, 2012

big trivia FAIL
May 9, 2003

"Jorge wants to be hardcore,
but his mom won't let him"

Aredna posted:

Is it possible you have duplicate records in your source table that are being eliminated in the view?
SQL code:
SELECT
  t.col1,
  t.col2,
  t.col3,
  t.col4,
  count(*)
FROM source_server.sourcedb.schema.table
GROUP BY
  t.col1,
  t.col2,
  t.col3,
  t.col4
HAVING count(*) > 1
This would be my first quick check if I'm not expecting duplicates in either data source.

I thought of that, but no. It doesn't appear to be so.

[edited b/c i dont know who reads this]

big trivia FAIL fucked around with this message at 02:24 on Oct 25, 2012

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad
I have a query that I think is fairly common. It certainly seems so with our applications.

code:
id | val | ts
---+-----+-------
 a |  10 | 12:01
 a |  12 | 12:05
 a |   9 | 12:15
 b |  30 | 12:03
I want to get the latest timestamped row for each id. The two ways I've done it are:

code:
-- ranking
select
id,
val
from
  (select
   row_number() over (partition by id order by ts desc) as rank,
   id,
   first_value(val) over (partition by id order by ts desc) as val
   from t) ranked
where rank = 1;

-- distincting
select
distinct id, val
from
  (select
   id,
   first_value(val) over (partition by id order by ts desc) as val
   from t) ranked;
I have no idea how well these perform in traditional databases, but they are pretty good in Vertica. So the problem isn't performance, but aesthetics.

When tables get wider, having to write out all the columns with the window clauses is really tedious (even with a window alias) and leads to extremely messy SQL. It just bothers me. Is there some form of the first query that I'm not aware of?

Goat Bastard
Oct 20, 2004

I've never felt the need, and I don't know if this is ok in Vertica, but in Oracle you can do

code:
select
id,
val
from
  (select 
   t.*,
   row_number() over (partition by id order by ts desc) as rank
   from t) ranked
where rank = 1;
if you want. But tbh I'd prefer typing the rows once then copy/pasting, as long as the query is properly formatted and performs well.

I don't have a good reason as to why, but seeing a * in a query that is intended to be more than just scratch makes me uneasy.

InAndOutBrennan
Dec 11, 2008
You could also do:
code:
SELECT id,
	val
FROM t
	INNER JOIN (SELECT id
			,MAX(ts) AS last_ts
			FROM t
			GROUP BY id) t2
		ON t2.last_ts = t.ts
			AND t2.id = t.id
You will get duplicates if you have doubles of the timestamps though. The other solutions will give you one value even if the order is undefined in that case.

kimbo305 posted:

I have a query that I think is fairly common. It certainly seems so with our applications.

edit: loving group by

hangry jeep
Apr 27, 2004

I'm still learning SQL and having a bit of trouble with trying to figure out if this is the right query for this and if it's the most efficient:

code:
SELECT COUNT (bodytext) FROM text_table
WHERE bodytext like '%Television%'
OR bodytext like '%television%'
OR bodytext like '%tv%'
OR bodytext like '%TV%'
What I want to do is find the number of times the word Television or television or tv or TV appears anywhere in the column 'bodytext'. It can be at the beginning of a sentence, in the middle, at the end, or really anywhere. I don't think this is the most efficient way of doing this (or even if this is the right way)? I read something about using regex for it, but I'm not really sure how that would be used either.

glompix
Jan 19, 2004

propane grill-pilled

quimbee posted:

I'm still learning SQL and having a bit of trouble with trying to figure out if this is the right query for this and if it's the most efficient:

code:
SELECT COUNT (bodytext) FROM text_table
WHERE bodytext like '%Television%'
OR bodytext like '%television%'
OR bodytext like '%tv%'
OR bodytext like '%TV%'
What I want to do is find the number of times the word Television or television or tv or TV appears anywhere in the column 'bodytext'. It can be at the beginning of a sentence, in the middle, at the end, or really anywhere. I don't think this is the most efficient way of doing this (or even if this is the right way)? I read something about using regex for it, but I'm not really sure how that would be used either.

What database engine are you using?

In SQL Server, LIKE clauses perform terribly. I think LIKE is case insensitive too, so you're probably doing more work than you need for rows that don't match. If you're using SQL Server, you can add an assembly to your database that has a regex function, and create a scalar function from that. You could also implement full text search, which I think would be most preferable.

Begby
Apr 7, 2005

Light saber? Check. Black boots? Check. Codpiece? Check. He's more machine than kid now.

quimbee posted:

I'm still learning SQL and having a bit of trouble with trying to figure out if this is the right query for this and if it's the most efficient:

code:
SELECT COUNT (bodytext) FROM text_table
WHERE bodytext like '%Television%'
OR bodytext like '%television%'
OR bodytext like '%tv%'
OR bodytext like '%TV%'
What I want to do is find the number of times the word Television or television or tv or TV appears anywhere in the column 'bodytext'. It can be at the beginning of a sentence, in the middle, at the end, or really anywhere. I don't think this is the most efficient way of doing this (or even if this is the right way)? I read something about using regex for it, but I'm not really sure how that would be used either.

The way you have this is not all that efficient as already stated. Also, this is not searching for words but rather any text that has a 't' followed by 'v', you need to change it to '% tv %' to actually get words, then you need to account for stuff like '% TV.%' where it could show up at the end of a sentence, then what if it has a comma after it. Right now you will get false positives on stuff like 'www.somesite.tv' or 'outvalue' or 'postvasectomy' etc.

Is there any way you could select all the records and do a loop and a regular expression? That might be a bit quicker if you can't get your db optimized or if your db does not have regex. Regular expressions will be a lot better at handling a case where you want to capture a tv or television with a space in front, followed by a space, or period, or comma, or semicolon, etc. With LIKE you will have to account for every case separately.

butt dickus
Jul 7, 2007

top ten juiced up coaches
and the top ten juiced up players
Not sure what you're using, but this works in MySQL:
SQL code:
SELECT COUNT(*) FROM text_table
WHERE
  bodytext RLIKE '[[:<:]]television[[:>:]]'
OR
  bodytext RLIKE '[[:<:]]tv[[:>:]]';
the [[:<:]] and [[:>:]] are word boundaries like \b in standard regex.

Cold on a Cob
Feb 6, 2006

i've seen so much, i'm going blind
and i'm brain dead virtually

College Slice
If you are on SQL Server and need high performance string matching, I agree that CLR is generally the way to go and it will scale much better as well. Don't use a cursor though, and don't expect miracles as I believe your code will still be doing a full table scan.

I haven't tried this package out yet but I plan on trying this to do some performance optimizations on the applications I am currently supporting, rather than writing the CLR assemblies myself:
http://www.sqlsharp.com/

If you use a full-text index in SQL Server you need to use the proper full-text searching functionality instead of LIKE (i.e. CONTAINS, FREETEXT, etc), so if this is something that needs to perform well and you have the disk to burn on a full text index it's worth a shot.

hangry jeep
Apr 27, 2004

The database is hive. I don't really have the option of changing how it's setup, I'm just suppose to run the query for it. :/

Newf
Feb 14, 2006
I appreciate hacky sack on a much deeper level than you.
*homework time*

I've got a database table called 'Department' and a text file called 'Department_data.txt'. I want to import the data from the text file with a single command.

code:
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| dname        | varchar(20) | YES  |     | NULL    |       |
| dnumber      | int(11)     | NO   | PRI | 0       |       |
| mgrsin       | int(11)     | YES  | MUL | NULL    |       |
| mgrstartdate | varchar(20) | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+
code:
Research 5 333445555 1995-05-22 
Administration 4 987654321 1995-01-01 
Headquarters 1 888665555 1981-06-19 
What I've got so far is:

code:
LOAD DATA LOCAL INFILE 'Department_data.txt' INTO TABLE Department LINES TERMINATED BY '\r\n';
which yields:
code:
SELECT * FROM Department;[s][/s]
+----------------------+---------+--------+--------------+
| dname                | dnumber | mgrsin | mgrstartdate |
+----------------------+---------+--------+--------------+
| Research 5 333445555 |       0 |   NULL | NULL         |
+----------------------+---------+--------+--------------+
1 row in set (0.00 sec)
So it's taking the first 20 characters and assigning them to dname, instead of taking each 'word' and assigning them to each value. I've been poking around google, but I think this is probably one of those things that's easier to ask a person about.

Help me?

Golbez
Oct 9, 2002

1 2 3!
If you want to take a shot at me get in line, line
1 2 3!
Baby, I've had all my shots and I'm fine
Try adding FIELDS TERMINATED BY ' ' to it.

kimbo305
Jun 9, 2007

actually, yeah, I am a little mad

Goat Bastard posted:


code:
select
id,
val
from
  (select 
   t.*,
   row_number() over (partition by id order by ts desc) as rank
   from t) ranked
where rank = 1;
I don't have a good reason as to why, but seeing a * in a query that is intended to be more than just scratch makes me uneasy.
Interesting -- I guess I don't need to do FIRST_VALUE for any of the value columns. Just rank is fine. After reviewing the query plan for this new instance, I don't think I'd be gaining much, since the partitions are already sorted the right way. But I'll give that a try (removing first_value, not going to .*).

To give back a little bit -- Vertica's #1 mission is load speed and storage performance. So it implements UPDATE in a terrible way that totally unsuited for OLTP work. A single-row update on a biggish table might take the better half of a second. Otherwise, we'd just be UPDATEing primary key id rows.

kimbo305 fucked around with this message at 07:16 on Oct 26, 2012

Newf
Feb 14, 2006
I appreciate hacky sack on a much deeper level than you.

Golbez posted:

Try adding FIELDS TERMINATED BY ' ' to it.

Ace, thanks. Ended up having to remove the lines terminated by clause too, as it turns out.

lord funk
Feb 16, 2004

Is there a way to introduce a generic object in Java, later decide what it is, then call functions to it? Here is the use case I'm trying to figure out:


Java code:
void drawLotsOfThings() {
  
  Object obj;
  
  int whichOne = int(random(3));
  switch(whichOne) {
    case 0:
      obj = new Circle(10);
    break;
    case 1:
      obj = new Triangle(10);
    break;
    case 2:
      obj = new Rectangle(10);
    break;
    default:
    break;
  }
  
  obj.drawInFrame();
}
(I'm working in Processing, in case that helps / matters).

Adbot
ADBOT LOVES YOU

Janitor Prime
Jan 22, 2004

PC LOAD LETTER

What da fuck does that mean

Fun Shoe
That's a basic OOP principal called Polymorphism. The idea is that you don't use a generic Object but rather a specific type with an abstract method (called drawInFrame in your case)that each subtype (Circle, Triangle, Rectangle in your case) implements.

Java code:
void drawLotsOfThings() {
  
  Shape shape;
  
  int whichOne = int(random(3));
  switch(whichOne) {
    case 0:
      shape= new Circle(10);
    break;
    case 1:
      shape= new Triangle(10);
    break;
    case 2:
      shape= new Rectangle(10);
    break;
    default:
    break;
  }
  
  shape.drawInFrame();
}

public interface Shape {
    public void drawInFrame();
}
public class Circle implements Shape{
    @Override
    public void drawInFrame(){
      do some drawing stuff;
    }
}

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply