Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Little Brittle
Nov 1, 2004

Come visit me dawg
I have a table set up like this:

code:
--
-- Table structure for table `product_feed`
--

`product_feed` (
  `sku` varchar(24) NOT NULL,
  `brand` varchar(48) NOT NULL,
  `prodname` varchar(48) NOT NULL,
  `link` text NOT NULL,
  `keywords` text NOT NULL,
  `date` varchar(24) NOT NULL,
  `category` varchar(24) NOT NULL,
  UNIQUE KEY `sku` (`sku`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
How would I create efficient queries and indexes for partial matches? Say I wanted to search the 'brand,prodname,keywords' fields for keyword 'Sony'. Ideally, it would return all SKUs that contain that keyword in any of those three fields. I can't find a nice way to get partial matches without doing a full text search (MATCH AGAINST). Is there a more efficient way of doing it? How would I index the fields to put the least strain on the server?

Adbot
ADBOT LOVES YOU

Little Brittle
Nov 1, 2004

Come visit me dawg
I'm trying to build a search based on tags, but I'm having trouble running queries on blog posts matching multiple tags while returning the total matches. Here is my table layout and some sample data:

code:
TABLE "posts"

id  title             content           time
--  --------          -----------       ----
12  my blog title 1   bla bla bla...   1218826136
13  my blog title 2   bla bla bla...   1218826136
14  my blog title 3   bla bla bla...   1218826136


TABLE "tags"

id  name
--  -----
45  cars  
46  trucks   
47  motorcycles 


TABLE "tagmap"

id   tag_id   post_id
--   ------   -------
156  46       12
157  46       14
158  45       12
159  47       13
Say someone does a tag search for "cars trucks". How can I run a query that a) returns a limit of 20 posts that are tagged with both tags, and b) return the total number of results for pagination purposes? Is there a way to do this in a single query? Right now I'm using a query like this, but I can't find a way to both limit the results and count the total possible rows:

code:
Query for "bookmark+webservice+semweb"
SELECT posts.*
FROM tagmap, posts, tags
WHERE tagmap.tag_id = tags.id
AND (tags.name IN ('bookmark', 'webservice', 'semweb'))
AND posts.id = tagmap.post_id
GROUP BY posts.id
HAVING COUNT( posts.id )=3

Little Brittle
Nov 1, 2004

Come visit me dawg
I'm having trouble doing this query in MySQL via PHP. I keep getting this error:
#1054 - Unknown column 'posts.user' in 'on clause',
The query works if I put single quotes around 'posts.user' in the LEFT JOIN, but it no longer pulls in the correct value. Could someone help me out with this?

code:
SELECT SQL_CALC_FOUND_ROWS posts.*, users.username
FROM posts, tagmap, tags
LEFT JOIN users ON users.id = posts.user
WHERE tagmap.tag_id = tags.id
AND tags.name = 'funny'
AND posts.id = tagmap.post_id
GROUP BY posts.id
LIMIT 20 

Little Brittle
Nov 1, 2004

Come visit me dawg

Victor posted:

Do you dislike inner join for some reason? Strictly speaking, you could swap posts with tags in the from clause, but that's hacky and makes things brittle. Just use inner joins.
Thanks for the help. I only used LEFT JOIN because it's what I know. If inner join is better, I'd gladly use that. What would be the most efficient way to write this query, if I threw mine in the trash?

Little Brittle
Nov 1, 2004

Come visit me dawg
How would you design a MySQL table to deal with a Twitter-style follow relationship? I already have a users table, so I imagine a table like this mapping two user IDs together:

Table: followers
code:
  following_user, followed_user
  18              349
  265             48             
  4668            681
  48              265
Is it efficient to insert a new row for every follow, even if it is just reciprocating an existing one (see row 2 and 4)? If so, how would you query a mutual relationship?

Little Brittle
Nov 1, 2004

Come visit me dawg

Triple Tech posted:

I wouldn't be so concerned about efficiency... (optimizing prematurely?) And how else would you do it relationally?

At the very least there would be two indexes defined. 1) A,B 2) B,A (to make reverse lookups as fast)
That makes sense. I'm not overly concerned about efficiency, I just want to make sure there isn't a better way to do it. If I do go with that format, is there a way to query all a user's followers while having a way to determine whether or not it is mutual? For example, I want to create a page displaying all of my followers and have a "follow" link only on users that I am not following.

My Followers:
Frank (you are following Frank)
Tom (you are following Tom)
Bill (click here to follow Bill)
Mike (you are following Mike)

Little Brittle
Nov 1, 2004

Come visit me dawg

Triple Tech posted:

code:
select -- untested
  fan          = a.following_user,
  is_following = b.following_user is not null

from followers a

left join followers b
 on a.following_user = b.followed_user
and b.following_user = 12345

where a.followed_user = 12345 -- the combination to the airlock and luggage
That did the job, thanks for the help.

Little Brittle
Nov 1, 2004

Come visit me dawg
I'm trying to figure out a query to pull similar articles based on the highest number of matching tags.

Here is some example MYSQL table data:
code:
   TABLE posts
id       | title 
-----------------------------
6        |  Amazing Stories
7        |  Interesting Finds
8        |  This Car is Cool

  TABLE tags
id       | Name
-----------------------------
12       |  weird
48       |  funny
49       |  interesting
50       |  sad
51       |  video
123      |  image

  TABLE tagmap
post_id  | tag_id
-----------------------------
6        |  48
6        |  49
7        |  123
7        |  50
7        |  51
7        |  12
8        |  48
8        |  49
8        |  50
Let's say posts.id = 8. How could I create a query to return 3 similar posts that share tags with posts.id 8? I would like to order results by the highest number of shared tags first. I would also like to exclude posts.id 8 from the results. Could anyone help with this? It seems easy enough, but I can't get the results I want from my testing.

Little Brittle fucked around with this message at 21:46 on Jun 13, 2009

Little Brittle
Nov 1, 2004

Come visit me dawg

camels posted:

this query (untested) retrieves all the posts that share tags with postid 8:

This part varies depending on what DBMS and version is used.
Thanks for the help. I'm using MySQL 5.0.22

Little Brittle
Nov 1, 2004

Come visit me dawg

camels posted:

Works perfectly, thanks for the help.

Little Brittle
Nov 1, 2004

Come visit me dawg
How can I run a MySQL select that searches a column for names that do not start with an alphanumeric character? Here is a terrible example of what I'm trying to do:
code:
SELECT * FROM things WHERE name !LIKE '%(a-zA-Z0-9)'

Little Brittle
Nov 1, 2004

Come visit me dawg
All of the above solutions work, thanks. I'll do some benchmarking to see which one is quickest.

Little Brittle
Nov 1, 2004

Come visit me dawg
I am helping with an ecommerce site that will be using a drop-ship supplier. This supplier offers a product feed with 60,000 SKUs for inventory data that is updated every 10 minutes. When a product is discontinued, it disappears from the feed. There are no timestamps for updated products.

I'm trying to figure out the best MySQL database design that allows for quick inventory updates every 10 minutes, and also has some sort of mechanism for removing items that no longer appear in the feed. The bad way would be to set all stock to zero before importing and all items that exist in the feed would overwrite with their stock quantities. The problem with that is there are 60k SKUs, and anyone trying to place an order while the feed is processing would get a cart error during checkout.

What is the proper way to do this? My goal is to get the lowest possible processing time and avoid 120k SQL queries every 10 minutes, while making sure that discontinued items are removed ASAP.

Adbot
ADBOT LOVES YOU

Little Brittle
Nov 1, 2004

Come visit me dawg

Triple Tech posted:

Why don't you just delete the item from the table? If you want to keep a table of product information and also keep track of stock (which may be zero), make a new table.

What exactly is the bottleneck here? That seems like such a small table.
I need to keep old products in the database for reporting purposes. I'm only looking to change the product status to unavailable if it doesn't appear in the feed. My problem is finding the most logical way to select which products exist in the database but do not exist in the latest feed. I realize this isn't a huge amount of data, but we are working with limited server resources and I'd rather not be running hundreds of thousands of queries every 10 minutes.

Sprawl posted:

I assume you have some sort of transactional table, where you would be able to do something like a lock and a set of crap no?

You could start with a lock then and update setting all the inventories to 0 and then run a set of updates for each inventory item with the correct levels and then unlock the table.
That was my first thought, I was just hoping for ways to achieve the same effect without having to run an update query for every SKU in the feed. Ideally, I'd like to find out a way to do this that only ran an update on SKUs whose inventory has changed and products which no longer exist in the feed. I'm not too hot with the ins and outs of table locking. If someone attempts to place an order while the inventory table is locked, won't that trigger an error?

dcallen posted:

If you want to avoid recalculating all of the entries you can have a delta table with all of the inventory changes since the last update. If you only sold 3 units of item 'A', you would be able to process the delta table and only update that single row.
That gets me thinking. I could create 2 tables in addition to the master product table, called 'inventory_in' and 'inventory_out'. I could populate the tables with the most recent feed and the one previous to it. I can select inventory changes, new items, and removed items easily by comparing the two, while not touching the master product table with hundreds of thousands of old SKUs.

Or am I just retarded and way over-thinking this?

Little Brittle fucked around with this message at 00:18 on Dec 15, 2009

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply