Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Jo
Jan 24, 2005

:allears:
Soiled Meat
EDIT: I am dumb.

Adbot
ADBOT LOVES YOU

Jo
Jan 24, 2005

:allears:
Soiled Meat

King Gonorrhea posted:

There may be a better thread for this but I'm hoping to just have a question answered and I think experienced app builders might know best.

My Dad wants to build/commission an image recognition app where you would take a picture, and then your phone serves up a bunch of information on what was in your picture. He has asked me to research how an app could access current databases of knowledge to assist in this, what is required etc.

I've attempted looking around but I haven't found enough information to tell me whether he's even asking the right question or not, so, at a very high level, is there any special access required for an app to consult google or wiki for information? What other storehouses of knowledge are there that may be useful in this?

Thanks for your time, a high level quick answer is fine.

Let's break this problem into parts. I'm assuming that you're speaking of an Android app because I don't know the iPhone process very well, but there will be minimal differences in the grand scheme of things. If you mean a desktop app, that will change things. If you mean a web app, that will change things.

Your app will probably...
Take a picture.
Probably the easiest part. On Android, you specify an intent and it acquires a picture using the camera. Easy peasy.

Perform an object selection in the scene.
This requires either a user interaction to specify the items in the scene via touching/clicking, or solving the figure ground and image segmentation problem. Clicking on an image is pretty easy and not so error prone. If he _doesn't_ want that, you have to deal with figure-ground/image segmentation. Both of these are open problems in computer science and have been for probably 40 years. To give some sense of their difficulty, a team of three postdocs, a professor, and I spent a well over a year on the problem with minimal success. This will pretty much require a heuristic, but what that heuristic is depends on what the ultimate goal is. If you just want one specific object in the scene and info about it, then user input is the way to go. More recent research on multi-object classification has been done, and is currently regarded as state of the art. I'll dig up some citations if I can. This will _not_ run on a phone. At best, you can push the image to a web server and have that process the image, then return some classes for items in the scene along with bounding boxes. Hinton and LeCun are currently world leaders in this. You're not going to get any sort of universality here. Every item will be in one of perhaps 128 categories. I'll talk more about this at the end of my post.
For each object, perform either...
- An exact visual search
Which would be useful if you wanted to know _exactly_ about an object, like, for instance, the Mona Lisa
- OR perform a category lookup from the label
on Wikipedia, which would be useful if you want to know about an object with many different appearances, like a bicycle or a monkey or a sea-shell. This one would require some amount of image understanding, which is a more difficult and still unsolved problem. (See the Hinton paper at the end.)
This is the second easiest part. Wikipedia allows you to download the entire text corpus, which comes in at a cozy 15 terabytes uncompressed. I think it can be compressed down to ten gigs. You're best off just doing a search using one of many publicly available APIs. I'd recommend a Google search for [source of information] API. If that's not available, check if the source has a feed available, like an RSS feed. That will be easier to parse.

EDIT:
Here's a list of standard image classification databases used in academic research. Many of them are free to download. http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
Not that not all of them have ground-truth information available, and many of them are either gigantic or feature specific. There's not a lot of cross-domain success right now. If you train on one style of data, that's the style that gets used.

I've used the PASCAL and CalTech-256 database before and I think it's probably closest to what you have in mind. Take a look at the papers from the winners of yesteryear. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/index.html and http://www.vision.caltech.edu/Image_Datasets/Caltech256/

This paper by Geoff Hinton is a turning point for modern computer vision. It really spurred a huge resurgence of interest in the industry. http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf Skip to the end and check out the results.

EDIT 2:
This app http://camfindapp.com/ is actually probably on the lines of what you want. I've not tried it, but it may suit what your father is looking for.

Jo fucked around with this message at 21:47 on Jun 12, 2014

Jo
Jan 24, 2005

:allears:
Soiled Meat

ElPipiripau posted:

Hi! I'm a Premiere Pro user. As you may know, sometimes you have to render what you have already edited in your timeline. This can take a few minutes and, here's the request, I would like a tiny app to watch CPU usage when it is more than, say, 80% [user specified] and if it is this way for more than %seconds%, chime a %sound% when CPU goes back down to < 10% of usage (so i can know when rendering is finished).

This could be wonderful if I can define what aplication (like Premiere or AfterEffects) are running when this app play the sound.

Thanks in advance!

Are you a Windows user? What version? Do you have Java or Python installed, or do you want a native app? Can it run from the console or do you want something with a GUI?

Jo
Jan 24, 2005

:allears:
Soiled Meat

ElPipiripau posted:

Sorry, yes, i'm a Windows 7 user. I have Java installed but a native app would be better. A simple GUI would be great and able to reside on traybar (where Windows clock is located). Thanks.

I haven't had the time to take a pass at this yet, but I still would like to. (Not sure if I'll get to it tonight.) If the Shutter setup doesn't work for you and you still want a half-assed solution, PM me.

Jo
Jan 24, 2005

:allears:
Soiled Meat

Heran Bago posted:

I'm looking for a software solution to listen for a specific sound clip, and if detected perform another action such as open a file.

This isn't a "tiny" app by any means from a programming standpoint, I just wonder if something like this is already out there. Googling is coming up dry.

If this is a game I have I might be able to look for the trigger. What game and what event? Otherwise I might be able to make something that monitors the system audio out. What OS?

  • Locked thread