Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.
I'm a complete outsider interested in gaining an overview of OCR methods and applications; there's a specific OCR application that I think would be unusually straightforward and doable, and which I also think has already been completely accomplished with high fidelity in a nearly identical setting. I unfortunately can't be specific, but it's along the lines of "right now there are tools out there for reading product names and prices from grocery receipts, and we want a tool that can do the same to bookstore receipts." The real applications are actually much closer together and more stable than that example. There's also a preexisting, fully labeled dataset that I believe is quite large (six digits) that could be used to train and refine such an application. The endpoint would be a site where users would upload an image of the text and the output would be generated for their review/errorchecking immediately, then stored.

Everything I see and read makes this seem like it would be (relatively) simple, but at the same time, I know I'm close to clueless about any of this stuff, and I'm really just trying to get a sense of how costly/feasible actually doing it would be. Is there a good resource for really intro-level stuff that would give me a sense of how difficult building this out would look like?

Discendo Vox fucked around with this message at 06:18 on Jul 3, 2022

Adbot
ADBOT LOVES YOU

Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.

cinci zoo sniper posted:

Talking about real life application targets for this:

1) Would the text there be printed by a machine exclusively?

2) Would it all be in English?

3) Could these parcels of text be considered documents (i.e. clean background, standard font, focus on legibility)?

4) Would they be described well by a low number of templates?
1) Yes

2) No, a very low group, probably ~1%, will be in duplicated form with another language and in theory there will be a handful not in English. It should be possible to exclude these from the intended use, and/or from the training set.

3) Yes.

4) I’m not certain what you mean by “template”, but the images would be in a handful, maybe six, standardized forms where text is positioned in proportionately consistent relative positions and sizes. The theoretical training data is not currently labeled to separate these formats.

Discendo Vox fucked around with this message at 15:16 on Jul 3, 2022

Discendo Vox
Mar 21, 2013

We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.
Thanks, these responses are very helpful. I need to figure out how to persuade the relevant stakeholders that this would be similarly straightforward.

Jabor posted:

Have you tried just pointing Tesseract at your data and seeing how well it does?

I literally do not know how to do that. I do know that one implementation of the highly similar use case I'd mentioned uses tesseract.

Discendo Vox fucked around with this message at 21:57 on Jul 3, 2022

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply