The Artificial Intelligence & Machine Learning Megathread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread

Discendo Vox: Mar 21, 2013; We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.

I'm a complete outsider interested in gaining an overview of OCR methods and applications; there's a specific OCR application that I think would be unusually straightforward and doable, and which I also think has already been completely accomplished with high fidelity in a nearly identical setting. I unfortunately can't be specific, but it's along the lines of "right now there are tools out there for reading product names and prices from grocery receipts, and we want a tool that can do the same to bookstore receipts." The real applications are actually much closer together and more stable than that example. There's also a preexisting, fully labeled dataset that I believe is quite large (six digits) that could be used to train and refine such an application. The endpoint would be a site where users would upload an image of the text and the output would be generated for their review/errorchecking immediately, then stored.

Everything I see and read makes this seem like it would be (relatively) simple, but at the same time, I know I'm close to clueless about any of this stuff, and I'm really just trying to get a sense of how costly/feasible actually doing it would be. Is there a good resource for really intro-level stuff that would give me a sense of how difficult building this out would look like?

Discendo Vox fucked around with this message at 06:18 on Jul 3, 2022

# ¿ Jul 3, 2022 06:08

Adbot: ADBOT LOVES YOU

# ¿ May 21, 2024 05:36

Discendo Vox: Mar 21, 2013; We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.

cinci zoo sniper posted:

Talking about real life application targets for this:

1) Would the text there be printed by a machine exclusively?

2) Would it all be in English?

3) Could these parcels of text be considered documents (i.e. clean background, standard font, focus on legibility)?

4) Would they be described well by a low number of templates?

1) Yes

2) No, a very low group, probably ~1%, will be in duplicated form with another language and in theory there will be a handful not in English. It should be possible to exclude these from the intended use, and/or from the training set.

3) Yes.

4) I�m not certain what you mean by �template�, but the images would be in a handful, maybe six, standardized forms where text is positioned in proportionately consistent relative positions and sizes. The theoretical training data is not currently labeled to separate these formats.

Discendo Vox fucked around with this message at 15:16 on Jul 3, 2022

# ¿ Jul 3, 2022 15:13

Discendo Vox: Mar 21, 2013; We don't need to have that dialogue because it's obvious, trivial, and has already been had a thousand times.

Thanks, these responses are very helpful. I need to figure out how to persuade the relevant stakeholders that this would be similarly straightforward.

Jabor posted:

Have you tried just pointing Tesseract at your data and seeing how well it does?

I literally do not know how to do that. I do know that one implementation of the highly similar use case I'd mentioned uses tesseract.

Discendo Vox fucked around with this message at 21:57 on Jul 3, 2022

# ¿ Jul 3, 2022 19:08

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > The Artificial Intelligence & Machine Learning Megathread