Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
KICK BAMA KICK
Mar 2, 2009

Thanks all, json sounds like the winner. Kinda forgot it existed cause I've never actually built anything with it before.

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I try to avoid pickle.

There's serious security implications. It's true you don't have to worry about that if you are and always will be in control of where the stuff you're unpickling comes from. However, it's my experience that even if you think you are and always will be in control...that often doesn't remain the case.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I have some pictures of things with text on them and I want to write a thing to extract that text. The text is simple and short, few words at most, not a big page from a novel or anything, and all the text is in a predictable location in every image.

Has anyone done something like this and can point me towards the right packages to use? Is this just considered OCR or should I be looking for "machine vision"?

Sad Panda
Sep 22, 2004

I'm a Sad Panda.
Look for Tesseract OCR.

the yeti
Mar 29, 2008

memento disco



I have a set of forms with typed field names and handwritten values; if I wanted to OCR them rather than do data entry manually, would Tesseract + the python interface for it be my best bet?

the yeti fucked around with this message at 16:43 on Apr 30, 2019

mbt
Aug 13, 2012

the yeti posted:

I have a set of forms with typed field names and handwritten values; if I wanted to OCR them rather than do data entry manually, would Tesseract + the python interface for it be my best bet?

Yeah

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

If you don't mind doing HTTP requests, the Google Vision API will likely do the best job.

shrike82
Jun 11, 2005

Tesseract's models for vision are quite dated. You're better of feeding the images into a Google, Microsoft etc. online CV API.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
I used tesseract to turn the Mueller report into a text file, it worked real fine

QuarkJets
Sep 8, 2008

Yeah dated doesn't necessarily mean bad but you may get better performance for weird edges cases with newer machine vision tools

shrike82
Jun 11, 2005

I'd suggest trying out samples of your handwriting on Tesseract and an online API service.

It's really a night and day thing and this specific domain is an area where deep learning techniques have shown themselves to perform better without any caveats. There's a reason why digit recognition (MNIST) is treated as the "hello world" equivalent for AI.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Yeah, modern handwriting recognition (and "regular" OCR) has made real advances over older stuff.

Of course, if the older stuff is getting 99.999% on your text there's not much use in trying to get something newer unless you're doing tons of recognition.

(This reminds me of something I've complained about before. All the OCR packages you can get for general office work...like Abbey FineReader and other stuff for managing paperwork...is stuck on tech from eons ago. I wish someone would come out with something modern.)

KICK BAMA KICK
Mar 2, 2009

So the OCR posts reminded me I was meaning to look into it for a thing I'm building and I installed Tesseract and followed a guide to see a minimal example of how it works, tested it against a few images in the domain I'm working on, and OK cool, far from perfect but definitely could be useful.

Then I went to the Google Vision API page and threw some of the same images into its demo and oh my god this is witchcraft.

KingNastidon
Jun 25, 2004
Are there any good packages for spitting out simple pandas dataframes in html format? I've been using altair a lot recently to create quick interactive visualizations + export to html, but don't think there's functionality to display the underlying data in a browser friendly format. Basic functionality like searching or sorting would be great. Don't need to host it on the web and the source data is static. My customers are reticent to open xls files much less fiddle with filters.

larper
Apr 9, 2019

KingNastidon posted:

Are there any good packages for spitting out simple pandas dataframes in html format? I've been using altair a lot recently to create quick interactive visualizations + export to html, but don't think there's functionality to display the underlying data in a browser friendly format. Basic functionality like searching or sorting would be great. Don't need to host it on the web and the source data is static. My customers are reticent to open xls files much less fiddle with filters.

drat altair looks sweet. The dataframe method .to_html() will produce a simple html table with no markup. From there you can use the DataTables jquery plugin to add sorting and filters. You'll probably need to make header and footer files of html/JS/CSS to accomplish all the formatting stuff though.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

KICK BAMA KICK posted:

So the OCR posts reminded me I was meaning to look into it for a thing I'm building and I installed Tesseract and followed a guide to see a minimal example of how it works, tested it against a few images in the domain I'm working on, and OK cool, far from perfect but definitely could be useful.

Then I went to the Google Vision API page and threw some of the same images into its demo and oh my god this is witchcraft.

I was gonna look into setting up a server to do this for what I had in mind remotely, but if google gives me 1000 free things a month then witchcraft it is.

Proteus Jones
Feb 28, 2013



Boris Galerkin posted:

I was gonna look into setting up a server to do this for what I had in mind remotely, but if google gives me 1000 free things a month then witchcraft it is.

Same.

I have a ton of personal documents I was going to scan in and use a commercial OCR on the PDFs to index them, but with 1000 free a month I may try to whip up a script to process the scan folder and compare.

KingNastidon
Jun 25, 2004

The Xkdc Larper posted:

drat altair looks sweet. The dataframe method .to_html() will produce a simple html table with no markup. From there you can use the DataTables jquery plugin to add sorting and filters. You'll probably need to make header and footer files of html/JS/CSS to accomplish all the formatting stuff though.

Thank you, DataTables looks awesome and I can definitely use this. I want to be half as smart and ambitious as the github folks that link together this poo poo. For this project I'm trying to find something that's pretty bolt-on given I mostly do excel and rudimentary data work in python, but know nothing about html/JS/CSS other than kind of understanding what's going on.

What I'm trying to do is basically create a lazy (or poor man that can't get funding) Tableau or Veeva for CRM. Create points on a map, allow end users to click on that point, and spit out data associated with that point. For example, customers/sales data/internal employees responsible. Like, imagine this combined with this where selecting a map point on the left would spit out summarized data tables on the right.

This probably is begging for a web app that can real-time query the selection from underlying database, but live in a flat-file csv/xls/ppt world.

shrike82
Jun 11, 2005

There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books

Fluent Python alone is worth the price of entry.

Umbreon
May 21, 2011

shrike82 posted:

There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books

Fluent Python alone is worth the price of entry.

Would you mind explaining why?

shrike82
Jun 11, 2005

Most Python guides are for beginners, Fluent Python is targeted at intermediate to even advanced developers, examining the language in depth. It does a good job discussing development best practices, explaining language plumbing, as well as providing practical code walk-through of major use cases/packages.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Yeah, Fluent Python is really good.

By the time I ever read it, I already knew most of what was in it, but it still was helpful to remind me about all of the cool and useful ways to do stuff in Python.

Also in that bundle, Think Python is a good book for learning how to program if you're a newbie.

Thermopyle fucked around with this message at 13:52 on May 2, 2019

PBS
Sep 21, 2015

KingNastidon posted:

Thank you, DataTables looks awesome and I can definitely use this. I want to be half as smart and ambitious as the github folks that link together this poo poo. For this project I'm trying to find something that's pretty bolt-on given I mostly do excel and rudimentary data work in python, but know nothing about html/JS/CSS other than kind of understanding what's going on.

What I'm trying to do is basically create a lazy (or poor man that can't get funding) Tableau or Veeva for CRM. Create points on a map, allow end users to click on that point, and spit out data associated with that point. For example, customers/sales data/internal employees responsible. Like, imagine this combined with this where selecting a map point on the left would spit out summarized data tables on the right.

This probably is begging for a web app that can real-time query the selection from underlying database, but live in a flat-file csv/xls/ppt world.

Would Apache Superset fit your bill at all?

larper
Apr 9, 2019

KingNastidon posted:

What I'm trying to do is basically create a lazy (or poor man that can't get funding) Tableau or Veeva for CRM. Create points on a map, allow end users to click on that point, and spit out data associated with that point. For example, customers/sales data/internal employees responsible. Like, imagine this combined with this where selecting a map point on the left would spit out summarized data tables on the right.

You could probably hack this together with Altair and using an event listener to provide json to a DataTables object, if you absolutely need to make flat files. That said there are far better ways to make a data dashboard but they require a server running a web stack.

SurgicalOntologist
Jun 17, 2004

Bokeh can probably do that without a server.

larper
Apr 9, 2019
:stare: That library seems extremely good

KICK BAMA KICK
Mar 2, 2009

A few months ago there was a Humble Bundle of Python books from Packt that I bought but never got around to cracking into much, and I remember there was one book that the experienced people here actively disliked -- was it Clean Code or Python 3 OOP or some other one?

Umbreon
May 21, 2011

shrike82 posted:

There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books

Fluent Python alone is worth the price of entry.

I ordered this, and I got the receipt email from PayPal, but humble bundle never sent me anything? Is that how it normally works? Do I have to go to the website or something?

shrike82
Jun 11, 2005

If you login to your accounts page, you should be shown a bundle page with a bunch of links to download the ebooks in various formats. Enjoy!

NinpoEspiritoSanto
Oct 22, 2013




KICK BAMA KICK posted:

A few months ago there was a Humble Bundle of Python books from Packt that I bought but never got around to cracking into much, and I remember there was one book that the experienced people here actively disliked -- was it Clean Code or Python 3 OOP or some other one?

I hope it wasn't clean code because the packt clean code and FP stuff I've worked through has been great.

Proteus Jones
Feb 28, 2013



shrike82 posted:

If you login to your accounts page, you should be shown a bundle page with a bunch of links to download the ebooks in various formats. Enjoy!

Yeah, I can still access books I bought years ago through my account. Check that first.

shrike82
Jun 11, 2005

As an aside, do you guys learn mainly from written stuff (books, articles etc.)?
I was just musing with some colleagues that younger developers tend to be more comfortable with watching videos to learn about technical stuff. I've seen some of the stuff they watch are deep dives and not just 101 stuff. The flip-side is they don't touch (O'Reilly or whatever) books at all - sticking to online stuff.

NinpoEspiritoSanto
Oct 22, 2013




shrike82 posted:

As an aside, do you guys learn mainly from written stuff (books, articles etc.)?
I was just musing with some colleagues that younger developers tend to be more comfortable with watching videos to learn about technical stuff. I've seen some of the stuff they watch are deep dives and not just 101 stuff. The flip-side is they don't touch (O'Reilly or whatever) books at all - sticking to online stuff.

I'm self taught and embrace a variety of sources. I may or may not be representative.

Proteus Jones
Feb 28, 2013



Bundy posted:

I'm self taught and embrace a variety of sources. I may or may not be representative.

:same:

Reference books were the last thing I kept using dead-tree versions for, but now that I have a 34” ultra wide monitor, I’ve fully converted to having various eBooks and webpages on the standby in the right side of the screen. Still *tons* of real-estate for PyCharm.

The Fool
Oct 16, 2003


I absolutely prefer reading to videos. Don't do dead trees anymore though, all online. Usually whatever random blog I ended up at after a half-assed Google search.


If find the inaccuracies of the blog forcing me to think harder is better for my learning process.

death cob for cutie
Dec 30, 2006

dwarves won't delve no more
too much splatting down on Zot:4
I can't learn via video about programming; really, I can't learn from video on any topic at all. I have pretty bad ADHD, so I'll have to rewatch a video 4-6 times to catch it all - more if there's lovely editing, weird jumps, etc. On the other side, I have a ridiculously fast reading speed and I don't have to pause/scrub through a video to get back to what I last understood if I missed something - I just flick my eyes back or go back a page. Video tutorials have gotten better than they used to be, but they're still awful IMO.

Methanar
Sep 26, 2013

by the sex ghost
if I can't grep it. Its garbage

dougdrums
Feb 25, 2005
CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)
I can't stand watching videos to learn stuff. I think it's a combination of impatience and the desire to skip around. I still keep books around for some niche or real general reference topics. If I need to learn something new I just look at the documentation online or some examples on github. Books cost money and go out of date too quickly nowadays.

qsvui
Aug 23, 2003
some crazy thing
Books are great at providing detailed reasoning and examples that docs or blogs can't or won't do. Besides, if you're making computer toucher money, I don't think spending money on books is too big of a burden :shrug:.

Adbot
ADBOT LOVES YOU

duck monster
Dec 15, 2004

CarForumPoster posted:

I do a lot of pickle reading/writing and csv reading/writing with the pandas implementation of each. Pickle is an order of magnitude faster, would take a pickle every time. (Bonus that it handles mixed data types)

Pickle is amazingly fast, but it needs to be handled with caution. Changing schemas can leave Pickled data in a pretty broken state, and exposing that pickled data to writing from an untrusted third party can open up all sorts of gnarly whoops. My basic rule is that if I'm just storing data for retrieval by an ajoininng script, or whatever, its a good choice. Anything customer facing , or requiring that the data still be readable more than a month from now, use something else.

Oh and fun fact. Eve Online's wire format for its Machonet was pretty much just pickled objects. Thats where all those python injection hacks came from. It blew my mind we where messing with that poo poo for *years* before they caught on (I think I can mention this now. I havent played Eve in 5-6 years)

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply