|
Thanks all, json sounds like the winner. Kinda forgot it existed cause I've never actually built anything with it before.
|
# ? Apr 29, 2019 16:08 |
|
|
# ? Jun 10, 2024 12:26 |
|
I try to avoid pickle. There's serious security implications. It's true you don't have to worry about that if you are and always will be in control of where the stuff you're unpickling comes from. However, it's my experience that even if you think you are and always will be in control...that often doesn't remain the case.
|
# ? Apr 29, 2019 16:08 |
|
I have some pictures of things with text on them and I want to write a thing to extract that text. The text is simple and short, few words at most, not a big page from a novel or anything, and all the text is in a predictable location in every image. Has anyone done something like this and can point me towards the right packages to use? Is this just considered OCR or should I be looking for "machine vision"?
|
# ? Apr 30, 2019 14:10 |
|
Look for Tesseract OCR.
|
# ? Apr 30, 2019 14:18 |
|
I have a set of forms with typed field names and handwritten values; if I wanted to OCR them rather than do data entry manually, would Tesseract + the python interface for it be my best bet?
the yeti fucked around with this message at 16:43 on Apr 30, 2019 |
# ? Apr 30, 2019 16:35 |
|
the yeti posted:I have a set of forms with typed field names and handwritten values; if I wanted to OCR them rather than do data entry manually, would Tesseract + the python interface for it be my best bet? Yeah
|
# ? Apr 30, 2019 17:31 |
|
If you don't mind doing HTTP requests, the Google Vision API will likely do the best job.
|
# ? Apr 30, 2019 17:33 |
|
Tesseract's models for vision are quite dated. You're better of feeding the images into a Google, Microsoft etc. online CV API.
|
# ? May 1, 2019 00:16 |
|
I used tesseract to turn the Mueller report into a text file, it worked real fine
|
# ? May 1, 2019 00:41 |
|
Yeah dated doesn't necessarily mean bad but you may get better performance for weird edges cases with newer machine vision tools
|
# ? May 1, 2019 11:06 |
|
I'd suggest trying out samples of your handwriting on Tesseract and an online API service. It's really a night and day thing and this specific domain is an area where deep learning techniques have shown themselves to perform better without any caveats. There's a reason why digit recognition (MNIST) is treated as the "hello world" equivalent for AI.
|
# ? May 1, 2019 11:33 |
|
Yeah, modern handwriting recognition (and "regular" OCR) has made real advances over older stuff. Of course, if the older stuff is getting 99.999% on your text there's not much use in trying to get something newer unless you're doing tons of recognition. (This reminds me of something I've complained about before. All the OCR packages you can get for general office work...like Abbey FineReader and other stuff for managing paperwork...is stuck on tech from eons ago. I wish someone would come out with something modern.)
|
# ? May 1, 2019 17:22 |
|
So the OCR posts reminded me I was meaning to look into it for a thing I'm building and I installed Tesseract and followed a guide to see a minimal example of how it works, tested it against a few images in the domain I'm working on, and OK cool, far from perfect but definitely could be useful. Then I went to the Google Vision API page and threw some of the same images into its demo and oh my god this is witchcraft.
|
# ? May 2, 2019 03:08 |
Are there any good packages for spitting out simple pandas dataframes in html format? I've been using altair a lot recently to create quick interactive visualizations + export to html, but don't think there's functionality to display the underlying data in a browser friendly format. Basic functionality like searching or sorting would be great. Don't need to host it on the web and the source data is static. My customers are reticent to open xls files much less fiddle with filters.
|
|
# ? May 2, 2019 04:20 |
|
KingNastidon posted:Are there any good packages for spitting out simple pandas dataframes in html format? I've been using altair a lot recently to create quick interactive visualizations + export to html, but don't think there's functionality to display the underlying data in a browser friendly format. Basic functionality like searching or sorting would be great. Don't need to host it on the web and the source data is static. My customers are reticent to open xls files much less fiddle with filters. drat altair looks sweet. The dataframe method .to_html() will produce a simple html table with no markup. From there you can use the DataTables jquery plugin to add sorting and filters. You'll probably need to make header and footer files of html/JS/CSS to accomplish all the formatting stuff though.
|
# ? May 2, 2019 04:39 |
|
KICK BAMA KICK posted:So the OCR posts reminded me I was meaning to look into it for a thing I'm building and I installed Tesseract and followed a guide to see a minimal example of how it works, tested it against a few images in the domain I'm working on, and OK cool, far from perfect but definitely could be useful. I was gonna look into setting up a server to do this for what I had in mind remotely, but if google gives me 1000 free things a month then witchcraft it is.
|
# ? May 2, 2019 09:37 |
|
Boris Galerkin posted:I was gonna look into setting up a server to do this for what I had in mind remotely, but if google gives me 1000 free things a month then witchcraft it is. Same. I have a ton of personal documents I was going to scan in and use a commercial OCR on the PDFs to index them, but with 1000 free a month I may try to whip up a script to process the scan folder and compare.
|
# ? May 2, 2019 10:01 |
The Xkdc Larper posted:drat altair looks sweet. The dataframe method .to_html() will produce a simple html table with no markup. From there you can use the DataTables jquery plugin to add sorting and filters. You'll probably need to make header and footer files of html/JS/CSS to accomplish all the formatting stuff though. Thank you, DataTables looks awesome and I can definitely use this. I want to be half as smart and ambitious as the github folks that link together this poo poo. For this project I'm trying to find something that's pretty bolt-on given I mostly do excel and rudimentary data work in python, but know nothing about html/JS/CSS other than kind of understanding what's going on. What I'm trying to do is basically create a lazy (or poor man that can't get funding) Tableau or Veeva for CRM. Create points on a map, allow end users to click on that point, and spit out data associated with that point. For example, customers/sales data/internal employees responsible. Like, imagine this combined with this where selecting a map point on the left would spit out summarized data tables on the right. This probably is begging for a web app that can real-time query the selection from underlying database, but live in a flat-file csv/xls/ppt world.
|
|
# ? May 2, 2019 10:04 |
|
There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books Fluent Python alone is worth the price of entry.
|
# ? May 2, 2019 11:09 |
|
shrike82 posted:There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books Would you mind explaining why?
|
# ? May 2, 2019 11:17 |
|
Most Python guides are for beginners, Fluent Python is targeted at intermediate to even advanced developers, examining the language in depth. It does a good job discussing development best practices, explaining language plumbing, as well as providing practical code walk-through of major use cases/packages.
|
# ? May 2, 2019 11:41 |
|
Yeah, Fluent Python is really good. By the time I ever read it, I already knew most of what was in it, but it still was helpful to remind me about all of the cool and useful ways to do stuff in Python. Also in that bundle, Think Python is a good book for learning how to program if you're a newbie. Thermopyle fucked around with this message at 13:52 on May 2, 2019 |
# ? May 2, 2019 13:50 |
|
KingNastidon posted:Thank you, DataTables looks awesome and I can definitely use this. I want to be half as smart and ambitious as the github folks that link together this poo poo. For this project I'm trying to find something that's pretty bolt-on given I mostly do excel and rudimentary data work in python, but know nothing about html/JS/CSS other than kind of understanding what's going on. Would Apache Superset fit your bill at all?
|
# ? May 2, 2019 13:51 |
|
KingNastidon posted:What I'm trying to do is basically create a lazy (or poor man that can't get funding) Tableau or Veeva for CRM. Create points on a map, allow end users to click on that point, and spit out data associated with that point. For example, customers/sales data/internal employees responsible. Like, imagine this combined with this where selecting a map point on the left would spit out summarized data tables on the right. You could probably hack this together with Altair and using an event listener to provide json to a DataTables object, if you absolutely need to make flat files. That said there are far better ways to make a data dashboard but they require a server running a web stack.
|
# ? May 2, 2019 17:20 |
|
Bokeh can probably do that without a server.
|
# ? May 2, 2019 17:55 |
|
That library seems extremely good
|
# ? May 2, 2019 21:07 |
|
A few months ago there was a Humble Bundle of Python books from Packt that I bought but never got around to cracking into much, and I remember there was one book that the experienced people here actively disliked -- was it Clean Code or Python 3 OOP or some other one?
|
# ? May 2, 2019 22:25 |
|
shrike82 posted:There's an amazing Python ebook bundle on Humble bundle - https://www.humblebundle.com/books/python-oreilly-books I ordered this, and I got the receipt email from PayPal, but humble bundle never sent me anything? Is that how it normally works? Do I have to go to the website or something?
|
# ? May 2, 2019 22:26 |
|
If you login to your accounts page, you should be shown a bundle page with a bunch of links to download the ebooks in various formats. Enjoy!
|
# ? May 2, 2019 23:49 |
KICK BAMA KICK posted:A few months ago there was a Humble Bundle of Python books from Packt that I bought but never got around to cracking into much, and I remember there was one book that the experienced people here actively disliked -- was it Clean Code or Python 3 OOP or some other one? I hope it wasn't clean code because the packt clean code and FP stuff I've worked through has been great.
|
|
# ? May 3, 2019 02:01 |
|
shrike82 posted:If you login to your accounts page, you should be shown a bundle page with a bunch of links to download the ebooks in various formats. Enjoy! Yeah, I can still access books I bought years ago through my account. Check that first.
|
# ? May 3, 2019 02:07 |
|
As an aside, do you guys learn mainly from written stuff (books, articles etc.)? I was just musing with some colleagues that younger developers tend to be more comfortable with watching videos to learn about technical stuff. I've seen some of the stuff they watch are deep dives and not just 101 stuff. The flip-side is they don't touch (O'Reilly or whatever) books at all - sticking to online stuff.
|
# ? May 3, 2019 02:08 |
shrike82 posted:As an aside, do you guys learn mainly from written stuff (books, articles etc.)? I'm self taught and embrace a variety of sources. I may or may not be representative.
|
|
# ? May 3, 2019 02:11 |
|
Bundy posted:I'm self taught and embrace a variety of sources. I may or may not be representative. Reference books were the last thing I kept using dead-tree versions for, but now that I have a 34” ultra wide monitor, I’ve fully converted to having various eBooks and webpages on the standby in the right side of the screen. Still *tons* of real-estate for PyCharm.
|
# ? May 3, 2019 02:36 |
|
I absolutely prefer reading to videos. Don't do dead trees anymore though, all online. Usually whatever random blog I ended up at after a half-assed Google search. If find the inaccuracies of the blog forcing me to think harder is better for my learning process.
|
# ? May 3, 2019 03:56 |
|
I can't learn via video about programming; really, I can't learn from video on any topic at all. I have pretty bad ADHD, so I'll have to rewatch a video 4-6 times to catch it all - more if there's lovely editing, weird jumps, etc. On the other side, I have a ridiculously fast reading speed and I don't have to pause/scrub through a video to get back to what I last understood if I missed something - I just flick my eyes back or go back a page. Video tutorials have gotten better than they used to be, but they're still awful IMO.
|
# ? May 3, 2019 04:40 |
|
if I can't grep it. Its garbage
|
# ? May 3, 2019 04:46 |
|
I can't stand watching videos to learn stuff. I think it's a combination of impatience and the desire to skip around. I still keep books around for some niche or real general reference topics. If I need to learn something new I just look at the documentation online or some examples on github. Books cost money and go out of date too quickly nowadays.
|
# ? May 3, 2019 04:48 |
|
Books are great at providing detailed reasoning and examples that docs or blogs can't or won't do. Besides, if you're making computer toucher money, I don't think spending money on books is too big of a burden .
|
# ? May 3, 2019 05:13 |
|
|
# ? Jun 10, 2024 12:26 |
|
CarForumPoster posted:I do a lot of pickle reading/writing and csv reading/writing with the pandas implementation of each. Pickle is an order of magnitude faster, would take a pickle every time. (Bonus that it handles mixed data types) Pickle is amazingly fast, but it needs to be handled with caution. Changing schemas can leave Pickled data in a pretty broken state, and exposing that pickled data to writing from an untrusted third party can open up all sorts of gnarly whoops. My basic rule is that if I'm just storing data for retrieval by an ajoininng script, or whatever, its a good choice. Anything customer facing , or requiring that the data still be readable more than a month from now, use something else. Oh and fun fact. Eve Online's wire format for its Machonet was pretty much just pickled objects. Thats where all those python injection hacks came from. It blew my mind we where messing with that poo poo for *years* before they caught on (I think I can mention this now. I havent played Eve in 5-6 years)
|
# ? May 3, 2019 05:53 |