Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
Just embrace the Oxford comma and not have to deal with any logic other than if the list is >= 3 :v:

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!

a witch posted:

Pycharm. Add the library to your project, set a breakpoint in it and run the debugger.

Thermopyle posted:

PyCharm is great, but if you don't want to use it, you can use pdb or ipdb.

Foxfire_ posted:

pudb's my favorite if you're on unix

Thanks for the ideas.

wolrah
May 8, 2006
what?

Hughmoris posted:

Is there a good way to step through Python code? I found a Python library to parse torrent names (https://github.com/divijbindlish/parse-torrent-name/blob/master/PTN/parse.py) and I can't quite figure out how it works.

That code leans pretty heavily on regular expressions, which Python debuggers aren't going to be as helpful with.

Mr Crucial
Oct 28, 2005
What's new pussycat?
I'm not sure if this is the right place for this question, but here goes. How do I add a CA certificate so that Python and Python-based apps trust TLS certs generated by that CA?

I'm running Ansible on a Centos 7 box. I have playbooks that connect to Windows devices over WinRM protected by a cert generated by the CA. Ansible works fine but the CA is not trusted so I get a lot of verification errors every time I run the playbooks.

Various googling has led me down the path of running

code:
python -c "import ssl; print(ssl.get_default_verify_paths())"
to find the location of the default trust store (/etc/pki/tls/certs/ca-trust.crt in my case), and adding a PEM-encoded copy of the CA public cert to it using openssl. This I've done, and using "openssl verify" to test confirms that openssl at least trusts the CA.

Ansible, however, stubbornly refuses to verify the certs. I'm at a bit of a loss as to how to get this working now. Can anyone point me in the right direction?

Tigren
Oct 3, 2003

Mr Crucial posted:

I'm not sure if this is the right place for this question, but here goes. How do I add a CA certificate so that Python and Python-based apps trust TLS certs generated by that CA?

I'm running Ansible on a Centos 7 box. I have playbooks that connect to Windows devices over WinRM protected by a cert generated by the CA. Ansible works fine but the CA is not trusted so I get a lot of verification errors every time I run the playbooks.

Various googling has led me down the path of running

code:
python -c "import ssl; print(ssl.get_default_verify_paths())"
to find the location of the default trust store (/etc/pki/tls/certs/ca-trust.crt in my case), and adding a PEM-encoded copy of the CA public cert to it using openssl. This I've done, and using "openssl verify" to test confirms that openssl at least trusts the CA.

Ansible, however, stubbornly refuses to verify the certs. I'm at a bit of a loss as to how to get this working now. Can anyone point me in the right direction?

I haven't used this authentication method myself, but do the following host vars help?

code:
ansible_connection: winrm
ansible_winrm_cert_pem: /path/to/certificate/public/key.pem
ansible_winrm_cert_key_pem: /path/to/certificate/private/key.pem
ansible_winrm_transport: certificate
Source

Tigren fucked around with this message at 18:12 on Nov 30, 2017

Mr Crucial
Oct 28, 2005
What's new pussycat?

Tigren posted:

I haven't used this authentication method myself, but do the following host vars help?
They don't unfortunately, because I'm not using certificate authentication, I'm using CredSSP auth over HTTPS which is a different matter. I did try adding the ansible_winrm_cert_pem like so, but it didn't work:

code:
ansible_port=5986
ansible_connection=winrm
ansible_winrm_scheme=https
#ansible_winrm_server_cert_validation=ignore
ansible_winrm_cert_pem=/etc/pki/tls/cert.pem
ansible_winrm_transport=credssp
It's frustrating as poo poo because all of the official Ansible documentation suggests using self-signed certificates on a WinRM target and turning off certificate validation - there's not a single word about how to do this properly.

Tigren
Oct 3, 2003

Mr Crucial posted:

They don't unfortunately, because I'm not using certificate authentication, I'm using CredSSP auth over HTTPS which is a different matter. I did try adding the ansible_winrm_cert_pem like so, but it didn't work:

code:
ansible_port=5986
ansible_connection=winrm
ansible_winrm_scheme=https
#ansible_winrm_server_cert_validation=ignore
ansible_winrm_cert_pem=/etc/pki/tls/cert.pem
ansible_winrm_transport=credssp
It's frustrating as poo poo because all of the official Ansible documentation suggests using self-signed certificates on a WinRM target and turning off certificate validation - there's not a single word about how to do this properly.

Does setting the REQUESTS_CA_BUNDLE environment variable help? It looks like CredSSP auth is handled by requests-credssp.

I love that Ansible even just tells you to ignore cert validation. Super secure!

quote:

When the Ansible controller is running on Python 2.7.9+ or an older version of Python that has backported SSLContext (like Python 2.7.5 on RHEL 7), the controller will attempt to validate the certificate WinRM is using for an HTTPS connection. If the certificate cannot be validated (such as in the case of a self signed cert), it will fail the verification process.

To ignore certificate validation, add ansible_winrm_server_cert_validation: ignore to inventory for the Windows host.

Tigren fucked around with this message at 20:53 on Nov 30, 2017

Mr Crucial
Oct 28, 2005
What's new pussycat?

Tigren posted:

Does setting the REQUESTS_CA_BUNDLE environment variable help? It looks like CredSSP auth is handled by requests-credssp.

I love that Ansible even just tells you to ignore cert validation. Super secure!

Good idea, but no that didn't help. I ramped up the verbosity of Ansible and I think the fault is actually in a PowerShell module that's part of Ansible itself. It handles the connectivity to Windows but having a peruse through it there's nothing in the way of certificate handling. I've logged a bug on the Ansible Github page for it.

And yes, super secure. Getting into all this devops tooling stuff has really highlighted to me how much a shitshow security is across the entire space.

Dominoes
Sep 20, 2007

Dominoes posted:

Hey dudes: How do you actively test/work on functions in your code? This is a broad-question; I hope this context helps:

My workflow has involved editing files in PyCharm or another editor, and having a separate Ipyhon window open with %autoreload 2 set; I import the module. To test, I save the code in the editor, and work in Ipython.

I've recently tried Rstudio and Spyder... Love how you can just work entirely in the IDE and run bits of/all your code etc whenever you want. This doesn't appear possible in Pycharm: The run button at the top just runs the whole thing, as if it were a standalone script. The built-in console doesn't use the Spyder/Rstudio behavior I described, and doesn't work with autoreload. Is there any way to do this in Pycharm?
This appears to be fixed in the latest PyCharm release! Can do everything in the integrated Ipython terminal.

FoiledAgain
May 6, 2007

unpacked robinhood posted:

Little things:

Is there a pythonic one-liner to make natural langage enumerations ?
For example I'd have:
Python code:
fatmenu=['donut','deep fried kale','spoken yogurt']
and want to list them like "you ordered donut, deep fried kale and spoken yogurt", with commas for more than two items.
Same with "you ordered 3 items" with the 's' added as needed ?

It's two dead simple functions with conditionnals but I feel like there's an interesting and probably obtuse python trick for this.


code:
import random
fatmenu=['donut','deep fried kale','spoken yogurt','foo cookies', 'bar bars']
n = random.randint(1,len(fatmenu))
foods = random.sample(fatmenu, n)
output = 'you ordered {} item{}: {}{}{}'.format(n,
                                                's' if not n == 1 else '', #because zero is grammatically plural
                                                foods[0] if n <= 2 else ', '.join(foods[:-1]),
                                                ' and ' if n>1 else '',
                                                foods[-1] if n>1 else '')
print(output)
edit: just realized that it doesn't matter if I check for zero items because my random.randint() never picks below one. but in your case maybe zero does matter.

FoiledAgain fucked around with this message at 22:58 on Nov 30, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Dominoes posted:

This appears to be fixed in the latest PyCharm release! Can do everything in the integrated Ipython terminal.

FWIW, I was never really clear on what behavior you were looking for.

That might be because I almost always write a unit tests for whatever function. Like, I might not write the unit test first like a good TDD disciple, but if I get to the point where I'm wanting to run the function, I write a unit test to run the function and then press Ctrl-Shift-F10 to run it.

This gets me more unit tests and lets me do set up work and whatever else needs done.

vikingstrike
Sep 23, 2007

whats happening, captain
You’ve always been able to highlight code and execute it in the built in i python terminal of pycharm.

xgalaxy
Jan 27, 2004
i write code
Apologies if this has been asked in here before.

I'm using Google Cloud SDK (Note: not App Engine SDK although they are very similar).
In additional this I am using google-cloud-bigquery, google-cloud-pubsub, and some other google things.

It appears none of these things work properly with pylint. I get unable to import 'google.cloud' stuff everywhere I'm trying to use this stuff.
If I bring up a repl and import from google.cloud it works fine and of course running Google's app engine local dev server it runs fine.

Why is this such a mess?

Dominoes
Sep 20, 2007

vikingstrike posted:

You’ve always been able to highlight code and execute it in the built in i python terminal of pycharm.

Thermopyle posted:

FWIW, I was never really clear on what behavior you were looking for.

That might be because I almost always write a unit tests for whatever function. Like, I might not write the unit test first like a good TDD disciple, but if I get to the point where I'm wanting to run the function, I write a unit test to run the function and then press Ctrl-Shift-F10 to run it.

This gets me more unit tests and lets me do set up work and whatever else needs done.
I'd always have issues with dependencies when running highlighted code in console; had to select the entire function each time, or reload/reset the console each time code changed. Additionally, things like pressing the up arrow to autocomplete with history didn't work. Difficult to troubleshoot now that it's fixed!

mr_package
Jun 13, 2000
I'm writing a Flask app that accepts file uploads that I then need to process and publish (upload) elsewhere. What's the best way to fork the "process_uploads" process? I can think of several ways (e.g. suprocess.Popen) but maybe I should be using the multiprocessing module? I want the Flask view to add a job to the queue and then call my "process_uploads" class/function/whatever to take and handle those.

IAmKale
Jun 7, 2007

やらないか

Fun Shoe

xgalaxy posted:

Apologies if this has been asked in here before.

I'm using Google Cloud SDK (Note: not App Engine SDK although they are very similar).
In additional this I am using google-cloud-bigquery, google-cloud-pubsub, and some other google things.

It appears none of these things work properly with pylint. I get unable to import 'google.cloud' stuff everywhere I'm trying to use this stuff.
If I bring up a repl and import from google.cloud it works fine and of course running Google's app engine local dev server it runs fine.

Why is this such a mess?
I have the exact same setup (using pylint on a GCP application) but I couldn't remember ever having this issue. It looks like I worked around it by adding this line to the project's pylintrc file:

code:
ignored-modules=google
Does that help?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

mr_package posted:

I'm writing a Flask app that accepts file uploads that I then need to process and publish (upload) elsewhere. What's the best way to fork the "process_uploads" process? I can think of several ways (e.g. suprocess.Popen) but maybe I should be using the multiprocessing module? I want the Flask view to add a job to the queue and then call my "process_uploads" class/function/whatever to take and handle those.

You shouldn't fork process or threads from a view.

You need a task task queue. A simple one is python-rq, a complex featureful one is celery.

mr_package
Jun 13, 2000

Thermopyle posted:

You shouldn't fork process or threads from a view.

You need a task task queue. A simple one is python-rq, a complex featureful one is celery.

I might be misusing the term 'view'. Flask Route? Assuming that's not the same thing? It's a simple enough use case I just want to call a second process on demand rather than setting up queue/polling/workers. The workload is low (less than 10 hits per day) so my main concern is just preventing users from sitting there asking "is anything happening?" after they click the upload button. Is it so horrible to do it this way? I've written a lot of Python over the years but it's all been scripting-style so this kind of async behaviour is new territory for me.

I looked at python-rq while researching approaches to this and it looked very good but overkill, but if it's your recommendation I'll do it this way.

Space Kablooey
May 6, 2009


Yeah, that's a job for a rq or celery setup.

mr_package
Jun 13, 2000
Ok I'll take your advice, python-rq. Can someone tell me in brief why it's wrong to do it the other way? Linking to docs/article/whatever is fine, this is obviously a gap in my knowledge and I want to gain more understanding-- I don't have a comp sci background I'm just "good with computers" and never stopped learning.

Is it mostly a Python thing? If you were working in C++ / Java would you use the same approach? (Maybe they have queuing built-in to their standard libraries?)

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

mr_package posted:

Ok I'll take your advice, python-rq. Can someone tell me in brief why it's wrong to do it the other way? Linking to docs/article/whatever is fine, this is obviously a gap in my knowledge and I want to gain more understanding-- I don't have a comp sci background I'm just "good with computers" and never stopped learning.

Is it mostly a Python thing? If you were working in C++ / Java would you use the same approach? (Maybe they have queuing built-in to their standard libraries?)

There's a myriad of reasons, none of which is too convincing on its own. Also it depends on what your server setup is like...nginx, Apache, threaded requests vs green threads, vs processes, blah blah blah.

The first time I got bit by one of these myriad reasons was the fact that I opened myself to a DoS attack because each time a view was hit it would spawn a process to process a image, hit some urls, and update the database. Of course, I could come up with some sort of decentralized system to maintain a limited number of processes only and then a queue of backed-up tasks that needed run.

But that would be re-inventing one of the many existing task queue systems.

For something as small as you're talking about there are much more lightweight task queues, like http://django-background-tasks.readthedocs.io/en/latest/

onionradish
Jul 6, 2006

That's spicy.
I want to add multi-threading to a basic webscraper I've been tasked with. I have a list of URLs to spread across threads, but don't want to hit the same host simultaneously.

With a list of URLs, some from the same host, some from different hosts, what's the best way to set up thread Queue()s or some other URL pool so each thread can do simultaneous downloads as long as they're from different hosts?

This seems like something simple, and something that would be in stdlib collections or itertools, but I'm not seeing it. If it's actually a tricky issue, that's fine, and I'll work on a solution -- I just don't want to re-invent the wheel.

Eela6
May 25, 2007
Shredded Hen

onionradish posted:

I want to add multi-threading to a basic webscraper I've been tasked with. I have a list of URLs to spread across threads, but don't want to hit the same host simultaneously.

With a list of URLs, some from the same host, some from different hosts, what's the best way to set up thread Queue()s or some other URL pool so each thread can do simultaneous downloads as long as they're from different hosts?

This seems like something simple, and something that would be in stdlib collections or itertools, but I'm not seeing it. If it's actually a tricky issue, that's fine, and I'll work on a solution -- I just don't want to re-invent the wheel.

Sort them, then use itertools.groupby to split into groups by host. Separate the tasks by host rather than URL.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
I'm new to Pycharm and utilizing virtual environments, and I'm running Windows 10.

When creating a new project in Pycharm, I can't find a module that I want to install (https://github.com/divijbindlish/parse-torrent-name). Is my next best option to open up a console window, activate the virtual environment and install the module? Or is there a way to help Pycharm find the module for installation?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Hughmoris posted:

I'm new to Pycharm and utilizing virtual environments, and I'm running Windows 10.

When creating a new project in Pycharm, I can't find a module that I want to install (https://github.com/divijbindlish/parse-torrent-name). Is my next best option to open up a console window, activate the virtual environment and install the module? Or is there a way to help Pycharm find the module for installation?

I usually click the Terminal button in PyCharm and install packages that way. It automatically activates the virtualenv or conda env for the project.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Thermopyle posted:

I usually click the Terminal button in PyCharm and install packages that way. It automatically activates the virtualenv or conda env for the project.

That worked, thanks. PyCharm is a bit overwhelming coming from Vim or Atom.

Tigren
Oct 3, 2003

Hughmoris posted:

I'm new to Pycharm and utilizing virtual environments, and I'm running Windows 10.

When creating a new project in Pycharm, I can't find a module that I want to install (https://github.com/divijbindlish/parse-torrent-name). Is my next best option to open up a console window, activate the virtual environment and install the module? Or is there a way to help Pycharm find the module for installation?

Phone posting, but you should be able to open the project interpreter settings and install packages there.

https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Tigren posted:

Phone posting, but you should be able to open the project interpreter settings and install packages there.

https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html

Thanks. That was the initial route I pursued but the package I needed wasn't in the available list.

Tigren
Oct 3, 2003

Hughmoris posted:

Thanks. That was the initial route I pursued but the package I needed wasn't in the available list.

Weird, works for me.



What is listed when you click on that "Manage Repositories" button?

Mine has https://pypi.python.org/simple listed.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Tigren posted:

Weird, works for me.



What is listed when you click on that "Manage Repositories" button?

Mine has https://pypi.python.org/simple listed.

My "Manage Repositories" list was initially empty. I added the one that you listed and refreshed available packages and no change, still can't find it. It looks like it might only be displaying Conda packages? A quick google search says this might not be an extremely uncommon issue but I haven't found a solution.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Umm, I'm not at my pc but there's a button on the right side at the bottom that switches between virtual environments and conda.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Thermopyle posted:

Umm, I'm not at my pc but there's a button on the right side at the bottom that switches between virtual environments and conda.

That was it. I'm able to find the package. Of course, when I go to install it, it errors out. :suicide:

I get the same error when attempting to install it from CMD but I am able to manually install it with setup.py .

FAGGY CLAUSE
Apr 9, 2011

by FactsAreUseless
Not sure this is the exact thread for it, but I'm using Python to build out a prototype.

Long story short, I'm building out an document OCR process/pipeline to extract data from PDF documents. Many of these are just straight up scans of structured documents, hence the OCR bit. I'm using Tesseract for the moment and looking for suggestions on any other OCR solutions I can use. Cloud services are a no go. No Russian software companies either. Otherwise, the customer generally prefers buying commercial software in the end, but until then I just need to prove that this would be useful before we start buying things.

Anyone have OCR experience, particularly with structured forms, and recognizing data tables? I've been getting OK results for now. Tabula looks interesting. Tesseracts HOCR output format is a nice way to identify exact locations of each word. I can find fields by certain words/phrases that tend to get OCR'd the best and then locate text relative to these locations. For the tabular stuff I was considering even attempting some sort of clustering to see if that helped pull out wrapped text/phrases. I have some check boxes to pull out and have had some luck cropping with OpenCV and the counting the the % of black pixels vs white. But at this point I feel like I'm reinventing the wheel.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

FAGGY CLAUSE posted:

Not sure this is the exact thread for it, but I'm using Python to build out a prototype.

Long story short, I'm building out an document OCR process/pipeline to extract data from PDF documents. Many of these are just straight up scans of structured documents, hence the OCR bit. I'm using Tesseract for the moment and looking for suggestions on any other OCR solutions I can use. Cloud services are a no go. No Russian software companies either. Otherwise, the customer generally prefers buying commercial software in the end, but until then I just need to prove that this would be useful before we start buying things.

Anyone have OCR experience, particularly with structured forms, and recognizing data tables? I've been getting OK results for now. Tabula looks interesting. Tesseracts HOCR output format is a nice way to identify exact locations of each word. I can find fields by certain words/phrases that tend to get OCR'd the best and then locate text relative to these locations. For the tabular stuff I was considering even attempting some sort of clustering to see if that helped pull out wrapped text/phrases. I have some check boxes to pull out and have had some luck cropping with OpenCV and the counting the the % of black pixels vs white. But at this point I feel like I'm reinventing the wheel.

I don't have a lot of experience but also have this exact problem so I'm curious what you find out. Why no cloud services though?

FAGGY CLAUSE
Apr 9, 2011

by FactsAreUseless
Classified documents.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

FAGGY CLAUSE posted:

Classified documents.

Are you this guy:
https://www.youtube.com/watch?v=h6TRYcx74qs

FAGGY CLAUSE
Apr 9, 2011

by FactsAreUseless
I don't know how but you found me.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

It's probably better to ask in the general programming thread as there's nothing specific to python.

I've actually been working on something similar for personal usage. however, my scanner has a "Scan to PDF" option that automatically OCR's documents so I haven't had to worry about the OCR part. I've just been using text parsing to identify the structured parts of PDFs I want to pull out.

Hughmoris
Apr 21, 2007
Let's go to the abyss!
How is python+selenium for filling out lots of repetitive forms? I noticed that some people on my project team are manually entering in the data for 2000+ users in to a web portal. They've asked for help but my eyes will fall out of my head if I have to manually type in crap.

I have all of the user data in a clean csv file. The steps that are needed are basically:
  • I log in to the web portal (just once)
  • Click on search field and enter user name
  • Click on said user
  • Fill in a couple of text boxes, check a couple of boxes, select values from a drop down list
  • save form
  • GOTO search for a user

I used AutoIT for a similar job a few years ago but I figured I'd give Python a try for this (plus I forgot AutoIT).

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

It will work fine.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply