Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
nbv4
Aug 21, 2002

by Duchess Gummybuns

Avenging Dentist posted:

Ok so strip out all elements more than a standard deviation from the mean:
code:
A[(A < A.mean()+A.std()) & (A > A.mean()-A.std())]

awesome, thats perfect!

Adbot
ADBOT LOVES YOU

spankweasel
Jan 4, 2006

Ugh. I'm frustrated with trying to extend unittest so the test cases will run in their own thread. I have a collection of tests which take 15-20 minutes each to run, but all do something completely different than any of the others (install VirtualBox guests) and running this test in serial will take too long.

Has anybody screwed around with trying to thread test cases?

good jovi
Dec 11, 2000

'm pro-dickgirl, and I VOTE!

spankweasel posted:

Ugh. I'm frustrated with trying to extend unittest so the test cases will run in their own thread. I have a collection of tests which take 15-20 minutes each to run, but all do something completely different than any of the others (install VirtualBox guests) and running this test in serial will take too long.

Has anybody screwed around with trying to thread test cases?

Have you tried using nose with its multiprocess plugin?

tripwire
Nov 19, 2004

        ghost flow

Avenging Dentist posted:

Ok so strip out all elements more than a standard deviation from the mean:
code:
A[(A < A.mean()+A.std()) & (A > A.mean()-A.std())]

:monocle: drat. Numpy's fancy array indexing is awesome.

Captain Capacitor
Jan 21, 2008

The code you say?
I'm trying out a build of Python 3.2 from the Mercurial branch, and I came across this tidbit after I encountered some build errors on OSX.

README posted:

On OSX and Cygwin, the executable is called python.exe; elsewhere it's just python.

Anyone have any idea why it's named as such now? :psyduck:

good jovi
Dec 11, 2000

'm pro-dickgirl, and I VOTE!

Captain Capacitor posted:

Anyone have any idea why it's named as such now? :psyduck:

I don't know if this is exactly what you're asking, but that's just the pre-installation binary. It's that way in 2.6 as well. As for why, I cannot imagine.

spankweasel
Jan 4, 2006

Sailor_Spoon posted:

Have you tried using nose with its multiprocess plugin?

I'll give this a shot today. Thanks for the idea.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

Captain Capacitor posted:

I'm trying out a build of Python 3.2 from the Mercurial branch, and I came across this tidbit after I encountered some build errors on OSX.


Anyone have any idea why it's named as such now? :psyduck:

The original reason was the fact that HFS+ in normal operation, like windows, is case insensitive. having a Python directory, and a binary named "python" makes things unhappy.

maskenfreiheit
Dec 30, 2004
Edit: doublepost

maskenfreiheit fucked around with this message at 01:22 on Mar 13, 2017

ATLbeer
Sep 26, 2004
Über nerd

GregNorc posted:

How intelligent is python about using resources?

For example, I'm writing a small script that takes a dictionary file and outputs all the permutations a word could have. (Input butts, get butt$, bu++s, b|_|tts, etc). Relatively simple, but when you have a 50mb dictionary file it's not instantaneous either.

I'm also looking at other stuff too, like crawling through larger datasets (wikipedia dumps, project Gutenberg book collections, etc)

I figure it's good to be thinking about efficiency early with a small project like this.

While I don't expect python to be incredibly efficient/intelligent, is it at least smart enough to spread something like this across both cores of a machine, or am I going to have to pick read up on multithreading or something to achieve that? (so if I have [x] cores, it'll split the task amongst them?)

http://docs.python.org/library/multiprocessing.html

Captain Capacitor
Jan 21, 2008

The code you say?

Just came from a talk at a local Python authors group, someone gave a talk about Twisted and Multiprocessing

maskenfreiheit
Dec 30, 2004
Edit: doublepost

maskenfreiheit fucked around with this message at 01:23 on Mar 13, 2017

tehk
Mar 10, 2006

[-4] Flaw: Heart Broken - Tehk is extremely lonely. The Gay Empire's ultimate weapon finds it hard to have time for love.

GregNorc posted:

So without using that library, python will restrict all work to one core?

One thread.

edit:link added

tehk fucked around with this message at 16:35 on Feb 10, 2010

king_kilr
May 25, 2007

tehk posted:

One thread.

edit:link added

No, you can have multiple threads (using the threading module), they might even be assigned to multiple cores by your operating system's scheduler, however they will not be run concurrently.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

Captain Capacitor posted:

Just came from a talk at a local Python authors group, someone gave a talk about Twisted and Multiprocessing

Twisted and multiprocessing? I'm interested in hearing more.

king_kilr
May 25, 2007

m0nk3yz posted:

Twisted and multiprocessing? I'm interested in hearing more.

I'm interested in hearing less. I mean twisted's ok (in an 800lb gorrilla sense of the term), and multiprocessing is pretty snazzy, but BOTH AT ONCE? I'm pretty sure twisted isn't even threadsafe.

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

king_kilr posted:

I'm interested in hearing less. I mean twisted's ok (in an 800lb gorrilla sense of the term), and multiprocessing is pretty snazzy, but BOTH AT ONCE? I'm pretty sure twisted isn't even threadsafe.

It depends; is it multiprocessing, or process management within twisted, which is completely different?

tehk
Mar 10, 2006

[-4] Flaw: Heart Broken - Tehk is extremely lonely. The Gay Empire's ultimate weapon finds it hard to have time for love.

king_kilr posted:

No, you can have multiple threads (using the threading module), they might even be assigned to multiple cores by your operating system's scheduler, however they will not be run concurrently.

I meant that without using a module like threading or multiprocessing you will be stuck on one thread. I assumed his question was indirectly asking "What is python's default behavior in terms of concurrency/multithreading/whatever"

Captain Capacitor
Jan 21, 2008

The code you say?

m0nk3yz posted:

Twisted and multiprocessing? I'm interested in hearing more.

I've asked the presenter to send me a copy at his convenience, I'll post a link to it at some point.

How he arranged it was using Twisted to handle the web service and start/stop workers created using the Multiprocess package.

For those (like myself) who have a less than keen interest in Twisted, I've heard good things about Celery

m0nk3yz
Mar 13, 2002

Behold the power of cheese!

Captain Capacitor posted:

I've asked the presenter to send me a copy at his convenience, I'll post a link to it at some point.

How he arranged it was using Twisted to handle the web service and start/stop workers created using the Multiprocess package.

For those (like myself) who have a less than keen interest in Twisted, I've heard good things about Celery

Yeah, using multiprocessing for work start/stop is common - celery uses multiprocessing inside of it as well.

ATLbeer
Sep 26, 2004
Über nerd
Perfectly timed for the GIL page of this megathread

http://ec2-174-129-96-143.compute-1.amazonaws.com/index.html

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:

code:
str.encode("ascii", "ignore")
However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien³ should be converted to "Alien 3".
2. "—" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

ATLbeer
Sep 26, 2004
Über nerd

Thermopyle posted:

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:

code:
str.encode("ascii", "ignore")
However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien³ should be converted to "Alien 3".
2. "—" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

Those are some nasty edge cases. I don't think something like that would exist in the way you want because should Alien³ be Alien 3 or Alien^3 or Alien*Alien*Alien?

deimos
Nov 30, 2006

Forget it man this bat is whack, it's got poobrain!

Thermopyle posted:

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:

code:
str.encode("ascii", "ignore")
However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien³ should be converted to "Alien 3".
2. "—" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

Behold:
code:
>>> from unicodedata import normalize
>>> b = u"áñejo"
>>> normalize('NFKD', b).encode('ascii', 'ignore')
'anejo'
MAGIC!

Sure, it doesn't do everything you want, but hey, it's a solution.
Note: I couldn't test your examples.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

How about os.path.normpath()? http://docs.python.org/library/os.path.html#os.path.normpath

I'm not sure if it can take in unicode, but it's worth trying.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.

Scaevolus
Apr 16, 2007

Plorkyeran posted:

Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.
I assume you meant printable characters, but you also can't have NULs and a few other things.

Wikipedia posted:

The Windows kernel forbids the use of characters in range 1-31 (i.e., 0x01-0x1F) and characters " * : < > ? \ / |. Although NTFS allows each path component (directory or filename) to be 255 characters long and paths up to about 32767 characters long, the Windows kernel only supports paths up to 259 characters long. Additionally, Windows forbids the use of the MS-DOS device names AUX, CLOCK$, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, CON, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, NUL and PRN, as well as these names with any extension (for example, AUX.txt), except when using Long UNC paths (ex. \\.\C:\nul.txt or \\?\D:\aux\con). (In fact, CLOCK$ may be used if an extension is provided.)

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Plorkyeran posted:

Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.

I have no way to guarantee that it's going to be on NTFS without making that a system requirement...which I'd rather not do.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
Try to open the filename with the unmangled name and only mangle it if that fails. If you can create a file with a unicode filename, the filesystem supports unicode.

Lurchington
Jan 2, 2003

Forums Dragoon

Plorkyeran posted:

Try to open the filename with the unmangled name and only mangle it if that fails. If you can create a file with a unicode filename, the filesystem supports unicode.

I ran into this on a script I wrote that took tv episode titles and used them to name files.

I removed the characters disallowed by NTFS, then attempted to write the file with the unicode characters. If an exception was thrown, I encoded as ascii. I didn't run into any cases other than accented characters (which encode to ascii somewhat gracefully)

BattleMaster
Aug 14, 2000

I'm having a problem getting a script to work. I've traced the problem back to getopt returning two empty lists no matter what arguments are supplied to the script. Here's how it is being used:

code:
opts, args = getopt.getopt(sys.argv[1:], "bcsuo:", ["memspace=", "native-file="])
I actually didn't write the program this is used it (It's pmImgCreator.py for PyMite), but I'm pulling my hair out trying to get it to work. I've looked at the reference for getopt and everything seems right. Additionally no one else is complaining about it on the project's Google group. It just isn't working no matter what combination of arguments I've tried using it with. Even if I just tack a load of gibberish non-options onto the command line, args is empty.

Is getopt just plain broken in Windows? I've tried running it with Python 2.6.4 and even 2.5.4 with no luck.

leterip
Aug 25, 2004

BattleMaster posted:

I'm having a problem getting a script to work. I've traced the problem back to getopt returning two empty lists no matter what arguments are supplied to the script. Here's how it is being used:

code:
opts, args = getopt.getopt(sys.argv[1:], "bcsuo:", ["memspace=", "native-file="])
I actually didn't write the program this is used it (It's pmImgCreator.py for PyMite), but I'm pulling my hair out trying to get it to work. I've looked at the reference for getopt and everything seems right. Additionally no one else is complaining about it on the project's Google group. It just isn't working no matter what combination of arguments I've tried using it with. Even if I just tack a load of gibberish non-options onto the command line, args is empty.

Is getopt just plain broken in Windows? I've tried running it with Python 2.6.4 and even 2.5.4 with no luck.

Are you sure sys.argv[1:] has data in it? It seems likely to have data so that might be something easily overlooked. That same script works perfect for me on OS X.

BattleMaster
Aug 14, 2000

leterip posted:

Are you sure sys.argv[1:] has data in it? It seems likely to have data so that might be something easily overlooked. That same script works perfect for me on OS X.

You're right, it's empty. Clearly the arguments are not being passed to the script.

I tried running it with "c:\python25\python.exe pmImgCreator.py [arguments]" rather than "pmImgCreator.py [arguments]" and it seems to work now. Something must have been wrong with the Windows file associations. Thanks for the advice.

UberJumper
May 20, 2007
woop
Does anyone have any good suggestions and recommendations for good practices when creating python packages?

Basically i am tasked with combining a dozen or so modules, a lot of them have overlapping functionality, in some cases certain functions have just been copied and pasted.

The other important part is, certain methods have 2-3 different ways of doing certain things depending on what modules currently exist in the system. For example if the user has python 2.4 then generally they will have win32com, however in 2.5 win32com is not installed by default with ArcGIS python install, so we write a method of doing the same thing via ctypes.

So there is alot of:

code:
if user_has_win32com:
  do_win32_foo()
elif user_has_ctypes:
  do_ctypes_foo()
else:
  do_something_else()
A lot of the methods all need to keep and interact with global variables, there is always a reference to the geoprocessor, user's toolbox path, arcgis toolbox path, etc. Most of these variables are populated generally when the module is imported.

So basically this is what i am looking at right now:

code:
foo package\
  __init__.py (populate global variables, and figure out what the user has installed on their system)
  common.py (all of the global variables used within the package)
  constants.py (all of the constants used)
  error.py (exceptions, used by this package, along with some error handling routines)
  log.py (custom logging handlers)
  path.py (basically versions of os.path abspath, exists, etc, these basically do some extra work for arcgis geoprocessor specific api stuff)
  _gp.py (geoprocessor specific functions, these would be imported into __init__.py, so the user can simply do foo.<some gp function>)
But what i am wondering is if its a good idea to simply make the _gp.py into a sub package, and split it off, into categories, since i don't really want to put 60 odd random functions together. Generally these would most likely be categorized by what type, so we would have foo.gp.raster, for all the raster methods. However i kinda feel like thats going to get irritating for anyone to use. Since we have to use the full namespace path (:cry:) due to internal reasons.

Also the second thing i also am tasked with, is that we want these modules to have a debugging logger. Such that certain key bits of information is stored. I know i can simply use

code:
_logger = logging.getLogger(__name__)
However that does not seem to work well, since i basically want to add all the loggers as children of a common logger. E.g. such that foo.path uses the logger name foo.path. Any suggestions for this?

Can anyone shed some light on good practices for packages, and is there anything terribly wrong with my design?

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

Does anyone have any good suggestions and recommendations for good practices when creating python packages?

It's probably best to just end it all now. Death is the only escape from distutils/setuptools et al.

(I'll discuss more in detail later. This is as much a reminder for me to write a big pile of words as it is me being snarky.)

Lono
Jun 30, 2004

In a world of thieves, the only final sin is stupidity.
We have a site that uses some hosted Filemaker :downs: databases to serve up some webpages and act as an e-commerce site. The host does not do a good job of monitoring the databases and they close 2-3 times a week causing the site to not load.

I'm sick of getting calls about the site going down and being the only person in the company that has the capability to pick up the phone and call the host to have them open the files again.

I put this together (this is my first ever attempt at something like this) to check the url and if I gett an HTTP Error response send out an email to our host's support inbox.

I was wondering if any of you saw any problems with my script. Also, is there a way to run this with cron on OSX or is there a better way to run this? I want the script to run every 10 minutes.


code:
import urllib2
import smtplib
from email.mime.text import MIMEText

smtpuser = 'mygmailaddress@gmail.com'
smtppass = 'mygmailpass'
recipients = ['support@host.com', 'me@mycompany.com', 'boss@mycompany.com']

body = open('mailtext.txt', 'rb')
msg = MIMEText (body.read())
body.close()
msg['Subject'] = 'Files Closed'
msg['From'] = 'mygmailaddress@gmail.com'
msg['To'] = 'support@host.com'


url = ('http://www.example.com/urlthatshouldbeup')
try:
	connection = urllib2.urlopen(url)
	connection.close()
except urllib2.HTTPError, e:
	mailServer = smtplib.SMTP('smtp.gmail.com' ,587)
	mailServer.ehlo()
	mailServer.starttls()
	mailServer.ehlo()
	mailServer.login(smtpuser, smtppass)
	mailServer.sendmail(smtpuser,recipients,msg.as_string())
	mailServer.close()	

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

The other important part is, certain methods have 2-3 different ways of doing certain things depending on what modules currently exist in the system. For example if the user has python 2.4 then generally they will have win32com, however in 2.5 win32com is not installed by default with ArcGIS python install, so we write a method of doing the same thing via ctypes.

Why not just specify a dependency on the ctypes distribution when you're in Python 2.4? If you're using setuptools, it's as easy as adding install_requires = ['ctypes'] to your setup call (when running Python 2.4).

UberJumper posted:

code:
if user_has_win32com:
  do_win32_foo()
elif user_has_ctypes:
  do_ctypes_foo()
else:
  do_something_else()

More to the point though, this should happen only once per file (if at all). What's the point in doing this for every function/function call? It just makes code hard to read. Do this:

code:
if user_has_win32com:
   def foo(): pass
elif user_has_ctypes:
   def foo(): pass
else:
   def foo(): pass

UberJumper posted:

code:
foo package\
  __init__.py (populate global variables, and figure out what the user has installed on their system)
...
__init__.py, so the user can simply do foo.<some gp function>)

Why is there a root-level __init__.py? That doesn't really even make sense.

UberJumper
May 20, 2007
woop

Avenging Dentist posted:

Why not just specify a dependency on the ctypes distribution when you're in Python 2.4? If you're using setuptools, it's as easy as adding install_requires = ['ctypes'] to your setup call (when running Python 2.4).

Didn't realize that i'll take a look at it. The ctypes stuff is pretty much a very small user group, who does not have pywin32 installed. I'll take a look tho.

quote:

More to the point though, this should happen only once per file (if at all). What's the point in doing this for every function/function call? It just makes code hard to read. Do this:

code:
if user_has_win32com:
   def foo(): pass
elif user_has_ctypes:
   def foo(): pass
else:
   def foo(): pass

Thats awesome i love it.

quote:

Why is there a root-level __init__.py? That doesn't really even make sense.

I thought by default all python packages require an __init__.py file, in the root of the package?

Avenging Dentist
Oct 1, 2005

oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

I thought by default all python packages require an __init__.py file, in the root of the package?

The root of the package, yes (i.e. foo_package/__init__.py). Not in the directory containing the package directory. It's like index.html for a website.

Adbot
ADBOT LOVES YOU

UberJumper
May 20, 2007
woop

Avenging Dentist posted:

The root of the package, yes (i.e. foo_package/__init__.py). Not in the directory containing the package directory. It's like index.html for a website.

Blah thanks i think i just screwed up my example.

  • Locked thread