Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

nbv4: Aug 21, 2002; by Duchess Gummybuns

Avenging Dentist posted:

Ok so strip out all elements more than a standard deviation from the mean:
code:
A[(A < A.mean()+A.std()) & (A > A.mean()-A.std())]

awesome, thats perfect!

# ? Feb 8, 2010 19:48

Adbot: ADBOT LOVES YOU

# ? Jun 2, 2024 20:18

spankweasel: Jan 4, 2006

Ugh. I'm frustrated with trying to extend unittest so the test cases will run in their own thread. I have a collection of tests which take 15-20 minutes each to run, but all do something completely different than any of the others (install VirtualBox guests) and running this test in serial will take too long.

Has anybody screwed around with trying to thread test cases?

# ? Feb 8, 2010 22:27

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

spankweasel posted:

Ugh. I'm frustrated with trying to extend unittest so the test cases will run in their own thread. I have a collection of tests which take 15-20 minutes each to run, but all do something completely different than any of the others (install VirtualBox guests) and running this test in serial will take too long.

Has anybody screwed around with trying to thread test cases?

Have you tried using nose with its multiprocess plugin?

# ? Feb 8, 2010 23:27

tripwire: Nov 19, 2004; _{ghost flow}

Avenging Dentist posted:

Ok so strip out all elements more than a standard deviation from the mean:
code:
A[(A < A.mean()+A.std()) & (A > A.mean()-A.std())]

drat. Numpy's fancy array indexing is awesome.

# ? Feb 9, 2010 03:36

Captain Capacitor: Jan 21, 2008; The code you say?

I'm trying out a build of Python 3.2 from the Mercurial branch, and I came across this tidbit after I encountered some build errors on OSX.

README posted:

On OSX and Cygwin, the executable is called python.exe; elsewhere it's just python.

Anyone have any idea why it's named as such now? :psyduck:

# ? Feb 9, 2010 15:48

good jovi: Dec 11, 2000; 'm pro-dickgirl, and I VOTE!

Captain Capacitor posted:

Anyone have any idea why it's named as such now?

I don't know if this is exactly what you're asking, but that's just the pre-installation binary. It's that way in 2.6 as well. As for why, I cannot imagine.

# ? Feb 9, 2010 16:30

spankweasel: Jan 4, 2006

Sailor_Spoon posted:

Have you tried using nose with its multiprocess plugin?

I'll give this a shot today. Thanks for the idea.

# ? Feb 9, 2010 16:37

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

Captain Capacitor posted:

I'm trying out a build of Python 3.2 from the Mercurial branch, and I came across this tidbit after I encountered some build errors on OSX.

Anyone have any idea why it's named as such now?

The original reason was the fact that HFS+ in normal operation, like windows, is case insensitive. having a Python directory, and a binary named "python" makes things unhappy.

# ? Feb 10, 2010 03:51

maskenfreiheit: Dec 30, 2004

Edit: doublepost

maskenfreiheit fucked around with this message at 01:22 on Mar 13, 2017

# ? Feb 10, 2010 05:18

ATLbeer: Sep 26, 2004; Über nerd

GregNorc posted:

How intelligent is python about using resources?

For example, I'm writing a small script that takes a dictionary file and outputs all the permutations a word could have. (Input butts, get butt$, bu++s, b|_|tts, etc). Relatively simple, but when you have a 50mb dictionary file it's not instantaneous either.

I'm also looking at other stuff too, like crawling through larger datasets (wikipedia dumps, project Gutenberg book collections, etc)

I figure it's good to be thinking about efficiency early with a small project like this.

While I don't expect python to be incredibly efficient/intelligent, is it at least smart enough to spread something like this across both cores of a machine, or am I going to have to pick read up on multithreading or something to achieve that? (so if I have [x] cores, it'll split the task amongst them?)

http://docs.python.org/library/multiprocessing.html

# ? Feb 10, 2010 06:02

Captain Capacitor: Jan 21, 2008; The code you say?

ATLbeer posted:

http://docs.python.org/library/multiprocessing.html

Just came from a talk at a local Python authors group, someone gave a talk about Twisted and Multiprocessing

# ? Feb 10, 2010 07:06

maskenfreiheit: Dec 30, 2004

Edit: doublepost

maskenfreiheit fucked around with this message at 01:23 on Mar 13, 2017

# ? Feb 10, 2010 16:18

tehk: Mar 10, 2006; [-4] Flaw: Heart Broken - Tehk is extremely lonely. The Gay Empire's ultimate weapon finds it hard to have time for love.

GregNorc posted:

So without using that library, python will restrict all work to one core?

One thread.

edit:link added

tehk fucked around with this message at 16:35 on Feb 10, 2010

# ? Feb 10, 2010 16:32

king_kilr: May 25, 2007

tehk posted:

One thread.

edit:link added

No, you can have multiple threads (using the threading module), they might even be assigned to multiple cores by your operating system's scheduler, however they will not be run concurrently.

# ? Feb 10, 2010 17:19

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

Captain Capacitor posted:

Just came from a talk at a local Python authors group, someone gave a talk about Twisted and Multiprocessing

Twisted and multiprocessing? I'm interested in hearing more.

# ? Feb 10, 2010 17:49

king_kilr: May 25, 2007

m0nk3yz posted:

Twisted and multiprocessing? I'm interested in hearing more.

I'm interested in hearing less. I mean twisted's ok (in an 800lb gorrilla sense of the term), and multiprocessing is pretty snazzy, but BOTH AT ONCE? I'm pretty sure twisted isn't even threadsafe.

# ? Feb 10, 2010 18:26

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

king_kilr posted:

I'm interested in hearing less. I mean twisted's ok (in an 800lb gorrilla sense of the term), and multiprocessing is pretty snazzy, but BOTH AT ONCE? I'm pretty sure twisted isn't even threadsafe.

It depends; is it multiprocessing, or process management within twisted, which is completely different?

# ? Feb 11, 2010 00:11

tehk: Mar 10, 2006; [-4] Flaw: Heart Broken - Tehk is extremely lonely. The Gay Empire's ultimate weapon finds it hard to have time for love.

king_kilr posted:

No, you can have multiple threads (using the threading module), they might even be assigned to multiple cores by your operating system's scheduler, however they will not be run concurrently.

I meant that without using a module like threading or multiprocessing you will be stuck on one thread. I assumed his question was indirectly asking "What is python's default behavior in terms of concurrency/multithreading/whatever"

# ? Feb 11, 2010 03:08

Captain Capacitor: Jan 21, 2008; The code you say?

m0nk3yz posted:

Twisted and multiprocessing? I'm interested in hearing more.

I've asked the presenter to send me a copy at his convenience, I'll post a link to it at some point.

How he arranged it was using Twisted to handle the web service and start/stop workers created using the Multiprocess package.

For those (like myself) who have a less than keen interest in Twisted, I've heard good things about Celery

# ? Feb 11, 2010 04:34

m0nk3yz: Mar 13, 2002; Behold the power of cheese!

Captain Capacitor posted:

I've asked the presenter to send me a copy at his convenience, I'll post a link to it at some point.

How he arranged it was using Twisted to handle the web service and start/stop workers created using the Multiprocess package.

For those (like myself) who have a less than keen interest in Twisted, I've heard good things about Celery

Yeah, using multiprocessing for work start/stop is common - celery uses multiprocessing inside of it as well.

# ? Feb 11, 2010 19:53

ATLbeer: Sep 26, 2004; Über nerd

Perfectly timed for the GIL page of this megathread

http://ec2-174-129-96-143.compute-1.amazonaws.com/index.html

# ? Feb 11, 2010 19:59

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:

code:

str.encode("ascii", "ignore")

However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien� should be converted to "Alien 3".
2. "�" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

# ? Feb 11, 2010 23:02

ATLbeer: Sep 26, 2004; Über nerd

Thermopyle posted:

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:
code:
str.encode("ascii", "ignore")
However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien� should be converted to "Alien 3".
2. "�" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

Those are some nasty edge cases. I don't think something like that would exist in the way you want because should Alien� be Alien 3 or Alien^3 or Alien*Alien*Alien?

# ? Feb 11, 2010 23:45

deimos: Nov 30, 2006; Forget it man this bat is whack, it's got poobrain!

Thermopyle posted:

Is there a good library for sanitizing unicode text to Windows safe filenames while using some sort of heuristics for converting characters?

I have just been using this:
code:
str.encode("ascii", "ignore")
However, this isn't very robust.

My app takes movie titles from imdb.com and uses them to name files. Some example heuristics that I would like:

1. Alien� should be converted to "Alien 3".
2. "�" (em dash) should be converted to "-".

The problem is that I keep adding special cases for all these things and it feels like there should be a better way, or at least that someone else would have already came up with a bunch of these special cases.

Behold:

code:

>>> from unicodedata import normalize
>>> b = u"��ejo"
>>> normalize('NFKD', b).encode('ascii', 'ignore')
'anejo'

MAGIC!

Sure, it doesn't do everything you want, but hey, it's a solution.
Note: I couldn't test your examples.

# ? Feb 12, 2010 00:57

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

How about os.path.normpath()? http://docs.python.org/library/os.path.html#os.path.normpath

I'm not sure if it can take in unicode, but it's worth trying.

# ? Feb 12, 2010 03:23

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.

# ? Feb 12, 2010 03:29

Scaevolus: Apr 16, 2007

Plorkyeran posted:

Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.

I assume you meant printable characters, but you also can't have NULs and a few other things.

Wikipedia posted:

The Windows kernel forbids the use of characters in range 1-31 (i.e., 0x01-0x1F) and characters " * : < > ? \ / |. Although NTFS allows each path component (directory or filename) to be 255 characters long and paths up to about 32767 characters long, the Windows kernel only supports paths up to 259 characters long. Additionally, Windows forbids the use of the MS-DOS device names AUX, CLOCK$, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, CON, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, NUL and PRN, as well as these names with any extension (for example, AUX.txt), except when using Long UNC paths (ex. \\.\C:\nul.txt or \\?\D:\aux\con). (In fact, CLOCK$ may be used if an extension is provided.)

# ? Feb 12, 2010 11:58

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Plorkyeran posted:

Make sure the user is actually using a filesystem that doesn't support Unicode before you start mangling names. NTFS supports every single character other than \ / : * ? " < > |.

I have no way to guarantee that it's going to be on NTFS without making that a system requirement...which I'd rather not do.

# ? Feb 12, 2010 19:39

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Try to open the filename with the unmangled name and only mangle it if that fails. If you can create a file with a unicode filename, the filesystem supports unicode.

# ? Feb 12, 2010 20:21

Lurchington: Jan 2, 2003; Forums Dragoon

Plorkyeran posted:

Try to open the filename with the unmangled name and only mangle it if that fails. If you can create a file with a unicode filename, the filesystem supports unicode.

I ran into this on a script I wrote that took tv episode titles and used them to name files.

I removed the characters disallowed by NTFS, then attempted to write the file with the unicode characters. If an exception was thrown, I encoded as ascii. I didn't run into any cases other than accented characters (which encode to ascii somewhat gracefully)

# ? Feb 12, 2010 21:00

BattleMaster: Aug 14, 2000

I'm having a problem getting a script to work. I've traced the problem back to getopt returning two empty lists no matter what arguments are supplied to the script. Here's how it is being used:

code:

opts, args = getopt.getopt(sys.argv[1:], "bcsuo:", ["memspace=", "native-file="])

I actually didn't write the program this is used it (It's pmImgCreator.py for PyMite), but I'm pulling my hair out trying to get it to work. I've looked at the reference for getopt and everything seems right. Additionally no one else is complaining about it on the project's Google group. It just isn't working no matter what combination of arguments I've tried using it with. Even if I just tack a load of gibberish non-options onto the command line, args is empty.

Is getopt just plain broken in Windows? I've tried running it with Python 2.6.4 and even 2.5.4 with no luck.

# ? Feb 13, 2010 08:25

leterip: Aug 25, 2004

BattleMaster posted:

I'm having a problem getting a script to work. I've traced the problem back to getopt returning two empty lists no matter what arguments are supplied to the script. Here's how it is being used:
code:
opts, args = getopt.getopt(sys.argv[1:], "bcsuo:", ["memspace=", "native-file="])
I actually didn't write the program this is used it (It's pmImgCreator.py for PyMite), but I'm pulling my hair out trying to get it to work. I've looked at the reference for getopt and everything seems right. Additionally no one else is complaining about it on the project's Google group. It just isn't working no matter what combination of arguments I've tried using it with. Even if I just tack a load of gibberish non-options onto the command line, args is empty.

Is getopt just plain broken in Windows? I've tried running it with Python 2.6.4 and even 2.5.4 with no luck.

Are you sure sys.argv[1:] has data in it? It seems likely to have data so that might be something easily overlooked. That same script works perfect for me on OS X.

# ? Feb 13, 2010 22:29

BattleMaster: Aug 14, 2000

leterip posted:

Are you sure sys.argv[1:] has data in it? It seems likely to have data so that might be something easily overlooked. That same script works perfect for me on OS X.

You're right, it's empty. Clearly the arguments are not being passed to the script.

I tried running it with "c:\python25\python.exe pmImgCreator.py [arguments]" rather than "pmImgCreator.py [arguments]" and it seems to work now. Something must have been wrong with the Windows file associations. Thanks for the advice.

# ? Feb 13, 2010 22:52

UberJumper: May 20, 2007; woop

Does anyone have any good suggestions and recommendations for good practices when creating python packages?

Basically i am tasked with combining a dozen or so modules, a lot of them have overlapping functionality, in some cases certain functions have just been copied and pasted.

The other important part is, certain methods have 2-3 different ways of doing certain things depending on what modules currently exist in the system. For example if the user has python 2.4 then generally they will have win32com, however in 2.5 win32com is not installed by default with ArcGIS python install, so we write a method of doing the same thing via ctypes.

So there is alot of:

code:

if user_has_win32com:
  do_win32_foo()
elif user_has_ctypes:
  do_ctypes_foo()
else:
  do_something_else()

A lot of the methods all need to keep and interact with global variables, there is always a reference to the geoprocessor, user's toolbox path, arcgis toolbox path, etc. Most of these variables are populated generally when the module is imported.

So basically this is what i am looking at right now:

code:

foo package\
  __init__.py (populate global variables, and figure out what the user has installed on their system)
  common.py (all of the global variables used within the package)
  constants.py (all of the constants used)
  error.py (exceptions, used by this package, along with some error handling routines)
  log.py (custom logging handlers)
  path.py (basically versions of os.path abspath, exists, etc, these basically do some extra work for arcgis geoprocessor specific api stuff)
  _gp.py (geoprocessor specific functions, these would be imported into __init__.py, so the user can simply do foo.<some gp function>)

But what i am wondering is if its a good idea to simply make the _gp.py into a sub package, and split it off, into categories, since i don't really want to put 60 odd random functions together. Generally these would most likely be categorized by what type, so we would have foo.gp.raster, for all the raster methods. However i kinda feel like thats going to get irritating for anyone to use. Since we have to use the full namespace path ( :cry:

) due to internal reasons.

Also the second thing i also am tasked with, is that we want these modules to have a debugging logger. Such that certain key bits of information is stored. I know i can simply use

code:

_logger = logging.getLogger(__name__)

However that does not seem to work well, since i basically want to add all the loggers as children of a common logger. E.g. such that foo.path uses the logger name foo.path. Any suggestions for this?

Can anyone shed some light on good practices for packages, and is there anything terribly wrong with my design?

# ? Feb 16, 2010 17:35

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

Does anyone have any good suggestions and recommendations for good practices when creating python packages?

It's probably best to just end it all now. Death is the only escape from distutils/setuptools et al.

_{(I'll discuss more in detail later. This is as much a reminder for me to write a big pile of words as it is me being snarky.)}

# ? Feb 16, 2010 17:37

Lono: Jun 30, 2004; In a world of thieves, the only final sin is stupidity.

We have a site that uses some hosted Filemaker :downs:

databases to serve up some webpages and act as an e-commerce site. The host does not do a good job of monitoring the databases and they close 2-3 times a week causing the site to not load.

I'm sick of getting calls about the site going down and being the only person in the company that has the capability to pick up the phone and call the host to have them open the files again.

I put this together (this is my first ever attempt at something like this) to check the url and if I gett an HTTP Error response send out an email to our host's support inbox.

I was wondering if any of you saw any problems with my script. Also, is there a way to run this with cron on OSX or is there a better way to run this? I want the script to run every 10 minutes.

code:

import urllib2
import smtplib
from email.mime.text import MIMEText

smtpuser = 'mygmailaddress@gmail.com'
smtppass = 'mygmailpass'
recipients = ['support@host.com', 'me@mycompany.com', 'boss@mycompany.com']

body = open('mailtext.txt', 'rb')
msg = MIMEText (body.read())
body.close()
msg['Subject'] = 'Files Closed'
msg['From'] = 'mygmailaddress@gmail.com'
msg['To'] = 'support@host.com'


url = ('http://www.example.com/urlthatshouldbeup')
try:
	connection = urllib2.urlopen(url)
	connection.close()
except urllib2.HTTPError, e:
	mailServer = smtplib.SMTP('smtp.gmail.com' ,587)
	mailServer.ehlo()
	mailServer.starttls()
	mailServer.ehlo()
	mailServer.login(smtpuser, smtppass)
	mailServer.sendmail(smtpuser,recipients,msg.as_string())
	mailServer.close()

# ? Feb 16, 2010 17:52

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

The other important part is, certain methods have 2-3 different ways of doing certain things depending on what modules currently exist in the system. For example if the user has python 2.4 then generally they will have win32com, however in 2.5 win32com is not installed by default with ArcGIS python install, so we write a method of doing the same thing via ctypes.

Why not just specify a dependency on the ctypes distribution when you're in Python 2.4? If you're using setuptools, it's as easy as adding install_requires = ['ctypes'] to your setup call (when running Python 2.4).

UberJumper posted:

code:

if user_has_win32com:
  do_win32_foo()
elif user_has_ctypes:
  do_ctypes_foo()
else:
  do_something_else()

More to the point though, this should happen only once per file (if at all). What's the point in doing this for every function/function call? It just makes code hard to read. Do this:

code:

if user_has_win32com:
   def foo(): pass
elif user_has_ctypes:
   def foo(): pass
else:
   def foo(): pass

UberJumper posted:

code:

foo package\
  __init__.py (populate global variables, and figure out what the user has installed on their system)
...
__init__.py, so the user can simply do foo.<some gp function>)

Why is there a root-level __init__.py? That doesn't really even make sense.

# ? Feb 16, 2010 18:40

UberJumper: May 20, 2007; woop

Avenging Dentist posted:

Why not just specify a dependency on the ctypes distribution when you're in Python 2.4? If you're using setuptools, it's as easy as adding install_requires = ['ctypes'] to your setup call (when running Python 2.4).

Didn't realize that i'll take a look at it. The ctypes stuff is pretty much a very small user group, who does not have pywin32 installed. I'll take a look tho.

quote:

More to the point though, this should happen only once per file (if at all). What's the point in doing this for every function/function call? It just makes code hard to read. Do this:
code:
if user_has_win32com:
   def foo(): pass
elif user_has_ctypes:
   def foo(): pass
else:
   def foo(): pass

Thats awesome i love it.

quote:

Why is there a root-level __init__.py? That doesn't really even make sense.

I thought by default all python packages require an __init__.py file, in the root of the package?

# ? Feb 16, 2010 19:15

Avenging Dentist: Oct 1, 2005; oh my god is that a circular saw that does not go in my mouth aaaaagh

UberJumper posted:

I thought by default all python packages require an __init__.py file, in the root of the package?

The root of the package, yes (i.e. foo_package/__init__.py). Not in the directory containing the package directory. It's like index.html for a website.

# ? Feb 16, 2010 19:32

Adbot: ADBOT LOVES YOU

# ? Jun 2, 2024 20:18

UberJumper: May 20, 2007; woop

Avenging Dentist posted:

The root of the package, yes (i.e. foo_package/__init__.py). Not in the directory containing the package directory. It's like index.html for a website.

Blah thanks i think i just screwed up my example.

# ? Feb 16, 2010 20:05

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »