Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Kerpal
Jul 20, 2003

Well that's weird.
Does anyone have experience with the _winreg module? I'm trying to write a script that deletes a registry key in HKLM\Software\Classes\Installer\Products, however the _winreg DeleteKey method cannot delete keys with subkeys, which gives a WindowsError exception "Access is denied" error 5. The only solutions I could think of was to create a function that recursively deletes every subkey in the input key before deleting the primary key. I'm trying to do this remotely, so I have to be able to connect to a remote registry. Another solution was to call the reg command like:

reg delete "\\\\%s1\\HKLM\Software\Classes\Installer\Products\%s2"

Where %s1 is the computer and %s2 is the product key I'm trying to delete. Running this command produces a different error, "The procedure number is out of range." Running the same command on the local machine worked fine.

Adbot
ADBOT LOVES YOU

Titan Coeus
Jul 30, 2007

check out my horn

Kerpal posted:

Does anyone have experience with the _winreg module? I'm trying to write a script that deletes a registry key in HKLM\Software\Classes\Installer\Products, however the _winreg DeleteKey method cannot delete keys with subkeys, which gives a WindowsError exception "Access is denied" error 5. The only solutions I could think of was to create a function that recursively deletes every subkey in the input key before deleting the primary key. I'm trying to do this remotely, so I have to be able to connect to a remote registry. Another solution was to call the reg command like:

reg delete "\\\\%s1\\HKLM\Software\Classes\Installer\Products\%s2"

Where %s1 is the computer and %s2 is the product key I'm trying to delete. Running this command produces a different error, "The procedure number is out of range." Running the same command on the local machine worked fine.

The "rdelete" function here might help: http://code.activestate.com/recipes/476229-yarw-yet-another-registry-wrapper/

The Gripper
Sep 14, 2004
i am winner

Kerpal posted:

Does anyone have experience with the _winreg module? I'm trying to write a script that deletes a registry key in HKLM\Software\Classes\Installer\Products, however the _winreg DeleteKey method cannot delete keys with subkeys, which gives a WindowsError exception "Access is denied" error 5. The only solutions I could think of was to create a function that recursively deletes every subkey in the input key before deleting the primary key. I'm trying to do this remotely, so I have to be able to connect to a remote registry. Another solution was to call the reg command like:

reg delete "\\\\%s1\\HKLM\Software\Classes\Installer\Products\%s2"

Where %s1 is the computer and %s2 is the product key I'm trying to delete. Running this command produces a different error, "The procedure number is out of range." Running the same command on the local machine worked fine.
I've used _winreg this week actually, I don't know if doing it recursively would even work because of how annoying as gently caress the API is. EnumKey(key,index) sucks a whole lot and I'm pretty sure if you delete a key while calling EnumKey in a loop, the index fucks up and can skip keys (which can leave you with non-empty keys that can't be deleted on the first pass).

I ended up with this load of garbage of a script to do it, by enumerating keys into a list and deleting one-by-one backwards after, outside of the EnumKey loop:
Python code:
import _winreg

def enum_keys(key_str, subkey_str):
    l = [subkey_str]
    try:
        current_key = _winreg.OpenKey(key_str,subkey_str)
    except WindowsError:
        print "Key not found."
        raise

    i = 0
    try:
        while True:
            next = _winreg.EnumKey(current_key,i)
            next_str = "%s\\%s" % (subkey_str,next)
            l.append(enum_keys(key_str,next_str))
            i+=1
    except WindowsError:
        return l

def delete_keys(arr,key):
    while type(arr) == type([]) and len(arr) > 0:
        delete_keys(arr.pop(),key)

    #empty list case
    if type(arr) == type([]):
        return

    #remove key
    print "Deleting: " + arr
    _winreg.DeleteKey(key,arr)
    return


key_str = _winreg.HKEY_CURRENT_USER
subkey_str = "SOFTWARE\\test"

x = enum_keys(key_str,subkey_str)

delete_keys(x,key_str)
There's probably a much easier way, and this isn't particularly thoroughly tested.

The Gripper fucked around with this message at 17:46 on Oct 4, 2012

Kerpal
Jul 20, 2003

Well that's weird.
Sweet, thanks Gripper! That works exactly like I wanted it to. The only thing I changed was the key_str argument, instead I'm using:

Python code:
HKLM = _winreg.HKEY_LOCAL_MACHINE
key_str = _winreg.ConnectRegistry(r'\\computer_name', HKLM)
to connect to the remote machine. The code traversed all the keys and removed them, exactly like I need. Your enumerate keys function looked very similar to what I was trying, which started as something like:

Python code:
def delete_subkeys(key, delete_list):
    for i in delete_list:
        product_key = i[0]
        c = 0
        skey = _winreg.OpenKey(key, product_key)
        subkeys = []
        while True:
            try:
                sub_key = _winreg.EnumKey(skey, c)
                s = '%s\\%s' % (product_key, sub_key)
                subkeys.append(s)
                c += 1
            except WindowsError:
                break
I'm trying to understand how you're traversing each key, because to me that's the real trick here. I guess it works because you're using enum_keys as a recursive function to traverse each subkey until the WindowsError exception is caught? Thanks for the help.

Kerpal fucked around with this message at 21:13 on Oct 4, 2012

raminasi
Jan 25, 2005

a last drink with no ice

JOHN SKELETON posted:

Does anyone have any clue how to unfuck EasyEclipse? Or any other IDE recommendations for Windows?

I don't know how it behaves with the console but there's actually a pretty decent Python package for Visual Studio. If you don't already have VS you can just use the free shell.

disclaimer: I'm no Python guru so this might have some glaring deficiency I haven't noticed yet

The Gripper
Sep 14, 2004
i am winner

Kerpal posted:

Sweet, thanks Gripper! That works exactly like I wanted it to. The only thing I changed was the key_str argument, instead I'm using:

I'm trying to understand how you're traversing each key, because to me that's the real trick here. I guess it works because you're using enum_keys as a recursive function to traverse each subkey until the WindowsError exception is caught? Thanks for the help.
That's pretty much it, the docs say to just loop over EnumKeys until a WindowsError is thrown, indicating there's no additional keys. That function just traverses the key towards the deepest child and produces a list of lists, with child keys being nested inside their parent keys list (so when deleting I could just pop() the last element to remove the last child).

Thinking about it, a better solution is:
Python code:
def rdelkey(key_str,subkey_str):
    _del_subkeys(key_str, subkey_str)     #children
    _winreg.DeleteKey(key_str,subkey_str) #top-parent


def _del_subkeys(key_str, subkey_str):
    l = []
    current_key = _winreg.OpenKey(key_str,subkey_str)
    try:
        i=0
        while True:
            next = _winreg.EnumKey(current_key,i)   #get next subkey name
            next_str = "%s\\%s" % (subkey_str,next) #create full path to subkey
            l.append(next_str)
            _del_subkeys(key_str,next_str)          #delete all subkeys in subkey
            i+=1
    except WindowsError: #no more subkeys available
        for item in l:
            _winreg.DeleteKey(key_str,item)
Where you use rdelkey(key_str,subkey_str) the same way you would enum_keys in the previous one. This one's probably a bit easier to wrap your head around and I wish i'd thought of it before writing that last confusing list-pop one.

Edit; whoops named a function wrong.

The Gripper fucked around with this message at 01:41 on Oct 5, 2012

the
Jul 18, 2004

by Cowcaster
http://pastebin.com/3q8dW3EY

I'm doing an assignment where I have to do a certain number of "sweeps" in a loop to get an accurate value.

My professor's Fortran version does 2000 sweeps in 3 seconds on his laptop. Mine currently does 100 sweeps in about 10 minutes on a quad-core desktop. So, I think something is really wrong with my code.

The loop is here:

Python code:
change = 1.
while change < 101:
    for i in range(1, xx.size-1):
            for j in range(1, xx.size-1):
                for k in range(1, xx.size-1):
                    v[i,j,k] = (1./6)*(v[i+1,j,k]+v[i-1,j,k] + v[i,j+1,k]+v[i,j-1,k] + v[i,j,k+1]+v[i,j,k-1]) + (b_width**2/(6*enot))*rho_grid[i,j,k]
    print change
    change = change + 1
How could I do that differently to speed it up tremendously? Or is Fortran just that much faster than Python?

the fucked around with this message at 02:29 on Oct 10, 2012

Titan Coeus
Jul 30, 2007

check out my horn

the posted:

http://pastebin.com/3q8dW3EY

I'm doing an assignment where I have to do a certain number of "sweeps" in a loop to get an accurate value.

My professor's Fortran version does 2000 sweeps in 3 seconds on his laptop. Mine currently does 100 sweeps in about 10 minutes on a quad-core desktop. So, I think something is really wrong with my code.

The loop is here:

Python code:
change = 1.
while change < 101:
    for i in range(1, xx.size-1):
            for j in range(1, xx.size-1):
                for k in range(1, xx.size-1):
                    v[i,j,k] = (1./6)*(v[i+1,j,k]+v[i-1,j,k] + v[i,j+1,k]+v[i,j-1,k] + v[i,j,k+1]+v[i,j,k-1]) + (b_width**2/(6*enot))*rho_grid[i,j,k]
    print change
    change = change + 1
How could I do that differently to speed it up tremendously? Or is Fortran just that much faster than Python?

Perhaps someone who knows more about this than me see something I don't, but other than doing a couple calculations ahead of time (e.g. 1./6, (b_width**2/(6*enot))), I don't see anything in particular you can do to speed this up. This is not unusual, but I am surprised this is so much slower so hopefully I am missing some insight someone else will have.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

See if you can reduce the number of nested for loops. Right now you have 4 nested for loops (your while loop is just another for loop in disguise), and that's what's slowing you down. Specifically, you should be able to eliminate all of the inner for loops using itertools.product. Itertools is made for these kinds of things, and I bet you'll see a nice speedup just from that.

Also, your quad core processor isn't being utilized. If you really need to speed stuff up, it may be worth looking into the built in multiprocessing library, but it will add a lot of complexity and you might need to rethink the problem a bit. I don't know if your professor's Fortran version can utilize multiple cores or not.

I don't know how much it would help in your situation, but you could also look into PyPy or Cython. I've heard very good things about them for speeding up numerical things like this.

e: I thought about it a bit more and doubted myself on the loop thing. Then I tested it and found no difference

fart simpson fucked around with this message at 03:18 on Oct 10, 2012

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
"use numpy"

Titan Coeus
Jul 30, 2007

check out my horn

MeramJert posted:

See if you can reduce the number of nested for loops. Right now you have 4 nested for loops (your while loop is just another for loop in disguise), and that's what's slowing you down. Specifically, you should be able to eliminate all of the inner for loops using itertools.product. Itertools is made for these kinds of things, and I bet you'll see a nice speedup just from that.

Also, your quad core processor isn't being utilized. If you really need to speed stuff up, it may be worth looking into the built in multiprocessing library, but it will add a lot of complexity and you might need to rethink the problem a bit. I don't know if your professor's Fortran version can utilize multiple cores or not.

I don't know how much it would help in your situation, but you could also look into PyPy or Cython. I've heard very good things about them for speeding up numerical things like this.

Itertools might be useful (I'll be reading the documentation on itertools after this post..) but the other suggestions I don't think will be. Each iteration of the loop has a dependency on a prior iteration, so there isn't anything to parallelize. PyPy doesn't have NumPy support ("soon"), and if he is going to use Cython he might as well just write the entire thing in C.

fart simpson
Jul 2, 2005

DEATH TO AMERICA
:xickos:

Yeah well, I just ran the code on my machine and got ~5.8 seconds per sweep using his original version, and ~5.8s per sweep using itertools instead.

The real slowdown is entirely on this line:
Python code:
v[i,j,k] = (1./6)*(v[i+1,j,k]+v[i-1,j,k] + v[i,j+1,k]+v[i,j-1,k] + v[i,j,k+1]+v[i,j,k-1]) + (b_width**2/(6*enot))*rho_grid[i,j,k]
which takes > 0.015 seconds per run, hundreds of times on each sweep


He is.

fart simpson fucked around with this message at 03:27 on Oct 10, 2012

Nippashish
Nov 2, 2005

Let me see you dance!

This is the right answer. You can probably turn those three for loops into a single expression with the right slicing of the arrays. If you find yourself writing a loop over the elements of a numpy array there is a >99.9% chance you are doing something wrong and your code will be stupendously slow.

The mindset you need when you're using numpy is pretty much the same as the mindset you need when writing matlab: Executing a line of python code is super, super slow and you should be making every effort to execute as few lines of python as possible. This means you need to push elementwise operations into numpy instead of writing the loops in python.

As an illustration, the difference in speed between these two snippits is about a factor of 10 on my machine:
code:
import numpy as np

X = np.random.standard_normal(size=(100,100,100))
Y = X**2
and
code:
import numpy as np

X = np.random.standard_normal(size=(100,100,100))
Y = np.zeros_like(X)
for x in xrange(100):
    for y in xrange(100):
        for z in xrange(100):
            Y[x,y,z] = X[x,y,z]**2
The result in both cases is the same. The first is an example of the right way to use numpy, and the second is an example of the wrong way to use numpy.

Emacs Headroom
Aug 2, 2003

But in a crappy way. Numpy sucks if you use it like Python lists. You have to use it like it's meant to be used if you want to get the power out of it.

But he should also be using scipy stuff as much as possible. The first part of the expression: v[i,j,k] = (1./6)*(v[i+1,j,k]+v[i-1,j,k] + v[i,j+1,k]+v[i,j-1,k] + v[i,j,k+1]+v[i,j,k-1]), is clearly a 3d convolution with a small filter function. He could be a little smarter about it, split the 3d filter into slices where the middle part of the filter is a box and the two ends are a point. So it would be something like (in pseudo-python code):

Python code:
boxconv_v = np.zeros(v.shape)
for i in range(v.shape[0]):
    boxconv_v[i, :, :] = scipy.signal.convolve2d(v[i, :, :], boxfilter, mode="same")
for i in range(1, v.shape[0]-1):
    v = v[i-1, :, :] + boxconv_v[i, :, :] + v[i+1, :, :]
v = v / 6. + constant * rho_grid

the
Jul 18, 2004

by Cowcaster
Thanks for the input, guys.

What I did is set it, instead of a count, to not stop until the difference between the current and former loop is less than 0.01, which is about the accuracy I need. It's been running for about 2 hours and looks to be on track (currently at 0.26 and declining!)

Scaevolus
Apr 16, 2007

the posted:

Thanks for the input, guys.

What I did is set it, instead of a count, to not stop until the difference between the current and former loop is less than 0.01, which is about the accuracy I need. It's been running for about 2 hours and looks to be on track (currently at 0.26 and declining!)
You should still look into SciPy/NumPy, if you do things like this a lot it will make your life much easier.

onionradish
Jul 6, 2006

That's spicy.
What is the "best practices" method for working with settings/INI-type files in Python?

For example, I have a main script that downloads using a custom host and password to be specified in an INI-like file, and I want to update that INI with the "last-date-processed" so it doesn't process old files. The scripts are running under my complete control, and I'm not worried about someone adding malicious values or codes.

Should I manually read/write a TXT file and parse it, should I use "import xxx.ini" to load it into my script, should I use something like JSON or pickle to save/load the variables, or is there a preferred library I should use?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
JSON seems fine, although I'd separate configuration out from the cached state.

OnceIWasAnOstrich
Jul 22, 2006

onionradish posted:

What is the "best practices" method for working with settings/INI-type files in Python?

There is ConfigParser built in if you like INI-style files and want to be able to easily hand-edit them which can be a pain for JSON files for people who don't use JSON or javascript.

onionradish
Jul 6, 2006

That's spicy.

OnceIWasAnOstrich posted:

There is ConfigParser built in if you like INI-style files and want to be able to easily hand-edit them which can be a pain for JSON files for people who don't use JSON or javascript.
Thanks! That looks like exactly what I was after!

Hed
Mar 31, 2004

Fun Shoe
Please be careful and use getBoolean instead of any other method when trying to do your true/false settings, lest you get the wrong result from the line "DEBUG=False" since non-empty strings are actually True! Ask me how I know this.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Or just use JSON or something which doesn't have this crazy format for strings and actually has structured data and things.

Your pick.

BigRedDot
Mar 6, 2008

For simple configs I usually go with YAML or json.

thepedestrian
Dec 13, 2004
hey lady, you call him dr. jones!
I've written a small python script to download a large amount of files from a remote HTTP server. It uses threading and a Semaphore almost exactly like is detailed here. It seems to run fine on my MacBook using Python 2.7 on Snow Leopard, but when I try to run it using Python 2.7 on my Ubuntu 12.04 VirtualBox VM on the same MacBook, it doesn't work. When I run it from an SSH session it runs for a few seconds and then hangs. When I run it straight from the VirtualBox console, it doesn't download anything and hangs. Then after a while in both cases it says connection timeout/failure in name resolution.

There definitely isn't anything wrong with the network connection or name resolution on the VM (Downloading remote HTTP files works fine using wget or anything else), so I am at a loss. What could be causing my script to work on Snow Leopard but fail on Ubuntu 12.04 using the same version of Python?

tef
May 30, 2004

-> some l-system crap ->
edit: nope

tef fucked around with this message at 22:51 on Oct 12, 2012

The Gripper
Sep 14, 2004
i am winner

thepedestrian posted:

I've written a small python script to download a large amount of files from a remote HTTP server. It uses threading and a Semaphore almost exactly like is detailed here.
Do you have the same issue if you rip the threading code out and just hit the files one after the other in the main thread? I seem to remember someone having that same problem occur in a non-threaded application and it'd help to narrow it down to a urllib2 or whatever issue independent of threads.

thepedestrian
Dec 13, 2004
hey lady, you call him dr. jones!

The Gripper posted:

Do you have the same issue if you rip the threading code out and just hit the files one after the other in the main thread? I seem to remember someone having that same problem occur in a non-threaded application and it'd help to narrow it down to a urllib2 or whatever issue independent of threads.

I did this and it still hung on Ubuntu. I wasn't able to find any specific issue with urllib2, but I actually moved to urllib3 which is specifically designed for threaded http requests and website crawls and did away with my own threading implementation and it seems to have solved my issue. Thanks for your help.

Jo
Jan 24, 2005

:allears:
Soiled Meat
What is the most universally acceptable way of going from a numpy boolean array of shape n by 1 into a double array of shape n/64 by 1? I'm using a horrible packing masterpiece and figure there's a better way.

Scaevolus
Apr 16, 2007

Jo posted:

What is the most universally acceptable way of going from a numpy boolean array of shape n by 1 into a double array of shape n/64 by 1? I'm using a horrible packing masterpiece and figure there's a better way.

Something like this?
code:
import numpy as np
x = np.random.randint(2, size=64 * 8).view(np.bool_) # a bunch of random bools
y = np.packbits(x.view(np.uint8)).view(np.float64)   # doubles made from them

Jo
Jan 24, 2005

:allears:
Soiled Meat

Scaevolus posted:

Something like this?
code:
import numpy as np
x = np.random.randint(2, size=64 * 8).view(np.bool_) # a bunch of random bools
y = np.packbits(x.view(np.uint8)).view(np.float64)   # doubles made from them

That's absolutely perfect! Way better than what I had. Thank you!

Shaocaholica
Oct 29, 2002

Fig. 5E
I need some help multi processing. I have a python script that currently generates a bunch of shell commands which it runs serially. I want to run these shell commands in parallel limited by the cpu count of the machine.

In a nutshell my script is doing this:
code:
import subprocess

class App(blah.App):

    def main(self):
        for file in files:
            print('some message about the current command to be called')
            subprocess.call(shlex.split(cmd))

if __name__ == '__main__':
    App().run(sys.argv)
and here's what I have to get it to multiprocess but with no luck:
code:
import subprocess
import multiprocessing

class App(blah.App):

    def do_shit(output_file_path, cmd, msg):
        #Skip if exists
        if os.path.exists(output_file_path):
            print('skipping due to existing file: ' + output_file_path)
        else:
            print(msg)
            subprocess.call(shlex.split(cmd))
            
    def main(self):
        #I think this is supposed to create a pool size = cpu_count
        pool = multiprocessing.Pool(None)
                
        for file in files:
            #some logic happens here and variables are set
            
            #I think I'm doing it wrong here
            pool.map(do_shit, mov_file_name, mov_cmd, msg)

if __name__ == '__main__':
    App().run(sys.argv)

The Gripper
Sep 14, 2004
i am winner

Shaocaholica posted:

I need some help multi processing. I have a python script that currently generates a bunch of shell commands which it runs serially. I want to run these shell commands in parallel limited by the cpu count of the machine.

In a nutshell my script is doing this:
I don't think you want pool.map for this. map(f,i) applies function f to each item in iterator i, and returns a list of values returned by f(x) in the same order as supplied to f(x) by the iterator, though it will execute each of them simultaneously (or as simultaneous as processes=x specifies):
Python code:
import multiprocessing
import random, time

def f(x):
    time.sleep(random.randint(1,2))
    print x
    return x

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    print pool.map(f, range(10))

"""
E:\code>python pool.py
210


3
54

86

7
9
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
"""
Where the numbers outside the list are from the print x line in f(x), executing out of order because of the random pause, and the list is the result of all f(x) calls in the correct order. (Edit; you *can* use pool.map(f,i) as long as you're passing an iterable to it, so you could possibly pass files to it directly and have do_shit deal with the variables, though it's really meant for transforming one list into another).

You'll want apply_async, used in this way:
Python code:
import multiprocessing
import random, time

def f2(x,y):
    time.sleep(random.randint(1,2))
    return x+y

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=4)
    results = []
    
    for n in range(10):
        result = pool.apply_async(f2,(n,n+2,))
        results.append(result)

    print "Pool created, sleeping for 10"
    time.sleep(10)

    for r in results:
        print r.get()
In that code, pool.apply_async immediately applies the function f2(n,n+2) and stores the result object in result. You can then retrieve the result with result.get(timeout=x), if necessary. I added the time.sleep(10) in there so you can see that the async processes all complete before r.get() is called, which wouldn't be possible in serial because of the random 1-2s pause. If you comment that pause out you'll see that r.get() blocks until the result is available, so you can use that to determine when each of your pooled processes has completed.

I managed to forkbomb myself twice during the writing of that, I impress myself.

The Gripper fucked around with this message at 12:07 on Oct 18, 2012

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
The concurrent.futures module looks like it's designed in terms of how you were trying to use multiprocessing. Submitting a single job (with submit) immediately returns a Future object that (among other things) can tell you when the computation is done and can return the value when it is.

The Gripper
Sep 14, 2004
i am winner
Yeah, concurrent.futures is basically a more fleshed out multiprocessing module with finer-grained control over pooled tasks, it works very similarly to how Futures work in scala/Akka (py3k feature, but pypi has a backport of it).

And now for a question!: has anyone successfully used pypy sandbox on Windows? I can successfully translate it (or at least, it completes without fatal error), but actually trying to get it up and running isn't happening.

pypy pypy_interact.py e:\pypy\pypy-sandbox.exe results in "ValueError: close_fds is not supported on Windows platforms if you redirect stdin/stdout/stderr", and changing line 145 of sandlib.py to close_fds=False just dumps an RPython traceback.

I can't find any documentation on it (or anyone saying "it works/doesn't work on Windows"), so I'm just flailing around trying different things like CPython to execute pypy_interact.py, with differing but still failing results.

Are there any other sandboxed python solutions? I can just dump it into a VM and execute there, but i'd rather have something that won't require reloading a snapshot or restarting the VM if something fucks up horribly (which it will).

Edit; after bonering around with it in a VM it seems like it's just not worth the effort, can't import half the modules I'd like and a ton of things that are documented as working just don't, like os.listdir() erroring out without a traceback. Back to the drawing board!

The Gripper fucked around with this message at 23:59 on Oct 18, 2012

cancelope
Sep 23, 2010

The cops want to search the train
I am a translator by day and so I spend most of my time in Microsoft Word--Word 2011 on OS X and 2012 on Windows 7. There are a lot of repetitive things I could stand to automate; for example, before I send in a file I usually run a search-and-replace of doubled whitespace. I'm able to automate some of these things using macros, but anything more complicated seems like it would require the use of Visual Basic (gross). For example, one very simple thing I could use would be something to warn me about the use of any single-letter words except for "I" and "a." Eventually I'd like to build a package of checkers (perhaps employing NLTK) that I could use as tool for grammar.

There's got to be some way to interact with Word documents via Python. Live integration would be excellent, although I'm willing to go for noninteractive use as well. Thoughts? On Mac it might be possible to hack up something terribly ugly with Microsoft's Automator workflows and some wack AppleScript.

Or perhaps I should just start doing my work in Markdown, processing it in plain text, and generating Word-compatible files from there. My work uses simple formatting so that's not a problem, although itwould interfere with the workflow for any document that I'm not writing from scratch.

cancelope fucked around with this message at 12:40 on Oct 19, 2012

M31
Jun 12, 2012

asaf posted:

There's got to be some way to interact with Word documents via Python.
I don't know about Word, but LibreOffice has a Python API.

The Gripper
Sep 14, 2004
i am winner
pywin32 has some Word-related functionality, though it basically wraps the VBA functionality and opens an instance of Word to perform everything in there (though the instance can be hidden). I've seen it used for grading applications, mostly. I think it works with all current Word versions.

http://sourceforge.net/projects/pywin32/ + http://www.galalaly.me/index.php/2011/09/use-python-to-parse-microsoft-word-documents-using-pywin32-library/

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



For the more modern .docx format, if you're using it https://github.com/mikemaccana/python-docx (I haven't personally tried this library)

e: there's always IronPython and .Net's word interop library Microsoft.Office.Interop.Word. Using regular .Net libraries in Python is kind of annoying, but not nearly as annoying as trying to write VBscript in Office

Munkeymon fucked around with this message at 15:10 on Oct 19, 2012

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Just learn VBA. It's annoying and stupid and dumb, but it's less annoying and stupid and dumb than dealing with all of the garbage that's a hacked toolchain like pywin32.

Adbot
ADBOT LOVES YOU

cancelope
Sep 23, 2010

The cops want to search the train

Suspicious Dish posted:

Just learn VBA. It's annoying and stupid and dumb, but it's less annoying and stupid and dumb than dealing with all of the garbage that's a hacked toolchain like pywin32.

I agree that all of this seems pretty unwieldy, and the things that could work well (like IronPython) are hindered by lack of good cross-platform support. I'm a worse programmer than I was three years ago and figuring out a new language or massive new API would require more time than I have.

I think I might indeed best be served by using plaintext with markup for light formatting. That at least presents a simple linear progression where I can start on the CLI with piped tools, slap on a simple browser-based or GUI text editing control (or script a text editor like Sublime Text) when I feel ready, and organically grow this into a fuller platform by adding more and more GUI hooks to the functionality. Famous last words, I know!

cancelope fucked around with this message at 16:50 on Oct 19, 2012

  • Locked thread