Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

supermikhail: Nov 17, 2012; "It's video games, Scully."
Video games?"
"He enlists the help of strangers to make his perfect video game. When he gets bored of an idea, he murders them and moves on to the next, learning nothing in the process."
"Hmm... interesting."

drat. I mean, there are supposed to be either one or two operands, nothing else. But, yeah, if someone passes in a three-operand problem, the third operand is going to be ignored. Of course, that's not going to happen in the normal course of the website's functioning... Hm. Or maybe it can (I mean, if someone just edited the url). Anyway, should I put that in the no-return check? (Probably not, since this method hasn't received much enthusiasm in the thread.)

Oh, what it does in general? Well, it's a mental math trainer for my personal use. :shobon:

I don't aspire to anything more than 2 operands. And at this stage I'm not sure I'm ever going to get where I could share this program online.

Also, I tried to come up with something more general than what I have right now, but I wasn't sure Django would digest it. (Please don't ask me why I had to involve Django.) There are two supertypes - OneOperandProblem and TwoOperandProblem, and those are extended into all the operations I need - AdditionProblem, SubtractionProblem, MultiplicationProblem, DivisionProblem, SquareProblem, and RootProblem.

Well, my shameful secret is out. What's yours?

# ? Jan 10, 2015 16:05

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 18:54

ShadowHawk: Jun 25, 2000; CERTIFIED PRE OWNED TESLA OWNER

superstepa posted:

I just want to use beautfulsoup to scrape data from a table once every 24 hours. I want it to work on windows and I'd like to have all the code in python within the app but at the moment cron does seem like the best option

If it needs to be in the running process, setup a signal handler and have a cron job send the signal (via the kill command). Not sure about the Windows equivalent.

If it doesn't, just have a separate cron job run a separate script.

As another slightly worse alternative (eg if your script never crashes and needs relaunching), you can launch a separate thread at startup and have it loop and sleep for the amount of time needed.

ShadowHawk fucked around with this message at 17:38 on Jan 10, 2015

# ? Jan 10, 2015 17:32

Jose Cuervo: Aug 25, 2004

I have a program written in Java that I inherited from someone else that I am trying to add functionality to. I have written a piece of code in Python and would ideally like the following:

During the course of the Java program executing, it provides two integers as input to the Python code. The Python code uses these two integers as input and returns a float to the Java program.

The Java code providing input to the Python code and the Python code handing back the float result to the Java code will happen multiple times (on the order of tens of thousands).

Is it possible to do this (call a piece of Python code with Java code)?

If so, what should I look at to begin understanding how to accomplish this?

# ? Jan 10, 2015 18:45

feedmegin: Jul 30, 2008

Jose Cuervo posted:

I have a program written in Java that I inherited from someone else that I am trying to add functionality to. I have written a piece of code in Python and would ideally like the following:

During the course of the Java program executing, it provides two integers as input to the Python code. The Python code uses these two integers as input and returns a float to the Java program.

The Java code providing input to the Python code and the Python code handing back the float result to the Java code will happen multiple times (on the order of tens of thousands).

Is it possible to do this (call a piece of Python code with Java code)?

If so, what should I look at to begin understanding how to accomplish this?

While I haven't used it personally, are you aware of http://en.wikipedia.org/wiki/Jython ? That should be able to call between java and python with zero overhead (other than that inherent in any call in a virtual machine)

# ? Jan 10, 2015 19:04

reading: Jul 27, 2013

I think I'm having an issue with a __getitem__ method that I made. I have this class:

Python code:

class GameMap(object):

    def __init__(self, id_number, level, location='surface', objects=[]):
        self.id_number = id_number
        self.level = level # this is the actual map.
        self.location = location
        self.objects = objects

    def __getitem__(self, index):
        return self.level[index]

and I'm trying to use this object to store both a list of maps, and also a list of objects (game components, like NPCs, equipment, etc) which are on that map.

But when I try to access that objects list with "mymap.objects" where mymap is a GameMap instance, I get "AttributeError: 'list' object has no attribute 'objects'". Is this caused by the __getitem__ method which always tries to return a level[index] even when I want an objects[index]? Should I somehow overload the __getitems__ method to make it possible to index this object for either of those parameters?

Edit: I think I figured out my problem, I hadn't assigned the instance "mymap" to be a GameMap object yet at the point where I was trying to index it.

reading fucked around with this message at 23:11 on Jan 10, 2015

# ? Jan 10, 2015 22:03

SurgicalOntologist: Jun 17, 2004

Errr, if you get that error from that code then I can tell you with 100% certainty that mymap is not a GameMap but is in fact a list. __getitem__ will never be invoked unless you use the square brackets. I don't know of any exceptions.

E: also don't put mutable objects (e.g. an empty list) as default arguments, that will cause problems. Instead do objects=None and put an if objects is None: clause in your __init__ to assign the real default.

SurgicalOntologist fucked around with this message at 22:16 on Jan 10, 2015

# ? Jan 10, 2015 22:14

reading: Jul 27, 2013

Doesn't a default argument just get overriden with any argument passed for that parameter? What kind of problems does that cause if there's a mutable default parameter?

# ? Jan 10, 2015 23:00

SurgicalOntologist: Jun 17, 2004

reading posted:

Doesn't a default argument just get overriden with any argument passed for that parameter? What kind of problems does that cause if there's a mutable default parameter?

Yes it does, the problem comes when it doesn't get passed. If you make two GameMaps without passing the optional objects argument, their objects attribute will both reference the same list. If you append to one of them it will change both, effectively.

That's not what caused the error you posted, though.

# ? Jan 10, 2015 23:05

reading: Jul 27, 2013

SurgicalOntologist posted:

Yes it does, the problem comes when it doesn't get passed. If you make two GameMaps without passing the optional objects argument, their objects attribute will both reference the same list. If you append to one of them it will change both, effectively.

That's not what caused the error you posted, though.

Ah I see, so when the GameMap object is instanced, that empty list in the default parameter is only shallow-copied, not deep-copied? That seems like poor behavior on Python's part, is there a reason for that?

# ? Jan 10, 2015 23:13

SurgicalOntologist: Jun 17, 2004

It's not copied at all. I believe it would actually cause a lot more strange behavior if it were copied (or at least slowdowns), but I can't come up with a good example off the top of my head. Maybe someone else can.

# ? Jan 10, 2015 23:22

QuarkJets: Sep 8, 2008

reading posted:

I think I'm having an issue with a __getitem__ method that I made. I have this class:
Python code:
class GameMap(object):

    def __init__(self, id_number, level, location='surface', objects=[]):
        self.id_number = id_number
        self.level = level # this is the actual map.
        self.location = location
        self.objects = objects

    def __getitem__(self, index):
        return self.level[index]
and I'm trying to use this object to store both a list of maps, and also a list of objects (game components, like NPCs, equipment, etc) which are on that map.

But when I try to access that objects list with "mymap.objects" where mymap is a GameMap instance, I get "AttributeError: 'list' object has no attribute 'objects'". Is this caused by the __getitem__ method which always tries to return a level[index] even when I want an objects[index]? Should I somehow overload the __getitems__ method to make it possible to index this object for either of those parameters?

Edit: I think I figured out my problem, I hadn't assigned the instance "mymap" to be a GameMap object yet at the point where I was trying to index it.

It's good practice to not mutate variables from one type to another. If mymap is a list, then it'll be better if you declare some other variable to be the GameMap object

# ? Jan 11, 2015 00:01

Begall: Jul 28, 2008

Does anyone have any recommendations for JSON-RPC libraries? Server & Client, preferably with good quality documentation?

# ? Jan 11, 2015 02:32

salisbury shake: Dec 27, 2011

Jose Cuervo posted:

I have a program written in Java that I inherited from someone else that I am trying to add functionality to. I have written a piece of code in Python and would ideally like the following:

During the course of the Java program executing, it provides two integers as input to the Python code. The Python code uses these two integers as input and returns a float to the Java program.

The Java code providing input to the Python code and the Python code handing back the float result to the Java code will happen multiple times (on the order of tens of thousands).

Is it possible to do this (call a piece of Python code with Java code)?

If so, what should I look at to begin understanding how to accomplish this?

Look into JNI/JNA on the java side and ctypes, cffi and cython for python for generic interop if you hate yourself or you find that poo poo interesting.
There are a few specific options for working with Java code from within Python, though the reverse is something I've never had to do.
https://wiki.python.org/moin/ScriptingJava

# ? Jan 11, 2015 19:42

Jose Cuervo: Aug 25, 2004

feedmegin posted:

While I haven't used it personally, are you aware of http://en.wikipedia.org/wiki/Jython ? That should be able to call between java and python with zero overhead (other than that inherent in any call in a virtual machine)

salisbury shake posted:

Look into JNI/JNA on the java side and ctypes, cffi and cython for python for generic interop if you hate yourself or you find that poo poo interesting.
There are a few specific options for working with Java code from within Python, though the reverse is something I've never had to do.
https://wiki.python.org/moin/ScriptingJava

Thanks for the info. I think I might be able to avoid this issue (EDIT: avoid the issue entirely and NOT have to call the Python code from Java) if I can speed up my Python code. I have a for loop that represents the majority of the computation and I am hoping to be able to parallelize it - each result from the for loop gets stored in a dictionary.
This is my code:

Python code:

import networkx as nx
import pandas as pd

G = nx.read_shp(road_shapefile_path)
G = G.to_undirected(G)

bldg_to_coords_map = {}
bldg_data_df = pd.read_csv(building_raw_data_path, sep='\t')
for idx, row in bldg_data_df.iterrows():
	bldg_to_coords_map[row['Building_ID']] = (row['Centroid_x'], row['Centroid_y'])

node_list = G.nodes()
for bldg_id, bldg_coord in bldg_to_coords_map.iteritems():
	min_dist = 1000000
	best_coord_match = None
	for node_coord in node_list:
		this_dist = euclidean_distance(node_coord, bldg_coord)
		if this_dist < min_dist:
			best_coord_match = node_coord
			min_dist = this_dist

	bldg_to_coords_map[bldg_id] = best_coord_match

The second for loop is what I am hoping to parallelize. The graph G has about 20000 nodes, so each iteration of the for loop can take a while to process. I looked into parallelization in Python and came across this post and this documentation, and the following parallelization of my code:

Python code:

import joblib
import multiprocessing
import networkx as nx
import pandas as pd

def best_match(bldg_id, bldg_coord, node_list):
	min_dist = 1000000
	best_coord_match = None
	for node_coord in node_list:
		this_dist = euclidean_distance(node_coord, bldg_coord)
		if this_dist < min_dist:
			best_coord_match = node_coord
			min_dist = this_dist

	return (bldg_id, best_coord_match)

if __name__ == "__main__":
	G = nx.read_shp(road_shapefile_path)
	G = G.to_undirected(G)

	bldg_to_coords_map = {}
	bldg_data_df = pd.read_csv(building_raw_data_path, sep='\t')
	for idx, row in bldg_data_df.iterrows():
		bldg_to_coords_map[row['Building_ID']] = (row['Centroid_x'], row['Centroid_y'])
	
	num_cores = multiprocessing.cpu_count()
	node_list = G.nodes()
	results = joblib.Parallel(n_jobs=num_cores)(joblib.delayed(best_match)(
				bldg_id, bldg_coord, node_list) for bldg_id, bldg_coord in bldg_to_coords_map.items())
	for bldg_id, matched_coords in results:
		bldg_to_coords_map[bldg_id] = matched_coords

However, even though I have 4 CPUs on my computer the parallelized code actually takes 3 times as LONG as the regular code does. Can anyone see what I am doing incorrectly? Or is this telling me that my for loop was not a good candidate for parallelization? If so, why not ( as far as I can tell it meets the requirements for parallelization)?

Jose Cuervo fucked around with this message at 00:43 on Jan 12, 2015

# ? Jan 12, 2015 00:04

KICK BAMA KICK: Mar 2, 2009

Jose Cuervo posted:

However, even though I have 4 CPUs on my computer the parallelized code actually takes 3 times as LONG as the regular code does. Can anyone see what I am doing incorrectly? Or is this telling me that my for loop was not a good candidate for parallelization? If so, why not ( as far as I can tell it meets the requirements for parallelization)?

I won't pretend to know enough to explain it (or to be certain that it's causing your issue), but the term you should be Googling if you don't know about it already is "Global Interpreter Lock". It's a design choice made in the default implementation of Python that seriously handicaps its potential for parallelization, as I understand it. But it's not inherent to the Python language, just the CPython interpreter; some alternatives like the aforementioned Jython don't have this issue. That might not help if you were trying to avoid that but I hope that's enough to start your research.

# ? Jan 12, 2015 00:35

OnceIWasAnOstrich: Jul 22, 2006

Joblib defaults to using the multiprocessing library to bypass the Global Interpreter Lock. The first thing to check is to make sure that you are actually seeing 4 Python processes each at full CPU usage. If you don't, then it has something to do with how joblib should be used. If you do see four processes using full CPU time, then the problem is probably some sort of inefficiency with spinning up processes for each iteration. You might need to either use multiprocessing directly and set up worker processes that persist for more than one job or find that functionality in joblib.

# ? Jan 12, 2015 01:28

Jose Cuervo: Aug 25, 2004

OnceIWasAnOstrich posted:

Joblib defaults to using the multiprocessing library to bypass the Global Interpreter Lock. The first thing to check is to make sure that you are actually seeing 4 Python processes each at full CPU usage. If you don't, then it has something to do with how joblib should be used. If you do see four processes using full CPU time, then the problem is probably some sort of inefficiency with spinning up processes for each iteration. You might need to either use multiprocessing directly and set up worker processes that persist for more than one job or find that functionality in joblib.

Actually I think I found the explanation here, where my issue is that each iteration of the for loop is relatively inexpensive, but there are a lot of iterations to do. It seems that the parallelization works best if the individual iterations are expensive compared to the overhead required to set up a new process each time.

So I was wrong, and the situation I had was not ideal for parallelization in that way.

# ? Jan 12, 2015 02:16

pmchem: Jan 22, 2010

I've never looked into using joblib for anything but you may consider parallelism alternatives such as MPI (e.g., mpi4py). It's used for similar things and I've used it for things that are trivially parallel (individual iterations of a loop being cheap and independent).

# ? Jan 12, 2015 02:42

OnceIWasAnOstrich: Jul 22, 2006

Jose Cuervo posted:

Actually I think I found the explanation here, where my issue is that each iteration of the for loop is relatively inexpensive, but there are a lot of iterations to do. It seems that the parallelization works best if the individual iterations are expensive compared to the overhead required to set up a new process each time.

So I was wrong, and the situation I had was not ideal for parallelization in that way.

This is what I was trying to get at. It seems unlikely however that joblib is actually spinning up a new process for every iteration, if it is it sounds like something that should really be fixed. If you use multiprocessing.Pool directly it creates the processes at the time of pool creation, and then feeds them one at a time. I can get no slowdown and a slight increase in speed from running functions that return in as fast ~100ns each that way, so you may want to try more primitive multiprocessing tools.

# ? Jan 12, 2015 03:31

Jose Cuervo: Aug 25, 2004

OnceIWasAnOstrich posted:

This is what I was trying to get at. It seems unlikely however that joblib is actually spinning up a new process for every iteration, if it is it sounds like something that should really be fixed. If you use multiprocessing.Pool directly it creates the processes at the time of pool creation, and then feeds them one at a time. I can get no slowdown and a slight increase in speed from running functions that return in as fast ~100ns each that way, so you may want to try more primitive multiprocessing tools.

So I looked into using just the multiprocessing library with a Pool:

Python code:

import joblib
import multiprocessing
import networkx as nx
import pandas as pd

def best_match(bldg_id, bldg_coord, node_list):
	min_dist = 1000000
	best_coord_match = None
	for node_coord in node_list:
		this_dist = euclidean_distance(node_coord, bldg_coord)
		if this_dist < min_dist:
			best_coord_match = node_coord
			min_dist = this_dist

	return (bldg_id, best_coord_match)

if __name__ == "__main__":
	G = nx.read_shp(road_shapefile_path)
	G = G.to_undirected(G)

	bldg_to_coords_map = {}
	bldg_data_df = pd.read_csv(building_raw_data_path, sep='\t')
	for idx, row in bldg_data_df.iterrows():
		bldg_to_coords_map[row['Building_ID']] = (row['Centroid_x'], row['Centroid_y'])
	
	pool_size = multiprocessing.cpu_count()  # 4 cpus
	pool = multiprocessing.Pool(processes=pool_size)
	node_list = G.nodes()
	result = [pool.apply(best_match, args=(bldg_id, bldg_coord, node_list)) for bldg_id, bldg_coord in bldg_to_coords_map.iteritems()]

Is this more along the lines of what you were saying? I am not sure I have used a Pool correctly here - multiple processes get spawned (when I look in Task Manager, but they each use <10% CPU), and the code does not terminate even after 7 minutes of running, where the unparallelized code is done after 3 minutes.

I was able to get a speedup with joblib by breaking the list of (bldg_id, bldg_coord) tuples into 4 equal sized lists, and then using joblib to run four processes in parallel (i.e. run the four for loops in parallel). When I run the code this way four processes do get spawned and they use 100% CPU (25% each). It just seems like a hacky way to get things done, but I suppose it is better than nothing.

# ? Jan 12, 2015 04:46

namaste friends: Sep 18, 2004; by Smythe

I've been loving with threading and multiprocess more or less for the past month. I've seen the same results too which is code that runs slower with more processes (don't use threading because of GIL). Have you considered using apply_async?

# ? Jan 12, 2015 05:19

Jose Cuervo: Aug 25, 2004

Cultural Imperial posted:

I've been loving with threading and multiprocess more or less for the past month. I've seen the same results too which is code that runs slower with more processes (don't use threading because of GIL). Have you considered using apply_async?

Yeah I tried that as well - the code took about as long to run as with regular apply - just over 5.5 minutes (I had read the wrong output when I said 7 minutes for apply earlier). The CPU was definitely more used - 75-85% CPU being used for the 5.5 minutes with apply_async versus 35-40% CPU being used for the 5.5-ish minutes with apply.

Right now my hacky version with joblib runs in 1 minute 27 seconds. There is some stuff about sharing memory/ access to objects with joblib, so I will look into that tomorrow.

# ? Jan 12, 2015 05:36

QuarkJets: Sep 8, 2008

Jose Cuervo posted:

So I looked into using just the multiprocessing library with a Pool:
Python code:
import joblib
import multiprocessing
import networkx as nx
import pandas as pd

def best_match(bldg_id, bldg_coord, node_list):
	min_dist = 1000000
	best_coord_match = None
	for node_coord in node_list:
		this_dist = euclidean_distance(node_coord, bldg_coord)
		if this_dist < min_dist:
			best_coord_match = node_coord
			min_dist = this_dist

	return (bldg_id, best_coord_match)

if __name__ == "__main__":
	G = nx.read_shp(road_shapefile_path)
	G = G.to_undirected(G)

	bldg_to_coords_map = {}
	bldg_data_df = pd.read_csv(building_raw_data_path, sep='\t')
	for idx, row in bldg_data_df.iterrows():
		bldg_to_coords_map[row['Building_ID']] = (row['Centroid_x'], row['Centroid_y'])
	
	pool_size = multiprocessing.cpu_count()  # 4 cpus
	pool = multiprocessing.Pool(processes=pool_size)
	node_list = G.nodes()
	result = [pool.apply(best_match, args=(bldg_id, bldg_coord, node_list)) for bldg_id, bldg_coord in bldg_to_coords_map.iteritems()]
Is this more along the lines of what you were saying? I am not sure I have used a Pool correctly here - multiple processes get spawned (when I look in Task Manager, but they each use <10% CPU), and the code does not terminate even after 7 minutes of running, where the unparallelized code is done after 3 minutes.

I was able to get a speedup with joblib by breaking the list of (bldg_id, bldg_coord) tuples into 4 equal sized lists, and then using joblib to run four processes in parallel (i.e. run the four for loops in parallel). When I run the code this way four processes do get spawned and they use 100% CPU (25% each). It just seems like a hacky way to get things done, but I suppose it is better than nothing.

Try this:

code:

result = pool.map(best_match, bldg_to_coords_map.iteritems())

# ? Jan 12, 2015 05:37

Jose Cuervo: Aug 25, 2004

QuarkJets posted:

Try this:

code:

result = pool.map(best_match, bldg_to_coords_map.iteritems())

I tried that but couldn't figure out how to pass in node_list as an additional argument. THoughts?

# ? Jan 12, 2015 05:51

QuarkJets: Sep 8, 2008

Jose Cuervo posted:

I tried that but couldn't figure out how to pass in node_list as an additional argument. THoughts?

The best way to do this might be to write your data into the values of the dict

code:

def best_match(bldg_id, bldg_coord, node_list):
    min_dist = 1000000
    best_coord_match = None
    for node_coord in node_list:
        this_dist = euclidean_distance(node_coord, bldg_coord)
        if this_dist < min_dist:
            best_coord_match = node_coord
            min_dist = this_dist

    return (bldg_id, best_coord_match)

if __name__ == "__main__":
    G = nx.read_shp(road_shapefile_path)
    G = G.to_undirected(G)
    node_list = G.nodes()

    bldg_to_coords_map = {}
    bldg_data_df = pd.read_csv(building_raw_data_path, sep='\t')
    for idx, row in bldg_data_df.iterrows():
        bldg_to_coords_map[row['Building_ID']] = row['Building_ID'], (row['Centroid_x'], row['Centroid_y']), node_list
    
    pool_size = multiprocessing.cpu_count()  # 4 cpus
    pool = multiprocessing.Pool(processes=pool_size)
    result = pool.map(best_match, bldg_to_coords_map.itervalues())

QuarkJets fucked around with this message at 10:10 on Jan 12, 2015

# ? Jan 12, 2015 10:06

Nippashish: Nov 2, 2005; Let me see you dance!

If you stick all of the coordinates (for both the buildings and the graph) into numpy arrays then you can use this:

code:

def square_distance(X, Y):
    """
    D = square_distance(X, Y)
    D[i,j] == square_distance(X[i,:], Y[j,:])
    """
    X2 = (X**2).sum(axis=1)
    Y2 = (Y**2).sum(axis=1)
    square_dist = np.add.outer(X2, Y2) - 2.0 * np.dot(X, Y.T)
    return square_dist

to compute the squared distance between all pairs of points and it will be much faster than looping over each pair. You can recover the nearest node to each building with something like this:

code:

dists = square_distance(building_coords, node_coords)
nearest = np.argmin(dists, axis=1)
nearest_neighbours[nearest,:]

If you really need a dictionary from building id to nearest coordinate then you'll still have to build that from the result, but you can build a pandas DataFrame directly from nearest_neighbours (and have it indexed by the building id). If this uses too much memory you can process buildings in blocks, similar to what you're doing with multiprocessing.

# ? Jan 12, 2015 10:09

Munkeymon: Aug 14, 2003; Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.

Jose Cuervo posted:

Is this more along the lines of what you were saying? I am not sure I have used a Pool correctly here - multiple processes get spawned (when I look in Task Manager, but they each use <10% CPU), and the code does not terminate even after 7 minutes of running, where the unparallelized code is done after 3 minutes.

I was able to get a speedup with joblib by breaking the list of (bldg_id, bldg_coord) tuples into 4 equal sized lists, and then using joblib to run four processes in parallel (i.e. run the four for loops in parallel). When I run the code this way four processes do get spawned and they use 100% CPU (25% each). It just seems like a hacky way to get things done, but I suppose it is better than nothing.

You can also set chunksize to the length of the list divided by the pool_size. Someone night be able to say for sure, but I think multiprocessing is using such a small default chunksize to move data between threads that it's actually spending a ton of (wall-clock) time waiting for the OS to move tiny chunks of data between threads and the solution I found was to basically force it to move the whole dang list (or as close to that as the OS will allow) to the process at once.

I wouldn't call it "hacky", but I'm biased

# ? Jan 12, 2015 20:00

sharktamer: Oct 30, 2011; Shark tamer ridiculous

I'm using fabric to run a single command on multiple servers. It's a big command with a for loop and multiple subshells, so I'm left with a big fat string that's being passed to the fabric run function. Does anyone know of any libraries out there that can be used to construct shell commands a little nicer? Something maybe like:

code:

import theshelllib

theshelllib.foreach(a_glob).do(cmd)

which would give me the below command as a string:

code:


for i in a_glob:{ cmd $i;}

Probably a bad example there since I'm not entirely sure what I'll be doing with it at the moment, hopefully I've made what I'm looking for clear.

# ? Jan 13, 2015 22:45

ShadowHawk: Jun 25, 2000; CERTIFIED PRE OWNED TESLA OWNER

Jose Cuervo posted:

Yeah I tried that as well - the code took about as long to run as with regular apply - just over 5.5 minutes (I had read the wrong output when I said 7 minutes for apply earlier). The CPU was definitely more used - 75-85% CPU being used for the 5.5 minutes with apply_async versus 35-40% CPU being used for the 5.5-ish minutes with apply.

Right now my hacky version with joblib runs in 1 minute 27 seconds. There is some stuff about sharing memory/ access to objects with joblib, so I will look into that tomorrow.

Did you ever try concurrent.futures?

# ? Jan 14, 2015 02:49

Edison was a dick: Apr 3, 2010; direct current only

sharktamer posted:

I'm using fabric to run a single command on multiple servers. It's a big command with a for loop and multiple subshells, so I'm left with a big fat string that's being passed to the fabric run function. Does anyone know of any libraries out there that can be used to construct shell commands a little nicer?

If you're running the commands to do some kind of configuration management it might be wortn looking at ansible.

Otherwise, unless fabric mangles the strings you give it, you could write out your script as a long string and interpolate any parameters in.

code:


import pipes
import subprocess
import sys

script = r'''
printf "%s\n" {hostname} >/etc/hostname
hostname -F /etc/hostname
'''

host = sys.argv[1]
new_hostname = sys.argv[2]

subprocess.check_call(
    ['ssh', host,
     script.format(
         hostname=pipes.quote(new_hostname)
     )
    ]
)

I find the raw, long string form useful, as you can inline the whole shell script and format it as readably as you want.

# ? Jan 14, 2015 10:47

Kudaros: Jun 23, 2006

I have a simple question about CSV files/list splitting with Python. I have a long list of names occupying a single column and I want to split these lists in such a way to maximize the number of names on a single page. That is, I want to create new columns from the list to fill a page and only go to the next page if necessary -- the idea is to maximize ink on the paper to reduce pages used. Are there any convenient packages for this or should I just hand code it? I am using tablib at the moment.

# ? Jan 14, 2015 18:36

ShadowHawk: Jun 25, 2000; CERTIFIED PRE OWNED TESLA OWNER

Kudaros posted:

I have a simple question about CSV files/list splitting with Python. I have a long list of names occupying a single column and I want to split these lists in such a way to maximize the number of names on a single page. That is, I want to create new columns from the list to fill a page and only go to the next page if necessary -- the idea is to maximize ink on the paper to reduce pages used. Are there any convenient packages for this or should I just hand code it? I am using tablib at the moment.

Is it ok to munge the order?

If not you're just iteratively slicing the list by however many lines fit on a page:

Python code:

lines_per_page = 40
columns = [original_list[a:a+lines_per_page] for a in range(0,len(original_list),lines_per_page)]

If you can munge the order then it sounds like you want to sort by length of the name first, which is only accurate in a monospaced font.

# ? Jan 14, 2015 18:53

Kudaros: Jun 23, 2006

ShadowHawk posted:

Is it ok to munge the order?

If not you're just iteratively slicing the list by however many lines fit on a page:
Python code:
lines_per_page = 40
columns = [original_list[a:a+lines_per_page] for a in range(0,len(original_list),lines_per_page)]
If you can munge the order then it sounds like you want to sort by length of the name first, which is only accurate in a monospaced font.

This works just fine for my purposes, thanks for the response. I now have a separate problem

EDIT: Nevermind about tablibs problem -- very basic looping out of range mistake...

I think I'm representing the scientist-as-bad-programmer stereotype...

Kudaros fucked around with this message at 20:44 on Jan 14, 2015

# ? Jan 14, 2015 20:04

MockingQuantum: Jan 20, 2012

Noob alert: I'm almost totally new to programming. I recently finished the Python course on Codecademy, and I find I really enjoy working with the language. I feel like I could use some more guided study before I try things on my own though. Would Invent with Python be a natural next step in teaching myself how to program? If it matters, my end goals would be game development and, eventually, audio-related applications.

# ? Jan 14, 2015 21:39

Dominoes: Sep 20, 2007

Start a project. Codeacademy's Python course is all you need for now.

# ? Jan 14, 2015 21:49

Rlazgoth: Oct 10, 2003; I attack people at random.

Following my previous post (it's on the previous page) and the recommendations in this thread, I came to the conclusion that learning Qt for this particular project would be a significant detour so I sticked to Tkinter and programmed the GUI using it. As suggested, I completely separated the program logic from the GUI component, and all interaction between them is done through a single "event_handler" function. The program now runs as expected, except for the fact that the GUI completely freezes while running searches or downloading files.

I have looked up information on this and I understand why it is occurring. From what I gather, I have three options to deal with this: use a callback loop (which from my understanding is not the best option to handle I/O blocking in the case of Tkinter - at least I have no idea how to implement it in a simple manner), multiprocessing, or threading.

I attempted the threading solution and actually managed to get the program running as expected; whenever the user presses the Search or Export buttons, the event_handler opens a new thread and runs the query/export functions within it. They do whatever they are supposed to do and the GUI doesn't freeze. However, while reading up on this I came across a substantial amount of posts warning against using threading in Python - specially for beginners like me - and to use multiprocessing instead. More alarming is the fact that most guides are written assuming you are using Queue in tandem with threading, which I am not - and nor do I understand why would I need it. Because of this, I am becoming wary of investing more time in threading, out of fear of doing something incredibly wrong but not immediately obvious. Can anyone explain - in beginner's terms - if it is truly a bad idea to use threading, and if so, why? The only code section where I am currently using threading is this one:

code:

def event_handler(command):
    if command == "FORMSEARCH":
        s_ID = session()
        body_actionbar.do_search.configure(state=DISABLED)
        t = threading.Thread(target=lambda: search(body_form.get_input(),s_ID)
        t.start()

Is there anything horribly wrong in here? Any other comments are welcome, I'm learning mostly from practice so I have no way to know when I gently caress stuff up (except when the program doesn't run at all).

# ? Jan 14, 2015 22:46

FoiledAgain: May 6, 2007

Rlazgoth posted:

More alarming is the fact that most guides are written assuming you are using Queue in tandem with threading, which I am not - and nor do I understand why would I need it. Because of this, I am becoming wary of investing more time in threading, out of fear of doing something incredibly wrong but not immediately obvious. Can anyone explain - in beginner's terms - if it is truly a bad idea to use threading, and if so, why? The only code section where I am currently using threading is this one:
code:
def event_handler(command):
    if command == "FORMSEARCH":
        s_ID = session()
        body_actionbar.do_search.configure(state=DISABLED)
        t = threading.Thread(target=lambda: search(body_form.get_input(),s_ID)
        t.start()
Is there anything horribly wrong in here? Any other comments are welcome, I'm learning mostly from practice so I have no way to know when I gently caress stuff up (except when the program doesn't run at all).

You need the Queue if you are trying to communicate with the thread. But it looks like you aren't returning anything, just running a thread.

If that changes, You'll want to learn about the Tkinter method after() which will lets you check in on a queue regularly.

# ? Jan 14, 2015 22:57

Rlazgoth: Oct 10, 2003; I attack people at random.

FoiledAgain posted:

You need the Queue if you are trying to communicate with the thread. But it looks like you aren't returning anything, just running a thread.

If that changes, You'll want to learn about the Tkinter method after() which will lets you check in on a queue regularly.

Actually that answers what would be my next question. The search function does return something, an URL which is then used by an export function to save the results locally. I was wondering how to access that value and this explains the role of Queue in the middle of this. Thanks!

# ? Jan 14, 2015 23:04

FoiledAgain: May 6, 2007

Rlazgoth posted:

Actually that answers what would be my next question. The search function does return something, an URL which is then used by an export function to save the results locally. I was wondering how to access that value and this explains the role of Queue in the middle of this. Thanks!

It's fairly simple to do. Create a queue in your main Tk thread, send it as an argument to your search function, and then tell that function to .put() your result into the queue. In your main thread, you need a function something like this:

code:


def check_q(self):
    try:
        url = self.q.get()
       self.process_url(url)
    except queue.Empty:
        self.after(10,self.check_q)

This will continue checking the q for something every 10ms, until it actually find something there.

# ? Jan 15, 2015 00:19

Adbot: ADBOT LOVES YOU

# ? Jun 11, 2024 18:54

sharktamer: Oct 30, 2011; Shark tamer ridiculous

Is this gross?

code:

@task
def the_cool_task():
    out = '1\n2\n4\n5\n3'
    if 'sort_output' in [i[3] for i in inspect.stack()]:
        return out
    print out

@task
@runs_once
def sort_output(f):
    results = execute(f)
    print sort_results(results)

fabric defines its tasks that can be run with the fab command by the functions given the @task command. You can pass arguments to the functions from the command line so:

code:

fab sort_output:the_cool_task

can be run to sort the output of the_cool_task. This works, but is it really bad practice to have classes either return or print, especially when the way I'm doing it is so hacky?

sharktamer fucked around with this message at 01:46 on Jan 15, 2015

# ? Jan 15, 2015 01:28

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »