|
I'm trying to figure out what the least-dumb way to assign a service tech based on a map of highlighted states. Each rep has 1 or more states that they are responsible for, and i want to email the responsible tech when a ticket comes in from a store in their state. I'm new to Python.code:
|
# ? Mar 20, 2015 21:02 |
|
|
# ? May 9, 2024 06:54 |
|
Bob Morales posted:I'm trying to figure out what the least-dumb way to assign a service tech based on a map of highlighted states. Each rep has 1 or more states that they are responsible for, and i want to email the responsible tech when a ticket comes in from a store in their state. I'm new to Python. You probably want to maintain a dictionary mapping states to the techs that are responsible for them.
|
# ? Mar 20, 2015 21:09 |
|
Bob Morales posted:I'm trying to figure out what the least-dumb way to assign a service tech based on a map of highlighted states. Each rep has 1 or more states that they are responsible for, and i want to email the responsible tech when a ticket comes in from a store in their state. I'm new to Python. You could create a dictionary with each state mapped to the tech responsible: Python code:
Python code:
|
# ? Mar 20, 2015 21:10 |
|
Jose Cuervo posted:This only works if each state is taken care of by a single tech. If there are multiple techs per state, just make the dict values lists, and then send an e-mail to each of them (or the first not occupied, or whatever your desired behaviour is).
|
# ? Mar 20, 2015 21:13 |
|
I have a 'Factory' class that creates instances of the 'Tanker' class. I want to keep track of the order in which tanker instances are created by all instances of the 'Factory' class. For example if there are two factories, and the first factory creates one tanker at time 4 and one tanker at time 10, and the second factory creates one tanker at time 7, I want the tanker created at time 4 to have the ID 1, tanker created at time 7 to have the ID 2, and the tanker created at time 10 to have the ID 3. It seems like creating a global variable would accomplish this, but is there a better way to keep track of this? Perhaps a variable in the factory class that is common to all instances? EDIT: Turns out it is called a static variable: http://stackoverflow.com/questions/68645/static-class-variables-in-python Jose Cuervo fucked around with this message at 21:36 on Mar 20, 2015 |
# ? Mar 20, 2015 21:28 |
|
I'm starting a little project writing a wrapper for a web REST(ish) API. Anyone have any examples of any wrappers out there that they like in particular that I can steal ideas from? Particularly I'm looking for wrappers that do a good job of any of these (and more): python-izing the API, making good objects out of API results, maps saving objects to POSTing objects in a transparent-as-possible manner, caching related objects, rate limiting, all of that stuff.
|
# ? Mar 20, 2015 21:33 |
|
Jose Cuervo posted:You could create a dictionary with each state mapped to the tech responsible: We use 1 tech per state right now. I thought about that, but then if we fire a tech and hire a new one I have to change the tech assigned to like, 15 states in some cases.
|
# ? Mar 20, 2015 21:45 |
|
SurgicalOntologist posted:The only thing "wrong" with a functional style is it's not mainstream Python, so your typical Python programmer is not likely to have encountered it and will be confused. BigRedDot posted:I haven't used map or filter in years. List comprehensions and generator expressions
|
# ? Mar 20, 2015 22:03 |
|
Bob Morales posted:We use 1 tech per state right now. [state_to_tech.__setitem__(state, new_technician) for state, technician in zip(state_to_tech.keys(), state_to_tech.values()) if state_to_tech[state] == old_technician]
|
# ? Mar 20, 2015 22:11 |
|
You could always use a DataFrame and have name, email, and state columns. You can easily filter based on state or name to slice it however you need. If you start assigning techs to multiple states you can create variables as needed or just add additional rows.
|
# ? Mar 20, 2015 22:26 |
|
Or invert the problem tech['jim'] = ('WA','CA','TX <etc>) tech['bill'] = ('AK','AZ',<etc>) Then just iiterate over the techs looking for the state. code:
duck monster fucked around with this message at 00:05 on Mar 21, 2015 |
# ? Mar 20, 2015 23:59 |
|
I'm doing a python course and I'm completely stumped on how to do a question in an assignment. It's a beginner course and they've given us some predefined stuff to use. This is an example of what its supposed to look like: This is what I've currently managed to do: This is my code: code:
code:
E: I fixed the numbers being wrong after I posted this, instead of display_temp(data[z][i]) it should be display_temp(data[z][sdpos+i]) E: Somehow I got it to work, kinda, it looks like this now: underage at the vape shop fucked around with this message at 12:11 on Mar 21, 2015 |
# ? Mar 21, 2015 11:56 |
|
You probably want to have a look at string formatting (If it's a bit much to look at, have a look at the examples, there's some using the exact same alignment trick as the method you're calling) Basically the output is tabulated, each temperature their method prints is padded to a fixed width, so the next one starts in the right place. Your issue is that your dates aren't padded, so the first temperature is printed too far to the left, and so all the rest are too. Look at their printing code and steal it!
|
# ? Mar 21, 2015 12:55 |
|
It worked, thanks! I just need to finish making the interact function that lets the user put in commands (aka calling the functions I just wrote), then the comments, and then I'll have done this whole assignment in about 6 hours.
|
# ? Mar 21, 2015 14:33 |
|
Hey dudes, Numba's pretty cool. Ie mean fast. Would there be any interest in a pypi module that contains some basic numerical functions implemented with it? Ie mean, sum, std, variance, bisect, interp etc. Ie maybe call the module 'fast', and you could just call 'fast.sum()', 'fast.interp() etc, and it would do the same as the standard/math/numpy libraries, but faster? Seems like it could be a drop-in replacement. Background: Numba's a continuum module that lets you write code that runs about as fast as C. You make a python function with a numba decorator, but you have to take away python's niceties in that function. It's still super-convenient, because once you've made a numba function, you can call it from full-up python. That said, caveat I need to confirm: I think some of the benefit of numba is from joining as much stuff as you can get away with into a single loop, and if you split the code into standalone functions, you may lose some of the performance benefits. ie a pearson correlation function that runs way faster than the scipy one: Python code:
Dominoes fucked around with this message at 18:34 on Mar 22, 2015 |
# ? Mar 22, 2015 18:17 |
|
Did some homework: It looks like splitting loops does cause slowdowns (Although still much faster than equivalent numpy/scipy funcs), but you can still split up some of the funcs. Ie: These basic funcs that work together can't really be made to share loops, so there's no benefit to mushing them together: Python code:
Python code:
|
# ? Mar 22, 2015 18:55 |
|
A bit of an odd situation: I'm developing a website with Pelican and testing it locally using the builtin webserver via SimpleHttpServer. Which works fine on one machine but when I do this on my laptop, the webserver would repeatedly refuse to launch. Python swallowed the error, but it turned out to be a socket error "Address already in use". netstat doesn't show the address in use, which is weird. When I change the port, it works fine the first time but on subsequent uses reverts to the error. So it's like the port isn't being released, but over quite a long time period. If I run the line that calls the server separately (python -m pelican.server 8000), things seem to be fine. Python 2.7.9, OSX 10.9, Pelican 3.5 for what it's worth.
|
# ? Mar 22, 2015 19:09 |
|
setuptools question. Let's say I have a module called 'quick', and ran python setup.py develop on it. Inside the main directory is a subdirectory called quick, and a file in it called funcs.py. funcs.py has some functions called 'mean', 'var', etc. I want to use this like so: Python code:
Python code:
from .funcs import sum_ as sum, mean, var No worky. Any idea?
|
# ? Mar 22, 2015 19:22 |
|
I never really had a need to speed up my numerical code so I haven't looked much into numba or other optimization techniques and I can't address your questions. I just wanted to point out that it is possible to compute variance in a single pass. That could improve things. Also, be sure to test with a variety of data sizes, as that may affect which styles are faster. Dominoes posted:setuptools question. Put any module-level imports that you want to be accessible directly at the library level in __init__.py. So, quick/__init__.py: Python code:
SurgicalOntologist fucked around with this message at 19:26 on Mar 22, 2015 |
# ? Mar 22, 2015 19:23 |
|
Thanks bro! I did notice that the speed improvement over numpy/scipy gets lower with higher .sizes, so that makes sense. Ie something might have 100x performance with sample size 100, but 2x when you add a few more zeros. I'll see if I can figure out a single-pass variance. ( I assume this means calculate the mean while you're calculating hte variance?) Hey turns out my prob with imports was that I had double-extensioned the __init__.py. Changed the .funcs to quick.funcs based on your advice. Also if you have no tolerance, caffeine is a hell of a drug. Dominoes fucked around with this message at 19:30 on Mar 22, 2015 |
# ? Mar 22, 2015 19:25 |
|
Dominoes posted:Did some homework: It looks like splitting loops does cause slowdowns (Although still much faster than equivalent numpy/scipy funcs), but you can still split up some of the funcs. Ie: Well, yeah. Think about the operations that you're performing here. In the first example that you posted (the one note quoted here), you calculated the mean, and then you used that in your variance calculation, and then you used those variances to immediately spit out standard deviations. And since you were merging loops, you only actually looped over the data twice (once for the mean, once for the variance) In this new example, your functions are unaware of the previous results, and the loops are split. So you calculated the mean, and then in calculating the std you wound up recalculating the mean. That's three loops, plus extra function overhead, plus extra compilation overhead since each of those functions gets compiled separately the first time that they're called. Naturally, this runs slower than the two-loop implementation that is all encapsulated in a single function, for many of the same reasons that the numpy implementation is also slower. Have you actually compared the numpy mean() and sum() method against your compiled mean() and sum() functions? I would have guessed that the numpy functions are about the same speed in the single-array case, since it's using compiled Fortran (faster) with a bunch of extra features (slower). But once you start doing more complex things, like calculating correlations, numpy's tendency to create temporary arrays would bite you in the rear end and the numba implementation would become faster.
|
# ? Mar 22, 2015 19:55 |
|
Your explanation of why the first correlation example was faster makes sense.QuarkJets posted:Have you actually compared the numpy mean() and sum() method against your compiled mean() and sum() functions? I would have guessed that the numpy functions are about the same speed in the single-array case, since it's using compiled Fortran (faster) with a bunch of extra features (slower). But once you start doing more complex things, like calculating correlations, numpy's tendency to create temporary arrays would bite you in the rear end and the numba implementation would become faster. Yep. Performance increase is inversely-proportional to data size for the basic funcs. Python code:
Python code:
Python code:
Python code:
Dominoes fucked around with this message at 20:12 on Mar 22, 2015 |
# ? Mar 22, 2015 20:04 |
|
This brings up an interesting point. We often have situations where an intermediate result of one algorithm is useful on its own. Variance and mean is an obvious, and simple, example, but I can recall encountering the same issue before with more complex algorithms. Is there a standard for dealing with this? It seems that if someone was really concerned about speed, and needed both the variance and the mean, they would end up implementing it themselves rather than use a library version which will inevitably compute the mean twice. In this day and age I can't see anyone re-implementing mean and variance as anything except wasted effort. What can library authors do about this? I see two options, neither of which are very attractive (and neither of which I've seen done). You could have the variance function spit out the mean as well. Which would work but would create some awkward code. Alternatively, the variance function could take the mean as an optional input, which if passed causes the mean computation step to be skipped. This would also work but relies on an invariant that could break down. Is there a more general term for this problem in computer science? Basically the phenomenon whereby clean encapsulation is at odds with doing each calculation only once.
|
# ? Mar 22, 2015 20:17 |
|
SurgicalOntologist posted:Many tutorials/examples will instead recommend from .funcs import var but my understanding is since Python 3 relative imports are not recommended. Kinda glad to hear it: I've written a lot of Python code, but relative imports have always given me trouble. I just avoid them by using the absolute path all the time.
|
# ? Mar 22, 2015 20:26 |
|
SurgicalOntologist posted:It seems that if someone was really concerned about speed, and needed both the variance and the mean, they would end up implementing it themselves rather than use a library version which will inevitably compute the mean twice. In this day and age I can't see anyone re-implementing mean and variance as anything except wasted effort. What can library authors do about this? This is probably a topic that should be brought over to the scientific computing thread, where it's more relevant, but: You're absolutely right, in computation-heavy fields it's common to reimplement basic library features for the sake of speed. Even in very fast computational languages (C and Fortran) it is common for computational experts to reimplement basic functions instead of using standard functions, much as Dominoes has done here. Even a speed increase of a few milliseconds per function call can have huge implications for code that normally takes days to run on a large supercomputer. And I don't think that there's anything that library maintainers can really do about this in a lot of cases. On one hand, you want features that are as fast as possible. On the other hand, you also want features that are user-friendly and adaptable, which sometimes comes at the expense of speed. Consider FFTW: the core code is extremely fast, but people who have never used FFTW before may need some time in order to use it effectively. Reimplementations of FFTW (the fft features of MATLAB or Scipy/Numpy) provide a trivial-to-use interface, but using the default parameters usually results in a slower FFT. Or, as we've seen here, the numpy mean() function is compiled but runs slower than the simple mean() function that someone can code up in a few seconds. The numpy mean function takes 4 optional arguments that are useful in a variety of specific use cases but might not be necessary in most cases e: I guess that you could write simple specific-case functions and just make their uses very specific ("mean_float32 returns the float32 flattened mean of a an array of float32 values", "mean_int64 returns the int64 flattened mean of an array of int64 values", etc). That's appealing to people solving hard computational problems due to the speed involved, but it's not very appealing to the computer scientists that create standard libraries (for maintainability and user friendliness reasons) or to the average user of these libraries. Going back to FFTW, that's why FFTW has so many separate but almost-equivalent functions (this one takes a complex array and modifies it in place, this one takes a real array and returns a complex array, etc). The maintainers want a library that is optimally fast, and they chose to sacrifice some user friendliness in order to maintain that speed. QuarkJets fucked around with this message at 21:37 on Mar 22, 2015 |
# ? Mar 22, 2015 21:28 |
|
QuarkJets posted:e: I guess that you could write simple specific-case functions and just make their uses very specific ("mean_float32 returns the float32 flattened mean of a an array of float32 values", "mean_int64 returns the int64 flattened mean of an array of int64 values", etc). That's appealing to people solving hard computational problems due to the speed involved, but it's not very appealing to the computer scientists that create standard libraries (for maintainability and user friendliness reasons) or to the average user of these libraries. Going back to FFTW, that's why FFTW has so many separate but almost-equivalent functions (this one takes a complex array and modifies it in place, this one takes a real array and returns a complex array, etc). The maintainers want a library that is optimally fast, and they chose to sacrifice some user friendliness in order to maintain that speed.
|
# ? Mar 22, 2015 21:40 |
|
SurgicalOntologist posted:This brings up an interesting point. We often have situations where an intermediate result of one algorithm is useful on its own. Variance and mean is an obvious, and simple, example, but I can recall encountering the same issue before with more complex algorithms. Is there a standard for dealing with this? There is another approach, deferred execution, and that is the path that Blaze (and dynd to some degree) are trying to follow. Basically, instead of immediately executing expressions, you keep track of all the expressions that are used and collect them into a big expression graph. The only when you actually want a "realized" result is the entire expression graph executed. But at this point you have a lot more information, you can optimize the expression graph to coalesce duplicate computations, remove unnecessary temporaries, use the most efficient access patterns, and only compute enough to actually provide what was asked for. If you are thinking that sounds like another compiler, well that's because basically it is. If you combine this with things like Numba, it becomes even more powerful. Our vision is for scientific and analytical compute is really about being able to spell things at a high level, but push efficient execution down to the metal and across clusters. BTW Dominoes I share a new office with most of the Numba devs. If you have any specific ideas you want me to pass on please feel free to let me know here or pm. BigRedDot fucked around with this message at 21:50 on Mar 22, 2015 |
# ? Mar 22, 2015 21:47 |
|
BigRedDot posted:BTW Dominoes I share a new office with most of the Numba devs. If you have any specific ideas you want me to pass on please feel free to let me know here or pm. Allow for Python function annotations. Method one: Ignore them, like the python interpreter does, instead of throwing an error. Should be easy to implement. Method two: Use them as an alternative syntax for function signatures, like with mypy. from Numba's documentation: Python code:
Python code:
|
# ? Mar 22, 2015 21:55 |
|
Quick on Github. Rough. Needs work and more functionality before I put it on pypi. Not sure how useful it will be to others, since the performance increase is only notable for number crunching. The idea is to install with pip install quick, and use its numerical functions as faster drop-in replacements for builtins or numpy's. Might fill a niche between using numpy functions, and writing custom optimized code. Dominoes fucked around with this message at 22:46 on Mar 22, 2015 |
# ? Mar 22, 2015 22:24 |
|
Still a programming newbie. Would like some help with enumerate(). When I try to learn stuff on my own, it's hard to follow because of the different variables they are using. I'm reading a txt file into a list. I need to number each line. Then print even numbered lines. I made it work like this, but I have a feeling that enumerate works better. ( I discovered it afterwards) code:
|
# ? Mar 23, 2015 00:33 |
|
Didn't test this:Python code:
Python code:
I think in Python 2 (again, not testing) you could use a slice on the output of enumerate to avoid the condition altogether: Python code:
|
# ? Mar 23, 2015 01:01 |
|
KICK BAMA KICK posted:CODE AND poo poo The first example is great. I understand it and it works perfectly. %d and the like are confusing to me for some reason. Also, I haven't gotten to join yet, but seems pretty straightforward. Thanks much. This is for an early Rosalind problem. Has anyone gone through these problems before? It seems fun and not terribly boring.
|
# ? Mar 23, 2015 01:18 |
|
jimcunningham posted:%d and the like are confusing to me for some reason. Python code:
Python code:
|
# ? Mar 23, 2015 01:29 |
|
Is the old style use a no-no. Or just preference?
|
# ? Mar 23, 2015 01:38 |
|
BigRedDot posted:There is another approach, deferred execution, and that is the path that Blaze (and dynd to some degree) are trying to follow. Basically, instead of immediately executing expressions, you keep track of all the expressions that are used and collect them into a big expression graph. The only when you actually want a "realized" result is the entire expression graph executed. But at this point you have a lot more information, you can optimize the expression graph to coalesce duplicate computations, remove unnecessary temporaries, use the most efficient access patterns, and only compute enough to actually provide what was asked for. If you are thinking that sounds like another compiler, well that's because basically it is. If you combine this with things like Numba, it becomes even more powerful. Our vision is for scientific and analytical compute is really about being able to spell things at a high level, but push efficient execution down to the metal and across clusters. That's super cool. I've been following Matthew Rocklin's blog so this is all familiar, but it didn't occur to me as a resolution to this issue. Good stuff. I should probably start using blaze one of these days.
|
# ? Mar 23, 2015 02:22 |
|
KICK BAMA KICK posted:but that doesn't work in Python 3 because enumerate returns a generator or iterator or something like that instead of a plain old list. code:
code:
|
# ? Mar 23, 2015 03:05 |
|
BigRedDot posted:You can "slice" generators using itertools:
|
# ? Mar 23, 2015 03:12 |
|
SurgicalOntologist posted:This brings up an interesting point. We often have situations where an intermediate result of one algorithm is useful on its own. Variance and mean is an obvious, and simple, example, but I can recall encountering the same issue before with more complex algorithms. Is there a standard for dealing with this? If the results are deterministic, you can do something called memoization, which is basically caching intermediate values for functions where you know the output won't change. But really, there is almost always a trade off between performance, flexibility, and ease of use. I'm writing GIS software from scratch right now, and I've got the standard ConvertLatLonToScreenCoords() and DrawLine() type things for one-off tasks or user-defined things where performance isn't a big deal. The core methods for drawing roads or cities or whatever else are all custom and repeat a ton of code. It's harder to maintain, it creates bigger binaries, etc. but performance is the most critical thing when rendering out 100,000 polylines to the screen, so it's a necessary evil. Abstract things out and make code easy to read and maintain. Once you hit a brick wall, start considering other options like cutting down on how frequently your function is called, pre-fetching it, caching it, etc. I'd probably save writing a one-off function for last when all else has failed, but don't be ashamed when you have to do it.
|
# ? Mar 23, 2015 04:47 |
|
Yeah I almost added memoization as another option but I decided I had already written enough . Anyways, for me it's mostly academic as I've rarely if ever had to seriously optimize my code, so I've always been on the "ease of use" side of the tradeoff.
|
# ? Mar 23, 2015 04:57 |
|
|
# ? May 9, 2024 06:54 |
|
jimcunningham posted:Is the old style use a no-no. Or just preference? I think it's just preference, but the old-style is so much less flexible that I basically never use it anymore
|
# ? Mar 23, 2015 07:01 |