Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Hammerite posted:

Thanks for this pointer. This looks like it might at least have most of what I need, I'll have to decide whether to use this or go on with my own thing.

Taking a longer (but still brief) look at this, unless I'm missing something, it still uses floating point numbers internally to speed things up or whatever. I need something that will do things symbolically (even if that implies that things will be slower) and that will cooperate if I decide to use my own subclass of numbers.Real. I'll probably carry on with my own class. I'm having fun with it, anyway. I never learnt basic linear algebra as well as I should have.

Adbot
ADBOT LOVES YOU

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
If you install something like uWSGI in a virtualenv, should any configuration files also live inside the virtualenv?

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

Hammerite posted:

Taking a longer (but still brief) look at this, unless I'm missing something, it still uses floating point numbers internally to speed things up or whatever. I need something that will do things symbolically (even if that implies that things will be slower) and that will cooperate if I decide to use my own subclass of numbers.Real. I'll probably carry on with my own class. I'm having fun with it, anyway. I never learnt basic linear algebra as well as I should have.

Years ago as an exercise I implemented gaussian elimination over fields like that using the fractions library coupled with customized classes. For example in the Q[sqrt(2)] case the objects would be just pairs of fractions (a, b) where the inverse would be (a/(a^2 + 2b^2), -b/(a^2 + 2b^2)).

Dren
Jan 5, 2001

Pillbug
Anyone have a one-liner for the for loop here so that the function is not needed?

Python code:
def next_two(i):
    while True:
        yield (i.next(), i.next())

a = range(10)

for x, y in next_two(iter(a)):
    print x, y

good jovi
Dec 11, 2000

'm pro-dickgirl, and I VOTE!

fletcher posted:

If you install something like uWSGI in a virtualenv, should any configuration files also live inside the virtualenv?

Config files should live with the project. Virtualenvs do have a directory, but that's just an implementation detail, and that directory could be anywhere (for instance, I use virtualenvwrapper, so all my env directories are in ~/.virtualenvs).

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Symbolic Butt posted:

Years ago as an exercise I implemented gaussian elimination over fields like that using the fractions library coupled with customized classes. For example in the Q[sqrt(2)] case the objects would be just pairs of fractions (a, b) where the inverse would be (a/(a^2 + 2b^2), -b/(a^2 + 2b^2)).

Yeah, this is exactly the sort of thing I was thinking of doing (not very far into it right now).

I had been thinking of making the matrix objects mutable in a limited way - it would be possible to edit entries in a matrix, but not to change the dimensions of the matrix. Now, though, I'm thinking of making them immutable. There are some things I'm not sure about there, though.

  1. How exactly do I create a sensible __hash__() for my class? The documentation says that the key thing is that if two objects are equal, then they should have equal hash values. But is there more that I ought to do? If I have a naff __hash__() method are Python programmers going to scoff? :ohdear: If a matrix (3x2, say) is represented internally by something like t = ((1, 2), (3, 4), (5, 6)), then I could just implement __hash__() as hash(t). But maybe the hash should be different to the hash of t. It could be ~hash(t), or hash(t) + x, or hash(t + (y,)) + x (where x and y are some integers plucked out of the air). Is there a requirement that an object's hash value belong inside a set range? The docs just say it should be an integer.
  2. Am I allowed to have mutable internal state if the object is immutable? Motivation for this question is that if my class calculates, say, the determinant then it might be computationally expensive, so I might want to cache it in the object in case it gets requested a second time, or is needed for something a second time. The way I read the docs it seems like the hash value need only depend upon the same object properties as the __eq__() method, so it would be fine to do this.
  3. If I do an operation that potentially involves lots of intermediate matrices, like reducing a big matrix to echelon form, then it might be wasteful to generate new objects for each step, so maybe I need a mutable companion class (MutableMatrix or BaseMatrix, say) to reduce that expense by doing things in-place. Is this a thing that people do?

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Dren posted:

Anyone have a one-liner for the for loop here so that the function is not needed?

Python code:
def next_two(i):
    while True:
        yield (i.next(), i.next())

a = range(10)

for x, y in next_two(iter(a)):
    print x, y

There's the standard zip/iter trick, but that seems readable enough to me. Only thing I'd do is use next(i) instead of i.next().

tef
May 30, 2004

-> some l-system crap ->

Hammerite posted:

Yeah, this is exactly the sort of thing I was thinking of doing (not very far into it right now).

I had been thinking of making the matrix objects mutable in a limited way - it would be possible to edit entries in a matrix, but not to change the dimensions of the matrix. Now, though, I'm thinking of making them immutable. There are some things I'm not sure about there, though.

Make them Immutable, or better yet, use an existing Matrix Library

quote:

How exactly do I create a sensible __hash__() for my class?

If your __eq__ method checks .bar and .foo, use them in the __hash__ method.
code:
def __hash__(self):
    return hash((self.foo, self.bar))

quote:

Am I allowed to have mutable internal state if the object is immutable?

It isn't really mutable if it can't change. You can have lazy properties that are calculated and then fixed without being made fun of by python celebrities.

quote:

If I do an operation that potentially involves lots of intermediate matrices, like reducing a big matrix to echelon form, then it might be wasteful to generate new objects for each step, so maybe I need a mutable companion class (MutableMatrix or BaseMatrix, say) to reduce that expense by doing things in-place. Is this a thing that people do?

Write the slow version first, then replace it with an immutable version if it isn't fast enough. If you're worried about speed, using a library with native code may be the best way to handle things, rather than fiddling around in python.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

There's the standard zip/iter trick, but that seems readable enough to me. Only thing I'd do is use next(i) instead of i.next().

Yeah, the helper function seems the cleanest way. If you're wanting to do it without one, you can use this horror :3:
code:
>>> a_list = [chr(x) for x in range(65, 65+26)]
>>> for i, keys in itertools.groupby(enumerate(a_list), lambda i:i[0]//2):
...     print i, list(keys)
... 
0 [(0, 'A'), (1, 'B')]
1 [(2, 'C'), (3, 'D')]
2 [(4, 'E'), (5, 'F')]
3 [(6, 'G'), (7, 'H')]
4 [(8, 'I'), (9, 'J')]
5 [(10, 'K'), (11, 'L')]
6 [(12, 'M'), (13, 'N')]
7 [(14, 'O'), (15, 'P')]
8 [(16, 'Q'), (17, 'R')]
9 [(18, 'S'), (19, 'T')]
10 [(20, 'U'), (21, 'V')]
11 [(22, 'W'), (23, 'X')]
12 [(24, 'Y'), (25, 'Z')]

tef
May 30, 2004

-> some l-system crap ->

Suspicious Dish posted:

There's the standard zip/iter trick.

I assume you mean this ?

code:
>>> a = b = iter(a_list)
>>> for i in itertools.izip(a,b):
...     print i
... 
('A', 'B')
('C', 'D')
('E', 'F')
('G', 'H')
('I', 'J')
('K', 'L')
('M', 'N')
('O', 'P')
('Q', 'R')
('S', 'T')
('U', 'V')
('W', 'X')
('Y', 'Z')
>>> 

Mrs. Wynand
Nov 23, 2002

DLT 4EVA

Hammerite posted:

Yeah, this is exactly the sort of thing I was thinking of doing (not very far into it right now).

I had been thinking of making the matrix objects mutable in a limited way - it would be possible to edit entries in a matrix, but not to change the dimensions of the matrix. Now, though, I'm thinking of making them immutable. There are some things I'm not sure about there, though.

  1. How exactly do I create a sensible __hash__() for my class? The documentation says that the key thing is that if two objects are equal, then they should have equal hash values. But is there more that I ought to do? If I have a naff __hash__() method are Python programmers going to scoff? :ohdear: If a matrix (3x2, say) is represented internally by something like t = ((1, 2), (3, 4), (5, 6)), then I could just implement __hash__() as hash(t). But maybe the hash should be different to the hash of t. It could be ~hash(t), or hash(t) + x, or hash(t + (y,)) + x (where x and y are some integers plucked out of the air). Is there a requirement that an object's hash value belong inside a set range? The docs just say it should be an integer.
  2. Am I allowed to have mutable internal state if the object is immutable? Motivation for this question is that if my class calculates, say, the determinant then it might be computationally expensive, so I might want to cache it in the object in case it gets requested a second time, or is needed for something a second time. The way I read the docs it seems like the hash value need only depend upon the same object properties as the __eq__() method, so it would be fine to do this.
  3. If I do an operation that potentially involves lots of intermediate matrices, like reducing a big matrix to echelon form, then it might be wasteful to generate new objects for each step, so maybe I need a mutable companion class (MutableMatrix or BaseMatrix, say) to reduce that expense by doing things in-place. Is this a thing that people do?

If you implement __hash__ you don't need to implement __eq__ do you?

In any case, equality (and hash equality) are supposed to be equal-by-value, so yes, usually you'd implement __hash__ as hash((self.internal_a,self.internal_b,....)) or somesuch. If you want identity the user would use the "is" operator. There are times when you only want equality by identity (usually when the objects represent some external state, like a row or a file or something) in which case just don't implement __hash__ (or __eq__ etc) as that is how objects work by default.

As for mutability, there are no hard and fast semantics (that I know of) in python regarding mutability for your own classes. If you call your object immutable I'm going to assume it's immutable, and what you do internally is none of my business. If the abstraction leaks and an object I assume is immutable starts mutating from under me though, I will try to hate you to death over the internet. But yes, you can definitely do it, just make sure the public presentation of your class is consistent.

Having said all that, tef is quite correct in saying it is best not to sweat performance at first. It's not actually that obvious trying to predict where mutability vs immutability will perform better.

NtotheTC
Dec 31, 2007


tef posted:

I assume you mean this ?

code:
>>> a = b = iter(a_list)
>>> for i in itertools.izip(a,b):
...     print i
... 
('A', 'B')
('C', 'D')
('E', 'F')
('G', 'H')
('I', 'J')
('K', 'L')
('M', 'N')
('O', 'P')
('Q', 'R')
('S', 'T')
('U', 'V')
('W', 'X')
('Y', 'Z')
>>> 


I got...

Python code:
for x in zip(*[iter(range(10))]*2):
    print x[0], x[1]
... by googling around. It's not very pretty, but I figure anyone going for one liners isn't really after readability.

NtotheTC fucked around with this message at 16:48 on Aug 13, 2013

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

NtotheTC posted:

I got...

Python code:
for x in zip(*[iter(range(10))]*2):
    print x[0], x[1]
... by googling around. It's not very pretty, but I figure anyone going for one liners isn't really after readability.

Yep, that's the standard one. It relies on the fact that list multiplication returns the same iter object.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Mr. Wynand posted:

If you implement __hash__ you don't need to implement __eq__ do you?

The way I think I understand it, hashable objects' hashes have to be the same if the objects are the same. But two different objects could happen to have the same hash. It's really really unlikely because hashes are integers with lots of digits, like 3713084879518070856 (that was the hash of the tuple (4, 5) when I fired up the Python interpreter just now). But it could happen, and it would be bad if two different matrices could, entirely by chance, test as being the same on a certain occasion because they chance to have the same hash.

Thanks for advice and thanks also to tef. I know there are existing matrix libraries but I am preoccupied with having one that prefers using Fractions over floats*. I'm not sure what search terms I'd use to find something like that. Finding matrix library implementations on Google is easy, finding ones that satisfy idiosyncratic requirements is less easy.

* For example, the following snippets should be equivalent:

code:
M = Matrix([1, 2], [3, 4])
M /= 4
code:
from fractions import Fraction as frac
M = Matrix([frac(1, 4), frac(1, 2)], [frac(3, 4), frac(1, 1)])

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

Hammerite posted:

The way I think I understand it, hashable objects' hashes have to be the same if the objects are the same. But two different objects could happen to have the same hash. It's really really unlikely because hashes are integers with lots of digits, like 3713084879518070856 (that was the hash of the tuple (4, 5) when I fired up the Python interpreter just now). But it could happen, and it would be bad if two different matrices could, entirely by chance, test as being the same on a certain occasion because they chance to have the same hash.

Objects don't test as the same if their hashes are equal (unless for some reason you define __eq__ in that way). Hash values are mostly used to select hash table buckets when objects are used as dict keys or set items, and a good __hash__ function will try to minimize collisions so that items are distributed more-or-less uniformly in these data structures.

Everything still "works" correctly even if you use a degenerate hash function:
Python code:
>>> class No:
...   def __hash__(self):
...     return 1
... 
>>> n1 = No()
>>> n1
<__main__.No object at 0x7f9b18473290>
>>> n2 = No()
>>> n2
<__main__.No object at 0x7f9b184732d0>
>>> n1 == n2
False
>>> s = {n1, n2}
>>> s
{<__main__.No object at 0x7f9b184732d0>, <__main__.No object at 0x7f9b18473290>}
>>> No() in s
False
Everything will be assigned to the same hash table bucket, of course, so this guarantees that you'll hit the worst-case O(n) lookup/insertion time. There isn't any danger of losing an object, though.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Lysidas posted:

Objects don't test as the same if their hashes are equal (unless for some reason you define __eq__ in that way). Hash values are mostly used to select hash table buckets when objects are used as dict keys or set items, and a good __hash__ function will try to minimize collisions so that items are distributed more-or-less uniformly in these data structures.

Everything still "works" correctly even if you use a degenerate hash function:
Python code:
>>> class No:
...   def __hash__(self):
...     return 1
... 
>>> n1 = No()
>>> n1
<__main__.No object at 0x7f9b18473290>
>>> n2 = No()
>>> n2
<__main__.No object at 0x7f9b184732d0>
>>> n1 == n2
False
>>> s = {n1, n2}
>>> s
{<__main__.No object at 0x7f9b184732d0>, <__main__.No object at 0x7f9b18473290>}
>>> No() in s
False
Everything will be assigned to the same hash table bucket, of course, so this guarantees that you'll hit the worst-case O(n) lookup/insertion time. There isn't any danger of losing an object, though.

Oh, I see. My understanding was wrong. It is still appropriate to define __eq__() for a matrix class, though. The condition for two matrices to be equal is simple to state and to code. They are equal if they have equal dimensions and elements are pairwise equal. Object identity would not be appropriate, because if you construct two matrices in two different ways or at two different times but they are the same matrix, they should still compare equal.

Mrs. Wynand
Nov 23, 2002

DLT 4EVA
Doh, yes, you need __eq__ separately as you need to test type (or interface, if applicable) as well. I think I was getting confused with ruby...

What I said about equality vs identity still stands though - you (usually) want your objects to be equal by value, and if anyone needs an identity comparison they'll just use "is" explicitly.

Dren
Jan 5, 2001

Pillbug

tef posted:

I assume you mean this ?

code:
>>> a = b = iter(a_list)
>>> for i in itertools.izip(a,b):
...     print i
... 
('A', 'B')
('C', 'D')
('E', 'F')
('G', 'H')
('I', 'J')
('K', 'L')
('M', 'N')
('O', 'P')
('Q', 'R')
('S', 'T')
('U', 'V')
('W', 'X')
('Y', 'Z')
>>> 

This is really nice. I hadn't thought to use two references to the same iterator but I figured there had to be some trick like this. I think I'll use this one.

Suspicious Dish, why do you suggest next(i) instead of i.next()?

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe

Dren posted:

This is really nice. I hadn't thought to use two references to the same iterator but I figured there had to be some trick like this. I think I'll use this one.

Suspicious Dish, why do you suggest next(i) instead of i.next()?

.next() becomes .__next__() in Python 3. next() is a convenience function that calls whichever one is appropriate in the version you're using.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe

Dren posted:

Suspicious Dish, why do you suggest next(i) instead of i.next()?

Same reason I'd suggest you use str(i) instead of i.__str__().

Dominoes
Sep 20, 2007

Dominoes posted:

PYQT signals/slots question:

I'm getting crashes with QT threads. After troubleshooting and internet searching, it appears that the crashes are caused by updating the GUI from a thread other than its own. The solution I've read is to be to never 'update' the GUI directly, but use signals/slots instead.

Solved. Example solution:
Python code:
from PyQt5 import QtCore, QtWidgets

from gui import Ui_Main # Qt Designer file, via pyuic
 

class Main(QtWidgets.QMainWindow):
    def __init__(self, parent=None):
        QtWidgets.QWidget.__init__(self, parent)
        self.ui = Ui_Main()
        self.ui.setupUi(self)
        self.ui.my_button.clicked.connect(self.function)
 
    mySignal = QtCore.pyqtSignal(str)

    def function(self):
        self.thread = Thread()
        self.mySignal.connect(lambda message: self.ui.statusbar.showMessage(message))
        self.thread.start()


class Thread(QtCore.QThread):
    def __init__(self):
        QtCore.QThread.__init__(self)
 
    def __del__(self):
        self.wait()
 
   def run(self):
       main.mySignal.emit("Hi!")

  
app = QtWidgets.QApplication(sys.argv)
main = Main()
main.show()
sys.exit(app.exec_())

Dominoes fucked around with this message at 03:30 on Aug 14, 2013

digitalcamo
Jul 11, 2013
I just want to see if I'm understand this code correctly. I'm a little confused by it's mechanics I'd guess you'd call it. I get bottle_cokes is not defined in this section of code.

code:
def box_of_coke(started):
	bottle_cokes = started * 500
	crates_of_cokes = bottle_cokes / 700
	trucks_of_cokes = crates_of_cokes * 10
	return bottle_cokes, crates_of_cokes, trucks_of_coke

#This won't work
start_point = 10000
box_of_coke(start_point)
print "There is %d bottles of coke, %d crates, and %d trucks." % (bottle_cokes, crates_of_cokes, trucks_of_coke)
This section of code works.
code:
def box_of_coke(started):
	bottle_cokes = started * 500
	crates_of_cokes = bottle_cokes / 700
	trucks_of_coke = crates_of_cokes * 10
	return bottle_cokes, crates_of_cokes, trucks_of_coke

start_point = 10000
coke, crates, trucks = box_of_coke(start_point)
print "There is %d bottles of coke, %d crates, and %d trucks." % (coke, crates, trucks)
I know how to make the code work, but I don't understand why I have to create new variables for the function variables. I also do not understand how the variable coke gets assigned to bottle_cokes. Does it just go through coke, crates, trucks and assigns the first variable coke to the first variable of box_of_coke? And I'm assuming since I have three variables in box_of_coke I must declare three variables to box_of_coke? I'm sorry if my question is a little confusing I'm just trying to understand functions as best as possible.

Dominoes
Sep 20, 2007

digitalcamo posted:

I just want to see if I'm understand this code correctly. I'm a little confused by it's mechanics I'd guess you'd call it. I get bottle_cokes is not defined in this section of code.
...
I know how to make the code work, but I don't understand why I have to create new variables for the function variables.

In your first example, 'bottles_coke', 'crates_of_coke' etc are local variables, accessible only by the box_of_coke() function. In your second example, you assigned them to global variables, which can be accessed outside box_of_coke().

quote:

I also do not understand how the variable coke gets assigned to bottle_cokes. Does it just go through coke, crates, trucks and assigns the first variable coke to the first variable of box_of_coke?
Yes.

quote:

And I'm assuming since I have three variables in box_of_coke I must declare three variables to box_of_coke?
It depends. If you tried to assign it to one variable, you'd get a tuple containing all 3 values you returned. If you tried two, you'd raise this exception: 'ValueError: too many values to unpack (expected 2)'

Technique only: print "There is {0} bottles of coke, {1} crates, and {2} trucks.".format(bottle_cokes, crates_of_cokes, trucks_of_coke)

Dominoes fucked around with this message at 04:07 on Aug 14, 2013

Dren
Jan 5, 2001

Pillbug
The answer is scope.

http://en.m.wikipedia.org/wiki/Scope_(computer_science)

Crosscontaminant
Jan 18, 2007

Another way to think about it: You're expecting the functions to dump its names into the outer scope. What actually happens is a tuple value is created and dumped into the outer scope; since in the first code sample it's not being assigned to a name, it gets silently garbage-collected, after which bottle_cokes throws a NameError because it's not defined in the outer scope.

In the second code sample, the tuple is created and returned from the function, and then each element of the tuple is assigned to one of the three names you've just declared in the outer scope.

Dominoes posted:

It depends. If you tried to assign it to one variable, you'd get a tuple, with size three. If you tried two, you'd raise this exception: 'ValueError: too many values to unpack (expected 2)'
Though you can do *args to slurp the remainder of the return values into one name as a tuple.

Dominoes posted:

Technique only: print "There is {0} bottles of coke, {2} crates, and {3} trucks.".format(bottle_cokes, crates_of_cokes, trucks_of_coke)
By the same token you ought to put brackets around what you give to print, since it becomes a function in Python 3.

deedee megadoodoo
Sep 28, 2000
Two roads diverged in a wood, and I, I took the one to Flavortown, and that has made all the difference.


You need to read up on scope.

The simplest explanation is that the variables within a certain scope (in this case the function) only exist within that scope. However, the function has access to outside variables in the greater scope.

So for example:

code:
def test(myvar):
    print myvar, x

x = "goon"
test("hello")
will actually work because the variable x is available to the function.

however

code:
def test1(myvar):
    y = myvar

def test2(myvar):
    print myvar, y


test1(2)
test2("goon")
will fail because "y" only exists within test1 and is not accesible outside of that functions scope.

There is a lot more to it, but hopefully that makes sense.

Hammerite
Mar 9, 2007

And you don't remember what I said here, either, but it was pompous and stupid.
Jade Ear Joe
When you write something like "return a, b, c" you are really doing "return (a, b, c)". That is, you are returning a tuple with three elements. The comma-separated sequence of values is implicitly a tuple. You might find it helpful to write it with the brackets until you are comfortable with this.

When you return just one value (really you always return just one object, it's just that it's sometimes a tuple, but I mean when you don't use two or more values separated by commas) there's no comma and it's not made into a tuple. (But if for some reason you want to return a tuple with one element x, it is possible to do that by writing "return (x,)")

When you assign to more than one thing, as in "a, b, c = func()" it means you are expecting func() to return a sequence with a length of 3. The names in the "sequence of names" on the left are assigned the respective values from the sequence returned by func().

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Hammerite posted:

When you assign to more than one thing, as in "a, b, c = func()" it means you are expecting func() to return a sequence with a length of 3. The names in the "sequence of names" on the left are assigned the respective values from the sequence returned by func().

For future Googlin' this is called "unpacking".

SYSV Fanfic
Sep 9, 2003

by Pragmatica
Anyone know of a good sound recognition library for python or C? I want a library that can compare two audio samples and determine how close of a match they are. I don't remember enough math to write my own (lol laplace and fourier transforms) and I can't find anything. I am currently messing around with echoprint's codegen library to shorten the sample length from 20 seconds down to ~3-5 seconds but I have no idea how well fingerprinting designed for music is going to work for generic sound recognition, especially if there is any background noise involved.

deimos
Nov 30, 2006

Forget it man this bat is whack, it's got poobrain!
Maybe LibXtract?

NtotheTC
Dec 31, 2007


Unit testing question. I decided to write an IRC bot using this lightweight bot framework as a starting point.

I wanted to write some unit tests for it before I started messing around with it, but I'm having trouble wrapping my head around the idea of mocking. For example, I'd like to make a test to check whether it can connect to an IRC server, so I obviously need to mock an irc server, or at least the part of it that would respond when the bot tries to connect. I'm not quite getting how you can mock something like that to behave correctly. Does anyone have any pointers?

SirPablo
May 1, 2004

Pillbug
I'm racking my brain here trying to figure out how to quickly load data from into numpy arrays. I have a single .gz file with 86,000 individual files in it, each one comprised of daily weather data for a single station through its existence. The data in a file looks like this:

code:
USC00242347190802TMAX  -56  6  -72  6   11  6  -39  6   22  6   33  6   22  6   56  6   28  6   39  6   22  6   22  6   39  6   39  6    6  6   61  6   39  6  -50  6    0  6   -6  6   78  6   83  6  167  6  106  6   44  6   67  6   61  6  -17  6  -61  6-9999   -9999  
USC00242347190802TMIN -289  6 -250  6 -139  6 -150  6 -250  6 -183  6 -117  6 -139  6 -156  6  -72  6 -128  6 -156  6 -122  6  -39  6 -167  6  -89  6-9999    -200  6 -139  6 -161  6  -89  6  -33  6    6  6  -17  6  -28  6  -28  6  -28  6 -139  6 -156  6-9999   -9999   
USC00242347190802TOBS  -72  6  -78  6  -67  6 -150  6    6  6  -56  6  -89  6    0  6   22  6  -17  6  -33  6 -122  6   28  6    6  6  -89  6   28  6 -111  6 -111  6  -28  6  -61  6   22  6   28  6  106  6   44  6  -28  6   39  6  -17  6 -111  6 -156  6-9999   -9999   
I want to extract just the lines that include TMAX. I've focused on using numpy's genfromtxt, but where I'm at right now leaves me with a structured array and I would like a standard array of int values.

code:
# Extract from tarball
f = T.extractfile(n).readlines()
# Initialize empty array
tmp = []
# Set some vars for ingesting into array
flags = [11, 4, 2, 4, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1,
	1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5,
	1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
	5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1,
	1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1,
	1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1]
dtypes = [('station', 'S11'), ('Year', 'i4'), ('Month', 'i2'), ('Type', 'S4'),
	('Value0', np.number), ('MFlag0', np.character), ('QFlag0', np.character), ('SFlag0', np.character),
	('Value1', np.number), ('MFlag1', np.character), ('QFlag1', np.character), ('SFlag1', np.character),
	('Value2', np.number), ('MFlag2', np.character), ('QFlag2', np.character), ('SFlag2', np.character),
	...
	('Value30', np.number), ('MFlag30', np.character), ('QFlag30', np.character), ('SFlag30', np.character)]
cols = [1, 2, 3, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116, 120, 124]
# Magic time...now load the entire file into a structured array
d = np.genfromtxt(f, dtype=dtypes, delimiter=flags, usecols=cols, missing_values=-9999, usemask=True)
# Strip out only the TMAX lines
d = d[np.where(d['Type']=='TMAX')]
That leaves me with an array looking like this...

code:
(1908, 2, TMAX, -56.0, -72.0, 11.0, -39.0, 22.0, 33.0, 22.0, 56.0, 28.0, 39.0, 22.0, 22.0, 39.0, 39.0, 6.0, 61.0, 39.0, -50.0, 0.0, -6.0, 78.0, 83.0, 167.0, 106.0, 44.0, 67.0, 61.0, -17.0, -61.0, --, --)
How the F can I splice out that third column with the TMAX and make it a typical numpy array? I thought about trying to save and re-ingest the array but numpy.save will not work on it. Any thoughts?

accipter
Sep 12, 2003

SirPablo posted:

I'm racking my brain here trying to figure out how to quickly load data from into numpy arrays. I have a single .gz file with 86,000 individual files in it, each one comprised of daily weather data for a single station through its existence. The data in a file looks like this:

code:
USC00242347190802TMAX  -56  6  -72  6   11  6  -39  6   22  6   33  6   22  6   56  6   28  6   39  6   22  6   22  6   39  6   39  6    6  6   61  6   39  6  -50  6    0  6   -6  6   78  6   83  6  167  6  106  6   44  6   67  6   61  6  -17  6  -61  6-9999   -9999  
USC00242347190802TMIN -289  6 -250  6 -139  6 -150  6 -250  6 -183  6 -117  6 -139  6 -156  6  -72  6 -128  6 -156  6 -122  6  -39  6 -167  6  -89  6-9999    -200  6 -139  6 -161  6  -89  6  -33  6    6  6  -17  6  -28  6  -28  6  -28  6 -139  6 -156  6-9999   -9999   
USC00242347190802TOBS  -72  6  -78  6  -67  6 -150  6    6  6  -56  6  -89  6    0  6   22  6  -17  6  -33  6 -122  6   28  6    6  6  -89  6   28  6 -111  6 -111  6  -28  6  -61  6   22  6   28  6  106  6   44  6  -28  6   39  6  -17  6 -111  6 -156  6-9999   -9999   
I want to extract just the lines that include TMAX. I've focused on using numpy's genfromtxt, but where I'm at right now leaves me with a structured array and I would like a standard array of int values.


I would probably skip numpy.genfromtxt unless profiling the code showed that it was required.

Python code:
>>> line = 'USC00242347190802TMAX  -56  6  -72  6   11  6  -39  6   22  6   33  6   22  6   56  6   28  6   39  6   22  6  22  6   39  6   39  6    6  6   61  6   39  6  -50  6    0  6   -6  6   78  6   83  6  167  6  106  6   44  6   67  6   61  6  -17  6  -61  6-9999   -9999'
>>> parts = line.split()
>>> year = parts[0][-10:-6]
>>> month = parts[0][-6:-4]
>>> observations = parts[1::2]
>>> combined = [year, month] + observations
>>> d = np.array([int(c) for c in combined])

SirPablo
May 1, 2004

Pillbug
Yea I'm hating to go line-by-line but I think I have to. I also think I'll need the flexibility of genfromtxt since there are no defined character delimiters (it is very likely numbers will run into each other, thus the need for defined columns). My brain finally farted out this option which seems to work.

code:
# Extract from tarball
f = T.extractfile(n).readlines()
# Find where TMAX is
rows = []
for x in range(len(f)):
    if re.search('TMAX',f[x]): rows.append(x)
# Set some vars for ingesting into array
flags = [11, 4, 2, 4, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
	5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
	...
	5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1]
cols = [1, 2, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116, 120, 124]
# Magic time...now load the entire file into a structured array
d = np.genfromtxt(f, delimiter=flags, usecols=cols, missing_values=-9999, usemask=True)
# Strip out only the TMAX lines
d = d[rows]
That gives me standard array with a nice shape.

accipter
Sep 12, 2003

SirPablo posted:

Yea I'm hating to go line-by-line but I think I have to. I also think I'll need the flexibility of genfromtxt since there are no defined character delimiters (it is very likely numbers will run into each other, thus the need for defined columns). My brain finally farted out this option which seems to work.

code:
# Extract from tarball
f = T.extractfile(n).readlines()
# Find where TMAX is
rows = []
for x in range(len(f)):
    if re.search('TMAX',f[x]): rows.append(x)
# Set some vars for ingesting into array
flags = [11, 4, 2, 4, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
	5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1,
	...
	5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1, 5, 1, 1, 1]
cols = [1, 2, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 108, 112, 116, 120, 124]
# Magic time...now load the entire file into a structured array
d = np.genfromtxt(f, delimiter=flags, usecols=cols, missing_values=-9999, usemask=True)
# Strip out only the TMAX lines
d = d[rows]
That gives me standard array with a nice shape.

There is this simplification you could make. I guess I don't understand the need to process lines without 'TMAX' in them. Also, is '6' a separator?

Python code:
rows = [i for i, l in enumerate(lines) if 'TMAX' in l]

Red Mike
Jul 11, 2011

NtotheTC posted:

Unit testing question. I decided to write an IRC bot using this lightweight bot framework as a starting point.

I wanted to write some unit tests for it before I started messing around with it, but I'm having trouble wrapping my head around the idea of mocking. For example, I'd like to make a test to check whether it can connect to an IRC server, so I obviously need to mock an irc server, or at least the part of it that would respond when the bot tries to connect. I'm not quite getting how you can mock something like that to behave correctly. Does anyone have any pointers?

Basically, you want any place in your code where the test needs to interact with the irc server to instead interact with your own "mock" server. This can be easy or hard, depending on how you go about it. It took me a few tries to end up with something decent on my own project. Here's what I came up with for my own lightweight bot framework.

Seeing as how you're testing writing a bot with the framework, I don't think mocking the IRC server will actually be necessary though. From what I skimmed through in your choice of framework, you'll have to do two things:

For any lines of data you want sent towards the IRC server, they should pass through a mock say() (or a raw_send if there is one such function, that sends raw data, instead of saying to a channel). This say() should log, per-test-case, if the information that was sent to the IRC server is the correct one, in the correct order, etc.

For any lines of data you want sent from the IRC server, you should call the methods on the bot directly, by which I mean the callbacks you add. You can also somehow call __processLine with the raw data, but that means testing the framework instead of the bot, which you shouldn't really be doing.

SirPablo
May 1, 2004

Pillbug
My analysis is only looking at maximum temperatures (TMAX) so anything else is data I don't need in the analysis. The number 6 is not a delimiter, it is a data quality flag value (just happens to be a lot of 6s in that example). I'll give the enumerate a spin, thanks!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Two servers, remote from each other. ServerA wants ServerB to run some fairly simple code (less than 30 lines) that ServerB doesn't have. How do I get the code from ServerA to ServerB?

I already have a REST API.

I can't guarantee that they will be running the exact same version of Python. In fact, right now I know one is on 2.7.x and the other is on 2.6.x, so I'm not sure about binary formats being compatible between the two, but we can assume the code itself is compatible and that both have the same 3rd-party libraries available.

Suspicious Dish
Sep 24, 2011

2020 is the year of linux on the desktop, bro
Fun Shoe
Unless you have a very, very good reason, you should never execute code from anybody as a service.

Adbot
ADBOT LOVES YOU

deimos
Nov 30, 2006

Forget it man this bat is whack, it's got poobrain!

Thermopyle posted:

Two servers, remote from each other. ServerA wants ServerB to run some fairly simple code (less than 30 lines) that ServerB doesn't have. How do I get the code from ServerA to ServerB?

I already have a REST API.

I can't guarantee that they will be running the exact same version of Python. In fact, right now I know one is on 2.7.x and the other is on 2.6.x, so I'm not sure about binary formats being compatible between the two, but we can assume the code itself is compatible and that both have the same 3rd-party libraries available.

Sounds like a job for ZeroRPC

Suspicious Dish posted:

Unless you have a very, very good reason, you should never execute code from anybody as a service.

I assumed he wanted something like RPC, but I might be wrong.





Depending on the context Thermopyle, you might want to look into something like salt stack for this, otherwise you're looking at something far more complex like Apache Mesos.

deimos fucked around with this message at 22:19 on Aug 14, 2013

  • Locked thread