Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
SurgicalOntologist
Jun 17, 2004

I'm banging my head against the wall on a numpy/pandas float issue. I have a DataFrame of simulation results (or observations, doesn't matter). There is a floating point column "time", that would be convenient as an index but because floating-point indexes are difficult I am using timestep (integer) as the index and keeping time as a regular column. However, a common operation is looking up the index at a particular time. Because time is a float it can't be done the easy way (this is the same reason not to use time as the index). Anyways, here is my attempt:
Python code:
def index_at_time(time_series, time_values):
        return time_series.searchsorted(time_values)
Here time_series is a pandas Series but if you're only familiar with numpy this is equivalent to np.searchsorted(time_series, time_values).

Here are my test:
Python code:
@given(floats(min_value=0.5, max_value=10))
def test_index_at_time_vector_input(mul):
    time_series = pd.Series(np.arange(0, 2, 0.1))
    time_steps = np.arange(20)
    time_values = time_steps * mul / 10
    time_values /= mul
    np.testing.assert_allclose(index_at_time(time_series, time_values), range(20))
I am using hypothesis to feed in floats in order to mess with the floating-point representation without actually changing the underlying values.

Unfortunately, it fails consistently at time step 9:
code:
mul = 0.5000000000026954

>       np.testing.assert_allclose(index_at_time(time_series, time_values), range(20))
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       (mismatch 10.0%)
E        x: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 10, 11, 12, 13, 14, 15, 16,
E              17, 19, 19], dtype=int64)
E        y: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
E              17, 18, 19])
It's the same thing when I test the values one at a time, I get 0, ..., 7, 8, 10, 10, ... 19.

In short, I can't figure out a good way to reliably get the closest index with floats. I really don't want to do something like argmin(abs(time_series - time_values)) for a frequently used lookup function, but is that the only way?

SurgicalOntologist fucked around with this message at 17:47 on Nov 16, 2017

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

In short, I can't figure out a good way to reliably get the closest index with floats. I really don't want to do something like argmin(abs(time_series - time_values)) for a frequently used lookup function, but is that the only way?

Searchsorted doesn't look for the index of the nearest-to-target value, it looks for the index where you could insert the target value while keeping the original vector sorted. If searchsorted returns idx for target it doesn't tell you if target is closer to values[idx-1] or values[idx], you need to check that separately. Here's a solution from stackoverflow:
code:
def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    return idx - (np.abs(value - array[idx-1]) < np.abs(value - array[idx]))

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Want to get all your existing code typed for mypy?

https://github.com/dropbox/pyannotate

Jose Cuervo
Aug 25, 2004
I wrote a small data processing script a while back and since then I have updated pandas. Now the script does not work. Short of creating a venv for a single script, is there any way of telling my system to use a specific version of pandas when running the script?

huhu
Feb 24, 2006

Jose Cuervo posted:

I wrote a small data processing script a while back and since then I have updated pandas. Now the script does not work. Short of creating a venv for a single script, is there any way of telling my system to use a specific version of pandas when running the script?

Why not just use a virtualenv? virtualenvwrapper makes it a lot easier to work with virtualenv. I imagine this isn't going to be the last time you're going to have version issues so I'd suggest just making a virtualenv and setup a requirements.txt.

QuarkJets
Sep 8, 2008

Yeah your environment probably only has 1 version of pandas installed, so your script is going to use that version. If you want a different version then you would need to either downgrade pandas in that environment (which is the easier option) or create a new environment with the downgraded pandas and then either activate that environment whenever you want to use this script or modify your script's shebang to use the correct environment's python binary

Nippashish
Nov 2, 2005

Let me see you dance!
Another option is to fix the script to work with the updated pandas version.

Jose Cuervo
Aug 25, 2004
My use case is that I need to rerun a bit of analysis for a paper in response to a review that requires something slightly different or with a updated set of data. I don't think it is worth my time to fix the script that I might have written 6 months ago for a one time run, so it seems like the venv is the way to go.

QuarkJets
Sep 8, 2008

Do you use conda? Conda lets you downgrade packages really easily, then you can just upgrade again when you're done

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
Conda is seriously so good.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

You can do the same on a vanilla install...

Seventh Arrow
Jan 26, 2005

I'm trying to comprehend numpy arrays - I have an assignment and the first question is to create a random vector of size 20 and sort it in descending order. This is what I came up with:

code:
 
import numpy as np
a = np.random.random((1,20))
np.sort(-a)
I get the following result, so I think it works:

array([[-0.94139218, -0.70652483, -0.67840897, -0.67044282, -0.62539388,
-0.61770677, -0.58816414, -0.46556941, -0.44944398, -0.4487512 ,
-0.43776743, -0.41519608, -0.39534896, -0.34280607, -0.23698099,
-0.0829909 , -0.05634266, -0.05450404, -0.04979055, -0.02429839]])

I'm not sure about the parameters used for 'np,' though - in this case ((1,20)). So I think '1' means that is has one dimension, so it's a vector. The '20' seems to be the total size of the array. Then I see many arrays that have a third number, but I'm not sure what it means...the tutorials that I've seen so far refuse to stoop to my level. Can anyone elucidate?

QuarkJets
Sep 8, 2008

Seventh Arrow posted:

I'm trying to comprehend numpy arrays - I have an assignment and the first question is to create a random vector of size 20 and sort it in descending order. This is what I came up with:

code:
 
import numpy as np
a = np.random.random((1,20))
np.sort(-a)
I get the following result, so I think it works:

array([[-0.94139218, -0.70652483, -0.67840897, -0.67044282, -0.62539388,
-0.61770677, -0.58816414, -0.46556941, -0.44944398, -0.4487512 ,
-0.43776743, -0.41519608, -0.39534896, -0.34280607, -0.23698099,
-0.0829909 , -0.05634266, -0.05450404, -0.04979055, -0.02429839]])

I'm not sure about the parameters used for 'np,' though - in this case ((1,20)). So I think '1' means that is has one dimension, so it's a vector. The '20' seems to be the total size of the array. Then I see many arrays that have a third number, but I'm not sure what it means...the tutorials that I've seen so far refuse to stoop to my level. Can anyone elucidate?

np is the numpy module, importing it as np just gives you a shorthand way of accessing everything in numpy. The other way is to just "import numpy" and then you'd be calling numpy.random.random(). It's very common to see numpy aliased to np in this way, it's convenient and common practice

What you're really doing is giving arguments to the random() function of the np.random module. The arguments you provided are a tuple: (1,20). The tuple you provide defines what shape the output array will have. You could have also provided (20,) and it would have still given you a 20-element array. If you wanted a 4x10 array (40 elements), you'd give it (4,10), ie:

np.random.random((4,10))

QuarkJets fucked around with this message at 02:12 on Nov 24, 2017

Seventh Arrow
Jan 26, 2005

Ok great, thanks! I did know about the "np" thing but thanks anyhoo :) So in the next question it says to create a 5x5 array with 6's on the borders and 0's on the inside. So I guess I would use ((5,5)) for the tuple, yes? I'll have to look into arranging the numbers in such a specific fashion though.

edit: wait, the ((5,5)) doesn't seem right

Also, is that a real quote from Duck Dunn?

QuarkJets
Sep 8, 2008

For more information on random, you can try googling for the numpy random module:

https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

You can call any of the functions defined there by invoking np.random.whatever_function_you_want() with the proper arguments. The docs will describe what the arguments are and what they're used for. For instance, you invoked numpy.random.random, which according to the documentation returns a random array of floats. But what if you want integers? That page shows there's a randint function:

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.randint.html#numpy.random.randint

You'd call it with np.random.randint

QuarkJets
Sep 8, 2008

Seventh Arrow posted:

Ok great, thanks! I did know about the "np" thing but thanks anyhoo :) So in the next question it says to create a 5x5 array with 6's on the borders and 0's on the inside. So I guess I would use ((5,5)) for the tuple, yes? I'll have to look into arranging the numbers in such a specific fashion though.

edit: wait, the ((5,5)) doesn't seem right

Also, is that a real quote from Duck Dunn?

random.random((5,5)) would give you a 5x5 array of random floats. You don't really want a random array of values though; you may want to use something like np.zeros or np.ones

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
I've started using pyCharm.
I like all the little advice and hints but I don't understand this one:



A quick googling suggest this isn't bad practice but I'm curious anyway.
This is the code:

Python code:
try:
    import wx
except ImportError:
    try:
        import Tkinter
        import tkMessageBox
        tkMessageBox.showinfo("Erreur", "wxPython n\'est pas installé")
    except ImportError:
        print Fore.YELLOW +"Tkinter n'est pas installe"
    print Fore.YELLOW + "wxPython n'est pas installe"
Is there something wrong with this ? It tries to import wxPython, if it fails tries displaying a message with Tkinter, then defaults to the terminal.

e: Thanks Loezi

Loezi posted:

I think it's just helpfully telling you that you won't be able to use Tkinter after the inner try/except block if the the try fails. Which is true, but also something you already knew?

unpacked robinhood fucked around with this message at 17:24 on Nov 27, 2017

Stringent
Dec 22, 2004


image text goes here
Answered my own question, nm. :)

Loezi
Dec 18, 2012

Never buy the cheap stuff

unpacked robinhood posted:

I've started using pyCharm.
I like all the little advice and hints but I don't understand this one:



A quick googling suggest this isn't bad practice but I'm curious anyway.
This is the code:

Python code:
try:
    import wx
except ImportError:
    try:
        import Tkinter
        import tkMessageBox
        tkMessageBox.showinfo("Erreur", "wxPython n\'est pas installé")
    except ImportError:
        print Fore.YELLOW +"Tkinter n'est pas installe"
    print Fore.YELLOW + "wxPython n'est pas installe"
Is there something wrong with this ? It tries to import wxPython, if it fails tries displaying a message with Tkinter, then defaults to the terminal.

I think it's just helpfully telling you that you won't be able to use Tkinter after the inner try/except block if the the try fails. Which is true, but also something you already knew?

Danyull
Jan 16, 2011

Can anyone tell me why the upper bound of this while loop increases by thousands every iteration? As far as I can tell I'm not modifying the pix array at all so I'm not sure how elements are being added to it.

code:
idx = 0
while idx < len(pix):
    l = rule.len
    if (idx + rule.len) >= len(pix):
        l = len(pix) - idx
    chunk = Chunk()
    for i in range(idx, idx + l):
        chunk.ch.append(pix[i])
    chunks.append(chunk)
    idx += 1

Danyull fucked around with this message at 06:08 on Nov 29, 2017

breaks
May 12, 2001

As you say, from that code alone there is no reason len(pix) should increase. Absent any other information I'm going to guess the actual problem has to do with taking chunks of length L and incrementing your index by 1. If idx is 50 and rule.len is 100 and len(pix) is some large number, are you really intending to have chunks of pixels 50-149, 51-150, 52-151, etc?

If that's not it either I'm overlooking something or you need to provide more information.

If that is it, using names such as "chunk_length" instead of "l" or "position" instead of "idx" could help you avoid this kind of problem in the future. Having "i", "idx", "l", and "1" all in the same 10 lines of code is a bit gnarly. The loop appending pixel one at a time to chunk.ch also smells funny, it looks like a slice might be a good idea, but I don't know what pix and chunk.ch are exactly.

breaks fucked around with this message at 07:48 on Nov 29, 2017

Danyull
Jan 16, 2011

The loop is part of a script that should be breaking images into chunks and sub-chunks of predetermined sizes, then randomly shuffling and permutating some of of them based on a chosen pattern. I'm using Pillow as a way to gather data about the image. The pix array is a list containing tuples that represent the RGB values for each pixel of the image in the form (RRR | GGG | BBB). So yes, it has the chance to be pretty large, but I am currently testing it on images that are only about 10x10 pixels in size and still having the problem.

I actually tried slicing for that appending loop but I think I was doing it wrong because chunk.ch kept coming up as an empty array after trying to add pixels to it. Also sorry about the bad naming/syntax, this is my first real try at programming since high school.

breaks
May 12, 2001

So from your description it sounds as if you do intend to increment your index by l (the length of the current chunk) instead of 1 (as in the number). If you fix that does the current problem persist?

Cingulate
Oct 23, 2012

by Fluffdaddy
Danyull I'm not perfectly sure I get what you want, but

code:
for idx, pic in enumerate(pix):
    if not idx % rule.len:
        chunks.append(Chunk())
    chunks[-1].ch.append(pic)
Something like this?

Danyull
Jan 16, 2011

breaks posted:

So from your description it sounds as if you do intend to increment your index by l (the length of the current chunk) instead of 1 (as in the number). If you fix that does the current problem persist?

You're right, I should have used the length there instead of one, however the infinite loop problem still existed. I ended up fixing it actually, but I still don't know how. For some reason when it got to the point of creating a new chunk, the new chunk.ch would already be filled with the elements of pix. Appending the elements of pix onto the end of chunk.ch was for some reason also appending the elements onto pix again, causing it to double in size each time. The issue went away after adding in "chunk.ch = []" between "chunk = Chunk()" and the appending loop.

Cingulate
Oct 23, 2012

by Fluffdaddy

Danyull posted:

You're right, I should have used the length there instead of one, however the infinite loop problem still existed. I ended up fixing it actually, but I still don't know how. For some reason when it got to the point of creating a new chunk, the new chunk.ch would already be filled with the elements of pix. Appending the elements of pix onto the end of chunk.ch was for some reason also appending the elements onto pix again, causing it to double in size each time. The issue went away after adding in "chunk.ch = []" between "chunk = Chunk()" and the appending loop.
So
code:
for idx, pic in enumerate(pix):
    if not idx % rule.len:
        chunks.append(Chunk())
        chunks[-1].ch = list()
    chunks[-1].ch.append(pic)
?

Danyull
Jan 16, 2011

I ended up doing:

code:
        while position < len(pix):
            chunk_length = rule.len
            if (position + chunk_length) >= len(pix):
                chunk_length = len(pix) - position
            chunk = Chunk()
            chunk.ch = []
            for i in range(position, position + chunk_length):
                chunk.ch.append(pix[i])
            chunks.append(chunk)
            position += chunk_length
Anyways after some more fixing it works now but only on half of the image:





e: and now fully fixed

Danyull fucked around with this message at 21:47 on Nov 29, 2017

Cingulate
Oct 23, 2012

by Fluffdaddy
"while" solutions usually feel rather un-pythonic to me. I think if you want to iterate over an iterable, you should iterate over the iterable!

Counting indices (or however that thing is called) are alien and un-pthonic.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

At a guess something screwy was going on in your Chunk class, so when you constructed a new one you were actually reusing an array (maybe handing the same one out multiple times so they all saw every append), or doing something with your pix list or the original Image object, something like that

Honestly in these situations the best thing is to run it through a debugger, set it to pause in a useful place so you can poke around at the current state of things, and you'll probably get an idea of where things are going wrong. Sometimes it's an obvious bug, other times it gives you a direction to investigate

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Cingulate posted:

"while" solutions usually feel rather un-pythonic to me. I think if you want to iterate over an iterable, you should iterate over the iterable!

Counting indices (or however that thing is called) are alien and un-pthonic.

Well guy hasn't been programming for a while! So long as it works that's the main thing, you can clean it up later

For this kind of thing I'd look at itertools, there are some functions that do chunking and windowing so you don't need to worry about janitoring your own ranges
oh wait now I remember, it does not have that and you have to write your own which is bad. Thanks python!

baka kaba fucked around with this message at 22:25 on Nov 29, 2017

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
conda build variants, how do they work?

I have a Python package called package1 that I want to build for both Python 2.7 and 3.5. package1 has a build requirement for package2, which is also a Python package and should also be built for Python 2.7 and 3.5, and package2 is not a sub-package of package1.

Inside each of my conda recipe folder, as explained on the build variant page, I have the following file:

code:
# conda_build_config.yaml

python:
  - 2.7
  - 3.5
When I do conda build package1 it first tries to build the py27 variant. It sees that I need the py27 variant of package2, so it builds that, and then it continues and builds the py27 variant of package1 (the resulting file is something like package1-py27.tar.bz2). Next, it tries to build the py35 variant of package1. It sees that I have built package2, but since that one is the py27 variant it fails the build since package1 itself requires py35 now.

If I instead do conda build package2 first then it will correctly build package2_py27 and package2_py35, and then if I build package1 it'll correctly build package1_py27 and package1_py35.

Is there a way to make the first usage case work properly so that I don't have to worry about build order?

Danyull
Jan 16, 2011

It's probably not pythonic because I only knew C# and Java up until this point.

baka kaba posted:

At a guess something screwy was going on in your Chunk class, so when you constructed a new one you were actually reusing an array (maybe handing the same one out multiple times so they all saw every append)

This looks to have been the problem. I screwed up the def __init__ for that class and it wasn't creating a new array for some reason. Fixed that and then I could remove the chunk.ch = [] line I had to throw in.




If anyone wants to mess around with it you can get it here. The only package you need is Pillow. You just run shufflestream.py with an image named test.jpg in the same project/folder and it will pop out a new glitch.jpg. You can edit the constants/patterns at the beginning of the file for interesting results.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Noice! It would be cool if you could supply a pattern that says which chunks get glitched (so you can have some clean parts and maybe make the shapes everyone loves) or make the randomisation function get more pronounced as it goes, so it gets more glitchy as you go down the image

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Is there a good way to step through Python code? I found a Python library to parse torrent names (https://github.com/divijbindlish/parse-torrent-name/blob/master/PTN/parse.py) and I can't quite figure out how it works.

a witch
Jan 12, 2017

Hughmoris posted:

Is there a good way to step through Python code? I found a Python library to parse torrent names (https://github.com/divijbindlish/parse-torrent-name/blob/master/PTN/parse.py) and I can't quite figure out how it works.

Pycharm. Add the library to your project, set a breakpoint in it and run the debugger.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

PyCharm is great, but if you don't want to use it, you can use pdb or ipdb.

Foxfire_
Nov 8, 2010

pudb's my favorite if you're on unix

Linear Zoetrope
Nov 28, 2011

A hero must cook
How fraught is the idea of runtime module generation with Python? I'm working with protobuf and have had one too many subtle bugs that amounted to "you forgot to run the script to regenerate from the proto modules, idiot." My other code in compiled languages has it as part of the build process, making this near-impossible, and I'm toying with the idea of adding it to the Python module at load-time.

I was thinking something along the lines of:
code:
# .../protos/__init__.py

spin wait on a file-system level .__protolock__

if generated files are currently present:
    if we can locate protoc and the protobuf source:
        regenerate if the generated files are out of date, accept otherwise
    else if we can only locate the protobuf source:
        warn if the generated files are out of date, accept otherwise
    else:
         accept (for redistributing the files without the protobuf source)
else:
    if we can locate protoc and the protobuf source:
        generate the files
    else:
        emit an appropriate warning about missing protoc/source

delete .__protolock__
(NB, this is slightly abstracting things, I have to make a one-line substitution to one of the files due to protoc weirdness, but is mostly accurate).

But I'm not sure how bad of an idea this is with Python. It looks like there are tools for finding the location of the current source or .pyc file, but I'm wondering if there are horrific code-lovecraftian angles I haven't considered here.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
Little things:

Is there a pythonic one-liner to make natural langage enumerations ?
For example I'd have:
Python code:
fatmenu=['donut','deep fried kale','spoken yogurt']
and want to list them like "you ordered donut, deep fried kale and spoken yogurt", with commas for more than two items.
Same with "you ordered 3 items" with the 's' added as needed ?

It's two dead simple functions with conditionnals but I feel like there's an interesting and probably obtuse python trick for this.

Adbot
ADBOT LOVES YOU

Da Mott Man
Aug 3, 2012


unpacked robinhood posted:

Little things:

Is there a pythonic one-liner to make natural langage enumerations ?
For example I'd have:
Python code:
fatmenu=['donut','deep fried kale','spoken yogurt']
and want to list them like "you ordered donut, deep fried kale and spoken yogurt", with commas for more than two items.
Same with "you ordered 3 items" with the 's' added as needed ?

It's two dead simple functions with conditionnals but I feel like there's an interesting and probably obtuse python trick for this.

Python code:
fatmenu=['donut','deep fried kale','spoken yogurt']
print("{}, and {}".format(", ".join(fatmenu[:-1]), fatmenu[-1]))
That is how I would do it.

EDIT: Shouldn't post at 3am. Misread what you asked, I would just do it with an if, elif, else block.

Da Mott Man fucked around with this message at 12:51 on Nov 30, 2017

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply