Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Cingulate
Oct 23, 2012

by Fluffdaddy
I often do something like this:

code:
parameter = 999
while does_not_match_some_condition(parameter):
    parameter = stochastic_process()

do_things_with_parameter(parameter)
I.e., to find a good value for my parameter (might be a string, a number, a sequence ...), I use a while loop to try a few values for my parameter, until I have one that matches some conditions. Then, I use that parameter for some purpose.

Specifying the parameter to some arbitrary value before the while loop feels inelegant. Is there a better way for doing this?


A more fleshed-out example, maybe I want to randomly draw an even integer between 140 and 75858 whose third-to-last digit is a 4 or a 7 for whatever reason:

code:
from random import randint
my_number = -1

while my_number % 1 and str(my_number)[-3] in {'4', '7'}:
    my_number = randint(140, 75859)

print(my_number)
(Ignore that this is probably a very inefficient solution for solving this specific task.)


I guess this is a very minor thing, but it's a pattern I use all the time and it feels weird.

Adbot
ADBOT LOVES YOU

Cingulate
Oct 23, 2012

by Fluffdaddy

HardDiskD posted:

I try to start with None or a boolean value if I don't have a default value that makes sense.
Yeah but often the check requires a specific type. I guess there's some default for all these types (empty string, empty list, zero ...).
But sometimes these 'natural defaults' match the condition. E.g., what if False matches my criterion of being (not var % 2 and var < 1000)?

Cingulate
Oct 23, 2012

by Fluffdaddy
I like the generator one! Thanks guys.

Cingulate
Oct 23, 2012

by Fluffdaddy

SurgicalOntologist posted:

I don't fully understand the algorithm but the first step is to try to do this with vectors operation. This might not work as written but may only need small tweaks. At least it should illustrate the idea of vectorizing the operation.

Python code:
x, y = np.meshgrid(range(0, nummonths), range(0, nummonths))
cohorts = data['nps'] * data['duration']**(y - x)
What's the point of the '0's here? Is this best behavior somehow?

Cingulate
Oct 23, 2012

by Fluffdaddy

QuarkJets posted:

[i for i in range(1, 100, 1)]
A single-letter, completely uninformative variable name? You monster. Call it list_creator_running_dummy or something like that.

Cingulate
Oct 23, 2012

by Fluffdaddy
For real though, how did you oldsters ever live with Python 2 and its leaking list comps?

Cingulate
Oct 23, 2012

by Fluffdaddy

HardDiskD posted:

You mean the leaking variable that is used on building the comps
Yeah, [_ for this_var_will_leak in in some_iter]

In other news, I may be literally the worst coder ITT, but I got really good teacher evaluations on my recent Python class and we'll try and apply for funding for a bigger project for teaching Python to students, and for using Python to teach science to students :v:

Cingulate
Oct 23, 2012

by Fluffdaddy
code:
python -m "this" | grep honking
:colbert:

Cingulate
Oct 23, 2012

by Fluffdaddy
Different question: does Continuum make money? What are the chances "conda install all_the_software_i_need" won't work in 2018 because Travis Oliphant has to choose between making it slightly easier for me to set up numpy or feeding his kids?

Cingulate
Oct 23, 2012

by Fluffdaddy
I also have a Pandas question. I have a multi-indexed Dataframe. I want to subset the DF based on the value of column A, and then look up stuff in the original DF based on values of one of the index columns and one of the regular columns in the subset, but in pairs.

E.g., the subset is
code:
i_1 i_2 A  B
0   a   x  a
1   b   x  c
2   c   x  a
And now I want to look up these rows of the original:
code:
i_1 i_2
0   a
1   c
2   a
I have code for this, but it is specific to my situation and I am sure very suboptimal:

code:
result = [dict(g.query("A == @cond_b & B == '{}'".format(int(ii)))[["A", "C"]].values)
 for _, g in df_main.groupby(level=0)
 for ii in g.query("A == 'cond_a'")["B"]]
Sorry for this overly convoluted and irreproducible example, but maybe somebody can already see where I'm going for.


huhu posted:

2018? I don't think anything is guaranteed 3 months from now.
Anything particular makes you say that?

SurgicalOntologist posted:

The nice thing about a heavily used open source project is that it cannot really go away. If Continuum disappears the code will still exist and there will be enough volunteers to maintain it.
Yeah but somebody has to host all of that stuff. And possibly pay for the traffic, pay for up to date packaging, etc. Pay for boring bugfix maintenance work.

Now a lot of the times, this will work out fine, somehow. But I'm sure if Continuum went under, things would change, and probably not for the better.

Cingulate
Oct 23, 2012

by Fluffdaddy
Why is there no natural sort in standardlib!?!!!

Cingulate
Oct 23, 2012

by Fluffdaddy

dougdrums posted:

Python code:
noisymodule.print = lambda x : None
Haha take that noisy module!
Crank it up a notch.

Python code:
noisymodule.print = lambda x : del random.choice(noisymodule.keys())

noisymodule.warn = lambda x : os.remove(glob.glob(noisymodule.__file__.replace("__init__.py", "*"))

Cingulate
Oct 23, 2012

by Fluffdaddy
Somebody fix the code and put it on pypi pls

Cingulate
Oct 23, 2012

by Fluffdaddy

My Rhythmic Crotch posted:

I am looking for a python module to help me do some design/sim type of work. I want to build an object as a 3d array of points, and be able to interact with the object (zoom, rotate, etc). A 3D surface plot in matplotlib is almost exactly what I want, except when viewing the plot it's really clunky and hard to interact with. So basically something just like that, except less frustrating. There doesn't seem to be much out there for this. Any suggestions welcomed.
We use Mayavi to visualize brains in 3D and rotate them. However, it's a terrible piece of software, if at all possible, you should use something else.

Cingulate
Oct 23, 2012

by Fluffdaddy

pangstrom posted:

Probably sounds like faux outrage but I'm really just curious: is starting bullet lists for human consumption with element zero something programmers routinely do?
I thought that's a thing people do to imply point zero is what you should have done already in the first place or something like that. Like, utter basics.

Cingulate
Oct 23, 2012

by Fluffdaddy
Now tell them about ... the stars.

Cingulate
Oct 23, 2012

by Fluffdaddy

Baby Babbeh posted:

What's the best way to go through a pandas dataframe and convert all of the values to a different value if they meet a set criteria? I've got a data frame representing grayscale images, where each cell is a value between 0 and 255, and I'd like to just convert this to black and white by changing anything that's not zero into a 1 prior to doing some analysis on it. Is this something I should use .apply for?

Forgive me if this is a stupid question, but I'm really new to working with pandas and its slightly different from how a lot of python base types work.
Just like in numpy?

df[df != 0] = 1

Cingulate
Oct 23, 2012

by Fluffdaddy

Baby Babbeh posted:

That... makes sense. I was overcomplicating this. It returns another df rather than changing it in place, right?
It assigns 1 to every place in the df whose former value is not 0. Nothing is "returned"; there is no function call.

indices = df != 0
df[indices] = 1

Cingulate
Oct 23, 2012

by Fluffdaddy
Interesting, I literally never use .where. Have to look into that.

Also :same:ing the py3redirect

Cingulate
Oct 23, 2012

by Fluffdaddy

Malcolm XML posted:

:psyduck:

Did u think that 64 bits was enough to represent all of the reals?
Python is a language that's being used by people with very little CS knowledge (e.g., me). That's, from what I can tell, by design: Python is easy to learn, easy to read, welcoming and forgiving. I think the thread has a good track record of being exactly like that, too.

Cingulate
Oct 23, 2012

by Fluffdaddy

Malcolm XML posted:

any programmer that's using floats for numerical work needs to know when and how they work so when they gently caress up they aren't surprised. math.fsum is in the standard library for a reason

pmchem posted:

anyone involved with scientific software should be aware of floats

I would think the entire neural network community largely ignores floating point issues, what with them running near-exclusively on single precision.

Malcolm XML posted:

Actually this is the worst possible takeaway because Numerical analysis is a) hard and b) for even most practical purposes requires an understanding of when it'll fail on you. Catastrophic cancellation and loss of precision can lead to cases where your component will fail and fail hard. Unless you don't want to engineer robustly, sure you can ignore it. I have run into many cases where poor or no understanding of float approximations have lead to pernicious bugs in production systems costing lots of money

While I don't expect people to understand IEEE-754 the standard in its entirety, it is immensely unhelpful to present it as a real number/exact fraction abstraction since it's leakier than a sieve (but designed in such a way to be useful for those in the know) and frequently beginners will smash their heads against the wall for days until they are told how the semantics of floats work when they run into "random" "non-deterministic" errors that are 100% deterministic.


For example, addition is commutative but not associative. This isn't trivial nor particularly expected.

>>> 1e16 + 1 == 1e16
True

A way of salami slicing

Perfectly fine if your algorithms are insensitive to things like that (i use floats in quant finance optimization models) but try simulating a chaotic system and watch as floats are completely useless

It gets better when you have no idea how chaotic your system is!
My point was you can correct someone without being an rear end to them.

Cingulate
Oct 23, 2012

by Fluffdaddy

Malcolm XML posted:

it's not as fun
I generally don't do this, but- if your fun is being needlessly mean towards other people, here's some life advice for you: try doing that less. Try changing. You'll become a happier person, and contribute more to the lives of those around you.
I don't mean this to disparage you or as a counterargument to your position - which it is not - but as advice from someone who should've been given that lesson much earlier themselves, too.

shrike82 posted:

yup, google's TPUs use quantization to improve inference performance so you can't say that ML practitioners can ignore FP issues
They can, and do, largely ignore them. They can also try to reduce precision even more to get some performance benefits in certain applied cases.

(I'm obviously not saying these people are ignorant of the issues, just that the issues are largely irrelevant, and being ignored.)

Cingulate
Oct 23, 2012

by Fluffdaddy

shrike82 posted:

An ML practitioner or academic could decide to ignore precision based on their needs, but I'd find it hard to argue that one can make an informed decision without understanding the basics of floating point arithmetic.
I guess in practice, it's irrelevant cause the guys who wrote e.g. Tensorflow definitely understand the fine-grained details here, and if the issue were relevant, they most certainly would be able to account for that. You probably don't get into a situation where you do heavy AI stuff without automatically understanding floating point issues.

(For applied guys like me, it might be different, and it's possible people are getting hosed over here from time to time.)

Cingulate
Oct 23, 2012

by Fluffdaddy
I think I need async/await ..? But I have never used either, nor understood the descriptions.

I am training a neural network on synthetic data, and I'm alternating data generation and training. So it looks like this:

code:
for epochs in range(number_of_epochs):
    X, y = generate_data(10000)  # generate training data
    model.train(X, y)  # train the model for one epoch
Now the data generation actually takes a significant fraction of the time it takes to train the model on the generated data (about 30%). But both run only on 1 CPU core (the model runs on the GPU). So it would save me some time if I could generate the next batch of data while the model is training. I.e., if I could do something like:

code:
X, y = generate_data(10000)  # generate training data
for epochs in range(number_of_epochs):
    model_finished = without_GIL(model.train(X, y))  # start training and release GIL
    if epochs < number_of_epochs:  # don't need for last epoch
        X, y = generate_data(10000)  # use CPU while model is running on GPU
        wait_for(model_finished)  # only proceed to next iteration once both training and data generation have finished
Is that something I could use await for?

Edit: both functions fully occupy 1 core. I think that means await is not enough? Maybe I should just abuse joblib heavily, that I could do.
Double Edit: joblib uses pickling and the model is compiled and sent to the GPU, so I think joblib wouldn't work.

Cingulate fucked around with this message at 14:35 on Jun 15, 2017

Cingulate
Oct 23, 2012

by Fluffdaddy
Thanks everyone.

I think step 1 should be to make my generate_data less horribly ineffective, it's a big nested loop. Bummer - I thought this'd force me to learn something new :v:

Cingulate
Oct 23, 2012

by Fluffdaddy
I know joblib and multiprocessing's pool - I use them a lot actually, because most of my problems are trivially parallelizable. This one, I fear, is not: the training happens in Theano (I think I hosed something up with my Tensorflow installation?), so it compiles CUDA code for the GPU and then it sits there. So the training always needs to happen in the main thread/kernel/session. But yes, if I can send the data generation to a secondary process, that would save me some time. So this can be done with the multiprocessing module, did I get that right?..

(Having multiple X sit in memory probably isn't viable cause they're pretty big - tens of GB - , but I guess I can make them smaller and find a few GB somewhere and it should work.)

Cingulate
Oct 23, 2012

by Fluffdaddy

shrike82 posted:

Out of curiosity, what is the NN problem you're working with?
Training only for a single epoch for a given training set (some kind of augmenting going on there?) and data generation taking 30% of training time smells of there being room for cleaning up the code around there before spending too much time on shifting to a multiprocess setup.
Turns out there is a perfectly fine Keras module for doing just what I did manually very badly. :downs:

Cingulate
Oct 23, 2012

by Fluffdaddy
I'm sure if you can organize it in matrix form/numpy/pandas, it's gonna run in a fraction of the time. Essentially, you want to vectorize it: instead of having objects per dataset, you just have one matrix, and apply each of these operations simultaneously to all of its rows.

And to perhaps slightly increase readability for baka's code:
code:
def wait_time(self, time_seconds):
    # skipping the temp variables
    x1k10 = self.x1 * self.k10
    x1k12 = self.x1 * self.k12
    x1k13 = self.x1 * self.k13
    x2k21 = self.x2 * self.k21
    x3k31 = self.x3 * -self.k31

    self.x1 += sum((x2k21, x1k12, x3k31, x1k13, x1k10)) * time_seconds
    self.x2 += (x1k12 - x2k21) * time_seconds
    self.x3 += (x1k13 + x3k31) * time_seconds
    # factoring out self.keo so you multiply once
    self.xeo += self.keo * (self.x1 - self.xeo) * time_seconds
(Please double check if I botched any of the signs :v:)

Cingulate
Oct 23, 2012

by Fluffdaddy

Philip Rivers posted:

Okay, so I have a set of points and a set of lines defined by two endpoints. Basically I'm working on an intersection algorithm and I need to figure out when a given point intersects one of the lines, but what makes it tricky for me is that the lines exert force on the endpoints and pull them together so the position of the endpoints are constantly updating. Here's what it looks like in action:





I don't know what the best way to access all those endpoints quickly would be, I'm very naively self taught and don't know much at all about data/efficiency.
Did you check if one of the many graph/network analysis toolboxes works for your problem?

Cingulate
Oct 23, 2012

by Fluffdaddy

Philip Rivers posted:

Do you have any resources on this? I'm very out of my depth with a lot of this and I'm just fumbling in the dark.
It's not perfect, but it's a start ...

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf

Generally, Numpy is your pythonic way of calling highly optimized packages for numerical computation. As Eela6 said, try to vectorize: not much looping, but operating on vectors and arrays as a whole.

Cingulate
Oct 23, 2012

by Fluffdaddy

SirPablo posted:

Damnit that does work. Thanks for pointing out my idiocy. Suppose I can take the output values and make a new Dataframe with them.

What about restricting data to a specific window of days in the year? I know I can slice easily by year-mo (for example, df['1990-01':'1999-12']), but can you slice by mo-dd? It'd be nice to get all the data between, say, June 15 and Sep 30, like df['06-15':'09-30'] but for all years. Is that possible?
https://stackoverflow.com/a/16179190 ?

Cingulate
Oct 23, 2012

by Fluffdaddy
If it's tabular data, just use pandas read_csv (and make plots in seaboard).

Cingulate
Oct 23, 2012

by Fluffdaddy
Situations where I can see the hand-crafted option make sense:
- you're analysing data and tweeting on an embedded system running python 2.6 on 128MB RAM
- you want to Learn Python the Hard and Manly Way

Situations where I'd prefer the Pandas option:
- you want to get things done and learn the tools you're actually gonna use in real situations

Anything I forgot? :v:

Jose, in case "seaboard" confused you, I apologise: install anaconda and go here.

Cingulate
Oct 23, 2012

by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.
But if you work with data and want to plot it, Pandas is the API you'll be learning in the end anyways. For data stuff, Pandas is more standardlib than the standardlib!

Cingulate
Oct 23, 2012

by Fluffdaddy

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.


I am no fan of masochism. I just don't like solutions that are 'well, first install this dependency' for simple problems. I think you should use the level of tool that's appropriate for your problem.

If I'm using CSVs, I'm probably going to be doing it by hand. If I have a big dataset that requires the big guns,

1. I'm going to use xarray , not pandas
2. Why the hell am I using CSVs?
The recommendation for using Python for data sciency stuff is to install Anaconda, which carries pandas.

(You're probably using CSVs/pandas because 1. the party/platform you got the data from uses CSVs/stuff pandas can easily parse.)

I don't think this is about "big guns". It's just so much more handy to use pandas and the standard scientific Python stack even for tiny data.

It seems like 1/3rd of questions ITT are solved by SurgicalOntologist posting a pandas one-liner.

Cingulate
Oct 23, 2012

by Fluffdaddy

Jose posted:

Ok i'll check out pandas. Thanks everyone. I was using QtConsole launched through anaconda yesterday but would spyder be more suitable?
For data handling/analysis/viz purposes, consider the Jupyter notebook instead.

Cingulate
Oct 23, 2012

by Fluffdaddy

Jose posted:

Can anyone link a good guide for combining pandas and matplotlib? Basically how matplotlib usage differs if I'm using pandas data frames
Matplotlib isn't pandas aware. But consider Seaborn. It's essentially a bunch of shortcuts for doing nice matplotlib plots from pandas Dataframes. Just go to the seaboarn website.

Cingulate
Oct 23, 2012

by Fluffdaddy

Slimchandi posted:

That's my general approach; I'm sure I will find some dead ends along the way but that's how I will learn the most about it.

For example of what I mean by excel versus python issue, there are calculations of customer losses in October that depend on the total customer numbers in September. Only then can you get total customers in October, which can be used to find losses in Nov etc. This kind of iterative approach makes sense to me in Excel, but in pandas I am used to calculating each series at a time, rather than two interdependently.
Essentially your problem will be to concisely write up your problems and post them at the correct time ITT, so SurgicalOntologist can provide the required Pandas one-liner to solve all your problems :v:

Cingulate
Oct 23, 2012

by Fluffdaddy
A while ago, I complained about dict(this_will_become_a_str=var) syntax. Somebody told me I was silly because dict is highly convergent with regular kwargs. I have in the meantime understood your point, and it has essentially made me more comfortable using **kwargs well.

Adbot
ADBOT LOVES YOU

Cingulate
Oct 23, 2012

by Fluffdaddy

JVNO posted:

I use PsychoPy to run Psychology experiments. Right now, I'm doing an experiment where participants view a word in multiple different fonts. Here's my problem: The fonts need to be markedly different between presentations, but the letters themselves should always appear in the same spot. So while common monospaced fonts are far too similar to one another, variable spaced fonts tend to result in a much smaller word width with smaller spacing between words.

In my case, I use 5 letter words with Lucida Console for one presentation, and Brush Script for the other. Is there any way in Python or PsychoPy that I can fix/justify the text display so the width and/or spacing of these two fonts are the same? Save for manually setting the X/Y coordinates of each individual letter?

I'm displaying text in my Experiment using TextStim.
Displaying the letters individually is probably gonna be very simple.

code:
## boilerplate here
from psychophy import visual

x_pos = (-3, -1, +1, +3)
def display_word(word, x_pos, font):
    for x, character in zip(word, x_pos):
        visual.TextStim(win, text=x, font=font', pos=(x_pos, 0.0)).draw()
    win.flip()

display_word("code", x_pos, this_font)
or something like that.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply