Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nippashish
Nov 2, 2005

Let me see you dance!

cinci zoo sniper posted:

Yeah NumPy will internally route stuff to C or Fortran code to speed things up, including hardware-specific optimisations. Pandas does the same twofold, since it has both its own C stuff and is largely built on top of NumPy (wrt data structures and such).

This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 15, 2013




Nippashish posted:

This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.

Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Oh yeah my bad, the count function is the best way to do that infinite thing - loops indefinitely and hands you incrementing numbers, what more could you want

SurgicalOntologist
Jun 17, 2004

cinci zoo sniper posted:

Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series.

Yeah, good point about apply. I think it evaluates the function on the first element more than once, in order to figure out the fastest way to execute it. As a result it might not be the best choice with a function with side effects, like the qr code example. It shouldn't be an issue on most filesystems, but nevertheless that might be a reason to go with iterrows in this case.

Edit: or itertuples

SurgicalOntologist fucked around with this message at 05:08 on Jun 11, 2018

CarForumPoster
Jun 26, 2013

⚡POWER⚡
This was really helpful

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
Is there a pycharm keyboard shortcut to turn

code:
a, b, c = 1, 2, 3
into

code:
a = 1
b = 2
c = 3
?

cinci zoo sniper
Mar 15, 2013




Boris Galerkin posted:

Is there a pycharm keyboard shortcut to turn

code:
a, b, c = 1, 2, 3
into

code:
a = 1
b = 2
c = 3
?

I’ll need to check but you may be able to coax code style guidelines into disrespecting that and then just give it a pass with code for matter (CTRL+ALT+L by default).

E: I did spin up PyCharm at work and I don't think you can do this in a manner more straightforward than writing a corresponding code template, if any at all (excluding some plugin or external code formatter).

cinci zoo sniper fucked around with this message at 15:38 on Jun 11, 2018

bamhand
Apr 15, 2010
I'm trying to get some plotly scatter plots to give different colors to the points based on the value in some column but I'm running into issues. The example code I'm trying to imitate is as follows:

code:
import plotly.plotly as py
import pandas as pd

df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]
df.head(2)

fig = {
    'data': [
        {
            'x': df[df['year']==year]['gdpPercap'],
            'y': df[df['year']==year]['lifeExp'],
            'name': year, 'mode': 'markers',
        } for year in [1952, 1982, 2007]
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}
Now what I have works for when there are no groups:
code:
if group_by=="all":
        return {
                'data': [
                            go.Scatter(
                            x=dff['perf_dt'],
                            y=dff[var_name],
                            mode='markers',
                            marker={
                                'size': 15,
                                'opacity': 0.5,
                                'line': {'width': 0.5, 'color': 'white'}
                                }
                            )
                        ],
                'layout': go.Layout(
                        xaxis={'title': "Performance Month"},
                        yaxis={'title': "Paydown Rate"},
                        margin={'l': 40, 'b': 40, 't': 10, 'r': 0},
                        hovermode='closest'
                )
        }
I'm having trouble figuring out where to incorporate the for loop when using go.Scatter, e.g. the following does not work:
code:
group_list = dff[group_by].unique()
        return {
                'data': [
                            {
                            go.Scatter(
                                x=dff[dff[group_by]==group_val]['perf_dt'],
                                y=dff[dff[group_by]==group_val][var_name],
                                mode='markers',
                                name=group_val,
                                marker={
                                        'size': 15,
                                        'color': dff[group_by],
                                        'opacity': 0.5,
                                        'line': {'width': 0.5, 'color': 'white'}
                                }
                            )
                            } for group_val in group_list                 
                        ],
                'layout': go.Layout(
                        xaxis={'title': "Performance Month"},
                        yaxis={'title': "Paydown Rate"},
                        margin={'l': 40, 'b': 40, 't': 10, 'r': 0},
                        hovermode='closest'
                )
        }
Any ideas on what I'm doing wrong? I've tried moving the brackets for the loop around but no luck.

cinci zoo sniper
Mar 15, 2013




You should have a layout for each data group, if I remember correctly.

bamhand
Apr 15, 2010
So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call.

cinci zoo sniper
Mar 15, 2013




bamhand posted:

So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call.

I would first try to see if you can make it work with generator expression in layout section, same one you have in data. If you wrap your entire return block inside one, you’ll get 4 single plot figures, rather than 1 figure with 4 subplots that you seem to want.

bamhand
Apr 15, 2010
No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like:

code:
{
    print(x)
} for x in y
I'm having a lot of trouble googling the issue. You would think grouping a plot into colors would be a pretty common thing to do. So far I've only found code to do it in R but not python.

e: Wait I found it! You create a list of traces and then have a for loop append to the list. That makes much more sense to me than the example I had in my original post.

code:
    traces = []
    for i in filtered_df.continent.unique():
        df_by_continent = filtered_df[filtered_df['continent'] == i]
        traces.append(go.Scatter(
            x=df_by_continent['gdpPercap'],
            y=df_by_continent['lifeExp'],
            text=df_by_continent['country'],
            mode='markers',
            opacity=0.7,
            marker={
                'size': 15,
                'line': {'width': 0.5, 'color': 'white'}
            },
            name=i
        ))

    return {
        'data': traces,
        'layout': go.Layout(
            xaxis={'type': 'log', 'title': 'GDP Per Capita'},
            yaxis={'title': 'Life Expectancy', 'range': [20, 90]},
            margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
            legend={'x': 0, 'y': 1},
            hovermode='closest'
        )

bamhand fucked around with this message at 02:37 on Jun 12, 2018

SurgicalOntologist
Jun 17, 2004

Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists.

Edit: https://bitbucket.org/zzzeek/sqlalchemy/issues/3942/add-a-way-to-fetch-a-model-or-create-one
So let me revise my question: anyone know of a third-party utility library that provides this?

SurgicalOntologist fucked around with this message at 02:58 on Jun 12, 2018

cinci zoo sniper
Mar 15, 2013




bamhand posted:

No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like:

code:
{
    print(x)
} for x in y
I'm having a lot of trouble googling the issue. You would think grouping a plot into colors would be a pretty common thing to do. So far I've only found code to do it in R but not python.

e: Wait I found it! You create a list of traces and then have a for loop append to the list. That makes much more sense to me than the example I had in my original post.

code:
    traces = []
    for i in filtered_df.continent.unique():
        df_by_continent = filtered_df[filtered_df['continent'] == i]
        traces.append(go.Scatter(
            x=df_by_continent['gdpPercap'],
            y=df_by_continent['lifeExp'],
            text=df_by_continent['country'],
            mode='markers',
            opacity=0.7,
            marker={
                'size': 15,
                'line': {'width': 0.5, 'color': 'white'}
            },
            name=i
        ))

    return {
        'data': traces,
        'layout': go.Layout(
            xaxis={'type': 'log', 'title': 'GDP Per Capita'},
            yaxis={'title': 'Life Expectancy', 'range': [20, 90]},
            margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
            legend={'x': 0, 'y': 1},
            hovermode='closest'
        )

Rereading now with a fresh head - your original example should’ve worked without {} around go.Scatter. And generator expressions are pretty standard stuff - [a for b in c] is a syntactic shortcut for

for b in c:
[].append(a)

TheFluff
Dec 13, 2006

FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

SurgicalOntologist posted:

Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists.

Edit: https://bitbucket.org/zzzeek/sqlalchemy/issues/3942/add-a-way-to-fetch-a-model-or-create-one
So let me revise my question: anyone know of a third-party utility library that provides this?

There's a whole bunch of different ways to do this in different database implementations. Off the top of my head, I can think of at least these variants:
1. INSERT ... ON DUPLICATE KEY IGNORE (MySQL/MariaDB, primitive and rarely useful)
2. INSERT ... ON CONFLICT DO NOTHING (Postgres only, I think)
3. MERGE INTO ... WHEN NOT MATCHED BY TARGET THEN INSERT (standard SQL 2003, implemented in MSSQL and Oracle)
4. Two separate queries (select and then insert) in a transaction (dangerous, hard to get right)

Then you get all the other variants if you want to update the potentially existing target row ("upsert"). All of this has a bunch of gotchas - this SO question has a good literature overview.

pmchem
Jan 22, 2010


I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python).

I know most of what I'm going to do, but I am looking for a free "coding test" style website for them to run exercises against. For privacy reasons I won't go into, I'd prefer if the site does not require a login (one strike against HackerRank). http://codingbat.com/python seems like what I want: it's lightweight, does the checking of answers on the fly in the browser, is free, and doesn't require a login. But on the downside, its examples are not very extensive.

Does anyone know a better resource / website for what I need?

Even better, perhaps, would be a set of ipython notebooks that could be run without an internet connection. So, they'd need to set up the problems, and have the answer available (but hidden), so the user could check that their code is producing the correct answer.

huhu
Feb 24, 2006

pmchem posted:

I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python).

I know most of what I'm going to do, but I am looking for a free "coding test" style website for them to run exercises against. For privacy reasons I won't go into, I'd prefer if the site does not require a login (one strike against HackerRank). http://codingbat.com/python seems like what I want: it's lightweight, does the checking of answers on the fly in the browser, is free, and doesn't require a login. But on the downside, its examples are not very extensive.

Does anyone know a better resource / website for what I need?

Even better, perhaps, would be a set of ipython notebooks that could be run without an internet connection. So, they'd need to set up the problems, and have the answer available (but hidden), so the user could check that their code is producing the correct answer.

Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points.

shrike82
Jun 11, 2005

Project Euler focuses more on the math/stats side of things rather than teach coding so really depends on what you're trying to teach them.

Codewars is more oriented towards the coding side of things.

pmchem
Jan 22, 2010


huhu posted:

Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points.

yeah, I've already got a few of those picked out. Euler and codingbat were my starting points.

After continuing to search, I just found this group of ipython nb's: https://github.com/donnemartin/interactive-coding-challenges

It's pretty much spot-on for what I wanted. Even comes with unit tests in challenge notebooks, and separate solution notebooks. I'll pick and choose the appropriate challenges for my people.

Donne also has these nb's, https://github.com/donnemartin/data-science-ipython-notebooks , which are also nice.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I have a numpy.ndarray of coordinate points, i.e.,

code:
x = np.ndarray[[1.2, 0.2, 0], [1.3, 0.3, 0], [1.3, 0.0, 1], ...]
And what I want to do is to find the n closest indices where y == 0 but also where z == 0. In the example above, the 2 closest points for this would be at index 0 and 1, because x[2] does not meet the requirement z == 0.

e:
Figured out the problem with argsort: I didn't check for absolute values of y. This works now (from this SO post)

code:
mask = (arr[:, 2] == 0)

# n closest values to where y == 0
n = 5
subset_idx = np.abs(arr[:, 1])[mask].argsort()[:n]
parent_idx = np.arange(arr.shape[0])[mask][subset_idx]
vvv Yeah that's what I meant. vvv

Boris Galerkin fucked around with this message at 16:34 on Jun 13, 2018

SurgicalOntologist
Jun 17, 2004

Boris Galerkin posted:

I have a numpy.ndarray of coordinate points, i.e.,

code:
x = np.ndarray[[1.2, 0.2, 0], [1.3, 0.3, 0], [1.3, 0.0, 1], ...]
And what I want to do is to find the n closest indices where y == 0 but also where z == 0. In the example above, the 2 closest points for this would be at index 0 and 1, because x[2] does not meet the requirement z == 0.

Do you mean the closest points to y == 0? Because in your example there are no points with y == 0...

bamhand
Apr 15, 2010
Am I missing something really stupid here, I'm kind of muddling my way through python right now. Thanks for everyone's help so far by the way.

ValueError: time data '01-Jun-09' does not match format '"%d-%b-%y"'

e: Nevermind it was the quotes. I am bad at this.

SurgicalOntologist
Jun 17, 2004

Boris Galerkin posted:

vvv Yeah that's what I meant. vvv

Do you want the closest indices to y == 0 where z == 0? Or do you want the closest indices to (y, z) == (0, 0)?

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections.

Heck, it's frustrating to just click on one and lose the state of all the others.

My fallback is to try to create a special unit test class that just references the 14 tests I need to run, but the test bench for this project is kind of ridiculous and I anticipate it being troublesome.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Doesn't the Python version have a "Rerun all failed" button in the test window?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Rocko Bonaparte posted:

Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections.

Heck, it's frustrating to just click on one and lose the state of all the others.

My fallback is to try to create a special unit test class that just references the 14 tests I need to run, but the test bench for this project is kind of ridiculous and I anticipate it being troublesome.

I'm not at a PC to tell you exactly what to click on, but you can run just failed tests and even better it autosaves the last X test runs so you can always open up whichever recent run you want.

You can also manually save a test run.

It's all in the buttons in the tests tool window...

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can.

cinci zoo sniper
Mar 15, 2013




Rocko Bonaparte posted:

None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can.

You can export test results to file and import them in new session.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I didnt know that about PyCharm, thanks thread!

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results."

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Yeah, that's what I was talking about.

cinci zoo sniper
Mar 15, 2013




Rocko Bonaparte posted:

Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results."

Yeah they are stored automatically in IDEs memory, in some capacity.

HamsterPolice
Apr 17, 2016

Are there any good packages for using Python to connect to S3? Right now I’m using Boto.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Boto is all I've ever used. It's not great, but it works.

Also, pipenv support is coming to PyCharm in 183.155.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
So fstrings are really cool but restricts me to 3.6+. Is there a way to use fstrings with 3.5+?

necrotic
Aug 2, 2005
I owe my brother big time for this!
https://github.com/asottile/future-fstrings/blob/master/README.md ?

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Holy poo poo that is amazing. Thanks.

mr_package
Jun 13, 2000
I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions?

mr_package fucked around with this message at 22:04 on Jun 21, 2018

cinci zoo sniper
Mar 15, 2013




mr_package posted:

I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions?

In my experience, this is a useful thing to do for data retrieval or parsing. My context usually involves large and complex JSON or XML documents, but I still basically force everything to lists since then you write and test just one set of operations.

Adbot
ADBOT LOVES YOU

necrotic
Aug 2, 2005
I owe my brother big time for this!
Based on the name I would think not, but the name could be more clear with like get_top_item_from_queue. Always returning a list for even single item gets seems annoying as hell from a usage perspective. Don't make me destructure/unwrap every time I use a method.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply