Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Nippashish: Nov 2, 2005; Let me see you dance!

cinci zoo sniper posted:

Yeah NumPy will internally route stuff to C or Fortran code to speed things up, including hardware-specific optimisations. Pandas does the same twofold, since it has both its own C stuff and is largely built on top of NumPy (wrt data structures and such).

This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.

# ? Jun 10, 2018 19:25

Adbot: ADBOT LOVES YOU

# ? May 30, 2024 13:48

cinci zoo sniper: Mar 15, 2013

Nippashish posted:

This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.

Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series.

# ? Jun 10, 2018 20:02

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Oh yeah my bad, the count function is the best way to do that infinite thing - loops indefinitely and hands you incrementing numbers, what more could you want

# ? Jun 10, 2018 20:34

SurgicalOntologist: Jun 17, 2004

cinci zoo sniper posted:

Apply() is slightly more complicated than that. The method itself will evaluate what is happening inside the function, and correspondingly execute it in "normal Python environment" or in "Cython environment". The former will be a standard for loop with a function call overhead, the latter will be a slightly slower than a directly vectorised operation - it all is fairly dependant on your lambda function and the axis choice. The only hard rule here is that iterrows() is bad, since on to of every other overhead you'll also incur Pandas casting each row into a Series.

Yeah, good point about apply. I think it evaluates the function on the first element more than once, in order to figure out the fastest way to execute it. As a result it might not be the best choice with a function with side effects, like the qr code example. It shouldn't be an issue on most filesystems, but nevertheless that might be a reason to go with iterrows in this case.

Edit: or itertuples

SurgicalOntologist fucked around with this message at 05:08 on Jun 11, 2018

# ? Jun 11, 2018 01:24

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

This was really helpful

# ? Jun 11, 2018 03:12

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Is there a pycharm keyboard shortcut to turn

code:

a, b, c = 1, 2, 3

into

code:

a = 1
b = 2
c = 3

# ? Jun 11, 2018 12:37

cinci zoo sniper: Mar 15, 2013

Boris Galerkin posted:

Is there a pycharm keyboard shortcut to turn
code:
a, b, c = 1, 2, 3
into
code:
a = 1
b = 2
c = 3
?

I�ll need to check but you may be able to coax code style guidelines into disrespecting that and then just give it a pass with code for matter (CTRL+ALT+L by default).

E: I did spin up PyCharm at work and I don't think you can do this in a manner more straightforward than writing a corresponding code template, if any at all (excluding some plugin or external code formatter).

cinci zoo sniper fucked around with this message at 15:38 on Jun 11, 2018

# ? Jun 11, 2018 13:05

bamhand: Apr 15, 2010

I'm trying to get some plotly scatter plots to give different colors to the points based on the value in some column but I'm running into issues. The example code I'm trying to imitate is as follows:

code:

import plotly.plotly as py
import pandas as pd

df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]
df.head(2)

fig = {
    'data': [
        {
            'x': df[df['year']==year]['gdpPercap'],
            'y': df[df['year']==year]['lifeExp'],
            'name': year, 'mode': 'markers',
        } for year in [1952, 1982, 2007]
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}

Now what I have works for when there are no groups:

code:

if group_by=="all":
        return {
                'data': [
                            go.Scatter(
                            x=dff['perf_dt'],
                            y=dff[var_name],
                            mode='markers',
                            marker={
                                'size': 15,
                                'opacity': 0.5,
                                'line': {'width': 0.5, 'color': 'white'}
                                }
                            )
                        ],
                'layout': go.Layout(
                        xaxis={'title': "Performance Month"},
                        yaxis={'title': "Paydown Rate"},
                        margin={'l': 40, 'b': 40, 't': 10, 'r': 0},
                        hovermode='closest'
                )
        }

I'm having trouble figuring out where to incorporate the for loop when using go.Scatter, e.g. the following does not work:

code:

group_list = dff[group_by].unique()
        return {
                'data': [
                            {
                            go.Scatter(
                                x=dff[dff[group_by]==group_val]['perf_dt'],
                                y=dff[dff[group_by]==group_val][var_name],
                                mode='markers',
                                name=group_val,
                                marker={
                                        'size': 15,
                                        'color': dff[group_by],
                                        'opacity': 0.5,
                                        'line': {'width': 0.5, 'color': 'white'}
                                }
                            )
                            } for group_val in group_list                 
                        ],
                'layout': go.Layout(
                        xaxis={'title': "Performance Month"},
                        yaxis={'title': "Paydown Rate"},
                        margin={'l': 40, 'b': 40, 't': 10, 'r': 0},
                        hovermode='closest'
                )
        }

Any ideas on what I'm doing wrong? I've tried moving the brackets for the loop around but no luck.

# ? Jun 11, 2018 21:30

cinci zoo sniper: Mar 15, 2013

You should have a layout for each data group, if I remember correctly.

# ? Jun 11, 2018 21:52

bamhand: Apr 15, 2010

So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call.

# ? Jun 11, 2018 22:35

cinci zoo sniper: Mar 15, 2013

bamhand posted:

So can I wrap the whole thing in the for loop? I'm basically unsure on how the syntax should work with incorporating the loop with the scatter() function call.

I would first try to see if you can make it work with generator expression in layout section, same one you have in data. If you wrap your entire return block inside one, you�ll get 4 single plot figures, rather than 1 figure with 4 subplots that you seem to want.

# ? Jun 11, 2018 22:51

bamhand: Apr 15, 2010

No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like:

code:

{
    print(x)
} for x in y

I'm having a lot of trouble googling the issue. You would think grouping a plot into colors would be a pretty common thing to do. So far I've only found code to do it in R but not python.

e: Wait I found it! You create a list of traces and then have a for loop append to the list. That makes much more sense to me than the example I had in my original post.

code:

    traces = []
    for i in filtered_df.continent.unique():
        df_by_continent = filtered_df[filtered_df['continent'] == i]
        traces.append(go.Scatter(
            x=df_by_continent['gdpPercap'],
            y=df_by_continent['lifeExp'],
            text=df_by_continent['country'],
            mode='markers',
            opacity=0.7,
            marker={
                'size': 15,
                'line': {'width': 0.5, 'color': 'white'}
            },
            name=i
        ))

    return {
        'data': traces,
        'layout': go.Layout(
            xaxis={'type': 'log', 'title': 'GDP Per Capita'},
            yaxis={'title': 'Life Expectancy', 'range': [20, 90]},
            margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
            legend={'x': 0, 'y': 1},
            hovermode='closest'
        )

bamhand fucked around with this message at 02:37 on Jun 12, 2018

# ? Jun 12, 2018 01:32

SurgicalOntologist: Jun 17, 2004

Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists.

Edit: https://bitbucket.org/zzzeek/sqlalchemy/issues/3942/add-a-way-to-fetch-a-model-or-create-one
So let me revise my question: anyone know of a third-party utility library that provides this?

SurgicalOntologist fucked around with this message at 02:58 on Jun 12, 2018

# ? Jun 12, 2018 02:33

cinci zoo sniper: Mar 15, 2013

bamhand posted:

No dice, I tried enclosing stuff inside the for loop in a couple different ways and it didn't work. Also I've never seen a for loop structured in that way before. Is it a special case? It doesn't work for something like:
code:
{
    print(x)
} for x in y
I'm having a lot of trouble googling the issue. You would think grouping a plot into colors would be a pretty common thing to do. So far I've only found code to do it in R but not python.

e: Wait I found it! You create a list of traces and then have a for loop append to the list. That makes much more sense to me than the example I had in my original post.
code:
    traces = []
    for i in filtered_df.continent.unique():
        df_by_continent = filtered_df[filtered_df['continent'] == i]
        traces.append(go.Scatter(
            x=df_by_continent['gdpPercap'],
            y=df_by_continent['lifeExp'],
            text=df_by_continent['country'],
            mode='markers',
            opacity=0.7,
            marker={
                'size': 15,
                'line': {'width': 0.5, 'color': 'white'}
            },
            name=i
        ))

    return {
        'data': traces,
        'layout': go.Layout(
            xaxis={'type': 'log', 'title': 'GDP Per Capita'},
            yaxis={'title': 'Life Expectancy', 'range': [20, 90]},
            margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
            legend={'x': 0, 'y': 1},
            hovermode='closest'
        )

Rereading now with a fresh head - your original example should�ve worked without {} around go.Scatter. And generator expressions are pretty standard stuff - [a for b in c] is a syntactic shortcut for

for b in c:
[].append(a)

# ? Jun 12, 2018 05:17

TheFluff: Dec 13, 2006; FRIENDS, LISTEN TO ME
I AM A SEAGULL
OF WEALTH AND TASTE

SurgicalOntologist posted:

Is there a construct or idiom in SQLAlchemy for "fetch an instance with these attributes; if one doesn't exist, create it"? I find myself needing this a lot. I'll write one for my own uses but maybe it already exists.

Edit: https://bitbucket.org/zzzeek/sqlalchemy/issues/3942/add-a-way-to-fetch-a-model-or-create-one
So let me revise my question: anyone know of a third-party utility library that provides this?

There's a whole bunch of different ways to do this in different database implementations. Off the top of my head, I can think of at least these variants:
1. INSERT ... ON DUPLICATE KEY IGNORE (MySQL/MariaDB, primitive and rarely useful)
2. INSERT ... ON CONFLICT DO NOTHING (Postgres only, I think)
3. MERGE INTO ... WHEN NOT MATCHED BY TARGET THEN INSERT (standard SQL 2003, implemented in MSSQL and Oracle)
4. Two separate queries (select and then insert) in a transaction (dangerous, hard to get right)

Then you get all the other variants if you want to update the potentially existing target row ("upsert"). All of this has a bunch of gotchas - this SO question has a good literature overview.

# ? Jun 12, 2018 15:25

pmchem: Jan 22, 2010

I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python).

I know most of what I'm going to do, but I am looking for a free "coding test" style website for them to run exercises against. For privacy reasons I won't go into, I'd prefer if the site does not require a login (one strike against HackerRank). http://codingbat.com/python seems like what I want: it's lightweight, does the checking of answers on the fly in the browser, is free, and doesn't require a login. But on the downside, its examples are not very extensive.

Does anyone know a better resource / website for what I need?

Even better, perhaps, would be a set of ipython notebooks that could be run without an internet connection. So, they'd need to set up the problems, and have the answer available (but hidden), so the user could check that their code is producing the correct answer.

# ? Jun 13, 2018 02:18

huhu: Feb 24, 2006

pmchem posted:

I need to mentor people of wildly different ages, skills and backgrounds on a language new to them (in this case, Python).

I know most of what I'm going to do, but I am looking for a free "coding test" style website for them to run exercises against. For privacy reasons I won't go into, I'd prefer if the site does not require a login (one strike against HackerRank). http://codingbat.com/python seems like what I want: it's lightweight, does the checking of answers on the fly in the browser, is free, and doesn't require a login. But on the downside, its examples are not very extensive.

Does anyone know a better resource / website for what I need?

Even better, perhaps, would be a set of ipython notebooks that could be run without an internet connection. So, they'd need to set up the problems, and have the answer available (but hidden), so the user could check that their code is producing the correct answer.

Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points.

# ? Jun 13, 2018 02:43

shrike82: Jun 11, 2005

Project Euler focuses more on the math/stats side of things rather than teach coding so really depends on what you're trying to teach them.

Codewars is more oriented towards the coding side of things.

# ? Jun 13, 2018 02:47

pmchem: Jan 22, 2010

huhu posted:

Project Euler is pretty good. Only need a username and password to keep track of progress. Not sure if that's quite what you're looking for. Questions increase in difficulty so people could start at different points.

yeah, I've already got a few of those picked out. Euler and codingbat were my starting points.

After continuing to search, I just found this group of ipython nb's: https://github.com/donnemartin/interactive-coding-challenges

It's pretty much spot-on for what I wanted. Even comes with unit tests in challenge notebooks, and separate solution notebooks. I'll pick and choose the appropriate challenges for my people.

Donne also has these nb's, https://github.com/donnemartin/data-science-ipython-notebooks , which are also nice.

# ? Jun 13, 2018 03:03

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

I have a numpy.ndarray of coordinate points, i.e.,

code:

x = np.ndarray[[1.2, 0.2, 0], [1.3, 0.3, 0], [1.3, 0.0, 1], ...]

And what I want to do is to find the n closest indices where y == 0 but also where z == 0. In the example above, the 2 closest points for this would be at index 0 and 1, because x[2] does not meet the requirement z == 0.

e:
Figured out the problem with argsort: I didn't check for absolute values of y. This works now (from this SO post)

code:

mask = (arr[:, 2] == 0)

# n closest values to where y == 0
n = 5
subset_idx = np.abs(arr[:, 1])[mask].argsort()[:n]
parent_idx = np.arange(arr.shape[0])[mask][subset_idx]

vvv Yeah that's what I meant. vvv

Boris Galerkin fucked around with this message at 16:34 on Jun 13, 2018

# ? Jun 13, 2018 16:16

SurgicalOntologist: Jun 17, 2004

Boris Galerkin posted:

I have a numpy.ndarray of coordinate points, i.e.,
code:
x = np.ndarray[[1.2, 0.2, 0], [1.3, 0.3, 0], [1.3, 0.0, 1], ...]
And what I want to do is to find the n closest indices where y == 0 but also where z == 0. In the example above, the 2 closest points for this would be at index 0 and 1, because x[2] does not meet the requirement z == 0.

Do you mean the closest points to y == 0? Because in your example there are no points with y == 0...

# ? Jun 13, 2018 16:32

bamhand: Apr 15, 2010

Am I missing something really stupid here, I'm kind of muddling my way through python right now. Thanks for everyone's help so far by the way.

ValueError: time data '01-Jun-09' does not match format '"%d-%b-%y"'

e: Nevermind it was the quotes. I am bad at this.

# ? Jun 13, 2018 16:34

SurgicalOntologist: Jun 17, 2004

Boris Galerkin posted:

vvv Yeah that's what I meant. vvv

Do you want the closest indices to y == 0 where z == 0? Or do you want the closest indices to (y, z) == (0, 0)?

# ? Jun 13, 2018 16:54

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections.

Heck, it's frustrating to just click on one and lose the state of all the others.

My fallback is to try to create a special unit test class that just references the 14 tests I need to run, but the test bench for this project is kind of ridiculous and I anticipate it being troublesome.

# ? Jun 18, 2018 05:40

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Doesn't the Python version have a "Rerun all failed" button in the test window?

# ? Jun 18, 2018 06:01

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Rocko Bonaparte posted:

Is there a way in PyCharm to group unit tests? I think this is a thing in ReSharper but it's not in the PyCharm unit test runner. I'm trying to debug some issues in an open source project, and it takes an hour to rerun all the 14,000+ tests. I just have 14 failures to thumb through scattered across different sections.

Heck, it's frustrating to just click on one and lose the state of all the others.

My fallback is to try to create a special unit test class that just references the 14 tests I need to run, but the test bench for this project is kind of ridiculous and I anticipate it being troublesome.

I'm not at a PC to tell you exactly what to click on, but you can run just failed tests and even better it autosaves the last X test runs so you can always open up whichever recent run you want.

You can also manually save a test run.

It's all in the buttons in the tests tool window...

# ? Jun 18, 2018 06:54

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can.

# ? Jun 18, 2018 07:04

cinci zoo sniper: Mar 15, 2013

Rocko Bonaparte posted:

None of that seems to persist if I close PyCharm. I want something that will survive reboots and such. This sounds dumb but I am new to the source's test environment and expect this to be pretty rough until then; it is something I am looking over at home when I can.

You can export test results to file and import them in new session.

# ? Jun 18, 2018 07:21

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

I didnt know that about PyCharm, thanks thread!

# ? Jun 19, 2018 02:59

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results."

# ? Jun 19, 2018 05:33

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Yeah, that's what I was talking about.

# ? Jun 19, 2018 06:29

cinci zoo sniper: Mar 15, 2013

Rocko Bonaparte posted:

Oh poo poo I don't even have to explicit export them. There's just a pile of them in "Run->Import Test Results."

Yeah they are stored automatically in IDEs memory, in some capacity.

# ? Jun 19, 2018 06:56

HamsterPolice: Apr 17, 2016

Are there any good packages for using Python to connect to S3? Right now I�m using Boto.

# ? Jun 19, 2018 15:00

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Boto is all I've ever used. It's not great, but it works.

Also, pipenv support is coming to PyCharm in 183.155.

# ? Jun 19, 2018 17:44

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

So fstrings are really cool but restricts me to 3.6+. Is there a way to use fstrings with 3.5+?

# ? Jun 20, 2018 08:19

necrotic: Aug 2, 2005; I owe my brother big time for this!

https://github.com/asottile/future-fstrings/blob/master/README.md ?

# ? Jun 20, 2018 14:09

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

necrotic posted:

https://github.com/asottile/future-fstrings/blob/master/README.md ?

Holy poo poo that is amazing. Thanks.

# ? Jun 20, 2018 17:58

mr_package: Jun 13, 2000

I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions?

mr_package fucked around with this message at 22:04 on Jun 21, 2018

# ? Jun 21, 2018 21:59

cinci zoo sniper: Mar 15, 2013

mr_package posted:

I've written a module that interfaces with a database, and for 'simplicity of interface' I wrote most functions to return a list even if we expect a single result, because most functions do return more than one thing (or could). e.g. both 'get_item_from_queue' and 'get_queue' return a list of dictionaries even though 'get_item_from_queue' will always return a single item because the SQL behind it is literally SELECT TOP 1. Doing some refactoring now, and looking at this it seems like hobgoblin consistency. But I can see the rationale. Especially if the design ever changes. I dunno-- would get_item_from_queue ever really be expected to return more than one thing? Probably not. Opinions?

In my experience, this is a useful thing to do for data retrieval or parsing. My context usually involves large and complex JSON or XML documents, but I still basically force everything to lists since then you write and test just one set of operations.

# ? Jun 21, 2018 22:15

Adbot: ADBOT LOVES YOU

# ? May 30, 2024 13:48

necrotic: Aug 2, 2005; I owe my brother big time for this!

Based on the name I would think not, but the name could be more clear with like get_top_item_from_queue. Always returning a list for even single item gets seems annoying as hell from a usage perspective. Don't make me destructure/unwrap every time I use a method.

# ? Jun 21, 2018 22:21

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »