Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
SurgicalOntologist
Jun 17, 2004

CarForumPoster posted:

I can't think of a super easy way to do this and I bet there is one.

I have a pandas df:
code:
id|name|attrs
123|bob|[cool,old,man]
456|dave|[uncool,old,man]
I want:
code:
id|name|attrs
123|bob|cool
123|bob|old
123|bob|man
456|dave|uncool
456|dave|old
456|dave|man
Whats the pythonic/pandas way to do this for a list (df['attrs'][index]) of arbitrary length?

There may be a shortcut or more clever way, but if you can assume that the list is the same length for all of them you can use the .str accessor to access list items (I never understood why this is is in the string accessor but it's handy).

Python code:
n = len(df.attrs.iat[0])

# Unpack each list item to its own column.
for i in range(n):
    df[f'attr{i}'] = df.attrs.str[0]

# Convert wide to long.
new_df = pd.melt(
    df.drop('attrs', axis='columns'),
    id_vars=['id', 'name'],
    var_name='attr_ix',
    value_name='attr',
)
This will include a column 'attr_ix' with values like ['attr1', 'attr2', ...], so if you want just add .drop('attr_ix', axis='columns').

I'm 95% sure this is correct but there may be something weird that happens with the indices or something, melt sometimes takes several tries to get right.

Edit: After writing this I googled it and found some clever solutions based on df.attrs.apply(pd.Series) which I didn't realize would work. That only replaces the loop in my code, although I think it would work better with varying-size lists.

SurgicalOntologist fucked around with this message at 14:39 on Sep 14, 2019

Adbot
ADBOT LOVES YOU

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SurgicalOntologist posted:

There may be a shortcut or more clever way, but if you can assume that the list is the same length for all of them you can use the .str accessor to access list items (I never understood why this is is in the string accessor but it's handy).

Python code:
n = len(df.attrs.iat[0])

# Unpack each list item to its own column.
for i in range(n):
    df[f'attr{i}'] = df.attrs.str[0]

# Convert wide to long.
new_df = pd.melt(
    df.drop('attrs', axis='columns'),
    id_vars=['id', 'name'],
    var_name='attr_ix',
    value_name='attr',
)
This will include a column 'attr_ix' with values like ['attr1', 'attr2', ...], so if you want just add .drop('attr_ix', axis='columns').

I'm 95% sure this is correct but there may be something weird that happens with the indices or something, melt sometimes takes several tries to get right.

Edit: After writing this I googled it and found some clever solutions based on df.attrs.apply(pd.Series) which I didn't realize would work. That only replaces the loop in my code, although I think it would work better with varying-size lists.

Melt looks like what I want, never thought to describe it as unpivoting Thank you!

duck monster
Dec 15, 2004

Meyers-Briggs Testicle posted:

I've never programmed in an enterprise environment but probably want to do that now. Are ETL and Airflow something I can learn shallowly and easily enough to slap on my resume or would someones bullshit meter immediately be flipped

Stick with what you know. ETL is part of the whole enterprise bus type bullshit, and unless your specifically aiming at working Enterprise Bus's its poo poo you pick up pretty quickly on the job.

I've never seen airflow used in the wild. Im sure it is, but I havent encountered it yet.

Don't sweat the details, they arent looking for how many buzz words you can write, they just want to know if your an intuitive codder that can fit into a bigger team without imploding from structural confusion

a foolish pianist
May 6, 2007

(bi)cyclic mutation

CarForumPoster posted:

I can't think of a super easy way to do this and I bet there is one.

I have a pandas df:
code:
id|name|attrs
123|bob|[cool,old,man]
456|dave|[uncool,old,man]
I want:
code:
id|name|attrs
123|bob|cool
123|bob|old
123|bob|man
456|dave|uncool
456|dave|old
456|dave|man
Whats the pythonic/pandas way to do this for a list (df['attrs'][index]) of arbitrary length?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html#pandas.DataFrame.explode

CarForumPoster
Jun 26, 2013

⚡POWER⚡

This owns, exactly what I want, thanks!

I love you thread, you're so much nicer to me than stack overflow.

SurgicalOntologist
Jun 17, 2004

Glad to know about that one, thanks!

susan b buffering
Nov 14, 2016

KICK BAMA KICK posted:

None of those books/videos look like the standard "learn Python" recommendations (IDK anything about those fields so I'm not sure if the ones specifically about like Pandas would be relevant to your interests). But PyCharm is the most common IDE for Python, and I think some of the features the Professional edition adds do relate to math/scientific/data libraries like NumPy, which I'm guessing you'd find yourself using at some point, so those licenses in the bundles are a great value (like $89 for a year standard but maybe you can swing an educational discount?). My suggestion: set the reminder on that bundle, install the free edition of PyCharm, start learning from whatever resource you find/someone else here recommends, and if you think you're gonna keep using it grab that 6-month Professional license at the $20 tier, can't lose at that price.

Definitely try what QuarkJets said but if you must compile for whatever reason, this worked for me on a 3B (with maybe one or two substitutions even I was able to figure out, and I am dumb) and includes the flags that enable some optimizations for the Pi's chips.

PyCharm Pro is free for students (along with the rest of their IDEs).

Evrart Claire
Jan 11, 2008

KICK BAMA KICK posted:

None of those books/videos look like the standard "learn Python" recommendations (IDK anything about those fields so I'm not sure if the ones specifically about like Pandas would be relevant to your interests). But PyCharm is the most common IDE for Python, and I think some of the features the Professional edition adds do relate to math/scientific/data libraries like NumPy, which I'm guessing you'd find yourself using at some point, so those licenses in the bundles are a great value (like $89 for a year standard but maybe you can swing an educational discount?). My suggestion: set the reminder on that bundle, install the free edition of PyCharm, start learning from whatever resource you find/someone else here recommends, and if you think you're gonna keep using it grab that 6-month Professional license at the $20 tier, can't lose at that price.

Thanks! Not a student currently so don't think i can swing any educational discounts. Graduated in 2017 and had to work some real poo poo jobs for a couple years and currently just an office assistant at a state gov. agency since I was broke without a car on graduation which limited where I could actually get to interviews at, and first couple interviews for PhD positions I had went lovely enough to really kill my confidence for awhile.

Dominoes
Sep 20, 2007

Hail Mary: Has anyone successfully compiled a portable Windows cPython? When running the build.bat script, the install seems to tie into the system path, cause it to break when moving the folder it was built into. This is perhaps a deceptively hard problem.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Dominoes posted:

Hail Mary: Has anyone successfully compiled a portable Windows cPython? When running the build.bat script, the install seems to tie into the system path, cause it to break when moving the folder it was built into. This is perhaps a deceptively hard problem.

I ran across this a week or so ago, maybe it can help?

https://github.com/indygreg/python-build-standalone

Dominoes
Sep 20, 2007

Thermopyle posted:

I ran across this a week or so ago, maybe it can help?

https://github.com/indygreg/python-build-standalone
Awesome! Looks like exactly what I was looking for. I was able to get a working Windows one downloaded from an official Python on nuget.org; ie extracting a Visual Studio Python standalone for Windows. I've learned this is a solved problem for Win, but still an open question on an "any-linux" Python distro. I wonder if I could use the link you posted to at least build Ubuntu-wide, Debian-wide etc distros.

Dominoes fucked around with this message at 16:19 on Sep 18, 2019

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I love this thread

KICK BAMA KICK
Mar 2, 2009

Wow "Obey the Testing Goat" is really good. Wish I'd read it before starting my Django project. Doubt I fully commit to TDD but the ideas of doing one thing at a time and no seriously, idiot, only do one thing at a time! seem really valuable. And this is sticking with me more than any other textbook-ish thing I've ever read on coding; using something lightweight but approaching the form of a real-world application and actually walking you through real-world deployment is a lot more compelling than the "a Bunny is a type of Animal" illustrations a lot of books use.

NinpoEspiritoSanto
Oct 22, 2013




I recently started with TDD and pytest. Completely alien to me but after reading the clean architecture book and the YouTube video on the topic targeted at python I decided to give it a go.

It's taken some getting my head around but so far I'm really liking how it makes me think about the problem at hand and the code in terms of data.

Also +1 for the goat. I also recommend a look at Hypothesis for anyone interested in testing, it's excellent for catching edge cases and there's a tool out there that generates test scenarios with hypothesis based on type annotations if you use those. I am away from my PC atm I'll reply with the name of that when I have it.

NinpoEspiritoSanto fucked around with this message at 02:16 on Sep 19, 2019

Dominoes
Sep 20, 2007

Be careful - like with other paradigms (eg OOP, functional programming etc), it's easy to over-apply, especially when just learning it. Don't let writing tests slow down your prototyping and feature adding. related

Dominoes fucked around with this message at 04:35 on Sep 19, 2019

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I could have sworn that you could right click a ipynb file in PyCharm and convert it into a normal py file but the option doesn't show up and I can't find anything about it in the settings. Am I misremembering or something?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Boris Galerkin posted:

I could have sworn that you could right click a ipynb file in PyCharm and convert it into a normal py file but the option doesn't show up and I can't find anything about it in the settings. Am I misremembering or something?

When I open them in PyCharm Pro they open side by side as a regular .py file and a jupyter notebook.

death cob for cutie
Dec 30, 2006

dwarves won't delve no more
too much splatting down on Zot:4

Thermopyle posted:

Keep us updated!

Figured it out: the situations in which I was able to mainpulate a list in session no problem, I was also modifying such as an int or string stored in session, adding/removing a key, etc.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
TLDR: How can I have the functionality of df.drop_duplicates() but with mixed data types?

Details:
I have a DataFrame (df1) that needs to have rows added to it from df2 if they're not already in it. I figure the easiest way to do this is to merge the two DFs, then drop_duplicates().

Something like:
code:
df_all = df1.merge(df2, on=['col1','col2','col3'], how='left', indicator=True)
df_all.drop_duplicates(keep='first', inplace=True)
Sadly, both my dataframes have many different datatypes including lists, dicts, nested dataframes, etc.

How can I have the functionality of drop_duplicates but with mixed data types? Surprisingly SO didn't have an obvious easy answer when I googled it.

EDIT: Found this right after hitting reply: df_all.duplicated(subset=[column_list]) tells me which ones to remove, then I just remove those which are duplicated.

CarForumPoster fucked around with this message at 19:49 on Sep 20, 2019

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Boris Galerkin posted:

I could have sworn that you could right click a ipynb file in PyCharm and convert it into a normal py file but the option doesn't show up and I can't find anything about it in the settings. Am I misremembering or something?

Vscode python tools will do this

vikingstrike
Sep 23, 2007

whats happening, captain
You can also do it in the Jupyter web interface

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I have a number 1.23 and I want to round it up to 1.3. Is there a built in or numpy function to do this because this is the best I’ve got:

code:
import math

x = 1.23
rounded = math.floor(x) + math.ceil(10 * (x - math.floor(x))) / 10

print(rounded)  # 1.3
Similarly how would I round 1.29 down to 1.2?

code:
y = 1.29
rounded = y // 0.1 * 0.1
At least rounding down is less :wtf: than rounding up.

Boris Galerkin fucked around with this message at 11:19 on Sep 24, 2019

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

CarForumPoster posted:

When I open them in PyCharm Pro they open side by side as a regular .py file and a jupyter notebook.

This must be a new feature then. Coulda sworn before I had to right click the file and save/convert to a python script.

the yeti
Mar 29, 2008

memento disco



Boris Galerkin posted:

I have a number 1.23 and I want to round it up to 1.3. Is there a built in or numpy function to do this because this is the best I’ve got:

code:
import math

x = 1.23
rounded = math.floor(x) + math.ceil(10 * (x - math.floor(x))) / 10

print(rounded)  # 1.3
Similarly how would I round 1.29 down to 1.2?

code:
y = 1.29
rounded = y // 0.1 * 0.1
At least rounding down is less :wtf: than rounding up.

Just off the cuff take a look at realpython.com/Python-rounding

The methods there are similar to what you’re doing but a little cleaner (I think? It’s early) and there might be one in there that covers both of your cases.

Solumin
Jan 11, 2013
round(number+0.5, 1). Annoyingly, round doesn't let you choose which direction to round to, but it does let you choose the number of digits.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

the yeti posted:

Just off the cuff take a look at realpython.com/Python-rounding

The methods there are similar to what you’re doing but a little cleaner (I think? It’s early) and there might be one in there that covers both of your cases.

Thanks, I guess that answers my question on whether or not there’s a built in round up/down function I couldn’t find.

Solumin posted:

round(number+0.5, 1). Annoyingly, round doesn't let you choose which direction to round to, but it does let you choose the number of digits.

That doesn’t really work though. Putting in 0.6 would round down to 1.0, but putting in 0.4 would round up to 1.0.

Anyway is solved now. If anyone was wondering why I wanted this it’s cause I was trying to make axis ticks/labels that would be slightly bigger than min and max values. Though now that I think about it I’m not sure if matplotlib has something that does this automatically cause I didn’t think to look…

OnceIWasAnOstrich
Jul 22, 2006

Python code:
from decimal import Decimal, ROUND_UP

x = Decimal(1.23)
x.quantize(Decimal('0.1'), rounding=ROUND_UP)
Not exactly convenient.

Edit:

Doing this also exposes the weird Python float magic.

Python code:
x = 1.23
str(x)
#'1.23'
Decimal(x)
#Decimal('1.229999999999999982236431605997495353221893310546875')

OnceIWasAnOstrich fucked around with this message at 16:58 on Sep 24, 2019

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

The decimal module is a good thing that more people should be aware of.

I use it all the time.

mbt
Aug 13, 2012

x = 1.23
roundup = float(str(x)[:-1])+0.1

:v:

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

OnceIWasAnOstrich posted:

Not exactly convenient.

Always Be Making Functions

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
float(f"{val:0.1f}")

mr_package
Jun 13, 2000
I write code on MacOS and deploy to Win/Mac/Linux. If I use venv to create a virtual environment locally on my MBP, doesn't it still necessitate massaging the target system quite a bit? The Python binary venv creates on my system isn't going to just copy over to Linux or Windows, right? So... I need to run venv on the target system first and then copy the virtual environment over? Or I have to run something else that does the deployment (installs whatever is in requirements.txt?) as opposed to just copying files? Because if it's RH/CentOS it's not going to have python 3 by default, and even the latest RHEL is shipping Python 3.6. So if I'm on 3.7.4 and my virtual environment is same, I need to take some steps to prepare the server to support 3.7.4 virtual environment, yeah?

There must be some simple/good workflow that doesn't require I install a bunch of modules everywhere I want to deploy a small script? I know this comes up all the time I'm just looking for the "one-- and only one-- right way" to do this. I'd love for it to be venv since it's built-in to the STL; I don't want to have to do a bunch of manual steps on every system before I can run my stuff.

Edit: looks like pip / requirements.txt is probably the simplest option? But still have to get the version of Python I need on each system.
https://stackoverflow.com/questions/48876711/how-to-deploy-flask-virtualenv-into-production

Edit2: also, this: https://www.scylladb.com/2019/02/14/the-complex-path-for-a-simple-portable-python-interpreter-or-snakes-on-a-data-plane/

I'm surprised how much work it seems to be to solve this problem, still.

mr_package fucked around with this message at 21:35 on Sep 24, 2019

NinpoEspiritoSanto
Oct 22, 2013




Check out poetry, from the author of pendulum.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

mr_package posted:

I write code on MacOS and deploy to Win/Mac/Linux. If I use venv to create a virtual environment locally on my MBP, doesn't it still necessitate massaging the target system quite a bit? The Python binary venv creates on my system isn't going to just copy over to Linux or Windows, right? So... I need to run venv on the target system first and then copy the virtual environment over? Or I have to run something else that does the deployment (installs whatever is in requirements.txt?) as opposed to just copying files? Because if it's RH/CentOS it's not going to have python 3 by default, and even the latest RHEL is shipping Python 3.6. So if I'm on 3.7.4 and my virtual environment is same, I need to take some steps to prepare the server to support 3.7.4 virtual environment, yeah?

There must be some simple/good workflow that doesn't require I install a bunch of modules everywhere I want to deploy a small script? I know this comes up all the time I'm just looking for the "one-- and only one-- right way" to do this. I'd love for it to be venv since it's built-in to the STL; I don't want to have to do a bunch of manual steps on every system before I can run my stuff.

Edit: looks like pip / requirements.txt is probably the simplest option? But still have to get the version of Python I need on each system.
https://stackoverflow.com/questions/48876711/how-to-deploy-flask-virtualenv-into-production

Edit2: also, this: https://www.scylladb.com/2019/02/14/the-complex-path-for-a-simple-portable-python-interpreter-or-snakes-on-a-data-plane/

I'm surprised how much work it seems to be to solve this problem, still.

Python is not very good for this. (Stares enviously at go)

Look into nuitka to build a standalone exe from python code.

the docs posted:

It translates the Python into a C level program that then uses libpython and a few C files of its own to execute in the same way as CPython does.

Dominoes
Sep 20, 2007

mr_package posted:

I write code on MacOS and deploy to Win/Mac/Linux. If I use venv to create a virtual environment locally on my MBP, doesn't it still necessitate massaging the target system quite a bit? The Python binary venv creates on my system isn't going to just copy over to Linux or Windows, right? So... I need to run venv on the target system first and then copy the virtual environment over? Or I have to run something else that does the deployment (installs whatever is in requirements.txt?) as opposed to just copying files? Because if it's RH/CentOS it's not going to have python 3 by default, and even the latest RHEL is shipping Python 3.6. So if I'm on 3.7.4 and my virtual environment is same, I need to take some steps to prepare the server to support 3.7.4 virtual environment, yeah?

There must be some simple/good workflow that doesn't require I install a bunch of modules everywhere I want to deploy a small script? I know this comes up all the time I'm just looking for the "one-- and only one-- right way" to do this. I'd love for it to be venv since it's built-in to the STL; I don't want to have to do a bunch of manual steps on every system before I can run my stuff.

Edit: looks like pip / requirements.txt is probably the simplest option? But still have to get the version of Python I need on each system.
https://stackoverflow.com/questions/48876711/how-to-deploy-flask-virtualenv-into-production

Edit2: also, this: https://www.scylladb.com/2019/02/14/the-complex-path-for-a-simple-portable-python-interpreter-or-snakes-on-a-data-plane/

I'm surprised how much work it seems to be to solve this problem, still.
Python installs and venvs are fairly environment-specific. Assuming 64-bit, you can build one that will work on any version of Win or any version of Mac, or a few related Linux Oses (And in the future, maybe any Linux OS), and copy as you describe. Unfortunately, you can't do this from Mac to Win or Linux, as you're asking. Additionally, many Python dependencies are built OS-specific, but these generally have anylinux variants. You'll need to install the Python version you need on the deployment system, and set up a venv there.

The proj I posted about recently addresses this, but is not currently able to install Python on Mac or Linux OSes other than Ubuntu/Debian/Centos/RH/Kali. Even if I set it up to manage Python on other Linux OSes, if the system doesn't have the Py version you want, you'll still be packaging a whole Py interpreter to run your program (~30mb download, unpacked to ~200mb!) . And it won't let you copy+paste from one OS to another.

I think Therm hit the nail on the head - Is Python the right tool here? Of course, fighting through this may be simpler than re-writing your program.

edit: It appears that making a manylinux Python build may involve just building on an old Linux version... ie ones built on Centos 7 appear to work on a range of other Linux distros...

Dominoes fucked around with this message at 13:40 on Sep 26, 2019

duck monster
Dec 15, 2004

Dominoes posted:

Be careful - like with other paradigms (eg OOP, functional programming etc), it's easy to over-apply, especially when just learning it. Don't let writing tests slow down your prototyping and feature adding. related

Cargo Cult think of any sort can be pretty bad. I'm maintaining a Swift app where the whole loving thing is written with the RX libraries, and dear god is it hardcore callback spaghetti. I mean I dig functional coding, I still occasionally treat meself to writing little things in Racket scheme, but in a primarily OO language like swift overuse just leads to madness (It gets even weirder in a language like Java). At least with Python the lack of sensible closures puts at least SOME kibosh on it.

duck monster fucked around with this message at 07:01 on Sep 28, 2019

Dominoes
Sep 20, 2007

I love how Python supports a number of paradigms... Although I wish it did have sensible closures!

Deadite
Aug 30, 2003

A fat guy, a watermelon, and a stack of magazines?
Family.
I don’t know if this is the right place to ask, but I am trying to read a large (750mb) csv file into a pandas dataframe, and it seems to be taking an unreasonably long time. I am limiting the columns to only 8 columns with the usecols option, but the read_csv method is still taking 6 minutes to read the file into python.

I haven’t been using python for very long and I’m coming from a SAS programming background. In SAS this file loads in a few seconds, so I feel like I am screwing something up for this to take so long. I originally tried the read_sas method to load the original 1.5 gb dataset, but I had a memory error and had to convert the file to csv to get around that. The file only has 170k rows.

Does anyone have an idea why this is taking so long? Or is this just a normal amount of time for python to process this file? Google/stack exchange are getting me nowhere.

Edit: Never mind, I switched the file from a network drive to my local drive and now it loads in 4 seconds. I guess it’s a network I/O issue and not a python issue

Deadite fucked around with this message at 03:00 on Sep 28, 2019

duck monster
Dec 15, 2004

Dominoes posted:

I love how Python supports a number of paradigms... Although I wish it did have sensible closures!

Rubys closure syntax would be so easy to implement a variant of that would actually look like it belonged

Something like

code:
class array:
   def each(callback):
	for item in self:
           callback(item)

some_array.each(): [value]
      thing(value)
Generators and the like get you half way there, but they all seem weirdly complicated measures compared to what could be a very simple mechanism

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess.

A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda.

Dominoes fucked around with this message at 14:26 on Sep 28, 2019

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply