Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Chin Strap posted:

It isn't as good though
I might check it out, thanks.

StumblyWumbly posted:

I've found that Plotly makes better graphs more easily compared to Matplotlib.
Plotly is really good for a lot of things.

Matplotlib's strength is its weakness; it can do everything. But its API can be cumbersome because of that. I like Seaborn for making the most common types of plots. It's nice because it's a wrapper around Matplotlib, so you can still add extra fancy customizations to its Figure objects if you want to.

Adbot
ADBOT LOVES YOU

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Chin Strap posted:

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)

code:
# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)
All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.
Thanks for the perspective (and for contributing to pandas!). I agree 100% that using a well-established, mature package has a lot to be said for it. Basically everything in the Python data world supports pandas, or even relies on it. Given that, I should probably just get a lot better at pandas, and (at least for now) use polars for use cases when performance matters a lot. I'll check out that book you recommended.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
RealPython has a good article on list comprehensions: https://realpython.com/list-comprehension-python/

(Their stuff is usually good, FYI)

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
They're making it so you can turn off that "feature" pretty soon.

No, Excel, I did not mean "October 1, 1950" when I entered "10-50"

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
When plotting stuff with Matplotlib/Seaborn, is there a simple way to ensure that the legend always winds up in the same spot outside of the plot(s)? I generally use the loc='upper right', bbox_to_anchor=(X, 1) arguments to get the legend outside and to the right of the plots, where X is some number like 1.2 or 1.4, but then I need to keep manually tweaking X to get the legend exactly where I want it (with its top roughly aligned to the top of the plots, and a bit of whitespace between the right border of the plots and the left border of the legend). This of course involves constantly regenerating the plot(s), which may or may not be fast depending on what I'm doing. Surely there is a less annoying way to do this?

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
The official solution is really verbose, non-Pythonic code, and the verbosity doesn't add anything useful like error checking. I don't think you're missing anything.

E: It's possible the official solution is written that way for teaching purposes, since each step in the logic is explicit. Like, by taking 4 lines of code to get "front," that avoids needing to know that if you slice a string to a higher end index than the string's length, you just get whatever's in the string.

Zugzwang fucked around with this message at 06:36 on Feb 15, 2024

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
PyArrow has a Parquet module. You might try that if Polars is bring uncooperative.

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

QuarkJets posted:

Hmm it looks like applying lru_cache to methods prevents instances of that class from ever getting garbage collected, that's a bummer.
What about after using method.cache_clear()?

In any case, that certainly explains weird bugs I've run into when using lru_cache.

Adbot
ADBOT LOVES YOU

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme
DuckDB uses SQL to manipulate dataframes and is insanely performant. It interfaces nicely with pandas too.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply