Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›2 »

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

Chin Strap posted:

It isn't as good though

I might check it out, thanks.

StumblyWumbly posted:

I've found that Plotly makes better graphs more easily compared to Matplotlib.

Plotly is really good for a lot of things.

Matplotlib's strength is its weakness; it can do everything. But its API can be cumbersome because of that. I like Seaborn for making the most common types of plots. It's nice because it's a wrapper around Matplotlib, so you can still add extra fancy customizations to its Figure objects if you want to.

# ¿ Oct 19, 2023 01:35

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 06:30

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

# ¿ Oct 20, 2023 11:40

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

Chin Strap posted:

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)
code:
# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)
All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.

Thanks for the perspective (and for contributing to pandas!). I agree 100% that using a well-established, mature package has a lot to be said for it. Basically everything in the Python data world supports pandas, or even relies on it. Given that, I should probably just get a lot better at pandas, and (at least for now) use polars for use cases when performance matters a lot. I'll check out that book you recommended.

# ¿ Oct 21, 2023 02:12

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

RealPython has a good article on list comprehensions: https://realpython.com/list-comprehension-python/

(Their stuff is usually good, FYI)

# ¿ Oct 21, 2023 13:47

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

They're making it so you can turn off that "feature" pretty soon.

No, Excel, I did not mean "October 1, 1950" when I entered "10-50"

# ¿ Nov 8, 2023 00:46

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

When plotting stuff with Matplotlib/Seaborn, is there a simple way to ensure that the legend always winds up in the same spot outside of the plot(s)? I generally use the loc='upper right', bbox_to_anchor=(X, 1) arguments to get the legend outside and to the right of the plots, where X is some number like 1.2 or 1.4, but then I need to keep manually tweaking X to get the legend exactly where I want it (with its top roughly aligned to the top of the plots, and a bit of whitespace between the right border of the plots and the left border of the legend). This of course involves constantly regenerating the plot(s), which may or may not be fast depending on what I'm doing. Surely there is a less annoying way to do this?

# ¿ Dec 5, 2023 09:05

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

The official solution is really verbose, non-Pythonic code, and the verbosity doesn't add anything useful like error checking. I don't think you're missing anything.

E: It's possible the official solution is written that way for teaching purposes, since each step in the logic is explicit. Like, by taking 4 lines of code to get "front," that avoids needing to know that if you slice a string to a higher end index than the string's length, you just get whatever's in the string.

Zugzwang fucked around with this message at 06:36 on Feb 15, 2024

# ¿ Feb 15, 2024 06:30

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

PyArrow has a Parquet module. You might try that if Polars is bring uncooperative.

# ¿ Feb 21, 2024 21:49

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

QuarkJets posted:

Hmm it looks like applying lru_cache to methods prevents instances of that class from ever getting garbage collected, that's a bummer.

What about after using method.cache_clear()?

In any case, that certainly explains weird bugs I've run into when using lru_cache.

# ¿ Mar 11, 2024 04:00

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 06:30

Zugzwang: Jan 2, 2005; You have a kind of sick desperation in your laugh.; Ramrod XTreme

DuckDB uses SQL to manipulate dataframes and is insanely performant. It interfaces nicely with pandas too.

# ¿ Mar 23, 2024 13:59

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›2 »