Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

So I'm an experienced python Data Science person but I've relied on all my company's tooling in order to have a good environment before. Let's say I just have a Chromebook, a windows desktop I only interact with remotely with Chrome Remote Desktop, and no current python environment installed anywhere.

I'm probably going to be using Jupiterhub or Colab for doing actual interactive work but would like to set up PyCharm to actually develop my code.

Would it make more sense to:

A) set it up on my windows desktop

Or

B) get some sort of remote hosting linux solution I can remotely log in to and develop on?

I would like it to be as turn key as possible because my experience with setting this stuff up is non-existent. I just know the ins and outs of using it in context of work.

# ¿ Apr 1, 2023 18:40

Adbot: ADBOT LOVES YOU

# ¿ May 12, 2024 04:37

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Seventh Arrow posted:

Would Google Colab be a possible solution?

Jupyter/Colab notebooks suck for actual development of libraries though (I use it daily for interactive and plotting stuff). Encourages way too much bad code practice and isn't an IDE of any kind.

Guess since I use VSCode anyway might as well do that personally too. I'll probably just install Linux on my desktop. Is there any reason to do dual boot over a VM?

# ¿ Apr 2, 2023 00:03

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

joebuddah posted:

Thanks for the assist.

There isn't always a decimal or a comma.
I am supposed to allow for values between 0.001 and 200,000.001

I think you are correct and I should ignore typos.

Isn't this non-distinguishable on its own?

200,002

Doesn't tell me if it is 200 with a decimal portion of 002 or 200,002 with a decimal portion of 0

# ¿ Aug 9, 2023 14:18

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

spiritual bypass posted:

You could read backward through the string to see if there's punctuation before you've passed 3 places. Then, use that knowledge to add zeroes to the end. That ought to at least normalize the decimal part length.

But there potentially isn't a decimal

# ¿ Aug 9, 2023 14:51

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

It's bad.

# ¿ Sep 1, 2023 10:09

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

oatmealraisin posted:

This might be a dumb question, but what are the benefits of a data class over a regular class?

Free string representation, free to_dict implementation, lots of that sort of thing.

If what I'm doing at all looks like "maybe just holding data" I'll use it

# ¿ Sep 17, 2023 00:43

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

https://stackoverflow.com/questions/62919271/how-do-i-define-a-typing-union-dynamically

Use a tuple instead of a list?

# ¿ Sep 21, 2023 10:22

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Used all the time in Pandas too

# ¿ Sep 26, 2023 00:21

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

You can do like

Python code:

Gem2 = Annotated[Gem, ...]

and use that? Pick a better name of course

# ¿ Sep 27, 2023 14:16

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

A) You've got to give error output if you want help

B) This seems like a terrible class to work with. Can't you simplify things at all

# ¿ Sep 27, 2023 15:43

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

To me it sounds like something you'd like to query. I've not used sqlite but if it is really that low of a bar to set up makes sense to me.

# ¿ Sep 28, 2023 14:34

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

BUUNNI posted:

Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet.

https://store.metasnake.com/effective-pandas-book is the best book on Pandas I know of.

# ¿ Oct 18, 2023 20:12

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Zugzwang posted:

Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/

It isn't as good though

# ¿ Oct 18, 2023 21:04

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

https://docs.xarray.dev/en/latest/internals/how-to-create-custom-index.html maybe?

# ¿ Oct 19, 2023 09:02

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Falcon2001 posted:

So it sounds like my dataframe approach isn't insane, at least for ones where visualizing it is reasonable (there was a falling sand puzzle previously that definitely benefited from that), so I'll stick with it for now. I think any time it's super slow is an example of a time where populating the whole map probably isn't a good idea anyway.

You are doing it like a multiindex and not just a tuple valued index right?

# ¿ Oct 19, 2023 19:17

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Falcon2001 posted:

Not a tuple valued index no, just a standard 2d dataframe with custom indices - here's the generation code.
Python code:
self.df = pd.DataFrame(arr, index=range(min_y_index, max_y_index), columns=range(min_x_index, max_x_index))

Oh I missed you only wanted 2d. This is fine, or you could make a thin wrapper around a numpy ndarray with an offset for each axis stored so that when you do `x[i, j]` it actually calls `x[i + x_min, j + y_min]` maybe. Could be faster for algorithms.

# ¿ Oct 19, 2023 20:27

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Zugzwang posted:

Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)

code:

# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)

All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.

# ¿ Oct 20, 2023 14:53

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Just use JSON.

# ¿ Nov 6, 2023 18:53

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

Just use a CSV reader like pandas has.

# ¿ Nov 29, 2023 20:28

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

DoctorTristan posted:

Personally I’m extremely anti using pandas in anything even resembling a pipeline since the devs absolutely love introducing breaking changes.

Better to never curate the API ever.

# ¿ Feb 27, 2024 12:24

Adbot: ADBOT LOVES YOU

# ¿ May 12, 2024 04:37

Chin Strap: Nov 24, 2002; I failed my TFLC Toxx, but I no longer need a double chin strap; Pillbug

You might want to do deepcopy instead of just dict() to copy but that's all I'd change

EDIT: If you wanted to be real safe you'd make them frozen too.

# ¿ Apr 24, 2024 20:20

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python