Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
So I'm an experienced python Data Science person but I've relied on all my company's tooling in order to have a good environment before. Let's say I just have a Chromebook, a windows desktop I only interact with remotely with Chrome Remote Desktop, and no current python environment installed anywhere.

I'm probably going to be using Jupiterhub or Colab for doing actual interactive work but would like to set up PyCharm to actually develop my code.

Would it make more sense to:

A) set it up on my windows desktop

Or

B) get some sort of remote hosting linux solution I can remotely log in to and develop on?

I would like it to be as turn key as possible because my experience with setting this stuff up is non-existent. I just know the ins and outs of using it in context of work.

Adbot
ADBOT LOVES YOU

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Seventh Arrow posted:

Would Google Colab be a possible solution?

Jupyter/Colab notebooks suck for actual development of libraries though (I use it daily for interactive and plotting stuff). Encourages way too much bad code practice and isn't an IDE of any kind.

Guess since I use VSCode anyway might as well do that personally too. I'll probably just install Linux on my desktop. Is there any reason to do dual boot over a VM?

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

joebuddah posted:

Thanks for the assist.

There isn't always a decimal or a comma.
I am supposed to allow for values between 0.001 and 200,000.001

I think you are correct and I should ignore typos.

Isn't this non-distinguishable on its own?

200,002

Doesn't tell me if it is 200 with a decimal portion of 002 or 200,002 with a decimal portion of 0

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

spiritual bypass posted:

You could read backward through the string to see if there's punctuation before you've passed 3 places. Then, use that knowledge to add zeroes to the end. That ought to at least normalize the decimal part length.

But there potentially isn't a decimal

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
It's bad.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

oatmealraisin posted:

This might be a dumb question, but what are the benefits of a data class over a regular class?

Free string representation, free to_dict implementation, lots of that sort of thing.

If what I'm doing at all looks like "maybe just holding data" I'll use it

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
https://stackoverflow.com/questions/62919271/how-do-i-define-a-typing-union-dynamically

Use a tuple instead of a list?

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
Used all the time in Pandas too

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
You can do like
Python code:
Gem2 = Annotated[Gem, ...]
and use that? Pick a better name of course

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
A) You've got to give error output if you want help

B) This seems like a terrible class to work with. Can't you simplify things at all

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
To me it sounds like something you'd like to query. I've not used sqlite but if it is really that low of a bar to set up makes sense to me.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

BUUNNI posted:

Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet.

https://store.metasnake.com/effective-pandas-book is the best book on Pandas I know of.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Zugzwang posted:

Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/

It isn't as good though

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
https://docs.xarray.dev/en/latest/internals/how-to-create-custom-index.html maybe?

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Falcon2001 posted:

So it sounds like my dataframe approach isn't insane, at least for ones where visualizing it is reasonable (there was a falling sand puzzle previously that definitely benefited from that), so I'll stick with it for now. I think any time it's super slow is an example of a time where populating the whole map probably isn't a good idea anyway.

You are doing it like a multiindex and not just a tuple valued index right?

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Falcon2001 posted:

Not a tuple valued index no, just a standard 2d dataframe with custom indices - here's the generation code.

Python code:
self.df = pd.DataFrame(arr, index=range(min_y_index, max_y_index), columns=range(min_x_index, max_x_index))

Oh I missed you only wanted 2d. This is fine, or you could make a thin wrapper around a numpy ndarray with an offset for each axis stored so that when you do `x[i, j]` it actually calls `x[i + x_min, j + y_min]` maybe. Could be faster for algorithms.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

Zugzwang posted:

Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity.

Personally, I've switched over to polars when I can, though I often still bang something out in pandas when I need something quick and the execution speed differences don't really matter.

I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit)

code:
# pandas
prices_df.loc['2023-03'] *= 1.1

# polars
polars_df.with_column(
    pl.when(pl.col('timestamp').is_between(
        datetime('2023-03-01'),
        datetime('2023-03-31'),
        include_bounds=True
    )).then(pl.col('val') * 1.1)
    .otherwise(pl.col('val'))
    .alias('val')
)
All the pl.col stuff especially makes me unhappy. Some of this is definitely a "I'm used to it" pandas bias and "I come from R and it looks more like R" pandas bias. Any time you have meaningful indices across dataframes and want to do operations the auto alignment that pandas does can really clean up code compared to having to do joins everywhere in polars (if I'm messing any of these points up with polars ignorance feel free someone to call me out).

Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too).

But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward.

I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to).

I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
Just use JSON.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
Just use a CSV reader like pandas has.

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug

DoctorTristan posted:

Personally I’m extremely anti using pandas in anything even resembling a pipeline since the devs absolutely love introducing breaking changes.

Better to never curate the API ever.

Adbot
ADBOT LOVES YOU

Chin Strap
Nov 24, 2002

I failed my TFLC Toxx, but I no longer need a double chin strap :buddy:
Pillbug
You might want to do deepcopy instead of just dict() to copy but that's all I'd change

EDIT: If you wanted to be real safe you'd make them frozen too.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply