|
So I'm an experienced python Data Science person but I've relied on all my company's tooling in order to have a good environment before. Let's say I just have a Chromebook, a windows desktop I only interact with remotely with Chrome Remote Desktop, and no current python environment installed anywhere. I'm probably going to be using Jupiterhub or Colab for doing actual interactive work but would like to set up PyCharm to actually develop my code. Would it make more sense to: A) set it up on my windows desktop Or B) get some sort of remote hosting linux solution I can remotely log in to and develop on? I would like it to be as turn key as possible because my experience with setting this stuff up is non-existent. I just know the ins and outs of using it in context of work.
|
# ¿ Apr 1, 2023 18:40 |
|
|
# ¿ May 12, 2024 04:37 |
|
Seventh Arrow posted:Would Google Colab be a possible solution? Jupyter/Colab notebooks suck for actual development of libraries though (I use it daily for interactive and plotting stuff). Encourages way too much bad code practice and isn't an IDE of any kind. Guess since I use VSCode anyway might as well do that personally too. I'll probably just install Linux on my desktop. Is there any reason to do dual boot over a VM?
|
# ¿ Apr 2, 2023 00:03 |
|
joebuddah posted:Thanks for the assist. Isn't this non-distinguishable on its own? 200,002 Doesn't tell me if it is 200 with a decimal portion of 002 or 200,002 with a decimal portion of 0
|
# ¿ Aug 9, 2023 14:18 |
|
spiritual bypass posted:You could read backward through the string to see if there's punctuation before you've passed 3 places. Then, use that knowledge to add zeroes to the end. That ought to at least normalize the decimal part length. But there potentially isn't a decimal
|
# ¿ Aug 9, 2023 14:51 |
|
It's bad.
|
# ¿ Sep 1, 2023 10:09 |
|
oatmealraisin posted:This might be a dumb question, but what are the benefits of a data class over a regular class? Free string representation, free to_dict implementation, lots of that sort of thing. If what I'm doing at all looks like "maybe just holding data" I'll use it
|
# ¿ Sep 17, 2023 00:43 |
|
https://stackoverflow.com/questions/62919271/how-do-i-define-a-typing-union-dynamically Use a tuple instead of a list?
|
# ¿ Sep 21, 2023 10:22 |
|
Used all the time in Pandas too
|
# ¿ Sep 26, 2023 00:21 |
|
You can do like Python code:
|
# ¿ Sep 27, 2023 14:16 |
|
A) You've got to give error output if you want help B) This seems like a terrible class to work with. Can't you simplify things at all
|
# ¿ Sep 27, 2023 15:43 |
|
To me it sounds like something you'd like to query. I've not used sqlite but if it is really that low of a bar to set up makes sense to me.
|
# ¿ Sep 28, 2023 14:34 |
|
BUUNNI posted:Is there a good book or manual that goes over data analysis and visualization using python? I’m not that good at reading the official documentation for Pandas and NumPy and stuff yet. https://store.metasnake.com/effective-pandas-book is the best book on Pandas I know of.
|
# ¿ Oct 18, 2023 20:12 |
|
Zugzwang posted:Wes McKinney's (creator of pandas) book is available for free on his site: https://wesmckinney.com/book/ It isn't as good though
|
# ¿ Oct 18, 2023 21:04 |
|
https://docs.xarray.dev/en/latest/internals/how-to-create-custom-index.html maybe?
|
# ¿ Oct 19, 2023 09:02 |
|
Falcon2001 posted:So it sounds like my dataframe approach isn't insane, at least for ones where visualizing it is reasonable (there was a falling sand puzzle previously that definitely benefited from that), so I'll stick with it for now. I think any time it's super slow is an example of a time where populating the whole map probably isn't a good idea anyway. You are doing it like a multiindex and not just a tuple valued index right?
|
# ¿ Oct 19, 2023 19:17 |
|
Falcon2001 posted:Not a tuple valued index no, just a standard 2d dataframe with custom indices - here's the generation code. Oh I missed you only wanted 2d. This is fine, or you could make a thin wrapper around a numpy ndarray with an offset for each axis stored so that when you do `x[i, j]` it actually calls `x[i + x_min, j + y_min]` maybe. Could be faster for algorithms.
|
# ¿ Oct 19, 2023 20:27 |
|
Zugzwang posted:Related to all this pandas discussion: what's y'all's take on the future of pandas now that polars exists? I know that polars doesn't yet do everything that pandas does (though I'm not an expert on the details here), but it does a hell of a lot, and with a nicer API and vastly more speed. I'll be curious to see polars' impact over time given that it seems to be surging in popularity. I think "nicer API" is a bit of a stretch. Haven't used it because at Google we don't really have approved Rust support yet (cross language compilation with all the custom things we do takes a lot of support and it is still definitely alpha) but it looks more like a thin layer over SQL style operations than the pandas library. Never felt super pythonic to me to have to write something like (stolen from reddit) code:
Also there is something to be said for a much more mature package with more books and other packages that work with it. If the dataframe interchange protocols get off the ground there is less of an argument here. And you can always convert directly to a pandas dataframe for those api boundaries I guess (but I'm guessing that doing that too often will cause slowdowns too). But I can't deny the speed. And there is definitely something to the fact that the API has been able to be built without an extreme amount of baggage it carries around. Pandas is trying its best to make things more consistent and streamlined but deprecation and removal take forever. Copy-on-write in Pandas 2 is probably going to become the default, in which case by Pandas 3 they are liable to remove all inplace operations and make things immutable the way Polars does which would be a big step forward. I know some people don't like the chaining syntax of Polars, but that's how modern Pandas should be written more often than not anyway (and again how you see it with tidyverse stuff in R so it is the style I am used to). I am a Pandas contributor though so part of my Polars reluctance may also just be clinging to what I know. I certainly would be giving polars a shot if I could, but would probably wind up going with pandas for non-performance critical parts because of the syntax.
|
# ¿ Oct 20, 2023 14:53 |
|
Just use JSON.
|
# ¿ Nov 6, 2023 18:53 |
|
Just use a CSV reader like pandas has.
|
# ¿ Nov 29, 2023 20:28 |
|
DoctorTristan posted:Personally I’m extremely anti using pandas in anything even resembling a pipeline since the devs absolutely love introducing breaking changes. Better to never curate the API ever.
|
# ¿ Feb 27, 2024 12:24 |
|
|
# ¿ May 12, 2024 04:37 |
|
You might want to do deepcopy instead of just dict() to copy but that's all I'd change EDIT: If you wanted to be real safe you'd make them frozen too.
|
# ¿ Apr 24, 2024 20:20 |