|
Thermopyle posted:mypy Does mypy have support for pandas objects? So I could set the return/input type to be pd.Series or pd.DataFrame?
|
# ¿ Mar 28, 2017 19:50 |
|
|
# ¿ May 2, 2024 10:59 |
|
huhu posted:Would this be the best way to log errors on a script that I'm running? Check out the logging module that comes in the standard library. It offers a cleaner interface for this type of stuff.
|
# ¿ Mar 28, 2017 20:43 |
|
No, turn away!
|
# ¿ Apr 12, 2017 13:08 |
|
If that's substitution is all you need to do, can you not just use the replace() string method? Do you need to use regex?
|
# ¿ Apr 14, 2017 23:15 |
|
Methanar posted:The data is actually more structured like this. I gave the replace snippet a shot, but it didn't do anything. If you iterating over a dictionary you'll need to update the comprehension accordingly. If your use case really is just the simple replace, I think regex is overkill here and makes for messier code for no reason.
|
# ¿ Apr 15, 2017 01:06 |
|
Thanks for the Fluent Python suggestion, thread. Started reading it last week and it's been really good.
|
# ¿ Apr 22, 2017 14:41 |
|
huhu posted:In Pycharm, is there a way to run code and then and then leave some sort of a command line to continue playing around with it? Highlight code, right click, execute selection in console.
|
# ¿ Apr 27, 2017 21:44 |
|
Take a look at pd.to_numeric()
|
# ¿ Apr 28, 2017 03:56 |
|
Didn't you say above you told the installer to not add anything to your PATH? Are you sure that pip working in that instance is not a carry over of a python installation pre-Anaconda?
|
# ¿ Jun 5, 2017 13:08 |
|
^^ Yep that would definitely be the easiest way to troubleshoot that.
|
# ¿ Jun 5, 2017 13:28 |
|
I also bet he's going to need to do manipulation/cleaning of the data before plotting which will be much easier in pandas, but sure reinvent the wheel.
|
# ¿ Aug 14, 2017 20:57 |
|
Hughmoris posted:Speaking of Pandas, I run in to trouble when I need to create additional columns that are filled based on other column criteria. For example, if I have a CSV of: code:
|
# ¿ Aug 15, 2017 03:00 |
|
Jose posted:Can anyone link a good guide for combining pandas and matplotlib? Basically how matplotlib usage differs if I'm using pandas data frames Pandas has some plotting functions that will output matplotlib axes that you can tweak and save from there. Plot() is the main interface, but some others like hist() and boxplot() have one off functions. Like Cingulate said, seaborn is also a nice library that helps bridge these worlds and it is dataframe aware. Although in either case you might have to use a bit of matplotlib to make things exactly the way you want.
|
# ¿ Aug 16, 2017 12:51 |
|
I forget from your earlier posts, but have you posted any examples of code where you're getting stuck? This thread is quite willing to help people along if they know where to meet you.
|
# ¿ Sep 8, 2017 02:59 |
|
It can sure. You'd just have an empty value for that level, but to me that would be pretty weird to work with. You could also define the level you're struggling to map with an appropriate value. For example in your case it would be setting the value "Day" even though there is no variability for type 1. Other solutions would involve collapsing across levels through different naming or not using the extra level at all.
|
# ¿ Sep 13, 2017 23:50 |
|
I saw the new PyPy has pandas and numpy support. Anyone know of any benchmarking that's been done?
|
# ¿ Oct 6, 2017 17:07 |
|
You can use the debugger to stop at a point in the code, hit the console button and interact with the ipython window there (there is also a variables viewer too). Or you can highlight whatever code you want to run, and then right click and choose “execute in python console” and it will run it in an open console (if you have one) or open a new one and execute it there. I used to do exactly what you describe with pycharm + a terminal window but I do it all in pycharm now. I know this is short but I’m phone posting but can provide more detail later if needed.
|
# ¿ Nov 4, 2017 13:04 |
|
You’ve always been able to highlight code and execute it in the built in i python terminal of pycharm.
|
# ¿ Dec 1, 2017 01:10 |
|
Cingulate posted:I found this very interesting: https://medium.com/dunder-data/python-for-data-analysis-a-critical-line-by-line-review-5d5678a4c203 I think he nit picks here and there but overall I tend to agree with a lot of his points. I’ve never read his cookbook on pandas but maybe I’ll get work to buy it and I’ll thumb through it. I’ve used pandas enough now that all I usually need is a quick scan of the docs to remember things. However one thing he mentions early in his review that drives me up the loving wall is how pandas devs use all of these random weird functions and methods that aren’t really documented anywhere and when you’re learning the library it’s really hard to figure out what the hell they are for.
|
# ¿ Dec 6, 2017 15:29 |
|
Hughmoris posted:For the Pandas users out there, what type of things (if any) do you bounce back to Excel for? Nothing other than writing out tabular summary views of the data I’m working with (which I may clean up the formatting here and there) that are usually sent to coworkers, or opening up Excel files I’ve been sent to see how they’re formatted, so I know how to import them.
|
# ¿ Dec 10, 2017 15:16 |
|
x = {} creates a dictionary, x = set() if you want to create an empty set
|
# ¿ Dec 20, 2017 20:39 |
|
If you need to pull the value of the next row into the current row, then create a new column with shift(-1)?
|
# ¿ Jan 10, 2018 15:55 |
|
itertuples() is much quicker than iterrows(), and might be a nice middle ground.
|
# ¿ Jan 10, 2018 18:30 |
|
pd.concat(your list of series, axis=1) should do it.
|
# ¿ Jan 25, 2018 20:53 |
|
Jose Cuervo posted:I am trying to understand what I am doing wrong with a Pandas merge of two data frames: Is df_two unique on ID, Date?
|
# ¿ Feb 3, 2018 04:28 |
|
Jose Cuervo posted:Are you asking if all the (ID, Date) tuple combinations in df_two are unique? If so, yes. df_two was generated using groupby where on=['ID', 'Date']. Er, sorry. Unique was a bad word. In df_two does (ID, Date) ever include multiple rows?
|
# ¿ Feb 3, 2018 16:06 |
|
Jose Cuervo posted:No, because df_two was generated using a groupby statement where on=['ID', 'Date'] - so every row in df_two corresponds to a unique ('ID', 'Date') tuple. Try the indicator flag on the merge and then use it to see if it might lead you to where the extra rows are coming.
|
# ¿ Feb 3, 2018 19:03 |
|
Anybody had any issues with PyCharm skipping over breakpoints when debugging? My Google searches have failed me, and it's getting super annoying because I can't figure out how to replicate the issues.
|
# ¿ Feb 9, 2018 20:32 |
|
re.sub? https://docs.python.org/3/library/re.html#re.sub
|
# ¿ Feb 25, 2018 23:46 |
|
What level of observation do you need the resultant data to be?
|
# ¿ Feb 27, 2018 01:59 |
|
Seventh Arrow posted:Pretty detailed...this is the kind of analysis that I'll need to do on the data: Phone posting so I could be missing something, but those first three files look to have the same columns. If that’s the case, then concatenating the files together would work. Then you’d want to do two merges for the files below. One on store location key and the other on product key. If you need help, I can post pseudo code for you here in a bit when I can get back to a laptop. I would do all of this in pandas btw.
|
# ¿ Feb 27, 2018 02:36 |
|
Seventh Arrow posted:If you could post an example of what you had in mind, it would be greatly appreciated. I have some other things that I can work on, so no rush. Here's what I had in mind. code:
code:
|
# ¿ Feb 27, 2018 02:58 |
|
Seventh Arrow posted:That's great, thanks a lot. So I guess "concat" was what I was looking for when it comes to the similar csv files. Does it just automatically look at the column names and sort accordingly? It's aligning the DataFrames along the columns index, which in this case is their names. So it doesn't matter what their position is, it matters that they are labeled the same. The default parameter is axis=0, which is over the columns (so you are appending DataFrames by stacking them on top of each other, and the column names are telling pandas which data goes where), but you could set it to axis=1, and think of the same exercise based on row indices, too. If one DataFrame has a column the others don't, pandas will create a new column, and assign missing values to the rows/pieces that didn't contain that variable. quote:Another bit of interest is the "frame.loc" line...so if I have multiple columns what would the format be like? Maybe something like: Whoops, there was a typo in my original code. It should read: code:
To give you an idea of how you can build this into more complex expressions. Wonder if for region Canada, for all stores with a location id over 1000, I want to set the missing values in column 'price' to 'GOON', you could do: code:
|
# ¿ Feb 27, 2018 05:08 |
|
Is there a reason you're using pyspark? Everything you mention can be done with pandas directly. edit: pandas automatically recognizes sales as a float column. See below. code:
vikingstrike fucked around with this message at 19:48 on Feb 27, 2018 |
# ¿ Feb 27, 2018 19:45 |
|
Seventh Arrow posted:Yes, I made a tiny little inconspicuous link in my post...here's the whole hog: If you need to create a DataFrame, and pyspark has a DataFrame creator function that gives you the desired output, I'm not sure why you'd try to roll your own. Turning CSVs into DataFrames is some of the most basic functionality of a library like this.
|
# ¿ Feb 27, 2018 20:06 |
|
One is for indexing one is for iterating.
|
# ¿ Mar 3, 2018 15:22 |
|
What type of reports are you thinking? Out of what you describe, the logical addition would be matplotlib/seaborn to plot figures in python.
|
# ¿ Mar 4, 2018 22:57 |
|
You can send email using python and write it in a way that the email should be in line. Been a while since I’ve done this, but this be easy to automate. Data cleaning -> figure generation -> email.
|
# ¿ Mar 4, 2018 23:07 |
|
Look at this loop and see if you can spot where you're tripping up:code:
|
# ¿ Mar 5, 2018 04:19 |
|
|
# ¿ May 2, 2024 10:59 |
|
Nope. It has to do with how you are first assigning collector key and sales.
|
# ¿ Mar 5, 2018 04:33 |