|
I don’t know if this is the right place to ask, but I am trying to read a large (750mb) csv file into a pandas dataframe, and it seems to be taking an unreasonably long time. I am limiting the columns to only 8 columns with the usecols option, but the read_csv method is still taking 6 minutes to read the file into python. I haven’t been using python for very long and I’m coming from a SAS programming background. In SAS this file loads in a few seconds, so I feel like I am screwing something up for this to take so long. I originally tried the read_sas method to load the original 1.5 gb dataset, but I had a memory error and had to convert the file to csv to get around that. The file only has 170k rows. Does anyone have an idea why this is taking so long? Or is this just a normal amount of time for python to process this file? Google/stack exchange are getting me nowhere. Edit: Never mind, I switched the file from a network drive to my local drive and now it loads in 4 seconds. I guess it’s a network I/O issue and not a python issue Deadite fucked around with this message at 03:00 on Sep 28, 2019 |
# ¿ Sep 28, 2019 02:23 |
|
|
# ¿ May 12, 2024 07:06 |
|
Is there a way to create pdf reports without a specific package that does that? At work I am limited only to the packages that come with anaconda, and I can’t see anything that will work. Everything I find on google is just “install reportlab” and it’s frustrating because my IT department won’t let me
|
# ¿ Dec 1, 2019 21:44 |
|
I’m pretty new to python, and I didn’t think to just download the code and import it that way. I’m used to programming in SAS, so having to find packages to accomplish tasks is hard to get the hang of. I keep thinking there must be a way to do everything in vanilla python and that’s the wrong way to think about creating programs it seems
|
# ¿ Dec 2, 2019 04:27 |
|
Does anyone have experience running Dask on AWS? I'm trying to start a dask cluster on an EC2 instance but the dask scheduler never seems to start. I'm using EC2Cluster from dask-cloudprovider. Here's what I see on PuTTy when the cluster context manager executes:code:
|
# ¿ May 4, 2021 03:57 |
|
I'm having an issue trying to add a matplotlib graph to a tkinter GUI where the legend to the graph is getting cut off if I move it to below the chart. Does anyone know a way to display the legend when it is outside of the chart? FigureCanvasTkAgg doesn't have a height argument so I can't just stretch the viewable area. It looks like it is some kind of automatic resizing problem that I can't figure out a way around. Here's the test code: Python code:
|
# ¿ Jun 17, 2021 14:42 |
|
OnceIWasAnOstrich posted:I've never had this problem in the context of another GUI but I've definitely run into similar issues with bits of a matplotlib figure getting rendered outside the bounds of an image. It is usually something that a call to tight_layout() or other layout-modifying functions can address. Thanks, tight_layout is exactly what I was looking for.
|
# ¿ Jun 17, 2021 16:49 |
|
Gobbeldygook posted:I am a coding newbie working through Learn Python The Hard Way. For exercise 36 he says to make a text-based adventure game. I decided to add quicktime events, which requires timed input. I found a complicated, Windows-only method on reddit and it works fine, but I decided to try to make the seemingly simple crossplatform solution work just for it's own sake. Here is what I have: The easiest way to fix this would just be to reset t to a new Timer object, like code:
|
# ¿ Jul 19, 2021 03:20 |
|
I'm having a problem figuring out a regex and hopefully someone can point to my mistake. Here's the test case:code:
The code works when it's just one character: code:
|
# ¿ Dec 8, 2021 22:51 |
|
Thanks to you both, I'm not great with groups and didn't realize I needed an outer group along with the inner group to return what I wanted
|
# ¿ Dec 8, 2021 23:07 |
|
Following along with this video helped me immensely when I was starting out with pandas https://youtu.be/5JnMutdy6Fw
|
# ¿ Dec 14, 2021 13:42 |
|
Can anyone help me understand why this example ends in an error:code:
code:
|
# ¿ Nov 22, 2022 23:29 |
|
I can tell that from the error, what I don't understand is why that string would have the method applied in the first place, since it should have been filtered out in the first part of the where function.
|
# ¿ Nov 24, 2022 02:39 |
|
QuarkJets posted:That's not the order of operations; the way you have this set up, astype occurs first, and that new dataset would become one of the inputs of the "where" function So astype is the first part executed before the first "where" happens? I thought the first "where" statement would have filtered out the 'Missing' strings before the second where executes and applies the astypes
|
# ¿ Nov 24, 2022 03:04 |
|
Oooooooh okay, I see now. Thank you for your very clear explanation. I didn't think about how all of the arguments in the "where" function needed to be resolved before the function executes, but that is how literally all functions work. I just couldn't see it in this case for whatever reason. For context I was trying to write a "where" statement to replace values in a column that already contained the "Missing" strings, so I needed to find a way to filter those out, then apply the criteria to the intended population, and return all the values back to the dataframe with every value in its original index position.
|
# ¿ Nov 24, 2022 05:53 |
|
Does anyone have a good resource for dask that can be understood by an idiot? I've been struggling with the library for way too long and I still have no idea what I'm doing. It's really frustrating to think you're running code in parallel, only to find out that you're not actually using all the threads in your processor unless you set the config to either 'multiprocessing' or 'distributed' and I don't know the difference between them. All I know is that 'distributed' runs faster than 'multiprocessing' but also causes my computer to restart with larger files. It also produces cryptic messages like this: code:
Anyway I feel like I need a better foundation and reading the documentation is getting me nowhere.
|
# ¿ Apr 22, 2023 20:31 |
|
So here's my dask test case, and it's a little misleading because when I check the times the compute() without a LocalCluster/Client runs much, much faster for the example than it does with the program I'm actually building. The real program runs faster without a LocalCluster until the file gets to be around 1GB in size, then the LocalCluster distributed compute starts being faster. I can't seem to recreate some of the errors I'm seeing with large files with the example code though. It does top out at 5 million rows before I get this error, which I don't get with my real program. This whole thing is so confusing.code:
Python code:
|
# ¿ Apr 23, 2023 19:56 |
|
I have a quick question that I cannot figure out and the keywords involved make googling difficult. I’m also having a hard time explaining this so bear with me. I am trying to write an if statement that checks that one variable isn’t in a list of values, or if that variable isn’t equal to a specific value while another variable is equal to a specific value at the same time. My test case is below. In this case I only want to see ‘Right’ when x is either ‘a’, ‘b’, or ‘c’ OR x is ‘d’ while y is also 3. code:
|
# ¿ Feb 25, 2024 16:37 |
|
FISHMANPET posted:Is the not negating the entire statement or just the part before the or? I think you need more parenthesis, because the not isn't applying precisely how you want it. Yes, the ‘not’ should only apply to the first conditions wrapped in parentheses and not the one after the ‘or’. I tried wrapping the whole thing in parentheses like (not (x == ‘d’ and y == 3)) but that doesn’t seem to work either. ‘Right’ is the desired result when x = ‘d’ and y = 3. The test case is just an example from a much larger program that I can’t easily restructure so I just need to figure out if what I’m trying to do is possible. It feels like there should be a way to do this but I can’t find any info.
|
# ¿ Feb 25, 2024 17:17 |
|
boofhead posted:yeah, why are you writing it like that? why not just write I have to do it this way because these two new conditions are just an addition to a very long list of existing conditions, and at least some of them will need to be negated. I inherited this program and I can’t change how it’s structured without causing a lot of drama so I’m trying to do the best with what I have. I agree that it’s going to be a (more) confusing mess from here on out.
|
# ¿ Feb 25, 2024 17:35 |
|
|
# ¿ May 12, 2024 07:06 |
|
boofhead posted:refactor the whole thing imo Perfect, thank you. And I'm sure that team doesn't do any unit testing. I got pulled in to help out because they were running behind. At the end of the month this program will not be my problem again until an ironic reorg forces me to maintain it. Here's a more accurate representation of the issue, but pretend there are about 20 more variables that need to be tested: Python code:
|
# ¿ Feb 25, 2024 17:51 |