|
itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes" you want this one in it code:
|
# ? Dec 6, 2018 06:08 |
|
|
# ? May 30, 2024 13:40 |
|
cinci zoo sniper posted:If your dataset has only 10000 records for test then this will be off by one error, maximum i is 9999 and can not be equal 10000, and the last modulo operation will resolve to not True. It's not running off data--the generator will go forever without the break. That's why I'm confused: the second if statement must be True, or the loop would never stop. After a few more runs though, it looks like it's not consistent behavior. It's hit 10,000 iterations 3 out of 10 times. Something strange going on. bob dobbs is dead posted:itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes" Not a bad idea to factor out the grouping aspect. I already have toolz as a dependency, so I'll use partition_all. Good call. The implementation is pretty obtuse, but looks like toolz does it the same way.
|
# ? Dec 6, 2018 06:28 |
|
Oh, figured it out. https://github.com/tqdm/tqdm/issues/613
|
# ? Dec 6, 2018 06:29 |
|
SurgicalOntologist posted:And if you haven't seen it before, tqmd is just a progress bar library (a pretty great one)
|
# ? Dec 6, 2018 06:47 |
SurgicalOntologist posted:Oh, figured it out. https://github.com/tqdm/tqdm/issues/613 Oh.
|
|
# ? Dec 6, 2018 07:46 |
|
bob dobbs is dead posted:itertools is one of the great parts of python, only they have a very significant and good as hell portion of it which is not actually part of the library but is shown as "itertool recipes" This has always made me laugh. "Let's write a bunch of useful and helpful code. Then lets take it and put it in the documentation instead of making it actually importable."
|
# ? Dec 6, 2018 23:01 |
|
I'd be surprised if this is the only module attempting to correct that.
|
# ? Dec 6, 2018 23:06 |
|
I'm playing around with NumPy and Advent of Code. I'm working on vectorizing AoC Day 3's challenge.Python code:
For example, in get_fabric(), I'm trying to minimize sequential Python code in favor of using NumPy's ufuncs and routines where possible, but I'm still calling np.add.at() over a thousand times in total. That feels wrong, but I don't have the context to really know if/why it's wrong. salisbury shake fucked around with this message at 06:30 on Dec 7, 2018 |
# ? Dec 7, 2018 06:27 |
|
Thermopyle posted:This has always made me laugh. Brought to you by the makers of dataclasses, asyncio, mypy, and assignment expressions. Smart people making reasonable decisions leads to astounding variety of useful and also very dumb poo poo.
|
# ? Dec 7, 2018 06:41 |
|
salisbury shake posted:I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically. The problem is that the pattern of indices you need to accumulate into is a bunch of irregularly scattered rectangles. There is good syntax for working with single rectangles of locations, or with irregularly scattered single locations, but irregularly scattered rectangles is a case that is not handled well. If you build all of the indexes explicitly (e.g using np.meshgrid or np.mgrid) and concatenate them together then you could probably get away to a single call to np.add.at (by changing the problem from indexing at an irregular collection of rectangles to indexing at an irregular collection of points), but this doesn't solve the real problem which is that you need to loop over all of the claims to collect the indices in the first place. It might be worth trying as an exercise to learn the tools, but it won't be any more efficient and will probably be harder to understand. For count_overlapping it would be more idiomatic to write np.sum(fabric > 1), but your way is fine too. It would also be more idiomatic to write fabric[indices] += increment instead of np.add.at(fabric, indices, increment) unless you are making use of the fact that np.add.at will increment repeated indices multiple times. Nippashish fucked around with this message at 09:21 on Dec 7, 2018 |
# ? Dec 7, 2018 09:06 |
|
salisbury shake posted:I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically. Outside of a code-golf type fun challenge, there's not really a reason to vectorize this kind of thing. It will make it less readable, slower, and more memory hungry than a loop version. The problem is naturally solvable and readable with a loop, and loops are what computers are good at. The problem with doing a naive loop is that Python is very, very slow. To get your math to run at a reasonable speed, you need to get the computation out of Python. You can do that by using NumPy ufuncs so that the only bit of Python code that runs is invoking the ufunc a few times and having each ufunc [which is implemented in C] do lots of math. But that means you have to figure out a way to vectorize instead of doing it the natural way. An alternative is to use numba. It provides a decorator that you can apply to your Python function that says "instead of executing this function in the Python interpreter, just-in-time compile it into normal instructions the first time it is used and automatically generate the boilerplate to pass data back and forth". It can only work with a subset of the Python language, but it's often good enough. Taking your get_fabric function and massaging it a bit (I don't remember offhand if namedtuples are supported in numba): code:
Bonus thoughts for refining your solution: - With your solution, what happens if you have a million elves on a 10x10 fabric? - Can you avoid having to allocate the length x length array? i.e. can you make it work if the fabric is 10000000 x 10000000 as long as the claim list is short?
|
# ? Dec 7, 2018 10:24 |
|
I got another easy/dumb one. Have df["Time"] with code:
code:
4 for i in range(len(temp)): ----> 5 if temp[i][1] == "days": 6 temp[i][0] = int(temp[i][0]) 7 elif pd.isnull(temp[i]): TypeError: 'float' object is not subscriptable But I think the real issue is the "isnull" part. I think its telling me hey there is no "ith" element to look up...but there is right?
|
# ? Dec 7, 2018 16:26 |
CarForumPoster posted:I got another easy/dumb one. It tries to get 2nd element inside NULL, that temp[i] corresponds to. Also I would suggest to use dateparser for this.
|
|
# ? Dec 7, 2018 16:37 |
|
cinci zoo sniper posted:It tries to get 2nd element inside NULL, that temp[i] corresponds to. Also I would suggest to use dateparser for this. Good call. This works and should be more robust than my previous one: code:
|
# ? Dec 7, 2018 18:08 |
|
I am trying to compute a kernel density estimate for the rate of incidents around the state of Virginia, similar to what is done here. My previous attempts with using 'euclidean' as the metric produces this KDE plot: I then realised that I should be using 'haversine' as the metric because I have a two dimensional vector space as described here. I have had to modify the code from the Python Data Science Handbook example because I am using geopandas to plot the state map instead of the matplotlib basemap, and my data is in a dataframe and not numpy arrays like the data seems to be in the linked example. Here is my code: Python code:
There are about 8000 points plotted on the map with some well defined clusters of points. Unfortunately the KDE plot does not seem to be picking up on them as it does in the linked example. I have tried playing with the bandwidth for the KDE but this does not seem to change things for the better - for example when the bandwidth is set to 0.3 this is the resulting KDE plot: I was expecting the KDE using 'haversine' as the metric to be slightly different to the original KDE plot, but still somewhat similar. I think that the KDE plots I have now are incorrect but I cannot tell what I am doing wrong. Thoughts?
|
# ? Dec 7, 2018 19:54 |
|
eucliean distance metric is fine unless you're using the map for navigation tho? 2-space is within the set of n-spaces, yes what do the latitude and longitude data points look like?
|
# ? Dec 7, 2018 19:57 |
|
bob dobbs is dead posted:eucliean distance metric is fine unless you're using the map for navigation tho? From the Python Data Science Handbook it seemed to say that you should use 'haversine' when performing KDE where the points are latitude and longitude - that is why I went from using 'euclidean' to 'haversine'. By 'fine' do you mean the error in the distances between points (because the distances will not be great circle distances but just straight line distances) is small enough to be ignored if you are not trying to navigate between points? I am not sure what the '2-space is within the set of n-spaces, yes' comment means exactly. The Longitude minimum and maximum are -83.6311 and -75.3771, while the Latitude maximum and minimum are 36.5454 and 39.4172. I believe they are in units of decimal degrees. Is this what you were asking?
|
# ? Dec 7, 2018 20:26 |
|
Jose Cuervo posted:From the Python Data Science Handbook it seemed to say that you should use 'haversine' when performing KDE where the points are latitude and longitude - that is why I went from using 'euclidean' to 'haversine'. By 'fine' do you mean the error in the distances between points (because the distances will not be great circle distances but just straight line distances) is small enough to be ignored if you are not trying to navigate between points? what i thought that you were thinking is that "oh, you can't use euclidean at all for this", but the real statement to make is "euclidean will introduce distortions in this. but if you're overlaying it over a flat projection in the first place, the distortions will basically look like the map distortions". the other possible misunderstanding is just because euclidean distance is viable for any dimensional space (n-spaces) you think you can't just use it for 2-dimensional space (2-space) also, the underlaying mistake is prolly you need it radians (use this one https://docs.scipy.org/doc/numpy-1.3.x/reference/generated/numpy.deg2rad.html or https://docs.python.org/2/library/math.html#math.radians) bob dobbs is dead fucked around with this message at 20:33 on Dec 7, 2018 |
# ? Dec 7, 2018 20:31 |
|
bob dobbs is dead posted:what i thought that you were thinking is that "oh, you can't use euclidean at all for this", but the real statement to make is "euclidean will introduce distortions in this. but if you're overlaying it over a flat projection in the first place, the distortions will basically look like the map distortions". the other possible misunderstanding is just because euclidean distance is viable for any dimensional space (n-spaces) you think you can't just use it for 2-dimensional space (2-space) Thanks for the clarification regarding the 'euclidean' versus 'haversine' issue. I am using radians though: in the code I posted I convert the latlon values to radians using numpy.radians() and I do the same for the sample points with the line Python code:
EDIT: But since it seems reasonable to use 'euclidean' I will just use that and not worry about why this is not working. Edit2: Does posting on the forums count as duck debugging? I noticed that I had the sample points as (long, lat) pairs, not (lat, long) pairs, i.e. the line Python code:
Python code:
Jose Cuervo fucked around with this message at 20:40 on Dec 7, 2018 |
# ? Dec 7, 2018 20:34 |
|
here's to not actually lookin at your code lol
|
# ? Dec 7, 2018 22:23 |
|
SurgicalOntologist posted:I'm doing a bunch of computations and saving them to a database. The following function is supposed to be committing to the database every 100 results (based on the variable commit_interval), and stopping after 10,000 (the value of total). But I'm not getting 10,000 results on each backtest_set, instead I'm getting 9,900. Can anyone spot the error? Am I making a basic off-by-one -ish mistake or do I have some weird race condition? I can't figure it out. Here's the code. You are testing that i == total, but on the last iteration i == (total - 1).
|
# ? Dec 8, 2018 02:57 |
|
Nope, if that were the case, I'd get 10,001 results not 9,900 (the iterator is infinite so the break statement is the only thing stopping it). Note that I use the second argument to enumerate to start the count at 1, to prevent the issue you mention. It turned out to be a known tqdm bug, a weird async interaction that occurs when the progress bar is killed with a break statement.
|
# ? Dec 8, 2018 03:45 |
|
Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3? >>> from some_library import nth_prime >>> nth_prime(999) 7907 You'd think this would be easy to find, but all I can find from googling are snippets on stackoverflow that only work in python 2.
|
# ? Dec 9, 2018 23:07 |
|
School of How posted:Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3? thats cuz your search keyword is "sieve of erastothenes" or "number sieve", not "nth prime" its not precisely what you want but its real close
|
# ? Dec 10, 2018 00:26 |
|
bob dobbs is dead posted:thats cuz your search keyword is "sieve of erastothenes" or "number sieve", not "nth prime" Thats a completely different algorithm. Sieve gives you primes below a certain number, thats different than nth prime...
|
# ? Dec 10, 2018 02:09 |
|
School of How posted:Thats a completely different algorithm. Sieve gives you primes below a certain number, thats different than nth prime... cache the sieve results in a list? or are we talkin O(10e6) th prime here
|
# ? Dec 10, 2018 02:11 |
|
School of How posted:Does anybody know of a good python library that has an implementation of "nth prime" that works in both python 2 and 3? The SymPy library has such a function.
|
# ? Dec 10, 2018 03:07 |
|
QuarkJets posted:Not using self means that you've defined a class attribute. This is valid, all instances of the class are accessing a shared version of the variable and that may be the behavior that you want. foo.num and bar.num will always be the same value in your code Ah, okay. That actually explains the variable instance better. For some reason I thought if defined outside of the __init__, it was a class variable, while inside the __init__ it was an instance. That actually helps clear up a lot, thank you!
|
# ? Dec 10, 2018 20:14 |
|
When you make a new class, what actually happens is that everything inside a class statement is run as a normal python function, and then adds all local variables that are there at the function end to the class dictionary. And there is absolutely nothing special about the self. syntax, you are just setting a property on an object. (that just happens to be your current instance)
|
# ? Dec 10, 2018 20:39 |
|
^^^ that's what you get when you leave a window openGothmog1065 posted:Ah, okay. That actually explains the variable instance better. For some reason I thought if defined outside of the __init__, it was a class variable, while inside the __init__ it was an instance. That actually helps clear up a lot, thank you! If it's not clear, Python's a dynamic language where you can just assign attributes and functions to objects whenever you like. So you can take any thing and go bitmap.butts = 101 or whatever you like - under the hood there's a local namespace with a dictionary of attributes and functions, and you can add and remove from that however you want So when you define a class, you can set attributes on the class object itself - you're adding to the class's dictionary, so any instances can see that stuff in a higher scope. So that works as a class variable - instances can reference it, and you can just reference it as an attribute on the class itself if you like, MyClass.x What you're doing in the __init__ constructor is taking the instance itself (passed in as a parameter called self by convention) and just assigning attributes to that object. So when you do self.x = 69 you're adding that attribute to that object's local dictionary, which only affects that instance. So that basically works as an instance variable. There's nothing special about it - you're just adding that property to that object that was passed in The maybe weird thing is that you define all these functions in the class as def whatever(self, x, y), but when you call the method on an instance, you just do thing.whatever(x, y). Under the hood, it rewrites the call and does MyClass.whatever(thing, x, y) - it calls the function in the class object, and passes in the instance object, so you can mess with it in the body of the function. That's why they all have that self parameter - so they can reference and affect the actual instance. The language sugar takes care of rewriting those calls for you, but in the function itself you have to work with the instance parameter explicitly
|
# ? Dec 10, 2018 21:14 |
|
A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?
|
# ? Dec 10, 2018 21:24 |
|
The XKCD Larper posted:A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way?
|
# ? Dec 10, 2018 21:43 |
|
The XKCD Larper posted:A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way? Using setters/getters means being able to do something with those values before storing them or before giving them to the requester, which can be extremely useful. Really basic example say that you want a stored value describing an angle to always be in the range 0-360; if you use a setter then your user can pass in values outside of the range and your setter can just fix the input or raise an exception or whatever you want But you shouldn't just create setters/getters for all of your attributes, you should create setters and/or getters when you have some reason to do so. People who insist on all of their attributes having getters/setters that basically do nothing are idiots who should go back to Java Setting attributes directly is such a standard and normal part of Python that there's even a set of decorators (properties) that allows getters/setters to be syntactically identical to directly accessing attributes, e.g. Python code:
|
# ? Dec 10, 2018 22:29 |
|
The XKCD Larper posted:A question about class variables. I watched the MIT OCW video about class objects and the instructor says that class variables should be set, and retrieved, using getter and setter functions as opposed to directly assigning them. They say that this allows for renaming variables within the class and making renaming variables easier. Is there any other reason that one would work this way? The reason people do it in java is because if you cant add extra logic on variable set/get if you are using a raw class variable, and if you decide to change from raw to getters/setters, you break code that used your class before. Thankfully, in python you can use descriptors to easily add special get/set logic and it will look the same as a raw access, and wont break already written code.
|
# ? Dec 10, 2018 22:35 |
|
dont learn oop from java or python, learn it from smalltalk or ruby
|
# ? Dec 10, 2018 22:40 |
|
bob dobbs is dead posted:dont learn oop
|
# ? Dec 10, 2018 22:53 |
|
its worthwhile if you actually read what it was originally for and its history and understand it it hasn't proven to be a magic bullet. it's not a synonym for modular or structured programming
|
# ? Dec 10, 2018 22:56 |
|
It may help to think of Classes in python as either types you can make, or bags of related data with associated functions. Check out dataclasses, and see how far you can get with them alone.
|
# ? Dec 10, 2018 23:26 |
|
I am getting my feet wet with web scraping. Is there a reason to use beautifulsoup over selenium? I find selenium much easier to work with but I am doing super basic stuff right now while learning. Was just curious if BS brings something to the table selenium doesn't have.
|
# ? Dec 11, 2018 00:39 |
|
|
# ? May 30, 2024 13:40 |
|
keyframe posted:I am getting my feet wet with web scraping. Is there a reason to use beautifulsoup over selenium? I find selenium much easier to work with but I am doing super basic stuff right now while learning. Was just curious if BS brings something to the table selenium doesn't have. beautifulsoup is better for having a html doc (doesnt even have to be from the web) and doing arcane queries on it selenium is better for doing arcane sophisticated things in a ui or some poo poo so if you have to trigger weirdo browser events, you gotta use selenium. if you gotta get the second href that contains the word "bob dobbs" in the inner html but which don't also have this other id and yadda yadda, you're better off using the beautifulsoup capabilities more. you can use both, you can also use neither
|
# ? Dec 11, 2018 00:43 |