|
QuarkJets posted:Good news! The compat module should trivialize your transition to TF2 but yeah it's still nonzero effort. There are some significant performance and design improvements in TF2, I recommend it I definitely will upgrade at some point, it's just such a low priority in the scheme of things. We're building up for an acquisition at the end of the month, so some other things require due diligence.
|
# ? Oct 12, 2020 19:03 |
|
|
# ? May 15, 2024 10:31 |
|
Wallet posted:I could be misremembering but I'm pretty sure pathlib handles this on its own. I think the only real reason to use PureWindowsPath or PurePosixPath is if you are trying to gently caress around with a Windows path on a POSIX machine or vice versa. Yeah; if you just use Path pathlib will transform the path in whatever way is appropriate to the local system, including changing slash directions
|
# ? Oct 12, 2020 20:18 |
|
Windows also accepts either direction
|
# ? Oct 12, 2020 20:57 |
|
I believe os.path also auto-converts so long as you always use posix-style slashes (e.g. even if you just print the string returned from an os.path function, its slashes will be correct for the OS)
|
# ? Oct 12, 2020 22:24 |
|
QuarkJets posted:Yeah; if you just use Path pathlib will transform the path in whatever way is appropriate to the local system, including changing slash directions This makes sense and it's what I thought from the last time I did it, but it just ... wasn't. I'll just poke it with a stick again tomorrow and come running back if it's being grouchy.
|
# ? Oct 12, 2020 23:37 |
|
Rocko Bonaparte posted:I'll just poke it with a stick again tomorrow and come running back if it's being grouchy. ...and here I am and it's being grouchy. Windows w/ 3.8.3 code:
code:
I'm guessing that the pathlib on each system is more the problem than anything and doing a str() on the path isn't really sufficient. Is there some other way I should get these paths without having to test for OS?
|
# ? Oct 13, 2020 19:55 |
|
Is it something they fixed between the two different versions you used? edit: Ah, nope! The _WindowsFlavour defines an altsep while the _PosixFlavour does not! necrotic fucked around with this message at 20:09 on Oct 13, 2020 |
# ? Oct 13, 2020 20:00 |
|
That looks like correct behavior? On windows, the paths "..\this\that.txt" and "../this/that.txt" both mean "up one directory, into a directory named this, file name that.txt" On linux, the path "../this/that.txt" means that, but the path "..\this\that.txt" means "file named ..\this\that.txt" in the current directory. ..\this\that.txt is a legal posix filename because posix filenames are terrible (newlines and nonprinting UTF-8 characters are also legal posix filenames) If you want to make that be interpreted as a path with directories on linux, you will need to either explicitly tell it to interpret it as a windows path or manipulate the string into a posix path Foxfire_ fucked around with this message at 20:08 on Oct 13, 2020 |
# ? Oct 13, 2020 20:05 |
|
Foxfire_ posted:..\this\that.txt is a legal posix filename because posix filenames are terrible (newlines and nonprinting UTF-8 characters are also legal posix filenames)
|
# ? Oct 13, 2020 20:12 |
|
If you're composing the paths, just always use forward slashes since windows will also accept that. If you're accepting the paths from users, you'll need to decide how to interpret what they mean when they enter "this\that" on linux since it technically means filename but they probably didn't mean that if they are good upstanding people
|
# ? Oct 13, 2020 20:21 |
|
Rocko Bonaparte posted:Yeah that's wild right there. So I guess I need some extra logic for this after all. Try building the path instead: Python code:
QuarkJets fucked around with this message at 21:28 on Oct 13, 2020 |
# ? Oct 13, 2020 20:54 |
|
Does PurePath.as_posix() help at all, at least if running the code on Windows?
|
# ? Oct 13, 2020 21:19 |
|
Hello thread, my old friend I have a need to compute k nearest neighbors on mid-to-large datasets (20k samples and above) Default implementations of exact kNN via ball tree and the like in sklearn stall out if you actually attempt to do this because they rely on computing the entire pairwise distance matrix for the whole dataset using scipy's single-threaded sadgasm of a distance function. Wishing to avoid approximate nearest neighbors, I think it may be worth it just to roll my own fast knn with a lazily evaluated distance metric. This way we can split up the dataset into small chunks, and by computing the one vs all distance of a single sample in the chunk, we can use the triangle inequality to guarantee no sample outside the chunk is closer than some distance. After that we can quickly solve a small knn problem. My problem now is how best to go about writing a performant, lazily evaluated, cached distance function in python? Ideally it would be something dictionary or hashmap-esque, where the key is a pair of indices representing two samples and the value is evaluated once when called, then stored for future reference. Unfortunately python dictionaries get sort of slow if they get really huge. Does anyone have suggestions? (The temptation to do this in rust and then try to get it to be callable from python is overwhelming, but I know that that way lies only madness) DearSirXNORMadam fucked around with this message at 05:46 on Oct 15, 2020 |
# ? Oct 15, 2020 05:41 |
|
That sounds large enough & you caring about performance enough that using python is a bad idea. Python is very, very slow. If you're doing large numerical stuff, you can really only use it as a glue language to connect together pieces implemented in something else. Backing it with rust or C, or writing in a python-adjacent language like Cython or numba seem like better ideas to me. How many times are you going to use this thing and how many points are you going to classify? If it's not that many, a naive iteration through all points for every lookup in not-python might be the fastest thing considering your time to implement it. 20,000 points is not that many. Like making up some numbers and calling each point 100 doubles long, that's only 16MB of memory to go through per classification.
|
# ? Oct 15, 2020 07:26 |
|
Why avoid ANN?
|
# ? Oct 15, 2020 09:05 |
|
Mirconium posted:Hello thread, my old friend Consider taking a look at FAISS https://github.com/facebookresearch/faiss. It can do a lot of things but the core of it is extremely fast kNN searches on extremely large datasets with a variety of strategies to speeding things up on hideously big datasets. 20k is honestly pretty small (exhaustive search gets infeasible around 1M vectors with that code) so you can just use the flat index for exact results and not rely on the index tricks for approximate results. Since you refer to scipy's distance function I'm assuming you are using something normal like Euclidean distance so it probably includes a C-implementation of what you are using already. edit: That said, 20k is not a lot of vectors at all and I've used the sklearn BallTree with no problems for both Euclidean or Cosine distance metrics on similarly sized datasets. The scipy pdist() function only takes about 10 seconds on my machine to compute the full distance matrix for 20k vectors of 100 dimensions each. Don't give it a callable metric, that will ruin things with Python overhead. If you can't use any of the included metrics you'll need to construct your method with not-Python in some way. OnceIWasAnOstrich fucked around with this message at 15:45 on Oct 15, 2020 |
# ? Oct 15, 2020 15:27 |
|
OnceIWasAnOstrich posted:Consider taking a look at FAISS https://github.com/facebookresearch/faiss. It can do a lot of things but the core of it is extremely fast kNN searches on extremely large datasets with a variety of strategies to speeding things up on hideously big datasets. 20k is honestly pretty small (exhaustive search gets infeasible around 1M vectors with that code) so you can just use the flat index for exact results and not rely on the index tricks for approximate results. Since you refer to scipy's distance function I'm assuming you are using something normal like Euclidean distance so it probably includes a C-implementation of what you are using already. This is a pretty good recommendation! I am definitely keeping it in mind if my version chokes when I go above 1 million data points or so That being said, at the risk of someone cross-posting this to the coding horrors thread, because I am DEFINITELY not a real programmer, this function appears to work fine. (It seems to resolve ties differently than whatever sklearn knn does, but whatever, good enough for government work) https://pastebin.com/vXkuJ0Xe If anyone sees something real bad in there though, please save me from myself.
|
# ? Oct 16, 2020 03:14 |
|
For those using Python on a team, do you have a "python fundamentals for common scenarios at this company" training? I wrote a style guide for new people during onboarding that sets some of the preferences/expectations of how we write code. I find that this works pretty well and gives me something to reference for comments on PRs to train the new people quickly, especially coders with 2+ years in the field. For interns/new grads, I find that some very common "learning python" lessons get repeated, Examples of things that go in there that I get asked by any intern or new grad would be: -What is anaconda and virtualenv? -How to iterate over dicts and dataframes (and to not use for i in range(0, len(x))) -A video with git fundamentals (we also have a git process guide that explains how we do PRs) -Common stuff with querying APIs/interacting with websites (e.g. selenium vs requests, cover the requests params) -Django file layout -Web Frameworks -Turning JSON into a DataFrame & Flattening JSON -Pandas Merge/Concat/Join/Append CarForumPoster fucked around with this message at 13:17 on Oct 16, 2020 |
# ? Oct 16, 2020 13:05 |
|
How to use a code formatter (especially if projects are set to for it). The number of times I get merge requests etc. where spacing and other stuff is all over the place is quite impressive, especially since it's built into things like PyCharm. I also always tell people about type annotations, since they immediately help with development (e.g. in PyCharm) and make following code easier.
|
# ? Oct 16, 2020 20:19 |
|
A lot of people don't know how to use list comprehensions or f-strings, both super useful tools. Just providing contrasting examples of these vs not these can be helpful.
|
# ? Oct 16, 2020 21:57 |
|
Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap! Do the type hints improve performance substantially?
|
# ? Oct 16, 2020 23:44 |
|
Mirconium posted:Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap! On the other hand, if you write performance-intensive functions in Cython, you can potentially get huge performance gains by doing nothing special except declaring types.
|
# ? Oct 17, 2020 00:10 |
|
Mirconium posted:Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap! The type annotations are only used by third party tools, python ignores them at runtime, so unfortunately no performance impact.
|
# ? Oct 17, 2020 00:13 |
|
The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.
|
# ? Oct 17, 2020 00:22 |
|
quote:For those using Python on a team, do you have a "python fundamentals for common scenarios at this company" training? Yes, but but we spend much more time on non-python specific stuff. I.e. the importance of naming your variables, how to split up your code into functions and methods, how to split your functionalities in multiple files/objects, some very basic OOP concepts, some very basic FP concepts, etc. On that topic: does somebody know a python equivalent for Clean Code by Martin Fowler? I.e. the same topics, but with python code examples instead of java code examples?
|
# ? Oct 17, 2020 00:37 |
|
Dominoes posted:The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them. something that i'd wish i'd realized a long time ago: pass a list of dataclass elements to the pandas dataframe constructor and it just figures out the appropriate column names and types
|
# ? Oct 17, 2020 00:53 |
|
Zoracle Zed posted:something that i'd wish i'd realized a long time ago: pass a list of dataclass elements to the pandas dataframe constructor and it just figures out the appropriate column names and types I use pandas daily. Can you give an example of this?
|
# ? Oct 17, 2020 00:58 |
|
Dominoes posted:The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them. I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?
|
# ? Oct 17, 2020 01:54 |
|
Mirconium posted:I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language? Obviously no? I want to make a serverless webapp that gets some data from 3 or 4 REST APIs, turns it into a couple graphs and an excel file, then sends the graphs and excel file as an email to my boss. Give me some fully typed languages where that's easier than Python because that's a 1 or 2 day project if the APIs are your standard REST returning JSON and authenticating w/an API key. Whether you choose to use typing or not is irrelevant.
|
# ? Oct 17, 2020 02:52 |
|
Mirconium posted:I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language? I don’t see how it would be easier to learn a new language + ecosystem than to add type annotations to your Python code.
|
# ? Oct 17, 2020 02:52 |
|
Mirconium posted:I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language? Something interesting happened, gradually I think, over the past decades: Explicit typing went from being something required for compilers to assign memory etc, and could be a chore. With the addition of type inference and complex type systems, it shifted to a powerful tool that lets compilers, IDEs, and other tools (like mypy) catch bugs, and ensure your program acts how you intend. Dominoes fucked around with this message at 03:00 on Oct 17, 2020 |
# ? Oct 17, 2020 02:56 |
|
It also gets easier to use with each release. 3.9 added the ability to type hint collections without importing them from the typing module. So you can do list[int] instead of typing.List[int].
|
# ? Oct 17, 2020 03:00 |
|
That's awesome.
|
# ? Oct 17, 2020 03:01 |
I've caught bugs thanks to type annotations, use them
|
|
# ? Oct 17, 2020 03:06 |
|
Types are computer checked static assertions/tests. Use them! Malcolm XML fucked around with this message at 05:01 on Oct 17, 2020 |
# ? Oct 17, 2020 04:59 |
|
CarForumPoster posted:I use pandas daily. Can you give an example of this? I did something similar today with lists of objects code:
|
# ? Oct 17, 2020 05:02 |
|
Dominoes posted:Typing works well in Python, and (Subjective, of course) is almost always worth it, even for small programs. Oh, I agree 100%, this is half of why I love Rust, I can count on one hand the number of times I've really had to debug a program that compiles in Rust which is half due to static typing, so I appreciate the benefits of static typing as implicit debugging tremendously. I think the question I was trying to ask, which I ask as a non-CS-educated person (mainly a biologist) who genuinely doesn't know, is what are the benefits that Python derives from NOT having static types? My best guess from experience with Java/Rust is probably that you don't have to spend half your time writing abstract classes and interfaces when you need something generic to happen, since in Python you can just write a function that takes it on faith that whatever you pass it will have a .foo() method that does something .foo() should do? (Personally I continue to use Python because it has an obscenely good ecosystem of libraries which makes it extremely versatile. Say what you will about how they handled the transition from 2 to 3, but from an outsider's view it seems like Python has pretty good governance as a language if its library ecosystem is any indication. But if the Rust library ecosystem was as extensive as Python's, I would never write a line of another language again)
|
# ? Oct 17, 2020 05:33 |
|
Mirconium posted:I think the question I was trying to ask, which I ask as a non-CS-educated person (mainly a biologist) who genuinely doesn't know, is what are the benefits that Python derives from NOT having static types? quote:But if the Rust library ecosystem was as extensive as Python's, I would never write a line of another language again Another key perk of Python that I think Rust won't be able to catch up with is the REPL - especially variants like iPython. Using it for quickly testing functions, as a powerful, customizable calculator etc. Dominoes fucked around with this message at 17:35 on Oct 17, 2020 |
# ? Oct 17, 2020 13:44 |
|
The original reason for Python not having static types is that it makes writing the interpreter easier. In the interpreter, every single python object is the same type (PyObject) and supports exactly the same set of operations & data. Python grew out of a one man hobby project and lots of its warts stem from that. It's predecessor language (ABC) did have a static type checker (and so did Algol, FORTRAN and C, all of which are much older) Dynamic types also make objects changeable at runtime, which is useful for modifying stuff for tests or messing with other peoples code. Every function/attribute access is internally "Look up a value in a map of string -> PyObject, throw AttributeError if that key doesn't exist", so you can trivially rewire those mappings at runtime. Also people like having less stuff to type when working interactively or doing little shell script type things and those were python's main target area for most of its history. Guido van Rossum was resistant to adding type hints up until he moved from academia to having to maintain large codebases.
|
# ? Oct 17, 2020 19:55 |
|
|
# ? May 15, 2024 10:31 |
Yeah non static typing in languages like Python, PHP et al was an exercise in saving time but it turns out it's fine until it introduces loving horrible and subtle bugs in code. The experiment failed, declaring types is better and strong typing should be a bare minimum (it is a good thing that Python does not coerce types for you).
|
|
# ? Oct 17, 2020 23:56 |