Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Macichne Leainig
Jul 26, 2012

by VG

QuarkJets posted:

Good news! The compat module should trivialize your transition to TF2 but yeah it's still nonzero effort. There are some significant performance and design improvements in TF2, I recommend it

I definitely will upgrade at some point, it's just such a low priority in the scheme of things. We're building up for an acquisition at the end of the month, so some other things require due diligence.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

Wallet posted:

I could be misremembering but I'm pretty sure pathlib handles this on its own. I think the only real reason to use PureWindowsPath or PurePosixPath is if you are trying to gently caress around with a Windows path on a POSIX machine or vice versa.

Yeah; if you just use Path pathlib will transform the path in whatever way is appropriate to the local system, including changing slash directions

Foxfire_
Nov 8, 2010

Windows also accepts either direction

QuarkJets
Sep 8, 2008

I believe os.path also auto-converts so long as you always use posix-style slashes (e.g. even if you just print the string returned from an os.path function, its slashes will be correct for the OS)

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!

QuarkJets posted:

Yeah; if you just use Path pathlib will transform the path in whatever way is appropriate to the local system, including changing slash directions

This makes sense and it's what I thought from the last time I did it, but it just ... wasn't. I'll just poke it with a stick again tomorrow and come running back if it's being grouchy.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!

Rocko Bonaparte posted:

I'll just poke it with a stick again tomorrow and come running back if it's being grouchy.

...and here I am and it's being grouchy.

Windows w/ 3.8.3
code:
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:37:02) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> win_path = r"..\this\that.txt"
>>> lin_path = "../this/that.txt"
>>> pw = pathlib.Path(win_path)
>>> pl = pathlib.Path(lin_path)
>>> str(pw)
'..\\this\\that.txt'
>>> str(pl)
'..\\this\\that.txt'
Both paths on Windows come out with backslashes.

code:
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pathlib
>>> win_path = r"..\this\that.txt"
>>> lin_path = "../this/that.txt"
>>> pw = pathlib.Path(win_path)
>>> pl = pathlib.Path(lin_path)
>>> str(pw)
'..\\this\\that.txt'
>>> str(pl)
'../this/that.txt'
It's just regurgitating the slashes I originally set.

I'm guessing that the pathlib on each system is more the problem than anything and doing a str() on the path isn't really sufficient. Is there some other way I should get these paths without having to test for OS?

necrotic
Aug 2, 2005
I owe my brother big time for this!
Is it something they fixed between the two different versions you used?

edit: Ah, nope! The _WindowsFlavour defines an altsep while the _PosixFlavour does not!

necrotic fucked around with this message at 20:09 on Oct 13, 2020

Foxfire_
Nov 8, 2010

That looks like correct behavior?

On windows, the paths "..\this\that.txt" and "../this/that.txt" both mean "up one directory, into a directory named this, file name that.txt"
On linux, the path "../this/that.txt" means that, but the path "..\this\that.txt" means "file named ..\this\that.txt" in the current directory. ..\this\that.txt is a legal posix filename because posix filenames are terrible (newlines and nonprinting UTF-8 characters are also legal posix filenames)

If you want to make that be interpreted as a path with directories on linux, you will need to either explicitly tell it to interpret it as a windows path or manipulate the string into a posix path

Foxfire_ fucked around with this message at 20:08 on Oct 13, 2020

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!

Foxfire_ posted:

..\this\that.txt is a legal posix filename because posix filenames are terrible (newlines and nonprinting UTF-8 characters are also legal posix filenames)
Yeah that's wild right there. So I guess I need some extra logic for this after all.

Foxfire_
Nov 8, 2010

If you're composing the paths, just always use forward slashes since windows will also accept that.
If you're accepting the paths from users, you'll need to decide how to interpret what they mean when they enter "this\that" on linux since it technically means filename but they probably didn't mean that if they are good upstanding people

QuarkJets
Sep 8, 2008

Rocko Bonaparte posted:

Yeah that's wild right there. So I guess I need some extra logic for this after all.

Try building the path instead:

Python code:
output = Path("..", "this", "that.txt")
This will insert the correct separators for whatever system you're currently on. You can also use os.path.join to get the same result but as a string instead of a Path object

QuarkJets fucked around with this message at 21:28 on Oct 13, 2020

mr_package
Jun 13, 2000
Does PurePath.as_posix() help at all, at least if running the code on Windows?

DearSirXNORMadam
Aug 1, 2009
Hello thread, my old friend

I have a need to compute k nearest neighbors on mid-to-large datasets (20k samples and above)

Default implementations of exact kNN via ball tree and the like in sklearn stall out if you actually attempt to do this because they rely on computing the entire pairwise distance matrix for the whole dataset using scipy's single-threaded sadgasm of a distance function.

Wishing to avoid approximate nearest neighbors, I think it may be worth it just to roll my own fast knn with a lazily evaluated distance metric. This way we can split up the dataset into small chunks, and by computing the one vs all distance of a single sample in the chunk, we can use the triangle inequality to guarantee no sample outside the chunk is closer than some distance. After that we can quickly solve a small knn problem.

My problem now is how best to go about writing a performant, lazily evaluated, cached distance function in python? Ideally it would be something dictionary or hashmap-esque, where the key is a pair of indices representing two samples and the value is evaluated once when called, then stored for future reference. Unfortunately python dictionaries get sort of slow if they get really huge. Does anyone have suggestions?

(The temptation to do this in rust and then try to get it to be callable from python is overwhelming, but I know that that way lies only madness)

DearSirXNORMadam fucked around with this message at 05:46 on Oct 15, 2020

Foxfire_
Nov 8, 2010

That sounds large enough & you caring about performance enough that using python is a bad idea. Python is very, very slow. If you're doing large numerical stuff, you can really only use it as a glue language to connect together pieces implemented in something else. Backing it with rust or C, or writing in a python-adjacent language like Cython or numba seem like better ideas to me.

How many times are you going to use this thing and how many points are you going to classify? If it's not that many, a naive iteration through all points for every lookup in not-python might be the fastest thing considering your time to implement it. 20,000 points is not that many. Like making up some numbers and calling each point 100 doubles long, that's only 16MB of memory to go through per classification.

SurgicalOntologist
Jun 17, 2004

Why avoid ANN?

OnceIWasAnOstrich
Jul 22, 2006

Mirconium posted:

Hello thread, my old friend

I have a need to compute k nearest neighbors on mid-to-large datasets (20k samples and above)

Default implementations of exact kNN via ball tree and the like in sklearn stall out if you actually attempt to do this because they rely on computing the entire pairwise distance matrix for the whole dataset using scipy's single-threaded sadgasm of a distance function.

Wishing to avoid approximate nearest neighbors, I think it may be worth it just to roll my own fast knn with a lazily evaluated distance metric. This way we can split up the dataset into small chunks, and by computing the one vs all distance of a single sample in the chunk, we can use the triangle inequality to guarantee no sample outside the chunk is closer than some distance. After that we can quickly solve a small knn problem.

My problem now is how best to go about writing a performant, lazily evaluated, cached distance function in python? Ideally it would be something dictionary or hashmap-esque, where the key is a pair of indices representing two samples and the value is evaluated once when called, then stored for future reference. Unfortunately python dictionaries get sort of slow if they get really huge. Does anyone have suggestions?

(The temptation to do this in rust and then try to get it to be callable from python is overwhelming, but I know that that way lies only madness)

Consider taking a look at FAISS https://github.com/facebookresearch/faiss. It can do a lot of things but the core of it is extremely fast kNN searches on extremely large datasets with a variety of strategies to speeding things up on hideously big datasets. 20k is honestly pretty small (exhaustive search gets infeasible around 1M vectors with that code) so you can just use the flat index for exact results and not rely on the index tricks for approximate results. Since you refer to scipy's distance function I'm assuming you are using something normal like Euclidean distance so it probably includes a C-implementation of what you are using already.

edit: That said, 20k is not a lot of vectors at all and I've used the sklearn BallTree with no problems for both Euclidean or Cosine distance metrics on similarly sized datasets. The scipy pdist() function only takes about 10 seconds on my machine to compute the full distance matrix for 20k vectors of 100 dimensions each. Don't give it a callable metric, that will ruin things with Python overhead. If you can't use any of the included metrics you'll need to construct your method with not-Python in some way.

OnceIWasAnOstrich fucked around with this message at 15:45 on Oct 15, 2020

DearSirXNORMadam
Aug 1, 2009

OnceIWasAnOstrich posted:

Consider taking a look at FAISS https://github.com/facebookresearch/faiss. It can do a lot of things but the core of it is extremely fast kNN searches on extremely large datasets with a variety of strategies to speeding things up on hideously big datasets. 20k is honestly pretty small (exhaustive search gets infeasible around 1M vectors with that code) so you can just use the flat index for exact results and not rely on the index tricks for approximate results. Since you refer to scipy's distance function I'm assuming you are using something normal like Euclidean distance so it probably includes a C-implementation of what you are using already.

edit: That said, 20k is not a lot of vectors at all and I've used the sklearn BallTree with no problems for both Euclidean or Cosine distance metrics on similarly sized datasets. The scipy pdist() function only takes about 10 seconds on my machine to compute the full distance matrix for 20k vectors of 100 dimensions each. Don't give it a callable metric, that will ruin things with Python overhead. If you can't use any of the included metrics you'll need to construct your method with not-Python in some way.

This is a pretty good recommendation! I am definitely keeping it in mind if my version chokes when I go above 1 million data points or so

That being said, at the risk of someone cross-posting this to the coding horrors thread, because I am DEFINITELY not a real programmer, this function appears to work fine. (It seems to resolve ties differently than whatever sklearn knn does, but whatever, good enough for government work)

https://pastebin.com/vXkuJ0Xe

If anyone sees something real bad in there though, please save me from myself.

CarForumPoster
Jun 26, 2013

⚡POWER⚡
For those using Python on a team, do you have a "python fundamentals for common scenarios at this company" training?

I wrote a style guide for new people during onboarding that sets some of the preferences/expectations of how we write code. I find that this works pretty well and gives me something to reference for comments on PRs to train the new people quickly, especially coders with 2+ years in the field.

For interns/new grads, I find that some very common "learning python" lessons get repeated,

Examples of things that go in there that I get asked by any intern or new grad would be:
-What is anaconda and virtualenv?
-How to iterate over dicts and dataframes (and to not use for i in range(0, len(x)))
-A video with git fundamentals (we also have a git process guide that explains how we do PRs)
-Common stuff with querying APIs/interacting with websites (e.g. selenium vs requests, cover the requests params)
-Django file layout
-Web Frameworks
-Turning JSON into a DataFrame & Flattening JSON
-Pandas Merge/Concat/Join/Append

CarForumPoster fucked around with this message at 13:17 on Oct 16, 2020

Hollow Talk
Feb 2, 2014
How to use a code formatter (especially if projects are set to for it). The number of times I get merge requests etc. where spacing and other stuff is all over the place is quite impressive, especially since it's built into things like PyCharm.

I also always tell people about type annotations, since they immediately help with development (e.g. in PyCharm) and make following code easier.

QuarkJets
Sep 8, 2008

A lot of people don't know how to use list comprehensions or f-strings, both super useful tools. Just providing contrasting examples of these vs not these can be helpful.

DearSirXNORMadam
Aug 1, 2009
Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap!

Do the type hints improve performance substantially?

Zugzwang
Jan 2, 2005

You have a kind of sick desperation in your laugh.


Ramrod XTreme

Mirconium posted:

Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap!

Do the type hints improve performance substantially?
Type hinting alone doesn’t improve performance. It does help developers know what is supposed to go where and what’s supposed to come out of a function. It’s also not enforced at all; you can hint that a function takes an int argument and returns a string, but write code that accepts a list and returns a dict.

On the other hand, if you write performance-intensive functions in Cython, you can potentially get huge performance gains by doing nothing special except declaring types.

Imbroglio
Mar 8, 2013

Mirconium posted:

Wait holy smokes when did Python get (I assume optional?) types? I've been living under a rock for a while, admittedly, but holy crap!

Do the type hints improve performance substantially?

The type annotations are only used by third party tools, python ignores them at runtime, so unfortunately no performance impact.

Dominoes
Sep 20, 2007

The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.

Walh Hara
May 11, 2012

quote:

For those using Python on a team, do you have a "python fundamentals for common scenarios at this company" training?

Yes, but but we spend much more time on non-python specific stuff. I.e. the importance of naming your variables, how to split up your code into functions and methods, how to split your functionalities in multiple files/objects, some very basic OOP concepts, some very basic FP concepts, etc.

On that topic: does somebody know a python equivalent for Clean Code by Martin Fowler? I.e. the same topics, but with python code examples instead of java code examples?

Zoracle Zed
Jul 10, 2001

Dominoes posted:

The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.

something that i'd wish i'd realized a long time ago: pass a list of dataclass elements to the pandas dataframe constructor and it just figures out the appropriate column names and types

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Zoracle Zed posted:

something that i'd wish i'd realized a long time ago: pass a list of dataclass elements to the pandas dataframe constructor and it just figures out the appropriate column names and types

I use pandas daily. Can you give an example of this?

DearSirXNORMadam
Aug 1, 2009

Dominoes posted:

The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Mirconium posted:

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?

Obviously no?

I want to make a serverless webapp that gets some data from 3 or 4 REST APIs, turns it into a couple graphs and an excel file, then sends the graphs and excel file as an email to my boss.

Give me some fully typed languages where that's easier than Python because that's a 1 or 2 day project if the APIs are your standard REST returning JSON and authenticating w/an API key. Whether you choose to use typing or not is irrelevant.

susan b buffering
Nov 14, 2016

Mirconium posted:

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?

I don’t see how it would be easier to learn a new language + ecosystem than to add type annotations to your Python code.

Dominoes
Sep 20, 2007

Mirconium posted:

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?
Typing works well in Python, and (Subjective, of course) is almost always worth it, even for small programs.

Something interesting happened, gradually I think, over the past decades: Explicit typing went from being something required for compilers to assign memory etc, and could be a chore. With the addition of type inference and complex type systems, it shifted to a powerful tool that lets compilers, IDEs, and other tools (like mypy) catch bugs, and ensure your program acts how you intend.

Dominoes fucked around with this message at 03:00 on Oct 17, 2020

susan b buffering
Nov 14, 2016

It also gets easier to use with each release. 3.9 added the ability to type hint collections without importing them from the typing module. So you can do list[int] instead of typing.List[int].

Dominoes
Sep 20, 2007

That's awesome.

NinpoEspiritoSanto
Oct 22, 2013




I've caught bugs thanks to type annotations, use them

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Types are computer checked static assertions/tests. Use them!

Malcolm XML fucked around with this message at 05:01 on Oct 17, 2020

lazerwolf
Dec 22, 2009

Orange and Black

CarForumPoster posted:

I use pandas daily. Can you give an example of this?

I did something similar today with lists of objects

code:
from pandas import DataFrame
my_list = [
   { 'id': 1, 'label': 'item1', 'price': 50 }, 
   { 'id': 2, 'label': 'item2', 'price': 200 },
   # ...etc
]
df = DataFrame(my_list)
print(df)


   id  label  price
0   1  item1     50
1   2  item2    200
I believe I've done something similar with Django objects as well

DearSirXNORMadam
Aug 1, 2009

Dominoes posted:

Typing works well in Python, and (Subjective, of course) is almost always worth it, even for small programs.

Something interesting happened, gradually I think, over the past decades: Explicit typing went from being something required for compilers to assign memory etc, and could be a chore. With the addition of type inference and complex type systems, it shifted to a powerful tool that lets compilers, IDEs, and other tools (like mypy) catch bugs, and ensure your program acts how you intend.

Oh, I agree 100%, this is half of why I love Rust, I can count on one hand the number of times I've really had to debug a program that compiles in Rust which is half due to static typing, so I appreciate the benefits of static typing as implicit debugging tremendously.

I think the question I was trying to ask, which I ask as a non-CS-educated person (mainly a biologist) who genuinely doesn't know, is what are the benefits that Python derives from NOT having static types?

My best guess from experience with Java/Rust is probably that you don't have to spend half your time writing abstract classes and interfaces when you need something generic to happen, since in Python you can just write a function that takes it on faith that whatever you pass it will have a .foo() method that does something .foo() should do?

(Personally I continue to use Python because it has an obscenely good ecosystem of libraries which makes it extremely versatile. Say what you will about how they handled the transition from 2 to 3, but from an outsider's view it seems like Python has pretty good governance as a language if its library ecosystem is any indication. But if the Rust library ecosystem was as extensive as Python's, I would never write a line of another language again)

Dominoes
Sep 20, 2007

Mirconium posted:

I think the question I was trying to ask, which I ask as a non-CS-educated person (mainly a biologist) who genuinely doesn't know, is what are the benefits that Python derives from NOT having static types?
My biggest guess is that it was born in an era before static typing as a verification tool was popular. But, perhaps there's a more subtle reason I'm missing.

quote:

But if the Rust library ecosystem was as extensive as Python's, I would never write a line of another language again
I'm optimistic about that as well. I think the timeline will vary significantly based on the area. For example, despite the work on the web server ecosystem, Rust has nothing near as powerful or feature-rich as Django. And for numerical/scientific computing, Python's lead will be hard to catch up with. I can see Rust being used to optimize libraries using FFI that are called by Python.

Another key perk of Python that I think Rust won't be able to catch up with is the REPL - especially variants like iPython. Using it for quickly testing functions, as a powerful, customizable calculator etc.

Dominoes fucked around with this message at 17:35 on Oct 17, 2020

Foxfire_
Nov 8, 2010

The original reason for Python not having static types is that it makes writing the interpreter easier. In the interpreter, every single python object is the same type (PyObject) and supports exactly the same set of operations & data. Python grew out of a one man hobby project and lots of its warts stem from that. It's predecessor language (ABC) did have a static type checker (and so did Algol, FORTRAN and C, all of which are much older)

Dynamic types also make objects changeable at runtime, which is useful for modifying stuff for tests or messing with other peoples code. Every function/attribute access is internally "Look up a value in a map of string -> PyObject, throw AttributeError if that key doesn't exist", so you can trivially rewire those mappings at runtime. Also people like having less stuff to type when working interactively or doing little shell script type things and those were python's main target area for most of its history. Guido van Rossum was resistant to adding type hints up until he moved from academia to having to maintain large codebases.

Adbot
ADBOT LOVES YOU

NinpoEspiritoSanto
Oct 22, 2013




Yeah non static typing in languages like Python, PHP et al was an exercise in saving time but it turns out it's fine until it introduces loving horrible and subtle bugs in code. The experiment failed, declaring types is better and strong typing should be a bare minimum (it is a good thing that Python does not coerce types for you).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply