Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

SurgicalOntologist: Jun 17, 2004

I've got a little puzzle... we have a bunch of Python packages with a similar test setup: pytest with pytest-cov, pytest-flake8, pytest-docstyle, and pytest-mypy. We just copied this configuration into a new library, one that has zero actual tests at this point (for the purpose of getting at least the linting for now), and we got 49% coverage. What gives? Does one of these linters actually execute the code in a way that registers the coverage measurement? And if so why only 49%? We're all quite confused.

# ? Oct 10, 2019 16:34

Adbot: ADBOT LOVES YOU

# ? May 27, 2024 02:43

zhar: May 3, 2019

Very simple question, I have a bunch of files like 1.json, 2.json, and so on that I need to play with in that order.

I try to use this:

code:

for file in sorted(os.listdir()):

and it gives me the list in this order: 1.json, 101.json etc. How do I get it to sort in the correct order?

# ? Oct 12, 2019 20:57

QuarkJets: Sep 8, 2008

You could zero pad all of the file names that start with a number prior to sorting

# ? Oct 12, 2019 21:07

necrotic: Aug 2, 2005; I owe my brother big time for this!

If you can't rename them then a for i in range(len(file_list)) and constructing the filename from i+1 would be easy, assuming no numbers are skipped in the filenames.

# ? Oct 12, 2019 21:19

a foolish pianist: May 6, 2007; (bi)cyclic mutation

code:

files = os.listdir(directory)
file_numbers = []
for file in files:
    file_numbers +=int(file.split(".")[0])

for i in range(0, high_number):
    if i in file_numbers:
        do_thing("().json".format(i))

# ? Oct 12, 2019 21:37

zhar: May 3, 2019

Thanks, I ended up padding with zfill.

# ? Oct 12, 2019 21:49

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

https://www.vice.com/en_us/article/zmjwda/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies?utm_source=reddit.com

The original paper can be found here: https://pubs.acs.org/doi/10.1021/acs.orglett.9b03216

If you dig enough you can find out that they assumed glob.glob output was sorted.

# ? Oct 13, 2019 03:02

Malcolm XML: Aug 8, 2009; I always knew it would end like ｔｈｉｓ．

Natural sort: https://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort

# ? Oct 13, 2019 03:22

VictualSquid: Feb 29, 2012; Gently enveloping the target with indiscriminate love.

I am reading my way through the testing goat book right now.

At some point he recommends automating deployment using:

code:

run(f'python3.6 -m venv virtualenv')

But that will obviously fail on a modern system. I can just change it to 3.7, but that feels janky.

When I googled it it looks like there is no option or even variant to automatically create a venv with a python version that isn't already installed through the package manager.

To me that implies that everybody in the python world expects that their software will be utterly abandoned by all users by the time 3.8 becomes the default for new linux installs. Is that really a reasonable assumption in the python world?

# ? Oct 13, 2019 16:16

Dominoes: Sep 20, 2007

I think the assumption is you must have the desired Py version installed. On Linux or Mac, this is straightfwd to do by building from source; using the included instructions, it will install to `usr/bin` under the alias `python3.7` etc. You can then either run `python3.8 -m venv virtualenv`, or `./usr/bin/python3.8 -m venv virtualenv`. (You may be able to find unofficial package-manager versions, but these are distro-specific, and may-or-may not be a pain to install) In Windows, the best way may be to use the installers; I don't think you'll get the aliases, but can use `./C/Program files/python3.8/Scripts/python.exe` etc.

quote:

When I googled it it looks like there is no option or even variant to automatically create a venv with a python version that isn't already installed through the package manager.

With a tool I created recently, you'd run `pyflow switch 3.8`, and `pyflow install` to download and switch to the new version. I think Conda may do this as well.

VictualSquid posted:

To me that implies that everybody in the python world expects that their software will be utterly abandoned by all users by the time 3.8 becomes the default for new linux installs.

Chances are, there will never become a default across linux distros. Based on history, 3.8 won't be the most popular for a long time. I think you'll find there are many different workflows around, and using an automated venv-creator tied to a version/alias is one of many techniques. Code built for Python 3.6 etc should work on 3.8... but not vice versa.

Dominoes fucked around with this message at 16:38 on Oct 13, 2019

# ? Oct 13, 2019 16:25

Dominoes: Sep 20, 2007

Thermopyle posted:

https://www.vice.com/en_us/article/zmjwda/a-code-glitch-may-have-caused-errors-in-more-than-100-published-studies?utm_source=reddit.com

The original paper can be found here: https://pubs.acs.org/doi/10.1021/acs.orglett.9b03216

If you dig enough you can find out that they assumed glob.glob output was sorted.

I haven't dug into the original chem papers, but it's surprising this wasn't picked up earlier due to surprising or inconsistent results... it makes me wonder how much publication bias played in.

# ? Oct 13, 2019 16:36

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Yeah.

It was actually caught because of different results in the same lab on computers running different OS's.

# ? Oct 13, 2019 16:39

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

Dominoes posted:

I think the assumption is you must have the desired Py version installed.

Yea, auto installing the latest python version for an environment of an otherwise known config seems like begging for problems.

# ? Oct 13, 2019 19:01

QuarkJets: Sep 8, 2008

Dominoes posted:

I haven't dug into the original chem papers, but it's surprising this wasn't picked up earlier due to surprising or inconsistent results... it makes me wonder how much publication bias played in.

It's a small miracle that it was discovered at all

The fundamental issue was that whoever wrote the script assumed that glob provides sorted results, so the returned file ordering was OS-dependent. Someone who's not actively looking for bugs in the script likely wouldn't notice this assumption, and in fact, hundreds of researchers using the script didn't notice the issue for over 5 years; how often does a chemist perform code review for widely-used software that they didn't write? I doubt it ever happens.

And then this error produces output that still looks entirely reasonable; in the example provided, one OS produced a value of 173.2 while another produced 172.7. A lot of the studies suffering from the problem might not even be sensitive to a calculation bias this small. And many of those studies actually published correct results, because they were running on an OS where glob coincidentally provided results in the order that the script was expecting

We like to imagine that scientists are rigorously reverifying each others' work all the time, but the fact of the matter is that there's little fame or profit to be found in doing that kind of essential work, so this reverification process doesn't happen nearly as often as it needs to. And in this reverification study, if the student had been running on an OS that provides the same glob ordering as the original study then the issue would have remained undiscovered.

# ? Oct 14, 2019 00:56

QuarkJets: Sep 8, 2008

CarForumPoster posted:

Yea, auto installing the latest python version for an environment of an otherwise known config seems like begging for problems.

It's kind of a ghetto way to do version control but yeah, it makes sense. If you wrote software for Python 3.7, you don't want someone to assume that it'll continue working fine under Python 4.5 in the year 2057

# ? Oct 14, 2019 00:59

The March Hare: Oct 15, 2006; _{Je r�ve d'un}
Wayne's World 3; Buglord

So I've worked with Python GUI development before and packaging is such a nightmare.

https://build-system.fman.io/docs/

I just found this which lets you make a pyqt5 app and build it with `fbs freeze` or create an installer with `fbs installer` and ran through their little tutorial and everything worked flawlessly. Does anyone have any experience going beyond the tutorial with this thing? It seems way too good to be true.

# ? Oct 15, 2019 04:55

cinci zoo sniper: Mar 15, 2013

3.8 is here.

# ? Oct 15, 2019 10:09

punished milkman: Dec 5, 2018; would have won

cinci zoo sniper posted:

3.8 is here.

The f-string debug unpacking shorthand or whatever it is seems neat I guess

# ? Oct 15, 2019 17:24

Hed: Mar 31, 2004; Fun Shoe

The walrus operator looks neat but I'll have to practice with it to see where it's good and more importantly where it might impair readability.

# ? Oct 17, 2019 19:06

SurgicalOntologist: Jun 17, 2004

Yeah, looks possible to overdo but I'll probably end up using it.

# ? Oct 17, 2019 19:57

Cockmaster: Feb 24, 2002

I've been playing around the multiprocessing library, and it looks like for what I was hoping to do with it (run a function which needs to execute on the order of 10,000 times, taking a total of roughly 1-2 seconds with a single thread), the overhead would use more time than I'd save.

Are there any other solutions for parallel processing in Python under Windows 10, ones that don't add major overhead? This function is probably too complex to be a good candidate for CUDA.

# ? Oct 19, 2019 03:21

QuarkJets: Sep 8, 2008

Is that 1-2 seconds overall, or 1-2 seconds per function call? If the former, you can probably still get a small performance increase but it may not be noticeable. If the latter, multiprocessing is extremely well-suited to your problem

But... you should use concurrent.futures instead. It's basically a high-level wrapper for multiprocessing that was introduced in Python 3.2. It's multiprocessing (and also multithreading), but even easier to use.

If you're mostly dealing with primitives and arrays, an even better and way-faster option is to use Numba to compile your function. Numba comes with a bunch of parallelization features, or you can just turn off the GIL and use your own multithreading with concurrent.futures if that's what you prefer.

# ? Oct 19, 2019 07:43

Cockmaster: Feb 24, 2002

QuarkJets posted:

Is that 1-2 seconds overall, or 1-2 seconds per function call? If the former, you can probably still get a small performance increase but it may not be noticeable. If the latter, multiprocessing is extremely well-suited to your problem

But... you should use concurrent.futures instead. It's basically a high-level wrapper for multiprocessing that was introduced in Python 3.2. It's multiprocessing (and also multithreading), but even easier to use.

If you're mostly dealing with primitives and arrays, an even better and way-faster option is to use Numba to compile your function. Numba comes with a bunch of parallelization features, or you can just turn off the GIL and use your own multithreading with concurrent.futures if that's what you prefer.

It's 1-2 seconds overall. If concurrent.futures is based on the multiprocessing library, wouldn't it have the same problem with overhead as using multiprocessing by itself?

It looks like Numba is my best option here (short of rewriting the function with Cython). Thank you.

# ? Oct 19, 2019 13:57

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Found a list of some use cases for the new walrus operator. I'm sure going forward we're all going to hear a million times "what's it good for?".

https://github.com/vlevieux/Walrus-Operator-Use-Cases

# ? Oct 19, 2019 23:04

QuarkJets: Sep 8, 2008

Cockmaster posted:

It's 1-2 seconds overall. If concurrent.futures is based on the multiprocessing library, wouldn't it have the same problem with overhead as using multiprocessing by itself?

It looks like Numba is my best option here (short of rewriting the function with Cython). Thank you.

Yes, that's right. I'd advise using concurrency.futures if your problem was 1-2 seconds per function call * 10000 iterations, but it probably won't be good when each function call is 1-2 ms

Also yeah, Numba is great and I prefer it over Cython, both for ease of use and for performance. In fact if you can compile the function in nopython mode then it may run so fast that you don't even need to consider parallelism

# ? Oct 19, 2019 23:19

Private Speech: Mar 30, 2011; I HAVE EVEN MORE WORTHLESS BEANIE BABIES IN MY COLLECTION THAN I HAVE WORTHLESS POSTS IN THE BEANIE BABY THREAD YET I STILL HAVE THE TEMERITY TO CRITICIZE OTHERS' COLLECTIONS

IF YOU SEE ME TALKING ABOUT BEANIE BABIES, PLEASE TELL ME TO

EAT. SHIT.

Thermopyle posted:

Found a list of some use cases for the new walrus operator. I'm sure going forward we're all going to hear a million times "what's it good for?".

https://github.com/vlevieux/Walrus-Operator-Use-Cases

The list comprehension stuff seems genuinely good, but the rest is a bit meh.

# ? Oct 20, 2019 11:30

punished milkman: Dec 5, 2018; would have won

Private Speech posted:

The list comprehension stuff seems genuinely good, but the rest is a bit meh.

I'm sure that this is at least partially because I haven't really used the walrus operator yet and I'll need time to adjust, but it seems so unintuitive and makes the code way less readable for me.

# ? Oct 20, 2019 14:08

Sad Panda: Sep 22, 2004; I'm a Sad Panda.

If I understand it properly, my main use will be shortening...

code:

invalid = True
while not invalid:
     do stuff

# ? Oct 20, 2019 14:49

NinpoEspiritoSanto: Oct 22, 2013

Sad Panda posted:

If I understand it properly, my main use will be shortening...
code:
invalid = True
while not invalid:
     do stuff

This is my current takeaway as well, though I rarely use regex for anything so I'm sure the match shortcut might come in handy for those that do. Some other examples at the link Thermopyle shared might come in handy.

# ? Oct 20, 2019 15:34

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

punished milkman posted:

I'm sure that this is at least partially because I haven't really used the walrus operator yet and I'll need time to adjust, but it seems so unintuitive and makes the code way less readable for me.

I've used it in several languages and it's still unintuitive and hard to read.

I think it's probably going to be over-used.

# ? Oct 20, 2019 17:53

punished milkman: Dec 5, 2018; would have won

There was some discussion a few pages back about the Obey the Testing Goat book for TDD with Python/Django. Just wanted to chime in and say I bought it because of whoever mentioned it and can vouch that it rules. If you're like me in that you've been doing lots of work with Django but have been neglecting doing adequate testing because it seems hard, overwhelming and maybe a waste of time, you should probably read this book.

# ? Oct 21, 2019 14:09

Norton the First: Dec 4, 2018; by Fluffdaddy

Is this an OK place to ask for help with something?

I've started working with pandas dataframes/pivot tables at my job. I have a pivot table that looks like this:

How do I add subtotals for elements of the middle level, e.g., a subtotal listing under "B" that sums over elements 2-14? I've found solutions online, but I haven't fully understood the reasoning, and trying to apply the solutions blindly has given me different kinds of unsatisfactory results.

# ? Oct 22, 2019 19:48

SurgicalOntologist: Jun 17, 2004

code:

df.groupby('Middle Level').sum()

One thing to get used to, if you're coming from Excel, is that you wouldn't put the subtotals "under" the raw data. I mean, you could probably do it, with the above line then assigning new rows, but it would be awkward. Better to use pandas to do the calculations, then if you need a pretty output at some point you would output to html or something.

# ? Oct 22, 2019 20:21

Norton the First: Dec 4, 2018; by Fluffdaddy

SurgicalOntologist posted:

code:
df.groupby('Middle Level').sum()
One thing to get used to, if you're coming from Excel, is that you wouldn't put the subtotals "under" the raw data. I mean, you could probably do it, with the above line then assigning new rows, but it would be awkward. Better to use pandas to do the calculations, then if you need a pretty output at some point you would output to html or something.

Thanks for the reply. I suppose that (the prettiness) is the real question. For context, in this unit people have been taking raw data reports they get weekly, pasting them into a workbook, and constructing Excel pivot tables by hand for a whole host of tasks. The goal is to give them the same output they're used to programmatically.

# ? Oct 22, 2019 21:18

Tayter Swift: Nov 18, 2002; Pillbug

Multilevel indicies destroy my brain and I hates them. I like to convert stuff to tall format and operate that way, then pivot back to what form the output needs to be in. (I think that's called Tidy Data? Dunno)

# ? Oct 22, 2019 23:29

Business: Feb 6, 2007

KICK BAMA KICK posted:

Reread some code I wrote before finishing Obey the Testing Goat (again, strong recommend, not sure anything has ever helped me as much as that) and realizing I had actually rolled my own extremely stupid version of unittest.mock to fake an external API, return some fake data, capture the arguments used, the whole nine yards.

Kinda addicting going back through my code and mocking out every last thing to create proper unit tests though I definitely see the concerns raised about mocks tying you to a particular implementation, sometimes to the point of tests almost looking like tautologous restatements of the code being tested.

jumping on the testing train myself now! any advice from windows vets on getting geckodriver/selenium working there? It seems annoying but I haven't messed with it too much yet

Business fucked around with this message at 15:39 on Oct 23, 2019

# ? Oct 23, 2019 15:20

KICK BAMA KICK: Mar 2, 2009

Business posted:

jumping on the testing train myself now! any advice from windows vets on getting geckodriver/selenium working there? It seems annoying but I haven't messed with it too much yet

Other than the occasional version mismatch after Firefox auto-updates I didn't have any problem with geckodriver as long as I stuck the executable in the root of my project.

# ? Oct 23, 2019 15:36

punished milkman: Dec 5, 2018; would have won

Business posted:

jumping on the testing train myself now! any advice from windows vets on getting geckodriver/selenium working there? It seems annoying but I haven't messed with it too much yet

I haven't used it on Windows but maybe check out webdriver_manager and see if it works for you (https://github.com/SergeyPirogov/webdriver_manager). Installs via pip and automatically downloads the latest version of the needed driver for whatever browser you're working with.

# ? Oct 23, 2019 15:38

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Business posted:

jumping on the testing train myself now! any advice from windows vets on getting geckodriver/selenium working there? It seems annoying but I haven't messed with it too much yet

You just download geckodriver and use it. Nothing special about windows.

# ? Oct 23, 2019 17:05

Adbot: ADBOT LOVES YOU

# ? May 27, 2024 02:43

Business: Feb 6, 2007

Thanks yeah I was overthinking it. Messed something up the first try and based on google got the impression it was more complicated than it actually was

# ? Oct 23, 2019 17:39

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »