Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Malcolm XML
Aug 8, 2009

I always knew it would end like this.

QuarkJets posted:

Basically I was going to say what Foxfire_ said; for loops in Python are slow and usually aren't what you want to use for bulk-computation. VBA is a compiled language, but Python is not.

The first-order speedup you can apply to code like this is to use vectorized numpy operations instead of for loops.
code:

import numpy as np
for x in range(0, nummonths):
    cohorts.iloc[x, x:nummonths] = data.ix[x, 'nps'] * np.power(data.ix[x, 'duration'], np.array(range(nummonths-x)))

That's one loop removed and should be significantly faster. It's possible to get rid of the outer loop as well using something like meshgrid to take care of the indexing but it's too late and I'm too tired to type it all out

Pandas indexing is insanely slow compared to the underlying arrays

Also vectorize your poo poo

Or don't and use numba its a tossup as to which ends up faster

Adbot
ADBOT LOVES YOU

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
This has little to do with python being compiled or not: pandas has a ton of overhead if you don't do things the correct way

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Foxfire_ posted:

Huh, I was expecting the pandas overhead to be much smaller. Some testing shows that it goes:

vectorized numpy > vectorized pandas >> iterative numpy >>> iterative pandas

An iterative solution in something with known types (C/Fortran/Java/numba/Julia/whatever) will still be faster for complicated calculations than the vectorized versions since the vectorized ones destroy all cache locality once the dataset is big enough (by the time you go back to do operation 2 on item 1, it's been booted out of the cache), but you can still get enough speedup to move some algorithms from unworkable to useable.

Python's speed problems mostly don't have to do with it being compiled in advance or not. They're consequences of how it works with types and the stuff it lets you change. It's basically impossible to JIT compile stuff like you would in Java because you can't prove that a loop isn't doing things like monkey patching functions or operators at some random loop iteration in the middle. The interpreter has to execute what you put down naively since it can't prove that it's safe to take any shortcuts.

Timing results:
code:

import numpy as np
import pandas as pd

numpy_array1 = np.random.rand(100000)
numpy_array2 = np.random.rand(100000)

print("Vectorized numpy")
%timeit  out = numpy_array + numpy_array2

print("Iterative numpy")
out = np.empty(100000, dtype=np.float64)
%timeit for i in np.arange(100000): out[i] = numpy_array1[i] + numpy_array2[i]

pandas_dataframe = pd.DataFrame({'A': numpy_array1, 'B': numpy_array2})
print("Vectorized pandas")
%timeit out = pandas_dataframe.A + pandas_dataframe.B

print("Iterative pandas")
out = np.empty(100000, dtype=np.float64)
%timeit for i in np.arange(100000): out[i] = pandas_dataframe.A.iloc[i] + pandas_dataframe.B.iloc[i]

code:

Vectorized numpy
10000 loops, best of 3: 150 µs per loop
Iterative numpy
10 loops, best of 3: 52.1 ms per loop
Vectorized pandas
10000 loops, best of 3: 181 µs per loop
Iterative pandas
1 loop, best of 3: 4.3 s per loop

It is entirety possible to JIT compile python using techniques similar to JavaScript -- hidden classes, partial evaluation, polymorphic inline caching-- but there's a lot more money behind Js dev


Pypy does a lot of good things but is a research project first . Maybe graal will help , jruby augmented with truffle and graal is insanely fast comparatively

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

oliveoil posted:

Is there anything like the JVM spec but for Python?

I want to know what I can assume about the way Python works. I want to know what's the memory overhead for creating an object, for example. How many bits are in an integer? And so on. I don't want to cobble together answers from StackOverflow that maybe describe the behavior of a specific Python implementation that isn't necessarily fixed, for example.

Also, I'd like to separately (so that I know what is prescribed for all Python implementations, and what is specific to or just an implementation detail of the most popular implementation) about the internals of the most popular Python implementation. I'm guessing that's CPython, but I imagine 2.7 and 3.blah are pretty different. In general, though, I want to know stuff like how does garbage collection work in CPython? What happens when I create a thread? How do locks work? Java has a bunch of explanation about "happens-before" relationships when writing multi-threaded programs - what's the Python equivalent?

LMAO its whatever Guido fever dreams up and shits out in Cpython


Hell it took a heroic effort to get ruby specified and that had people who cared behind it

Spelunk through the source. Cpython uses a ref counting mechanic fwiw

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Fwiw attrs is a better namedtuple at the cost of needing a dependency

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

shrike82 posted:

Sorry, I meant that it's so trivial in Python that people don't tend to think "Oh, I need to use a DI/IoC framework here".
More generally, and imho, people spend less time thinking about design patterns when building stuff in Python.

This is a good read if you want to learn more about Python design patterns
http://www.aleax.it/gdd_pydp.pdf

The main use I see of IOC containers is magically getting constructor args on creation by registering classes and interfaces which is v unpythonic

Otherwise just pass them in as arguments

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

This is a nice library that I use fairly often. backoff.

When working with external APIs over the network you have to handle the stuff that happens when you're dealing with networks and other network devices. This library helps with that. Amazon has a good post on their architecture blog about the best algorithms for choosing a time to wait between retrying failed requests if you want to read about it.

Anyway, it's pretty simple to use...you just decorate the function that might fail and the decorator retries the function until its successful.

Simple example that backs off each request by an exponential amount of time:
Python code:
@backoff.on_exception(backoff.expo,
                      requests.exceptions.RequestException,
                      max_tries=8)
def get_url(url):
    return requests.get(url)
Complex example using multiple decorators to catch different types of stuff:
Python code:
@backoff.on_predicate(backoff.fibo, max_value=13)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.HTTPError,
                      max_tries=4)
@backoff.on_exception(backoff.expo,
                      requests.exceptions.TimeoutError,
                      max_tries=8)
def poll_for_message(queue):
    return queue.get()
Anyway, I thought I'd share this as beginners often get into python wanting to scrape websites or download some sort of data. This is a good thing to implement if you're doing that!

There's another, more popular, library called Retrying that you can take a look at as well. The reason I don't use it is that it doesn't support the recommended algorithm in that Amazon blog post, but you might be interested in looking at it as well.

oh nice i use retrying but this looks to be more configurable at runtime

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

huhu posted:

code:

temp/
	target1/
		/x
			file1.txt
			file2.txt
		/y
			file1.txt
			file2.txt
	target2/
		...

code:

rootPath = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'temp')
targets = [target for target in os.listdir(rootPath) if os.path.isdir(os.path.join(rootPath, target))]
for target in targets:
    xPath = os.path.join(rootPath, target, "x")
    xFile = os.listdir(xPath)[0]
    with open(os.path.join(xPath, xFile) "r") as f:
        for line in f:
            print(line) # Whatever commands I'd actually need to do with this line here. 

Is there a cleaner way to not have so many os.path.join() or would that require knowing the operating system I'm working with?

Import pathlib

Or just use '/' since windows can normalize it just fine (except for the leading r'\\')

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
fluent python owns

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

LochNessMonster posted:

Never knew floating points behave like this, learning something knew everyday.

:psyduck:

Did u think that 64 bits was enough to represent all of the reals?

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Boris Galerkin posted:

It's more of a hardware limitation than an actual thing. In real life 0.3 is exactly 0.3, but we lose some accuracy when we represent it in a computer.

I imagine for most people none of this has any direct relevance, but if you're doing any sort of numerical/computational work then this type of stuff comes up all the time. Doing simulations for example we already accept that there is an inherent error from the actual math equations and simplifications to them, but we also accept that there are errors from discretization (dividing the problem up into multiple smaller parts), and we also must be aware of problems in roundoff/precision due to the computer. Lots of fields use 64 bit floats for example because it gives us more precision (more decimals).

I remember one of my earliest courses in this field our professor intentionally misled us to use MATLAB to write our first finite difference solver to solve a problem that produced nonsense results because the default floating point precision in MATLAB (at that time? not sure if it's the case still) was 32bit. Due to error propagation these errors eventually ruined the solution. Telling MATLAB (or using C/Fortran double precision numbers) to use 64 bit reals on the other hand fixed the problem because these errors never had a chance to propagate.

Look at this scrub who doesn't compensate for floating pt errors and is agnostic to precision.


Mixed precision iterative refinement :hellyeah:

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Next y'all will be like "use doubles" or maybe even "use quads" and then come back like "my prog is mega slow"

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

QuarkJets posted:

That has nothing to do with floating point errors though; exact Decimal types exist without having to represent all of the reals

but the point is that no finite representation can represent real numbers w/o some loss and imprecision, fixed point decimal does it differently than IEEE-754


arbitrary precision is a different game but still limited by practicality

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Cingulate posted:

Python is a language that's being used by people with very little CS knowledge (e.g., me). That's, from what I can tell, by design: Python is easy to learn, easy to read, welcoming and forgiving. I think the thread has a good track record of being exactly like that, too.

Nippashish posted:

That's why I'm suggesting a very simple mental model that is intuitive and also sufficient for even non-beginners. Teaching people that floating point numbers are dark and spooky and complicated isn't very productive, because very few people need to care about that level of detail.

I'm responding to "Numerical Analysis is not an easy subject and it frustrates me when people pretend it's simple & intuitive" by pointing out that for most practical purposes it can be made to be exactly that.

Actually this is the worst possible takeaway because Numerical analysis is a) hard and b) for even most practical purposes requires an understanding of when it'll fail on you. Catastrophic cancellation and loss of precision can lead to cases where your component will fail and fail hard. Unless you don't want to engineer robustly, sure you can ignore it. I have run into many cases where poor or no understanding of float approximations have lead to pernicious bugs in production systems costing lots of money

While I don't expect people to understand IEEE-754 the standard in its entirety, it is immensely unhelpful to present it as a real number/exact fraction abstraction since it's leakier than a sieve (but designed in such a way to be useful for those in the know) and frequently beginners will smash their heads against the wall for days until they are told how the semantics of floats work when they run into "random" "non-deterministic" errors that are 100% deterministic.


For example, addition is commutative but not associative. This isn't trivial nor particularly expected.






funny Star Wars parody posted:

I'm assuming that you should use doubles but please expand on this
>>> 1e16 + 1 == 1e16
True

A way of salami slicing

Perfectly fine if your algorithms are insensitive to things like that (i use floats in quant finance optimization models) but try simulating a chaotic system and watch as floats are completely useless

It gets better when you have no idea how chaotic your system is!

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

LochNessMonster posted:

As the root cause of this discussion as well as a beginner in terms of programming I can tell you that you're over simplifying things.

Before this discussion I had no clue that floats would give irregular results. I was trying to do some really basic math functions and they were off by 10-20%.

If I'm running into that in my 2nd project with basic funtionality I'd say it's defintely something almost everyone will care about rather sooner than later.

I'd rather have people tell me I should watch out with where I'm using floats (like they did, thank you) than tell me not to worry because I don't need to care about that level of detail.

Floats are 100% deterministic. They will do 100% of what you tell them to do, but that's a pretty complex operation involving rescales and possibly higher precision intermediaries.

If you are calculating mathematical functions, use a library or look up a numerically stable algorithm.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

SurgicalOntologist posted:

Umm I simulate chaotic systems... what should I be using if not floats?

a complicated algorithm that compensates for the fact that the system is very sensitive to accumulated approximation error and a lot of whiskey when b/c even then long term behavior is sketchy


Nippashish posted:

I'd actually go further and say that it's not just beginners, but most programmers. If you need to worry about the details of floating point numbers then you are doing something quite unusual.

any programmer that's using floats for numerical work needs to know when and how they work so when they gently caress up they aren't surprised. math.fsum is in the standard library for a reason

Better to learn this now than with actual repercussions on the line


QuarkJets posted:

This is the level of worrying that Nippashish said a newbie shouldn't be at, and he was 100% right. "Why is this single line of floating-point arithmetic giving me a funny-looking answer" requires a very basic understanding of the issue, not "read this essay on the difficult subject of numerical analysis"

Newbies shouldn't worry about how to use lathes, just have them lose a finger or break a few thousand parts instead of having them learn how to use their tools

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

QuarkJets posted:

Is writing bad analogies required for numerical analysis, now?

You're seriously fighting for the position that says "a person new to programming has to have a solid understanding of the difficult topic of numerical analysis before writing a program that uses floats". Is that really the hill that you want to die on?

well it keeps me in a job, so pragmatically no


but yes you should understand your tools before you use them, maybe read the fine manual as well

Cingulate posted:

I would think the entire neural network community largely ignores floating point issues, what with them running near-exclusively on single precision.


They have run the analysis and it doesn't matter to them. OP poster was running into self admitted errors and had no idea what was going on, because they had no idea what they were doing.



quote:


My point was you can correct someone without being an rear end to them.


it's not as fun

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Nippashish posted:


Anyway the point I'm making isn't that people should ignore that floats are not a perfect representation of real numbers. My point is that although the precise behavior of floats is quite complicated, there are very simple ways of thinking about them that explain their behavior at a level that is appropriate for the vast majority of people, including most of the people writing scientific software.

You're wrong and should feel wrong. If you're advocating writing scientific software or numerical software without understanding what's going on, please let me know what you make so I can never use it

Not everyone programs fart apps exclusively so its worth knowing when you can and when you can't ignore abstractions


OP go read http://floating-point-gui.de/basic/

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

SurgicalOntologist posted:

Do you have an actual algorithm in mind? Because if so I'd be curious to read up on it. I always figured that after solver issues (i.e. discretization error, stiffness, etc.) and measurement error on initial conditions, floating-point arithmetic is pretty low on the list of concerns.

But yeah, dealing with chaotic systems, good luck if you need to make specific predictions. Investigating the qualitative behavior of the system is more useful anyway.

In any case, you started by saying "floats are completely useless" for simulating complex systems, which I still don't understand--should I not be using floats? Or are just making the point that chaos magnifies errors? In which case, I have errors of much higher orders of magnitude I'm already concerned about.

I guess in the sense that getting long term quantitative predictions using floating point is useless due to the accumulated error being magnified. It was a bit glib

I have heard good things about differential quadrature methods but can't vouch personally

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Cingulate posted:

I generally don't do this, but- if your fun is being needlessly mean towards other people, here's some life advice for you: try doing that less. Try changing. You'll become a happier person, and contribute more to the lives of those around you.
I don't mean this to disparage you or as a counterargument to your position - which it is not - but as advice from someone who should've been given that lesson much earlier themselves, too.

They can, and do, largely ignore them. They can also try to reduce precision even more to get some performance benefits in certain applied cases.

(I'm obviously not saying these people are ignorant of the issues, just that the issues are largely irrelevant, and being ignored.)

Yes I'm going to take life advice from posting on the something awful comedy forums

(I think it's pretty clear that every post is tongue firmly jammed in cheek)

People who tell learners to ignore things that they barely understand never fails to amuse

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

Async basically only helps with IO-bound tasks. If you are mostly CPU-bound you should probably use multiprocessing.

(I'm on phone and basically didn't even look at your code)

Python is like the worst tool for this any code I have that's deeply improved by threading I've shifted over to elixir wince BEAM was designed right and can communicate with python processes for the numpy stuff

I wish asyncio was more like erlang :(

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Lol use an actual diffeq solving codes instead of doing it by hand unless you are researching solvers

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Like instead of the janky first order stuff you could probably use the standard rk4 and be algorithmically faster

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

breaks posted:

Use the task scheduler unless you have a good reason not to. Running it in a loop will work until your computer reboots or it throws an exception and whatever series of other problems, then by the time you find and fix all those all you get for the extra work and inconvenience is probably a worse task scheduler.

Good news: python has one built in: import sched

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Try looking for a computational geometry algorithm that matches what you're trying to do

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Boris Galerkin posted:

Ok cool thanks for the info. I figured it was a spec related thing but I guess this is the first time I've ever used a thing where nan or dividing by zero doesn't crash your program.


Yep I explicitly used numpy arrays here cause I figured it was easier to just handle the nan/inf values of they ever arose rather than pre-checking.

Behold the quiet nan

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

Functional programming in python can be done effectively and sometimes it can be done appropriately, but generally you shouldn't go "ok I'm going to write this program functionally".

Use it when it makes sense.

Guido doesn't optimize style for writing functionally and it shows.

https://stackoverflow.com/questions/1017621/why-isnt-python-very-good-for-functional-programming

That list isn't exactly right, but it gets the idea across.

its more of a backwards thing: guido is stuck in the late 80s-90s when it comes to programming ergonomics and thus will never see any benefit in functional programming and the sycophants who make up cPython will thus never do it

The upshot is that functional programming in python is far more painful than it needs to be and so you shouldn't use it, making it more painful since no developer will optimize for it, and...

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Boris Galerkin posted:

Is there such thing as "statically compiling" an executable Python script? I would like to be able to just distribute the script as an executable with dependencies baked into it so that I don't need to worry about sourcing or creating a venv and whatnot.

Yeah py2exe or pex

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
The julia diffeq library is insanely good


Julia is like a Matlab successor but better

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Airflow is the worst out there except for everything else

Luigi doesn't do scheduling just sequencing Iirc

Godspeed goon.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

mr_package posted:

I'm parsing a file line-by-line, modifying some of them based on regex pattern matching. Basic pattern is this; forgive the $ variables, trying to simplify so consider this pseudocode/Python hybrid:
code:
with source_file.open() as f:
    for line in f:
        if $line_matches_regex_pattern:
            line = $line_modified_with_value_from_dict
        yield line
Main question I have is whether other Python programmers agree using generator here is appropriate. I'm doing it so that someone can call this function and make more changes, if desired, or just write each line to a new file. Before I had a monolithic function that was doing all of this and it just made sense to break it down. I'm using yield here because the 'for line in f' pattern reads the file line by line (and not entirely into memory) which is a general pattern I'm trying to use where possible. Technically the files I'm working with today are small but they may not be in future, so trying to learn this the 'correct' way. Is it?

I might have fixed this with an SE Linux group / permissions change but so long ago I can't say 100%. Also I wasn't using Docker. But may be worth checking into; I definitely ran into SE Linux issues on CentOS other times as well.

edit: indent fix

Use sed

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Yeah I'm glad the python community is realizing that Guido made some bad choices regarding the stdlib and keeping cpython a slow mess

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Lmao at Guido taking it persomally when someone with expertise makes constructive criticism about his baby and he storms out

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Thermopyle posted:

Yes, people should just rip off the python 2 bandaid.

She might very well have mentioned python packaging, the article is just a summary of her talk.

They didn't record the talks so people could say more controversial things. Sounds like Guido should've thought about that before attending talks there...

Easier said than done. Would be a lot easier if Guido had engineered (lol) a proper migration strategy instead of winging it and complaining when people dont spend millions upgrading.

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
It's like if pipenv wasn't written by an rear end in a top hat, so yes

Malcolm XML
Aug 8, 2009

I always knew it would end like this.
Just be a pre merge hook or ci

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Dominoes posted:

Hey dudes. Would anyone be willing to help test if Python binaries I built work on similar systems? Looking for someone with a Ubuntu or Debian install to try this, and specify a Python version you don't have, >= 3.4. Looking for someone with Windows who doesn't have Python 3.7 installed to try it with Python 3.7 specified. Much appreciated.

Ie download and run the deb or MSI, navigate to an empty folder, run `pypackage install`, and enter an applicable version when asked. Run `pypackage python`, and see if the REPL shows the right version.

Install a vm and do it yourself dude

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Dominoes posted:

I don't get along well with Docker, and am looking for help on an open-source project. You might even be interested in using this later, if not now. What I'm specifically asking for help on is a proof-of-concept for getting official binaries hosted on python.org.

https://developer.microsoft.com/en-us/windows/downloads/virtual-machines

free 60 day vms (just keep rebuilding it)

No docker needed. But a trivial dockerfile for this would be worth it since all you do is setup and d/l. And with microsoft's free CI/CD it'd run automatically which is a lot more reliable than asking idiot goons like myself to test

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Fluue posted:

What's the typical pattern for loading credentials from something like AWS SSM Parameter Store into an API wrapper while keeping class initialization lightweight? Is it typically

code:
# api.py
class MyAPIWrapper:
	def __init__(api_secret, api_user, **kwargs):
		# values passed in through caller making separate call to SSM parameter store
		... store the secret/other api auth setup in instance vars ...
and incur the call to AWS SSM every time you have to init the API wrapper? Or does it make more sense to do:

code:
# api.py
from myconfig import ssm_config_loader   # e.g. this returns

class MyAPIWrapper:
	def __init__(api_secret, api_user, **kwargs):
		... store the secret/other api auth setup in instance vars ...

api = MyAPIWrapper(**ssm_config_loader(api_key="my_api"))

The iron rule of dependency injection strongly suggests you do the latter. Your API wrapper should have no idea where it gets its keys.

Adbot
ADBOT LOVES YOU

Malcolm XML
Aug 8, 2009

I always knew it would end like this.

Fluue posted:

Right. I'm asking whether it's better to init an instance of the API wrapper in the module itself or init the wrapper elsewhere (E.g. when it's needed).

This is running in an AWS lambda, btw, which typically advises creating reusable instances :shrug:

The latter, again. For lambdas and scripts I ve always had a driver function that gets all of the config from wherever and then initializes the modules that do the work

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply