Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nippashish
Nov 2, 2005

Let me see you dance!

Boris Galerkin posted:

e: nevermind this only half works. __repr__ returns an actual Name object so I can't do things like "country.name+country.name['de']". Sad.

__repr__ only gets called to turn your object into a string, so "country.name+country.name['de']" doesn't call __repr__ at all. This is also the essence of why what you want is a bad idea. What type of thing is "country.name"? It can't be a string, because you can't index a string with ['de'], but it needs to behave like a string almost all the time. This means that it needs to be some kind of nearly-a-string object that behaves like the string "Belgium" except in some very special circumstances, and making an object like that will surprise people. That's what you've built with Name, except __repr__ only covers behaving like a string when you print it, and not when do other string things like concatenating or slicing.

If you want to do something like this in a simple and not-surprising way then use a function. It's not semantically wrong for it to be a function because it's a thing that behaves differently depending on what arguments you pass it. If you're really upset about typing country.name() instead of country.name you can do this:

code:
class Belgium(object):

    _names = {'en': 'Belgium',
             'de': 'Belgien',
             'nl': 'Belgie',
             'fr': 'Belgique',
             'default': 'Belgium'}

    @property
    def name(self):
        return self._names['default']
        
    def localized_name(self, lang):
      return self._names[lang]
        
 country = Belgium()
   country.name
=> 'Belgium'
   country.localized_name('de')
=> 'Belgien'
   country.name + country.localized_name('de')
=> 'BelgiumBelgien'
   

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

Boris Galerkin posted:

(but mostly because I don't think it's semantically correct to call a function to return a value that I've already computed).

This is a pretty weird thing to think.

Nippashish
Nov 2, 2005

Let me see you dance!
You need to tell sorted what it's sorting. In your last example, how does it know it's supposed to be sorting the list of cars?

You should probably also get used to looking up the documentation for functions you are using. Even if you don't understand everything its a good habit to get into because because a lot of these types of questions have answers that live in the documentation.

Nippashish
Nov 2, 2005

Let me see you dance!

Boris Galerkin posted:

I think the proper term is using the __new__ constructor as a factory function? Is this an accepted way to do something or is this one of these things I shouldn't do in python?

Yes, this is what you're doing and no it is not a normal thing to do. I can't think of anything really wrong with it, but it's definitely weird and you will surprise people if you do it.

Nippashish
Nov 2, 2005

Let me see you dance!
There's a lot of slightly strange things in the collections module. Like docstrings using ''' instead of """, or single line doc strings using 1 quote instead of 3, or the whole implementation of namedtuple.

Nippashish
Nov 2, 2005

Let me see you dance!
How about
code:
class WeirdClass(object):
    def __init__(self, foo):
        self._foo = foo

    def foo(self, *args, **kwargs):
        return self._foo(self, *args, **kwargs)
This is sort of weird, but what you want is also sort of weird. "I want a function that does different things depending on which instance I call it on" really screams "I want a base class with a bunch of different subclasses" to me.

Nippashish
Nov 2, 2005

Let me see you dance!

Eela6 posted:

Numerical Analysis is not an easy subject and it frustrates me when people pretend it's simple & intuitive.

In the vast (vast) majority of cases they are in fact very simple and intuitive. It's extremely rare to need a mental model of floating point that is more sophisticated than "they're like real numbers except that they get funny when they're really big or really small" even if you work with them every day.

Nippashish
Nov 2, 2005

Let me see you dance!
Python code:
>>> abs(.3 / 3 - 0.1) < 1e-12
True
They're weird when they're small. One of the implications is you shouldn't expect exact equality.

Nippashish
Nov 2, 2005

Let me see you dance!

Thermopyle posted:

That's why its confusing to beginners.

That's why I'm suggesting a very simple mental model that is intuitive and also sufficient for even non-beginners. Teaching people that floating point numbers are dark and spooky and complicated isn't very productive, because very few people need to care about that level of detail.

I'm responding to "Numerical Analysis is not an easy subject and it frustrates me when people pretend it's simple & intuitive" by pointing out that for most practical purposes it can be made to be exactly that.

Nippashish
Nov 2, 2005

Let me see you dance!

LochNessMonster posted:

If I'm running into that in my 2nd project with basic funtionality I'd say it's defintely something almost everyone will care about rather sooner than later.

I'd rather have people tell me I should watch out with where I'm using floats (like they did, thank you) than tell me not to worry because I don't need to care about that level of detail.

Floating point numbers are not real numbers, and you need to be aware of this when using them. The particular way in which they are not real numbers is quite complicated, and is rarely relevant. I offered some additional guidance on how to think about the relationship between floats and reals.

Floating point weirdness is also one of those topics that programmers love to expand on at length when it comes up (I'm slightly surprised no one has warned you about storing currency in floats yet) and my suggestion to "not worry about it" should be taken in the context of your post triggering a page of responses. The fact that you went away from the discussion with the impression that you should "watch out" when using floats is exactly what I was trying to mitigate.

Nippashish
Nov 2, 2005

Let me see you dance!

baka kaba posted:

But you should worry about it though? To the point that you're aware that there are some basic pitfalls, and then keep them in mind when you're using floats. The poster you're replying to was already not worrying about it, that's why they hit a problem and had no idea what was going wrong

My stance is that you should be aware floats are not reals, and but you should also not worry about it. The poster I was replying to was not aware of this (and they got lots of explanations in this thread, so that part worked out well).

Nippashish
Nov 2, 2005

Let me see you dance!

Eela6 posted:

I mostly posted the point about currency to tweak Nippanish's nose, but I'm actually glad I did. More people should know.

I used to write modelling software for an insurance company and we did 100% of our currency calculations with floats. I hope this annoys you :kimchi:.

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

This is the level of worrying that Nippashish said a newbie shouldn't be at, and he was 100% right. "Why is this single line of floating-point arithmetic giving me a funny-looking answer" requires a very basic understanding of the issue, not "read this essay on the difficult subject of numerical analysis"

I'd actually go further and say that it's not just beginners, but most programmers. If you need to worry about the details of floating point numbers then you are doing something quite unusual.

Nippashish
Nov 2, 2005

Let me see you dance!

pmchem posted:

Nah, floating point correctness and approximations come up all the time in scientific computing, especially if you have unit tests. Scientific computing isn't "quite unusual"; it's the primary driver for the biggest floating-point computers in the world: https://www.top500.org/lists/2016/11/

I'm well aware of this, pushing floats around is my day job. Most of the careful thinking about stability is done in at the level of BLAS and friends and very few people actually write that kind of code (the few that do do need to care about the details of floats).

When you're writing scientific software your reasoning about stability typically happens at a higher level of abstraction than the properties of floats. Instead you think about the properties of matrix factorizations and DE solvers and other high level operations (which inherit their properties from floats, but not in ways you control). Reasoning at the boundary is dominated by "things can get weird with really big or really small numbers", which I why I suggested that as a mental model earlier.

Nippashish
Nov 2, 2005

Let me see you dance!

pmchem posted:

Cool, mine too. You ignored the point about unit tests in my post. If you write scientific software that has numerical tests of floating point results, then as I'm sure you're aware, when you change (compiler / compiler flags / arch / libraries), the float results may slightly change. That's a pretty common occurrence, and is a reason why anyone involved with scientific software should be aware of floats. Not unusual at all.

Yes sorry I forgot to respond to the unit test part. The the version of this that I see most often is that the results of parallel reductions change because floats aren't associative so results can change even between multiple runs of the same binary on the same hardware. I deal with this in exactly the same way I deal with any other float comparison.

Anyway the point I'm making isn't that people should ignore that floats are not a perfect representation of real numbers. My point is that although the precise behavior of floats is quite complicated, there are very simple ways of thinking about them that explain their behavior at a level that is appropriate for the vast majority of people, including most of the people writing scientific software.

Nippashish
Nov 2, 2005

Let me see you dance!

Thermopyle posted:

Depending upon what your crawler is actually doing, you should probably use asyncio or greenlet instead of threads/multiprocessing anyway. I/O bound tasks work great with async.

Is it possible to use async without making the code async "all the way down"? Maybe I'm just dense, but afaict anything that blocks in a non-async way (read: all legacy code ever) ends up blocking the world?

Nippashish
Nov 2, 2005

Let me see you dance!

Thermopyle posted:

The absolute easiest is to use eventlet or greenlet which will patch the standard library at runtime to make all the I/O functions (like the socket library and thus http, urllib, requests, etc) use yielding versions. This will make your legacy python code automatically async-friendly.

This is the thing I was missing. I understand what async is for, but last time I tried it I gave up because I convinced myself I'd need to rewrite all my existing non-async-aware code to see benefits.

Nippashish
Nov 2, 2005

Let me see you dance!

Cingulate posted:

If it's tabular data, just use pandas read_csv (and make plots in seaboard).

Seriously, do this. Do not even consider the csv reader in the stdlib. Pandas is infinitely superior.

Nippashish
Nov 2, 2005

Let me see you dance!
You should though, especially since the next step is "make charts".

Nippashish
Nov 2, 2005

Let me see you dance!

Eela6 posted:

I don't feel that way. I don't particularly like pandas, and I prefer using the stdlib where possible. You have to consider the audience of your code, too. Jose is very new to python; the last thing he needs is a thousand different APIs to understand.

The audience is someone who wants to load data from csvs to produce charts. Someone who is already using matplotlib and is enrolled in something called "data camp".

Pandas is a cornerstone of the python data science toolchain. Avoiding it is pretty much the data science equivalent of rolling your own web server.

Nippashish
Nov 2, 2005

Let me see you dance!
I put stickers on my laptop because I work in an office with 500 people and 99% of them use macbook pros. Gotta have some way to tell which one is mine.

Nippashish
Nov 2, 2005

Let me see you dance!

creatine posted:

Question:

I am passing a list of arrays into a function that will then do some plotting with those arrays. A simple way I thought to get a name was to use the .index() method in a way like this

code:
for data in input_list:
    print(input_list.index(data))
But, I keep getting an error:

code:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

input_list.index(data) effectively looks like this on the inside:

code:
for index, thing in enumerate(input_list):
    if data == thing:
        return index
Your "thing"s are numpy arrays (presumably they're also all the same size you'd get a different error) and if you do array1 == array2 the result is an array of booleans that tell you elementwise equality of array1 and array2. Using this as the condition in an "if" means python asks the array of booleans to convert itself into a single True or False value, and the error you're seeing is numpy complaining that it doesn't know how to compute an array with more than one element into a single boolean.

Nippashish
Nov 2, 2005

Let me see you dance!

SurgicalOntologist posted:

In short, I can't figure out a good way to reliably get the closest index with floats. I really don't want to do something like argmin(abs(time_series - time_values)) for a frequently used lookup function, but is that the only way?

Searchsorted doesn't look for the index of the nearest-to-target value, it looks for the index where you could insert the target value while keeping the original vector sorted. If searchsorted returns idx for target it doesn't tell you if target is closer to values[idx-1] or values[idx], you need to check that separately. Here's a solution from stackoverflow:
code:
def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    return idx - (np.abs(value - array[idx-1]) < np.abs(value - array[idx]))

Nippashish
Nov 2, 2005

Let me see you dance!
Another option is to fix the script to work with the updated pandas version.

Nippashish
Nov 2, 2005

Let me see you dance!

Tigren posted:

This seems like it's no longer just an implementation detail, but Guido proclaiming dicts are now ordered by design starting in 3.7.

It seems like they're also keeping collections.OrderedDict but intentionally having a different implementation optimized around different usage patterns, which seems like an odd choice.

Nippashish
Nov 2, 2005

Let me see you dance!

unpacked robinhood posted:

It gives the following tree, in this case the longest path is 9 bits long, which isn't an inprovement when storing 8 bit values I think ?

A Huffman code minimizes the average code length for the data used to build it. If you compare a Huffman code to the shortest fixed width code then some words will have longer codes and some will have shorter codes, but it is specifically the common words that are short and the rare ones that are long. If you encode the input data and compute total_encoded_bits / number_of_words then this ratio will be smaller for the Huffman code.

Nippashish
Nov 2, 2005

Let me see you dance!

Slimchandi posted:

But even if the object I'm describing in my class is 'abstract' (e.g. Bird), which I never call directly, and I only use it to subclass instances of 'real' birds (Gull, Eagle, Owl), this situation wouldn't benefit from using an ABC. I would be better off with a standard Bird class that I inherit from, right?

I wouldn't use an ABC unless at least one of the methods was going to be abstract. In this case the advantage of an abstract foo over def foo(): pass in the base class is that python will yell at you if you forget to implement foo in the subclass, instead of just calling the empty method and plowing ahead. If this is not a thing you care about then you have no reason to use ABCs.

Nippashish
Nov 2, 2005

Let me see you dance!

NtotheTC posted:

To be clear, the behaviour I want to replicate is

I have endless copies of
code:
#!/bin/bash

source $(pwd)/venv/bin/activate
"$@"
hanging around in my projects. The end result looks pretty similar to what you want because env changes made inside a script don't propogate out to the surrounding environment so I can type ./run.sh python my_script.py and it does the right thing.

Nippashish
Nov 2, 2005

Let me see you dance!

ZarathustraFollower posted:

filename=('HeteroFreq2.txt'),

What am I doing wrong with declaring the filename?

That comma one the end makes it a tuple with one element. Delete the comma (and maybe also the parens because they're not doing anything).

Nippashish
Nov 2, 2005

Let me see you dance!

Portland Sucks posted:

I know floating point math is a nontrivial issue, but is there a standard way to handle this stuff when doing routine calculations on numbers that are very precise? My current thought is to just change my tests to pass if there is less than n% difference between the Excel values and the python computed values, but that seems a little hand wavy.

You're opening a huge can of worms here by asking about precision, although almost certainly the answer you want is "learn to live with small amounts of precision difference".

However, if you're seeing differences as early as just reading the values and looking at what you read it is extremely likely that excel actually has the same precision loss as python and is just rounding differently for display purposes.

Nippashish
Nov 2, 2005

Let me see you dance!

Seventh Arrow posted:

Wait, I just realized that the index - or row number, whatever you want to call it - is being included in "collector_key":

{"collector_key":{"0":-1,"1":-1,"2":139517343969,"3":-1,"4":-1...}

All those numbers that are bolded are totally superfluous. I wonder if removing them (if possible) will fix the problem.

No this is not what you need to do. to_json is not storing "superflous" data, but it's also not storing the data in the format you need. When you look at a json file you can read [] and {} just like you do in python. When you call json.load on the file [] becomes a list and {} becomes a dict.

What's going on is you have a table of data like:
code:
   A  B  C
0 a0 b0 c0
1 a1 b1 c1
2 a2 b2 c2
The dynamodb expects a sequence of records, like this:
code:
[
  {A: a0 B: b0 C: c0},
  {A: a1 B: b1 C: c1},
  {A: a2 B: b2 C: c2},
]
(i.e this is a list of dicts and the for movie in movies loop runs over the list).

Pandas .to_json stores the data differently though. to_json stores it like this:
code:
{
  A: { 0: a0, 1: a1, 2: a2 },
  B: { 0: b0, 1: b1, 2: c2 },
  C: { 0: c0, 1: c1, 2: b2 },
}
This is a dictionary of columns, where each column is a dictionary mapping index -> value (the index is there because pandas lets you use non-integer non-contiguous indexes It can't just write them down in order because you could have an index like [0,1,23243254324324] instead of [0,1,2]).

For what you are trying to do I think the "json" part is leading you down a garden path. You need to do something like this:
code:
from __future__ import print_function # Python 2/3 compatibility
import boto3
import pandas as pd

dynamodb = boto3.resource(...)

table = dynamodb.Table('Loyalty_One')

with open("loyalty_one.json") as json_file:
    data = pd.read_json(json_file)
    for numbers in data.iterrows():
        collector_key = int(numbers['collector_key'])
        sales = int(numbers['sales'])
        store_location_key = int(numbers['store_location_key'])

        # ... continue as before

Nippashish
Nov 2, 2005

Let me see you dance!

Seventh Arrow posted:

I'm not sure (yet) how to call a dataframe row by row

That's what iterrows does.

Nippashish
Nov 2, 2005

Let me see you dance!
You can also run an http server in the python 3 process and have the python 2 process interact with that to signal when it should do things.

Nippashish
Nov 2, 2005

Let me see you dance!

Linear Zoetrope posted:

In numpy (or some interoperable library like pandas) is there a way to create basic shapes like triangles or circles in an ndarray/tensor/table/whatever? I'm featurizing a space of entities represented like {pos = (x,y) , shape = Triangle{base_len = 1.5}, ...} or whatever and while I'm entirely capable of writing this myself* it would be nice to be able to just have python do it.

Numpy isn't an image library so it doesn't have functions like this. Your best bet I think is to use Pillow to draw a greyscale image for each channel and then convert those to numpy arrays and stack them. Pillow has a polygon function that should suit your needs, and you can convert a Pillow image to a numpy array with np.asarray(my_image).

Nippashish
Nov 2, 2005

Let me see you dance!

SpaceSDoorGunner posted:

Anyone know why this:

https://stackoverflow.com/questions/23287/largest-prime-factor-of-a-number/412942#412942

is better what seemed to me to be the obvious solution:

def factor(num):
->factors = [ ]
->for x in range(1,num):
->-> if num%x == 0:
->->-> factors.append(x)
->factors.append(num)
->return(factors)
factor(42)

Yours finds non-prime factors.

Nippashish
Nov 2, 2005

Let me see you dance!

cinci zoo sniper posted:

Yeah NumPy will internally route stuff to C or Fortran code to speed things up, including hardware-specific optimisations. Pandas does the same twofold, since it has both its own C stuff and is largely built on top of NumPy (wrt data structures and such).

This doesn't help for constructs like apply though, since they need to invoke a python function (and thus interact with the interpreter) for each element of whatever you are applying to.

Nippashish
Nov 2, 2005

Let me see you dance!
The arange and the reciprocal aren't helping with memory use either.

Nippashish
Nov 2, 2005

Let me see you dance!

salisbury shake posted:

I'm just learning to use NumPy and had to poke around the docs to do what I want. While the above code finds the solution correctly, I'm unsure if I'm using NumPy efficiently or even canonically.

For example, in get_fabric(), I'm trying to minimize sequential Python code in favor of using NumPy's ufuncs and routines where possible, but I'm still calling np.add.at() over a thousand times in total. That feels wrong, but I don't have the context to really know if/why it's wrong.

The problem is that the pattern of indices you need to accumulate into is a bunch of irregularly scattered rectangles. There is good syntax for working with single rectangles of locations, or with irregularly scattered single locations, but irregularly scattered rectangles is a case that is not handled well.

If you build all of the indexes explicitly (e.g using np.meshgrid or np.mgrid) and concatenate them together then you could probably get away to a single call to np.add.at (by changing the problem from indexing at an irregular collection of rectangles to indexing at an irregular collection of points), but this doesn't solve the real problem which is that you need to loop over all of the claims to collect the indices in the first place. It might be worth trying as an exercise to learn the tools, but it won't be any more efficient and will probably be harder to understand.

For count_overlapping it would be more idiomatic to write np.sum(fabric > 1), but your way is fine too. It would also be more idiomatic to write fabric[indices] += increment instead of np.add.at(fabric, indices, increment) unless you are making use of the fact that np.add.at will increment repeated indices multiple times.

Nippashish fucked around with this message at 09:21 on Dec 7, 2018

Nippashish
Nov 2, 2005

Let me see you dance!
You could also just commit to not import your scripts as modules and then you don't need any double underscore shenanigans.

Adbot
ADBOT LOVES YOU

Nippashish
Nov 2, 2005

Let me see you dance!

QuarkJets posted:

It's cool and good to have a file that is both importable and runnable as a script

I guess what I'm trying to say is that having a slightly weird syntax to do a slightly weird thing is one of the less objectionable features of python imo.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply