Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Cingulate
Oct 23, 2012

by Fluffdaddy
So in light of

Cingulate posted:

I found this very interesting: https://medium.com/dunder-data/python-for-data-analysis-a-critical-line-by-line-review-5d5678a4c203
A brutal review of Wes McKinney's book on pandas.
What's an actually good intro to data analysis with Python?

Adbot
ADBOT LOVES YOU

Stringent
Dec 22, 2004


image text goes here
What are people using for linters in vim?

Baby Babbeh
Aug 2, 2005

It's hard to soar with the eagles when you work with Turkeys!!



Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns.

My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?

Eela6
May 25, 2007
Shredded Hen

Baby Babbeh posted:

Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns.

My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?

is there anything wrong with just using pure python?

Like, you could do this:

Python code:
from typing import *
def flatten_subset(d: dict, keys = Iterable[str], *, sep: str = ".") -> Iterable:
  def get_nested_elem(key: str):
    v = d
    for k in key.split(sep):
        v = v[k]
    return v
    
  for key in keys:
    yield get_nested_elem(key)


if __name__ == '__main__': # test
  d = {"foo": [1, 2], "bar": {'0': 0, '1': 1}, "baz": 3}
  keys = ("foo", "bar.0")
  want = [[1, 2], 0]
  got = list(flatten_subset(d, keys))
  assert got == want

From there the 'naive' approach should work just fine.

Eela6 fucked around with this message at 23:45 on Dec 14, 2017

hhhmmm
Jan 1, 2006
...?

Baby Babbeh posted:

Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns.

My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?

Will you be doing this once or repeatedly?

If you are only doing it once, maybe not optimize on performance on that step? Does kind of seem like one of those point where it could be acceptable to just accept that your code is not optimal, and just go fetch a cup of coffee whenever you need to redo it.

If you do need to optimize, consider:
For each iteration in the json file, check for all the features you are looking for. Add it to a series, but afterwards combine as a row-wise operation, not coloumn-wise.
Instead of checking for one feature for every iteration through the JSON file, since you are then repeating the same expensive disk operations for every feature you are looking for.

(or I could have misunderstood)

Baby Babbeh
Aug 2, 2005

It's hard to soar with the eagles when you work with Turkeys!!



I'm going to be getting these json files in batches to process, 100 or so a batch, so I'd like to make this operation reasonably lightweight if I can, but it also doesn't need to be insanely optimized either. I'm okay with running it and then going to get lunch if need be. I think I'm going to basically follow Eela6's approach, make a list of Serieses, and then concatenate it into a DataFrame. That should be good enough. Thanks for your help!

Shadow0
Jun 16, 2008


If to live in this style is to be eccentric, it must be confessed that there is something good in eccentricity.

Grimey Drawer
I'm trying to make some Python program where I need to be able to record some audio, but every library I look at leads me to a dead end. I can't seem to get a library for audio recording to install on my Windows machine with Android 3.5 (and also Anaconda if that matters). Does anyone know of a library or some function or some other method I can use to record audio with Python?

There was a PyAudio wheel that seemed like it was going to be just what I needed, but I can't seem to install it and get it working. I guess I'm pretty confused on the whole Python-library-getting process and how it works with Anaconda and all that.

Edit: I just decided to just give up on it. I'll just record the audio with something else, and then have Python do the rest. I think I just messed up the install of the PyAudio somehow, oh well.

Shadow0 fucked around with this message at 07:02 on Dec 17, 2017

NtotheTC
Dec 31, 2007


So Microsoft are considering making Python an official scripting language in Excel which is pretty cool I guess. Though it means when my friends ask me "Hey fix my spreadsheets" I can't use the "sorry I don't know anything abiut vba" excuse.

Fergus Mac Roich
Nov 5, 2008

Soiled Meat

NtotheTC posted:

So Microsoft are considering making Python an official scripting language in Excel which is pretty cool I guess. Though it means when my friends ask me "Hey fix my spreadsheets" I can't use the "sorry I don't know anything abiut vba" excuse.

I don't know, I've always seen VBA as kind of a canary in the Excel coalmine, so to speak. If your macro is cumbersome to write, you know you shouldn't have picked Excel for this.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

outlier posted:

I don't have any thoughts about the backend at the moment other than I'd like it to be a Python-based one (which I am not going to build myself). I figured the choice of frontend might constraint the backend. Will chase up Django REST.

Modern web stuff completely decouples (or at least thats the goal) the frontend from the backend, so the choice of technologies on either end shouldn't have any effect on your choices on the other end.

The hottest frontend right now is probably React/Redux, but discussions on that mostly occur in the modern web development thread.

Dominoes
Sep 20, 2007

If you haven't tried Pipenv, give it a shot: It elegantly combines virtualenvs with pip, and has simplified my workflow. It's been hard-broken with two distinct bugs until a few days ago, but the latest version is good-to-go.

Eela6
May 25, 2007
Shredded Hen
I love fun weird metaprogramming stuff.

Python code:
def ordering_mixin(*args: str, default: Any = None):
    """order_by returns a mixin class which provides ordering operators. 
    These operators order lexigraphically by the attributes with the names in 'args'. 
    ordering_mixin will only compare two classes with the same ordering attrributes; that is,
    classes created with equivalent ordering_mixins"""

    if default is not None:
        def attrs(obj: Any) -> Iterator[Any]: 
            return (getattr(obj, arg, default) for arg in args)
    else:
        def attrs(obj: Any) -> Iterator[Any]:
            return (getattr(obj, arg) for arg in args)

    
    @total_ordering
    class OrderedMixin:
        _ordered_mixin_args = tuple(args)
        def __lt__(self, other: Any):
            if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
                return NotImplemented
            for a, b in zip(attrs(self), attrs(other)):
                if a < b:
                    return True
                elif b < a:
                    return False
            return False

        def __eq__(self, other: Any):
            if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
                return NotImplemented
            return all(a == b for a, b in zip(attrs(self), attrs(other)))
        

    return OrderedMixin

if __name__ == '__main__':
    class TestClass(ordering_mixin('a', 'b')):
       def __init__(self, a, b, c, d):
            self.a, self.b, self.c, self.d = a, b, c, d


    foo = TestClass(2, 3, 0, 0)
    bar = TestClass(2, 4, 0, 0)
    baz = TestClass(1, 3, 0, 0)

    class TestClass2(ordering_mixin('c', 'd')):
        def __init__(self, a, b, c, d):
           self.a, self.b, self.c, self.d = a, b, c, d

    assert foo < bar
    assert bar > baz
    assert foo == foo
    poo = TestClass2(2, 3, 0, 0)
    try:
        foo < poo
        raise AssertionError('should have raised a typeerror')
    except TypeError as e:
        pass

Eela6 fucked around with this message at 21:03 on Dec 19, 2017

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

OrderedMixin!

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

OrderedMixin!

my bad ;) I haven't used python for real dev work in a while, since I work in Go professionally. I just like to come back and stretch my wings sometimes :)

Tigren
Oct 3, 2003

Tigren posted:

And remember, in Python 3.6, dicts are now ordered.

This seems like it's no longer just an implementation detail, but Guido proclaiming dicts are now ordered by design starting in 3.7.

Guido van Rossum posted:

Make it so. "Dict keeps insertion order" is the ruling. Thanks!

https://mail.python.org/pipermail/python-dev/2017-December/151283.html

Nippashish
Nov 2, 2005

Let me see you dance!

Tigren posted:

This seems like it's no longer just an implementation detail, but Guido proclaiming dicts are now ordered by design starting in 3.7.

It seems like they're also keeping collections.OrderedDict but intentionally having a different implementation optimized around different usage patterns, which seems like an odd choice.

Baby Babbeh
Aug 2, 2005

It's hard to soar with the eagles when you work with Turkeys!!



That seems... not very pythonic?

Data Graham
Dec 28, 2009

📈📊🍪😋



Doesn’t that mean there will be demand for like index-based accessors and tons of non-backward-compatible code simplifications?

Schism time

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
Anyone familiar with Huffman trees ?
I'm trying to build one as an exercise (and later to store a bunch of 8 bit values on a microcontroller), and I'm missing something.

When building from short sentences it seems to build functionnal trees, but it falls apart with more data.

For example if I pile enough words together (ascii coded):
code:
flippity flop shitkabob the wrong whistler and so are you, come get some no dad is dad as you zoom zam everyone is prettty hotttt my cousin steals eggnog umbrella yeah now you see begrudingly huge piss orange monkey congratulate synergie ghetto lambda extraverti nyctalope donkey dong county fattest man on the moon what what yeah no say say i love velo poupoupidou zoop 1 2 3 tell my friends I love me
It gives the following tree, in this case the longest path is 9 bits long, which isn't an inprovement when storing 8 bit values I think ?



So I either: don't understand what Huffman trees are, build a graphic representation that isn't accurate, or have a broken thing somewhere.

I'm not sure how to start fixing this. This is some of my code, the whole thing is here, as well as a bunch of log data (computing weights and tree building)

e: removed embedded code

I wondered if relying on min() and index() to chose and remove elements could introduce inconsistencies, but it seems they'll always work "first index comes first" when there are two Nodes with the same weight.

unpacked robinhood fucked around with this message at 12:39 on Dec 20, 2017

Nippashish
Nov 2, 2005

Let me see you dance!

unpacked robinhood posted:

It gives the following tree, in this case the longest path is 9 bits long, which isn't an inprovement when storing 8 bit values I think ?

A Huffman code minimizes the average code length for the data used to build it. If you compare a Huffman code to the shortest fixed width code then some words will have longer codes and some will have shorter codes, but it is specifically the common words that are short and the rare ones that are long. If you encode the input data and compute total_encoded_bits / number_of_words then this ratio will be smaller for the Huffman code.

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

Dominoes posted:

If you haven't tried Pipenv, give it a shot: It elegantly combines virtualenvs with pip, and has simplified my workflow. It's been hard-broken with two distinct bugs until a few days ago, but the latest version is good-to-go.

Does pipenv still insist on using the deprecated virtualenv tool on all versions of Python, instead of using the venv standard library module when it's available? That was definitely the case when I checked last, and was a no-go for me to try out pipenv.

Sockser
Jun 28, 2007

This world only remembers the results!




When building out some code last week, I accidentally used {} to make an array instead of ()
e.g. arr = { ‘a’, ‘b’, ‘c’, }

This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?

Dominoes
Sep 20, 2007

Lysidas posted:

Does pipenv still insist on using the deprecated virtualenv tool on all versions of Python, instead of using the venv standard library module when it's available? That was definitely the case when I checked last, and was a no-go for me to try out pipenv.
virtualenv is listed as s dependency in setup.py, so probably.

Sockser posted:

When building out some code last week, I accidentally used {} to make an array instead of ()
e.g. arr = { ‘a’, ‘b’, ‘c’, }

This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?
You created a set. It's like a list, but without duplicates. It's mainly used as an intermediate data structure to remove duplicates.

Dominoes fucked around with this message at 20:09 on Dec 20, 2017

SurgicalOntologist
Jun 17, 2004

Sockser posted:

When building out some code last week, I accidentally used {} to make an array instead of ()
e.g. arr = { ‘a’, ‘b’, ‘c’, }

This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?

It's called a set. It kind of acts like dictionary keys in that there are no duplicates, only hashable values are allowed, and order is undefined (although is this changing for sets too?), but there are no values, not even None. It's useful for keeping track of things where you don't want to keep multiple copies of anything, for getting the unique elements of another collection, or of course for set operations like union and intersection.

Edit: it's faster for membership testing (the "in" keyword) so I often use set literals in expressions like if extension in {'csv', 'tsv', 'xls', 'xlsx'}: (even though a speed consideration is pointless in this example, the semantics of using a set here work better as well).

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

Sockser posted:

When building out some code last week, I accidentally used {} to make an array instead of ()
e.g. arr = { ‘a’, ‘b’, ‘c’, }

This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?

No, this creates a set object: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

Sets are unordered collections of distinct objects, so the order you see is arbitrary, and duplicate elements are only stored once:

code:
In [1]: {'a', 'b', 'c', 'b'} == {'c', 'a', 'b'}
Out[1]: True
I'm pretty sure that in the distant past (like Python 2.3), sets were implemented as dictionaries that mapped every key to None, but this is an implementation detail that isn't necessarily still true. I'm sure the set and dict implementations in current Python share a lot of hash table functionality, but I wouldn't be surprised if sets were no longer implemented as wrapper classes around dicts that map each element to None.

e: wow beaten twice, that was fast

Lysidas fucked around with this message at 20:14 on Dec 20, 2017

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Now that's irony

Sockser
Jun 28, 2007

This world only remembers the results!




Tried to show that to someone who showed up to my desk after I posted and the interpreter spit out set(xyz)

I probably should’ve run it through the interpreter before asking :downs:

On the bright side, I didn’t know the python had native support for sets like that so that’s neat


Edit:
x = {‘a’}
Yields set(a)
x = {‘a’: 1}
Gives you a dict

So what is the type of x if you
x = {}

E2: type(x) tells me it’s a dictionary. Can you turn that into an empty set somehow?

Sockser fucked around with this message at 20:36 on Dec 20, 2017

vikingstrike
Sep 23, 2007

whats happening, captain
x = {} creates a dictionary, x = set() if you want to create an empty set

unpacked robinhood
Feb 18, 2013

by Fluffdaddy

Nippashish posted:

A Huffman code minimizes the average code length for the data used to build it. If you compare a Huffman code to the shortest fixed width code then some words will have longer codes and some will have shorter codes, but it is specifically the common words that are short and the rare ones that are long. If you encode the input data and compute total_encoded_bits / number_of_words then this ratio will be smaller for the Huffman code.

Thanks ! I have the compression side doing...something now.

Assuming it works correctly, it does wonders for RLE encoded b&w images, not so much on the data I wanted to use it on.

Eela6
May 25, 2007
Shredded Hen
Sets have a number of operators that aren't defined for dictionaries, too.

Python code:
a = {'foo', 'bar'}
b = {'bar', 'baz'}
assert  a|b ==  {'foo', 'bar', 'baz'} # union
assert a & b == {'bar'} # intersection 
assert a ^ b == {'foo', 'baz'} # xor
assert a-b == {'foo'} #difference

assert {'foo'} < {'foo', 'bar'} # proper subset
assert not {'foo'} < {'foo'} # a set is not a proper subset of itself
assert {'foo'} <= {'foo'} # improper subset

These four in-place operators (|=, &=, -=, ^=) are also available.

Eela6 fucked around with this message at 22:22 on Dec 20, 2017

FoiledAgain
May 6, 2007

Since we're talking about sets, here's my favourite way to remove duplicates from a list:
some_list = list(set(some_list))

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

FoiledAgain posted:

Since we're talking about sets, here's my favourite way to remove duplicates from a list:
some_list = list(set(some_list))

The main problem with this is that it doesn't maintain order, but it is clear and succinct.

I can distinctly remember the first thing I ever did in Python that I felt was cool without any help from StackOverflow or anywhere:

Python code:
list(dict.fromkeys(some_list))
Of course, at the time that wasn't order preserving, but in python 3.6 it is. If you're on Python 2.7 you can use collections.OrderedDict.

LochNessMonster
Feb 3, 2005

I need about three fitty


I was wondering there was a more pythonic way of doing this:

code:
d1 = { ‘servertype’ : [“name1”, “name2”, etc], ‘servertype2’ : [“name3”, etc] }


d2 = { ‘servertype1’ : [“extension1”, “extension2”, etc], ‘servertype2’ : [“extension3”, etc] }

some_list = []

for k, v in d1.items():
  for x in range(len(d1):
    for y in range(len(d2):
      some_list.append(str(d1[k][x]) + str(d2[k][y]))

print(some_list)

IAmKale
Jun 7, 2007

やらないか

Fun Shoe

LochNessMonster posted:

I was wondering there was a more pythonic way of doing this:

List comprehensions are pretty pythonic. How about something like this?

code:
d1 = { "servertype" : ["name1", "name2"], "servertype2" : ["name3"] }
d2 = { "servertype" : ["extension1", "extension2"], "servertype2" : ["extension3"] }

some_list = []

for k, v in d1.items():
    if k in d2:
        some_list += ['{}{}'.format(x, y) for x in d1[k] for y in d2[k]]

# Returns ['name1extension1', 'name1extension2', 'name2extension1', 'name2extension2', 'name3extension3']
print(some_list)

Eela6
May 25, 2007
Shredded Hen

LochNessMonster posted:

I was wondering there was a more pythonic way of doing this:

code:
d1 = { ‘servertype’ : [“name1”, “name2”, etc], ‘servertype2’ : [“name3”, etc] }


d2 = { ‘servertype1’ : [“extension1”, “extension2”, etc], ‘servertype2’ : [“extension3”, etc] }

some_list = []

for k, v in d1.items():
  for x in range(len(d1):
    for y in range(len(d2):
      some_list.append(str(d1[k][x]) + str(d2[k][y]))

print(some_list)


Absolutely.

Python code:
from typing import *

def old_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> List[str]:
    some_list = []

    for k, v in d1.items():
        for x in range(len(d1)):
            for y in range(len(d2)):
                some_list.append(str(d1[k][x]) + str(d2[k][y]))
    return some_list



def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
    for k in d1:
        v1, v2 = d1[k], d2[k]
        for x in v1:
            for y in v2:
                yield str(x)+str(y)
            

if __name__ == '__main__':
    d1 = {n: [f"d1_{n}_{m}" for m in range(3)] for n in range(3)}
    d2 = {n: [f'd2_{n}_{m}' for m in range(3)] for n in range(3)}
    want = old_lookup(d1, d2)
    got = list(new_lookup(d1, d2))
    
    assert want == got

You can get even fancier like this, but this is starting to rely on your audience having a very through understanding of modern python:
Python code:
def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
    for k in d1:
        v1, v2 = d1[k], d2[k]
        return (str(x)+str(y) for x in v1 for y in v2)
IAmKale, I like your solution except for one thing: it doesn't duplicate the behavior of the original function in the case that a key in d1 is missing from d2. The original function will raise a KeyError, but yours will silently ignore that key.

Eela6 fucked around with this message at 20:55 on Dec 21, 2017

IAmKale
Jun 7, 2007

やらないか

Fun Shoe

Eela6 posted:

IAmKale, I like your solution except for one thing: it doesn't duplicate the behavior of the original function in the case that a key in d1 is missing from d2. The original function will raise a KeyError, but yours will silently ignore that key.
Argh, busted. I got so caught up in fixing the example code that I assumed the mismatched keys were a typo :negative:

Eela6
May 25, 2007
Shredded Hen

IAmKale posted:

Argh, busted. I got so caught up in fixing the example code that I assumed the mismatched keys were a typo :negative:

The Zen Of Python posted:

In the face of ambiguity, refuse the temptation to guess.

(Take this with the appropriate amount of :goonsay: or :smugdog:)

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You can also do

Python code:

from itertools import product, chain

combos = (product(d1[server], d2[server]) for server in d1.keys())
as_strings = ["".join(parts) for parts in chain.from_iterable(combos)]

which gives you all the name/extension combos for each server. It's an iterable representing each server, each item is an iterable of all that server's combos, held in tuples. So you can use chain to unpack all those nested iterables and just spit out one sequence of tuples, and join 'em up however you like

(also it reads nicer if you change "d1" to "names" and "d2" to "extensions" or whatever!)

baka kaba fucked around with this message at 00:09 on Dec 22, 2017

LochNessMonster
Feb 3, 2005

I need about three fitty


Thanks for all the suggestions and examples, really opening up different ways to approach the issue I wouldn’t have ever known about.

The d1/d2 are from my piece of code that I was testing with. In the original code they’re named appropriately.

Adbot
ADBOT LOVES YOU

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Oh I meant in mine really, just that attribute[server] will read more clearly, so it's obvious what you're getting the product of

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply