|
So in light of Cingulate posted:I found this very interesting: https://medium.com/dunder-data/python-for-data-analysis-a-critical-line-by-line-review-5d5678a4c203
|
# ? Dec 14, 2017 12:33 |
|
|
# ? May 15, 2024 03:15 |
|
What are people using for linters in vim?
|
# ? Dec 14, 2017 12:47 |
|
Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns. My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?
|
# ? Dec 14, 2017 23:00 |
Baby Babbeh posted:Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns. is there anything wrong with just using pure python? Like, you could do this: Python code:
Eela6 fucked around with this message at 23:45 on Dec 14, 2017 |
|
# ? Dec 14, 2017 23:41 |
|
Baby Babbeh posted:Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns. Will you be doing this once or repeatedly? If you are only doing it once, maybe not optimize on performance on that step? Does kind of seem like one of those point where it could be acceptable to just accept that your code is not optimal, and just go fetch a cup of coffee whenever you need to redo it. If you do need to optimize, consider: For each iteration in the json file, check for all the features you are looking for. Add it to a series, but afterwards combine as a row-wise operation, not coloumn-wise. Instead of checking for one feature for every iteration through the JSON file, since you are then repeating the same expensive disk operations for every feature you are looking for. (or I could have misunderstood)
|
# ? Dec 15, 2017 22:42 |
|
I'm going to be getting these json files in batches to process, 100 or so a batch, so I'd like to make this operation reasonably lightweight if I can, but it also doesn't need to be insanely optimized either. I'm okay with running it and then going to get lunch if need be. I think I'm going to basically follow Eela6's approach, make a list of Serieses, and then concatenate it into a DataFrame. That should be good enough. Thanks for your help!
|
# ? Dec 16, 2017 00:39 |
|
I'm trying to make some Python program where I need to be able to record some audio, but every library I look at leads me to a dead end. I can't seem to get a library for audio recording to install on my Windows machine with Android 3.5 (and also Anaconda if that matters). Does anyone know of a library or some function or some other method I can use to record audio with Python? There was a PyAudio wheel that seemed like it was going to be just what I needed, but I can't seem to install it and get it working. I guess I'm pretty confused on the whole Python-library-getting process and how it works with Anaconda and all that. Edit: I just decided to just give up on it. I'll just record the audio with something else, and then have Python do the rest. I think I just messed up the install of the PyAudio somehow, oh well. Shadow0 fucked around with this message at 07:02 on Dec 17, 2017 |
# ? Dec 16, 2017 04:52 |
|
So Microsoft are considering making Python an official scripting language in Excel which is pretty cool I guess. Though it means when my friends ask me "Hey fix my spreadsheets" I can't use the "sorry I don't know anything abiut vba" excuse.
|
# ? Dec 17, 2017 19:20 |
|
NtotheTC posted:So Microsoft are considering making Python an official scripting language in Excel which is pretty cool I guess. Though it means when my friends ask me "Hey fix my spreadsheets" I can't use the "sorry I don't know anything abiut vba" excuse. I don't know, I've always seen VBA as kind of a canary in the Excel coalmine, so to speak. If your macro is cumbersome to write, you know you shouldn't have picked Excel for this.
|
# ? Dec 18, 2017 04:12 |
|
outlier posted:I don't have any thoughts about the backend at the moment other than I'd like it to be a Python-based one (which I am not going to build myself). I figured the choice of frontend might constraint the backend. Will chase up Django REST. Modern web stuff completely decouples (or at least thats the goal) the frontend from the backend, so the choice of technologies on either end shouldn't have any effect on your choices on the other end. The hottest frontend right now is probably React/Redux, but discussions on that mostly occur in the modern web development thread.
|
# ? Dec 18, 2017 17:17 |
|
If you haven't tried Pipenv, give it a shot: It elegantly combines virtualenvs with pip, and has simplified my workflow. It's been hard-broken with two distinct bugs until a few days ago, but the latest version is good-to-go.
|
# ? Dec 19, 2017 19:27 |
I love fun weird metaprogramming stuff.Python code:
Eela6 fucked around with this message at 21:03 on Dec 19, 2017 |
|
# ? Dec 19, 2017 20:49 |
|
OrderedMixin!
|
# ? Dec 19, 2017 21:00 |
Thermopyle posted:OrderedMixin! my bad I haven't used python for real dev work in a while, since I work in Go professionally. I just like to come back and stretch my wings sometimes
|
|
# ? Dec 19, 2017 21:01 |
|
Tigren posted:And remember, in Python 3.6, dicts are now ordered. This seems like it's no longer just an implementation detail, but Guido proclaiming dicts are now ordered by design starting in 3.7. Guido van Rossum posted:Make it so. "Dict keeps insertion order" is the ruling. Thanks!
|
# ? Dec 20, 2017 00:27 |
|
Tigren posted:This seems like it's no longer just an implementation detail, but Guido proclaiming dicts are now ordered by design starting in 3.7. It seems like they're also keeping collections.OrderedDict but intentionally having a different implementation optimized around different usage patterns, which seems like an odd choice.
|
# ? Dec 20, 2017 00:34 |
|
That seems... not very pythonic?
|
# ? Dec 20, 2017 02:56 |
Doesn’t that mean there will be demand for like index-based accessors and tons of non-backward-compatible code simplifications? Schism time
|
|
# ? Dec 20, 2017 04:24 |
|
Anyone familiar with Huffman trees ? I'm trying to build one as an exercise (and later to store a bunch of 8 bit values on a microcontroller), and I'm missing something. When building from short sentences it seems to build functionnal trees, but it falls apart with more data. For example if I pile enough words together (ascii coded): code:
So I either: don't understand what Huffman trees are, build a graphic representation that isn't accurate, or have a broken thing somewhere. I'm not sure how to start fixing this. This is some of my code, the whole thing is here, as well as a bunch of log data (computing weights and tree building) e: removed embedded code I wondered if relying on min() and index() to chose and remove elements could introduce inconsistencies, but it seems they'll always work "first index comes first" when there are two Nodes with the same weight. unpacked robinhood fucked around with this message at 12:39 on Dec 20, 2017 |
# ? Dec 20, 2017 04:37 |
|
unpacked robinhood posted:It gives the following tree, in this case the longest path is 9 bits long, which isn't an inprovement when storing 8 bit values I think ? A Huffman code minimizes the average code length for the data used to build it. If you compare a Huffman code to the shortest fixed width code then some words will have longer codes and some will have shorter codes, but it is specifically the common words that are short and the rare ones that are long. If you encode the input data and compute total_encoded_bits / number_of_words then this ratio will be smaller for the Huffman code.
|
# ? Dec 20, 2017 10:00 |
|
Dominoes posted:If you haven't tried Pipenv, give it a shot: It elegantly combines virtualenvs with pip, and has simplified my workflow. It's been hard-broken with two distinct bugs until a few days ago, but the latest version is good-to-go. Does pipenv still insist on using the deprecated virtualenv tool on all versions of Python, instead of using the venv standard library module when it's available? That was definitely the case when I checked last, and was a no-go for me to try out pipenv.
|
# ? Dec 20, 2017 19:50 |
|
When building out some code last week, I accidentally used {} to make an array instead of () e.g. arr = { ‘a’, ‘b’, ‘c’, } This was working but the order was getting goofed, which led me to discover my error. Is this just generating a dictionary of only keys with None values?
|
# ? Dec 20, 2017 20:03 |
|
Lysidas posted:Does pipenv still insist on using the deprecated virtualenv tool on all versions of Python, instead of using the venv standard library module when it's available? That was definitely the case when I checked last, and was a no-go for me to try out pipenv. Sockser posted:When building out some code last week, I accidentally used {} to make an array instead of () Dominoes fucked around with this message at 20:09 on Dec 20, 2017 |
# ? Dec 20, 2017 20:07 |
|
Sockser posted:When building out some code last week, I accidentally used {} to make an array instead of () It's called a set. It kind of acts like dictionary keys in that there are no duplicates, only hashable values are allowed, and order is undefined (although is this changing for sets too?), but there are no values, not even None. It's useful for keeping track of things where you don't want to keep multiple copies of anything, for getting the unique elements of another collection, or of course for set operations like union and intersection. Edit: it's faster for membership testing (the "in" keyword) so I often use set literals in expressions like if extension in {'csv', 'tsv', 'xls', 'xlsx'}: (even though a speed consideration is pointless in this example, the semantics of using a set here work better as well).
|
# ? Dec 20, 2017 20:12 |
|
Sockser posted:When building out some code last week, I accidentally used {} to make an array instead of () No, this creates a set object: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset Sets are unordered collections of distinct objects, so the order you see is arbitrary, and duplicate elements are only stored once: code:
e: wow beaten twice, that was fast Lysidas fucked around with this message at 20:14 on Dec 20, 2017 |
# ? Dec 20, 2017 20:12 |
|
Now that's irony
|
# ? Dec 20, 2017 20:19 |
|
Tried to show that to someone who showed up to my desk after I posted and the interpreter spit out set(xyz) I probably should’ve run it through the interpreter before asking On the bright side, I didn’t know the python had native support for sets like that so that’s neat Edit: x = {‘a’} Yields set(a) x = {‘a’: 1} Gives you a dict So what is the type of x if you x = {} E2: type(x) tells me it’s a dictionary. Can you turn that into an empty set somehow? Sockser fucked around with this message at 20:36 on Dec 20, 2017 |
# ? Dec 20, 2017 20:33 |
|
x = {} creates a dictionary, x = set() if you want to create an empty set
|
# ? Dec 20, 2017 20:39 |
|
Nippashish posted:A Huffman code minimizes the average code length for the data used to build it. If you compare a Huffman code to the shortest fixed width code then some words will have longer codes and some will have shorter codes, but it is specifically the common words that are short and the rare ones that are long. If you encode the input data and compute total_encoded_bits / number_of_words then this ratio will be smaller for the Huffman code. Thanks ! I have the compression side doing...something now. Assuming it works correctly, it does wonders for RLE encoded b&w images, not so much on the data I wanted to use it on.
|
# ? Dec 20, 2017 21:20 |
Sets have a number of operators that aren't defined for dictionaries, too.Python code:
Eela6 fucked around with this message at 22:22 on Dec 20, 2017 |
|
# ? Dec 20, 2017 22:18 |
|
Since we're talking about sets, here's my favourite way to remove duplicates from a list: some_list = list(set(some_list))
|
# ? Dec 21, 2017 07:45 |
|
FoiledAgain posted:Since we're talking about sets, here's my favourite way to remove duplicates from a list: The main problem with this is that it doesn't maintain order, but it is clear and succinct. I can distinctly remember the first thing I ever did in Python that I felt was cool without any help from StackOverflow or anywhere: Python code:
|
# ? Dec 21, 2017 19:50 |
|
I was wondering there was a more pythonic way of doing this:code:
|
# ? Dec 21, 2017 20:14 |
|
LochNessMonster posted:I was wondering there was a more pythonic way of doing this: List comprehensions are pretty pythonic. How about something like this? code:
|
# ? Dec 21, 2017 20:36 |
LochNessMonster posted:I was wondering there was a more pythonic way of doing this: Absolutely. Python code:
Python code:
Eela6 fucked around with this message at 20:55 on Dec 21, 2017 |
|
# ? Dec 21, 2017 20:48 |
|
Eela6 posted:IAmKale, I like your solution except for one thing: it doesn't duplicate the behavior of the original function in the case that a key in d1 is missing from d2. The original function will raise a KeyError, but yours will silently ignore that key.
|
# ? Dec 21, 2017 22:38 |
IAmKale posted:Argh, busted. I got so caught up in fixing the example code that I assumed the mismatched keys were a typo The Zen Of Python posted:In the face of ambiguity, refuse the temptation to guess. (Take this with the appropriate amount of or )
|
|
# ? Dec 21, 2017 23:40 |
|
You can also doPython code:
which gives you all the name/extension combos for each server. It's an iterable representing each server, each item is an iterable of all that server's combos, held in tuples. So you can use chain to unpack all those nested iterables and just spit out one sequence of tuples, and join 'em up however you like (also it reads nicer if you change "d1" to "names" and "d2" to "extensions" or whatever!) baka kaba fucked around with this message at 00:09 on Dec 22, 2017 |
# ? Dec 22, 2017 00:05 |
|
Thanks for all the suggestions and examples, really opening up different ways to approach the issue I wouldn’t have ever known about. The d1/d2 are from my piece of code that I was testing with. In the original code they’re named appropriately.
|
# ? Dec 22, 2017 11:29 |
|
|
# ? May 15, 2024 03:15 |
|
Oh I meant in mine really, just that attribute[server] will read more clearly, so it's obvious what you're getting the product of
|
# ? Dec 22, 2017 12:31 |