Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Eela6
May 25, 2007
Shredded Hen
Fluent Python is by far my favorite Python book.

Adbot
ADBOT LOVES YOU

Eela6
May 25, 2007
Shredded Hen

onionradish posted:

I want to add multi-threading to a basic webscraper I've been tasked with. I have a list of URLs to spread across threads, but don't want to hit the same host simultaneously.

With a list of URLs, some from the same host, some from different hosts, what's the best way to set up thread Queue()s or some other URL pool so each thread can do simultaneous downloads as long as they're from different hosts?

This seems like something simple, and something that would be in stdlib collections or itertools, but I'm not seeing it. If it's actually a tricky issue, that's fine, and I'll work on a solution -- I just don't want to re-invent the wheel.

Sort them, then use itertools.groupby to split into groups by host. Separate the tasks by host rather than URL.

Eela6
May 25, 2007
Shredded Hen

MF_James posted:

So, I'd like to get into Python, I'm currently an infrastructure/Ops guy and I'd like to add Dev in front of that and get on the money train (and advanced with technology), I figured Python is fairly system agnostic and can be applied in a lot of places. Is a good starting place the https://docs.python.org/3/ (official docs) tutorial? I went to school for programming (did C/C++/C#, HTML/CSS, Java and a few other things), but did not graduate and it's been 10 years since I've touched any of that, but I do use powershell now as much as possible, just to give you a background and rough estimate of where I'm at.

I think the Python 3 docs are as solid of a place as any to start.

Eela6
May 25, 2007
Shredded Hen

Baby Babbeh posted:

Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns.

My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?

is there anything wrong with just using pure python?

Like, you could do this:

Python code:
from typing import *
def flatten_subset(d: dict, keys = Iterable[str], *, sep: str = ".") -> Iterable:
  def get_nested_elem(key: str):
    v = d
    for k in key.split(sep):
        v = v[k]
    return v
    
  for key in keys:
    yield get_nested_elem(key)


if __name__ == '__main__': # test
  d = {"foo": [1, 2], "bar": {'0': 0, '1': 1}, "baz": 3}
  keys = ("foo", "bar.0")
  want = [[1, 2], 0]
  got = list(flatten_subset(d, keys))
  assert got == want

From there the 'naive' approach should work just fine.

Eela6 fucked around with this message at 23:45 on Dec 14, 2017

Eela6
May 25, 2007
Shredded Hen
I love fun weird metaprogramming stuff.

Python code:
def ordering_mixin(*args: str, default: Any = None):
    """order_by returns a mixin class which provides ordering operators. 
    These operators order lexigraphically by the attributes with the names in 'args'. 
    ordering_mixin will only compare two classes with the same ordering attrributes; that is,
    classes created with equivalent ordering_mixins"""

    if default is not None:
        def attrs(obj: Any) -> Iterator[Any]: 
            return (getattr(obj, arg, default) for arg in args)
    else:
        def attrs(obj: Any) -> Iterator[Any]:
            return (getattr(obj, arg) for arg in args)

    
    @total_ordering
    class OrderedMixin:
        _ordered_mixin_args = tuple(args)
        def __lt__(self, other: Any):
            if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
                return NotImplemented
            for a, b in zip(attrs(self), attrs(other)):
                if a < b:
                    return True
                elif b < a:
                    return False
            return False

        def __eq__(self, other: Any):
            if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
                return NotImplemented
            return all(a == b for a, b in zip(attrs(self), attrs(other)))
        

    return OrderedMixin

if __name__ == '__main__':
    class TestClass(ordering_mixin('a', 'b')):
       def __init__(self, a, b, c, d):
            self.a, self.b, self.c, self.d = a, b, c, d


    foo = TestClass(2, 3, 0, 0)
    bar = TestClass(2, 4, 0, 0)
    baz = TestClass(1, 3, 0, 0)

    class TestClass2(ordering_mixin('c', 'd')):
        def __init__(self, a, b, c, d):
           self.a, self.b, self.c, self.d = a, b, c, d

    assert foo < bar
    assert bar > baz
    assert foo == foo
    poo = TestClass2(2, 3, 0, 0)
    try:
        foo < poo
        raise AssertionError('should have raised a typeerror')
    except TypeError as e:
        pass

Eela6 fucked around with this message at 21:03 on Dec 19, 2017

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

OrderedMixin!

my bad ;) I haven't used python for real dev work in a while, since I work in Go professionally. I just like to come back and stretch my wings sometimes :)

Eela6
May 25, 2007
Shredded Hen
Sets have a number of operators that aren't defined for dictionaries, too.

Python code:
a = {'foo', 'bar'}
b = {'bar', 'baz'}
assert  a|b ==  {'foo', 'bar', 'baz'} # union
assert a & b == {'bar'} # intersection 
assert a ^ b == {'foo', 'baz'} # xor
assert a-b == {'foo'} #difference

assert {'foo'} < {'foo', 'bar'} # proper subset
assert not {'foo'} < {'foo'} # a set is not a proper subset of itself
assert {'foo'} <= {'foo'} # improper subset

These four in-place operators (|=, &=, -=, ^=) are also available.

Eela6 fucked around with this message at 22:22 on Dec 20, 2017

Eela6
May 25, 2007
Shredded Hen

LochNessMonster posted:

I was wondering there was a more pythonic way of doing this:

code:
d1 = { ‘servertype’ : [“name1”, “name2”, etc], ‘servertype2’ : [“name3”, etc] }


d2 = { ‘servertype1’ : [“extension1”, “extension2”, etc], ‘servertype2’ : [“extension3”, etc] }

some_list = []

for k, v in d1.items():
  for x in range(len(d1):
    for y in range(len(d2):
      some_list.append(str(d1[k][x]) + str(d2[k][y]))

print(some_list)


Absolutely.

Python code:
from typing import *

def old_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> List[str]:
    some_list = []

    for k, v in d1.items():
        for x in range(len(d1)):
            for y in range(len(d2)):
                some_list.append(str(d1[k][x]) + str(d2[k][y]))
    return some_list



def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
    for k in d1:
        v1, v2 = d1[k], d2[k]
        for x in v1:
            for y in v2:
                yield str(x)+str(y)
            

if __name__ == '__main__':
    d1 = {n: [f"d1_{n}_{m}" for m in range(3)] for n in range(3)}
    d2 = {n: [f'd2_{n}_{m}' for m in range(3)] for n in range(3)}
    want = old_lookup(d1, d2)
    got = list(new_lookup(d1, d2))
    
    assert want == got

You can get even fancier like this, but this is starting to rely on your audience having a very through understanding of modern python:
Python code:
def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
    for k in d1:
        v1, v2 = d1[k], d2[k]
        return (str(x)+str(y) for x in v1 for y in v2)
IAmKale, I like your solution except for one thing: it doesn't duplicate the behavior of the original function in the case that a key in d1 is missing from d2. The original function will raise a KeyError, but yours will silently ignore that key.

Eela6 fucked around with this message at 20:55 on Dec 21, 2017

Eela6
May 25, 2007
Shredded Hen

IAmKale posted:

Argh, busted. I got so caught up in fixing the example code that I assumed the mismatched keys were a typo :negative:

The Zen Of Python posted:

In the face of ambiguity, refuse the temptation to guess.

(Take this with the appropriate amount of :goonsay: or :smugdog:)

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Somehow I missed that this was part of 3.5, but you can now merge two dicts into a new dict in a single expression:

Python code:
z = {**x, **y}  # y keys overwrite existing keys in x
Hot drat.

Yeah, that's a good one. Cleaner and terser than
Python code:
 {k:v for k, v in itertools.chain(x.items(), y.items())}

Eela6
May 25, 2007
Shredded Hen
They do, it's called .update(), which updates the dict in place. Creating a new dictionary from the contents of two or more others is what we're doing, which is somewhat different.

I'd like to call it a union, but Union's imply no loss of information, which isn't quite right if some of the dictionaries have overlapping keys with different values.

Eela6
May 25, 2007
Shredded Hen

Wallet posted:

I have very limited programming experience generally and even less experience with Python, so I'll apologize if this is a really stupid question, but I wasn't able to find much from googling:

I've got a csv file with a little over 90,000 rows that each have a key in the first column and a value in the second. I also have a list of keys that I want to retrieve the values for.

Currently, I'm using csv.reader to read the file into a dictionary and then looping through my list of keys to retrieve the value for each from the dictionary. This works, but I have a feeling that this is a really stupid/inefficient way of going about things.

The other approach that comes to mind is creating a duplicate of the list of keys that I want to retrieve values for, iterating through the rows of the file checking if that row matches any of the keys I'm after, storing the value and removing the key from my duplicate list if it does match, and continuing on until the duplicate list is empty.

Am I an idiot? Is either of these approaches appropriate? Is there a better solution?

Either approach should work just fine, but the first one is probably better. 90000 rows is really not that many.

Eela6
May 25, 2007
Shredded Hen

Roundboy posted:

as pointed out to me in the general question thread, we have a python thread, so here goes.

I have formatting configured in a json file to specifically add the logging.Formatter as:
code:
'%(asctime)s|%(filename)s|%(name)s|%(levelname)s|%(jobid)s|%(source)s|%(message)s'
jobid and source being my own variables i am appending to each log message which works swimmingly. The problem is when I use other imported modules that also use standard python logging like pysftp or googleauth that throw key errors dumping out their debug logs and complaining that they don;t know jobid or source ala :
code:
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib64/python3.5/logging/handlers.py", line 71, in emit
    if self.shouldRollover(record):
  File "/usr/lib64/python3.5/logging/handlers.py", line 187, in shouldRollover
    msg = "%s\n" % self.format(record)
  File "/usr/lib64/python3.5/logging/__init__.py", line 830, in format
    return fmt.format(record)
  File "/usr/lib64/python3.5/logging/__init__.py", line 570, in format
    s = self.formatMessage(record)
  File "/usr/lib64/python3.5/logging/__init__.py", line 539, in formatMessage
    return self._style.format(record)
  File "/usr/lib64/python3.5/logging/__init__.py", line 383, in format
    return self._fmt % record.__dict__
KeyError: 'jobid'
Call stack:
  File "/usr/lib64/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/site-packages/paramiko/transport.py", line 1908, in run
    handler(self.auth_handler, m)
  File "/usr/local/lib/python3.5/site-packages/paramiko/auth_handler.py", line 580, in _parse_userauth_success
    'Authentication ({}) successful!'.format(self.auth_method))
  File "/usr/local/lib/python3.5/site-packages/paramiko/auth_handler.py", line 75, in _log
    return self.transport._log(*args)
  File "/usr/local/lib/python3.5/site-packages/paramiko/transport.py", line 1687, in _log
    self.logger.log(level, msg, *args)
Message: 'Authentication (password) successful!'

I am having trouble wrapping my head around what the best approach is here. Can i create a custom logging class to auto add those variables to all logs regardless? Am I going about this all wrong ? What is the nest practice here? I cant get the proper google search terms to bring up what i want to happen, and the existing docs on logging dont seem to cover this situation.

The problem here is that you're specifying behavior for logging.Formatter that relies on information that logging.Formatter doesn't have.

Instead, you should probably specify formatting behavior for a subclass of logging.Formatter that you control. Logs you create will use the special behavior, but other logs will be unaffected.

Eela6
May 25, 2007
Shredded Hen

Slimchandi posted:

Trying to read up on abstract base classes, found a few tutorials and Pycon videos but they all seem a bit shallow in their explanation.

Have I got this roughly right?

- Abstract base classes are designed not to be called directly, but inherited from. ABCs force derived classes to implement their 'abstract' methods.

- If any of these method is not implemented in a derived class, TypeError is thrown when the program tries to run.

- An ABC may fully implement some of these abstract methods, or simply pass, leaving the derived class to handle the implementation.

- Likewise, the derived class may override an abstract method, or make a call to super() and use the ABCs implementation directly (inheritance of abstract methods is not permitted)

If I've got this correct (and that's a big if), then I can see the use of ABCs when are writing classes that others will use or have complex structures, as you need to enforce those methods for the whole thing to work.


You've got it.



Slimchandi posted:


But even if the object I'm describing in my class is 'abstract' (e.g. Bird), which I never call directly, and I only use it to subclass instances of 'real' birds (Gull, Eagle, Owl), this situation wouldn't benefit from using an ABC. I would be better off with a standard Bird class that I inherit from, right?
Yes. ABCs are not really meant for production code. They're meant for designing user-extensible frameworks.

Alex Martelli, in Luciano Ramalho's Fluent Python, pg. 331 posted:

"ABC's are meant to encapsulate very general concepts, abstractions, introduced by a framework - things like "a sequence" and "an exact number". [Readers] most likely don't need to write any new ABCs, just use existing ones correctly, to get 99.9% of the benefits without serious risk of misdesign.

Eela6 fucked around with this message at 21:05 on Jan 17, 2018

Eela6
May 25, 2007
Shredded Hen
It is an extremely through text. There's a lot to take in.

Eela6 fucked around with this message at 22:47 on Jan 17, 2018

Eela6
May 25, 2007
Shredded Hen
If you want arbitrary precision, you should use the decimal library.

Alternately, use math.isclose()

Eela6
May 25, 2007
Shredded Hen

baka kaba posted:

The more pythony way might be to have a generator that yields lists (basically consumes the iterator, adding each item to a list it's building until the item is too big, yields the old list and creates a new one for the item). That way you can feed that into another generator that filters out the single lists, or have the first generator just not yield a single item list

I agree. I would do it like this:

Python code:
from typing import *
def get_distance_groups(a: List[int], tol:int) -> Iterator[List[int]]:
    start, prev = 0, a[0]
    for i, n in enumerate(a):
        if abs(prev-n) > tol: # important! n could be negative!
            yield a[start:i]
            start = i
        prev = n
    if start < len(a)-1:
        yield a[start:]

Then filter the output.

IN:
Python code:
groups = get_distance_groups(a= [0, 14, 18, 20, 36, 41, 62, 70, 72], tol=5)
print([x for x in groups if len(x) > 1])
OUT:
[[14, 18, 20], [36, 41], [70, 72]]

Eela6 fucked around with this message at 00:36 on Feb 9, 2018

Eela6
May 25, 2007
Shredded Hen

Slimchandi posted:

I always try to avoid doing callable/in-place stuff in a list comprehension eg [print(letter) for letter in name]. Is there any particular reason to avoid this? It seems to be more concise than the indentation of a for-loop.

Yes. You shouldn't use a list comprehension for it's side effects. A list comprehension should be used to create a list. In fact, generally speaking, any function which had side effects shouldn't be used in a comprehension; they're explicitly for a functional style of programming;

What you're talking about is the equivalent of these lines of code:

Python code:
a = []
for letter in name:
    a.append(print(letter))
Indeed,
IN:
Python code:
print([print(letter) for letter in 'word'])
OUT:
pre:
w
o
r
d
[None, None, None, None]

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Yes. Blame history.


Everything you said is correct, but I just wanted to point out that the poster said "callable", and using a callable is fine in a list comprehension, it's just that the particular example the poster used wasn't great.

(of course, it could be the reverse and the example was on-point and the word "callable" wasn't exactly what he meant)

They used 'callable' as a synonym for in-place, so I assumed they meant 'mutable', given the example.

Eela6
May 25, 2007
Shredded Hen

Thermopyle posted:

Yes, I know what you assumed, I'm pointing out that there's more than one way to read it so the poster doesn't get confused.

Thanks!

Eela6
May 25, 2007
Shredded Hen

JVNO posted:

In list logic, .remove will remove the first item in a given list that matches the query...

How do you delete the last item in a list that matches a particular query?

Because the best I can come up with is to reverse the list, apply the remove function, then reverse the list again. And that strikes me as terribly inefficient :v:

I would do it like this:

Use reversed() to iterate over the list in reverse (i.e, from back to front) without modifying the list.

Find the index to remove, then use the del statement to remove that element of the list.

Note that removing elements from the middle of a list is not a particularly efficient operation.

Putting it together:

Python code:
from typing import *
def delete_last_matching_inplace(a: List[int], k: int):
    for i, n in enumerate(reversed(a), 1):
        if n == k:
            del a[-i]
            return
    raise ValueError(f'no element in {a} matches {k}')
pre:
In [12]: a = [1, 2, 3, 2, 5, 2]

In [13]: delete_last_matching_inplace(a, 2)

In [14]: a
Out[14]: [1, 2, 3, 2, 5]

In [15]: delete_last_matching_inplace(a, 2)

In [16]: a
Out[16]: [1, 2, 3, 5]

In [17]: delete_last_matching_inplace(a, 2)

In [18]: a
Out[18]: [1, 3, 5]

In [19]: delete_last_matching_inplace(a, 2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-dc87b4f3a28d> in <module>()
----> 1 delete_last_matching_inplace(a, 2)

<ipython-input-3-b7155902c6b2> in delete_last_matching_inplace(a, k)
      4             del a[-i]
      5             return
----> 6     raise ValueError(f'no element in {a} matches {k}')

ValueError: no element in [1, 3, 5] matches 2

Eela6
May 25, 2007
Shredded Hen

JVNO posted:

Wow, great responses and super quick. Unfortunately the responses aren’t easily applied to my own program- and I decided instead to rebuild the program in a way that obviated the need for removal.

For anyone curious, I needed a list generated that includes 20 of each of the following:

NR
L0
L2P
L2T
L4P
L4T
L8P
L8T

For a total of 160 items in the list. All of these stand for different experimental conditions, and are randomly presented, but some conditions are related. The rules are:

L0 and NR can go anywhere in the list that doesn’t conflict with another rule.
For n = L2P, n + 1 = L2T
For n = L4P, n + 2 = L4T
For n = L8P, n + 4 = L8T

I’m phone posting now, but my new approach is to add the L(X)P items to the list at the start, shuffle the order, and use that as a seed for a pseudo-random procedural generator. The procedural generator will then populate the list with L(X)T items, using L0/NR items as filler when necessary.

It’s a heck of a lot more complicated than I thought ought to be necessary (~150 lines of code), and is slower than my usual experiment list generator, but I’m ironing out a final couple bugs (usually missing L(X)P items) and it appears to work.

This is an interesting problem that's more difficult than it appears. I spent a little bit of time fiddling with it and wasn't able to find a solution that preserved 'true' randomness (i.e, all valid strings are equally likely given the limits of the prng) that wasn't O(n2) or worse.

If it's not sensitive, would you mind showing me what you end up with?

Eela6
May 25, 2007
Shredded Hen

ed, nm, answering the wrong question.

PS: you shouldn't pop things from the front of a list. If you must pop from the front, use collections.deque; if the order doesn't matter, just use pop() (without arguments) or iterate through.

A couple other bits of python niceties you could use:

the enumerate creates an index for the collection you're looping through. EG:
Python code:
for i, c in enumerate(['g','o','o','n']):
    print(f'({i}, {c})', end="\t")
pre:
(0, g)  (1, o)  (2, o)  (3, n)
A more 'pythonic' take on your insertion sort code would look like this:
Python code:
def insertion_sort(unsorted): #use a more descriptive name for variable argument rather than aliasing 'list'
    def insert(sorted, new_value):
        # no need for bounds check; this loop is a no-op if len(a) == 0
        for j, v in enumerate(a):
            if new_value <= v:
                sorted.insert(j, new_value)
                return a 
       
        a.append(new_value) # v is bigger than every element in a (this also covers the empty list)
        return a

    sorted = []
    for v in unsorted:
        sorted = insert(sorted, v)
    return sorted

Eela6 fucked around with this message at 05:46 on Mar 18, 2018

Eela6
May 25, 2007
Shredded Hen

Mootallica posted:

I might be blind because I haven't seen it discussed, but is the Python Humble Bundle worth it?


I was originally looking at the $15 tier as I don't think the other tiers would be any good for me (At best I just write small scripts to help automate things, web scraping and DB stuff) - but I remember reading earlier in the thread that Fluent Python is good and probably worth the $20?

Fluent Python is a total steal at $20. I'm sure you can find something of worth in the rest of the bundle, too.

Eela6
May 25, 2007
Shredded Hen

porksmash posted:

In the python docs for glob, it's defined like so:

code:
glob.glob(pathname, *, recursive=False)
What does that asterisk after pathname mean? The glob function only takes 1 positional argument.

The * means that's the end of positional arguments. recursive is a keyword-only argument. You can call it with glob.glob(somepath, recursive=True) if you'd like.

Eela6
May 25, 2007
Shredded Hen
Python code:

tol = 10
foo = [abs(a-b)<tol for b in query]

Eela6
May 25, 2007
Shredded Hen

unpacked robinhood posted:

I'm trying to apply a function f(x,y) to each possible combination of two strings in a list.

Python code:
    names = ['joel','barney','georges','amsterdam']
    # 4 is a random placeholder value
    d = pd.DataFrame(4,names,names)
code:
           joel  barney  georges  amsterdam
joel          4       4        4          4
barney        4       4        4          4
georges       4       4        4          4
amsterdam     4       4        4          4
I'd like to compute a value for each cell (in my case the Levenshtein distance) using the row index and column index as value.
Googling gets me vague and complicated answers so I'm probably not going in the right direction ?

This is a great place for a generator or list comprehension.


IN
Python code:
def add(x,y): 
     return x+y

print([ [f(x,y) for x in 'abc'] for y in 'abc'])
OUT:
pre:
[['aa', 'ba', 'ca'], ['ab', 'bb', 'cb'], ['ac', 'bc', 'cc']]

Adbot
ADBOT LOVES YOU

Eela6
May 25, 2007
Shredded Hen
I think you should declare something as close to where it is used as possible.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply