- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Fluent Python is by far my favorite Python book.
|
#
¿
Oct 21, 2017 21:18
|
|
- Adbot
-
ADBOT LOVES YOU
|
|
#
¿
May 15, 2024 02:46
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I want to add multi-threading to a basic webscraper I've been tasked with. I have a list of URLs to spread across threads, but don't want to hit the same host simultaneously.
With a list of URLs, some from the same host, some from different hosts, what's the best way to set up thread Queue()s or some other URL pool so each thread can do simultaneous downloads as long as they're from different hosts?
This seems like something simple, and something that would be in stdlib collections or itertools, but I'm not seeing it. If it's actually a tricky issue, that's fine, and I'll work on a solution -- I just don't want to re-invent the wheel.
Sort them, then use itertools.groupby to split into groups by host. Separate the tasks by host rather than URL.
|
#
¿
Dec 1, 2017 21:32
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
So, I'd like to get into Python, I'm currently an infrastructure/Ops guy and I'd like to add Dev in front of that and get on the money train (and advanced with technology), I figured Python is fairly system agnostic and can be applied in a lot of places. Is a good starting place the https://docs.python.org/3/ (official docs) tutorial? I went to school for programming (did C/C++/C#, HTML/CSS, Java and a few other things), but did not graduate and it's been 10 years since I've touched any of that, but I do use powershell now as much as possible, just to give you a background and rough estimate of where I'm at.
I think the Python 3 docs are as solid of a place as any to start.
|
#
¿
Dec 7, 2017 21:21
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Pandas question! I've got a json file with stupid amounts of nesting that I want to turn into a nice flat datafile. Basically for each record I want to pull just a few features out of each property that are nested two and sometimes three layers deep rather than just flattening the whole thing out and ending up with a ton of extraneous columns.
My naive approach was to create an empty dataframe, iterate through the json file and grab things, stick those in a Series and then stick the Series in the dataframe, but I know this can't be the right way to do this. What should I be doing instead?
is there anything wrong with just using pure python?
Like, you could do this:
Python code:from typing import *
def flatten_subset(d: dict, keys = Iterable[str], *, sep: str = ".") -> Iterable:
def get_nested_elem(key: str):
v = d
for k in key.split(sep):
v = v[k]
return v
for key in keys:
yield get_nested_elem(key)
if __name__ == '__main__': # test
d = {"foo": [1, 2], "bar": {'0': 0, '1': 1}, "baz": 3}
keys = ("foo", "bar.0")
want = [[1, 2], 0]
got = list(flatten_subset(d, keys))
assert got == want
From there the 'naive' approach should work just fine.
Eela6 fucked around with this message at 23:45 on Dec 14, 2017
|
#
¿
Dec 14, 2017 23:41
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I love fun weird metaprogramming stuff.
Python code:def ordering_mixin(*args: str, default: Any = None):
"""order_by returns a mixin class which provides ordering operators.
These operators order lexigraphically by the attributes with the names in 'args'.
ordering_mixin will only compare two classes with the same ordering attrributes; that is,
classes created with equivalent ordering_mixins"""
if default is not None:
def attrs(obj: Any) -> Iterator[Any]:
return (getattr(obj, arg, default) for arg in args)
else:
def attrs(obj: Any) -> Iterator[Any]:
return (getattr(obj, arg) for arg in args)
@total_ordering
class OrderedMixin:
_ordered_mixin_args = tuple(args)
def __lt__(self, other: Any):
if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
return NotImplemented
for a, b in zip(attrs(self), attrs(other)):
if a < b:
return True
elif b < a:
return False
return False
def __eq__(self, other: Any):
if not hasattr(other, '_ordered_mixin_args') or other._ordered_mixin_args != self._ordered_mixin_args:
return NotImplemented
return all(a == b for a, b in zip(attrs(self), attrs(other)))
return OrderedMixin
if __name__ == '__main__':
class TestClass(ordering_mixin('a', 'b')):
def __init__(self, a, b, c, d):
self.a, self.b, self.c, self.d = a, b, c, d
foo = TestClass(2, 3, 0, 0)
bar = TestClass(2, 4, 0, 0)
baz = TestClass(1, 3, 0, 0)
class TestClass2(ordering_mixin('c', 'd')):
def __init__(self, a, b, c, d):
self.a, self.b, self.c, self.d = a, b, c, d
assert foo < bar
assert bar > baz
assert foo == foo
poo = TestClass2(2, 3, 0, 0)
try:
foo < poo
raise AssertionError('should have raised a typeerror')
except TypeError as e:
pass
Eela6 fucked around with this message at 21:03 on Dec 19, 2017
|
#
¿
Dec 19, 2017 20:49
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Sets have a number of operators that aren't defined for dictionaries, too.
Python code:a = {'foo', 'bar'}
b = {'bar', 'baz'}
assert a|b == {'foo', 'bar', 'baz'} # union
assert a & b == {'bar'} # intersection
assert a ^ b == {'foo', 'baz'} # xor
assert a-b == {'foo'} #difference
assert {'foo'} < {'foo', 'bar'} # proper subset
assert not {'foo'} < {'foo'} # a set is not a proper subset of itself
assert {'foo'} <= {'foo'} # improper subset
These four in-place operators (|=, &=, -=, ^=) are also available.
Eela6 fucked around with this message at 22:22 on Dec 20, 2017
|
#
¿
Dec 20, 2017 22:18
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I was wondering there was a more pythonic way of doing this:
code:d1 = { ‘servertype’ : [“name1”, “name2”, etc], ‘servertype2’ : [“name3”, etc] }
d2 = { ‘servertype1’ : [“extension1”, “extension2”, etc], ‘servertype2’ : [“extension3”, etc] }
some_list = []
for k, v in d1.items():
for x in range(len(d1):
for y in range(len(d2):
some_list.append(str(d1[k][x]) + str(d2[k][y]))
print(some_list)
Absolutely.
Python code:from typing import *
def old_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> List[str]:
some_list = []
for k, v in d1.items():
for x in range(len(d1)):
for y in range(len(d2)):
some_list.append(str(d1[k][x]) + str(d2[k][y]))
return some_list
def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
for k in d1:
v1, v2 = d1[k], d2[k]
for x in v1:
for y in v2:
yield str(x)+str(y)
if __name__ == '__main__':
d1 = {n: [f"d1_{n}_{m}" for m in range(3)] for n in range(3)}
d2 = {n: [f'd2_{n}_{m}' for m in range(3)] for n in range(3)}
want = old_lookup(d1, d2)
got = list(new_lookup(d1, d2))
assert want == got
You can get even fancier like this, but this is starting to rely on your audience having a very through understanding of modern python:
Python code:def new_lookup(d1: Dict[int, List[str]], d2: Dict[int, List[str]]) -> Iterable[str]:
for k in d1:
v1, v2 = d1[k], d2[k]
return (str(x)+str(y) for x in v1 for y in v2)
IAmKale, I like your solution except for one thing: it doesn't duplicate the behavior of the original function in the case that a key in d1 is missing from d2. The original function will raise a KeyError, but yours will silently ignore that key.
Eela6 fucked around with this message at 20:55 on Dec 21, 2017
|
#
¿
Dec 21, 2017 20:48
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Somehow I missed that this was part of 3.5, but you can now merge two dicts into a new dict in a single expression:
Python code:z = {**x, **y} # y keys overwrite existing keys in x
Hot drat.
Yeah, that's a good one. Cleaner and terser than Python code: {k:v for k, v in itertools.chain(x.items(), y.items())}
|
#
¿
Jan 6, 2018 04:13
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
They do, it's called .update(), which updates the dict in place. Creating a new dictionary from the contents of two or more others is what we're doing, which is somewhat different.
I'd like to call it a union, but Union's imply no loss of information, which isn't quite right if some of the dictionaries have overlapping keys with different values.
|
#
¿
Jan 6, 2018 04:46
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I have very limited programming experience generally and even less experience with Python, so I'll apologize if this is a really stupid question, but I wasn't able to find much from googling:
I've got a csv file with a little over 90,000 rows that each have a key in the first column and a value in the second. I also have a list of keys that I want to retrieve the values for.
Currently, I'm using csv.reader to read the file into a dictionary and then looping through my list of keys to retrieve the value for each from the dictionary. This works, but I have a feeling that this is a really stupid/inefficient way of going about things.
The other approach that comes to mind is creating a duplicate of the list of keys that I want to retrieve values for, iterating through the rows of the file checking if that row matches any of the keys I'm after, storing the value and removing the key from my duplicate list if it does match, and continuing on until the duplicate list is empty.
Am I an idiot? Is either of these approaches appropriate? Is there a better solution?
Either approach should work just fine, but the first one is probably better. 90000 rows is really not that many.
|
#
¿
Jan 8, 2018 17:53
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
as pointed out to me in the general question thread, we have a python thread, so here goes.
I have formatting configured in a json file to specifically add the logging.Formatter as:
code:'%(asctime)s|%(filename)s|%(name)s|%(levelname)s|%(jobid)s|%(source)s|%(message)s'
jobid and source being my own variables i am appending to each log message which works swimmingly. The problem is when I use other imported modules that also use standard python logging like pysftp or googleauth that throw key errors dumping out their debug logs and complaining that they don;t know jobid or source ala :
code:--- Logging error ---
Traceback (most recent call last):
File "/usr/lib64/python3.5/logging/handlers.py", line 71, in emit
if self.shouldRollover(record):
File "/usr/lib64/python3.5/logging/handlers.py", line 187, in shouldRollover
msg = "%s\n" % self.format(record)
File "/usr/lib64/python3.5/logging/__init__.py", line 830, in format
return fmt.format(record)
File "/usr/lib64/python3.5/logging/__init__.py", line 570, in format
s = self.formatMessage(record)
File "/usr/lib64/python3.5/logging/__init__.py", line 539, in formatMessage
return self._style.format(record)
File "/usr/lib64/python3.5/logging/__init__.py", line 383, in format
return self._fmt % record.__dict__
KeyError: 'jobid'
Call stack:
File "/usr/lib64/python3.5/threading.py", line 882, in _bootstrap
self._bootstrap_inner()
File "/usr/lib64/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/site-packages/paramiko/transport.py", line 1908, in run
handler(self.auth_handler, m)
File "/usr/local/lib/python3.5/site-packages/paramiko/auth_handler.py", line 580, in _parse_userauth_success
'Authentication ({}) successful!'.format(self.auth_method))
File "/usr/local/lib/python3.5/site-packages/paramiko/auth_handler.py", line 75, in _log
return self.transport._log(*args)
File "/usr/local/lib/python3.5/site-packages/paramiko/transport.py", line 1687, in _log
self.logger.log(level, msg, *args)
Message: 'Authentication (password) successful!'
I am having trouble wrapping my head around what the best approach is here. Can i create a custom logging class to auto add those variables to all logs regardless? Am I going about this all wrong ? What is the nest practice here? I cant get the proper google search terms to bring up what i want to happen, and the existing docs on logging dont seem to cover this situation.
The problem here is that you're specifying behavior for logging.Formatter that relies on information that logging.Formatter doesn't have.
Instead, you should probably specify formatting behavior for a subclass of logging.Formatter that you control. Logs you create will use the special behavior, but other logs will be unaffected.
|
#
¿
Jan 17, 2018 19:32
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Trying to read up on abstract base classes, found a few tutorials and Pycon videos but they all seem a bit shallow in their explanation.
Have I got this roughly right?
- Abstract base classes are designed not to be called directly, but inherited from. ABCs force derived classes to implement their 'abstract' methods.
- If any of these method is not implemented in a derived class, TypeError is thrown when the program tries to run.
- An ABC may fully implement some of these abstract methods, or simply pass, leaving the derived class to handle the implementation.
- Likewise, the derived class may override an abstract method, or make a call to super() and use the ABCs implementation directly (inheritance of abstract methods is not permitted)
If I've got this correct (and that's a big if), then I can see the use of ABCs when are writing classes that others will use or have complex structures, as you need to enforce those methods for the whole thing to work.
You've got it.
But even if the object I'm describing in my class is 'abstract' (e.g. Bird), which I never call directly, and I only use it to subclass instances of 'real' birds (Gull, Eagle, Owl), this situation wouldn't benefit from using an ABC. I would be better off with a standard Bird class that I inherit from, right?
Yes. ABCs are not really meant for production code. They're meant for designing user-extensible frameworks.
Alex Martelli, in Luciano Ramalho's Fluent Python, pg. 331 posted:
"ABC's are meant to encapsulate very general concepts, abstractions, introduced by a framework - things like "a sequence" and "an exact number". [Readers] most likely don't need to write any new ABCs, just use existing ones correctly, to get 99.9% of the benefits without serious risk of misdesign.
Eela6 fucked around with this message at 21:05 on Jan 17, 2018
|
#
¿
Jan 17, 2018 20:58
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
It is an extremely through text. There's a lot to take in.
Eela6 fucked around with this message at 22:47 on Jan 17, 2018
|
#
¿
Jan 17, 2018 22:45
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
If you want arbitrary precision, you should use the decimal library.
Alternately, use math.isclose()
|
#
¿
Jan 29, 2018 00:37
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
The more pythony way might be to have a generator that yields lists (basically consumes the iterator, adding each item to a list it's building until the item is too big, yields the old list and creates a new one for the item). That way you can feed that into another generator that filters out the single lists, or have the first generator just not yield a single item list
I agree. I would do it like this:
Python code:from typing import *
def get_distance_groups(a: List[int], tol:int) -> Iterator[List[int]]:
start, prev = 0, a[0]
for i, n in enumerate(a):
if abs(prev-n) > tol: # important! n could be negative!
yield a[start:i]
start = i
prev = n
if start < len(a)-1:
yield a[start:]
Then filter the output.
IN:
Python code:groups = get_distance_groups(a= [0, 14, 18, 20, 36, 41, 62, 70, 72], tol=5)
print([x for x in groups if len(x) > 1])
OUT:
[[14, 18, 20], [36, 41], [70, 72]]
Eela6 fucked around with this message at 00:36 on Feb 9, 2018
|
#
¿
Feb 9, 2018 00:32
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I always try to avoid doing callable/in-place stuff in a list comprehension eg [print(letter) for letter in name]. Is there any particular reason to avoid this? It seems to be more concise than the indentation of a for-loop.
Yes. You shouldn't use a list comprehension for it's side effects. A list comprehension should be used to create a list. In fact, generally speaking, any function which had side effects shouldn't be used in a comprehension; they're explicitly for a functional style of programming;
What you're talking about is the equivalent of these lines of code:
Python code:a = []
for letter in name:
a.append(print(letter))
Indeed,
IN:
Python code:print([print(letter) for letter in 'word'])
OUT:
pre:w
o
r
d
[None, None, None, None]
|
#
¿
Feb 9, 2018 19:57
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Yes. Blame history.
Everything you said is correct, but I just wanted to point out that the poster said "callable", and using a callable is fine in a list comprehension, it's just that the particular example the poster used wasn't great.
(of course, it could be the reverse and the example was on-point and the word "callable" wasn't exactly what he meant)
They used 'callable' as a synonym for in-place, so I assumed they meant 'mutable', given the example.
|
#
¿
Feb 9, 2018 20:49
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Yes, I know what you assumed, I'm pointing out that there's more than one way to read it so the poster doesn't get confused.
Thanks!
|
#
¿
Feb 9, 2018 20:53
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
In list logic, .remove will remove the first item in a given list that matches the query...
How do you delete the last item in a list that matches a particular query?
Because the best I can come up with is to reverse the list, apply the remove function, then reverse the list again. And that strikes me as terribly inefficient
I would do it like this:
Use reversed() to iterate over the list in reverse (i.e, from back to front) without modifying the list.
Find the index to remove, then use the del statement to remove that element of the list.
Note that removing elements from the middle of a list is not a particularly efficient operation.
Putting it together:
Python code:from typing import *
def delete_last_matching_inplace(a: List[int], k: int):
for i, n in enumerate(reversed(a), 1):
if n == k:
del a[-i]
return
raise ValueError(f'no element in {a} matches {k}')
pre:In [12]: a = [1, 2, 3, 2, 5, 2]
In [13]: delete_last_matching_inplace(a, 2)
In [14]: a
Out[14]: [1, 2, 3, 2, 5]
In [15]: delete_last_matching_inplace(a, 2)
In [16]: a
Out[16]: [1, 2, 3, 5]
In [17]: delete_last_matching_inplace(a, 2)
In [18]: a
Out[18]: [1, 3, 5]
In [19]: delete_last_matching_inplace(a, 2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-dc87b4f3a28d> in <module>()
----> 1 delete_last_matching_inplace(a, 2)
<ipython-input-3-b7155902c6b2> in delete_last_matching_inplace(a, k)
4 del a[-i]
5 return
----> 6 raise ValueError(f'no element in {a} matches {k}')
ValueError: no element in [1, 3, 5] matches 2
|
#
¿
Mar 2, 2018 01:02
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Wow, great responses and super quick. Unfortunately the responses aren’t easily applied to my own program- and I decided instead to rebuild the program in a way that obviated the need for removal.
For anyone curious, I needed a list generated that includes 20 of each of the following:
NR
L0
L2P
L2T
L4P
L4T
L8P
L8T
For a total of 160 items in the list. All of these stand for different experimental conditions, and are randomly presented, but some conditions are related. The rules are:
L0 and NR can go anywhere in the list that doesn’t conflict with another rule.
For n = L2P, n + 1 = L2T
For n = L4P, n + 2 = L4T
For n = L8P, n + 4 = L8T
I’m phone posting now, but my new approach is to add the L(X)P items to the list at the start, shuffle the order, and use that as a seed for a pseudo-random procedural generator. The procedural generator will then populate the list with L(X)T items, using L0/NR items as filler when necessary.
It’s a heck of a lot more complicated than I thought ought to be necessary (~150 lines of code), and is slower than my usual experiment list generator, but I’m ironing out a final couple bugs (usually missing L(X)P items) and it appears to work.
This is an interesting problem that's more difficult than it appears. I spent a little bit of time fiddling with it and wasn't able to find a solution that preserved 'true' randomness (i.e, all valid strings are equally likely given the limits of the prng) that wasn't O(n2) or worse.
If it's not sensitive, would you mind showing me what you end up with?
|
#
¿
Mar 3, 2018 00:39
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
ed, nm, answering the wrong question.
PS: you shouldn't pop things from the front of a list. If you must pop from the front, use collections.deque; if the order doesn't matter, just use pop() (without arguments) or iterate through.
A couple other bits of python niceties you could use:
the enumerate creates an index for the collection you're looping through. EG:
Python code:for i, c in enumerate(['g','o','o','n']):
print(f'({i}, {c})', end="\t")
pre:(0, g) (1, o) (2, o) (3, n)
A more 'pythonic' take on your insertion sort code would look like this:
Python code:def insertion_sort(unsorted): #use a more descriptive name for variable argument rather than aliasing 'list'
def insert(sorted, new_value):
# no need for bounds check; this loop is a no-op if len(a) == 0
for j, v in enumerate(a):
if new_value <= v:
sorted.insert(j, new_value)
return a
a.append(new_value) # v is bigger than every element in a (this also covers the empty list)
return a
sorted = []
for v in unsorted:
sorted = insert(sorted, v)
return sorted
Eela6 fucked around with this message at 05:46 on Mar 18, 2018
|
#
¿
Mar 18, 2018 05:03
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I might be blind because I haven't seen it discussed, but is the Python Humble Bundle worth it?
I was originally looking at the $15 tier as I don't think the other tiers would be any good for me (At best I just write small scripts to help automate things, web scraping and DB stuff) - but I remember reading earlier in the thread that Fluent Python is good and probably worth the $20?
Fluent Python is a total steal at $20. I'm sure you can find something of worth in the rest of the bundle, too.
|
#
¿
May 8, 2018 16:19
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
In the python docs for glob, it's defined like so:
code:glob.glob(pathname, *, recursive=False)
What does that asterisk after pathname mean? The glob function only takes 1 positional argument.
The * means that's the end of positional arguments. recursive is a keyword-only argument. You can call it with glob.glob(somepath, recursive=True) if you'd like.
|
#
¿
Jul 13, 2018 06:24
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
Python code:
tol = 10
foo = [abs(a-b)<tol for b in query]
|
#
¿
Aug 2, 2018 17:00
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I'm trying to apply a function f(x,y) to each possible combination of two strings in a list.
Python code: names = ['joel','barney','georges','amsterdam']
# 4 is a random placeholder value
d = pd.DataFrame(4,names,names)
code: joel barney georges amsterdam
joel 4 4 4 4
barney 4 4 4 4
georges 4 4 4 4
amsterdam 4 4 4 4
I'd like to compute a value for each cell (in my case the Levenshtein distance) using the row index and column index as value.
Googling gets me vague and complicated answers so I'm probably not going in the right direction ?
This is a great place for a generator or list comprehension.
IN
Python code:def add(x,y):
return x+y
print([ [f(x,y) for x in 'abc'] for y in 'abc'])
OUT:
pre:[['aa', 'ba', 'ca'], ['ab', 'bb', 'cb'], ['ac', 'bc', 'cc']]
|
#
¿
Aug 20, 2018 14:39
|
|
- Adbot
-
ADBOT LOVES YOU
|
|
#
¿
May 15, 2024 02:46
|
|
- Eela6
- May 25, 2007
-
-
Shredded Hen
|
I think you should declare something as close to where it is used as possible.
|
#
¿
Aug 23, 2018 05:27
|
|