Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

Data Graham: Dec 28, 2009; 📈📊🍪😋

So here's me refactoring my API crawling script for speed.

"I know, I'll make it multithreaded!"

*reads this article about the GIL*

"Surely the leopard won't eat MY face!"

Before threading on a 4-CPU box: 35 minutes on a single CPU

With 5 threads: 29 minutes, CPUs about 30% saturated, evenly

With 20 threads: 40 minutes, CPUs all saturated

"Uhh. Okay, let me just run four separate processes and slice up the data set four ways."

With 5 threads per process: 8 minutes, CPU usage evenly saturated

With 1 thread per process: 8 minutes, CPU usage evenly saturated

So um. Does this mean the GIL is really that much of a bastard and I should just have shot myself rather than try multithreading in Python in the first place?

Data Graham fucked around with this message at 13:26 on Aug 1, 2017

# ¿ Aug 1, 2017 13:21

Adbot: ADBOT LOVES YOU

# ¿ May 13, 2024 09:15

Data Graham: Dec 28, 2009; 📈📊🍪😋

Cool. You may all point and laugh as you please

# ¿ Aug 1, 2017 13:29

Data Graham: Dec 28, 2009; 📈📊🍪😋

Why not use a native python rar library?

# ¿ Sep 29, 2017 23:11

Data Graham: Dec 28, 2009; 📈📊🍪😋

Doesn�t that mean there will be demand for like index-based accessors and tons of non-backward-compatible code simplifications?

Schism time

# ¿ Dec 20, 2017 04:24

Data Graham: Dec 28, 2009; 📈📊🍪😋

What you've done sounds like how I would do it frankly.

I'd rather process each row once and then have a nice fast hash-table lookup from a dictionary (for each of n keys) than process each row n times looking for the keys.

The benefit that your proposed solution has is that n keeps decreasing as keys are found, and once they're all found you can discard the whole rest of the CSV; but whether that's overall more performant than the first approach depends on the shape of the data. The proposed solution is also trickier and sounds like it would take more testing and tinkering, and for that reason alone I'd probably stick with the first approach.

# ¿ Jan 8, 2018 17:45

Data Graham: Dec 28, 2009; 📈📊🍪😋

Yeah, that was going to be my question. Isn't the benefit of that just that the functionality is built-in and easy to use rather than that it's efficient?

# ¿ Jan 8, 2018 18:18

Data Graham: Dec 28, 2009; 📈📊🍪😋

Sounds like it's just an SPA where the data you want is filled in via ajax calls. So your test framework needs to run JS with full browser-like capabilities.

(Or access the API directly, but...)

# ¿ Jan 10, 2018 16:16

Data Graham: Dec 28, 2009; 📈📊🍪😋

To make the rest of your code more opaque and confusing, duh.

:v:

# ¿ Mar 3, 2018 22:59

Data Graham: Dec 28, 2009; 📈📊🍪😋

The March Hare posted:

Now that I think about it, I can see being new and being totally confused by CBVs/generic CBVs/function views. I think models are fairly reasonable (especially now that you don't need an outside library for migrations), functional views are totally intuitive in their machinations, urls aren't that bad as long as you read the docs, and forms are terrible even if you do read the documentation.

The docs seem to be downplaying function-based views as much as they can; if they aren't planning to deprecate them outright I feel like they're going to turn into a semi-supported feature that just confuses everyone down the line.

# ¿ Mar 26, 2018 20:16

Data Graham: Dec 28, 2009; 📈📊🍪😋

Unless I'm misunderstanding you, you already have everything you need. message is currently a dict with all the values that were loaded in from the JSON. So you can access resourceID like

message['resourceID']

# ¿ Mar 30, 2018 02:29

Data Graham: Dec 28, 2009; 📈📊🍪😋

It sounds like it's legitimately giving you an authentication error. Are you literally sending "username" and "password" as the username and password like in your code? Didn't the docs describe what to use for those values?

# ¿ Mar 31, 2018 03:16

Data Graham: Dec 28, 2009; 📈📊🍪😋

Seventh Arrow posted:

That's great, thank you. Could you maybe explain why it's necessary to put the [0] between ['data'] and ['token']? (and maybe where I could find more info on the topic?)

Because the [] brackets mean an array or list. There is only one element in that list, the {'token': 'bYrIXkbPK933im6zpM4GoPB59i7pxLhkfAbdCrQr7kWFrjnH5hW5Z130uqqSI1uU', 'status': 'queued'} dict; but you still have to specify which element in the list you want.

Formatted out, your dict might look like this, with more than one item in the list:

code:

{
    'data': [
        {
            'token': 'bYrIXkbPK933im6zpM4GoPB59i7pxLhkfAbdCrQr7kWFrjnH5hW5Z130uqqSI1uU', 'status': 'queued'
        },
        {
            'token': 'asdadasdasdasdsad', 'status': 'queued'
        }
    ], 
    'status': 200
}

And in that case you'd use [1] to refer to the list element with the "asdadasdasdasdsad" token.

Data Graham fucked around with this message at 02:36 on Apr 2, 2018

# ¿ Apr 2, 2018 02:32

Data Graham: Dec 28, 2009; 📈📊🍪😋

Or pprint?

That�s more for native objects though, not sure if it handles json or anything.

# ¿ Apr 11, 2018 23:59

Data Graham: Dec 28, 2009; 📈📊🍪😋

Could you not just put the output in a <pre> tag?

# ¿ Apr 12, 2018 12:58

Data Graham: Dec 28, 2009; 📈📊🍪😋

The Fool posted:

At work in am receiving a backup of an Sql database, but I will not be getting control of any of the existing infrastructure or front end.

I was thinking about throwing up a quick and dirty Django front end, but have no idea how to set up the orm to handle an existing database.

Is there an accepted best practice? Should I just build out a new table structure that I can import the data into, or is there a better way?

Modeling an existing table schema in Django's ORM isn't too much of a hassle. Just specify db_table and db_column if you don't want to rename anything. There shouldn't be anything in the schema it can't deal with, or if you do it might not be important anyway?

# ¿ Jul 4, 2018 10:50

Data Graham: Dec 28, 2009; 📈📊🍪😋

Anybody have thoughts on PyCharm vs. IntelliJ+Python extensions?

My work is trying to eradicate the former in favor of the latter, for licensing cost reasons. All us Python developers think this will suck a bunch, but not many of us have enough experience with IntelliJ to have concrete complaints about it, other than anecdotal stories that IntelliJ takes forever to launch.

Anyone care to vent?

# ¿ Sep 5, 2018 17:51

Data Graham: Dec 28, 2009; 📈📊🍪😋

In general I like to write my methods to take richer objects rather then sparser. It�s easier to expand the later that way (example: method that takes an array of things to operate on rather than a single thing)

# ¿ Sep 15, 2018 14:46

Data Graham: Dec 28, 2009; 📈📊🍪😋

Client-side validation is a gateway drug

# ¿ Nov 23, 2018 02:13

Data Graham: Dec 28, 2009; 📈📊🍪😋

Style question. Which is better:

code:

    def retrieve(self, request, *args, **kwargs):
        account = get_object_or_404(Account, pk=kwargs['account_id'])
        obj = get_object_or_404(self.queryset, pk=kwargs['pk'], account=account)

code:

    def retrieve(self, request, account_id=None, pk=None):
        account = get_object_or_404(Account, pk=account_id)
        obj = get_object_or_404(self.queryset, pk=pk, account=account)

# ¿ Jan 26, 2019 15:42

Data Graham: Dec 28, 2009; 📈📊🍪😋

Cool, thanks for the responses.

Bundy posted:

Agree with the other posts and besides, explicit is better than implicit.

Nice. I'll go this route then, since this gives me a good bite-sized rationale in case anyone wants me to defend it.

# ¿ Jan 26, 2019 21:10

Data Graham: Dec 28, 2009; 📈📊🍪😋

Dr Subterfuge posted:

That rationale comes packaged with some others, if you're curious.

Oh sure, I've got most of them committed to memory, I just hadn't connected that one to the case in question.

For context though, this is me subclassing DRF ViewSets and their class methods, which in the base implementation use the *args/**kwargs notation, so I was sort of following that pattern just out of inertia. I wanted to use the named params, but I was hoping there wasn't maybe like some unspoken rule about keeping consistency with a library you're building on top of, or something like that.

Data Graham fucked around with this message at 04:05 on Jan 27, 2019

# ¿ Jan 27, 2019 04:02

Data Graham: Dec 28, 2009; 📈📊🍪😋

I mean, much as I appreciate the level of mastery and art that such patterns can make possible, there�s a point at which I�m like �sure hope I�m not the guy who has to inherit this code after the person who wrote it leaves�

# ¿ Feb 3, 2019 01:21

Data Graham: Dec 28, 2009; 📈📊🍪😋

Today is #2to3 party day wooooo

Migrating about 7 legacy django apps that all have to be updated at the same time because they're all sharing an Apache mod_wsgi space uuughghh

# ¿ Mar 31, 2019 18:25

Data Graham: Dec 28, 2009; 📈📊🍪😋

Yeah, same question. I super depend on iPython and the django CLI shell, and having to do long annoying docker-compose commands to get into the shell or install packages or run migrations or whatever is super unsatisfying compared to just doing a local venv.

I'm very open to shifting my thinking more into docker land, but so far it's still an uphill battle.

# ¿ Apr 6, 2019 13:53

Data Graham: Dec 28, 2009; 📈📊🍪😋

Caching at the edge server, survey says ...

# ¿ Apr 10, 2019 03:12

Data Graham: Dec 28, 2009; 📈📊🍪😋

I'd totally sub it.

I still can't get my head around how volumes work and really what a container even is, like obviously it's not an entire VM image, it's like ... just a single app within an image, and you can have like 5-6 of them docker-composed together in the same environment, like one is your database and the other is your python app, but if you ssh into one it's like a parallel universe where it can't see any of the other ones and the db in the python env will be out of sync with whatever is in the db container? But in prod it all just magically works?

I mean I'm being deliberately obtuse here for effect but I do really wish I had a more visceral understanding of the concepts involved. Really sucks when you're having the guy who hired you explain it to you for the third time and you're watching his face fall as he comes to grips with the idea that he's made a huge mistake *Sound of Silence plays*

# ¿ Apr 10, 2019 17:51

Data Graham: Dec 28, 2009; 📈📊🍪😋

QuarkJets posted:

The second one is better

But you gotta do the first if your condition is anything other than "bar is truthy".

# ¿ May 6, 2019 03:14

Data Graham: Dec 28, 2009; 📈📊🍪😋

Thermopyle posted:

I prefer the comment before the if block.

However, this does not make any sense and I do not know why I prefer it.

I mean, you don't put a docstring before the function or class definition starts.

This bothers me because every time I write such a comment, it's weighing on me to choose between what I prefer and what seems technically correct.

Just think of a comment as a freestyle decorator

# ¿ May 9, 2019 21:49

Data Graham: Dec 28, 2009; 📈📊🍪😋

^^^ Hahaha. I can't decide if that's the most pythonic or the least pythonic thing ever

# ¿ Sep 5, 2019 15:46

Data Graham: Dec 28, 2009; 📈📊🍪😋

But what if u wanted to divide the string by something

like u'zero'

# ¿ Sep 5, 2019 20:41

Data Graham: Dec 28, 2009; 📈📊🍪😋

Dominoes posted:

Sweet. I'm not sure why I've had such trouble with VMs before; tried Docker to help someone repro a bug and got nowhere, despite most people having no trouble. Also having flashbacks of an old Django tutorial that used Vagrant/Chef/Virtualbox. Looks like Windows Subsys for Linux now supports several distros, which may help too.

Man I'm glad I'm not the only one for whom Docker seems like watching kids today with their intendos and wondering when I got so old.

Wasn't someone going to start an All Things Docker thread?

# ¿ Sep 10, 2019 12:26

Data Graham: Dec 28, 2009; 📈📊🍪😋

Mr. Angry posted:

str and int are immutable and are always passed by value so you can simply set a default value like "" and it'll work. Lists, dicts and other objects are always passed by reference and setting a default value without a factory function will result in every instance having a reference to the same object. It's for the same reason you don't use empty lists or other objects in function arguments.

PEP 484 used to allow both of your examples to work but now the Optional type must be explicit and type checkers should raise an error if you don't include it.

Waiwaiwait. I feel like this means I might have been doing poo poo wrong for years then.

Does this mean

code:

def foo(blah, my_list=[]):

code:

foo(blah, [])

are wrong? What do I need to be doing (and fixing in ... like .. all of my code ever)?

# ¿ Jan 15, 2020 14:32

Data Graham: Dec 28, 2009; 📈📊🍪😋

Okay, well as long as passing in an empty array to a new function invocation is kosher, that's not as bad as I thought.

A lot of times I've put the my_list=[] in the function definition just so there would always be something to iterate over within the function instead of throwing None errors if I didn't specify the kwarg in the function call (usually this was the result of me tacking on new functionality to legacy code where there are a lot of calls to the function and I want to seamlessly add handling for an optional param without having to refactor every single call).

I could have just done "if my_list:" but hey

# ¿ Jan 15, 2020 15:03

Data Graham: Dec 28, 2009; 📈📊🍪😋

Like for example I want to take

code:

def foo(blah):
    return True

And turn it into

code:

def foo(blah, my_list=[]):
    for item in my_list:
        do_stuff()
    return True

Where the code has a ton of existing calls to foo(blah). So if I did my_list=[], it would work, whereas my_list=None would throw TypeErrors.

I guess it felt cleaner and saved a line of code to not have to check whether my_list was truthy, but what you guys are saying makes sense. I like the "my_list = my_list or list()" style.

# ¿ Jan 15, 2020 15:12

Data Graham: Dec 28, 2009; 📈📊🍪😋

SurgicalOntologist posted:

One way to avoid the if statement is to make the default value an empty tuple. Since tuples are immutable, it doesn't have the same problem.

But in general you shouldn't hesitate to write if arg is None: often.

Yeah, good call. I like the previous suggestion a little more because it also saves an indent, but I'll bear that in mind.

The tuple thing sure feels like inside baseball though. One of the joys of python is how much of it Just Works the way you'd expect it to on the surface, and I guess I'd tricked myself into thinking passing-by-value-or-reference was one of those things one didn't have to think about in day-to-day code, since that pattern I was using didn't seem to cause any issues.

Live and learn. That'll make a drat fine interview question though

# ¿ Jan 15, 2020 16:11

Data Graham: Dec 28, 2009; 📈📊🍪😋

Oh sure, I get it. You can imagine how it feels learning fundamentals after umpty-ump years doing this though :v:

But having thought about this a little, yeah, the classical form the interview question would take would be like "What's the difference between a list and a tuple". But with this spin it would become more like:

Suppose you're writing a function that takes a list as a parameter. Why would it be OK to write def foo(my_list=()): and not OK to write def foo(my_list=[]):?

The answer I would LIKE to see would be "It wouldn't, because that's weird and obscures readability and tricks a coder into thinking the function is supposed to take a tuple instead of a list"

But "it's immutable and therefore will allow you to iterate over my_list within the function without worrying about whether it's getting passed by reference" definitely demonstrates more internalized understanding than I had an hour ago

e: To be clear I knew about mutability, just not this aspect of it. You know, definition vs. practice stuff.

Data Graham fucked around with this message at 17:30 on Jan 15, 2020

# ¿ Jan 15, 2020 17:03

Data Graham: Dec 28, 2009; 📈📊🍪😋

I�m in the habit (usually) of putting trailing commas in everything, but especially tuples because wacky breakage happens in the case of length = 1 if you don�t.

I like trailing commas in dicts and lists too because it makes adding/removing elements safer when they�re all on individual lines.

E: ^^ yeah that

# ¿ Jan 15, 2020 18:18

Data Graham: Dec 28, 2009; 📈📊🍪😋

Random thought, it seems wonky to me that a dict subscript is written foo["bar"] and not foo{"bar"}

# ¿ Apr 2, 2020 21:52

Data Graham: Dec 28, 2009; 📈📊🍪😋

In case anybody's reading this thread and not the Django thread �

I could really use someone's help tracing down a problem I'm having in Django. It seems trivially easy to cause a deadlock in the logging module, by hitting the site ~simultaneously with dozens-to-hundreds of AJAX requests.

It's been independently reproduced; I just don't know how to go further with digging through the innards of Python itself since it seem like that's where this issue lies.

https://github.com/data-graham/wedge

Can anyone spare some time to poke at this? I don't want to discount the much-appreciated efforts others have already given it, but I'd like to broaden the visibility now that I know it's not just me.

# ¿ Apr 22, 2020 14:55

Adbot: ADBOT LOVES YOU

# ¿ May 13, 2024 09:15

Data Graham: Dec 28, 2009; 📈📊🍪😋

Hm. Strange, because when I originally started investigating this, I had loggers defined. That's what the problem was to begin with, it's wedging in production.

I took out everything I could identify as not affecting the behavior, and I was originally swapping all the various loggers in and out, until I eventually saw that it wasn't whether the loggers were defined that was causing it, but the handlers. So I left loggers empty for the demo.

FWIW I just now added my prod loggers back in

Python code:

        'django.request': {
            'handlers': ['mail_admins'],
            'level': 'ERROR',
            'propagate': True,
        },
        '': {
            'handlers': ['logfile'],
            'level': 'INFO',
            'propagate': True,
        },
        'django.db.backends': {
            'level': 'WARN',
            'handlers': ['logfile'],
        },
        'django.security.DisallowedHost': {
            'handlers': ['null'],
            'propagate': False,
        },

and uncommented the handlers and it wedged again right away.

# ¿ Apr 23, 2020 12:11

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›4 »