Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

baka kaba posted:

Maybe "batteries included" isn't the best idea because those batteries are usually crappy and get discarded quickly??

Make it easier for people to add their own quality batteries

This way lies npm and I don't think we want that road

Adbot
ADBOT LOVES YOU

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Sylink posted:

This is sort of a general programming question but I'm using Python specifically -

I am writing a small script that will take a list of people and their shift schedules, and then it needs to detect from that list and the current datetime whether that person should be on shift now or not.

I have two problems:

1) What is the best way to encode/store the shift schedules in a format that the script would import? The schedules are the same weekly so the exact date isn't important but the weekday and hour of the day are important.

Right now I record the start hour as number and the weekday as a number (0-6 per the weekday() function in datetime), then the length of the shift in hours so you can compute the end by adding X hours.

2) What is the best way to detect if they are within the range? It seems like I need to generate a table of values (probably in pandas) with one column being the start time and the other an end time, then you can do a start < current < end to detect if they should be on shift.

Mostly point 1 and making it easy to get the shifts in is the hard part, I could just generate the next 48 hours of date ranges for each person I suppose? (48 because some shifts go through midnight so over two different days)

You want a date/time library that can
1. Serialize to and deserialize from strings (when saving/reading the file defining the shifts)
2. Do nice comparisons so you can do now < shift_end

A standard library way to do this is datetime, using strftime and strptime for reading/writing and comparing datetime objects. As posters have mentioned above there are better libraries on pypi for this (I haven’t used them personally but I 100% believe they’re better than datetime); that said, a lot of the problems they solve are about things like timezone handling and localization that might not be an issue for you.

Keep in mind, on that subject, that if you ever have even one person using this - as either someone that will be defining shifts or getting angry texts that they’re not at work - who is in a different time zone than everybody else this gets a lot more complex

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Yeah I’d always make loggers for a module using __name__ because if you have static utility functions, whether done as free functions or as staticmethod or classmethod, you won’t necessarily have access to instance attribute loggers. Now you might go ahead and make instance loggers that are children of the module logger, but to make the namespacing work properly so you can do configuration at the top of the hierarchy a module level logger should probably be in there somewhere

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Depending how you run it your entrypoint might have a __name__ of '__main__' and not get caught by configurations you apply to your package root just fyi.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

General_Failure posted:

I'm just going around and around in circles. Is there a way to set Jupyter notebook / lab to accept a local IP address range? Ie 192.168.1.0/24
Really I just want to be able to use my Jetson Nano headless. I mean I can with ssh and rdp, but Jupyter would be a great platform to utilise it with, but the documentation is so drat terrible and I can't work out how to allow connections from only my LAN.
It'd only be me connecting to it so it doesn't need to be anything fancy like Jupyter Hub. It just seems kind of silly to me that a browser based IDE thing is such a drat pain to use from other computers.

You could also give it a static IP if it's just at home. But really just do `0.0.0.0`, that's no less safe than the local ips unless it has a direct line out to the world. If you're really worried put nginx in front of it.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

unpacked robinhood posted:

I have a small percentages of files out of a batch that don't parse well when opened with the default open(..)
I've managed to get around this by checking the encoding on each file with filemagic, and setting the encoding parameter accordingly.

Does it feel bad-practicy ?

No , that’s exactly what it’s for.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
That and inertia, it's been around for long enough that it's already been a glue language for a while, and a lot of system components have (often first-party) bindings like systemd

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Dominoes posted:

I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess.

I think it's fine too, at least once you remove those xkcd links, come on.

quote:

A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda.

Can't quite parse this - what about Poetry are you trying to make work? The thing where you try and scrape its pyproject.toml entries?

Also I remember from that discord link that you were talking with PSF about this, and probably the single thing that makes me most uncomfortable about the prospect of using this project is that it requires you, specifically, to maintain its backend architecture. There's a link to your github embedded in the source for downloading python binaries, etc. Are you trying to get PSF to handle the managing binaries part? And possibly the serverside dependency listing that's currently on a personal heroku?

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

duck monster posted:

Whats wrong with good old fashioned virtualenv(-wrapper) and pip?

Those things are old, and pretty close to flawless in my opinion. With the added advantage that it isolates system pythons from local pythons without doing an end run around the os distros package manager

Pyenv always seemed to me like an attempt at replicating RVM, and RVM was a mistake. I just dont get what these things bring to the table. Whats the use case?

I mean, the link on the last post of the last page (here it is again: https://github.com/David-OConnor/pyflow) to the readme has pretty explicit content about it

Dominoes posted:

I apologize for the spam - would anyone mind skimming this readme, especially the parts comparing to existing projects? I'm attempting to demonstrate why I decided to build this, and how it improves on existing tools, without sounding snarky or insulting to those tools. I think it's OK, this is difficult to self-assess.

A more direct summary here, from anecdoetes: I'm still not sure how to make Poetry work with Python 3 on Ubuntu. Pipenv's dep resolution is unusably slow and installation instructions confusing. Pyenv's docs, including installation-instructions aren't user-friendly, and to install Py with it requires the user's comp to build from source. Pip+Venv provides no proper dependency-resolution, and is tedius. Conda provides a nice experience, but there are many packages I use that aren't on Conda.

Oh one more thing about this - it would be very cool to have a way to disable the patching for multiple-dependency-version support. It has a valid use case for using pyflow to manage trees that will only ever be used with pyflow, but the instant you w ant to publish a package to pypi, or use your code without pyflow, it won't work (as you mention in the readme) and to me it's much better to see those errors early. Having multiple-version support would be nice but IMO it's not really something you can have supported in only one, or a subset, of tools.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
I think part of the problem with all of these different packaging solutions and with yours finding a niche is the success Python has had among people whose job description isn't "writing software in python". It's a problem that usually only system scripting languages like sh or powershell have. That means that you have to consider two environments, or maybe two workflows basically entirely separately:

- The developer environment/workflow. This is people who write Python as their primary job. They produce a piece of code in Python and they are committed to either a) defining which interpreters it will work under for people in workflow 2 to use and letting people in workflow 2 worry about external compatibilities; or b) producing a packaged application that (usually) bundles an interpreter
- The user workflow. This is people who don't write Python as their primary job and don't produce packaged output but use Python a lot. This is everybody who uses jupyter, most people who use conda, most people who use py(xy) or spyder or whatever. This is also, ironically enough, things like Mac OSX. They manually handle interpreter installs and expect to not have to think about them. They also expect to consume packaged outputs from the first workflow.

There really hasn't been a tool that successfully handles both workflows. Probably pyenv/tox combined with first pipenv and now poetry is the best developer workflow, which requires managing multiple interpreter installations, automating tests, automating dependencies, and primarily focusing on the package you're developing. venv/virtualenvwrapper and plain pip or conda are the thing that works for the second workflow. Your tool seems targeted at the first workflow, and that's fine! Because pipenv is slow as dogshit and it's annoying to have to jump through hoops with path shims to make sure your dev environment doesn't stomp on your system python or whatever. So when people critique it they should keep in mind what it's aimed at.

Also I don't think this is you (let me know if it is) but somebody else on the python discourse, who mostly works with nix, is doing a similar import rewriter: https://github.com/nix-community/nixpkgs-pytools/#python-rewrite-imports https://discuss.python.org/t/allowing-multiple-versions-of-same-python-package-in-pythonpath/2219/6

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

QuarkJets posted:

conda and jupyter existing primarily for people who do not write python is a... interesting take

I don't mean it like "for simpletons who could not possibly be programmers" I mean it's much rarer that they're used for people whos goal is to ship python code (except for people wanting to ship python code for people to use in jupyter). It's a much stronger point about jupyter than conda though I'll admit

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
The functional api for things like namedtuples or enums is an insane code smell and a stain on the language imo. I mean come on

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

CarForumPoster posted:

Does a jupyter notebook count? I love notebooks. Inline graphs/images and documentation style commenting (markdown cells) are the best.

Notebooks rule because literate programming never should have died. Unfortunately they still have all the same problems that literate programming had, mainly that they end up being great for roughly everything but production use

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

mr_package posted:

edit: One advantage might be the ability to test with multiple versions, e.g. Red Hat is shipping 3.6 now I think; that means things like dataclasses or from __future__ import annotations and who knows what else won't work so if pyenv brings that to the table it is absolutely worth it.

You may already know this but the right tool for this is tox

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
It kinda depends pretty heavily on what your actual use case is. In your trivial example I'd almost prefer
Python code:
[dec(inc(inc(n)) for n in row]
but as those functions you're mapping get more complex I'd decompose them into separate transforms. I personally don't like using the functional syntax in python since nothing else really does. Even (ignoring your caveat about iterators) if you wanted lazy iteration, I'd prefer a generator comprehension.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

QuarkJets posted:

To be more specific:

quote:

Yes, it's fairly common but it's also the wrong way to do it. editable packages exist for a reason.

What's wrong about using PYTHONPATH here? I'm asking because I'd like to know if there's some hidden pitfall that I haven't considered to modifying PYTHONPATH for development sessions.

I have an operational venv that I could clone and then pollute with a git repo, but A) that's kind of a time-and-space waster if the venv is large and B) the git repo may not have the same file structure as the code would once installed into the venv, for whatever reason. Editing files in the venv directly would be bad practice since those changes would be untracked. So editing PYTHONPATH here seems like the best option, but you're saying that's bad because editable packages exist. Could you elaborate? Your followup was kind of generalized, I'd like to get to the bottom of this situation in particular.

Well, I think the point he's making is "you could do that but there's better ways and not knowing those ways might mean you don't know other best practices" which is true but kind of a dickish thing to snidely assert without actually providing an alternative.

The alternative, btw, is "editable packages". Use the -e option to pip install or poetry install or whatever and it'll put a .egg-link file in your default interpreter site-packages that contains the path for the interpreter to look to find a file. That's the intended workflow as far as I know for installing a package that you're also working on.

Anyway, using PYTHONPATH instead _will_ obviously work but if you do the editable package thing you're putting more of the work onto the interpreter's built in loading methods which means your method of using the package is closer to the nominal end user's which means you might miss fewer errors around prod/dev configuration differences.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Dominoes posted:


So:
Python code:
pH = sensor.read()
pH = sensor.read(t=23.)
Rust code:
let pH = sensor.read(OnBoard);
let pH = sensor.read(OffBoard(23.));
See if you can spot the strong convention I broke!

I also have a C++ version. I used a third option, function overloading, but this resulted in a load of DRY I'm not happy about. I don't think I can even do enums-that-hold values, or kwargs in C++, so w/e.

Using camelCase for variable names, hate to see it lol

In cpp you might do something like this with default arguments, or inheritance, or template specialization

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

CarForumPoster posted:


Works but this gives a syntax error with the last line:
code:
    [(joel_df.loc[index, column] = add_df[column]) for column in add_df.index]
Am I losing my mind? Whats wrong with that syntax?

Assignments can't be used like that, basically. Elements of list comprehensions at least nominally have to be able to evaluate to something, and assignment expressions can't be used in the rhs of an expression. You need to refactor this so that you don't have to use assignment, somehow (don't know how dataframes work off the top of my head)

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Rocko Bonaparte posted:

I am unfamiliar with them either and it looks like they don't have a simple setter/getter set up so all I could think of was an ugly soup using getattr and setattr. That's strange to me because I would think that kind of list comprehension would be common enough.

Yeah, but it’s really hard to define an ergonomic api for multidimensional data as I continue to face to my despair at my day job

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
A team at a company I used to work for ran a web service that involved hosting user-provided 3d models. For a while at least, to generate the thumbnails (that's it - just the thumbnails) they would send a newly-uploaded model over to some aws loadbalancer that would spin up an instance and invoke blender on it with a predefined scene to render three sizes of thumbnails. When you uploaded a new model the thumbnails could take actual hours before they showed up (depending on the queue length) with no indication anything was happening.

Not python related but literally what we're talking about here lol

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

CarForumPoster posted:

And what they should’ve done is use an AWS Lambda function! Only issue.

A guy comes to the county courthouse looking to change his name. “Oh, what’s your current name?” asks the clerk, and when the man replies, “Jim Hitler” the clerk understands completely. “So what do you want to change it to?” asks the clerk. “Well, I’m thinking about Joe.”

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
All .py files must have encoding declarations and if they don’t there will be an error. Also major strides have been made in lazy bytecode compilation. Sadly these errors combine to mean the error from encoding cannot be associated with a specific python file.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Inheritance is a path forward there for sure, but I'd also consider combining some more strict use of structural typing. To me, catching AttributeError in the normal course of events is not great even though it will work because it speaks to the organization of the code - why could things that don't have that attribute actually be used there? "It's a third party library that sucks" is a good reason, but if you're writing it all yourself I'd consider either
- Requiring on_enter_room to be defined but having it do nothing if necessary
- Making on_enter_room a class attribute that contains a possibly-empty list of callables (and remember, bound methods are first class so you can put like self.bark in the list)

In general I also find it's really helpful to use type annotations and an external static analyzer like mypy, which can catch a lot of the weirdnesses that might otherwise make you use attributeerror while you're writing your code. It's really easy to set up.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Wallet posted:

As Dominoes said, the type information for arguments/returns is best covered by type hints (which PyCharm has robust support for, like seemingly everything else). If you are dynamically adding methods to a class or something you can add type hints by using a stub. The other stuff people put in multi-line comments at the top of classes/functions almost always seems unnecessary to me. It's either high level information that can be conveyed by just naming the function appropriately, or its specific enough that I'd usually rather it was in regular comments where it's relevant.

Maybe it's just me, but who is this poo poo for?

Python code:
    def users_set_avatar(self, avatar_url, **kwargs):
        """Set a user’s avatar"""

What Bundy said, but also that is indeed a somewhat useless docstring. It doesn't describe anything that the function name doesn't cover. But what if you start making it more in depth so that devs don't have to read the function's code and follow its dependencies, or can view it on that online documentation? Like,

Python code:
def users_set_avatar(self, avatar_url: str, **kwargs):
    """
    Set a user's avatar.

    avatar_url should be a valid url pointing to an image. The image may be jpg or png; other types will throw a RuntimeException.
    If there is a problem with the url and it cannot be resolved or downloaded, a RuntimeException will be raised. This includes 
    certificate chain verification, if this is an [url]https://[/url] url.

    kwargs are passed unchanged to requests.get(). 

    If this user already has an avatar, it will be overwritten. If this was the only user using the avatar, it will be deleted
    from the shared image cache.
    """
Makes it a bit more useful!

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Please document the assumptions your code makes and its semantics. Please don't say "just read the code". Especially if you're not using typing (with good, well-named, verbose, and specific types) it is really important to keeping the code readable.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Warbird posted:

Folks is there an idiot-proof way to handle and manage pip packages? I'm forever seeming to have just every drat thing break when trying to set up some interesting looking app or the other and it's continuously a pain in my side.

On the publisher side, nothing special.

On the user side, use virtualenvs. You create a virtualenv for each "app". You can activate or deactivate them. When one is activated, the shell that it's activated in will use the virtualenv's configuration to find and install python packages. You should do this even if you have just one app you need to install, honestly - it makes life so much better.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

KICK BAMA KICK posted:

Feel like some might consider this golf-y compared to a full try/except, opinions?
Python code:
value = default
with contextlib.suppress(PossibleException):
    value = operation_that_might_raise()
Just came to mind cause it's the one actual PyCharm bug with a long-open report that I've encountered that they haven't fixed (the first assignment of value will get flagged as having no effect, not understanding that the second assignment is in a block where failure might be expected).

From both an (admittedly probably meaningless) performance (much more important) readability standpoint I would only do this if it was extremely likely that in the normal course of operation that exception would happen. Like at least a 50/50 and maybe more often than not. It adds so much emphasis to saying yup this might happen, yup it for sure might, compared to a try/except (and towards perf, a try/except will typically be implemented to have little effect in the no-exception-raised path since that's the more common case).

Everything about try/except style exception handling is designed for exceptions as, well, exceptions - things that can happen that aren't intended but can still be accounted for on a "oh that's not what I wanted, I guess I'll try this other thing instead". That's why the performance is designed to have little or no impact on the no-exception pass, that's why the weight of the syntax is in the exception handler. Switching that up so that the work is done in either case doesn't make me think "oh this code is particularly conscientious", it makes me think "this exception must happen constantly, I wonder if somebody's using it for flow control".

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Dominoes posted:

One minor, tangental* step: I'm going to move the python process manager I wrote towards only supporting the latest python minor version. I've been neglecting this project for want of spare time, but this change may simplify the codebase and API.

With that in mind: If you're using 3.6, 3.7 etc, why? Would switching to 3.8 break anything for you?

* Relevant in that it reduces a system state degree-of-freedom.

I work on a project that (and obviously this isn't great) runs a robot but also is installable from pypi for local simulations of actions. While I can bump the python version on the robot all I want, I also want to keep it backwards compatible with a reasonable set of python 3 variants (where backwards compatible mostly deals with asyncio utils and typing).

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Hughmoris posted:

Rookie question:

How do I find/read built-in function implementations in Python? E.g. I have Python 3.9 and I'd like to see how they implemented the max() or any() function. I've been poking around github and Google but haven't had any luck.

Depends on which ones you mean. max and min and other builtins and language-provided functions specifically are in c but the standard library modules are by and large in the Lib directory of the cpython repo

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Jeesis posted:

Trying to unfuck some code, for background I am going from python 2.5 to python 3.8


Well, the original appears to be passing the match object which is a lot more than just the string so I guess it depends what
pre:
process_butts
is doing.

I also don't know what string.replace is, if it was a method in python 2 it's gone in python 3. You could use re.sub or I think replace on the match object.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Cyril Sneer posted:

As a follow up, is there a way to allow dynamic assignment of classes? In order to assemble my pipeline, I was thinking of using a configuration dictionary, with the keys specifying the particular model to use, amongst other things.

I.e., something like:

code:
config_dict = { 'predictor': modelA }

You can literally do this, classes are objects too and defining them with the class keyword creates an entry for that class object. You can call them to call their constructors, too

code:

class A:
   def __init__(self): print("im an A")

config_dict = {'A': A}

config_dict['A']()  # prints "im an A"
Not sure if that's what you're looking for, but it is a thing.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

mystes posted:

I guess if you're using mypy you could theoretically make a plugin for it so you could typecheck some sort of wrapper that would generate versions of classes with optional members or something wacky like that, but it seems like way too much work and I assume that wouldn't work with pycharm, etc. anyway.

Yeah this is pretty much the only way to do it if you want static typechecking. Ironically if all you want is runtime typechecking I'm pretty sure you could do it with dynamic classes, but mypy does only AST level evaluation so it can't see it.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
Just be aware that what docker is optimized for is running on linux hosts where it can take advantage of mechanisms that provide strong per-process isolation and filesystem mounts (or bsd). It is slower on windows where it goes through an emulation layer; it is dogshit slow on OSX where it just runs in a linux vm, and eats a ton of resources, and particularly eats poo poo if you have the temerity to try and share files between the host OS and the VM, and also docker daemon runs a vm while active so it eats a bunch of system resources. It will make everything work cross platform but it won't necessarily make everything work well cross-platform.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
I mean unfortunately I think the sort of real answer is "if it matters, don't do that". Like, if this is a toy project, whatever, string.format() is probably fine. But if it's a real project that's, for instance, on the internet, you probably don't want to use strings for it at all. You want to store a computation tree, like an AST (and maybe literally an ast.AST instance that you poke the user-supplied operands into), after parsing and sanitizing user input at the point of entry, both for security reasons and for error locality reasons.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

former glory posted:

I'm pretty new to python and I've hit a major stumbling block in trying to test out an old github repo. It has a requirements.txt with a good 7-8 libraries listed with specific versions, and I've inferred that this was developed on python 3.5. I've tried creating a venv with my system 3.7 python and using pip and it dies trying to install one of the libs.

I tried using a conda environment with all sorts of python versions from 3.4-3.9 -- thinking maybe conda is smarter about this than pip -- and it dies on a different package with each version. There doesn't seem to be a way to get all the requirements to install. I've managed to get slightly different versions of libs to install except for matplotlib and pyclipper now, and I'm wondering if there's some sort of standard way of working this out and maybe I'm just making this hard for no reason? I've tried having PyCharm process the requirements.txt with conda and even manually just installing the packages.

Sorry, don't want to get all stackoverflow on the thread, just wondering if there's some obvious thing I should be using for this type of problem. :shrug: The whole world of pip, conda, miniconda, anaconda has me reaching all over.

Honestly it kind of sounds to me that time has ended up making that collection of libraries at the versions frozen in the requirements mutually incompatible through transitive dependency, like maybe

- app requires a at version 1 and b at version 2
- a at version 1 requires c at version > 3
- c now has a version 4, which matches a's requirement, but in turn requires, say, b at version 3
- oops you're broken

I don't think there's a specific tool that can base its work on requirements that can help, the way around this is what poetry and pipenv do which is to take something that has these semantic requirements like a requirements.txt and freeze them into a checked-in exact version snapshot of the world (or at least the venv) which is what you later use to reinstall things, and you can periodically update that snapshot to keep track of updates to dependencies and transitive dependencies.

The path forward might be to scrub through the requirements and the pip or conda output and see which dependencies are causing the problem and add or edit restrictions until you can massage the dependency tree back into a working state.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
a thing that might be nice there is for the url fetcher to be a classmethod:

Python code:
class PageRipper:
    def __init__(self, html_data: bytes) -> None:
        """" Build a pageripper. In most circumstances, rather than building
        this class directly, use from_url."""
        soup = BeautifulSoup(html_data)
        # ... do stuff ...

    @classmethod
    def from_url(cls, url: bytes) -> 'PageRipper':
        response = requests.get(url)
        return cls(response.text)
but yes in general you'll want to separate the concerns of 1) getting the content 2) parsing the content and probably 3) doing any processing based on the content to separate, independent chunks of code that can be tested individually.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

BeastOfExmoor posted:

I've been working on the code I was asking about a few pages back based on the solid recommendations the thread gave me. Wanted to get some feedback on how I'm approaching some things though.

Here's some very abbreviated code from the main class:

Python code:
class EbirdLocation():
    def __init__(self, ebird_html: Union[bytes, str]) -> None:
        """Class for storing parsed data from an ebird region or hotspot"""
        if not ebird_data_is_valid(ebird_html):
            raise ValueError  # Consider changing this to a custom error type

        soup = BeautifulSoup(ebird_html, 'html.parser')

        self._set_location_code(soup)
        # Wondering about if this is worse than changing _set_location_code to _get_location_code 
        # and having it return the location code.

        if self.location_type == "region":
            self._set_total_ebirders(soup)
            self._set_hotspots_count(soup)
        else:
            self.total_ebirders = None
            self.hotspots = None

    @classmethod
    def get_region(self, region: str, period: str = "all", sightings_type: str = SightingsType.first_seen.value):
        """Retrieve /region html and create a EbirdLocationData object from it"""

        validated_period = Period(period)  # Exception will occur if invalid period is passed
        sightings_type_validated = SightingsType(sightings_type)  # Exception will occur if invalid period is passed

        url = create_ebird_url(region, page_type=PageType.region.value, period=validated_period.value, sightings_type=sightings_type_validated.value)
        logging.debug(f"URL = {url}")
        ebird_html = get_ebird_html(url)
        return EbirdLocation(ebird_html)

    def _set_location_code(self, soup: BeautifulSoup):
        """Set the ebird location code (Ex. US-WA-061)"""
        canonical_location = soup.find("link", rel="canonical").attrs['href']
        last_non_location_code_character = re.search("/(hotspot|region)/", canonical_location).end()
        # Slice the end off the canonical location to get the location code
        self.location_code = str(canonical_location)[last_non_location_code_character:]

quote:

Questions:
1. Is there any advantage to having a function which returns data which a variable can be set to vs just having the function set self.location_code based on what it's parsed from the HTML?

There's two good reasons to have a function calculating the value of a variable. First, it makes it easier to read, and not joking in something like 95% of code use cases the single most important thing to do with the code is make it clear and easy to read. You'll read it more often than you write it. Second, it's more efficient if the value won't change very often to just calculate it once and put it in your pocket.

That said, you typically don't want to have functions that create member variables that you call in initializers. Python doesn't really have separate ahead of time class member definitions - there's not a single unified place to look, required by the language, where you'll see all the member variables of a class. This can be a tremendous pain. The overwhelmingly common idiom is therefore to set all member variables at least once in __init__. You can change them later, but make sure that every member variable shows up in __init__ somehow, even if it's just to set it to None to mark that it's there, so you can always scan through __init__ and see all the member variables. Doing this also makes it more friendly to the type system, which is not good at tracking side effects like "this method happens to set this member variable". If you ran mypy on what you have I think it would complain that location_type was referenced before being created in __init__.

Summing that all up, I'd keep _set_location_code, but make it end with
pre:
return str(canonical_location)[last_location_code_character:]
and have __init__ do
pre:
self.location_code = self._set_location_code(soup)
That's both a) easier to read and b) friendlier to mypy

quote:

2. The classmethod is written to give an easy was to create an object with one line of code, while still giving the flexibility to generate the object from HTML which allows for a lot of flexibility in testing and use. Any issues with how this is written?
Love it.

quote:

3. I omitted the actual function definitions for brevity, but currently this class can generate off either a region page or a hotspot page. Hotspot pages don't contain total_ebirders or hotspot counts, so currently I'm just setting those to None for hotspots, but I'm weighing the advantages of making region and hotspot subclasses. I guess the biggest issue would be that I'd need to do the detection before creating the class rather than allowing it to be done based on what BeautifulSoup finds in the HTML.

Well, nothing's stopping you from having __init__ take a BeautifulSoup object instead of the raw html; you're doing a lot of work in that get_region method after all.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Epsilon Plus posted:

Isn't the instance of the object still created at that point, though? Wouldn't I just end up with an uninitialized/partially initialized object of that class? I mean if it raises an exception right then and brings down the whole program it's not really an issue anymore per se, but I don't want the entire program to fall over irrecoverably over it.

Python object initialization isn't done until __init__ is complete, so this

Python code:

class A:
   def __init__(self): raise Exception("i shan't be doing this while my hair is wet")
try:
   a = A()
except Exception:
   print("drat, the hair was wet")
print(a)

will result in a NameError at print(a), since the raise was during the lhs of the class creation.

In 99.99% of cases you shouldn't use __new__ and if you're in the .01% you'll know it, guido.

If you're worried about saving memory, hey it's a gc'd language it'll go away eventually probably

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

Epsilon Plus posted:

poo poo, I thought it was done when __new__ was complete... which would rely on __init__ being complete...

... why did I need to ask this question, again? I could have sworn it creates an instance, then uses __init__ to populate it afterwards.

i'm pretty sure that's how it works internally but when you're creating an object they both get done at the same sequence point relative to the creator invocation so if either fails the result won't get bound

Adbot
ADBOT LOVES YOU

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man

CarForumPoster posted:

Thread, I interview Python coders. I often get new grads who list themselves as Python (Advanced) in their skills section. 100% of them I am disappointed with

- what don’t you like about python?
- what do you wish it did better?
- why don’t you think it does things that way?
- what makes you use it in spite of that?

I think if someone can answer those in a cogent and consistent way they’re pretty much what you’re looking for without needing language trivia

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply