Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
SurgicalOntologist
Jun 17, 2004

Haha, I was in VR at the time, actually (a lowly lab manager messing with Matlab when there was no equipment to order).

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

Cyril Sneer posted:

But I'm not sure how to do that without requiring that somehow everyone put their trackers into the same file. Maybe some __init__.py trickery can be helpful here?

Yeah that's basically it. I think there are different approaches but what I do is I think for every __init__.py "is there anything in this package [folder with an __init__.py] that I want to be able to import from the level above?" and if the answer is yes, do this (using your example):

Python code:
# MLP/models/trackers/__init__.py
from MLP.trackers.trackerA import BobsTracker
from MLP.trackers.trackerB import JoesTracker
Then in anywhere else you can do

Python code:
from MLB.trackers import BobsTracker, Joes Tracker
Edit: Essentially, MLB/trackers/__init__.py corresponds to MLB.trackers in the same way that MLB/trackers/trackerA.py corresponds to MLB.trackers.trackerA. Anything you want to be accessible in MLB.trackers, you must define or import in MLB/trackers/__init__.py.

SurgicalOntologist fucked around with this message at 19:52 on Dec 26, 2020

SurgicalOntologist
Jun 17, 2004

Rocko Bonaparte posted:

I'm wondering how other people might be dealing with a situation of scratchwork classes and rigid type checking. I have some classes that are getting progressively fed information so their fields start out optional. When they are in-use, these will be filled.

Type checking hates this if I don't null check everything once the instance is in-use. One of the situations was a good candidate to use a builder because there was a lot of logic associated with generating defaults. However, another one is just some scratchwork.

So right now I'm trying a dataclass that has the fields optional while it's being internally messed around, but it basically returns a version of itself with fields not optional and not null. If I add a property, I will have to remember to put it in both places. I'm not sure about doing reflection stuff to auto-populate because I have to specify the typing information. For as much as I'm using it, that's fine, but it smells a little bit. Has anybody else worked through a problem like this before?

These containers are areas where people love to put the wrong thing in the wrong spot so I specifically want type checking to get involved for them.

I try to design classes such that there are no optional fields and it only gets instantiated when it is truly "ready". The idea of strong typing is that it shouldn't even be possible that an invariant-breaking instance can exist. Typically that means custom constructors (classmethods that return an instance) for the various construction paths. Maybe that's when you meant by a builder, I'm not sure.

I guess your dataclass thing is kind of an extreme version of that, but if you have such a variety of paths that an alternate constructor is not enough, and you are needing some of the same properties before the instance is "valid" that you also use on valid instances, that's a big sign to me that you haven't picked out the right classes in the first place. For example, maybe you can group some of those fields into another class, that gets instantiated and has properties and then becomes an attribute of an instance of the higher-level class. What you're dealing with tends to happen to me when I find myself putting everything related into one class. If I had to guess, your problem is having too few classes

SurgicalOntologist
Jun 17, 2004

Haha, you didn't have to edit out calling me out. I'll cop to not knowing the builder pattern. I guess that's why I'm in the Python thread :eng99:

SurgicalOntologist
Jun 17, 2004

QuarkJets posted:

So I'm watching the NVidia GTC talks this week and I just want to express my tentative excitement over legate.numpy and legate.pandas. They were in closed-access last year, but now they're out, and they're basically drop-in replacements for numpy and pandas for parallelized GPU computing

Nice! Any idea how it compares to JAX? Looks like legate is easier to work with multiple GPUs while JAX targets more the single GPU use case, at least it's my 5 minute analysis.

At work we use xarray so it always seems a bit farther off that we can put our entire codebase into one of these backends, although looks like there is the beginning of some support so maybe we should try it. We manually cast to JAX arrays in a few slow functions but it's annoying to lose the conveniences of xarray. I wonder if the fact that it supports pandas means xarray is more doable (xarray basically generalizes pandas for high-dimensional arrays so you can get named dimensions rather than "columns" and "index" -- it uses pd.Index under the hood).

I'm registered to the conference but haven't been able to attend anything -- I've got a long list of talks I want to try to watch over the weekend.

SurgicalOntologist
Jun 17, 2004

I do have a personal project that I develop against docker. If it was just the one conatainer I probably wouldn't bother, but once I had a docker compose set up with rq, Cloud SQL Proxy and some other dependencies, it was easier to use it for development than install those things "natively" (although that wouldn't be hard either). For development I just volume mount the source code directory, there are probably better ways but it works well for a Python flask project at least.

SurgicalOntologist
Jun 17, 2004

You can directly pass an open file to the stdin/stderr args of Popen any of the subprocess helper functions.

Edit: oops, misunderstood. I missed the console part. You want to do a tee basically. I don't know if there's any shortcut besides iterating. Well, maybe just use tee outside python.

SurgicalOntologist fucked around with this message at 19:23 on Jun 7, 2021

SurgicalOntologist
Jun 17, 2004

Nevermind

SurgicalOntologist
Jun 17, 2004

It's probably overkill and I have no idea what it offers for serializing expressions, but it could be interesting to use the symbolic computation library SymPy to represent variables and expressions. I think it can parse strings so you wouldn't have to worry about that. You probably don't need much math stuff but you can evaluate expressions, and maybe some other features like simplification or equation display would be appealing.

SurgicalOntologist
Jun 17, 2004

It sort of looks like a pandas MultiIndex as columns, except the labels aren't repeated. I would suggest to "manually" construct a MultiIndex for the column axis. They you can stack or similar to tidy the dataset.

If you really want to try the 3D thing, the library you want is xarray. But I don't think it would actually help reading the data, just manipulating it, depending on what you need to do. And it's probably overkill in this case, it really shines with data on a grid (eg volumetric).

SurgicalOntologist
Jun 17, 2004

That's awesome, how did I miss mamba for so long! My team will be very pleased to have faster builds.

SurgicalOntologist
Jun 17, 2004

Funny, I'm the opposite of cinci zoo sniper, the first thing I do in every project is create an empty environment and pip install -e the directory. And yes, I always use absolute imports.

SurgicalOntologist
Jun 17, 2004

To speak for Electoral Surgery, I think the idea is that it's a little weird that docstrings are string literals rather than a type of comment. In my experience it gives people the idea they can use triple-quoted string literals throughout their code as multiline comments. I've had to explain that distinction to more than a few less experienced Python developers.

(edit: wrong poster named)

SurgicalOntologist fucked around with this message at 22:20 on Mar 20, 2022

SurgicalOntologist
Jun 17, 2004

12 rats tied together posted:

you can use triple quoted strings as multi line comments, just don't have them also be docstrings. iirc the parser ignores them unless they are the first line in a class or module

Maybe it ignroes all string literals that aren't assigned to a variable, but it doesn't ignore all triple-quoted strings. I use them all the time for multiline strings.

SurgicalOntologist
Jun 17, 2004

I prefer dataclasses myself, but I felt like I should add to this conversation that TypedDict exists.

SurgicalOntologist
Jun 17, 2004

For CLIs I too use click for anything moderately complicated and docopt for smaller scripts.

If you're looking for something interactive, perhaps prompt_toolkit is what you're looking for. For something more flexible (i.e. not necessarily prompt based) urwid is super powerful.

SurgicalOntologist
Jun 17, 2004

Looks like it exists: https://github.com/click-contrib/click-repl

Edit: continuing to browse this list, looks like that's not your only choice.

SurgicalOntologist fucked around with this message at 21:34 on Jul 11, 2022

SurgicalOntologist
Jun 17, 2004

Assuming you are just grouping on 'work_start', you can get your equivalence classes with something close to:

Python code:
(
    df.work_start
    .diff().fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)
Basically, take the differences, check if its greater than the interval such that a 0 indicates it should be grouped, then take the cumsum. Every row that should be grouped will have the same value as the previous. The fillna(0) is for the first row.

Then this part I'm not sure about, but you should probably define a function that outputs each row you want from a subset of the DF, let's call it merge_group(group: DataFrame) -> Series (I think it should output Series, a row of a DF).

Python code:
(
    df.groupby(
        df.work_start
        .diff().fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .apply(merge_group)
)
Edit: if you want to check between end and start, the classes are something like:
Python code:
(
    df.work_start
    .sub(df.work_end.shift(1)).fillna(0)
    .gt(INTERVAL_THRESHOLD).astype(int)
    .cumsum()
)
Edit2: perhaps 0 and INTERVAL_THRESHOLD need to be TimeDeltas or whatever

Edit3: because I'm bored I checked pandas docs and I think you could do
Python code:
(
    df.groupby(
        df.work_start
        .sub(df.work_end.shift(1)).fillna(0)
        .gt(INTERVAL_THRESHOLD).astype(int)
        .cumsum()
    )
    .aggregate(dict(
        work_start='min',
        work_end='max',
    ))
)
Although if you have more columns you may still need a custom function.

SurgicalOntologist fucked around with this message at 19:14 on Jul 28, 2022

SurgicalOntologist
Jun 17, 2004

These types of problems are my bread and butter (finding consecutive sections of at timeseries that meet a set of criteria) and the basic pattern is:
  • Apply the criteria to get a boolean
  • Get a diff on the boolean to find "edges" of the groups
  • Take a cumsum to get indices of the groups
  • (It's not your case but often in my case I want a "null group" in between the groups. In that case, apply back the False values from step 1 to fill those as zeros or null.)

SurgicalOntologist
Jun 17, 2004

Josh Lyman posted:

Yeah it's tricky. Not sure if this will help but I'll try to describe it better: Imagine a dataset that has the daily weight and calories of every goon and I can request them by username and month. The server normally returns this as a nest list

[[weights for month] , [calories for month]]

If I request a username that doesn't exist, the API returns [] which I can handle fine.

The problem I'm trying to understand is when it returns [[] , []]. This is where stopping the notebook and rerunning the cell in Jupyter "fixes" the issue and the nested list will be correctly populated. If it were strictly a rate limit issue, you'd think a 5 minute sleep before the code block tries the request again should work but it doesn't.

To be clear, restarting the client code is what solves it? I mean, you are not restarting the server or anything I presume. Are you doing anything in the client code such as using sessions? Or any other state that gets cleared when you restart the kernel? It must be something like that.

There's no reason the same request should get a different result unless soemthing changes about the request itself. The server doesn't know the environment that the request was made unless its passed into the request. Which could happen for example with sessions. You might be able to inspect these kinds of things in the request object, or even check the traffic with wireshark and see what's different in the requests that don't work vs. the next one after restarting the client.

Edit: reread the post, if you're not even restarting the kernel but just re-running the cell... that's potentially even weirder although depending on what's in the cell I suppose it could be equivalent. In the end it comes down to what's in the cell. I'd put my money that you're building up state somehow.

Edit2: just to elaborate on what I would do to debug, specifically inspecting the requests object. Detect when the situation happens but break the loop instead of sleeping. Take a look at r, headers, cookies, and especially r.request (which encapsulates what you're sending to the server). In the next cell make another request but don't overwrite r, call it r2 or whatever. Assuming that request works, compare them side by side. Find the difference in what you are sending to the server.

SurgicalOntologist fucked around with this message at 23:44 on Aug 19, 2022

SurgicalOntologist
Jun 17, 2004

Josh Lyman posted:

What do you mean by building up state?
I guess I mean, whatever you're doing in the cell, its causing one request and another to not actually be the same despite appearing to have the same parameters. The actual request you're sending might be different, in some way, depending on the state of all that code you're running in that cell. It's the only way I can make sense of it.

Josh Lyman posted:

Edit: Regarding your edit2 , that was one of the first things I tried and figured out the data exists. So when I see the script is inside the sleep loop, I’ll stop the notebook and manually run the API call in another cell using the “current” parameter values and it works.

Yeah, but what I'm suggesting is taking that "r" object you get, specifically r.requests which is an object of type requests.PreparedRequest that encapsulates all the information you're sending to the server, and take a close look. See if r.request and r2.request are different in any way, assuming r is a failed response and r2 is a successful one. Are the request headers exactly the same? Is the request body exactly the same?

To put it another way, if you send the same exact set of bytes to the server, it shouldn't matter whether you send it from one cell or another (given that you seem to have ruled out rate-limiting on the server side with your sleep experiment). Much more likely than anything else I can think of, you are not actually sending the same bytes, but somehow the context of the cell (i.e. the loop) is changing what bytes you send to the server. Inspecting your PreparedRequest will help you figure out if that's the case. Also, other fields of the response (r) itself might be interesting too, perhaps there is a clue in the status code or the headers returned from the server.

SurgicalOntologist
Jun 17, 2004

Josh Lyman posted:

This kinda makes sense to me. Stupid question, how do I run requests on a local server? The API call work like
Python code:
from localserver import localpackage
from localserver.localpackage.parameters import queryname
%env no_proxy.com=.localserver.com,localhost

my code loop
	localpackage.queryname(param1, param2...)
In a browser window, I go to http://jupyterlab.localserver.com/user/username/lab which has a one-click AWS Cognito login. The opens JupyterLab where I run all my notebooks. The same server also handles the API call (I think). Since I'm using any URLs or user/pass for the API call, I'm not sure how to get requests to work with this setup.

I have no idea what you mean by this localserver thing and Google isn't helping. I assumed from previous posts that you were using requests. If that's not the case, my specific advice is less relevant but probably the answer lies in this direction. Is it really running on localhost? Do you control the server? Where did the code for localserver.localpackage come from?

This makes it more complicated because the state that's affecting the requests might be coming from your code but from the code you're importing. In which case there might not be much you can do about it. Whether its possible to use requests instead depends on how much info you can get about the actual HTTP API and the complexity of the code (you can see the code in Jupyterlab by typing the function/class/module with ?? after). Probably in the code they are using requests but not necessarily. I mean, with the info we've seen, maybe it's not even an HTTP API but a simulation of one, in which case all bets are off.

SurgicalOntologist
Jun 17, 2004

Based on "self" I'm guessing that's in a class. Functions are bound to the instance as methods, so it should be key=self.myFunc1.

SurgicalOntologist
Jun 17, 2004

I tested in a notebook and couldn't reproduce. Even reimporting doesn't reset the value.

How are you referencing params in the algorithm? Is it in the notebook or imported? Could be something there.

SurgicalOntologist
Jun 17, 2004

I would fork remotezip and modify it to support passing a session optionally. Looks like it's less than 200 lines so it should be pretty easy and is a worthwhile contribution to give back to the project

SurgicalOntologist
Jun 17, 2004

Nice! I didn't know about remotezip before you mentioned it, I'm going to keep it in mind in case the need arises.

SurgicalOntologist
Jun 17, 2004

That reminds me, I have an ongoing argument with a colleague about private (by convention) methods and attributes in Python. He wants everything to be private unless there is a strong reason to make it public. His classes typically have only one or two public methods. I prefer everything to be public unless there is a good reason to make it private. I typically only make private methods for little helper methods that I refactor out of other methods.

Is the almost-everything-private convention common in Python? In my experience no, but when we debate he can find examples in relatively well-established open-source projects. We've pretty much agreed to disagree for years but the style is noticeably different depending on which of us started the package (we were two of the first developers at the company and now mostly work on separate projects).

His background is computer vision, not Java if that's what you're thinking.

SurgicalOntologist
Jun 17, 2004

Is your Chromebook at all powerful? Because it can run Linux just fine most likely. I develop on a Chromebook and use VSCode server (code-server) on localhost, although I've also used it remotely from time to time, as well as the regular Linux VSCode. All three options work pretty well for me and nearly identical usability.

SurgicalOntologist
Jun 17, 2004

In my opinion, typing list instead of List[Group] just to avoid the circular import is probably the worst possible solution. If you believe the circular relationship is a smell (which I'm on the fence about), well that hasn't been addressed, and you've basically given up on having your code well-typed. I find the recommendation to do that or List[Any] baffling, even if it comes from Google.

What I might do is something like this.

Python code:
# Original

class User:
    def get_groups(self) -> List[Group]:
        ...


class Group:
    users: List[User]

 
# Refactored

class User:
    ...


class Group:
    users: List[User]

    @classmethod
    def get_groups_of_user(cls, user: User) -> List[Group]:  # anyone know if there is a way to type this like List[cls] instead?
        ...


Basically, some_user.get_groups() may look natural but there's nothing wrong with Group.get_groups_of_user(some_user). To me, it's nice to have a simple class like User at the base of the tree with purely internal logic and no dependencies. It's groupability behavior can be in another class, whether that's Group or a third one, without losing expressability.

On the other hand, I'm partial to the argument that this is just a workaround and there's nothing wrong with the circular dependency (other than the fact that it's not allowed). Probably the actual best approach in these situations is to put them in the same file. And if the response is "that file would be too big!" then for sure the classes are too big and should be split. Maybe user.py and group.py is not actually the best way to split the logic into modules/classes. Especially with Protocols you can split different aspects of your objects' behaviors among classes rather than having big monster classes. So, my response when a colleague asks for help with a circular dependency problem is almost always "make your classes small enough that you are comfortable putting them together". If they are so tightly coupled, it makes sense that they be in the same module.

Edit:

I think the answer to my above question about typing the classmethod output is something like

Python code:
T = TypeVar('T')

class Group(Generic[T]):
    def __init__(self, items: Iterable[T]):
        self.items: List[T] = list(items)

    @classmethod
    def get_groups_by_item(cls, item: T) -> List[Group[T]]:
        ...


user_group = Group[User](users)
(I would probably also try to subclass from Sequence or Collection and get useful methods like __iter__ and __getitem__.)

Oh wait, that just abstracts the User part, not the Group part. Well, maybe useful, but I'm still not sure how to enforce that a subclass's get_groups_by_item can only return a list of itself rather than of a parent class.

Edit2: Ok I googled it, the answer is to use typing on cls. Getting complicated but maybe...
Python code:
T = TypeVar('T')
G = TypeVar('G', bound='Group')

class Group(Generic[T]):
    def __init__(self, items: Iterable[T]):
        self.items: List[T] = list(items)

    @classmethod
    def get_groups_by_item(cls: Type[G], item: T) -> List[G[T]]:
        ...


user_group = Group[User](users)
Admittedly that was a bit of a side curiosity question and that probably isn't necessary if the only thing you want to make groups of is users. Still, it shows that not only you can define users without them needing to know that they are groupable (by putting all group-related behavior on the grouping class), but you can even define grouping capabilities without needing to know anything about users specifically. Any behavior related purely to managing generic sub-items can go in Group and any behavior dependent on the fact that you are dealing with users specifically (if any) can go in another UserGroup class.

SurgicalOntologist fucked around with this message at 13:03 on Apr 16, 2023

SurgicalOntologist
Jun 17, 2004

I just can't help myself...

Python code:
def isodd(n: int) -> bool:
    return bool(n % 2)


def sort_odds_inplace(inputs: list[int]) -> list[int]:
    odds = iter(sorted(filter(isodd, inputs)))
    return [next(odds) if isodd(n) else n for n in inputs]


assert sort_odds_inplace([5,2,7,1,4,6,3,9]) == [1, 2, 3, 5, 4, 6, 7, 9]
Of course, ask me to do it live and I'd probably completely crack, I've never had to do a live coding test.

SurgicalOntologist
Jun 17, 2004

boofhead posted:

i, too, couldnt help myself

Yours will get the wrong answer if the input list has 0 in it. :viggo:

SurgicalOntologist
Jun 17, 2004

We use extras_require['test'] in setup.py for test dependencies that don't end up in the build (neither do the tests themselves, if you package things correctly). Although I gather setup.py is deprecated and we're behind the times, I assume there's an equivalent.

If someone wants tests, rather than running from PyPi/docker/whatever build, they should pull the source. The only reason to run the tests is if you're developing anyway, assuming you can see in the CI that they passed for the current build.

SurgicalOntologist
Jun 17, 2004

rowkey bilbao posted:

How should I write this to convince flake8 to calm down:

Python code:
from fastapi import APIRouter
from typing import Union, Any
router = APIRouter()

USERS_AND_ANIMALS = [... for _ in something]  # dynamicaly loading some pydantic models

@router.post("/", status_code=201, response_model=Union[*USERS_AND_ANIMALS])
def create_user_or_animal(data: Any):
    pass
That snippet behaves like I want to, but our CI spits a warning at the @router line.
The warning is:

code:
.../file.py:9:52: E999 SyntaxError: invalid syntax. Perhaps you forgot a comma?


This is a bit of a smell IMO. Much better would be to setup the classes in `USERS_AND_ANIMALS` such that they have the same parent class. If not possible through inheritance, you can use a Protocol, which is essentially a way to define a class based on its behavior (e.g. any class with methods .feed() and .eat() is an Animal) without needing to touch the class implementations. I'm not super knowledgeable on pydantic if there's any trick there but probably a superclass is the easiest solution here if you have control over those implementations.

Edit: I think Unions and the word "or" in names is a smell to be clear. Certainly a big Union like this one, and there's a better way to type it for sure. But I would also add that create_user_or_animal is also a bit smelly. I don't know your domain and maybe there's a good reason, but I think most likely your code would be more maintainable with different functions/endpoints for creating animals and creating users.


rowkey bilbao posted:

Additionally, where can I read about types in a way that helps me understand wtf it means when I see TypeVar or T or funny brackets ?

Those are generics, mypy docs are pretty good on that: https://mypy.readthedocs.io/en/stable/generics.html

Actually mypy docs are good in general, the section on Protocol is probably better than the one I linked above.

SurgicalOntologist fucked around with this message at 18:51 on Sep 24, 2023

Adbot
ADBOT LOVES YOU

SurgicalOntologist
Jun 17, 2004

I suggest take a step back and review your whole type system, not just Gem. In particular soil. It doesn't make sense to me that it can be so many different things (string, dict of strings, dict of gems). I mean, I can imagine the reasons. But I think if you try to design a proper Soil type, rather than trying to capture the different possibilities of Soil with different arrangements of builtin types, you will end up simplifying things a lot.

Edit: to be more specific, you don't really have a dict in the sense of mapping arbitrary keys to values. At least judging by your choice of "underground" in the example, that looks like a domain concept, not an arbitrary string.

From another angle, imagine you have your data model set up, and your next task is to implement a method that checks whether a certain gem is anywhere in the asteroid. That's going to be a nightmare with all the combinations you'll have to check.

SurgicalOntologist fucked around with this message at 21:20 on Sep 27, 2023

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply