Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
susan b buffering
Nov 14, 2016

Probably using a remote debugger, I imagine.

Adbot
ADBOT LOVES YOU

susan b buffering
Nov 14, 2016

Loezi posted:

Today I learned about https://docs.python.org/3/library/contextlib.html#contextlib.suppress and
Python code:
with suppress(FileNotFoundError):
    os.remove('somefile.tmp')
Are there other similar useful context managers, other than open, that one wouldn't necessary know about?

Basically anything where you're dealing with a resource that should eventually be released can use it. Off the top of my head, sqlite connections, thread Locks, and socket servers all can make use of the "with" statement, which is essentially shorthand for the traditional try-finally block that you would use to ensure an open resource is released when it is no longer needed.

In more concrete terms, any object which implements the __enter__ and __exit__ methods is a context manager, and will work with "with".

susan b buffering fucked around with this message at 09:47 on Apr 13, 2019

susan b buffering
Nov 14, 2016

mr_package posted:

When working with relational data, do you keep it in relational format or just flatten it all? I'm pulling a bunch of stuff out of a database and the tables are all normalized correctly, and I'm not sure whether to keep the small id:text tables. So for example, each item has a "language_code" parameter, in the database it might be 1/2/3 for en-US/fr-FR/de-DE but I could easily pull that out and have it be "en-US" instead of 1 and not even maintain this separate table at all. Advantages of being more human readable might outweigh the bloat and slowness of trying to update all 5k items if we changed to "en_us" or something.

I'm used to working with databases, so normalized data is totally standard to me but maybe in Python / JSON this becomes an antipattern? Anyone ever dealt with this one way or the other and then regretted it?

I tend to keep things relational by having my DB models represented as classes with relevant methods for accessing related objects.
So in your case, I'd have something like:
code:
class Text(object):
    lang():
	 self.conn().execute("SELECT * FROM lang WHERE id = ?", self.lang_id)
         lang = self.conn().fetchone()
	 return lang
This way you can have the human readable value without de-normalizing the database.

You can even add a @property decorator above the method so it can be accessed as Text().lang.

I would recommend looking into ORM and datamapper design patterns for more ideas.

susan b buffering
Nov 14, 2016

mr_package posted:

Thanks that is something I was thinking about too-- keep the schema as-is and then just try to provide a clean interface (if only to myself) to it.

One wrinkle is I'm basically migrating from a database to JSON (seems friendliest serialization format). But your approach would still work, it would just be reading it out of a dictionary (which is itself parsed JSON) instead of directly querying the db. I'm essentially doing SQL export to JSON, and trying to decide on what the schema/design of what that JSON should be.

I am not sure how much 'first normal form' style of data modeling to maintain in this case. The DBA in me says 'don't throw away this data / schema, it's useful and correct' but pragmatically I look at this and say 'actually YAGNI, just write it out as a config and forget the old database'.

I suppose the fundamental question is: if you're writing an app that is using a kind of medium-small data set (20k records or so?) but not using a database backend or even SQLite, would you still always always always model that data as relational? Or would you kind of cheat and make it a slightly bloated JSON config file and not worry about it too much? Is there a rule I just don't know about, like "No, it doesn't matter, you don't need a database so just use whatever format is most readable" or maybe "Yes for the love of god keep the data normalized it will save you so much pain in six months when you need to add another platform/target /os".

I probably would, but I also try and keep my models fairly agnostic of where the actual data is coming from. Common properties would come from a base class/mix-ins or sometimes stored in an object/dict passed into the instance.

When it comes to serialization, I'm definitely doing a bit of flattening. 2-column id:text tables are probably just going to be represented as the text. Related tables with more fields get an object(take care if your db structure allows for recursive dependencies :v:) and/or locator uri if this is being served as a REST API.

I personally try and avoid de-normalizing an already normalized database, especially as a shortcut to what my bespoke ORM should handle in code anyways. I totally believe you can make an educated decision to do so and be fine. For instance, with the lang table you mentioned in the first post, one could argue that since language tags are already standardized, keeping them in a separate table is excessive. OTOH that sort of table structure can be a real benefit if you need to track down / prevent typos or other invalid data.



I've actually been doing basically the opposite of what you've been doing, which is writing a wrapper for a REST api I pull data from pretty regularly, which will probably end up in a sqlite database. I made heavy use of dataclasses, which I highly recommend looking into(namedtuples, too).

Here's my base model class and one of the child classes.

Python code:
class Base(object):
    def __init__(self, client: object, data: dict) -> None:
        super().__init__()
        self.client = client
        if data is not None:
            for attribute, value in data.items():
                setattr(self, attribute, value)

@dataclass(init=False)
class Persona(Base):
    id: Optional[int] = None
    name: Optional[str] = None
    bio: Optional[str] = None
    since: Optional[int] = None
    email: Optional[str] = None
    website: Optional[str] = None
    image: Optional[str] = None

    def playlists(self, **kwargs) -> Listing:
        params = kwargs
        params["persona_id"] = self.id
        return self.client.playlists(params=params)
The nice part about this is that if I decide to load this data into a database, I can use these models to read from it again so long as I have a client object with the same methods as the client wrapper I wrote for the api client.

susan b buffering fucked around with this message at 19:06 on May 10, 2019

susan b buffering
Nov 14, 2016

The PEP for a local packages directory at least seems like a promising step forward.

susan b buffering
Nov 14, 2016

KICK BAMA KICK posted:

None of those books/videos look like the standard "learn Python" recommendations (IDK anything about those fields so I'm not sure if the ones specifically about like Pandas would be relevant to your interests). But PyCharm is the most common IDE for Python, and I think some of the features the Professional edition adds do relate to math/scientific/data libraries like NumPy, which I'm guessing you'd find yourself using at some point, so those licenses in the bundles are a great value (like $89 for a year standard but maybe you can swing an educational discount?). My suggestion: set the reminder on that bundle, install the free edition of PyCharm, start learning from whatever resource you find/someone else here recommends, and if you think you're gonna keep using it grab that 6-month Professional license at the $20 tier, can't lose at that price.

Definitely try what QuarkJets said but if you must compile for whatever reason, this worked for me on a 3B (with maybe one or two substitutions even I was able to figure out, and I am dumb) and includes the flags that enable some optimizations for the Pi's chips.

PyCharm Pro is free for students (along with the rest of their IDEs).

susan b buffering
Nov 14, 2016

Lambdas must evaluate to a value, which is why that doesn’t work.

lambda: None would probably work.

susan b buffering
Nov 14, 2016

I tend to use NamedTuples from the typing module for data containers, and DataClasses for when I need a bit more than that.

I’ll probably use typed dicts more with APIs. Haven’t really had a chance to mess with them yet.

Dominoes posted:

Lists of tuples sounds like trouble, since you're relying on indexes, which have no semantic meaning.

I've found organizing my projects are dataclasses and enums is nice - it provides a backbone where I think of my project in terms of how the data's organized. I with the enums had auto set by default, since adding integers manually implies meaning where there is none.

Eg from the docs:
Python code:
class Ordinal(Enum):
    NORTH = auto()
    SOUTH = auto()
    EAST = auto()
    WEST = auto()
Is more verbose than needed (ie Enums could have their own name without inheriting a class, and auto for each item is noise), but isn't too bad.

You can use the functional API to achieve what you want. The below is equivalent to what you posted. I chose the space-delimited form for the second argument, but a sequence or mapping can be used instead.
Python code:
Ordinal = Enum("Ordinal", "NORTH SOUTH EAST WEST")

susan b buffering fucked around with this message at 02:00 on Nov 21, 2019

susan b buffering
Nov 14, 2016

QuarkJets posted:

Does pycharm have the ability to do a project-level replacement of .format strings with f-strings? It can definitely do it on single lines but I can't seem to find the ability to do it globally and Google isn't helping.

I think you can use code inspections to do this. On my phone so can’t check.

susan b buffering
Nov 14, 2016

The one click droplet on DO is hilariously out of date last I checked, but a $5 droplet can run django just fine.

I’d recommend gunicorn for your WSGI server along with nginx. The gunicorn docs have a guide for this. Also definitely use a virtual env.

Deployment can be somewhat tricky but it’s worth figuring out. You can also just run it locally until you feel more comfortable.

susan b buffering
Nov 14, 2016

yeah, you're gonna have a much better time keeping the script simple and letting an external runner send you an email when the script fails.

if anything, you may want to do some logging to help diagnose any issues that come up. if you use systemd timers instead of cron then logging is simply a matter of printing log events to stdout

susan b buffering
Nov 14, 2016

You can use NetworkX to draw graphs with matplotlib, but the docs for this will tell you to use a dedicated tool like graphviz.

graphviz is a separate executable but the graphviz python module will call it for you to render graphs, so you don't need to feed it anything manually.

susan b buffering
Nov 14, 2016

Zero Gravitas posted:

Is there a way to create pandas dataframes in a class object? (I think thats the term.)

Im trying to create a stock control system with a bunch of dataframes for holding data. I thought I'd create a class so I can start a bunch of dataframes all with the same column headings and append data to them later.

Plainly I'm doing something wrong with my class creation since I simply get an object that doesnt inherit the pd.DataFrame options.


Be gentle, this isnt my usual day job.
You need to assign the dataframe to an instance variable, like so:

code:
self.df = pd.DataFrame(columns=columnNames)
Then you can access the dataframe from within the instance using self.df, but you need to include `self` as the argument to your instance methods. Here's a version of your code that hopefully helps you get started:

code:

import os
import pandas as pd

class Inventory():
    
    """
    Class for inventory operations
    
    - initialise
    - add item
    - remove item
    - add qty
    - remove qty
    
    """
    def __init__(self):
        
        columnNames = ["SKU", "BATCH","ORDERNO", "DESCRIPTION","VALUE", "QTY"]
        
        self.df = pd.DataFrame(columns=columnNames)
        
        
    def addItem(self, item):
        #bad example

        self.df.append(item)
    
    def subtractItem(self):
        #TODO
        pass
    
    def addQty(self):
        #TODO
        pass
    
    def subtractQty(self):
        #TODO
        pass

The example code I put in addItem was mainly to show how you would access the dataframe. Most of my experience using pandas has been analyzing preexisting datasets so I've know idea if "append" is the correct method to actually use.

susan b buffering
Nov 14, 2016

CarForumPoster posted:

I've never thought to use a df in a class before, good example.

If youre adding one df of N rows to the end of another (think add new row in a db) you use pd.concat([df1,df2], ignore_index=True/False)

Yeah, concat is right.

I hadn’t really thought of doing what the OP is doing either. The closest I’ve come is having a method return a dataframe, with the data coming from a sqlite database.

susan b buffering
Nov 14, 2016

Think I'm fully onboard the attrs train now. The ability to define a converter function for attributes is extremely convenient when dealing with nested data structures.

susan b buffering
Nov 14, 2016

Splitting up the work that way should be fairly ideal in terms of speed-up.

Number 2 might be possible but I wouldn’t count on it being simple if it is. You would have a much better time if those series were stored in an array like a pandas dataframe or just a plain numpy 2d array, even if just for the computation you’re doing.

I’d also recommend looking into using Numba for what you’re doing instead of joblib. It’s a JIT compiler that also handles threading and works particularly well with numpy arrays.

susan b buffering
Nov 14, 2016

TheKingofSprings posted:

Python is remotely sshing into a raspberry pi and launching half a GStreamer pipeline which sends a video stream to the computer running python. It then opens up a receiver pipeline using subprocess.Popen which is where the latency is getting measured and printed to stdout. The only option I’m using is shell=True right now.

I think it’s the receiver stuff which is causing this steady increase in latency but I need to confirm by running my transmission pipeline outside python while using subprocess for the receiver still.

It'd be helpful if you could post the code in question, or a minimal, working example. There are a lot of ways that the subprocessing module can be used that can result in deadlocking.

susan b buffering
Nov 14, 2016

2013 lurker rereg posted:

Doesn't pycharm pro include datagrip? I was surprised at how much I liked it once I got pro through work--i used SQL all the time in my "analytics" world, but being able to mock the DB easily and validate things was super nice.

All of Jetbrains IDEs contain the features of DataGrip, yeah.

susan b buffering
Nov 14, 2016

Mirconium posted:

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?

I don’t see how it would be easier to learn a new language + ecosystem than to add type annotations to your Python code.

susan b buffering
Nov 14, 2016

It also gets easier to use with each release. 3.9 added the ability to type hint collections without importing them from the typing module. So you can do list[int] instead of typing.List[int].

susan b buffering
Nov 14, 2016

Rocko Bonaparte posted:

Has anybody here used threading.local? I can (and probably will) do some experiments before getting into it, but I'm trying to understand its caveats and gotchas. I am in a situation where I need to pass in some context to a callback. The system that uses that callback is stateless and doesn't need to worry about thread safety. However, the callback I'm using is contextual so I need different state between threads. I'm trying to figure out how I might manage that state.

My understanding is threading.local() just gives me some handle to shove poo poo on a per-thread basis. I first need to figure out if threading.local() will give a unique handle each time that I have to then juggle or if it gives me the same handle each time for the same thread. If the former, then I guess I have to stash it somehow and resolve it on a per-thread basis. At that point, I don't know if it matters that I use it in the first place, so I was hoping it was the latter. The other issue then is cleaning up afterwards. I guess I need to make sure I delete the stuff so I don't leave it assigned to the thread for eternity.

Threading.local creates an object whenever you call it, so you should assign that object to a variable and then store local data as attributes on it. That object will persist as long as you need it, but calling threading.local again will just give you a new object.

susan b buffering
Nov 14, 2016

mystes posted:

It's part of the python standard now but the normal python interpreter doesn't check it. You can use it with mypy or an ide for type checking.

https://docs.python.org/3/library/typing.html

Also used for dataclasses and namedtuples(via typing.NamedTuple).

I think attrs also supports using them in class definitions now.

susan b buffering
Nov 14, 2016

musicbrainz picard is built using Python and PyQt and it’s no more difficult to install and use than any other software I’ve used

susan b buffering
Nov 14, 2016

I think there might be legitimate use cases for a decorator on __init__, but I can’t think of any good reason to use @classmethod there.

susan b buffering
Nov 14, 2016

Gin_Rummy posted:

I gotcha. So could I leverage the Python from the actual wave generator/sequencer portion onto a JS GUI, or am I looking at redoing my entire "app?"

If you care about the latency between the UI and the actual sound generating parts of the synth then you'll probably want to rewrite the whole thing in JS. The Web Audio API actually has some pretty decent building blocks for audio synthesis, so it probably won't be as tough as you think.

susan b buffering
Nov 14, 2016

CarForumPoster posted:

Y’all are nuts saying rebuild it in JS. Assuming the compute time for this is 10sec or less, Plotly dash single page app deployed to heroku. Easy peasy, don’t even need to ditch your QT GUI if you do it right, just import the function. Use boto3 to upload to S3.

Check the last page of my previous posts ITT. I wrote out exactly how to do this. You’ll have a deployed hello world web app that runs a function in 15 minutes for free. Then you add your code.

If your function takes a while, or tour audio files are big, this gets harder as you are hard limited to 30sec.

What the gently caress are you talking about? A synthesizer is real-time audio. If you’re talking about seconds then you’re in a completely different universe than the op.

susan b buffering
Nov 14, 2016

CarForumPoster posted:

Yea I guess I didn’t understand what the requirements of a synthesizer were. Socially appropriate reaction btw.

Lol maybe don’t open your post with “y’all are nuts” when making assertions about a domain you have no knowledge of if you’re looking for a polite response.

susan b buffering
Nov 14, 2016

I think his python book is good because after trying it twice in high school I gave up on learning to code and talked to girls instead :v:

susan b buffering
Nov 14, 2016

BAD AT STUFF posted:

I would love to have a place to bitch about Azure Data Factory, so I'm all for a data engineering thread.

when i was hired at my current job i was started out on an ADF project and i jumped to something else as soon as the opportunity presented itself

Adbot
ADBOT LOVES YOU

susan b buffering
Nov 14, 2016

Having not used Flask with PyCharm, I'd check the run configuration. You can at the very least probably set up a more specific binding in there if it isn't already what you want.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply