Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Dominoes
Sep 20, 2007

One minor, tangental* step: I'm going to move the python process manager I wrote towards only supporting the latest python minor version. I've been neglecting this project for want of spare time, but this change may simplify the codebase and API.

With that in mind: If you're using 3.6, 3.7 etc, why? Would switching to 3.8 break anything for you?

* Relevant in that it reduces a system state degree-of-freedom.

Dominoes fucked around with this message at 02:19 on Oct 10, 2020

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

Duly noted! I'm now leaning against this idea.

Another tangent: Have any of y'all dug into the cyphon codebase? The tool I referenced above is essentially an API wrapper. I wonder how tough it would be to modify Python to use better behavior natively.

Dominoes fucked around with this message at 19:00 on Oct 10, 2020

Dominoes
Sep 20, 2007

Are you serious?

Let's go for a drive!

Dominoes
Sep 20, 2007

The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.

Dominoes
Sep 20, 2007

Mirconium posted:

I mean, I appreciate Python as much as the next guy, and most of my coding is in Python, but at that point wouldn't it just be easier to go to a fully typed language?
Typing works well in Python, and (Subjective, of course) is almost always worth it, even for small programs.

Something interesting happened, gradually I think, over the past decades: Explicit typing went from being something required for compilers to assign memory etc, and could be a chore. With the addition of type inference and complex type systems, it shifted to a powerful tool that lets compilers, IDEs, and other tools (like mypy) catch bugs, and ensure your program acts how you intend.

Dominoes fucked around with this message at 03:00 on Oct 17, 2020

Dominoes
Sep 20, 2007

That's awesome.

Dominoes
Sep 20, 2007

Mirconium posted:

I think the question I was trying to ask, which I ask as a non-CS-educated person (mainly a biologist) who genuinely doesn't know, is what are the benefits that Python derives from NOT having static types?
My biggest guess is that it was born in an era before static typing as a verification tool was popular. But, perhaps there's a more subtle reason I'm missing.

quote:

But if the Rust library ecosystem was as extensive as Python's, I would never write a line of another language again
I'm optimistic about that as well. I think the timeline will vary significantly based on the area. For example, despite the work on the web server ecosystem, Rust has nothing near as powerful or feature-rich as Django. And for numerical/scientific computing, Python's lead will be hard to catch up with. I can see Rust being used to optimize libraries using FFI that are called by Python.

Another key perk of Python that I think Rust won't be able to catch up with is the REPL - especially variants like iPython. Using it for quickly testing functions, as a powerful, customizable calculator etc.

Dominoes fucked around with this message at 17:35 on Oct 17, 2020

Dominoes
Sep 20, 2007

As Data points out, the dir function doesn't do what you're asking. Check out the official docs.

edit: I think your book is talking about a Unix/powershell console command, and is not directly related to Python.

Dominoes fucked around with this message at 15:51 on Oct 29, 2020

Dominoes
Sep 20, 2007

Hey bros. Is there a good way to visualize data with 3 input dims and 1 output? Maybe with some sort of shading, or density of points, or threshold values for drawing surfaces.

(To define my terms. 1 in 1 out: Scatter/line plot. 2 in 1 out: surface or contour plot. 2 in 2 out: Vector plot in 2d. 3 in 3 out: Vector plot in 3d.)

Dominoes
Sep 20, 2007

Bad Munki posted:

Like your 2-in 1-out, but add hue for the 3rd dim.
Thanks. That would convey the info, ie a colored surface plot.


CarForumPoster posted:

Its hard to know without knowing the data types but if they are all ordinal categories/numeric data, you can put 4 features on a scatter plot easily with X Axis, Y Axis, Point color (green->red), point size.

Which of those should be your three inputs and which one should be your output kinda depends on the data, though I' suppose I default to Y-Axis or color as output variables.
All numeric data types. The colored scatter would work too, and is conceptually similar to the colored surface. Although the output has a value for any combo of inputs. I'm trying to visualize molecular orbitals, so ideally, it would be some sort of hazy 3d object, or perhaps (This is common:) a contour plot up a dimension. Ie bound 3d solids with a threshhold value. (Although this would be very sensitive to the threshold). I worry that the colored surface plot or scatter wouldn't get great at building visual intuition for what the orbitals look like. Although maybe a modified scatter, where instead of a uniform set of points with diff colors, for each sample point, you drop a different density of scattered points, so you see many points near a high value, and none near 0.

I think ultimately, the answer will have to be some sort of custom rendering. Refreshing Vulkan skills. Does this seem overkill/over-engineered? Yes. Is this one of those cases where you have to fight that instinct and just do it? Leaning yes.




Bad Munki posted:

And bear in mind, every lovely meme that gets posted is showing 3 values at every x/y coordinate, you can use 1, 2, or 3 of those bands to your heart’s desire similarly.
True!

Dominoes fucked around with this message at 15:30 on Oct 30, 2020

Dominoes
Sep 20, 2007

CarForumPoster posted:

EDIT: One exception to the above lol is when the 3 factors and output are in some easily thought of coordinate system like (X, Y, Z, output) or (range, azimuth elevation, power in dB)
In this case, they do correspond to the 3 space dimensions! The output is a scalar field value. (electron wave function, or probability-density function.

quote:

EDIT2: Just post your graphs. Its my first day off in a while and I am day drunk posting about python.
Here's a 2D example. I don't yet have the 3d computations done, but it will be an extension of things this in 1 extra dimension:


The orbitals table images on this page are one way the 3d graph might look.

Dominoes
Sep 20, 2007

Fork.

Dominoes
Sep 20, 2007

I assumed notebooks were for sharing results, especially with interactive visuals.

Dominoes
Sep 20, 2007

Re the notebook chat: It's always something I've had in my hip pocket as a tool I'll know is available when the use came up. I was surprised to hear (partly from articles critical of its non-linear excecution) how it's often used in practice, ie for broader use cases.

Dominoes
Sep 20, 2007

Cyril Sneer posted:

As a follow up, is there a way to allow dynamic assignment of classes? In order to assemble my pipeline, I was thinking of using a configuration dictionary, with the keys specifying the particular model to use, amongst other things.
...

I recommend dataclasses for configs, since they likely have a fixed set of settings (fields), and the values are of different types. Comparatively, dictionaries leave you open to errors that will surprise you at runtime.

Use an Enum to define allowed classes that can be in the config:
Python code:
class SelectedModel(Enum):
    MODEL_A = auto()
    MODEL_B = auto()
    # ...


@dataclass
class Config
    predictor: SelectedModel

config = Config(SelectedModel.MODEL_A)

Dominoes fucked around with this message at 21:16 on Dec 28, 2020

Dominoes
Sep 20, 2007

SelectedModel.MODEL_A is an enum variant. You can think of an enum as a way to list choices. (Or as a binary with more than 2 variants, and semantic meaning to each variant.)

When processing your config, you run different code depending on which variant the predictor field holds. This way your IDE etc will only allow certain classes to be selected in the config. Enums allow you to specify only valid models. Your example used strings; presumably not every string (or every class) is a valid option in your config!

If you plan to serialize your config, instead of using enum.auto(), specify an integer for each variant, so your serialization is consistent.

Dominoes fucked around with this message at 22:00 on Dec 28, 2020

Dominoes
Sep 20, 2007

In Windows, pip install is fine. On Linux, you risk putting your system in a totalled state due to system reliance on the python install it comes with.

Try this:
-Download the latest Python source from python.org
-Install it using `configure`, `make` and `sudo make install` from the directory you unpacked it to
-Use its pip: `python3.9 -m pip install apackage`.

This way, you don't risk modifying a package your OS relies on. Or, roll the dice with your system python's pip. It will probably be fine.

Dominoes fucked around with this message at 19:39 on Dec 30, 2020

Dominoes
Sep 20, 2007

12 rats tied together posted:

I like pyenv for managing python installs. The usual workflow is that you:
...
It probably doesn't work on windows.
There's also this program I wrote that abstracts the python installation and package management.

quote:

I'd really suggest using WSL to do any sort of python development on windows anyway.
Why?

Dominoes
Sep 20, 2007

There are options for configuring the Python interpreter, including adding additional paths for libs, but I don't understand what you're attempting.

Dominoes
Sep 20, 2007

copy.deepcopy(j)

Dominoes
Sep 20, 2007

Rocko Bonaparte posted:

I'm wondering how other people might be dealing with a situation of scratchwork classes and rigid type checking. I have some classes that are getting progressively fed information so their fields start out optional. When they are in-use, these will be filled.

Type checking hates this if I don't null check everything once the instance is in-use. One of the situations was a good candidate to use a builder because there was a lot of logic associated with generating defaults. However, another one is just some scratchwork.

So right now I'm trying a dataclass that has the fields optional while it's being internally messed around, but it basically returns a version of itself with fields not optional and not null. If I add a property, I will have to remember to put it in both places. I'm not sure about doing reflection stuff to auto-populate because I have to specify the typing information. For as much as I'm using it, that's fine, but it smells a little bit. Has anybody else worked through a problem like this before?

These containers are areas where people love to put the wrong thing in the wrong spot so I specifically want type checking to get involved for them.
Whenever you use one of the optional values, explicitly handle both the null and non-null case. Verbose, but robust.

Your builder or two-class system could work, if you confirm your workflow change states from nullable to non-nullable cleanly.

Dominoes
Sep 20, 2007

Hughmoris posted:

Anyone need an extra set of novice hands for an open source project? I can't think of any pet projects to create, and I'd like to take a stab at contributing to a meaningful project. It can be your project, or someone else's project that you're contributing to.
Python package and installation manager. I don't have time to maintain it now. Mostly works, but there are cases that cause it to fail. Ie certain combos of OS, dependency, Py version etc. I think the best goal is fixing edge cases so it works, or fails as gracefully as possible.

Dominoes fucked around with this message at 05:53 on Feb 21, 2021

Dominoes
Sep 20, 2007

I'm kind of drunk, but it looks like you're trying to overload functions in Python. Don't do that.

Dominoes
Sep 20, 2007

Rocko Bonaparte posted:

Has anybody here gotten on to the Python machine learning train?
It's a hype trap. Like data science a few years back. Might be a good resume pad if you're looking for a job or VC funds.

If you're trying to solve a practical problem, consider a decision tree.

Dominoes
Sep 20, 2007

QuarkJets posted:

"Trap" maybe isn't the right word when machine learning experts are in massive demand and are actually solving unique problems that haven't been easily solvable by classical methods. There's definitely a lot of hype present, but it's also just a really useful tool, like learning how to use Docker or CUDA

e: And I want to clarify that there are absolutely a ton of grifters that are taking advantage of the hype and trying to apply ML to everything while pretending like it's magic, but that advice is more relevant to project managers than developers

Rocko Bonaparte posted:

Somebody called you out on this but it doesn't mean you're wrong either. I'm doing some expeditionary stuff at work because it's politically vogue to try to hit some of our problems with some machine learning. When I heard about it, I first thought about what exactly they're trying to accomplish in even the most basic terms of inputs and outputs. Nobody really knows and that's a warning sign. But since I was the idiot that tried to use neuroevolution for stock market stuff a decade ago, I'm wading through it myself. I suspect there are machine learning solutions to these particular problems; my general take on if its possible is if I can model a situation and "see" a solution but it's particularly difficult to outright code the solution in a contemporary way. However, if I can code an assessment of success then I have a fitness function and "I'm halfway there" (fighting non-linearity in the model sounds like what will take the remaining 50% of effort until it expands to 99% of the effort...).

Going in another direction: are you implying a decision tree would not be machine learning? Or was that just a "X instead of Y" kind of answer? I agree with the idea but I'm testing the whole ecosystem using a domain I've learned in the past. So since I'm cozy with perceptrons, I figured I'd assess the different libraries based on my previous experience with them. This is showing me how goofy this stuff is and implies that if I naively go off and use something like the decision trees that I'm going to be wading into some muck that doesn't mesh up evenly with how it's taught as theory.

We're on the same page. I'm opportunistic ML and AI techniques will transform our world, and we're gradually getting there. In their current form, I'm not convinced ANNs, SVMs etc are good fits for many problems. There are exceptions, like image recognition. There's enough noise today that I'd guess a random mention of ML or AI (especially a press release, job description, business plan, resume etc) is full of it; I've updated my priors.

You can classify a decision tree as ML if you want, or not. It's an easy-to-grasp, but powerful tool for creating complex behavior.

If you'd like to get into ML, carefully consider why first. Would this have been appealing 5 years ago?

Dominoes fucked around with this message at 00:00 on Mar 12, 2021

Dominoes
Sep 20, 2007

I don't have a reason to draw a line; categorization is a tool you can apply to a problem. Maybe you have a reason to draw a line for DTs as ML or not.

In the same sense, choose a tool suitable for the problem you're working with. Maybe it's something categorized as ML. I reject xhoosing ML when it's the wrong tool.

Dominoes
Sep 20, 2007

I don't know how feasible this is for your use case, but I agree - Pandas can be a performance bottleneck. You're working with strings instead of numbers, but when working with numbers, using numpy arrays (which IIRC it wraps) is OOMs faster.

Adbot
ADBOT LOVES YOU

Dominoes
Sep 20, 2007

Foxfire_ posted:

Your underlying problem is more that pandas in general is trying to optimize for person-time writing code at the expense of being very slow and using lots of RAM. Generally that's a good assumption, but not here if you have too much stuff and are going to run it a lot. pandas is also horrible at strings since the underlying actual things being stored is arrays of PyObject pointers to full python objects elsewhere.

8,000,000 rows x 50 cols isn't that much data. Like if each string is 64bytes, that's still only about 25GB, which is only going to take a second or two to go through if they're already in RAM.

I would:
- Move data to plain numpy arrays of string_ dtype (fixed length, not python objects)
- Do the fuzzy string match in numba. Levenshtein distance implementations are easily googleable
Ought to take less than a second to compute distance between the query and every element in a column

Wouldn't use a database since you won't be able to do the fuzzy matching on the database side, and you want to avoid having to construct python objects or run python code per element.
Agree. Now that I'm in a mood to ignore diplomacy: Pandas blows. The devs think we live in a post-computational-scarcity world. They're wrong.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply