Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Zoracle Zed
Jul 10, 2001
Anyone know if there's a library that does shell style $(foobar) string interpolation? There's so many goddamn articles about python string formatting it's impossible to search for.

Adbot
ADBOT LOVES YOU

Zoracle Zed
Jul 10, 2001
hah no fstrings are great I just literally need to interpolate $ strings. yes, it is for a very stupid reason

Zoracle Zed
Jul 10, 2001
I'm using htcondor on a compute cluster. It lets you submit jobs with some bespoke syntax like "input = input_$(Process).txt output=output_$(Process).txt queue 100" and it will launch 100 cpus and feed them the corresponding input and output files. There are very nice python bindings for interacting with the master condor daemon but because they're thin wrappers around the C library they're pretty low level and I'm picking away at a more pleasant high level abstraction. A problem is that all of the variable expansion is done inside the library and as far as I can tell it doesn't expose that functionality anywhere. In theory since you know most of the variable inputs and the format strings you should be able to reconstruct the expansion pretty easily but then I thought about handling nested parens and barfed a little

e: to clarify the variables that would be expanded are interior condor variables, not environmental -- so shelling out to bash wouldn't help

Zoracle Zed fucked around with this message at 05:21 on May 11, 2020

Zoracle Zed
Jul 10, 2001

QuarkJets posted:

I would be hesitant to try and duplicate the variable expansion myself in Python, if apparently it's being done inside the library; can you design your high-level interface to pass through strings without expansion and let the library do them as-necessary? I know you said that the library doesn't expose that functionality, what I mean is that you don't deal with the expanded variables at all, your high-level interface could just take care of any boilerplate issues with using the low-level wrappers.

What does htcondor do for those variable expansions? Is it all custom-coded crap or is it a commonly-used feature of the library that just isn't being exposed?

we have some function that takes too long:
Python code:
def poop(n):
    return Butt(n).poo poo()
and some input data:
Python code:
n_turds = (randint(1e6) for _ in range(100))
it'd be real convenient if we could do
Python code:
shits = map_async_somehow(poop, n_turds)
shits.wait_all()
n_total_shits = sum(s.result() for s in shits) 
what you have to do instead is
Python code:
submit_to_condor(
    executable='python -c "print(poop($n_turd))"',
    data=[{'n_turd': n, 'my_id': i} for (i, n_turd) in enumerate(n_turds)], 
    log_file='log_$(my_id).txt',
    output='output_$(my_id).txt',
)

actual_log_files = [f'log_{i}.txt' for i in range(len(n_turds))]
while any(status(f) == 'RUNNING' for f in actual_log_files ):
    time.sleep(1)

actual_output_files = [f'output_{i}.txt' for i in range(len(n_turds))]
n_total_shits = sum(int(open(f).readline()) for f in actual_output_files )
And part of abstracting away all that poo poo is doing the log_files and output files expansion. Most of the time you're just doing simple substitution so I could stop giving a poo poo and do a str.replace but in theory you can do more complicated substitutions and I figured if there's already a library to do it I might as well use it.

Zoracle Zed
Jul 10, 2001

Genuinely good post.

At risk of completely ignoring all of the social context you're discussing and reverting to purely technical solutions: recently I had to do some unfortunate pythonpath wrangling and discovered that, if you're in a conda env,

Sh code:
conda env var set "PYTHONPATH=$(pwd):$PYTHONPATH"
will add the cwd to pythonpath only when the venv is active. (And conda'll print a little warning message when you activate the venv.) It's a nice intermediate between setting an envvar temporarily (and forgetting to redo so every time you restart your console), and stuffing it in your bashrc and forgetting to tell other people about it when you share code. Actually there's even support for it by conda-dev for putting it in your environment.yml file so it's explicitly specified.

Zoracle Zed
Jul 10, 2001
Anyone have any experience with making custom Conda forge recipes? There’s a library I’d like to use that has a couple of c++ dependencies that are available on Conda forge as Linux binaries but not windows, which is what I’m using. But all of these dependencies have appveyor scripts to ensure they build successfully on windows. So... it should be possible to wrap all of them up in a custom Conda recipe so I don’t have to build them manually myself, right?

God I hate dependency janitoring

Zoracle Zed
Jul 10, 2001

SurgicalOntologist posted:

I wrapped some external c++ dependencies in conda forge recipes like 4 years ago. It wasn't too hard although I didn't have to deal with any Windows bullshit. The recipe is just the instructions though, you still have to build the recipe somewhere.

Oh drat, I was under the impression that conda-forge did some CI server type thing that would actually build the recipe for me.

Zoracle Zed
Jul 10, 2001

Zoracle Zed posted:

Anyone have any experience with making custom Conda forge recipes? There’s a library I’d like to use that has a couple of c++ dependencies that are available on Conda forge as Linux binaries but not windows, which is what I’m using. But all of these dependencies have appveyor scripts to ensure they build successfully on windows. So... it should be possible to wrap all of them up in a custom Conda recipe so I don’t have to build them manually myself, right?

God I hate dependency janitoring

Commenting on my own post because this turned out to be easier than I thought it would and maybe it will be useful for someone.

A package foo at https://anaconda.org/conda-forge/foo has a recipe repo at https://github.com/conda-forge/foo-feedstock. (it's weird the conda-forge listing doesn't link to this.)

Fork the feedstock repo, remove
code:
skip: True  # [win]
from meta.yaml, and add a bld.bat to the recipe folder:

code:
mkdir build && cd build

cmake -G "NMake Makefiles"^
    -DCMAKE_BUILD_TYPE=Release^
    -DCMAKE_INSTALL_PREFIX=%LIBRARY_PREFIX%^
    -DCMAKE_PREFIX_PATH=%LIBRARY_PREFIX%^
    -DCMAKE_INSTALL_LIBDIR=lib^
    %SRC_DIR%

if errorlevel 1 exit 1

nmake
if errorlevel 1 exit 1

nmake install
if errorlevel 1 exit 1
Commit the changes, then run "conda smithy rerender -c auto" which does some magic with CI authorization tokens. Then submit a pull request and hope it builds successfully.

Kinda neat!

Zoracle Zed
Jul 10, 2001
xarray is pretty sick for multidimensional arrays where a 2d table just isn't enough

Zoracle Zed
Jul 10, 2001
imo any ad hoc bandaid for hosed up data like that is going to be an accelerating spiral of immiserating maintainance work--just say no!

Zoracle Zed
Jul 10, 2001
so... sorry if you already mentioned this, but you have proposed to the powers-that-be that they may want to reconsider their Supremely hosed Data Serialization Format?

Zoracle Zed
Jul 10, 2001
oh man, that's brutal. my sympathies

Zoracle Zed
Jul 10, 2001

Marx Was A Lib posted:

e: don't want to doublepost, however I just stumbled upon xlwings, this may have some promise.

I'm thinking now that if you're to have any hope of a maintainable solution, you're going to want to model the data source as correctly as possible as an Excel spreadsheet, rather than a text stream you're doing regexes on, or trying to find a quick intermediate conversion to pandas, etc. My primary concern would be that for any python/xls interface, you're inevitably going to run in to some insane excel quirks/bugs/corner cases, and ideally you'd fix those at the level of the xls interface library, avoiding as much as possible the writing of SFDSF-specific kludges. If xlwings is a commercial/closed-source product, this avenue might be closed to you. Just a thought!

Zoracle Zed fucked around with this message at 16:21 on Sep 4, 2020

Zoracle Zed
Jul 10, 2001

Dominoes posted:

The ability to let your IDE (like Pycharm), or a type checker (like mypy) makes type hings a gamechanger for the language. Related: Check out dataclasses, which use them.

something that i'd wish i'd realized a long time ago: pass a list of dataclass elements to the pandas dataframe constructor and it just figures out the appropriate column names and types

Zoracle Zed
Jul 10, 2001
I'd recommend the grouper iterator but god drat it's annoying itertools has a "recipes" section in the documentation instead of just putting code in the library where it'd be useful

Zoracle Zed
Jul 10, 2001
just a minor suggestion that's it's also cool & good to avoid the import convenience shuffling as long as possible and stick to slamming alt-enter in pycharm as long as possible. honestly even then i'd still prefer something siloed off like "from butts.quickstart import Butt, Fart, poo poo"

Zoracle Zed
Jul 10, 2001
a python tutorial in 2021 targeting py2 is... very odd.

Zoracle Zed
Jul 10, 2001
A couple of my favorite libraries implement nice _repr_html_ methods so objects print rich formatted representations in Jupyter Notebook. Whenever I've looked into it their source code, there's a ton of artisanal hand-crafted html & css formatting. Anyone ever seen a library for doing that kind of thing automatically that handles composition? Like if Fart and Poop both implement _repr_html_, a Butt object that has fart and poop variables should just shove their _html_repr_s into a table or tree or something.

Zoracle Zed
Jul 10, 2001

Bundy posted:

Use sqlite or something god drat pandas has become the new excel

I am somehow angry at how true this is

Zoracle Zed
Jul 10, 2001
Anyone else noted how awful it is googling for anything python-related these days? SEO means the first couple pages of results are all geeksforgeeks.com and other awful beginner tutorial spam. (Which, like, even for beginners, seem bad.)

Anyway, here's my actual question: in matplotlib.pyplot.plot there's this concept of a format string, so 'r--' means a red dashed line, etc. Somewhere in matplotlib, I assume, there has to be a format string parser that takes in 'r--' and spits out {color: 'r', linestyle: '--'} or whatever. Point me in the right direction, please? Whenever I write convenience plotting methods I end up fighting to balance the convenience of the very-compact format string style and actual structured argument handling.

Zoracle Zed fucked around with this message at 06:32 on Mar 24, 2021

Zoracle Zed
Jul 10, 2001

DoctorTristan posted:

You want to know the kwargs alternative to the format string, or the location of the parser module? If the latter why not just call it with an invalid string and see where the exception gets thrown from?

the latter, and that's exactly what I needed, hah!

Zoracle Zed
Jul 10, 2001

former glory posted:

Thanks, both posts were really helpful. I'm going to attack it again later tonight with that methodology. I'll keep a snapshot of my entire venv once I finish up my current and first big python project to hopefully avoid this when I inevitably unbox it a while from now.

If you get stuck, post the requirements.txt and I'll take a stab at it (no promises), assuming "a good 7-8 libraries listed with specific versions" means that's the total number of requirements (as opposed to "7-8 of the 75 dependencies have versions pinned").

Zoracle Zed
Jul 10, 2001

punished milkman posted:

you can make this even easier with:

print(f”{x=}”)

f-strings are awesome.

that's neat -- had to look it up for myself. it's new in 3.8 apparently

Zoracle Zed
Jul 10, 2001
Probably overkill for you because you're using bare dicts, but the generic support in mypy is pretty cool for composition like that

code:
from typing import TypeVar, Generic

class ReportType:  ...
class Foo(ReportType):  ...
class Bar(ReportType): ...
    
RT = TypeVar('RT', bound=ReportType)

class Report(Generic[RT]):
    def __init__(self, report_type: RT):
        self.report_type = report_type
then you can annotate report variable types as Report, Report[Foo], or Report[Bar], etc. Pycharm's type infererence will autosuggest correct methods for report.report_type

Zoracle Zed
Jul 10, 2001
I like the way set operations work with dict keys:

code:
for color in (line.split() & count.keys()):
    count[color] += 1
edit: assuming you don't want to count duplicates, (just the original), like foxfire_ hinted

Zoracle Zed
Jul 10, 2001
that's what the walrus operator is for

code:
[n for name in names if (n := name.lower())[0] == "m"]

Zoracle Zed
Jul 10, 2001
that's correct. I think you're looking for np.where(myArray>=100)[0]

Zoracle Zed
Jul 10, 2001
no idea what Folium is, but I'd guess you're running into a classic lambda variable binding gotcha:

Python code:
fs = [lambda: i for i in range(3)]

for f in fs:
    print(f())
prints "2, 2, 2"

one kludge is to bind the variable to an argument default value:

Python code:
fs = [lambda i=i: i for i in range(3)]
which prints 0, 1, 2

of course there's other, much more readable solutions, like:

Python code:
def make_fn(i):
    return lambda: i

fs = [make_fn(i) for i in range(3)]

Zoracle Zed
Jul 10, 2001
short answer: "''.join(filter(str.isalnum, s)).lower()

longer answer:

filter accepts a function and an iterator. it applies the function to each element in the iterator, keeping the element if the applied function returns a truthy value. and iterating over strings iterates over their individual characters. so

Python code:
filter(str.isalnum, s)
is equivalent to:

Python code:
(char for char in s if char.isalnum())
then

Python code:
"".join(iterator_of_characters)
glues the individual characters back into a single string. which is the thing you wanted to apply lower to

Zoracle Zed fucked around with this message at 00:34 on Sep 5, 2022

Zoracle Zed
Jul 10, 2001
Because DCLP3 isn't quoted, it's being interpreted as a variable, not a string. Ideally you want your database api to handle string escaping for you because 1) it's a huge headache and 2) the whole sql injection thing once you need to worry about malicious data, etc. Should look more like this (can't remember if sqlite uses %s or just % as the placeholder):

code:
c.execute("INSERT INTO blood_glucose VALUES (%s, %s, %s, %s)", ('DCLP3, row["SID"], row["Date_Time"], row["Value"]))
note instead there's also a pandas method for dumping a dataframe to sqlite: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

but... if you've already got the data in memory as a dataframe, are you sure you want/need to use a database at all, instead of working directly on the dataframe?

Zoracle Zed
Jul 10, 2001

SporkOfTruth posted:

if you're not a bit careful, you can fall into the trap of forgetting the yield order.

yes it seems very strange to ever be expected to remember the order, and accepting an argument to control the iteration order seems even weirder! I'd probably do something like yielding a namedtuple of (time, agent, source, outputs) or whatever so the user is free to later sort them however they'd like.

Zoracle Zed
Jul 10, 2001

SporkOfTruth posted:

Controlling iteration order "makes sense" if you're an engineer who wants to plot/observe all of the simulated data before you run an experiment on it, and you're used to 3D arrays in Matlab where this would be possible. (I was told at one point "why isn't this a Numpy array".) Otherwise, yeah, it's a weird ask!

In that case even better then a numpy array (where you still need to remember the axis order) is an xarray array, with labeled axes.

Zoracle Zed
Jul 10, 2001
check out pep 671 - late-bound function argument defaults

Zoracle Zed
Jul 10, 2001

Jose Cuervo posted:

I keep having SQL questions which I cannot seem to locate the answer to:

just guessing but I bet your SQL driver is (correctly, imo) not interpolating the tss variable here:

code:
datetime(:start_dt, '+:tss hours')
because it's inside a string literal. try it something like this, (modulo whatever the appropriate string concatenation operator is for your db)

code:
datetime(:start_dt, STR_CONCAT('+', :tss, ' hours'))

Zoracle Zed
Jul 10, 2001

C2C - 2.0 posted:

Here's the error I'm getting:
code:
AttributeError: 'NoneType' object has no attribute 'render'
music.html
Prior to this, I didn't have the def show(): function written and was receiving the same error. The function was lifted straight from the documents and I'm still getting the same error. The only thing I came across while searching is that the notebook defaults to True unless it's passed a False argument which seems to be rendered moot by the function itself.

big picture suggestion here for when you're asking for troubleshooting help: try first to simplify the problem as much as possible. here's a heavily simplified version that exhibits the same error:



notice how this no longer requires us to understand what's in your 'music.json' (that would've been my first guess for a NoneType attribute error), and including the stack trace helps narrow down the problem

anyway, after poking around a bit, I think

code:
g.show(‘music.html’)
can be changed to

code:
g.show(‘music.html’, notebook=False)
using notebook=True (the default) pyvis is attempting to construct some HTML representation for inline viewing in a jupyter notebook. With it off, it just writes the requested html file.

if you do want the inline plot, it seems like you need to specify that at Network initialization for the template to be set properly?

code:
from pyvis.network import Network
net = Network(notebook=True)
net.add_node('fart')
net.show('music.html')
I'm not a pyvis user but imo this would warrant a nice, polite issue submitting to their github or whatever, seems like an unnecessary beginner trap

Zoracle Zed
Jul 10, 2001

Jose Cuervo posted:

I am trying to parallelize (using the joblib module) some code which involves querying a database (the query I had issues with earlier).

first: sqlite (afaik) isn't great for parallel writing, so this is only worth doing if each task is only reading from the database.

second: the error you have there says "TypeError: cannot pickle 'sqlite3.Cursor' object". Can you see how to rearrange some things so the cursor object doesn't need to be shared between the jobs running in parallel?

Zoracle Zed
Jul 10, 2001
my #1 suggestion would be to start contributing to open source projects you're already using.

1) If it's a project you use and like, presumably the people who maintain it know a whole bunch of stuff about python you could learn
2) Having a pre-existing project to contribute to removes a lot of "where do I even begin?" blank-page syndrome
3) In addition to language-specific skills, getting better at source control and collaborative programming is useful everywhere
4) It can genuinely feel good to contribute -- you might make the world an infinitesimally better place!

Oysters Autobio posted:

With Google I realized that I just keep finding the same SEO-padding blogs that actually say next to nothing in terms of actually useful tutorials.

this poo poo sucks so bad and it's depressing how it's never going to get better

Zoracle Zed
Jul 10, 2001

Oysters Autobio posted:

I've wanted to do this but feeling a bit intimidated and overly anxious (which is silly now that I say it outloud I know) about looking like an idiot when submitting a MR. Worried my Python abilities aren't "meta" enough for actually contributing it somewhere. But, you're right actually. Plus if its something I think I could use at work then it actually might be significantly easier to try and build inu a couple features I might find useful into an existing codebase rather than trying to re-make whatever edge-case and being intimidated by the whole initial setup.

yeah! don't forget you can dip your toes in the water by adding or improving documentation, something that pretty much every project ever will welcome with open arms. I like to start contributing to a new project with small PRs anyway, it's nice to feel out the maintainers a bit -- how fast they respond, how detailed their feedback, how stringent they are on formatting issues, etc.

Zoracle Zed
Jul 10, 2001
My first guess is something like

Python code:
@dataclass
class TestParams:
    noise: float
    rate_tolerance: float


def query(...) -> Iterator[Tuple[int, TestParams]]:
    """yields (channel_id, TestParams) tuples"""

    yield 17, TestParams(noise=0.07, rate_tolerance=0.001)
    yield 18, TestParams(noise=0.08, rate_tolerance=0.002)
but tbh I don't really understand your data model.

Adbot
ADBOT LOVES YOU

Zoracle Zed
Jul 10, 2001

FISHMANPET posted:

I'm writing a Python module, and I've gotten myself stuck into some circular dependencies. I've found some "basic" stuff about fixing that, but I seem to have gotten myself into a deeper pickle. Basically, I have a User class, and that User class has methods that will return a Group. And I have a Group class that has methods that will return a User. And I'm trying to define these in separate files, otherwise this is gonna be some massive 2000 line single-file module. I've finally just "thrown in the towel" so to speak and stopped importing at the top of the file, and instead am importing only when a function is called, but that doesn't feel great either.

Foxfire_ posted:

For the rest of the situations, local imports like you're doing are the typical workaround.

I usually do the local import like Foxfire_ suggests, especially if they can be constrained to a single method that 'converts' between the two types. The other option is, instead of having group.py 'from user import User', just do 'import user' and then refer to user.User when it's needed. That way you're not referencing any module members at the top level and avoid the circular import problem.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply