Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm wanting to improve my Python some because I started a new job where I want to be relatively proficient in it, but also won't be doing Python day-in day-out, so I figured I'd make a little toy application with it like I've been doing for years in Java, Javascript, or Go.

I'm wanting to do a little OAuth-driven web frontend around https://github.com/sgratzl/slack_cleaner2. I'm thinking that I want sessions to store user-specific slack API credentials that I get during the oauth flow, background processing to actually do the many, many slack API calls when it needs to crawl and delete messages of a certain type. The high level flow I'm thinking of is:
  • Link to slack via OAuth to allow the app access it needs and learn who the user is
  • User inputs how they want to run the cleaner
  • slack_cleaner runs in the background until it's done cleaning.

This is a toy to me, I want it to be deployed in a way where it'll work for me, but I don't need persistent storage. If I were doing this in Java, I'd use Spring Boot with in-memory sessions and async processing on a threadpool for calling Slack's API. In NodeJS, I'd freehand "sessions" in a shared data structure, have the background jobs just be promises that I don't check the result of, and just leave it a single process for my expected levels of load.

I'm starting off sketching this out in Flask, but wondering if I should go for full-fat Django just to get easier sessions, although it looks like Django is going to demand a real database. From what I can tell, doing something like I'm wanting with Python is going to need some type of external in-memory storage for sessions (Redis?), I'm going to need some type of external task queue (Celery?) to manage the background processing, and I'm going to want multiple Python processes for web and multiple python process for doing the async processing. I'm probably putting this behind nginx because I know nginx extremely well and that's how all my other toys are deployed, but now I'm having to figure out uWSGI vs Gunicorn also.

Am I reaching for too many tools here? How could I simplify this and do this with fewer parts and processes in Python-land? I could also just implement the worker that consumes tasks from a task queue and bang out the OAuth login flow & sessions in Java or JS in hours, but I'm doing this intentionally to improve my Python.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I appreciate the feedback from both of you! I need to handle an OAuth2 bot access tokens, so I need persistent server-side state somewhere beyond just a task queue, and I'm also hoping to host this on the same home server that I already have hosting a dozen of my previous bad ideas, so I'm going to go ahead & use uWSGI, nginx, Celery, & Redis.

I've got Celery & Flask set up and working together decently, and it's fine for local development but actually deploying this means that I now have 3 new services (uWSGI, redis, celery), and I'm not at all confident in my uWSGI or celery config. I guess I'll just peek once in a while to make sure nothing's crashing.

I did give up on figuring out the correct way to use the Flask application factory while still letting celery be declared globally so that I could annotate methods with @celery.task, and to my great shame I just have everything in one file right now. It looks like this would be a decent way to do it, but right now I need the app to be created before configuring celery, and need celery available globally so I'm not using the create_app() factory and just creating the app myself manually.

https://blog.miguelgrinberg.com/post/celery-and-the-flask-application-factory-pattern

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Epsilon Plus posted:

It sounds like you've determined Django would be absolute overkill for your needs, but it's worth noting for future reference that a django-admin created barebones project is configured with a sqlite3 database, no muss no fuss.

Actually, I am entirely regretting my decision and should have gone with Django all along. I get along with the Spring project well in Java-land, I have no idea why I didn't go with the heaviest-weight most batteries included Python framework.

Can Django use that SQLite database out of the box for background task processing?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Data Graham posted:

Sure, it's just another database. The ORM abstracts away nearly all distinctions.

(Yes there are some caveats and DB-specific hacks you can do, but they're not really relevant here)

Thanks. I'm guessing something like this would be the solution that one would reach for on Django: https://django-background-tasks.readthedocs.io/en/latest/

I still seem to need another service to manage the background processing python process, vs being able to use the same pool of Python processes to serve web requests and poll for background jobs. I'm probably not going to change because I did get this working on Flask / Celery / Redis, but it's good to know.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I could use a hand figuring out where setuptools gets default configuration from. I am failing to pip install uwsgi on an Apple mac in a Python 3.9.9 venv.

I have an Apple Silicon Mac that has conda installed for some projects, but others are managed with pip and venv. I am able to create a venv entirely outside of anaconda by calling a non-anaconda python executable that was installed with pyenv to create the venv, which I'm then using pip to install libraries with.

The problem that I'm having for some packages is that for some reason, setuptools within the venv is trying to link against the Anaconda library path, which doesn't have arm64 libraries, like this:

pre:
/opt/anaconda3/lib/libz.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
When I see the pip log, everything is running in the venv appropriately except for the link step, which is calling the linker with the below, but I have no idea where any configuration would be that's causing Python to look in the anaconda path for this, because everything should be either in the virtual environment or linking against system libraries.

It's invoking the linker with:
pre:
-L/Users/me/opt/anaconda3/lib

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

QuarkJets posted:

When you say "installed with pyenv", could that have come from an anaconda environment?

I don't know much about Mac but I know that it's *nix-y. In *nix your LD_LIBRARY_PATH dictates what libraries are visible to you. Likewise, PATH shows access to directories with executables in them. Check your LD_LIBRARY_PATH and PATH and see if you have an anaconda3 path in there. Then recursively grep for anaconda3 in the directory that holds your venv and see what comes up.

Or just skip doing any of that poo poo and instead grab a mambaforge installer and create a conda environment wherever you need for whatever packages you want, abandoning whatever janky workflow has led to this situation.

My continued debugging efforts are below, but I've got a quick question too: How do people do local devlopment of a celery application given that celery is a separate process? I've got a local devleopment start script, but this feels pretty janky and I bet there's a better way to do this:

pre:
#!/bin/sh

export FLASK_ENV=development

celery -A app.celery worker &
flask run

I really appreciate the suggestion and am checking out Mamba, but I'm still stuck at this same failure state. Should I just give up at using pip and only use packages offered through the conda infrastructure? I'm only using pip for this one because I'm much more familiar with doing CI / deployment of web services using pip with requirements.txt.

My full setup is:
  • pyenv installed from brew
  • conda installed with anaconda desktop distribution initially, now trying mamba
  • Conda environments used for stuff the scientists give me
  • Python versions installed with pyenv used for things I want to

To create a virtual environment, I'm directly invoking a Python version installed with pyenv like this:
pre:
 ~/.pyenv/versions/3.9.9/bin/python -m venv env
LD_LIBRARY_PATH is not set at all. I think that this isn't a problem with LD_LIBRARY_PATH because when I run pip install uwsgi or pip install -r requirements.txt, I see the below, which is why I came to the python thread. Pip trying to install it results in it compiling from source, where we see clang specifically being called with: -L/Users/me/opt/anaconda3/lib

I have no idea where that is coming from, but I feel like it must be coming from the Python setuptools, because if the command were run

pre:
    *** uWSGI linking ***
    clang -o /Users/me/src/me/slack-deleter-pro/env/bin/uwsgi  core/utils.o core/protocol.o core/socket.o core/logging.o core/master.o core/master_utils.o core/emperor.o core/notify.o core/mule.o core/subscription.o core/stats.o core/sendfile.o core/async.o core/master_checks.o core/fifo.o core/offload.o core/io.o core/static.o core/websockets.o core/spooler.o core/snmp.o core/exceptions.o core/config.o core/setup_utils.o core/clock.o core/init.o core/buffer.o core/reader.o core/writer.o core/alarm.o core/cron.o core/hooks.o core/plugins.o core/lock.o core/cache.o core/daemons.o core/errors.o core/hash.o core/master_events.o core/chunked.o core/queue.o core/event.o core/signal.o core/strings.o core/progress.o core/timebomb.o core/ini.o core/fsmon.o core/mount.o core/metrics.o core/plugins_builder.o core/sharedarea.o core/rpc.o core/gateway.o core/loop.o core/cookie.o core/querystring.o core/rb_timers.o core/transformations.o core/uwsgi.o proto/base.o proto/uwsgi.o proto/http.o proto/fastcgi.o proto/scgi.o proto/puwsgi.o core/zlib.o core/regexp.o core/routing.o core/yaml.o core/xmlconf.o core/dot_h.o core/config_py.o plugins/python/python_plugin.o plugins/python/pyutils.o plugins/python/pyloader.o plugins/python/wsgi_handlers.o plugins/python/wsgi_headers.o plugins/python/wsgi_subhandler.o plugins/python/web3_subhandler.o plugins/python/pump_subhandler.o plugins/python/gil.o plugins/python/uwsgi_pymodule.o plugins/python/profiler.o plugins/python/symimporter.o plugins/python/tracebacker.o plugins/python/raw.o plugins/gevent/gevent.o plugins/gevent/hooks.o plugins/ping/ping_plugin.o plugins/cache/cache.o plugins/nagios/nagios.o plugins/rrdtool/rrdtool.o plugins/carbon/carbon.o plugins/rpc/rpc_plugin.o plugins/corerouter/cr_common.o plugins/corerouter/cr_map.o plugins/corerouter/corerouter.o plugins/fastrouter/fastrouter.o plugins/http/http.o plugins/http/keepalive.o plugins/http/https.o plugins/http/spdy3.o plugins/signal/signal_plugin.o plugins/syslog/syslog_plugin.o plugins/rsyslog/rsyslog_plugin.o plugins/logsocket/logsocket_plugin.o plugins/router_uwsgi/router_uwsgi.o plugins/router_redirect/router_redirect.o plugins/router_basicauth/router_basicauth.o plugins/zergpool/zergpool.o plugins/redislog/redislog_plugin.o plugins/mongodblog/mongodblog_plugin.o plugins/router_rewrite/router_rewrite.o plugins/router_http/router_http.o plugins/logfile/logfile.o plugins/router_cache/router_cache.o plugins/rawrouter/rawrouter.o plugins/router_static/router_static.o plugins/sslrouter/sslrouter.o plugins/spooler/spooler_plugin.o plugins/cheaper_busyness/cheaper_busyness.o plugins/symcall/symcall_plugin.o plugins/transformation_tofile/tofile.o plugins/transformation_gzip/gzip.o plugins/transformation_chunked/chunked.o plugins/transformation_offload/offload.o plugins/router_memcached/router_memcached.o plugins/router_redis/router_redis.o plugins/router_hash/router_hash.o plugins/router_expires/expires.o plugins/router_metrics/plugin.o plugins/transformation_template/tt.o plugins/stats_pusher_socket/plugin.o -lpthread -lm -lz -L/Users/me/opt/anaconda3/lib -lpcre -lexpat -lintl -ldl -framework CoreFoundation /Users/me/.pyenv/versions/3.9.9/lib/python3.9/config-3.9-darwin/libpython3.9.a
    ld: warning: ignoring file /Users/me/opt/anaconda3/lib/libz.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
    ld: warning: ignoring file /Users/me/opt/anaconda3/lib/libpcre.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
    ld: warning: ignoring file /Users/me/opt/anaconda3/lib/libexpat.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
    ld: warning: ignoring file /Users/me/opt/anaconda3/lib/libintl.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
    Undefined symbols for architecture arm64:
      "_XML_ErrorString", referenced from:
          _uwsgi_xml_config in xmlconf.o
      "_XML_GetCurrentLineNumber", referenced from:
          _uwsgi_xml_config in xmlconf.o

I feel like I'm getting so, so much closer. The libraries that I need are in /opt/homebrew/lib, because I've installed them or something that uses them with Homebrew at some point. I ended up working around this by installing uwsgi from homebrew instead of using pip, but I am worried this issue will crop up again in the future whenever a package needs to link against native libraries again.

This would be solved if anaconda would distribute arm64 libraries or universal libraries instead of x86 only, then it would be able to link against the libraries that anaconda has.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

QuarkJets posted:

I think that you should try using mamba to create the environment instead of pyenv. Then activate the environment and use "mamba install X" or "pip install Y" to install things. You should not touch any other version of pip or conda than what exists from within your environment

Or define a yaml file defining the specific packages and version requirements, and create the entire environment with "mamba env create" while specifying the yaml file. You can put a pip block in conda yaml files, anything within the block will be pip installed, everything else will be mamba installed.

I think doing the above should solve this problem and let you pip install that package, but the specific problem was that pip couldn't build a wheel, right? Conda/mamba use prebuilts, so just using mamba to install whatever package you're stuck on is also something you can try. Personally I don't use pip for anything unless I absolutely have to

Thanks, that was actually the first thing that I tried, mamba / conda only install x86 binaries / libraries, and I was encountering another problem at runtime rather than build-time because it was trying to link conda-managed x86 binaries against system-managed arm64 libraries, which was failing too.

I was using pip because pip is able to load / link arm64 binaries, libraries, and python runtimes, which conda / mamba don't seem to do at all. I'll try just using uwsgi from conda-forge again: https://anaconda.org/conda-forge/uwsgi

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Hey, I wanted to come back and thank this thread for pointing me to better tools, although my environment is still a complete mess.

So far I got rid of the Anaconda environment, and installed miniforge instead. I'm using mamba instead of conda in the base environment, and now my system isn't trying to link against x86 libraries in loaded from anaconda anymore.

However, before being able to pip install uwsgi, I had to specify LD_FLAGS to manually tell it where to load one library. There's a long running Github issue about this: https://github.com/unbit/uwsgi/issues/2361 . I hope that somebody with a detailed understanding of how setuptools works can understand why the setup.py is able to find some libraries installed through homebrew and manually pass them to the linker, but not others.

My honeymoon with miniforge / mamba also ended when I couldn't get jupyter notebooks installed just by doing mamba install jupyter, and had to hack in a symlink to make the newer version of a library appear to be an older one: https://github.com/conda/conda/issues/9038.

My overall impression of anaconda / miniforge / conda / mamba is that this is all just a barely working duct-taped together disaster.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I'm using Python to tear through some data files, and un-gzipping them in Python is pretty slow. What's a sane pattern for quickly un-gzipping multiple large files to get them into a single Python process?

I've read http://www.dalkescientific.com/writings/diary/archive/2020/09/16/faster_gzip_reading_in_python.html, and it looks like one faster approach to ungzipping in Python is to use https://github.com/pycompression/xopen, which launches an external native gzip or pigz, directs its output to a pipe, and reads from the pipe in Python, both getting the better native performance and built-in multiprocessing because now the unzip is in a separate process.

Does anybody have sane ideas for how I could set this up to uncompress multiple files at once and get them all into Python in an organized way? I'm using to being able to do simultaneous multi-threading in Java or C# and can't figure out a Pythonic pattern here.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Electoral Surgery posted:

Use multiprocessing's Pool to kick it off once per file, then wait for all the jobs in the pool to finish?

My understanding is that it'd be slow-ish to copy data back from a process, especially if it's potentially enormous. I'm assuming you're saying decompress each file in a separate process, then bring it all back to the master process. I need all of the data from all of the files in one python process to be able to re-assemble it. That's a good very basic improvement that I had missed though!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Reporting back to say that for my specific usage case, xopen is a huge winner: https://github.com/pycompression/xopen

Creates a new process to decompress each compressed file.
Uses the Intel gzip decompressor, the fastest in the land. Unless you have a non-Intel CPU.
Got a change to allow bigger pipes merged back into CPython to make their specific usage case even faster: https://github.com/python/cpython/pull/21921

This is absolutely perfect for having 5 60GB gzips open at once!

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

punk rebel ecks posted:

Wait so I read somewhere that Troika Games coded their titles in Python?

So "Vampire: The Masquerade - Bloodlines", "Arcanum", and "Temple of Elemental Evil" were all coded in Python and not like some form of C like almost every other game?

I would bet they're using Python for game logic, nobody is insane enough to do a 3d engine in Python.

Games tend to have a more comfortable language for logic, whether something custom like UnrealScript in unreal engine 1-3 or a more widely used language like Lua.

EVE online used stackless python for their game logic. I don't know if they still do, but I wouldn't be surprised if it's some horrifying variant.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

punk rebel ecks posted:

I want to learn C++ once I have fully learned Python and Java. Would it be best to learn C first, before I learn C++?

Trying to take all language fanboy-ism out of it, the corporate world runs on Java (and Java-likes like C# and Go). Java and similar languages are a very powerful, flexible tool and you see them everywhere because they're an appropriate tool for a wide variety of solutions. Java itself feels old, lame, boring, occasionally frustrating, and verbose, but is also battle-proven for a whole host of situations that Python, PHP, NodeJS and others are not as good a fit for, or even not capable of reaching.

Old companies, companies with enormous scale, or companies with high performance needs (games) will have a ton of C++ too. It will take more effort to do the same thing to the same level of safety and robust-ness in C++ than it will in Java/C#. That said, if you need memory efficiency and low latency, sometimes C++ is the tool for the job. It has so many sharp edges and hidden traps that some environments with low-latency needs will choose to write a really bizarre style of zero-allocation Java instead of doing C++. High-frequency trading famously does this, running Java with the GC off.

There are a lot of languages competing to replace C++ with something safer, the biggest being Rust and the buzziest at this instant being Zig. Rust is seeing real adoption in places that have needed C++ in the past, like browsers. I'm not very experienced with Rust personally, but my understanding is that knowing C++ well is not necessarily useful for being effective in Rust. However, for all of the languages competing to be the next Java, I think that knowing Java well does help. I'm talking about Kotlin, and Scala, or even C#.

I think that learning C is worth the effort. C itself is very small, and a separate language than modern C++. Also, when you do interop between languages, it will pretty much always be via C FFI, so if you're gluing together Rust and C++ code, you're probably doing it via C! Extending Python will always be via C FFI, same for Java JNI.

So yes, that's a very long-winded way to say "I think that you should learn C first". Not because it helps you learn C++, but because C is the lingua franca that all of these tools can use to talk to each other at the point that you're gluing stuff together. And if you aren't heading somewhere that needs C++, there is a reason that Rust has a cult of people saying "use Rust all the time everywhere", Rust has less downsides than C++ in most situations.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got two stupid Python questions today:

Does numpy fall back on a pure python implementation if something goes wrong with loading the native extensions? I got thrown a python 2 script today where someone said "This is insanely slow, we've been using it but it takes 4-8 days to run" and asked if I could speed it up. I ran 2to3 then reindent over it, then updated its dependencies from numpy 1.16 to 1.23.3. Then when I went to profile it and figure out why it was slow, it finished in less than a minute on the same dataset that had previously taken a week. I tossed it back to the user and they're all happy with the new, python 3 version, but I want to know why the gently caress did porting Python 2 to 3 and updating numpy make it 10,000x faster?!?!?!?! Yes, results are identical between the two. The only thing I can possibly think of is that it was using a broken version of numpy, but 1.16 isn't even that old.

Also, is there a better way to indicate direct vs transitive dependencies in a requirements.txt file? The project only has a handful of direct dependencies, but because i'm checking in a full pip freeze rather than just the scipy>=0.18,<0.19, numpy>=1.16,<1.17 that was there before, now I have all of the transitive dependencies in the requirements.txt as well as the direct ones.

Edit: and while I'm here, what's the best of the modern build system backends. Hatch/hatchery? Poetry-core? Which one do you like?

Twerk from Home fucked around with this message at 05:48 on Oct 12, 2022

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
It looks like projects using pyproject.toml will specify that you should try to install wheels in the [build-system] block: https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/#fallback-behaviour

If I have a package that's using the legacy setup.py, is there a way to make it use wheels when installing? When I pip install my package, I'm seeing a whole lot of:
"Using legacy 'setup.py install' for $DEPENDENCY, since package 'wheel' is not installed."

I know that I could hack around this by installing wheel manually first in an environment before running pip install, but there must be a way to tell everyone's pip to try to use wheels by default because I've seen other packages do it.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Jose Cuervo posted:

I have a large amount of blood glucose time series data from here: https://public.jaeb.org/datasets/diabetes. Each row of the time series data has at least 4 columns - an identifier of which trial the data belongs to, an identifier of which subject from the trial the data belongs to, the date and time of the blood glucose, and the actual value of the blood glucose, and conservatively there are over 20 million rows.

Given a list of subjects and a date time for each subject, I would like to be able to quickly obtain the next 2 hours of blood glucose data for each of those subjects, and I will be performing this query repeatedly.

I have no real experience with databases, but given the size of the data it seems like it would be best to store this data in a database and then run queries to extract the data I want.

Is a Sqlite database with a single table the best way to store this data given the queries I want to run? The single table would have four columns (trial ID, subject ID, date time, and blood glucose value).

SQLite with a multi-column index on subject id and timestamp should work well. If you think you'll need to query by timestamp only but not subject only, reverse that order.

https://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Mr. Nemo posted:

Hello, I'm trying to scrape forum threads as a personal project, so far I got this to get information from each post in a thread

a) User
b) Date
c) The text

code:
PythonThread="https://forums.somethingawful.com/showthread.php?threadid=3812541&userid=0&perpage=40&pagenumber="
PostsDictionary={"User_Name":[],"Date":[],"Post_Content":[],"Post_Words":[]}
postsperpage=42 #We know there are 40 user posts in each page, and 2 ads which are counted as posts
for i in range(100): #the 100 is just a place holder
    j=i+1
    print("Scraping page "+str(j))
    Response=requests.get(PythonThread+str(j))
    pageTemp=BeautifulSoup(Response.text)
    for k in range(postsperpage):
        username=str(pageTemp.find_all("td",class_="userinfo")[k].find("dt"))
        usernameclean=username[username.find(">")+1:username.find("<",1)]
        if usernameclean!="Adbot": #name of the user that posts the ads previously mentioned
            PostsDictionary["User_Name"].append(usernameclean)
            date=str(pageTemp.find_all("td",class_="postdate")[k])
            dateofpost=pd.to_datetime(str(date[date.rfind("a>")+3:date.rfind("a>")+21]))
            PostsDictionary["Date"].append(dateofpost)
            text=str(pageTemp.find_all("td",class_="postbody")[k])
            text=cleanhtml(text)
            text=text.replace("\n","")
            PostsDictionary["Post_Content"].append(text)
            PostsDictionary["Post_Words"].append(len(text.split()))
It works, and I end up with a dictionary I can then transform into a Dataframe to analyze individual user data, create a .doc with the thread, etc.

But it's a bit slow. Any suggestions on how it could be improved?

Have you measured how much of the slowness is making the request, waiting for a response vs your parsing code? If it's your code that's slow, which would be a little surprising, one immediate thought is that len(text.split()) is a really slow way to count words.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Mr. Nemo posted:

I have no idea how to do that to be honest, how do you suggest I go at it? I took a 6 class python course and this is my final project (including some space filler plots and other stuff). I guess I could run it twice for 100 pages with and without that word counter line and see if there's a noticeable difference.

Also the way I'm getting the name and dates is VERY manual, because I haven't had time to really dive into the HTML stuff beyond finding a "large" tag, that may be adding some time?

The simplest way to test if the request is slow would be to download the html for a page into a file and read from that, and time how long it takes to do all this while reading the html from a file vs actually making the request. If you want to make the whole thing faster regardless of why it's slow, just run lots of python processes, but don't be surprised when your IP gets blocked from hitting SA because you made a bunch of automated requests.

If you want to actually measure why the whole thing is slow, break out a profiler. cProfile and snakeviz would be a decent place to start, but if you fire up scalene it's nicer.

https://github.com/plasma-umass/scalene

Edit: my suggestion to fire up a lot of python processes assumes you're on a recent computer with 8+ cores.

Twerk from Home fucked around with this message at 23:45 on Dec 21, 2022

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

C2C - 2.0 posted:

edit: nevermind. Just realized it's an issue with database entries.

Your life will be easier if you normalize a tiny bit, and have a cities table and cafes table, then you can select cities by a city ID and not have to do string searching. At the very least, where you know exactly what city value you are searching for, you want to avoid using LIKE if you don't have to.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
How much Python 2 are y'all living with? What's a sane expectation for carrying along Python 2 applications that don't have any change planned but are expected to keep working forever?

If we try and mandate Python 3 by decree but without getting everyone's hearts and minds onboard, there's going to be grumbling for all eternity. My own thoughts on it are that all code does end up being modified at some point, and the less often you encounter Python 2, the more likely you are to make mistakes in it. I'd like to port everything that runs in 2023 to Python 3, full stop.

I'm just tired. Tired of having to remember 2/3 differences, tired of collaboration failures where the two parties are on different Python versions and don't voice this at the start of a conversation, tired of ancient packages.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Gothmog1065 posted:

So frustrated now, but lead tech doesn't want to "introduce another language", so going to have to rewrite this in kornshell (because heaven forbid we use something slightly more updated), if not, gonna have to write it in C or C++ most likely. so yeah..


:suicide:

I used to think that I Got the Life with kornshell, but then after I Did My Time I felt like it was all Coming Undone.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Hughmoris posted:

For those of you who fiddle with Python at home, what does your dev environment look like? Do you have Python installed directly to your desktop? Utilizing containers, or a remote environment?

System Python and a virtual environment per project is a low-fuss way to live, unless your system python is too old because you're riding CentOS into the ground for some reason.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Boris Galerkin posted:

I am setting up a python dev environment on Windows 11, only ever used Linux or Mac. On those I just `conda install python3` and let conda manage all my dependencies cause I typically need the scientific poo poo.

Just do the same thing in Windows?

If you've used Anaconda and it works for you, keep using conda.

I hope you don't need any packages from bioconda, because they do not support Windows.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

xzzy posted:

Anyone got a technique or tips for having a globally available script run from a venv without making a user source the activate script? It also needs to support tab completion (using the argcomplete module).

I've got a script that needs some newer modules than the base OS provides, and I'm not able to mess with OS packages. I've done some googling and found some suggestions for making a wrapper script or setting up an alias setting with environment variables, but nothing really works all that well. The tab completion is the biggest hangup.

I know I'm going to have to drop a script in profile.d to do some setup and that's fine, but it would be nice if it could work transparently without interfering with other python scripts that aren't in the venv.

Look at how pipx works, or just use it directly: https://github.com/pypa/pipx

A virtual environment for each script, and an entrypoint script put somewhere on the users path that launches the right python.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I've got a weird corner case for distributing wheels that I'm bashing my head against. I'm checking out https://github.com/pypa/cibuildwheel as a way to build wheels for a whole host of different targets using manylinux so that nobody has to compile poo poo themselves from a source distribution, but I noticed that cibuildwheel is directly calling wheel and not building a source distribution first with the build module.

This means that it would fail to detect problems with the sdist, because it never uses the sdist to build a wheel. For example, there were some files that were moved around without updating the MANIFEST.in, which meant that whenever anyone tried to compile from the sdist it would fail due to missing headers.

Am I wrong in thinking that cibuildwheel should build the sdist first and then build a wheel from that vs just building a wheel straight from the repository code? Is this project in python packaging hell and never should have gotten here? I'm going to read more about configuring cibuildwheel and see where I get.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's the right way to ship a Python package on PyPI that needs boost headers? I know that I can build a hell of a lot of binary wheels with cibuildwheel, but an sdist is likely to be the best option for portability and long-term usability.

In R-land CRAN has a lot of header-only packages available for common C++ dependencies, which means that my R package doesn't need to include Boost, I just depend on BH: https://cran.r-project.org/web/packages/BH/index.html. In conda, there's a whole of lot headers packaged. In PyPI, either I can't find where someone else has packaged Boost, or there's just not a culture of doing this.

I'd really like the sdist to have boost delivered from the same place as the binary wheels, so I'd use conda-packaged boost for distributing the package on conda, and I was hoping to use a PyPI packaged boost rather than system boost, boost from source, boost from conda, or boost from anywhere else.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Oysters Autobio posted:

Just getting into learning flask and my only real question is how people generally start on their HTML templates. Are folks just handwriting these? Or am I missing something here with flask?

Basically I'm looking for any good resources out there for HTML templates to use for scaffolding your initial templates. Tried searching for HTML templates but generally could only find paid products for entire websites, whereas I just want sort of like an HTML component library that has existing HTML. i.e. here's a generic dashboard page html, here's a web form, here's a basic website with top navbar etc.

Bonus points if it already has Jinja too but even if these were just plain HTML it would be awesome.

edit: additionally, has anyone used flask app builder before? The built in auth and admin panels are mighty appealing but I'm worried about locking into a heavily opinionated framework (why I chose flask over say Django)

You want Bootstrap or Foundation if you want some of the most common HTML libraries around. They both have whole page examples, but really it's just about designing with their grid system so that stuff works on a phone but doesn't look stupid on a desktop, which is a tricky balance.

They are just HTML/CSS/JS and you're going to be templating the dynamic parts of the site yourself, but there's a reason that Bootstrap has been the most popular option to make a new site for more than a decade now.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply