Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Hughmoris
Apr 21, 2007
Let's go to the abyss!
I want to create a simple auto-extractor for torrents. I'm on Windows 10 and have Winrar.

What is the best practice to call Winrar (or processes in general) from a python script? Is it using subprocess.call?

Hughmoris fucked around with this message at 23:11 on Sep 29, 2017

Adbot
ADBOT LOVES YOU

Data Graham
Dec 28, 2009

📈📊🍪😋



Why not use a native python rar library?

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Data Graham posted:

Why not use a native python rar library?

I tried rarfile but I was having issues with it finding UnRAR, even when I provided it the full path. I've got subprocess working now though.

Seventh Arrow
Jan 26, 2005

I'm working through an exercise from a book:

https://pastebin.com/zMweG525

It's supposed to be using regex queries to look for phone numbers and email addresses, but it's giving me syntax errors on line 25 and I can't figure out why.

I checked if maybe "text" was a reserved word in Python or something, but that doesn't seem to be the case. Besides, I get the same error if I substitute "txt" or even "floof."

The "pyperclip" module is installed and I've used it successfully before. What am I missing?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

You haven't closed your regex compile call on the previous line (missing the closing parenthesis)

Seventh Arrow
Jan 26, 2005

Woops you're right, thanks. That was obvious, that'll teach me for focusing too much on line 25.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Line 25 just looked pretty ok to me so I figured the problem was caused before that! :frogbon:

Also your linter should really catch that kind of thing and put a red squiggle somewhere since it's never actually closed

baka kaba fucked around with this message at 19:36 on Sep 30, 2017

Seventh Arrow
Jan 26, 2005

Yeah I've been using Geany, which isn't super robust (or maybe it can be configured to be). Maybe I'll just start doing everything in jupyter.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. Bertrand Russell

I came across this article when Googling for something unrelated and I found it semi-interesting.

It's about how Instagram converted from Python 2.7 to Python 3.5. Took them a year to do it.

It's mostly just generalities, but there is this:

quote:

We did not have performance gain expectations for Python 3 right out of the box. So it was a pleasant surprise to see 12 percent CPU savings (on uswgi/Django) and 30 percent memory savings (on celery).

This is in-line with my experience on a couple of projects I've converted to Python 3.

Ellie Crabcakes
Feb 1, 2008

Stop emailing my boyfriend Gay Crungus

Data Graham posted:

Why not use a native python rar library?
There doesn't appear to be one.

That Guy posted:

]No. The current architecture - parsing in Python and decompression with command line tools work well across all interesting operating systems (Windows/Linux/MacOS), wrapping a library does not bring any advantages.

creatine
Jan 27, 2012




Question:

I am passing a list of arrays into a function that will then do some plotting with those arrays. A simple way I thought to get a name was to use the .index() method in a way like this

code:
for data in input_list:
    print(input_list.index(data))
But, I keep getting an error:

code:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Even though I verify the type() of the input is a list

edit:

I managed to get it working by using enumerate inside my zip() call like so
code:
for (i, j), color in zip((enumerate(coords)), Spectral4):

creatine fucked around with this message at 22:37 on Oct 5, 2017

mr_package
Jun 13, 2000
https://github.com/ponty/pyunpack says it supports RAR.

Nippashish
Nov 2, 2005

Let me see you dance!

creatine posted:

Question:

I am passing a list of arrays into a function that will then do some plotting with those arrays. A simple way I thought to get a name was to use the .index() method in a way like this

code:
for data in input_list:
    print(input_list.index(data))
But, I keep getting an error:

code:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

input_list.index(data) effectively looks like this on the inside:

code:
for index, thing in enumerate(input_list):
    if data == thing:
        return index
Your "thing"s are numpy arrays (presumably they're also all the same size you'd get a different error) and if you do array1 == array2 the result is an array of booleans that tell you elementwise equality of array1 and array2. Using this as the condition in an "if" means python asks the array of booleans to convert itself into a single True or False value, and the error you're seeing is numpy complaining that it doesn't know how to compute an array with more than one element into a single boolean.

creatine
Jan 27, 2012




Nippashish posted:

input_list.index(data) effectively looks like this on the inside:

code:
for index, thing in enumerate(input_list):
    if data == thing:
        return index
Your "thing"s are numpy arrays (presumably they're also all the same size you'd get a different error) and if you do array1 == array2 the result is an array of booleans that tell you elementwise equality of array1 and array2. Using this as the condition in an "if" means python asks the array of booleans to convert itself into a single True or False value, and the error you're seeing is numpy complaining that it doesn't know how to compute an array with more than one element into a single boolean.

Interesting, thanks for the explanation!

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Thermopyle posted:

I came across this article when Googling for something unrelated and I found it semi-interesting.

It's about how Instagram converted from Python 2.7 to Python 3.5. Took them a year to do it.

It's mostly just generalities, but there is this:


This is in-line with my experience on a couple of projects I've converted to Python 3.

This is good to know.

Also, hi thread. I am learning python to do some web scraping and data manipulation and eventually machine learning stuff.

Holy crap is it easy. I got all the data from a webpage into a csv with like 4 lines of code (pandas) and theres 19 bajillion examples of how to do this online.

CarForumPoster fucked around with this message at 11:43 on Oct 6, 2017

Hughmoris
Apr 21, 2007
Let's go to the abyss!

CarForumPoster posted:

This is good to know.

Also, hi thread. I am learning python to do some web scraping and data manipulation and eventually machine learning stuff.

Holy crap is it easy. I got all the data from a webpage into a csv with like 4 lines of code (pandas) and theres 19 bajillion examples of how to do this online.

I'm working my way through Automate The Boring Stuff and am on the Web Scraping section. Just curious, with your pandas example, are you scraping full tables or are you using selectors to nab individual items and then building a dataframe?

Hughmoris fucked around with this message at 16:48 on Oct 6, 2017

vikingstrike
Sep 23, 2007

whats happening, captain
I saw the new PyPy has pandas and numpy support. Anyone know of any benchmarking that's been done?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Hughmoris posted:

I'm working my way through Automate The Boring Stuff and am on the Web Scraping section. Just curious, with your pandas example, are you scraping full tables or are you using selectors to nab individual items and then building a dataframe?

I just started yesterday so right now I nab everything and dump it into a CSV.

Also, is there a machine learning thread? Im wondering if I could actually skip this step all together.

I have the problem of similar data like names, dates, events are stored in tables on lots of webpages in semi-different formats on each website. I want to scrape that data and have ~machine learning~(or whatever) sort it out for me such that it is stored in a way I can add to a database. (CSV or something like that) I have ~50 sample webpages I could format how I'd like (desired output) and use to train a model with and could easily get a few hundred if it was worth while.

I know I'd need a way to vectorize either the table info or the html all together...havent figured that out yet.

Machine learning thread?

CarForumPoster fucked around with this message at 17:29 on Oct 6, 2017

Cingulate
Oct 23, 2012

by Fluffdaddy

CarForumPoster posted:

I just started yesterday so right now I nab everything and dump it into a CSV.

Also, is there a machine learning thread? Im wondering if I could actually skip this step all together.

I have the problem of similar data like names, dates, events are stored in tables on lots of webpages in semi-different formats on each website. I want to scrape that data and have ~machine learning~(or whatever) sort it out for me such that it is stored in a way I can add to a database. (CSV or something like that) I have ~50 sample webpages I could format how I'd like (desired output) and use to train a model with and could easily get a few hundred if it was worth while.

I know I'd need a way to vectorize either the table info or the html all together...havent figured that out yet.

Machine learning thread?
You can try the data science thread, or the stats thread.

Though I'm not sure what you actually want. (Supervised) machine learning relates standardised data of one form (predictors) to standardised data of another form (outcomes). It seems to me you're describing a part of the data wrangling process still - although that too might belong into the data science thread.

Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ
I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline.

I'm looking to write a process manager that can monitor these scripts, trigger them when they need to be, and give me the ability to manually start and stop them when I want to. This is running on a Windows Server which seems to limit some of the available libraries already out there.

Am I trying to reinvent the wheel? I can't really find anything that meets these requirements already, but it seems like it should exist.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Cingulate posted:

You can try the data science thread, or the stats thread.

Though I'm not sure what you actually want. (Supervised) machine learning relates standardised data of one form (predictors) to standardised data of another form (outcomes). It seems to me you're describing a part of the data wrangling process still - although that too might belong into the data science thread.

Your understanding is correct. I want a webscraper more flexible than most (all?) web scrapers to scrape a certain type of data and put it in a database. (Dates and info usually stored in tables on many different webpage layouts)

I.e. I want a flexible data formatting tool. I want to use machine learning to do some data wrangling.

Seventh Arrow
Jan 26, 2005

Portland Sucks posted:

I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline.

I'm looking to write a process manager that can monitor these scripts, trigger them when they need to be, and give me the ability to manually start and stop them when I want to. This is running on a Windows Server which seems to limit some of the available libraries already out there.

Am I trying to reinvent the wheel? I can't really find anything that meets these requirements already, but it seems like it should exist.

Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.

Portland Sucks
Dec 21, 2004
༼ つ ◕_◕ ༽つ

Seventh Arrow posted:

Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.

I'm actually using Luigi at the moment. It seems good enough for a kind of fire and forget solution. I'm having issues designing a fault handling system into the work flow. This is the first time I've done ETL work and I don't have any one around to give me insight in to the "right" way to do this I guess. I'm sure my problem is pretty simple:

Data Sources A, B, C, ... Z are folders that get filled with XML files, database tables that get updated every minute or so, and some time series data that gets written to proprietary data files. So the extraction step is retrieving the new data from these sources. At the moment I'm just reaching back with a little bit of overlap to make sure I don't miss anything. I'd like to find a better way to do this.

Transformation of the data into a format I need it to be in is pretty straight forward I guess.

Loading the data I currently have to deal with a regular amount of integrity failures during insertions since I'm grabbing overlapping historical data. This is probably a lovely way to do the load job since it prevents me from batch loading data so I'm doing inserts row by row.

It seems like the "good" way to do this would be to only get the new data during the extraction step, and then to do a batch load all at once during the loading step but I'm not quite sure how to achieve this as of now.

There is also the case that the data I'm fetching may have duplicate data already in it regardless if I'm reaching back further than I need to, so wouldn't that cause a failure of the data insertion if I'm doing it in batch?

The whole ETL process makes sense to me at the high level, but the execution of it is tripping me up.

old.flv
Jan 28, 2017

A good lad who likes his Anna's.

Cingulate posted:

You can try the data science thread, or the stats thread.

where is this thread?

QuarkJets
Sep 8, 2008

It's in the Science subforum: https://forums.somethingawful.com/showthread.php?threadid=3359430

old.flv
Jan 28, 2017

A good lad who likes his Anna's.
thank you!

Dominoes
Sep 20, 2007

Does anything in the OP look stale?

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
Is there a good library to program things for Phillips Hue lights?

CarForumPoster
Jun 26, 2013

⚡POWER⚡

Dominoes posted:

Does anything in the OP look stale?

The Heroku link directs you to a login rather than info about heroku.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Seventh Arrow posted:

Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.

Airflow is a bit more modern. I've been wanting to build a reactive / streaming ETL pipeline for a while but never really gotten off the ground.

At any rate for what Portland Sucks is asking, in any framework what you get is the scheduler, dependency graph, etc. The actual python to convert an your xml file to your database row is something you just have to write.

nonathlon
Jul 9, 2004
And yet, somehow, now it's my fault ...
Any experience with Flask Appbuilder?

I had to put together a simple CRUD webapp and it was a dream doing the first pass: FAB got me there almost trivially. But, now I'm looking to customize some things and refine the behaviour, it's proving a lot harder. Largely due to the lack of available examples. The FAB distribution has quite a lot of examples but they're all for simple things.

OnceIWasAnOstrich
Jul 22, 2006

Ugh asyncio. I felt the need to build a nice fast socket based server app and decided to do it "correctly". Started using asyncio and man in this a mess. Why are both generator coroutines and async functions a thing? Why are there multiple keywords that all do something very slightly different with seemingly arbitrary restrictions on which types of functions they work on?

I do actually know the answer to these questions. I'm just annoyed because the mix of info and examples I find even in the 3.6 documentation is baffling to use as an intro because it is not at all obvious which type of coroutines different bits of the asyncio library work with, at least at first. Trying to figure out whether/how you can schedule? run? new async def functions inside an eventloop which is already running a run_until_complete streaming server is weirdly difficult when examples are using two different APIs and two different forms of coroutines and older info uses newer terminology differently.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. Bertrand Russell

Yes, introducing the pre 3.6 API first was a mistake.

Sockser
Jun 28, 2007

This world only remembers the results!




I've set up an internal PyPi server at my company to manage our common testing code between teams, and it's been working pretty great.

I'm now trying to get python3 working for our tests, and I've hit sort of a weird maybe-snag?

Teamcity is building out my common packages, dropping a VERSION file into the package directory containing, say, '1.0.0.286', and the package is being uploaded to pypi as asc_page_models-1.0.0.286.zip

But now when I'm in a python 3 virtual environment, it downloads the correct version of that package from the server as specified in my requirements file, but pip freaks out about the version number,
Requested asc_page_models>=1.0.0.200 from <path to newest package on server>, but installing version None

If I do a pip freeze, I get
asc_page_models==b-1.0.0.286-n-

What the hell is going on here? Googling says it has to do with the package version not matching the file name but the file name is being derived from the version in my CI pipeline ?

E: just to clarify, no problem if I'm running out of a python 2 virtualenv

E2: could this be caused by my package not having a __version__ variable? It's 730pm, this is well past a tomorrow problem.

Sockser fucked around with this message at 00:31 on Oct 11, 2017

accipter
Sep 12, 2003
I am the author of two Python packages. The first package (pyrvt) provides a set of tools for working with type of motion, and provides a number of classes. The second package (pysra), uses pyrvt, but extends the capabilities of the pysra. I extend those cabilities, but inheriting from a class in pysra, however it feels like I am doing this in a clunky way. Is there a better way do this?

Python code:
# Within motion.py of pysra

class RvtMotion(pyrvt.motions.RvtMotion, Motion):
    def __init__(self,
                 osc_freqs,
                 osc_accels_target,
                 duration=None,
                 peak_calculator=None,
                 calc_kwds=None):
        # Is there a better way of injecting this?
        Motion.__init__(self)
        pyrvt.motions.RvtMotion.__init__(
            self,
            osc_freqs,
            osc_accels_target,
            duration=duration,
            peak_calculator=peak_calculator,
            calc_kwds=calc_kwds)


class CompatibleRvtMotion(pyrvt.motions.CompatibleRvtMotion, Motion):
    def __init__(self,
                 osc_freqs,
                 osc_accels_target,
                 duration=None,
                 osc_damping=0.05,
                 event_kwds=None,
                 window_len=None,
                 peak_calculator=None,
                 calc_kwds=None):
        # Is there a better way of injecting this?
        Motion.__init__(self)
        pyrvt.motions.CompatibleRvtMotion.__init__(
            self,
            osc_freqs,
            osc_accels_target,
            duration=duration,
            osc_damping=osc_damping,
            event_kwds=event_kwds,
            window_len=window_len,
            peak_calculator=peak_calculator,
            calc_kwds=calc_kwds)

breaks
May 12, 2001

If you have control over all the classes used, I think the Right Way is to call super().__init__ in every class in the hierarchy, and pass on arguments not used by the class using *args and **kwargs as necessary.

But this doesn't work if any one of the classes is not written with this technique in mind, in which case you need to revert to what you're already doing.

That said I try to avoid this kind of thing since traditional multiple inheritance kind of sucks, so maybe there is some better way that I don't know of.

Python3 ex:

code:
>>> class A:
	def __init__(self, pos1, **kwargs):
		print('A init')
		self.pos1 = pos1
		super().__init__(**kwargs)
>>> class B:
	def __init__(self, optional=None, **kwargs):
		print('B init')
		self.optional = optional
		super().__init__(**kwargs)
>>> class C(B, A):
	def __init__(self, pos1):
		print('C init')
		super().__init__(pos1=pos1)
>>> class D(A, B):
	def __init__(self, pos1):
		print('D init')
		super().__init__(pos1=pos1)
>>> class E(A, B):
	def __init__(self, pos1, **kwargs):
		print('E init')
		super().__init__(pos1=pos1, **kwargs)
>>> c = C(1)
C init
B init
A init
>>> d = D(1)
D init
A init
B init
>>> e = E(1, optional=True)
E init
A init
B init
>>> e.optional
True
Come to think of it, I guess in practice you probably want to avoid *args in the base classes and only call super().__init__ with keyword arguments along the lines of super().__init__(pos1=pos1, **kwargs), so I edited the code block to reflect that. But the point here is that super() means "the next thing in the MRO that implements this", so you call it in every class, take out what you need, and pass on the rest.

breaks fucked around with this message at 05:39 on Oct 11, 2017

Linear Zoetrope
Nov 28, 2011

A hero must cook
Is PEP484 worth using? I like it but I also like static analysis a lot.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. Bertrand Russell

Linear Zoetrope posted:

Is PEP484 worth using? I like it but I also like static analysis a lot.

Yes, but note that you have to use MyPy to actually get types checked... 484 just specifies the types but doesn't do anything with them.

Or use PyCharm which uses the type hints for ide stuff.

creatine
Jan 27, 2012




Wondering if something like this exists in Python.

I have some data that gets analyzed and then put on a scatter plot. I was using matplotlib and bokeh but I've got a lot of data and multiple datasets on the same graph and was wondering if there was a module or way to save scatter plots into PSD or some other layered image type that I can use to edit.

Adbot
ADBOT LOVES YOU

accipter
Sep 12, 2003

creatine posted:

Wondering if something like this exists in Python.

I have some data that gets analyzed and then put on a scatter plot. I was using matplotlib and bokeh but I've got a lot of data and multiple datasets on the same graph and was wondering if there was a module or way to save scatter plots into PSD or some other layered image type that I can use to edit.

You can create SVG files with matplotlib, which should work for you.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply