|
I want to create a simple auto-extractor for torrents. I'm on Windows 10 and have Winrar. What is the best practice to call Winrar (or processes in general) from a python script? Is it using subprocess.call? Hughmoris fucked around with this message at 23:11 on Sep 29, 2017 |
# ? Sep 29, 2017 23:08 |
|
|
# ? May 14, 2024 15:51 |
Why not use a native python rar library?
|
|
# ? Sep 29, 2017 23:11 |
|
Data Graham posted:Why not use a native python rar library? I tried rarfile but I was having issues with it finding UnRAR, even when I provided it the full path. I've got subprocess working now though.
|
# ? Sep 29, 2017 23:39 |
|
I'm working through an exercise from a book: https://pastebin.com/zMweG525 It's supposed to be using regex queries to look for phone numbers and email addresses, but it's giving me syntax errors on line 25 and I can't figure out why. I checked if maybe "text" was a reserved word in Python or something, but that doesn't seem to be the case. Besides, I get the same error if I substitute "txt" or even "floof." The "pyperclip" module is installed and I've used it successfully before. What am I missing?
|
# ? Sep 30, 2017 18:59 |
|
You haven't closed your regex compile call on the previous line (missing the closing parenthesis)
|
# ? Sep 30, 2017 19:08 |
|
Woops you're right, thanks. That was obvious, that'll teach me for focusing too much on line 25.
|
# ? Sep 30, 2017 19:19 |
|
Line 25 just looked pretty ok to me so I figured the problem was caused before that! Also your linter should really catch that kind of thing and put a red squiggle somewhere since it's never actually closed baka kaba fucked around with this message at 19:36 on Sep 30, 2017 |
# ? Sep 30, 2017 19:33 |
|
Yeah I've been using Geany, which isn't super robust (or maybe it can be configured to be). Maybe I'll just start doing everything in jupyter.
|
# ? Sep 30, 2017 19:40 |
|
I came across this article when Googling for something unrelated and I found it semi-interesting. It's about how Instagram converted from Python 2.7 to Python 3.5. Took them a year to do it. It's mostly just generalities, but there is this: quote:We did not have performance gain expectations for Python 3 right out of the box. So it was a pleasant surprise to see 12 percent CPU savings (on uswgi/Django) and 30 percent memory savings (on celery). This is in-line with my experience on a couple of projects I've converted to Python 3.
|
# ? Sep 30, 2017 20:34 |
|
Data Graham posted:Why not use a native python rar library? That Guy posted:]No. The current architecture - parsing in Python and decompression with command line tools work well across all interesting operating systems (Windows/Linux/MacOS), wrapping a library does not bring any advantages.
|
# ? Sep 30, 2017 20:56 |
|
Question: I am passing a list of arrays into a function that will then do some plotting with those arrays. A simple way I thought to get a name was to use the .index() method in a way like this code:
code:
edit: I managed to get it working by using enumerate inside my zip() call like so code:
creatine fucked around with this message at 22:37 on Oct 5, 2017 |
# ? Oct 5, 2017 22:27 |
|
https://github.com/ponty/pyunpack says it supports RAR.
|
# ? Oct 5, 2017 22:27 |
|
creatine posted:Question: input_list.index(data) effectively looks like this on the inside: code:
|
# ? Oct 6, 2017 00:14 |
|
Nippashish posted:input_list.index(data) effectively looks like this on the inside: Interesting, thanks for the explanation!
|
# ? Oct 6, 2017 00:24 |
|
Thermopyle posted:I came across this article when Googling for something unrelated and I found it semi-interesting. This is good to know. Also, hi thread. I am learning python to do some web scraping and data manipulation and eventually machine learning stuff. Holy crap is it easy. I got all the data from a webpage into a csv with like 4 lines of code (pandas) and theres 19 bajillion examples of how to do this online. CarForumPoster fucked around with this message at 11:43 on Oct 6, 2017 |
# ? Oct 6, 2017 11:19 |
|
CarForumPoster posted:This is good to know. I'm working my way through Automate The Boring Stuff and am on the Web Scraping section. Just curious, with your pandas example, are you scraping full tables or are you using selectors to nab individual items and then building a dataframe? Hughmoris fucked around with this message at 16:48 on Oct 6, 2017 |
# ? Oct 6, 2017 16:44 |
|
I saw the new PyPy has pandas and numpy support. Anyone know of any benchmarking that's been done?
|
# ? Oct 6, 2017 17:07 |
|
Hughmoris posted:I'm working my way through Automate The Boring Stuff and am on the Web Scraping section. Just curious, with your pandas example, are you scraping full tables or are you using selectors to nab individual items and then building a dataframe? I just started yesterday so right now I nab everything and dump it into a CSV. Also, is there a machine learning thread? Im wondering if I could actually skip this step all together. I have the problem of similar data like names, dates, events are stored in tables on lots of webpages in semi-different formats on each website. I want to scrape that data and have ~machine learning~(or whatever) sort it out for me such that it is stored in a way I can add to a database. (CSV or something like that) I have ~50 sample webpages I could format how I'd like (desired output) and use to train a model with and could easily get a few hundred if it was worth while. I know I'd need a way to vectorize either the table info or the html all together...havent figured that out yet. Machine learning thread? CarForumPoster fucked around with this message at 17:29 on Oct 6, 2017 |
# ? Oct 6, 2017 17:25 |
|
CarForumPoster posted:I just started yesterday so right now I nab everything and dump it into a CSV. Though I'm not sure what you actually want. (Supervised) machine learning relates standardised data of one form (predictors) to standardised data of another form (outcomes). It seems to me you're describing a part of the data wrangling process still - although that too might belong into the data science thread.
|
# ? Oct 6, 2017 18:16 |
|
I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline. I'm looking to write a process manager that can monitor these scripts, trigger them when they need to be, and give me the ability to manually start and stop them when I want to. This is running on a Windows Server which seems to limit some of the available libraries already out there. Am I trying to reinvent the wheel? I can't really find anything that meets these requirements already, but it seems like it should exist.
|
# ? Oct 6, 2017 18:46 |
|
Cingulate posted:You can try the data science thread, or the stats thread. Your understanding is correct. I want a webscraper more flexible than most (all?) web scrapers to scrape a certain type of data and put it in a database. (Dates and info usually stored in tables on many different webpage layouts) I.e. I want a flexible data formatting tool. I want to use machine learning to do some data wrangling.
|
# ? Oct 6, 2017 21:33 |
|
Portland Sucks posted:I've been writing a custom ETL tool at work in python and I've got most of it all broken down into individual scripts. All of the steps along the way are jobs that should be able to execute independently, but can also be triggered by a successful end condition of one before it in the pipeline. Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering.
|
# ? Oct 6, 2017 21:55 |
|
Seventh Arrow posted:Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering. I'm actually using Luigi at the moment. It seems good enough for a kind of fire and forget solution. I'm having issues designing a fault handling system into the work flow. This is the first time I've done ETL work and I don't have any one around to give me insight in to the "right" way to do this I guess. I'm sure my problem is pretty simple: Data Sources A, B, C, ... Z are folders that get filled with XML files, database tables that get updated every minute or so, and some time series data that gets written to proprietary data files. So the extraction step is retrieving the new data from these sources. At the moment I'm just reaching back with a little bit of overlap to make sure I don't miss anything. I'd like to find a better way to do this. Transformation of the data into a format I need it to be in is pretty straight forward I guess. Loading the data I currently have to deal with a regular amount of integrity failures during insertions since I'm grabbing overlapping historical data. This is probably a lovely way to do the load job since it prevents me from batch loading data so I'm doing inserts row by row. It seems like the "good" way to do this would be to only get the new data during the extraction step, and then to do a batch load all at once during the loading step but I'm not quite sure how to achieve this as of now. There is also the case that the data I'm fetching may have duplicate data already in it regardless if I'm reaching back further than I need to, so wouldn't that cause a failure of the data insertion if I'm doing it in batch? The whole ETL process makes sense to me at the high level, but the execution of it is tripping me up.
|
# ? Oct 7, 2017 01:39 |
|
Cingulate posted:You can try the data science thread, or the stats thread. where is this thread?
|
# ? Oct 8, 2017 00:17 |
|
It's in the Science subforum: https://forums.somethingawful.com/showthread.php?threadid=3359430
|
# ? Oct 8, 2017 02:01 |
|
thank you!
|
# ? Oct 8, 2017 04:02 |
|
Does anything in the OP look stale?
|
# ? Oct 8, 2017 11:01 |
|
Is there a good library to program things for Phillips Hue lights?
|
# ? Oct 8, 2017 12:59 |
|
Dominoes posted:Does anything in the OP look stale? The Heroku link directs you to a login rather than info about heroku.
|
# ? Oct 8, 2017 14:05 |
|
Seventh Arrow posted:Doesn't Luigi already do this? It's kind of old, but ETL stuff is pretty much a staple of data engineering. Airflow is a bit more modern. I've been wanting to build a reactive / streaming ETL pipeline for a while but never really gotten off the ground. At any rate for what Portland Sucks is asking, in any framework what you get is the scheduler, dependency graph, etc. The actual python to convert an your xml file to your database row is something you just have to write.
|
# ? Oct 8, 2017 15:12 |
|
Any experience with Flask Appbuilder? I had to put together a simple CRUD webapp and it was a dream doing the first pass: FAB got me there almost trivially. But, now I'm looking to customize some things and refine the behaviour, it's proving a lot harder. Largely due to the lack of available examples. The FAB distribution has quite a lot of examples but they're all for simple things.
|
# ? Oct 9, 2017 16:40 |
|
Ugh asyncio. I felt the need to build a nice fast socket based server app and decided to do it "correctly". Started using asyncio and man in this a mess. Why are both generator coroutines and async functions a thing? Why are there multiple keywords that all do something very slightly different with seemingly arbitrary restrictions on which types of functions they work on? I do actually know the answer to these questions. I'm just annoyed because the mix of info and examples I find even in the 3.6 documentation is baffling to use as an intro because it is not at all obvious which type of coroutines different bits of the asyncio library work with, at least at first. Trying to figure out whether/how you can schedule? run? new async def functions inside an eventloop which is already running a run_until_complete streaming server is weirdly difficult when examples are using two different APIs and two different forms of coroutines and older info uses newer terminology differently.
|
# ? Oct 10, 2017 15:42 |
|
Yes, introducing the pre 3.6 API first was a mistake.
|
# ? Oct 10, 2017 18:17 |
|
I've set up an internal PyPi server at my company to manage our common testing code between teams, and it's been working pretty great. I'm now trying to get python3 working for our tests, and I've hit sort of a weird maybe-snag? Teamcity is building out my common packages, dropping a VERSION file into the package directory containing, say, '1.0.0.286', and the package is being uploaded to pypi as asc_page_models-1.0.0.286.zip But now when I'm in a python 3 virtual environment, it downloads the correct version of that package from the server as specified in my requirements file, but pip freaks out about the version number, Requested asc_page_models>=1.0.0.200 from <path to newest package on server>, but installing version None If I do a pip freeze, I get asc_page_models==b-1.0.0.286-n- What the hell is going on here? Googling says it has to do with the package version not matching the file name but the file name is being derived from the version in my CI pipeline ? E: just to clarify, no problem if I'm running out of a python 2 virtualenv E2: could this be caused by my package not having a __version__ variable? It's 730pm, this is well past a tomorrow problem. Sockser fucked around with this message at 00:31 on Oct 11, 2017 |
# ? Oct 11, 2017 00:27 |
|
I am the author of two Python packages. The first package (pyrvt) provides a set of tools for working with type of motion, and provides a number of classes. The second package (pysra), uses pyrvt, but extends the capabilities of the pysra. I extend those cabilities, but inheriting from a class in pysra, however it feels like I am doing this in a clunky way. Is there a better way do this?Python code:
|
# ? Oct 11, 2017 04:09 |
|
If you have control over all the classes used, I think the Right Way is to call super().__init__ in every class in the hierarchy, and pass on arguments not used by the class using *args and **kwargs as necessary. But this doesn't work if any one of the classes is not written with this technique in mind, in which case you need to revert to what you're already doing. That said I try to avoid this kind of thing since traditional multiple inheritance kind of sucks, so maybe there is some better way that I don't know of. Python3 ex: code:
breaks fucked around with this message at 05:39 on Oct 11, 2017 |
# ? Oct 11, 2017 05:04 |
|
Is PEP484 worth using? I like it but I also like static analysis a lot.
|
# ? Oct 11, 2017 08:40 |
|
Linear Zoetrope posted:Is PEP484 worth using? I like it but I also like static analysis a lot. Yes, but note that you have to use MyPy to actually get types checked... 484 just specifies the types but doesn't do anything with them. Or use PyCharm which uses the type hints for ide stuff.
|
# ? Oct 11, 2017 13:46 |
|
Wondering if something like this exists in Python. I have some data that gets analyzed and then put on a scatter plot. I was using matplotlib and bokeh but I've got a lot of data and multiple datasets on the same graph and was wondering if there was a module or way to save scatter plots into PSD or some other layered image type that I can use to edit.
|
# ? Oct 11, 2017 14:38 |
|
|
# ? May 14, 2024 15:51 |
|
creatine posted:Wondering if something like this exists in Python. You can create SVG files with matplotlib, which should work for you.
|
# ? Oct 11, 2017 14:52 |