|
Thermopyle posted:I wish they talked about why they went with Miniconda instead of virtualenv. Reading between the lines it sounds like what I mentioned earlier...packages that require compiling. Also if they have more to install than just Python packages but want to manage everything with one tool.
|
# ? Sep 10, 2016 21:24 |
|
|
# ? May 9, 2024 01:03 |
|
Hey guys- I'm trying to help out a new student in my lab. They're using Python to run Psychology experiments, but there's an issue I'm not acquainted with. Basically, participants are supposed to be reading aloud a list of words that appear on screen. This is all very simple, but she wants to record the entire stream of vocal responses from and dump it to a single large wave file- but I'm not quite sure how you would set up a PyAudio/Wave modules to passively record a long experiment. Now this isn't necessarily how I'd approach it, and I'm working with her code as best I can, but here it is: code:
code:
I'm guessing there is another function for passively, but I'm not aware of it. Is there an alternate way I could do it? PoizenJam fucked around with this message at 06:38 on Sep 12, 2016 |
# ? Sep 12, 2016 06:30 |
|
i've never used pyaudio, but from a quick look, instead of blocking during recording you can use a callback: http://people.csail.mit.edu/hubert/pyaudio/#wire-callback-example
|
# ? Sep 12, 2016 11:00 |
|
I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file.
|
# ? Sep 12, 2016 20:37 |
|
Not sure what's going wrong here. code:
if I change the last line to: code:
it does the trick. But I don't actually want to have files called something<name>. Any clues what I'm doing wrong? The error message I get is: code:
Of course I figure out what's wrong seconds after posting. the "something" is actually the subdirectory I'm opening the .data files from (/home/dir/project/data/htmlpage*.data). Without including the path, it can't find the files. I guess I just need to find out how to tar files without the path data. LochNessMonster fucked around with this message at 21:17 on Sep 12, 2016 |
# ? Sep 12, 2016 21:13 |
|
You could use os.chdir() to set your working path to "something" and then you could just use the name of the file. Then you'd probably need to use os.getcwd() to get original directory, change into directory with files, add files to tar then move tar to original directory after you're done. Or just make a copy of each file in original directory, add to tar, delete copy. Master_Odin fucked around with this message at 21:47 on Sep 12, 2016 |
# ? Sep 12, 2016 21:44 |
|
Master_Odin posted:You could use os.chdir() to set your working path to "something" and then you could just use the name of the file. I fixed it with: tar.add(str("data/" + name), arcname=str(name))
|
# ? Sep 12, 2016 21:58 |
|
JVNO posted:I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file. i'm just guessing from the code you posted(it's monday and i'm ill so reading is for nerds), but in her example she's reading before the questions are presented. in the code block you posted, you're reading afterwards. the 'hang' you're referring to is possibly just the recording actually starting after the questions have been posted, since: Python code:
|
# ? Sep 12, 2016 22:04 |
|
^ yeah, it looks like it just loops, waiting for each chunk to come through, then adds it to the list and waits for the next one. That's 30 minutes of chunks to wait for, so it blocks (that's what hanging is) until it's doneJVNO posted:I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file. It looks like your code is basically the example code from the documentation, so maybe you're just running out of memory? 30 minutes of raw audio is a lot to hold in a list without streaming it to disk. Does it work with 5 or 10 seconds? What do your crash errors say? baka kaba fucked around with this message at 00:30 on Sep 13, 2016 |
# ? Sep 13, 2016 00:24 |
|
Dex: I realize that's the error with her version of the code- I pointed that out. I just don't know how to 'passively' record the stream while the code advances, then dumping that entire stream to a wave file. Baka Kaba:The program becomes immediately non responsive, to the point where the 'End Process?' prompt will come up. I can't test. The crash errors are unspecified if it hangs like this I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes. Specifically the part of the code here: "#Present study trials *TRIMMED FOR BREVITY- but displays multiple trials of words which participants must read aloud*" Is the giant 'FOR ___ in ____' that displays the hundreds of trials.
|
# ? Sep 13, 2016 18:59 |
|
JVNO posted:Dex: I realize that's the error with her version of the code- I pointed that out. I just don't know how to 'passively' record the stream while the code advances, then dumping that entire stream to a wave file. I haven't used PyAudio, but it sounds like "immediately non responsive" is because the program is waiting for the recording to end. That's where the callback function comes in. In callback mode, PyAudio will call a specified callback function whenever there is new (recorded) audio data available. Note that PyAudio calls the callback function in a separate thread. The separate thread is what allows the audio to record while the rest of the script still plays through.
|
# ? Sep 13, 2016 21:07 |
|
I don't know anything about PyAudio and I haven't even looked at it, but generally when you want to do two things at once...like interview a person AND record audio, you'll need threads or multiprocessing. edit: oops, left thread reply page open for a long time...
|
# ? Sep 13, 2016 21:39 |
|
JVNO posted:I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes. Yeah, it's been talked about but in case you're not clear - code execution by default is synchronous, which is a fancy way of saying that all your code gets executed in order, one line at a time, until you hit some end point. So your code starts up, initialises the audio engine thing, then sits in that for loop waiting for all the chunks to come in, 30 mins' worth. Only after that loop has completed does it move on to the next stuff. That waiting behaviour is called blocking because it blocks execution, and does nothing until it gets what it's waiting for. It can't respond to input or anything, because it can't process it until the waiting is over. That's why programs hang (as opposed to crashing) I think what you're looking for is asynchronous behaviour, where you start the polling loop that grabs the chunks, but that goes off on its own to work and the rest of your code runs as usual. So you have two different tasks running simultaneously, instead of one after the other. That is the kind of thing you want, but it doesn't just happen, right? To break out of that synchronous, one-line-after-the-other behaviour you need some kind of async structure that will let you run multiple things at once Luckily it looks like the audio thing has already written it for you - if you pass in a callback, it should run in the background on its own and run whatever code is in the callback when there's an event (I'm assuming whenever it finishes a chunk, I'm phoneposting here so I can't look). So you can set up the engine, give it a callback handler to do whatever with the stuff it produces, send it on its way and then carry on with the rest of your script doing something else. This is basically how async stuff tends to work, and yeah it's more complicated than a basic script that just runs from beginning to end baka kaba fucked around with this message at 22:06 on Sep 13, 2016 |
# ? Sep 13, 2016 22:02 |
|
I'm scraping a site using Selenium and the PhantomJS webdriver. The weird thing is, it works in the interactive console, but in a script no matter how long I wait for the page to load, the element I'm looking for isn't found. I can replicate this in the console by stringing together mutliple commands with a semicolon. Even with a long wait it fails. However, in the interactive prompt, immediately after I get the error it starts to work. And it's not a matter of how long I wait for the element to load. If I just wait 1 second, then get the exception, immediately running the find again works. I tried to replicate this in the script with try/except but that doesn't work either. Could this be some kind of async thing? Does Python need to relinquish control for a moment in order for the page load to register? I'm very confused. To clarify Python code:
SurgicalOntologist fucked around with this message at 05:30 on Sep 14, 2016 |
# ? Sep 14, 2016 05:19 |
|
Change to the webdriver for a real browser and see if the behavior happens there. That's one of my first steps when trying to figure problems like this out.
|
# ? Sep 14, 2016 05:25 |
|
Thermopyle posted:Change to the webdriver for a real browser and see if the behavior happens there. That's one of my first steps when trying to figure problems like this out. Duh, that makes debugging so much easier. Turned out to be URL typo. Thanks!
|
# ? Sep 14, 2016 05:37 |
How come this thing just hangs here?code:
|
|
# ? Sep 14, 2016 20:22 |
|
So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed. I can get the information I want, which is the brand, type, mileage, year and the dealer who sells it. For each entry I scrape I put the values in variables. For now I just want to write them to disk or database, but in the end I'd like to identify each one of them uniquely (unfortunately license plates are usually not listed, so I need to think of something for that...). As for structuring the data I've been doing some reading but I can't really see what I should use to store this info. Do I use a list, tuple or dictionary? Ideally I'd parse the info for 100-200 items with 5 attrinutes each. Would it be a good idea to create lists in a list and the order of the items relates to the value inside the list? Like the first item is brand, 2nd type, 3rd milage, etc? And then write them to sqlite with list.pop or something? I'm really eager to hear how you approach questions like these because I don't really know how to proceed.
|
# ? Sep 15, 2016 21:08 |
|
LochNessMonster posted:So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed. Use a database. A database has tables. Each table is kind of like a spreadsheet. So you'd have a field (column in spreadsheet) for each attribute. You use some sort of structure like a dictionary when you're dealing with the items in python, and then map the attributes in your python code to fields in the database. Thermopyle fucked around with this message at 21:12 on Sep 15, 2016 |
# ? Sep 15, 2016 21:10 |
|
LochNessMonster posted:So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed. You use sqlite3 by using it with SQL. You first create a "table", which is just set of defined columns. CREATE TABLE motorcycles(brand, type, mileage INTEGER, year INTEGER); Then you can add rows with the INSERT statement: INSERT INTO motorcycles VALUES("Toyota", "V8", 32000, 2008); To execute SQL, use the cursor.execute(): Python code:
|
# ? Sep 15, 2016 21:59 |
|
Thermopyle posted:Use a database. A database has tables. Each table is kind of like a spreadsheet. So you'd have a field (column in spreadsheet) for each attribute. I'm definately going to write the data to a database. What I'm struggling with is how to structure the data with my script before writing to disk. The program currently scrapes html pages that are saved to disk. Each page has 10 vehicles in it, so I loop through the divs parsing the values for brand/type/milage/year/dealer. That's where I am right now. To get that data to a database I figured I should probably "store" that data inside the script for at least all vehicles on 1 page, or for all vehicles on all the pages. When I have parsed 1 page (and thus 10 vehicles) I write all of them to the database at once, instead of opening (and closing) a db connection for each iteration of the page loop(s). So I think I need to put the vehicle info in a list, tuple or dictionary before writing it to the database. I could be missing something obvious though.
|
# ? Sep 15, 2016 21:59 |
|
LochNessMonster posted:I'm definately going to write the data to a database. What I'm struggling with is how to structure the data with my script before writing to disk. That's exactly what dicts and lists are for. So you could have a list called list_of_motorcycles and each motorcycle could be a dict, which links keys to values. Once you have parsed all the information for one motorcycle, append that dictionary to the end of your list_of_motorcycles and move to the next one. Then, at the end, you've got a big ol' list of motorcycle dictionaries. But really, you'll already have the db connection open and I guarantee you that you won't be taxing sqlite by calling 100 INSERTs. So you can just add each entry instead of storing them in another object before adding them. Python code:
Tigren fucked around with this message at 23:07 on Sep 15, 2016 |
# ? Sep 15, 2016 22:49 |
|
Tigren posted:That's exactly what dicts and lists are for. So you could have a list called list_of_motorcycles and each motorcycle could be a dict, which links keys to values. Once you have parsed all the information for one motorcycle, append that dictionary to the end of your list_of_motorcycles and move to the next one. Then, at the end, you've got a big ol' list of motorcycle dictionaries. Thank you, this what I had in mind as a concept, but I had no clue if it would be a good and/or efficient way of doing it. I also didn't know what type I should've used. Thanks for helping me figuring this out! After I manage to do this, I can create a loop based on Suspicious Dishs sql query to write to a database.
|
# ? Sep 15, 2016 23:05 |
|
If all you're doing is taking the data from the form and not doing anything meaningful with it before insertion into the DB, you may as well as just insert it into DB immediately. There's no penalty leaving the database connection open for the length of scraping, especially since it's sqlite.
|
# ? Sep 16, 2016 00:43 |
|
Master_Odin posted:If all you're doing is taking the data from the form and not doing anything meaningful with it before insertion into the DB, you may as well as just insert it into DB immediately. There's no penalty leaving the database connection open for the length of scraping, especially since it's sqlite. I'm really new to programming so I'm taking baby steps. In the near future I will be doing stuff with the data before inserting it. Good to know I could keepthe connection open if I for some reasonneed to in the future though.
|
# ? Sep 16, 2016 01:03 |
fletcher posted:How come this thing just hangs here? Fixed it with: code:
|
|
# ? Sep 16, 2016 02:38 |
|
LochNessMonster posted:Thank you, this what I had in mind as a concept, but I had no clue if it would be a good and/or efficient way of doing it. I also didn't know what type I should've used. Thanks for helping me figuring this out! you could also use sqlalchemy for this. first create your database entity: Python code:
Python code:
|
# ? Sep 16, 2016 11:32 |
|
Is there a good book for people new to Python who aren't new to programming in general? Essentially less of the "this is how you add two numbers" and more of a "Python for C# Developers" style of book?
|
# ? Sep 16, 2016 17:21 |
|
Dex posted:you could also use sqlalchemy for this. It looks a bit more complicated, is it something I should be able to do as a complete beginner? What are the pros/cons compared to the plain sqlite3 method?
|
# ? Sep 16, 2016 17:28 |
|
Martytoof posted:Is there a good book for people new to Python who aren't new to programming in general? I googled for "python for java developers" since C# and Java are so similar and came up with this: https://antrix.net/static/pages/python-for-java/online/ After a quick skim it seems alright. Note that he focuses on Python 2.7 whereas a decent amount of developers are on Python 3. But it will be close enough...
|
# ? Sep 16, 2016 17:30 |
|
LochNessMonster posted:It looks a bit more complicated, is it something I should be able to do as a complete beginner? Use plain sqlite for now. What you learn will be super useful when you go to start using an ORM like sqlalchemy as it'l help you understand it.
|
# ? Sep 16, 2016 17:31 |
|
Thermopyle posted:I googled for "python for java developers" since C# and Java are so similar and came up with this: https://antrix.net/static/pages/python-for-java/online/ Looks good to me! Thanks. I'm versed in Java and C# but honestly any language will be fine, I just have a hard time putting up with the first few chapters of most books where we get into "this is a variable" "this is a statement" "here is a definition of an object" and I lose all interest in going any further :P
|
# ? Sep 16, 2016 17:36 |
|
Thermopyle posted:Use plain sqlite for now. Was just reading about sqlalchemy and while it looks like overkill for now, I might indeed need it in the future. I'll start fiddling with sqlite for now. edit: I must say I'm starting to enjoy this a lot more than I thought I would. I once had a Java programming class which had a horrible teacher (several years consistant 90% of the class failing) and thought programming sucks balls. A few years back I was trying to find a specific brand/type motorcycle and I was quite annoyed I had to manually check a lot of dealer sites to see if they had any, so I figured that'd be a nice project to start with. I currently now have a scraper/parser working that gets roughly 100 vehicles from 1 dealer who has a standard webgallery which I've seen several other dealers use as well. Hopefully I can get the data of another 10-20 dealers this way. I'm pretty sure I'll be running into a lot of design flaws in a while, but figuring those out seems like half the fun. LochNessMonster fucked around with this message at 19:59 on Sep 16, 2016 |
# ? Sep 16, 2016 17:41 |
|
LochNessMonster posted:I'm pretty sure I'll be running into a lot of design flaws in a while, but figuring those out seems like half the fun. i don't think i've ever got through an entire project without rethinking something i did at the start and could have done better. getting something working and being happy with how it works are two totally different things, i barely ever accomplish the latter. it's good to know when to just leave something be though otherwise you'll never finish anything! sqlalchemy is definitely overkill for what you're trying to do, it's just handy to keep in mind if things start growing. the more you understand using database connections and sql directly, the easier the sqlalchemy stuff is to figure out, though, so there's no real rush to start using it - was just throwing it out there as an option to look into if you're curious
|
# ? Sep 16, 2016 20:10 |
|
I'm in a weird situation where I'm developing a python script on one machine in a conda environment, but I need to run it on a Windows box. Normally, not a big deal since I have Anaconda installed and up to date on both. However, this script relies on pexpect, which has some unix lib dependancies. So I'm limited to a Cygwin instance here. I tried to install the linux Anaconda, but it dumps out when it's trying to install conda. So my question is this: pexpect is currently the only thing other std libs that I'm using for this and it's already installed in Cygwin via pip.. Python3 on Cygwin is 3.4.3. So far, I've only specified my conda envs with "python3". Can I specify a specific point release of python for my env? If not, what should I look out for in my source to make sure it will execute in the Cygwin environment?
|
# ? Sep 19, 2016 18:09 |
|
flosofl posted:Can I specify a specific point release of python for my env? Yes, you can specify python versions like any other conda package: code:
|
# ? Sep 19, 2016 21:32 |
|
BigRedDot posted:Yes, you can specify python versions like any other conda package: Thanks!
|
# ? Sep 19, 2016 21:39 |
|
Cross-posting another question from the Bokeh Google Group:quote:I have a tricky case of circular callbacks and I'm wondering if anyone has any thoughts. SurgicalOntologist fucked around with this message at 21:50 on Sep 19, 2016 |
# ? Sep 19, 2016 21:42 |
|
I've got this code that is searching through a directory and trying to match files up to another directory. If the file in the first is found with the same name in the second, it'l move the file in the first directory to the second. code:
code:
code:
|
# ? Sep 19, 2016 22:32 |
|
|
# ? May 9, 2024 01:03 |
|
huhu posted:I've got this code that is searching through a directory and trying to match files up to another directory. If the file in the first is found with the same name in the second, it'l move the file in the first directory to the second. Break only takes you out of the current for loop, you presumably have another loop around that, which you're not breaking out of. The simplest thing might be to set a variable when you've found it, and check that variable at the top of your outer loop and break if the variable is set.
|
# ? Sep 19, 2016 22:39 |