Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

BigRedDot: Mar 6, 2008

Thermopyle posted:

I wish they talked about why they went with Miniconda instead of virtualenv. Reading between the lines it sounds like what I mentioned earlier...packages that require compiling.

Also if they have more to install than just Python packages but want to manage everything with one tool.

# ? Sep 10, 2016 21:24

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 07:51

PoizenJam: Dec 2, 2006; Damn!!!
It's PoizenJam!!!

Hey guys- I'm trying to help out a new student in my lab. They're using Python to run Psychology experiments, but there's an issue I'm not acquainted with.

Basically, participants are supposed to be reading aloud a list of words that appear on screen. This is all very simple, but she wants to record the entire stream of vocal responses from and dump it to a single large wave file- but I'm not quite sure how you would set up a PyAudio/Wave modules to passively record a long experiment.

Now this isn't necessarily how I'd approach it, and I'm working with her code as best I can, but here it is:

code:

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 1024
RECORD_SECONDS = 1800
WAVE_OUTPUT_FILENAME = "fastboostCongButtonAudio_"+str(userVar['1. Subject'])+".wav" 
audio = pyaudio.PyAudio()

# start Recording
stream = audio.open(format=FORMAT, channels=CHANNELS,rate=RATE, input=True,frames_per_buffer=CHUNK)
frames = []

for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

[b]#Present study trials
     *TRIMMED FOR BREVITY- but displays multiple trials of words which participants must read aloud*

#AFTER STUDY TRIALS COMPLETE
stream.stop_stream()
stream.close()
audio.terminate()
    
    
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

study_output.close()

I recognize why her original code won't work- the 'for' loop recording audio occurs before the study trials are even presented. It will just hang, not responding, and sometimes crash. But when I move those lines to post study trials, such as:

code:

stream = audio.open(format=FORMAT, channels=CHANNELS,rate=RATE, input=True,frames_per_buffer=CHUNK)
frames = []

[b]#Present study trials
    *TRIMMED FOR BREVITY- but displays multiple trials of words which participants must read aloud*

#AFTER STUDY TRIALS COMPLETE
for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
stream.stop_stream()
stream.close()
audio.terminate()
    
    
waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(frames))
waveFile.close()

I still get what appears to be a hang/crash. This occurs if I skip the 'for j in range' loop altogether and just insert the 'data = stream.read(CHUNK)' and 'frame.append' command inline.

I'm guessing there is another function for passively, but I'm not aware of it. Is there an alternate way I could do it?

PoizenJam fucked around with this message at 06:38 on Sep 12, 2016

# ? Sep 12, 2016 06:30

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

i've never used pyaudio, but from a quick look, instead of blocking during recording you can use a callback: http://people.csail.mit.edu/hubert/pyaudio/#wire-callback-example

# ? Sep 12, 2016 11:00

PoizenJam: Dec 2, 2006; Damn!!!
It's PoizenJam!!!

I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file.

# ? Sep 12, 2016 20:37

LochNessMonster: Feb 3, 2005; I need about three fitty

Not sure what's going wrong here.

code:

import os, tarfile
baseFileName = "data"
dataDir = "data/"
dataFiles = os.listdir(dataDir)


#adds all files in dataFiles, including directories and unwatned files.
with tarfile.open("data/archive/test.tar", "w") as tar:
    for name in dataFiles:
        if name.endswith(".data"):
            tar.add(str(name))

gives the following error message.

if I change the last line to:

code:

tar.add(str("something" + name"))

it does the trick. But I don't actually want to have files called something<name>. Any clues what I'm doing wrong?

The error message I get is:

code:

Traceback (most recent call last):
  File "/home/dir/project.py", line 11, in <module>
    tar.add(str(name))
  File "/usr/lib/python3.4/tarfile.py", line 1907, in add
    tarinfo = self.gettarinfo(name, arcname)
  File "/usr/lib/python3.4/tarfile.py", line 1779, in gettarinfo
    statres = os.lstat(name)
FileNotFoundError: [Errno 2] No such file or directory: 'htmlpage8.data'

edit:

Of course I figure out what's wrong seconds after posting.

the "something" is actually the subdirectory I'm opening the .data files from (/home/dir/project/data/htmlpage*.data).

Without including the path, it can't find the files. I guess I just need to find out how to tar files without the path data.

LochNessMonster fucked around with this message at 21:17 on Sep 12, 2016

# ? Sep 12, 2016 21:13

Master_Odin: Apr 15, 2010; My spear never misses its mark...

ladies

You could use os.chdir() to set your working path to "something" and then you could just use the name of the file.

Then you'd probably need to use os.getcwd() to get original directory, change into directory with files, add files to tar then move tar to original directory after you're done.

Or just make a copy of each file in original directory, add to tar, delete copy.

Master_Odin fucked around with this message at 21:47 on Sep 12, 2016

# ? Sep 12, 2016 21:44

LochNessMonster: Feb 3, 2005; I need about three fitty

Master_Odin posted:

You could use os.chdir() to set your working path to "something" and then you could just use the name of the file.

Then you'd probably need to use os.getcwd() to get original directory, change into directory with files, add files to tar then move tar to original directory after you're done.

Or just make a copy of each file in original directory, add to tar, delete copy.

I fixed it with:

tar.add(str("data/" + name), arcname=str(name))

# ? Sep 12, 2016 21:58

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

JVNO posted:

I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file.

i'm just guessing from the code you posted(it's monday and i'm ill so reading is for nerds), but in her example she's reading before the questions are presented. in the code block you posted, you're reading afterwards. the 'hang' you're referring to is possibly just the recording actually starting after the questions have been posted, since:

Python code:

for j in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)

...reads as "from now until however many seconds, start recording", to me, whereas the callback example has a start_stream which should kick off that reading in the background(assuming your callback function is doing that)

# ? Sep 12, 2016 22:04

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

^ yeah, it looks like it just loops, waiting for each chunk to come through, then adds it to the list and waits for the next one. That's 30 minutes of chunks to wait for, so it blocks (that's what hanging is) until it's done

JVNO posted:

I'm curious why the blocked form of the code wouldn't be valid? The intention is to record the entire PyAudio stream from the experiment (up to about 30 minutes) and dump it to a single wave file.

It looks like your code is basically the example code from the documentation, so maybe you're just running out of memory? 30 minutes of raw audio is a lot to hold in a list without streaming it to disk. Does it work with 5 or 10 seconds?

What do your crash errors say?

baka kaba fucked around with this message at 00:30 on Sep 13, 2016

# ? Sep 13, 2016 00:24

PoizenJam: Dec 2, 2006; Damn!!!
It's PoizenJam!!!

Dex: I realize that's the error with her version of the code- I pointed that out. I just don't know how to 'passively' record the stream while the code advances, then dumping that entire stream to a wave file.

Baka Kaba:The program becomes immediately non responsive, to the point where the 'End Process?' prompt will come up. I can't test. The crash errors are unspecified if it hangs like this

I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes.

Specifically the part of the code here:

"#Present study trials
*TRIMMED FOR BREVITY- but displays multiple trials of words which participants must read aloud*"

Is the giant 'FOR ___ in ____' that displays the hundreds of trials.

# ? Sep 13, 2016 18:59

Tigren: Oct 3, 2003

JVNO posted:

Dex: I realize that's the error with her version of the code- I pointed that out. I just don't know how to 'passively' record the stream while the code advances, then dumping that entire stream to a wave file.

Baka Kaba:The program becomes immediately non responsive, to the point where the 'End Process?' prompt will come up. I can't test. The crash errors are unspecified if it hangs like this

I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes.

Specifically the part of the code here:

"#Present study trials
*TRIMMED FOR BREVITY- but displays multiple trials of words which participants must read aloud*"

Is the giant 'FOR ___ in ____' that displays the hundreds of trials.

I haven't used PyAudio, but it sounds like "immediately non responsive" is because the program is waiting for the recording to end. That's where the callback function comes in. In callback mode, PyAudio will call a specified callback function whenever there is new (recorded) audio data available. Note that PyAudio calls the callback function in a separate thread. The separate thread is what allows the audio to record while the rest of the script still plays through.

# ? Sep 13, 2016 21:07

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

I don't know anything about PyAudio and I haven't even looked at it, but generally when you want to do two things at once...like interview a person AND record audio, you'll need threads or multiprocessing.

edit: oops, left thread reply page open for a long time...

# ? Sep 13, 2016 21:39

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

JVNO posted:

I'm not sure why the callback function would be necessary still; I want one single recording. If the wav recording is simply too large for a discontinuous recording, is there a way I could stick a couple lines of code at the bottom of each iteration of the experimental trial 'FOR' loop that says 'Write frames to wav file; clear buffer for next trial'? Each trial individually is less than 5 seconds- just the total time exceeds 30 minutes.

Yeah, it's been talked about but in case you're not clear - code execution by default is synchronous, which is a fancy way of saying that all your code gets executed in order, one line at a time, until you hit some end point. So your code starts up, initialises the audio engine thing, then sits in that for loop waiting for all the chunks to come in, 30 mins' worth. Only after that loop has completed does it move on to the next stuff. That waiting behaviour is called blocking because it blocks execution, and does nothing until it gets what it's waiting for. It can't respond to input or anything, because it can't process it until the waiting is over. That's why programs hang (as opposed to crashing)

I think what you're looking for is asynchronous behaviour, where you start the polling loop that grabs the chunks, but that goes off on its own to work and the rest of your code runs as usual. So you have two different tasks running simultaneously, instead of one after the other. That is the kind of thing you want, but it doesn't just happen, right? To break out of that synchronous, one-line-after-the-other behaviour you need some kind of async structure that will let you run multiple things at once

Luckily it looks like the audio thing has already written it for you - if you pass in a callback, it should run in the background on its own and run whatever code is in the callback when there's an event (I'm assuming whenever it finishes a chunk, I'm phoneposting here so I can't look).

So you can set up the engine, give it a callback handler to do whatever with the stuff it produces, send it on its way and then carry on with the rest of your script doing something else. This is basically how async stuff tends to work, and yeah it's more complicated than a basic script that just runs from beginning to end

baka kaba fucked around with this message at 22:06 on Sep 13, 2016

# ? Sep 13, 2016 22:02

SurgicalOntologist: Jun 17, 2004

I'm scraping a site using Selenium and the PhantomJS webdriver. The weird thing is, it works in the interactive console, but in a script no matter how long I wait for the page to load, the element I'm looking for isn't found. I can replicate this in the console by stringing together mutliple commands with a semicolon. Even with a long wait it fails. However, in the interactive prompt, immediately after I get the error it starts to work. And it's not a matter of how long I wait for the element to load. If I just wait 1 second, then get the exception, immediately running the find again works. I tried to replicate this in the script with try/except but that doesn't work either. Could this be some kind of async thing? Does Python need to relinquish control for a moment in order for the page load to register? I'm very confused.

To clarify

Python code:

In [154]: driver = webdriver.PhantomJS()

In [155]: driver.get(redacted_url); driver.maximize_window(); sleep(2); driver.find_element_by_name(redacted_element_name)
---------------------------------------------------------------------------
NoSuchElementException                    Traceback (most recent call last)
<ipython-input-155-5ca245665979> in <module>()
----> 1 driver.get(redacted_url); driver.maximize_window(); sleep(2); driver.find_element_by_name(redacted_element_name)

<traceback snipped>

In [156]: driver.find_element_by_name(redacted_element_name)
Out[156]: <selenium.webdriver.remote.webelement.WebElement (session="a24df3f0-7a32-11e6-b72f-c930f9a60756", element=":wdc:1473826868311")>

Edit: hmm with a long enough wait it does work in the prompt in one line, but still not in the script.

SurgicalOntologist fucked around with this message at 05:30 on Sep 14, 2016

# ? Sep 14, 2016 05:19

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Change to the webdriver for a real browser and see if the behavior happens there. That's one of my first steps when trying to figure problems like this out.

# ? Sep 14, 2016 05:25

SurgicalOntologist: Jun 17, 2004

Thermopyle posted:

Change to the webdriver for a real browser and see if the behavior happens there. That's one of my first steps when trying to figure problems like this out.

Duh, that makes debugging so much easier. Turned out to be URL typo. :doh:

Thanks!

# ? Sep 14, 2016 05:37

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

How come this thing just hangs here?

code:

$ sudo /usr/local/bin/pip -v install --upgrade virtualenv
1 location(s) to search for versions of virtualenv:
* [url]https://pypi.python.org/simple/virtualenv/[/url]
Getting page [url]https://pypi.python.org/simple/virtualenv/[/url]
Looking up "https://pypi.python.org/simple/virtualenv/" in the cache
No cache entry available
Starting new HTTPS connection (1): pypi.python.org
"GET /simple/virtualenv/ HTTP/1.1" 200 10671
Updating cache with response from "https://pypi.python.org/simple/virtualenv/"
Caching b/c date exists and max-age > 0

# ? Sep 14, 2016 20:22

LochNessMonster: Feb 3, 2005; I need about three fitty

So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed.

I can get the information I want, which is the brand, type, mileage, year and the dealer who sells it. For each entry I scrape I put the values in variables.

For now I just want to write them to disk or database, but in the end I'd like to identify each one of them uniquely (unfortunately license plates are usually not listed, so I need to think of something for that...).

As for structuring the data I've been doing some reading but I can't really see what I should use to store this info. Do I use a list, tuple or dictionary?

Ideally I'd parse the info for 100-200 items with 5 attrinutes each. Would it be a good idea to create lists in a list and the order of the items relates to the value inside the list? Like the first item is brand, 2nd type, 3rd milage, etc? And then write them to sqlite with list.pop or something? I'm really eager to hear how you approach questions like these because I don't really know how to proceed.

# ? Sep 15, 2016 21:08

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

LochNessMonster posted:

So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed.

I can get the information I want, which is the brand, type, mileage, year and the dealer who sells it. For each entry I scrape I put the values in variables.

For now I just want to write them to disk or database, but in the end I'd like to identify each one of them uniquely (unfortunately license plates are usually not listed, so I need to think of something for that...).

As for structuring the data I've been doing some reading but I can't really see what I should use to store this info. Do I use a list, tuple or dictionary?

Ideally I'd parse the info for 100-200 items with 5 attrinutes each. Would it be a good idea to create lists in a list and the order of the items relates to the value inside the list? Like the first item is brand, 2nd type, 3rd milage, etc? And then write them to sqlite with list.pop or something? I'm really eager to hear how you approach questions like these because I don't really know how to proceed.

Use a database. A database has tables. Each table is kind of like a spreadsheet. So you'd have a field (column in spreadsheet) for each attribute.

You use some sort of structure like a dictionary when you're dealing with the items in python, and then map the attributes in your python code to fields in the database.

Thermopyle fucked around with this message at 21:12 on Sep 15, 2016

# ? Sep 15, 2016 21:10

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

LochNessMonster posted:

So my (newbie) webscraping project is actually making some progress. I'm scraping a site with motorcycles and would like to store them in a sqlite3 database but am not sure on how to proceed.

I can get the information I want, which is the brand, type, mileage, year and the dealer who sells it. For each entry I scrape I put the values in variables.

For now I just want to write them to disk or database, but in the end I'd like to identify each one of them uniquely (unfortunately license plates are usually not listed, so I need to think of something for that...).

As for structuring the data I've been doing some reading but I can't really see what I should use to store this info. Do I use a list, tuple or dictionary?

Ideally I'd parse the info for 100-200 items with 5 attrinutes each. Would it be a good idea to create lists in a list and the order of the items relates to the value inside the list? Like the first item is brand, 2nd type, 3rd milage, etc? And then write them to sqlite with list.pop or something? I'm really eager to hear how you approach questions like these because I don't really know how to proceed.

You use sqlite3 by using it with SQL. You first create a "table", which is just set of defined columns.

CREATE TABLE motorcycles(brand, type, mileage INTEGER, year INTEGER);

Then you can add rows with the INSERT statement:

INSERT INTO motorcycles VALUES("Toyota", "V8", 32000, 2008);

To execute SQL, use the cursor.execute():

Python code:

import sqlite3
conn = sqlite3.connect('motorcycles.db')
cursor = conn.cursor()

cursor.execute("CREATE TABLE motorcycles(brand, type, mileage INTEGER, year INTEGER)")

for (brand, type, mileage, year) in motorcycles:
    cursor.execute("INSERT INTO motorcycles VALUES(?, ?, ?, ?)", brand, type, mileage, year)

# ? Sep 15, 2016 21:59

LochNessMonster: Feb 3, 2005; I need about three fitty

Thermopyle posted:

Use a database. A database has tables. Each table is kind of like a spreadsheet. So you'd have a field (column in spreadsheet) for each attribute.

You use some sort of structure like a dictionary when you're dealing with the items in python, and then map the attributes in your python code to fields in the database.

I'm definately going to write the data to a database. What I'm struggling with is how to structure the data with my script before writing to disk.

The program currently scrapes html pages that are saved to disk. Each page has 10 vehicles in it, so I loop through the divs parsing the values for brand/type/milage/year/dealer.

That's where I am right now.

To get that data to a database I figured I should probably "store" that data inside the script for at least all vehicles on 1 page, or for all vehicles on all the pages.

When I have parsed 1 page (and thus 10 vehicles) I write all of them to the database at once, instead of opening (and closing) a db connection for each iteration of the page loop(s).

So I think I need to put the vehicle info in a list, tuple or dictionary before writing it to the database.

I could be missing something obvious though.

# ? Sep 15, 2016 21:59

Tigren: Oct 3, 2003

LochNessMonster posted:

I'm definately going to write the data to a database. What I'm struggling with is how to structure the data with my script before writing to disk.

The program currently scrapes html pages that are saved to disk. Each page has 10 vehicles in it, so I loop through the divs parsing the values for brand/type/milage/year/dealer.

That's where I am right now.

To get that data to a database I figured I should probably "store" that data inside the script for at least all vehicles on 1 page, or for all vehicles on all the pages.

When I have parsed 1 page (and thus 10 vehicles) I write all of them to the database at once, instead of opening (and closing) a db connection for each iteration of the page loop(s).

So I think I need to put the vehicle info in a list, tuple or dictionary before writing it to the database.

I could be missing something obvious though.

That's exactly what dicts and lists are for. So you could have a list called list_of_motorcycles and each motorcycle could be a dict, which links keys to values. Once you have parsed all the information for one motorcycle, append that dictionary to the end of your list_of_motorcycles and move to the next one. Then, at the end, you've got a big ol' list of motorcycle dictionaries.

But really, you'll already have the db connection open and I guarantee you that you won't be taxing sqlite by calling 100 INSERTs. So you can just add each entry instead of storing them in another object before adding them.

Python code:

list_of_motorcycles = list() # This establishes an empty list called list_of_motorcycles

# loop through the divs parsing the values for brand/type/milage/year/dealer. 
brand = 'Kawasaki'
type = 'Ninja'
mileage = 10000
year = '2016'
dealer = 'Crazy Ed'
motorcycle = {'brand': brand,
                          'type': type,
                          'mileage': mileage,
                          'year': year,
                          'dealer': dealer}
list_of_motorcycles.append(motorcycle)
# end of loop

Tigren fucked around with this message at 23:07 on Sep 15, 2016

# ? Sep 15, 2016 22:49

LochNessMonster: Feb 3, 2005; I need about three fitty

Tigren posted:

That's exactly what dicts and lists are for. So you could have a list called list_of_motorcycles and each motorcycle could be a dict, which links keys to values. Once you have parsed all the information for one motorcycle, append that dictionary to the end of your list_of_motorcycles and move to the next one. Then, at the end, you've got a big ol' list of motorcycle dictionaries.
Python code:
list_of_motorcycles = list() # This establishes an empty list called list_of_motorcycles

# loop through the divs parsing the values for brand/type/milage/year/dealer. 
brand = 'Kawasaki'
type = 'Ninja'
mileage = 10000
year = '2016'
dealer = 'Crazy Ed'
motorcycle = {'brand': brand,
                          'type': type,
                          'mileage': mileage,
                          'year': year,
                          'dealer': dealer}
list_of_motorcycles.append(motorcycle)
# end of loop

Thank you, this what I had in mind as a concept, but I had no clue if it would be a good and/or efficient way of doing it. I also didn't know what type I should've used. Thanks for helping me figuring this out!

After I manage to do this, I can create a loop based on Suspicious Dishs sql query to write to a database.

# ? Sep 15, 2016 23:05

Master_Odin: Apr 15, 2010; My spear never misses its mark...

ladies

If all you're doing is taking the data from the form and not doing anything meaningful with it before insertion into the DB, you may as well as just insert it into DB immediately. There's no penalty leaving the database connection open for the length of scraping, especially since it's sqlite.

# ? Sep 16, 2016 00:43

LochNessMonster: Feb 3, 2005; I need about three fitty

Master_Odin posted:

If all you're doing is taking the data from the form and not doing anything meaningful with it before insertion into the DB, you may as well as just insert it into DB immediately. There's no penalty leaving the database connection open for the length of scraping, especially since it's sqlite.

I'm really new to programming so I'm taking baby steps. In the near future I will be doing stuff with the data before inserting it. Good to know I could keepthe connection open if I for some reasonneed to in the future though.

# ? Sep 16, 2016 01:03

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

fletcher posted:

How come this thing just hangs here?

code:

$ sudo /usr/local/bin/pip -v install --upgrade virtualenv
1 location(s) to search for versions of virtualenv:
* [url]https://pypi.python.org/simple/virtualenv/[/url]
Getting page [url]https://pypi.python.org/simple/virtualenv/[/url]
Looking up "https://pypi.python.org/simple/virtualenv/" in the cache
No cache entry available
Starting new HTTPS connection (1): pypi.python.org
"GET /simple/virtualenv/ HTTP/1.1" 200 10671
Updating cache with response from "https://pypi.python.org/simple/virtualenv/"
Caching b/c date exists and max-age > 0

Fixed it with:

code:

rm -rf /root/.cache/pip

# ? Sep 16, 2016 02:38

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

LochNessMonster posted:

Thank you, this what I had in mind as a concept, but I had no clue if it would be a good and/or efficient way of doing it. I also didn't know what type I should've used. Thanks for helping me figuring this out!

After I manage to do this, I can create a loop based on Suspicious Dishs sql query to write to a database.

you could also use sqlalchemy for this. first create your database entity:

Python code:

from sqlalchemy import Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Motorcycle(Base):
	__tablename__ = 'motorcycles'
	id = Column(Integer, primary_key=True)
	brand = Column(String)
	mileage = Column(Integer)
	year = Column(Integer)
	dealer = Column(String)

then you can use Motorcycle for your inserts:

Python code:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from yourmodule import Motorcycle

engine = create_engine('sqlite:///yourdb')
Session = sessionmaker(bind=engine)
# This creates your table, if it doesn't exist.
Motorcycle.metadata.create_all(engine)

# Populate this with actual data
results = [Motorcycle(brand='whatever', mileage=1, year=2000, dealer='whoever'),
           Motorcycle(brand='welp', mileage=10, year=2000, dealer='whoever')]
session = Session()
for bike in results:
    session.add(bike)
session.commit()

bit more overhead but it can be easier to maintain if your project starts growing.

# ? Sep 16, 2016 11:32

some kinda jackal: Feb 25, 2003; �
�

Is there a good book for people new to Python who aren't new to programming in general?

Essentially less of the "this is how you add two numbers" and more of a "Python for C# Developers" style of book?

# ? Sep 16, 2016 17:21

LochNessMonster: Feb 3, 2005; I need about three fitty

Dex posted:

you could also use sqlalchemy for this.

bit more overhead but it can be easier to maintain if your project starts growing.

It looks a bit more complicated, is it something I should be able to do as a complete beginner?

What are the pros/cons compared to the plain sqlite3 method?

# ? Sep 16, 2016 17:28

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Martytoof posted:

Is there a good book for people new to Python who aren't new to programming in general?

Essentially less of the "this is how you add two numbers" and more of a "Python for C# Developers" style of book?

I googled for "python for java developers" since C# and Java are so similar and came up with this: https://antrix.net/static/pages/python-for-java/online/

After a quick skim it seems alright. Note that he focuses on Python 2.7 whereas a decent amount of developers are on Python 3. But it will be close enough...

# ? Sep 16, 2016 17:30

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

LochNessMonster posted:

It looks a bit more complicated, is it something I should be able to do as a complete beginner?

What are the pros/cons compared to the plain sqlite3 method?

Use plain sqlite for now.

What you learn will be super useful when you go to start using an ORM like sqlalchemy as it'l help you understand it.

# ? Sep 16, 2016 17:31

some kinda jackal: Feb 25, 2003; �
�

Thermopyle posted:

I googled for "python for java developers" since C# and Java are so similar and came up with this: https://antrix.net/static/pages/python-for-java/online/

After a quick skim it seems alright. Note that he focuses on Python 2.7 whereas a decent amount of developers are on Python 3. But it will be close enough...

Looks good to me! Thanks. I'm versed in Java and C# but honestly any language will be fine, I just have a hard time putting up with the first few chapters of most books where we get into "this is a variable" "this is a statement" "here is a definition of an object" and I lose all interest in going any further :P

# ? Sep 16, 2016 17:36

LochNessMonster: Feb 3, 2005; I need about three fitty

Thermopyle posted:

Use plain sqlite for now.

What you learn will be super useful when you go to start using an ORM like sqlalchemy as it'l help you understand it.

Was just reading about sqlalchemy and while it looks like overkill for now, I might indeed need it in the future.

I'll start fiddling with sqlite for now.

edit:

I must say I'm starting to enjoy this a lot more than I thought I would. I once had a Java programming class which had a horrible teacher (several years consistant 90% of the class failing) and thought programming sucks balls.

A few years back I was trying to find a specific brand/type motorcycle and I was quite annoyed I had to manually check a lot of dealer sites to see if they had any, so I figured that'd be a nice project to start with.

I currently now have a scraper/parser working that gets roughly 100 vehicles from 1 dealer who has a standard webgallery which I've seen several other dealers use as well. Hopefully I can get the data of another 10-20 dealers this way.

I'm pretty sure I'll be running into a lot of design flaws in a while, but figuring those out seems like half the fun.

LochNessMonster fucked around with this message at 19:59 on Sep 16, 2016

# ? Sep 16, 2016 17:41

Dex: May 26, 2006; Quintuple x!!!

Would not escrow again.

VERY MISLEADING!

LochNessMonster posted:

I'm pretty sure I'll be running into a lot of design flaws in a while, but figuring those out seems like half the fun.

i don't think i've ever got through an entire project without rethinking something i did at the start and could have done better. getting something working and being happy with how it works are two totally different things, i barely ever accomplish the latter. it's good to know when to just leave something be though otherwise you'll never finish anything!

sqlalchemy is definitely overkill for what you're trying to do, it's just handy to keep in mind if things start growing. the more you understand using database connections and sql directly, the easier the sqlalchemy stuff is to figure out, though, so there's no real rush to start using it - was just throwing it out there as an option to look into if you're curious

# ? Sep 16, 2016 20:10

Proteus Jones: Feb 28, 2013

I'm in a weird situation where I'm developing a python script on one machine in a conda environment, but I need to run it on a Windows box. Normally, not a big deal since I have Anaconda installed and up to date on both.

However, this script relies on pexpect, which has some unix lib dependancies. So I'm limited to a Cygwin instance here. I tried to install the linux Anaconda, but it dumps out when it's trying to install conda. So my question is this:

pexpect is currently the only thing other std libs that I'm using for this and it's already installed in Cygwin via pip.. Python3 on Cygwin is 3.4.3. So far, I've only specified my conda envs with "python3". Can I specify a specific point release of python for my env? If not, what should I look out for in my source to make sure it will execute in the Cygwin environment?

# ? Sep 19, 2016 18:09

BigRedDot: Mar 6, 2008

flosofl posted:

Can I specify a specific point release of python for my env?

Yes, you can specify python versions like any other conda package:

code:

[bryan:...e-datavis-with-bokeh-python]$ conda create -n foo python=3.4.1                                 (master)
Fetching package metadata ...........
Solving package specifications: ..........

Package plan for installation in environment /Users/bryan/anaconda/envs/foo:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    setuptools-27.2.0          |           py34_0         528 KB

The following NEW packages will be INSTALLED:

    openssl:    1.0.1k-1
    pip:        8.1.2-py34_0
    python:     3.4.1-4
    readline:   6.2-2
    setuptools: 27.2.0-py34_0
    sqlite:     3.13.0-0
    tk:         8.5.18-0
    wheel:      0.29.0-py34_0
    xz:         5.0.5-1
    zlib:       1.2.8-3

Proceed ([y]/n)?

# ? Sep 19, 2016 21:32

Proteus Jones: Feb 28, 2013

BigRedDot posted:

Yes, you can specify python versions like any other conda package:

code:


[bryan:...e-datavis-with-bokeh-python]$ conda create -n foo python=3.4.1                                 (master)
Fetching package metadata ...........
Solving package specifications: ..........

Package plan for installation in environment /Users/bryan/anaconda/envs/foo:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    setuptools-27.2.0          |           py34_0         528 KB

The following NEW packages will be INSTALLED:

    openssl:    1.0.1k-1
    pip:        8.1.2-py34_0
    python:     3.4.1-4
    readline:   6.2-2
    setuptools: 27.2.0-py34_0
    sqlite:     3.13.0-0
    tk:         8.5.18-0
    wheel:      0.29.0-py34_0
    xz:         5.0.5-1
    zlib:       1.2.8-3

Proceed ([y]/n)?

Thanks!

# ? Sep 19, 2016 21:39

SurgicalOntologist: Jun 17, 2004

Cross-posting another question from the Bokeh Google Group:

quote:

I have a tricky case of circular callbacks and I'm wondering if anyone has any thoughts.

Basically, I have a HTML5 video whose currentTime property I want linked to the currently selected row of a DataTable. Currently this is mediated through a dummy glyph (until https://github.com/bokeh/bokeh/issues/3674 at least) with something like time_glyph.data_source.set('data', {'time': [video.currentTime]}); I have a server-side callback for that source where I update the selection of the data table (as well as a bunch of other things). Works great!

Now, I'm trying to link things in the other direction, so the user can use the arrow keys in the data table to go back and forth in the video. My first try was with some CustomJS callback like video.currentTime = cb_obj.get('selected')['1d'].indices[0] / 10; Of course, this leads to a circular callback problem.

Anyone have any ideas for a workaround? I'm stumped.

SurgicalOntologist fucked around with this message at 21:50 on Sep 19, 2016

# ? Sep 19, 2016 21:42

huhu: Feb 24, 2006

I've got this code that is searching through a directory and trying to match files up to another directory. If the file in the first is found with the same name in the second, it'l move the file in the first directory to the second.

code:

for filename_current in filenames_current:
	print("Checking %s to %s" % (filename_new, filename_current))
	if (filename_new == filename_current):
		print('Moving "%s" to "./uploads"...' % (filename_new),end="")
		filename_new_path = os.path.join(pdf_folder_new,filename_new)
		filename_current_path = os.path.join(foldername_current,filename_current)	
		shutil.move(filename_new_path, filename_current_path)
		print('done')
		break

Everything looks fine when running

code:

Checking file3.pdf to file1.pdf
Checking file3.pdf to file2.pdf
Checking file3.pdf to file3.pdf
Moving "file3.pdf" to "./uploads"...done

However, the code keeps running...

code:

Checking file3.pdf to file4.pdf
Checking file3.pdf to file5.pdf

Am I doing something wrong with the break statement?

# ? Sep 19, 2016 22:32

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 07:51

Edison was a dick: Apr 3, 2010; direct current only

huhu posted:

I've got this code that is searching through a directory and trying to match files up to another directory. If the file in the first is found with the same name in the second, it'l move the file in the first directory to the second.
code:
for filename_current in filenames_current:
	print("Checking %s to %s" % (filename_new, filename_current))
	if (filename_new == filename_current):
		print('Moving "%s" to "./uploads"...' % (filename_new),end="")
		filename_new_path = os.path.join(pdf_folder_new,filename_new)
		filename_current_path = os.path.join(foldername_current,filename_current)	
		shutil.move(filename_new_path, filename_current_path)
		print('done')
		break
Everything looks fine when running
code:
Checking file3.pdf to file1.pdf
Checking file3.pdf to file2.pdf
Checking file3.pdf to file3.pdf
Moving "file3.pdf" to "./uploads"...done
However, the code keeps running...
code:
Checking file3.pdf to file4.pdf
Checking file3.pdf to file5.pdf
Am I doing something wrong with the break statement?

Break only takes you out of the current for loop, you presumably have another loop around that, which you're not breaking out of.
The simplest thing might be to set a variable when you've found it, and check that variable at the top of your outer loop and break if the variable is set.

# ? Sep 19, 2016 22:39

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »