Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

Cat Plus Plus: Apr 8, 2011

Modern Pragmatist posted:

I guess one way around it would be to just leave it as a bytestring until we read in all values, and then convert them using the specific encoding if it's provided and iso8859 otherwise.

Yes, that's better. If you decode non-ISO8859 bytestring to Unicode with ISO8859 codec, you've already corrupted the data � reencoding and then decoding with a different codec will not buy you anything but further corruption.

# ? Jul 31, 2012 19:50

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 06:04

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

So, it sounds like to me that you decoded a bytestring incorrectly, and now you want to try again using a different encoding specified in the file.

The correct way to fix that is not to re-encode back out to a bytestring, but just keep the bytestring around.

# ? Jul 31, 2012 19:51

Modern Pragmatist: Aug 20, 2008

Ok great. That's finally starting to click. I'll have to rework some things to get that setup but I'll probably end up storing all the bytestrings and then decode when a specific encoding is provided.

Another related question then. Say I have the Item class:

Python code:

class Item:
    def __init__(self,value,encoding):
        self._bytestring = value
        self.value = self._bytestring.decode(encoding)
        self._encoding = encoding

Now if the user wants to change the value of Item.value, then how do you go about ensuring that the string provided uses the encoding that we need? I'm guess the answer is that you cannot, in which case would I have to create a method that requires the user to specify which encoding they are using or requires them to provide a bytestring instead?

# ? Jul 31, 2012 20:26

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Why are you not using the Unicode sandwich?

# ? Jul 31, 2012 20:28

Modern Pragmatist: Aug 20, 2008

Suspicious Dish posted:

Why are you not using the Unicode sandwich?

Just watched Ned Batchelder's talk at Pycon. I'll take a stab at it. Thanks.

# ? Jul 31, 2012 20:56

nonathlon: Jul 9, 2004; And yet, somehow, now it's my fault ...

Up to now, I've been using Epydoc for API documentation. Even when Epydoc was abandoned and starting bitrotting, I just patched it up and kept going. But finally it's time to get with the new hotness, so I'm moving everything across to Sphinx, which has involved learning a whole new way of doing things. Some odd questions:

* I'm extracting docstrings from my code with apidoc. The big irritation at this is that apidoc looks in a given directory for modules and then parses them ... which means that if you point it at the top level of a typical python package, it picks up setup.py and the tests dir as well. apidoc says it can exclude modules as well, but this functionality is only partial working. Any better way of doing things?

* Sphinx uses/creates hella files and dirs. I'd prefer to just package (and distribute) the output rather than all the doctrees etc. What are other people doing?

* Swapping doc tools meant swapping the format doctsrings are in. The native Sphinx format looks cryptic to me. The google form looks better as does the stuff used by the Cartouche extension. Opinions.

* I produce a lot of namespaced packages (e.g. foo.bar), which Sphinx doesn't seem to entirely like. Or rather, it present the documentation for them in a strange way:

code:

Welcome to foo.bar's documentation!

foo.bar
   foo Package

bar Package
   bar Package
   analysis Module
   nodes Module
  Subpackages

io Package
   io Package
   baseio Module
  dialect Module

which doesn't seem entirely useful to me for a package that's physically structured:

code:

foo
- bar
   - analysis.py
   - nodes.py
   - io (dir)
       - baseio.py
       - dialect.py

# ? Aug 2, 2012 15:30

the: Jul 18, 2004; by Cowcaster

So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

# ? Aug 3, 2012 13:16

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

the posted:

So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

Are we talking precisely 4.25, or just something that is 4.25 to some degree of precision? If it is the former you can just use the index() method.

As in: mylist.index(4.25)

If it needn't be precisely 4.25 but just quite close, I guess you have to do something more involved.

# ? Aug 3, 2012 13:22

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

the posted:

So I have an array of a bunch of floats. I want to find what spot in the array that 4.25 is in there (it's in there). How do I do that?

I guess I'll take the more involved approach. Why do you think you need the index into the array? You rarely need it in Python.

# ? Aug 3, 2012 14:58

lunar detritus: May 6, 2009

This is probably stupid but I want to execute a command line program from a webpage using python. I'm probably going to use flask and just put a 'Start' and a 'Stop button that executes and stops the program. I haven't tried yet but I should use subprocess, right?

My main question is, is there a way to capture the shell's output? The program captures audio and it displays in the shell how big the file is and how long it has been recording. I'm guessing I can use AJAX in the client-side to constantly check for updates but I'm not sure how to capture and refresh that info server-side. Any ideas?

# ? Aug 3, 2012 15:01

Emacs Headroom: Aug 2, 2003

Suspicious Dish posted:

I guess I'll take the more involved approach. Why do you think you need the index into the array? You rarely need it in Python.

He's probably doing something with his physics homework where he has two measurements and he's got to get the measurement of the second at the same time that the first one is at a certain value.

The annoying thing though (other than the fact he's coming to this thread constantly for his physics homework) is that in this post addressed to him in this very drat thread I explained in detail how to do this with numpy.

It's like talking to a wall. A wall that wants more help with its homework.

# ? Aug 3, 2012 15:19

the: Jul 18, 2004; by Cowcaster

Sorry, sometimes I forget if I ask the same question. I really do appreciate the help.

It was what he said. I had two arrays of 50 elements that were related. One had an ith position that had a value of 4.25, and I wanted to find out what i was so I could figure out what the ith element of the second array was.

I ended up doing:

Python code:

for i in [i for i,x in enumerate(testlist) if x == 1]: print i

which worked perfectly.

# ? Aug 3, 2012 17:19

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

the posted:

It was what he said. I had two arrays of 50 elements that were related. One had an ith position that had a value of 4.25, and I wanted to find out what i was so I could figure out what the ith element of the second array was.

code:

for a, b in zip(L1, L2):
    if matches(a):
        return b

# ? Aug 3, 2012 18:22

MC Fruit Stripe: Nov 26, 2002; around and around we go

I thought of a specific task I wanted to do in Python, really more of an excuse to learn more Python than anything else, but I've hit a brick wall. What command or module do I want to be using to pull specific text from a website?

I am trying to extract the styles from an artist's Allmusic.com page. In the source code of the site is this

code:

<dd class="styles">
    <ul>
    <li><a href="/style/club-dance-ma0000002544">Club/Dance</a></li>
    <li><a href="/style/soul-ma0000002865">Soul</a></li>
    <li><a href="/style/urban-ma0000011965">Urban</a></li>
    <li><a href="/style/adult-contemporary-r-b-ma0000012131">Adult Contemporary R&B</a></li>
    <li><a href="/style/contemporary-pop-rock-ma0000004443">Contemporary Pop/Rock</a></li>
    <li><a href="/style/contemporary-r-b-ma0000002969">Contemporary R&B</a></li>
    <li><a href="/style/dance-rock-ma0000012069">Dance-Rock</a></li>
    <li><a href="/style/funk-ma0000002606">Funk</a></li>
    <li><a href="/style/alternative-indie-rock-ma0000012230">Alternative/Indie Rock</a></li>
    <li><a href="/style/dance-pop-ma0000004548">Dance-Pop</a></li>
    <li><a href="/style/neo-psychedelia-ma0000012252">Neo-Psychedelia</a></li>
    </ul>
</dd>

I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

# ? Aug 4, 2012 19:48

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

MC Fruit Stripe posted:

I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

First of all, does Allmusic.com have an API? If so, use that instead of page scraping.

If it doesn't, then you can scrape the page. Use lxml.html to parse the HTML file, and extract the contents you want.

# ? Aug 4, 2012 19:54

OnceIWasAnOstrich: Jul 22, 2006

MC Fruit Stripe posted:

I thought of a specific task I wanted to do in Python, really more of an excuse to learn more Python than anything else, but I've hit a brick wall. What command or module do I want to be using to pull specific text from a website?

I am trying to extract the styles from an artist's Allmusic.com page. In the source code of the site is this
code:
<dd class="styles">
    <ul>
    <li><a href="/style/club-dance-ma0000002544">Club/Dance</a></li>
    <li><a href="/style/soul-ma0000002865">Soul</a></li>
    <li><a href="/style/urban-ma0000011965">Urban</a></li>
    <li><a href="/style/adult-contemporary-r-b-ma0000012131">Adult Contemporary R&B</a></li>
    <li><a href="/style/contemporary-pop-rock-ma0000004443">Contemporary Pop/Rock</a></li>
    <li><a href="/style/contemporary-r-b-ma0000002969">Contemporary R&B</a></li>
    <li><a href="/style/dance-rock-ma0000012069">Dance-Rock</a></li>
    <li><a href="/style/funk-ma0000002606">Funk</a></li>
    <li><a href="/style/alternative-indie-rock-ma0000012230">Alternative/Indie Rock</a></li>
    <li><a href="/style/dance-pop-ma0000004548">Dance-Pop</a></li>
    <li><a href="/style/neo-psychedelia-ma0000012252">Neo-Psychedelia</a></li>
    </ul>
</dd>
I'd like to tell it to read the file, which I am currently grabbing with wget, and to save the file with only what's between <dd class="styles"> and </dd>. I could then take those results, and tell it to tell me only what's between > and < because the only thing that is ever between those in that list is the plain text name of the style.

The problem that I am having is that even though I can get python to input the file, print it out, print out only X characters, print starting from X character etc, I am not understanding how to extract a section of the file. Admittedly, I know diddly, so a little help would be appreciated.

Take a look at the BeautifulSoup library or Scrapy, much easier than trying to extract bits of HTML with string commands.

# ? Aug 4, 2012 19:57

MC Fruit Stripe: Nov 26, 2002; around and around we go

I appreciate the answers, and will sit down with all of the options tonight and see what I can learn. Thank you.

# ? Aug 4, 2012 20:09

lunar detritus: May 6, 2009

MC Fruit Stripe posted:

I appreciate the answers, and will sit down with all of the options tonight and see what I can learn. Thank you.

For what's worth I used Scrapy for something very similar and it worked perfectly well (at least once I figured out the appropriate XPath selector).

# ? Aug 4, 2012 22:12

Titan Coeus: Jul 30, 2007; check out my horn

OnceIWasAnOstrich posted:

Take a look at the BeautifulSoup library or Scrapy, much easier than trying to extract bits of HTML with string commands.

Seconding BeautifulSoup. Made my life much easier when parsing poorly formatted HTML.

# ? Aug 4, 2012 23:18

MC Fruit Stripe: Nov 26, 2002; around and around we go

Sigh. I am using BeautifulSoup.

All I want is to take a page on Allmusic.com, say The Beatles page, located here: http://www.allmusic.com/artist/the-beatles-mn0000754032

And ultimately, after however many steps, have exactly the following result

code:

The Beatles	British Invasion;British Psychedelia;Contemporary Pop/Rock;Early Pop/Rock

(Styles pruned for table breakage)

That just does not feel that difficult. But I can get it to return the page, return the page with formatting stripped, I can get it to tell me where 'styles' is, return every link or bolded word in the page, but I just can not get it to grab a section of text.

Help?

I've been trying to use the findAll() function, but it's as useless as I am. The problem is that there's nothing specific to find. I am not looking for the word styles, I am looking for the word styles then the next X words in the document until the list is over, and that is my total confusion. I don't want to search for text, I want to find a word, then return text until another word.

MC Fruit Stripe fucked around with this message at 00:34 on Aug 5, 2012

# ? Aug 5, 2012 00:28

Titan Coeus: Jul 30, 2007; check out my horn

MC Fruit Stripe posted:

Either they changed the meaning of parsing (unlikely), what I am trying to do is more complicated than space exploration (less likely), or I am just not loving seeing this (very likely).

All I want is to take a page on Allmusic.com, say The Beatles page, located here: http://www.allmusic.com/artist/the-beatles-mn0000754032

And ultimately, after however many steps, have exactly the following result
code:
The Beatles	British Invasion;British Psychedelia;Contemporary Pop/Rock;Early Pop/Rock
(Styles pruned for table breakage)

That just does not feel that difficult. But I can get it to return the page, return the page with formatting stripped, I can get it to tell me where 'styles' is, return every link or bolded word in the page, but I just can not get it to grab a section of text.

Help?

I have no idea how to suggest improvements as I have no clue what you are doing.

# ? Aug 5, 2012 00:31

Emacs Headroom: Aug 2, 2003

MC Fruit Stripe posted:

The problem is that there's nothing specific to find. I am not looking for the word styles, I am looking for the word styles then the next X words in the document until the list is over, and that is my total confusion. I don't want to search for text, I want to find a word, then return text until another word.

Couple of hints. First, you can use the Firefox Inspector (Tools -> Web Developer -> Inspect) to see that every style link is under a particular div. In this case, it's under dd.styles (which itself is under dl.details which is under div#sidebar.left). So you can use beautifulsoup to return stuff under that div.

Another hint, if the former seems like something in Martian, is to notice that every style listed there is a link, and the link is of the form http://www.allmusic.com/style/blahblahblah

So you can search for that link format, and return the text associated with that href. It would be better just to grab stuff in the div though.

If you don't know anything about how html is structured, this will be slightly harder for you.

Emacs Headroom fucked around with this message at 00:45 on Aug 5, 2012

# ? Aug 5, 2012 00:43

how!!: Nov 19, 2011; by angerbot

I am writing a python library. It is very simple, and laid out almost exactly like the project described here: http://guide.python-distribute.org/quickstart.html#lay-out-your-project

My setup.py looks the same too. When I create a new virtualenv, then run python setup.py install, I get this output:

code:

$ python setup.py install
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'install_requires'
  warnings.warn(msg)
running install
running build
running build_py
running install_lib
creating /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy
copying build/lib/etsy/__init__.py -> /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy
byte-compiling /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/etsy/__init__.py to __init__.pyc
running install_egg_info
Writing /Users/chris/Documents/python-etsy/ettest/lib/python2.7/site-packages/python_etsy-0.1-py2.7.egg-info

Then when I go to the lib/python2.7/site-packages/ folder, I see my module there, but when I try to import it from the shell, it gives me an "ImportError: No module named etsy"

the code for my project is here: https://github.com/priestc/python-etsy

What could be causing this?

# ? Aug 5, 2012 17:53

Cat Plus Plus: Apr 8, 2011

how!! posted:

What could be causing this?

PYTHONPATH might be wrong. Inspect sys.path in that shell. Make sure virtualenv is active before you start the shell.

# ? Aug 5, 2012 18:31

Ulio: Feb 17, 2011

Why am I getting a syntax error for the 7 in this "Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32" ???

# ? Aug 8, 2012 00:32

Lysidas: Jul 26, 2002; John Diefenbaker is a madman who thinks he's John Diefenbaker.; Pillbug

Because that isn't Python code? :confused:

# ? Aug 8, 2012 01:35

Ulio: Feb 17, 2011

nvm fixed it, thanks. IDLE was just being weird.

edit: Everything I type on IDLE says invalid syntax. It even errors the text written at the very top which appears each time on its own.

Ulio fucked around with this message at 02:20 on Aug 8, 2012

# ? Aug 8, 2012 02:07

Titan Coeus: Jul 30, 2007; check out my horn

Ulio posted:

nvm fixed it, thanks. IDLE was just being weird.

edit: Everything I type on IDLE says invalid syntax. It even errors the text written at the very top which appears each time on its own.

Screenshot?

# ? Aug 8, 2012 04:23

Ulio: Feb 17, 2011

I guessing it's something simple I am forgetting but here is a saved file I'm trying to run.

http://imgur.com/jXifV

# ? Aug 8, 2012 06:22

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Yes, that's invalid syntax.

# ? Aug 8, 2012 06:24

Hammerite: Mar 9, 2007; And you don't remember what I said here, either, but it was pompous and stupid.; Jade Ear Joe

Suspicious Dish posted:

Yes, that's invalid syntax.

More specifically...

You need another + between name and "!" on the final line

# ? Aug 8, 2012 09:23

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

No. Not only that. He's trying to make Python run a REPL session from a file, and it's choking on the ">>>".

# ? Aug 8, 2012 15:37

Ulio: Feb 17, 2011

Ok I got it to work now, thanks for the answer. It worked when I opened a new window for it.

# ? Aug 8, 2012 21:33

Bodhi Tea: Oct 2, 2006; seconds are secular, moments are mine, self is illusion, music's divine.

Say I have a function with an optional argument which I don't know the default value of:

code:

def foo (max_id=<mystery value>)

and another function, that will call foo(). Is there a more succinct way of doing this?

code:

def bar(max_id=None):
    if max_id:
          foo(max_id)
    else:
          foo()

# ? Aug 12, 2012 18:54

Cat Plus Plus: Apr 8, 2011

Bodhi Tea posted:

Is there a more succinct way of doing this?

First you yell at the developer who wrote that other function, and then you do foo(max_id) (because now function uses None for the default value).

You could do this, but I'm not really sure if it's that much better:

Python code:

def bar(max_id = None):
    foo_args = [max_id] if max_id is not None else []
    foo(*foo_args)

It doesn't buy you much, and you need to think a little longer to see what's happening.

# ? Aug 12, 2012 19:17

tef: May 30, 2004; -> some l-system crap ->

Bodhi Tea posted:

Say I have a function with an optional argument which I don't know the default value of:

code:

def bar(*args, **kwargs):
    foo(*args, **kwargs)

http://docs.python.org/library/functools.html functools.wraps might be useful

# ? Aug 12, 2012 20:21

Cat Plus Plus: Apr 8, 2011

I don't think this the case where all bar does is call foo, because it would make it pretty much pointless. If you use that argument later in the function then going with *args, **kwargs might actually make it worse.

# ? Aug 12, 2012 20:32

Captain von Trapp: Jan 23, 2006; I don't like it, and I'm sorry I ever had anything to do with it.

I apologize for the dumb question, but I am by no stretch a real programmer and am just fooling around with Python for fun. Someone posed a challenge in which I'm to find all four-digit numbers n such that n and 4n have the same digits, rearranged. Originally, I checked them all in Mathematica, where I did this (in a for loop):

code:

If[Sort[IntegerDigits[n]] == Sort[IntegerDigits[4n]], Print[n]]

What's the Pythonic way to do this? At first I tried:

code:

print [x for x in range(1000,10000) if list(str(x)).sort() == list(str(4*x)).sort()]

Which of course fails because sort sorts in place rather than returning a list for the == operator to act on. It would be easy to do this without list comprehensions but I'm guessing this ought to be a one-liner for a programmer. So I did:

code:

print [x for x in range(1000,10000) if set(str(x)) == set(str(4*x))]

Which mostly works but includes spurious results because (say) 4160 multiplied by four has the same digits but one of them is duplicated. Sets don't care about duplication. (Of course I can eliminate this problem by using 2500 for the upper limit of the range but I'm interested in the "proper" way to see if two lists are the same up to ordering.)

# ? Aug 12, 2012 21:26

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

tef posted:

code:
def bar(*args, **kwargs):
    foo(*args, **kwargs)
http://docs.python.org/library/functools.html functools.wraps might be useful

That would make bar(None) behave wrong. I'm not sure what Bodhi wants, though.

# ? Aug 12, 2012 21:37

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 06:04

Suspicious Dish: Sep 24, 2011; 2020 is the year of linux on the desktop, bro; Fun Shoe

Captain von Trapp posted:

Which of course fails because sort sorts in place rather than returning a list for the == operator to act on.

Try the "sorted" function:

code:

print [x for x in range(1000,10000) if sorted(str(x)) == sorted(str(4*x))]

# ? Aug 12, 2012 21:39

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »