Python information and short questions megathread.

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »

lotor9530: Apr 29, 2009

What is the best way to replace multiple variables in a template file with arguments from the command line?

Here's what I'm currently doing:

code:

# parse command line

argv = sys.argv
if len(argv) != 5:
  print "Syntax: file.py foo.txt variable1 variable2 variable3"
  sys.exit()

infile = sys.argv[1]
var1 = float(sys.argv[2])
var2 = int(sys.argv[3])
var3 = int(sys.argv[4])

me = 0

# Write variable2 and variable1 to foo.txt_0
# Create the new config file for writing
config = io.open(sys.argv[1] + '_0', 'w')

# Read the lines from the template, substitute the values, and write to the new config file
for line in io.open('foo.txttemplate', 'r'):
    line = line.replace('$variable1', sys.argv[2])
    config.write(line)

for line in io.open('in.lj17template', 'r'):
    line = line.replace('$variable2', sys.argv[3])
    config.write(line)

# Close the files
config.close()
print("Done, exiting.")

Running this (file.py foo.txt 100 200 300) gives me a file output of:

code:

This is my file that I want to replace 100 and variable2 in.
Blah blah blah.
This is my file that I want to replace variable1 and 200 in.
Blah blah blah.

I tried this, too:

code:

for line in io.open('foo.txttemplate', 'r'):
    line = line.replace('$variable1', sys.argv[2])
    config.write(line)
    line = line.replace('$variable2', sys.argv[3])
    config.write(line)

Which gives:

code:

This is my file that I want to replace 100 and variable2 in.
This is my file that I want to replace variable1 and 200 in.
Blah blah blah.
Blah blah blah.

# ? Sep 1, 2014 18:35

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 22:47

supercrooky: Sep 12, 2006

You're trying to make

code:

This is my file that I want to replace variable1 and variable2 in.
Blah blah blah.

into

code:

This is my file that I want to replace 100 and 200 in.
Blah blah blah.

right?

code:

for line in open(template, 'r'):
    config.write(line.replace('variable1', var1).replace('variable2', var2))

will do what you are trying to do. As for an actual strategy for managing config files, you should look into ConfigParser https://docs.python.org/2/library/configparser.html

# ? Sep 1, 2014 19:46

SurgicalOntologist: Jun 17, 2004

What you tried first is going through the whole file and doing one replacement while writing each line, then going through the whole file again doing the next replacement while writing each line.

Your second try is doing one replacement, then writing the line, then another replacement, then writing the line again, etc.

(At least I think, the fact that you have two different filenames suggests maybe some duplication is intended? It's not completely clear what you're trying to do specifically)

What you should be doing (if I understand your intention), as supercrooky suggested, is for each line, first do all the replacements, and only then write each line. There shouldn't be any need to loop over the file twice, or to do two writes on each iteration.

As a side note, you don't need the io module, the builtin open function is fine. You should also be using the with statement rather than manually closing:

Python code:

with open(sys.argv[1] + '_0', 'w') as config:
    with open('foo.txttemplate', 'r' as template:
        for line in template:
            # do stuff here

# ? Sep 1, 2014 20:53

Nippashish: Nov 2, 2005; Let me see you dance!

You should use a template engine like jinja2 because they are designed specifically to solve this type of problem.

# ? Sep 1, 2014 21:14

lotor9530: Apr 29, 2009

supercrooky posted:

code:
for line in open(template, 'r'):
    config.write(line.replace('variable1', var1).replace('variable2', var2))
will do what you are trying to do. As for an actual strategy for managing config files, you should look into ConfigParser https://docs.python.org/2/library/configparser.html

Thanks! This works, except I had to change var1 and var2 to unicode(var1) and unicode(var2).

# ? Sep 1, 2014 21:15

Dominoes: Sep 20, 2007

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.

Example XML:

XML code:

 <faa:Member>
    <aixm:Navaid gml:id="NAVAID_0000001">
      <aixm:timeSlice>
        <aixm:NavaidTimeSlice gml:id="NAVAID_TS_0000001">
          <gml:validTime>
            <gml:TimePeriod gml:id="NAVAID_TIME_PERIOD_0000001">
              <gml:beginPosition>2014-07-24T00:00:00.000-04:00</gml:beginPosition>
              <gml:endPosition indeterminatePosition="unknown"/>
            </gml:TimePeriod>
          </gml:validTime>
          <aixm:interpretation>BASELINE</aixm:interpretation>
          <aixm:designator>NUD</aixm:designator>
          <aixm:name>ADAK</aixm:name>

Say I want to store the 'name' to a database. Based on The official docs, it looks like I'd use this:

Python code:

tree = ET.parse('filename.xml')
root = tree.getroot()

for child in root:
    name = child[0][0][0][3].text
# ...

This is a delicate solution, as any variance in the entry structure will (and does) cause it to select the wrong element. If this was JSON, here's how I'd parse it:

Python code:

name = root['Member']['Navaid']['NavaidTimeSlice']['name']

Can I do something similar with XML?

Dominoes fucked around with this message at 02:00 on Sep 2, 2014

# ? Sep 2, 2014 01:30

KernelSlanders: May 27, 2013; Rogue operating systems on occasion spread lies and rumors about me.

BigRedDot posted:

Oh, I do too, hence the scare quotes.

Let's go another direction and make the ultra-flexible enterprise scorer that can perform arbitrary accumulations and transforms on the input with configurable decoders:

code:

''' The score.py module provides functions for generalized word 
scoring, as well as specialized scorers for particular rulesets.

'''

def score_word(word, decode, transform=lambda x: x, accum=sum):
    ''' Score an input word according to given decode, transform, and 
    accumulation policies.

    Args:
        word (str): 
            a word to score
        decode (callable): 
            callable taking one letter as input that maps the letter to 
            its score
        transform(callable, optional) : 
            callable that performs any necessary transformation on each 
            letter before decoding (default: identity)
        accum (callable, optional): 
            callable the reduces a sequence of letter scores into a 
            final score (default: sum)

    Returns:
        float : score

    Examples:

    >>> score_word("za", decode=lambda x: 2)
    4
    >>>

    '''
    return accum(decode(letter) for letter in transform(word))

#
# Scrabble (tm) specific scoring functions
# 

_SCRABBLE_SCORE_BY_LETTER = {
    "c": 3,
    "b": 3,
    "d": 2,
    "g": 2,
    "f": 4,
    "h": 4,
    "k": 5,
    "j": 8,
    "m": 3, 
    "q": 10,
    "p": 3,
    "w": 4,
    "v": 4,
    "y": 4,
    "x": 8,
    "z": 10,
}


def scrabble_word(word):
    ''' Score a word according to Scrabble (tm) rules.

    Args:
        word (str) : 
            a word to score

    Returns:
        float : score

    Examples:

    >>> scrabble_word("za")
    11
    >>>

    '''
    return score_word(
        word, 
        decode=lambda x: _SCRABBLE_SCORE_BY_LETTER.get(x, 1), 
        transform=lambda x: x.lower()
    )

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Sorry, it's Labor Day, and I'm, er... uh, working? I leave it as an exercise to the reader to extend this and define an accumulator to handle scoring (double, triple)-(letter, word) modifiers. Don't forget to update the doctests, and re-run Sphinx to generate new API documentation (you'll need sphinx-napoleon to handle the Google style docstrings). Pull requests welcome!

Edit: Unicode support is an open issue; see the GH tracker.

Let's take a simple problem and rather than taking five minutes to solve it, let's figure out if it's a special case of some much larger set of problems that we might possibly (but probably won't) want to solve at some point in the future. Then let's develop a spec for an extensible framework and class library that can be called upon to solve any of those larger classes of problems using unnecessarily cryptic chains of method calls to execute the same functionality as we could have done yesterday when we started all this.

Sorry, but solving a problem that can be solved in 20 lines of code (unless it's running a space shuttle or pacemaker) shouldn't involve opening powerpoint to make a presentation on your proposed "framework."

This is why I hate java.

# ? Sep 2, 2014 03:48

Hed: Mar 31, 2004; Fun Shoe

Dominoes posted:

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.

Example XML:
XML code:
 <faa:Member>
    <aixm:Navaid gml:id="NAVAID_0000001">
      <aixm:timeSlice>
        <aixm:NavaidTimeSlice gml:id="NAVAID_TS_0000001">
          <gml:validTime>
            <gml:TimePeriod gml:id="NAVAID_TIME_PERIOD_0000001">
              <gml:beginPosition>2014-07-24T00:00:00.000-04:00</gml:beginPosition>
              <gml:endPosition indeterminatePosition="unknown"/>
            </gml:TimePeriod>
          </gml:validTime>
          <aixm:interpretation>BASELINE</aixm:interpretation>
          <aixm:designator>NUD</aixm:designator>
          <aixm:name>ADAK</aixm:name>
Say I want to store the 'name' to a database. Based on The official docs, it looks like I'd use this:
Python code:
tree = ET.parse('filename.xml')
root = tree.getroot()

for child in root:
    name = child[0][0][0][3].text
# ...
This is a delicate solution, as any variance in the entry structure will (and does) cause it to select the wrong element. If this was JSON, here's how I'd parse it:
Python code:
name = root['Member']['Navaid']['NavaidTimeSlice']['name']
Can I do something similar with XML?

Take a look at the XPath examples in the Python docs, they should be able to get you going.

# ? Sep 2, 2014 04:16

Space Kablooey: May 6, 2009

Dominoes posted:

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.
*snip*

I used https://pypi.python.org/pypi/xmltodict some time ago and it worked well. Not sure about its performance on large files.

EDIT:

Murodese posted:

Really, really bad. I was using xmltodict to read 300kb-1.5mb XML files and changing it to use XPath resulted in a speedup of something like 15,000%.

Ouch

Space Kablooey fucked around with this message at 15:31 on Sep 2, 2014

# ? Sep 2, 2014 04:27

Murodese: Mar 6, 2007; Think you've got what it takes?
We're looking for fine Men & Women to help Protect the Australian Way of Life.

Become part of the Legend. Defence Jobs.

HardDisk posted:

I used https://pypi.python.org/pypi/xmltodict some time ago and it worked well. Not sure about its performance on large files.

Really, really bad. I was using xmltodict to read 300kb-1.5mb XML files and changing it to use XPath resulted in a speedup of something like 15,000%.

# ? Sep 2, 2014 15:23

BigRedDot: Mar 6, 2008

KernelSlanders posted:

Let's take a simple problem and rather than taking five minutes to solve it, let's figure out if it's a special case of some much larger set of problems that we might possibly (but probably won't) want to solve at some point in the future. Then let's develop a spec for an extensible framework and class library that can be called upon to solve any of those larger classes of problems using unnecessarily cryptic chains of method calls to execute the same functionality as we could have done yesterday when we started all this.

Sorry, but solving a problem that can be solved in 20 lines of code (unless it's running a space shuttle or pacemaker) shouldn't involve opening powerpoint to make a presentation on your proposed "framework."

This is why I hate java.

I hope you don't think I was being serious...

Edit: Well, the docstrings were serious. I do wish more people wrote good docstrings.

Edit2: Although, in truth, it's a balance... I mean, if you've never been in the position of thinking "Goddamn I wish I had generalized this more from the start, what a loving pain in the rear end it's going to be now" then I can only assume you haven't been writing software for very long.

BigRedDot fucked around with this message at 16:02 on Sep 2, 2014

# ? Sep 2, 2014 15:28

Ahz: Jun 17, 2001; PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!

Does anyone have any tips on PDF rendering in Python? I am playing with Report Lab, but it seems slow at about 500ms per page.

I still have to do some optimizing, but If anyone knows any good tips or gotchas, it would help.

# ? Sep 2, 2014 15:56

JHVH-1: Jun 28, 2002

I was doing my scripts using lxml and in some cases I was able to get away with dot notation, but if an element didn't exist you end up with an error so you have to put handling in for it.
I found XML annoying to deal with, but maybe because I was using an older python without all the features the XPath docs have. Anywhere I could get json output instead I always opted for that.

# ? Sep 2, 2014 16:06

David Pratt: Apr 21, 2001

KernelSlanders posted:

This is why I hate java.

Anyone who feels a similar way should read execution in the kingdom of nouns.

# ? Sep 2, 2014 17:11

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

BigRedDot posted:

Edit2: Although, in truth, it's a balance... I mean, if you've never been in the position of thinking "Goddamn I wish I had generalized this more from the start, what a loving pain in the rear end it's going to be now" then I can only assume you haven't been writing software for very long.

While I get where you're coming from, YAGNI. I have to tell myself this all the time.

# ? Sep 2, 2014 17:31

Dominoes: Sep 20, 2007

Hed posted:

Take a look at the XPath examples in the Python docs, they should be able to get you going.

Thanks - that worked. here's the code I ended up using:

Python code:

    aixm = '{[url]http://www.aixm.aero/schema/5.1[/url]}'

    tree = ET.parse(os.path.join(DIR_RES, filename))
    root = tree.getroot()

    for child in root:
        ident = child.findall(".//{0}timeSlice//{0}designator".format(aixm))
        ident = ident[0].text

        n = Navaid(ident=ident...)
        n.save()

# ? Sep 3, 2014 01:05

EAT THE EGGS RICOLA: May 29, 2008

Haystack posted:

Ok, so that sort of data mashing is not the sort of thing you want to be doing inside of your template (down that path lies php). Get your data into shape, then pass it into the template. In your case, you're basically mashing together two dicts, so here's an outline of what your code code might look like.
Python code:
 #If you want things to stay in some sort of order, use ordereddicts instead of plain dicts
value_map = {"unique_id" : "12345", "etc" : "etc"}
human_labels = get_human_lablels() #Your API call, deserialized into a {"value_map key":"human label"} dict

#There's shorter ways to do this mapping, but I left it long for readability
for_humans = {}
for key, value in value_map.iteritems():
    human_label = human_labels[key]
    for_humans[human_label] = value

#Now your template only needs to do a simple loop over the finalized data
template.render("my_template.jinja", values=for_humans))
Generally speaking, anything you pass into a template should already be finalized, or at worst have an API that resolves down to single calls.

Welp I did this correctly and my boss thinks it's too complicated so I got chastised for doing it this way.

lol

# ? Sep 3, 2014 20:13

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

I was hoping to look up definitions of words online, and found out about the nltk module. I've tried to install it from pip on Windows and Linux, with Python 2.7.5 and Python 3.3.4, and it completely screws up when I try to import it. It's just dumb grammar stuff like:

code:

Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\site-packages\nltk\__init__.py", line 37
    except IOError, ex:

Is there something else out there that might help. I must have automatic lookups for crazy Scrabble words, dammit! :mad:

# ? Sep 4, 2014 06:42

Alligator: Jun 10, 2009; LOCK AND LOAF

That specific error is because you have a version of nltk intended for python 2 not 3. From what I can tell you'll need to use the current alpha/beta versions for python 3 support, try running pip with the --pre flag so it'll grab pre-releases.

# ? Sep 4, 2014 08:57

onionradish: Jul 6, 2006; That's spicy.

Rocko Bonaparte posted:

I was hoping to look up definitions of words online, and found out about the nltk module. ... I must have automatic lookups for crazy Scrabble words, dammit!

If your goal is looking up defnitions for Scrabble words, NLTK may not be a great fit. It's primarily useful for analyzing common English, so the crazy words that Scrabble considers valid (like "agma" and "za") and now-common words like "hashtag" or "selfie" aren't in the list. NLTK uses an offline version of WordNet -- you can test out some words yourself to see whether it includes the kinds of words you'd be looking for.

Based on your goal, using the Requests library to send a web query and parse the response from an online dictionary (or finding an API) would likely give you better results.

# ? Sep 4, 2014 13:53

Rocko Bonaparte: Mar 12, 2002; Every day is Friday!

Heh--I figured I was just bitching on the Internet. I didn't actually expect replies!

Alligator posted:

That specific error is because you have a version of nltk intended for python 2 not 3. From what I can tell you'll need to use the current alpha/beta versions for python 3 support, try running pip with the --pre flag so it'll grab pre-releases.

When I try in Python2 it hates me too:

code:

Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk.corpus import wordnet
  File "<stdin>", line 1
    import nltk.corpus import wordnet
                            ^
SyntaxError: invalid syntax

onionradish posted:

Based on your goal, using the Requests library to send a web query and parse the response from an online dictionary (or finding an API) would likely give you better results.

This is probably what I should be doing. It really hadn't occurred to me to just hit up the REST API of some common dictionary site. That's probably why there isn't much about this online.

# ? Sep 4, 2014 16:16

KICK BAMA KICK: Mar 2, 2009

Rocko Bonaparte posted:

Python code:

import nltk.corpus import wordnet

Just a typo:

Python code:

from ntlk.corpus import wordnet

# ? Sep 4, 2014 16:20

Tacos Al Pastor: Jun 20, 2003

Got another question for you guys. My program is about 99% complete but Im a little confused on one part. I'm trying to sort some key values from a dictionary and the code I have is just not working.
All the program does is read text from a file, strip out any punctuation and then return the count for words that are one to 16 characters in length.

Python code:

# Purpose: Call the program with an argument, it should treat the argument as a filename, and process the content of the file.
# The program reads all the input, splitting it up into words, and computes the length of each word. Punctuation marks should not 
# be included as a part of the word, so "it's" should be counted as a three-character word, and "final." should be counted as a 
# five-character word. The example text includes a "word" of zero length (the "&"); your solution should handle this. When all input 
# has been processed ,the program should print a table showing the word count for each of the word lengths that has been encountered. 
# Your tutor will run your code against several standardized inputs to verify the correctness of your logic.  

import sys
import operator
import string
from sys import argv
from collections import Counter
_dict = {'1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0, '9': 0, '10': 0, '11': 0, '12': 0, '13': 0, '14': 0, '15': 0, '16': 0}


def switch(x):
    return {
        1: '1',
        2: '2',
        3: '3',
        4: '4',
        5: '5',
        6: '6',
        7: '7',
        8: '8',
        9: '9',
        10: '10',
        11: '11',
        12: '12',
        13: '13',
        14: '14',
        15: '15',
        16: '16',
    }[x]

def main(argv):
    file_name = sys.argv[1]
    with open(file_name) as fp:
        while True:
            contents = fp.read()
            if not contents: break
            else: 
                contents = ''.join(c for c in contents if c not in ',".?;:&')    #Strip all the punctuation out
                contents = contents.split()                                      #Split contents into a list
                for word in contents:                                            #For ever word thats in the list 
                    cnt = 0                                                      #set the count to 0
                    for letter in word:                                          #for each letter in the word
                        cnt += 1                                                 #increment the count
                    key = switch(cnt)                                            #make the key value the number string in switch
                    _dict[key] += 1                                              #increment number count for each word count in the dict
    print("Length Count")
    for key in sorted(_dict):                                                    #attempted to sort the dict by key value, but apparently this doesnt work(?) 
        print(key, "    ", _dict[key])                                           #print the key, _dict values

if __name__ == "__main__":
    main(sys.argv[1:])

This particular line:

quote:

for key in sorted(_dict):
print(key, " ", _dict[key])

wont work. Im getting the following output for my text file which is the declaration of independence. Nothing is sorted, but the counts are correct:

Length Count
1 16
10 56
11 34
12 14
13 9
14 7
15 2
16 0
2 267
3 267
4 169
5 140
6 112
7 98
8 69
9 61

Any idea what I am doing wrong?

# ? Sep 5, 2014 00:41

EAT THE EGGS RICOLA: May 29, 2008

You're sorting those as strings, not as ints. It's sorting correctly for strings. Change the keys in your dict to integers.

There's a bunch of other stuff you're doing weirdly but I'm on my phone.

# ? Sep 5, 2014 00:47

Dren: Jan 5, 2001; Pillbug

It sorted correctly. Your keys are strings.

# ? Sep 5, 2014 00:47

Tacos Al Pastor: Jun 20, 2003

Dren posted:

It sorted correctly. Your keys are strings.

wow. my god. I cant believe Im that dumb. Thanks!

# ? Sep 5, 2014 00:59

yippee cahier: Mar 28, 2005

I know it's your homework, so hopefully these offer you some things to try out instead of giving away answers.

You should look up defaultdict (https://docs.python.org/3.4/library/collections.html#collections.defaultdict) as a replacement for your _dict object. Pretty useful to know about some cool stuff lurking in the standard library for further down the road. Related: you'll never see a 17 letter long word?
Consider if there are any easier ways to determine the length of an sequence without walking through each item and tallying them using a for loop.
Consider your choice of a dict to hold the tallies. Your keys are sequential integers, you look up a string representation of them using a function called "switch", then look up the value in a dict associated with the string. There's a much simpler way to accomplish this. Also don't name your things "_dict" or "switch".
You never use a Counter or the operator module you import.
Look up the docs for the .read() method on your file object as your code is written expecting different behaviour.
Will you ever see different punctuation than what you have listed? The example given for "it's" will fail. A whitelist of alpha characters to keep would be more robust. The library has something useful for that: https://docs.python.org/3.4/library/string.html#string-constants or isalpha().
It's interesting you're using a generator statement to conditionally filter out punctuation. If you've got a handle on it, see if employing it elsewhere would simplify what you're doing.

# ? Sep 5, 2014 06:46

salisbury shake: Dec 27, 2011

salisbury shake posted:

Anyone have success with bs4 and multiprocessing?
Python code:
from multiprocessing import Pool
from bs4 import BeautifulSoup as BS
import requests

url = 'http://yahoo.com'

with Pool(4) as pool:
    pgs = (requests.get(url).content for _ in range(8))
    wrapped_pgs = pool.map(BS, pgs)
shits itself with a max recursion limit reached RuntimeError, but replace the URL with say google.com and it works just fine. Replace it with the forum URL and get a different RuntimeError. Wrapping content serially works just fine for all urls involved.

If I break out lxml will I run into the same problem? Changing the bs4 backend to lxml still produces errors.

In case anyone cares: this happens because bs4.element.Tag objects cannot be pickled when sent to a Pool's workers because of the overridden __getattr__ method.

Wrote an adapter that translates bs4 api calls and attribute lookups into the appropriate XPath strings and lxml method calls, and didn't have to rewrite anything in the parser

.

Got a 10x speed up in initially wrapping str/bytes content along with a 10x speed up when doing tag queries and attribute lookups. No need to use the multiprocessing module at all.

# ? Sep 5, 2014 19:33

Tacos Al Pastor: Jun 20, 2003

sund posted:

I know it's your homework, so hopefully these offer you some things to try out instead of giving away answers.

[*]You should look up defaultdict (https://docs.python.org/3.4/library/collections.html#collections.defaultdict) as a replacement for your _dict object. Pretty useful to know about some cool stuff lurking in the standard library for further down the road. Related: you'll never see a 17 letter long word?

I did consider this but I just wanted to get the basics working until I could handle this case. Ill take a look into defaultdict.

quote:

[*]Consider if there are any easier ways to determine the length of an sequence without walking through each item and tallying them using a for loop.

I know that some of this can be accomplished using "If Not In" etc types of statements. Beyond that Im not sure.

quote:

[*]Consider your choice of a dict to hold the tallies. Your keys are sequential integers, you look up a string representation of them using a function called "switch", then look up the value in a dict associated with the string. There's a much simpler way to accomplish this. Also don't name your things "_dict" or "switch".

Yeah the naming could be a bit better and the switch statement I felt was a bit unnecessary if there was a way to access dict[key] values without having to go that route. I just havent thought about it enough.

quote:

[*]You never use a Counter or the operator module you import.

This needs to be removed. I was going to use that as a tally to count up the collection of the number of characters for each word in the text.

quote:

[*]Look up the docs for the .read() method on your file object as your code is written expecting different behaviour.

Ok Ill check this out.

quote:

[*]Will you ever see different punctuation than what you have listed? The example given for "it's" will fail. A whitelist of alpha characters to keep would be more robust. The library has something useful for that: https://docs.python.org/3.4/library/string.html#string-constants or isalpha().

I've used isalpha() before and I dont know why I didnt think of this earlier. Thanks.

quote:

[*]It's interesting you're using a generator statement to conditionally filter out punctuation. If you've got a handle on it, see if employing it elsewhere would simplify what you're doing.

Can you give me a hint where it might else be used?

Thanks again for all the pointers!

# ? Sep 5, 2014 21:12

ShadowHawk: Jun 25, 2000; CERTIFIED PRE OWNED TESLA OWNER

spiralbrain posted:

I did consider this but I just wanted to get the basics working until I could handle this case. Ill take a look into defaultdict.

They're simpler to use than you might think. That often happens in Python.

quote:

I know that some of this can be accomplished using "If Not In" etc types of statements. Beyond that Im not sure.

No, much simpler. A built-in, non-library function.

quote:

Yeah the naming could be a bit better and the switch statement I felt was a bit unnecessary if there was a way to access dict[key] values without having to go that route. I just havent thought about it enough.

Again, another built-in function. This one I'll just give you: int(foo) and str(bar)

quote:

This needs to be removed. I was going to use that as a tally to count up the collection of the number of characters for each word in the text.

If you can figure out how to use a Counter, you should probably do that as it's exactly what you're doing here.

quote:

Ok Ill check this out.

Can you give me a hint where it might else be used?

Your hint is these two are related.

# ? Sep 5, 2014 23:30

BeefofAges: Jun 5, 2004; Cry 'Havoc!', and let slip the cows of war.

salisbury shake posted:

In case anyone cares: this happens because bs4.element.Tag objects cannot be pickled when sent to a Pool's workers because of the overridden __getattr__ method.

Wrote an adapter that translates bs4 api calls and attribute lookups into the appropriate XPath strings and lxml method calls, and didn't have to rewrite anything in the parser .

Got a 10x speed up in initially wrapping str/bytes content along with a 10x speed up when doing tag queries and attribute lookups. No need to use the multiprocessing module at all.

You could also try using ThreadPool, since I'm guessing you're at least partially I/O bound, not CPU bound. ThreadPool has the same interface as Pool, except that it creates threads instead of processes. It's undocumented, for whatever reason.

# ? Sep 6, 2014 00:20

Hed: Mar 31, 2004; Fun Shoe

Am I the only one who stopped using defaultdicts? I think there was like one time a few years ago it was easier for me than just using get and setdefault on a normal dict. Ever since then the collections one has sorta fallen out of favor for me.

# ? Sep 6, 2014 01:33

HonorableTB: Dec 22, 2006

How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?

# ? Sep 6, 2014 21:12

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

HonorableTB posted:

How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?

This reminded me that I started a new OP months ago and then forgot...

I'll try to get it up this evening or tomorrow.

In the meantime check out Learn Python the Hard Way and Think Python.

# ? Sep 6, 2014 22:00

null gallagher: Jan 1, 2014

What matters is if you feel you're learning from it. I've seen a lot of people say they want to learn programming but never get anywhere because they get hung up on deciding what language or tutorial to use.

Honestly, from what I've seen of the Code Academy type sites, they've all felt a little too hand-holdy and don't explain what's going on. I tried one of them a while back (I think it was Ruby), and it felt like the tutorial maker was telling me "just type this in, don't worry about what it means or how you'd use it in another situation." Then again, I didn't get too far.

I learned Python from Think Python, and I liked it a lot. I went through it after I'd graduated with my CS/math degree, but the author uses it for his intro CS classes.

# ? Sep 6, 2014 22:02

Dominoes: Sep 20, 2007

HonorableTB posted:

How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?

I learned Python from it. The interactivity helped. I'd recommend it.

# ? Sep 7, 2014 01:01

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

code:

>> x = (1, 2, 3)
>> x[0]
1
>> x[0 - 1]
3
>> x[0 - 2]
2

Buh?

# ? Sep 7, 2014 06:21

Telarra: Oct 9, 2012

Python code:

>> x = (1, 2, 3)
>> x[0]
1
>> x[-1]
3
>> x[-2]
2
>> x[0 : 1]
(1,)
>> x[0 : 2]
(1, 2)

# ? Sep 7, 2014 06:24

Pudgygiant: Apr 8, 2004; Garnet and black? More like gold and blue or whatever the fuck colors these are

Dammit, I knew that. I wasn't trying to slice, I was trying to subtract from the result, and it caught me off guard.

# ? Sep 7, 2014 06:24

Adbot: ADBOT LOVES YOU

# ? May 8, 2024 22:47

QuarkJets: Sep 8, 2008

# ? Sep 7, 2014 06:25

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python information and short questions megathread.

«‹›484 »