Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
lotor9530
Apr 29, 2009
What is the best way to replace multiple variables in a template file with arguments from the command line?

Here's what I'm currently doing:
code:
# parse command line

argv = sys.argv
if len(argv) != 5:
  print "Syntax: file.py foo.txt variable1 variable2 variable3"
  sys.exit()

infile = sys.argv[1]
var1 = float(sys.argv[2])
var2 = int(sys.argv[3])
var3 = int(sys.argv[4])

me = 0

# Write variable2 and variable1 to foo.txt_0
# Create the new config file for writing
config = io.open(sys.argv[1] + '_0', 'w')

# Read the lines from the template, substitute the values, and write to the new config file
for line in io.open('foo.txttemplate', 'r'):
    line = line.replace('$variable1', sys.argv[2])
    config.write(line)

for line in io.open('in.lj17template', 'r'):
    line = line.replace('$variable2', sys.argv[3])
    config.write(line)

# Close the files
config.close()
print("Done, exiting.")
Running this (file.py foo.txt 100 200 300) gives me a file output of:
code:
This is my file that I want to replace 100 and variable2 in.
Blah blah blah.
This is my file that I want to replace variable1 and 200 in.
Blah blah blah.
I tried this, too:
code:
for line in io.open('foo.txttemplate', 'r'):
    line = line.replace('$variable1', sys.argv[2])
    config.write(line)
    line = line.replace('$variable2', sys.argv[3])
    config.write(line)
Which gives:
code:
This is my file that I want to replace 100 and variable2 in.
This is my file that I want to replace variable1 and 200 in.
Blah blah blah.
Blah blah blah.

Adbot
ADBOT LOVES YOU

supercrooky
Sep 12, 2006
You're trying to make
code:
This is my file that I want to replace variable1 and variable2 in.
Blah blah blah.
into

code:
This is my file that I want to replace 100 and 200 in.
Blah blah blah.
right?

code:
for line in open(template, 'r'):
    config.write(line.replace('variable1', var1).replace('variable2', var2))
will do what you are trying to do. As for an actual strategy for managing config files, you should look into ConfigParser https://docs.python.org/2/library/configparser.html

SurgicalOntologist
Jun 17, 2004

What you tried first is going through the whole file and doing one replacement while writing each line, then going through the whole file again doing the next replacement while writing each line.

Your second try is doing one replacement, then writing the line, then another replacement, then writing the line again, etc.

(At least I think, the fact that you have two different filenames suggests maybe some duplication is intended? It's not completely clear what you're trying to do specifically)

What you should be doing (if I understand your intention), as supercrooky suggested, is for each line, first do all the replacements, and only then write each line. There shouldn't be any need to loop over the file twice, or to do two writes on each iteration.

As a side note, you don't need the io module, the builtin open function is fine. You should also be using the with statement rather than manually closing:
Python code:
with open(sys.argv[1] + '_0', 'w') as config:
    with open('foo.txttemplate', 'r' as template:
        for line in template:
            # do stuff here

Nippashish
Nov 2, 2005

Let me see you dance!
You should use a template engine like jinja2 because they are designed specifically to solve this type of problem.

lotor9530
Apr 29, 2009

supercrooky posted:

code:
for line in open(template, 'r'):
    config.write(line.replace('variable1', var1).replace('variable2', var2))
will do what you are trying to do. As for an actual strategy for managing config files, you should look into ConfigParser https://docs.python.org/2/library/configparser.html

Thanks! This works, except I had to change var1 and var2 to unicode(var1) and unicode(var2).

Dominoes
Sep 20, 2007

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.

Example XML:
XML code:
 <faa:Member>
    <aixm:Navaid gml:id="NAVAID_0000001">
      <aixm:timeSlice>
        <aixm:NavaidTimeSlice gml:id="NAVAID_TS_0000001">
          <gml:validTime>
            <gml:TimePeriod gml:id="NAVAID_TIME_PERIOD_0000001">
              <gml:beginPosition>2014-07-24T00:00:00.000-04:00</gml:beginPosition>
              <gml:endPosition indeterminatePosition="unknown"/>
            </gml:TimePeriod>
          </gml:validTime>
          <aixm:interpretation>BASELINE</aixm:interpretation>
          <aixm:designator>NUD</aixm:designator>
          <aixm:name>ADAK</aixm:name>
Say I want to store the 'name' to a database. Based on The official docs, it looks like I'd use this:

Python code:
tree = ET.parse('filename.xml')
root = tree.getroot()

for child in root:
    name = child[0][0][0][3].text
# ...
This is a delicate solution, as any variance in the entry structure will (and does) cause it to select the wrong element. If this was JSON, here's how I'd parse it:

Python code:
name = root['Member']['Navaid']['NavaidTimeSlice']['name']
Can I do something similar with XML?

Dominoes fucked around with this message at 02:00 on Sep 2, 2014

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

BigRedDot posted:

Oh, I do too, hence the scare quotes. :) Let's go another direction and make the ultra-flexible enterprise scorer that can perform arbitrary accumulations and transforms on the input with configurable decoders:

code:
''' The score.py module provides functions for generalized word 
scoring, as well as specialized scorers for particular rulesets.

'''

def score_word(word, decode, transform=lambda x: x, accum=sum):
    ''' Score an input word according to given decode, transform, and 
    accumulation policies.

    Args:
        word (str): 
            a word to score
        decode (callable): 
            callable taking one letter as input that maps the letter to 
            its score
        transform(callable, optional) : 
            callable that performs any necessary transformation on each 
            letter before decoding (default: identity)
        accum (callable, optional): 
            callable the reduces a sequence of letter scores into a 
            final score (default: sum)

    Returns:
        float : score

    Examples:

    >>> score_word("za", decode=lambda x: 2)
    4
    >>>

    '''
    return accum(decode(letter) for letter in transform(word))

#
# Scrabble (tm) specific scoring functions
# 

_SCRABBLE_SCORE_BY_LETTER = {
    "c": 3,
    "b": 3,
    "d": 2,
    "g": 2,
    "f": 4,
    "h": 4,
    "k": 5,
    "j": 8,
    "m": 3, 
    "q": 10,
    "p": 3,
    "w": 4,
    "v": 4,
    "y": 4,
    "x": 8,
    "z": 10,
}


def scrabble_word(word):
    ''' Score a word according to Scrabble (tm) rules.

    Args:
        word (str) : 
            a word to score

    Returns:
        float : score

    Examples:

    >>> scrabble_word("za")
    11
    >>>

    '''
    return score_word(
        word, 
        decode=lambda x: _SCRABBLE_SCORE_BY_LETTER.get(x, 1), 
        transform=lambda x: x.lower()
    )

if __name__ == '__main__':
    import doctest
    doctest.testmod()
Sorry, it's Labor Day, and I'm, er... uh, working? I leave it as an exercise to the reader to extend this and define an accumulator to handle scoring (double, triple)-(letter, word) modifiers. Don't forget to update the doctests, and re-run Sphinx to generate new API documentation (you'll need sphinx-napoleon to handle the Google style docstrings). Pull requests welcome!

Edit: Unicode support is an open issue; see the GH tracker.

Let's take a simple problem and rather than taking five minutes to solve it, let's figure out if it's a special case of some much larger set of problems that we might possibly (but probably won't) want to solve at some point in the future. Then let's develop a spec for an extensible framework and class library that can be called upon to solve any of those larger classes of problems using unnecessarily cryptic chains of method calls to execute the same functionality as we could have done yesterday when we started all this.

Sorry, but solving a problem that can be solved in 20 lines of code (unless it's running a space shuttle or pacemaker) shouldn't involve opening powerpoint to make a presentation on your proposed "framework."

This is why I hate java.

Hed
Mar 31, 2004

Fun Shoe

Dominoes posted:

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.

Example XML:
XML code:
 <faa:Member>
    <aixm:Navaid gml:id="NAVAID_0000001">
      <aixm:timeSlice>
        <aixm:NavaidTimeSlice gml:id="NAVAID_TS_0000001">
          <gml:validTime>
            <gml:TimePeriod gml:id="NAVAID_TIME_PERIOD_0000001">
              <gml:beginPosition>2014-07-24T00:00:00.000-04:00</gml:beginPosition>
              <gml:endPosition indeterminatePosition="unknown"/>
            </gml:TimePeriod>
          </gml:validTime>
          <aixm:interpretation>BASELINE</aixm:interpretation>
          <aixm:designator>NUD</aixm:designator>
          <aixm:name>ADAK</aixm:name>
Say I want to store the 'name' to a database. Based on The official docs, it looks like I'd use this:

Python code:
tree = ET.parse('filename.xml')
root = tree.getroot()

for child in root:
    name = child[0][0][0][3].text
# ...
This is a delicate solution, as any variance in the entry structure will (and does) cause it to select the wrong element. If this was JSON, here's how I'd parse it:

Python code:
name = root['Member']['Navaid']['NavaidTimeSlice']['name']
Can I do something similar with XML?

Take a look at the XPath examples in the Python docs, they should be able to get you going.

Space Kablooey
May 6, 2009


Dominoes posted:

Looking for help parsing XML files. I have long files - I want to pull some of the information into database entries.
*snip*

I used https://pypi.python.org/pypi/xmltodict some time ago and it worked well. Not sure about its performance on large files.

EDIT:

Murodese posted:

Really, really bad. I was using xmltodict to read 300kb-1.5mb XML files and changing it to use XPath resulted in a speedup of something like 15,000%.

Ouch

Space Kablooey fucked around with this message at 15:31 on Sep 2, 2014

Murodese
Mar 6, 2007

Think you've got what it takes?
We're looking for fine Men & Women to help Protect the Australian Way of Life.

Become part of the Legend. Defence Jobs.

HardDisk posted:

I used https://pypi.python.org/pypi/xmltodict some time ago and it worked well. Not sure about its performance on large files.

Really, really bad. I was using xmltodict to read 300kb-1.5mb XML files and changing it to use XPath resulted in a speedup of something like 15,000%.

BigRedDot
Mar 6, 2008

KernelSlanders posted:

Let's take a simple problem and rather than taking five minutes to solve it, let's figure out if it's a special case of some much larger set of problems that we might possibly (but probably won't) want to solve at some point in the future. Then let's develop a spec for an extensible framework and class library that can be called upon to solve any of those larger classes of problems using unnecessarily cryptic chains of method calls to execute the same functionality as we could have done yesterday when we started all this.

Sorry, but solving a problem that can be solved in 20 lines of code (unless it's running a space shuttle or pacemaker) shouldn't involve opening powerpoint to make a presentation on your proposed "framework."

This is why I hate java.

I hope you don't think I was being serious...

Edit: Well, the docstrings were serious. I do wish more people wrote good docstrings.

Edit2: Although, in truth, it's a balance... I mean, if you've never been in the position of thinking "Goddamn I wish I had generalized this more from the start, what a loving pain in the rear end it's going to be now" then I can only assume you haven't been writing software for very long. :)

BigRedDot fucked around with this message at 16:02 on Sep 2, 2014

Ahz
Jun 17, 2001
PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!
Does anyone have any tips on PDF rendering in Python? I am playing with Report Lab, but it seems slow at about 500ms per page.

I still have to do some optimizing, but If anyone knows any good tips or gotchas, it would help.

JHVH-1
Jun 28, 2002
I was doing my scripts using lxml and in some cases I was able to get away with dot notation, but if an element didn't exist you end up with an error so you have to put handling in for it.
I found XML annoying to deal with, but maybe because I was using an older python without all the features the XPath docs have. Anywhere I could get json output instead I always opted for that.

David Pratt
Apr 21, 2001

KernelSlanders posted:

This is why I hate java.

Anyone who feels a similar way should read execution in the kingdom of nouns.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

BigRedDot posted:

Edit2: Although, in truth, it's a balance... I mean, if you've never been in the position of thinking "Goddamn I wish I had generalized this more from the start, what a loving pain in the rear end it's going to be now" then I can only assume you haven't been writing software for very long. :)

While I get where you're coming from, YAGNI. I have to tell myself this all the time.

Dominoes
Sep 20, 2007

Hed posted:

Take a look at the XPath examples in the Python docs, they should be able to get you going.
Thanks - that worked. here's the code I ended up using:

Python code:
    aixm = '{[url]http://www.aixm.aero/schema/5.1[/url]}'

    tree = ET.parse(os.path.join(DIR_RES, filename))
    root = tree.getroot()

    for child in root:
        ident = child.findall(".//{0}timeSlice//{0}designator".format(aixm))
        ident = ident[0].text

        n = Navaid(ident=ident...)
        n.save()

EAT THE EGGS RICOLA
May 29, 2008

Haystack posted:

Ok, so that sort of data mashing is not the sort of thing you want to be doing inside of your template (down that path lies php). Get your data into shape, then pass it into the template. In your case, you're basically mashing together two dicts, so here's an outline of what your code code might look like.

Python code:
 #If you want things to stay in some sort of order, use ordereddicts instead of plain dicts
value_map = {"unique_id" : "12345", "etc" : "etc"}
human_labels = get_human_lablels() #Your API call, deserialized into a {"value_map key":"human label"} dict

#There's shorter ways to do this mapping, but I left it long for readability
for_humans = {}
for key, value in value_map.iteritems():
    human_label = human_labels[key]
    for_humans[human_label] = value

#Now your template only needs to do a simple loop over the finalized data
template.render("my_template.jinja", values=for_humans))
Generally speaking, anything you pass into a template should already be finalized, or at worst have an API that resolves down to single calls.

Welp I did this correctly and my boss thinks it's too complicated so I got chastised for doing it this way.

lol

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
I was hoping to look up definitions of words online, and found out about the nltk module. I've tried to install it from pip on Windows and Linux, with Python 2.7.5 and Python 3.3.4, and it completely screws up when I try to import it. It's just dumb grammar stuff like:

code:
Python 3.3.4 (v3.3.4:7ff62415e426, Feb 10 2014, 18:13:51) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\site-packages\nltk\__init__.py", line 37
    except IOError, ex:
Is there something else out there that might help. I must have automatic lookups for crazy Scrabble words, dammit! :mad:

Alligator
Jun 10, 2009

LOCK AND LOAF
That specific error is because you have a version of nltk intended for python 2 not 3. From what I can tell you'll need to use the current alpha/beta versions for python 3 support, try running pip with the --pre flag so it'll grab pre-releases.

onionradish
Jul 6, 2006

That's spicy.

Rocko Bonaparte posted:

I was hoping to look up definitions of words online, and found out about the nltk module. ... I must have automatic lookups for crazy Scrabble words, dammit!
If your goal is looking up defnitions for Scrabble words, NLTK may not be a great fit. It's primarily useful for analyzing common English, so the crazy words that Scrabble considers valid (like "agma" and "za") and now-common words like "hashtag" or "selfie" aren't in the list. NLTK uses an offline version of WordNet -- you can test out some words yourself to see whether it includes the kinds of words you'd be looking for.

Based on your goal, using the Requests library to send a web query and parse the response from an online dictionary (or finding an API) would likely give you better results.

Rocko Bonaparte
Mar 12, 2002

Every day is Friday!
Heh--I figured I was just bitching on the Internet. I didn't actually expect replies!

Alligator posted:

That specific error is because you have a version of nltk intended for python 2 not 3. From what I can tell you'll need to use the current alpha/beta versions for python 3 support, try running pip with the --pre flag so it'll grab pre-releases.

When I try in Python2 it hates me too:
code:
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk.corpus import wordnet
  File "<stdin>", line 1
    import nltk.corpus import wordnet
                            ^
SyntaxError: invalid syntax

onionradish posted:

Based on your goal, using the Requests library to send a web query and parse the response from an online dictionary (or finding an API) would likely give you better results.

This is probably what I should be doing. It really hadn't occurred to me to just hit up the REST API of some common dictionary site. That's probably why there isn't much about this online.

KICK BAMA KICK
Mar 2, 2009

Rocko Bonaparte posted:

Python code:
import nltk.corpus import wordnet
Just a typo:
Python code:
from ntlk.corpus import wordnet

Tacos Al Pastor
Jun 20, 2003

Got another question for you guys. My program is about 99% complete but Im a little confused on one part. I'm trying to sort some key values from a dictionary and the code I have is just not working.
All the program does is read text from a file, strip out any punctuation and then return the count for words that are one to 16 characters in length.

Python code:
# Purpose: Call the program with an argument, it should treat the argument as a filename, and process the content of the file.
# The program reads all the input, splitting it up into words, and computes the length of each word. Punctuation marks should not 
# be included as a part of the word, so "it's" should be counted as a three-character word, and "final." should be counted as a 
# five-character word. The example text includes a "word" of zero length (the "&"); your solution should handle this. When all input 
# has been processed ,the program should print a table showing the word count for each of the word lengths that has been encountered. 
# Your tutor will run your code against several standardized inputs to verify the correctness of your logic.  

import sys
import operator
import string
from sys import argv
from collections import Counter
_dict = {'1': 0, '2': 0, '3': 0, '4': 0, '5': 0, '6': 0, '7': 0, '8': 0, '9': 0, '10': 0, '11': 0, '12': 0, '13': 0, '14': 0, '15': 0, '16': 0}


def switch(x):
    return {
        1: '1',
        2: '2',
        3: '3',
        4: '4',
        5: '5',
        6: '6',
        7: '7',
        8: '8',
        9: '9',
        10: '10',
        11: '11',
        12: '12',
        13: '13',
        14: '14',
        15: '15',
        16: '16',
    }[x]

def main(argv):
    file_name = sys.argv[1]
    with open(file_name) as fp:
        while True:
            contents = fp.read()
            if not contents: break
            else: 
                contents = ''.join(c for c in contents if c not in ',".?;:&')    #Strip all the punctuation out
                contents = contents.split()                                      #Split contents into a list
                for word in contents:                                            #For ever word thats in the list 
                    cnt = 0                                                      #set the count to 0
                    for letter in word:                                          #for each letter in the word
                        cnt += 1                                                 #increment the count
                    key = switch(cnt)                                            #make the key value the number string in switch
                    _dict[key] += 1                                              #increment number count for each word count in the dict
    print("Length Count")
    for key in sorted(_dict):                                                    #attempted to sort the dict by key value, but apparently this doesnt work(?) 
        print(key, "    ", _dict[key])                                           #print the key, _dict values

if __name__ == "__main__":
    main(sys.argv[1:])
This particular line:

quote:

for key in sorted(_dict):
print(key, " ", _dict[key])

wont work. Im getting the following output for my text file which is the declaration of independence. Nothing is sorted, but the counts are correct:

Length Count
1 16
10 56
11 34
12 14
13 9
14 7
15 2
16 0
2 267
3 267
4 169
5 140
6 112
7 98
8 69
9 61

Any idea what I am doing wrong?

EAT THE EGGS RICOLA
May 29, 2008

You're sorting those as strings, not as ints. It's sorting correctly for strings. Change the keys in your dict to integers.

There's a bunch of other stuff you're doing weirdly but I'm on my phone.

Dren
Jan 5, 2001

Pillbug
It sorted correctly. Your keys are strings.

Tacos Al Pastor
Jun 20, 2003

Dren posted:

It sorted correctly. Your keys are strings.

wow. my god. I cant believe Im that dumb. Thanks!

yippee cahier
Mar 28, 2005

I know it's your homework, so hopefully these offer you some things to try out instead of giving away answers.
  • You should look up defaultdict (https://docs.python.org/3.4/library/collections.html#collections.defaultdict) as a replacement for your _dict object. Pretty useful to know about some cool stuff lurking in the standard library for further down the road. Related: you'll never see a 17 letter long word?
  • Consider if there are any easier ways to determine the length of an sequence without walking through each item and tallying them using a for loop.
  • Consider your choice of a dict to hold the tallies. Your keys are sequential integers, you look up a string representation of them using a function called "switch", then look up the value in a dict associated with the string. There's a much simpler way to accomplish this. Also don't name your things "_dict" or "switch".
  • You never use a Counter or the operator module you import.
  • Look up the docs for the .read() method on your file object as your code is written expecting different behaviour.
  • Will you ever see different punctuation than what you have listed? The example given for "it's" will fail. A whitelist of alpha characters to keep would be more robust. The library has something useful for that: https://docs.python.org/3.4/library/string.html#string-constants or isalpha().
  • It's interesting you're using a generator statement to conditionally filter out punctuation. If you've got a handle on it, see if employing it elsewhere would simplify what you're doing.

salisbury shake
Dec 27, 2011

salisbury shake posted:

Anyone have success with bs4 and multiprocessing?

Python code:
from multiprocessing import Pool
from bs4 import BeautifulSoup as BS
import requests

url = 'http://yahoo.com'

with Pool(4) as pool:
    pgs = (requests.get(url).content for _ in range(8))
    wrapped_pgs = pool.map(BS, pgs)
shits itself with a max recursion limit reached RuntimeError, but replace the URL with say google.com and it works just fine. Replace it with the forum URL and get a different RuntimeError. Wrapping content serially works just fine for all urls involved.

If I break out lxml will I run into the same problem? Changing the bs4 backend to lxml still produces errors.

In case anyone cares: this happens because bs4.element.Tag objects cannot be pickled when sent to a Pool's workers because of the overridden __getattr__ method.

Wrote an adapter that translates bs4 api calls and attribute lookups into the appropriate XPath strings and lxml method calls, and didn't have to rewrite anything in the parser :).

Got a 10x speed up in initially wrapping str/bytes content along with a 10x speed up when doing tag queries and attribute lookups. No need to use the multiprocessing module at all.

Tacos Al Pastor
Jun 20, 2003

sund posted:

I know it's your homework, so hopefully these offer you some things to try out instead of giving away answers.

[*]You should look up defaultdict (https://docs.python.org/3.4/library/collections.html#collections.defaultdict) as a replacement for your _dict object. Pretty useful to know about some cool stuff lurking in the standard library for further down the road. Related: you'll never see a 17 letter long word?

I did consider this but I just wanted to get the basics working until I could handle this case. Ill take a look into defaultdict.

quote:

[*]Consider if there are any easier ways to determine the length of an sequence without walking through each item and tallying them using a for loop.

I know that some of this can be accomplished using "If Not In" etc types of statements. Beyond that Im not sure.

quote:

[*]Consider your choice of a dict to hold the tallies. Your keys are sequential integers, you look up a string representation of them using a function called "switch", then look up the value in a dict associated with the string. There's a much simpler way to accomplish this. Also don't name your things "_dict" or "switch".

Yeah the naming could be a bit better and the switch statement I felt was a bit unnecessary if there was a way to access dict[key] values without having to go that route. I just havent thought about it enough.

quote:

[*]You never use a Counter or the operator module you import.

This needs to be removed. I was going to use that as a tally to count up the collection of the number of characters for each word in the text.

quote:

[*]Look up the docs for the .read() method on your file object as your code is written expecting different behaviour.

Ok Ill check this out.

quote:

[*]Will you ever see different punctuation than what you have listed? The example given for "it's" will fail. A whitelist of alpha characters to keep would be more robust. The library has something useful for that: https://docs.python.org/3.4/library/string.html#string-constants or isalpha().

I've used isalpha() before and I dont know why I didnt think of this earlier. Thanks.

quote:

[*]It's interesting you're using a generator statement to conditionally filter out punctuation. If you've got a handle on it, see if employing it elsewhere would simplify what you're doing.

Can you give me a hint where it might else be used?

Thanks again for all the pointers!

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

spiralbrain posted:

I did consider this but I just wanted to get the basics working until I could handle this case. Ill take a look into defaultdict.
They're simpler to use than you might think. That often happens in Python.

quote:

I know that some of this can be accomplished using "If Not In" etc types of statements. Beyond that Im not sure.
No, much simpler. A built-in, non-library function.

quote:

Yeah the naming could be a bit better and the switch statement I felt was a bit unnecessary if there was a way to access dict[key] values without having to go that route. I just havent thought about it enough.
Again, another built-in function. This one I'll just give you: int(foo) and str(bar)

quote:

This needs to be removed. I was going to use that as a tally to count up the collection of the number of characters for each word in the text.
If you can figure out how to use a Counter, you should probably do that as it's exactly what you're doing here.

quote:

Ok Ill check this out.

Can you give me a hint where it might else be used?
Your hint is these two are related.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

salisbury shake posted:

In case anyone cares: this happens because bs4.element.Tag objects cannot be pickled when sent to a Pool's workers because of the overridden __getattr__ method.

Wrote an adapter that translates bs4 api calls and attribute lookups into the appropriate XPath strings and lxml method calls, and didn't have to rewrite anything in the parser :).

Got a 10x speed up in initially wrapping str/bytes content along with a 10x speed up when doing tag queries and attribute lookups. No need to use the multiprocessing module at all.

You could also try using ThreadPool, since I'm guessing you're at least partially I/O bound, not CPU bound. ThreadPool has the same interface as Pool, except that it creates threads instead of processes. It's undocumented, for whatever reason.

Hed
Mar 31, 2004

Fun Shoe
Am I the only one who stopped using defaultdicts? I think there was like one time a few years ago it was easier for me than just using get and setdefault on a normal dict. Ever since then the collections one has sorta fallen out of favor for me.

HonorableTB
Dec 22, 2006
How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

HonorableTB posted:

How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?

This reminded me that I started a new OP months ago and then forgot...

I'll try to get it up this evening or tomorrow.

In the meantime check out Learn Python the Hard Way and Think Python.

null gallagher
Jan 1, 2014
What matters is if you feel you're learning from it. I've seen a lot of people say they want to learn programming but never get anywhere because they get hung up on deciding what language or tutorial to use.

Honestly, from what I've seen of the Code Academy type sites, they've all felt a little too hand-holdy and don't explain what's going on. I tried one of them a while back (I think it was Ruby), and it felt like the tutorial maker was telling me "just type this in, don't worry about what it means or how you'd use it in another situation." Then again, I didn't get too far.

I learned Python from Think Python, and I liked it a lot. I went through it after I'd graduated with my CS/math degree, but the author uses it for his intro CS classes.

Dominoes
Sep 20, 2007

HonorableTB posted:

How would you all rate Code Academy's Python course? I'm about 30% through it and I feel like I'm learning it, but at the same time I want to make sure that I'm doing things correctly. I have little programming experience and I want to learn Python easily. The resources in the OP are alright, but I feel like they are very out of date. What would you guys recommend nowadays for Python tutorials?
I learned Python from it. The interactivity helped. I'd recommend it.

Pudgygiant
Apr 8, 2004

Garnet and black? More like gold and blue or whatever the fuck colors these are
code:
>> x = (1, 2, 3)
>> x[0]
1
>> x[0 - 1]
3
>> x[0 - 2]
2
Buh?

Telarra
Oct 9, 2012

Python code:
>> x = (1, 2, 3)
>> x[0]
1
>> x[-1]
3
>> x[-2]
2
>> x[0 : 1]
(1,)
>> x[0 : 2]
(1, 2)

Pudgygiant
Apr 8, 2004

Garnet and black? More like gold and blue or whatever the fuck colors these are
Dammit, I knew that. I wasn't trying to slice, I was trying to subtract from the result, and it caught me off guard.

Adbot
ADBOT LOVES YOU

QuarkJets
Sep 8, 2008

nm

  • Locked thread