Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Pudgygiant
Apr 8, 2004

Garnet and black? More like gold and blue or whatever the fuck colors these are
Legit question (that also seems stupid), I have a function that returns ('foo', 'bar'), what's the easiest way to directly pass that to one that takes 2 args? Make a default arg and test for a tuple?

e
so:
code:
def iCountToThree():
    return 1, 2, 3

def didICountToThree(isOne, isTwo = 2, isThree = 3):
    if type(isOne) is tuple:
        isOne, isTwo, isThree = isOne
    blah blah blah

Pudgygiant fucked around with this message at 06:42 on Sep 7, 2014

Adbot
ADBOT LOVES YOU

Telarra
Oct 9, 2012

https://docs.python.org/2/tutorial/controlflow.html#unpacking-argument-lists
Python code:
 
didICountToThree(*iCountToThree())
didICountToThree(*(1, 2, 3))
didICountToThree(1, 2, 3)

Literally Elvis
Oct 21, 2013

Does anybody have experience with the Yelp API? I'm trying to write a command line utility that queries Yelp, and even though I'm forming requests like http://api.yelp.com/v2/search?&term=pizza&radius_filter=3218&cll=30.2747,-97.740344 I'm getting big UNSPECIFIED_ERROR messages returned. I don't see how much more specific I could be than "give me everything with the word pizza in it that is 2 miles from the state capitol," though, and that error is not even listed on Yelp's official list of error messages

EDIT: Apparently I am a dumb and using &cll= only works when specifying a neighborhood (&location=) value as well.

Literally Elvis fucked around with this message at 12:31 on Sep 7, 2014

salisbury shake
Dec 27, 2011

BeefofAges posted:

You could also try using ThreadPool, since I'm guessing you're at least partially I/O bound, not CPU bound. ThreadPool has the same interface as Pool, except that it creates threads instead of processes. It's undocumented, for whatever reason.

Been playing with the ThreadPoolExecutor from concurrent.futures and it does the trick for IO limited work.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

If you want to be cool and need to speed up I/O-bound work, look at green threads using something like eventlet or gevent.

Comrade Gritty
Sep 19, 2011

This Machine Kills Fascists
So I'm just starting to use pandas for the first time ever, and I'm trying to decide between different methods of doing things. Right now I have a single database table of "events" where each row represents one event and has a datetime as well as a number of other attributes associated with it. Right now I'm charting this by manipulating the data in SQL and then just dumping that into a pandas DataFrame before plotting it. However I'm barely using the DataFrame and I'm wondering if it would maybe be faster/saner to just dump the raw data into a pandas DataFrame and use it.

Primarily my question is:

1) Given a table like that, can/how would I use pandas to do things like "get the number of events per day based on a particular attribute". Like "100 events for Windows on 2014/09/07" assuming one of the attributes is an OS.
2) Is this likely to be faster than the SQL queries? Right now I have a few queries that are each taking 20+ minutes to complete against a table with 500 million events.

SurgicalOntologist
Jun 17, 2004

1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date:

Python code:
df[(df.index.date == datetime(2014, 9, 7) & (df['OS'] == 'Windows')].count()
Otherwise, maybe something like this which just uses a regular left <= values <= comparator (note that the right endpoint would be included and so would catch an event at midnight the next day):

Python code:
df[(df['datetime'].between(datetime(2014, 9, 7), datetime(2014, 9, 8)) & (df['OS'] == 'Windows')].count()
2) As to which is faster, IDK. Can you even fit all that data into memory?

Comrade Gritty
Sep 19, 2011

This Machine Kills Fascists

SurgicalOntologist posted:

1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date:

Python code:
df[(df.index.date == datetime(2014, 9, 7) & (df['OS'] == 'Windows')].count()
Otherwise, maybe something like this which just uses a regular left <= values <= comparator (note that the right endpoint would be included and so would catch an event at midnight the next day):

Python code:
df[(df['datetime'].between(datetime(2014, 9, 7), datetime(2014, 9, 8)) & (df['OS'] == 'Windows')].count()

They are not guaranteed to be unique. I'll have to try that other thing!

SurgicalOntologist posted:

2) As to which is faster, IDK. Can you even fit all that data into memory?

Hm, well I have boxes available to me with either 128GB of RAM or 512GB of RAM.

The events in this case are every download from PyPI since Jan 02, 2014 to now and I'm trying to figure out what sort of interesting data can be extracted from it.

Comrade Gritty
Sep 19, 2011

This Machine Kills Fascists

SurgicalOntologist posted:

1) Just off the top of my head. There may be faster ways. If you're able to use the datetime as an index (e.g., they're unique), you can take advantage of the datetimeindex methods and attributes like date:

Python code:
df[(df.index.date == datetime(2014, 9, 7) & (df['OS'] == 'Windows')].count()
Otherwise, maybe something like this which just uses a regular left <= values <= comparator (note that the right endpoint would be included and so would catch an event at midnight the next day):

Python code:
df[(df['datetime'].between(datetime(2014, 9, 7), datetime(2014, 9, 8)) & (df['OS'] == 'Windows')].count()

Actually going back to this, do indexes have to be unique? E.g. if I have a DataFrame where I did df.set_index("the datetime field"), and the datetime fields aren't unique did I lose data there?

SurgicalOntologist
Jun 17, 2004

I'm not sure. Try it out on some test data. Without checking the docs my hunch would be that it would work but might make some manipulations impossible. Similar to how a lot of things can get screwed up when you have an unsorted index, or how a lot of mathematical techniques give screwy results if the index isn't evenly spaced.

My recommendation given that data set is to generate a PeriodIndex I think it's called, perhaps every hour or even coarser, and covert those individual entries into counts. It will be much easier to deal with in that kind of form I would guess, and much faster. Plus you could always go back to the raw events if you needed something more.

Actually given your original question maybe that was your plan all along.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Pretty sure a relational database will always be faster for this kind of work. Selecting and sorting are what they do best.

Literally Elvis
Oct 21, 2013

Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something.

BeefofAges
Jun 5, 2004

Cry 'Havoc!', and let slip the cows of war.

Literally Elvis posted:

Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something.

Why are you installing dependencies via a shell script that calls sudo pip? Better to have a setup.py or requirements.txt.

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord

Literally Elvis posted:

Would anybody care to glance over the code I wrote for this Yelp scraping project? I am dumb so it has no tests (something I will rectify in the next week or so), but other than that, I'm pretty happy with it. I just want to make sure the code doesn't blatantly stink really hard. Also, this is the first time I've ever had to write a README.md so that was something.

A few things:

- I'm 95% sure you don't need the # -*- coding: utf-8 -*- line if this is exclusively python3. this item is worthless

- eliminate_duplicate_results(results) can be just set(results) or list(set(results)) if you still really want it to be a list.

- This kind of slicing in credentials.readline().strip()[15:] is just too silly, you should be using configparser or yaml or something.

- This part
code:
if radius >= 40000:
    radius = 40000
can be radius = min(40000, radius). But this doesn't really matter, the important thing is... Why 40000 anyway? There are lots of magic numbers in your code.

- That Business object thing sounds like it's begging to be a namedtuple.

Literally Elvis
Oct 21, 2013

BeefofAges posted:

Why are you installing dependencies via a shell script that calls sudo pip? Better to have a setup.py or requirements.txt.
I strugged a lot with this, frankly. I tried making a setup.py, but it never seemed to work. It's something I'm going to work on.

Symbolic Butt posted:

A few things:

- I'm 95% sure you don't need the # -*- coding: utf-8 -*- line if this is exclusively python3. this item is worthless

- This kind of slicing in credentials.readline().strip()[15:] is just too silly, you should be using configparser or yaml or something.

- That Business object thing sounds like it's begging to be a namedtuple.

- This part
code:
if radius >= 40000:
    radius = 40000
can be radius = min(40000, radius). But this doesn't really matter, the important thing is... Why 40000 anyway? There are lots of magic numbers in your code.
These are all things I can look into. 40,000 is the largest value the Yelp API will accept for that particular parameter. The only other magic numbers I found were 1609 (the number of meters per mile), and 3959 (the radius of the earth in miles). I will turn those into variables with proper comments when I get out of work.

Symbolic Butt posted:

- eliminate_duplicate_results(results) can be just set(results) or list(set(results)) if you still really want it to be a list.

I knew this, but when I tried it, it would refuse to work, presumably because the list is of an object?

Comrade Gritty
Sep 19, 2011

This Machine Kills Fascists
More pandas questions for y'all!

Right now I have a dataframe that looks like this:


code:
In [37]: df.head()
Out[37]:
         day distribution_type python_type python_release python_version operating_system installer_type installer_version  count
0 2014-01-02         bdist_dmg        None           None           None             None        browser              None      2
1 2014-01-02         bdist_egg     cpython           None   2.6.6-final0            Linux   bandersnatch               1.1      7
2 2014-01-02         bdist_egg     cpython           None   2.7.3-final0            Linux   bandersnatch             1.0.1     27
3 2014-01-02         bdist_egg     cpython           None   2.7.3-final0            Linux   bandersnatch        1.0.2.dev0      7
4 2014-01-02         bdist_egg     cpython           None   2.7.3-final0            Linux   bandersnatch             1.0.4      4
What I'd like to do now is effectively filter so that this only has rows where installer_type = "pip" and where installer_version has a value and then throw both of those columns becuase they'll no longer be useful.

I have the first part of that done using:

code:
In [50]: df[df.installer_type == "pip"].dropna(subset=["installer_version"]).drop("installer_type", 1).head()
Out[50]:
           day distribution_type python_type python_release python_version operating_system installer_version  count
15  2014-01-02         bdist_egg     cpython           None          2.7.6            Linux             1.4.1      2
123 2014-01-02       bdist_wheel     cpython           None          2.6.5            Linux             1.4.1      3
124 2014-01-02       bdist_wheel     cpython           None          2.6.5            Linux               1.5     31
125 2014-01-02       bdist_wheel     cpython           None          2.6.6            Linux               1.5    289
126 2014-01-02       bdist_wheel     cpython           None          2.6.6          Windows               1.5      2
Now I want to be able to drop the installer_version however since there are multiple values there, I'd like to take any row where the only difference is the installer_version column, and sum() the count column. Anyone have an inkling how to do this?

vikingstrike
Sep 23, 2007

whats happening, captain
I haven't test this code, but something like this should work:

code:
df = df[(df['installer_type'] == "pip") & (df['installer_version'] ~= None)]
df = df.drop(['installer_type'], axis=1)
version_counts = df.groupby(['installer_version'])['count'].agg(sum)

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Literally Elvis posted:

I strugged a lot with this, frankly. I tried making a setup.py, but it never seemed to work. It's something I'm going to work on.

Creating a requirements.txt is as simple as pip freeze > requirements.txt, and then an end user just has to do pip install -r requirements.txt.

ShadowHawk
Jun 25, 2000

CERTIFIED PRE OWNED TESLA OWNER

Literally Elvis posted:

These are all things I can look into. 40,000 is the largest value the Yelp API will accept for that particular parameter. The only other magic numbers I found were 1609 (the number of meters per mile), and 3959 (the radius of the earth in miles). I will turn those into variables with proper comments when I get out of work.
More specifically, you should turn them into constants defined at the start of the file. Compare:

Python code:
MAX_YELP_RADIUS = 40000 # API limit: http://www.yelp.com/developers/documentation/v2/search_api

...

radius = min(radius, MAX_YELP_RADIUS)

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Thermopyle posted:

Creating a requirements.txt is as simple as pip freeze > requirements.txt, and then an end user just has to do pip install -r requirements.txt.

Also, it's bad form to have a user blind execute a sudo command.

Instead, use a requirements.txt and allow the user to define the installation context.

Literally Elvis
Oct 21, 2013

Blinkz0rz posted:

Also, it's bad form to have a user blind execute a sudo command.

Instead, use a requirements.txt and allow the user to define the installation context.

Agreed, I've edited the README and generated the requirements.txt (thanks, Thermopyle!).

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug
In general, don't store something as a number unless you're going to do math with it. In your case, this applies to zip codes, but I've also seen phone numbers stored as numeric values once or twice in other projects. (I haven't actually checked whether you're converting to an int when getting the data from Yelp, but I did notice the initialization of Business.zip as 00000.)

Lysidas fucked around with this message at 13:53 on Sep 9, 2014

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Also, your make_api_call method is waaay more complicated than it needs to be. No need to do readline and stripping to write your own credentials parser when configparser is built into the stdlib.

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Steampunk Hitler posted:

Now I want to be able to drop the installer_version however since there are multiple values there, I'd like to take any row where the only difference is the installer_version column, and sum() the count column. Anyone have an inkling how to do this?

It should be

Python code:
df = df.drop("installer_version", 1).drop_duplicates()
Edit: Oh, sum the count. Need to read better.

Python code:
columns = ['day', 'distribution_type', 'python_type', 'python_release', 'python_version', 'operating_system']
df = df.groupby(columns).count.sum()

KernelSlanders fucked around with this message at 16:18 on Sep 9, 2014

Literally Elvis
Oct 21, 2013

Lysidas posted:

In general, don't store something as a number unless you're going to do math with it. In your case, this applies to zip codes, but I've also seen phone numbers stored as numeric values once or twice in other projects. (I haven't actually checked whether you're converting to an int when getting the data from Yelp, but I did notice the initialization of Business.zip as 00000.)
Should I just initialize it as None instead?

Blinkz0rz posted:

Also, your make_api_call method is waaay more complicated than it needs to be. No need to do readline and stripping to write your own credentials parser when configparser is built into the stdlib.
Someone else had mentioned configparser, which I will definitely read up on. Initially, that part of the code was just the four values without their explanations, but I was worried if somebody non-technical was charged with using it, they would just paste whatever Yelp displayed into a file and then be mad when it didn't work.

The March Hare
Oct 15, 2006

Je rêve d'un
Wayne's World 3
Buglord
Hi y'all, so I just "finished" my first programming thing and I want to have it looked over by smart dudes who know what is up so that you can tell me how extremely dumb I am. I have a pretty limited cs background (cf. none), so I wrote all of this with the help of SO and the scattered qt/pyqt docs which are terrifying and wouldn't stop referencing c++ implementations of things that I can only partially comprehend. I'm really looking for feedback on project structure, tech choice (are my library choices appropriate etc.), anything that I may have obviously implemented wrong/in a strange way, and things that I have totally missed that I should be doing. I went to school for architecture, I can take feedback for what it is so please fire away.

As a brief introduction, my goal was to create a cross-platform method for sharing my clipboard between machines while also maintaining a full searchable record of my clipboard (well, the bits I choose to store).

I used Django with DRF & Elastic Search to make the API/searchable record of the clipboard. Then I used pyqt4 and requests/pyperclip to make a tray app that lets you either send your current clipboard data to the Django app, or set your clipboard equal to the most recent record for your account from the app. Anything you send is private by default, but you can make the data public and get permalinks to any particular record so you can use it as a pastebin-lite too I suppose.

As far as things that I know I am bad on go, I feel like my understanding of threading (both in general, and in qt) is really not up to snuff, but I've been struggling to find a good resource for learning how to really deal with them. Additionally, I was running into a segfault on app exit that was really bothering me because I don't know how to debug it. I ended up deciding (after a lot of SO reading) that it was probably the qt exit call executing random-order deletion causing the python interpreter to crash. I "solved" this by telling sip (which I additionally do not fully understand) not to destroy things on exit. This seems to have worked, but also feels really hacky to me, though evidently it is the default behavior in pyqt5. What should I be doing here, if anyone knows? Also, is there any way to debug a segfault like that?

https://github.com/LarryBrid/glimmer-client is the qt client, and https://github.com/LarryBrid/glimmer is most of the code for the Django site. The site is operable at https://glimmer.io/. I probably wouldn't store anything really sensitive on it, because (as should be clear) I am far from an expert and I cannot imagine it is airtight. Please excuse the design/copy and that the download button doesn't do anything yet, I'll be adding links to download the client after I have it vetted and make whatever changes are needed. For now you should just be able to run the client.py file from the glimmer-client repo if you want to play around with it. You get your API key from the Profile page right now.

e; I just realized I need to add something to alert people if their API calls are failing, as the console will be hidden normally.

The March Hare fucked around with this message at 18:04 on Sep 9, 2014

Ghaz
Nov 19, 2004

Pudgygiant posted:

Legit question (that also seems stupid), I have a function that returns ('foo', 'bar'), what's the easiest way to directly pass that to one that takes 2 args? Make a default arg and test for a tuple?

e
so:
code:
def iCountToThree():
    return 1, 2, 3

def didICountToThree(isOne, isTwo = 2, isThree = 3):
    if type(isOne) is tuple:
        isOne, isTwo, isThree = isOne
    blah blah blah

Haven't seen anyone reply to this one yet, but for your specific scenario you would do that with the * operator:

code:
def returns_a_two_tuple():
    return 1, 2

def accepts_two_args(a, b):
    print a + b

accepts_two_args(*returns_a_two_tuple())
# prints 3

you can also use ** with a dict

http://agiliq.com/blog/2012/06/understanding-args-and-kwargs/

Ghaz fucked around with this message at 03:27 on Sep 10, 2014

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work...

I have a folder with lots of text files. In the text files are account numbers and associated information. The text files are sorted and named by a range of dates. For instance:
code:
CQMB_01012014_01082014.txt
CQMB_01082014_01152014.txt
CQMB_01152014_01222014.txt
If I want to look for an account that was at the hospital on 01/10/2014 then I would open CQMB_01082014_01152014.txt etc...

I have my eyes on a larger program but right now I simply want to be able to provide the script a date, and the script knows which text file to open. If the date falls on the edge, for instance 01/08/2014, then it would open both text files with that date.

Any advice on how to tackle this?

KernelSlanders
May 27, 2013

Rogue operating systems on occasion spread lies and rumors about me.

Hughmoris posted:

Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work...

I have a folder with lots of text files. In the text files are account numbers and associated information. The text files are sorted and named by a range of dates. For instance:
code:
CQMB_01012014_01082014.txt
CQMB_01082014_01152014.txt
CQMB_01152014_01222014.txt
If I want to look for an account that was at the hospital on 01/10/2014 then I would open CQMB_01082014_01152014.txt etc...

I have my eyes on a larger program but right now I simply want to be able to provide the script a date, and the script knows which text file to open. If the date falls on the edge, for instance 01/08/2014, then it would open both text files with that date.

Any advice on how to tackle this?

Start by reading the entire file list into memory, parse the start and end date parts of the filename into datetime objects, then find the ones that overlap your target date. Hint: ranges that overlap your target date are the ones where the start date is less than or equal to your target and the end date is greater than or equal to your target. That kind of indexing is the sort of thing pandas does really cleanly, but it can certainly be done just using lists as well.

Jose Cuervo
Aug 25, 2004

Hughmoris posted:

Let me preface this by saying I'm not a programmer, just a guy trying to automate crap at work...

I have a folder with lots of text files. In the text files are account numbers and associated information. The text files are sorted and named by a range of dates. For instance:
code:
CQMB_01012014_01082014.txt
CQMB_01082014_01152014.txt
CQMB_01152014_01222014.txt
If I want to look for an account that was at the hospital on 01/10/2014 then I would open CQMB_01082014_01152014.txt etc...

I have my eyes on a larger program but right now I simply want to be able to provide the script a date, and the script knows which text file to open. If the date falls on the edge, for instance 01/08/2014, then it would open both text files with that date.

Any advice on how to tackle this?

I think the following should work:
Python code:
import datetime
import glob
##

search_date= '01102014'
search_date= datetime.datetime.strptime(search_date, '%m%d%Y').date()

sorted_file_list= sorted(glob.glob('*.txt'))
for file_name in sorted_file_list:
	start_date= datetime.datetime.strptime(file_name[5:13], '%m%d%Y').date()
	end_date= datetime.datetime.strptime(file_name[14:22], '%m%d%Y').date()
	if (search_date >= start_date) and (search_date <= end_date):
		print file_name
	if start_date > search_date:
		break
I don't quite know what you meant by 'open the file' so I just printed out the file once I had found it. The second if condition is meant to terminate the loop once the file(s) have been found.

There might be a nicer (more Pythonic) way to do this and I am sure other people will say so if there is.

Hughmoris
Apr 21, 2007
Let's go to the abyss!

KernelSlanders posted:

Start by reading the entire file list into memory, parse the start and end date parts of the filename into datetime objects, then find the ones that overlap your target date. Hint: ranges that overlap your target date are the ones where the start date is less than or equal to your target and the end date is greater than or equal to your target. That kind of indexing is the sort of thing pandas does really cleanly, but it can certainly be done just using lists as well.

Jose Cuervo posted:

I think the following should work:
Python code:
import datetime
import glob
##

search_date= '01102014'
search_date= datetime.datetime.strptime(search_date, '%m%d%Y').date()

sorted_file_list= sorted(glob.glob('*.txt'))
for file_name in sorted_file_list:
	start_date= datetime.datetime.strptime(file_name[5:13], '%m%d%Y').date()
	end_date= datetime.datetime.strptime(file_name[14:22], '%m%d%Y').date()
	if (search_date >= start_date) and (search_date <= end_date):
		print file_name
	if start_date > search_date:
		break
I don't quite know what you meant by 'open the file' so I just printed out the file once I had found it. The second if condition is meant to terminate the loop once the file(s) have been found.

There might be a nicer (more Pythonic) way to do this and I am sure other people will say so if there is.

Thanks for the ideas. Jose, I'll try that out tonight when I'm back in the office. I appreciate you taking the time to write that up. I'm 100% a python beginner so it'll take me a little time to parse that and understand what its doing.

theguyoncouch
May 23, 2014
I had a questions over using DictWriter and couldn't find enough information on how to implement it correctly. i have the following code for filling out my class object

code:
with open('test.csv', newline='') as csvfile:
        fields = csv.DictReader(csvfile, delimiter=',', quotechar='"')

        for row in fields:
            if current.eid == row['eid']:
                current = Employee(row['lastname'],
                                   row['firstname'],
                                   row['eid'],
                                   row['clockin'],
                                   row['clockout'],
                                   row['flag'])
But once i do this, i want to modify current.clockin with 'current.clockin = str(datetime.datetime.now())' and modify the csv file to save that one field. Is there a way to modify only one field using DictWriter or would i have to load the whole document into a variable and rewrite everything just to modify one field? I started writing the code for the DictWriter but i'm lost afterwards

code:
with open('test.csv', 'w', newline='') as csvfile:
            fieldnames = ['lastname', 'firstname', 'eid',
                          'clockin', 'clockout', 'flag']
            wfields = csv.DictWriter(csvfile, fieldnames)

accipter
Sep 12, 2003

theguyoncouch posted:

I had a questions over using DictWriter and couldn't find enough information on how to implement it correctly. i have the following code for filling out my class object

code:
with open('test.csv', newline='') as csvfile:
        fields = csv.DictReader(csvfile, delimiter=',', quotechar='"')

        for row in fields:
            if current.eid == row['eid']:
                current = Employee(row['lastname'],
                                   row['firstname'],
                                   row['eid'],
                                   row['clockin'],
                                   row['clockout'],
                                   row['flag'])
But once i do this, i want to modify current.clockin with 'current.clockin = str(datetime.datetime.now())' and modify the csv file to save that one field. Is there a way to modify only one field using DictWriter or would i have to load the whole document into a variable and rewrite everything just to modify one field? I started writing the code for the DictWriter but i'm lost afterwards

code:
with open('test.csv', 'w', newline='') as csvfile:
            fieldnames = ['lastname', 'firstname', 'eid',
                          'clockin', 'clockout', 'flag']
            wfields = csv.DictWriter(csvfile, fieldnames)

If you are only updating one columns, you probably can just used csv.reader/csv.writer as below:
Python code:
FNAME = 'test.csv'
with open(FNAME, 'r') as fin:
    rows = list(csv.reader(fin))

with open(FNAME, 'w', newline='') as fout:
    writer = csv.writer(fout, delimiter=',', quotechar='"')
    # Write the header
    writer.writerow(rows[0])
    for row in rows[1:]:
        row[3] = str(datetime.datetime.now())
        writer.writerow(row)

theguyoncouch
May 23, 2014

accipter posted:

If you are only updating one columns, you probably can just used csv.reader/csv.writer as below:
Python code:
FNAME = 'test.csv'
with open(FNAME, 'r') as fin:
    rows = list(csv.reader(fin))

with open(FNAME, 'w', newline='') as fout:
    writer = csv.writer(fout, delimiter=',', quotechar='"')
    # Write the header
    writer.writerow(rows[0])
    for row in rows[1:]:
        row[3] = str(datetime.datetime.now())
        writer.writerow(row)

It is one column in one row though. How would I tell my loop which row I want

Space Kablooey
May 6, 2009


There's probably a better way, but

Python code:
for row in rows[1:]:
	if row[2] == 'some_eid': # I'm guessing this is the eid column that you want
	# or you could use something like
	# if row[2] in list_of_eids_to_update:
		row[3] = str(datetime.datetime.now()) # Or whatever column you want to update
	writer.writerow(row)

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Jose Cuervo posted:

I think the following should work:
Python code:
import datetime
import glob
##

search_date= '01102014'
search_date= datetime.datetime.strptime(search_date, '%m%d%Y').date()

sorted_file_list= sorted(glob.glob('*.txt'))
for file_name in sorted_file_list:
	start_date= datetime.datetime.strptime(file_name[5:13], '%m%d%Y').date()
	end_date= datetime.datetime.strptime(file_name[14:22], '%m%d%Y').date()
	if (search_date >= start_date) and (search_date <= end_date):
		print file_name
	if start_date > search_date:
		break
I don't quite know what you meant by 'open the file' so I just printed out the file once I had found it. The second if condition is meant to terminate the loop once the file(s) have been found.

There might be a nicer (more Pythonic) way to do this and I am sure other people will say so if there is.

This worked like a champ, thank you.

theguyoncouch
May 23, 2014
I keep getting pylint errors redigining name 'DEFAULT_LIST' from outer scope and invalid argument name 'DEFAULT_LIST'.
My problem is i turned the arguments into a list because pylint was telling me not to have too many arguments and i wanted to make it more readable.
How can i correct this?
Python code:
DEFAULT_LIST = ['', '', '', '', '', 'False', 'False']


class Employee:
    'base class called for employee'
    def __init__(self, DEFAULT_LIST):        #Messages occur on this line. 
        self.lastname = DEFAULT_LIST[0]
        self.firstname = DEFAULT_LIST[1]
        self.eid = DEFAULT_LIST[2]
        self.clockin = DEFAULT_LIST[3]
        self.clockout = DEFAULT_LIST[4]
        self.flag = DEFAULT_LIST[5]
        self.admin = DEFAULT_LIST[6]

David Pratt
Apr 21, 2001
That's less readable. Just declare what your constructor takes explicitly. Either disable that rule in pylint or adjust it to be something sensible.

Also if you want default arguments you need to name them, and having a mutable object (like a list) as a default argument is a bad idea.

David Pratt fucked around with this message at 16:54 on Sep 11, 2014

theguyoncouch
May 23, 2014

David Pratt posted:

That's less readable. Just declare what your constructor takes explicitly. Either disable that rule in pylint or adjust it to be something sensible.

Also if you want default arguments you need to name them, and having a mutable object (like a list) as a default argument is a bad idea.

Alright i'll mess with pylint instead

Python code:
class Employee:
    'base class called for employee'
    def __init__(self, lastname, firstname, eid,
                 clockin, clockout, flag, admin):
        self.lastname = lastname
        self.firstname = firstname
        self.eid = eid
        self.clockin = clockin
        self.clockout = clockout
        self.flag = flag
        self.admin = admin

Adbot
ADBOT LOVES YOU

BigRedDot
Mar 6, 2008

Hey, time for new Bokeh! Fun stuffs:

New widgets like a hands-on table and new examples plus improvements to the cross filter (still a work in progress)



Additional abstract rendering capabilities for heat maps and contours when you have millions of points:



Some general feature improvements like multiple axes and some design love for the toolbar:



And finally, we just hit 2k stars on GitHub :peanut:



As always we are looking for new contributors and team members so if any of this interests you and you want to get involved in an open source project, drop us a line on GH!

  • Locked thread