Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Baronash
Feb 29, 2012

So what do you want to be called?

SurgicalOntologist posted:

I think you can make that a linear programming problem. With Python the best option (last time I looked into it) was PuLP, which has a terrible interface but it does work. You could make a matrix of variables where each slot is one person in one program taking the value of 0 or 1. Add a constraint for each program (e.g. row in the matrix) that it sums to 2 (2 volunteers in each slot), that the columns (volunteers) sum to 1 for the day rows and 1 for the night rows. "Don't pair up the same volunteers twice" is going to be hard to formulate linearly but may be doable combinatorically (something like "the sum of these four cells must be < 4; the sum of these four cells must be < 4, etc.). That might be too many constraints... in which case maybe there's a more clever way to do it or you could make the pairing concept part of the objective function (and I think you'd have to look into a solver that can handle a nonlinear objective function.. but still something more direct than a GA should work).

When I solved a problem like this I put the LPVariable objects (representing the 0 or 1 values) in an actual numpy array and used dot products to build the constraints. Another approach would be to store the LPVariables in a dict with keys of (volunteer, program) tuples and build all the constraints by looping.

Thanks everyone for the suggestions. I've been slowly trying this out, and I think it should work for me. I appreciate it.

Adbot
ADBOT LOVES YOU

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I'm trying to set up PyCharm remotely.

I've got a remote SFTP deployment server set up, and I've configured the path mappings so that I can just right click on my project folder and press "sync with deployed to remote_server" and I've verified that everything is in sync (well, at first it wanted me to download everything because of CRLF vs. LF endings, but I figured that out eventually).

Now I try to add an SSH Python interpreter and choose "existing server configuration." It's all good except it's asking me how I want to set up sync folders. It defaults to

C:/Users/me/PyCharmProjects/foo <--> /tmp/pycharm_project_1234

Umm, okay. What do I want to set here? I thought I already configured the server and mappings, so I don't understand why it's asking me again. Should I change it to

C:/Users/me/PyCharmProjects/foo <--> /home/me/PyCharmProjects/foo instead (where the project folder is actually located)?

I just don't want pycharm to nuke the entire directory on the remote server if I choose a wrong thing. I mean I could just git clone it again, but still.

What I want is if I edit bar.txt on this computer, I want it to automatically propagate to the server and vice versa. Ideally what I'd like is if the left "projects" tab just showed me files on the remote server so that I'm editing files directly on that server instead of locally, so that there's no need to sync at all.

e: If I leave the ssh interpreter mapping to default it looks like it tries to upload my entire project folder to the server, which is dumb, because this project folder already exists there and is in sync with the local computer.

e2: So if I tell the remote interpreter to map to the existing directory on the server, it just uploads everything anyway and overwrites everything. If I tell the remote interpreter to use another directory then it uploads to that directory instead. If I have the deployment server set to automatically upload changes, then I guess my files get uploaded twice. What the hell.

Boris Galerkin fucked around with this message at 19:55 on Jun 7, 2018

Ahz
Jun 17, 2001
PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!
Maybe I'm confused or you are.

The nice thing about using a remote interpreter is it runs remote on the remote python shell/env, but you're using local code via the project IDE. The remote interpreter doesn't/shouldn't use any remote code from your project. When you configure it and pick a remote python env to use via SSH and run a remote configuration via the RUN configurations dialog, your running code and environment variables right from your IDE but @ the remote executable.

You don't need to sync -> deploy files for remote execution.

Ahz fucked around with this message at 21:41 on Jun 7, 2018

CarForumPoster
Jun 26, 2013

⚡POWER⚡
Short question: When you guys write python code, do you write a function under a def and then if __name__ = __main__ do the function you just wrote?

Reason:
I've been tinkering with python for a while now but am finally working on a little project (automated GUI/Web interaction using selenium or pyautogui to automate writing and uploading test reports) that needs me to write a bunch of little functions to call with a wrapper based on what I need to do that day. I wrote my code then when I was done, indented it and wrote def funcname(): followed by if __name__ = __main__ do the function. I feel like I should do this for everything I write. Do you guys do this this way?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I...haven't actually needed a __name__ = __main__ block in literal years! Too much working with frameworks that have their own dealio for handling entry points and the like.

I know that doesn't actually answer your question, you just made me think of that.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies
I almost always have a "main()" function that "if __name__ == '__main__'" block calls if the script is to be run directly. pylint trained me that doing anything in the if statement complicated is bad, and also to never do tests based around testing if __name__ is equal to __main__ within your actual function code. This makes it way easier also to turn your script into an installed application on the commandline if you go that route.

ihatepants
Nov 5, 2011

Let the burning of pants commence. These things drive me nuts.



I have a homework assignment for my beginner's computer science class and I'm not quite sure what I'm doing wrong in my current implementation. I'm trying to read data from a CSV which includes data of customers of a fictional travel company and I'm supposed to open a .txt file template and replace the placeholders in it with the data on the CSV, then save a new txt file for each customer with the proper replacement. As if I am sending a new email to each customer in the CSV.

For example:
code:
to: [[email]]

Dear [[first_name]] [[last_name]],

We're sorry for the delay of your flight on [[Date]]. 
I've gotten to the part where I'm able to load the CSV and put the data in the CSV into an array.
code:
file_obj = open(PATH_SAVE_DIR + csv_filename, newline='')
reader = csv.DictReader(file_obj)
headers = reader.fieldnames  # list of headers

file_obj.close()

customerdata = []
with open(PATH_SAVE_DIR + csv_filename, 'r') as inf:
    reader = csv.reader(inf)
    row = next(reader)
    for row in reader:
        customerdata.append(row)
So my customer data is in an array with the following output:
code:
[['James', 'Butt', '6649 N Blue Gum St', 'New Orleans', 'Orleans', 'LA', '70116', '504-621-8927', 'jbutt@gmail.com', 'gold'], ['Josephine', 'Darakjy', '4 B Blue Ridge Blvd', 'Brighton', 'Livingston', 'MI', '48116', '810-292-9388', 'josephine_darakjy@darakjy.org', 'silver'], ['Art', 'Venere', '8 W Cerritos Ave #54', 'Bridgeport', 'Gloucester', 'NJ', '8014', '856-636-8749', 'art@venere.org', 'bronze']]
The part where I'm actually trying to replace the data of the txt file with the customer's data is throwing me off. I'm able to replace the data in the file, but it only does it with the first set of data, and never counts up in the array.
code:
file_obj = open(PATH_SAVE_DIR + EMAIL_TEMPLATE, 'r')
file_input = file_obj.read()

count = 0
for customer in range(len(customerdata)):
    customernumber = str(customer + 1)
    while count < len(customerdata):
        for word in headers:
            if word in file_input:
                index = headers.index(word)
                file_input = file_input.replace("[[" + word + "]]",
                                                customerdata[count][index])
        count += 1
    file_output = open(PATH_SAVE_DIR + EMAIL + customernumber + ".txt", 'w')
    file_output.write(file_input)
    file_output.close()
So it is able to successfully create email1.txt, email2.txt and email3.txt, but all three of the files only include the replaced data of the first customer. I tried putting "print (count)" in my "for word in headers" loop, and it seems like it only runs that for loop a single time, then doesn't attempt it again with the count going up (for the next customer in the array). How can I repeat this loop to do it for each customer? Any tips would be swell.


Edit: I figured it out and got it to do everything I wanted it to do with my limited knowledge.

code:
    try:
        with open(PATH_SAVE_DIR + EMAIL_TEMPLATE, 'r') as file_obj:
            file_input = file_obj.read()
    except FileNotFoundError:
        print ("Could not find your template file. Please try again.")
        quit()

    get_email_folder()  #  get email folder for where txt output is placed
    for customer in range(len(customerdata)):
        customernumber = str(customer + 1)  # count customer for filename
        for word in headers:  # replace each placeholder it finds in template
            if word in file_input: # placeholder from headers
                index = headers.index(word)
                replaced = file_input.replace("[[" + word + "]]",
                                              customerdata[customer][index])
                file_input = replaced
            if "[[Date]]" in file_input:  # [[Date]] placeholder
                currentdate = time.strftime("%B %d, %Y")
                replaced = file_input.replace('[[Date]]', currentdate)
                file_input = replaced
            if "[[event]]" in file_input:  # [[event]] placeholder
                replaced = file_input.replace('[[event]]', user_event)
            try:
                with open(PATH_SAVE_DIR + EMAIL + customernumber +".txt",'w') \
                as file_output:  # write replaced placeholders to new file
                    file_output.write(replaced)
            except FileNotFoundError:  # error if directory doesn't exist
                print ("Please try again with a directory that already exists.")
                quit()
        with open(PATH_SAVE_DIR + EMAIL_TEMPLATE, 'r') as file_obj:
            file_input = file_obj.read()  # reopen template for next customer
    print('\nGenerated emails saved to', PATH_SAVE_DIR + user_email_folder)

ihatepants fucked around with this message at 04:36 on Jun 8, 2018

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

Ahz posted:

Maybe I'm confused or you are.

The nice thing about using a remote interpreter is it runs remote on the remote python shell/env, but you're using local code via the project IDE. The remote interpreter doesn't/shouldn't use any remote code from your project. When you configure it and pick a remote python env to use via SSH and run a remote configuration via the RUN configurations dialog, your running code and environment variables right from your IDE but @ the remote executable.

You don't need to sync -> deploy files for remote execution.

Maybe I'm confused/misunderstanding then. I thought remote development was "ssh into a remote computer, edit files directly on remote computer with vim." I was hoping to replace that with "start pycharm, open project located on remote computer, edit code directly on remote computer." In my case I can physically access both computers. Since I've already got the code and environment and everything all set up on the "remote" computer, I thought I could just use pycharm on my "local" computer to edit those files directly (and also automatically relay all python/shell commands over to it so that the code is both edited and run there).

Anyway I think I figured it out. I've created a deployment configuration that is mapped to the remote computer as I already had, and have this set to automatically upload changes. When I add a remote ssh python interpreter I changed the path mapping there (I still don't understand why I need to set this mapping twice, in two different menus) so that it points to the actual project directory on the remote. Then I untick the "automatically upload files" box in this menu (again, I still don't understand why there are two places to edit this that are contrary to each other) to disable that. Now when I edit a file the remote deployment automatically uploads the files to the right place, and when I press play the remote interpreter runs on that directory.

I guess I'm confused about the difference between remote deployment and remote interpreter, because I thought they are different words for the same thing.

Boris Galerkin fucked around with this message at 06:17 on Jun 8, 2018

QuarkJets
Sep 8, 2008

CarForumPoster posted:

Short question: When you guys write python code, do you write a function under a def and then if __name__ = __main__ do the function you just wrote?

Reason:
I've been tinkering with python for a while now but am finally working on a little project (automated GUI/Web interaction using selenium or pyautogui to automate writing and uploading test reports) that needs me to write a bunch of little functions to call with a wrapper based on what I need to do that day. I wrote my code then when I was done, indented it and wrote def funcname(): followed by if __name__ = __main__ do the function. I feel like I should do this for everything I write. Do you guys do this this way?

Yeah constantly, even for throwaway 1-time-use code I will define a function and invoke it with an if __name__ == __main__ block. It's a great habit

Slimchandi
May 13, 2005
That finger on your temple is the barrel of my raygun

quote:

code:
 
    for customer in range(len(customerdata)):
        customernumber = str(customer + 1)  # count customer for filename
        for word in headers:  # replace each placeholder it finds in template
            if word in file_input: # placeholder from headers
                index = headers.index(word)
                replaced = file_input.replace("[[" + word + "]]",
                                              customerdata[customer][index])
                file_input = replaced

You could try using this kind of structure if Python is relatively new to you, it might help want you want to achieve. range(len(data)) is a bit of a red flag:

code:
    for customer in customerdata:
        print(customer)
And for your customer number increment

code:
    for index, customer in enumerate(customerdata):
        print(index, customer)
And if you want your counting to start from 1 rather than zero

code:
    for index, customer in enumerate(customerdata, 1):
        print(index, customer)

Ahz
Jun 17, 2001
PUT MY CART BACK? I'M BETTER THAN THAT AND YOU! WHERE IS MY BUTLER?!

Boris Galerkin posted:

Maybe I'm confused/misunderstanding then. I thought remote development was "ssh into a remote computer, edit files directly on remote computer with vim." I was hoping to replace that with "start pycharm, open project located on remote computer, edit code directly on remote computer." In my case I can physically access both computers. Since I've already got the code and environment and everything all set up on the "remote" computer, I thought I could just use pycharm on my "local" computer to edit those files directly (and also automatically relay all python/shell commands over to it so that the code is both edited and run there).

Anyway I think I figured it out. I've created a deployment configuration that is mapped to the remote computer as I already had, and have this set to automatically upload changes. When I add a remote ssh python interpreter I changed the path mapping there (I still don't understand why I need to set this mapping twice, in two different menus) so that it points to the actual project directory on the remote. Then I untick the "automatically upload files" box in this menu (again, I still don't understand why there are two places to edit this that are contrary to each other) to disable that. Now when I edit a file the remote deployment automatically uploads the files to the right place, and when I press play the remote interpreter runs on that directory.

I guess I'm confused about the difference between remote deployment and remote interpreter, because I thought they are different words for the same thing.

Well remote interpreter by itself removes any need to sync with your deployment for fast easy spin-ups. The only think you need on the remote server is the python executable and your package dependencies installed. Run, debug, edit is all local even though your instance is running remotely.

I don't see the point in remotely sync'ing files and running the remote interpreter unless you're trying to debug an issue with a live instance, like debugging prod servers or something. It seems like a redundant step and more complicated configuration to code, deploy, run and remotely remote code when you're doing active development.

Boris Galerkin
Dec 17, 2011

I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!
I agree with you and what you’re saying for normal use. In this case though, my remote server is my main workstation and the local computer I’m trying to set up is a laptop.

huhu
Feb 24, 2006
What the hell is happening here? I add a closing bracket and PyCharm just shits all over the formatting.


baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

I'm guessing it's because links should be a list of dicts and not a dict of dicts? So it's trying to parse what you've written as key:value pairs

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

That's Javascript so if you have it in braces, it's an object literal which requires a key for each value.

Or change your outer braces to brackets and make it an array literal.

Basically what they ^ said but pointing out its JS.

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

IN MY DEFENCE YOUR HONOUR I saw it as JSON which I knew was wrong anyway but I just rolled with it :shobon:

baka kaba fucked around with this message at 17:49 on Jun 9, 2018

CarForumPoster
Jun 26, 2013

⚡POWER⚡
I have a dumb newbie question but I feel like I'm missing something obvious.

I have a function to make QRcodes:

code:
def genqr(url="defaulturl", fname="defaultfname"):
    img = qrcode.make(url)
    img.save(fname)
    print('Generating QR code from URL: ', url, " Saved to", fname)
I have some excel data to pass to it. My intuition is to load this in a dataframe with:
df = pd.read_excel("data.xlsx"), giving:

code:
                     urls filenames  number
0  http://url.com/?n=1001  1001.jpg    1001
1  http://url.com/?n=1002  1002.jpg    1002
2  http://url.com/?n=1003  1003.jpg    1003
3  http://url.com/?n=1004  1004.jpg    1004
I want to loop through the urls generating QR codes and saving them with those file names. I think I am missing something obvious. I thought df.apply(qrgen.genqr, axis=1, urls, filenames) or something like that might work.

What do I need to be doing?

huhu
Feb 24, 2006

CarForumPoster posted:

I have a dumb newbie question but I feel like I'm missing something obvious.

I have a function to make QRcodes:

code:
def genqr(url="defaulturl", fname="defaultfname"):
    img = qrcode.make(url)
    img.save(fname)
    print('Generating QR code from URL: ', url, " Saved to", fname)
I have some excel data to pass to it. My intuition is to load this in a dataframe with:
df = pd.read_excel("data.xlsx"), giving:

code:
                     urls filenames  number
0  [url]http://url.com/?n=1001[/url]  1001.jpg    1001
1  [url]http://url.com/?n=1002[/url]  1002.jpg    1002
2  [url]http://url.com/?n=1003[/url]  1003.jpg    1003
3  [url]http://url.com/?n=1004[/url]  1004.jpg    1004
I want to loop through the urls generating QR codes and saving them with those file names. I think I am missing something obvious. I thought df.apply(qrgen.genqr, axis=1, urls, filenames) or something like that might work.

What do I need to be doing?

I don't see a loop anywhere? I can't be sure because I've not used the functions you're discussing but I think you're assuming something is iterating when it's not.

You might want to do something like (phone posting):
code:

for row in data: 
  genQR(row)

huhu fucked around with this message at 00:59 on Jun 10, 2018

CarForumPoster
Jun 26, 2013

⚡POWER⚡

huhu posted:

I don't see a loop anywhere? I can't be sure because I've not used the functions you're discussing but I think you're assuming something is iterating when it's not.

You might want to do something like (phone posting):
code:
for row in data: 
  genQR(row)

I thought df.apply was supposed to act like a loop

EDIT:

Thanks for the help,. I got it!

code:
import qrgen as qr
import pandas as pd

df = pd.read_excel("cases.xlsx")

for index, row in df.iterrows():
    qr.genqr(row[0],row[1])

CarForumPoster fucked around with this message at 01:18 on Jun 10, 2018

SurgicalOntologist
Jun 17, 2004

You could do it with apply too, just need to set up the function so it takes a row as input.

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SurgicalOntologist posted:

You could do it with apply too, just need to set up the function so it takes a row as input.

Im trying to learn the whole "stop using for loops" thing, can you give me an example of it with df.apply?

SurgicalOntologist
Jun 17, 2004

code:

import pandas as pd
import qrgen as qr


df = pd.read_excel("cases.xlsx")

def qr_from_row(row):
    return qr.genqr(row.url, row.filename)

df.apply(qr_from_row, axis='index')

Stringent
Dec 22, 2004


image text goes here

CarForumPoster posted:

Im trying to learn the whole "stop using for loops" thing, can you give me an example of it with df.apply?

For loops are good, don't learn the whole "stop using for loops" thing.

cinci zoo sniper
Mar 15, 2013




Stringent posted:

For loops are good, don't learn the whole "stop using for loops" thing.

For working with pandas, for loops are the slowest way to do things, and one of the less readable ones too. If you absolutely must loop over rows like a grandpa, use iterrows(). Normally you should just use apply() for iterative processes, and vectorize your poo poo over Pandas series, or, faster, NumPy arrays.

dougdrums
Feb 25, 2005
CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)
I've been thinking that the for/in/if construct was the way to do a map/filter eagerly. I haven't used pandas though, and only numpy a little, I'm not sure how they structure things. Are these libraries or CPython able to do some special sort of reductions or optimization with the functional operators?

I started out writing things with generators, and applying the for/in/if to yield and receive (via send). Would I benefit from using map/filter/reduce in this case? Or is it just that the general way to use pandas or numpy is to build a "query" using functional operators? That's what finally sold me on python tbh; I started using python to replace dotnet in general a few months ago, if that gives you some idea of my perspective.

Stringent
Dec 22, 2004


image text goes here

cinci zoo sniper posted:

For working with pandas, for loops are the slowest way to do things, and one of the less readable ones too. If you absolutely must loop over rows like a grandpa, use iterrows(). Normally you should just use apply() for iterative processes, and vectorize your poo poo over Pandas series, or, faster, NumPy arrays.

I don't use Pandas so I don't know the internals.

A common thing I deal with is new python programmers tying themselves in knots to avoid explicit for loops for no good reason. If the internals of Pandas are so interleaved with loops or other expensive operations to make looping over collections dangerous then that is certainly a consideration. But those are faults of the library and therefore not good general programming advice.

Stringent
Dec 22, 2004


image text goes here
That said, the guy was asking about Pandas so, mea culpa.

cinci zoo sniper
Mar 15, 2013




Stringent posted:

I don't use Pandas so I don't know the internals.

A common thing I deal with is new python programmers tying themselves in knots to avoid explicit for loops for no good reason. If the internals of Pandas are so interleaved with loops or other expensive operations to make looping over collections dangerous then that is certainly a consideration. But those are faults of the library and therefore not good general programming advice.

Maybe read before replying with your "wisdom" then?

That poster works specifically with Pandas, for which there are specialised looping means that are much more optimised. There's nothing dangerous about a for loop in pandas, it just normally is a up to a few orders of magnitude slower than what library offers and inhibits readability for vast majority of people who use Pandas.

This is not to say that for loops are bad or that Pandas is ideal, just that using for loops with Pandas in absolute majority of cases is a sign of a clueless newbie.

huhu
Feb 24, 2006
Not trying to start some argument, genuinely curious... The constructs I use exclusively are for item in array, for item in enumerate (array), and for item in range. Are there any good uses of the basic for loop with i=0, i++?

SurgicalOntologist
Jun 17, 2004

Honestly, in this case I'm not sure there's an efficiency difference, since the function isn't vectorized. The loop is moved to a pandas function but not a C extension (at least I don't think so). Still, I find apply more readable than iterrows, especially when it's chained with other operations. It also encourages you to encapsulate the function which should make it testable.

I don't mind resorting to iterrows if another formulation doesn't come to me, but it's my last resort. It could be that a decade of scientific computing and avoiding loops like the plague has broken my brain, though.

SurgicalOntologist
Jun 17, 2004

huhu posted:

Not trying to start some argument, genuinely curious... The constructs I use exclusively are for item in array, for item in enumerate (array), and for item in range. Are there any good uses of the basic for loop with i=0, i++?

No. In python you should never have to manage your iteration index. Maybe in a while loop or some other place where in some sense you're managing the loop logic yourself. But 99% of the time when I do that I realize there's a better way before I finish coding the loop.

SurgicalOntologist fucked around with this message at 17:01 on Jun 10, 2018

CarForumPoster
Jun 26, 2013

⚡POWER⚡

SurgicalOntologist posted:

No. In python you should never have to manage your iteration index. Maybe in a while loop or some other place where in some sense you're managing the loop logic yourself. But 99% of the time when I do that I realize there's a better way before I finish coding the loop.

Do you have anything I can read (or would you share) about why this is, particularly in the case of a while loop?

For example I wrote something that clicks a button, takes a screen shot and iterates the file name like: (I dont have the code handy) Is this bad?

code:
i=0

try:
    while True:
	i=i+1
        x = ____
	y = ____
        pyautogui.click(x, y)
	img = screenshot.grab(args)
	img.save('filename' + i + '.png')
except KeyboardInterrupt:
    print('\n')

cinci zoo sniper
Mar 15, 2013




SurgicalOntologist posted:

No. In python you should never have to manage your iteration index. [i]Maybe[i] in a while loop or some other place where in some sense you're managing the loop logic yourself. But 99% of the time when I do that I realize there's a better way before I finish coding the loop.

SurgicalOntologist posted:

Honestly, in this case I'm not sure there's an efficiency difference, since the function isn't vectorized. The loop is moved to a pandas function but not a C extension (at least I don't think so). Still, I find apply more readable than iterrows, especially when it's chained with other operations. It also encourages you to encapsulate the function which should make it testable.

I don't mind resorting to iterrows if another formulation doesn't come to me, but it's my last resort. It could be that a decade of scientific computing and avoiding loops like the plague has broken my brain, though.

Agreed. As a side note, often itertuples() or to_records() can be used in those edge cases to get some performance gains over iterrows(), which has overhead of making every row into a Series.

cinci zoo sniper
Mar 15, 2013




CarForumPoster posted:

Do you have anything I can read (or would you share) about why this is, particularly in the case of a while loop?

For example I wrote something that clicks a button, takes a screen shot and iterates the file name like: (I dont have the code handy) Is this bad?

code:
i=0

try:
    while True:
	i=i+1
        x = ____
	y = ____
        pyautogui.click(x, y)
	img = screenshot.grab(args)
	img.save('filename' + i + '.png')
except KeyboardInterrupt:
    print('\n')

This is not really wrong and slightly detached from the topic of index janitoring. Nevertheless, I'd likely rewrite try block as:

code:
for i in itertools.count():
   do_stuff()

dougdrums
Feb 25, 2005
CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)

CarForumPoster posted:

Do you have anything I can read (or would you share) about why this is, particularly in the case of a while loop?

I believe the usual thing to do is to use range, count, or xrange like so:
code:
for i in range(100):
    print(i)
I dunno if click() blocks or something but I'd just do this assuming it does:
code:
def take_screenshot(x, y, *args):
    pyautogui.click(x, y)
    return screenshot.grab(*args)

for index, image in zip(count, partial(take_screenshot, x, y)):
    image.save(f'filename{index}.png')
No idea if that works ... e: oh i forgot the args

dougdrums fucked around with this message at 17:34 on Jun 10, 2018

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

I think CarForumPoster's example is a whole other thing, that's just an infinite loop that happens to contain an incrementing counter. It's not actually iterating over anything like a collection or a fixed range of values, and I think that's the simplest way to achieve it?

But it's probably better to have a range instead of an infinite loop yeah (you'll run into problems eventually)

CarForumPoster
Jun 26, 2013

⚡POWER⚡

baka kaba posted:

I think CarForumPoster's example is a whole other thing, that's just an infinite loop that happens to contain an incrementing counter. It's not actually iterating over anything like a collection or a fixed range of values, and I think that's the simplest way to achieve it?

But it's probably better to have a range instead of an infinite loop yeah (you'll run into problems eventually)

Correct. I have to watch a zillion training videos for a new job and I am too lazy to click next and they had tests, but I appreciate the other perspectives and solutions its definitely a problem I encounter a lot where I need to do it over a limited number of items in a list or something.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

CarForumPoster posted:

Do you have anything I can read (or would you share) about why this is, particularly in the case of a while loop?

For example I wrote something that clicks a button, takes a screen shot and iterates the file name like: (I dont have the code handy) Is this bad?

code:
i=0

try:
    while True:
	i=i+1
        x = ____
	y = ____
        pyautogui.click(x, y)
	img = screenshot.grab(args)
	img.save('filename' + i + '.png')
except KeyboardInterrupt:
    print('\n')

That's not the same thing as what is being talked about. What's being talked about is iterating over a collection of things.

But, I'd probably use itertools.count

Python code:
for i in itertools.count(1):
   print(i)
   print("I never happen because count goes on forever")
Simple and clear.

dougdrums
Feb 25, 2005
CLIENT REQUESTED ELECTRONIC FUNDING RECEIPT (FUNDS NOW)

cinci zoo sniper posted:

This is not to say that for loops are bad or that Pandas is ideal, just that using for loops with Pandas in absolute majority of cases is a sign of a clueless newbie.

I don't know about pandas, but it would makes sense to me to use functional methods when using numpy because that's what most of the people using numpy would be used to, outside of python. I'd think that numpy probably does some special reductions, or can parse that info otherwise though.

CarForumPoster posted:

Correct. I have to watch a zillion training videos for a new job and I am too lazy to click next and they had tests, but I appreciate the other perspectives and solutions its definitely a problem I encounter a lot where I need to do it over a limited number of items in a list or something.

I can't think of a place in python for it. You'd only need to explicitly specify an iterator variable like that if you wish for it to be modified somewhere in the loop. The only example that comes to mind is for a parser to skip ahead in a source document when the iterator is passed by pointer in C -- which is a niche case.

The essence of an iterator index is to define an order. In the 'screenshot' case here, if the order didn't matter, you could very well leave 'i' out and use a random number. If the order does matter, using something informative and already ordered -- like the current time -- would not be a bad upgrade. Using the time would mean you can leave out 'i' here too.

So, you would not need an explicit iterator count for a couple reasons: The set you're working on already has some notion of order you can go by; or you just don't care about the order anyways.

Adbot
ADBOT LOVES YOU

cinci zoo sniper
Mar 15, 2013




dougdrums posted:

I don't know about pandas, but it would makes sense to me to use functional methods when using numpy because that's what most of the people using numpy would be used to, outside of python. I'd think that numpy probably does some special reductions, or can parse that info otherwise though.

Yeah NumPy will internally route stuff to C or Fortran code to speed things up, including hardware-specific optimisations. Pandas does the same twofold, since it has both its own C stuff and is largely built on top of NumPy (wrt data structures and such).

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply