Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nippashish
Nov 2, 2005

Let me see you dance!
The arange and the reciprocal aren't helping with memory use either.

Adbot
ADBOT LOVES YOU

Hughmoris
Apr 21, 2007
Let's go to the abyss!

Dr Subterfuge posted:

Use np.sum instead of np.cumsum. It should just return a scalar in this case, which is all you want anyway.

Awesome. This change allowed the RP3 script to complete the original 50,000,000.

Original pure python solution: 64 seconds
Numpy solution: 3 seconds.

QuarkJets
Sep 8, 2008

Dr Subterfuge posted:

I had to resist just giving a trite answer of "use numpy." Is there any other python way that comes close to the same speed?

Numba would likely be just as fast if not faster (it takes a little time to compile the function but then you no longer need to create huge temporary arrays), and the implementation would be adding 1 decorator to version A

Spime Wrangler
Feb 23, 2003

Because we can.

Dr Subterfuge posted:

Use np.sum instead of np.cumsum. It should just return a scalar in this case, which is all you want anyway.

Yeah of course do this lol

Hughmoris
Apr 21, 2007
Let's go to the abyss!

QuarkJets posted:

Numba would likely be just as fast if not faster (it takes a little time to compile the function but then you no longer need to create huge temporary arrays), and the implementation would be adding 1 decorator to version A

Holy moly, Numba flies. I hopped over to my main desktop to test (since it appears numba cant run on RP3)
Python code:
max_n = 50000001


def version_numpy():
    """
    Numpy to generate reciprocals and to sum
    """
    xs = np.arange(1, max_n)
    rs = 1/xs
    x = np.sum(rs)
    print(x)

@jit
def version_numba():
    """
    Numba to generate reciprocals and to sum
    """
    x = 0
    for i in range(1, max_n):
        x = x + (1 / i)
    print(x)

n = 20

total_numpy = timeit(version_numpy, number=n)
total_numba = timeit(version_numba, number=n)

print('avg time for numpy = {0:0.3f} seconds'.format(total_numpy/n))
print('avg time for numba = {0:0.3f} seconds'.format(total_numba/n))
code:
avg time for numpy = 0.285 seconds
avg time for numba = 0.052 seconds

Munkeymon
Aug 14, 2003

Motherfucker's got an
armor-piercing crowbar! Rigoddamndicu𝜆ous.



I think division might happen in the OS instead of being an instruction on ARM, so your desktop would have wildly different performance for that specific operation. At least for integers - dunno about floats.

SurgicalOntologist
Jun 17, 2004

This is a very specific question but maybe someone has been in this situation before.

I have a SQL database I'm managing with SQLAlchemy. Lots of tables, lots of code, using lots of different SQLAlchemy features. I'm now going to build a web frontend for the whole thing with Flask. So I'm wondering if there is anything to look out for porting from Sqlalchemy to Flask-Sqlalchemy. The base classes are different so there's at least a chance for some shenanigans. Luckily I have extensive unit tests so I should at least become aware of any issues but I thought I'd ask around before I start the migration.

PBS
Sep 21, 2015

SurgicalOntologist posted:

This is a very specific question but maybe someone has been in this situation before.

I have a SQL database I'm managing with SQLAlchemy. Lots of tables, lots of code, using lots of different SQLAlchemy features. I'm now going to build a web frontend for the whole thing with Flask. So I'm wondering if there is anything to look out for porting from Sqlalchemy to Flask-Sqlalchemy. The base classes are different so there's at least a chance for some shenanigans. Luckily I have extensive unit tests so I should at least become aware of any issues but I thought I'd ask around before I start the migration.

I couldn't figure out how to get cursors working with it, but I'm new to it and python so it's probably just me.

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

SurgicalOntologist posted:

This is a very specific question but maybe someone has been in this situation before.

I have a SQL database I'm managing with SQLAlchemy. Lots of tables, lots of code, using lots of different SQLAlchemy features. I'm now going to build a web frontend for the whole thing with Flask. So I'm wondering if there is anything to look out for porting from Sqlalchemy to Flask-Sqlalchemy. The base classes are different so there's at least a chance for some shenanigans. Luckily I have extensive unit tests so I should at least become aware of any issues but I thought I'd ask around before I start the migration.

flask-sqlalchemy is almost completely useless, just use normal sqlalchemy and like 5-15 lines of wrapper code

flask is good as hell, flask-hyphen-stuff sucks

SurgicalOntologist
Jun 17, 2004

haha ok, easy enough

SnatchRabbit
Feb 23, 2006

by sebmojo
I'm writing a function that checks over some db snapshots. First, it is going to check the timestamp on the snapshot. If older than 60 days it will check for some tags. If it finds the tag I want, it will delete the snapshot. If not I want it to give me a skipping message. The function seems to evaluate all my snapshots more or less correctly, the issue I am having is that I only get the "snapshotid does not have Weekly tag, skipping" only shows up when there is a list of other tags present, but not if the tags are completely empty. Is there a relatively simple way to adapt my if statement to accommodate an empty tag list?


code:
  for snapshot in dbsnapshots:
    dbsnapshotid = snapshot['DBSnapshotIdentifier']
    dbsnapshotcreatetime = snapshot['SnapshotCreateTime']
    dbsnapshotarn = snapshot['DBSnapshotArn']
    timedifference = currentdate - dbsnapshotcreatetime
    if timedifference.days>60:
      print (dbsnapshotid + " is older than 60 days. Checking tags")
      tags = client.list_tags_for_resource(ResourceName=dbsnapshotarn)
      tags = tags['TagList']
      for tag in tags:
        if tag["Key"] == 'DBBackupFrequency' and tag["Value"] == 'Weekly':
          print ('Weekly tag found, Deleting snapshot')
        else:
          print (dbsnapshotid + ' does not have Weekly tag, skipping')
    else:
      print (dbsnapshotid + " is not older than 60 days, skipping")

M. Night Skymall
Mar 22, 2012

SnatchRabbit posted:

I'm writing a function that checks over some db snapshots. First, it is going to check the timestamp on the snapshot. If older than 60 days it will check for some tags. If it finds the tag I want, it will delete the snapshot. If not I want it to give me a skipping message. The function seems to evaluate all my snapshots more or less correctly, the issue I am having is that I only get the "snapshotid does not have Weekly tag, skipping" only shows up when there is a list of other tags present, but not if the tags are completely empty. Is there a relatively simple way to adapt my if statement to accommodate an empty tag list?


I'd just add a check for an empty list like so:

code:
  for snapshot in dbsnapshots:
    dbsnapshotid = snapshot['DBSnapshotIdentifier']
    dbsnapshotcreatetime = snapshot['SnapshotCreateTime']
    dbsnapshotarn = snapshot['DBSnapshotArn']
    timedifference = currentdate - dbsnapshotcreatetime
    if timedifference.days>60:
      print (dbsnapshotid + " is older than 60 days. Checking tags")
      tags = client.list_tags_for_resource(ResourceName=dbsnapshotarn)
      tags = tags['TagList']
      for tag in tags:
        if tag["Key"] == 'DBBackupFrequency' and tag["Value"] == 'Weekly':
          print ('Weekly tag found, Deleting snapshot')
        else:
          print (dbsnapshotid + ' does not have Weekly tag, skipping')
      if not tags:
        print (dbsnapshotid + ' does not have Weekly tag, skipping')
    else:
      print (dbsnapshotid + " is not older than 60 days, skipping")

SurgicalOntologist
Jun 17, 2004

That would work, but to better reflect the logic of the operation (you want to know if the tag is there or not) I would do it something like this:
Python code:
if  any(tag['Key'] == 'DBBackupFrequency' and tag['Value'] == 'Weekly' for tag in tags):
    print(f'Weekly tag found, deleting snapshot {dbsnapshotid}')
else:
    print(f'Snapshot {dbsnapshotid} does not have weekly tag, skipping')
Any will return False if the list is empty--it's the funcion you're looking for here.

Edit: Also, I would recomend using the continue statement to skip an element in a loop. More readable than if/else spaghetti--the actual delete code will be only one level nested (the for loop) rather than under a couple of ifs.
Python code:
for snapshot in dbsnapshots:
    dbsnapshotid = snapshot['DBSnapshotIdentifier']
    dbsnapshotcreatetime = snapshot['SnapshotCreateTime']
    dbsnapshotarn = snapshot['DBSnapshotArn']

    timedifference = currentdate - dbsnapshotcreatetime
    if timedifference.days <= 60:
        print (f'{dbsnapshotid} is not older than 60 days, skipping')
        continue

    tags = client.list_tags_for_resource(ResourceName=dbsnapshotarn)['TagList']
    if not any(tag['Key'] == 'DBBackupFrequency' and tag['Value'] == 'Weekly' for tag in tags):
        print(f'Snapshot {dbsnapshotid} does not have weekly tag, skipping')
        continue

    print(f'Weekly tag found, deleting snapshot {dbsnapshotid}')
    # Do the delete here.
    

SurgicalOntologist fucked around with this message at 18:16 on Sep 12, 2018

SnatchRabbit
Feb 23, 2006

by sebmojo
Thanks!

Cirofren
Jun 13, 2005


Pillbug

SurgicalOntologist posted:

Edit: Also, I would recomend using the continue statement to skip an element in a loop.

This looks great and is rarely something that comes to mind. Thanks for this.

mr_package
Jun 13, 2000
When working with pathlib do you write your functions so that they assume they will receive Path objects, or do you still call Path(input_value) on input in case something is a string? Similarly, when the paths come into your program as strings (user input, databases, etc.) do you convert them to Path right away or do you save them as strings and then convert to Path when needed?

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

The caller of my functions already has to conform to the signature of my function, so I don't get wishy-washy with the types of my parameters.

If I'm working with pathlib, I take Path objects not strings.

Data Graham
Dec 28, 2009

📈📊🍪😋



In general I like to write my methods to take richer objects rather then sparser. It’s easier to expand the later that way (example: method that takes an array of things to operate on rather than a single thing)

velvet milkman
Feb 13, 2012

by R. Guyovich
I pretty much exclusively work with Path objects now, including type annotations for methods, and just cast them to strings when totally necessary.

Tortilla Maker
Dec 13, 2005
Un Desmadre A Toda Madre
Brief: How can I iterate through a list when I need to specify the index number?

New to programming languages so please excuse bad phrasing/terminology.

I need to work with an API that limits query returns to just 50 rows of data per call. To allow for pagination, I can specify the starting row I'd like to query for:
Call 1: Start at row 0
Call 2: Start at row 50
Call 3: etc.

As I need far more than 50 rows, I need to query across many, many calls/paginations. I wrote code to loop through a few instances and it successfully calls the API the designated number of times. However, since I"m iterating with a loop, I can't append to a dictionary so I'm having to use a list.

I want to send the JSON return into a dataframe but I'm having trouble figuring out how to do this loop. Since it's a list, I'm having to designate the index number but I'm not sure how I can iterate through the full sequence automatically.

This is an example of how I would call each index manually from the list:
code:
vin = []

for item in json_response[0]['listings']:
    vins.append({'vin':item['vin']})

for item in json_response[1]['listings']:
    vins.append({'vin':item['vin']})
    
for item in json_response[2]['listings']:
    vins.append({'vin':item['vin']})
This is a failed attempt to loop through the index:
code:
vin = []
for item in range(len(json_response['listings'])):
    vins.append({'vin':item['vin']})
Any thoughts?

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
enumerate is your friend

also, you don't need it

also, i would've named the toplevel var `json_responses`

code:
vins = []
for json_response in json_responses:
    vins.extend(list(map(lambda x: {'vin': x['vin']}, json_response["listings"])))
or,

code:
vins = sum(map(process_json_response, json_responses))

def process_json_response(resp):
    return list(map(lambda x: {'vin': x['vin']}, resp["listings"]))
not very iteratorish, i must admit

bob dobbs is dead fucked around with this message at 05:49 on Sep 16, 2018

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

In case that's not clear (since you said you're new to this)

Python code:
json_response = [
    {'listings': [
            {'vin': 1234},
            {'vin': 9999}
        ]
    },
    {'listings': [
            {'vin': 3456}
        ]
    }
]

vins = []
for item in json_response:
    for listing in item['listings']:
        vins.append({'vin': listing['vin']})

print(vins)
The first loop is grabbing each item in the response, and the second loop pulls out the 'listings' object from that item dictionary, and loops over each listing. And inside that loop, you're pulling the vin from the listing, and popping it a new dictionary to add to the list you're building

So you don't actually need to mess with indices, you take a thing and say "for thing in collection_of_things" and write your code to mess with each 'thing' in the loop body. Sometimes (like here) you need to do that again inside the loop, because 'thing' has another bunch of stuff you want to iterate over, and so on

There are way cleverer and more concise ways of doing this (like bob's examples) but I just wanted to lay out the basics just in case

Tortilla Maker
Dec 13, 2005
Un Desmadre A Toda Madre
Thank you both for your responses. This was really helpful and definitely put me on the right path!

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Tortilla Maker posted:

Thank you both for your responses. This was really helpful and definitely put me on the right path!

this sort of manipulation is always a lot easier with a lil copied concrete example in comments somewhere or a type annotation

Dominoes
Sep 20, 2007

Now that Python 3.7's been out for a few months, what do you think of dataclasses?

I've been making liberal use of them; they remind me of structs in C or Rust, yet include features you'd expect in a high-level language, like default behavior for printing and equality. For many cases, I seem them as a nicer way to bundle data than [named]tuples.

cinci zoo sniper
Mar 15, 2013




Same opinion as yours, neat replacement to names tipped.

Slimchandi
May 13, 2005
That finger on your temple is the barrel of my raygun
I'm trying to understand more about environments and how they will fit in to my use case, but I'm struggling so any help appreciated.

I've been developing apps with a GUI built in ipywidgets, so they are intended to run entirely within a notebook. I have created separate packages and host them as a zip on a local file server.

I would also like to have separate environments for each app, to control external dependencies and versions etc. Ideally I would include the env with the python files in the package.

I'm not sure whether I can either:
I
1) create separate envs and then choose the correct env inside jupyter notebook (preferable)

Or

2) create separate envs and then install notebook into each environment, so you have to preselect the correct env before launching notebook and each app.

Or do I completely misunderstand all of this?

QuarkJets
Sep 8, 2008

The second one. An environment describes what modules (and executables) will be available to the notebook

There may be a way to do the first but I can't imagine it so it's probably convoluted and difficult

Jose Cuervo
Aug 25, 2004
I taught a class for a colleague last week that used Jupyter notebooks. Each student was told to download the notebook from the class website, and then we went through the notebook in class and the students had cells where they had to type their own code.

One of the issues I encountered when plotting using seaborn was that not everyone had the exact same plot - i.e., the scatter plot matrix looked slightly different between students (the underlying shape etc was correct, but the presentation was different). I think this came down to the fact that not everyone was using Python 3 like I was, and not everyone had the same version of seaborn installed.

Another issue was pandas.cut() worked slightly differently for everyone because of changes between versions.

Question: Is there a standard / simple way in the Notebook to ensure that everyone uses the same version of Python, and the same version of the packages being imported?

Symbolic Butt
Mar 22, 2009

(_!_)
Buglord
Installing Anaconda seems to be the best solution for this in my experience.

Loel
Jun 4, 2012

"For the Emperor."

There was a terrible noise.
There was a terrible silence.



M. Night Skymall posted:

The other problem with that code is that the exception you're using the try/except block to catch happens when you convert the input from a string into an integer, you need to move that part inside the try.

Aha, thank you :D

My next project, I'm trying to get the top player each morning from a fantasy football site.

code:
import bs4, requests

resRB = requests.get('http://www03.myfantasyleague.com/2018/adp?COUNT=200&POS=RB&ROOKIES=0&INJURED=0&CUTOFF=5&FRANCHISES=12&IS_PPR=1&IS_KEEPER=0&IS_MOCK=0&TIME=')
res.raise_for_status() 
Round1 = sb4.BeautifulSoup(resRB.text, 'html.parser') #this is getting the website info for RBs
Ive gotten this far to pull the data for the site, but Im not clear on how to format 'grab the top guy's name'.

OnceIWasAnOstrich
Jul 22, 2006

Jose Cuervo posted:

Question: Is there a standard / simple way in the Notebook to ensure that everyone uses the same version of Python, and the same version of the packages being imported?

Run a JupyterHub server and set everyone to use one consistent environment?

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

How would you explain it to someone? How do they find the top guy on that page?

Sometimes you can look at the CSS and find a handy identifier for the thing you're looking for, other times you need to start thinking about how the document is structured. Find thing, get other thing within that, etc

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost

Symbolic Butt posted:

Installing Anaconda seems to be the best solution for this in my experience.

anaconda is de facto within continuum analytics a two-man dealie

one of those two men is quitting

take as you will

cinci zoo sniper
Mar 15, 2013




bob dobbs is dead posted:

anaconda is de facto within continuum analytics a two-man dealie

one of those two men is quitting

take as you will

Oh. Oh. :yikes:

bob dobbs is dead
Oct 8, 2017

I love peeps
Nap Ghost
less bad than it sounds because they have a lotta developers who are just ok familiar w/ it but lol

pmchem
Jan 22, 2010


bob dobbs is dead posted:

anaconda is de facto within continuum analytics a two-man dealie

one of those two men is quitting

take as you will

...but isn't that their primary product to the world? What's the story?

Jose Cuervo
Aug 25, 2004

Symbolic Butt posted:

Installing Anaconda seems to be the best solution for this in my experience.

What does this process look like? Get everyone in the class to download Anaconda on day one of the class, then...?

OnceIWasAnOstrich posted:

Run a JupyterHub server and set everyone to use one consistent environment?

The documentation for this looks like it is tailor made for this, thanks.

SurgicalOntologist
Jun 17, 2004

Jose Cuervo posted:

What does this process look like? Get everyone in the class to download Anaconda on day one of the class, then...?

You can have them copy-paste a line that installs the right versions of everything. Or give them an environment.yml file that specifies everything (although they will still have to copy paste the line that installs it).

JupyterHub is idea though if you have access to a server and a bit of patience to set it up. I did a data science class in JupyterHub, it was a bit of a pain to set up authentication and everything but was smooth after that. That was a couple years ago, it's probably easier now.

Adbot
ADBOT LOVES YOU

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

pmchem posted:

...but isn't that their primary product to the world? What's the story?

I'm interested in this as well. I though anaconda was the reason they existed.

I think we used to have a continuum guy who posts here sometimes...

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply