Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Sure, just wrap your whole script in a try catch and then do what you want with the captured exception.

You can get a wealth of information from sys.exc_info and using the traceback module. Probably something useful in the inspect module as well.

(or something like that, I'm typing on mobile)

Adbot
ADBOT LOVES YOU

SnatchRabbit
Feb 23, 2006

by sebmojo
I'm querying through some JSON data, and I've hit the point in where I run into Key/Value pairs. I'd like to be able to organize the query information according to a specific tag in the key/value pairs so for instance, I want each item to be categorized by its "my:environmenttag-example" key. I'm just not sure how I should be writing my loop to grab this tag and use it. The key value pair I want to use looks something like this:

code:
"Tags": [
                        {
                            "Value": "Environment123", 
                            "Key": "myenvironment:tag-example"
                        },
                        {
                            "Value": "SomeOtherTag", 
                            "Key": "my:tag-OtherTag"
                        }
code:
  EC2list = json.dumps(EC2list, indent=4, sort_keys =True, default=str)
  EC2list = json.loads(EC2list)
  EC2list = EC2list['Reservations'][0]['Instances']

  for n in EC2list:
    instanceid = n['InstanceId']
    environment = n['Tags']['Key']['my:environmenttag-example']['Value']
    writedatatofile()
I want the output to look something like:
InstanceID 1
Environment123

InstanceID2
Environment321

Space Kablooey
May 6, 2009


{item['Key']: item['Value'] for item in n['Tags']}

Mootallica
Jun 28, 2005

I might be blind because I haven't seen it discussed, but is the Python Humble Bundle worth it?


I was originally looking at the $15 tier as I don't think the other tiers would be any good for me (At best I just write small scripts to help automate things, web scraping and DB stuff) - but I remember reading earlier in the thread that Fluent Python is good and probably worth the $20?

Eela6
May 25, 2007
Shredded Hen

Mootallica posted:

I might be blind because I haven't seen it discussed, but is the Python Humble Bundle worth it?


I was originally looking at the $15 tier as I don't think the other tiers would be any good for me (At best I just write small scripts to help automate things, web scraping and DB stuff) - but I remember reading earlier in the thread that Fluent Python is good and probably worth the $20?

Fluent Python is a total steal at $20. I'm sure you can find something of worth in the rest of the bundle, too.

2nd Rate Poster
Mar 25, 2004

i started a joke

Dominoes posted:

Is it possible to override Python's built-in error handling with a third-party module import? I'd like to improve them to be more like Rust's, where it makes educated guesses about what you did wrong, and what the fix is. (eg, rather than just raise an AttributeError or NameError, point out a best guess of the attribute or variable you misspelled.)

Yes. Take a look at sys.excepthook. If you want some example code to look at sentry's raven library uses this to report errors.

Dominoes
Sep 20, 2007

Sweet

SnatchRabbit
Feb 23, 2006

by sebmojo
More json and key value issues.

code:
{
    "SecurityGroups": [
        {
            "IpPermissionsEgress": [
                {
                    "IpProtocol": "-1", 
                    "PrefixListIds": [], 
                    "IpRanges": [
                        {
                            "Description": "Allow all outgoing", 
                            "CidrIp": "0.0.0.0/0"
                        }
                    ], 
                    "UserIdGroupPairs": [], 
                    "Ipv6Ranges": []
                }
            ], 
            "Description": "Web Security Group", 
            "Tags": [
                {
                    "Value": "App1", 
                    "Key": "ApplicationName"
                }, 
                {
                    "Value": "7pm - 9pm PST Daily", 
                    "Key": "MaintenanceWindow"
                }, 
                {
                    "Value": "03-09-18", 
                    "Key": "test:RequestedDate"
                }, 
                {
                    "Value": "blabla.blablabla.blbla", 
                    "Key": "DBEndpoint"
                }, 
             
                {
                    "Value": "val1", 
                    "Key": "test:FriendlyName"
                }
            ]
    }
}
The issue i'm running into is that I can return tag but when I try to do anything with it like parse through the Key/Values I get a Key error on tag. For the lift of me I can't figure out why. I have another script that does basically the same thing but this one isn't working for some reason.

code:
client = boto3.client('ec2')
  response = client.describe_security_groups()        
  response = json.dumps(response, indent=4, sort_keys =True, default=str)
  response = json.loads(response)

  for securitygroup in response["SecurityGroups"]:
    for tag in securitygroup["Tags"]:
      if tag["Key"] == 'test:FriendlyName' and tag["Value"] in ['val1','val2','val3']:
        environment = tag['Value']
        environment = str(environment)
        writetofile()
Always give me this error:
code:
{
  "stackTrace": [
    [
      "/var/task/index.py",
      11,
      "lambda_handler",
      "for tag in securitygroup[\"Tags\"]:"
    ]
  ],
  "errorType": "KeyError",
  "errorMessage": "'Tags'"
}
The only thing I can think of is some of the tags are crazy long, so maybe its throwing off the JSON parse?

SnatchRabbit fucked around with this message at 23:51 on May 8, 2018

necrotic
Aug 2, 2005
I owe my brother big time for this!
There's an entry in SecurityGroups that doesn't have a Tags key, which would throw that error. You probably want to make sure the Tags key exists before reading it!

SnatchRabbit
Feb 23, 2006

by sebmojo

necrotic posted:

There's an entry in SecurityGroups that doesn't have a Tags key, which would throw that error. You probably want to make sure the Tags key exists before reading it!

Sorry, I don't follow. It wouldn't go [n]["SecurityGroups"][n]["Tags"]?

necrotic
Aug 2, 2005
I owe my brother big time for this!
That error means one of the securitygroup iterations does not have a key "Tags" defined, so the error is thrown. You basically want:

code:
for securitygroup in response["SecurityGroups"]:
    if "Tags" in securitygroup:
        for tag in securitygroup["Tags"]:
            # blah blah blah

SnatchRabbit
Feb 23, 2006

by sebmojo

necrotic posted:

That error means one of the securitygroup iterations does not have a key "Tags" defined, so the error is thrown. You basically want:

code:
for securitygroup in response["SecurityGroups"]:
    if "Tags" in securitygroup:
        for tag in securitygroup["Tags"]:
            # blah blah blah

Oh, gotcha, thank you!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Sometimes I go:

Python code:
# returns the value of 'Tags' or an empty list
tags = securitygroup.get('Tags', [])  

for tag in tags:
    # if there was no 'Tags', then nothing happens 
    # because you're iterating over an empty list
Saves an extra level of indentation.

unpacked robinhood
Feb 18, 2013

by Fluffdaddy
I have trouble structuring my code.
I have a bunch of scrappers running and adding news articles to a db.
Now I'd like to recursively (not sure it's the correct word) call my scrappers using keywords extracted from the content, to build a network graph spreading around the initial search.

This is the kind of code I have in mind:

Python code:
    initial_search="funny shoe"
    run_scrappers(initial_search)
    # matching articles are now in the db
    start_words=[[initial_search]]
        for idx, cur_kwds in enumerate(start_words):
            for word in cur_kwds
                docs_a = dbf.find_with_content(word) #  text search for articles containing the current keyword
                docs_b = dbf.find_with_search_term(word) # search for articles yielded when searching the current keyword
                docs = docs_a + docs_b
                # docs is a list of Articles
                for d in docs:
                    # compute keyword extraction from each article's text content
                    # end up with a fresh list of keywords
                    # run scrappers on each keyword
                    # ?
All those things work separately, but I'm not quite sure how to get them to fit together.
In this example 'funny shoe' yields a bunch of articles, each article contains keywords that the scrappers look for, and a set of rules I'll figure later will decide if a node should exist, and so on.
Ideally I'd have a stop condition on a maximum path length from the starting point.

unpacked robinhood fucked around with this message at 18:12 on May 10, 2018

baka kaba
Jul 19, 2003

PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Phoneposting, but you probably want a function that takes a list of keywords, and what you have so far (the state). Then it does its thing with all the keywords, getting the next set and adding whatever to the state, and returning that state and the new set of keywords

So you basically call it with an empty state and your first keywords (maybe just a list of one), it does the first round and hands you back the results. Then you can call it again with the updated stuff, as many times as you need to, by calling it from a loop until you're done

LochNessMonster
Feb 3, 2005

I need about three fitty


I'm trying to grab the SSL certificates from a dozen different sites and want to check the validity of them. This works perfectly fine while doing it purely based on fqdn. For some of the servers I don't know the hostname, or can't reach them directly and I can only access through specific URL's. So https://www.company.com goes to server1, company.com/some-path goes to server2 and company.com/some-other-path goes to server3, but I don't know the direct names for servers 2 and 3 (or can't access them on their hostname only).

I'm using the socket and ssl libraries to do this, and I think that socket only takes a hostname as input, not a URL. I've been looking at the requests module and while it does SSL verification, it seems I can't retrieve the certificate as a x509 object as it's being handled by urllib3 and is dismissed before requests returns the requested URL.

Any ideas on how to make this work?

Lysidas
Jul 26, 2002

John Diefenbaker is a madman who thinks he's John Diefenbaker.
Pillbug

Are you absolutely certain that you are communicating with a different server when you access (e.g.) https://www.company.com/some-path vs. https://www.company.com/? That is generally not how DNS, TCP/IP, TLS/SSL, and HTTP work -- one cannot have a specific URL path served by a different publicly accessible web server, with a different certificate, from the same hostname.

When you connect to https://www.company.com/some-path, the following happens, more or less:
  1. Your system performs a DNS query for www.company.com
  2. Your TLS/SSL library connects to the appropriate IP address over port 443
  3. Your HTTP client uses the TLS-wrapped socket to request the path /some-path with hostname www.company.com
  4. The server software fulfills this request in whatever way is appropriate

(Note that SNI alters this process a bit, allowing server software to return different certificates when different hostnames are requested, but this doesn't help you. This only applies when you have multiple hostnames mapping to the same IP address, not the same hostname apparently referencing different machines.)

If you're absolutely sure that a different machine is responsible for the path https://www.company.com/some-path vs. https://www.company.com/, that doesn't mean you can directly connect to that machine to obtain its certificate. One can configure a reverse proxy like nginx to route requests for certain paths to different internal or even external servers, but in this case you are not communicating with that other server to get the content for https://www.company.com/some-path -- the server software for https://www.company.com/ is making this connection, presumably validating the other system's certificate if connecting over HTTPS, and then relaying the data to you. In this situation, you never directly connect to the other server, and even if it's publicly accessible, you have no way of knowing its hostname/IP address/anything about it.

e: and of course someone can set up https://www.company.com/some-path to redirect to a different hostname/path altogether, in which case it should be pretty trivial to parse that reply in your software to connect to that other host instead

Lysidas fucked around with this message at 22:34 on May 12, 2018

Mugsbaloney
Jul 11, 2012

We prefer your extinction to the loss of our job

Hi friends, hope you can help me with my homework.

I have a bunch (900) word documents on Sharepoint, and my boss wants me to read each document and extract features so we can do some Machine Learning. Normally I'd just save them locally and use docx but I'm told that this is not secure and I have to find a way of requesting the files in the script then posting them back to Sharepoint - ie the documents shouldn't be sitting in readable format on my hard drive.

Here's what I've got so far, which throws a 403 error :

import requests
from requests.auth import HTTPBasicAuth

USERNAME = "MUGSBALONEY@CORP.com"

PASSWORD = "SECRET"

response = requests.get("https://foosharepointurl", auth=HTTPBasicAuth(USERNAME, PASSWORD))

print(response.status_code)



Any guidance gratefully received

Dr Subterfuge
Aug 31, 2005

TIME TO ROC N' ROLL
It looks like Sharepoint doesn't like your authentication method. I don't know anything about Sharepoint, and there are conflicting pieces of information online about how to properly authenticate your session, but I can say that it doesn't look like basic authentication is the way to go here.

The Fool
Oct 16, 2003


Yeah, Basic Auth doesn't work for Sharepoint.


You can do user auth if you're using the Sharepoint CSOM: https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/complete-basic-operations-using-sharepoint-client-library-code

The "correct" way is to use an app registration and authentication token through the graph api: https://docs.microsoft.com/en-us/onedrive/developer/rest-api/ https://docs.microsoft.com/en-us/onedrive/developer/rest-api/getting-started/app-registration

edit: just realized this was a the python thread, here's something python specific that my help: https://github.com/vgrem/Office365-REST-Python-Client

The Fool fucked around with this message at 18:23 on May 14, 2018

LochNessMonster
Feb 3, 2005

I need about three fitty


Lysidas posted:

Are you absolutely certain that you are communicating with a different server when you access (e.g.) https://www.company.com/some-path vs. https://www.company.com/? That is generally not how DNS, TCP/IP, TLS/SSL, and HTTP work -- one cannot have a specific URL path served by a different publicly accessible web server, with a different certificate, from the same hostname.

When you connect to https://www.company.com/some-path, the following happens, more or less:
  1. Your system performs a DNS query for www.company.com
  2. Your TLS/SSL library connects to the appropriate IP address over port 443
  3. Your HTTP client uses the TLS-wrapped socket to request the path /some-path with hostname www.company.com
  4. The server software fulfills this request in whatever way is appropriate

(Note that SNI alters this process a bit, allowing server software to return different certificates when different hostnames are requested, but this doesn't help you. This only applies when you have multiple hostnames mapping to the same IP address, not the same hostname apparently referencing different machines.)

If you're absolutely sure that a different machine is responsible for the path https://www.company.com/some-path vs. https://www.company.com/, that doesn't mean you can directly connect to that machine to obtain its certificate. One can configure a reverse proxy like nginx to route requests for certain paths to different internal or even external servers, but in this case you are not communicating with that other server to get the content for https://www.company.com/some-path -- the server software for https://www.company.com/ is making this connection, presumably validating the other system's certificate if connecting over HTTPS, and then relaying the data to you. In this situation, you never directly connect to the other server, and even if it's publicly accessible, you have no way of knowing its hostname/IP address/anything about it.

e: and of course someone can set up https://www.company.com/some-path to redirect to a different hostname/path altogether, in which case it should be pretty trivial to parse that reply in your software to connect to that other host instead

Thanks for the extensive reply. What’s happening is that https://www.company.com is being hosted/maintained by a different department and that all traffic on https://www.company.com/some-path is redirected to my docker cluster. In front of my docker cluster there’s some nginx proxies that recieve all the /some-path requests from F5 loadbalancers setup and maintained by a completely different part of the organization.

I’m not sure how the traffic gets there but it gets there. The dirty part is that the whole site is using wild card certs (*.company.com, yes I know it’s stupid as gently caress to do this) so each server has their own keypair and certificate. Their SAN values don’t contain the real hostnames so I can’t find out the real hosts behind it either.

I’m not sure which certificates I get presented when accessing /some-path, wether that’s from the F5s or from the servers hosting the subsite.

So back to my question, is there any way to get the SSL certificates from the server hosting a specific URL or will I always get the certificate from the loadbalancers/proxies?

Mugsbaloney
Jul 11, 2012

We prefer your extinction to the loss of our job

The Fool posted:

Yeah, Basic Auth doesn't work for Sharepoint.



Thanks for the help (you too Dr), will give this a bash tomorrow!

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Kenneth Reitz did a talk on Pipenv that might be useful for anyone wondering why they should use Pipenv.

https://www.youtube.com/watch?v=GBQAKldqgZs

Dominoes
Sep 20, 2007

Here's a related Reddit thread , peppered with aggression.

Dominoes fucked around with this message at 22:53 on May 14, 2018

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

Like I replied on that thread earlier today, that whole thing is weird nerdery.

Dominoes
Sep 20, 2007

I'm firefrommoonlight there; some dude got passive-agg with me for no reason, KR took things personally, and the Pendulum/Poetry guy threw in some smugness.

Dominoes fucked around with this message at 01:17 on May 15, 2018

The Fool
Oct 16, 2003


jonwayne is a pro though

Hughmoris
Apr 21, 2007
Let's go to the abyss!
Also on Reddit is a clean list of Pycon 2018 talks.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies
The main takeaway I got from that whole thing is that Python dependency management is becoming increasingly like Go's was with multiple different "standards" (for Python, we've got setup.py, requirements.txt, Pipfile, pyproject.toml, setup.cfg, MANIFEST.in, ...) and a bunch of churn over "the way forward" especially given that some are specified in PEPs (pyproject.toml) and accepted and others are the defacto (Pipfile) based on usage in the wild.

Like, is the suggested solution right now to have a Pipfile and pyproject.toml and then to keep them manually synced for publishing/development, even though they're largely going to be the same thing (and I fail to see why we need three files considering the success of yarn/npm/cargo on a two file format).

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I think the way forward is too in flux right now.

But, I think if you choose the most popular one (pipenv), no matter what happens you will end up being fine.

Like there will probably be tools to convert your Pipfile to FuturePipfileReplacement. Actually, if whatever you use can export a requirements.txt you can probably move between all of the solutions without much effort.

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

I really like how pipenv manages your virtualenvs for you.

I really do not like how slow it is at locking dependencies.

Dominoes
Sep 20, 2007

Thermopyle posted:

I really like how pipenv manages your virtualenvs for you.

I really do not like how slow it is at locking dependencies.
Same on both, as I posted in the reddit thread. It's been a gamechanger for me.

SurgicalOntologist
Jun 17, 2004

I think what's often missing in these discussions is the difference between applications and libraries. I didn't read the whole reddit thread but KR does mention that pipenv is only for applications, I'm just not sure it got much attention.

This reminded me that I've never used pipenv and though I usually write library code I have one or two applications that I could port. I currently use a conda requirements file for those. Should I switch?

SurgicalOntologist fucked around with this message at 02:48 on May 15, 2018

Thermopyle
Jul 1, 2003

...the stupid are cocksure while the intelligent are full of doubt. —Bertrand Russell

SurgicalOntologist posted:

This reminded me that I've never used pipenv and though I usually write library code I have one or two applications that I could port. I currently use a conda requirements file for those. Should I switch?

Well, it's easier I guess.

Just try it out. You're not commiting yourself to anything and it'll take just a few minutes to set it up.

Dominoes
Sep 20, 2007

SurgicalOntologist posted:

Should I switch?
You should try it.

Bash code:
pip install pipenv
cd ~/myproject
pipenv install
Then you can

Bash code:
pipenv shell
from inside your proj directory to activate the environment

or
Bash code:
pipenv run command
to run a single command.

SurgicalOntologist
Jun 17, 2004

I see. I guess the reason I've never bothered with something like that is that I never navigate to the project directory to run a CLI application. That doesn't make sense to me, the source code is one thing but I'm usually using the application to interact with files somewhere else (e.g. a data store) or not interacting with files at all and I just won't leave my home directory. So with conda my environments are set up by name rather than by path, which fits my workflow. Am I weird for not wanting to navigate to my source code to run it?

All that said, it does look simple, and you're right Thermopyle it's low cost low risk, and there's no harm in one more tool in my belt, so I'll at least give it a shot.

While we're on the topic I'll mention a couple things I like about conda (both of which clearly have pros and cons). It can handle non-Python dependencies, which has come in handy several times. And if you install the same package version in multiple environments, it hard-links them if your file system supports it, saving space.

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies

Thermopyle posted:

I think the way forward is too in flux right now.

But, I think if you choose the most popular one (pipenv), no matter what happens you will end up being fine.

Like there will probably be tools to convert your Pipfile to FuturePipfileReplacement. Actually, if whatever you use can export a requirements.txt you can probably move between all of the solutions without much effort.
While I agree in theory that tying oneself to the most popular one will generally be fine, after looking through poetry, I think it's probably taking the better approach of rewriting what pip and pipenv on top of it does into one package that then uses one single format and having that format be used for installations, building, and packaging (instead of just the first two in the case of Pipenv and then leaving pip to do the third totally separately). It also doesn't help that Pipenv literally advertises itself as THE "Python packaging tool" when it's more of an installation dependency management system.

SurgicalOntologist posted:

I think what's often missing in these discussions is the difference between applications and libraries. I didn't read the whole reddit thread but KR does mention that pipenv is only for applications, I'm just not sure it got much attention.
Except what about for developing libraries? I think it's more that including/excluding setup.py should be determined on if you're writing an application or a library (exclude on former, include on latter), but you should probably always have a Pipfile and Pipfile.lock for ease of development? KR has certainly included them in requests at the very least.

SurgicalOntologist
Jun 17, 2004

Master_Odin posted:

Except what about for developing libraries? I think it's more that including/excluding setup.py should be determined on if you're writing an application or a library (exclude on former, include on latter), but you should probably always have a Pipfile and Pipfile.lock for ease of development? KR has certainly included them in requests at the very least.

Genuinely asking, what are the benefits for ease of development? Once I have setup.py, if I want to develop a new machine I just run pip install -e. and I'm ready to go.

Also, exclude setup.py on apps? My approach is the opposite, I always include setup.py but I also include a file that specificies concrete dependencies for apps. Setup.py, for example, will create entry points for me and allow me to import app functions for development.

SurgicalOntologist fucked around with this message at 04:33 on May 15, 2018

Master_Odin
Apr 15, 2010

My spear never misses its mark...

ladies

SurgicalOntologist posted:

Genuinely asking, what are the benefits for ease of development? Once I have setup.py, if I want to develop a new machine I just run pip install -e. and I'm ready to go.
The answer to that probably depends on your use of virtualenv/pipenv.

Adbot
ADBOT LOVES YOU

vikingstrike
Sep 23, 2007

whats happening, captain

Dominoes posted:

You should try it.

Bash code:
pip install pipenv
cd ~/myproject
pipenv install
Then you can

Bash code:
pipenv shell
from inside your proj directory to activate the environment

or
Bash code:
pipenv run command
to run a single command.

I’m a conda user too that would be up for trying something new that’s supported more generally by the language. Couple of questions:

- To install packages, would I just use pip once the environment is activated?
- What would be the easiest way to install MKL compiled packages like numpy? The ease of this is one the things that attracted me to conda

Not really related, but has anyone used the Intel python distribution? How does it differ from the packages I install through conda? I’ve seen reports online of where it’s quicker but I feel like I’m missing something.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply