Python

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »

accipter: Sep 12, 2003

Mrenda posted:

Is there a particular setup I should have for developing with Python on windows?

I would recommend a miniconda installation.

https://conda.io/miniconda.html

# ? Jul 6, 2018 14:55

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 04:15

SurgicalOntologist: Jun 17, 2004

Boris Galerkin posted:

I'm trying to play around with __getattr__ and __setattr__ so that I can do something like the following:
code:
class MyClass:
    def __init__(self):
        self._whatever = OtherClass()

    def __getattr__(self, k):
        # if self._whatever (OtherClass) has k then return self._whatever.k
        # elif self has k then return self.k
        # else raise error

    def __setattr__(self, k, v):
        # if self._whatever has k then set self._whatever.k = v
        # else self.k = v if and only if self._whatever does not have k
I am getting recursion errors the moment I add the __setattr__ method and no amount of searching has given me a clear answer on how to fix this.

I found this stackoverflow post and it's basically the same problem I am having. The help responses given there are extremely cryptic and not helpful at all. One person wrote some paragraphs explaining what was happening but nobody ever actually answers the question

e: I would like to avoid implementing @property and @setters for everything.

You need to avoid calling __setattr__ again in __setattr__... which is what will happen if you have

code:

self._whatever = k

Instead you could try

code:

super().__setattr__('_whatever', v)

code:

self.__dict__['_whatever'] = v

I forget which is more recommended. You should do the analagous in getattr for your fallback behavior.

# ? Jul 6, 2018 14:58

NtotheTC: Dec 31, 2007

Mrenda posted:

Is there a particular setup I should have for developing with Python on windows?

Is developing on windows a requirement or a preference?

Also: Web development or desktop app or machine learning etc?

NtotheTC fucked around with this message at 15:21 on Jul 6, 2018

# ? Jul 6, 2018 15:08

NtotheTC: Dec 31, 2007

Quote is not edit

NtotheTC fucked around with this message at 15:20 on Jul 6, 2018

# ? Jul 6, 2018 15:11

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

TBH, just a vanilla Python Windows install works pretty well nowadays.

5 years ago it was bad. 2 years ago it was kinda frustrating. Now it's OK.

# ? Jul 6, 2018 15:13

cinci zoo sniper: Mar 15, 2013

I second Anaconda, I just normally spring for the full sized distro since it just works (tm).

# ? Jul 6, 2018 15:18

Mrenda: Mar 14, 2012

NtotheTC posted:

Is developing on windows a requirement or a preference?

Also: Web development or desktop app or machine learning etc?

I have python setup in a Linux VM where I'm playing around with Flask, but I want to be able to use it for stuff on Windows too rather than switching about in languages. I've done a few MOOC courses but never really made/finished anything, so I'm just looking to make a lil' program for Win10 that pulls weather info from an API I've yet to find, stores it and displays it.

# ? Jul 6, 2018 15:25

NtotheTC: Dec 31, 2007

Mrenda posted:

I have python setup in a Linux VM where I'm playing around with Flask, but I want to be able to use it for stuff on Windows too rather than switching about in languages. I've done a few MOOC courses but never really made/finished anything, so I'm just looking to make a lil' program for Win10 that pulls weather info from an API I've yet to find, stores it and displays it.

Fair enough - I'm a big proponent of developing on the target OS so personally I'd install the latest python3 binary from python.org and then use pipenv to manage your python requirements. I don't recommend trying to go through git bash or bash for windows, if you want a posix environment then I'd just develop on native linux

# ? Jul 6, 2018 15:30

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

cinci zoo sniper posted:

I second Anaconda, I just normally spring for the full sized distro since it just works (tm).

Unless you're using PowerShell aka the standard terminal in Windows 10 (it's free software)

Finally getting around to maybe thinking about how to implement activate and deactivate after 4 years

# ? Jul 6, 2018 17:53

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

Oh man I remember being super irritated about that a long time ago. Lol that they're just now doing something about it.

# ? Jul 6, 2018 18:35

breaks: May 12, 2001

Boris Galerkin posted:

I'm trying to play around with __getattr__ and __setattr__ so that I can do something like the following:
code:
    def __getattr__(self, k):
        # if self._whatever (OtherClass) has k then return self._whatever.k
        # elif self has k then return self.k
        # else raise error
e: I would like to avoid implementing @property and @setters for everything.

It was already mentioned but inside __getattr__ you need to call super().__getattr__ (pre-edit I said __getattribute__ here but I'm pretty sure that's incorrect - it's been a while though) to perform attribute access on self to avoid recursion. Similarly with super().__setattr__ if you want to set something on self in __setattr__. (I think super() is ok but it might be necessary to call them on object directly? I don't remember) Don't dig in the __dict__ directly unless you're absolutely sure you want to bypass basically all normal attribute access behavior.

Another thing to keep in mind is that __getattr__ is a fallback that only gets called when __getattribute__ fails (there is a little more to it than that, reference the Python Data Model document). Absent any shenanigans that means that the name wasn't found in the instance or class. So, checking for the name on self in __getattr__ does nothing.

If you're proxying a known thing and it's really just an issue of trying to avoid writing out 6 lines of getter and setter over and over, consider writing a factory function for your properties or your own property-like descriptor. Using a more narrowly targeted hook will save you headaches.

breaks fucked around with this message at 00:19 on Jul 7, 2018

# ? Jul 6, 2018 20:43

Mrenda: Mar 14, 2012

I've actually managed to do something and it was easier than I thought despite there being a few hurdles. I'm now getting weather data from an API, saving it to an SQLite db and showing the temperature when it gets the data. The next step is putting this all in a gui/graphical format. What's best for that? (and has decent documentation/SO resources.) I'm trying to use as much standard Python as I can so TkInter sounds best?

The other thing I have to do is stop using a loop with a time.sleep() to get the current weather info every hour, maybe a delta time since so I don't have the whole program waiting for just that.

# ? Jul 8, 2018 10:09

Bruegels Fuckbooks: Sep 14, 2004; Now, listen - I know the two of you are very different from each other in a lot of ways, but you have to understand that as far as Grandpa's concerned, you're both pieces of shit! Yeah. I can prove it mathematically.

Mrenda posted:

I've actually managed to do something and it was easier than I thought despite there being a few hurdles. I'm now getting weather data from an API, saving it to an SQLite db and showing the temperature when it gets the data. The next step is putting this all in a gui/graphical format. What's best for that? (and has decent documentation/SO resources.) I'm trying to use as much standard Python as I can so TkInter sounds best?

The other thing I have to do is stop using a loop with a time.sleep() to get the current weather info every hour, maybe a delta time since so I don't have the whole program waiting for just that.

You're probably better off using the timer object for updating - you're going to need to use it if you want to use a gui because you won't want to block the main thread. I don't have a strong opinion on python UI frameworks because I've never used python for UI commercially, but qt worked fine for me in a hobby project before.

# ? Jul 9, 2018 14:52

Mrenda: Mar 14, 2012

Bruegels Fuckbooks posted:

You're probably better off using the timer object for updating - you're going to need to use it if you want to use a gui because you won't want to block the main thread. I don't have a strong opinion on python UI frameworks because I've never used python for UI commercially, but qt worked fine for me in a hobby project before.

I went with TkInter as the default and it had enough things built in to update/call functions after a certain time so it's now all good.

This is the first time I've made anything, even an extremely simple anything, that I have use for myself.

# ? Jul 9, 2018 15:27

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

If anyone ever wants a really simple GUI, particularly to make a form for taking arguments and running a script, big shout out to Gooey. Tutorial here:

http://pbpython.com/pandas-gui.html

I used this and a python to exe maker and made some little tools that helped my coworkers who would be scared of the command line.

# ? Jul 10, 2018 00:52

Jose Cuervo: Aug 25, 2004

I have a Jupyter notebook where I have a set of 3 subplots which share an x-axis. The x-axis is time in weeks, and I would like to make the plot scrollable (ideally) with two buttons at the bottom - a forward button which increases the upper and lower bound on the x-axis by 4 weeks, and backward button which decreases the upper and lower bound on the x-axis by 4 weeks.

This is the code i use right now to generate a static image in the notebook cell:

Python code:

def plot_history(months_back):
    df.index = df['Date']
    df = df.sort_values(by='Date')
    most_recent_date = df.iloc[df.shape[0] - 1]['Date']
    if most_recent_date.day == 1:
        start_date = most_recent_date - pd.tseries.offsets.MonthBegin(months_back)
    else:
        start_date = most_recent_date - pd.tseries.offsets.MonthBegin(months_back + 1)
    if start_date.dayofweek != 0:
        start_date -= pd.tseries.offsets.Week(weekday=0)
    df = df[start_date:most_recent_date]
        
    # Plots
    height_ratios = [1.25, 3, 3]
    fig, subplot = plt.subplots(nrows=len(height_ratios), ncols=1, 
                                figsize=(20, 15), sharex='all',
                                gridspec_kw={'height_ratios': height_ratios})

    plot_subplot_0(subplot[0], df)
    plot_subplot_1(subplot[1], df)
    plot_subplot_2(subplot[2], df)

    set_x_axis_attributes(start_date, most_recent_date)

    plt.tight_layout()
    plt.show()
    plt.close()


def set_x_axis_attributes(start_date, most_recent_date):
    """
    Sets
    1. The limits on the x-axis,
    2. Formats the ticks so that they occur every Sunday, and
    3. Formats the tick labels to only show the year when relevant.
    """
    plt.xlim(start_date - pd.tseries.offsets.Day(1), most_recent_date + pd.tseries.offsets.Week(weekday=0))
    locs, labels = [], []
    date = start_date
    year = start_date.year
    while date < most_recent_date + pd.tseries.offsets.Week(weekday=6):
        locs.append(date)
        labels.append(date.strftime('%b-%d'))
        date += pd.tseries.offsets.Week(1)
        if date.year > year:
            labels[-1] += ('\n%i' % year)
        year = date.year

    # Show the year for the last tick label
    date -= pd.tseries.offsets.Week(1)
    labels[-1] += date.strftime('\n%Y')
    
    plt.xticks(locs, labels, fontsize=15)

I think I can use ipywidgets to accomplish this, but I am not sure on how to structure the code to this to achieve what I want. Any pointers appreciated.

# ? Jul 10, 2018 19:07

Business: Feb 6, 2007

I don't know quite how to ask for what I'm looking for because I know very little about actual software development even if I've done some programming.

So I wrote some scripts that manipulate text in certain ways and output certain strings into a new text file. Custom, very weird use case NLP tasks. These more or less work for my purposes, but my bigger goals are to (1) collect all of these into a command line program that would just be easier for me (and ideally others) to use (2) get the whole shebang online for people to use.
Can anyone direct me towards some resources to help me think through how to do (1)? (2) is a huge stretch goal, but ideally I would be able to do (1) in a way that would set me up for success on (2). I'm looking for some explanations of how to go from a folder of .py files that do little tasks to an organized piece of software that I can work on, add features to, etc. I want to have a sense of best practices as my lil scripts get more complicated over time. There must be a name for what I'm trying to get at here, but I don't even know.

# ? Jul 11, 2018 14:15

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

Business posted:

I don't know quite how to ask for what I'm looking for because I know very little about actual software development even if I've done some programming.

So I wrote some scripts that manipulate text in certain ways and output certain strings into a new text file. Custom, very weird use case NLP tasks. These more or less work for my purposes, but my bigger goals are to (1) collect all of these into a command line program that would just be easier for me (and ideally others) to use (2) get the whole shebang online for people to use.
Can anyone direct me towards some resources to help me think through how to do (1)? (2) is a huge stretch goal, but ideally I would be able to do (1) in a way that would set me up for success on (2). I'm looking for some explanations of how to go from a folder of .py files that do little tasks to an organized piece of software that I can work on, add features to, etc. I want to have a sense of best practices as my lil scripts get more complicated over time. There must be a name for what I'm trying to get at here, but I don't even know.

Try Click. It's very nice to work with, kind of guides you into writing good user interfaces, and has really extensive documentation that hits on some design questions. You should refactor your individual scripts into functions and classes you can import around into other files, and then write a CLI app around your various tools using click. So you might have a script that right now that tags parts of speech and prints the tree (or whatever). You might then refactor that into a function that parses the sentence and a function that prints the tree. Then write a CLI app with click that you use like nlp tagger "this is a sentence" --tree, but then you can write an option for the app nlp tagger "this is a sentence" --table that prints a table.

I have not tried it at all so I can't vouch for it, but Python Fire might be useful for your purposes?

# ? Jul 12, 2018 02:07

NtotheTC: Dec 31, 2007

Seconding Click- using it to create neat command line apps is one of life's purest pleasures and I wish I could make it my full time job somehow

# ? Jul 12, 2018 10:47

Business: Feb 6, 2007

Ghost of Reagan Past posted:

Try Click. It's very nice to work with, kind of guides you into writing good user interfaces, and has really extensive documentation that hits on some design questions. You should refactor your individual scripts into functions and classes you can import around into other files, and then write a CLI app around your various tools using click. So you might have a script that right now that tags parts of speech and prints the tree (or whatever). You might then refactor that into a function that parses the sentence and a function that prints the tree. Then write a CLI app with click that you use like nlp tagger "this is a sentence" --tree, but then you can write an option for the app nlp tagger "this is a sentence" --table that prints a table.

I have not tried it at all so I can't vouch for it, but Python Fire might be useful for your purposes?

Started learning the click stuff today, the documentation is really good. Much appreciated!

# ? Jul 12, 2018 20:44

porksmash: Sep 30, 2008

In the python docs for glob, it's defined like so:

code:

glob.glob(pathname, *, recursive=False)

What does that asterisk after pathname mean? The glob function only takes 1 positional argument.

# ? Jul 13, 2018 05:24

Eela6: May 25, 2007; Shredded Hen

porksmash posted:

In the python docs for glob, it's defined like so:
code:
glob.glob(pathname, *, recursive=False)
What does that asterisk after pathname mean? The glob function only takes 1 positional argument.

The * means that's the end of positional arguments. recursive is a keyword-only argument. You can call it with glob.glob(somepath, recursive=True) if you'd like.

# ? Jul 13, 2018 06:24

Boris Galerkin: Dec 17, 2011; I don't understand why I can't harass people online. Seriously, somebody please explain why I shouldn't be allowed to stalk others on social media!

SurgicalOntologist posted:

You need to avoid calling __setattr__ again in __setattr__... which is what will happen if you have
code:
self._whatever = k
Instead you could try
code:
super().__setattr__('_whatever', v)
or
code:
self.__dict__['_whatever'] = v
I forget which is more recommended. You should do the analagous in getattr for your fallback behavior.

breaks posted:

It was already mentioned but inside __getattr__ you need to call super().__getattr__ (pre-edit I said __getattribute__ here but I'm pretty sure that's incorrect - it's been a while though) to perform attribute access on self to avoid recursion. Similarly with super().__setattr__ if you want to set something on self in __setattr__. (I think super() is ok but it might be necessary to call them on object directly? I don't remember) Don't dig in the __dict__ directly unless you're absolutely sure you want to bypass basically all normal attribute access behavior.

Another thing to keep in mind is that __getattr__ is a fallback that only gets called when __getattribute__ fails (there is a little more to it than that, reference the Python Data Model document). Absent any shenanigans that means that the name wasn't found in the instance or class. So, checking for the name on self in __getattr__ does nothing.

If you're proxying a known thing and it's really just an issue of trying to avoid writing out 6 lines of getter and setter over and over, consider writing a factory function for your properties or your own property-like descriptor. Using a more narrowly targeted hook will save you headaches.

Thanks, I�ve got it working now.

Random unrelated question, when I write a function with *args and keyword arguments, is it suppose to go like this

code:

def func(a, b, *args, c=None):
    ...

Or like this

code:

def func(a, b, c=None, *args):
    ...

I�m fairly sure that python3 doesn�t complain about this either way but I would guess the first one is more correct?

# ? Jul 13, 2018 13:46

Mad Jaqk: Jun 2, 2013

Boris Galerkin posted:

Random unrelated question, when I write a function with *args and keyword arguments, is it suppose to go like this
code:
def func(a, b, *args, c=None):
    ...
Or like this
code:
def func(a, b, c=None, *args):
    ...
I�m fairly sure that python3 doesn�t complain about this either way but I would guess the first one is more correct?

It depends on if you want c to be a keyword-only argument or not. Should a call of func("Hey", "you", "guys") be interpreted as c="guys" or args=("guys",)?

# ? Jul 13, 2018 14:24

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

I would like to make a pandas dataframe out of a JSON file containing scraped court data. I'd like all the data to appear in columns but there's lots of nesting and I can't figure out how to get it in a nice big table like I'd like.

I thought the way to go would be something like:

code:

import pandas as pd
import json
from pandas.io.json import json_normalize

with open(r'data.json') as json_data:
    d = json.load(json_data)

data = json_normalize(d)
df = pd.DataFrame.from_dict(data, orient='columns')
df.head()

cases =  pd.read_json( (df['caselink']).to_json(), orient='index')
cases.head()

But that just gives me a 1 x 4000 df and doesnt unnest the data.

csvkit can give me a csv output which is easier but it requires a ton of reoganizing and clean up. I figured a custom solution might be in order.

Data sample: (Very long so I pastebinned)
https://pastebin.com/PhDkrCME

# ? Jul 14, 2018 16:15

cinci zoo sniper: Mar 15, 2013

You won't be able to trivially map a decently hierarchical JSON document to a table or set of tables. I had to solve a similar problem at work and my final (naturally a temporary quick and dirt one to be rewritten better at the time) solution was to write a MongoDB script that does some server-side filtering and removes unnecessary fields and everything, a long-ish Python script with functions to read what i need and write it into pandas. On Python side I did iterate over individual JSON documents. The process was good enough to comb through a couple gigabytes of JSON poo poo reasonably quickly.

Edit: index orient is probably not what you may want in any case.

cinci zoo sniper fucked around with this message at 16:30 on Jul 14, 2018

# ? Jul 14, 2018 16:27

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Someone who knows what they're doing will be along soon, but looking at it your issue might be the fact (I think, I'm on a phone here) that you're passing it an array wrapped in an object, so it looks like one item containing one value (your array structure)

Try making your array the top level instead (remove the outer { } and get rid of the array's name key). Or maybe parse the JSON and pluck out the array object to pass to pandas for your nefarious needs

# ? Jul 14, 2018 16:36

cinci zoo sniper: Mar 15, 2013

*coughs* This is literally my job, getting meaningful things out of JSON and XML piles.

# ? Jul 14, 2018 16:43

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

cinci zoo sniper posted:

You won't be able to trivially map a decently hierarchical JSON document to a table or set of tables. I had to solve a similar problem at work and my final (naturally a temporary quick and dirt one to be rewritten better at the time) solution was to write a MongoDB script that does some server-side filtering and removes unnecessary fields and everything, a long-ish Python script with functions to read what i need and write it into pandas. On Python side I did iterate over individual JSON documents. The process was good enough to comb through a couple gigabytes of JSON poo poo reasonably quickly.

Edit: index orient is probably not what you may want in any case.

I dont mind some time investment in solving this issue. I will need to do it weekly or daily and the data will be formatted like this always. Whats the best way to solve it and got a link to learn more about it? I have no experience with MongoDB and dont control the server sending the data but am willing to learn.

CarForumPoster fucked around with this message at 16:58 on Jul 14, 2018

# ? Jul 14, 2018 16:55

cinci zoo sniper: Mar 15, 2013

CarForumPoster posted:

I dont mind some time investment in solving this issue. I will need to do it roughly weekly and the data will be formatted like this always.

What's your desired output, to roughly sketch on that sample you linked? I've not seen that many good links on this to be honest, just random blog posts ways back. It's a technologically simple problem that can be labour-"intensive" if your source data and your desired output don't cooperate (like it often is for me since if there's one thing that people's credit histories are not then it's concise). I'll take a look at your file a bit later, on the go right now.

Also yeah, good point you raise - how exactly do you receive this stuff?

cinci zoo sniper fucked around with this message at 16:59 on Jul 14, 2018

# ? Jul 14, 2018 16:56

CarForumPoster: Jun 26, 2013; â¡POWERâ¡

cinci zoo sniper posted:

What's your desired output, to roughly sketch on that sample you linked? I've not seen that many good links on this to be honest, just random blog posts ways back. It's a technologically simple problem that can be labour-"intensive" if your source data and your desired output don't cooperate (like it often is for me since if there's one thing that people's credit histories are not then it's concise). I'll take a look at your file a bit later, on the go right now.

My goal is to use the data in a variety of machine learning, NLP and statistics tasks. CSVkit (the 70% option) may be a better option than I once conceived.

The goal would be to map the fields to a set of names that I can play with.

E.g. right now I take judge_name, one hot encode them, and look for correlations to outcomes.

code:

case_number	charge_type	date_filed	court_name	case_number_long	judge_name	case_status	citation_number	compliance_date	def_name	pltf_name	def_atty	atty_phone	dob	offense_date	charge&state	charge_name_1
abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123	abc123
abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124	abc124

EDIT: It is scraped it from the interwebs.

CarForumPoster fucked around with this message at 17:20 on Jul 14, 2018

# ? Jul 14, 2018 17:06

cinci zoo sniper: Mar 15, 2013

Okay that data example is really straightforward.

code:

# %% lib improt
import pandas as pd
import json as js


# %% read file
with open("goon.json") as data:
    test = pd.DataFrame(js.load(data).get("caselink"))

Now you need to figure out what exactly you want to do with Casedata, parties, chargedetails, docketevents, and financial. Some are straightforward - like [a.get("name") for a in test.loc[0, "Casedata"]] or zipping financial stuff. Other not so much, that file isn't too complete or too well composed.

Good luck!

E: To be noted, I don't remember what's behaviour on pd.DataFrame(None) so this may die if crudely extended to an empty file (e.g. if missing data doesn't preserve same hierarchy early on).

cinci zoo sniper fucked around with this message at 17:34 on Jul 14, 2018

# ? Jul 14, 2018 17:32

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

cinci zoo sniper posted:

*coughs* This is literally my job, getting meaningful things out of JSON and XML piles.

I typed that before anyone replied :shobon:

# ? Jul 14, 2018 17:37

cinci zoo sniper: Mar 15, 2013

baka kaba posted:

I typed that before anyone replied

It�s all good, I�m cranky anyways since I�m out of Diet Coke unexpectedly.

# ? Jul 14, 2018 17:39

CHEF!!!: Feb 22, 2001

I've been painfully reminded that I need to learn Python beyond the barest of bare basics. As part of an interview process, I've been asked to parse an nginx log file and find all IPs if something performs a PUT, POST, PATCH, or DELETE 3 or more times in a 10 second interval

Question: Given a nginx access logfile, find all the ip�s which perform a PUT, POST,
PATCH or DELETE requests 3 or more times within a 10 second interval.

I thought it would be easy enough, but the inclusion of this code has left me not sure what to do. Filtering the log file and extracting the IP address and the actions logged is simple enough, but the logic for filtering for 10 second intervals within this code has left me feeling like a drooling nitwit.

On a completely related note, can people here still vouch for CodeAcademy being a good way to start truly learning Python? I'm at the point in my career (Systems Admin / Engineer / whatever) now where I really should learn proper Python coding.

code:

import os
from datetime import datetime
def block_abuse():
	pass
end
# Don't change below this line
# Don't change below this line
# Don't change below this line
def get_log_line():
	lines = open("/root/nginx.log", "r").read().decode('utf-8').split("\n")
	for line in lines:
		yield line

def test_results():
	print("Running tests...")

	correct_ips = ['113.217.59.65', '15.66.125.50', '155.122.40.32', '186.174.248.169']
	
	banned_ips = set(os.popen("iptables -L INPUT -v -n | grep DROP | awk '{print $8}'").read().split("\n"))
	banned_ips = [banned_ip for banned_ip in banned_ips if banned_ip]
	
	print(banned_ips)

	if len(correct_ips) != len(banned_ips):
		raise Exception("Number of banned IP's should be " + str(len(correct_ips)) + " but was " + str(len(banned_ips)))

	for banned_ip in banned_ips:
		if not banned_ip in correct_ips:
			raise Exception(str(banned_ip) + " should not be banned but was")

	for correct_ip in correct_ips:
		if not correct_ip in banned_ips:
			raise Exception(str(correct_ip) + " should be banned but wasn't")
	
	print("TESTS PASSED!")

if __name__ == '__main__':
	block_abuse()
	test_results()

# ? Jul 16, 2018 16:57

Thermopyle: Jul 1, 2003; ...the stupid are cocksure while the intelligent are full of doubt. �Bertrand Russell

CHEF!!! posted:

On a completely related note, can people here still vouch for CodeAcademy being a good way to start truly learning Python? I'm at the point in my career (Systems Admin / Engineer / whatever) now where I really should learn proper Python coding.

To be honest, if you understand basic python and programming in general, you'd probably be better served by reading Fluent Python.

# ? Jul 16, 2018 17:04

Roargasm: Oct 21, 2010; Hate to sound sleazy
But tease me
I don't want it if it's that easy

If you want to see a fully featured python program for some examples on how to do nginx log analysis check out https://github.com/lebinh/ngxtop

# ? Jul 16, 2018 18:06

cinci zoo sniper: Mar 15, 2013

On whiteboard exercise level with using basic Python, I'd just pile a dictionary of nested lists where each line read checks delta of last and 3rd from end request datetimes. This would likely be slow as gently caress, but fast and straightforward to write.

Another quick option would be to filter by IP into separate lists, then just filter lists down to right request type, and compute running delta on last 3 request datetimes. Most likely much better on performance although I haven't done text filtering like that.

# ? Jul 16, 2018 18:26

baka kaba: Jul 19, 2003; PLEASE ASK ME, THE SELF-PROFESSED NO #1 PAUL CATTERMOLE FAN IN THE SOMETHING AWFUL S-CLUB 7 MEGATHREAD, TO NAME A SINGLE SONG BY HIS EXCELLENT NU-METAL SIDE PROJECT, SKUA, AND IF I CAN'T PLEASE TELL ME TO
EAT SHIT

Another thing you could try is having a list containing the last 10 seconds' of IPs (filtered to only include the request types you're looking for). So when you get a new IP, add it to the list, then pop off the entries from the front of the list that have a timestamp > 10s earlier than your new entry. Then you can scan your list for matches on that new IP and see if you get 3+

Like a moving window, basically

# ? Jul 16, 2018 19:41

Adbot: ADBOT LOVES YOU

# ? May 14, 2024 04:15

Dr Subterfuge: Aug 31, 2005; TIME TO ROC N' ROLL

baka kaba posted:

Another thing you could try is having a list containing the last 10 seconds' of IPs (filtered to only include the request types you're looking for). So when you get a new IP, add it to the list, then pop off the entries from the front of the list that have a timestamp > 10s earlier than your new entry. Then you can scan your list for matches on that new IP and see if you get 3+

Like a moving window, basically

This is basically the ideal use case for a deque and more efficient than the general purpose list.

# ? Jul 16, 2018 19:52

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Python

«‹›230 »