Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Ghost of Reagan Past
Oct 7, 2003

rock and roll fun
I've been following along with Python using the latest rush data and it's been fun to poke this stuff, and great for working out how to do similar analyses in Python.



EDIT: David Johnson, in red


EDIT 2: OH MY GOD TODD GURLEY


EDIT #3: Here's the direction I'm thinking of exploring: run direction!. Stay tuned!

Ghost of Reagan Past fucked around with this message at 04:31 on Feb 13, 2017

Adbot
ADBOT LOVES YOU

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun
Here's some stuff on run direction.

First up, the Executive Summary!

1. Running behind the right guard is the worst direction to run.
2. The best direction to run is outside to the left.
3. Most runs in the NFL are in the middle, which is actually surprisingly effective.

Anyway, let's dig in.

The Average Running Back
Here's the average running back's run direction distribution.


We can glean a few things from this. First, inside runs dominate. Second, see those dips behind the guards? These runs, as we'll see, are less successful than every other kind of run, so presumably NFL teams understand this. But the odder thing is the dropoff on outside runs. These are actually pretty successful, averaging more than 4.3 ypc--but they fail more often than other runs. Is this a smart strategy by coaches, or are they being too conservative?

Here's the average yardage for each direction.


Now, how do we want to measure 'run failure'? Let's consider a run below 2 yards to be a failure--ignoring yards to go, of course, which would make some runs of 2 yards or less be successes. This is just to help us get a grip on what we're looking at here.


This is super interesting. Green is a successful run, blue is a failed run. This is the average yardage for each success and each direction. Note that the failures for the left and right outside runs are pretty big! This may explain the conservatism above. What proportion of runs for each type are failed runs, though? Do outside runs fail more often than inside runs?


Cross-tabbed and normalized, as well:
code:
rundirection  successful
LE            1             65.868875
              0             34.131125
LG            1             71.128983
              0             28.871017
LT            1             68.252636
              0             31.747364
RE            1             64.747371
              0             35.252629
RG            1             70.114157
              0             29.885843
RT            1             69.052494
              0             30.947506
middle        1             69.003449
              0             30.996551
So outside runs do fail more than inside runs. But is this a good tradeoff that coaches are making? Should they call runs to the outside more? I can't answer this question, but it's worth thinking about.

Comparing Running Backs
So here are some comparison charts between backs.

Adrian Peterson


Darren Sproles


Todd Gurley


Noted Laughingstock Trent Richardson


David Johnson (this is loving weird man)


Stay tuned for better success metrics and random forests. I can make you charts of any backs you'd like. Code will eventually be up somewhere once I figure out where to drop Jupyter notebooks.

Ghost of Reagan Past fucked around with this message at 21:29 on Feb 26, 2017

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun

pmchem posted:

I have a NFL data science question and this seems like the most appropriate place. Let's talk raw data sources. Ground Control uses nfldb.

As far as I'm aware (and I might be wrong), if a player does not collect stats in a particular game, nfldb does not differentiate between: (1) player was active, but did not play, (2) player was inactive due to injury, (3) player was suspended. I'm also not sure what it does with players who not on a NFL roster for a given week but collect stats other weeks; presumably it has week-by-week NFL roster status for each player.

Is there an easy source for data types 1-3? Especially #2+#3? Preferably in a way that can be easily imported via a python interface such as nfldb? I am looking to do a little machine learning, but missing that data would make the effort pointless. Manually entering the data would be prohibitive on my human-time.
If you could get week-to-week roster data, you could smack the data sets together reasonably quickly. It'd be kind of convoluted but it shouldn't be prohibitively time-consuming or difficult. Like, just the raw data should be fine, you don't need much more than that.

That data is likely available on a source like Pro Football Reference but I can't be 100% sure. It wouldn't be easily importable but you should be able to get it if you're dedicated.

This is actually the hardest part of doing data science.

Ghost of Reagan Past fucked around with this message at 14:31 on Aug 6, 2017

  • Locked thread