Ground Control: deconstructing the run game to an unreasonable extent

The Something Awful Forums > Discussion > Sports Argument Stadium > The Football Funhouse > Ground Control: deconstructing the run game to an unreasonable extent

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

I've been following along with Python using the latest rush data and it's been fun to poke this stuff, and great for working out how to do similar analyses in Python.

EDIT: David Johnson, in red

EDIT 2: OH MY GOD TODD GURLEY

EDIT #3: Here's the direction I'm thinking of exploring: run direction!. Stay tuned!

Ghost of Reagan Past fucked around with this message at 04:31 on Feb 13, 2017

# ¿ Feb 12, 2017 04:30

Adbot: ADBOT LOVES YOU

# ¿ May 8, 2024 09:17

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

Here's some stuff on run direction.

First up, the Executive Summary!

1. Running behind the right guard is the worst direction to run.
2. The best direction to run is outside to the left.
3. Most runs in the NFL are in the middle, which is actually surprisingly effective.

Anyway, let's dig in.

The Average Running Back
Here's the average running back's run direction distribution.

We can glean a few things from this. First, inside runs dominate. Second, see those dips behind the guards? These runs, as we'll see, are less successful than every other kind of run, so presumably NFL teams understand this. But the odder thing is the dropoff on outside runs. These are actually pretty successful, averaging more than 4.3 ypc--but they fail more often than other runs. Is this a smart strategy by coaches, or are they being too conservative?

Here's the average yardage for each direction.

Now, how do we want to measure 'run failure'? Let's consider a run below 2 yards to be a failure--ignoring yards to go, of course, which would make some runs of 2 yards or less be successes. This is just to help us get a grip on what we're looking at here.

This is super interesting. Green is a successful run, blue is a failed run. This is the average yardage for each success and each direction. Note that the failures for the left and right outside runs are pretty big! This may explain the conservatism above. What proportion of runs for each type are failed runs, though? Do outside runs fail more often than inside runs?

Cross-tabbed and normalized, as well:

code:

rundirection  successful
LE            1             65.868875
              0             34.131125
LG            1             71.128983
              0             28.871017
LT            1             68.252636
              0             31.747364
RE            1             64.747371
              0             35.252629
RG            1             70.114157
              0             29.885843
RT            1             69.052494
              0             30.947506
middle        1             69.003449
              0             30.996551

So outside runs do fail more than inside runs. But is this a good tradeoff that coaches are making? Should they call runs to the outside more? I can't answer this question, but it's worth thinking about.

Comparing Running Backs
So here are some comparison charts between backs.

Adrian Peterson

Darren Sproles

Todd Gurley

Noted Laughingstock Trent Richardson

David Johnson (this is loving weird man)

Stay tuned for better success metrics and random forests. I can make you charts of any backs you'd like. Code will eventually be up somewhere once I figure out where to drop Jupyter notebooks.

Ghost of Reagan Past fucked around with this message at 21:29 on Feb 26, 2017

# ¿ Feb 26, 2017 21:01

Ghost of Reagan Past: Oct 7, 2003; rock and roll fun

pmchem posted:

I have a NFL data science question and this seems like the most appropriate place. Let's talk raw data sources. Ground Control uses nfldb.

As far as I'm aware (and I might be wrong), if a player does not collect stats in a particular game, nfldb does not differentiate between: (1) player was active, but did not play, (2) player was inactive due to injury, (3) player was suspended. I'm also not sure what it does with players who not on a NFL roster for a given week but collect stats other weeks; presumably it has week-by-week NFL roster status for each player.

Is there an easy source for data types 1-3? Especially #2+#3? Preferably in a way that can be easily imported via a python interface such as nfldb? I am looking to do a little machine learning, but missing that data would make the effort pointless. Manually entering the data would be prohibitive on my human-time.

If you could get week-to-week roster data, you could smack the data sets together reasonably quickly. It'd be kind of convoluted but it shouldn't be prohibitively time-consuming or difficult. Like, just the raw data should be fine, you don't need much more than that.

That data is likely available on a source like Pro Football Reference but I can't be 100% sure. It wouldn't be easily importable but you should be able to get it if you're dedicated.

This is actually the hardest part of doing data science.

Ghost of Reagan Past fucked around with this message at 14:31 on Aug 6, 2017

# ¿ Aug 6, 2017 14:28

The Something Awful Forums > Discussion > Sports Argument Stadium > The Football Funhouse > Ground Control: deconstructing the run game to an unreasonable extent