Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Hey Hockles, sorry for the radio silence! Took a few weeks to finish my dissertation, then took a vacation after getting my PhD, but I'm at the airport now to head back home. I'll respond to your email tomorrow and help with playoffs scripting.

Adbot
ADBOT LOVES YOU

Hockles
Dec 25, 2007

Resident of Camp Blood
Crystal Lake

Forever_Peace posted:

Hey Hockles, sorry for the radio silence! Took a few weeks to finish my dissertation, then took a vacation after getting my PhD, but I'm at the airport now to head back home. I'll respond to your email tomorrow and help with playoffs scripting.

Well, that's a great reason to not be around. Congratulations, Dr. Forever_Peace!

whypick1
Dec 18, 2009

Just another jackass on the Internet
Thanks for bumping this thread Hockles, because I had something I wanted to share that would fit here, but this place has been rather moribund.

I thought to myself early "jesus loving christ, it seems like all but 3 teams have had to start their backup or backup backup QB". Well...



Number of different QBs to start for each team for the past 3 seasons (I'd do more, but it's a bit of a pain-in-the-rear end with QBs that have started for multiple teams...BTW, there's been 4 of those guys this year). This is only looking at QB starts, so the de facto starter going down before the season starts (i.e. Geno Smith this year) doesn't show up here. The very last row is the number of teams that have managed to be fortunate enough to not have to start their backup QB.

One interesting thing that I didn't put on the table since it wouldn't make any sense once I add more years, is which teams have had the same starting QB during this span:

Atlanta Falcons - no playoffs, no playoffs, no playoffs
Detroit Lions - no playoffs, Calvin Johnson rule redux, no playoffs
Miami Dolphins - no playoffs, no playoffs, no playoffs
New England Patriots - L AFCCG, W SB, could be #1 seed
New York Giants - no playoffs, no playoffs, no playoffs
San Diego Chargers - L Div, no playoffs, no playoffs
Seattle Seahawks - W SB, L SB, WC

So much for correlation. Seattle and New England have been two of the most dominant teams these past few years and those other teams...well, they're in the NFL. That's about all they have in common.

I'll probably expand this table back to '02 after next week.

whypick1 fucked around with this message at 04:26 on Dec 31, 2015

SurgicalOntologist
Jun 17, 2004

Hockles posted:

Hi guys, remember this thread? Anyway, if any of you still are bookmarked on this, or happen to follow this, I am in need of help with the script Forever_Peace made for me for 1KYOB using nfldb and python.

I want to know how to pull stats for the playoffs coming up. The script starts off like this:

code:
import nfldb
import pandas as pd
import os
import sys

#Open cmd, navigate to nfldb\scripts, then run "python nfldb-update"
#set working directy to folder containing script and entries, then open an ipython notebook and: %run 1kyob_script.py

betpath = os.path.dirname(sys.argv[0])

seasontype = 'Regular'
week = 16
year = 2015

db = nfldb.connect()
Basically, do I need to change the "seasontype" to something else, or just increment the week to week 18 after the last week of the regular season?

code:
nfldb> SELECT DISTINCT season_type FROM game
+---------------+
| season_type   |
|---------------|
| Preseason     |
| Regular       |
| Postseason    |
+---------------+

nfldb> SELECT DISTINCT week FROM game WHERE season_type = 'Postseason'
+--------+
|   week |
|--------|
|      1 |
|      2 |
|      3 |
|      4 |
|      5 |
+--------+
tl;dir: change season_type to 'Postseason' and week back to 1.

Hockles
Dec 25, 2007

Resident of Camp Blood
Crystal Lake

Thanks! I couldn't find that on the wiki.

Mr. Funny Pants
Apr 9, 2001

This question is likely too stupid to deserve this thread, but I have to ask. It's something that I've wondered for literally decades and I'm sure there's a simple explanation staring me in the face.

On kick-offs the receiving team lines up x number of men ten yards from the kicking team. Then there's another group a bit deeper, presumably to guard against a pooch kick. Why, once they've confirmed that the kick is going deep, don't these two groups hold their ground or run forward, rather than immediately ceding ground before blocking?

I look forward to finding out once and for all what obvious answer I'm missing and why I've been an idiot for 30+ years.

Impossibly Perfect Sphere
Nov 6, 2002

They wasted Luanne on Lucky!

She could of have been so much more but the writers just didn't care!
Trying to block someone running full speed at you is really hard.

Spoeank
Jul 16, 2003

That's a nice set of 11 dynasty points there, it would be a shame if 3 rings were to happen with it

Mr. Funny Pants posted:

This question is likely too stupid to deserve this thread, but I have to ask. It's something that I've wondered for literally decades and I'm sure there's a simple explanation staring me in the face.

On kick-offs the receiving team lines up x number of men ten yards from the kicking team. Then there's another group a bit deeper, presumably to guard against a pooch kick. Why, once they've confirmed that the kick is going deep, don't these two groups hold their ground or run forward, rather than immediately ceding ground before blocking?

I look forward to finding out once and for all what obvious answer I'm missing and why I've been an idiot for 30+ years.

I would say the chances that you let a free runner past are much, much higher if you run at them rather than run back then catch them. That or you can't really hold a block long enough for someone to run it back 30+ yards and for you to still be holding that block.

Epi Lepi
Oct 29, 2009

You can hear the voice
Telling you to Love
It's the voice of MK Ultra
And you're doing what it wants

whypick1 posted:

Thanks for bumping this thread Hockles, because I had something I wanted to share that would fit here, but this place has been rather moribund.

I thought to myself early "jesus loving christ, it seems like all but 3 teams have had to start their backup or backup backup QB". Well...



Number of different QBs to start for each team for the past 3 seasons (I'd do more, but it's a bit of a pain-in-the-rear end with QBs that have started for multiple teams...BTW, there's been 4 of those guys this year). This is only looking at QB starts, so the de facto starter going down before the season starts (i.e. Geno Smith this year) doesn't show up here. The very last row is the number of teams that have managed to be fortunate enough to not have to start their backup QB.

One interesting thing that I didn't put on the table since it wouldn't make any sense once I add more years, is which teams have had the same starting QB during this span:

Atlanta Falcons - no playoffs, no playoffs, no playoffs
Detroit Lions - no playoffs, Calvin Johnson rule redux, no playoffs
Miami Dolphins - no playoffs, no playoffs, no playoffs
New England Patriots - L AFCCG, W SB, could be #1 seed
New York Giants - no playoffs, no playoffs, no playoffs
San Diego Chargers - L Div, no playoffs, no playoffs
Seattle Seahawks - W SB, L SB, WC

So much for correlation. Seattle and New England have been two of the most dominant teams these past few years and those other teams...well, they're in the NFL. That's about all they have in common.

I'll probably expand this table back to '02 after next week.

Where are you getting your info on Tampa because we've only used Winston this year and I'm pretty sure we only had 2 QBs last year.

whypick1
Dec 18, 2009

Just another jackass on the Internet
Bah, I accidentally had some leftover rows which meant some guys got double-counted. Fixed my OP.

3 DONG HORSE
May 22, 2008

I'd like to thank Satan for everything he's done for this organization

Forever_Peace posted:

Hey Hockles, sorry for the radio silence! Took a few weeks to finish my dissertation, then took a vacation after getting my PhD, but I'm at the airport now to head back home. I'll respond to your email tomorrow and help with playoffs scripting.

What was your dissertation about? A statistical analysis of going WR/WR over RB/RB? :angel:


Congrats, dude!

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
So I've finally sat down and started writing this big running backs analysis I've been promising you guys since forever.

I ended up deciding to break everything down in to chapters, but even that wasn't really enough to keep things manageable trying to balance technical insights with the explanation and narrative thread necessary to keep everybody on the same page, even if they have a minimal statistical background.

So I decided to try co-writing the chapters with the Spirit of Ernie Adams. I'm using Ernie to express the technical supernerd stuff, and using my own voice for explanation and for linking up various plots and tables and stuff.

There is a chance this comes off as super dumb and hokey, so I wrote up a test chapter in that style so I could get some honest feedback. Please help my effort posts be actually worth reading!

Currently, I have the following chapters planned:
- Embracing probability distributions.
- Individual Run distributions and running styles (also the good, the bad, and the ugly)
- Finding comparable players through distribution matching
- Sampling, resampling, and the central limit theorem.
- Teams, defenses, and players across time.
- Situational variables (down, distance, field position, score, timeleft etc)
- Modeling the run (GAM and random forest models)
- Marginal effectiveness of individual players
- Bayesian analyses of individual player performance

They would probably all be about as long as the one posted below. My goal would be one every week or two until they are done.

Would it make sense to keep it in this thread? Or should I ask if I can make a thread for it, so folks can find the OP easily with the links to the individual chapters?

Also, I suck at writing and probably need an editor. If you are good at writing and editing and want to collaborate, that would be awesome. If you are good at statistics instead and would like to contribute analyses, that would also be awesome. Just let me know.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
When a coach calls a running play in the NFL, what do they expect to see? What does a “typical” run look like? A good coach would probably say “it depends”: every call is intended to attack a particular defense (or to set up a future attack on a particular defense) in a particular way in a particular circumstance. Designing runs is hard, calling the right play at the right time is hard, and executing a solid run is hard (but fun: offensive linemen generally would prefer mauling the hell out of the guy across from them then dropping back into pass protection any day of the week).

But let’s abstract over particular situations, defenses, and plays for a moment. Assume that, over the long run, there’s an archetypal balance point for how a rushing attempt meets a run defense. You give the running back the ball. How far does he typically go?


“There’s about a million ways to answer that question. It’s almost just kind of a dumb question.”


But the ubiquity with which people throw “yards per carry” stats around make me think it’s something that a lot of us are interested in. Our job is to find the most appropriate answer to that intuitive interest. So let’s hear it, Ernie. What do you got for us?


“Well, you mentioned yards per carry (YPC) already. Over the past six regular seasons, the average run is about 4.175 yards. In economic terms, this is the ‘expected value’ of the run when you call it.”


It’s what most of us probably thought of first when we thought of the “typical” run: a hair over four yards. That’s the ‘mean’ yardage gained on a run. Nice and easy to calculate: just take all your rushing yards, and divide by rushing attempts. This is usually a relatively stable number in modern football, though it does wax and wane a bit year by year. Here is the average yardage for each of the past six seasons:

pre:
2010 4.18
2011 4.32
2012 4.24
2013 4.10
2014 4.13
2015 4.09
“But YPC has some issues. For starters, it’s heavily influenced by big runs. Being pulled around by large values is a consideration whenever you use the mean as a measure of central tendency. Like, when Bill Gates walks into a room, the average person in that room becomes a multi-millionaire, even if not a single person is actually a multi-millionaire (Gates is a billionaire and everybody else in this scenario is a poor schmuck).

Because of the occasional fifty yard run, the mean value is a bit on the optimistic side. Get this: the most common outcome of a run is just two yards.”


This is called the “mode”, or the value appearing with the highest frequency in the data. Every single year in modern football, the three most common outcomes of a rushing attempt are one yard, two yards, or three yards (the typical order is 2/3/1). Like I said, running is hard.


“Another problem with that ‘4.1’ number is that very good running backs tend to get many more carries than average or below-average running backs, so the number is skewed in favor of a very high talent level. If you just picked a running back at random from the NFL and gave him the ball, or hell, even got a league-average running back on purpose, your expected value would almost certainly be lower than 4.1.”


That’s quite right, and it leads to the somewhat bizarre situation where most running backs are “below average” compared to the league-average YPC. And that’s no wonder, when the “average” is largely made up of guys like Adrian Peterson, who had more rushing attempts this year than anyone. It makes it very hard to “beat the average”.

One way to get around this is to find the average YPC for each running back individually, then averaging those averages together. This way, each player is weighted equally in the resulting mean. If we calculate this “YPC over players”, we get a much lower 3.54 for the past six years.


“That also gets you a lot closer to the median run, which is about 3 yards. We don’t know how far exactly the median run is, because yardage is only recorded in terms of discrete yards, but it’s probably somewhere between 3 and 3.5.”


If we line up all the runs in a given season in order of shortest to longest, the median is the rushing attempt right in the middle. It’s also known as the 50th percentile – half of all runs go about 3 yards.

I say “about” because only whole yards are recorded in the stat books, and play spotting is a bit dicey to begin with. In the official statistics for rushing attempts, the median is exactly three yards – just keep in mind that the “real” median is probably slightly more. The mean does not have this problem because we are averaging together a lot of whole numbers, which has no problem producing decimals.

Anyways, the idea of quantiles or “percentiles” (like the median at the 50th percentile) can, I think, be a useful one for us here. There are two general rules I want you to remember.

FP’s Rules of Run Quantiles:
1) The 1-3-5 rule. Over a quarter of rushing attempts are over by 1 yard. About half of rushing attempts are over by 3 yards. About ¾ of rushing attempts are over by 5 yards. These are the first, second (median), and third quartiles.


“Along with the minimum and maximum (-17 and 97 yards in this data set, respectively), this comprises the five-number summary popularized by my man Tukey.”


… uh, thanks Ernie.

2) The 10 at 10 rule. Any run that goes at least 10 yards is among the longest 10% of rushing attempts. This is the 90th percentile. By comparison, all runs that lose a yard or more among the 10% worst rushing attempts.

So what do you think, Ernie? Any critical insights here before we move on?


“I think this is all tedious bullshit and I’d rather just show the people what the run distribution looks like.”


… yeah OK that’s fair.

When you give a guy a football, this is what we think you should see in his future:



See, before the play has actually unfolded, a “point estimate” of what to expect (like a mean, median, or mode), is just not going to cut it here. Running the ball is a game of probabilities. He might get creamed behind the line of scrimmage by a DT that caught their blocker flat-footed. The TE might open up an unexpected gap for an easy five. A linebacker might hesitate on diagnosing a run to the outside because of all the play-fakes you’ve been throwing at him. And every once in a while, your running back might hit the crease at just the right time, break a tackle, and sprint away into an open secondary. There are so many moving parts, running the ball for a particular amount of yardage is always a bit of a gamble.

So rather than thinking “here is the average, and some run is either above or below that average”, we want you to flip the script. Instead, consider that there is a universe of possibilities that could occur when you hand that ball to your running back. By the end of the play, we’ve witnessed one of those universes. In short, we’ve taken a sample from a probability distribution.

What I’ve shown you above is called a “histogram”. It shows the proportion of runs that have gone for each number of yards. As I mentioned above, these discrete “steps” are just a reflection of the stat-keeping, not the actual run. So let’s smooth over them:




“Now THAT is what a typical run looks like!”


There’s really a lot you can learn from a density plot like this.

You can really see the “skew” we were discussing earlier (the Bill Gates thing). That long right tail goes all the way out to 100 yards. We just cut it off early for this chart. The black line done the middle, the mean yardage, is clearly pulled in that direction from that small set of very long runs.

You can see how common it actually is to lose yardage on a run. You can see the mode at 2 yards (the most common outcome). You can see the big bulk of runs going just a couple of yards. My favorite part – you can even see the slight “divot” at exactly 10 yards, where people are more likely to get exactly 9 or exactly 11 yards than exactly 10. We’ll come back to that one later.

But seeing those quantiles – the medians, the 90th percentile etc – is a bit tough here. We can make things easier for you by rearranging the data a bit. Rather that show the density at each point (i.e. the proportion of runs that go a particular amount of yards), we can instead add up the cumulative density at each point (i.e. the proportion of runs that go at least a particular amount of yards).



These can be tough to read if you’re not used to them. Just find a rushing yardage you’re interested in along the x-axis. Say 0 yards – the line of scrimmage. Move straight up from there until you hit the line – in this case, at about 0.2 on the y-axis. That means that about 20% of runs are already over by the time they reach the line of scrimmage. If you keep moving up, you’ll now be moving along a short vertical segment of the actual cumulative distribution itself. The length of this segment shows the proportion of runs that traveled exactly 0 yards. It corresponds with the relative size of the bars in the histogram above. And finally, keep traveling up and you’ll reach the end of this vertical segment, at about 0.1 along the y-axis. That means that about 10% of runs have ended before you even get to the line of scrimmage. Put it all together, and that one check told you that 10% of runs make negative yardage, and that an additional 10% of runs are stopped at the line, for a total of 20% of runs finished before they even gain positive yardage.


“These cumulative density plots are so information-dense. I’m really quite partial to them.”


Stunner.

Just in case you aren’t, here’s a list of the proportion of runs (out of 1) that have ended by the time they reach particular yardages:

pre:
Yards	proportion dead
-3	0.0314 
-2	0.0579 
-1	0.1015 
0	0.1953
1	0.3159 
2	0.4486 
3	0.5749 
4	0.6755 
5	0.7501  
6	0.8030 
7	0.8420  
8	0.8704  
9	0.8975 
10	0.9106  
11	0.9264 
12	0.9376 
13	0.9470 
14	0.9545 
15	0.9607
That covers all runs but the lowest and highest 5%.

Now, I’ve given a lot of bad news here so far. One in five runs don’t make it past the line of scrimmage. The most likely outcome is just 2 yards. Half of all runs end by 3 yards. And your odds of making it even 10 yards is 10:1 against.

But there’s some great news hidden inside that long right tail. Let’s say your running back gets the ball, breaks a tackle, and makes it to five yards. How much further, on average, might you expect him to go? Finding this expected value is actually pretty easy. We just look at all the runs that traveled at least five (or whatever) yards, and calculate how far they ended up going before being stopped. The “Yards Left”, if you will.



If we look at five yards, you’ll find something pretty heartening: running backs that go at least five yards end up, on average, going for at least another five before going down. And that rate just keeps getting better the further the running back gets into the open field.


“Yeah, but take a look at where you actually start a run, a few yards behind the line of scrimmage. The opposite is true. Every step you take is, on average, a step closer to being tackled.”


Yeah, ok, that’s also true. Take a look at -5 yards. You’ll see that here behind the line of scrimmage, your expected yards remaining is about 9.1 yards. Which, surprise surprise, means that you’ll end up somewhere around… 4.1 yards, or the mean yardage on a rushing attempt. But if you break through the defensive line, things are looking good.


“Basically, you’re seeing in this graph the three major stages of a run: the yards that are blocked, the yards that are contested, and the open field.”


It’s the job of the offensive line try to help more runs go for more yards. In essence, pushing the [url= http://i.imgur.com/1aAoFEZ.png]cumulative density curve[/url] further to the right. In this “yards left” graph above, there’s almost a straight line on the left all the way up to about -2 or so yards, meaning that each step is bringing you closer to the actual “danger zone” where you are at risk of being tackled, but the blocking by the line ahead of you generally allows you to travel some “free” yards in the backfield from where you got the ball.


After that point, the defensive line kicks in. It’s their job to stop as many runs for as short of a gain as possible, pushing the cumulative density curve to the left.


“That’s cute that you think that.”


Ok, usually their job. Anyways, as we saw from the previous figures, most runs are stopped in this region, between 0-5 yards, depending the relative maneuvering of the offensive line and the defensive line. A solid running back is one that finds the blocked yards and falls forward every time without coughing up the ball.


“Pretty much”


But working through that battlezone has a payoff. Each step past the D-Line improves your chances of longer and longer runs. We still have the problem of the skewed distribution and the long right tail when we look at averages, but the same pattern shows up if you look at median yards left.



So yeah, running is hard, and most runs don’t get very far. But I find it very hopeful that there is always more yards to look forward to, regardless of where you’re at in a run. Even at the very bottom of that valley, you can fall forward for an extra yard or two. But after that is the sweet taste of open-field running. Running backs should be the ultimate optimists.

Forever_Peace fucked around with this message at 15:44 on Jan 8, 2016

Qwijib0
Apr 10, 2007

Who needs on-field skills when you can dance like this?

Fun Shoe

This was a great read with good flow and didn't feel too dense at all.

foobardog
Apr 19, 2007

There, now I can tell when you're posting.

-- A friend :)

Qwijib0 posted:

This was a great read with good flow and didn't feel too dense at all.

I agree. My stats knowledge is basically AP Stats and The Cartoon Guide to Statistics, and it was completely understandable and enjoyable.

got any sevens
Feb 9, 2013

by Cyrano4747
Goddamn that's amazing.

So I wonder what the most important part of a run is: the rb's skill, weight, or something to do with his blocking? Do good blockers make average rb's great, or vice versa, or is it too many variables to analyze well? How much should teams spend on rb's if they tend to get just 2 yards per run? 3 of those doesn't add up to a first down.

Adrenalist
Jul 8, 2009
Great read. Here's a question, though: how much is a running play supposed to get? Coaches aren't often looking for dancers/home-run hitters, and power isn't exactly supposed to spring you to the house. Similarly, how much of that data is skewed by down-and-distance situations (e.g. 3rd/4th and inches, mostly--plays which are designed to sacrifice any chance of second-level blocking to ensure you get the first down?)

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Thanks for the kind words, folks. I appreciate the feedback and the questions. Those are all topics I have looked into and hope to address, actually.

It looks like the primary changes to the real thing will be the addition of a chapter summary at the end of each post, a GitHub repository for the code, and a wonkier spirit of Ernie.

I want to have drafts of chapters 2 and 3 written before putting up the OP, so I can keep stuff coming out at a good clip.

Chapter 2 has an interactive app that's basically done, though, so the major hurdle is already handled there.

Miloshe
Oct 25, 2009

The little chicken girl wants me to ease up!
He can't handle!
He cries like woman!



gently caress. Yes. Don't stop. Ernie/Sanders 2016.

Fenrir
Apr 26, 2005

I found my kendo stick, bitch!

Lipstick Apathy

Adrenalist posted:

Great read. Here's a question, though: how much is a running play supposed to get? Coaches aren't often looking for dancers/home-run hitters, and power isn't exactly supposed to spring you to the house. Similarly, how much of that data is skewed by down-and-distance situations (e.g. 3rd/4th and inches, mostly--plays which are designed to sacrifice any chance of second-level blocking to ensure you get the first down?)

I'm not 100% sure on this but I know I've read it somewhere: Run plays are "supposed" to get enough yards that you could run the ball 3 times and get a first down, so 3.33+ is what you're hoping for on 1st down and 10. The expectation is (obviously) different when you use any other down and distance situation - to the point I don't think you can really compare them.

I mean, you could do so when you're talking just bulk rushing data, but strategy changes so much for a situation like 4th and inches that it's more or less apples to oranges. A 3rd and goal from the 1 situation can only get 1 yard, after all - and most short yardage situations in general do not aim for much more than what's required. The RB (or QB in case of a sneak) often dives forward over the line if it's really short. They're not even trying to run at that point, they just want the first down.

I'm pretty sure that affects bulk rushing numbers in a pretty big way as well since these situations happen a lot. I wonder if there's a way to figure out just how many of those 1-2 yard runs are dives on 4th and 1 or 3rd and goal from the 1, etc etc.

Fenrir fucked around with this message at 16:29 on Jan 16, 2016

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
We are going to dive into situational factors so hard you'll need scuba gear.

The plan is to explore these run distributions for a while before the chapter on sampling and resampling that transitions us back into averages, THEN we'll do situational influences on those averages.

Spoeank
Jul 16, 2003

That's a nice set of 11 dynasty points there, it would be a shame if 3 rings were to happen with it

Forever_Peace posted:

We are going to dive into situational factors so hard you'll need scuba gear.

The plan is to explore these run distributions for a while before the chapter on sampling and resampling that transitions us back into averages, THEN we'll do situational influences on those averages.

That was going to be my question: runs on 4th and 1, runs within the 5 yard line, etc. etc.

Also how many of these runs are "successful?" How many are runs on 3rd & 15=0 that go for 6 or 7 because the defense played back? Great read though. Are you going to put this stuff on a real website? I would like to be able to reference it in my writing and I don't think "Something Awful Dot Com Forums Poster Forever_Peace found..."

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

Spoeank posted:

That was going to be my question: runs on 4th and 1, runs within the 5 yard line, etc. etc.

Also how many of these runs are "successful?" How many are runs on 3rd & 15=0 that go for 6 or 7 because the defense played back? Great read though. Are you going to put this stuff on a real website? I would like to be able to reference it in my writing and I don't think "Something Awful Dot Com Forums Poster Forever_Peace found..."

I can give ya'll a sneak peak: the single biggest influence on expected yardage is whether or not line of scrimmage is within 10 yards of goal. The compressed field and goal-line defenses really wreak havoc, above and beyond the removal of possible big runs. A smaller version of this run-defense effect happens on third down: the closer the line of scrimmage is to the first down marker, the harder it is to gain yardage. Conversely, the farther the team is from the first down marker (on a 3rd down), the more yardage they tend to gain, probably for exactly the reasons you speculated.

As for the website, no I have no plans at the moment. If somebody would like to host this stuff (once it starts coming out), just toss me a pm.

I WILL, however, be posting all of my code on GitHub with a public license. And the interactive apps will be hosted on GitHub as well - anybody who wants to will be able to run them for free.

SurgicalOntologist
Jun 17, 2004

Hi fellow football nerds.

Totally hypothetical question. Let's say you had 24 hours access to NFL next gen stats data, and had to develop and pitch an idea for how to use that data. What would you try?

pangstrom
Jan 25, 2003

Wedge Regret

SurgicalOntologist posted:

Hi fellow football nerds.

Totally hypothetical question. Let's say you had 24 hours access to NFL next gen stats data, and had to develop and pitch an idea for how to use that data. What would you try?
Re: what the stats actually are: Is it a time series of where every player and the ball is on the field at all times (and maybe their velocity & acceleration?)? Let me know and I bet I can come up with a few angles.

If it's a job tryout, good luck!

Impossibly Perfect Sphere
Nov 6, 2002

They wasted Luanne on Lucky!

She could of have been so much more but the writers just didn't care!
Prove that that difference in player speeds is generally inconsequential.

SurgicalOntologist
Jun 17, 2004

pangstrom posted:

Re: what the stats actually are: Is it a time series of where every player and the ball is on the field at all times (and maybe their velocity & acceleration?)? Let me know and I bet I can come up with a few angles.

I won't know in advance exactly what the data is. I hope to god they have a tracker on the ball. They for sure have two trackers per player, one on each shoulder pad.

If they don't give us the raw data and just something stupid like a speed timeseries I'm just going to walk out.

MacheteZombie
Feb 4, 2007
Make terrible player charts like the ones used in soccer:
'


Market them as coaching, roster management, and if the NCAA tracks these stats, drafting tools.

SlipUp
Sep 30, 2006


stayin c o o l

NC-17 posted:

Prove that that difference in player speeds is generally inconsequential.

This but I'd specifically try to find a correlation between speed and catches of 20+ yards.

pangstrom
Jan 25, 2003

Wedge Regret

SurgicalOntologist posted:

I won't know in advance exactly what the data is. I hope to god they have a tracker on the ball. They for sure have two trackers per player, one on each shoulder pad.

If they don't give us the raw data and just something stupid like a speed timeseries I'm just going to walk out.
It's going to be hard to do outcome-relevant stuff with such low-level data with all the confounds... I don't know what else they're going to give you or let you walk in there with, but this sounds like any prep work you can do will be worth it. Like if you go in there with a higher level database of every play with an eye on what you want that to look like and can sync that with what they give you, etc. I would focus on contrasting good players versus bad ones.

Also like it's going to depend a little on whether you're coming up with stuff for public consumption or for internal team use or something. Like a machine learning something or other might fly with the latter.

-Distance traveled by position in your average game... could do the same thing with top speed
-On passing plays (preferably ones where the ball was thrown downfield to rule out screens etc., if that is possible) it would be interesting to see a distribution of "distance to nearest opponent" for good CB vs. bad CB, good WR vs. bad WR. Maybe you can lock it to when the ball leaves the QB's hand.
-Seems like LBs who are good often accelerate like one time when they make the tackle on running plays. They take the angle to where the ballcarrier is going to be and then show up and snuff him out. The inverse goes for shifty players with the ball who are getting extra yards... they are causing defenders to accelerate and decelerate a lot. Maybe an acceleration profile on different play types for different players might be illustrative. Maybe you could get something out of the derivative of acceleration or "jerkiness"
-Collisions may be too hard to reliably detect in the data but that might be fun. Like "hardest hitting defenders" stuff, for example.

Fenrir
Apr 26, 2005

I found my kendo stick, bitch!

Lipstick Apathy

NC-17 posted:

Prove that that difference in player speeds is generally inconsequential.

That would be an interesting exercise. It's pretty well held to be true, aside from people who are just blazing fast like Randy Moss.

Another good example would be DeSean Jackson, whose early years especially revolved around outrunning the poo poo out of your secondary.

pangstrom
Jan 25, 2003

Wedge Regret
Like I agree it's pretty glib and superficial, especially the way it's being used now (PEYTON MANNING TOPPED OUT AT 15.0002 MPH on HIS SCRAMBLE or whatever), but guessing that's not the attitude that's going to impress whoever is giving him access to the data.

SurgicalOntologist
Jun 17, 2004

Yeah I'll want to go deeper than that, regardless of what they're looking for.

And I guess I'll share what it is: http://www.eventbrite.com/e/nfl-hackathon-tickets-20253069476

So it's not a job offer, but it could certainly lead to one. I read over the rules and it seems fairer than I would have guessed, that is you keep the rights.

For higher-level variables, since I don't know what they'll provide, I'll be bringing in this database: http://www.armchairanalysis.com/
I can't find anything in the rules to suggest this wouldn't be allowed. This is one step better than nfldb as it contains things that don't show up in box scores.

It's not super clear if they prefer something for internal or external consumption, but I'm not too worried. I'll do whatever seems interesting, simple enough to accomplish quickly, but complex enough to be impressive.

Pangstrom, your ideas are similar to what I've been considering. Other ideas on my brainstorming sheet are:
- success of misdirection (e.g. counter runs)
- pocket deformation
- a measure of fatigue
- reaction time to snap

pangstrom
Jan 25, 2003

Wedge Regret
Those are good -- I esp. like the reaction time one. I would be way more interested in that than the top speed stuff they trotted out all season.

Seems weird to "Automatically calculate statistics or descriptions of what occurred on plays in the game" by coming at those from the back end BUT that would be a fun problem... maybe not for a 24 hour thing, though, since that could be deep water for weird play edge cases or penalties etc.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

SurgicalOntologist posted:

Yeah I'll want to go deeper than that, regardless of what they're looking for.

And I guess I'll share what it is: http://www.eventbrite.com/e/nfl-hackathon-tickets-20253069476

So it's not a job offer, but it could certainly lead to one. I read over the rules and it seems fairer than I would have guessed, that is you keep the rights.

For higher-level variables, since I don't know what they'll provide, I'll be bringing in this database: http://www.armchairanalysis.com/
I can't find anything in the rules to suggest this wouldn't be allowed. This is one step better than nfldb as it contains things that don't show up in box scores.

It's not super clear if they prefer something for internal or external consumption, but I'm not too worried. I'll do whatever seems interesting, simple enough to accomplish quickly, but complex enough to be impressive.

Pangstrom, your ideas are similar to what I've been considering. Other ideas on my brainstorming sheet are:
- success of misdirection (e.g. counter runs)
- pocket deformation
- a measure of fatigue
- reaction time to snap

Congrats, sounds like a lot of fun! I'm completely jealous of access to that kind of data, but I'll settle for living vicariously. Drop a line if you want to strategize.

I'm no expert here, but I personally would probably try to start with a minimum workable product, then iterate and add complexity as you go.

So regardless of the topic you pick, I would 1) simplify the data, and 2) sample a training set. I'm imagining that player-tracker data would take the form of an X/Y axis position vector (or two, if they give you each shoulder pad) at some sampling rate. Hopefully, they already provide the basic infrastructure to plot player positions at a given time t so you don't need to write your own visualization script.

So, for example, if the data did indeed come in that format, and you wanted to pursue the snap reaction time idea (which I really like), here's how I might approach it, with a focus on modeling the impact of snap reaction on QB pressures.

1) Select the plays of some known set of high-volume edge rushers. Remove any plays involving penalties. Cut a training set and a test set.
2) Make a snap reaction script. This would need to identify the first period of post-snap movement. If they give you the position vector for each pad, I might average them. This would only work if the sampling rate was sufficiently high on the position vectors.
2a) Just the snap reaction script is enough to produce some interesting minimal product. I'd take a half-hour to explore snap reaction over time (e.g. as a function of plays that players has been a part of in the game or the drive, and/or clock time) and whether there is a significant home vs away difference (where reaction time for edge rushers might actually be SLOWER at home because crowd noise will be louder).
3) Make a blocker identifier script. I'd probably try to sample player positions at t=1s or t=1.5s or something and calculate the closest linear distance (or find all players within some minimum distance). I may restrict just to passing plays depending on how it looked. I'd probably also remove stunts (might be possible to classify these using the ratio of x distance to y distance at time t=1s, where a stunt would have a lot of x movement with very little y movement within the first second of the snap).
3a) Run the snap reaction script on the identified blocker.
4) Here I'd start modeling QB pressures as a function of snap reaction times (and the difference between the rusher and the blocker). The easy way would be if you already had binary outcomes of sacks and pressures per play: just run a logistic regression using, say, the difference in snap reaction times between the rusher and the blocker with a random-effects structure for the rusher. Add other fixed effects as required, including home/away status, blocker position (tackle, guard, center, TE, RB), whatever time influences you found in 2a etc. Cross-validate on the test set.
4a) Make a "beat the blocker" script, looking in some post-snap time window for any instance where the the rusher is closer the QB (by linear distance) than the blocker for some specified set of time. Run the model on these "beat the blocker" binary outcomes, regardless of whether the play resulted in a sack or pressure. Cross-validate.
5) Start building towards models of the whole pass-rush unit. I might start with a random forest model of QB pressures using snap reaction differences between each rusher and their blocker (because the combined effect is probably nonparametric).

Again, this is mostly just an exercise to illustrate the process of starting small, iterating frequently, and restricting data to iterate quickly. I would be really hesitant to work with the time-series stuff directly because the iteration time would be so much higher (you'll notice I'm only sampling position at time t and/or reducing some time window to a single value, like a binary "beat the blocker" or a measure of initial movement post-snap).

Personally, the topic I would probably be most inclined to pursue with data like this would be trying to classify offensive and defensive schemes based on personnel groupings and starting positions (possibly with a blitz checker script the looks for movement past the line of scrimmage within some time window for ILBs, safeties, and CBs on passing plays), then model play outcomes. What is the measurable impact of nickel formations on running and passing plays? Single-high safeties on big play success rate? How does a power run fare against different defensive fronts? But that might be a project bigger than 24 hours.

Good luck buddy!

pangstrom
Jan 25, 2003

Wedge Regret
Another simple thing that could be interesting are heatmaps of where RBs go when/after they get the ball, relative to the initial spot of the ball. You could break it out by the perimeter backs vs. the one-cut backs, something else, or classify backs yourself based on which they tend to do etc.

Spoeank
Jul 16, 2003

That's a nice set of 11 dynasty points there, it would be a shame if 3 rings were to happen with it
I'm editing Forever_Peace's next post right now and you guys are in for a treat. :swoon:

SurgicalOntologist
Jun 17, 2004

Welp, I'm out of the hackathon. I was originally told remote participation would be allowed, but the NFL put the hammer down on that. Sucks.

I'm a scientist studying human movement, and one of my specialties is analyzing motion tracking data.... so I'd like to think I was well qualified for this. Hope I get another chance at a dataset like that sometime.

Looking forward to Forever_Peace's next chapter though!

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Funny you should mention that. The Ground Control Project is live. First chapter and first app are up. Chapter 2 and two more apps go up within the week. Goons: assemble.

(sorry to hear you're out of the hackathon though. That really sucks. Was looking forward to having a goon rep talk to the NFL folks)

Adbot
ADBOT LOVES YOU

pangstrom
Jan 25, 2003

Wedge Regret
Came in here to see the hackathon results/updates and am bummed, but I will go check out ground control at least.

  • Locked thread