Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Locked thread
Amy Pole Her
Jun 17, 2002
I feel like you did other kids homework

Adbot
ADBOT LOVES YOU

JPrime
Jul 4, 2007

tales of derring-do, bad and good luck tales!
College Slice
Edquisha?

the mean lunch lady
Jun 24, 2009

went mad at sea
lots were drawn
Kroenke didn't survive
he was delicious
That's the only bad name of the bunch imo

Spoeank
Jul 16, 2003

That's a nice set of 11 dynasty points there, it would be a shame if 3 rings were to happen with it
yo F_P I shot you an email!

DJExile
Jun 28, 2007


Amy Pole Her posted:

I feel like you did other kids homework

hahahaha this is the impression i'm getting too

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

Amy Pole Her posted:

I feel like you did other kids homework

Worse, I did other dopey-rear end side projects in school too that were just as much work but didn't actually help out my friends any. In high school I spent months writing a complete symphony that nobody can actually play because I didn't know how to transpose music into any instruments that aren't in C. In elementary school I invented a game based around the statistics on the back of basketball cards (no it didn't make any sense).

Like, don't get me wrong, I also did normal people things. Played 2 sports a year since age 5. I just also did a lot of weird useless things.

DJExile
Jun 28, 2007


i really want to know what that basketball card game was

sourdough
Apr 30, 2012

DJExile posted:

i really want to know what that basketball card game was

Yes, please post it

ifuckedjesus
Sep 5, 2002
filez filez filez filez filez filez filez filez filez
Any chance we will see the receiving information for either or RB's or WR's in the future?

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

ifuckedjesus posted:

Any chance we will see the receiving information for either or RB's or WR's in the future?

Eventually! Pulling play-by-play receiving data (and passing data) will give me data for everybody, not just RBs, so I'm intending to do it when I'm ready to move on from the ground game. Tentative plan for the rest of this offseason is 1) expanded time frame, 2) QB runs, 3) maybe some Bayesian modeling, 4) maybe some predictive modeling. (Requests and suggestions are welcome)

DJExile posted:

i really want to know what that basketball card game was

What I can remember was pretty incoherent. I definitely had 5 on 5 where folks would take turns literally moving players around, and you had to have 1 of each position. I'm pretty sure I simulated a hundred-sided die using slips of paper labeled 0-9 (draw for the 10s digit, then the 1s digit, 00 is 100) where successful scores required drawing under the player's career FG % (or 3pt %). Literally the only thing defenders could do was block shots or steal the ball, since those were the only defensive stats on the cards. I think there might have been a formula to determine rebounds?

A draft definitely kicked things off. That was the important bit, clearly. I only had about ~60 basketball cards so the universe of players was not large. We never finished an actual game (turns out it was basically make believe fantasy basketball).

Forever_Peace fucked around with this message at 02:22 on Jul 10, 2017

DJExile
Jun 28, 2007


Forever_Peace posted:

What I can remember was pretty incoherent. I definitely had 5 on 5 where folks would take turns literally moving players around, and you had to have 1 of each position. I'm pretty sure I simulated a hundred-sided die using slips of paper labeled 0-9 (draw for the 10s digit, then the 1s digit, 00 is 100) where successful scores required drawing under the player's career FG % (or 3pt %). Literally the only thing defenders could do was block shots or steal the ball, since those were the only defensive stats on the cards. I think there might have been a formula to determine rebounds?

A draft definitely kicked things off. That was the important bit, clearly. I only had about ~60 basketball cards so the universe of players was not large. We never finished an actual game (turns out it was basically make believe fantasy basketball).

That's... actually really clever! You basically made something between fantasy basketball and tabletop card games.

pmchem
Jan 22, 2010


I have a NFL data science question and this seems like the most appropriate place. Let's talk raw data sources. Ground Control uses nfldb.

As far as I'm aware (and I might be wrong), if a player does not collect stats in a particular game, nfldb does not differentiate between: (1) player was active, but did not play, (2) player was inactive due to injury, (3) player was suspended. I'm also not sure what it does with players who not on a NFL roster for a given week but collect stats other weeks; presumably it has week-by-week NFL roster status for each player.

Is there an easy source for data types 1-3? Especially #2+#3? Preferably in a way that can be easily imported via a python interface such as nfldb? I am looking to do a little machine learning, but missing that data would make the effort pointless. Manually entering the data would be prohibitive on my human-time.

SurgicalOntologist
Jun 17, 2004

It's not free and I didn't check that it had exactly what you want, but I would be surprised if it doesn't (I think it has who was on the field for every play, not sure about roster status): http://armchairanalysis.com/

Certainly cheaper than the enterprise data vendors like STATS.1

Chichevache
Feb 17, 2010

One of the funniest posters in GIP.

Just not intentionally.

Amy Pole Her posted:

I feel like you did other kids homework

Yes, but here's the spooky surprise at the end of the movie: he was also a homeschooled only child.

got any sevens
Feb 9, 2013

by Cyrano4747

DJExile posted:

That's... actually really clever! You basically made something between fantasy basketball and tabletop card games.

And 5vs5 is a good number too, enough for some variation between players, you could have a 'bench' for substitutions, etc. :allears:

I'd also institute a team total value salary cap system, so you could have a few mediocre cards and a couple greats, or all average levels, etc. to prevent deck stacking
I made a homemade Triple Triad deck like that a few years ago, it was fun to tinker with.

Edit: back on topic, i'm eager to see what your qb runs analysis is, you always hear that announcer dicksucking about tom brady qb sneaks, i'm curious how much of it is bs or is dependent on the Oline, and what other qb's/teams do with the run (esp DangerRuss)

got any sevens fucked around with this message at 15:51 on Aug 2, 2017

blackmongoose
Mar 31, 2011

DARK INFERNO ROOK!

got any sevens posted:

And 5vs5 is a good number too, enough for some variation between players, you could have a 'bench' for substitutions, etc. :allears:

I'd also institute a team total value salary cap system, so you could have a few mediocre cards and a couple greats, or all average levels, etc. to prevent deck stacking
I made a homemade Triple Triad deck like that a few years ago, it was fun to tinker with.

By the time you've gone this far you might as well play Strat-O-Matic, though that kind of stat based system obviously works better for sports with discrete plays like baseball and football. Actually, a lot of the statistical work in this thread could probably improve their modeling of run plays or form a basis for your own system, though I think baseball is where most of the fantasy simulation style market resides.

Ghost of Reagan Past
Oct 7, 2003

rock and roll fun

pmchem posted:

I have a NFL data science question and this seems like the most appropriate place. Let's talk raw data sources. Ground Control uses nfldb.

As far as I'm aware (and I might be wrong), if a player does not collect stats in a particular game, nfldb does not differentiate between: (1) player was active, but did not play, (2) player was inactive due to injury, (3) player was suspended. I'm also not sure what it does with players who not on a NFL roster for a given week but collect stats other weeks; presumably it has week-by-week NFL roster status for each player.

Is there an easy source for data types 1-3? Especially #2+#3? Preferably in a way that can be easily imported via a python interface such as nfldb? I am looking to do a little machine learning, but missing that data would make the effort pointless. Manually entering the data would be prohibitive on my human-time.
If you could get week-to-week roster data, you could smack the data sets together reasonably quickly. It'd be kind of convoluted but it shouldn't be prohibitively time-consuming or difficult. Like, just the raw data should be fine, you don't need much more than that.

That data is likely available on a source like Pro Football Reference but I can't be 100% sure. It wouldn't be easily importable but you should be able to get it if you're dedicated.

This is actually the hardest part of doing data science.

Ghost of Reagan Past fucked around with this message at 14:31 on Aug 6, 2017

pmchem
Jan 22, 2010


Ghost of Reagan Past posted:

This is actually the hardest part of doing data science.

yeah, which is why I'm asking around. I looked at PFR and their weekly data seems limited to starters and participants. No information about deactivation due to injury vs suspension.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
I don't know either but if you find a source let me know.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
I don't mean to excite anybody but I'm currently optimizing hyperparameters for a support vector machine ensemble.

That means we are (finally) doing some machine learning, so naturally we're going to do machine learning like a goddamned motherfucker.

(just dropping a line to let folks know this isn't dead)

pmchem
Jan 22, 2010


Forever_Peace posted:

I don't know either but if you find a source let me know.

Well if you find a source of that data, I'll throw some of my ML skills at some things. Need the data, though.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
I've recently re-derived my own versions of "Expected Points Added" and "Success Rate" using machine-learning.

After about 3 weeks of model fitting and another week of quality checks and tweaking, I finally started poking around the results. One of the first things I did was estimate how above or below average each player has been with their contributions to their team's drives (in units of expected points given the game situation). I used a method that weights performance by sample size, so players with fewer carries have their estimates "shrunk" towards the mean. The outliers need to have a lot of high-quality carries.

Devonta Freeman: 1 standard deviation above average
Marshawn Lynch and David Johnson: 2 SD above average
Le'Veon Bell, DeMarco Murray, Adrian Peterson: 3 SD above average
Arian Foster: 3.4 SD above average
LeSean McCoy: 3.9 SD above average

Jamaal Charles: 5.5 SD above average

Every time I sit down and try to come up with a fresh way to look at the run game, Jamaal Charles comes along and blows the metric the gently caress up. Jamaal Charles is impossible.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Some other players 2+ SD above average:
Ezekiel Elliott
Mike Gillislee
Mark Ingram
BenJarvus Green-Ellis
Jeremy Hill

Looks good with limited samples:
Dion Lewis
Kenneth Dixon
Kendall Hunter
Rex Burkhead
Derrick Henry

2 SD below average:
Tre Mason
Alfred Blue
Darren McFadden
Bernard Pierce

Worst Success Rate Estimate:
Knile Davis

pmchem
Jan 22, 2010


Forever_Peace posted:

Some other players 2+ SD above average:
Ezekiel Elliott
Mike Gillislee
Mark Ingram
BenJarvus Green-Ellis
Jeremy Hill

The fact that "the Law Firm" BJGE makes that list should only reinforce the fact that it's really the rushing offense that is being judged here, not the individual RB. While there may be a high correlation between the two, false positives (and negatives) will sneak in. BJGE is a career 3.9 YPC guy who managed a high TD/carry rate due to playing on the Patriots a few years.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

pmchem posted:

The fact that "the Law Firm" BJGE makes that list should only reinforce the fact that it's really the rushing offense that is being judged here, not the individual RB. While there may be a high correlation between the two, false positives (and negatives) will sneak in. BJGE is a career 3.9 YPC guy who managed a high TD/carry rate due to playing on the Patriots a few years.

I agree that this isn't intended to disentangle a running back from the rest of the team (I'm not sure this is even a meaningful concept in football), but law firm is absolutely not a false positive for EPA. Law firm did two things that EPA loves: he didn't lose yardage much (expected points on a drive plummets on 2nd or 3rd and long), and more importantly, he didn't put the ball on the ground. In his entire 4 years with the Pats, he literally didn't fumble once. Fumbles are devastating: you lose the entire expected points for the drive, AND set the opposing team up with expected points in their favor. Law Firm is the perfect example of a guy that was an important contributor to his team's ability to score points without racking up gaudy numbers.

I think the YPC you mentioned is the poor measure for player contribution here.

edit: Yards gained only explains about a third of the variance in Expected Points Added on a play.

Forever_Peace fucked around with this message at 23:47 on Aug 27, 2017

pmchem
Jan 22, 2010


Well, without being able to peer into your formula and see how much each particular factor counts towards your EPA, it's hard to frame an argument. I can only discuss by example. So, example: David Johnson has 9 career fumbles compared to BJGE's 5, and DJ has those fumbles on a little more than half the career touches of BJGE. I'd take DJ over BJGE on my football team any day of the week and twice on Sunday. They were rated similarly in your SD metric.

BJGE, on the other hand, took years to earn a one-dimensional / GLB rushing job with the Pats. Their (arguably best front office in the NFL) let him walk in FA. He had one good season on a 3-year contract with Cincy, and washed out of the league before age 30, despite not having any major injuries. He's Just A Guy and sometimes fumble rates are just luck. I will never be convinced BJGE is 2 standard deviations above most other RB's. He was in the right spot, at the right time, with the right stocky, power rushing frame to fall forward for the Pats.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

pmchem posted:

Well, without being able to peer into your formula and see how much each particular factor counts towards your EPA, it's hard to frame an argument. I can only discuss by example. So, example: David Johnson has 9 career fumbles compared to BJGE's 5, and DJ has those fumbles on a little more than half the career touches of BJGE. I'd take DJ over BJGE on my football team any day of the week and twice on Sunday. They were rated similarly in your SD metric.

I'm in the process of writing it up, and plan to release both the source code and the data! The expected points of a drive is estimated using down, distance, field position, score differential, clock time, home vs away status (which conceivably impacts the ability to call plays effectively, particularly in no-huddle situations), and the predicted probability of being able to convert a first down and keep the drive going (which is in turn defense-adjusted). Predictions are a weighted average of the expectations given by a random forest model, a generalized additive model, a support vector machine fit to each down, an optimized logistic regression, and a 5-layer neural network (where the weights are the correlation between the predicted points and actual points in an out of sample test set). Every model had hyperparameters fit by grid search cross-validated optimization procedures. The ensemble values explain about 25% of the variance in drive outcomes (and the explanatory power was extremely stable, which was a problem with previous EPA measures). No offensive measures were used at all in the model, as we want to use the EPA values to separate offensive performances (we don't want to predict away the thing we're intending to explain). It's a lot of fun stuff - I think you'll be into it!

quote:

BJGE, on the other hand, took years to earn a one-dimensional / GLB rushing job with the Pats. Their (arguably best front office in the NFL) let him walk in FA. He had one good season on a 3-year contract with Cincy, and washed out of the league before age 30, despite not having any major injuries. He's Just A Guy and sometimes fumble rates are just luck. I will never be convinced BJGE is 2 standard deviations above most other RB's. He was in the right spot, at the right time, with the right stocky, power rushing frame to fall forward for the Pats.

On the other hand, that same front office also gave Law Firm a 2nd-round tender the first time he was an RFA! In contrast, this is what they did to Blount.

I'd take DJ in a heartbeat too, of course. I'd hazard a guess that if he racks up another 500 carries with a similar performance, he'd come out well ahead of Law Firm on this particular metric (again, distance from the back is only done proportional to the strength of the evidence for the difference - this is an approach called "Best Linear Unbiased Prediction" [their name, not mine]). But I still think that you may be undervaluing what Law Firm was able to accomplish as a role player, particularly for the Pats. EPA is much more sensitive to playcalling and situational factors than other metrics I've worked with. It really reflects players that have been put in a situation to be successful. This is really apparent with the Fullbacks, who have just a handful of carries and a YPC all below 3 or so, but are used almost exclusively in extremely high-leverage situations to convert first downs and goal line stands. It's part of the reason I ranked them using a technique that factors in sample size (regressing them all towards the mean). Law Firm had a lot of successful plays in part because the coach and team situation put him in a position to succeed. His value came from his near-unparalleled ability to not gently caress it up, including a ridiculous 4-season streak without a single fumble. Guys like this are really undervalued by a lot of fans, and I'm glad stats like this can shine a light on this sort of thing.

got any sevens
Feb 9, 2013

by Cyrano4747
How bad was Trent rich?
it'd be cute to have a "Trench" line like baseball's Mendoza line

Leperflesh
May 17, 2007

Forever_Peace posted:

Law Firm had a lot of successful plays in part because the coach and team situation put him in a position to succeed. His value came from his near-unparalleled ability to not gently caress it up, including a ridiculous 4-season streak without a single fumble. Guys like this are really undervalued by a lot of fans, and I'm glad stats like this can shine a light on this sort of thing.

I have a suspicion that fumbles are much less likely for plays where the back is asked to plow into a stacked box to gain two yards, compared to the average running play in which a back might hope or expect to gain more yardage. In the former case, the back knows he's going to crash into defenders trying to strip the ball immediately, and also is going more for power than speed, which may mean better ball protecting technique on average.

I'm sure you're busy but I'd be curious to see if I'm right.

pmchem
Jan 22, 2010


Forever_Peace posted:

EPA is much more sensitive to playcalling and situational factors than other metrics I've worked with. It really reflects players that have been put in a situation to be successful.

It's real late after a 20-team draft and game of thrones, so I'm quoting just this part. Yes, I agree with the quoted part! I like your analysis and efforts. But they are often framed around individual players, and the casual reader may read it as an absolute metric of individual player value. There is so much going into it reflecting the rest of the team around the player, the playcalling, and the general offensive efficiency of the team that is extremely difficult to disentangle. You put BJGE on the 2010 Pats, he has a great EPA. You put him on I dunno, the 2010 Cardinals, he's Beanie Wells, nobody remembers him and his EPA is uninteresting.

The trick is identifying (algorithmically) the talents that transcend their situation. Charles might be one. Barrie Sanders was one.

DJExile
Jun 28, 2007


Forever_Peace posted:

Jamaal Charles: 5.5 SD above average

Every time I sit down and try to come up with a fresh way to look at the run game, Jamaal Charles comes along and blows the metric the gently caress up. Jamaal Charles is impossible.

Jesus :stare:

got any sevens
Feb 9, 2013

by Cyrano4747
Any rookies you think might break out this year, or do you not follow college much?

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah

got any sevens posted:

How bad was Trent rich?
it'd be cute to have a "Trench" line like baseball's Mendoza line

EPA 1.34 SD below average (bottom 30 out of ~430), Success rate 1.84 SD below average (just missed bottom 15). Pretty bad. Alfred Blue somehow has done even worse on both counts.

Leperflesh posted:

I have a suspicion that fumbles are much less likely for plays where the back is asked to plow into a stacked box to gain two yards, compared to the average running play in which a back might hope or expect to gain more yardage. In the former case, the back knows he's going to crash into defenders trying to strip the ball immediately, and also is going more for power than speed, which may mean better ball protecting technique on average.

I'm sure you're busy but I'd be curious to see if I'm right.

Seems like a plausible hypothesis! Had a goon interested in "claiming" the fumble analysis for a guest article, but I haven't heard from him in a while. If he's still MIA in a few weeks I can rustle something up really quick.


pmchem posted:

It's real late after a 20-team draft and game of thrones, so I'm quoting just this part. Yes, I agree with the quoted part! I like your analysis and efforts. But they are often framed around individual players, and the casual reader may read it as an absolute metric of individual player value. There is so much going into it reflecting the rest of the team around the player, the playcalling, and the general offensive efficiency of the team that is extremely difficult to disentangle. You put BJGE on the 2010 Pats, he has a great EPA. You put him on I dunno, the 2010 Cardinals, he's Beanie Wells, nobody remembers him and his EPA is uninteresting.

The trick is identifying (algorithmically) the talents that transcend their situation. Charles might be one. Barrie Sanders was one.

Again, I don't think the thing you are looking for here is particularly coherent. We already know that guys can thrive in one scheme but not another, to the point where a particular body type might only even be playable given particular schemes (defensive linemen are particularly notorious for this). It's hypothetically interesting to consider which players may be highly adaptable, or how players might perform given some identical standard scheme and context and surrounding personnel, but this just isn't how football works. You're absolutely correct that readers would be mistaken if they think any of these metrics reflect scheme- or context-independent performance, but I certainly don't make this claim anywhere (and try to be pretty transparent that the context matters a lot - I even wrote a whole chapter on it). Pretty much the best we can do to get at the thing you want here is to compare a runner to his teammates, which I've already produced an app for (though naturally this sort of analysis depends quite a lot on who the teammates are and how they are used, so we don't actually avoid the context problem).

got any sevens posted:

Any rookies you think might break out this year, or do you not follow college much?

I don't work with college numbers, but I can give you completely uninformed amateur hunches!
- I think this tight end class has the potential to be a historic one, but we won't really know until 2020-2021.
- I was high on Mahomes in KC. But I've also been impressed by Kizer and to a lesser extent Trubs during the preseason. The QBs might surpass my initial expectations.
- I initially thought the running backs class was a bit over-hyped. I think Mixon could fantastic for CIN, but I think Gurley-itis is a risk for Fournette in JAX, CMC had some inflated numbers due to Stanford's completely bananas commitment to the quality of their offensive line play, and Cook might have trouble running inside the tackles given his frame and poor combine. But all have put on a pretty good showing so far, so who knows - maybe they live up to the hype. I'm also intrigued by Hunt, especially now given the Ware injury.
- Corey Davis should be good for TEN. I was also higher than most on Zay Jones after the draft, though the Watkins trade and Mathews injury has led other folks to adjust their expectations up as well. I'm also intrigued by Kenny Golladay.

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Bad news: upcoming chapter is delayed because in my incompetence, I royally hosed up the website beyond my ability to repair. It is now incomprehensible screen vomit.

Good news: we have a new website.

We now include such cutting-edge technology as a browser bar and social links (edit: and comments sections and an RSS feed). Welcome to 2001 I guess!

Yell at me when you find broken links so I can fix them.

Forever_Peace fucked around with this message at 01:09 on Aug 31, 2017

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
In b4 the gigantic looming github icon of doom.

Forever_Peace fucked around with this message at 04:15 on Aug 31, 2017

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
OK, migration complete. Old site is dead. Everything is here now. Currently working on trying to make pictures clickable (baby steps).

Athanatos
Jun 7, 2006

Est. 1967
Giving a bump so this doesnt slip into archives

Spoeank
Jul 16, 2003

That's a nice set of 11 dynasty points there, it would be a shame if 3 rings were to happen with it
F_P we miss you :(

Forever_Peace
May 7, 2007

Shoe do do do do do do do
Shoe do do do do do do yeah
Shoe do do do do do do do
Shoe do do do do do do yeah
Sorry guys. =( The next chapter has been 90% done for quite a while, but a couple of data science job opportunities opened up that I wanted to pursue, which has been eating a lot of my time.

I did just give a stats-y job talk that casually dropped some Ground Control stuff on em though. They loved it!

Adbot
ADBOT LOVES YOU

GonadTheBallbarian
Jul 23, 2007


Get you some paper, dude!

  • Locked thread