Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
JFairfax
Oct 23, 2008

by FactsAreUseless
im the tottenham heatspurs

Adbot
ADBOT LOVES YOU

TheBigAristotle
Feb 8, 2007

I'm tired of hearing about money, money, money, money, money.
I just want to play the game, drink Pepsi, wear Reebok.

Grimey Drawer

Why Tottenham Heatspurs

Arquebus
Feb 19, 2013
loving heatspurs.

wicka
Jun 28, 2007


i like that he clearly got lazy and just thought "i guess a P is like half of a B?"

straight up brolic
Jan 31, 2007

After all, I was nice in ball,
Came to practice weed scented
Report card like the speed limit

:homebrew::homebrew::homebrew:

I like that the Bulls are supposed to be man utd but it's the Wolves logo

JFairfax
Oct 23, 2008

by FactsAreUseless

straight up brolic posted:

I like that the Bulls are supposed to be man utd but it's the Wolves logo

that's exactly what I thought haha!

peanut-
Feb 17, 2004
Fun Shoe
I quite like some of those. The Burnley one is great.

Bogan Krkic
Oct 31, 2010

Swedish style? No.
Yugoslavian style? Of course not.
It has to be Zlatan-style.

BOURNE
MOUTH

Syncopated
Oct 21, 2010

The Chelsea one haha

UnlimitedSpessmans
Jul 31, 2015

why is the boro one central america wtf

Crazy Ted
Jul 29, 2003

Heatspurs really brings the whole thing up a notch or two

Weaponized Cum
Aug 31, 2004


This post brought to you by the finest Miami cocaine money can buy ----->
as a Miami Spurs fan, I allow it

Poonior Toilett
Aug 21, 2004

m'lady

TheBigAristotle posted:

Why Tottenham Heatspurs

Miami Heat

TheBigAristotle
Feb 8, 2007

I'm tired of hearing about money, money, money, money, money.
I just want to play the game, drink Pepsi, wear Reebok.

Grimey Drawer
It's the only one to incorporate the NBA team name

Crazy Ted
Jul 29, 2003

TheBigAristotle posted:

Why Tottenham Heatspurs
Because heat can be described as hot, you see...

Poonior Toilett
Aug 21, 2004

m'lady

TheBigAristotle posted:

It's the only one to incorporate the NBA team name

Man idk I'm looking at the same image as you, drat

straight up brolic
Jan 31, 2007

After all, I was nice in ball,
Came to practice weed scented
Report card like the speed limit

:homebrew::homebrew::homebrew:

lmao

https://twitter.com/registability/status/820322449304723457

TheBigAristotle
Feb 8, 2007

I'm tired of hearing about money, money, money, money, money.
I just want to play the game, drink Pepsi, wear Reebok.

Grimey Drawer
https://twitter.com/registability/status/818986595290386433

This account really keeps you guessing.

wicka
Jun 28, 2007


this 20 year old seems uncomfortable signing for a premier league team a week after being fired from his day job, must be a rapist

straight up brolic
Jan 31, 2007

After all, I was nice in ball,
Came to practice weed scented
Report card like the speed limit

:homebrew::homebrew::homebrew:

https://twitter.com/Caley_graphics/status/820656192578064384

people still use expected goals

jyrka
Jan 21, 2005


Potato Count: 2 small potatoes
It has its uses.

Jose
Jul 24, 2007

Adrian Chiles is a broadcaster and writer
https://twitter.com/paulpogba/status/820592336342286336

eh the pogba emoji doesn't display on SA

sticksy
May 26, 2004
Nap Ghost

It's ok, Pogba didn't show up for today's game either.

blue footed boobie
Sep 14, 2012


UEFA SUPREMACY
UPDATE: Data Breakdown of why Giroud is an Underrated Goalscorer self.Gunners
EDIT: I was not expecting this sort of support. Feedback has been awesome. Support has been awesome. Those with the words of encouragement you deserve to be listed by name but unfortunately there are quite a few deserving and I have to run!! I studied this in school but had to start a career that didn't make use of this so I really appreciate the response!
Hey all,
Introduction
Two weeks ago I made this post on why Giroud is an underrated goal-scorer. While I still stand by that argument, I made the post saying I would like to do a deeper dive, because while goals are not created equal, successive goals still do have value. If you are only concerned with valuing a goalscorer by their ability to get goals that change Ls to Ds and Ds to Ws, then Giroud is still the top man. However, if you want something a little more thorough, read on, because the conclusions are different.
I want to begin by saying the entire basis of this methodology, motivation for this post is taken straight from, or motivated by, the arguments in Anderson & Sally's The Numbers Game. It has the incendiary subtitle of "Why Everything You Know About Soccer Is Wrong", but you should look past that, as it is an amazingly insightful, yet digestible, book on soccer analytics.
This entire basis of both my original post was the idea that the "x number of goals a season" as a measure of impact or productivity is simply wrong, and in the grand scheme of things that is still true. However, both this post and the previous one both argue goals are not equal. The difference is the first one did not give numeric values to each goal but whether or not they had past a threshold of impact (i.e Goals that were equalizers or game-winning).
Summary of Method's Logic
A brief summary of why we can assign a value at all as explained in The Numbers Game: Anderson & Sally argue that because goals in Association Football are scarce to relative to other sports, are equally so across the top four leagues, and impact another metric that is also equal across the four major leagues (Three points for a Win, One for a Draw, and None for a Loss), there is an "exchange rate" from goals to points. However, unlike money exchange rates where the first dollar is worth the same as the fifth in it's value in pounds, the value of a goal is dependent upon how many goals have already been scored.
Anderson & Sally calculated this diminishing returns of value for successive goals by averaging the number points a certain number of goals would get you. Then, subtracting a given number of goal's value from the previous number of goal's value to create a marginal value increase brought on by a given number goal.
Dataset Description and Data Validity
Now, this is where my input comes in. Wenger's Arsenal serves as a remarkably good team for analyzing: Wenger has been in charge of Arsenal for twenty years, He's frequently mentioned in the longest serving managers list of all time, and is the only current manager on those lists. To add to this, Wenger is notoriously stubborn. Granted, that is a subjective thing to measure, but he is frequently criticized for it by fans, pundits and rivals alike. Fair to say there is a consensus of opinion there. By having such consistent management who is also quite consistent himself, we can take a large data set and know there is going to be significantly less "noise" in the data. By isolating this analysis to one relatively consistent team over a long period of time which would account for a lot of other variables, I feel very confident of the results this time. That said, there is a dearth of digestible soccer analytics out there, so I am happy to have feedback.
Data Selection
I chose to go back to 2006, which would mean about 10 and a half seasons of data to work with, which mean I had 401 observations. I would share the google file with everyone, however it is linked to my personal account which includes my full name, so I will just describe what I did if you wish to replicate it.
First I formatted data collected from 11vs11.com and isolated the competition to the Premier league to make it friend for Google Sheets functions. (Quick side note on this, it is actually pretty easy to turn copy and pasted data into Google Sheets friendly formatting by learning how to use the SPLIT, VLOOKUP and IF (layered) functions). First I wanted to assign all the points for each game. I wrote a layered IF function that first recognized if the home team was Arsenal or not, so if "True" I got Arsenal's goals scored from the second column, and if false, then I got Arsenal's goals scored from the third column. I then did the inverse to get the opposition's goal scored to subtract the former from the later to get Arsenal's goal differential, which I then wrote a subsequent IF function to tell if that got them three points, one, or none. That might sound like a lot of work for something you could just look and tell, but doing that 400 times is a ton of manual work, for which I do not have the attention span. With my method above, I only had to do it once, then copy, and paste.
Then I created an Average IF function that averaged the points of a game with a given amount of goals scored. So, for example the average points scored for two goals scored in the game would be =AVERAGEIFS(F$2:F$402,$B$2:$B$402,H2,$A$2:$A$402,"ARS"), and due to how I formatted it, I then did a like function for when Arsenal was away, added the two, and averaged them both. This gave me the Average League Points Earned by Goals Scored
Data Analysis Output
Average League Points Earned by Goals Scored
No goals: 0.63
1 Goal: 1.44
2 Goals: 2.39
3 Goals: 2.73
4 Goals: 2.79
5 Goals: 3.00
6 Goals: 3.00
7 Goals: 3.00
Remember of course that a 0-0 draw is 1 point but you can still lose if you score none, hence the non-zero value that is less than one. The immediate takeaway is that scoring five goals is a near certainty of a win. "Tell me what else is new". Fair, but we aren't done yet. To get the value of each goal, we look at it's impact on average League Points earned on the margin. For those unfamiliar with that concept, the marginal value is the valued gained for one more unit gained.
Marginal Average League Points Earned by Goals Scored
No goals: 0.63
1 Goal: 0.81
2 Goals: 0.95
3 Goals: 0.35
4 Goals: 0.06
5 Goals: 0.21
6 Goals: 0.00
7 Goals: 0.00
Analysis Interpretation
First some takeaways: Those first two goals are pretty drat important. It might seem like I made a mistake given the dropoff and return at 3/4 and 4/5. Why would a forth goal have such little impact? Well, any true Gooner recalls Arsenal has had some memorable games that resulted in 4-4 draws which were Liverpool (The "Basketball" game), Tottenham and the game-that-shalt-not-be-named against Newcastle (Whom I must say we need to forgive them for that, given that with their dying game one-man down they beat Tottenham 5-1 to make St.Totteningham's day happen despite the fact they had already been relegated. "Remember us" said one Newcastle fan. Indeed, I have.)
But here is the biggest takeaway. Once I broke down the number of 1st goals, 2nd goals, etc by Arsenal Goalscorers we can assign a value of impact to the team for each of Arsenal's goalscorers in the 2016/17 season. Before we look at that, let's look at the ranking by "top" Arsenal Goalscorers so far in the 2016/17 Season.
1st: Sanchez - 14 goals
2nd: Walcott - 8 goals
3rd: Giroud - 7 Goals
4th: Ozil - 5 Goals
Tied for 5th: Cazorla, Koscielny, Iwobi and Oxlade-Chamberlain - 2 Goals
Tied for 6th: Perez, Chambers - 1 goal
Now the marginal point contribution by player
1st: Sanchez - 8.81 points.
2nd: Walcott - 5.04 points.
3rd: Giroud - 4.41 points
4th: Ozil - 3.15 points
Tied for Fifth: Cazorla, Koscielny, Iwobi and Oxlade-Chamberlain - 1.26 points
Tied for Sixth: Chambers, Perez - 0.63 points.
Bear in mind this is off historical records to avoid the noise created by relatively rare results like Arsenal's opening 3-4 loss to Liverpool in a small sample size (It's only the 22nd gameweek, so that's a pretty small sample size.)
Even if you made the basis just off of the league points to this season, the ranking falls the same way. Sanchez has earned 14 points, Walcott with 8 points, Giroud with 7 points, etc.
Conclusion
So, I was actually wrong the first time: Giroud is fairly well valued. Though when I say that, I'm not including something really subjective and which is probably included in the fans valuation: media coverage and pundit commentary. My first post failed to account for the dependent nature of goals to their previously goals scored, and my valuation of Giroud was inflated. Be aware though this is something of a coincidence, typically the ranking played out across the league would show the "best" strikers given this technique would be different when compared to "Top goalscorers". At least that's what Anderson & Sally argue and demonstrate in their book.
Linear Regression
I was interested in doing a regression analysis of this, so I did the following to try and assess that.
EDIT: Just realized I didn't include "Zero Goals scored" as a variable, but I wasn't sure if that would serve as an intercept of zero as a constant. Anyone care to weigh in on how I should have included it?
I added columns for each goal to the right of this historical table of results: 1st goal scored, 2nd goal scored, 3rd goal scored, etc. I then filtered Home by Arsenal and 1st goal and populated the column with a 1 for each goal scored in that game, and zero after that number, so a game with four goals scored would read 1,1,1,1,0,0,0 because in the past ten years Arsenal have scored seven goals twice (7-3 versus Newcastle; 7-1 versus Blackburn). Then I did the same for when Arsenal was away. The reason for this being that I could then write a COUNTIF function of all the 1st goals scored, 2nd goals scored, etc.
For the really statistically minded, I could use your input on a linear regression I did of this same data.
I've messed around with linear regression after I studied it in school, but it's relatively simplistic and probably not up to serious standards even for reddit. Though I'm going to type up the output and put it below and for anyone well versed please let me know what you think. I added a variable for location but other than that it's the same data.
Summary Output
Regression Statistics
Multiple R: 0.912678759180265
R Square: 0.832982517458828
Adjusted R Square: 0.82958267048598
Standard Error: 0.952317793600056
Observations: 401
ANOVA
Regression
df: 8
SS: 177.584692
MS: 222.1980865
F: 245
Significance F: 0
Residual
df: 393
SS: 356.4153077
MS: 0.90690918
Total
df: 401
SS: 2134
Variables
Intercept: I set this equal to zero as the constant is zero points.
Location: A dummy variable with 1 being Home, 0 being Away
Coefficient: 0.4483164468
Standard Error: 0.0901545933849575
t Stat: 4.97275213552627
P-value: 0.000000988126769127729
Lower 95%: 0.271070844732009
Upper 95%: 0.625562048833091
1st Goal:
Coefficient: 1.22620024414246
Standard Error: 0.0975443881704749
t Stat: 12.5706897868842
P-value: 1.07498485313343E-30
Lower 95%: 1.03442616853765
Upper 95%: 1.03442616853765
2nd Goal:
Coefficient: 0.94552349146247
Standard Error: 0.129003177285906
t Stat: 7.32945894322383
P-value: 1.32649808285781E-12
Lower 95%: 0.691900853214421
Upper 95%: 1.19914612971052
3rd Goal:
Coefficient: 0.299998139704283
Standard Error: 0.146305076375933
t Stat: 2.05049713335601
P-value: 0.0409787438544267
Lower 95%: 0.0123596470700352
Upper 95%: 0.58763663233853
4th Goal:
Coefficient: 0.0744169620428588
Standard Error: 0.204630918220931
t Stat: 0.363664311775771
P-value: 0.716304294209188
Lower 95%: -0.327891217013003
Upper 95%: 0.476725141098721
5th Goal:
Coefficient: 0.20479646999096
Standard Error: 0.360593868352575
t Stat: 0.567942186389919
P-value: 0.570398519032509
Lower 95%: -0.504137760558739
Upper 95%: 0.913730700540659
6th Goal:
Coefficient: -0.0498129385313935
Standard Error: 0.502015498271313
t Stat: -0.0992258978117687
P-value: 0.921009540166151
Lower 95%: -1.03678471544196
Upper 95%: 0.937158838379176
7th Goal:
Coefficient: -0.149438815594182
Standard Error: 0.778144725952704
t Stat: -0.192045015033958
P-value: 0.847806077963412
Lower 95%: 1.67928577353025
Upper 95%: 1.38040814234189
My takeaway. My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.
Then I look at Adjusted R squared, okay, pretty high, and in addition to that not a huge leap between that and R Square, so that's also good for the fit. I hear all the time about t-stats but I also know that they can be arbitrarily inflated when you don't have all the right variables, so I know the other way of checking for the statistical significance of a variable is looking at it's p-value. Location, 1st goal, 2nd goal and 3rd goal are all statistically significant if you define the limit as p=0.05.
The problem I see is with the p-value of the 4th through 7th goals. They are huge, as in not at all statistically significant. To add to that, their coefficients are negative, which suggests there is something really wrong with my model, no?
Possible flaws with this model
Probably omitted variables.
Functional form of the regression given how the "independent" variables are actually related to each other, because one follows the other.
Then there is the possibility of skew seeing as there are only 17 games out of 401 that had Arsenal scoring 5, 6 or 7 goals, I would attribute to this to some sort of skew, which may explain why the next most infrequent category, 4th goal scored, was also statistically insignificant.
Could use some help interpreting or fixing this.

Literally Lewis Hamilton
Feb 22, 2005



Jfc nobody is going to read that

TheBigAristotle
Feb 8, 2007

I'm tired of hearing about money, money, money, money, money.
I just want to play the game, drink Pepsi, wear Reebok.

Grimey Drawer
The Gooner's Manifesto

Strawman
Feb 9, 2008

Tortuga means turtle, and that's me. I take my time but I always win.


big crush on Chad OMG posted:

Jfc nobody is going to read that

tl;dr - Remember of course that a 0-0 draw is 1 point but you can still lose if you score none, hence the non-zero value that is less than one. The immediate takeaway is that scoring five goals is a near certainty of a win

TheQuietWilds
Sep 8, 2009

Strawman posted:

tl;dr - Remember of course that a 0-0 draw is 1 point but you can still lose if you score none, hence the non-zero value that is less than one. The immediate takeaway is that scoring five goals is a near certainty of a win

Sounds like a really productive use of time to math out, he should get on the phone with coaching staffs, see if he can't cause a revolution

jyrka
Jan 21, 2005


Potato Count: 2 small potatoes
That is an impressive amount of effort for nothing. What's his name on SA?

Tokyo Sexwale
Jul 30, 2003

Strawman posted:

tl;dr - Remember of course that a 0-0 draw is 1 point but you can still lose if you score none, hence the non-zero value that is less than one. The immediate takeaway is that scoring five goals is a near certainty of a win

deffo something you need a dissertation for imo

sticksy
May 26, 2004
Nap Ghost
My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.

wicka
Jun 28, 2007


sticksy posted:

My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.

Thank you, I never would've found this bit on my own and it is loving hilarious.

Nottherealaborn
Nov 12, 2012

sticksy posted:

My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.

Lmfao

jre
Sep 2, 2011

To the cloud ?



sticksy posted:

My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.

:eyepop:

straight up brolic
Jan 31, 2007

After all, I was nice in ball,
Came to practice weed scented
Report card like the speed limit

:homebrew::homebrew::homebrew:

Jesus Christ the people in the comments

Weaponized Cum
Aug 31, 2004


This post brought to you by the finest Miami cocaine money can buy ----->
reddit.txt

Sneaks McDevious
Jul 29, 2010

by LITERALLY AN ADMIN

sticksy posted:

My interpretation skills are a bit hazy, but I recall that a quick way of checking that the fit of my model is not pure chance is looking at its robustness or Significance F, which is just zero, so there is a zero chance this fit is random. So far, so good.

outstanding

mackintosh
Aug 18, 2007


Semper Fidelis Poloniae
that took a lot more scrolling than I anticipated

paddyboat
Feb 20, 2013

Maxi, Maxi Rodriguez
Run down the wing for me
Marco van Basten has lost his drat mind

https://apnews.com/258e5f505eb849d2989ef9bc47404610

Adbot
ADBOT LOVES YOU

Bea Nanner
Oct 20, 2003

Je suis excité!
I'd really like to see old school NASL style penalties make a comeback.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply