Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
MedicineHut
Feb 25, 2016

chadbear posted:

I actually tried looking for anomalies using Benford's Law in the hourly funding data a few months ago but I didn't find anything. Neither absolut numbers nor hourly changes stood out. Even pledge frenzies during their ship sales followed Benford's Law. Of course that doesn't prove that the numbers are real sales. It just means that the hourly funding numbers that CIG puts in their funding tracker probably aren't made up by an intern who has the task of inventing numbers for the tracker.

If I wanted to fudge the numbers in the tracker I wouldn't just invent numbers anyway. I would say "OK let's add 591 accounts that have a subscription and sprinkle 12 Idris sales throughout the next month". I'd conjecture that such a procedure would produce numbers following Benford's Law.

I just extracted the whole set, daily pledge numbers, 2830 values in total. And it seems to show as fake.

e:

MedicineHut fucked around with this message at 08:42 on Aug 14, 2020

Adbot
ADBOT LOVES YOU

chadbear
Jan 15, 2020

MedicineHut posted:

I just extracted the whole set, daily pledge numbers, 2830 values in total. And it seems to show as fake.

Oh, I missed that you were talking about the daily pledges, sorry. I didn't look at daily pledges since I figured that they would fudge the hourly numbers and just sum up the hourly pledges for the daily numbers. Maybe they're doing it the other way round?

MedicineHut
Feb 25, 2016

chadbear posted:

Oh, I missed that you were talking about the daily pledges, sorry. I didn't look at daily pledges since I figured that they would fudge the hourly numbers and just sum up the hourly pledges for the daily numbers. Maybe they're doing it the other way round?

I dont know. Does it matter? Weather you use hourly or daily should not change the result no?

The Rabbi T. White
Jul 17, 2008





MedicineHut posted:

I just extracted the whole set, daily pledge numbers, 2830 values in total. And it seems to show as fake.

e:


I am not fond of them yanking on that pupper’s leg like that. :/

chadbear
Jan 15, 2020

MedicineHut posted:

I dont know. Does it matter? Weather you use hourly or daily should not change the result no?

If they (manually) fudge the daily numbers, they still need to come up with hourly numbers. They can't just divide daily numbers by 24 so they might use a randomizer that produces hourly numbers that add up to the daily number. Random numbers typically observe Benford's Law, so you'd get a conspicuous daily number and an inconspicuous hourly number.

Or I screwed up somewhere, that's actually more likely

MedicineHut
Feb 25, 2016

chadbear posted:

If they (manually) fudge the daily numbers, they still need to come up with hourly numbers. They can't just divide daily numbers by 24 so they might use a randomizer that produces hourly numbers that add up to the daily number. Random numbers typically observe Benford's Law, so you'd get a conspicuous daily number and an inconspicuous hourly number.

Or I screwed up somewhere, that's actually more likely

I may be wrong but as far as I can see in the tracker sheet they capture hourly figures indeed, and then the first tab simply collects that hourly data and summarizes it for a daily view.

chadbear
Jan 15, 2020

MedicineHut posted:

I may be wrong but as far as I can see in the tracker sheet they capture hourly figures and then the first tab simply collects the hourly data and summarizes the numbers for a daily view.

Sure. I'm not talking about the tracker but about CIG since they produce the numbers, both daily and hourly. I'm just trying to come up with an explanation why the daily numbers show a conspicuous pattern and the hourly numbers do not. So either CIG fudge the daily numbers and randomize the hourly numbers as a function of the daily numbers or I screwed up.

MedicineHut
Feb 25, 2016

chadbear posted:

Sure. I'm not talking about the tracker but about CIG since they produce the numbers, both daily and hourly. I'm just trying to come up with an explanation why the daily numbers show a conspicuous pattern and the hourly numbers do not. So either CIG fudge the daily numbers and randomize the hourly numbers as a function of the daily numbers or I screwed up.

Which set of figures you used for the hourly?

chadbear
Jan 15, 2020

MedicineHut posted:

Which set of figures you used for the hourly?

The hourly data from the funding tracker spreadsheet. I figured that they'd come directly from CIG's pledge tracker.

MedicineHut
Feb 25, 2016

chadbear posted:

The hourly data from the funding tracker spreadsheet. I figured that they'd come directly from CIG's pledge tracker.

We might need to work this offline to avoid thread derail so to get to the bottom of it but I just did a similar exercise with hourly data and it still yields a fake result for me :p

For info; what i used is the data in the second tab "Hourly Pledge Capture". In there you have total fund absolutes hour by hour, so to get the hourly actuals I had to create a new table and substract absolutes at a given hour from the previous hour. Using that it still gives me a fake. I had to clean a few rows that had zero values and a few other oddities.

MedicineHut
Feb 25, 2016

On that same second tab we also have at the top actuals for hourly in the last 7 days:

8330 8324 11588 10059 7317 8135 7724 5693 5336 5145 5024 4259 5727 6443 8979 10297 11385 8123 10391 12509 7893 9837 8565 8381
9476 10775 6996 7024 4805 13387 7020 6441 6234 9440 10090 5241 4915 5967 7814 7310 10526 6515 8263 10555 10751 7727 8331 5841
8800 6718 5428 5654 5258 4790 3562 4667 4574 3252 5115 4158 6208 5051 8135 7823 6411 6396 8211 8141 12858 6316 5088 7196
4258 4686 5611 5482 3180 5621 5583 2651 2542 2591 4157 4897 4815 5732 4719 5009 5750 4615 7347 7157 8389 8173 6010 6724
6556 6328 5608 4628 4831 8657 3247 3892 6461 4147 4043 3337 4658 5481 6205 6400 7345 5632 7010 7357 8638 6171 4559 4687
4201 5587 2606 2962 2645 2344 2723 4411 2603 3618 2203 3620 2881 3041 3615 6131 81924 125251 82255 70869 55932 49415 41803 36900
31323 20664 21285 20183 18051 16613 15934 12392 13862 9293 14016 12507 9588 14705 15621 14381 39497 29210 26125 22767 23509 19200 22947 15022

Using just this reduced set it also yields a fake result...

Nyast
Nov 14, 2017

BLAZING AT THE
SPEED OF LIGHT

G0RF posted:

I love it. We get a glimpse of how far the “vision” has progressed since 2017 and it’s still going to be long, lifeless dialogue scenes with big stars mouthing lameass lines and having occasionally “wakka wakka” comic exchanges with crew.

This is the game I hope to play watch someday.

That reminds me of that guy, a day or two ago, that was ranting that CIG's work on SQ42 is difficult because, you know, there's not going to be any scripting, all NPCs are gonna react dynamically with their incredible AI. Just lol.

chadbear
Jan 15, 2020

MedicineHut posted:

We might need to work this offline to avoid thread derail so to get to the bottom of it but I just did a similar exercise with hourly data and it still yields a fake result for me :p

For info; what i used is the data in the second tab "Hourly Pledge Capture". In there you have total fund absolutes hour by hour, so to get the hourly actuals I had to create a new table and substract absolutes at a given hour from the previous hour. Using that it still gives me a fake. I had to clean a few rows that had zero values and a few other oddities.

I couldn't find the the code that I used so I quickly redid the analysis. I copy/pasted the hourly pledges, converted everything into numbers and fed it into R. I used the benford.analysis package. For absolute hourly pledges I get a very conspicuous result. But in my opinion that's not a fair test because Benford's Law only concerns the first digit and the absolute hourly numbers are not independent. Imagine if you have 9.200.000 $ and you add some random ships every day. The leading number is going to be 9 for a long time even though there is no fudging.

So instead I looked at changes in hourly pledge numbers, i.e. how much is added every hour to the tracker.



The graph in the top left is the most important one. The first digit in the change of the pledge seems to follow Benford's Law pretty closely. There are some caveats though: For the other measures it doesn't follow Benford's prediction though, it seems. The statistical test that the package produces says:

Mantissa Arc Test

data: diff(data$pledge)
L2 = 0.010266, df = 2, p-value < 2.2e-16

Mean Absolute Deviation (MAD): 0.01254406
MAD Conformity - Nigrini (2012): Marginally acceptable conformity
Distortion Factor: -6.625748

So it's statistically significant but the mean absolute deviation seems to be so low that it's marginally acceptable.

I can share the code if you want.

chadbear fucked around with this message at 10:29 on Aug 14, 2020

MedicineHut
Feb 25, 2016

chadbear posted:

I couldn't find the the code that I used so I quickly redid the analysis. I copy/pasted the hourly pledges, converted everything into numbers and fed it into R. I used the benford.analysis package. For absolute hourly pledges I get a very conspicuous result. But in my opinion that's not a fair test because Benford's Law only concerns the first digit and the absolute hourly numbers are not independent. Imagine if you have 9.200.000 $ and you add some random ships every day. The leading number is going to be 9 for a long time even though there is no fudging.

So instead I looked at changes in hourly pledge numbers, i.e. how much is added every hour to the tracker.



The graph in the top left is the most important one. The first digit in the change of the pledge seems to follow Benford's Law pretty closely. There are some caveats though: The second order test (i.e. the change in change) doesn't follow Benford's prediction though, it seems. The statistical test that the package produces says:

Mantissa Arc Test

data: diff(data$pledge)
L2 = 0.010266, df = 2, p-value < 2.2e-16

Mean Absolute Deviation (MAD): 0.01254406
MAD Conformity - Nigrini (2012): Marginally acceptable conformity
Distortion Factor: -6.625748

So it's significant but the mean absolute deviation seems to be so low that it's marginally acceptable.

I can share the code if you want.

No worries, thanks, seems like a much more thorough analysis than what I did using this: https://www.dcode.fr/benford-law

MedicineHut
Feb 25, 2016

chadbear posted:

I couldn't find the the code that I used so I quickly redid the analysis. I copy/pasted the hourly pledges, converted everything into numbers and fed it into R. I used the benford.analysis package. For absolute hourly pledges I get a very conspicuous result. But in my opinion that's not a fair test because Benford's Law only concerns the first digit and the absolute hourly numbers are not independent. Imagine if you have 9.200.000 $ and you add some random ships every day. The leading number is going to be 9 for a long time even though there is no fudging.

So instead I looked at changes in hourly pledge numbers, i.e. how much is added every hour to the tracker.



The graph in the top left is the most important one. The first digit in the change of the pledge seems to follow Benford's Law pretty closely. There are some caveats though: For the other measures it doesn't follow Benford's prediction though, it seems. The statistical test that the package produces says:

Mantissa Arc Test

data: diff(data$pledge)
L2 = 0.010266, df = 2, p-value < 2.2e-16

Mean Absolute Deviation (MAD): 0.01254406
MAD Conformity - Nigrini (2012): Marginally acceptable conformity
Distortion Factor: -6.625748

So it's statistically significant but the mean absolute deviation seems to be so low that it's marginally acceptable.

I can share the code if you want.

Btw, do you mind uploading somewhere the file with the raw hourly change data? Just so I can compare with mine, I am still puzzled that the dcode site, as simple as it is, yields such a different result from yours. The only explanation I can see is that we are using very different data.

Taintrunner
Apr 10, 2017

by Jeffrey of YOSPOS
plane citizen is letting people show off gameplay demos now

https://www.youtube.com/watch?v=xSYOZVkiWqM

chris roberts is gonna be pissed when he finds out about this

LazyMaybe
Aug 18, 2013

oouagh
slight throwback

https://twitter.com/magicalgirlnoir/status/1294211904739708928

sebmojo
Oct 23, 2010


Legit Cyberpunk









Tbh 2020 is so insane I'm 100% willing to believe the funding all basically on the level.

It must be comforting to put money into a dream, perhaps even more so when there is basically no chance it will ever become real

Kosumo
Apr 9, 2016

Bootcha posted:

To give you other questions to ask data:

June 2020 Analysis:
21 days out of 164 in 2020 brought in 5 digit funding. 12% of the YtD.
2019 had 125 out of 163 YtD (remember, leap year in 2020). 77% of 2019 YtD was 5 figures daily.

2018 year total was $37,759,020.
Daily average of that is $103,449.
let's exclude oct-nov-dec from that count
Oct: 4660591
Nov: 7971821
Dec: 6268518
Those three months brought in $18,900,930
the other 9 months brought in $18,858,090
Just under half the year's total.
273 days of that.
for a rounded average of $69,077

As for those three months we're "removing", let's compare and see if there's some foam.
Those 3 months took in $205,444 per day on rounded average.
So the "low" months take in about a third of funding compared to the high quarter of Oct-Nov-Dec.
I'm just gonna call that holiday time.

So let's look at 2019.
The gangbusters holiday time that started this all.
Total for 2019 was $47,735,514
Oct: $3,866,815
Nov: $9,700,386
Dec: $11,399,761
Total for Holiday months is $24,966,962
Leaving $22,768,552 for the low months.
Again, about under half, but a bit more this year.

So while the low months of 2019 are kinda weird in their consistency, I'm only going to dock them August and September, in the "where did this fuckery begin" maths.
So I'm adding Aug and Sep to the "holiday months for 2019.
Aug: $2,561,993
Sep: $2,461,906
$29,990,861 is the new holiday number, with $196,018 daily average for those 153 days.

49% was the non holiday intake ratio in 2018, 37% was the NewHoliday non-holiday intake ratio
Let's go with 43%, in the middle.
If ratios and patterns remain constant, at this ludicis gangbusters pace...
bearing in mind we still have proper citcon/anniversary still coming up
$108,717,788 would be the projected 2020 funding, according to the funding tracker.

$39,015,588 is our YtD total so far in 2020.
Today hasn't been "counted".
So it's actually by yesterday's measure.
2,711,058 registered accounts.
If every single one of those account bought something THIS year.
$14 per account.
But, let's remember the old Turbulant ratio, from the french HuffingtonPost article.
That half the registered users are paying accounts.
So that puts us at, currently, 1,355,529 paying customers.
For paying customers it'd be $28 per customer spent so far YtD.
225,934 new accounts
Is an 8% increase of accounts YtD 2020 enough to cover a 101% funding total, over 2018, in it's entirety?
A 233% increase of dollar per account over the prior year?

Feb 2020 Analysis:
okay, here's the breakdown
2014: $16 per citizen
2015: $12 per citizen
2016: $6 per citizen
2017: $3 per citizen
2018: $3 per citizen
2019: $4 per citizen
2020: $6 per citizen

There is an increase in the last two years, by a third in 2019, and by half in 2020.
Now, comparing that against how much the citizen count has grown per year:
2014: Starting at 346895
2015: 708095, 104% increase
2016: 1141548, 61% increase
2017: 1705848, 49% increase
2018: 1957205, 15% increase
2019: 2214195, 13% increase
2020: 2486328, 12% increase
You'd be right in saying there aren't enough "new citizens" to dilute the increase in pledges.
I'd be willing to write off 2019 as a bull market, that it was the best year ever.

Heck, I'll even go one further, let's say there are 50k super whales that make up 70% of pledges.
Each one of those 50k super whales, to make up 70% of the market, had to have pledged:
2014 $76.05
2015 $123.60
2016 $95.33
2017 $70.52
2018 $82.64
2019 $119.34
2020 $200.46

Dude, you don't know the half of it.

You should check out Sunk Cost Galaxy, you may learn a thing or two.

Shaman Tank Spec
Dec 26, 2003

*blep*



TheAgent posted:

garbage data being fed into tables at massive rates without any sort of protection can cause unrecoverable corruption

since no one there has worked on MMOs or basic security for databases this kinda poo poo is gonna get worse and worse and worse

Working with databases can be tricky at the best of times, especially if best practises aren't followed and doubly especially if some of the people working on said databases don't even know the non-hosed practises.

In my previous project I was managing the backend for a large system that revolved around location data. We stored data from a bunch of people using a locator tag, retrieved through a net socket and stored on a PostgreSQL database. All access to the database was supposed to be handled through a restful API which only allowed limited manipulation to make sure nobody broke anything.

Except then one of the geniuses working on the project somehow got direct access to the database and attempted to make a copy of all our data for some bizarre purpose. This wasn't just some random dude, this was a guy specifically hired to design the database, which he quickly proved incapable of doing thus landing it all in my lap. So anyway, at close to midnight that night I get a frantic Slack message from the guy saying he's made a "small error". Instead of copying the database, he has managed to delete all the contents. All of it. Every single line is still present, but all the values are blank.

LUCKILY because I'm a paranoid rear end in a top hat, one of the first things I had done when the database landed in my lap, was to set up a system to scrape it all into huge and unwieldy JSON files every night, so we had a set of local hard copy backups of the previous five days. I took some pleasure in informing our superiors the next day that the entire database had been blanked by our database guy, deleting two years of data, without which the project was done and dusted. I let them stew for a few seconds before I told them that I had hard copy backups of everything and could restore the database.

I don't even want to think about what a huge database that gets "iterated on" by a project consisting entirely of people who don't seem to have a clue, led by a moron who changes his mind about everything every 15 minutes, looks like. And I have a lot of pity for the poor motherfucker who has to try to keep that thing running somehow.

Colostomy Bag
Jan 11, 2016

:lesnick: C-Bangin' it :lesnick:

Taintrunner posted:

plane citizen is letting people show off gameplay demos now

https://www.youtube.com/watch?v=xSYOZVkiWqM

chris roberts is gonna be pissed when he finds out about this

Well that's all nice and all but can you serve a Hairy Roberts onboard? Checkmate.

Dwesa
Jul 19, 2016

Maybe I'll go where I can see stars
I am sure there will be some whale saying it looks like a scene from a live action movie and that this is the proper way to eat an ice cream.

chadbear
Jan 15, 2020

MedicineHut posted:

Btw, do you mind uploading somewhere the file with the raw hourly change data? Just so I can compare with mine, I am still puzzled that the dcode site, as simple as it is, yields such a different result from yours. The only explanation I can see is that we are using very different data.

Sure. Here's the raw hourly change in the tracker:

https://pastebin.com/8Dc6QF0K

Here's the R code that I used if anyone is interested. Before running it copy/paste the hourly pledges, convert everything to numbers (days and $ values), delete the last column because it's identical to the first column of the next day, and save as a csv.

https://pastebin.com/dVQLyidu

Colostomy Bag
Jan 11, 2016

:lesnick: C-Bangin' it :lesnick:

BTW, you two IMO aren't derailing the thread with stats.

TheAgent
Feb 16, 2002

The call is coming from inside Dr. House
Grimey Drawer
stats and numbers are good and fun

not quite as good as puppers, but still p good

Dr. Honked
Jan 9, 2011

eat it you slaaaaaaag

Taintrunner posted:

plane citizen is letting people show off gameplay demos now

https://www.youtube.com/watch?v=xSYOZVkiWqM

chris roberts is gonna be pissed when he finds out about this

Why would Crobbler ever be pissed off about another game? It doesn't matter what other games are doing. It's not about games, and never was. It's about the cult.

marumaru
May 20, 2013



UnknownTarget posted:

No, actually they use live streamed maps from Bing and use AI to generate the buildings from the satellite imagery. Check it out, it's really cool and looks way more advanced than procedural buildings;

blackshark.ai

ive been in since the alpha
it still kinda blows

MedicineHut
Feb 25, 2016

chadbear posted:

Sure. Here's the raw hourly change in the tracker:

https://pastebin.com/8Dc6QF0K

Here's the R code that I used if anyone is interested. Before running it copy/paste the hourly pledges, convert everything to numbers (days and $ values), delete the last column because it's identical to the first column of the next day, and save as a csv.

https://pastebin.com/dVQLyidu

Thanks. Yeah, we are using pretty much the same figures. Dcode just returns a fake result with both your set and mine though: https://www.dcode.fr/benford-law

MedicineHut fucked around with this message at 13:30 on Aug 14, 2020

Combat Theory
Jul 16, 2017

Thoatse posted:

lol

https://www.youtube.com/watch?v=Y6MnYX6OJvM

Birds start ~4:15 but check out the elephants and giraffes and poo poo too

Birds in flight simulations fill me with anxiety about bird strikes.

downout
Jul 6, 2009

Bootcha posted:

The Benford Analysis won't work for third-party tracked statistics. We'd have to have the exact data CIG reports on their timing method, to include hourly/daily/monthly. However, even then that isn't really what the Benford Analysis would want to look at, it wants to look at the financial statements of earnings and expenditures.

The only data set that would be a "truthful" as possible would be the UK financials, but they only file annually, not quarterly, so the data set is very limited to I believe 5 points at the moment (EOY), so that's not really enough to create a distinguishable smooth curve. Remember, this really only works when you have hundreds of data entries, not single digits.

This all being said, I do think there are some questions we can ask the funding tracker data, in conjunction with the US/UK financials, the Kickstarter backer tier counts, and the Turbulant 50% ratio. We solidly know the date, the amount of reported accounts, and the amount of recorded funding. The Kickstarter page has "34,397 backers pledged $2,134,374 to help bring this project to life" divided into tiers, however that does not guarantee each of those 34,397 are unique, but we can at least confirm there were 34,397 purchases. I think a ratio of buying habits can be drawn from that.

The trick is asking the data the right questions.

For example, if I wanted to predict the possible income from subscribers in 2019 and 2020, I would take the EOY account numbers on Jan1 against the CIG financial reports of subscriber income in thousands over the last 4 years and average that, which is 1.686 in subscriber dollars per account created by the way. From there, I can see what Jan1 2020's account number was (2,486,328), and predict that 2019 will report about 4.2 million in subscriber income, and also according to YtD CIG has brought in about 4.6 million in 2020 with a possible 5 million by EOY.

I did question if the data set was valid for doing Benford analysis. I was just doing layman's reading which suggested a larger data set is better, and numbers that had an equal opportunity to have any numerical values 1 - 9 for the 1st digit.. Which suggested to me that individual purchase amounts would be worse than the daily reported totals for analysis.

chadbear
Jan 15, 2020

MedicineHut posted:

Thanks. Yeah, we are using pretty much the same figures. Dcode just returns a fake result with both your set and mine though: https://www.dcode.fr/benford-law

I don't think it's a fake result. The test is also significant when I run it in R. If you compare the observed rates on the left with the predicted rates on the right you see that there are some differences but they are not large. I would call such a difference statistically significant but not significant from a smoking gun perspective.

MedicineHut
Feb 25, 2016

chadbear posted:

I don't think it's a fake result. The test is also significant when I run it in R. If you compare the observed rates on the left with the predicted rates on the right you see that there are some differences but they are not large. I would call such a difference statistically significant but not significant from a smoking gun perspective.



Yeah, I also noticed the distribution % are eyeballed close to expected distribution but I presume the fact it is rejected nevertheless suggests the rejection criteria for p-value has other control elements that are beyond eyeball smoking guns. That p-value is really tiny.

What this may mean in practical terms is that CIG is indeed cooking certain numbers into the real numbers in that tracker; maybe not in enough quantity to allow a smoking gun eyeball confirmation but enough to get absolutes to record level funding all the while not quite managing to fool Benford's Law.

MedicineHut fucked around with this message at 14:32 on Aug 14, 2020

trucutru
Jul 9, 2003

by Fluffdaddy

UnknownTarget posted:



Blackshark looks at an image and uses (talking out of my butt but this is how I would do it) AI image recognition to determine outlines of buildings, trees, water features, etc. Then, it draws a floor plan specifically for each building based on its findings and then applies a texture, generated off of imagery, to the model.


Yes, and the kind of process you just described is called procedural generation. You can call it AI-based Satellite-assisted targeted generation or whatever but it's still procedural generation.

Fidelitious
Apr 17, 2018

MY BIRTH CRY WILL BE THE SOUND OF EVERY WALLET ON THIS PLANET OPENING IN UNISON.
I know people are kind of desperate to believe that the funding tracker is faked because it makes you feel better about humanity, but I'm sorry guys, it's real.

I believe it's fully legitimate and the bump this year has been indirectly funded by government emergency money to people that didn't actually need it to survive.
It's one of those Occam's Razor things. While it's fun to imagine conspiracies about money laundering the much simpler answer is that the Star Citizen community has a good number of idiots with money to burn.

TheAgent
Feb 16, 2002

The call is coming from inside Dr. House
Grimey Drawer
star citizen has always been funded indirectly by government money tho

trucutru
Jul 9, 2003

by Fluffdaddy

Look at you, you filthy FUDster, your tweet barely has a third of the likes and replies of an official tweet from the company that has the most invested backers around.

https://twitter.com/RobertsSpaceInd/status/1293585843714551808

MedicineHut
Feb 25, 2016

MedicineHut posted:

Yeah, I also noticed the distribution % are eyeballed close to expected distribution but I presume the fact it is rejected nevertheless suggests the rejection criteria for p-value has other control elements that are beyond eyeball smoking guns. That p-value is really tiny.

What this may mean in practical terms is that CIG is indeed cooking certain numbers into the real numbers in that tracker; maybe not in enough quantity to allow a smoking gun eyeball confirmation but enough to get absolutes to record level funding all the while not quite managing to fool Benford's Law.

Further reading possibly relevant (or maybe just some confirmation bias on my part :p ): https://www.acfeinsights.com/acfe-insights/2016/1/15/benfords-law-a-real-life-case-study

quote:

A single vendor, in a population of more than 16,000 checks spanning a period of 10 years, had over 1,400 checks written to them over a period of just three years. Nearly 10 percent of the total checks were written to this single vendor, in just one-third of the time. The other entity had more than 1,700 checks written to one vendor over a period of five years. How is that possible? That doesn’t just happen naturally in most businesses. There must be some other reason.

quote:

In and of itself, a Benford’s Law analysis will not produce a smoking gun, but it will shine a light on the cloud of smoke, and if you follow that cloud of smoke, you might find the smoking gun. This is a fine example of the process in action.

MedicineHut fucked around with this message at 14:50 on Aug 14, 2020

chadbear
Jan 15, 2020

MedicineHut posted:

Yeah, I also noticed the distribution % are eyeballed close to expected distribution but I presume the fact it is actually rejected suggests the rejection criteria for p-value has other control elements that are beyond eyeball smoking guns. That p-value is really tiny.

What this may mean in practical terms is that CIG is indeed cooking certain numbers into the real numbers in that tracker; maybe not in enough quantity to allow a smoking gun eyeball confirmation but enough to get absolutes to record level funding all the while not fooling Benford Law.

I learned in my statistics education that with enough data points you can falsify any single hypothesis. As you gather more data points the empirical pattern converges ever closer to its true pattern. A theory like Benford's Law will never actually be "true" though, because theories are simply inadequate human tools to organize patterns. All theories are wrong. It's a question of how useful a theory is. Useful theories give better predictions than bad ones. Benford's Law definitely fits the data better than a uniform distribution, for example.

Since the number of data points that this dumpster fire has generated over the last decade is quite large I would also eyeball the data instead of relying on a single p-value. My gut feeling says that the test is inconclusive.

Dark Off
Aug 14, 2015




chadbear posted:

I learned in my statistics education that with enough data points you can falsify any single hypothesis. As you gather more data points the empirical pattern converges ever closer to its true pattern. A theory like Benford's Law will never actually be "true" though, because theories are simply inadequate human tools to organize patterns. All theories are wrong. It's a question of how useful a theory is. Useful theories give better predictions than bad ones. Benford's Law definitely fits the data better than a uniform distribution, for example.

Since the number of data points that this dumpster fire has generated over the last decade is quite large I would also eyeball the data instead of relying on a single p-value. My gut feeling says that the test is inconclusive.

furthermore mega whales of SC might skew the values towards fake as well.
they are spending ludicrous amounts of money on this, emulating fake transactions.

Basically the theory wont hold against human stupidity

Adbot
ADBOT LOVES YOU

trucutru
Jul 9, 2003

by Fluffdaddy

chadbear posted:

All theories are wrong. It's a question of how useful a theory is. Useful theories give better predictions than bad ones. Benford's Law definitely fits the data better than a uniform distribution, for example.


My theory is that CR is a hack and my prediction that SQ42 is gonna suck. Prove me wrong, brainiac.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply