[shameless bump!]Since I'm quite free today and the next couple of days, I would like to revisit this stats again. I'm really interested to find whether it's true that higher ranking decks get more hits than the rest of the decks in each league. Sort of a Mythbuster mission - whether it's a myth, fact or plausible.
However, I'm not entirely confident in my analysis, thus I would like to hear some feedback on this (especially from those math/stats wiz).
Bear in mind that I'm not looking for anything that's too complicated - My only tools will be MS excel.
To make things a bit easier to understand, I'll be referring to this googlesheet (
https://spreadsheets.google.com/spreadsheet/ccc?key=0ApI-50p2LmlcdFUzUDhDdFB1bVFnSG5pbkt4YzR0WlE&hl=en_US) on some points below. You should be able to play around with the last tab (pivot table), but I really hope no one will "bork" it.
Okay, so here goes my wall of text. >.<
[start wall of text]DATA POOLIn order to get a good result, stats from 1 day is not enough. I haven't decided how large the data pool should be (how many days), but I think it should be around 3-4 days.
GATHERING DATASince the leaderboard only shows the total games of each deck, I need to figure the exact number of hits for that particular day. To get these numbers I need to collect the data for every 24 hours; i.e. I collect the first data on 1st Aug at 12:00 GMT, the second on 2nd Aug 12:00 GMT, etc. Bottom line, the time should be specific.
FORMULANext, is the formula. Specifically,
how to calculate the exact number of hits for that particular day. The basic formula is fairly easy; i.e. I just need to calculate total hits from Day 2
-- total hits from Day 1. To anyone who is familiar with excel functions, I'm using the vlookup function (see Tab 2, column L. "Day_1" is a named range referring to Tab 1)
On the other hand, there are some challenges/obstacles that I have identified so far:
User can reset his deck; e.g. in Day 1 the user has a total games of 10, but he reset his deck in Day 2 and got 5 hits on that day.
Solution: I can't use the basic formula. I would just need to take "5" as the total number of hits for Day 2.
Similar to above, users who were not in the leaderboard in Day 1 can submit a new deck in Day 2 and enter the leaderboard.
Solution: same as above.
The time when a deck was reset/submitted. These can happen within the 24 hours. I'm not sure whether this would have any significant effect or not. Also, I don't think there's a way to overcome this "challenge".
Solution: dunno... >.<
Based on the above, I've concluded as per below. You can also find the formula in Tab 2, column M.
(btw, this is not a psedocode. I'm not a programmer >.<) If deck's age=0 (deck reset) OR name not found in Day 1 (new user in leaderboard),
Take total number of hits for Day 2
Else
total number of hits for day 2 - total number of hits for day 1RESULTDisplaying the result is fairly easy using Pivot table. The tricky part is to figure out which are the most relevant values. For starters, I'm going with rank vs total number of hits. I'm displaying the rank in range of 1-24, 25-49, etc. Then, I have min, max, sum and average of hits within the rank range.
Below is the result for Silver league. The data was taken at around 5:00 GMT July, 24 . So, I can say that from 5:00 GMT July, 23 till 5:00 GMT July, 24, people around the world played a total of 5,956 games in Silver league.
You can also view this in the googlesheet link I've provided earlier. Take note that the pivot table option may take a while to load. Make sure you select on of the cells in the table and wait. The only filter you should change is the League as shown below.
[end wall of text]
Again, I would very much appreciate any feedback about this, specifically the analysis/method/math/formula/orwhutevaitis that I mentioned above. Also, I would rather we do not begin to discuss the result from this one day result... yet.