(edited a bit to reflect that some of my concerns were covered in a PM I saw after posting... and later I noticed that I corrected for something twice, and so fixed that)
If he got two sets with high marks and one with low marks, then overall he got what he got, 22 out of 75.
Round numbers, about one in 20 people would do that well or better against a truly uniform random selection of cards without replacement from an "infinite shoe."
It is boderline, and to know which side of the 5% probability cutoff it falls on (the most meagre criterion for "statistical significance" that anybody would begin to take seriously), I'd need to know a few more details of how the site works. (Definitely not uniform random without replacement from an infinite shoe, so the computed figure from the simple model will be a bit off)
(The "5% cutoff" used here reflects that both high scores and also low scores "win." I have a problem with deciding this
after the results have been viewed, since that invalidates one of the assumptions of the calculation.)
The 5 older sets (35 out of 125) are more clearly over the 5% line (
not the .05% line, which is 5 in 10,000 not 5 in 100). I'm assuming, of course, that this record of 5 sets does not survive because it is "good" while others were lost because they were less "good."
I cannot undertsand the rest of the post. Why are there "such-and-such out of 11" entries? There are 25 trials in a set. What happened to the other 14 in that set?
QUOTE
When he got an exceptionally high score, I recorded it.
You need to record every score. If this is just the times when he won, then there is no point doing an analysis to see if his good showing could plausibly be by chance. It isn't. You only recorded the good scores.
You definitely want to Google
"cherry picking" statistics for guidance on some of these issues.
What you report is tantalizingly better than chance, but the methods strongly suggest both selective reporting and also making up the criterion for success after you have seen the outcome.
(EV may have some further testing-related posts, however, and so it is best to stop here and turn things back to him).