I think it is likely that the results are weird because at only 100 games the error margin is high. Testing is time-consuming, so it is tempting to stop a test early based on initial results, but if you do that you do not have an accurate measurement and are responding to noise.
I think it is likely that the results are weird because at only 100 games the error margin is high. Testing is time-consuming, so it is tempting to stop a test early based on initial results, but if you do that you do not have an accurate measurement and are responding to noise.