In 1936, a research company decided to get a handle on who was going to win the US election between incumbent and Democrat candidate Franklin D. Roosevelt and Republican Alf Landon, detailed Christoph Safferling of Ubisoft Blue Byte during his talk at GDC Europe 2014 in Cologne.
The firm decided to take the voting intentions of the largest number of US citizens it could, using the phone book and data from registered cards to send a mail out to a huge chunk of the population.
Surprisingly, you might think, the results of said survey predicted a win for Landon, when in reality, Roosevelt earned a second term with a landslide – a result accurately predicted by the then little known Gallop, based on a sample of just a couple of thousand people.
How could a much bigger survey sample prove to be so wrong, when a comparatively minuscule slice predicted the result with ease?
According to Safferling, those behind the bigger of the two surveys made a grave error when it came to choosing their selection process.
"At the time, only rich people owned phones or drove cars, which meant that the results had a bias towards rich voters who were more likely to vote for Landon," Safferling pointed out.
Self selection is actually everywhere, especially in games.Christoph Safferling
Bigger does not always mean better when it comes to producing statistics that hold water, yet it's a pitfall developers increasingly fall into completely unawares.
For instance, many developers rely on data from "self selection" groups – people who voluntarily enter said group to give their views. These may be early adopters, or forum posters, or simply dedicated gamers willing to take 30 seconds out of their day to answers your questions.
Whether positive or negative, these people will always give a more extreme take on your game than the masses, and can lead developers to make mistakes when it comes to deploying or removing features from play.
"Self selection is actually everywhere, especially in games," Safferling added. "Don't be blinded by the high KPI numbers. You almost need to take them and then knock them down a bit."
Even if you take care to avoid self selection, data can still give off rogue results.
Safferling continued by pointing out that statistics have shown that Coca Cola drinkers, for instance, have less problems in childbirth.
"This caused people to worry, thinking that it's not really right to be telling pregnant women to drink more soda. But, of course, who primarily drinks Coca Cola? Young people."
Young people who, by default, are less likely to have trouble in childbirth. Unwittingly, the statistics produce data that actually has no relation to the group in question.
The takeaway, then, was don't let data rule the design of your game, and while it's an important facet of game development, data is nothing if you don't know what it means or why it's in any way significant.