It is easy to get fooled by how the statistics interpret data. Sometimes, analysis of big data sets lead to conclusions that may not make sense. Also, the cause and effect do not work quite the same when the big data analysis shows a correlation. Just because there is a correlation does not mean that there is a cause and effect. Take the example of Kaggle… they ran a contest in 2012 on the quality of used cars and the characteristics of those cars. A used car dealer supplied the data to predict which cars were likely to have problems, their characteristics and what were the other cars that were not so likely to have problems. A correlation analysis showed that cars painted orange were far less prone to have defects – about half the rate of other cars. What has the car color got to do with problems? Color has no correlation and rightly so – this was just the chance event that was pulled out. But once, such a correlation between the car defects and color had been found out, the conclusions that can be drawn tends to get ridiculous. • Paint your car orange to have fewer defects. • Buy a orange car and your car will last longer, no matter how you treat it and forget about the oil change. • If you have an orange car, then you do not need to maintain the car. However, these conclusions get more complicated the more you use them. Even with the most complicated analysis, it is important to think about reason rather than believe everything that can be concluded.
Similar Posts
Hacking medicine and Amazon Kindle
The current period has been called the information age, where information is the most important element of society. Interestingly, with so much data being collected, the next age is the data analysis age wherein the analysis of data is important. An interesting thing about research in medicine is that it has been oriented towards clinical…
Measure heart rate with WiFi
Heart rate monitors have been used to measure the heart rate during sports and other activity. They have also been used to measure serious medical conditions. Smart watches these days make it very easy to measure heart rate and almost all of them perform the measurement. However, they require contact with the skin or body….
Mathematics prizes
There have been a variety of prizes to solve problems that are practical such as space rocket. Some companies have made a business out of problem solving through mediating solvers and problem generators – such as those mediated by Innocentive. However, rarely are problems solved just for the sake of solution of the problem. One…
Culturomics and Computational Lexicology data mining techniques.
Computational lexicology methods try to understand human behavior, cultural norms through analysis of texts. These methods try to enlist the usage of words through the years and conclude what has changed in human behavior over the years. There have been various studies done, one in particular in which Harvard scientists showed that nearly 50% of…
Simulating the whole brain one cell at a time.
Dr Dharmendra Modha’s group at IBM has been simulating the whole brain one cell at a time. They started with cat brain simulation and have almost reached the whole brain simulation except it runs about 1542 times slower. This is a simulation of nearly 1.6 billion virtual neurons and 9 trillion synapses. The power consumption…
Bayes Theorem
How do we make sense of all the data that is being generated? Traditional statistics seems to fail to analyze such huge numbers and permutations and some mathematicians have suggested that Bayesian methods might be the answer. Search for an explanation of Bayes Theorem or Bayes Statistics and you will find countless documents. One search…