Big Data is not always right – Fooled by analytics

Byscientist July 3, 2013

It is easy to get fooled by how the statistics interpret data. Sometimes, analysis of big data sets lead to conclusions that may not make sense. Also, the cause and effect do not work quite the same when the big data analysis shows a correlation. Just because there is a correlation does not mean that there is a cause and effect. Take the example of Kaggle… they ran a contest in 2012 on the quality of used cars and the characteristics of those cars. A used car dealer supplied the data to predict which cars were likely to have problems, their characteristics and what were the other cars that were not so likely to have problems. A correlation analysis showed that cars painted orange were far less prone to have defects – about half the rate of other cars. What has the car color got to do with problems? Color has no correlation and rightly so – this was just the chance event that was pulled out. But once, such a correlation between the car defects and color had been found out, the conclusions that can be drawn tends to get ridiculous. • Paint your car orange to have fewer defects. • Buy a orange car and your car will last longer, no matter how you treat it and forget about the oil change. • If you have an orange car, then you do not need to maintain the car. However, these conclusions get more complicated the more you use them. Even with the most complicated analysis, it is important to think about reason rather than believe everything that can be concluded.

Data analysis and Big Data

Searching for datasets
Byscientist March 15, 2021March 15, 2021

For anything Artificial Intelligence or Machine Learning, datasets are important and sometimes to tune the algorithms requires a dataset that is useful and valid.One search tool that many use is called “GOOGLE” but there is a specific link to search for datasets. https://datasetsearch.research.google.com/ Another site talks about the background of google search engine and other…

Read More Searching for datasets
Data analysis and Big Data

NIH thousand genome project
Byscientist June 19, 2013

How do you access 200 Terabytes of data? NIH is collecting data of many genomes that are being stored as computer code. These are then stored for researchers to access for their purpose. Being NIH, these large data sets are available for free. But, how do you process this amount of data. About the only…

Read More NIH thousand genome project
Data analysis and Big Data

Electronic lab notebook (ELN) comparison
Byscientist June 23, 2013

The lab is an important place where discoveries are made, whether the lab is virtual or whether the lab is a physical space where experiments are conducted. It is also a place where the data is acquired, compared and finally conclusions drawn to make the discoveries possible. One feature that is common to most labs…

Read More Electronic lab notebook (ELN) comparison
Data analysis and Big Data

Extensible Open source -omics software
Byscientist February 23, 2021February 23, 2021

Understanding complex data takes effort. This graphic shows co-morbidites in COVID-19 that was accomplished by a piece of software called Cytoscape The number of open source software that is available is a big list. One of them is Cytoscape. It has ability for wonderful integration of network data from various sources that can be analyzed…

Read More Extensible Open source -omics software
Data analysis and Big Data

Hacking medicine and Amazon Kindle
Byscientist June 16, 2013

The current period has been called the information age, where information is the most important element of society. Interestingly, with so much data being collected, the next age is the data analysis age wherein the analysis of data is important. An interesting thing about research in medicine is that it has been oriented towards clinical…

Read More Hacking medicine and Amazon Kindle
Data analysis and Big Data

Single cell addresses
Byscientist June 24, 2013

When we think of internet, it seems amazing to see that there are so many computers connected. This was made possible by the internet protocol version 4 provides 232 (4,294,967,296) addresses even though not all of them are usable. There are mostly saturated and the total number becomes about 4.2 billion and that is about…

Read More Single cell addresses

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Similar Posts