Category: Data analysis and Big Data

  • Enabling data visualization for large data

    The data analysis engine that most people use most commonly is something like an Excel spreadsheet with charts to look at the data that they have collected. However, once the dataset gets bigger, the tools to analyze the data also need to scale. Additionally, as the dataset gets bigger, it is managed through a database…

  • Managing PDF Literature

    Each researcher spends quality time managing the literature that they have collected in the form of papers and PDF’s. However, the collection of PDF continues to grow and there are very little in the form of tools that help manage it. There are three(3) great open source tools that work wonderfully to manage the collection…

  • Haploid maps : Large data sets of haploids

    There are large data-sets that are being generated that are amazingly large, completely open for anyone to use and see and can generate valuable conclusions.   Hapmap is one such grand project. This is made up of participants from US, UK, China, Japan, Canada and Nigeria. Look at the link below to get all the…

  • Integrating information: DistilBio perspective

    All scientists agree that there is a lot of information and data that is being generated and that it takes a significant amount of time to integrate that information. Trying to draw conclusions from that and make new discoveries is quite another matter. A company called “DistilBio” has done a wonderful job of integrating those…

  • Data – strange tales of paper size – A4

    Paper comes in different sizes. Paper is used every day and it has been accepted as a standard but there is some interesting mathematics behind it too. Consider size A0: The area of a A0 size of paper is exactly 1 meter square. Interestingly, each subsequent size is half the area of the previous size.…

  • Examples of the use of data mining techniques – Learning.

    A new coined term “data exhaust” has become popular. This implies that the data that is collected without a specific need or a specific routine, is also useful even though it is being “exhausted” like waste gases. Take an example of massive online courses: Coursera. Massive online courses (MOOC) are the latest trends in learning.…

  • Examples of use of data mining techniques – E book readers

    There is a great value in Ebook readers. Yes, they are very convenient to have a lot of books available to read and reading them is much easier with a screen, instead of messing with paper and other tools. However, the big value of Ebook readers is to the seller of books. Typically, they did…

  • Examples on how data mining helps with real world problems – data scientist

    Facebook’s Jeff Hammerbacher is an interesting person. He literally coined the term – data scientist and has been one of the big proponents of data mining. He found that the big predictor whether people will take action is dependent on whether they had seen their friends do the same thing. This is true whether you…

  • Big Data is not always right – Fooled by analytics

    It is easy to get fooled by how the statistics interpret data. Sometimes, analysis of big data sets lead to conclusions that may not make sense. Also, the cause and effect do not work quite the same when the big data analysis shows a correlation. Just because there is a correlation does not mean that…

  • New York city inspectors on over crowding spaces with big data analysis

    NY Times reported about Mike Flowers, who used quantitative measures to find apartments or buildings that were overcrowded. The typical method was to use random checks in areas or go to areas that had some complaints about overcrowding in the apartments. The hit rate was about 10%, which means less than random and pretty much…