Brain achieves a remarkable task to simplify the world and it is important for us to distinguish data from the noise. Take the example of a small piece of forest. This forest has trees, plants and some animals. The trees by themselves have branches, leaves, and other structures that have distinct shapes and sizes. If we had to classify all the information on one tree, then we would have to detail significant amount of information – detail information about shapes of each branch, shapes, color and size of each leaf since neither each branch or leaf is identical to another.
Put it in this way and then combined with information about each tree, there is a tremendous complexity to the forest and the structures that are described in detail.
However, calling a tree – a tree, makes several assumptions about some random nature of the tree and also some expected variation in a leaf. Once you have made that leap of generalization then the forest becomes a collection of trees that has some variation is size, shape, and color but can be broadly identified and grouped into a structure called a tree. However, a diseased tree had slightly different characteristics – for example fewer leaves or anomalies in the branches that are not as branched and appear broken.
When looking at Big Data it is important to reduce the data down to structures, shapes or broad categories that separates the expected noise from the small signal. Then it is much easier to comprehend and understand the structure of the data and find the signal among the noise.