Databases summarized for the biologist

Databases a simple definition: Database is a collection of data that is managed as a single unit. There are Six major types or models of databases:

Flat file databases: These were databases that managed all the data in one flat file. The file was just a list of items. There were several ways that this was managed and the easiest being is to create an index of the items in the database allowing the user to easily track the items in the list, find it or catalog it. Many times the data was organized as a collection of files and made the databases. This was the earliest evolution of the database. This was used in earliest of databases like DB1V.

Hierarchical model: This kind of database organized the data into collections of items that were related through same way or form to each other. The usual relationship was the parent-child where almost every item had a parent and child. The example of this was Information Management System from IBM.

Network model: In this case the items are related to each other through a relationship. The network model links all the items together and can be useful to find relationships but the programming becomes very complex easily and this requires a caution against falling into infinite loops.

The Relational Model: This is the most popular database model in use today. The central premise is that data is related to each other through relationships similar to previous models but the relationships are more direct and flexible. They allow creation of relationship paths through the data using a key index value that links various data sets to each other. The unique feature is that queries to the database works through sets of data rather than going through each data point at a time.

Object oriented Model: This model is based on the concept of an object. An object is a member of a class that has data items called variables and methods. Methods are operations that are performed on a series of records or variables. The key point to note here is that variables may only be accessed through methods – a characteristic known as encapsulation. The object oriented method makes it very easy to develop applications that are object oriented and has become very popular.

Hadoop, MapReduce for “BigData”: These are relatively simple implementations of a framework for distributed computational and data analysis framework. This distributes the data and the applications into small distributed chunks that can process the data in multiple levels. The technology is still evolving and more information is available below.


Posted

in

by

Tags: