To estimate risk of a disease, the earliest method that scientist used were epidemiological data analysis. For example, in the 1950’s they found that smokers got an increased incidence of lung cancer. It took the society nearly 3 decades before it was generally accepted and warnings started being issued but that is another story. Currently, the method data analysis is to use biomarkers for the disease. So, for example if you go to the doctor to measure heart risk, they would measure the C-reactive protein and brain natriuretic peptide. Based on these values and comparing the statistical correlation with the data, you will be able to compute the score for risk.
These direct measures are easy to compute if you have the data available. Where it gets interesting is to modify these parameters in interesting ways to compute the heart attack risk. So for example, you could combine the two parameters with the cholesterol numbers and call it “Serum risk”. The combination could be an equation or a direct correlation but now you have one number to compute risk.
A group of authors, Zeeshan Syed, Collin M. Stultz, Benjamin M. Scirica, and John V. Guttag have done just that in Sci Transl Med , 28 September 2011:Vol. 3, Issue 102. They looked at the Electrocardiogram data of many patients and rather than using the traditional method of correlating risk with one measured factor, depressed left ventricular ejection fraction, they combined factors to create three new “biomarkers”, morphologic variability, symbolic mismatch, and heart rate motif. These were computationally derived and there was no previous knowledge or derivation of how combination of factors could be useful.
This takes our predictability to a whole new level and can now probably catalyze more biological science in understanding what are the factors involved in risk. Also, since these are computationally derived, there is very little inherent bias. But wonder, if this can be used as an argument now about how computers are more intelligent than humans?