| |

Functional prediction of microbial sequences

A gloved scientist examines bacterial culture in a petri dish for research purposes.

Even with E.coli and M.tuberculosis we only know ½ of them.

Can you use ML model to define function: as natural language, or molecular interaction or chemical reaction.

Function as molecular interaction: protein – protein interaction.

Genomics: Learn association between genes (just like words). It is called gLM2. A multi-modal single residue resolution gLM.

GLM2 leaRNS CROSS-PROTEIN CO-EVOLUTIONARY SIGNAL.

The genes occur next to each other since they evolve and mutate fast so they are generally located together. GLM2 learns the signal between 2 genes. – evolutionary covariances co-evolutionary signal. This coevolutionary sequence also maps to PDB space (they are similar to each other).

FlashPPI leverages gLM2, coevolutionary statistic and joint optimization at protein and residue levels. FlashPPI learns a latent space of PPI. So protein in close enough spaces are closer to each other.

It is exceptionally fast – previous models give a score – how likely is it to interact. The way they do it to limit it in proteome space and is extremely fast.

FlashPPI predicts microbial only, cannot distingish between false positive paralog cross-talk, distant homologs tend to be misclassified to be interacting, (mitigate by penalizing homolog interactions), Some proteins behave like hubs (>20 scoring interactions)

This requires an inference pipeline – since more knowledge of functional annotations may be needed.

SeqHub has the FashPPI network. (https://seqhub.org/)

Joint optimization at protein and residue levels is important

https://www.tatta.bio/blog/glm2

Similar Posts