| |

Functional prediction of microbial sequences

A gloved scientist examines bacterial culture in a petri dish for research purposes.

Even with E.coli and M.tuberculosis we only know ½ of them.

Can you use ML model to define function: as natural language, or molecular interaction or chemical reaction.

Function as molecular interaction: protein – protein interaction.

Genomics: Learn association between genes (just like words). It is called gLM2. A multi-modal single residue resolution gLM.

GLM2 leaRNS CROSS-PROTEIN CO-EVOLUTIONARY SIGNAL.

The genes occur next to each other since they evolve and mutate fast so they are generally located together. GLM2 learns the signal between 2 genes. – evolutionary covariances co-evolutionary signal. This coevolutionary sequence also maps to PDB space (they are similar to each other).

FlashPPI leverages gLM2, coevolutionary statistic and joint optimization at protein and residue levels. FlashPPI learns a latent space of PPI. So protein in close enough spaces are closer to each other.

It is exceptionally fast – previous models give a score – how likely is it to interact. The way they do it to limit it in proteome space and is extremely fast.

FlashPPI predicts microbial only, cannot distingish between false positive paralog cross-talk, distant homologs tend to be misclassified to be interacting, (mitigate by penalizing homolog interactions), Some proteins behave like hubs (>20 scoring interactions)

This requires an inference pipeline – since more knowledge of functional annotations may be needed.

SeqHub has the FashPPI network. (https://seqhub.org/)

Joint optimization at protein and residue levels is important

https://www.tatta.bio/blog/glm2

Similar Posts

  • Sparql tool – connect LLM to knowledge graphs

    Talk to knowledge graph – sparql-tool Use LLM.: Hallucinations (sometimes good for creativity), Outdated knowledge, no access to your data ( trained on public knowledge). Solutions: Fine tuning – retrain the model on domain data RAG – Search: look things up in real time. Claude code looks at searched doc. Vector embeddings: semantic similarity search…

  • Explainable AI

    A very traditional problem solving method is the following: given a set of features or variables, can we understand the features to form a conclusion. This could be something like a treatment strategy wherein the strategy is built on a series of data and then ingesting the data helps make a conclusion. However, an equally…

  • Grunost

    Grunost is a type of cheese that is sweet and often typical of Norways’ cheese culture. It is incidentally not a cheese at all since it is made from the process of making cheese that leaves whey as a by product. This whey is boiled with milk or cream to make a brown caramelized cheese….

  • Open source goodies

    Open source text search engine There are many technologies available that get slightly better packaged and then sold commercially. Often times the technologies are so superb and crowd sourced so well that it is surprising that many people do not consider it as a valid strategy for their laboratories or companies. Take the example of…

  • Vitamin K2

    It has been traditionally thought that for bone health, sufficient calcium is required. However, it is just not calcium. A vitamin called Vitamin K1 (phylloquinone) and K2 (menaquinone) in combination with Vitamin D2 and calcium is important for activating a protein called osteocalcin which binds to calcium to build bones. Osteocalcin is also involved in…

  • A very sensitive vibration sensor

    Vibration sensing is important for sensing failure of instrumentation or found sound and vibration measurements. Technology for detecting sound has existed for a long time and usually employs a variety of microphones. However, vibration is difficult to measure easily. There have been many reports of utilization of microelectromechanical systems (MEMS) based technology such as this…