Playing ΓÇ£Hide and SeekΓÇ¥ - The Hidden Genome

32
Опубликовано 12 августа 2016, 2:35
The overwhelming increase in sequencing methodology resulted in the accumulation of millions of DNA sequences. These sequences are collected from thousands of genomes that (ideally) sample the ΓÇÿtree of lifeΓÇÖ. I will briefly discuss the ΓÇÿminimal set of instructionsΓÇÖ by which a linear sequence is transformed into a functional protein. What happen when the statistical noise is too high, thus classical procedures to predict protein sequences fail? I will focus on the challenge of identifying short proteins that remain buried in the genomic data. For illustration, I will take you for a ΓÇÿtreasure huntΓÇÖ for short proteins. Many short proteins share fuzzy features that are common to most animal venom. I will discuss the limitation in using classical tools that are based on string comparison, or pattern finding to identify short proteins. For this task, statistical machine learning methods were useful in identifying hidden bioactive sequences in several genomes. Evidently, such sequences are attractive candidates for novel therapy. The test case of short proteins illustrates the importance of a cycle that starts by a biological hypothesis, then uses a computational formulation and finalizes by an experimental validation. Finally, I will discuss our genomes with respect to our ΓÇÿpartnersΓÇÖ (viruses, bacteria). Once the interaction of these genomes is considered, the source for the dynamic nature of human evolution becomes evident. Related publications: ΓÇó Rappoport N, Karsenty S, Stern A, Linial N, Linial M. (2012) Nucl. Acids Res. 40:D313-D320. ΓÇó Rappoport N, Linial M. (2012) PLoS Comput Biol. 8:e1002364. ΓÇó Naamati G, Askenazi M, Linial M. (2010) Bioinformatics 26:i482-i488. ΓÇó Naamati G, Askenazi M, Linial M (2009) Nucl. Acids Res. 37:W363-368. ΓÇó Kaplan N, Morpurgo N, Linial M. (2007) J Mol Biol. 369:553-566.
автотехномузыкадетское