eScience Workshop 2005 - Integration and Visualization in Bioinformatics

31
Опубликовано 7 сентября 2016, 16:11
One of the greatest benefits of escienceΓÇöthe use of distributed computing and data resources for scientific discoveryΓÇöis the opportunity for scientists to begin working with data sets that would have been too large to work with otherwise and, consequently, ask questions that would have not been possible. There are many obvious challenges escience faces because of its distributed nature, but other challenges that, while not uniquely escientific, remain sufficiently domain-sensitive that solutions do not seem easily shareable. One particularly difficult problem is integrationΓÇöhow to coherently bring together disparate, massive data sets. Focus has been generally placed on the physical layer, borrowing from the three layers of data modeling, where details of implementation predominate. This problem will likely continue, though there is some hope leveraging ΓÇ£smartΓÇ¥ architectures like smart clients. Logical integrationΓÇöhow to meaningfully bring together massive, disparate data setsΓÇöfrom the scientistsΓÇÖ perspective is even more challenging. Another challenge of escience is creating meaningful, interactive visualizations of massive data sets. A direct benefit of this kind of visualization is allowing the scientist to freely explore in a setting that is more familiar and intuitive. In this presentation will we discuss three ongoing projects, CATPA (Curation and Alignment Tool for Protein Analysis), INGeNE (Integrated, Gene Network Explorer), and SNPEx (SNP Explorer) that address the challenges of integration and visualization. CATPA is a smart client application that allows for the curation of protein families at the residue level, including deletions. Interaction is done visually. INGeNE is an application that allows for functional genomic discovery by building networks of relationships where an edge is a determined by a combination of microarray data, protein-protein data, gene-gene interaction data, and phenotypic expression data. SNPEx is an application that includes a novel algorithm to find the most informative set of tagging SNPs. Additionally, we decided to implement SNPEx in both Java/MySQL and C#/SQLServer 2000 to compare performance of the two systems and found the later to be superior in our suite of tests.
автотехномузыкадетское