Local Views and Global Conclusions

16
Опубликовано 22 июня 2016, 19:42
Graph-structured data has become a universal phenomenon in the sciences, and novel mathematics is required to tackle the problems stemming from analyzing this data. The following challenge in bioinformatics exemplifies this general class of problems: the protein Interaction graph G of an organism has one vertex for each of its proteins and an edge for each pair of interacting proteins; several competing theories attempt to describe how such graphs emerge in evolution and we wish to tell which theory provides a better explanation. A major difficulty in resolving such problems is that G is huge so it is unrealistic to calculate most of its nontrivial graph parameters. But even a huge graph G can be efficiently sampled. Given a small integer k (say k=10), the k-profile of G is a distribution on k-vertex graphs. It is derived by randomly sampling k vertices in G and observing the subgraph that they induce. A theory largely developed in MSR ("Theory of graph limits" - Lovasz, Szegedy, Chayes, Borgs, Cohn, Friedman...) offers a clue. It says essentially that to decide whether a series of large graphs is derived from a given statistical model it is enough to check that the graphs' profiles behave as they should. I will give you some sense of the theory of graph limits and then move to discuss profiles. The two main questions are: (i) Which profiles are possible? (ii) What global properties of G can you derive, based on its profiles?
автотехномузыкадетское