Abstract:

The vertex-centric programming model is an established computational paradigm recently incorporated into distributed processing frameworks to address challenges in large-scale graph processing. Billion-node graphs that exceed the memory capacity of standard machines are not well-supported by popular Big Data tools like MapReduce, which are notoriously poor-performing for iterative graph analytics algorithms such as PageRank, Diameter, Shortest Path, etc. Vertex-centric programming is a new computational paradigm that iteratively executes a user-defined program over vertices of a graph. The vertex program is designed from the perspective of a vertex, receiving as input the vertex's local data as well as data from adjacent vertices along incident edges. The vertex program is executed across vertices of the graph synchronously, or may also be executed asynchronously. Execution halts after either a specified number of iterations, or all vertices have converged. The recent introduction of vertex programming methods and frameworks has spurred the adaptation of many popular Machine Learning and Data Mining algorithms into graph representations for high-performance vertex-centric processing of large-scale data sets. The goal of this presentation is to demonstrate how to think like avertex in order to analyze very large networks. Several seminal examples will be presented to introduce and explore this paradigm before a suite of new algorithms that the Data Science Group at the University of Notre Dame has developed to perform graph analytics, compute models, and efficiently execute algorithms on massive

networks.

Bio: Tim Weninger is an Assistant Professor at the University of Notre Dame where he directs Data Science Group and is a member of the Interdisciplinary Center for Networks Science and Applications (ICENSA). His research interests are at the intersection of large scale information network analysis, social media, information retrieval, machine learning and data mining. He is specifically interested in how humans collectively create and navigate information networks, and he uses properties of these emergent information networks to reason about the nature of relatedness, membership and other abstract and physical phenomena.