My research interests span theoretical machine learning, statistical inference, optimization, and game theory. In my spare time, I like to read, run, and watch Cricket 🏏.
Due to resource constraints and restricted access to large-scale graph datasets, it is often necessary to work with a sampled subgraph of a larger original graph. The task of inferring a global property of the original graph from the sampled subgraph is of fundamental interest. In this work, we focus on estimating the number of connected components. Due to the inherent difficulty of the problem for general graphs, little is known about the connection between the number of connected components in the observed subgraph and the original graph. To make this connection, we propose a highly redundant motif-based representation of the observed subgraph, which, at first glance, may seem counter-intuitive. However, the proposed representation is crucial in introducing a novel estimator for the number of connected components in general graphs. The connection is made precise via the Schatten \(k\)-norms of the graph Laplacian and the spectral representation of the number of connected components. We provide a guarantee on the resulting mean squared error that characterizes the bias-variance trade-off. Furthermore, our experiments on synthetic and real-world graphs show that we improve upon competing algorithms for graphs with spectral gaps bounded away from zero.
Proceedings of the World Wide Web Conference (WWW), 2019
This paper proposes an attributed network growth model. Despite the knowledge that individuals use limited resources to form connections to similar others, we lack an understanding of how local and resource-constrained mechanisms explain the emergence of rich structural properties found in real-world networks. We make three contributions. First, we propose a parsimonious and accurate model of attributed network growth that jointly explains the emergence of in-degree distributions, local clustering, clustering-degree relationship and attribute mixing patterns. Second, our model is based on biased random walks and uses local processes to form edges without recourse to global network information. Third, we account for multiple sociological phenomena: bounded rationality, structural constraints, triadic closure, attribute homophily, and preferential attachment. Our experiments indicate that the proposed Attributed Random Walk (ARW) model accurately preserves network structure and attribute mixing patterns of six real-world networks; it improves upon the performance of eight state-of-the-art models by a statistically significant margin of 2.5-10x.