Do Input Gradients Highlight Discriminative Features?
Harshay Shah, Prateek Jain, Praneeth Netrapalli

Neural Information Processing Systems
(NeurIPS), 2021

ICLR workshop on Science and Engineering of Deep Learning
(ICLR SEDL), 2021

ICLR workshop on Responsible AI
(ICLR RAI), 2021

H. Shah, P. Jain, P. Netrapalli

NeurIPS, 2021

ICLR SEDL, 2021

ICLR RAI, 2021

Post-hoc gradient-based interpretability methods [Simonyan et al., 2013, Smilkov et al., 2017] that provide instance-specific explanations of model predictions are often based on assumption (A): magnitude of input gradients -- gradients of logits with respect to input -- noisily highlight discriminative task-relevant features. In this work, we test the validity of assumption (A) using a three-pronged approach. First, we develop an evaluation framework, DiffROAR, to test assumption (A) on four image classification benchmarks. Our results suggest that (i) input gradients of standard models (i.e., trained on original data) may grossly violate (A), whereas (ii) input gradients of adversarially robust models satisfy (A). Second, we introduce BlockMNIST, an MNIST-based semi-real dataset, that by design encodes a priori knowledge of discriminative features. Our analysis on BlockMNIST leverages this information to validate as well as characterize differences between input gradient attributions of standard and robust models. Finally, we theoretically prove that our empirical findings hold on a simplified version of the BlockMNIST dataset. Specifically, we prove that input gradients of standard one-hidden-layer MLPs trained on this dataset do not highlight instance-specific signal coordinates, thus grossly violating assumption (A). Our findings motivate the need to formalize and test common assumptions in interpretability in a falsifiable manner [Leavitt and Morcos, 2020]. We believe that the DiffROAR evaluation framework and BlockMNIST-based datasets can serve as sanity checks to audit instance-specific interpretability methods; code and data available at github.com/harshays/inputgradients.

@article{shah2021input,
title={Do Input Gradients Highlight Discriminative Features?},
author={Shah, Harshay and Jain, Prateek and Netrapalli, Praneeth},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}

The Pitfalls of Simplicity Bias in Neural Networks
Harshay Shah, Kaustav Tamuly, Aditi Raghunathan, Prateek Jain, Praneeth Netrapalli

Neural Information Processing Systems
(NeurIPS), 2020

ICML workshop on Uncertainty and Robustness in Deep Learning
(ICML UDL), 2020

H. Shah, K. Tamuly, A. Raghunathan, P. Jain, P. Netrapalli

NeurIPS, 2020

ICML UDL, 2020

Several works have proposed Simplicity Bias (SB)—the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models—to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Soudry et al. 2018]. However, the precise notion of simplicity remains vague. Furthermore, previous settings that use SB to theoretically justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks—a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017]. We attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by designing datasets that (a) incorporate a precise notion of simplicity, (b) comprise multiple predictive features with varying levels of simplicity, and (c) capture the non-robustness of neural networks trained on real data. Through theory and empirics on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features. (ii) The extreme aspect of SB could explain why seemingly benign distribution shifts and small adversarial perturbations significantly degrade model performance. (iii) Contrary to conventional wisdom, SB can also hurt generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complex features. (iv) Common approaches to improve generalization and robustness—ensembles and adversarial training—can fail in mitigating SB and its pitfalls. Given the role of SB in training neural networks, we hope that the proposed datasets and methods serve as an effective testbed to evaluate novel algorithmic approaches aimed at avoiding the pitfalls of SB; code and data available at github.com/harshays/simplicitybiaspitfalls.

@article{shah2020pitfalls,
title={The Pitfalls of Simplicity Bias in Neural Networks},
author={Shah, Harshay and Tamuly, Kaustav and Raghunathan, Aditi and Jain, Prateek and Netrapalli, Praneeth},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}

Growing Attributed Networks through Local Processes
Harshay Shah, Suhansanu Kumar, Hari Sundaram

World Wide Web Conference
(WWW), 2019

H. Shah, S. Kumar, H. Sundaram

WWW, 2019

This paper proposes an attributed network growth model. Despite the knowledge that individuals use limited resources to form connections to similar others, we lack an understanding of how local and resource-constrained mechanisms explain the emergence of rich structural properties found in real-world networks. We make three contributions. First, we propose a parsimonious and accurate model of attributed network growth that jointly explains the emergence of in-degree distributions, local clustering, clustering-degree relationship and attribute mixing patterns. Second, our model is based on biased random walks and uses local processes to form edges without recourse to global network information. Third, we account for multiple sociological phenomena: bounded rationality, structural constraints, triadic closure, attribute homophily, and preferential attachment. Our experiments indicate that the proposed Attributed Random Walk (ARW) model accurately preserves network structure and attribute mixing patterns of six real-world networks; it improves upon the performance of eight state-of-the-art models by a statistically significant margin of 2.5-10x.

@inproceedings{shah2019growing,
title={Growing Attributed Networks through Local Processes},
author={Shah, Harshay and Kumar, Suhansanu and Sundaram, Hari},
booktitle={The World Wide Web Conference},
pages={3208--3214},
year={2019},
organization={ACM}
}

Number of Connected Components in a Graph: Estimation via Counting Patterns
Ashish Khetan, Harshay Shah, Sewoong Oh

Preprint: arXiv:1812.00139, 2018

A. Khetan, H. Shah, S. Oh

arXiv:1812.00139, 2018

Due to resource constraints and restricted access to large-scale graph datasets, it is often necessary to work with a sampled subgraph of a larger original graph. The task of inferring a global property of the original graph from the sampled subgraph is of fundamental interest. In this work, we focus on estimating the number of connected components. Due to the inherent difficulty of the problem for general graphs, little is known about the connection between the number of connected components in the observed subgraph and the original graph. To make this connection, we propose a highly redundant motif-based representation of the observed subgraph, which, at first glance, may seem counter-intuitive. However, the proposed representation is crucial in introducing a novel estimator for the number of connected components in general graphs. The connection is made precise via the Schatten \(k\)-norms of the graph Laplacian and the spectral representation of the number of connected components. We provide a guarantee on the resulting mean squared error that characterizes the bias-variance trade-off. Furthermore, our experiments on synthetic and real-world graphs show that we improve upon competing algorithms for graphs with spectral gaps bounded away from zero.

@article{khetan2018number,
title={Number of Connected Components in a Graph: Estimation via Counting Patterns},
author={Khetan, Ashish and Shah, Harshay and Oh, Sewoong},
journal={arXiv preprint arXiv:1812.00139},
year={2018}
}