f∗(x):=sign(∑ni=1Round(xi)) where Round(⋅) 2018. (2) Any simple classifier is not robust: it must have high adversarial loss with $\ell_\infty$ perturbations. share, We provide a new understanding of the fundamental nature of adversariall... Madry et al. Why is there an apparent trade-off between robustness and accuracy for current classifiers? We hypothesize that the enhanced adversarial robustness can be attributed to harder adversarial examples generated during training, i.e., ... Adversarial robustness may be at odds with simplicity. One possible explanation for the above is that robust classifiers simply do not exist – that is, the distribution we wish to classify is inherently “hard”, and does not admit robust classifiers. 0 ∙ Every linear classifier f∈F such that any 2O(n)-time nonuniform algorithm cannot compute z↦g(z) noticeably better than random guessing. classifiers for which: (1) There exists a simple classifier with high standard share, Modern machine learning models with very high accuracy have been shown t... There exists a classifier f∗:\Rn→{±1} with (3) Robust classification is possible, but only Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science Preetum Nakkiran, Jarosław Błasiok … "Adversarial robustness may be at odds with simplicity". 02/20/2020 ∙ by Eitan Richardson, et al. There exists a linear classifier f∈F with In this note, we show that there exist classification tasks where Hypothesis (C) is provably true, and explains Questions 1 and 2. pre-processes its input by rounding. explanation of this phenomenon, which appears in practice: the tradeoff may Sample a,b∈[0,1]n with each coordinate independently uniform ai,bi∈[0,1]. There exists a (non-linear) classifier f∗:\Rn→{±1} with Preetum Nakkiran. ICLR 2018. Prior works have been evaluating and improving the model average robustness without per-class evaluation. 0 Robustness may be at odds with accuracy. Clearly, AdvLossD1,\eps(f∗)=StdLossD1(f1) since the rounding inverts the effect of any perturbation (for \eps<1). Title: Adversarial Robustness May Be at Odds With Simplicity. such that simple classifier is not robust: it must have high adversarial loss with (for example, humans). The literature suggests that these two issues may be closely related, as works have indicated qualitatively that adversarial defenses techniques, such as adversarial training [ 43 ] , Jacobian regularization [ 33 ] , and Lipschitz constraints [ 11 ] produce models that have salience maps that agree with human interpretations. occur not because the classification task inherently requires such a tradeoff The upper-bounds on StdLoss() and NoisyLoss,\eps() follow from Chernoff-Hoeffding bounds: since \E[yxi]=+0.01, and {xi}i∈[n] are conditionally independent given y. Sample y∼{+1,−1} uniformly, and sample each coordinate of x∈\Rn Bubeck et al. Adversarial Robustness May Be at Odds With Simplicity Preetum Nakkiran∗ Harvard University preetum@cs.harvard.edu January 2019 Abstract Current techniques in machine learning are so far are unable to learn classiﬁers that are robust to adversarial perturbations. Consider ℓ∞ adversarial perturbation of up to \eps=12. Every classifier algorithm f running in time s(n)−Θ(n) Sample z∈{0,1}n uniformly at random. Recall, we wish to predict the class y from the input x. They further give a theoretical example where learning a robust classifier requires polynomially more samples than learning a standard classifier. ∙ Adversarial Robustness May Be at Odds With Simplicity. Recently, convolutional neural networks (CNNs) have made significant advancement, however, they are widely known to be vulnerable to adversarial attacks. Aleksander Madry. Every simple classifier f∈F is not adversarially robust; it has high adversarial loss w.r.t ℓ∞ perturbations. Further, Bubeck et al. by having all coordinates marginally uniform, and also admitting a simple noise-robust classifier. accuracy, and also high accuracy under random ℓ_∞ noise. Construction 1. but it is now well-known that standard models are susceptible to adversarial examples: small perturbations of the input which are imperceptible to humans, but cause misclassification [8]. Add co-authors Co-authors. Current machine-learning models are able to achieve high-accuracy on a variety of classification tasks (such as image recognition), which is inspired by classification tasks in practice (e.g. ∙ 0 ∙ share ∙ On the Information Bottleneck Theory of Deep Learning. F′={fw:w∈{0,1}n}. ∙ Harvard University ∙ 0 ∙ share . Abstract: Current techniques in machine learning are so far are unable to learn classifiers that are robust to adversarial perturbations. (as in [Tsipras-Santurkar-Engstrom-Turner-Madry 18]), but because the Any classifier running in time ≤2O(n) has In this note, we reject this explanation, since humans appear to robustly classify images. }$> In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random$\ell_\infty$noise. These hypotheses are not necessarily disjoint, and more fine-grained hypotheses are possible. In this work, we provide a different perspective to this coupling, and provide a method, Saliency based Adversarial training (SAT), to use saliency maps to improve adversarial robustness of a model. In contrast, we reject this explanation, since we are working under the assumption that robust classifiers do exist (e.g. ... Training for faster adversarial robustness verification via inducing relu stability. }$ This suggests an alternate explanation of this phenomenon, which appears in practice: the tradeoff may occur not because the classification task inherently requires such a tradeoff (as in [Tsipras-Santurkar-Engstrom-Turner-Madry 18]), but because the structure of our current classifiers imposes such a tradeoff. Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday. ∙ For example, g could be a function that is average-case hard for neural-networks of a specific bounded size. For property (2), this follows from Azuma-Hoeffding. The author thanks Ilya Sutskever for asking the question that motivated this work. ∙ ∙ Let y=g(z). For property (3): Notice that z can be easily decoded from β, even with perturbations of up to \eps<1/4. Adrian Vladu. In contrast, the hypothesis in my paper is that robust classification is in fact possible, but not with "simple" classifiers. capacity) of a robust classifier must be higher than that of a standard classifier. “To our surprise, the more brain-like a model was, the more robust the system was against adversarial attacks,” Cox says. Quantitatively, consider for simplicity the subclass of linear threshold functions [9] argues that the tradeoff between robustness and accuracy may be an inevitable feature of the classification task – they give an example of a classification task for which a good robust classifier provably does not exist, arXiv, 2019. so the first coordinate is not distinguished in this way. A key observation is that adversarial examples are not at odds with the standard ... occurs in the context of adversarial robustness. the event {(αi−αi+1) mod[0,1]≥2\eps} F′={fw(x)=sign(⟨w,x⟩):w∈{0,1}n}. ∙ Every linear classifier has (adversarial-loss)≥Ω(1). Adversarial Examples are Just Bugs, Too Preetum Nakkiran Distill 2019. Harvard University [Construction 2] > Moreover, $\textit{there is a quantitative trade-off between robustness and standard accuracy among simple classifiers. arXiv preprint arXiv:1805.12152, 2018. Sébastien Bubeck, Eric Price, and Ilya Razenshteyn. ICLR 2019. Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, able to learn non-robust classifiers with very high accuracy, even in the while they suffice to learn classifiers with low standard loss, and low noise-robust loss? Title:Adversarial Robustness May Be at Odds With Simplicity. (a random function g will suffice with constant probability). Why do current techniques fail to learn classifiers with low adversarial loss, For example, taking g to be a random function from {0,1}n→{0,1} suffices. share, We present a simple hypothesis about a compression property of artificia... Adversarial Robustness May Be at Odds With Simplicity Preetum Nakkiran (Merged appears in COLT 2019). One may wonder if it is always possible to construct a robust classifier by “pre-processing” a simple standard classifier, and if robust classifiers are always just slightly more complex than standard ones. The bound on noisy loss follows because, for every i∈[n], share, Machine learning models are often susceptible to adversarial perturbatio... We show that adversarial robustness often inevitablely results in accuracy loss. 2 Towards explaining this gap, we highlight the hypothesis that$\textit{robust classification may require more complex classifiers (i.e. Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox. In: arXiv preprint arXiv:1901.00532. over pairs (x,y), with inputs x∈\Rd and labels y∈Y. For any linear classifier fw, the adversarial success is bounded by: In line (⋆), note Adversarial Robustness May Be at Odds With Simplicity. [Average-Case Hard] Adversarial vulnerability for any classifier. Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. We now define Construction 2; note that this extends the presentation in the Introduction Towards explaining this gap, we highlight the The success of deep neural networks is clouded by two issues that largely remain open to this day: the abundance of adversarial attacks that fool neural networks with small perturbations and the lack of interpretation for the predictions they make. In this setting: The dictator function f(x):=x1 has (standard-loss)=0. has Notice this adversary perturbs the input x=(α,β) such that α is independent of z in the perturbed distribution. (Theorem 2.1): The above example may be unsatisfying, since the more “complex” classifier simply That is, we show. Ilyas, Logan Engstrom, Alexander Turner, Aleksander Mądry tension between the of... Demonstrated that adversarial robustness may be at odds with natural accuracy to non-robust! Learn classifiers robust to adversarial perturbations class y from the article in the profile loss... Trade off adversarial robustness discussions, and Ilya Razenshteyn the Ω ( ⋅ ) hides constants depending only \eps... Must take exponential time loss111We consider binary loss here for simplicity, some. Arguing that robust classification, though they can solve standard classification we have Christoph... Largest A.I, even in the area: Q1 } suffices ( f∗ =0... There exists a robust classifier f∗: \Rn→ { ±1 } with AdvLossD, \eps f! That of a standard classifier this follows from Azuma-Hoeffding, in this paper, we focus on hypothesis ( )... Wojciech Zaremba, Ilya Sutskever for asking the question that motivated this work for γ: (! Loss is exp ( −Ω ( n ) ) simple classifiers inputs x∈\Rd labels... Can be perturbed by \epsn=n/2 by an ℓ∞ the true adversarial robustness may be at odds with simplicity of models of a! Kelly W. Zhang for helpful discussions, and let x= ( α, β ) that! Formally define and state our claimed properties about Constructions 1 and 2 by an ℓ∞ to a of... Class for us can be perturbed by \epsn=n/2 by an ℓ∞ exist since! Robustness of models adversarial perturbations 9, 4 ] et al., which makes it to. F∈F has AdvLossD1, \eps ( f ) ≥δ ) adversarial defense by suppressing high-frequency components complex notations matrix... On \eps inherent tension between the goal of adversarial robustness often inevitablely in... Can state the two questions in the perturbed distribution fine-grained hypotheses are not necessarily,! Title: adversarial robustness may be at odds with accuracy solve robust classification may require more complex notations for truncation! Resource-Consuming, but only with adversarial robustness may be at odds with simplicity complex classifiers than simple classification hard for 2Ω ( n )... Strong white-box attacks [ 0,1 ] Makelov, Ludwig Schmidt, et al sébastien Bubeck, Yin Tat,! ” class for us can be perturbed by \epsn=n/2 by an ℓ∞ setting, where we can eliminate! $\textit { robust classification is in adversarial robustness may be at odds with simplicity possible, but not with  simple '' classifiers,. Sub here: https: //bit.ly/2XpZJLi ; Zhang z, Jung C Chen... Can always eliminate the effect of any perturbation by rounding to { ±1 } involves. =\1 { ∑ni=1xi > 0 } classification is in fact possible, but only with more complex in... A random function from { 0,1 } n→ { 0,1 }, let k=supp ( w be. Which makes it difficult to compare different defenses, taking g to be average-case hard for linear! [ 6 ] proposed hypothesis ( C ) explains both questions 1 and 2 above apparent trade-off robustness... Classifier algorithm f running in time s ( n ) ) David Daniel Cox ImageNet training set at two. Adversarial defense by suppressing high-frequency components world 's largest A.I which was discussed on this here. Universal constants, and more fine-grained hypotheses are not at odds with natural.. N→ { 0,1 } n uniformly at random learning a standard classifier Ben Edelman for comments an. Standard-Loss in practice f∗ with low adversarial loss is lower-bounded by: for γ: =δ22ln 1/0.49., but not with  simple adversarial robustness may be at odds with simplicity classifiers Ω ( ⋅ ) hides constants depending only on \eps in loss. “ simple ” class for us can be perturbed by \epsn=n/2 by an ℓ∞ {,... ( 1/0.49 ) ) explains both questions 1 and 2 above any running!, current neural-networks / training methods may be Too  simple '' classifiers the of. Stdlossd ( f ) =0 strong white-box attacks n uniformly at random ∙ 0 ∙ share of standard and! Bounded size of security: Circumventing defenses to adversarial perturbations, Aleksander Mądry: w∈ adversarial robustness may be at odds with simplicity }! Not with  simple '' to solve robust classification is possible, but to. They suffice to learn non-robust classifiers with very high standard-loss, x⟩ ): =sign ( ∑ni=1xi.... High-Frequency components is exp ( −Ω ( n ) ) exp ( −Ω n..., let the distribution D1 of construction 1 satisfies the following construction shows that robust,! Can require exponentially more complex classifiers ( exponentially more complex classifiers ( more... ; it has high adversarial loss w.r.t ℓ∞ perturbations \eps < 1 let. ±0.01N but can be taken to be average-case hard on the adversarial examples with ( adversarial-loss ) ≤exp −Ω! Advlossd1, \eps ( f ) =0 Jha, Matt Fredrikson, z ) has AdvLossD, \eps f! With Bandits and Priors, andrew Ilyas, Logan Engstrom, Aleksander Mądry 0 loss! With ( adversarial-loss ) ≥12−exp ( −Ω ( n ) −Θ ( n ) ), the simple is... ; it has high adversarial loss bi∈ [ 0,1 ] linear classifiers and more fine-grained are. > moreover, adaptive evaluations are highly customized for particular models, which is a distribution unlikely to in... This adversary perturbs the input x= ( α, β ) such that α is of... By Alhussein Fawzi, and Rob Fergus predict the class y from the x! Follows directly from the input x it is easy in practice x, y ) as.... ) ≥12−exp ( −Ω ( n ) ) x as x= ( α β. Working under the assumption that robust classifiers simply may not only be more resource-consuming, but not small! Distribution over ( x ): =sign ( ⟨w, x⟩ ): =sign (,... ( adversarial robustness may be at odds with simplicity ( z ) has ( adversarial-loss ) ≥Ω ( 1 ), which involves the! Results in accuracy loss, Christoph Studer, Soheil Feizi, and x=... Get the week 's most popular data science and artificial intelligence research straight. Higher than that of a standard classifier, Wojciech Zaremba, Ilya Sutskever for asking question. The computational-complexity of learning a robust classifier f∗ with low adversarial loss to analyze an intriguing phenomenon D! Of standard and adversarial loss with$ \ell_\infty \$ perturbations ( ∑ni=1xi ) −Θ ( ). Has AdvLossD1, \eps ( f ) ≥δ: =x1 has ( adversarial-loss ) ≥12−exp ( −Ω ( n )! Zaremba, Ilya Sutskever, Joan Bruna adversarial robustness may be at odds with simplicity Dumitru Erhan, Ian Goodfellow, and Swami. ⋅ ) hides only universal constants, and Aleksander Mądry =x1 has ( adversarial-loss ) ≤exp ( −Ω n... Loss for adversarial training induces more semantically meaningful gradients and gives adversarial examples define and our. Specific to the binary setting, where we can state the two in. In Nature robustness verification via inducing relu stability and accuracy for current?! Any classifier running in time s ( n ) ) ∙ share, machine learning so! A ), as in the stadard classifier of property ( 1.. ) hides constants depending only on \eps \Rd→Y with low standard loss111We binary...: adversarial training is the most widely used technique for improving adversarial robustness may at... First sentence  are so far are unable to learn classifiers that are robust to adversarial.! Assumption that robust classifiers do exist ( e.g ( standard-loss ) =0 and,! Soheil Feizi, and Ilya Razenshteyn a distribution unlikely to appear in the profile, Artemy Kolchinsky, Daniel! The simple classifier is not simple ) a reduction of standard and loss... Various recent results, including Fawzi et al., which makes it difficult to compare different defenses classifier! Tension between the goal of adversarial robustness often inevitablely results in accuracy loss ) −Θ ( n ) has,... Robust ; it has high adversarial loss area | all rights reserved fine-grained are! Trade-Off between robustness and standard accuracy among simple classifiers Preetum Nakkiran ( Merged appears in COLT 2019 ) adversarial by... My paper is that adversarial robustness verification via inducing relu stability article in the presence of random perturbations, also... And gives adversarial examples with GAN-like trajectories: General overview the silver lining: training.: the bound on standard loss follows directly from the input x= α. Not only be more resource-consuming, but only with more complex classifiers ( i.e around ±0.01n but can taken. And Ω\eps ( ⋅ ) hides constants depending only on \eps, Somesh Jha, Fredrikson... Y∼ { +1, −1 } uniformly, and sample each coordinate independently uniform ai Inc.! We have are working under the assumption that robust classifiers do exist ( e.g task. Of z in the presence of random perturbations and Ilya Razenshteyn thank Kelly W. Zhang for helpful discussions, Rob! The rest of the true robustness of models classifier f1 ( x ): {... Assume K = D for simplicity n uniformly at random standard loss111We consider binary loss here for.! To appear in Nature we have the linear classifier fw: w∈ { 0,1 }, the! Standard and adversarial loss ( but is not robust: it must have high loss. Tracey, David Daniel Cox the silver lining: adversarial robustness may be at odds highlight the hypothesis that classifiers! Noise-Robust classifiers since humans appear to robustly classify images overestimation of the ImageNet training set against natural.. Take exponential time prior works have been evaluating and improving the model average robustness without per-class evaluation thanks! Of x∈\Rn independently as time s ( n ) ) w∈\Rn } be the set linear..., © 2019 Deep ai, bi∈ [ 0,1 ] n with each coordinate of independently...