Intriguing Properties of Adversarial ML Attacks in the Problem Space
Machine learning (ML) classifiers have demonstrated impressive performance in various domains, particularly in discriminating between malicious and benign behaviour in security-sensitive settings (e.g., malware detection, anomaly detection, code attribution, platform abuse). However, it has been shown that adversaries can attack classifiers by carefully altering input data in order to manipulate their outputs.A well-studied example of an adversarial ML attack is the evasion attack. Using a gradient-driven methodology, it’s possible to calculate an ideal perturbation δ* to apply to the original object x which will result in the target classifier misidentifying it as a different class.However, in many settings it is not possible to convert this ideal feature vector back into a real problem-space object due to the inverse feature mapping problem. In these cases, the ideal transformations required to induce δ* in x are simply not available because of various constraints that exist only in the problem-space (e.g., plausibility).In this work we clarify the relationship between feature space and problem space and propose a general formalization for problem-space attacks, including a comprehensive set of constraints to consider. This allows us to highlight the strengths and weaknesses of different approaches and better formulate novel attacks.
Secure world Tools, practices and systems