Georgia Tech is leading a $1.2 million project to develop a system to protect the security of machine learning (ML) based systems. School of Computational Science and Engineering (CSE) Assistant Professor Duen Horng (Polo) Chau leads the project, funded by the National Science Foundation (NSF), alongside Professor Wenke Lee, Associate Professor Le Song, and Assistant Professor Taesoo Kim of Georgia Tech.
From applications in education to science and technology as a whole, the profound reach and use of machine learning is undeniable and ubiquitous. This, in turn, means that any damage caused by ML based systems can be extensive and devastating. Already, attackers can poison ML models by intentionally injecting maliciously crafted training data, causing the model to make wrong decisions. The history of cybersecurity suggests that attackers rendering machine learning based security analysis ineffective by gaining control of the input data or computation procedures will become more prevalent in the real-world soon.
The project team has extensive accomplishments and experience in machine learning, systems and network security, botnet and intrusion detection, and malware analysis. The project itself, titled, SaTC: CORE: Medium: Understanding and Fortifying Machine Learning Based Security Analytics, undertakes the challenge of developing a systematic, foundational, and practical framework to understand attacks, quantify vulnerabilities, and fortify machine learning based security analytics. The ultimate aim of the four-year project is to change how machine learning based systems will be designed, developed, and deployed.
“The ever-increasing volume of data that can be collected and made available for security analysis presents both great opportunities and great challenges. We can now apply powerful data analysis techniques, in particular, machine learning algorithms, that have been developed in recent years to gain new security insights and develop new solutions,” said Chau. “However, preliminary research has demonstrated that by gaining control of the input training data or the classification process, attackers can render machine learning based security analysis ineffective.”
Song explained further, “To determine how adversaries can attack ML based security analytics, we will study the theoretical vulnerabilities of ML algorithms, such as how adversaries may smartly select the most uncertain examples to optimize exploratory attacks, and how they may launch sophisticated causative attacks even when the choices of ML models and algorithms are not known.”
The findings from this research may lead to new kinds of adaptive cyberdefense systems. These systems would be highly resilient and efficient against future cybersecurity attacks, helping protect the nation and its citizens from harm. In a very tangible way, the proposed ideas in this NSF project push forward the envelope of state-of-the-art machine learning research, shaping systems now and into the future.
In 2016, a $1.5 million gift from Intel Corporation was given to Georgia Tech to establish a new research center – the Intel Science & Technology Center for Adversary-Resilient Security Analytics (ISTC-ARSA) – dedicated to the emerging field of ML cybersecurity. The new center focuses on strengthening the analytics behind malware detection and threat analysis. The research exploration with Intel helped the research team identify and formalize important new research questions that form the pillars of this NSF project. These include the crucial need for developing a theoretical machine learning framework to formally quantify the level of impact by different types of attacks, and using this theoretical thinking to guide and increase defender systems in a principled way.
The NSF project will leverage multiple channels to accelerate knowledge dissemination and tech transfer. From Symantec, the leading security solution provider, to various industry partners, the project team has obtained strong commitment to collaborate on the proposed research. Chau explained, “They will share with us malware samples, and help transition the developed research into practice, into their malware analysis engine, via Intel Software Guard Extension. We will open-source all developed software and datasets.”