Microsoft's security team relies on machine learning models to classify reported errors according to whether or not they are security-critical gaps the company on its blog communicates. Thanks to a new method, the model is so good that in 99 percent of all cases, errors reported are correctly classified as safety-relevant and in 97 percent of all cases they are correctly classified as critical or non-critical.
As a reason for the work, Microsoft writes that automated tools should help developers to prioritize their work better. "Too often, engineers waste time on false alarms or overlook a critical vulnerability that has been misclassified.", the company writes in its announcement. Unsurprisingly, Microsoft also writes that linking work on the machine learning models with the expertise of security experts has led to better results.
The basis for training the model are the approximately 13 million error reports and work tasks that the company has collected since 2001. A part of it was selected for the training according to its quality. After the initial training, Microsoft's security experts tested and evaluated the model in productive use. This was done by checking the average number of errors found and a random selection of the errors found. The model was also improved several times using this method.
Ideas to use automatic aids to detect or prevent security holes are not new. For example, Facebook is testing its code with the Zoncolan system. The cloud provider AWS also uses machine learning techniques for code verification for its code guru. What is special about the Microsoft model is that it is based on the title of the error messages and not on their actual content.
The company plans to use the methodology used "in the following months" to be published as open source on Github. A scientific evaluation Microsoft also provides it as part of its documentation.