Summary
Software defects (or bugs) are a severe problem. Due to the drastic efforts involved in this process, countless software applications are shipped with many known and unknown bugs, which can crash critical computing systems or expose serious security vulnerabilities. As we increasingly rely on computing systems, there is a critical need to find a better way to tackle software bugs.
There is a largely untapped resource that can help us tackle this problem. Billions of lines of code are readily available from millions of open-source projects hosted in repositories like GitHub, many of which are of professional quality. Millions of code revisions are committed to these open-source projects daily, many of which are good examples of bug repair solutions. This wealth of information offers a new way to tackle software bugs. By analysing code revisions related to software bugs and their repair solutions, we can discover the root causes of bugs and learn how to fix them. Through aggregating and leveraging these past development efforts devoted by many professional programmers worldwide, a tool can be designed to automatically identify and fix hidden software bugs from a new, unseen program.
We envision a new paradigm where software developers no longer need to spend enormous amounts of time manually finding and fixing bugs buried in hundreds of thousands of lines of complex code. This exciting vision of highly intelligent software development just becomes possible due to the recent breakthrough in the effectiveness of deep learning, which allows us to build powerful natural language processing (NLP) models to distil knowledge from large corpora of texts. This work will extend the reach of NLP to massive code bases, an area of research that is largely unexplored. By combining NLP methods with compiler-based code analysis, we will develop new models, analyses, and techniques to extract and transfer knowledge from open-source projects to automatically fix software bugs, a task that previously seemed difficult or impossible but is much needed.
