Code Completion with Statistical Language Models

450
37.5
Опубликовано 27 июня 2016, 20:48
In this talk, I present the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for the holes with the most likely sequences of method calls that we learn from existing code. The main idea of our approach is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from large codebase, and index them into a statistical language model. Then we employ the language model to find the highest ranked sentences and use them to synthesize a code completion. Our technique is capable of synthesizing sequences of method calls, calls that may span across multiple objects and methods together with their arguments. We implemented our approach for Java programs using Android APIs and our results show that the system is fast and effective. Virtually all computed completions typecheck and the desired completion appears in the first 3 results for 90% of the test cases.
автотехномузыкадетское