Abstract
We present NaturalCC, an efficient and extensible open-source toolkit for machine-learning-based source code analysis (i.e., code intelligence). Using NaturalCC, researchers can conduct rapid prototyping, reproduce state-of-the-art models, and/or exercise their own algorithms. NaturalCC is built upon Fairseq and PyTorch, providing (1) a collection of code corpus with preprocessing scripts, (2) a modular and extensible framework that makes it easy to repro-duce and implement a code intelligence model, and (3) a benchmark of state-of-the-art models. Furthermore, we demonstrate the usability of our toolkit over a variety of tasks (e.g., code summarization, code retrieval, and code completion) through a graphical user interface. The website of this project is http://xcodemind.github.io, where the source code and demonstration video can be found.
Original language | English |
---|---|
Title of host publication | Proceedings of 2022 ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE-Companion 2022 |
Publisher | IEEE |
Pages | 149-153 |
ISBN (Electronic) | 9781665495981 |
DOIs | |
Publication status | Published - 2022 |
Citation
Wan, Y., He, Y., Bi, Z., Zhang, J., Sui, Y., Zhang, H., Hashimoto, K., Jin, H., Xu, G., Xiong, C., & Yu, P. S. (2022). NaturalCC: An open-source toolkit for code intelligence. In Proceedings of 2022 ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, ICSE-Companion 2022 (pp. 149-153). IEEE. https://doi.org/10.1145/3510454.3516863Keywords
- Code intelligence
- Deep learning
- Code representation
- Code embedding
- Open source
- Toolkit
- Benchmark