Yangruibo (Robin) Ding's Personal Website

Yangruibo (Robin) Ding

I am a fifth-year Ph.D. student in the Department of Computer Science at Columbia University. I am fortunate to be advised by Prof. Baishakhi Ray and Prof. Gail Kaiser.

My research focuses on learning the semantic perspective of source code with statistical models for automated software engineering tasks, such as automated code generation and program analysis.

Email / Google Scholar / LinkedIn / Twitter

News

🎉July 2024: "Vulnerability Detection with Code Language Models: How Far Are We?" got accepted by ICSE 2025.

May. 2024: I joined Google DeepMind as a Student Researcher, working on Code LLMs.

⭐ May. 2024: Check our new work on training a 6.7B code LM to outperform GPT-3.5 in code execution reasoning: SemCoder: Training Code Language Models with Comprehensive Semantics.

Jan. 2024: "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain" got accepted by ICLR 2024. Congrats to Marcus!

Dec. 2023: "CYCLE: Learning to Self-Refine Code Generation" got accepted by OOPSLA 2024.

Dec. 2023: "Deep Learning Based Vulnerability Detection: Are We There Yet" got IEEE TSE Best Paper Award Runner-up. Congrats to Saikat!

Nov. 2023: I will serve as a Program Committee Member of ASE 2024.

Sep. 2023: Our Datasets and Benchmarks paper, CrossCodeEval, got accepted by NeurIPS 2023.

July 2023: CONCORD got ACM SIGSOFT Distinguished Paper Award.

Honors and Awards

IBM Ph.D. Fellowship Award. 2022-2024
ACM SIGSOFT Distinguished Paper Award. 2023
IEEE TSE Best Paper Award Runner-up. 2022
NSF Student Travel Award for ESEC/FSE'23. 2023
ACM SIGSOFT CAPS Travel Grant. 2023
NSF Travel Award for ICSE'22. 2022

Publications

Pre-print

SemCoder: Training Code Language Models with Comprehensive Semantics
Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray
Arxiv 2024

Conference & Journal

Vulnerability Detection with Code Language Models: How Far Are We?
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair,
David Wagner, Baishakhi Ray, Yizheng Chen
ICSE 2025

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray
ICLR 2024

	CYCLE: Learning to Self-Refine Code Generation Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray OOPSLA 2024
	TRACED: Execution-aware Pre-training for Source Code Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray ICSE 2024
	Automated Code Editing with Search-Generate-Modify Changshu Liu, Pelin Cetin, Yogesh Patodia, Baishakhi Ray, Saikat Chakraborty, Yangruibo Ding IEEE Transactions on Software Engineering (TSE)
	CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang ( equal contribution) LREC-COLING 2024
	CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang ( equal contribution) Datasets and Benchmarks Track NeurIPS 2023
	CONCORD: Clone-aware Contrastive Learning for Source Code Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, Baishakhi Ray ISSTA 2023 ACM SIGSOFT Distinguished Paper Award
	NatGen: Generative pre-training by "Naturalizing" source code Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, Baishakhi Ray ESEC/FSE 2022
	Towards Learning (Dis)-Similarity of Source Code from Program Contrasts Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty ACL 2022
	Deep learning based vulnerability detection: Are we there yet Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray ICSE 2022 Journal-First, IEEE Transactions on Software Engineering (TSE). IEEE TSE Best Paper Award Runner-up
	VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements Yangruibo Ding, Sahil Suneja, Yunhui Zheng, Jim Laredo, Alessandro Morari, Gail Kaiser, Baishakhi Ray SANER 2022
	CODIT: Code Editing With Tree-Based Neural Models Saikat Chakraborty Yangruibo Ding, Miltiadis Allamanis, Baishakhi Ray ICSE 2021 Journal-First, IEEE Transactions on Software Engineering (TSE)
	Patching as Translation: the Data and the Metaphor Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J Hellendoorn ASE 2020