Yangruibo (Robin) Ding

Email  /  Google Scholar  /  LinkedIn  /  Twitter

I am a final-year Ph.D. student in Computer Science at Columbia University advised by Prof. Baishakhi Ray and Prof. Gail Kaiser. I have also been a student researcher at Google DeepMind since May 2024.

My research focuses on developing large language models for code. I am interested in training language models to learn code-specific semantics (e.g., dynamic execution) and properties (e.g., functionality and constraint) to generate, analyze, and refine software programs. Most recently, I work on improving LLMs' reasoning capability to tackle complex programming tasks, such as debugging and patching.

📣 Office Hours: I am holding office hours on Tuesdays 3-4 PM, offering mentorship and advice to Columbia undergraduate/master students. If you want to discuss research ideas with me, please fill out this Form by EOD of Mondays.

profile photo
News

Sep. 2024: "SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning" got accepted by NeurIPS 2024.

🎉July 2024: "Vulnerability Detection with Code Language Models: How Far Are We?" got accepted by ICSE 2025.

DeepMind Logo May. 2024: I joined Google DeepMind as a Student Researcher, working on Code LLMs.

Jan. 2024: "CYCLE: Learning to Self-Refine Code Generation" got accepted by OOPSLA 2024.

Jan. 2024: "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain" got accepted by ICLR 2024. Congrats to Marcus!

Dec. 2023: "Deep Learning Based Vulnerability Detection: Are We There Yet" got IEEE TSE Best Paper Award Runner-up. Congrats to Saikat!

Nov. 2023: I will serve as a Program Committee Member of ASE 2024.

Sep. 2023: Our Datasets and Benchmarks paper, CrossCodeEval, got accepted by NeurIPS 2023.

July 2023: CONCORD got ACM SIGSOFT Distinguished Paper Award.

Honors and Awards
Work Experiences
Publications
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning
Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray

NeurIPS 2024
Vulnerability Detection with Code Language Models: How Far Are We?
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair,
David Wagner, Baishakhi Ray, Yizheng Chen

ICSE 2025
CYCLE: Learning to Self-Refine Code Generation
Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray

OOPSLA 2024
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

ICLR 2024
TRACED: Execution-aware Pre-training for Source Code
Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray

ICSE 2024
Automated Code Editing with Search-Generate-Modify
Changshu Liu, Pelin Cetin, Yogesh Patodia, Baishakhi Ray, Saikat Chakraborty, Yangruibo Ding

IEEE Transactions on Software Engineering (TSE)
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
Yangruibo Ding*, Zijian Wang*, Wasi Uddin Ahmad*, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang (* equal contribution)

LREC-COLING 2024
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Yangruibo Ding*, Zijian Wang*, Wasi Uddin Ahmad*, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang (* equal contribution)

NeurIPS 2023 (Datasets & Benchmarks)
CONCORD: Clone-aware Contrastive Learning for Source Code
Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, Baishakhi Ray
ACM SIGSOFT Distinguished Paper Award

ISSTA 2023
NatGen: Generative pre-training by "Naturalizing" source code
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, Baishakhi Ray

ESEC/FSE 2022
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts
Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

ACL 2022
Deep learning based vulnerability detection: Are we there yet
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray
IEEE TSE Best Paper Award Runner-up

ICSE 2022 (Journal-First), IEEE Transactions on Software Engineering (TSE)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements
Yangruibo Ding, Sahil Suneja, Yunhui Zheng, Jim Laredo, Alessandro Morari, Gail Kaiser, Baishakhi Ray

SANER 2022
CODIT: Code Editing With Tree-Based Neural Models
Saikat Chakraborty Yangruibo Ding, Miltiadis Allamanis, Baishakhi Ray

ICSE 2021 (Journal-First), IEEE Transactions on Software Engineering (TSE)
Patching as Translation: the Data and the Metaphor
Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J Hellendoorn

ASE 2020

Services

Program Committee

Conference Reviewer

Journal Reviewer

Teaching


Last Updated: Oct 2024.

Photo by Lingyi. Website Template by Jon Barron