Yangruibo (Robin) Ding

I am a fifth-year Ph.D. student in the Department of Computer Science at Columbia University. I am fortunate to be advised by Prof. Baishakhi Ray and Prof. Gail Kaiser.

My research focuses on learning the semantic perspective of source code with statistical models for automated software engineering tasks, such as automated code generation and program analysis.

Email  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo
News

🎉July 2024: "Vulnerability Detection with Code Language Models: How Far Are We?" got accepted by ICSE 2025.

DeepMind Logo May. 2024: I joined Google DeepMind as a Student Researcher, working on Code LLMs.

May. 2024: Check our new work on training a 6.7B code LM to outperform GPT-3.5 in code execution reasoning: SemCoder: Training Code Language Models with Comprehensive Semantics.

Jan. 2024: "Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain" got accepted by ICLR 2024. Congrats to Marcus!

Dec. 2023: "CYCLE: Learning to Self-Refine Code Generation" got accepted by OOPSLA 2024.

Dec. 2023: "Deep Learning Based Vulnerability Detection: Are We There Yet" got IEEE TSE Best Paper Award Runner-up. Congrats to Saikat!

Nov. 2023: I will serve as a Program Committee Member of ASE 2024.

Sep. 2023: Our Datasets and Benchmarks paper, CrossCodeEval, got accepted by NeurIPS 2023.

July 2023: CONCORD got ACM SIGSOFT Distinguished Paper Award.

Honors and Awards
Publications
Pre-print
SemCoder: Training Code Language Models with Comprehensive Semantics
Yangruibo Ding, Jinjun Peng, Marcus J. Min, Gail Kaiser, Junfeng Yang, Baishakhi Ray
Arxiv 2024
Conference & Journal
Vulnerability Detection with Code Language Models: How Far Are We?
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair,
David Wagner, Baishakhi Ray, Yizheng Chen
ICSE 2025
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray
ICLR 2024
CYCLE: Learning to Self-Refine Code Generation
Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray
OOPSLA 2024
TRACED: Execution-aware Pre-training for Source Code
Yangruibo Ding, Ben Steenhoek, Kexin Pei, Gail Kaiser, Wei Le, Baishakhi Ray
ICSE 2024
Automated Code Editing with Search-Generate-Modify
Changshu Liu, Pelin Cetin, Yogesh Patodia, Baishakhi Ray, Saikat Chakraborty, Yangruibo Ding
IEEE Transactions on Software Engineering (TSE)
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
Yangruibo Ding*, Zijian Wang*, Wasi Uddin Ahmad*, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang (* equal contribution)
LREC-COLING 2024
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
Yangruibo Ding*, Zijian Wang*, Wasi Uddin Ahmad*, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang (* equal contribution)
Datasets and Benchmarks Track
NeurIPS 2023
CONCORD: Clone-aware Contrastive Learning for Source Code
Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, Baishakhi Ray
ISSTA 2023
ACM SIGSOFT Distinguished Paper Award
NatGen: Generative pre-training by "Naturalizing" source code
Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, Baishakhi Ray
ESEC/FSE 2022
Towards Learning (Dis)-Similarity of Source Code from Program Contrasts
Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty
ACL 2022
Deep learning based vulnerability detection: Are we there yet
Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray
ICSE 2022 Journal-First, IEEE Transactions on Software Engineering (TSE).
IEEE TSE Best Paper Award Runner-up
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements
Yangruibo Ding, Sahil Suneja, Yunhui Zheng, Jim Laredo, Alessandro Morari, Gail Kaiser, Baishakhi Ray
SANER 2022
CODIT: Code Editing With Tree-Based Neural Models
Saikat Chakraborty Yangruibo Ding, Miltiadis Allamanis, Baishakhi Ray
ICSE 2021 Journal-First, IEEE Transactions on Software Engineering (TSE)
Patching as Translation: the Data and the Metaphor
Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J Hellendoorn
ASE 2020

Academic Services

Program Committee

Journal Reviewer

Conference Reviewer

Experiences

Teaching

Fall 2021: COMS4115 Programming Language & Translators, Columbia University

Internship

2022 Summer: Applied Scientist Intern, Amazon AWS AI Lab

2021 Summer: Research Intern, IBM T. J. Watson Research Center

2020 Summer: Research Intern, IBM T. J. Watson Research Center


Profile Photo by Lingyi. Website Template by Jon Barron