publications

publications are in reversed chronological order.

2025

  1. DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
    Jongwoo Ko, Tianyi Chen, Sungnyun Kim, and 4 more authors
    Mar 2025

2024

  1. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
    Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, and 5 more authors
    Feb 2024
  2. Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
    Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, and 4 more authors
    Jun 2024
  3. DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
    Huajian Xin, Daya Guo, Zhihong Shao, and 6 more authors
    May 2024
  4. Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
    Zhiqing Sun, Longhui Yu, Yikang Shen, and 4 more authors
    Mar 2024
  5. Training Language Models to Self-Correct via Reinforcement Learning
    Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, and 15 more authors
    Sep 2024
  6. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
    Jianlv Chen, Shitao Xiao, Peitian Zhang, and 3 more authors
    Feb 2024

2023

  1. Simple synthetic data reduces sycophancy in large language models
    Jerry Wei, Da Huang, Yifeng Lu, and 2 more authors
    Aug 2023
  2. MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
    Sirui Hong, Xiawu Zheng, Jonathan Chen, and 10 more authors
    Aug 2023
  3. Generative Agents: Interactive Simulacra of Human Behavior
    Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, and 3 more authors
    Apr 2023
  4. Transformers in Speech Processing: A Survey
    Siddique Latif, Aun Zaidi, Heriberto Cuayahuitl, and 3 more authors
    Mar 2023
  5. A Watermark for Large Language Models
    John Kirchenbauer, Jonas Geiping, Yuxin Wen, and 3 more authors
    Jan 2023
  6. Zero-shot Image-to-Image Translation
    Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, and 3 more authors
    Feb 2023
  7. Universal and Transferable Adversarial Attacks on Aligned Language Models
    Andy Zou, Zifan Wang, J. Zico Kolter, and 1 more author
    Jul 2023
  8. Large Language Models for Software Engineering: Survey and Open Problems
    Angela Fan, Beliz Gokkaya, Mark Harman, and 4 more authors
    Oct 2023
  9. Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    Albert Gu, and Tri Dao
    Dec 2023
  10. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, and 10 more authors
    Oct 2023
  11. Large Language Models as Optimizers
    Chengrun Yang, Xuezhi Wang, Yifeng Lu, and 4 more authors
    Sep 2023
  12. Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
    Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, and 9 more authors
    Dec 2023
  13. Query Rewriting for Retrieval-Augmented Large Language Models
    Xinbei Ma, Yeyun Gong, Pengcheng He, and 2 more authors
    May 2023
  14. Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
    Xin Cheng, Di Luo, Xiuying Chen, and 3 more authors
    May 2023
  15. Dense X Retrieval: What Retrieval Granularity Should We Use?
    Tong Chen, Hongwei Wang, Sihao Chen, and 5 more authors
    Dec 2023
  16. Lost in the Middle: How Language Models Use Long Contexts
    Nelson F. Liu, Kevin Lin, John Hewitt, and 4 more authors
    Jul 2023
  17. Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!
    Yubo Ma, Yixin Cao, YongChing Hong, and 1 more author
    Mar 2023
  18. Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
    Shengyao Zhuang, Bing Liu, Bevan Koopman, and 1 more author
    Oct 2023
  19. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    Zhihong Shao, Yeyun Gong, Yelong Shen, and 3 more authors
    May 2023
  20. RAGAS: Automated Evaluation of Retrieval Augmented Generation
    Shahul Es, Jithin James, Luis Espinosa-Anke, and 1 more author
    Sep 2023
  21. DiLoCo: Distributed Low-Communication Training of Language Models
    Arthur Douillard, Qixuan Feng, Andrei A. Rusu, and 6 more authors
    Nov 2023

2022

  1. Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
    Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency
    Sep 2022
  2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
    Jason Wei, Xuezhi Wang, Dale Schuurmans, and 6 more authors
    Jan 2022
  3. Training language models to follow instructions with human feedback
    Long Ouyang, Jeff Wu, Xu Jiang, and 17 more authors
    Mar 2022
  4. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
    Sewon Min, Xinxi Lyu, Ari Holtzman, and 4 more authors
    Feb 2022
  5. Repository-Level Prompt Generation for Large Language Models of Code
    Disha Shrivastava, Hugo Larochelle, and Daniel Tarlow
    Jun 2022
    ICML, 2023
  6. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
    Rinon Gal, Yuval Alaluf, Yuval Atzmon, and 4 more authors
    Aug 2022
  7. InstructPix2Pix: Learning to Follow Image Editing Instructions
    Tim Brooks, Aleksander Holynski, and Alexei A. Efros
    Nov 2022
  8. In-context Learning and Induction Heads
    Catherine Olsson, Nelson Elhage, Neel Nanda, and 23 more authors
    Sep 2022
  9. Matryoshka Representation Learning
    Aditya Kusupati, Gantavya Bhatt, Aniket Rege, and 8 more authors
    May 2022
  10. Training Compute-Optimal Large Language Models
    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, and 19 more authors
    Mar 2022
  11. Constitutional AI: Harmlessness from AI Feedback
    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, and 48 more authors
    Dec 2022
  12. Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization
    Elan Rosenfeld, Pradeep Ravikumar, and Andrej Risteski
    Feb 2022
  13. Precise Zero-Shot Dense Retrieval without Relevance Labels
    Luyu Gao, Xueguang Ma, Jimmy Lin, and 1 more author
    Dec 2022
  14. InPars: Data Augmentation for Information Retrieval using Large Language Models
    Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, and 1 more author
    Feb 2022
  15. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and 1 more author
    Dec 2022
  16. ASQA: Factoid Questions Meet Long-Form Answers
    Ivan Stelmakh, Yi Luan, Bhuwan Dhingra, and 1 more author
    Apr 2022
  17. Generate rather than Retrieve: Large Language Models are Strong Context Generators
    Wenhao Yu, Dan Iter, Shuohang Wang, and 6 more authors
    Sep 2022
  18. Diffusion-LM Improves Controllable Text Generation
    Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, and 2 more authors
    May 2022

2021

  1. Evaluating Large Language Models Trained on Code
    Mark Chen, Jerry Tworek, Heewoo Jun, and 55 more authors
    Jul 2021
  2. Efficiently Modeling Long Sequences with Structured State Spaces
    Albert Gu, Karan Goel, and Christopher Re
    Oct 2021
  3. Calibrate Before Use: Improving Few-Shot Performance of Language Models
    Tony Z. Zhao, Eric Wallace, Shi Feng, and 2 more authors
    Feb 2021
  4. Noisy Channel Language Model Prompting for Few-Shot Text Classification
    Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and 1 more author
    Aug 2021
  5. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and 1 more author
    In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event, Canada, Aug 2021
  6. High-Resolution Image Synthesis with Latent Diffusion Models
    Robin Rombach, Andreas Blattmann, Dominik Lorenz, and 2 more authors
    Dec 2021
  7. LoRA: Low-Rank Adaptation of Large Language Models
    Edward J. Hu, Yelong Shen, Phillip Wallis, and 5 more authors
    Jun 2021
  8. RoFormer: Enhanced Transformer with Rotary Position Embedding
    Jianlin Su, Yu Lu, Shengfeng Pan, and 3 more authors
    Apr 2021
  9. Prefix-Tuning: Optimizing Continuous Prompts for Generation
    Xiang Lisa Li, and Percy Liang
    Jan 2021
  10. Editing a classifier by rewriting its prediction rules
    Shibani Santurkar, Dimitris Tsipras, Mahalaxmi Elango, and 3 more authors
    Dec 2021
  11. Unsupervised Dense Information Retrieval with Contrastive Learning
    Gautier Izacard, Mathilde Caron, Lucas Hosseini, and 4 more authors
    Dec 2021
  12. ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction
    Kwan Ho Ryan Chan, Yaodong Yu, Chong You, and 3 more authors
    May 2021

2020

  1. Language Models are Few-Shot Learners
    Tom B. Brown, Benjamin Mann, Nick Ryder, and 28 more authors
    May 2020
  2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    Patrick Lewis, Ethan Perez, Aleksandra Piktus, and 9 more authors
    May 2020
  3. Dense Passage Retrieval for Open-Domain Question Answering
    Vladimir Karpukhin, Barlas Oguz, Sewon Min, and 5 more authors
    Apr 2020
  4. Accurate Detection of Wake Word Start and End Using a CNN
    Christin Jose, Yuriy Mishchenko, Thibaud Senechal, and 3 more authors
    Aug 2020
    Interspeech 2020
  5. Extracting Training Data from Large Language Models
    Nicholas Carlini, Florian Tramer, Eric Wallace, and 9 more authors
    Dec 2020
  6. Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
    Chaoyue Liu, Libin Zhu, and Mikhail Belkin
    Feb 2020
  7. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
    Omar Khattab, and Matei Zaharia
    Apr 2020
  8. Scaling Laws for Neural Language Models
    Jared Kaplan, Sam McCandlish, Tom Henighan, and 7 more authors
    Jan 2020

2019

  1. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
    Mike Lewis, Yinhan Liu, Naman Goyal, and 5 more authors
    Oct 2019
  2. Language Models are Unsupervised Multitask Learners
    Alec Radford, Jeff Wu, Rewon Child, and 3 more authors
    In , Oct 2019
  3. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
    Colin Raffel, Noam Shazeer, Adam Roberts, and 6 more authors
    Oct 2019
  4. Deep Double Descent: Where Bigger Models and More Data Hurt
    Preetum Nakkiran, Gal Kaplun, Yamini Bansal, and 3 more authors
    Dec 2019

2018

  1. Deep contextualized word representations
    Matthew E. Peters, Mark Neumann, Mohit Iyyer, and 4 more authors
    Feb 2018
  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and 1 more author
    Oct 2018
  3. Improving Language Understanding by Generative Pre-Training
    Alec Radford, and Karthik Narasimhan
    In , Oct 2018
  4. Gradient Descent Provably Optimizes Over-parameterized Neural Networks
    Simon S. Du, Xiyu Zhai, Barnabas Poczos, and 1 more author
    Oct 2018
  5. Reconciling modern machine learning practice and the bias-variance trade-off
    Mikhail Belkin, Daniel Hsu, Siyuan Ma, and 1 more author
    Dec 2018

2017

  1. The Implicit Bias of Gradient Descent on Separable Data
    Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, and 2 more authors
    Oct 2017
  2. Understanding Black-box Predictions via Influence Functions
    Pang Wei Koh, and Percy Liang
    Mar 2017

2016

  1. Understanding deep learning requires rethinking generalization
    Chiyuan Zhang, Samy Bengio, Moritz Hardt, and 2 more authors
    Nov 2016

2015

  1. Distilling the Knowledge in a Neural Network
    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean
    Mar 2015