Natural Language Processing

At the Language, Information, and Learning lab at Yale (LILY), we are working on the following cutting-edge research in natural language processing (NLP).

Multilingual information retrieval – We collaborate with researchers from Columbia University, the University of Maryland, the University of Edinburgh, and the University of Cambridge to build search engines for English users to query documents written in other languages including Swahili and Tagalog. This cross-lingual information retrieval system improves our capability of understanding and processing different low-resource languages and it offers users a reliable access to foreign documents.

Resources for learning NLP and AI – We aim to make dynamic research topics more accessible to the public by generating surveys of topics, discovering prerequisite relations among topics and recommending appropriate resources based on a given individual’s education background and needs. We host a search engine, AAN (All About NLP) which is available at

Medical NLP –We work with the Center for Outcomes Research & Evaluation (CORE) at the School of Medicine at Yale, investigating the use of NLP on electronic health records.  The tasks include abbreviation disambiguation, patient digital pairs, patient history record summarization. We are interested in how to transfer general knowledge to the medical domain and how to take good advantage of the limited high-quality data. Besides, we are also developing an NLP toolkit for the medical domain to perform tasks like named entity recognition and relation extraction.

Semantic parsing, Natural language database interfaces and dialogue systems – The goal of this project is to allow users with any background to talk to relational databases directly using human language. In this way, anyone can easily query and analyze a vast amount of data. You can check out our current work here. Moreover, we also aim to build conversational interfaces for even more natural information access, where the users participate in a conversation and the system takes the responsibility of choosing data sources and developing queries.

We also work on text summarization, question answering, graph methods for NLP, question answering, natural language generation from structured data, as well as programming code generation.

NLP Faculty: Dragomir Radev, Robert Frank, John Lafferty