Natural Language Processing

At the Language, Information, and Learning lab at Yale (LILY), we work on a number of cutting-edge research topics in natural language processing (NLP).
Logical reasoning (FOLIO)
Logical reasoning lies at the intersection of computer science, philosophy, and mathematics and it is a central component of Artificial Intelligence. The capabilities of large language models to perform logical reasoning have so far not been sufficiently evaluated. In this project, we investigate existing logical-reasoning datasets and build a more complex and diverse logical reasoning dataset. We also investigate four methods as our baselines to test the first-order logical reasoning capabilities of large-scale language models and their ability to perform neuro-symbolic reasoning: (a) fine-tuning medium-sized language models such as BERT-Large, RoBERTa-Large, GPT-J, (b) few-shot prompting giant models, GPT-3, Codex, etc., (c) prompt-tuning with T5-Large, and (d) chain-of-thought reasoning.
Table Summarization
Table summarization aims to build a natural language interface to allow users to access information in tabular data. We assume the presence of specific user intent indicated by either user-provided natural language query or system-provided (mined insights) highlighted table region, and we aim to explore effective methods to generate fluent, coherent, faithful and logically entailed description based on the user intent and the table. Once we achieve this aim, we seek to adapt our system to cope with real world application settings with multiple tables with complex schema and complex/ambiguous user queries. In the end, we seek to build the natural language interface by adapting our system to a dialogue setting.
Resources for learning NLP and AI (AAN)
The All About NLP (AAN) project focuses on the automatica development of educational resources and corpora. We aim to make dynamic research topics more accessible to the public by generating surveys of topics, discovering prerequisite relations among topics and recommending appropriate resources based on a given individual’s education background and needs. We host a search engine, AAN (All About NLP) and tools which are available at The research component is to explore and develop novel and exciting features using cutting-edge NLP technologies. This part involves deep learning-based models including BERT, text summarization, text classification, representation learning, and information retrieval.
Dialogue summarization and multi-turn dialogue comprehension
This project focuses on dialogue summarization and multi-turn dialogue comprehension. Previously, we worked on contrastive fine-tuning for faithful dialogue summarization, evaluation of factual consistency, conversation summarization benchmarks, and structured dialogue summarization for dialogue comprehension. We proposed a new structured paradigm of dialogue summarization and investigated its effectiveness and impact on multiple important downstream tasks of dialogue comprehension. We will next build upon our earlier work to further improve pre-trained language models’ dialogue summarization and comprehension ability through better structural reasoning, knowledge integration, and graph neural network modeling approaches. We plan to incorporate various knowledge reasoning techniques into this process to provide helpful contextual information for the generation and utilization of structured dialogue summarization. 
NLP for code generation (NLP4CODE)
The NLP4Code project aims to study and develop deep learning methods for various tasks related to source code, including code generation, program synthesis, code translation, automatic program repair, code summarization, etc. An example application is Copilot, a GPT-3 model trained on all code on GitHub. For this semester, we will build a dataset that unifies previous semantic parsing datasets with different target languages (e.g., SQL, lambda-calculus). We would also like to evaluate all existing code language models (e.g., Codex, CodeGen, CodeT5, InCoder, GPT-J/Neo) and their abilities to generate code in different domains under different settings (i.e., finetuning or few-shot). In addition, we would like to systematically study the evaluation metrics for the program synthesis tasks and design a more efficient metric.
Summarization evaluation
Automatic summarization has the potential to transform how we consume information. Summaries should capture the essential information present in a document or chat transcript, and automatic metrics have been developed to compare the ability of models to such salient summaries. However, recently-introduced metrics have not been widely adopted by the research community. The conclusions drawn on these datasets are often not statistically significant due to the small sample size of the evaluation datasets, which further inhibits the adoption of new metrics. Additionally, analysis of these metrics shows that they fail when the summarization systems being compared are close in quality, proving the brittleness of such automatic metrics. A sufficiently-sized benchmark dataset for salience evaluation will allow researchers to have confidence in metric comparisons and understand the domains and settings under which certain metrics outperform others. Therefore, we aim to collect a high-quality human evaluation dataset for summarization evaluation w.r.t. content coverage, propose more reliable human evaluation protocols and reliable automatic evaluation metrics. In addition, we are planning to build an easy-to-use toolkit and benchmark for human and automatic summarization evaluation. 
NLP for electronic health records
We work with the Center for Outcomes Research & Evaluation (CORE) at the School of Medicine at Yale, investigating the use of NLP on electronic health records (EHR).  The tasks include abbreviation disambiguation, patient digital pairs, patient history record summarization. We are interested in how to transfer general knowledge to the medical domain and how to take good advantage of the limited high-quality data. Besides, we are also developing an NLP toolkit for the medical domain to perform tasks like named entity recognition and relation extraction. 

Faculty working in this area:

faculty email website
Arman Cohan  
Robert Frank CLAY Lab
John Lafferty Lafferty Group