

Department of Computer Science, PhD Dissertation Defense
Student:听Xin Wei
Date and Time:听Jul 10, 2025 at 11:00 AM
Location:听ECSB 1st floor auditorium
Committee Chair:听Dr. Jian Wu
Committee Members:
Dr. Michele Weigle (Department of Computer Science, 51本色)
Dr. Sarah Rajtmajer (College of Information Sciences and Technology, Penn 51本色)
Dr. Lusi Li (Department of Computer Science, 51本色)听
Title:听Semantic information extraction from scholarly papers and an application
Abstract:听Scholarly papers are major vehicles for disseminating scientific discoveries, reporting experimental results, and communicating research ideas. A majority of the information in scholarly papers is unstructured data (i.e., text), which is challenging to organize, process, and analyze with computers. Extracting semantic information from textual content in scholarly papers and transforming them into structured or semi-structured data are crucial pre-processing steps for many downstream tasks. Traditionally, information extraction focuses on metadata and references. With the advancement of natural language processing and machine learning, deep learning models have been used to extract semantic information from textual content and use them as representations of scholarly documents.听
This dissertation proposes a major effort to develop machine learning and deep learning models to extract semantic features from the abstract and full text of scholarly papers and the application of extracted features on a downstream task in Science of Science. Specifically, we use machine learning models to extract theory and model entities, scientific claims, and acknowledgments, which represent the key semantic information but are hard to be extracted by heuristic methods. We highlight the technical contributions to overcome challenges, including using distant supervised learning and supervised contrastive learning to mitigate the data scarcity challenge and using named entity recognition (NER) along with a filter based on linguistic features to extract acknowledgment entities. We further report our efforts on applying extracted features to automatically assess the replicability of social and behavioral sciences papers, as well as to probe influential features in replicability assessment using explainable AI methods.