CS Talk - Petr Knoth, Knowledge Media Institute

Details
Date
October 13, 2025
Time
11:00 AM - 12:00 PM
Location
DL 316, 10 Hillhouse Avenue
Title: COnnecting REpositories (CORE - core.ac.uk) - an open scholarly infrastructure by researchers for researchers
Abstract:
CORE (Connecting Repositories) is an open scholarly infrastructure that indexes millions of open access research papers and metadata from repositories and journals worldwide. Its goal is to improve the discoverability and reuse of research outputs and support machine access to scholarly content in line with open access and open science principles. This talk will provide an overview of CORE and its services for repositories, including compliance monitoring, metadata validation, and tools to improve interoperability and discoverability. It will also present research by the Big Scientific Data and Text Analytics Group (BSDTAG), showcasing recent innovations such as CORE-GPT, a system for trustworthy question answering over scholarly literature; SDG: Classify, which maps research papers to UN Sustainable Development Goals; and SoFAIR, which addresses reproducibility and research software management. Finally, the talk will discuss how CORE enables external research and innovation in areas such as training large language models, plagiarism detection, library discovery, and the construction of scholarly graphs, fostering a globally connected and machine-readable open research ecosystem.
Bio:
Petr Knoth is Professor of Data Science at the Knowledge Media Institute, The Open University, where he leads the Big Scientific Data and Text Analytics Group (BSDTAG). His research focuses on large-scale machine processing of scientific information and the development of open scholarly infrastructures that make research outputs more discoverable, accessible, and reusable by humans and machines alike. He is the Founder and Head of CORE, a leading not-for-profit scholarly infrastructure service for open access research papers. CORE indexes content from thousands of repositories and journals worldwide and serves millions of monthly active users. It plays a vital role in supporting open science by enabling free discovery, access, and large-scale text and data mining of research. Throughout his career, Petr has been a strong advocate for open infrastructures in scholarly communication, collaborating with enterprises, funders, and not-for-profit organisations to support diverse use cases, from research assessment and compliance monitoring to AI-driven discovery and policy-making. He has served as Principal Investigator or researcher in more than 25 European Commission, national, and international funded projects in Open Science, NLP and AI.
Computer Science
Hosted by:
Ruzica Piskac