Text Mining with tf-idf
04 April 2025, 2:00 pm–4:00 pm
Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique.
Event Information
Open to
- All
Availability
- Yes
Organiser
-
Marco Humbel, UCLDH Associate Director (ECR)
Location
-
Connected Environments Lab - Room 107 (First Floor)UCL East , One Pool StreetStratford, LondonE20 2AF
How can computational methods help to distinguish between documents in large text corpora? In this workshop we will explore how to use tf-idf (term frequency - inversed document frequency) for text mining. Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique. The 2 hours workshop will cover:
- Introduction to tf-idf.
- Hands-on exercise with Google Labs Notebook
- Discussion and reflection
The workshop is aimed at anyone interested in using tf-idf for research in the arts and humanities. Attendees should have some familiarity with programming languages (e.g. Python) and working with data, but no extensive experience is required. We encourage to bring own data to the workshop. Please bring your own laptop and ensure you have access to UCL Wifi (https://www.ucl.ac.uk/isd/services/get-connected/wi-fi/uclguest)
Participation is free but registration is required: https://ucldh-textmining.eventbrite.co.uk
The workshop is facilitated by Dr Marco Humbel (UCLDH/TU Darmstadt) and Dr Jiajie Zhang (UCL DIS).
This event is organised by UCL Centre for Digital Humanities (UCLDH), part of the UCL Institute of Advanced Studies. In 2025, UCLDH is celebrating its 15th anniversary.
UCLDH draws on UCL's world-class research strength especially in information studies, computing science, and the arts and humanities. It supports and coordinates work in many institutional settings throughout the university, including the library services, museums and collections. The research facilitated by UCLDH takes place at the intersection of digital technologies and humanities. It produces applications and models that make possible new kinds of research, both in the humanities disciplines and in computer science and its applied technologies. It also studies the impact of these techniques on cultural heritage, museums, libraries, archives, and culture at large.
About the Speakers
Dr Marco Humbel
Dr Marco Humbel is a research associate for the Mixed-methods Digital Oral History project and currently serves as ECR UCL Centre for Digital Humanities Associate Director. Previously he worked for the AHRC Towards a National Collection Sloane Lab project. His research interests include Collections as Data, Digital Infrastructures, Social Movement Archives and AI technologies for Cultural Heritage, such as Handwritten Text Recognition.
Dr Jiajie Zhang
Dr Jiajie Zhang is a Research Associate for the Mixed-methods Digital Oral History project at UCL Department of Information Studies. Previously, he developed knowledge graph solutions for research impact analysis at Newcastle University. His research interests encompass Semantic Web Technologies, Knowledge Graphs, Information Extraction, Natural Language Processing, and Large Language Models, with applications in digital humanities and cultural heritage.
Close
