Text Mining with tf-idf
04 April 2025, 2:00 pm–4:00 pm

Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique.
Event Information
Open to
- All
Availability
- Yes
Organiser
-
Marco Humbel, UCLDH Associate Director (ECR)
Location
-
Connected Environments Lab - Room 107 (First Floor)UCL East , One Pool StreetStratford, LondonE20 2AF
How can computational methods help to distinguish between documents in large text corpora? In this workshop we will explore how to use tf-idf (term frequency - inversed document frequency) for text mining. Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique. The 2 hours workshop will cover:
- Introduction to tf-idf.
- Hands-on exercise with Google Labs Notebook
- Discussion and reflection
The workshop is aimed at anyone interested in using tf-idf for research in the arts and humanities. Attendees should have some familiarity with programming languages (e.g. Python) and working with data, but no extensive experience is required. We encourage to bring own data to the workshop. Please bring your own laptop and ensure you have access to UCL Wifi (https://www.ucl.ac.uk/isd/services/get-connected/wi-fi/uclguest)
Participation is free but registration is required: https://ucldh-textmining.eventbrite.co.uk
The workshop is facilitated by Dr Marco Humbel (UCLDH/TU Darmstadt) and Dr Jiajie Zhang (UCL DIS).
About the Speakers
Dr Marco Humbel
Dr Marco Humbel is a research associate for the Mixed-methods Digital Oral History project and currently serves as ECR UCL Centre for Digital Humanities Associate Director. Previously he worked for the AHRC Towards a National Collection Sloane Lab project. His research interests include Collections as Data, Digital Infrastructures, Social Movement Archives and AI technologies for Cultural Heritage, such as Handwritten Text Recognition.
Dr Jiajie Zhang
Dr Jiajie Zhang is a Research Associate for the Mixed-methods Digital Oral History project at UCL Department of Information Studies. Previously, he developed knowledge graph solutions for research impact analysis at Newcastle University. His research interests encompass Semantic Web Technologies, Knowledge Graphs, Information Extraction, Natural Language Processing, and Large Language Models, with applications in digital humanities and cultural heritage