XClose

UCL Centre for Digital Humanities

Home
Menu

Text Mining with tf-idf

04 April 2025, 2:00 pm–4:00 pm

text mining word cloud

Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique.

Event Information

Open to

All

Availability

Yes

Organiser

Marco Humbel, UCLDH Associate Director (ECR)

Location

Connected Environments Lab - Room 107 (First Floor)
UCL East , One Pool Street
Stratford, London
E20 2AF

How can computational methods help to distinguish between documents in large text corpora? In this workshop we will explore how to use tf-idf (term frequency - inversed document frequency) for text mining. Tf-idf is an information retrieval method to extract distinctive keywords from documents. In this interactive workshop we will explore the possibilities and limitations of the technique. The 2 hours workshop will cover:

- Introduction to tf-idf.
- Hands-on exercise with Google Labs Notebook
- Discussion and reflection

The workshop is aimed at anyone interested in using tf-idf for research in the arts and humanities. Attendees should have some familiarity with programming languages (e.g. Python) and working with data, but no extensive experience is required. We encourage to bring own data to the workshop. Please bring your own laptop and ensure you have access to UCL Wifi (https://www.ucl.ac.uk/isd/services/get-connected/wi-fi/uclguest)

Participation is free but registration is required: https://ucldh-textmining.eventbrite.co.uk

The workshop is facilitated by Dr Marco Humbel (UCLDH/TU Darmstadt) and Dr Jiajie Zhang (UCL DIS).

About the Speakers

Dr Marco Humbel

Dr Marco Humbel is a research associate for the Mixed-methods Digital Oral History project and currently serves as ECR UCL Centre for Digital Humanities Associate Director. Previously he worked for the AHRC Towards a National Collection Sloane Lab project. His research interests include Collections as Data, Digital Infrastructures, Social Movement Archives and AI technologies for Cultural Heritage, such as Handwritten Text Recognition.
 

Dr Jiajie Zhang


Dr Jiajie Zhang is a Research Associate for the Mixed-methods Digital Oral History project at UCL Department of Information Studies. Previously, he developed knowledge graph solutions for research impact analysis at Newcastle University. His research interests encompass Semantic Web Technologies, Knowledge Graphs, Information Extraction, Natural Language Processing, and Large Language Models, with applications in digital humanities and cultural heritage