Matilda De Sa
Matilda De Sa is studying BSc Geography at UCL. Matilda completed a UCL Social Data Institute internship with Geographic Data Service (GeoDS) in summer 2025 and provides an account of her experience
Describe the day-to-day responsibilities you had during your internship.
I worked on processing historic BT telephone directories. A lot of my time was spent researching and compiling historic STD dialling codes for different cities, which I then used in Python to match against entries in the directories. This meant each row of data (name, address, and phone number) could be tied to a specific telephone exchange area, making geocoding far more accurate by avoiding false matches across the UK.
I also regularly ran scripts on the CDRC servers to convert scanned directories (J2K format) into column images (PNG) and then into text using OCR. I also got more confident with shell scripting and Regex, and learned how to automate tasks so they could handle really large datasets. A typical day often began with checking in with [GeoDS Data Engineer] Maurizio Gibin to discuss progress or problems, and then working independently on these tasks with the flexibility to ask questions whenever needed.
What project(s) were you involved in? What outcomes or deliverables were generated?
I was involved in the BT Telephone Directories project, which aims to transform nearly a century of phone books into structured, geocoded datasets. The workflow included splitting pages into columns, applying OCR to turn them into text, and parsing the data into fields of names, addresses, and phone numbers. These entries were then geocoded using STD codes and exchange areas, enabling analysis of surname distributions, the spread of telephony, and links to demographic change.
My main deliverables included creating complete STD code lookup tables and running OCR scripts on directories. In the final two weeks, I worked independently on a mini-project using a modern North London directory. I geocoded the entries, created maps and summary statistics, and compared the data against census households to assess its value as a proxy. I summarised the findings in a technical report.
What was your favorite task/responsibility during your internship? Which piece of work are you most proud of?
My highlight was completing the North London mini-project. I enjoyed taking ownership of the entire process from running the geocoding, to designing maps and interpreting the results, through to writing a final report. It was rewarding to experiment with different ways of visualising the data and to think critically about what the outputs revealed. That independence made the project especially fulfilling.
What did you find challenging during your internship?
Getting used to working in a Linux server environment was challenging at first. Although I had some background from my remote sensing module, learning to navigate directory structures, use tmux, and apply shell commands in more advanced ways took time. I also had to quickly pick up tools like Regex, which was completely new to me.
What software and data analysis techniques did you have the opportunity to use?
I worked with a wide range of tools. Using WSL, I connected to the CDRC servers where I ran Python scripts to match STD codes and automate parts of the workflow. I also learned about Regex for cleaning OCR text, which was essential because of errors and inconsistencies in the scanned directories. In R, I used ggplot for mapping and Nominatim as a geocoder. For OCR itself, we used Tesseract, where I learned a lot about balancing speed and accuracy through parameter choices.
Were you able to apply any skills or knowledge from your academic studies to your internship work?
Linux commands from my remote sensing course helped me feel more confident on the servers, and I noticed parallels between OCR contrast adjustments and image classification in remote sensing. From my geocomputation module, I was already familiar with GeoDS’s Virtual Mapmaker, so it was exciting to contribute to the same group.
What was it like working for your provider?
The team at GeoDS was very welcoming and supportive from the start. My induction gave me space to learn about the project without pressure to understand everything immediately, and I had the chance to read background materials before diving in. My provider was excellent at explaining problems in depth and making sure I felt supported, while also giving me plenty of independence. That balance made it easy to build both understanding and confidence.
How would you describe the overall impact of this internship on your personal or professional development?
This internship has been a really valuable experience for me. It gave me confidence in my technical skills, introduced me to new tools like Regex and Tesseract, and taught me how to think about efficiency and scale in ways that coursework does not usually demand. It also gave me insight into real-world research workflows, and helped me to see how I might apply these methods in my dissertation and beyond. Overall, it has left me much more confident in pursuing data focused work after graduation.
What advice or guidance would you give to future students considering applying for the internship scheme?
My main advice is to tailor your CV and cover letter carefully to each role. Even if you are not on a data science pathway, do not let that put you off applying, as the scheme welcomes students from a range of backgrounds. There are tasks for everyone, and my provider was very accommodating to everyone’s different strengths. I also recommend making the most of the support available by asking questions whenever you need to.