Statistical Science


Episode 6 Transcript

Nathan Green  0:13  
Hi, everyone, my name is Nathan Green. And this is Sample Space, the podcast from the Department of Statistical Science here at UCL. I'm very pleased to be speaking with Mine Dogucu about something a little bit different today. So we're going to be talking about teaching and education in the context of statistics and her experiences and some things that I know absolutely nothing about. So I'm really looking forward to so hi, Mine. Before we crack on with that, would you like to give an introduction, a bit of your own background, please.

Mine Dogucu  0:47  
So I just joined the department in July actually, I'm pretty new in the department and in the UK. I moved here from California. I was an assistant professor of teaching in the Department of Statistics at the University of California, Irvine. I was also the Vice Chair of undergraduate studies there.

Nathan Green  1:04  
Okay, so how long have you been in the department?

Mine Dogucu  1:07  
So since July. So we're recording this at the end of September. So it's actually today's my second month anniversary in London.

Nathan Green  1:18  
Okay. So how's it going?

Mine Dogucu  1:22  
So far, so good, I must say, I really like the teaching community in the department. So this is very unusual for me because back in California, I was the only Professor focused on teaching. And here we have a very big group of teaching lecturers and professors. So this is something very new and very exciting for me.

Nathan Green  1:43  
So what have you been working on before you came to UCL?

Mine Dogucu  1:46  
So I think most recently, my biggest time was spent on Bayes in education. So I was teaching a basic course for undergraduates in the United States. Actually, a basic course for undergraduates is not quite as common in the US. I know, we're in a very Bayesian department. Actually, I worked recently on an education research looking at universities in the US, and how many of them offer basic courses and so on. So it's not as common as one would think. So I was teaching a basic course. And from that course, we started writing a textbook with my collaborators at different institutions, Elisha Johnson and Myles OTT. So it's the Bayes Rules book. So a good portion of my time was spent on writing that book and preparing teaching materials for a basic course for undergraduate audience.

Nathan Green  2:43  
How do they do that now, because I remember there was a conflict between whether you should bring in the Bayesian stuff at the beginning, and sort of mix it with the frequentist, or whether how I was taught to me like 20 odd years ago, it's the frequentist stuff? And then if you're interested later on, you might do some Bayesian stuff. So what's like your take on that?

Mine Dogucu  3:05  
I teach the same way. Like I used to teach senior fourth year students in the United States and they used to be get their frequentist training. And some Bayesian concepts would show up in their training as well. But like their first training is really like full training in Bayesian statistics will be in their last year of their degree, which is the fourth year. To be honest, even I know that there's a big debate should they teach Bayesian or not like depends on who you ask. But one thing I've noticed through teaching Bayesian statistics is actually it helps students understand frequentist statistics as well. So for instance, at the beginning of my course, like our textbook also, we covered the foundations, and so on and simulation approximating the posterior, but once we get using the posterior, and making inferences, like hypothesis testing, and so on, I know very well like in every time I teach hypothesis testing from a Bayesian perspective, students actually start thinking about hypothesis testing from frequentist perspective as well. Like, we teach p values over and over. And p values are very hard to internalise what they mean what they represent. But once they actually see hypothesis testing from a Bayesian perspective, and think about given data concept versus given null hypothesis true, that's when they actually understand what they have been doing in their frequencies, classes all along. It makes a little bit much more sense once they see the bass in perspective, because they have something to compare it with.

Nathan Green  4:40  
That's really interesting. So it doesn't kind of confuse them to have these two different ways of thinking about it?  And  is there a preference? Because, like you said, our department is very Bayesian, so we're all very bias, but it just seems a lot more of a natural way to do things, especially in terms of the interpretations of things, you know, like credible intervals and what have you. So what do you find from the students, what's the feedback from the students?

Mine Dogucu  5:12  
So when I teach the course, and in general, I never put pressure on students, I'm not trying to raise next generation Bayesians, but I do want students to be literate in Bayesian statistics. And they should be able to choose frequentist or Bayesian methods, depending on their research questions and where they end up in life. So the goal is not necessarily to make them basins. But I must confess that they do enjoy course, a lot. And they do end up saying things like, oh, frequentist statistics is lying to us all this time. So I did get comments like that. And they do have Bayesian tendencies towards the end. But I also have students, for instance, who like they take 10 weeks, of course, and they still oppose the idea of building a prior model. So I have all sorts of ideas, and they're all welcome in the classroom, to be honest, because they all exist in the statistics community as well.

Nathan Green  6:08  
Yeah, I think that's fair, actually. Whichever side of the argument you're on, you're always going to meet both the approaches. That's definitely true. And I've noticed that sort of a more of a pragmatism more recently, you know, it's not like everything's Bayesian or frequentist, people just don't care anymore. In the same way, people aren't getting angry at conferences like they used to.

Mine Dogucu  6:33  
Yeah, that is true. And also, like, from an education perspective, I would rather have my students be informed and pick fights if they're going to pick fights than not know about the topic and pick fights. So we want them to be educated about the topic.

Nathan Green  6:50  
Yeah, I like that. Okay, so that's, what you're teaching, how are you teaching it? Do you teach the intuition behind the method? And then you sort of introduced the equations and do it that way? Or do you sort of just show equations upfront and then pull it apart?

Mine Dogucu  7:10  
No, we actually leave math until the end. So the teaching in the course is very much computer assisted. Especially visualisations. Visualisations really help students see what's happening to posterior basically, how is prior influencing the posterior. How is likely that this is influencing the posterior like, if you think about deriving, let's say the beta-binomial model. And when they look at even the beta PDF posterior, like beta PDF, we actually like students just look at it as Greek letters right? Whereas if you show a couple of beta PDFs, they can actually see how the parameters are influencing the distribution. So visualisations will help them internalise that. And also, visualisations help them try different scenarios like what if what happens when the prior plateau, what happens if it's informative, prior highly informative prior? Or what happens when we have more data so the MLE might be similar. But if you collect more data, what happens to the posterior. So they actually see the influence of more evidence collection and how that influences our conclusions. So, for that reason, actually, we support the book with an R package that has different sets of functions, but the core set of functions is actually making these visualisations easy.

Nathan Green  8:41  
Yeah, that's a good idea. It's the same material in the Bayes Rules book as it is in the lectures. And then that's also linked with the package?

Mine Dogucu  8:52  
That is correct. Because my co-authors were also teaching similar courses in their institutions. So we wrote the books based on our course notes, reiterated, change things in our courses in the book and so on. So it was a continuous project. Intertwining the book and the courses.

Nathan Green  9:12  
Yeah, nice. And it's an online book, right? So it's like a live document.

Mine Dogucu  9:16  
That is absolutely true. It's based rules. book.com. And that was actually when we first started writing the book before even reaching out to publishers for proposals. That was our first criteria, we will make this book accessible online. And thankfully, our publishers CRC was fine with that and we make it open access.

Nathan Green  9:38  
Great. Okay. Well, I think I need to read it. Okay, so can we can you explain to me what accessibility is in education, please?

Mine Dogucu  9:51  
Absolutely. So we just talked about having an open access book, for instance. So the book is actually read by hundreds of users every day. Maybe I'm not sure how much its read, but at least the website is visited by hundreds of users every day. Sometimes 1000s of users, depending on some social media posts. And if you think about it, these users come from different parts of the world. And I'm not sure if we just had a physical book, this would have made it to everywhere in the world. So to be honest, so part of accessibility is actually lowering down costs, which is a big barrier in education, because textbooks cost a lot. So by making open access resources, and not just books, but other teaching materials, as well, like our slides, or hand notes or worksheets, these can also be open access as well. So I tried to do as much as possible on that and from my own courses, but as we were writing Bayes Rules book, we also learned a little bit about accessibility from different perspective, and which is visual access. Like for colorblind people making sure that the plots are accessible, because we use different colour for prior, likelihood and posterior and they should be visible to someone with colour blindness. And also, something exciting happened during this book, we were learning about how to make it accessible for people with visual impairments or blind people. We wanted to provide descriptions of figures, so that people who use screen readers can actually read the plots and make sense of the plots. And we wrote the book using the R package bookdown, but bookdown actually did not support. These are called alternate texts for figures, bookdown back then R markdown or knitr R packages did not support alternative text back then. And we requested it from the team who develops our Markdown and they actually started developing this. So now R markdown actually has alternative text available. And the new generation, R markdown, which is quarto also supports alternate text. So we were able to write our book with alternate texts for figures, which was exciting for us.

Nathan Green  12:15  
Yeah, sounds great. How does that practically work? The you write the description and then the record of an audio which was attached to the book? Or how does that work?

Mine Dogucu  12:27  
Good question. So alternative text is actually available. In many software. This is not something only statisticians should consider. Like, even if you do PowerPoint presentations, even if you post on Twitter, but the way it works is like on websites. So you may be thinking about a principle. But I'm actually I should clarify, I'm talking about the online book. As somebody who reads a screen reader, the screen reader can actually read every text on there. But when it comes to a plot, if there is no alternate text attached to plot, the screen reader would just read it as plots or plot.jpg or something. So we have to, on the back end, provide some description for the plots. And so it's not an audio file, it's actually a written part of the documents. So for those who use R markdown, it's actually a chunk option. But it's actually an HTML in HTML, there is an image alternative text and this is supported in many, many platforms.

Nathan Green  13:33  
I suppose I could use that in journal papers, you know, like it's applicable to everything.

Mine Dogucu  13:39  
So you would think that would be the case, Nathan. Once I wrote the manuscript, in fact, as part of that, actually, it's not it wasn't a manuscript, it was more like a professional organisations magazine piece. And I was talking about alternate texts in my writing. And unfortunately, they publish their magazine in PDF. And PDF is one of the least accessible formats, whereas HTML is much more accessible. And I insisted that they publish it in HTML. And this publication has been around at least 50 years and I think I'm afraid mine was the first HTML publication that they did. And that was only because I insisted that they put alternative text actually getting journals to publish your alternate text is a very deep battle. So I hope that professional organisations and journals take this, I think, one good organisation that does this well or better, should I say not best, is ACM American Computing Machinery group does have a good accessibility group that have guidelines for journals and conferences to take on accessibility.

So when I if I get a paper published, I should always try it. I feel meant that we should sort of collectively put pressure on. And what happens to the equations?

So for equations usually, using math Jack's usually have Switch screen readers. But I think so I'm not 100% sure about this. Maybe I shouldn't because it but it is very difficult because it's, it's all there after all, and getting subscripts superscripts. All where it should be is a hard test to get an idea. That definitely becomes challenging. Yes.

Nathan Green  15:33  
Yeah. Okay. That's a work in progress. So you had a win with the package down, or book down, sorry. So if anyone wants to do it in book down, now they can?

Mine Dogucu  15:47  
They can enter R markdown supported packages, or quarto, like, book, download Rmarkdown, quarto. All of these make it possible to write alternative text.

Nathan Green  15:58  
Okay, cool. I'm gonna try that as well.

Mine Dogucu  16:00  
So this also made me consider, why it took me many years coming out of PhD to figure all these things out, even though I always have visual components in my work. And so I started questions, things like are markdown, like, why did it take so many years to support this? And I think one of the reasons is because in statistics programme, we don't actually learn about accessibility. Accessibility, we can think about, like accessibility is relevant to any field really. But also there's accessibility specifically for statistics and data science, like how do we verbalise the plots, for instance, and so on. So, this made me question and there is a very good group of industry and education partners called Teach access. So they tried to close this gap. So they tried to help instructors teach accessibility to their students, they give grants for this, and I was lucky enough to get the grant from them. And I started actually teaching my own students accessibility as part of their data science courses. That of course, I have a lot to do on this. And I cannot say that its in great shape right now. But I think we have to consider our curricular design. Because if we don't teach it, we're not going to know, maybe by chance, we might hear it on a podcast or something.

Nathan Green  17:19  
Exactly. I mean, I've never heard of it. You know, I'm old. So is there like a, not quite a template, but is there like a format that you can use to describe a certain type of plot, for example? You know, is there a certain language lexicon that you'd use for that? Has it been standardised?

Mine Dogucu  17:38  
Good question. So there are two resources I can recommend. One is, so there is an automatic way to do this. One of them is Braille our package in our and this one actually is aimed for doing writing automated figure out texts. And people with visual impairments also use this. But the thing is, it's only supports a few plots, like histograms, bar plots, and so on. But if you think about scatter plots, that they're very hard to describe, because there's some trends that we are trying to capture. Same with trace plots in Bayesian statistics, for instance, trade. After all, these are automated, what is trying to do is like, it's trying to read the height of the bar, it's trying to read what's on the x axis, y axis, and so on. But trace plots, you're not going to read every point on the trace plot, right? Automatically, so they become harder to manage. But there are people working in this area. And there are also like people from computer science part who are trying to build software who can automate this process and so on. But in terms of writing from like manual perspective, there are some guidelines to one of them is by Amy sisal. If you actually write tied to Google, I bet it will show up like biting alternative text for data thoughts. Hers would be the first one to pop up. It's a Medium post. And she does talk about like, what includes some of these, like most important part is using units, x axis y axis was the plots like it's the histogram scatter plot, but most importantly, what is it telling, but the message in that plot is, and usually a rule of thumb for accessibility, if you include visualisation, if you can include your data in raw format somewhere, either in a GitHub report something is better because a visually impaired user can interact with your data if they want to find out more about it and like the original data source, a very good website for this also, even though I'm a hard alt like I always use our for my work and my teaching, but SAS is actually pretty good with accessibility and they're so accessible to website has very good examples on it. faxing with data and plots.

Nathan Green  20:03  
Great. Okay, well, I'll make sure to include those in the, in the text that goes along with this podcast. So people can check all that out, including the raw data is one way of helping reproducibility. Right. So the last point I have here is reproducible teaching. So, what do you mean by that I see the thing that sounds sort of modern, and I've been left behind with.

Mine Dogucu  20:28  
So in statistics, education with environment, think about software, we tend to focus on two things. One is software, we statistical software we use for teaching statistics. And the other one is for doing statistics, let me say teaching what I mean is what the students are learning as students. So this used to be like Minitab, let's say, and now it's, maybe we're teaching our, but at the same time, we wrote a preference, and we're arguing that the third, there's a third dimension, and that's what the instructor uses themselves. It's for teaching. So for instance, I could be teaching Minitab, let's say I wouldn't be, but I could be using our markdown to prepare my slides. So there's a third dimension that gets into the picture. And we always, I think, argue for reproducible research, about doing research. So we want to exit with try to take this from a teaching perspective. And this is very important, because many students we taught we try to teach them reproducibility. But they don't necessarily not every student gets to interact with us from a research perspective. I know that here, they do have research projects, but at least in the United States, it's not as common for every student to do a research project. So reproducible teaching in the classroom actually gives students to see their professor or lecturer actually using these tools. So for instance, when I teach, I use Git GitHub for my course pages, I use our Markdown and I teach our markdown, so a student who actually sees me using our Markdown and they have access to my slides. It's open source, open access. So they can if they see something that I have done in my slides that I'm not necessarily teaching them how to do they actually go to source code and learn it if they want to. And so basically, I'm teaching them the tool, but I'm also using the tool to teach them and everything we talk about reproducible research, like having version control, having access to data, having literate programming in research, is the whole story for teaching as well.

Nathan Green  22:43  
Excellent. I mean, everyone that I work with, right at the start, I always say lets work through GitHub. And more often than not, they don't have an account. They've never used it before. So if you're teaching students to be comfortable with using those, that's only a good thing. What about, like GitHub Pages and things do you use? Do you use those features?

Mine Dogucu  23:06  
Yes, I do use it myself. Because my course websites are hosted on GitHub Pages, some of them actually have their own domain names, and some that for classroom use, unfortunately, for private repos, is not easy, because for my courses, I have a GitHub organisation for each course. And private repos are not not possible. And student work has to be private, kept private, so that I had hard time getting working around that. In the last time. The last time I was teaching this course.

Nathan Green  23:39  
I remember seeing, you know, Jenny Brian, yeah, Rstudio, I remember seeing some of her stuff, which she uses for teaching R. And I was really impressed by how it was like integrating those like here project, marking things and assigning into groups. And it was all done via GitHub.

Mine Dogucu  23:59  
Actually, we also have our package that I co-developed with two PhD students back in UC Irvine. And basically, because also like grading, like getting these. And also like opening each, we do a lot of project work opening these one by one. So we have this package, great tools that we can actually open a student's projects, grade them there, we create the rubric, create each project automatically save that and move on to the next project. So we do have like, even though it seems overwhelming to teach, get get up, there are many resources out there, that actually makes the process much easier.

Nathan Green  24:41  
I've got quite a long list of things to do after this. That's a good sign, I think. Okay, so that's really all I wanted to talk to you about today. Like I said, it's been absolutely fascinating. And I'll probably want to talk to you some more in the future if that's Okay?

Mine Dogucu  25:00  
I would love to.

Nathan Green  25:01  
Yeah, fantastic. So everyone read the Bayes Rules book, and not necessarily to be a Bayesian statistician. I'd like to finish by thanking you for talking to us today. And hopefully I will see you soon around the department.

Mine Dogucu  25:17  
Thank you for having me.

Unknown Speaker  25:20  
UCL minds brings together the knowledge, insights and ideas of our community through a wide range of events and activities that are open to everyone.