XClose

Statistical Science

Home
Menu

Transcript: Episode 8

Omar Rivasplata  0:14  
Hello, my name is Omar Rivasplata. This is Sample Space, the podcast series of the Department of Statistical Science at University College London. Today, we are happy to welcome Sam Tickle. Sam is a lecturer at the University of Bristol, and today we're going to talk about his career in statistics and his research in changepoint detection. Welcome, Sam.

Sam Tickle  0:38  
Thanks so much Omar. And yes, I had no idea this was called sample space. Sam on sample space, I'm not entirely sure what to make of that. But there we go. Thank you very much for the invite.

Omar Rivasplata  0:49  
Very appropriate name for people interested in statistics, I think, and catchy name anyways. Welcome Sam. [Thank you very much.] We are going to be talking about a few things related to your career journey and your current research. So would you mind telling us about your career journey? Like, for instance, did you always know that you would like to pursue a career in statistics? How and when did you find out?

Sam Tickle  1:18  
Yes, that's an interesting question. I suppose I've always liked maths, from sort of, as long as I can remember, really, I've always I've always enjoyed, I've always enjoyed my numbers. And really, it's just a case of, I've always, I've always liked mixing that with, with facts about the real world in some way. So mixing those two things together, it's sort of I sort of fell into statistics in quite a natural fashion. So I've just sort of spend time absorbing facts from from from all sorts of useless places, really. I'm a mine of really useless information, I would say. And it was really sort of, towards the end of my undergraduate studies that, that that that sort of became the dominant, the dominant thing, I sort of realised that sitting there and proving something abstract in say, representation theory, wasn't going to be my calling in life. And really, it was getting down and dirty with really, really interesting datasets that can, hopefully, someday help us change the world. So I've always, I've always liked the big problems always been attracted to the big problems. And yes, I've been fortunate in my career so far that a few big problems have coincided with, with my career over the last few years. Not to mention COVID, and, and the looming spectre of climate change, etc. So yes, there's, there's, there's no shortage of problems, I can't claim to be offering any solutions at all. But that's, it's certainly it's certainly a ripe arena, shall we say, for the statistician to get their teeth stuck into things?

Omar Rivasplata  2:48  
And that makes complete sense. Were there any tough obstacles that you needed to overcome at that early stage of your career?

Sam Tickle  2:59  
That's a great question. I'd say that the honest answer to that is, is no, but that was purely because I was quite fortunate in terms of mentors, etc, who had my back. So just a quick roll call of where I was, and what I did. So my original undergraduate degree was was at Cambridge. And from there, I did the STOR-i Centre for Doctoral Training programme at Lancaster University. And really, that environment has, is really fantastic for fostering great statistical thinking. And also, it's fantastic for operational research. So STOR-i standing for statistics, operational research with industry. So there's a real focus there, as I say, on these kind of hot button problems with some application to an industry of choice. So while I was there, it was, there really wasn't any question of my needs not being met. And really, I think, well, I'm therefore a big proponent of the CDT model for doing PhDs as a result of that. I obviously, appreciate there are certain pluses in terms of the traditional model of PhDs as well. But yes, I'm so I'm certainly a full CDT convert.

Omar Rivasplata  4:07  
That sounds great and it's great to hear. From all accounts, the CDT model has many advantages, it encourages, it fosters like great careers among among early researchers. So I'm totally in line with you.

Sam Tickle  4:22  
Indeed, yes. And I think it's not obviously it's not just STOR-i. I mean, here in Bristol, we have COMPASS, which has produced I mean, it's, it's very, very new just started a few years ago, pretty much the same time as I started at Bristol, and it's going from strength to strength and again, has these great industrial links that really set it apart as a programme. And there are various other CDTs across the country in a similar vein.

Omar Rivasplata  4:44  
Let's sort of change point a little bit. And, in the conversation, I mean. Tell us about your research in Changepoint Detection.

Sam Tickle  4:52  
Sure, yeah. So, speaking of the CDT that that I was at, while I was there, I was focusing on a PhD concentrating on changepoint detection. So my final thesis title, it's called Change Detection for Data-intensive Settings. And I was thinking about several particular problems that existed at the time. The time being sort of 2016 to 19. The first problem I was focused on was parallelisation, of various existing changepoint detection methods. The state of play in 2016 was you have these really rather fast changepoint detection methods for a single sequence of data, univariate data, in the offline setting, can see all the data in advance. That was great. And you could you could find the change points in linear time, but only in expectation. So there's this great method called PELT, which is I think, probably the most cited changepoint paper of all time. PELT stands for Pruned Exactly Linear Time. And there's a successful R package, and I think now a successful Python package as well [things in ruptures]. But as I say, it's only an expectation that this this this thing is linear. And in the worst case, it can be quadratic, which means it's kind of an unnecessarily disadvantage. But it can be as slow as some methods that were in the literature for quite a long time ago. My first job was to try and use some just subtle little tweaks to the algorithm to [I mean] that you get a essentially a worst case linear cost at all times, but keeping that good theoretical guarantee that you're going to find the change points, you have asymptotic consistency in the locations and the number of change points that you find. So you have to make some assumptions with regards to how the change points are spaced within the data. So you can't have sort of, you know, pathological numbers popping up as you increase the amount of data that you collect. But subject to these assumptions, you can find the change points. And this was joint work with my supervisors, Professors Idris Eckley and Paul Fearnhead at Lancaster, and also a former another former STOR-i student, Dr. Kaylea Haynes. All great people, all very, very published and established in the change point literature. So that was my first project. My second project, as I say, I'm very much interested in data, that's the reason I became a statistician, after all, and so I went looking for a fun data set that I could sort of just get, apply apply any old change point model to, just just just just have fun for a little while. As ever, the first port of call was Kaggle, because, of course, it is. One data set that I found was the global terrorism database, the GTD. Now the global terrorism database, which is copyrighted to the University of Maryland, is a fantastic open source resource, essentially, has every single terror activity since 1970 compiled within it, from everywhere around the world. And it's maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism, or START. And what I thought was, okay, we can, we can see where things are happening in the database, you can, you've got, you've got sort of a country location, you've even got a sort of geographical exact geographic location if you want it, you've got all sorts of details with regards to what happened at the event. But I was more interested in how things evolved over time, I asked myself the question, what would happen if we were to do a count of the number of incidents within a given period of time, you could choose a day, you could choose a week, I chose to do it in months. So every month from 1970 to the present day, in a given region or country. What would happen then. Would we see sort of points in time where the probability of a terrorist attack seems to change abruptly across a given country a given region or around the world? That was the fundamental question I asked myself, that led to a few interesting change point questions, because when we have count data, we are outside the world of of normal distributions. And a lot of the changepoint detection theory lives in the normal distribution world. So if you pick up a random change point paper that's, say in the Annals of Statistics, or, or JASA, or something, chances are, it's got some really nice, beautiful theory in there, and also the strong chances are that theory will mostly be confined to the world of normal distributions. That was certainly true 10 years ago, it's less true now, but it was it was certainly true 10 years ago. It still is the case that most theory is confined to, you know, a fairly restrictive class of distributions. So that was a problem. So we need to we need to (a) come up with a method that was able to handle count data in this in this way, and while still being efficient, they were existing methods, but we wanted, we wanted to try and come up with something new; and (b) we also wanted to come up with some theory that that would work in this setting. So this was this was the challenge. And as I say, it was inspired totally by just looking at a data set and saying, well, the tool doesn't seem to exist for this yet, so let's let's go away and have a look at it. So that was that was project number two. And that led to a method called SUBSET. So I mentioned PELT before as an acronym of a change point detection method, PELT of course, being you know, it's very fascinating. It's linear time or expected linear time. SUBSET is a very, very pained acronym indeed. It stands for Sparse and Ubiquitous Binary Segmentation in Efficient Time. And everybody in the audience who didn't already think I was a monumental insert favourite colourful phrase here, now thinks that. So I'll pick I'll unpick the different terms one by one. Sparse and ubiquitous first of all. One key issue that we've I've kind of already touched upon here is that, well, we want to detect changes in first of all in countries or regions versus the whole world. And we won't be able to tell the difference between these two different types of change, you want to tell the difference between a change that affects a one country versus a change that affects everything at once, you know, it's a global change point. So we want to have good power, good statistical power to detect what we call sparse changes, changes that just are sparse in the variates that are in the system, versus global changes, changes that affects most, if not all of the variates in the system. Now I use the word ubiquitous there. I think I think I'd just published the paper. And it was two days later, and somebody said, why didn't you use the word universal? And I paused for 10 seconds. And I went, I'm an idiot. I should have just used the word universal. Anyway, nevermind. It's there forevermore, that I've used the word ubiquitous. Yes, it's fast, ubiquitous. In the sense, we have good statistical power to detect both types of change.

Omar Rivasplata  10:54  
What excites you about this line of research? What are the main open questions? What have you accomplished?

Sam Tickle  11:00  
Gosh, yeah. So that's, that's a really big question. So what have I accomplished thus far, rewinding slightly back to the beginning of PhD, PhD time, when I first sort of started tackling the problem with change detection. So we were talking about CDTs before. Back in my CDT days, I was thinking about, first of all, a problem in parallelisation. So a parallelisation of existing changepoint techniques. Now there's this very fast method called PELT, stands for pruned exact linear time, changepoint people absolutely love acronyms and I'm no exception, as I'm sure we'll get on to in a moment. And this PELT method is extremely, extremely fast. It's linear time in expectation. And the word exact there is essentially referring to the fact that it captures the change points according to minimising some global cost function, exactly. And by global cost function, feel free to go away and read the paper, it's essentially, there's an optimization problem that it's solving under the hood to find these changes. So that's, that's all very well, but the key there is expected linear time, and in the worst case, it can still be quadratic. So that can still be a bit of a problem. Now, this paper is a fantastic paper, it's been cited 1000s of times, I think it's probably the most heavily cited changepoint paper in the literature. And it's only it's only about 10 years old. It was again group of Lancaster people, two of whom were my supervisors, the third of whom was Dr. Rebecca Killick, Professor Rebecca Killick as of this year, she's just recently been promoted. Congratulations, Rebecca. However, as I say, in the worst case, it can still be quadratic. So the first project that I and my supervisors, Professors Idris Eckley and Paul Fearnhead, were thinking about was trying to do some subtle tweaks to the algorithm to make sure that we could, in the worst case, make it linear. And so we did this at the cost of the exactness and that cost function that I described a moment ago. But we still proved that we could get asymptotic consistency in the locations and the numbers of the changes under some not too restrictive assumptions on where the change points spawn. So you can't have sort of a pathological number of changes generating as you collect more and more data. So that worked nicely. And it was it was a fun theoretical exercise. And that paper is now in the Journal of Computation and Graphical Statistics. And so the next question I started thinking about as part of my PhD, we're going back to this notion of living data and data being the king of things, the centre of things, I went on Kaggle. And I found on Kaggle, a really cool dataset. And obviously, the very first thing that one does is go on Kaggle, right, this data set is the global terrorism database. Now, the global terrorism database, or the GTD is copyrighted to the University of Maryland, and it is maintained, and it's updated by the National Consortium for the Study of Terrorism and Responses to Terrorism, or START. It's a fantastic resource, a truly fantastic resource, you've got a compilation of every terrorist event since the beginning of 1970, to essentially the present day. You've got every single piece of information you might ever want about each terror activity, each bit of terror activity. And I thought, this is a truly fantastic resource, you've got massive amount of data, how can I turn it into a change point problem? At the time, I was sort of constrained to have this sort of laser focus on the problem of changes. And so I thought to myself, well, what, what would happen if we counted the number of incidents, let's say per month, in each, let's say, region of the world. We can partition the world into various regions, according to database, that's something we can do. We can we can do it by country as well. But we'll do it by region. How many incidents do we see? And therein, can we find a point in time or points in time for each of these regions where the probability of a terrorist attack seems to increase or decrease? And suddenly we've got ourselves to change point problem.

Omar Rivasplata  14:22  
Quick question Sam. [Sure.] Had this data set been studied by change point detection methods before?

Sam Tickle  14:28  
So that's a great question. It has been, but not in so many words. So the principal investigators on the global terrorism database are Dugan, LaFree, and I think there's a person called Miller as well. They have been writing about the the GTD since it became open source in 2007. And they've released a number of fairly straightforward statistical analyses a bit that they're not to be sort of, you know, theoretical statisticians by trade, but they're very passionate about this data set and they've said some had some really fantastic insights with regards to it. So for example, they've been zeroed in on, for example, policy changes in the UK, affecting a changing terror landscape in Northern Ireland during the troubles between 1969 and 1992. They've also commented on the bigger picture with regards to what they call the internment, the globalisation of terror between 1976 and 79. That's unfortunately, when terror activity became much more prevalent throughout the globe, essentially tripled in between in those three years. They've also more recently commented on other countries, some commentary on Egypt, for example. They're putting up papers all the time about about the data set, just because they're they're essentially on the spot for it. With regards to non principal investigators on the GDT, so you've got I think, a paper from about 10 years ago, that analysed a kind of a by day analysis on Colombia. In particular, it was looking at the activities of FARC. Of course, I think it was in 2016, that there was a peace agreement signed between FARC and the government. But there was a particularly brutal conflict that happened between FARC and the authorities particulary in the late 90s and the early noughties. And they actually found to the day, this is the investigators on this new paper, using a hidden Markov model type approach, they found this sort of the exact day where the US came in and said, right, we're going to try and tackle the drug economy in Colombia. And the US essentially came in and, and to try and stop the activities of FARC, which is, which is rather cool. However, obviously, we know that unfortunately, some terror activities are far more efficacious on the number of lives they take than others, they have a far wider impact and far more devastating. And I think it's important that statistical analysis, take this into account. This is something I've, this is something I've neglected. And one change point detection method did look at this, it was it was time to do some more anomaly detection type things in concert. But essentially, it's important if you want to capture things like 911. Because in my analysis, 911 is just kind of Oh, four incidents happened today. But it does sort of doesn't capture the full magesty of the fact that the world changed forever on the 11th of September 2001.

Omar Rivasplata  16:51  
So this is all knowledge that existed, knowledge of like, things that happened events that happened, right? Is it fair to say that the techniques, and I guess the hopes of the research of applying these techniques to these data, was to be able to, in hindsight, predict those events in one way or another?

Sam Tickle  17:11  
I think that's a fair assessment. Yeah. So one thing I would quite like to be able to think I contributed to one day, is some way of looking at these data, which then can be taken to policymakers or people who are far more knowledgeable about me and the workings of how country X or region Y works, and say yeah, here you go, this is the overall signal of the data that you can see now, this is the history history of the segment the data, feel free to go and use this to recommend a suite of policies or, you know, a general, a general platform for promoting peace, and best terror, going forward. That was certainly my overriding concern, looking at these data. And as you say, within that you've got some notion of prediction in terms of how a signal can can evolve, whether abruptly or not. This touches on a quite important issue. So the analysis that I've presented so far, isn't again entirely offline. So my first project was focusing on parallelisation of techniques. And this, this technique is sort of relies on the fact that you have an entire sequence that you've already seen. This new technique that I was looking at to analyse a multivariate data set where you've got abrupt changes in terror activity, potentially, in some region or across the world, again, realising you're having seen 50 years worth of data. And that's no good if you want to say Ah yes, I want to, I want to predict what's going to happen in Colombia next week, or next month. So we want to be able to be online about things. And we also ideally, we'd like to be model-free. So a lot of change detection methods will assume that when you change, you just, like, say, change a parameter in the distribution, whether this is I'm going to go along and have it be a normal distribution, and suddenly, my mean alters abruptly, or I have a negative binomial distribution, or Poisson distribution, if I'm thinking about counts, and then suddenly my param one of my one or both of my parameters, if I'm negative binomial, alters abruptly. That's all very well and good. And you can prove some nice theory with regards to how readily you can see a change depending on the sort of extent of the change whether it's a difference in the means, for example, and in the in the normal distribution example. But we'd like to be a bit more general, if we can. We'd like to be able to say, Okay, I have one data generating process, we're going to change to some other data generating process. I'm not saying anything about what data generating process one is doing relative to generating process two, I'm just saying they're different. Can we detect it? And this is a much harder question than specifying what those two genuine degradation processes belong to in terms of families or distributions, etc. As far as I'm aware, that's a very, not terribly well studied problem in the changepoint literature. So most changepoint detection problems will focus in there with regards to the theory on I have this particular type of distribution, let's see if we can do some nice theory based on this distribution. For example, let's say we've got exponential family or we've got normal division or something. So I wanted to try and be a little bit boulder for my final project, which was the very, very end of my PhD, but it also has also now spilled over into my, into my current role. This is a new method that I call OMEN, and OMEN is kind of harking back to the fact that you mentioned prediction before, it's kind of a very, it's a way of [crowbar in data]. But yeah, this this is this the ominously named OMEN, which aims to aims to be model-free, while also saying, Okay, we don't want to be able to have, say that we've seen 50 years worth of data before we can get started on doing some analysis, we want to be able to run with the data as we collect it.

Omar Rivasplata  20:35  
And I suppose the "O" in OMEN stands for online?

Sam Tickle  20:38  
That's exactly right. Yes, that's exactly right. And the M, and the E stands for, well, the two extremes of the word multivariate. And the N is nonparametric. It's, as I say, we changepoint people we like, we like our bad acronyms and, I'm unfortunately no exception with that.

Omar Rivasplata  20:54  
And so among all these works, that you just mentioned, works on this dataset, the global terrorism dataset, GTD, these previous works, what was missing? What did you think you could contribute that motivated you to work and spend time on this dataset?

Sam Tickle  21:13  
I haven't seen a multivariate analysis of these data at any point. It was it was always, let's go and look at let's go look at this as a single sequence of things. It was always with a view to oh, let's let's do let's do something fairly straightforward or something that if not straightforward, then out quite a way outside of what I might want, you might turn traditional changepoint detection, sort of using the classic baskets of techniques like binary segmentation, like PELT, and so on. So I thought, Well, why not try? Why not see what we can, what we can do. That was really what got me started, it was just it was just looking at the data in a slightly different way to what people had done before. As I say, the data kind of lend themselves quite nicely to a spatial analysis, plenty of people have done that there's a really, really nice map of that sort of compiles all the activity over map of the world that you can find online. It's a really good resource. But again, it was it was just, nobody had quite looked at it in the way that we wanted to look at it. And that that was what got me started. It was it was it was that internal visualisation, if you like, of the data that I was keen to realise. 

Omar Rivasplata  22:10  
By the way, it's probably a nice thing to mention, and to hopefully have in the cherry-picked version of the conversation that will go into the podcast. This is what you told us about, research on this data set, this is what you told us about in your recent talk at our departmental seminar series with the Department of Statistical Science. [Yeah.] That's a talk that attracted very much interest, and everyone was excited about it. Can you give us the highlights of this talk?

Sam Tickle  22:37  
You're very kind to say that it's attracted interest and, well, that's, that's very kind. It was it was it was that it was great to come down to UCL, being there in person. It still feels somewhat of a novelty to actually see physical people when giving a talk. I never, I never want to go back to 2020 - 2021. But yes, you're quite right. So I mentioned this OMEN method before, or I teased at it at least, with regards to trying to be model-free, trying to not require too restrictive assumptions on what you're changing from, and to, in change detection, while maintaining a multivariate perspective on things, while also being able to detect both sparse and dense changes. So being able to detect changes that affect just a small subset of the variates in the system versus most if not all of the variates in the system. And finally, also being online. That kind of Venn Diagram of doing all those things at once, is a hard problem in change detection. So there are a couple of methods that I can point to. Before 2019, there weren't any. So there's a method called gStream, which I think has an R package. So this is a method by Hao Chen, at UC Davis. That's a very good method, and it works actually as well on non Euclidean data, which is quite, which is, which is of interest. Now until this year, I think I thought it was pretty much the only effort in the literature that which tried to do all three, all of these things in the in the Venn diagram, at the same time. There are a couple more which are slowly springing up, but they tend to rely on things like deep learning or reinforcement learning, so they rely on things outside what might call kind of traditional kind of fundamentally, likelihood based classical statistical techniques, if you'd if you'd like. They're starting this new age of techniques, if you'd like, for change detection, that is very exciting, but it's kind of outside the wheelhouse in which I was, I was originally working. OMEN is an attempt to be a little bit classical while being inside the Venn diagram, being in the middle of Venn diagram at the same time. And this is what I was talking about last week, when I came to UCL. Essentially, I was presenting a lot of theory, with regards to why this new method worked. In broad strokes, what this method does is it just takes in some data from your potentially quite high-dimensional series, and it just looks at it for a little while. And it looks at it for a time called the, what we call the learning window. And within the learning window, what we're going to do is we're going to build a profile for each of the variates in your system. And we're going to compute an empirical cumulative distribution function. And then we're going to use that empirical cumulative distribution function to transform all subsequent data points that we see in each of the variates in each stream. So we're going to look at each element in the stream, separately, each each variate in the stream separately, using the CDF that we've, we've created for each, each of the variates. So we do that, and then we do a subtle adjustment. So we do a subtle adjustment to each of the numbers once we've applied the empirical CDF after the learning window, such that we get a normal nought one, when there is no change point. And happy to happy to talk through exactly how, or I think possibly the talk from last week was recorded, or if you'd like, you can contact me and have a look at my slides. And eventually this will be hopefully a published paper. Feel free to feel free to read it once it comes out. But you can transform this to get to get to get normal nought one, you know, not too complicated away. But that's only under the assumption of no change. If you do get a change point, then the result is something which is sub-Gaussian, but not necessarily normal nought one. So you can transform a problem of an arbitrary change point to a problem in detecting a change from normal nought one to something which is still sub-Gaussian, but very unlikely to be normal nought one.

Omar Rivasplata  25:54  
At that point, the problem essentially becomes a hypothesis test. Is that correct?

Sam Tickle  25:58  
That's right. That's exactly right. Yes. Essentially, the answer at that point is working out how you control the false alarm rate, while making sure that you detect as many changes as you can. Both sparse and dense changes. So changes that are affecting, again, just a small subset of the variates or all the variates at once. And yes, I presented a theory last week, which showed that you could do that, depending on the appropriate setting of various different things in the distribution, so forth. And the method remains online. So it's, it's an efficient method, I've run it, I've run it again on the GTD just to make sure, and I was able to this time stratify by country and do things by day, and it was finding everything that we were finding before and also some interesting other features. So interestingly enough, 2008 was seen as a change point for a lot of countries, I guess, the global financial crisis precipitated a sort of subtle change in some places. We also detected sort of esoteric things, I feel bad saying esoteric actually, we also detected some country specific things I should say. So just to give some examples, in 2015, there was a political crisis in Bangladesh, for the first half of the year, the method actually sort of exactly found the beginning and the end of the of the crisis. Unfortunately, the period of time lent itself to numerous acts of unrest, which were classified as terror under the under the rules of the database. So we found things like that. And they also, also the feature that we found, which I was pleased with, because I see it as a useful validation tool for any method, is we found a change point on the first of January 1998. And the reason why that's a useful validation tool for any method, which is wanted to do changepoint detection on the GTD, is that on the first January 1998, the data collection procedure, if you like, for the GTD, changed, because they changed the definition of what terrorism was. By they here again, I mean, I mean, the START team in University of Maryland, I mean, the people compiling that GTD. I don't mean that every single human got together and unilaterally change the decision. Although I don't know I was, I was only a nipper in 1998. They fundamentally changed the definition of what a terrorist attack was, and so therefore, you would expect to see a change point everywhere. And indeed, you do see, if you boil it down by region, or by country, you do see sort of a very strange and peculiar looking drop in terror activity in 1998. They wanted to be more concentrated on terror activity, which actually meant. Despite that, despite the fact that there was this drop in 1998, almost everywhere, unfortunately, there has never been more terror activity pretty much than the now is just not quite true, as it was it was at its peak in the mid 2010s. But essentially, essentially, it is still true, which is not, not all that good. So that the need for clear-headed policymaking that's informed by years and years worth of data analysis and data collection and all the rest of it is still very important. And it's it's why I still think this problem is worth thinking about.

Omar Rivasplata  28:38  
Your method, OMEN, that you just told us about, and about which your research on the data set GTD is about. The O stands for being online. The M stands for multivariate. [That's right.] And I suppose the N stands for nonparametric, or sort of being model-free? [That's right.] Is it fair to say that those are the highlights of your method? And that's what your method brought in that was missing previous works on this dataset.

Sam Tickle  29:05  
That's right. Yeah. So when when I first came up with OMEN, which was actually I think, in 2018, it was very late 2018. My PhD ended in 2019, so I had sort of a few, a few months of scrubbing together the method to just sort of get the code working. And yes, it's since then I've just been improving it and touching up the theory, and then got distracted by things like pandemic and doing data analysis therein. But that's, that's another story. Back then, in 2018, I wasn't aware of anything in the literature that did all of these things at once. That's not to say they didn't exist, it just I wasn't aware of it. Now, as I say, there are a couple of entries in the literature, which do claim to live inside the middle of the Venn diagram doing all of these things, I mentioned gStream, mentioned a few other techniques, but they're still very much the minority. And they all have they all have interesting issues. And I'm not saying that OMEN is without issues, its detection power relative to offline methods is not great. It can detect things, it will detect change if it's sufficiently noticeable and prominent, but hey, there's still work to do here, there's still potentially a better method out there to be found. As I say that, that was the that was the reason behind trying to do trying to do a moonshot on this.

Omar Rivasplata  30:09  
That's great. And that kind of leads us to the last set of questions that I wanted to touch upon in this conversation. And that's all great, by the way. You just mentioned, there might be some other things, some unresolved problems that you, I suppose, would like to pursue on this same line of research in the future. Would you mind sharing a bit about your plans for the future? In terms of like research plans? What do you hope to accomplish?

Sam Tickle  30:33  
Sure. So there's an awful lot still to do and change detection. I mean, change is the only thing that's constant. And so so the only thing that's changing really is that, is the nature of the problem. I think that the exciting aspects of this are evolving into, well, what happens when you have measures of dependence? Not even thought about that, with respect to OMEN. What happens when you have systems where variates drop in and out? How can you properly account for that? I talked a little bit about non Euclidean data with respect to the gStream technique. But, you know, what about, for example, quite complicated graph-based models. So there are a few researchers here at Bristol, who've been thinking about graph-based changepoint detection, and then I've come up with some really nice techniques. But again, how do you map that into the sphere of being online, and so forth. So there are an awful lot of really quite top button, exciting tasks that I could sink my teeth into. I personally am quite interested in the notion of change detection in text data, just because, although we now sensibly have some techniques that can be model-free and efficient in that way, there are obviously models that we can build that are good at analysing text and say, in saying, Yep, this is this is this, for example, sentiment, this Yep, this is this is a good summary statistic for what's going on in the text at this moment in time, where your time is how far you are through a corpus or something. So that's a problem I'm quite keen on. I've seen a few, a few contributions in the literature in this regard, but they tend to stick to relatively simple change detection techniques. Nothing wrong with that at all. But if some notion of more modern, more up to date, change detection techniques that encompass, let's say, a wider variety of more informed model and non model based approaches, alongside some hefty thinking from the from the NLP people, if that can be brought together, I think I think we've got something quite exciting on our hands. So this is something I was thinking about a little bit at the turn of 2021-22. Before the other things took over my life a little bit. But I think I think that would be a really cool problem, and one that I'd quite like to think about, even more.

Omar Rivasplata  32:37  
Cool, that sounds great. Here's another question. And I'm going to in this question, I'm going to take on something that I heard you say, just a moment ago. So there are some approaches to change point detection, that are based on this, currently kind of trending techniques, from deep learning, deep reinforcement learning, these techniques coming from machine learning. And this is all very new. And this is all very exciting, it's making news in the media in all sorts of problems. Now, my impression about how work goes in this area is, you have a problem, and then you throw a big neural network at the problem, and if that doesn't work, then you modify the parameters of the neural network until it does work, or you throw a big neural network hoping that it works. So do you have any comments about, like, you know, with the mind of a statistician, do you have any comment about this line of work? Is it likely to succeed for the problem of change point detection?

Sam Tickle  33:36  
That's a good question. So fundamentally, if I if I detach practical hat for a moment, and just think about this, from a theoretical perspective, the types of signal that change what detection people like and have thought about for years and years is, you know, something, which is a constant with abrupt change, followed by constant with abrupt change, and so forth. I mean, obviously, there are some people who take the much more noble approach of something more complicated, but we'll come back to that in a moment. Now, that notion of an underlying function, if you like, of piecewise constant is something which is PAC learnable. And in particular, it will have, you can you can draw up a neural network with finite Vapnik Chervonenkis dimension, which gives you, in theory can spit out the right function. Classical change detection, in that sense, is within reach of a neural network, for sure. Whether it's the right thing to do, from a practical perspective is interesting. I have seen a few papers that attempt to do this. Interesting enough, not in the discrete piecewise constant setting that I just outlined. So there's a paper that focuses on pyramid recurrent neural networks and focuses on the frequency domain of, of your of your time series in and well, in this case, the time series in question was, I think, bees waggling, the bees, bees and bees dancing. I'm not one to comment on data choices, I think I think all the all data are very beautiful. And yeah, there's some very nice ideas in there. At the same time, with regards to reinforcement learning, there is, I think, a really nice method that can came out, it was either 20, late 2021, or early this year, that focused on that precise, the problem change detection in self-driving cars, employing reinforcement learning, and I think it was deep reinforcement learning as well, that was. They had they had they had some neural architecture they were tuning. And this method actually is one of the few that I was briefly referring to before as one which was inside this Venn diagram of being online, being model-free being multivariate, etc. So basically, what I'm trying to say is all these papers are pretty recent, and all these all these sorts of thoughts with regards to this, this type of neural network, or this this type of function class that we're trying to learn is PAC learnable, etc. And obviously, we've had, we've had the concept of VC dimension for a very, very long time. I think it's only very recently that there's been a concerted effort towards actually trying to establish why neural networks work so well. I mean, you have this theoretical gap between how well neural networks should be able to do, based on high-dimensional statistical theory, and how well they actually do. I mean, they do much better than they should. And there are various, I mean, nobody's entirely sure of the answer, I would argue, but there are various interesting theories put forward. One hypothesis is that you've got stochastic gradient descent being helped by the fact that often start in a useful place, which I think is a compelling theory. And there are other there are other competing hypotheses. There's the there's a great paper called Deep Learning A Statistical Viewpoint, which came out last year, which I heartily recommend, and that sort of has the Vapnik Chervonenkis result that I mentioned before. All of these contributions are very recent, and it's a very fast moving field. So I'd say I'm excited, but still cautious. As somebody who grew up still, maybe I'm the last generation, but as a changepoint person who grew up in the in the era of kind of traditional hypothesis testing ultimately, I'm excited to see what the future brings, but at the same time, I have some caution, some reservation, but that's only natural. I'm already an old fuddy duddy. 

Omar Rivasplata  36:45  
Thank you Sam. 

Sam Tickle  36:46  
Thank you very much.

UCL Speaker  36:48  
UCL minds brings together the knowledge, insights and ideas of our community through a wide range of events and activities that are open to everyone.

Transcribed by https://otter.ai