Transcript Episode 9

UCL minds

UCL minds brings together the knowledge, insights and ideas of our community through a wide range of events and activities that are open to everyone.

Nicolás Hernández (NH)

Hello, everyone, and welcome to a new edition of sample space, the podcast of the statistical science department here at UCL. My name is Nicholas Hernandez. And today we have the honour and pleasure to be talking with Kevin Murphy. Kevin, thank you for joining us.

Kevin Murphy (KM)

Thank you for the invite.

Nicolás Hernández (NH)

As you probably know, Kevin doesn't need much of an introduction, to be honest. He's an excellent researcher who has been in several of the top universities in the world, Cambridge, Berkeley, MIT. Now, you're leading researche group at Google DeepMind. And, of course, you are the author of one of the most, let's say, well known books in machine learning. What about starting by the beginning? So how you decided to go for a PhD and do you have someone who inspired you?

Kevin Murphy (KM)

I guess my Yeah, origin story was as a teenager, I grew up in England, I read girdle Escher, Bach, by Douglas Hofstadter. Which I think, I don't remember when it came out. In the 80s, I think, it won a Pulitzer Prize. And it's a very strange book that mixes the discussion of girdles theorem and the ashes, art and bucks music. And he talks a lot about AI and consciousness and in computability, and, and stuff like that. Anyway, they kind of got me intrigued about AI and machine learning. And I studied Computer Sciences and undergrad at Cambridge, and wanted to sort of do more in the in the machine learning AI space. And I got a scholarship to go to the US as a kid, I'd always sort of dreamed of going to America. And I was lucky to get a scholarship to go to UPenn, where I did my masters. At Penn they were very small, they still are very strong in computational linguistics. And I took some classes in that, but I didn't really grab me. But I ended up doing a thesis related to computational biology in sequence modelling. So my very first paper is on using finite automata to do approximate string matching for DNA sequences. That was a long time ago. And then, and then I moved to California for so then I decided to stay on for a PhD. So I liked living in the States, and I liked doing research. So that's why I started to do a PhD is keep living the student life and expanding my mind.

Nicolás Hernández (NH)

A good combination.

Kevin Murphy (KM)

Yeah, yeah.

Nicolás Hernández (NH)

And then just a follow up. So you mentioned the 80s. And you mentioned AI. So I wonder, what, what was AI in the 80s?

Kevin Murphy (KM)

Well, I mean, I was just a kid then. So but I think at the time, you know, expert systems were the dominant paradigm. And then when I started grad school, like by the time I got to Berkeley, when would that have been? Like 96? I think I was very interested in graphical models, which is essentially, in fact, one of the main textbooks in the area is called probabilistic expert systems. And it's by a bunch of British statisticians actually. Well, Stephen Lawrence, and I don't think it's British. But I think he's a professor at Oxford. And Cowell, and I forget the other authors. Anyway, filter what I think is an author, who's a UCL retired professor. So they very much interpreted those models as using experts structure analogous to what expert systems used, but then also incorporating data. So you could update the parameters of the model given data. And you could do inference over unknown quantities and make predictions and so on. So as a structured probability model, that was, I think, the beginning of the transition from a purely manually designed model to a purely data driven model. And that sort of interesting hybrid, which was, you know, popular for a while and still is used in some cases, but that's, you know, what I did my thesis on it was that model family. These days, you know, there's much more emphasis when you have gigantic amounts of data, then it's less important the form of the model depending on the questions that you're answering. But in limited data settings, you need to think more carefully about your model. Exactly. That's somehow related to one of the question you got in at the end of your talk. Yes, a couple of permanent means. Yeah, I don't know. Maybe people watching the listening to the podcast won't have seen the talk. But yeah, I was giving some big picture talk about like all of machine learning from like a decision theory point of view, and how you can think of predictive modelling and unsupervised learning and generative AI has different elements. They're not variations on a theme, but they're related in certain ways. They're solving different tasks with different modelling assumptions. So I think the niche certain tasks require more care, and thought about the form of the model. And that largely depends, it depends both on the task you're trying to solve, but also the the amount of data that you have, I would say any questions related to causality fundamentally require a model based approach, because you could have infinite data and still fail to infer the true causal effects, if you don't account for, you know, confounding factors that might not be observed. So and that, you know, is actually fundamental to many problems, settings, but not all right? If you just want to generate a dialogue agent that is entertaining, it doesn't have to be true, it doesn't have to have it what sense with causality have a role is not clear, has any role there, right? But you can ask questions about, you know, if I took this drug, would it cure disease, x, and people are using these models for such questions. And it's not clear we can trust their outputs. And there's no reason why we should because they're not really model based approaches, they're like data driven approaches, the model is just a compression of the data. It doesn't have any explicit modelling of the underlying reality. Now, there are lots of claims that these large models do implicitly learn models of the world. And I think there's some evidence that they do otherwise they wouldn't be as effective as they are. But it's almost it's an artefact of the objective, the training objective like, which is a prediction or compression objective. So they do discover something about the world, but it's in a rather opaque way that's maybe not very easy to leverage for planning purposes or causal reasoning purposes. But there's no doubt that these systems are learning something about the world.

Nicolás Hernández (NH)

Also by the way, your top is going to be uploaded to our YouTube channels. Again, everyone is going to be able to listen to your talk and to the podcast. So during your answer, you mentioned probabilistic perspective. So your your book, machine learning a probabilistic perspective? I mean, has more than I think 5000 citations and won the DeGroot price if I'm correct, so what, what what inspired you to write this book and basically, why probabilistic perspective?

Kevin Murphy (KM)

Well, I didn't want to use the word Bayesian. Because that would put a lot of people off and cut my sales. And you know, that the many of the approaches I use are discussed in that book, are not fully Bayesian. I mean, there's a lot of methods most the majority of approaches in machine learning are just minimising a loss function. And often that's a log likelihood, and therefore, they just doing maximum likelihood estimation. So there's really nothing Bayesian about that. And most machine learning, people don't care about uncertainty modelling. So Bayes has nothing to do with it. But the models that are fitting are often probabilistic models. And so back then, that was not the dominant paradigm, though, you know, certainly for supervised learning, the dominant paradigm was thinking about function approximation, where there's a, you know, there's an XY mapping, and there's a unique y for every x. And the goal is to predict that way. And, and it's pretty clear that, in general, that is not sufficient, because there's going to be multiple possible ways for any given x, multiple possible outputs, because there might be ambiguity about the input, you know, maybe an ill posed question, or, and I think now, you know, in 20, in the 2020s, this is now the dominant paradigm. Everyone's doing probabilistic modelling, because transformers are probabilistic models over sequences. And it's sort of obvious that predicting the next word there is no unique next word. There's a distribution over words, I might say next, that's more or less entropic. And I sample from that distribution. And that's how language models work. And, you know, image generation, same thing, right, I type in a prompt, a pretty cat sitting on a flowerpot. And there are many, many possible images that, in some sense, is consistent with that prompt. So it's a one to many distribution one to many mapping. So obviously you need a probabilistic method or model to represent that diversity and uncertainty. So now that's sort of a vacuous statement. I mean, basically on. Well, I want to say all machine learning is probabilistic. I would say the dominant paradigm now is probabilistic. But even though it did not used to be, and the dominant paradigm is now generative, even though it didn't used to be so yeah, so it's sort of vacuous. Now, a footnote on that, so Yan Lacan, is doing a lot of interesting work has been doing a lot of interesting work on energy based models for for many years. And he is violently opposed to thinking of them in terms of probability for probabilistic modelling. And he says, that's a unnecessary restriction. Because that requires that you have a normalised distribution that is very difficult to compute the normalisation constant. And if you give up on that, then it sort of liberates you to try more exotic models. And, you know, I think there's some merit to that argument. You could also argue that the approximations, he comes up with just approximations to the partition function, and he's, you know, he's being approximately Bayesian anyway. So, you know, then it gets down into the details of specific models and specific inference techniques, and approximations, and, you know, you can quibble about whether they're probabilistic or not, and it doesn't so much matter. But I think the key point is that you want to have distributions over things, whether they're normalised or not, because the world is unpredictable, and diverse. And I think thinking about your One to One function mapping, or many to one, which is sort of the old school approach is obviously not sufficient.

Nicolás Hernández (NH)

Right, right. Thank you for that. That's a very thorough answer. So you if we go back to your like, career path, so after being an associate professor in British Columbia University, so you took a position at Google DeepMind. So will you briefly describe your journey from academia to industry? And what motivated the transition?

Kevin Murphy (KM)

Yeah, so this is 2012. I just got tenure, I had my sabbatical. And I thought, you know, I did my graduate work in California at Berkeley, and I loved the Bay Area. And I thought it'd be nice to go there for my sabbatical. And I had a lot of friends at Google. And they were all raving about it. And I thought, Well, it'd be interesting to see what the fuss is about. So I spent my first six months actually at Stanford, finishing my book, and then the second six months at Google. And, and indeed, it was a lot of fun. But I thought, six months wasn't enough. So I asked for a leave of absence, for one year leave of absence from UBC. And so I could sort of dive deeper into stuff at Google. And I applied for a full time position, and got that offer. And then I had to make a hard decision about whether to stay or return to academia. And it was a difficult decision, because I was very happy at UBC. But I felt like the the promise of Google hadn't like, especially in terms of machine learning, hadn't been fully realised. And it was like going to be a big thing. This is 2012. Now, deep learning was just taking off, right? I was kind of late to the party on that. I was in a meeting very early on in 2012, with Jeff Dean, Tom Dean, who was my host, and Andrew Yang. And we were talking about this talking about large scale neural networks. And the system that came to be known as disbelief, which was a precursor to TensorFlow. And Andrew had been scaling up neural nets at Stanford. And Jeff Dean, who's a very famous computer systems guy who has been at Google, he's like employee number two or something. He's been there since the beginning. So he invented MapReduce, and many large scale systems at Google that are like, you know, lifeblood of the company. And he's legendary as a, as an engineer, and really being able to make large scale systems work. He was very excited about the potential of building just massive neural nets and having that be sort of a breakthrough technology. And I was not really on that bandwagon. I was sort of watching it happened from the side. So I kind of got into that later, a few years later, but I thought, you know, that looked like a sort of promising avenue. But to me, the other thing that was promising was to work on video, which is very computationally expensive. So I had been doing more and more computer vision work. And that's ended. What I spent most of my first few years at Google doing was working on vision problems. Not I did some video, but I ended up mostly working on image problems. Got into video later. But I felt like the compute resources that you could get at a company would really be game changing. And that turned out to be a correct guess. I mean, I didn't. I did not predict the deep learning revolution. And I did not predict the success that we've had I not sure anyone predicted that. Maybe Ilya Sutskever. May, I think he did predict it. I remember, we were in grad school together. He was in Toronto I was at. I was a professor at UBC, when he was a graduate student at Toronto. And we were at many events together the CIFAR events that Geoff Hinton organised. And I remember Ilya was extremely evangelical about the power of large scale neural nets. And I was more sceptical. But we, you know, we've seen that he has been proven correct. And for those who don't know who I'm talking about his, I believe His title is Chief Technical Officer open AI. I don't know, sorry, earlier, if you're listening, I don't know his exact job title. But in any case, is employee number three, open AI. He's one of the earliest and most important technical members there. I think he had a startup, I believe he had a startup that Google acquired and an intern had a startup that Google acquired. So you know, during my time there, I've just seen the rise of, of the power of machine learning and and the breadth of applications. So it's sort of exciting to be on that bandwagon.

Nicolás Hernández (NH)

What do you think are the key factors that determine the impact of research in the field of machine learning?

Kevin Murphy (KM)

That's, I mean, there's many ways to have impact, right, you can have, you know, a really core idea. Like, I don't know, amortise variational inference in the via e, right that that idea was independently invented by two groups. So ducking, who's actually on my team at Google, and Maxwellian, his advisor, and then there was a group of researchers at Google DeepMind. And I apologise, but I don't remember who exactly it was. It might have been done. Westra or Shakira, Muhammad, I'm not sure. But oh, here we go, is denilla resent a and Chiquita Muhammad and Dan Webster, their papers called stochastic back propagation and approximate inference in deep generative models. And it's essentially the same as the V paper from kynren, Welling and is came out a more or less the same time. But for reasons of history, the Kingman welding paper is cited more. So that was very impactful. I think another way to have impact is through software. So, you know, like pie torch being open sourced, or Jack's, has indirectly enabled huge amounts of progress, right. And that tends to not get a lot of credit in the academic system. Because essentially, papers are the only thing that counts in academia. But in industrial research labs, you know, we make sure to reward impact along many dimensions. So you can get rewarded for publications, especially, you know, if it's cited a lot, or you win a best paper award, but you definitely get rewarded for creating reliable software systems that people use either internally or open source. And, you know, maybe your work is patented. Or maybe, you know, there's you could have what they call thought leadership, right? If you inspire people to work on a certain a certain set of problems. So yeah, so Justin Gilma, who's also on my team at Google worked on adversarial robustness for a while. And he was arguing that the sort of epsilon perturbations that people studied were really not very interesting. Attack mechanisms, if you look at like real world, adversarial attacks, they're often quite perceptually. Visible, like they might superimpose an ad on a cartoon background. So it can get through the, you know, let's say the, the ad filtering system on YouTube. And a human can tell that it's like one image pasted on top of another, it's not an imperceptible change. But nevertheless, it evades the classifier. And so he sort of made the point that we should be considering robustness to a much larger set of shifts, not just you know, epsilon ball around the input point. And anyway, so he had some sort of position paper on adversarial buses, and then the fields kind of changed. And there are still people looking at, you know, epsilon perturbations, but now there's a recognition to be of the need to be robust and much, much broader ranges of attacks. Right. Now, we see that like, I think this is an interesting area where people are trying to break language models by, you know, prompt injection and, you know, maybe they change a word and it's maybe imperceptible to a human, but often you can kind of tell, but, you know, in language space, it's you can often slip an unusual word in there and it might be undetected, it's not really clear what perceptually undetectable means so much. But sometimes you inject the magic word, and then it just breaks the system and you jailbreak all of the RLH. If that this the or whatever it is right, you can coax these systems to emit information that they're not supposed to admit. And so that's a fun game with important consequences. So I should say, it's not just a fun game. It is a fun game. But it's also important that we defend against these things.

Nicolás Hernández (NH)

Going back to your talk, also one of the I think the fourth point of your talk about generative AI. Yeah. So to be honest, sometimes I see all these dizzing developments. And I'm quite afraid of AI. So are you afraid of AI?

Kevin Murphy (KM)

I think the fears are overblown. I know this is obviously heated debates on both sides. Actually, I saw something on Twitter today. Apparently, Nick Bostrom, who is in the UK somewhere, Oxford, perhaps. Anyway, he apparently has just been interviewed. And he even he thinks that the fears about AI are overblown. And he's worried that now governments will over regulate and will kill or stop the AI train. And progress will won't won't be made, because the fears are being exaggerated. So I think some fears are exaggerated. I think destruction of extinction is a red herring when it clearly not gonna go extinct in the literal sense. On the other hand, I think there are more pressing short term concerns that are important that we should worry about things like misinformation, especially in the context of elections, or blackmail. And fake media in general is definitely an important problem. And there are issues around copyright and, you know, correct compensation for artists and using the material for training models. So these are all very difficult issues that are important. I don't think that existential threats to humanity, they're just adding entropy to the system that we already live in, right, you can already create fake media with Photoshop and various tools on these new generative AI just make it easier to do that at scale. Like it makes it cheaper, and it lowers the barrier to entry. So more people can do it. So it's a question of degree rather than kind, I think. They don't find them maybe in the context of creating fake imagery and voices, it's really is something that you couldn't do easily before. And for fake images, that's also true, but like people have been doing fake media for a while, one of my colleagues, Chris Bigler at Google. He specialises in this area. And he told me that the concerns about fake media being generated by AI, somewhat misplaced, the main concern is people taking information out of context. So they might show a photo, let's say, of a bombing, which they claim is in Gaza, or maybe in Israel, and in fact, it was taken 10 years ago. And then they use that photo to accompany some narrative to tell the story that they're telling. And it's real. It's a real photo. And it's real text. There was no AI involved. But it's misleading because the photo was actually they lied about the location of it. Or they just maybe they didn't lie. It was true. The location is correctly stated, but the date, they didn't mention the date, they didn't say, yeah, right. So there's a essentially deception, like intentional deception on the done by human for some political purpose. That already happens, right. And that has nothing to do with AI. And that's actually much more common. And after, you know, a few years ago, some of these generative image systems were available and working weren't very well like ganz right, at least for face images. So when they started coming out, people started wearing about fake media. And what he tells me is that, you know, it never really was a problem in practice, like people were not using Gan Jenner, Gann generators to make fake images for purposes of political campaigning or blackmail, at least not at scale. And the sort of deliberate deception or misinformation was the dominant concern. Now, maybe the newer technologies, diffusion models and language models are different than GaNS, because certainly the quality is better and and it will become and the tooling is perhaps easier to use. So it's possible that that will be more of a problem in the future. But I think it is something we need to worry about. But I'm not so worried about humans going extinct or being taken over by machines. I mean, we ultimately build these things and control them and I think there's a lot of fantasising about science fiction inspired fantasising about superhuman AI's enslaving us. And I think it just stops. It just It distracts from these other things. It's not like that's impossible. But I think there are many more pressing concerns that are not getting enough air time.

Nicolás Hernández (NH)

Great. So now I have to say, a feeling some relief. Here. So I think we are trying to blow this, I would like to ask you a couple of two more questions, if you don't mind. So do you have like a particular message for early career researchers or PhD students in statistics or machine learning?

Kevin Murphy (KM)

I know listen to machine, I don't know about what's on the mind of statistics students, I think in machine learning, many students these days are worried that there's either nothing left to do because, you know, check GPT has solved or problems, or that they want to get a piece of that action, but can't because they don't have access to compute. And I would say there are still interesting problems that you can work on as an academic. So I think things like, well, this is sort of the science of deep learning, like why do these large systems work as well as they do? That's not well understood. And it's true that you may need access to these models to answer the questions, but you don't need to be able to train them. You can just treat them as artefacts it's like, almost like archaeology, right? There are these pottery shards that you found and you want to know, like, what was the process that created them. So you're studying the behaviour of these systems from maybe a statistical physics point of view. So I think that's interesting and important, because we need to understand how these systems work. So we can control them. And then I think, as I mentioned, in my talk, I think there are a lot of applications of machine learning in the sciences, where you need maybe more bespoke modelling efforts that are sensitive to the details of the domain. And you can't just inhale the whole internet to train a massive blackbox model, because you don't have enough data. So you need to think about where the data comes from, maybe you use the models help you do data acquisition. And those problems are maybe more within reach of students because you the models, the size of the models that may be smaller. And furthermore, you have access to colleagues in other departments, who are domain experts in chemistry or biology or environmental science. So in for those kinds of problems, universities should be better positioned than companies, those companies usually you don't have chemists and biologists on their staff. I mean, they do in some places, but not in large numbers. So, you know, universities are multidisciplinary Institute's. So I would suggest leveraging that multidisciplinary, where possible,

Nicolás Hernández (NH)

just to close, what's the moment in your career that converts you in the research that you are now?

Kevin Murphy (KM)

I don't know if there's a single moment. I do. I guess I've had a meandering path. I do remember a very inspiring talk by David Heckman about Bayesian networks, and I was at the Santa Fe Institute summer school in the early 90s, I think and David Heckman, too, though, for those who don't know him, was a researcher at Microsoft, for many years did early pioneering work on Bayes nets. And now, I think he works at Amazon. Anyway, but he gave a great talk. And that was the first time I'd heard about that model family. And I was I thought this is the coolest thing. And then that's what I ended up doing my PhD on. So that was a very influential moment for me. Right? Yeah. And then I mentioned the Hofstetter book. And, yeah, and Chris Bishop’s book when I was teaching, I read that and I thought, oh, yeah, this is really great. I mean, that was sort of building on that has a very graphical models perspective as well. But, you know, then ope expands the set of things you can do with it by using variational inference. So it really, you know, makes that toolset more broadly applicable.

Nicolás Hernández (NH)

Great. Okay. Well, I think I think that that's all for today. So thank you so much for your time. You're welcome. Hope you have enjoyed. Thank you everyone. I hope you have enjoyed this view later.