Note: Franz Kiraly´s primary affiliation has changed to Shell UK (Shell.ai) in Jan 2020. He retains an honorary faculty position at UCL. Due to IT problems, this home page is frozen in its 2019 state – please note that it therefore contains outdated information. UCL ISD is working on resolving this problem. Dr Kiraly can still be contacted via his UCL email address.
Stochastic Modelling of Complex Systems
General Theory and Methodology
|Curriculum Vitae (2018/04)|
|Franz Kiraly @|
|Google Scholar||arXiv||The Alan Turing Institute & Data Study Groups||IRIS|
|Core interests||Recent projects||Professional roles||PhD & Internship applications|
|Short CV||Publications||Data scientific software||Talks, slides and videos|
As a practical statistician and machine learner, I am interested in creating a data analytics workflow which is empirically solid, quantitative, and useful in the real world.
My research aims to provide the foundations, through:
(i) studying external assessment, comparison, and validation of white-box and black-box methodology: how to empirically test whether the (black/white-box) method does what is claimed? Is it better than simpler alternatives, or better than a random guess?
(ii) theoretical analysis and practical workflow building for complex modelling tasks, e.g., in the presence of structured/hierarchical observation mechanisms or non-standard/composite data types. For example, prediction in the context of time series, spatial observations, comparisons, multiple data sources.
(iii) Design and implementation of automated modelling and model validation workflows: how to best do the above in a suitable software environment? How to use external checks to find the most suitable model, especially within a variety of trade-offs such as between accuracy, computational cost, and interpretability?
These are especially relevant in applications where usually the data and the associated scientific questions, and not a single method class is in the focus of interest; current project and collaboration domains include energy, finance, clinical health, sports and prevention.
Selected recent work on data scientific methodology:
Workflow design and theory for probabilistic supervised learning. Where supervised learning predicts a label, the probabilistic variant aims at predicting the uncertainty in prediction, in addition. Our work is the first to formalize this task in an entirely model-agnostic way, provides a number of theoretical insights, as well as a formal workflow design implemented in a python package, skpro, which is sklearn for the probabilistic case.
Predictive independence testing and graphical modelling. Our work establishes a close theoretical connection between the task of testing whether variables are (conditionally) independent, and testing whether it is possible to predict one from the other (better than a certain baseline). This leads to a close link between the predictive modelling and the independence testing workflows, enabling easy multivariate independence testing.
Learning with complex data types. In cases where variables are not numbers, categories, or strings, but more complex objects such as series, images, graphs, a more abstract data storage, processing and modelling infrastructure is needed. Model composition is the natural paradigm for the latter, leading to challenges in object oriented software design. The xpandas package for python extends joint functionality of pandas and sklearn transformers for this setting (work in progress).
Selected recent application work:
Prediction and Prevention of Falls in a Neurological In-Patient Population. Falling, and associated injuries such as hip fracture, are a major strain on health and health resources, especially in the elderly or hospitalized. We are able to predict, with high accuracy in a neurological population, whether a patient is likely to fall during their stay, using only a number connecting test (the Trail making test).
Quantification and Prediction in Running Sports. Characterizing the training state of running athletes, and making predictions for race planning and training. We can predict Marathon times with an error in the order of a few minutes, and we are able to accurately summarize an athlete by three characteristic numbers.
I am currently holding the following professional roles which are points of contact in matters as described below:
UCL Statistics: enterprise coordinator & MAPS enterprise board member
Internal enabling role, and point of contact for translational engagement with UCL statistics and the MAPS faculty, especially on data science topics - e.g., data scientific consulting, courses on data analytics, statistics, machine learning/AI, commissioned research projects. MAPS faculty PoC is Jawwad Darr (UCL MAPS faculty, Vice-Dean Enterprise)
UCL Statistics: diversity & equality board member
Internal point of contact for diversity & equality related matters.
UCL CoMPLEX: board member
Representing UCL statistics on the board of UCL CoMPLEX. Potential point of contact for academics and industry, especially on topics in the intersection of biotech/health and statistics/machine learning/AI.
Alan Turing Institute: Data Study Groups, scientific lead and coordination team member
Co-organisation of the outreach scheme, management of data scientific and technical aspects. Possible point of contact for translational engagement with the Alan Turing Institute. Main PoC are Sebastian Vollmer (Data Study Groups, Director) and Nicolas Guernion (Alan Turing Institute, Director of Partnerships).
I am currently accepting applications for PhD supervision, subject to UCL guidelines on formal requirements for obtaining a graduate research degree, and supervisory limits. Applications should include a CV, a short description of your research interests, a description of your background in mathematics/statistics, data analysis, and programming, as well as a motivation statement on what you are looking for in a PhD.
PhD stipends (through grants and projects) may be available - for these, kindly apply through the official channels, e.g., through the respective funding bodies (which may depend on your citizenship), or the UCL Human Resources portal.
I am accepting applications for short-term or summer internships from highly talented applicants. These are of 1-3 months length and cover subsistence plus world-wide travel expenses at the bursary rate. Initiative applications are possible, and should include an up-to-date CV, a motivation letter with description of research interests, evidence of scientific writing skills (e.g., a thesis or paper), evidence of data analytics skills (e.g., an analytics report), and/or evidence of programming skills (e.g., github account with public repositories).
Short Curriculum Vitae
At the University of Ulm, I have obtained my Diplomae (equivalent to MSc or MD, as regards content) in Computer Science, Mathematics, Medicine and Physics in the years 2003, 2005, 2006 and 2011; in 2008, I recieved my PhD in Medicine.
From 2007 to 2010, I have completed my PhD thesis in Mathematics on the topic of Arithmetic Geometry, under supervision of and in cooperation with Prof. Werner Lütkebohmert in Ulm.
From 2010 to 2013, I have worked as a postdoctoral researcher in Prof. Klaus-Robert Müller's Machine Learning Group, at the Technische Universität Berlin, and I have been an associate member of Prof. Günter Ziegler's Discrete Geometry Group, at the Freie Universität Berlin.
In 2012 I was appointed Leibniz Fellow at the Mathematisches Forschungsinstitut Oberwolfach where I spent a total of six months, split over 2012, 2013 and 2014.
Since 2013, I am working as a lecturer (comparable to a tenured assistant professor) at University College London.
In 2015, I have been visiting the Aalto Science Institute as an AScI Visiting Fellow.
Since 2016, I am also a faculty fellow at the newly founded Alan Turing Institute whose vision is to bundle and catalyze the UK's efforts in modern data science, and have been recently co-organizing its Data Study Groups as a member of the DSG coordination team.
[Download Curriculum Vitae] (2018/04)
(the arXiv versions are usually the most up-to-date)
Gressmann F, Király FJ, Mateen BA, Oberhauser H. Probabilistic Supervised Learning. Preprint, 105 pages, arXiv 1801.00753. 2018.
Burkart S, Király FJ. Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling. Preprint, 50 pages, arXiv 1711.05869. 2017.
Király FJ, Qian Z. Modelling Competitive Sports: Bradley-Terry-Élő Models for Supervised and On-Line Learning of Paired Competition Outcomes. Preprint, 53 pages, arXiv 1701.08055. 2017.
Mateen BA, Bussas M, Doogan C, Waller D, Saverino A, Király FJ, Playford ED. Machine Learning in Falls Prediction; A cognition-based predictor of falls for the acute neurological in-patient population. Preprint, 37 pages, arXiv 1607.07751. 2016.
Király FJ, Oberhauser H. Kernels for sequentially ordered data. Preprint, 48 pages, arXiv 1601.08169. 2016.
Király FJ, Ziehe A, Müller K-R. Learning with algebraic invariances, and the invariant kernel trick. Preprint, 17 pages, arXiv 1411.7817. 2014.
Blythe DAJ, Király FJ, Theran L. Algebraic combinatorial methods for low-rank matrix completion with application to athletic performance prediction. Preprint, 13 pages, arXiv 1406.2864. 2014.
Király FJ, Kreuzer M, Theran L. Learning with cross-kernels and Ideal PCA. Preprint, 14 pages, arXiv 1406.2646. 2014.
Király FJ, Theran L. Matroid Regression. Preprint, 16 pages, arXiv 1403.0873. 2014.
Király FJ, Ehler M. The algebraic approach to phase retrieval and explicit inversion at the identifiability threshold. Preprint, 26 pages, arXiv 1402.4053. 2014.
Király FJ, Kreuzer M, Theran L. Dual-to-kernel learning with ideals. Preprint, 15 pages, arXiv 1402.0099. 2014.
Király FJ, Rosen Z, Theran L. Algebraic matroids with graph symmetry. Preprint, 70 pages, arXiv 1312.3777. 2013.
Király FJ. Efficient orthogonal tensor decomposition, with an application to latent variable model learning. Preprint, 14 pages, arXiv 1309.3233. 2013.
Király FJ, Theran L. Coherence and sufficient sampling densities for reconstruction in compressed sensing. Preprint, 18 pages, arXiv 1302.2767. 2013.
Refereed conference publications
Király FJ, Ehler M. Algebraic reconstruction bounds and explicit inversion for phase retrieval at the identifiability threshold. Journal of Machine Learning Research Workshop & Conference Proceedings Vol.24 – Proceedings on the Seventeenth International Conference on Artificial Intelligence and Statistics. 9 pages. 2014.
Király FJ, Theran L. Obtaining error-minimizing estimates and universal entry-wise error bounds for low-rank matrix completion. Neural Information Processing Systems 2013, to appear in Proceedings. Preprint version available as arXiv 1302.5337, 14 pages. 2013.
[arXiv 1302.5337] [code, mloss]
Király FJ, Ziehe A. Approximate rank-detecting factorization of low-rank tensors. IEEE Internatioal Conference of Acoustics, Speech, and Signal Processing 2013, to appear in Proceedings. Preprint version available as arXiv 1211.7369, 5 pages. 2013.
[arXiv 1211.7369] [code, mloss]
Király FJ, Tomioka R. A combinatorial algebraic approach for the identifiability of low-rank matrix completion. International Conference on Machine Learning 2012. Published in ICML Proceedings, made available by ICML as arXiv 1206.4670, 8 pages. 2012.
Király FJ, Von Buenau P, Müller JS, Blythe DAJ, Meinecke FC, Müller K-R. Regression for sets of polynomial equations. Journal of Machine Learning Research Workshop & Conference Proceedings Vol.22 – Proceedings on the Fifteenth International Conference on Artificial Intelligence and Statistics, 22:628-637. 2012.
[arXiv 1110.4531] [code] (ZIP, 17,4 KB)
[JMLR W&CP 2012-22]
Király FJ, Ziehe A, Müller K-R. An algebraic method for approximate rank one factorization of rank deficient matrices. Latent Variable Analysis and Signal Separation 2012 Conference Proceedings, 272-279. 2012.
Refereed journal publications
Ioannidis K, Chamberlain SR, Treder M, Király FJ, Leppink EW, Redden SA, Stein DJ, Lochner C, Grant JE. Problematic internet use (PIU): Associations with the impulsive-compulsive spectrum. An application of machine learning in psychiatry. Accepted in Journal of Psychiatric Research. 2016.
Blythe DAJ, Király FJ. Prediction and quantification of individual athletic performance. PLoS ONE 11(6): e0157257. 2016.
[PLoS ONE 10.1371], includes code link
Ehler M, Graef M, Király FJ. Phase retrieval using random cubatures and fusion frames of positive semidefinite matrices. Waves, Wavelets and Fractals – Advanced Analysis. Dec 2015.
Király FJ, Theran L, Tomioka R. The algebraic combinatorial approach for low-rank matrix completion. Journal of Machine Learning Research, 16(Aug):1391-1436. 2015.
Larsen P, Király FJ. Fano schemes of generic intersections and machine learning. International Journal of Algebra and Computation, Vol.24, No.17, 923-933. 2014.
Király FJ, Lütkebohmert W. Invariants of regular local rings by p-cyclic group actions. Algebra and Number Theory, Vol.7, No.1, 63-74. 2013.
Király FJ, Von Buenau P, Blythe DAJ, Meinecke FC, Müller K-R. Algebraic geometric comparison of probability distributions. Journal of Machine Learning Research 13(Mar):855-903. 2012.
[JMLR 2012-13] [code] (ZIP, 3,8 KB)
Preprint published in the Oberwolfach Preprint Series as
Müller JS, von Bünau P, Meinecke FC, Király FJ, Müller K-R. The Stationary Subspace Analysis Toolbox. Journal of Machine Learning Research 12(Oct):3065−3069. 2011.
Kilian H-G, Kazda M, Király FJ, Kaufmann D, Kemkemer R, Bartkowiak D. On the structure-bounded growth processes in plant population. Cell Biochemistry and Biophysics 57:87-100. 2010.
Schlenk RF, Döhner K, Mack S, Stoppel M, Király F, Götze K, Hartmann F, Horst HA, Koller E, Petzer A, Grimminger W, Kobbe G, Glasmacher A, Salwender H, Kirchen H, Haase D, Kremers S, Matzdorff A, Benner A, Döhner H. Prospective evaluation of allogeneic hematopoietic stem-cell transplantation from matched related and matched unrelated donors in younger adults with high-risk Acute Myeloid Leukemia: German-Austrian trial AMLHD98A. Journal of Clinical Oncology 20;28(30):4642-4648. 2010.
Von Bünau P, Meinecke FC, Király FJ, Müller K-R. Finding stationary subspaces in multivariate time series. Physics Review Letters. 103, 214101. 2009.
Király FJ, Kletting P, Reske SN, Glatting G. Modelling radioimmunotherapy (RIT) with anti-CD45 antibody to obtain a more favourable biodistribution. Nuklearmedizin 48:113-119. 2009.
Király FJ. Wild quotient singularities of surfaces and their regular models. Doctoral dissertation, Ulm. 2010.
[e-print VTS Univ. Ulm]
Király FJ. Vergleich verschiedener Postremissionsstrategien bei der akuten myeloischen Leukämie mit normalem Karyotyp. Doctoral dissertation, Ulm. 2008.
[e-print VTS Univ. Ulm]
Data scientific software
Open source software in the sklearn ecosystem - contributions and collaborations are very welcome:
xpandas - extending pandas to data containers for structured, hierarchical and complext data types, and transformer interfaces compatible with the sklearn API
skpro - machine learning toolbox for paradigm-agnostic probabilistic supervised learning, i.e., probabilistic label predictions, extends the sklearn API and provides interfaces for Bayesian toolboxes (see also section 8 of the concomitant paper)
pcit - predictive conditional independence testing with a workflow interface to predictive models in sklearn (see also section 6 of the concomitant paper)
Past Talks: Slides and Videos
2012, June 29, 14:00-14:20, ICML 2012
University of Edinburgh, Appleton Tower, Room AT LT 2
A Combinatorial Algebraic Approach for the Identifiability of Matrix Completion
2012, April 23, 19:35-20:00, AISTATS 2012
La Palma, Los Cancajos, H10 Taburiente Playa, Las Nieves/Tenguía room
Regression for sets of polynomial equations
[Video, unfortunately incomplete]