About
Dan is a Research Engineer at the IT Innovation Centre within the School of Electronics and Computer Science, University of Southampton. After a PhD in Theoretical Particle Physics from the University of Manchester completed in 2015, with a short post-doc in Particle Cosmology in 2016, he joined the IT Innovation Centre in 2017.
He has an intimate familiarity with the data science pipeline: from the extraction of meaningful data and pre-processing, to the critical evaluation of results from machine learning algorithm studies and the deployment of machine-learned models. He is primarily driven by the vast application domain of statistics, particularly where cross-disciplinary work is involved, and has extensive experience in statistical analysis and the development of Monte-Carlo simulations.
In the COVID-19 era, Dan has undertaken several projects to assist with the UK's national epidemic response:
- Prediction of COVID-19 hospital impact using infectious disease epidemic modelling: fitting stochastic epidemic models to time series of hospitalisations to infer hospital impact
- The statistical evaluation of new and emerging testing technologies to identify SARS-CoV-2 such as lateral flow devices and loop mediated isothermal amplification (LAMP) based technologies
- Development of models of regular testing, taking into account the epidemiology and virology of SARS-CoV-2, sociological factors of adherence to testing and isolation, operational and logistical characteristics of the programme, and then determining how a given isolation policy subsequently affects transmission
- Development of machine learning models to predict COVID-19 patient deterioration
In addition to COVID-19 work, Dan has worked on various projects. Historically, his work has been on the following research projects:
CrowdHEALTH (EU H2020 project): CrowdHEALTH intends to integrate high volumes of health-related heterogeneous data from multiple sources and develop data analytics tools to operate on the integrated data, with the aim of supporting policy making decisions. Dan’s work in this project is in the development of machine learning prediction algorithms to predict when patients will have the first diagnosis of a set of high-impact diseases, such as cardiovascular disease and chronic kidney disease, in a timely and accurate fashion such that one can construct possible preparation and prevention approaches. The project utilises the prediction algorithm for population risk stratification.
BigMedilytics (EU H2020 project): BigMedilytics aims to transform the EU healthcare sector by using state-of-the-art big data technologies to achieve breakthrough productivity in the sector by reducing cost, improving patient outcomes and delivering better access to healthcare facilities simultaneously. Dan’s work in this project revolves around studying self-reported patient data associated with chronic obstructive pulmonary disease, linked with weather and pollution data in the patient’s home environment, to predict COPD exacerbation events. He is currently working on a machine learning algorithm to predict when these events will occur based on the linked data, so that patients can either prepare for the event by obtaining medication, or by preventing the event altogether.
Dan has also completed several pieces of commercially funded work as part of the VIVACE accelerated capability environment, as well as direct projects from the UK Home Office.
Research
Research groups
Current research
Dan’s research interests spans multiple fields in computer science and statistics.
In statistics, his core interests are the following:
- Development of more effective sampling techniques for producing samples from multivariate distributions
- Development of metrics for the assessment of statistical conditional models/machine learning models as applied to data
- The synthesis of Monte-Carlo simulation with machine learning and statistical learning
In data science, his core interests are in the following:
Missing data approaches: the vast majority of real-world applications of machine learning methods involve the problem of missing data. Imputation is one the most common approaches to handling missing data, and is practically useful. However, for accurate imputation, one needs a lot of data, as the imputation problem is often a more difficult problem to solve than the machine learning task at hand. Dan has great interest in developing approaches to missing data that do not involve imputation and exploits the full data in other manners, either through the modification of existing machine learning algorithms, or by the creation of new ones.
Electronic Patient Record event prediction: having an advanced knowledge of certain unfavourable medical events occurring, such as the first diagnosis of a high-impact disease or medical events that can repeat such as myocardial infarction, for individual patients can be used to either prepare for the event’s occurrence, or prevent it altogether. Dan is interested in developing approaches to predict these events, using either statistical analysis or machine learning methodologies, and developing models of disease progression. He is primarily interested in approaches that are generic, i.e., using electronic patient record datasets to determine the most significant factors affecting the events of interest rather than explicit domain expertise.
Other interests within data science includes:
- Development of new classification techniques for the exploration of scientific data
- Time series, image, audio/speech specific feature extraction and classification approaches
Dan holds other research interests outside of data science and statistics, including cryptography and efficient fully homomorphic encryption schemes for secure cloud computation.