Scientists and business people seek to understand large and complex datasets. This involves developing and then testing hypotheses, using a combination of human skill and computation. Any such project requires both visualizations and statistical methods, that help the researcher originate and validate hypotheses. There are growing toolsets for presenting visualizations (e.g. d3js, bokeh) and for machine learning (e.g. weka, sklearn). However, the effort to integrate them in a single project falls on the user.
This project will make tools integrating sklearn and bokeh, to provide a user-guided environment for interactively developing models of a dataset. It will match visualization methods and machine learning methods, so a single request from the user results in both the building of a predictive model, and a visualization which shows its strengths and weaknesses.
Many metrics exist for classifier quality, and help is needed to understand which is most relevant in any particular context, so a significant part of the work will be targeted on visualizing the quality of classifiers. This will be part of a workflow which offers alternative classifier methods, and allows the user to finally build a consensus model.
A similar line of work will be devoted to regression models. Finally, a short scoping study will review the potential for similar approaches to modelling projects with more complex outputs: multi-class classifiers, multi-output regressors, etc.
The scoping project will deliver its results as a cloud-ready container. Further work could follow this initial project to produce a tool suitable for non-technical users.
The project will be guided by the book Information Visualization: Perception for Design by Colin Ware, which contains important insights which have as yet been underexploited by software developers and in the machine learning community.
The project will be guided by Chris Morris, STFC, who recently prepared and taught a course on Machine Learning for Cheminformatics.
Principal Investigator: Professor Michael Boniface