In the current Information Age, data has become a commodity that is driving development crucial to future economic success, particularly for service-based economies such as the UK. The potential to transform the economic landscape is tantalising, from providing business with strategic advantage or new services, to revolutionising medical diagnostics, among many other benefits to society that were unthinkable only a decade ago. However this potential cannot be realised unless new methods for handling, analysing, and extracting knowledge from data are made available. This is particularly relevant in the context of Big Data, where scalable techniques and algorithms are vitally important. The emerging field of Data Science usually refers to the interface between Statistics, Mathematics, and Computer Science that is providing the much sought novel techniques and approaches arising from the cross-fertilisation of ideas between these complementary domains. Data Science is rapidly gathering momentum, and suggests promising new research avenues in the near future. In recognition of this momentum, EPSRC have established the Alan Turing Institute to promote advanced research and translational work in the application of data science, acknowledging that this requires leadership both in advanced mathematics and in computing science.
The University of Southampton is in an extraordinary position to contribute to this emerging field due to its strong reputation in each of the core subjects, and the interdisciplinary, collaborative and entrepreneurial environment cultivated and nurtured in Southampton over the last decade or so. A tangible proof is the first Southampton Data Workshop that we organised in September 2014, which brought together Southampton data researchers and practitioners, and local business and organisations with an interest in data (workshop programme attached). This two-day event highlighted the human and technical capabilities at Southampton, in addition to a keen interest in collaborative work. The round-table discussion that concluded the event made apparent the need for follow-up events and other consolidating initiatives, as well as reaching out to national and international institutions, and it is in this context that we put forward this proposal.
The Southampton Statistical Sciences Research Institute (S3RI) brings together researchers from Mathematical Sciences, Social Sciences, Medicine and other disciplines, to develop statistical methodology for data analysis, motivated by real-world applications. In Mathematical Sciences, the Pure Mathematics Group is heavily involved in the translation and development of ideas from pure mathematics in the context of Data Science, such as Topological and Geometrical Data Analysis, mostly in partnership with Computer Science and the Life Sciences. The Applied Mathematics group works on data assimilation for biological and geophysical models, as well as theoretical methods used in Machine Learning from Big Data. The Operational Research group is at the forefront of research in data-driven decision analysis and optimal decision-making. Interest in Data Science in general, and Machine Learning in particular, is spread across several research groups in Electronics and Computer Science (ECS), and it spans both theoretical/algorithmic work, and a wide range of applied problems such as Computational Biology or Signal Analysis. ECS teaches several UG and PG modules in machine learning and a new MSc programme in Data Science has been launched from 2015/16. The Administrative Data Research Centre for England (ADRC-E), directed by Professor PWF Smith, is an ESRC-funded centre that enables information routinely collected by government departments and other agencies to be shared anonymously and securely, and transformed into knowledge and evidence to inform public and economic policy. The Institute for Life Sciences (IfLS) is structured around four Grand Challenges including the “Human Nexus”, which seeks to model complex biological systems across multiple scales from large and complex data that describe biological systems at these different levels, tackling challenges including data noise, integration of disparate datasets, and visualization and analysis, and to serve as a catalyst for cross-discipline research as well as a hub for exchanging expertise in Data Science. The UoS hosts the Web Observatory, which curates and links to Data on the Web, and coordinates research activities with the broader scientific community, and the Iridis Computer Cluster, one of the largest computational facilities in the UK, which provides High Performance computing facilities to the research community at Southampton. Other research groups at Southampton would also benefit from the development of Data Science expertise, and particularly Big Data, at their doorstep, and we anticipate their participation and contribution to the activities in this proposal.
In addition to the conference planned for January 2016 we have other activities:
1. Sandpit/Hands-on events (up to three) We will run a competitive call in September for up to three one-day intensive events before March 2016, either research sandpits around key challenges identified, or training hands-on events on a specific topic in Data Science. Priority will be given to cross-disciplinary themes. The call will be advertised in the UoS and in other institutions and organisations via the list of contacts below. These one-day events will allow researchers and/or practitioners to meet in a more informal, less structured setting, to kick-start collaborative work around a particular research challenge, or fill in a specialised training gap, and help breaking the communication barrier across disciplines.
2. Travel grants (up to six) We will fund up to five outgoing visits from UoS members of staff to other institutions or organisations in the context of Data Science. We will give preference to cross-disciplinary visits that show clear promise of returning substantial benefit to the wider UoS Data Science community (e.g. transfer/exchange of skills or development of collaborations). We will review applications and allocate travel grants in two rounds, September and December, for visits up to March 2016. Each visit should be for up to a week, the grant will cover transport, accommodation and subsistence costs, and a short scientific report should be submitted at the end.
3. Visitor grants (up to three) We will cover transport, accommodation and subsistence costs for up to three individuals with a high profile in Data Science to visit the UoS. The visits should be proposed by UoS members of staff, should be for a minimum of three days, and must include at least one non-specialist public presentation. We will review applications and allocate visitor grants in two rounds, September and December, for visits up to March 2016.
4. Data Science Education working group One fundamental challenge in this emerging field is the training of the next generation of data scientists. We will constitute a working group to verify that data science education activities across the UoS are coordinated, and the full potential for interdisciplinary training is realised. In this way, we hope to ensure that current and prospective Southampton students can benefit from the breadth and depth of expertise in data analysis across the University. The working group will produce a report with key findings and recommendations.
5. Media internship (one) We will advertise a part-time Data Science Media Internship among our students and recent alumni. The successful candidate will help producing multi-media content (pictures, videos, text) for and from this project, such as pictures and recorded presentations or interviews to key people, preparing written material and creating and maintaining a Southampton Data Science blog and associated Twitter account. The material produced will be used to increase UoS profile in Data Science for the proposed and the future activities. We will advertise the internship in September, and appoint a candidate by October, allocating up to 120 hours over a period of 6 months (October-March) plus a small budget for equipment. The ideal candidate will have a background and/or interest in Data Science, and experience in producing media content, particularly for the web.