University of Southampton Electronics and Computer Science
Professional Engagement

Research areas

Stuart's research interests are focussed on natural language processing / computational linguistics with a focus on information extraction and human-in-the-loop NLP.

Natural Language Processing

Information Extraction - text/behaviour classification, few/zero shot learning, relation extraction, semantic role labelling, knowledge-base population, named entity recognition, graph-based knowledge extraction, geoparsing/location extraction, tabular data extraction, temporal extraction and event/topic detection.
Human-in-the-loop NLP - socio-technical NLP, human-in-the-loop AI, interactive sense making, active learning, adversarial training.
Other - trustworthy AI, argument mining, digital text forensics, domain adaption.

Domain expertise

Law enforcement, Health, Environment, Sensors, Misinformation, Legal, Crisis Management, Defence and Security.

Research Focus

Stuart's research interest lies in the natural language processing area of information extraction and human-in-the-loop NLP, developing novel algorithms to discover and exploit patterns in free text and metadata to extract actionable human intelligence and machine-readable knowledge. In a juxtaposition to big data approaches, my research has focussed on developing novel solutions to problems where training sets are small, sparse or fragmented in nature. This is very common in areas such as social media posts during breaking news events, emerging topics within online community forums, criminal marketplaces exhibiting deliberate obfuscation, and historical datasets where information can be inaccurately recorded, corrupted or lost over time. I am interested in investigating socio-technical NLP approaches promoting explainable and trustworthy AI, human-in-the-loop approaches and information extraction based on techniques such as few/zero-shot learning, graph-based models, sentence embeddings, domain adaption and argument mining.

Examples of Impact and Outreach

Commercial : Innovate UK, Tackling challenges, building prosperity: The Industrial Strategy Challenge Fund, Orbital Witness: New technology spots key legal issues in real estate transactions, NLP research impact from LPLP project 2021 Innovate UK link, PDF (see page 47)

Policy : Middleton, S.E., Invited AI Expert, UK Cabinet Office, London, Ministerial AI Roundtable: use of AI in policing, chaired by Policing Minister Nick Hurd, July 2019 FloraGuard outputs

Software : Middleton, S.E., geoparsing algorithm 'geoparsepy' is available open source from PyPI, averaging 1,500 downloads a month in 2021 [source] PyPI stats geoparsepy

Outreach : Cowell, C. Sajeva, M. Lavorgna, A. Middleton, S.E. Clarke, G. FloraGuard webinar, Royal Botanic Gardens, Kew, 2020, stakeholder analysis [314 registered, 170 attended live, 50 countries, major stakeholders such as DEFRA, WWF, US Dept of Justice, UN Office on Drugs and Crime (UNODC), European Commission and CITES] vimeo 1h 30mins duration

Outreach : Riley, S. Middleton, S.E. Chamberlain, A. Shukla, P. Living With AI Podcast: Challenges of Living with Artificial Intelligence, AI Music: Duelling against a Robot Piano! April 21, 2021 Sean Riley Season 1 Episode 14 Podcast, TAS Hub

Steering Committees, Panels, Session Chair, Editorial Positions

Turing Fellow - 2021+
Organising Committee and Workshop Co-chair - RUSI and UKRI TAS Hub conference, Trusting Machines? Cross-sector Lessons from Healthcare and Security 2021
Sector Leads Committee - UKRI Trustworthy Autonomous Systems (TAS) Hub 2020+
Full Member - EPSRC Peer Review College 2021+
Guest Editor - MDPI Sensors journal 2021 special issue 'Sensors Application on Early Warning System'
Session Chair - ECAI 2020
Steering Committee (chair) - ACM WebSci'20 Workshop 2020, Socio-technical AI systems for defence, cybercrime and cybersecurity [STAIDCC20]
Session Chair - ACM WebSci 2020
Invited Expert - UK Cabinet Office Ministerial AI Roundtable event 2019 on 'use of AI in policing'
Invited Expert - ATI/DSTL workshop 2019 on 'Decision Support for Military Commanders'
Steering Committee - RGS-IBG Annual Conference 2018, Using New Forms of Data in Research Session Convenor
Steering Committee (short paper/demo chair) IEEE International Conference on Intelligent Environments [IE] 2016 Posters & Short Paper Track Chair
Steering Committee - MediaEval Benchmarking Initiative for Multimedia Evaluation [MediaEval] 2016 Verifying Multimedia Use Task Committee
Invited expert - BBC South Today

Research projects

ProTechThem : an ESRC funded project. ProTechThem will explore sharenting (parents sharing online information about minors). Motivation for sharenting and automated detection of risk behaviours online will be explored through online ethnography, criminological analysis and Natural Language Processing (NLP) algorithms to support improvement to cybersecurity behaviours.

SafeSpacesNLP : an UKRI TAS Hub funded project. Behaviour classification NLP in a socio-technical AI setting for online harmful behaviours for children and young people. Exploring human-in-the-loop graph-based and few shot NLP models for behaviour classification of online forum posts.

GloSAT : a UK NERC platform grant. Global Surface Air Temperature (GloSAT) aims to improve understanding of climate variability and change. Objectives include information extraction and data rescue of climate change sensors data from historical texts.

Multimodal audio-textual argumentation mining of political debates : a Web Science Institute grant. Development of a multimodel dataset for training NLP models to perform argument mining of political debates.

CYShadowWatch a UK DSTL funded project. Automated Multilingual Information Extraction for Online Cybercrime Sites. CYShadowWatch will explore statistical machine translation and information extraction of online Russian cybercrime forums.

Legal & Property Language Processing (LPLP) project : an Innovate UK funded project. LPLP will develop cutting-edge AI techniques to extract and analyse legal rights and obligations related to property and land. Objectives include the development of Natural Language Processing algorithms to extract legal rights and obligations from Land Registry documents and the development of machine learning based legal risk models for property and land.

FloraGuard project : an UK ESRC funded project. FloraGuard will examine and map from a multidisciplinary perspective the criminal market in endangered plants affecting the UK. Quantitative evidence will come from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; natural language & machine learning used to socio-economically map this activity at a community level.

Intel-Analysis DSTL : a UK DSTL funded project. Intel-Analysis DSTL uses argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence is obtained in real-time from a combination of human intelligence reports and information extraction from social media via natural language processing.

REVEAL project : an EU funded FP7 project. REVEAL aims to advance the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including digita ltext forensics, trust and credibility analytics and decision support for journalists verifying user generated content.

Digital Police Officer (DPO) project : a UK WSI funded project. The DPO project aims to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply natural language processing techniques guided by insights from criminology.

GRAVITATE project : an EU funded H2020 project. Focussed on supporting geometric reconstruction and semantic reunification of cultural heritage objects using techniques such as semantic enrichment using natural language processing, graph matching and 3D geometric matching.

OFERTIE project : an EU funded FP7 project. OFERTIE aims to enhance and use the OFELIA Testbed for OpenFlow Programmable Networking to run experiments to establish how programmable networks can be used to support technical solutions such as multicast and managed QoS, and what business models and value chains would be able to use these solutions in an economically sustainable fashion.

TRIDEC project : an EU funded FP7 project. Focuses on context aware semantic information retrieval and data fusion for crisis management in the Tsunami early warning and Oil rig drilling domains. Work includes geospatial sensor information fusion for decision support, task context management and context aware information filtering of real-time sensor and video event streams.

ENVIROFI project : an EU funded FP7 project. Focuses on context aware semantic information fusion and the creation of future internet environmental enablers. Work includes geospatial sensor information fusion for marine and biodiversity domains, uncertainty context management and context aware information filtering of geo-distributed heterogeneous data streams publishing sensor time series, satellite images, video and web 2.0.

DESURBS project : an EU funded FP7 project. Focussed on knowledge-based decision support tools to help planning organizations (councils, city planners, companies) better understand the vulnerabilities and design possibilities when designing safer urban spaces. Work includes semantic enrichment, personalization of best practice reports, advanced visualization and use of mapping/charting tooling.

IRMOS project : an EU funded FP7 project. Focuses on application performance modelling for use in automated Cloud resource provisioning. Work includes using semantically annotated UML diagrams to produce discrete event simulations and optimised cloud provisioning strategies.

SANY project : an EU funded FP6 project. Focuses on interoperability of in-situ sensors and sensor networks. Work includes building an OGC compliant generic sensor information fusion infrastructure and use of semantic OGC standards to handle fusion processes and application datasets.

MUPPITS and POSTMARK projects : UK TSB funded projects. Focuses on media management for post-production companies. Work includes the creation of a media data and metadata warehouse with auditable event tracking and automated media management. Later work focussed on business models and exploitation paths via media partners such as Pinewood studios.

POLYMNIA project : an EU funded FP6 project. An intelligent cross-media platform for personalised leisure and entertainment in thematic parks or venues. Work included an automated video media production tool encoding directorial knowledge for automated personalized video editing.

PrestoSpace project : an EU funded FP6 project. The project's objective is to provide technical solutions and integrated systems for a complete digital preservation of all kinds of audio-visual collections. Work included a distributed rendering system for video restoration and the overall distributed control of mixed-media restoration sub-systems.

Moretea project : an EPSRC project. Electronic notebook project to improve the information environment for chemists doing chemistry - within and beyond the lab using semantic search within a secure environment.

SCULPTEUR project : an EU funded FP5 semantic web project. A digital library system for searching and retrieval of diverse multimedia representations to support the work of professional users in the fine arts. Work included semantic search and retrieval of mixed-media archives.

GEMSS project : an EU funded FP5 medical Grid project. Six medical imaging application are supported within a secure, commercial Grid infrastructure. Work included GRID based negotiation for medical simulation quality of service.

Quickstep project : a hybrid collaborative/content-based recommender system to recommend on-line research papers. Uses kNN multi-class paper classification and an ontology to enhance the profiling process. Two trials of the system were conducted, both lasting 1.5 months with 14 and 24 subjects. The results demonstrated the utility of using an ontological approach to user profiling and how applying domain knowledge can enhance profiling.

Foxtrot project : evolution of the Quickstep recommender system. Uses pearson-r correlation to recommend and kNN classification to profile user interests. An ontological approach is taken to represent user profiles. This allows users to visualize and edit profiles encouraging direct feedback on what the system thinks they are interested in. A year long trial is in progress with hundreds of staff, postgraduates and undergraduates from the university to evaluate the utility of this approach.

See the publications link for details of the above work.

Electronics and Computer Science