Skip to main navigationSkip to main content
The University of Southampton
We're launching a new website soon and would love your feedback. See the new design
Centre for Democratic Futures

Dr Stuart Middleton 

Lecturer in Computer Science

Dr Stuart Middleton's photo

Lecturer in Computer Science. My research interests are focussed on natural language processing, computational linguistics, information extraction and machine learning. I have been a PI and CoI on various Research Council, EU H2020, Innovate UK, DSTL and Home Office projects. Many of these projects are cross-disciplinary in nature, featuring consortia with a mixture of academic and commercial partners experienced in a range of domains and disciplines. My PhD looked into recommender systems and ontologies, working under supervisors Prof Dave de Roure and Prof Sir Nigel Shadbolt, and was completed in Oct 2002.

Research interests

Natural Language Processing, Computational linguistics, Information Extraction, Machine Learning

My research interest lies in the area between natural language processing and information extraction, developing novel algorithms to discover and exploit patterns in free text and metadata to extract actionable human intelligence and machine-readable knowledge. In a juxtaposition to big data approaches, my research has focussed on developing novel solutions to problems where training sets are small, sparse or fragmented in nature. This is very common in areas such as social media posts during breaking news events, emerging topics within online community forums, criminal marketplaces exhibiting deliberate obfuscation, and historical datasets where information can be inaccurately recorded, corrupted or lost over time. I am interested in investigating emergent patterns in language use by communities, zero/few shot learning, domain adaption, use of domain knowledge to fine-tune algorithms and approaches that preserve domain semantics, provenance and promote explainable and trustworthy AI.

Research outputs

geoparsepy PyPI

open source software github

Research projects

GloSAT - Global Surface Air Temperature : UK NERC platform grant. GloSAT will explore data rescue of climate change data records, including novel NLP approaches for historical sensor record documents scanned using optical character recognition (OCR) and machine learning for mis-positioned ship track detection.

CYShadowWatch - Automated Multilingual Information Extraction for Online Cybercrime Sites : DSTL and NCA funded project. CYShadowWatch will explore how statistical machine translation (MT) and informaton extraction (IE) algorithms can be applied to Russian cybercrime forums to deliver intelligence packages for UK law enforcement.

LPLP - Legal and property language processing : UK Innovate UK funded project. LPLP will develop cutting-edge AI techniques to extract and analyse legal rights and obligations related to property and land, including Natural Language Processing (NLP) algorithms to extract legal rights and obligations from HM Land Registry documents.

FloraGuard project : an UK ESRC funded project. FloraGuard will examine and map from a multidisciplinary perspective the criminal market in endangered plants affecting the UK. Quantitative evidence will come from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; natural language and machine learning used to socio-economically map this activity at a community level.

Intel-Analysis DSTL : a UK DSTL funded project. Intel-Analysis DSTL uses argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence is obtained in real-time from a combination of human intelligence reports and information extraction from social media via natural language processing.

REVEAL project : an EU funded FP7 project. REVEAL aims to advance the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including digita ltext forensics, trust and credibility analytics and decision support for journalists verifying user generated content.

GRAVITATE project : an EU funded H2020 project. Focussed on supporting geometric reconstruction and semantic reunification of cultural heritage objects using techniques such as semantic enrichment using natural language processing, graph matching and 3D geometric matching.

Digital Police Officer (DPO) project : a UK WSI funded project. The DPO project aims to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply natural language processing techniques guided by insights from criminology.

OFERTIE project : an EU funded FP7 project. OFERTIE aims to enhance and use the OFELIA Testbed for OpenFlow Programmable Networking to run experiments to establish how programmable networks can be used to support technical solutions such as multicast and managed QoS, and what business models and value chains would be able to use these solutions in an economically sustainable fashion.

TRIDEC project : an EU funded FP7 project. Focuses on context aware semantic information retrieval and data fusion for crisis management in the Tsunami early warning and Oil rig drilling domains. Work includes geospatial sensor information fusion for decision support, task context management and context aware information filtering of real-time sensor and video event streams.

ENVIROFI project : an EU funded FP7 project. Focuses on context aware semantic information fusion and the creation of future internet environmental enablers. Work includes geospatial sensor information fusion for marine and biodiversity domains, uncertainty context management and context aware information filtering of geo-distributed heterogeneous data streams publishing sensor time series, satellite images, video and web 2.0.

Sort via:TypeorYear



  • Klopfer, M. (Ed.), Simonis, I. (Ed.), Bleier, T., Bozic, B., Bumerl-Lexa, R., da Costa, A., Costes, S., Iosifescu, I., Martin, O., Frysinger, S., Havlik, D., Hilbring, D., Jacques, P., Klopfer, M., Kunz, S., Kutschera, P., Lidstone, M., Middleton, S. E., Roberts, Z., ... Wittamore, K. (2009). SANY: an open service architecture for sensor networks. SANY Consortium.

Book Chapters


Creative Media and Artefacts


COMP3225 Natural Language Processing COMP3222/COMP6246 Machine Learning Technologies
Dr Stuart Middleton
Building 58
Room 58/3087

Room Number : 32/4027A

Share this profile Share this on Facebook Share this on Twitter Share this on Weibo
Privacy Settings