University of Southampton Electronics and Computer Science
Research
Professional Engagement
Publications
Interests


Research areas

Stuart's research interests are focussed on natural language processing / computational linguistics and information extraction.

Information extraction - extraction and use of syntactic, lexical and semantic information to intelligently search/index/link/infer within and between application domains. This includes relational extraction, named entity recognition, knowledge base population, geoparsing/location extraction, temporal extraction and event/topic detection.

Computational linguistics / Natural Language Processing - exploring new approaches in computational semantics, including lexical semantics (word sense disambiguation, semantic role labelling), statistical semantics, argumentation mining, digital text forensics, knowledge representation and inference.

Domain expertise - law enforcement, media, sensors, environment, legal, health, crisis management and defence.

Stuart's research interest lies in the area between natural language processing and information extraction, developing novel algorithms to discover and exploit patterns in free text and metadata to extract actionable human intelligence and machine-readable knowledge. In a juxtaposition to big data approaches, my research has focussed on developing novel solutions to problems where training sets are small, sparse or fragmented in nature. This is very common in areas such as social media posts during breaking news events, emerging topics within online community forums, criminal marketplaces exhibiting deliberate obfuscation, and historical datasets where information can be inaccurately recorded, corrupted or lost over time. Stuart is interested in investigating emergent patterns in language use by communities, zero/few shot learning, domain adaption, use of domain knowledge to fine-tune algorithms and approaches that preserve domain semantics, provenance and promote explainable and trustworthy AI.

Stuart's geoparsing algorithm 'geoparsepy' is available open source from PyPI [averaging 50 to 60 downloads a month: source pypistats.org], which alongside the annotated geoparsing dataset has become a popular benchmark for researchers within this community.

Steering committees, panels, editorial positions

ACM WebSci'20 Workshop 2020, Socio-technical AI systems for defence, cybercrime and cybersecurity [STAIDCC20]
Guest editor MDPI Sensors journal 2020 special issue “Sensors Application on Early Warning System”
UK Cabinet Office Ministerial AI Roundtable event 2019 on “use of AI in policing”
ATI/DSTL workshop 2019 on “Decision Support for Military Commanders”
RGS-IBG Annual Conference 2018, Using New Forms of Data in Research Session Convenor
IEEE International Conference on Intelligent Environments [IE] 2016 Posters & Short Paper Track Chair
MediaEval Benchmarking Initiative for Multimedia Evaluation [MediaEval] 2016 Verifying Multimedia Use Task Committee
Invited expert for BBC South Today

Programme Committees

Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL) and International Joint Conference on Natural Language Processing (AACL-IJCNLP) 2020
European Conference on Artificial Intelligence (ECAI) 2020
International Conference on Data Science and Advanced Analytics (DSAA) 2018 to 2020
ACM International Conference on Multimedia (ACMMM) 2019
ACL North American Chapter of the Association for Computational Linguistics : Human Language Technologies (NAACL-HLT) 2018
IEEE International Conference on Intelligent Environments (IE) 2016 to 2019
Conversations on chatbots workshop (CONVERSATIONS) in conjunction with INSCI 2017 to 2018
Workshop on Internet for Financial Collective Awareness and Intelligence (IFIN) 2016 to 2017
Workshop on Social News on the Web (SNOW) 2016
Workshop on Web Multimedia Verification (WeMuV) 2015
Intelligent Personalization Workshop @ International Joint Conferences on Artificial Intelligence (IJCAI) 2015
IEEE/WIC/ACM Workshop on Web Personalization, Recommender Systems and Social Media (WPRSM) 2009 to 2014
ACM Conference on Recommender Systems (RECSYS) 2008 to 2013
International Conference on Electronic Commerce and Web Technologies (EC-Web) 2009 to 2013
User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI) 2013
AAAI Intelligent Techniques for Web Personalization (ITWP) 2004 to 2012
IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI) 2010
IEEE Topic Feature Discovery and Opinion Mining (TFDOM) 2010
IEEE Recommender Systems and Personalized Retrieval (RSPR) 2008
ACM Genetic and Evolutionary Computation Conference (GECCO) 2004

Research projects

GloSAT - Global Surface Air Temperature : a UK NERC latform grant. GloSAT aims to improve understanding of climate variability and change. Objectives include information extraction and data rescue of climate change sensors data from historical texts.

CYShadowWatch - Automated Multilingual Information Extraction for Online Cybercrime Sites : a UK DSTL funded project. CYShadowWatch will explore statistical machine translation and information extraction of online Russian cybercrime forums.

Legal & Property Language Processing (LPLP) project : an Innovate UK funded project. LPLP will develop cutting-edge AI techniques to extract and analyse legal rights and obligations related to property and land. Objectives include the development of Natural Language Processing algorithms to extract legal rights and obligations from Land Registry documents and the development of machine learning based legal risk models for property and land.

FloraGuard project : an UK ESRC funded project. FloraGuard will examine and map from a multidisciplinary perspective the criminal market in endangered plants affecting the UK. Quantitative evidence will come from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; natural language & machine learning used to socio-economically map this activity at a community level.

Intel-Analysis DSTL : a UK DSTL funded project. Intel-Analysis DSTL uses argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence is obtained in real-time from a combination of human intelligence reports and information extraction from social media via natural language processing.

REVEAL project : an EU funded FP7 project. REVEAL aims to advance the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including digita ltext forensics, trust and credibility analytics and decision support for journalists verifying user generated content.

GRAVITATE project : an EU funded H2020 project. Focussed on supporting geometric reconstruction and semantic reunification of cultural heritage objects using techniques such as semantic enrichment using natural language processing, graph matching and 3D geometric matching.

Digital Police Officer (DPO) project : a UK WSI funded project. The DPO project aims to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply natural language processing techniques guided by insights from criminology.

OFERTIE project : an EU funded FP7 project. OFERTIE aims to enhance and use the OFELIA Testbed for OpenFlow Programmable Networking to run experiments to establish how programmable networks can be used to support technical solutions such as multicast and managed QoS, and what business models and value chains would be able to use these solutions in an economically sustainable fashion.

TRIDEC project : an EU funded FP7 project. Focuses on context aware semantic information retrieval and data fusion for crisis management in the Tsunami early warning and Oil rig drilling domains. Work includes geospatial sensor information fusion for decision support, task context management and context aware information filtering of real-time sensor and video event streams.

ENVIROFI project : an EU funded FP7 project. Focuses on context aware semantic information fusion and the creation of future internet environmental enablers. Work includes geospatial sensor information fusion for marine and biodiversity domains, uncertainty context management and context aware information filtering of geo-distributed heterogeneous data streams publishing sensor time series, satellite images, video and web 2.0.

DESURBS project : an EU funded FP7 project. Focussed on knowledge-based decision support tools to help planning organizations (councils, city planners, companies) better understand the vulnerabilities and design possibilities when designing safer urban spaces. Work includes semantic enrichment, personalization of best practice reports, advanced visualization and use of mapping/charting tooling.

IRMOS project : an EU funded FP7 project. Focuses on application performance modelling for use in automated Cloud resource provisioning. Work includes using semantically annotated UML diagrams to produce discrete event simulations and optimised cloud provisioning strategies.

SANY project : an EU funded FP6 project. Focuses on interoperability of in-situ sensors and sensor networks. Work includes building an OGC compliant generic sensor information fusion infrastructure and use of semantic OGC standards to handle fusion processes and application datasets.

MUPPITS and POSTMARK projects : UK TSB funded projects. Focuses on media management for post-production companies. Work includes the creation of a media data and metadata warehouse with auditable event tracking and automated media management. Later work focussed on business models and exploitation paths via media partners such as Pinewood studios.

POLYMNIA project : an EU funded FP6 project. An intelligent cross-media platform for personalised leisure and entertainment in thematic parks or venues. Work included an automated video media production tool encoding directorial knowledge for automated personalized video editing.

PrestoSpace project : an EU funded FP6 project. The project's objective is to provide technical solutions and integrated systems for a complete digital preservation of all kinds of audio-visual collections. Work included a distributed rendering system for video restoration and the overall distributed control of mixed-media restoration sub-systems.

Moretea project : an EPSRC project. Electronic notebook project to improve the information environment for chemists doing chemistry - within and beyond the lab using semantic search within a secure environment.

SCULPTEUR project : an EU funded FP5 semantic web project. A digital library system for searching and retrieval of diverse multimedia representations to support the work of professional users in the fine arts. Work included semantic search and retrieval of mixed-media archives.

GEMSS project : an EU funded FP5 medical Grid project. Six medical imaging application are supported within a secure, commercial Grid infrastructure. Work included GRID based negotiation for medical simulation quality of service.

Quickstep project : a hybrid collaborative/content-based recommender system to recommend on-line research papers. Uses kNN multi-class paper classification and an ontology to enhance the profiling process. Two trials of the system were conducted, both lasting 1.5 months with 14 and 24 subjects. The results demonstrated the utility of using an ontological approach to user profiling and how applying domain knowledge can enhance profiling.

Foxtrot project : evolution of the Quickstep recommender system. Uses pearson-r correlation to recommend and kNN classification to profile user interests. An ontological approach is taken to represent user profiles. This allows users to visualize and edit profiles encouraging direct feedback on what the system thinks they are interested in. A year long trial is in progress with hundreds of staff, postgraduates and undergraduates from the university to evaluate the utility of this approach.

See the publications link for details of the above work.

Electronics and Computer Science