Professor Jonathon Hare

Publications

All my publications are listed below, together with their abstracts and links to the paper in the University ePrints service. For a list of my publications ordered by citations, please see my Google Scholar page.

Filter by type:

Sort by year:

FedTMOS: efficient One-Shot Federated Learning with Tsetlin Machine

Shannon How Shi Qi, Jagmohan Chauhan, Geoff V. Merrett, Jonathon Hare

Conference or Workshop Item

Abstract

One-Shot Federated Learning (OFL) is a promising approach that reduce communication to a single round, minimizing latency and resource consumption. However, existing OFL methods often rely on Knowledge Distillation, which introduce server-side training, increasing latency. While neuron matching and model fusion techniques bypass server-side training, they struggle with alignment when heterogeneous data is present. To address these challenges, we proposed One-Shot Federated Learning with Tsetlin Machine (FedTMOS), a novel data-free OFL framework built upon the low-complexity and class-adaptive properties of the Tsetlin Machine. FedTMOS first clusters then reassigns class-specific weights to form models using an inter-class maximization approach, efficiently generating balanced server models without requiring additional training. Our extensive experiments demonstrate that FedTMOS significantly outperforms its ensemble counterpart by an average of
6.16%, and the leading state-of-the-art OFL baselines by
7.22% across various OFL settings. Moreover, FedTMOS achieves at least a 2.3×
reduction in upload communication costs and a 75×
reduction in server latency compared to methods requiring server-side training. These results establish FedTMOS as a highly efficient and practical solution for OFL scenarios.

Dedicated class subnetworks for SNN class incremental learning

Katy Warr, Jonathon Hare, David Thomas

Conference or Workshop Item

Abstract

We explore an unconventional approach to the Class Incremental Learning (CIL) problem that dedicates a separate subnetwork for the independent learning of each class. This separation ensures that the learning of a new class has no effect on the model’s previously acquired knowledge. We present a learning strategy loosely inspired by biological neuron apoptosis (neuron death) and neurogenesis (neuron birth) combined with other methods to achieve effective learning and minimize the model’s resource requirements. The network is entirely feed-forward and uses low precision inter-neuron spikes combined with simple neuron behaviors, making it suitable for very low power Spiking Neural Network (SNN) realization on appropriate future neuromorphic platforms. We demonstrate the model in an abstract setting to explore the tradeoffs between optimizing for accuracy, network size, and run-time costs and show that, despite no competition between the classes during learning, it is possible to achieve top-1 accuracy of 85% on MNIST.

Improving recall in sparse associative memories that use neurogenesis

Katy Warr, Jonathon Hare, David Barrie Thomas

Article

Abstract

The creation of future low-power neuromorphic solutions requires specialist spiking neural network (SNN) algorithms that are optimized for neuromorphic settings. One such algorithmic challenge is the ability to recall learned patterns from their noisy variants. Solutions to this problem may be required to memorize vast numbers of patterns based on limited training data and subsequently recall the patterns in the presence of noise. To solve this problem, previous work has explored sparse associative memory (SAM)-associative memory neural models that exploit the principle of sparse neural coding observed in the brain. Research into a subcategory of SAM has been inspired by the biological process of adult neurogenesis, whereby new neurons are generated to facilitate adaptive and effective lifelong learning. Although these neurogenesis models have been demonstrated in previous research, they have limitations in terms of recall memory capacity and robustness to noise. In this article, we provide a unifying framework for characterizing a type of SAM network that has been pretrained using a learning strategy that incorporated a simple neurogenesis model. Using this characterization, we formally define network topology and threshold optimization methods to empirically demonstrate greater than 10⁴ times improvement in memory capacity compared to previous work. We show that these optimizations can facilitate the development of networks that have reduced interneuron connectivity while maintaining high recall efficacy. This paves the way for ongoing research into fast, effective, low-power realizations of associative memory on neuromorphic platforms.

Dataset for PhD thesis "Runtime algorithm and hardware management for efficient DNN inference on mobile and embedded platforms"

Lei Xun, Geoff Merrett, Jonathon Hare, Bashir M. Al-Hashimi, Long Tran-Thanh

Abstract

This dataset supports the PhD thesis: Lei Xun (2025), "Runtime Algorithm and Hardware Management for Efficient DNN Inference on Mobile and Embedded Platforms", University of Southampton, Faculty of Engineering and Physical Sciences, School of Electronics and Computer Science, PhD Thesis, 150pp.

Dataset supporting the thesis "Realising the Benefits of Dynamic DNNs on Reconfigurable Hardware"

Anastasios Dimitriou, Geoff Merrett, Jonathon Hare

Abstract

This dataset supports the thesis entitled "Realising the Benefits of Dynamic DNNs on Reconfigurable Hardware" AWARDED BY: University of Southampton DATE OF AWARD: 2025 DESCRIPTION OF THE DATA The data contain all the necessary information that are portrayed in the figures and tables of the thesis "Realising the Benefits of Dynamic DNNs on Reconfigurable Hardware". No specialist software needed to view the data.

A response to copyright and artificial intelligence consultation

Isobel Stark, Nancy Beckett-Jones, Sian Furmage, Jonathon Hare, Isabella Roberts, Hedvig Schmidt

Monograph

Realisation of early-exit dynamic neural networks on reconfigurable hardware

Anastasios Dimitriou, Lei Xun, Jonathon Hare, Geoff V. Merrett

Article

Abstract

Early-Exiting is a strategy that’s becoming popular in Deep Neural Networks (DNNs), as it can lead to faster execution and a reduction in the computational intensity of inference. To achieve this intermediate classifiers abstract information from the input samples to strategically stop forward propagation and generate an output at an earlier stage. Confidence criteria are used to identify easier to recognise samples over the ones that need further filtering. However, such dynamic DNNs have only been realised in conventional computing systems (CPU+GPU) using libraries designed for static networks. In this paper, we do a first exploration to efficiently realise early-exit dynamic DNNs on FPGAs, a platform already proven to be highly effective for neural network applications. We consider two approaches for implementing and executing the intermediate classifiers: pipeline, which uses existing hardware, and parallel, which uses additional dedicated modules. We model their energy needs and execution time and explore their performance using the BranchyNet early exit approach on LeNet-5, AlexNet, VGG19 and ResNet32, and a Xilinx ZCU106 Evaluation Board. We found that the dynamic approaches are at least 24% faster than a static network executed on a FPGA, consuming a minimum of 1.32x lower energy. We further observe that FPGAs can enhance the performance of early-exit dynamic DNNs, by minimising the complexities introduced by the decision intermediate classifiers, through parallel execution. Finally we compare the two approaches, and identify which is best for different network types and confidence levels.

Is saliency really captured by gradient?

Nehal Yasin, Jonathon Hare, Antonia Marcu

Conference or Workshop Item

Abstract

Numerous feature attribution (or saliency) measures have been proposed that utilise the gradients of the output with respect to features. Gradients in this setting unequivocally tell us about feature sensitivity by definition of the gradient, but do they really tell us about feature importance? We challenge the idea that sensitivity and importance are the same, and empirically show that gradients do not necessarily find important features that should be attributed to a models' prediction.

Rethinking Deep Thinking: stable learning of algorithms using Lipschitz Constraints

Jay Bear, Adam Prugel-Bennett, Jonathon Hare

Conference or Workshop Item

Abstract

Iterative algorithms solve problems by taking steps until a solution is reached. Models in the form of Deep Thinking (DT) networks have been demonstrated to learn iterative algorithms in a way that can scale to different sized problems at inference time using recurrent computation and convolutions. However, they are often unstable during training, and have no guarantees of convergence/termination at the solution. This paper addresses the problem of instability by analyzing the growth in intermediate representations, allowing us to build models (referred to as Deep Thinking with Lipschitz Constraints (DT-L)) with many fewer parameters and providing more reliable solutions. Additionally our DT-L formulation provides guarantees of convergence of the learned iterative procedure to a unique solution at inference time. We demonstrate DT-L is capable of robustly learning algorithms which extrapolate to harder problems than in the training set. We benchmark on the traveling salesperson problem to evaluate the capabilities of the modified system in an NP-hard problem where DT fails to learn.

FPGA acceleration of dynamic neural networks: challenges and advancements

Anastasios Dimitriou, Benjamin Biggs, Jonathon Hare, Geoff V. Merrett

Conference or Workshop Item

Abstract

Modern machine learning methods continue to produce models with a high memory footprint and computational complexity that are increasingly difficult to deploy in resource constrained environments. This is, in part, driven by a focus on costly, power-intensive GPUs, which has a feedback effect on the variety of methods and models chosen for development. We advocate for a transition away from the general purpose processing towards a more targeted, power-efficient, form of hardware, the Field-Programmable Gate Array (FPGA). These devices allow the user to programmatically tailor the model processing architecture, resulting in increased inference performance and lower power demands. Their resources however are limited, which leads to the necessity of simplifying the target deep machine learning models. Dynamic Deep Neural Networks (DNNs) are a class of models that go beyond limits of static model compression, by tuning computational workload to the difficultly of inputs on a per-sample basis. In spite of the model simplification capabilities of Dynamic DNNs and the provable efficiency of FPGAs, little work has been done towards accelerating Dynamic DNNs on FPGAs. In this paper we discuss why this occurs by highlighting the challenges and limitations, both at the software and hardware level. We detail the available efficiency, performance gains, and practical benefits of state-of-the-art Dynamic DNN implementations when FPGAs are adopted as the acceleration device. Finally, we present our conclusions and recommendations for continued research in this space

Fluid dynamic DNNs for reliable and adaptive distributed inference on edge devices

Lei Xun, Mingyu Hu, Hengrui Zhao, Amit Kumar Singh, Jonathon Hare, Geoff V. Merrett

Conference or Workshop Item

Abstract

Distributed inference is a popular approach for efficient DNN inference at the edge. However, traditional Static and Dynamic DNNs are not distribution-friendly, causing system reliability and adaptability issues. In this paper, we introduce Fluid Dynamic DNNs (Fluid DyDNNs), tailored for distributed inference. Distinct from Static and Dynamic DNNs, Fluid DyDNNs utilize a novel nested incremental training algorithm to enable independent and combined operation of its sub-networks, enhancing system reliability and adaptability. Evaluation on embedded Arm CPUs with a DNN model and the MNIST dataset, shows that in scenarios of single device failure, Fluid DyDNNs ensure continued inference, whereas Static and Dynamic DNNs fail. When devices are fully operational, Fluid DyDNNs can operate in either a High-Accuracy mode and achieve comparable accuracy with Static DNNs, or in a High-Throughput mode and achieve 2.5x and 2x throughput compared with Static and Dynamic DNNs, respectively.

Dynamic DNNs and runtime management for efficient inference on mobile/embedded devices

Lei Xun, Jonathon Hare, Geoff V. Merrett

Conference or Workshop Item

Abstract

Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to several key advantages in latency, privacy and always-on availability. However, due to limited computing resources, efficient DNN deployment on mobile and embedded platforms is challenging. Although many hardware accelerators and static model compression methods were proposed by previous works, at system runtime, multiple applications are typically executed concurrently and compete for hardware resources. This raises two main challenges: Runtime Hardware Availability and Runtime Application Variability. Previous works have addressed these challenges through either dynamic neural networks that contain sub-networks with different performance trade-offs or runtime hardware resource management. In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time. We co-designed novel Dynamic Super-Networks to maximise runtime system-level performance and energy efficiency on heterogeneous hardware platforms. Compared with SOTA, our experimental results using ImageNet on the GPU of Jetson Xavier NX show our model is 2.4x faster for similar ImageNet Top-1 accuracy, or 5.1% higher accuracy at similar latency. We also designed a hierarchical runtime resource manager that tunes both dynamic neural networks and DVFS at runtime. Compared with the Linux DVFS governor schedutil, our runtime approach achieves up to a 19% energy reduction and a 9% latency reduction in single model deployment scenario, and an 89% energy reduction and a 23% latency reduction in a two concurrent model deployment scenario.

Dataset supporting publication "Realisation of early-exit dynamic neural networks on reconfigurable hardware"

Anastasios Dimitriou, Lei Xun, Jonathon Hare, Geoff Merrett

Abstract

This dataset supports the publication " Realisation of Early-Exit Dynamic Neural Networks on Reconfigurable Hardware " to be published in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. This dataset contains: - 'Fig5a.csv': Data supporting Fig. 5 (a): Experimental results comparing average Execution Time per sample. Average values are calculated based on each exit point trigger rate (ai). - 'Fig5b.csv': Data supporting Fig. 5 (b): Experimental results comparing average Energy Consumption per sample. Average values are calculated based on each exit point trigger rate (ai). - 'Fig6a.csv': Data supporting Fig. 6 (a): Comparison of the pipeline and parallel designs over average execution time across early-exit LeNet-5, AlexNet, VGG19 and ResNet32. - 'Fig6b.csv': Data supporting Fig. 6 (b): Comparison of the pipeline and parallel designs over average energy consumption across early-exit LeNet-5, AlexNet, VGG19 and ResNet32. - 'Fig6c.csv': Data supporting Fig. 6 (c): Comparison of the pipeline and parallel designs over average data movement across early-exit LeNet-5, AlexNet, VGG19 and ResNet32. - 'Table2.csv': Data supporting TABLE II: Execution Time (ms). - 'Table3.csv': Data supporting TABLE III: Energy Consumption (mJ), with (w/ EE) and without (w/oEE) early exits. - 'Table4.csv': Data supporting TABLE IV: Performance Comparison With Exisiting Implementations. - 'Fig7a.csv': Data supporting Fig. 7 (a): On an 3 point early-exit Resnet-32 shows each exit’s trigger rate for different Confidence Thresholds. - 'Fig7b.csv': Data supporting Fig. 7 (b): On an 3 point early-exit Resnet-32 the percentage difference of parallel over pipeline approaches over Energy and Time for different Confidence Thresholds. Related projects: Engineering and Physical Sciences Research Council (EPSRC) under EP/S030069/1 Licence: CC BY 4.0

Efficient deployment of early-exit DNN architectures on FPGA platforms

Anastasios Dimitriou, Geoff Merrett, Jonathon Hare

Conference or Workshop Item

Dataset supporting the publication "Efficient Deployment of Early-Exit DNN Architectures on FPGA Platforms"

Anastasios Dimitriou, Geoff Merrett, Jonathon Hare

Abstract

Dataset supporting publication "Efficient Deployment of Early-Exit DNN Architectures on FPGA Platforms" presented at the conference: Design, Automation & Test in Europe Conference. This dataset contains: 'Fig2a.csv': Data supporting Fig. 2 (a). Execution time in ms of the Dynamic Deep Neural Network on different platforms. (CPU, CPU+GPU, Jetson Xavier and FPGA Xilinx ZCU106) 'Fig2b.csv': Data supporting Fig. 2 (b). Energy consumption and needed power for the execution of the Dynamic Deep Neural network on different platforms. (CPU, CPU+GPU, Jetson Xavier and FPGA Xilinx ZCU106). 'Fig3.csv' : Data supporting Fig. 3 . Number of samples to be firstly correctly predicted after the execution of every layer on ResNet-32. Related projects: Engineering and Physical Sciences Research Council (EPSRC) under EP/S030069/1 Licence: CC BY 4.0

Dataset supporting the conference paper "Fluid dynamic DNNs for reliable and adaptive distributed inference on edge devices"

Mingyu Hu, Lei Xun, Hengrui Zhao, Amit Kumar Singh, Jonathon Hare, Geoff Merrett

Abstract

This dataset supports the publication: "Fluid Dynamic DNNs for Reliable and Adaptive Distributed Inference on Edge Devices" by Lei Xun, Mingyu Hu, Hengrui Zhao, Amit Kumar Singh, Jonathon Hare, Geoff V. Merrett CONFERENCE: Design, Automation and Test in Europe Conference 2024 This dataset includes the experimental results for Figure 2 of the paper, showing the throughput and accuracy of the different models (static, dynamic and fluid) considered under different distributed-system cases (master & worker, master, worker). This dataset contains: -'data.csv': Data supporting Fig. 2. The throughput and accuracy of the different models (static, dynamic and fluid) considered under different distributed-system cases (master & worker, master, worker). Related projects: International Centre for Spatial Computational Learning

Enabling ImageNet-scale deep learning on MCUs for accurate and efficient inference

Sulaiman Sadiq, Jonathon Hare, Simon Craske, Partha Maji, Geoff Merrett

Article

Abstract

Conventional approaches to TinyML achieve high accuracy by deploying the largest deep learning model with highest input resolutions that fit within the size constraints imposed by the microcontroller's (MCUs) fast internal storage and memory. In this paper, we perform an in-depth analysis of prior works to show that models derived within these constraints suffer from low accuracy and, surprisingly, high latency. We propose an alternative approach that enables the deployment of efficient models with low inference latency, but free from the constraints of internal memory. We take a holistic view of typical MCU architectures, and utilise plentiful but slower external memories to relax internal storage and memory constraints. To avoid the lower speed of external memory impacting inference latency, we build on the TinyOps inference framework, which performs operation partitioning and uses overlays via DMA, to accelerate the latency. Using insights from our study, we deploy efficient models from the TinyOps design space onto a range of embedded MCUs achieving record performance on TinyML ImageNet classification with up to 6.7% higher accuracy and 1.4x faster latency compared to state-of-the-art internal memory approaches.

Dynamic DNNs meet runtime resource management for efficient heterogeneous computing

Lei Xun, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Deep Neural Network (DNN) inference is increasingly being deployed on edge devices, driven by the advantages of lower latency and enhanced privacy. However, the deployment of these models on such platforms poses considerable challenges due to the intensive computation and memory access requirements. While various static model compression techniques have been proposed, they often struggle when adapting to the dynamic computing environments of modern heterogeneous platforms. The two main challenges we focus on in our research are: (1) Dynamic Hardware and Runtime Conditions: Modern edge devices are equipped with heterogeneous computing resources, including CPUs, GPUs, NPUs, and FPGAs. Their availability and performance can change dynamically during runtime, influenced by factors such as device state, power constraints, and thermal conditions. Moreover, DNN models may need to share resources with other applications or models, introducing an additional layer of complexity to the quest for consistent performance and efficiency. (2) Dynamic Application Requirements: The same DNN model can be used in a variety of applications, each with unique and potentially fluctuating performance requirements.

In this poster, we will explore the world of dynamic neural networks, with a particular focus on their role in efficient model deployment in dynamic computing environments. Our system leverages runtime trade-offs in both algorithms and hardware to optimize DNN performance and energy efficiency. A cornerstone of our system is the Dynamic-OFA, a dynamic version of the 'once-for-all network', designed to efficiently scale the ConvNet architecture to fit the dynamic application requirements and hardware resources. It exhibits strong generalization across different model architectures, such as Transformer. We will also discuss the benefits of integrating algorithmic techniques with hardware opportunities, including Dynamic Voltage and Frequency Scaling (DVFS) and task mapping. Our experimental results, using ImageNet on a Jetson Xavier NX, reveal that the Dynamic-OFA outperforms state of-the-art Dynamic DNNs, offering up to 3.5x (CPU) and 2.4x (GPU) speed improvements for similar ImageNet Top-1 accuracy, or a 3.8% (CPU) and 5.1% (GPU) increase in accuracy at similar latency.

FedTM: Memory and Communication Efficient Federated Learning with Tsetlin Machine

Shannon How Shi Qi, Jagmohan Chauhan, Geoff V. Merrett, Jonathan Hare

Conference or Workshop Item

Abstract

Federated Learning has been an exciting development in machine learning, promising collaborative learning without compromising privacy. However, the resource-intensive nature of Deep Neural Networks (DNN) has made it difficult to implement FL on edge devices. In a bold step towards addressing this challenge, we present FedTM, the first FL framework to utilize Tsetlin Machine, a low-complexity machine learning alternative. We proposed a two-step aggregation scheme for combining local parameters at the server which addressed challenges such as data heterogeneity, varying participating client ratio and bit-based aggregation. Compared to conventional Federated Averaging (FedAvg) with Convolutional Neural Networks (CNN), on average, FedTM provides a substantial reduction in communication costs by 30.5× and 36.6× reduction in storage memory footprint. Our results demonstrate that FedTM outperforms BiFL-BiML (SOTA) in every FL setting while providing 1.37 − 7.6× reduction in communication costs and 2.93 − 7.2× reduction in run-time memory on our evaluated datasets, making it a promising solution for edge devices.

Dataset supporting publication "Exploration of Decision Sub-Network Architectures for FPGA-based Dynamic DNNs"

Anastasios Dimitriou, Mingyu Hu, Jonathon Hare, Geoffrey Merrett

Abstract

This dataset supports the publication " Exploration of Decision Sub-Network Architectures for FPGA-based Dynamic DNNs " to be published in the Proceedings of the 2023 Design, Automation and Test in Europe Conference and Exhibition. This dataset contains: - 'Fig2a.csv': Data supporting Fig. 2 (a). Execution time in ms of the Dynamic Deep Neural Network on different platforms. (CPU, CPU+GPU, Jetson Xavier and FPGA Xilinx ZCU106). - 'Fig2b.csv': Data supporting Fig. 2 (b). Energy consumption and needed power for the execution of the Dynamic Deep Neural network on different platforms. (CPU, CPU+GPU, Jetson Xavier and FPGA Xilinx ZCU106). Related projects: Engineering and Physical Sciences Research Council (EPSRC) under EP/S030069/1 Licence: CC BY 4.0

Exploration of Decision Sub-Network Architectures for FPGA-based Dynamic DNNs

Anastasios Dimitriou, Mingyu Hu, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Dynamic Deep Neural Networks (DNNs) can achieve faster execution and less computationally intensive inference by spending fewer resources on easy to recognise or less informative parts of an input. They make data-dependent decisions, which strategically deactivate a model’s components, e.g. layers, channels or sub-networks. However, dynamic DNNs have only been explored and applied on conventional computing systems (CPU+GPU) and programmed with libraries designed for static networks, limiting their effects. In this paper, we propose and explore two approaches for efficiently realising the sub-networks that make these decisions on FPGAs. A pipeline approach targets the use of the existing hardware to execute the sub-network, while a parallel approach uses dedicated circuitry for it. We explore the performance of each using the BranchyNet early exit approach on LeNet-5, and evaluate on a Xilinx ZCU106. The pipeline approach is 36% faster than a desktop CPU. It consumes 0.51 mJ per inference, 16x lower than a non-dynamic network on the same platform and 8x lower than an Nvidia Jetson Xavier NX. The parallel approach executes 17% faster than the pipeline approach when on dynamic inference no early exits are taken, but incurs an increase in energy consumption of 28%.

Improving the robustness of neural multiplication units with reversible stochasticity

Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract

Multilayer Perceptrons struggle to learn certain simple arithmetic tasks. Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range. In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges. Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima. A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution. Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.

Exploring the learning mechanisms of neural division modules

Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Article

Abstract

Of the four fundamental arithmetic operations (+, -, $\times$, $\div$), division is considered the most difficult for both humans and computers. In this paper, we show that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers. We propose two novel approaches for division which we call the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciprocal Unit (NMRU), and present improvements for an existing division module, the Real Neural Power Unit (Real NPU). In total we measure robustness over 475 different training sets for setups with and without input redundancy. We discover robustness is greatly affected by the input sign for the Real NPU and NRU, input magnitude for the NMRU and input distribution for every module. Despite this issue, we show that the modules can learn as part of larger end-to-end networks.

Compositing foreground and background using variational autoencoders

Adam Prugel-Bennett, Zezhen Zeng, Jonathon Hare, Mounîm El Yacoubi, Eric Granger, Pong Chi Yuen, Umapada Pal, Nicole Vincent

Conference or Workshop Item

Abstract

We consider the problem of composing images by combining an arbitrary foreground object to some background. To achieve this we use a factorized latent space. Thus we introduce a model called the “Background and Foreground VAE” (BFVAE) that can combine arbitrary foreground and background from an image dataset to generate unseen images. To enhance the quality of the generated images we also propose a VAE-GAN mixed model called “Latent Space Renderer-GAN” (LSR-GAN). This substantially reduces the blurriness of BFVAE images.

A primer for neural arithmetic logic modules

Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Article

Abstract

Neural Arithmetic Logic Modules have become a growing area of interest, though remain a niche field. These modules are neural networks which aim to achieve systematic generalisation in learning arithmetic and/or logic operations such as {+,−,×,÷,≤,AND} while also being interpretable. This paper is the first in discussing the current state of progress of this field, explaining key works, starting with the Neural Arithmetic Logic Unit (NALU). Focusing on the shortcomings of the NALU, we provide an in-depth analysis to reason about design choices of recent modules. A cross-comparison between modules is made on experiment setups and findings, where we highlight inconsistencies in a fundamental experiment causing the inability to directly compare across papers. To alleviate the existing inconsistencies, we create a benchmark which compares all existing arithmetic NALMs. We finish by providing a novel discussion of existing applications for NALU and research directions requiring further exploration.

Dynamic DNNs meet runtime resource management on mobile and embedded platforms

Lei Xun, Bashir Al-Hashimi, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Deep neural network (DNN) inference is increasingly being executed on mobile and embedded platforms due to low latency and better privacy. However, efficient deployment on these platforms is challenging due to the intensive computation and memory access. We propose a holistic system design for DNN performance and energy optimisation, combining the trade-off opportunities in both algorithms and hardware. The system can be viewed as three abstract layers: the device layer contains heterogeneous computing resources; the application layer has multiple concurrent workloads; and the runtime resource management layer monitors the dynamically changing algorithms' performance targets as well as hardware resources and constraints, and tries to meet them by tuning the algorithm and hardware at the same time. Moreover, We illustrate the runtime approach through a dynamic version of 'once-for-all network' (namely Dynamic-OFA), which can scale the ConvNet architecture to fit heterogeneous computing resources efficiently and has good generalisation for different model architectures such as Transformer. Compared to the state-of-the-art Dynamic DNNs, our experimental results using ImageNet on a Jetson Xavier NX show that the Dynamic-OFA is up to 3.5x (CPU), 2.4x (GPU) faster for similar ImageNet Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency. Furthermore, compared with Linux governor (e.g. performance, schedutil), our runtime approach reduces the energy consumption by 16.5% at similar latency.

Image-based attitude determination of co-orbiting satellites using deep learning technologies

Benjamin, Felix Guthrie, Minkwan Kim, Hodei Urrutxua, Jonathon Hare

Article

Abstract

Active debris removal missions pose demanding guidance, navigation and control requirements. We present a novel approach which adopts deep learning technologies to the problem of attitude determination of an uncooperative debris satellite of an a-priori unknown geometry. A siamese convolutional neural network is developed, which detects and tracks inherently useful landmarks from sensor data, after training upon synthetic datasets of visual, LiDAR or RGB-D data. The method is capable of real-time performance while improving upon conventional computer vision-based approaches, and generalises well to previously unseen object geometries, enabling this approach to be a feasible solution for safely performing guidance and navigation in active debris removal, satellite servicing and other close proximity operations. The performance of the algorithm, its sensitivity to model parameters and its robustness to illumination and shadowing conditions, are analysed via numerical simulation.

Unsupervised representation learning via information compression

Zezhen Zeng, Jonathon Hare, Adam Prugel-Bennett, Mounîm El Yacoubi, Eric Granger, Pong Chi Yuen, Umapada Pal, Nicole Vincent

Conference or Workshop Item

Abstract

This paper explores a new paradigm for decomposing an image by seeking a compressed representation of the image through an information bottleneck. The compression is achieved iteratively by reﬁning the reconstruction by adding patches that reduce the residual error. This is achieved by a network that is given the current residual errors and proposes bounding boxes that are down-sampled and passed to a variational auto-encoder (VAE). This acts as the bottleneck. The latent code is decoded by the VAE decoder and up-sampled to correct the reconstruction within the bounding box. The objective is to minimise the size of the latent codes of the VAE and the length of code needed to transmit the residual error. The iterations end when the size of the latent code exceeds the reduction in transmitting the residual error. We show that a very simple implementation is capable of ﬁnding meaningful bounding boxes and using those bounding boxes for downstream applications. We compare our model with other unsupervised object discovery models.

TinyOps: ImageNet Scale Deep Learning on Microcontrollers

Sulaiman Sadiq, Jonathon Hare, Partha Maji, Simon Craske, Geoff Merrett

Conference or Workshop Item

Abstract

Deep Learning on microcontroller (MCU) based IoT devices is extremely challenging due to memory constraints. Prior approaches focus on using internal memory or external memories exclusively which limit either accuracy or latency. We find that a hybrid method using internal and external MCU memories outperforms both approaches in accuracy and latency. We develop TinyOps, an inference engine which accelerates inference latency of models in slow external memory, using a partitioning and overlaying scheme via the available Direct Memory Access (DMA) peripheral to combine the advantages of external memory
(size) and internal memory (speed). Experimental results show that architectures deployed with TinyOps significantly outperform models designed for internal memory with up to 6% higher accuracy and importantly, 1.3-2.2x faster inference latency to set the state-of-the-art in TinyML ImageNet classification. Our work shows that the TinyOps space is more efficient compared to the internal or external memory design spaces and should be explored further for TinyML applications.

Similarity-aware CNN for efficient video recognition at the Edge

Mohammadamin Sabetsarvestani, Jonathon Hare, Bashir Al-Hashimi, Geoff Merrett

Article

Abstract

Convolutional neural networks (CNNs) often extract similar features from successive video frames due to having identical appearances. In contrast, conventional CNNs for video recognition process individual frames with a fixed computational effort. Each video frame is independently processed, resulting in numerous redundant computations and an inefficient use of limited energy resources, particularly for edge computing applications. To alleviate the high energy requirements associated with video frame processing, this paper presented similarity-aware CNNs that recognise similar feature pixels across frames and avoid computations on them. First, with a loss of less than 1% in recognition accuracy, a proposed similarity aware quantization technique increases the average number of unchanged feature pixels across frame pairs by up to 85%. Then, a proposed similarity-aware dataflow improves energy consumption by minimising redundant computations and memory accesses across frame pairs. According to simulation experiments, the proposed dataflow decreases the energy consumed by video frame processing by up to 30%.

Perceptions

Daniela Mihai, Jonathon Hare

Abstract

Perceptions is a study of how a machine perceives a photograph at different layers within its neural network. We generate sets of pen strokes which are drawn by a robot using pen and ink on Bristol board. The illustrations are produced by maximising the similarity between the machine's internal perception of the illustration and chosen target photographs. The study focusses on the difference between different inductive biases (shape versus texture) in the training of the neural network, as well as how the machine's perception changes as a function of depth within its network. The photos chosen are from travels to far away cities, taken before the COVID-19 pandemic.

Shared visual representations of drawing for communication: how do different biases affect human interpretability and intent?

Andreea, Daniela Mihai, Jonathon Hare

Conference or Workshop Item

Abstract

We present an investigation into how representational losses can affect the drawings produced by artificial agents playing a communication game. Building upon recent advances, we show that a combination of powerful pretrained encoder networks, with appropriate inductive biases, can lead to agents that draw recognisable sketches, whilst still communicating well. Further, we start to develop an approach to help automatically analyse the semantic content being conveyed by a sketch and demonstrate that current approaches to inducing perceptual biases lead to a notion of objectness being a key feature despite the agent training being self-supervised.

Physically Embodied Deep Image Optimisation

Andreea, Daniela Mihai, Jonathon Hare

Conference or Workshop Item

Data for Similarity-aware CNN for Efficient Video Recognition at the Edge

Mohammadamin Sabetsarvestani, Jonathon Hare, Bashir Al-Hashimi, Geoffrey Merrett

Abstract

This data is associated with the the "Similarity-aware CNN for Efficient Video Recognition at the Edge" article published in IEEE transaction on Computer-Aided Design of Integrated Circuits and Systems.

Learning to Draw: Emergent Communication through Sketching

Daniela Mihai, Jonathon Hare

Conference or Workshop Item

Abstract

Evidence that visual communication preceded written language and provided a basis for it goes back to prehistory, in forms such as cave and rock paintings depicting traces of our distant ancestors. Emergent communication research has sought to explore how agents can learn to communicate in order to collaboratively solve tasks. Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents. In this work, we explore a visual communication channel between agents that are allowed to draw with simple strokes. Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training. In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret. We hope to encourage future research to consider visual communication as a more flexible and directly interpretable alternative of training collaborative agents.

GhostShiftAddNet: More Features from Energy-Efficient Operations

Jia Bi, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Deep convolutional neural networks (CNNs) are computationally and memory intensive. In CNNs, intensive multiplication can have resource implications that may challenge the ability for effective deployment of inference on resource-constrained edge devices. This paper proposes GhostShiftAddNet, where the motivation is to implement a hardware-efficient deep network: a multiplication-free CNN with less redundant features. We introduce a new bottleneck block, GhostSA, that converts all multiplications in the block to cheap operations. The bottleneck uses an appropriate number of bit-shift filters to process intrinsic feature maps, then applies a series of transformations that consist of bit-shifts with addition operations to generate more feature maps that fully learn information underlying intrinsic features. We schedule the number of bit-shift and addition operations for different hardware platforms. We conduct extensive experiments and ablation studies with desktop and embedded (Jetson Nano) devices for implementation and measurements. We demonstrate the proposed GhostSA block can replace bottleneck blocks in the backbone of state-of-the-art networks architectures and gives improved performance on image classification benchmarks. Further, our GhostShiftAddNet can achieve higher classification accuracy by using fewer FLOPs and parameters (reduced by up to 3x) than GhostNet. When compared to GhostNet, inference latency on the Jetson Nano is improved by about 1.3x and 2x on GPU and CPU respectively.

Dataset for "GhostShiftAddNet: More Features from Energy-Efficient Operations"

Jia Bi, Jonathon Hare, Geoffrey Merrett

Abstract

This dataset supports the publication: GhostShiftAddNet: More Features from Energy-Efficient Operations.' in 'British Machine Vision Conference 2021'.

Learning division with neural arithmetic logic modules

Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract

To achieve systematic generalisation, it first makes sense to master simple tasks such as arithmetic. Of the four fundamental arithmetic operations (+,-,$\times$,$\div$), division is considered the most difficult for both humans and computers. In this paper we show that robustly learning division in a systematic manner remains a challenge even at the simplest level of dividing two numbers. We propose two novel approaches for division which we call the Neural Reciprocal Unit (NRU) and the Neural Multiplicative Reciprocal Unit (NMRU), and present improvements for an existing division module, the Real Neural Power Unit (Real NPU). Experiments in learning division with input redundancy on 225 different training sets, find that our proposed modifications to the Real NPU obtains an average success of 85.3$\%$ improving over the original by 15.1$\%$. In light of the suggestion above, our NMRU approach can further improve the success to 91.6$\%$.

Dynamic transformer for efficient machine translation on embedded devices

Hishan Parry, Lei Xun, Mohammadamin Sabetsarvestani, Jia Bi, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.

Attitude reconstruction of an unknown co-orbiting satellite target using machine learning technologies

Benjamin, Felix Guthrie, Minkwan Kim, Hodei Urrutxua, Jonathon Hare

Conference or Workshop Item

Abstract

Active debris removal missions pose demanding guidance, navigation and control requirements. We propose that novel machine learning techniques can help to meet several of the outstanding requirements. Building upon previous work which adopts machine learning technologies for tracking the rotational state of an unknown and uncooperative debris satellite, we improve the approach by further applying machine learning to make use of past measurements. The attitude of the debris target is reconstructed, thereby enabling different debris removal methods. The construction of a simulation framework for generating accurate labelled image data is presented, with the aim of facilitating further research in this area. Finally, we show that a neural network can also learn to track satellites and identify suitable locations for contact-based removal methods, without a-priori knowledge of the object's geometry.

Dataset for "Dynamic Transformer for Efficient Machine Translation on Embedded Devices"

Hishan Parry, Lei Xun, Mohammadamin Sabetsarvestani, Jia Bi, Jonathon Hare, Geoffrey Merrett

Abstract

This dataset supports the publication: 'Dynamic Transformer for Efficient Machine Translation on Embedded Devices' in '3rd ACM/IEEE Workshop on Machine Learning for CAD (MLCAD'21)'.

Dynamic-OFA: Runtime DNN architecture switching for performance scaling on heterogeneous embedded platforms

Wei Lou, Lei Xun, Mohammadamin Sabetsarvestani, Jia Bi, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Mobile and embedded platforms are increasingly required to efficiently execute computationally demanding DNNs across heterogeneous processing elements. At runtime, the available hardware resources to DNNs can vary considerably due to other concurrently running applications. The performance requirements of the applications could also change under different scenarios. To achieve the desired performance, dynamic DNNs have been proposed in which the number of channels/layers can be scaled in real time to meet different requirements under varying resource constraints. However, the training process of such dynamic DNNs can be costly, since platform-aware models of different deployment scenarios must be retrained to become dynamic. This paper proposes Dynamic-OFA, a novel dynamic DNN approach for state-of-the-art platform-aware NAS models (i.e. Once-for-all network (OFA)). Dynamic-OFA pre-samples a family of sub-networks from a static OFA backbone model, and contains a runtime manager to choose different sub-networks under different runtime environments. As such, Dynamic-OFA does not need the traditional dynamic DNN training pipeline. Compared to the state-of-the-art, our experimental results using ImageNet on a Jetson Xavier NX show that the approach is up to 3.5x (CPU), 2.4x (GPU) faster for similar Top-1 accuracy, or 3.8% (CPU), 5.1% (GPU) higher accuracy at similar latency.

Runtime DNN performance scaling through resource management on heterogeneous embedded platforms

Lei Xun, Bashir Al-Hashimi, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

DNN inference is increasingly being executed locally on embedded platforms, due to the clear advantages in latency, privacy and connectivity. Modern SoCs typically execute a combination of different and dynamic workloads concurrently, it is challenging to consistently meet latency/energy budgets because the local computing resources available to the DNN vary considerably. In this poster, we show how resource management can be applied to optimise the performance of DNN workloads by monitoring and tuning both software and hardware constantly at runtime. This work shows how dynamic DNNs trade-off accuracy with latency/energy/power on heterogeneous embedded CPU-GPU platform.

Image-based attitude determination of co-orbiting satellites enhanced with deep learning technologies

Benjamin, Felix Guthrie, Minkwan Kim, Hodei Urrutxua, Jonathon Hare

Conference or Workshop Item

Abstract

Active debris removal missions pose demanding guidance, navigation and con-trol requirements. We present a novel approach which adopts deep learning technologies to the problem of attitude determination of an uncooperative debris satellite of a-priori unknown geometry. A siamese convolutional neural network is developed, which detects and tracks inherently useful landmarks from sensor data, after training upon synthetic datasets of visual, LiDAR or RGB-D data. The method is capable of real-time performance while significantly improving upon conventional computer vision-based approaches, and generalises well to previously unseen object geometries, enabling this approach to be a feasible so-lution for guidance in active debris removal missions. The performance of the algorithm and its sensitivity to model parameters are analysed via numerical simulation.

Dataset for "Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms"

Wei Lou, Lei Xun, Mohammadamin Sabetsarvestani, Jia Bi, Jonathon Hare, Geoffrey Merrett

Abstract

This dataset supports the publication: 'Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms' in 'Efficient Deep Learning for Computer Vision Workshop at CVPR Conference 2021'.

A primer for neural arithmetic logic modules

Bhumika Mistry, Katayoun Farrahi, Jonathon Hare

Abstract

Neural Arithmetic Logic Modules have become a growing area of interest, though remain a niche field. These units are small neural networks which aim to achieve systematic generalisation in learning arithmetic operations such as {+, -, *, \} while also being interpretive in their weights. This paper is the first in discussing the current state of progress of this field, explaining key works, starting with the Neural Arithmetic Logic Unit (NALU). Focusing on the shortcomings of NALU, we provide an in-depth analysis to reason about design choices of recent units. A cross-comparison between units is made on experiment setups and findings, where we highlight inconsistencies in a fundamental experiment causing the inability to directly compare across papers. We finish by providing a novel discussion of existing applications for NALU and research directions requiring further exploration.

On the structure of cyclic linear disentangled representations

Matthew Painter, Adam Prugel-Bennett, Jonathon Hare

Conference or Workshop Item

Abstract

Disentanglement has seen much work recently for its interpretable properties and the ease at which it can be induced in the latent representations of variational auto-encoders. As a concept, disentanglement has proven hard to precisely define, with many interpretations leading to different metrics which do not necessarily agree. Higgins et al [2018] offer a precise definition of a linear disentangled representation which is grounded in the symmetries of the data. In this work we focus on cyclic symmetry structure. We examine how VAE posterior distributions are affected by different observations of the same problem and find that cyclic structure is encouraged even when it is not explicitly observed. We then find that better prior distributions, found via normalising flows, result in faster convergence and lower encoding costs than the standard Gaussian. We also find that linear representations can be distinguished from standard ones solely through disentanglement metrics scores, possibly due to their highly structured posteriors. Finally, we find preliminary evidence that linear disentangled representations offer better data efficiency than standard disentangled representations.

DEff-ARTS: differentiable efficient ARchiTecture search

Sulaiman Sadiq, Partha Maji, Jonathon Hare, Geoff Merrett

Conference or Workshop Item

Abstract

Manual design of efficient Deep Neural Networks (DNNs) for mobile and edge devices is an involved process which requires expert human knowledge to improve efficiency in different dimensions. In this paper, we present DEff-ARTS, a differentiable efficient architecture search method for automatically deriving CNN architectures for resource constrained devices. We frame the search as a multi-objective optimisation problem where we minimise the classification loss and the computational complexity of performing inference on the target hardware. Our formulation allows for easy trading-off between the sub-objectives depending on user requirements. Experimental results on CIFAR-10 classification showed that our approach achieved a highly competitive test error rate of 3:24% with 30% fewer parameters and multiply and accumulate (MAC) operations compared to Differentiable ARchiTecture Search (DARTS).

Anatomically constrained ResNets exhibit opponent receptive fields; so what?

Ethan William Albert Harris, Andreea Daniela Mihai, Jonathon Hare

Conference or Workshop Item

Abstract

Primate visual systems are well known to exhibit varying degrees of bottlenecks in the early visual pathway. Recent works have shown that the presence of a bottleneck between 'retinal' and 'ventral' parts of artificial models of visual systems, simulating the optic nerve, can cause the emergence of cellular properties that have been observed in primates: namely centre-surround organisation and opponency. To date, however, state-of-the-art convolutional network architectures for classification problems have not incorporated such an early bottleneck. In this paper, we ask what happens if such a bottleneck is added to a ResNet-50 model trained to classify the ImageNet data set. Our experiments show that some of the emergent properties observed in simpler models still appear in these considerably deeper and more complex models, however, there are some notable differences particularly with regard to spectral opponency. The introduction of the bottleneck is experimentally shown to introduce a small but consistent shape bias into the network. Tight bottlenecks are also shown to only have a very slight affect on the top-1 accuracy of the models when trained and tested on ImageNet.

Quasi-Newton's method in the class gradient defined high-curvature subspace

Mark Tuddenham, Adam Prugel-Bennett, Jonathon Hare

Conference or Workshop Item

Abstract

Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be.

On parameterizing higher-order motion for behaviour recognition

Yan Sun, Jonathon Hare, Mark Nixon

Article

Abstract

Human behaviours consist different types of motion; we show how they can be disambiguated into their components in a richer way than that currently possible. Studies on optical flow have concentrated on motion alone without the higher order components: snap, jerk and acceleration. We are the first to show how the acceleration, jerk, snap and their constituent parts can be obtained from image sequences, and can be deployed for analysis, especially of behaviour. We demonstrate the estimation of acceleration in sport, human motion, traffic and in scenes of violent behaviour to demonstrate the wide potential for application of analysis of acceleration. Determining higher order components is suited to the analysis of scenes which contain them: higher order motion is innate to scenes containing acts of violent behaviour, but it is not just for¬ behaviour which contains quickly changing movement: human gait contains acceleration though approaches have yet to consider radial and tangential acceleration, since they concentrate on motion alone. The analysis of synthetic ¬and real-world images illustrates the ability of higher order motion to discriminate different objects under different motion. Then the new approaches are applied in heel strike detection in the analysis of human gait. These results demonstrate that the new approach is ready for developing new applications in behaviour recognition and provides a new basis for future research and applications of higher-order motion analysis.

Linear disentangled representations and unsupervised action estimation

Matthew Painter, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Disentangled representation learning has seen a surge in interest over recent times, generally focusing on new models to optimise one of many disparate disentan- glement metrics. It was only with Symmetry Based Disentangled Representation Learning that a robust mathematical framework was introduced to define precisely what is meant by a “linear disentangled representation”. This framework deter- mines that such representations would depend on a particular decomposition of the symmetry group acting on the data, showing that actions would manifest through irreducible group representations acting on independent representational subspaces. Caselles-Dupré et al. [2019] subsequently proposed the first model to induce and demonstrate a linear disentangled representation in a VAE model. In this work we empirically show that linear disentangled representations are not present in standard VAE models and that they instead require altering the loss landscape to induce them. We proceed to show that such representations are a desirable property with regard to classical disentanglement metrics. Finally we propose a method to induce irreducible representations which forgoes the need for labelled action sequences, as was required by prior work. We explore a number of properties of this method, including the ability to learn from action sequences without knowledge of intermediate states and robustness under visual noise. We also demonstrate that it can successfully learn 4 different symmetries directly from pixels.

How convolutional neural network architecture biases learned opponency and colour tuning

Ethan William Albert Harris, Andreea Daniela Mihai, Jonathon Hare

Article

Abstract

Recent work suggests that changing Convolutional Neural Network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function.
To understand this relationship fully requires a way of quantitatively comparing trained networks.
The fields of electrophysiology and psychophysics have developed a wealth of methods for characterising visual systems which permit such comparisons.
Inspired by these methods, we propose an approach to obtaining spatial and colour tuning curves for convolutional neurons, which can be used to classify cells in terms of their spatial and colour opponency.
We perform these classifications for a range of CNNs with different depths and bottleneck widths.
Our key finding is that networks with a bottleneck show a strong functional organisation: almost all cells in the bottleneck layer become both spatially and colour opponent, cells in the layer following the bottleneck become non-opponent.
The colour tuning data can further be used to form a rich understanding of how colour is encoded by a network.
As a concrete demonstration, we show that shallower networks without a bottleneck learn a complex non-linear colour system, whereas deeper networks with tight bottlenecks learn a simple channel opponent code in the bottleneck layer.
We further develop a method of obtaining a hue sensitivity curve for a trained CNN which enables high level insights that complement the low level findings from the colour tuning data.
We go on to train a series of networks under different conditions to ascertain the robustness of the discussed results.
Ultimately, our methods and findings coalesce with prior art, strengthening our ability to interpret trained CNNs and furthering our understanding of the connection between architecture and learned representation.
Trained models and code for all experiments are available at https://github.com/ecs-vlc/opponency.

Point at the triple: generation of text summaries from knowledge base triples

Pavlos Vougiouklis, Eddy Maddalena, Jonathon Hare, Elena Simperl

Article

Abstract

We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator network, which, in addition to generating regular words from a fixed target vocabulary, is able to verbalise triples in several ways. We undertake an automatic and a human evaluation on single and open-domain summaries generation tasks. Both show that our approach significantly outperforms other data-driven baselines.

Point at the triple: Generation of text summaries from knowledge base triples (Extended Abstract)

Pavlos Vougiouklis, Eddy Maddalena, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

We investigate the problem of generating natural language summaries from knowledge base triples. Our approach is based on a pointer-generator net- work, which, in addition to generating regular words from a fixed target vocabulary, is able to ver- balise triples in several ways. We undertake an au- tomatic and a human evaluation on single and open- domain summaries generation tasks. Both show that our approach significantly outperforms other data-driven baselines.

FSPool: Learning set representations with featurewise sort pooling

Yan Zhang, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set. This can be used to construct a permutation-equivariant auto-encoder that avoids this responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets.

FSPool: Learning set representations with featurewise sort pooling

Yan Zhang, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Traditional set prediction models can struggle with simple datasets due to an issue we call the responsibility problem. We introduce a pooling method for sets of feature vectors based on sorting features across elements of the set to learn better set representations. This can be used to construct a permutation-equivariant auto-encoder, which avoids the responsibility problem. On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions. Used in set classification, FSPool significantly improves accuracy and convergence speed on the set versions of MNIST and CLEVR.

Deep set prediction networks

Yan Zhang, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

We study the problem of predicting a set from a feature vector with a deep neural network. Existing approaches ignore the set structure of the problem and suffer from discontinuity issues as a result. We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem. With a single feature vector as input, we show that our model is able to auto-encode point sets, predict bounding boxes of the set of objects in an image, and predict the attributes of these objects in an image.

Foveated convolutions: improving spatial transformer networks by modelling the retina

Ethan William Albert Harris, Mahesan Niranjan, Jonathon Hare

Conference or Workshop Item

Abstract

Spatial Transformer Networks (STNs) have the potential to dramatically improve performance of convolutional neural networks in a range of tasks. By ‘focusing’ on the salient parts of the input using a differentiable affine transform, a network augmented with an STN should have increased performance, efficiency and interpretability. However, in practice, STNs rarely exhibit these desiderata, instead converging to a seemingly meaningless transformation of the input. We demonstrate and characterise this localisation problem as deriving from the spatial invariance of feature detection layers acting on extracted glimpses. Drawing on the neuroanatomy of the human eye we then motivate a solution: foveated convolutions. These parallel convolutions with a range of strides and dilations introduce specific translational variance into the model. In so doing, the foveated convolution presents an inductive bias, encouraging the subject of interest to be centred in the output of the attention mechanism, giving significantly improved performance.

Spatial and colour opponency in anatomically constrained deep networks

Ethan William Albert Harris, Andreea Daniela Mihai, Jonathon Hare

Conference or Workshop Item

Abstract

Colour vision has long fascinated scientists, who have sought to understand both the physiology of the mechanics of colour vision and the psychophysics of colour perception. We consider representations of colour in anatomically constrained convolutional deep neural networks. Following ideas from neuroscience, we classify cells in early layers into groups relating to their spectral and spatial functionality. We show the emergence of single and double opponent cells in our networks and characterise how the distribution of these cells changes under the constraint of a retinal bottleneck. Our experiments not only open up a new understanding of how deep networks process spatial and colour information, but also provide new tools to help understand the black box of deep learning.

Deep set prediction networks

Yan Zhang, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Current approaches for predicting sets from feature vectors ignore the unordered nature of sets and suffer from discontinuity issues as a result. We propose a general model for predicting sets that properly respects the structure of sets and avoids this problem. With a single feature vector as input, we show that our model is able to auto-encode point sets, predict the set of bounding boxes of objects in an image, and predict the set of attributes of these objects.

Learning representations of sets through optimized permutations

Yan Zhang, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model's ability to learn permutations and set representations with either explicit or implicit supervision on four datasets, on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering.

Opportunities for machine learning and artificial intelligence in a national mapping agency: a perspective on enhancing ordnance survey workflow

Jon Murray, Isabel Sargent, David Holland, A. Gardiner, Kyriaki Dionysopoulou, S. Coupland, Jonathon Hare, P Atkinson

Conference or Workshop Item

Abstract

National Mapping agencies (NMA) are tasked with providing highly accurate geospatial data for a range of customers. This challenge has traditionally been met by combining remote sensing data gathering, field work and manual interpretation and processing of the data. This is a significant logistical undertaking which requires novel approaches to improve potential feature extraction from the available data. Using research undertaken at Great Britain’sNMA, Ordnance Survey (OS)as an example, this paper provides an overview of recent advances in the use of artificial intelligence (AI)to assist in improving feature classification from remotely sensed aerial imagery, describing research using high level neural network architecture to image classification that utilisesconvolutional neural network learning.

Joint deep learning for land cover and land use classification

Ce Zhang, Isabel Sargent, Xin Pan, Huapeng Li, Andy Gardiner, Jonathon Hare, Peter M. Atkinson

Article

Abstract

Land cover (LC) and land use (LU) have commonly been classified separately from remotely sensed imagery, without considering the intrinsically hierarchical and nested relationships between them. In this paper, for the first time, a highly novel Joint Deep Learning framework is proposed and demonstrated for LC and LU classification. The proposed Joint Deep Learning (JDL) model incorporates a multilayer perceptron (MLP) and convolutional neural network (CNN), and is implemented via a Markov process involving iterative updating. In the JDL, LU classification conducted by the CNN is made conditional upon the LC probabilities predicted by the MLP. In turn, those LU probabilities together with the original imagery are re-used as inputs to the MLP to strengthen the spatial and spectral feature representations. This process of updating the MLP and CNN forms a joint distribution, where both LC and LU are classified simultaneously through iteration. The proposed JDL method provides a general framework within which the pixel-based MLP and the patch-based CNN provide mutually complementary information to each other, such that both are refined in the classification process through iteration. Given the well-known complexities associated with the classification of very fine spatial resolution (VFSR) imagery, the effectiveness of the proposed JDL was tested on aerial photography of two large urban and suburban areas in Great Britain (Southampton and Manchester). The JDL consistently demonstrated greatly increased accuracies with increasing iteration, not only for the LU classification, but for both the LC and LU classifications, achieving by far the greatest accuracies for each at around 10 iterations. The average overall classification accuracies were 90.18% for LC and 87.92% for LU for the two study sites, far higher than the initial accuracies and consistently outperforming benchmark comparators (three each for LC and LU classification). This research, thus, represents the first attempt to unify the remote sensing classification of LC (state; what is there?) and LU (function; what is going on there?), where previously each had been considered separately only. It, thus, has the potential to transform the way that LC and LU classification is undertaken in future. Moreover, it paves the way to address effectively the complex tasks of classifying LC and LU from VFSR remotely sensed imagery via joint reinforcement, and in an automatic manner.

T-REx: A large scale alignment of natural language with knowledge base triples

Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Elena Simperl, Frederique Laforest

Conference or Workshop Item

Abstract

Alignments between natural language and Knowledge Base (KB) triples are an essential prerequisite for training machine learning approaches employed in a variety of Natural Language Processing problems. These include Relation Extraction, KB Population, Question Answering and Natural Language Generation from KB triples. Available datasets that provide those alignments are plagued by significant shortcomings – they are of limited size, they exhibit a restricted predicate coverage, and/or they are of unreported quality. To alleviate these shortcomings, we present T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences). T-REx is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates. Additionally, we stress the quality of this language resource thanks to an extensive crowdsourcing evaluation. T-REx is publicly available at: https://w3id.org/t-rex.

Deep cascade learning

Enrique, Salvador Marquez, Jonathon Hare, Mahesan Niranjan

Article

Abstract

In this paper, we propose a novel approach for efficient training of deep neural networks in a bottom-up fashion using a layered structure. Our algorithm, which we refer to as Deep Cascade Learning, is motivated by the Cascade Correlation approach of Fahlman who introduced it in the context of perceptrons. We demonstrate our algorithm on networks of convolutional layers, though its applicability is more general. Such training of deep networks in a cascade, directly circumvents the well-known vanishing gradient problem by ensuring that the output is always adjacent to the layer being trained. We present empirical evaluations comparing our deep cascade training with standard End-End training using back propagation of two convolutional neural network architectures on benchmark image classification tasks (CIFAR-10 and CIFAR-100). We then investigate the features learned by the approach and find that better, domain-specific, representations are learned in early layers when compared to what is learned in End-End training. This is partially attributable to the vanishing gradient problem which inhibits early layer filters to change significantly from their initial settings. While both networks perform similarly overall, recognition accuracy increases progressively with each added layer, with discriminative features learnt in every stage of the network, whereas in End-End training, no such systematic feature representation was observed. We also show that such cascade training has significant computational and memory advantages over End-End training, and can be used as a pre-training algorithm to obtain a better performance.

Neural Wikipedian: generating textual summaries from knowledge base triples

Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Article

Abstract

Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this paper, we investigate the problem of generating natural language summaries for Semantic Web data using neural networks. Our end-to-end trainable architecture encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We explore a set of different approaches that enable our models to verbalise entities from the input set of triples in the generated text. Our systems are trained and evaluated on two corpora of loosely aligned Wikipedia snippets with triples from DBpedia and Wikidata, with promising results.

An object-based convolutional neural network (OCNN) for urban land use classification

Ce Zhang, Isabel Sargent, Xin Pan, Huapeng Li, A. Gardiner, Jonathon Hare, Peter M. Atkinson

Article

Abstract

Urban land use information is essential for a variety of urban-related applications such as urban planning and regional administration. The extraction of urban land use from very fine spatial resolution (VFSR) remotely sensed imagery has, therefore, drawn much attention in the remote sensing community. Nevertheless, classifying urban land use from VFSR images remains a challenging task, due to the extreme difficulties in differentiating complex spatial patterns to derive high-level semantic labels. Deep convolutional neural networks (CNNs) offer great potential to extract high-level spatial features, thanks to its hierarchical nature with multiple levels of abstraction. However, blurred object boundaries and geometric distortion, as well as huge computational redundancy, severely restrict the potential application of CNN for the classification of urban land use. In this paper, a novel object-based convolutional neural network (OCNN) is proposed for urban land use classification using VFSR images. Rather than pixel-wise convolutional processes, the OCNN relies on segmented objects as its functional units, and CNN networks are used to analyse and label objects such as to partition within-object and between-object variation. Two CNN networks with different model structures and window sizes are developed to predict linearly shaped objects (e.g. Highway, Canal) and general (other non-linearly shaped) objects. Then a rule-based decision fusion is performed to integrate the class-specific classification results. The effectiveness of the proposed OCNN method was tested on aerial photography of two large urban scenes in Southampton and Manchester in Great Britain. The OCNN combined with large and small window sizes achieved excellent classification accuracy and computational efficiency, consistently outperforming its sub-modules, as well as other benchmark comparators, including the pixel-wise CNN, contextual-based MRF and object-based OBIA-SVM methods. The proposed method provides the first object-based CNN framework to effectively and efficiently address the complicated problem of urban land use classification from VFSR images.

Detecting heel strikes for gait analysis through acceleration flow

Yan Sun, Jonathon Hare, Mark Nixon

Article

Abstract

In some forms of gait analysis it is important to be able to capture when the heel strikes occur. In addition, in terms of video analysis of gait, it is important to be able to localise the heel where it strikes on the floor. In this paper, a new motion descriptor, acceleration flow, is introduced for detecting heel strikes. The key frame of heel strike can be determined by the quantity of acceleration flow within the Region of Interest (ROI), and positions of the strike can be found from the centre of rotation caused by radial acceleration. Our approach has been tested on a number of databases which were recorded indoors and outdoors with multiple views and walking directions for evaluating the detection rate under various environments. Experiments show the ability of our approach for both temporal detection and spatial positioning. The immunity of this new approach to three anticipated types of noises in real CCTV footage is also evaluated in our experiments. Our acceleration flow detector is shown to be less sensitive to Gaussian white noise, whilst being effective with images of low-resolution and without incomplete body position information when compared to other techniques.

VPRS-based regional decision fusion of CNN and MRF classifications for very fine resolution remotely sensed images

Ce Zhang, Isabel Sargent, Xin Pan, Andy Gardiner, Jonathon Hare, Peter M. Atkinson

Article

Abstract

Recent advances in computer vision and pattern recognition have demonstrated the superiority of deep neural networks using spatial feature representation, such as convolutional neural networks (CNN), for image classification. However, any classifier, regardless of its model structure (deep or shallow), involves prediction uncertainty when classifying spatially and spectrally complicated very fine spatial resolution (VFSR) imagery. We propose here to characterise the uncertainty distribution of CNN classification and integrate it into a regional decision fusion to increase classification accuracy. Specifically, a variable precision rough set (VPRS) model is proposed to quantify the uncertainty within CNN classifications of VFSR imagery, and partition this uncertainty into positive regions (correct classifications) and non-positive regions (uncertain or incorrect classifications). Those “more correct” areas were trusted by the CNN, whereas the uncertain areas were rectified by a Multi-Layer Perceptron (MLP)-based Markov random field (MLP-MRF) classifier to provide crisp and accurate boundary delineation. The proposed MRF-CNN fusion decision strategy exploited the complementary characteristics of the two classifiers based on VPRS uncertainty description and classification integration. The effectiveness of the MRF-CNN method was tested in both urban and rural areas of southern England as well as Semantic Labelling datasets. The MRF-CNN consistently outperformed the benchmark MLP, SVM, MLP-MRF and CNN and the baseline methods. This research provides a regional decision fusion framework within which to gain the advantages of model-based CNN, while overcoming the problem of losing effective resolution and uncertain prediction at object boundaries, which is especially pertinent for complex VFSR image classification.

How biased is your NLG evaluation?

Pavlos Vougiouklis, Eddy Maddalena, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

Human assessments by either experts or crowdworkers are used extensively for the evaluation of systems employed on a variety of text generative tasks. In this paper, we focus on the human evaluation of textual summaries from knowledge base triple-facts. More specifically, we investigate possible similarities between the evaluation that is performed by experts and crowdworkers. We generate a set of summaries from DBpedia triples using a state-of-the-art neural network architecture. These summaries are evaluated against a set of criteria by both experts and crowdworkers. Our results highlight significant differences between the scores that are provided by the two groups.

Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of utmost social and cultural importance to focus efforts on languages whose speakers only have access to limited Wikipedia content. We investigate supporting communities by generating summaries for Wikipedia articles in underserved languages, given structured data as an input.

We focus on an important support for such summaries: ArticlePlaceholders, a dynamically generated content pages in underserved Wikipedias. They enable native speakers to access existing information in Wikidata. To extend those ArticlePlaceholders, we provide a system, which processes the triples of the KB as they are provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is employed with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto, an easily-acquainted, artificial language whose Wikipedia content is maintained by a small but devoted community. With the help of the Arabic and Esperanto Wikipedians, we conduct a study which evaluates not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version.

A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification

Ce Zhang, Xin Pan, Huapeng Li, Andy Gardiner, Isabel Sargent, Jonathon Hare, Peter M. Atkinson

Article

Abstract

The contextual-based convolutional neural network (CNN) with deep architecture and pixel-based multilayer perceptron (MLP) with shallow structure are well-recognized neural network algorithms, representing the state-of-the-art deep learning method and the classical non-parametric machine learning approach, respectively. The two algorithms, which have very different behaviours, were integrated in a concise and effective way using a rule-based decision fusion approach for the classification of very fine spatial resolution (VFSR) remotely sensed imagery. The decision fusion rules, designed primarily based on the classification confidence of the CNN, reflect the generally complementary patterns of the individual classifiers. In consequence, the proposed ensemble classifier MLP-CNN harvests the complementary results acquired from the CNN based on deep spatial feature representation and from the MLP based on spectral discrimination. Meanwhile, limitations of the CNN due to the adoption of convolutional filters such as the uncertainty in object boundary partition and loss of useful fine spatial resolution detail were compensated. The effectiveness of the ensemble MLP-CNN classifier was tested in both urban and rural areas using aerial photography together with an additional satellite sensor dataset. The MLP-CNN classifier achieved promising performance, consistently outperforming the pixel-based MLP, spectral and textural-based MLP, and the contextual-based CNN in terms of classification accuracy. This research paves the way to effectively address the complicated problem of VFSR image classification.

Learning to count objects in natural images for visual question answering

Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

Conference or Workshop Item

Abstract

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

Wikidata2Wikipedia: Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Abstract

The associated repository contains the code and the corpora that were used in order to build a "learnable" system that generates open-domain textual summaries in Arabic and Esperanto given a set of Wikidata triples as input. The two corpora that have been used for the experiments are included in the repository: (i) Wikidata triples aligned with Wikipedia summaries in Arabic and (ii) Wikidata triples aligned with Wikipedia summaries in Esperanto.

Semantic face signatures: recognizing and retrieving faces by verbal descriptions

Nawaf, Yousef Almudhahka, Mark Nixon, Jonathon Hare

Article

Abstract

The adverse visual conditions of surveillance environments and the need to identify humans at a distance have stimulated research in soft biometric attributes. These attributes can be used to describe a human's physical traits semantically and can be acquired without their cooperation. Soft biometrics can also be employed to retrieve identity from a database using verbal descriptions of suspects. In this paper, we explore unconstrained human face identification with semantic face attributes derived automatically from images. The process uses a deformable face model with keypoint localisation which is aligned with attributes derived from semantic descriptions. Our new framework exploits the semantic feature space to infer face signatures from images and bridges the semantic gap between humans and machines with respect to face attributes. We use an unconstrained dataset, LFW-MS4, consisting of all the subjects from view-1 of the LFW database that have four or more samples. Our new approach demonstrates that retrieval via estimated comparative facial soft biometrics yields a match in the top 10.23% of returned subjects. Furthermore, modelling of face image features in the semantic space can achieve an equal error rate of 12.71%. These results reveal the latent benefits of modelling visual facial features in a semantic space. Moreover, they highlight the potential of using images and verbal descriptions to generate comparative soft biometrics for subject identification and retrieval.

Learning to generate Wikipedia summaries for underserved languages from Wikidata

Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

A structured light approach to imaging ancient Near Eastern cylinder seals: how efficient 3D imaging may facilitate corpuswide research

Jacob Dahl, Jonathon Hare, Kate Kelley, Kirk Martinez, David Young, Kaye Kelley, Rachel Wood

Book Section

Abstract

This chapter presents the work of the 12-month project Seals and Their Impressions in the Ancient Near
East (SIANE), a collaborative effort of the University of Southampton, Oxford University and the
University of Paris (Nanterre). Recognising the need for improved visual documentation of ancient
Near Eastern cylinder seals and the potential presented by new technologies, there have been several
approaches to 3D-imaging cylinder seals in recent years (e.g. Pitzalis et al. 2008; Reh et al. 2016;
Wagensonner forthcoming). SIANE focused on the development of equipment and workflow that can
quickly capture the maximum amount of meaningful data from a seal, including 3D data from
structured light and an automated production of ‘digital unwrappings’. The project addressed some
issues regarding the physical mounting of seals and developed a method of efficient data-capture that
allows the imaging of large numbers of cylinder seals for research and presentation purposes. A
particular research benefit from 3D image capture of entire seal collections is the potential for
exploring computer-aided image recognition, which could contribute to comparative glyptic studies
as well as helping to address the question of whether any original seals can be linked to known
ancient impressions on tablets or sealings possibly separated across modern collections.

Comparative face soft biometrics for human identification

Mark Nixon, Nawaf Almudhahka, Jonathon Hare, P. Karampelas, T. Bourlai

Book Section

Abstract

The recent growth in CCTV systems and the challenges of automatically identifying humans under the adverse visual conditions of surveillance have increased the interest in soft biometrics, which are physical attributes that can be used to describe people semantically. Soft biometrics enable human identification based on verbal descriptions, and they can be captured in conditions where it is impossible to acquire traditional biometrics such as iris and fingerprint. The research on facial soft biometrics has tended to focus on identification using categorical attributes, whereas comparative attributes have shown a better accuracy. Nevertheless, the research in comparative facial soft biometrics has been limited to small constrained databases, while identification in surveillance systems involves unconstrained large databases. In this chapter, we explore human identification through comparative facial soft biometrics in large unconstrained databases using the Labelled Faces in the Wild (LFW) database. We propose a novel set of attributes and investigate their significance. Also, we analyse the reliability of comparative facial soft biometrics for realistic databases and explore identification and verification using comparative facial soft biometrics. The results of the performance analysis show that by comparing an unknown subject to a line up of ten subjects only; a correct match will be found in the top 2.08% retrieved subjects from a database of 4038 subjects.

Tackling the small data problem in deep learning with multi-sensor approaches

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Within data science, many problems are solved using machine learning. Recently, with the introduction of deep learning, we see this trend spread out across industries of which archaeological object detection on remote sensor data is a case in point. From the known case studies, we have identified the main issues and developed improvements accordingly.

The main issue of archaeological datasets is that there are only a limited number of known sites which makes the networks prone to overfit. Overfitting happens when a network is trained on too few examples and learns patterns that do not generalize well to new data. To an extent, data augmentation can be used to prevent overfitting, however, the training images would still be highly correlated. Therefore, it is argued that the most effect can be gained by limiting storage of irrelevant features in networks. This can be done by optimising network architectures and additionally by using transfer learning in which pre-trained network are used to initialise training. Regardless of pre-training on datasets without archaeological sites, its trained network can still be useful for the low-level features (including lines and edges). A downside of pre-trained networks is that they can only work with data in the same format as they had been trained with.

Our main contribution is the research into including multi-sensor data. We will present approaches to train networks using images with stacks of data, apply fusion networks and by generating pre-trained networks for the available data of different sensors.

Automated detection of archaeology in the New Forest using deep learning with remote sensor data

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett, Isabel Sargent

Conference or Workshop Item

Abstract

As a result of the New Forest Knowledge project, many new sites were discovered. This was partly due to the undertaken LiDAR survey which was followed by an intensive manual process to interpret the results. The research presented in this paper looks at methods to automate this process especially for round barrow detection using deep learning.

Traditionally, automated methods require manual feature engineering to extract the visual appearance of a site on remote sensing data. Whereas this approach is difficult, expensive and bound to detect a single type of site, recent developments have moved towards automated feature learning of which deep learning is the most notable. In our approach, we use known site locations together with LiDAR data and aerial images to train Convolutional Neural Networks (CNNs). This network is typically constructed of many layers with each representing a different filter (e.g. to detect lines or edges). When this network is trained, each new site location that is fed to the network will update the weights of features to better represent the appearance of sites in the remote sensing data. For this learning process, an accurate dataset is required with a lot of examples and therefore the New Forest is a very suitable case study, especially thanks to the extensive research of the New Forest Knowledge project.

In this paper, our latest results will be presented together with a future perspective on how we can scale our approach to a country wide detection method when computing power becomes even more efficient.

Analysing acceleration for motion analysis

Yan Sun, Jonathon Hare, Mark Nixon

Conference or Workshop Item

Abstract

Previous research in motion analysis of image sequences has generally not considered the basic nature of higher orders of motion such as acceleration. In this work, we disambiguate different types of motion, and in particular focus on acceleration. First, we show acceleration can be computed in a principled manner by extending Horn and Schunck’s algorithm for global optical flow estimation. We then demonstrate an approximation of the acceleration field using an alternative established optical flow technique, since most real motions violate the global smoothness assumption of Horn and Schunck. Furthermore, we decompose acceleration into radial and tangential components for greater depth of understanding of the motion. As a general motion descriptor, we show how acceleration provides the capability for differentiating different types of motion in video sequences.

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christophe Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

Abstract

The linked repository contains the code along with the required corpora that were used in order to build a system that "learns" how to generate English biographies for Semantic Web triples. Two corpora are included: (i) DBpedia triples aligned with Wikipedia biographies and (ii) Wikidata triples aligned with Wikipedia biographies.

Inference and discovery in remote sensing data with features extracted using deep networks

Isabel Sargent, Jonathon Hare, David Young, Olivia Wilson, Charis Doidge, David Holland, Peter M. Atkinson, Max Bramer, Miltos Petridis

Conference or Workshop Item

Abstract

We aim to develop a process by which we can extract generic features from aerial image data that can both be used to infer the presence of objects and characteristics and to discover new ways of representing the landscape. We investigate the fine-tuning of a 50-layer ResNet deep convolutional neural network that was pre-trained with ImageNet data and extracted features at several layers throughout these pre-trained and the fine-tuned networks. These features were applied to several supervised classification problems, obtaining a significant correlation between the classification accuracy and layer number. Visualising the activation of the networks’ nodes found that fine-tuning had not achieved coherent representations at later layers. We conclude that we need to train with considerably more varied data but that, even without fine tuning, features derived from a deep network can produce better classification results than with image data alone.

Automation on steroids: an exploration of why deep learning is dominating automation

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

Traditionally, research initiatives into automated detection of archaeological objects were focussed on feature engineering to detect individual object types. These methods have been criticised for their lack in accuracy which is mostly caused by their inability to capture the variability within an object type and the objects’ appearance across different land cover types.

Recently, rather than further optimizing features, research has shifted towards feature learning which offers more flexibility. This shift was triggered by the overwhelming successes of deep learning (shown for e.g. self-driving cars and medical imagery). A deep convolutional neural network is build-up out of many layers and learns features from images of known objects which are fed to the network. In the early layers of a network only basic abstractions such as lines and edges are learned and as the deeper layers are reached the features get more refined and are able to extract the key characteristics of the object type. This process is very similar to how a human learns although there are some important advantages to the structure of deep networks. For example, they can be designed to incorporate different types of remote sensor data and can hence internally compare this variety of data. In his manner a network will quickly identify obvious false positives and adapt the weights of the layers accordingly. Another important point is that a network can fully appreciate the small variation of pixel values without any image enhancements. For LiDAR data this effect can be demonstrated with a network that identifies a slope in the first layers of the network and later on learns that the slope direction and local relief are important features for a specific object type.

The above listed approaches just scratch the surface of the wide range of possible methods to using deep learning for aerial archaeology. In the end, the shift in research is mainly driven by the far-future concept of a national model which automatically retrains with newly acquired remote sensing data to allow for new discoveries that can further improve the networks.

A future perspective for automated detection of archaeology using deep learning with remote sensor data

Iris, Caroline Kramer, Jonathon Hare, Adam Prugel-Bennett

Conference or Workshop Item

Abstract

An essential aspect of archaeology is the protection of sites from looters, extensive agriculture and erosion. Under this constant threat of destruction, it is of utmost importance that sites are located so that they can be monitored and protected. This is mostly done on the ground or by using remote sensing data such as aerial images or LiDAR derived elevation models. This task is time consuming and requires highly specialised and experienced people and would thus immensely benefit from automation. Within this novel research, the potential of deep learning for the detection of archaeological sites is being assessed.

What do Wikidata and Wikipedia have in common? An analysis of their use of external references

Alessandro Piscopo, Pavlos Vougiouklis, Lucie-Aimee, Frimelle Kaffee, Christopher Phethean, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

Wikidata is a community-driven knowledge graph, strongly linked to Wikipedia. However, the connection between the two projects has been sporadically explored. We investigated the relationship between the two projects in terms of the in- formation they contain by looking at their external references. Our findings show that while only a small number of sources is directly reused across Wikidata and Wikipedia, references of- ten point to the same domain. Furthermore, Wikidata appears to use less Anglo-American-centred sources. These results deserve further in-depth investigation.

Automatic semantic face recognition

Nawaf, Yousef Almudhahka, Mark Nixon, Jonathon Hare

Conference or Workshop Item

Abstract

Recent expansion in surveillance systems has motivated research in soft biometrics that enable the unconstrained recognition of human faces. Comparative soft biometrics show superior recognition performance than categorical soft biometrics and have been the focus of several studies which have highlighted their ability for recognition and retrieval in constrained and unconstrained environments. These studies, however, only addressed face recognition for retrieval using human generated attributes, posing a question about the feasibility of automatically generating comparative labels from facial images. In this paper, we propose an approach for the automatic comparative labelling of facial soft biometrics. Furthermore, we investigate unconstrained human face recognition using these comparative soft biometrics in a human labelled gallery (and vice versa). Using a subset from the LFW dataset, our experiments show the efficacy of the automatic generation of comparative facial labels, highlighting the potential extensibility of the approach to other face recognition scenarios and larger ranges of attributes.

A neural network approach for knowledge-driven response generation

Pavlos Vougiouklis, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Unconstrained human identification using comparative facial soft biometrics

Nawaf Y. Almudhahka, Mark S. Nixon, Jonathon S. Hare

Conference or Workshop Item

Abstract

Soft biometrics are attracting a lot of interest with the spread of surveillance systems, and the need to identify humans at distance and under adverse visual conditions. Comparative soft biometrics have shown a significantly better impact on identification performance compared to traditional categorical soft biometrics. However, existing work that has studied comparative soft biometrics was based on small datasets with samples taken under constrained visual conditions. In this paper, we investigate human identification using comparative facial soft biometrics on a larger and more realistic scale using 4038 subjects from the View 1 subset of the LFW database. Furthermore, we introduce a new set of comparative facial soft biometrics and investigate the effect of these on identification and verification performance. Our experiments show that by using only 24 features and 10 comparisons, a rank-10 identification rate of 96.98% and a verification accuracy of 93.66% can be achieved.

Aligning texts and knowledge bases with semantic sentence simplification

Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl

Conference or Workshop Item

Abstract

Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1050 sentences aligned with 1885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.

Erica the Rhino: a case study in using Raspberry Pi Single Board Computers for interactive art

Philip Basford, Graeme Bragg, Jonathon Hare, Mike Jewell, Kirk Martinez, David Newman, Reena Pau, Ash Smith, Tyler Ward

Article

Abstract

Erica the Rhino is an interactive art exhibit created by the University of Southampton, UK. Erica was created as part of a city wide art trail in 2013 called "Go! Rhinos", curated by Marwell Wildlife, to raise awareness of Rhino conservation. Erica arrived as a white fibreglass shell which was then painted and equipped with 5 Raspberry Pi Single Board Computers (SBC). These computers allowed the audience to interact with Erica through a range of sensors and actuators. In particular, the audience could feed and stroke her to prompt reactions, as well as send her Tweets to change her behaviour. Pi SBCs were chosen because of their ready availability and their educational pedigree. During the deployment, 'coding clubs' were run in the shopping centre where Erica was located, these allowed children to experiment with and program the same components used in Erica. The experience gained through numerous deployments around the country has enabled Erica to be upgraded to increase reliability and ease of maintenance, whilst the release of the Pi 2 has allowed her responsiveness to be improved.

Aligning Texts and Knowledge Bases with Semantic Sentence Simplification

Yassine Mrabet, Pavlos Vougiouklis, Halil Kilicoglu, Claire Gardent, Dina Demner-Fushman, Jonathon Hare, Elena Simperl

Abstract

The linked repository contains the resultant datasets of the Semantic Sentence Simplification (S3) methodology. Two high quality data-to-text corpora have been built: (i) DBpedia triples aligned with single Wikipedia sentences and (ii) triples from the Unified Medical Language System (UMLS) aligned with single MedlinePlus sentences.

Human face identification via comparative soft biometrics

Nawaf Almudhahka, Mark Nixon, Jonathon Hare

Conference or Workshop Item

Detection of Social Events in Streams of Social Multimedia

Jonathon Hare, Sina Samangooei, Mahesan Niranjan, Nicholas Gibbins

Article

Abstract

Combining items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and consume more effectively the torrents of information continuously being made available on the social web. This task is made challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised.

The problem of grouping social media items into meaningful groups can be seen as an ill-posed and application specific unsupervised clustering problem. A fundamental question in multimodal contexts is determining which features best signify that two items should belong to the same grouping.

This paper presents a methodology which approaches social event detection as a streaming multi-modal clustering task. The methodology takes advantage of the temporal nature of social events and as a side benefit, allows for scaling to real-world datasets. Specific challenges of the social event detection task are addressed: the engineering and selection of the features used to compare items to one another; a feature fusion strategy that incorporates relative importance of features; the construction of a single sparse affinity matrix; and clustering techniques which produce meaningful item groups whilst scaling to cluster very large numbers of items.

The state-of-the-art approach presented here is evaluated using the ReSEED dataset with standardised evaluation measures. With automatically learned feature weights, we achieve an F1 score of 0.94, showing that a good compromise between precision and recall of clusters can be achieved. In a comparison with other state-of-the-art algorithms our approach is shown to give the best results.

Entity-based Opinion Mining from Text and Multimedia

Diana Maynard, Jonathon Hare

Book Section

Getting by with a little help from the crowd: optimal human computation approaches to social image labeling

Babak Loni, Jonathon Hare, Mihai Georgescu, Michael Riegler, Mohamed Morchid, Richard Dufour, Martha Larson

Conference or Workshop Item

Abstract

Validating user tags helps to refine them, making them more useful for finding images. In the case of interpretation-sensitive tags, however, automatic (i.e., pixel-based) approaches cannot be expected to deliver optimal results. Instead, human input is key. This paper studies how crowdsourcing-based approaches to image tag validation can achieve parsimony in their use of human input from the crowd, in the form of votes collected from workers on a crowdsourcing platform. Experiments in the domain of social fashion images are carried out using the dataset published by the Crowdsourcing Task of the Mediaeval 2013 Multimedia Benchmark. Experimental results reveal that when a larger number of crowd-contributed votes are available, it is difficult to beat a majority vote. However, additional information sources, i.e., crowdworker history and visual image features, allow us to maintain similar validation performance while making use of less crowd-contributed input. Further, investing in “expensive" experts who collaborate to create definitions of interpretation-sensitive concepts does not necessarily pay off. Instead, experts can cause interpretations of concepts to drift away from conventional wisdom. In short, validation of interpretation-sensitive user tags for social images is possible, with “just a little help from the crowd."

NicePic! A system for extracting attractive photos from Flickr streams

Stefan Siersdorfer, Sergej Zerr, Jose San Pedro, Jonathon Hare

Conference or Workshop Item

Information extraction from multimedia web documents: an open-source platform and testbed

David Dupplaw, Michael Matthews, Richard Johansson, Giulia Boato, Andrea Costanzo, Marco Fontani, Enrico Minack, Elena Demidova, Roi Blanco, Thomas Griffiths, Paul H. Lewis, Jonathon Hare, Alessandro Moschitti

Article

Abstract

The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval.

Placing Photos with a Multimodal Probability Density Function

Jonathon Hare, Jamie Davies, Sina Samangooei, Paul H. Lewis

Conference or Workshop Item

Abstract

Knowing the location where a photograph was taken provides us with data that could be useful in a wide spectrum of applications. With the advance of digital cameras, and with many users exchanging their digital cameras for GPS-enabled mobile phones, photographs annotated with geographical locations are becoming ever more present on photo-sharing websites such as Flickr. However there is still a mass of content that is not geotagged, meaning that algorithms for efficient and accurate geographical estimation of an image are needed. This paper presents a general model for effectively using both textual metadata and visual features of photos to automatically place them on a world map with state-of-the-art performance. In addition, we explore how information from user-modelling can be fused with our model, and investigate the effect such modelling has on performance.

SemanticNews: Enriching publishing of news stories

Jonathon Hare, David Newman, Wim Peters, Mark Greenwood, Jana Eggink

Monograph

Abstract

A central goal for the EPSRC funded Semantic Media Network project is to support interesting collaboration opportunities between researchers in order to foster relationships and encourage working together (EPSRC priority 'Working Together'). SemanticNews was one of the four projects funded in the first round of Semantic Media Network mini-projects, and was collaboration between the Universities of Southampton and Sheffield, together with the BBC.
The SemanticNews project aimed to promote people's comprehension and assimilation of news by augmenting broadcast news discussion and debate with information from the semantic web in the form of linked open data (LOD). The project has laid the foundations for a toolkit for (semi- ) automatic provision of semantic analysis and contextualization of the discussion of current events, encompassing state of the art semantic web technologies including text mining, consolidation against Linked Open Data, and advanced visualisation.
SemanticNews was bootstrapped using episodes of the BBC Question Time programme that already had transcripts and manually curated metadata, which included a list of the topical questions being debated. This information was used to create a workflow that a) extracts relevant entities using established named entity recognition techniques to identify the types of information to contextualise for a news article; b) provides associations with concepts from LOD resources; and, c) visualises the context using information derived from the LOD cloud.
This document forms the final report of the SemanticNews project, and describes in detail the processes and techniques explored for the enrichment of Question Time episodes. The final section of the report discusses how this work could be expanded in the future, and also makes a few recommendations for additional data that could be could be captured during the production process that would make the automatic generation of the contextualisation easier.

Exploiting multimedia in creating and analysing multimedia Web archives

Jonathon Hare, David Dupplaw, Paul H. Lewis, Wendy Hall, Kirk Martinez

Article

Abstract

The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general.

Multimodal Sentiment Analysis of Social Media

Diana Maynard, David Dupplaw, Jonathon Hare

Conference or Workshop Item

Abstract

This paper describes the approach we take to the analysis of social media, combining opinion mining from text and multimedia (images, videos, etc), and centred on entity and event recognition. We examine a particular use case, which is to help archivists select material for inclusion in an archive of social media for preserving community memories, moving towards structured preservation around semantic categories. The textual approach we take is rule-based and builds on a number of sub-components, taking into account issues inherent in social media such as noisy ungrammatical text, use of swear words, sarcasm etc. The analysis of multimedia content complements this work in order to help resolve ambiguity and to provide further contextual information. We provide two main innovations in this work: first, the novel combination of text and multimedia opinion mining tools; and second, the adaptation of NLP tools for opinion mining specific to the problems of social media.

Experiments in Diversifying Flickr Result Sets

Neha Jain, Jonathon Hare, Sina Samangooei, John Preston, Jamie Davies, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

Abstract

The 2013 MediaEval Retrieving Diverse Social Images Task looked to tackling the problem of search result diversification of Flickr results sets formed from queries about geographic places and landmarks. In this paper we describe our approach of using a min-max similarity diversifier coupled with pre-filters and a reranker. We also demonstrate a number of novel features for measuring similarity to use in the diversification step.

Identifying the Geographic Location of an Image with a Multimodal Probability Density Function

Jamie Davies, Jonathon Hare, Sina Samangooei, John Preston, Neha Jain, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

Abstract

There is a wide array of online photographic content that is not geotagged. Algorithms for efficient and accurate geographical estimation of an image are needed to geolocate these photos. This paper presents a general model for using both textual metadata and visual features of photos to automatically place them on a world map.

Social Event Detection via sparse multi-modal feature selection and incremental density based clustering

Sina Samangooei, Jonathon Hare, David Dupplaw, Mahesan Niranjan, Nicholas Gibbins, Paul H. Lewis, Jamie Davies, Neha Jain, John Preston

Conference or Workshop Item

Abstract

Combining items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information now made available on the social web. This task is made challenging due to the scale of the streams and the inherently multimodal nature of the information to be contextualised. We present a methodology which approaches social event detection as a multi-modal clustering task. We address the various challenges of this task: the selection of the features used to compare items to one another; the construction of a single sparse affinity matrix; combining the features; relative importance of features; and clustering techniques which produce meaningful item groups whilst scaling to cluster large numbers of items. In our best tested configuration we achieve an F1 score of 0.94, showing that a good compromise between precision and recall of clusters can be achieved using our technique.

An investigation of techniques that aim to improve the quality of labels provided by the crowd

Jonathon Hare, Maribel Acosta, Anna Weston, E. Simperl, Sina Samangooei, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

Abstract

The 2013 MediaEval Crowdsourcing task looked at the problem of working with noisy crowdsourced annotations of image data. The aim of the task was to investigate possible techniques for estimating the true labels of an image by using the set of noisy crowdsourced labels, and possibly any content and metadata from the image itself. For the runs in this paper, we’ve applied a shotgun approach and tried a number of existing techniques, which include generative probabilistic models and further crowdsourcing.

A Unified, Modular and Multimodal Approach to Search and Hyperlinking Video

John Preston, Jonathon Hare, Sina Samangooei, Jamie Davies, Neha Jain, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

Abstract

This paper describes a modular architecture for searching and hyperlinking clips of TV programmes. The architecture aimed to unify the combination of features from different modalities through a common representation based on a set of probability density functions over the timeline of a programme. The core component of the system consisted of analysis of sections of transcripts based on a textual query. Results show that search is made worse by the addition of other components, whereas in hyperlinking precision is increased by the addition of visual features.

The role of multimedia in archiving community memories

Jonathon S. Hare, David Dupplaw, Wendy Hall, Paul Lewis, Kirk Martinez

Conference or Workshop Item

Abstract

The data contained on the web and social web is inherently multimedia; consisting of a mix of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. This paper explores some uses for the automatic analysis of multimedia data within the context of the archival and post-archival analysis of community memories on the web and social web.

OpenIMAJ – Intelligent Multimedia Analysis in Java

Jonathon Hare, Sina Samangooei, David Dupplaw

Article

Building a Multimedia Web Observatory Platform

Jonathon Hare, David Dupplaw, Wendy Hall, Paul H. Lewis, Kirk Martinez

Conference or Workshop Item

Abstract

The data contained within the web is inherently multimedia; consisting of a rich mix of textual, visual and audio modalities. Prospective Web Observatories need to take this into account from the ground up. This paper explores some uses for the automatic analysis of multimedia data within a Web Observatory, and describes a potential platform for an extensible and scalable multimedia Web Observatory.

The Southampton University Web Observatory

Wendy Hall, Thanassis Tiropanis, Ramine Tinati, Paul Booth, Paul Gaskell, Jonathon Hare, Les Carr

Conference or Workshop Item

Explicit diversification of image search

Jonathon Hare, Paul H. Lewis

Conference or Workshop Item

Abstract

Search result diversification can increase user satisfaction in answering a particular information need. There are many ways of diversify search results. In some cases the user has a clear idea of how they would like to see their results diversified. This work presents a system that is capable of diversifying search results along specific user-specified axes of diversity.

Twitter's visual pulse

Jonathon Hare, Sina Samangooei, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

Abstract

Millions of images are tweeted every day, yet very little research has looked at the non-textual aspect of social media communication. In this work we have developed a system to analyse streams of image data. In particular we explore trends in similar, related, evolving or even duplicated visual artefacts in the mass of tweeted image data — in short, we explore the visual pulse of Twitter.

Practical scalable image analysis and indexing using Hadoop

Jonathon S. Hare, Sina Samangooei, Paul H. Lewis

Article

Abstract

The ability to handle very large amounts of image data is important for image analysis, indexing and retrieval applications. Sadly, in the literature, scalability aspects are often ignored or glanced over, especially with respect to the intricacies of actual implementation details.

In this paper we present a case-study showing how a standard bag-of-visual-words image indexing pipeline can be scaled across a distributed cluster of machines. In order to achieve scalability, we investigate the optimal combination of hybridisations of the MapReduce distributed computational framework which allows the components of the analysis and indexing pipeline to be effectively mapped and run on modern server hardware. We then demonstrate the scalability of the approach practically with a set of image analysis and indexing tools built on top of the Apache Hadoop MapReduce framework. The tools used for our experiments are freely available as open-source software, and the paper fully describes the nuances of their implementation.

Semantically Tagging Images of Landmarks

Heather S. Packer, Jonathon S. Hare, Sina Samangooei, Paul Lewis

Conference or Workshop Item

PicAlert!: a system for privacy-aware image classification and retrieval

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare

Conference or Workshop Item

Abstract

Photo publishing in Social Networks and other Web2.0 applications has become very popular due to the pervasive availability of cheap digital cameras, powerful batch upload tools and a huge amount of storage space. A portion of uploaded images are of a highly sensitive nature, disclosing many details of the users’ private life. We have developed a web service which can detect private images within a user’s photo stream and provide support in making privacy decisions in the sharing context. In addition, we present a privacy-oriented image search application which automatically identifies potentially sensitive images in the result set and separates them from the remaining pictures

Event Detection using Twitter and Structured Semantic Query Expansion

Heather S. Packer, Sina Samangooei, Jonathon S. Hare, Nicholas Gibbins, Paul Lewis

Conference or Workshop Item

Proceedings of the 1st International Workshop on Knowledge Extraction & Consolidation from Social Media (KECSM-2012), Boston, USA, November 12, 2012

Diana Maynard, Stefan Dietze, Wim Peters, Jonathon Hare

Book

I know what you did last summer! - privacy-aware image classification and search

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare, Elena Demidova

Conference or Workshop Item

ImageTerrier: an extensible platform for scalable high-performance image retrieval

Jonathon Hare, Sina Samangooei, David Dupplaw, Paul H. Lewis

Conference or Workshop Item

OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia Analysis and Indexing of Images

Jonathan Hare, Sina Samangooei, David Dupplaw

Conference or Workshop Item

Abstract

OpenIMAJ and ImageTerrier are recently released open-source libraries and tools for experimentation and development of multimedia applications using Java-compatible programming languages. OpenIMAJ (the Open toolkit for Intelligent Multimedia Analysis in Java) is a collection of libraries for multimedia analysis. The image libraries contain methods for processing images and extracting state- of-the-art features, including SIFT. The video and audio libraries support both cross-platform capture and processing. The clustering and nearest-neighbour libraries contain efficient, multi-threaded implementations of clustering algorithms. The clustering library makes it possible to easily create BoVW representations for images and videos. OpenI-MAJ also incorporates a number of tools to enable extremely- large-scale multimedia analysis using distributed computing with Apache Hadoop. ImageTerrier is a scalable, high-performance search engine platform for content-based image retrieval applications using features extracted with the OpenIMAJ library and tools. The ImageTerrier platform provides a comprehensive test-bed for experimenting with image retrieval techniques. The platform incorporates a state-of-the-art implementation of the single-pass indexing technique for constructing inverted indexes and is capable of producing highly compressed index data structures.

Efficient clustering and quantisation of SIFT features: Exploiting characteristics of the SIFT descriptor and interest region detectors under image inversion

Jonathon Hare, Sina Samangooei, Paul Lewis

Conference or Workshop Item

Abstract

The SIFT keypoint descriptor is a powerful approach to encoding local image description using edge orientation histograms. Through codebook construction via k-means clustering and quantisation of SIFT features we can achieve image retrieval treating images as bags-of-words. Intensity inversion of images results in distinct SIFT features for a single local image patch across the two images. Intensity inversions notwithstanding these two patches are structurally identical. Through careful reordering of the SIFT feature vectors, we can construct the SIFT feature that would have been generated from a non-inverted image patch starting with those extracted from an inverted image patch. Furthermore, through examination of the local feature detection stage, we can estimate whether a given SIFT feature belongs in the space of inverted features, or non-inverted features. Therefore we can consistently separate the space of SIFT features into two distinct subspaces. With this knowledge, we can demonstrate reduced time complexity of codebook construction via clustering by up to a factor of four and also reduce the memory consumption of the clustering algorithms while producing equivalent retrieval results.

Analyzing and Predicting Sentiment of Images on the Social Web

Stefan Siersdorfer, Jonathon Hare, Enrico Minack, Fan Deng

Conference or Workshop Item

Abstract

In this paper we study the connection between sentiment of images expressed in metadata and their visual content in the social photo sharing environment Flickr. To this end, we consider the bag-of-visual words representation as well as the color distribution of images, and make use of the SentiWordNet thesaurus to extract numerical values for their sentiment from accompanying textual metadata. We then perform a discriminative feature analysis based on information theoretic methods, and apply machine learning techniques to predict the sentiment of images. Our large-scale empirical study on a set of over half a million Flickr images shows a considerable correlation between sentiment and visual features, and promising results towards estimating the polarity of sentiment in images.

Image and Collateral Text in Support of Auto-annotation and Sentiment Analysis

Pamela Zontone, Giulia Boato, Jonathon Hare, Paul Lewis, Stefan Siersdorfer, Enrico Minack

Conference or Workshop Item

Abstract

We present a brief overview of the way in which image analysis, coupled with associated collateral text, is being used for auto-annotation and sentiment analysis. In particular, we describe our approach to auto-annotation using the graph- theoretic dominant set clustering algorithm and the annotation of images with sentiment scores from SentiWordNet. Preliminary results are given for both, and our planned work aims to explore synergies between the two approaches.

Automatically Annotating the MIR Flickr Dataset: Experimental Protocols, Openly Available Data and Semantic Spaces

Jonathan Hare, Paul Lewis

Conference or Workshop Item

Abstract

The availability of a large, freely redistributable set of high-quality annotated images is critical to allowing researchers in the area of automatic annotation, generic object recognition and concept detection to compare results. The recent introduction of the MIR Flickr dataset allows researchers such access. A dataset by itself is not enough, and a set of repeatable guidelines for performing evaluations that are comparable is required. In many cases it also is useful to compare the machine-learning components of different automatic annotation techniques using a common set of image features. This paper seeks to provide a solid, repeatable methodology and protocol for performing evaluations of automatic annotation software using the MIR Flickr dataset together with freely available tools for measuring performance in a controlled manner. This protocol is demonstrated through a set of experiments using a “semantic space” auto-annotator previously developed by the authors, in combination with a set of visual term features for the images that has been made publicly available for download. The paper also discusses how much training data is required to train the semantic space annotator with the MIR Flickr dataset. It is the hope of the authors that researchers will adopt this methodology and produce results from their own annotators that can be directly compared to those presented in this work.

Semantic Retrieval and Automatic Annotation: Linear Transformations, Correlation and Semantic Spaces

Jonathan Hare, Paul Lewis

Conference or Workshop Item

Abstract

This paper proposes a new technique for auto-annotation and semantic retrieval based upon the idea of linearly mapping an image feature space to a keyword space. The new technique is compared to several related techniques, and a number of salient points about each of the techniques are discussed and contrasted. The paper also discusses how these techniques might actually scale to a real-world retrieval problem, and demonstrates this though a case study of a semantic retrieval technique being used on a real-world data-set (with a mix of annotated and unannotated images) from a picture library.

Image diversity analysis: context, opinion and bias

Pamela Zontone, Giulia Boato, F. G. B. De Natale, Alessia De Rosa, Mauro Barni, Alessandro Piva, Jonathan Hare, David Dupplaw, Paul Lewis

Conference or Workshop Item

Abstract

The diffusion of new Internet and web technologies has increased the distribution of different digital content, such as text, sounds, images and videos. In this paper we focus on images and their role in the analysis of diversity. We consider diversity as a concept that takes into account the wide variety of information sources, and their differences in perspective and viewpoint. We describe a number of different dimensions of diversity; in particular, we analyze the dimensions related to image searches and context analysis, emotions conveyed by images and opinion mining, and bias analysis.

IAM@ImageCLEFphoto 2009: Experiments on Maximising Diversity using Image Features

Jonathan Hare, David Dupplaw, Paul Lewis, Francesca Borri, Alessandro Nardi, Carol Peters

Conference or Workshop Item

Abstract

This paper describes the diversity enabled retrieval system constructed at Southampton for the ImageCLEFphoto 2009 task. The retrieval system used Terrier as the underlying textual indexing and retrieval system, and combined it with a technique for re-ranking the results by maximising the visual dissimilarity of retrieved images. The results show that our visual re-ranking method does indeed work at increasing the diversity in the top results, however, at the same time it causes a slight drop in precision. The text-based approach designed for handling the 'part 1 topics' of the task is also shown to perform very well.

IAM@ImageCLEFPhotoAnnotation 2009: Naïve application of a linear-algebraic semantic space

Jonathan Hare, Paul Lewis, Francesca Borri, Alessandro Nardi, Carol Peters

Conference or Workshop Item

Abstract

This paper describes Southampton's submissions to the 2009 ImageCLEF photo annotation task. For the task we used an annotation system based on the idea of constructing semantic spaces, which was developed previously at Southampton. To represent the image content, we used a combination of different SIFT and Colour-SIFT features detected using the difference-of-Gaussian and MSER techniques. These features were converted into a visual term representation by applying vector quantisation using a codebook learnt from a hierarchical k-means clustering. In terms of EER and AUC, the annotator performs reasonably well, however, it struggles when evaluated using the hierarchical measure proposed for the task, due to the way the annotation confidences are thresholded.

Delivery of QTIv2 question types

Gary Wills, Hugh Davis, Lester Gilbert, Jonathon Hare, Yvonne Howard, Steve Jeyes, David Millard, Robert Sherratt

Article

Abstract

The IMS Question and Test Interoperability (QTI) standard identifies sixteen different question types which may be used in on-line assessment. While some partial implementations exist, the R2Q2 project has developed a complete solution that renders and responds to all sixteen question types as specified. In addition, care has been taken in the R2Q2 project to ensure that the solution produced will allow for future changes in the specification. The design of R2Q2 is described, the focus being on lessons learnt. We describe the architecture and the rationale of the internal Web services and explain the approach taken in implementing the QTI specification, showing how the design allows for future tags to be added with the minimal of programming effort. The QTI standard has not had a great take-up in part due to the lack of tools. In the 2006 JISC Capital, three Assessment projects were commissioned: item authoring, item banking, and QTI-compliant test delivery. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service.

Application of the LifeGuide: the development and quantitative analysis of the 'Internet Doctor'

J.A. Joseph, Lucy Yardley, J. Hare, A Osmond, Yang Yang, Mark J. Weal, Gary B. Wills, S Michie

Conference or Workshop Item

Abstract

LifeGuide is a software package that allows health professionals and researchers with no programming skills to easily and flexibly create, evaluate and modify behavioural interventions. An intervention called the ‘Internet Doctor’ was developed as a way of identifying many of the tools that were required in LifeGuide. The ‘Internet Doctor’ provides people suffering from cold and flu symptoms with tailored advice for the self-care of cold and flu symptoms. Participants were automatically randomised to one of two versions of the website: (i) the full, ‘more interactive’ version, or, (ii) a ‘less interactive’ version which omitted references to the Internet Doctor and links to obtain further information. Participants who viewed the less interactive version were more likely to complete the full consultation cycle for their selected symptom and were also more likely to consult for more symptoms than those in the less interactive version. Few participants clicked on the optional links in the more interactive version. It is concluded that although the more interactive version of the website provided more information, participants did not make full use of the interactive features which displayed this information, and did not consult for as many symptoms, so may not have benefited from the website as much as those viewing the less interactive version.

Designing authoring tools for the creation of on-line behavioural interventions

Adrian Osmond, Jonathan Hare, Joseph Price, Ashley Smith, Mark Weal, Gary Wills, Yang Yang, Lucy Yardley, David De Roure

Conference or Workshop Item

Abstract

Behavioural interventions are used by social scientists to effect change in a person’s behaviour. The LifeGuide project is developing tools to enable the easy creation, deployment and trialling of Internet-based behavioural interventions. The use of on-line behavioural interventions is appealing as it can be more cost effective than face-to-face interventions, can deliver tailored advice at times that suit the participants, and can provide detailed statistical information that can be used to better understand behaviour or demonstrate the efficacy of the interventions themselves. The problem however is that developing on-line interventions is a complex, time-consuming task that often has involved high levels of specialist computing support in construction and delivery. The LifeGuide project is looking to put tools into the hands of domain specialists (psychologists, social scientists, health professionals, etc.) that enable them to easily construct their own behavioural interventions and deploy them on the Internet. This paper looks at the authoring tools currently being developed by the project, assesses their usability through case studies of interventions developed so far, and suggests where the project will look in the future to continue to improve the tools to meet the needs of the wide range of intervention authors.

Introduction to the LifeGuide: software facilitating the development of interactive behaviour change internet interventions

Lucy Yardley, Adrian Osmond, Jonathan Hare, Gary Wills, Mark Weal, David De Roure, Susan Michie

Conference or Workshop Item

Abstract

We are developing a set of software resources named ‘the LifeGuide’ that will enable researchers to collaboratively create, evaluate and modify two central dimensions of behavioural interventions: a) providing tailored advice; b) supporting sustained behaviour.

LifeGuide: a platform for performing web-based behavioural interventions

Jonathan Hare, Adrian Osmond, Yang Yang, Gary Wills, Mark Weal, David De Roure, Judith Joseph, Lucy Yardley

Conference or Workshop Item

Abstract

Behavioural interventions are a technique used by social scientists and health professionals to mediate the behaviour of a subject. Traditionally, interventions take the form of tailored advice given in a face-to-face setting. Internet-based behavioural interventions harness the power of the web to deliver tailored advice to participants at the time that most suits them. The LifeGuide project is a multidisciplinary collaboration with the aim of developing and proving a set of software tools for the development and deployment of internet-based behavioural interventions. The tools developed in LifeGuide cover the complete lifecycle of an intervention, from initial authoring to trialling and refinement to final deployment. Looking ahead, in the longer term we intend to investigate how the LifeGuide toolset can be applied to other domains.

Assessment delivery engine for QTIv2 tests

Gary Wills, Jonathan Hare, Jiri Kajaba, David Argles, Lester Gilbert, David Millard

Conference or Workshop Item

Abstract

The IMS Question and Test Interoperability (QTI) standard has not had a great take-up in part due to the lack of tools. In the 2006 JISC Capital, three Assessment projects were commissioned: item authoring, item banking, and QTI-compliant test delivery. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service. The project first developed a java library to implement the system. This will allow other developers and researchers to build their own system or take aspects of QTI they want to implement.

Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces

Jonathan Hare, Sina Samangooei, Paul Lewis, Mark Nixon

Conference or Workshop Item

Abstract

Semantic spaces encode similarity relationships between objects as a function of position in a mathematical space. This paper discusses three different formulations for building semantic spaces which allow the automatic-annotation and semantic retrieval of images. The models discussed in this paper require that the image content be described in the form of a series of visual-terms, rather than as a continuous feature-vector. The paper also discusses how these term-based models compare to the latest state-of-the-art continuous feature models for auto-annotation and retrieval.

A delivery engine for QTI assessments

Gary Wills, Jonathan Hare, Jiri Kajaba, David Argles, Lester Gilbert, David Millard

Article

Abstract

The IMS Question and Test Interoperability (QTI) standard has had a restricted take-up, in part due to the lack of tools. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to item authoring and item banking services, and the integration of the R2Q2 web service. The tools developed operate with a web client, as a plug-in to Moodle, or as a desktop application. The paper also reports on the load testing of the internal services and concludes that these are best represented as components. The project first developed a Java library to implement the system. This will allow other developers and researchers to build their own system or incorporate aspects of QTI they want to implement

Assessment Delivery Engine for QTIv2 Tests.

Gary Wills, Lester Gilbert, Jonathan Hare, Jiri Kajaba, David Argles, David Millard

Conference or Workshop Item

Abstract

The IMS Question and Test Interoperability (QTI) standard has not had a great take-up in part due to the lack of tools. This paper describes the ‘ASDEL’ test delivery engine, focusing upon its architecture, its relation to the item authoring and item banking services, and the integration of the R2Q2 Web service. The project first developed a java library to implement the system. This will allow other developers and researchers to build their own system or take aspects of QTI they want to implement.

Giving order to image queries

Jonathan Hare, Patrick Sinclair, Paul Lewis, Kirk Martinez, Theo Gevers, Ramesh Jain, Simone Santini

Conference or Workshop Item

Abstract

Users of image retrieval systems often find it frustrating that the image they are looking for is not ranked near the top of the results they are presented. This paper presents a computational approach for ranking keyworded images in order of relevance to a given keyword. Our approach uses machine learning to attempt to learn what visual features within an image are most related to the keywords, and then provide ranking based on similarity to a visual aggregate. To evaluate the technique, a Web 2.0 application has been developed to obtain a corpus of user-generated ranking information for a given image collection that can be used to evaluate the performance of the ranking algorithm.

MapSnapper: Engineering an Efficient Algorithm for Matching Images of Maps from Mobile Phones

Jonathan Hare, Paul Lewis, Layla Gordon, Glenn Hart, Theo Gevers, Ramesh Jain, Simone Santini

Conference or Workshop Item

Abstract

The MapSnapper project aimed to develop a system for robust matching of low-quality images of a paper map taken from a mobile phone against a high quality digital raster representation of the same map. The paper presents a novel methodology for performing content-based image retrieval and object recognition from query images that have been degraded by noise and subjected to transformations through the imaging system. In addition the paper also provides an insight into the evaluation-driven development process that was used to incrementally improve the matching performance until the design specifications were met.

Facing the reality of semantic image retrieval

Peter G. B. Enser, Christine J. Sandom, Jonathon S. Hare, Paul H. Lewis

Article

Abstract

Purpose – To provide a better-informed view of the extent of the semantic gap in image retrieval, and the limited potential for bridging it offered by current semantic image retrieval techniques. Design/methodology/approach – Within an ongoing project, a broad spectrum of operational image retrieval activity has been surveyed, and, from a number of collaborating institutions, a test collection assembled which comprises user requests, the images selected in response to those requests, and their associated metadata. This has provided the evidence base upon which to make informed observations on the efficacy of cutting-edge automatic annotation techniques which seek to integrate the text-based and content-based image retrieval paradigms. Findings – Evidence from the real-world practice of image retrieval highlights the existence of a generic-specific continuum of object identification, and the incidence of temporal, spatial, significance and abstract concept facets, manifest in textual indexing and real-query scenarios but often having no directly visible presence in an image. These factors combine to limit the functionality of current semantic image retrieval techniques, which interpret only visible features at the generic extremity of the generic-specific continuum. Research limitations/implications – The project is concerned with the traditional image retrieval environment in which retrieval transactions are conducted on still images which form part of managed collections. The possibilities offered by ontological support for adding functionality to automatic annotation techniques are considered. Originality/value – The paper offers fresh insights into the challenge of migrating content-based image retrieval from the laboratory to the operational environment, informed by newly-assembled, comprehensive, live data.

Bridging the Semantic Gap in Visual Information Retrieval: End of Project Report

Christine Sandom, Jonathan Hare, Peter Enser, Paul Lewis

Monograph

How to spot a Dalmatian in a pack of Dogs; A data-driven approach to searching unannotated images using natural language

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom

Conference or Workshop Item

Abstract

This poster demonstrates our recent work in the field of intelligent image retrieval in response to real requests from the practitioner domain. The poster shows how we are developing a data-driven 'semantic space' framework for information retrieval which can enable retrieval of unannotated imagery through natural language queries, and also facilitate automatic annotation of imagery.

Semantic Facets: An in-depth Analysis of a Semantic Image Retrieval System

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom

Conference or Workshop Item

Abstract

This paper introduces a faceted model of image semantics which attempts to express the richness of semantic content interpretable within an image. Using a large image data-set from a museum collection the paper shows how the facet representation can be applied. The second half of the paper describes our semantic retrieval system, and demonstrates its use with the museum image collection. A retrieval evaluation is performed using the system to investigate how the retrieval performance varies with respect to each of the facet categories. A number of factors related to the image data-set that affect the quality of retrieval are also discussed.

Delivery of QTIv2 Question Types

Gary Wills, Hugh Davis, Lester Gilbert, Jonathon Hare, Yvonne Howard, Steve Jeyes, David Millard, Robert Sherratt

Conference or Workshop Item

Abstract

The QTI standard identifies sixteen different question types which may be used in on-line assessment. While some partial implementations exist, the R2Q2 project has developed a complete solution that renders and responds to all sixteen question types as specified. In addition, care has been taken in the R2Q2 project to ensure that the solution produced will allow for future changes in the specification. The paper summarises the rationale of Web services and a Service Oriented Architecture, and then demonstrates how the R2Q2 project integrates into JISC’s e-Framework, and the reference model for assessment (FREMA). The design of R2Q2 is described, the focus being on lessons learnt. We describe the architecture and the rationale of the internal Web services and explain the approach taken in implementing the QTI specification, showing how the design allows for future tags to be added with the minimal of programming effort. A major objective of the design was to solve the problem of having to undertake a major redesign and reimplementation as a result of minor modifications to the specification. In the 2006 Capital Programme from JISC, three new projects were commissioned in the area of Assessment: one for authoring of items, one for item banking, and one for a complete test engine as described in the QTI specification. The R2Q2 Web service is at the heart of all three projects and this paper will describe how the R2Q2 Web service will be used.

Saliency for Image Description and Retrieval

Jonathon S. Hare

Thesis

Abstract

We live in a world where we are surrounded by ever increasing numbers of images. More often than not, these images have very little metadata by which they can be indexed and searched. In order to avoid information overload, techniques need to be developed to enable these image collections to be searched by their content. Much of the previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This thesis initially discusses how this problem can be circumvented by using salient interest regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The thesis discusses a number of different saliency detectors that are suitable for robust retrieval purposes and performs a comparison between a number of these region detectors. The thesis then discusses how salient regions can be used for image retrieval using a number of techniques, but most importantly, two techniques inspired from the field of textual information retrieval. Using these robust retrieval techniques, a new paradigm in image retrieval is discussed, whereby the retrieval takes place on a mobile device using a query image captured by a built-in camera. This paradigm is demonstrated in the context of an art gallery, in which the device can be used to find more information about particular images. The final chapter of the thesis discusses some approaches to bridging the semantic gap in image retrieval. The chapter explores ways in which un-annotated image collections can be searched by keyword. Two techniques are discussed; the first explicitly attempts to automatically annotate the un-annotated images so that the automatically applied annotations can be used for searching. The second approach does not try to explicitly annotate images, but rather, through the use of linear algebra, it attempts to create a semantic space in which images and keywords are positioned such that images are close to the keywords that represent them within the space.

Ambient Gestures

Maria Karam, Jonathon Hare, Paul Lewis, m.c. schraefel

Monograph

Abstract

We present Ambient Gestures, a novel gesture-based system designed to support ubiquitous ‘in the environment’ interactions with everyday computing technology. Hand gestures and audio feedback allow users to control computer applications without reliance on a graphical user interface, and without having to switch from the context of a non-computer task to the context of the computer. The Ambient Gestures system is composed of a vision recognition software application, a set of gestures to be processed by a scripting application and a navigation and selection application that is controlled by the gestures. This system allows us to explore gestures as the primary means of interaction within a multimodal, multimedia environment. In this paper we describe the Ambient Gestures system, define the gestures and the interactions that can be achieved in this environment and present a formative study of the system. We conclude with a discussion of our findings and future applications of Ambient Gestures in ubiquitous computing.

Image Auto-annotation using a Statistical Model with Salient Regions

Jiayu Tang, Jonathon S. Hare, Paul H. Lewis

Conference or Workshop Item

Abstract

Traditionally, statistical models for image auto-annotation have been coupled with image segmentation. Considering the performance of the current segmentation algorithms, it can be meaningful to avoid a segmentation stage. In this paper, we propose a new approach to image auto-annotation using statistical models. In this approach, segmentation is avoided through the use of salient regions. The use of the statistical model results in an annotation performance which improves upon our previously proposed saliency-based word propagation technique. We also show that the use of salient regions achieves better results than the use of general image regions or segments.

A Linear-Algebraic Technique with an Application in Semantic Image Retrieval

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom, Hari Sundaram, Milind Naphade, John R. Smith, Yong Rui

Article

Abstract

This paper presents a novel technique for learning the underlying structure that links visual observations with semantics. The technique, inspired by a text-retrieval technique known as cross-language latent semantic indexing uses linear algebra to learn the semantic structure linking image features and keywords from a training set of annotated images. This structure can then be applied to unannotated images, thus providing the ability to search the unannotated images based on keyword. This factorisation approach is shown to perform well, even when using only simple global image features.

The Reality of the Semantic Gap in Image Retrieval

Peter G. B. Enser, Christine J. Sandom, Paul H. Lewis, Jonathon S. Hare

Conference or Workshop Item

Mind the Gap: Another look at the problem of the semantic gap in image retrieval

Jonathon S. Hare, Paul H. Lewis, Peter G. B. Enser, Christine J. Sandom, Edward Y. Chang, Alan Hanjalic, Nicu Sebe

Conference or Workshop Item

Abstract

This paper attempts to review and characterise the problem of the semantic gap in image retrieval and the attempts being made to bridge it. In particular, we draw from our own experience in user queries, automatic annotation and ontological techniques. The first section of the paper describes a characterisation of the semantic gap as a hierarchy between the raw media and full semantic understanding of the media's content. The second section discusses real users' queries with respect to the semantic gap. The final sections of the paper describe our own experience in attempting to bridge the semantic gap. In particular we discuss our work on auto-annotation and semantic-space models of image retrieval in order to bridge the gap from the bottom up, and the use of ontologies, which capture more semantics than keyword object labels alone, as a technique for bridging the gap from the top down.

Bridging the Semantic Gap in Multimedia Information Retrieval: Top-down and Bottom-up approaches

Jonathon S. Hare, Patrick A. S. Sinclair, Paul H. Lewis, Kirk Martinez, Peter G.B. Enser, Christine J. Sandom, Paolo Bouquet, Roberto Brunelli, Jean-Pierre Chanod, Claudia Niederée, Heiko Stoermer

Conference or Workshop Item

Abstract

Semantic representation of multimedia information is vital for enabling the kind of multimedia search capabilities that professional searchers require. Manual annotation is often not possible because of the shear scale of the multimedia information that needs indexing. This paper explores the ways in which we are using both top-down, ontologically driven approaches and bottom-up, automatic-annotation approaches to provide retrieval facilities to users. We also discuss many of the current techniques that we are investigating to combine these top-down and bottom-up approaches.

iGesture: A Platform for Investigating Multimodal, Multimedia Gesture-based Interactions

Jonathon Hare, Maria Karam, Paul Lewis, m.c. schraefel

Monograph

Abstract

This paper introduces the iGesture platform for investigating multimodal gesture based interactions in multimedia contexts. iGesture is a low-cost, extensible system that uses visual recognition of hand movements to support gesture-based input. Computer vision techniques support gesture based interactions that are lightweight, with minimal interaction constraints. The system enables gestures to be carried out 'in the environment' at a distance from the camera, enabling multimodal interaction in a naturalistic, transparent manner in a ubiquitous computing environment. The iGesture system can also be rapidly scripted to enable gesture-based input with a wide variety of applications. In this paper we present the technology behind the iGesture software, and a performance evaluation of the gesture recognition subsystem. We also present two exemplar multimedia application contexts which we are using to explore ambient gesture-based interactions.

Content-based image retrieval using a mobile device as a novel interface

Jonathon S. Hare, Paul H. Lewis, Rainer W. Lienhart, Noburu Babaguchi, Edward Y. Chang

Conference or Workshop Item

Abstract

This paper presents an investigation into the use of a mobile device as a novel interface to a content-based image retrieval system. The initial development has been based on the concept of using the mobile device in an art gallery for mining data about the exhibits, although a number of other applications are envisaged. The paper presents a novel methodology for performing content-based image retrieval and object recognition from query images that have been degraded by noise and subjected to transformations through the imaging system. The methodology uses techniques inspired from the information retrieval community in order to aid efficient indexing and retrieval. In particular, a vector-space model is used in the efficient indexing of each image, and a two-stage pruning/ranking procedure is used to determine the correct matching image. The retrieval algorithm is shown to outperform a number of existing algorithms when used with query images from the mobile device.

On Image Retrieval using Salient Regions with Vector-Spaces and Latent Semantics

Jonathon S. Hare, Paul H. Lewis, Wee-Kheng Leow, Michael S. Lew, Tat-Seng Chua, Wei-Ying Ma, Lekha Chaisorn, Erwin M. Bakker

Article

Abstract

The vector-space retrieval model and Latent Semantic Indexing approaches to retrieval have been used heavily in the field of text information retrieval over the past years. The use of these approaches in image retrieval, however, has been somewhat limited. In this paper, we present methods for using these techniques in combination with an invariant image representation based on local descriptors of salient regions. The paper also presents an evaluation in which the two techniques are used to find images with similar semantic labels.

Saliency-based Models of Image Content and their Application to Auto-Annotation by Semantic Propagation

Jonathon S. Hare, Paul H. Lewis

Conference or Workshop Item

Abstract

In this paper, we propose a model of automatic image annotation based on propagation of keywords. The model works on the premise that visually similar image content is likely to have similar semantic content. Image content is extracted using local descriptors at salient points within the image and quantising the feature-vectors into visual terms. The visual terms for each image are modelled using techniques taken from the information retrieval community. The modelled information from an unlabelled query image is compared to the models of a corpus of labelled images and labels are propagated from the most similar labelled images to the query image.

Salient Regions for Query by Image Content

Jonathon S. Hare, Paul H. Lewis, Peter Enser, Yiannis Kompatsiaris, Noel E. O'Connor

Article

Abstract

Much previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This paper discusses how this problem can be circumvented by using salient interest points and compares and contrasts an extension to previous work in which the concept of scale is incorporated into the selection of salient regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The paper describes and contrasts two such salient region descriptors and compares them through their repeatability rate under a range of common image transforms. Finally, the paper goes on to investigate the performance of one of the salient region detectors in an image retrieval situation.

Scale Saliency: Applications in Visual Matching, Tracking and View-Based Object Recognition

Jonathon S. Hare, Paul H. Lewis

Conference or Workshop Item

Abstract

In this paper, we introduce a novel technique for image matching and feature-based tracking. The technique is based on the idea of using the Scale-Saliency algorithm to pick a sparse number of ‘interesting’ or ‘salient’ features. Feature vectors for each of the salient regions are generated and used in the matching process. Due to the nature of the sparse representation of feature vectors generated by the technique, sub-image matching is also accomplished. We demonstrate the techniques robustness to geometric transformations in the query image and suggest that the technique would be suitable for view-based object recognition. We also apply the matching technique to the problem of feature tracking across multiple video frames by matching salient regions across frame pairs. We show that our tracking algorithm is able to explicitly extract the 3D motion vector of each salient region during the tracking process, using a single uncalibrated camera. We illustrate the functionality of our tracking algorithm by showing results from tracking a single salient region in near real-time with a live camera input.

No lubricants from fluorinated C₆₀

Roger Taylor, Anthony G. Avent, T. John Dennis, Jonathon P. Hare, Harold W. Kroto, David R.M. Walton, John H. Holloway, Eric G. Hope, G. John Langley