The Web Data Research Assistant (aka WebDataRA)

WebDataRA is a tool has been developed by Prof Leslie Carr of the Web and Internet Science research group to support researchers using Web sourced data. Its functionality focuses on Twitter, but it also works for Facebook, Google, Google Scholar, SSRN, Github, Quora, ACM DL, Core and (since June 2020) Parler. The ultimate aim is to make advanced network analysis and textual analysis methods accessible to social science and non-programming researchers, especially those who work in an interdisciplinary context.

To install the software, please go to the Chrome Web Store in your Chrome browser. For support please contact Leslie Carr or EPrints Services. If you use it in your research, please acknowledge the Web Science Institute of the University of Southampton in any publications.

Operation

This Chrome extension takes information from search results pages and makes them available in an accessible spreadsheet form, summarising key components (title, contents, date, author etc.) The software understands the layout of Twitter and Facebook timeline pages (plus the search results pages of Google, Google Scholar, github, SSRN and Core), and extracts information as appropriate. The information is saved as an HTML file, which can be subsequently opened directly as a spreadsheet in Excel.

On Twitter and Facebook pages, the software will continuously scroll to the bottom of the page, triggering the server to send more data and to allow all available results to be gathered.

The software requires keypresses in the main browser window to trigger its operation. Press Shift-Ctrl-A to start collecting data from the page. If you are in Twitter or Facebook then this keypress starts the browser scrolling downwards to trigger more data collection as normal. Press Shift-Ctrl-H to halt the data collection and save the data to a file. Press Shift-Ctrl-Q to check progress.

Documentation

The following training resources are available (but need rewriting for version 2):

Misc Notes

More information is available about forthcoming Twitter capabilities.

Google results have a simple classification applied to the hosts. This is currently under development.

Before version 3, large data sets (around 10,000 tweets) cause the browser to run slowly. It may take 10s of seconds to respond to clicks on the WebDataRA window, especially if it means recalculating the table. From version 3, the size of the dataset should be much more manageable.