{"id":462,"date":"2019-02-22T12:04:15","date_gmt":"2019-02-22T12:04:15","guid":{"rendered":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/?p=462"},"modified":"2019-02-25T08:36:10","modified_gmt":"2019-02-25T08:36:10","slug":"crawl-visualisations-for-southampton-ac-uk","status":"publish","type":"post","link":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/2019\/02\/22\/crawl-visualisations-for-southampton-ac-uk\/","title":{"rendered":"Crawl visualisations for southampton.ac.uk"},"content":{"rendered":"<h3>Visualising our website<\/h3>\n<p><strong>Thanks to Jo Caley (aka SEO Jo) and Rayne Prendergast, our Search Engine Optimisation (SEO) specialists, for putting this blog post together.<\/strong><\/p>\n<p><span style=\"font-weight: 400\">We&#8217;ve been using a tool called &#8216;crawl visualisations&#8217; to reveal some significant issues with the structure of our website.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Once a user is on our website, it&#8217;s very easy for them to get lost or to arrive at content dead-ends. <\/span><span style=\"font-weight: 400\">This leads to a poor user experience, meaning they&#8217;re less likely to return to our website in future.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Search engines (e.g. Google, Bing etc) must crawl <em>a LOT<\/em> of URLs to understand the site. Issues such as duplicate content and sitemap errors can confuse search engines and ultimately waste <a href=\"https:\/\/webmasters.googleblog.com\/2017\/01\/what-crawl-budget-means-for-googlebot.html\">our crawl budget<\/a><\/span><span style=\"font-weight: 400\">, meaning they will not index or rank our pages, which in turn leads to users not finding our pages in their search results. Boo!<\/span><\/p>\n<p><span style=\"font-weight: 400\">We&#8217;re going to deal with this as part of our OneWeb search engine optimisation (SEO) and larger strategy work. We&#8217;ve already made a start by sharing these visualisations and information about the work as part of our <\/span><a href=\"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/2019\/02\/08\/university-challenge\/\"><span style=\"font-weight: 400\">challenge session<\/span><\/a><span style=\"font-weight: 400\"> &#8211; but the data and findings were so compelling we want to share them with you too. <\/span><\/p>\n<h3>Focus on your users<\/h3>\n<p><span style=\"font-weight: 400\">You\u2019ve probably heard us banging the \u2018user needs first\u2019 drum over the past year. Our website is not for us, it\u2019s for the people we seek to serve as a University. So that means focusing and understanding their needs first.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In other words, more about them, less about us.<\/span><\/p>\n<p><span style=\"font-weight: 400\">As part of the preparation for the <\/span><a href=\"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/2019\/02\/08\/university-challenge\/\"><span style=\"font-weight: 400\">challenge session<\/span><\/a><span style=\"font-weight: 400\">, we carried out a <\/span><a href=\"https:\/\/moz.com\/learn\/seo\/crawl-site-audit\"><span style=\"font-weight: 400\">crawl<\/span><\/a><span style=\"font-weight: 400\"> (Search engines crawl websites to discover content and store it in databases) of the main domain <\/span><span style=\"font-weight: 400\"><a href=\"https:\/\/www.southampton.ac.uk\/\">https:\/\/www.southampton.ac.uk<\/a>, using <\/span><a href=\"https:\/\/www.screamingfrog.co.uk\/\"><span style=\"font-weight: 400\">Screaming Frog<\/span><\/a><span style=\"font-weight: 400\">. We then used the output from the crawl to create <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Force-directed_graph_drawing\"><span style=\"font-weight: 400\">force-directed graphs<\/span><\/a><span style=\"font-weight: 400\">; interactive visualisations of our website\u2019s architectures.<\/span><\/p>\n<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-463\" src=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/force-directed-diagram-right-click.gif\" alt=\"\" width=\"600\" height=\"540\" \/><br \/>\n<span style=\"font-weight: 400;font-size: 10pt\">An example of an interactive force-directed crawl diagram from Screaming Frog. Credit: <a href=\"https:\/\/www.screamingfrog.co.uk\/wp-content\/uploads\/2018\/08\/force-directed-diagram-right-click.gif\">Screaming Frog<\/a><\/span><\/p>\n<p style=\"text-align: center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-464 aligncenter\" src=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/grey-nodes.gif\" alt=\"\" width=\"600\" height=\"534\" \/><span style=\"font-size: 10pt\">An example of an interactive force-directed crawl diagram from Screaming Frog. Right click on a node to focus here. Credit: <a href=\"https:\/\/www.screamingfrog.co.uk\/site-architecture-crawl-visualisations\/\">Screaming Frog<\/a><\/span><\/p>\n<h3>Mapping the data<\/h3>\n<p><span style=\"font-weight: 400\">The crawler encountered 323,000 urls in total. Screaming Frog shorten each crawl map to the first 10,000 urls it finds, but even then the file is too large to reproduce online. Here are two screenshots instead, one showing a force-directed crawl diagram, the other a force-directed directory tree diagram.<\/span><\/p>\n<h4>Crawl visualisation<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-465 size-full\" src=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/crawl-vis.png\" alt=\"Force-directed crawl diagram of the University of Southampton\u2019s website https:\/\/www.southampton.ac.uk\" width=\"602\" height=\"508\" srcset=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/crawl-vis.png 602w, https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/crawl-vis-300x253.png 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/p>\n<p style=\"text-align: center\"><span style=\"font-size: 10pt\">The internal linking structure of https:\/\/www.southampton.ac.uk (first 10,000 urls encountered), created using a force-directed crawl diagram from Screaming Frog.\u00a0<\/span><span style=\"font-size: 10pt\"><strong>KEY:\u00a0<\/strong><\/span><span style=\"font-size: 10pt\">H: Home page, VOD: Virtual open day, P: Prospectuses, SS: Student services, RGP: Regulations, guides and policies, SM: Sitemap, PGT: Postgraduate taught course pages, PGR: Postgraduate research pages, UG: Undergraduate pages<\/span><\/p>\n<h4>Directory tree visualisation<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-486\" src=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/annotated-force-directed-site-architecture-visualisation-fullcrawl.260918.jpg\" alt=\"\" width=\"600\" height=\"509\" srcset=\"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/annotated-force-directed-site-architecture-visualisation-fullcrawl.260918.jpg 1000w, https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/annotated-force-directed-site-architecture-visualisation-fullcrawl.260918-300x254.jpg 300w, https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/annotated-force-directed-site-architecture-visualisation-fullcrawl.260918-768x651.jpg 768w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: center\"><span style=\"font-size: 10pt\">The directory tree visualisation of https:\/\/www.southampton.ac.uk (first 10,000 urls encountered), created using a force-directed directory tree diagram from Screaming Frog.\u00a0<\/span><span style=\"font-size: 10pt\">KEY:\u00a0<\/span><span style=\"font-size: 10pt\">H: Home page, MOD: Modules, PS: Programme Specs, HU: Humanities, VOD: Virtual Open Day<\/span><\/p>\n<h3>How to read a crawl visualisation<\/h3>\n<p><span style=\"font-weight: 400\">Both visualisations\u2019 start URL was the home page and show the increasing crawl depth of the site. Crawl depth is the minimum number of clicks it takes to get from the home page to the destination URL.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The green nodes are pages which are indexable, meaning that search engines (e.g. Google, Bing etc). can find the page and return it in search results.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The red nodes are pages which are non-indexable by search engines. They may be non-indexable because the pages:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">are <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pagination\"><span style=\"font-weight: 400\">paginated<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">contain a \u201c<\/span><a href=\"https:\/\/backlinko.com\/nofollow-link\"><span style=\"font-weight: 400\">nofollow<\/span><\/a><span style=\"font-weight: 400\">\u201d or \u201c<\/span><a href=\"https:\/\/moz.com\/learn\/seo\/canonicalization\"><span style=\"font-weight: 400\">canonical<\/span><\/a><span style=\"font-weight: 400\">\u201d tag (for example, to prevent duplicate content)<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/moz.com\/learn\/seo\/redirection\"><span style=\"font-weight: 400\">redirect<\/span><\/a><span style=\"font-weight: 400\">, return an error (http status code)<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">are blocked in the <\/span><a href=\"https:\/\/moz.com\/learn\/seo\/robotstxt\"><span style=\"font-weight: 400\">robots.txt<\/span><\/a><span style=\"font-weight: 400\">. <\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">Therefore, they can identify areas of concern.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">The grey nodes show where a page has child pages, but the visualisation doesn\u2019t show them as it has reached the 10,000 url limit.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The lines represent the link between one URL and another, via the shortest path.<\/span><\/p>\n<h3>Why visualisations are useful<\/h3>\n<p>Crawl visualisations show the internal linking structure of a website. Internal links help to establish hierarchy within a website and pass value and authority around the site. Effectively, this is how search engines might crawl a site and rank the content within.<\/p>\n<p>Directory tree visualisations are useful because they show the organisation of a website and how users might navigate a site.<\/p>\n<p>Either way, they provide scale and perspective and can reveal underlying issues that are otherwise difficult to detect.<\/p>\n<h3>What it all means<\/h3>\n<p><span style=\"font-weight: 400\">Our website is massive. These diagrams represent just 3.1% of the southampton.ac.uk domain.\u00a0<\/span><span style=\"font-weight: 400\">The crawl diagram shows that our website is very segmented, with undergraduate and postgraduate course pages operating as distinct and separate websites in themselves.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Duplicate content and orphan pages (i.e. pages that aren\u2019t linked to any other page) are likely to be issues and the sitemap template seems to be throwing errors. Other potential problem areas include the regulations, guides and policies, prospectuses, and student services sections.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The directory tree diagram shows how deep our website is (10+ levels) and that the \u2018modules\u2019 section is orphaned \u2013 representing a missed opportunity to acquire and engage prospective students through organic search (Google, for instance, tends not to index or rank orphan pages). There are also possible issues here with programme-specification pages &#8211; and more worryingly &#8211; some level 2 pages around the home page. This is a key area which OneWeb will need to address.<\/span><\/p>\n<h3>Next steps<\/h3>\n<p><span style=\"font-weight: 400\">Creating these diagrams has enabled us to visualise the website, albeit on a reduced scale, and produce a list of valuable action points to take forward into OneWeb.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">As part of the <\/span><a href=\"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/2019\/01\/30\/change-is-coming-heres-why\/\"><span style=\"font-weight: 400\">preparation for OneWeb<\/span><\/a><span style=\"font-weight: 400\">, it is vital we drive a focused SEO strategy around keywords and metadata, and test the information architecture of the website with our users. <\/span><\/p>\n<p><span style=\"font-weight: 400\">This will also feed into the strategy work around taxonomies (the way a website organises its data into categories and subcategories), metatagging (keywords and phrases which tell search engines what content to include in search results for users), and continuing iterations of workflows and governance.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">As always, if you have any questions, <a href=\"mailto:digital@soton.ac.uk\">please get in touch<\/a>. Thank you.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visualising our website Thanks to Jo Caley (aka SEO Jo) and Rayne Prendergast, our Search Engine Optimisation (SEO) specialists, for putting this blog post together. We&#8217;ve been using a tool called &#8216;crawl visualisations&#8217; to reveal some significant issues with the structure of our website.\u00a0 Once a user is on our website, it&#8217;s very easy for [&hellip;]<\/p>\n","protected":false},"author":185,"featured_media":465,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-462","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"jetpack_featured_media_url":"https:\/\/www.southampton.ac.uk\/blog\/wp-content\/uploads\/sites\/27\/2019\/02\/crawl-vis.png","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paLsBb-7s","_links":{"self":[{"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/posts\/462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/users\/185"}],"replies":[{"embeddable":true,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/comments?post=462"}],"version-history":[{"count":19,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/posts\/462\/revisions"}],"predecessor-version":[{"id":481,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/posts\/462\/revisions\/481"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/media\/465"}],"wp:attachment":[{"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/media?parent=462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/categories?post=462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.southampton.ac.uk\/blog\/digitalteam\/wp-json\/wp\/v2\/tags?post=462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}