A question I get from time to time is “How to I detect or find the pages on my WordPress site that people don’t visit?”. Where it’s important to know where your traffic actually end up, on some occasions there might be of interest to know what part of your site that are invisible to the world.

This is how I recommend doing it, which is a pretty straight forward approach using only a few tools.

  • Two subsets of data, one from the CMS and one from Google Analytics
  • Excel
  • A cup of coffee

1. Find all the pages on your site

The first thing that needs to be done is to get hold of a complete list of URLs that you actually have on your site. This can be done in numerous ways, all pending what kind of CMS and/or tools you have available to you.

If you are running a site on WordPress, I’d recommend having a look at one of these approaches. The first one requires you to have FTP-access to you site, which is something you should have if you don’t have the cheapest web host service around.

I did some minor alterations to the code found on http://pastebin.com/qEMbQ8Xj to suit or results better, so the finalized version is as follow, where I’ve removed the post title and type, since we really don’t need that information.

[php]posts;

header(‘Content-type:text/plain’);
foreach($posts as $post) {
switch ($post->post_type) {
case ‘nav_menu_item’:
break;
case ‘page’:
$permalink = get_page_link($post->ID);
break;
case ‘post’:
$permalink = get_permalink($post->ID);
break;
default:
$permalink = get_post_permalink($post->ID);
break;
}
echo “\n{$permalink}”;
}
?>[/php]

You should save this code into a file named something temporary, I went for test.php, and upload it to your domain so that it’s reachable from a browser. I used the data for one of my other domains where I post a lot of photography, so it would be guess that people don’t always find the old stuff.

I went to myotherdomain.se/test.php and got a nice list of URLs looking like this:

[code]
http://www.carlbomfoto.se/product/surf-cruiser/
http://www.carlbomfoto.se/bilder-fran-aik-spelarbussmottagning-2015/
http://www.carlbomfoto.se/fototrooper/
http://www.carlbomfoto.se/birka-finsmakare/
http://www.carlbomfoto.se/till-sjoss/
http://www.carlbomfoto.se/nar-dottern-blir-fotonord-pa-riktigt/
…and so on…
[/code]

Just copy/paste that result into an empty sheet in Excel. We now have some raw data we can use to compare the other dataset with.

If you don’t have the possibility to upload a file via FTP to your domain, and have quite small site I would recommend having a look at one of my go to SEO crawlers that comes in a free limited version; the Screaming Frog SEO Spider.

Screaming_Frog_SEO_Spider_3_3_-_Crawl_Mode

 

This will give you the same set of data, and all you have to do is export it into Excel.

Biggest difference in these two is that the former will only give you actual posts and pages, whereas the latter will give you an index of tags and categories as well. This is all pending your needs. You can always rewrite the first piece of php code to show everything as well.

2. Get the data from Google Analytics

All we really have to do here is to export the data from Google analytics for the date range we want to look at.

Pages_-_Google_Analytics

 

Just head over to the report found at Behaviour -> Site Content -> All Pages and select the date range you’re interested in.

Since Google Analytics will exports the rows visible, do not forget to increase the amount of rows shown in the lower right corner so that the list will contain all your data. Then just click Export and select for Excel (XLSX).

I then cleaned up the data somewhat so that all we see is the actual page paths (excluding the domain name) and also removed some /category/ and /tag/ pages from the GA data since the first code export doesn’t include them. If I’ve opted for Screaming Frog data instead I would have left it, it’s just to make sure that we compare the same things, which in this case ended up being actual blog posts really.

3. Hello Excel

All right, just make this crystal clear: Excel is one of the most important tools when analyzing data.

Screenshot_26_04_15_08_26

 

All I have know is basically just two columns of data where A contains the export from point 1 above, and column B contains the GA data we exported in point 2.

Next, I want to know what URLs that are shown in column A don’t exist in column B, because that’s the URLs we have in the CMS but hasn’t been displayed during the selected date range (March 2015, in this case).

I’m using a simple COUNTIF function here to just check row by row in the value in column A exists in the range found in column B.

In column C, my function looks like this; =COUNTIF(B$2:$B$202;A2)>0

Basically, the range is set to entire B column, and the criteria is set to the value in column A. Then just extend this function for the entire sheet and you will see something like this, but with your own data of course.

Unvisited pages on my blog

By filtering on FALSE, I’d get a complete list of URLs on my blog that didn’t receive one singe pageview last month. It’s a little bit of work to pull the right data, but as soon as you have one data set that contains the URLs you actually have, and one data set with visited ones, it’s very quick work in Excel to get the data you want.

Tell your friends...Share on Facebook3Tweet about this on TwitterShare on Google+0Share on LinkedIn16