Did you know? How to set up exclusions in Site Auditor

How To

Did you know? How to set up exclusions in Site Auditor

In the two months since Site Auditor launched, we’ve crawled more than 4 million pages. We continue to receive great feedback from our users on how to improve it, and have been updating it — adding new features and fixing bugs — almost daily.

Most recently, we added the ability to exclude errors and specific paths, allowing you to fine-tune your crawl so that you’re only notified about issues important to your specific goals.

Let’s go through both of these new exclusion options in detail.

Website path exclusions

In some instances, you’ll want to exclude certain parts of your site from being crawled.

For example, WordPress creates separate archive pages for each tag and category you use on your site. When you apply multiple tags or categories to a post, that post appears on the archive page for each. And when Site Auditor crawls the site, it picks up on this as duplicate content.

Since you know this isn’t an issue to be concerned about, you can stop Site Auditor from crawling these tag and category archives by setting up a Website Path Exclusion. To do this, go to Site > Auditor and click on the wrench icon to get to Settings.

Raven Site Auditor: Exclusions

In the Website Path Exclusions section, click the “Create New Exclusion” button.

Raven Site Auditor: Exclusions

You can exclude various paths from the crawl, but for this example we’ll exclude all category pages on the Raven blog from being crawled and will enter /category/* for our exclusion.

Adding the asterisk after category/ tells Site Auditor to ignore the URL raventools.com/blog/category and all folders and files below it, like raventools.com/blog/category/seo/ and raventools.com/blog/category/raven/.

Raven Site Auditor: Exclusions

Click the “Create New Exclusion” button, and your exclusion will be saved. Going forward, Site Auditor will no longer crawl URLs according to your exclusions.

If you ever change your mind and want Site Auditor to start crawling those excluded URLs again, just go back to the Website Path Exclusions section in Settings and delete one or all of the exclusions you created.

Raven Site Auditor: Exclusions

Error exclusions

If Site Auditor reports certain issues that you already know about but don’t consider a problem, or don’t want to report to your client, you can exclude them from your total number of issues and from showing up on the summary sub-tabs of Site Auditor.

For example, let’s say that you have your robots.txt file set up to block login pages, or maybe your 404 error page, from being crawled. The first time Site Auditor crawls your site, seeing these URLs in your report is probably a helpful reminder. But if you don’t want this information to be factored into your total crawl issues going forward, you can exclude this metric from future crawls.

Raven Site Auditor: Exclusions

To do this, go into Site Auditor > Settings (the wrench icon again), and find the Exclude Errors from Report section. From the drop-down, select “Pages blocked by robots.txt” and then save.

Raven Site Auditor: Exclusions

Immediately after saving, “Pages blocked by robots.txt” will no longer appear on the Summary tab under Visibility Issues, and the number of issues related to this metric will no longer count toward the total number of issues reported for your site.

Raven Site Auditor Exclusions

You will still be able to find metrics for the errors you’ve excluded under the main tabs, in case you want to check in from time to time to make sure nothing has gone awry with your site.

Raven Site Auditor: Exclusions

If you haven’t crawled your site yet, now’s a great time to get started and try out these new features. let us know what you think!

As Raven's quality analyst, Megan works with the development team to test new features before they are released into the wild. She dreams of living in an elaborate tree house eating Fruit Roll-Ups and gaming the days away.

More about Megan Morris | @RavenMegan

Comments are closed on this post