Google, Bing, Qwant need access: Here is how you set-up your robots.txt file correctly.
How to check your robots.txt file: Entry explains how to do it right. In turn, improve your site’s SEO results and traffic.

We address three questions

CLICK - Sometimes Google is not allowed to index a website or blog #DrKPI #SocBiz #SmallData

1. What do robots or crawlers do?
2. What does it mean if a robot cannot crawl the content of your blog or webpage?
3. How to BEST set-up your robots.txt file.

Improve  your search positioning and get the latest news on your mobile.

Subscribe to our award-winning blog: DrKPI – the trend blog for SEO and the social web

By default, everything on your blog that visitors can see, can be indexed by search engines like Google or Qwant. Indexed content will show up in search results. Most blogs receive about 40 to 70 percent of their visitor traffic from search engines.

You can prevent search engines to index certain pages. This is done by editing your robots.txt file. However, this is usually not in the blogger’s best interest. Below we outline what your robots.txt file must contain to allow Google or DrKPI to crawl it.

What do robots do?

Web robots are sometimes also called web wanderers, crawlers or spiders.

Robots perform various tasks. In the context here we are interested in their work regarding:

1. Site Indexing: they take a copy of a website they find and store this information at the search engine’s servers.
2. Validating the site code – this means comparing the website code to W3C standards and grading the code according to accuracy.
3. Link Checking – this includes tracing incoming and outgoing links.

What should you check for?

While the robots.txt file is a great thing, we can inadvertantly make errors which results in outcomes that we may not want. For instance, recently, I informed a blogger that we could not scan his site. He wrote in reply:

„Hi Urs

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /wp-img
Disallow: /impressum

All allowed, only sub-directories are not.“

The above indicates, however, much more is being disallowed then the scanning of sub-directories. For example:

Disallow: /wp-content

This command prevents search engines (i.e. if they respect the blogger’s wish – DrKPI does, Google…?) to index the blog’s posts. To allow indexing the blog’s content, the command needs to be changed to:

Allow: /wp-content

In turn, Google and DrKPI can index the blog posts of this site (read also  WordPress – section on robots.txt optimization).

Do I have a choice?

CLICK - Google and DrKPI respect your wishes and refrain from indexing your blog's content #DrKPI #SocBiz #SmallData Yes you do. For instance, you have the choice to give a particular crawler permission to index your blog.

In order to allow us to index your blog, while possibly still preventing others from doing so, we need you to add the following two lines to your robots.txt file:

User-agent:  DrKPI-bot
Allow: /

If the above has been added, the DrKPI-bot can then go ahead and crawl your blog, even if you use the command:

Disallow:  /wp-content

Thefore, by entering the above command we can crawl your blog’s content and index it. In return, you get the actionable metrics you want, in order to improve the blog’s performance for your business.

PS:  99.68% of all bloggers allow us to crawl their site. Accordingly, we provide them with the actionable metrics needed to improve their blog.

Register free and join other top bloggers that allow DrkPI to benchmark their content – You will be glad you did

Interesting read: The robots.txt website with tipps and tricks

What can you do?

1. Go to your robots.txt file and check – does it allow your site to be crawled – make sure it does (see above on how to do it).
2. Allow trackbacks and pingbacks on your blog
3. Check again – have you set-up things properly.

If you want to see your site’s or blog’s robots.txt file just add /robots.txt to your domain, such as:

QUICK CHECK: (click now to view)

Interesting read: Make use of the robots.txt file & ensure it’s working by Patrick Sexton

Source: Google and DrKPI: SEO optimization

Have we forgotten to mention something?
How much of your blog’s or website’s traffic comes via search engines?

Thanks again for sharing your insights – I always appreciate your very helpful feedback.

Urs E. Gattiker, Ph.D. - CyTRAP Labs - ComMetrics.

Hooray – you read the whole post by author Urs E. Gattiker – aka DrKPI! Want to hang out more? Check out the news updates on Twitter, join our Social Media Monitoring discussion group on Xing, chat with us on Google+, and receive fortnightly updates and behind-the-scenes scoops through our newsletter.

This post is also available in: Englisch

7 Kommentare
  1. Linnie
    Linnie sagte:

    Dear DrKPI

    This is an interesting post. We went and checked our robots.txt file immediately.
    And we had to change something on our file thanks to your hint and suggestions made here.
    That will surely help to improve the SEO for our site. Of course, a never ending challenge.

  2. seonewtool
    seonewtool sagte:

    great post.
    This tool can help you to identify errors that may exist within your current /robots.txt file.
    It also lists the pages that you’ve specified to be disallowed.

    robots txt checker

    • Urs E. Gattiker
      Urs E. Gattiker sagte:

      Thank you so much for adding this….

      However, we had to remove your link, since it has some funny code and asks people to download some software first beforehand.

      This does not come across as being a safe site. Did you do that on purpose or are you not aware of this?


  3. Eduard
    Eduard sagte:

    Hi Urs
    You can also use Moderatobot service to verify that your robots.txt
    Our tool checks if the robots.txt file is well formatted and can be understood by most bots.

    • Urs E. Gattiker
      Urs E. Gattiker sagte:

      Dear Eduard

      Very nice indeed I went and checked it and got this output:

      # Added by SEO Ultimate’s Link Mask Generator module
      User-agent: *
      Allow: /wp-admin/admin-ajax.php
      Disallow: /go/
      # End Link Mask Generator output
      Disallow: /wp-admin/

      Your tool tells me what might be wrong which I find very interesting.
      Unfortuntely, it fails to tell me, how to fix it with my WordPress robots.txt file.

      Are you working on telling people what they need to do to fix the issues you identify with your nice tool?
      Thanks for sharing.
      #BlogRank #BuzzRank

  4. Eduard
    Eduard sagte:

    Dear Urs.

    Thanks for your interest and kind feedbacks.

    After checking a robots.txt file, Moderatobot suggests it’s own version of the file with all known problems fixed. You can find this file in the right pane on the checking page and then save it on a disk and use it in your project.
    If you meant this.

    Now we are working on a joint checking of robots.txt, sitemap.xml and metatags of a website’s pages to detect logical problems, for example, when disallowing resources

    Best regards,

Trackbacks & Pingbacks

  1. […] Interesting:  Google and DrKPI: SEO optimization […]

Kommentare sind deaktiviert.