Using robots.txt

Robots.txt is a text file that contains site indexing parameters for the search engine robots.

Yandex supports the Robots Exclusion Protocol with advanced features.

When crawling a site, the Yandex robot loads the robots.txt file. If the latest request to the file shows that a site page or section is prohibited, the robot won't index them.

Requirements to the robots.txt file

Yandex robots correctly process robots.txt, if:

The file size doesn't exceed 500 KB.
It is a TXT file named "robots", robots.txt.
The file is located in the root directory of the site.
The file is available for robots: the server that hosts the site responds with an HTTP code with the status 200 OK. Check the server response

If the file doesn't meet the requirements, the site is considered open for indexing.

Yandex supports redirection from the robots.txt file located on one site to the file located on another site. In this case, the directives in the target file are taken into account. This redirect can be useful when moving the site.

Recommendations on the content of the file

Yandex supports the following directives:


Directive	What it does
User-agent *	Indicates the robot to which the rules listed in robots.txt apply.
Disallow	Prohibits indexing site sections or individual pages.
Sitemap	Specifies the path to the Sitemap file that is posted on the site.
Clean-param	Indicates to the robot that the page URL contains parameters (like UTM tags) that should be ignored when indexing it.
Allow	Allows indexing site sections or individual pages.
Crawl-delay	Specifies the minimum interval (in seconds) for the search robot to wait after loading one page, before starting to load another. We recommend using the crawl speed setting in Yandex.Webmaster instead of the directive.


Directive	What it does
User-agent *	Indicates the robot to which the rules listed in robots.txt apply.
Disallow	Prohibits indexing site sections or individual pages.
Sitemap	Specifies the path to the Sitemap file that is posted on the site.
Clean-param	Indicates to the robot that the page URL contains parameters (like UTM tags) that should be ignored when indexing it.
Allow	Allows indexing site sections or individual pages.
Crawl-delay	Specifies the minimum interval (in seconds) for the search robot to wait after loading one page, before starting to load another. We recommend using the crawl speed setting in Yandex.Webmaster instead of the directive.

* Mandatory directive.

You'll most often need the Disallow, Sitemap, and Clean-param directives. Examples:

User-agent: * #specifies the robots that the directives are set for
Disallow: /bin/ # prohibits links from the Shopping Cart.
Disallow: /search/ #  prohibits page links of the search embedded on the website
Disallow: /admin/ # prohibits links from the admin panel
Sitemap: http://example.com/sitemap # specifies the path to the website's Sitemap file for the robot
Clean-param: ref /some_dir/get_book.pl

Robots from other search engines and services may interpret the directives in a different way.

Note. The robot takes into account the case of substrings (file name or path, robot name) and ignores the case in the names of directives.

Using Cyrillic characters

The use of the Cyrillic alphabet is not allowed in the robots.txt file and server HTTP headers.

For domain names, use Punycode. For page addresses, use the same encoding as that of the current site structure.

Example of the robots.txt file:

#Incorrect:
User-agent: Yandex
Disallow: /корзина
Sitemap: сайт.рф/sitemap.xml

#Correct:
User-agent: Yandex
Disallow: /%D0%BA%D0%BE%D1%80%D0%B7%D0%B8%D0%BD%D0%B0
Sitemap: http://xn--80aswg.xn--p1ai/sitemap.xml

How do I create robots.txt?

In the text editor, create a file named robots.txt and add the directives you need in it.
Check the file in Yandex.Webmaster.
Place the file to your site's root directory.

Sample file. This file allows indexing of the entire site for all search engines.

FAQ

The site or individual pages are prohibited in robots.txt, but are still in the search

As a rule, after you set a ban on indexing using any of the available methods, pages are excluded from the search results within two weeks. You can speed up this process.

The “Server responds with redirect to /robots.txt request” error occurs on the “Site diagnostics” page in Yandex.Webmaster.

For the robots.txt file to be taken into account by the robot, it must be located in the root directory of the site and respond with HTTP 200 code. The indexing robot doesn't support the use of files hosted on other sites.

To check availability of the robots.txt file for the robot, check the server response.

If your robots.txt redirects to another robots.txt file (for example, when moving a site), Yandex takes into account the target robots.txt. Make sure that the correct directives are specified in this file. To check the file, add the target site in Yandex.Webmaster and verify your site management rights.