Robots text
A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.
A robots.txt file is used primarily to manage crawler traffic to your site, and usually to keep a file off Google, depending on the file type:
The best robots.txt for Blogger depends on your specific goals and preferences. However, here are some common approaches:
Allow all crawling:
User-agent: *
Allow: /
This allows all search engines to crawl all pages on your Blogger blog.
Disallow specific directories:
User-agent: *
Disallow: /search
Disallow: /20*
Disallow: /feeds
This prevents search engines from crawling pages in the "search", "20*", and "feeds" directories.
Allow only HTML pages:
User-agent: *
Disallow: /
Allow: /*.html
This allows search engines to crawl only pages with a ".html" extension.
Completely block crawling:
User-agent: *
Disallow: /
This prevents all search engines from crawling any pages on your Blogger blog.
Additional considerations:
- If you have a specific page or directory that you want to exclude from search engine indexing, you can add a "Disallow" rule for that page or directory.
- If you have a sitemap, you can include the URL of your sitemap in your robots.txt file to help search engines find and index your content more efficiently.
- You can use a tool like Google Search Console to test your robots.txt file and see how it is affecting your website's visibility in search results.
Ultimately, the best robots.txt for your Blogger blog will depend on your specific needs and goals. Experiment with different configurations to find the one that works best for you.
User-agent: *
Disallow: /search*
Disallow: /20*
Disallow: /feeds*
Allow: /*.html
The provided robots.txt file instructs search engines to:
- Disallow crawling pages that start with "/search", "/20", and "/feeds".
- Allow crawling pages that end with ".html".
- Allow crawling all other pages.
This means that search engines will likely index pages on your website that have a ".html" extension, but they will avoid pages that start with "/search", "/20", or "/feeds".
If you want to prevent search engines from crawling any pages on your website, you can simply remove the "Allow: /*.html" line.
therefor this will be best
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search*
Disallow: /20*
Disallow: /feeds*
Allow: /*.html
Sitemap: https://yoursite.blogspot.com/sitemap.xml
Sitemap: https://yoursite.blogspot.com/sitemap-pages.xml
No comments:
Post a Comment