Preventing Search Indexing

  1. Can I prevent the appliance from archiving my content or showing parts of a document in the results?
  2. Does the crawler respect robots.txt files?
  3. What is the user agent name for the Texas A&M Google bot?

Q: Can I prevent the appliance from archiving my content or showing parts of a document in the results?
A: Yes, the appliance will obey metatags in the HTML head of a document that prevent it from archiving that document or showing a snippet in results for that document.

The following metatag will prevent the appliance from archiving a document:
<meta name="tamu-googlebot" content="noarchive" />

The following metatag will prevent the appliance from showing result snippets for a document:
<meta name="tamu-googlebot" content="nosnippet" />

Note that these metatags must be present in documents at crawl time for the appliance to obey them.

The Appliance will also obey directives such as "index", "noindex", "follow", and "nofollow" in a ROBOTS meta tag. Here are some examples:

<meta name="robots" content="index,follow" />
<meta name="robots" content="noindex,follow" />
<meta name="robots" content="index,nofollow" />
<meta name="robots" content="noindex,nofollow" />
Q: Does the crawler respect robots.txt files?
A: Yes, the Google Search Appliance honors the robot exclusion standard including the use of robots.txt files and META robot entries. The appliance checks for robots.txt files on each server it crawls.

Q: What is the user agent name for the Texas A&M Google bot?
A: The user agent name is tamu-googlebot.

To use the user agent in your robots.txt file the syntax would be:
User-Agent: tamu-googlebot
Disallow: /path/to-be/excluded/

← Back