I recommend carefully researching each of these options before implementing any changes to ensure that only the desired pages are blocked from search engines.
A page can be restricted from indexing in 3 ways:
Option #1: In the robots.txt file
In a robots.txt file, you can specify whether you’d like to block bots from a single page, a whole directory, or even just a single image or file.
Disallow indexing of everything
Disawllow indexing of a specific folder
Disawllow indexing of a specific page
Disallow indexing of a folder, except for allowing the indexing of one file in that folder
To block access to all URLs that include a question mark (?), you could use the following entry:
To block an URLs that end with .asp, you could use the following entry:
Please note that Google and other search engines may not retroactively remove pages from results if you implement the robots.txt file method. While this tells bots not to crawl a page, search engines can still index your content if, for example, there are inbound links to your page from other websites. If your page has already been indexed and you’d like it to be removed from search engines retroactively, you’ll likely want to use the “No Index” meta tag method below.
Option #2: By Noindex Meta tag
Using a metatag to prevent a page from appearing in search results. It requires only a tiny bit of technical practice. In fact, it’s really just a copy/paste job if you’re using the right content management system.
Example of “noindex” a page
<meta name="robots" content="noindex" />
Add the tag to thesection of your page’s HTML, that’s it.
Option #3: By Noindex X-Robots tag
The X-Robots-Tag can be used as an element of the HTTP header response for a given URL.
Example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page:
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
Example of pages should not in the search result:
Duplicate content page
This is one of the common situations when a page should de-index from search engines. Duplicate content is when two or more identical pages indexed by search engines. Duplicate content is a common issue in SEO. It negatively impacts the value of each identical page in search engines.
Thank you page
If you are using Google Analytics to track conversion rate, you probably have a Thank You page for conversion rate tracking purpose. Thank You page should not index in search engines. Because you want people to land to a Thank You page only because they filled out a form from your landing page. Not because they found your Thank You page in search results.
Member only page
For any website that hosting the member-only content or paid content, pages that contain restricted content should not index in search engines. Because you don’t want the content accessible from the general public.