Skip to main content

Removing Part of your page from being indexed by Google with the help of ChatGPT

Submitted by daniel on

I am quite a heavy user of rss feeds that use keywords to return related content. 

Recently I had to limit certain sources of News content on one of my sites because it was returning false positives. What this meant was that content from certain sources was being included in a search result, despite the main article not having a keyword match. After a little more investigation, it turned out that on a specific news page there were related articles whose keywords were being picked up in these related articles. The consequence of this meant that I received alot of false positive matches in my source feed.

To counter this I decided to remove these sites from my rss feed, as I was having to unpublish quite a lot of these irrelevant articles. 

I also decide to contact google to see if there was anything that could be doen there end, but this was met with this is how rss feeds work response, and there are alternatives in the market that you can choose to use.

Thinking about this, it seemed that it must be a common problem. Essentially on an article it is not unusual to have links to other related content, so how do we stop these from skewing the search results, and the rss feeds that these results populate. It seemed the emphasis was in these sites to improve theie search listings to make them more accurate but how could this be acheived?

Certainly there seemed to be some solutions that could be adopted. One such method involved displaying the related content in an iframe, and then preventing that iframe from being indexed. Another was to use a server side include for the related content and then stop that SSI from being indexed. That said, none of these seemed like a good match for my use case where I was not using iFrames or Server Side Includes.

Eventually I decided to turn to ChatGPT to see what it might come up with. After a few attempts I got a response that semed to be viable solution.

Q. is it possible to remove part of a html section or block from being indexed by search bot, not the whole page

A. 

Yes, it is possible to prevent search bots from indexing specific parts of an HTML page while allowing other parts to be indexed. You can achieve this by using the robots meta tag or the nofollow attribute.

The robots meta tag allows you to specify instructions for search engine crawlers. One of the common directives used is noindex, which tells search bots not to index the content within the specified section. Here's an example of how you can use the robots meta tag to prevent indexing:

<head> <meta name="robots" content="index,follow"> </head> 

<body> 

<!-- Content to be indexed --> 

<div> <h1>Indexable Content</h1> <p>This content will be indexed by search engines.</p> </div> 

<!-- Content not to be indexed --> 

<div> 
<h1>No Index Content</h1><p>This content will not be indexed.</p> <meta name="robots" content="noindex,nofollow">
</div> 

</body>

In this example, the first meta tag in the head section allows search engines to index and follow links on the page. The second meta tag within the "No Index Content" section instructs search bots not to index that specific part.

Alternatively, you can use the nofollow attribute on individual links to prevent search engines from following those links. Here's an example:

<a href="https://example.com" rel="nofollow">Link</a>

In this case, the rel="nofollow" attribute is added to the anchor (<a>) tag, which tells search engines not to follow that particular link.

Keep in mind that while these methods can discourage search engines from indexing or following specific content, they are not foolproof. Determined or malicious bots may still access and index the content you intended to exclude.

Previously I had searched on various search engines for an answer, and this had not been successful. I had also posted on Google Communities forum and again this had not really been fruitful. Even now I cannot not really say if and where this is documented althought there is docmentation on the robots metatag here. Certainly there were quite a lot of articles on using metatags to disable search engine indexing on a specific page.  I had also come across the use of metatags previously and did not really consider them seriously because I had thought that they were restricted to the head of a html document, which for certain meta tags is in fact the case.

Anyway, if you have related links on your page, and do not want these to skew the results of the main content on the page, using the robots metatag  could be a viable solution to improve the accuracy of the SEO relating to your content

Add new comment

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.