Internet Business and Marketing Trends

Yahoo Search Subtly Nudges Webmasters

One of biggest thorns in the side of all search engines is non-relevant search results. Whether the cause of these types of results are spammers or webmasters who aren’t conscientious enough to check their work, poor SERPs can damage the reputation of search engine almost irrevocably (who, exactly, would want to use an engine that gives irrelevant results?).

The big four have taken various steps to address this, which usually consists of various webmaster guidelines and FAQs. However, Yahoo Search is trying a different approach. On the Yahoo Search blog, it was revealed they now parse wildcard characters (*, $) included in robot.txt files. The request for Yahoo to do this was made at an SES conference (who says search engines don’t listen to their audience?).

The blog entry reveals the details of the wildcard allowances:

‘*’ - matches a sequence of characters

You can now use ‘*’ in robots directives for Yahoo! Slurp to wildcard match a sequence of characters in your URL. You can use this symbol in any part of the URL string you provide in the robots directive. For example,

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html
Disallow: /*?sessionid

While the $ informs Slurp to:

anchor the match to the end of the URL string. Without this symbol, Yahoo! Slurp would match all URLs against the directives, treating the directives as a prefix. For example:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$
Allow: /*?$

The first command tells Slurp to disallow all files that end with the .gif extension. The second command informs Slurp to allow files ending with “?” to be included in the crawl.

If you noticed, in the * wildcard explanation, Priyank Garg used Session IDs as an example and this is what led to my opening statement about irrelevant search results. Obviously, Yahoo wants to rid their index of as many erroneous URLs as possible and accepting the * wildcard in robot.txt files helps in this process, provided webmasters are willing to implement these allowances.

In fact, I asked Yahoo senior PR manager Shelia Tran if this was their intention when they introduced this change. She replied:

We try our best to crawl and find the best unique content on webmaster’s sites. However, any help that webmasters can give in terms of hints and directives to prevent the crawler from crawling duplicate pages or crawler traps is welcome.

In other words, Yahoo wants webmasters to take an active role in preventing Slurp traps. Putting the appropriate commands in your site’s robot.txt file will also help them keep their SERPs relatively clean, which is something that benefits all search engine users.

Chris Richardson
Staff Writer | WebProNews Blog

Tags: , ,

RSS feed | Trackback URI

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.

Search WebProBlog

 

WebProBlog Email

 


Recent Posts


» iEntry Links


Categories


Contact WebProBlog

RSS Feeds



Titan Quest Forum
The #1 Titan Quest forum
Halo 3 Forum
The best Halo, Halo 2, Halo 3 forum
Nintendo Wii
Nintendo Wii news and views
Mac Software
The best in OS X freeware
Graphics Forum
Your source for graphic tutorials

About WebProBlog

Welcome to WebProBlog! WebProBlog is essentially the WebProNews staff community blog. Frequently, we may have ideas or observations that may not necessarily be a great fit for a full WebProNews article but would work great in a blog. As a result, you can expect to see posts here from a few WebProNews writers and staff...


WebProBlog WebProNews WebProNews WebProBlog RSS Feed Rich Ord, CEO iEntry inc. Susan Coppersmith David Utter Jason Miller Doug Caverly Mike McDonald Chris Richardson Tiffany Doughty Nathaniel Drake Jay Fougere Rachel Harvey Joe Lewis