Maintain your old Siemens Hipath system

60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next

Search engine listing delays have come to be called the Google Sandbox effect are actually true in practice at each of four top tier search engines in one form or another. MSN, it seems has the shortest indexing delay at 30 days. This article is the second in a series following the spiders through a brand new web site beginning on May 11, 2005 when the site was first made live on that day under a newly purchased domain name.

First Case Study Article

Previously we looked at the first 35 days and detailed the crawling behavior of Googlebot, Teoma, MSNbot and Slurp as they traversed the pages of this new site. We discovered the each robot spider displays distinctly different behavior in crawling frequency and similarly differing indexing patterns.

For reference, there are about 15 to 20 new pages added to the site daily, which are each linked from the home page for a day. Site structure is non-traditional with no categories and a linking structure tied to author pages listing their articles as well as a "related articles" index varied by linking to relevant pages containing similar content.

So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.

The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.

In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

Lessons learned in the first 60 days on a new site follow:

1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

The bottom line is that we've discovered all engines seem to delay indexing of new domain names for at least thirty days. Google so far has delayed indexing THIS new domain for 60 days since first crawling it. AskJeeves has crawled thousands of pages, while indexing none of them. MSN indexes faster than all engines but requires robots.txt file. Yahoo's Slurp crawls on again off again for 60 days, but indexes only six of total 15,000 or more pages crawled to date.

We seem to have settled that there is a clear indexing delay, but whether this site specifically is "Sandboxed" and whether delays apply universally is less clear. Many webmasters claim that they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spiders through new sites following launch to document their results publicly so that indexing and crawling behavior are proven.

© Copyright July 18, 2005 Mike Banks Valentine

Mike Banks Valentine is a search engine optimization specialist who operates WebSite101 eCommerce Tutorial and will continue reports of case study chronicling search indexing of Publish101 Article Resource

Click to Contact Mike Valentine

In The News:

Fighting in key Yemen city dies down after ceasefire
Tue, 18 Dec 2018 07:50:00 +0000
Yemen's key port city of Hodeida was calm on Tuesday, hours after a ceasefire came into effect between government-allied forces and the country's rebels.

'You sold your country out': Judge hits out at Trump's ex-adviser
Tue, 18 Dec 2018 15:58:00 +0000
A judge has delayed the sentencing of Donald Trump's former national security adviser Michael Flynn after telling him: "You sold your country out."

Brit arrested over cannabis oil in Bali could face death penalty
Mon, 17 Dec 2018 21:56:00 +0000
A British man who could face the death penalty after he was arrested in Bali for allegedly smuggling cannabis oil has admitted he has been "very stupid".

Seven English-speakers killed by Cameroon military
Tue, 18 Dec 2018 17:12:00 +0000
Seven suspected English-speaking separatist rebels have been killed by the military in Cameroon.

BA to resume Pakistan flights 10 years after hotel bombing
Tue, 18 Dec 2018 16:19:00 +0000
British Airways is to resume flights to Pakistan more than 10 years after a deadly terror attack at the Marriott Hotel in the capital Islamabad.



tikatoshop.it

Erfahrungen mit Pallhuber Wein
Agen Bola SBOBET Terpercaya

Travel in comfort and at your leisure with CT Airlink Limousine & Car Service for top quality private transportation and exceptional customer service. We operate Sedans, SUVs & Vans for CT Car Services to covering all Connecticut airports including Car Service from CT to Newark Airport , Mohegan Casino Uncasville CT, Foxwoods Casino Mashantucket CT, Manhattan Cruise Terminal NYC, Brooklyn Cruise Terminal NYC and Bayonne Cruise Terminal NJ. CT Airlink hire licensed and friendly chauffeurs who have in-depth knowledge of the Areas.

World of Website Promotion

Website promotion is a big and ongoing process. Every person... Read More

See No Google, Hear No Google, Speak No Google

That's right - I dreamt of a World Wide Web... Read More

2 Powerful Ways To Capitalize on Your Search Engine Traffic

Many marketers know that search engine marketing is among one... Read More

How to Make Better Use of Web Site Page Titles and META Data

We got down to the basics with web site page... Read More

Article Marketing: Fox in the Competitor Hen House or Chicken Little?

I recently was asked by an author to remove a... Read More

SEO Expert Guide - Black Hat SEO - Activities to avoid (part 8/10)

In parts 1 - 7, you learnt how to develop... Read More

How and When Should I Submit My Website to Google?

As soon as you register your domain name, submit it... Read More

Ten Steps To A Well Optimized Website - Step 3: Site Structure

Welcome to part three in this search engine positioning series.... Read More

Use Search Engines For A Guaranteed Web Site Promotion

For your web site to succeed, you must use is... Read More

Google Rankings ? Achieving a Top 10 Position in Google ? Part 1

Achieving a top ranking position in Google is every webmasters... Read More

Five Tips to Improve Your Chances with Google et al.

"Dear David: I just created a website on baby toy... Read More

SEO Help: Dont Try to Fool the Search Engines

Writing articles is all the rage these days on the... Read More

The Google Strategy

Webmasters across the Internet were totally floored by what happened... Read More

Should You Be Linking for Traffic or Rankings?

Just for a change, rather than a technical article, I... Read More

Do Not Ever Link to a Site Without Doing This First!

Links are a crucial part of attaining high search rankings,... Read More

Increase Page Rank with Search Engine Optimization

Utilizing effective search engine optimization techniques will improve the page... Read More

9 Steps to Getting Better Search Engine Rankings

You finally have a website and you are ready to... Read More

2 Lesser Known Ways to Brainstorm for Internet Home Business Keywords

Search Engine Optimization (SEO) doctrine states that you should always... Read More

Link Building - The Waiting Game

Link building is a waiting game. Many clients have asked... Read More

3.5 Tips To Help You Avoid Becoming The Next Search Engine Outlaw

Tip 1 - Hide And SeekDo not use hidden text... Read More

Top 10 Little Used SEO Strategies

There are millions of web sites trying to get listed... Read More

One Way Link Building Secures Long Term Ranking Results

One-way link building is a great way to improve your... Read More

How To Measure Search Engine Marketing ROI

According to the Search Engine Marketing Professional Organization (SEMPO), advertisers... Read More

Google has an Achilles Heal - Will Their Competitors Notice?

Even though Google Revenues continue to soar, the hidden problem... Read More

Googles New SEO Rules

Google has recently made some pretty significant changes in its... Read More