SES Chicago: Bulk Submit 2.0

Filed in Conferences/Educ., Google, Yahoo by Matt McGee on December 5, 2006 3 Comments

SES ChicagoMy notes from the Tuesday, 10:15 am session titled “Bulk Submit 2.0.”

Danny Sullivan, Moderator

– used to be able to email Infoseek a list of 500 URLs and they’d be added to their index almost right away
– this went away and kinda got replaced by paid inclusion
– now it’s making a comeback with something like Google Sitemaps
– at PubCon, all 3 SEs now cooperating on sitemap protocol

Amanda Camp, Google

– my first project at Google was Google Sitemaps
– Add URL page is not the recommended way to submit to Google
– recommended way is Google Sitemaps
– Sitemaps helps us be smarter about the way we crawl — tell us when pages have been updated

Four formats
1) Text file of URLs
2) RSS/Atom feeds
3) Sitemap protocol
4) OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting)
* An HTML sitemap is not the same as a Googl Sitemap

Simple Rules
– always contain the full URL
– remove unnecessary parameters, like sessions IDs
– place sitemap in highest directory of URL you’re submitting
– domain needs to match path you’re submitting (www vs. non-www, etc.)
– name sitemap whatever you want
– URLs must use UTF-8 encoding
– sitemaps max = 50,000 URLs or 10mb, index files a max of 1,000 sitemaps
– use GZIP to compress sitemaps

Text File format
– one URL per line
– max of 50k URLs
– text file should contain nothing but list of URLs

Syndication feed
– Google accepts RSS 2.0 or Atom 0.3 feeds that use field
(too fast)

XML format
(gives examples of various XML tags and what they mean)

Sitemap Generators
– Google offers its own, and there are 3rd party generators, too (not endorsed by Google)

– use Google submission form
– try to accept new sitemaps within half hour (will say “okay” on dashboard, or “errors” if problems found)
– adding a Sitemap is optional; can still use Webmaster Central without having a sitemap

More info:

Amit Kumar, Yahoo – manager, Site Explorer

Site Explorer: browse your pages and links; site authentication; bulk submission; new features coming soon

Site Explorer (SE) interface
(shows screenshots of submit URL page and dashboard)
– submit page doesn’t require login
– authentication process slightly different from Google: download file, upload to your site, click button in SE
– SE shows pages, inlinks, subdomains

Eric Papczun, Performics

* biggest challenge for large sites is getting a complete and accurate list of URLs
* use due-diligence to make sure duplicate pages aren’t listed
* after submission, sitemaps picked up within 1-2 days; entire sitemap crawled within 3-14 days (avg. about a week); small sites and new sites will take longer to get sitemap crawled — submit new content regularly

Sitemap Mgmt. Tips
– have optimized native sitemap (HTML version)
– focus the crawler by excluding redundant content (i.e., print-friendly pages), disembodied content (i.e., Flash objects), spammy stuff
– use “preferred domain” to tell Google if you want or just to appear in SERPs
– use separate sitemap (in Google) for news and mobile sitemaps

Impact / What to Expect
– sitemaps are tools, not solutions
– we’ve seen two effects: 1) number of pages indexed goes up (example: retailer with ugly URLs went from 61k to 133k URLs); 2) number of pages indexed goes down (eliminating dupe content URLs)

* select URLs for more frequent crawling with “priority” XML tag
* use it to spotlight frequently updated pages, new pages
* we’ve found Google is responsive to this tag

Handling Errors
– 404 errors will probably be most frequent error
– might be from typos in URL, server issues, etc.

* if you don’t have a robots.txt file, Google assumes you’re okay with full crawl of your site
* on large, dynamic sites we typically see about 5% of site crawled per visit

Todd Friesen, Range Online Media

Bulk submit via feeds

Paid Inclusion Feeds
– Yahoo Search Submit
— publish, include, and refresh content without relying on spiders
— refresh natural results within 48 hours
— provide relevant and targeted copy
— quickly update listings to reflect sales/promotions
— detailed tracking

Comparison Shopping Engine Feeds
– comparison engines convert, esp. MSN Shopping — if you’re not using MSN Shopping, you’re missing out

– Google Base
— free, and it converts
— ranks by relevance over price
— limited user support
— lots of competition

– MSN Shopping
— reasonable CPCs
— best grouping algo in the Big 3
– lower volume than Base
— great conversion

– Yahoo Shopping
— highest volume, but most expensive
— send the most traffic
— user reviews somewhat outdated
— poor product grouping
— high CPCs

(shares two case studies)

* paid inclusion can be used for immediate results or to pick up pages where control over on-site content is limited
* results in days, not months
* doesn’t tie up client IT resources
* makes A/B testing in natural SERPs possible


Todd: always put fresh content on home page because home page is always the page that gets crawled the most; update and resubmit Sitemaps regularly;

Eric: (question re: news story feeds) — also supplement sitemaps with PPC

Todd: (question re: aging delay and new sites) — yes, use sitemaps if you’re a new site; get trusted links like Yahoo Directory, BOTW,; buy an old site with trusted links and 301 the whole thing to your site to get credit for links

[tags]seschicago06, ses, seo[/tags]

Comments (3)

Trackback URL | Comments RSS Feed

  1. says:

    Well, Thank you for the grate information you got here it really helps me getting indexed as my website was not indexed since a 12 days until i submitted my sitemap in xml file format and now am totally indexed 31 pages so thanks

Leave a Reply

Your email address will not be published. Required fields are marked *