5 Common Crawlability Mistakes That Kill Your SEO Success

Filed in MY BEST POSTS, SEO by Matt McGee on June 30, 2008

SEO Success PyramidIn my opinion, the nuts-and-bolts of SEO can generally be boiled down to three primary parts: Crawlability, Content, and Links. These three things make up the middle row of the SEO Success Pyramid, and they’re an absolute must as you work your way up the pyramid to becoming a trusted site.

Search engine spiders/bots aren’t all that intelligent. If a spider can’t find content (because of a broken link, for example), it’s not programmed to stop what it’s doing and go looking around for that great article you wrote. It’s going to move on to the next link and keep crawling, crawling, and crawling. That’s what it does.

It’s common sense: If a spider can’t access your content, your content won’t be indexed and will never be found in search engines. That’s why crawlability is a foundational element of SEO and the SEO Success Pyramid.

5 Common Crawlability Barriers

1.) You screwed up the robots.txt file.

If you’re like me, you roll your eyes every time you hear or read someone talking about this, right? I mean, really, who still screws up their robots.txt file? Search marketers and others have been banging this drum so long, you’d think it doesn’t need to be said anymore.

Well, well, well … have a look at what I found last week on Yahoo Answers:

robots.txt question on Yahoo Answers

Apparently, we do still need to bang the drum: Be careful with your robots.txt files. It’s the first thing to check when you think you have crawlability issues. You can learn everything you need to know at www.robotstxt.org.

2.) Too many variables/parameters in your URLs

Search engines are getting better at crawling long, ugly links — but they still don’t like them. Google’s webmaster guidelines explain it in plain English:

If you decide to use dynamic pages (i.e., the URL contains a “?” character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

(Bonus: Short URLs also get clicked on more often in the SERPs. They’re good for crawlability and clickability.)

3.) Session IDs in your URLs

Search engine spiders flat-out do not like to see session IDs in your URLs. If you’re using session IDs on your site, be sure to store them in cookies (which spiders don’t accept) instead of including them as part of your URLs. Session IDs can cause a single page of content to be visible at multiple URLs, and that would just clog up the SERPs. So, search engines don’t like to crawl URLs with session IDs.

4.) Your site suffers from code bloat.

Code bloat is one of those things that isn’t really a problem … until it’s a Big Problem. Spiders are generally good at separating code from content, but that doesn’t mean you should make it more difficult by having so much code that the content is hard to find. If you look at the source code of your web pages, and finding the content is like looking for the proverbial needle-in-a-haystack, you may have crawlability problems. As Stoney deGeyter recently said on Search Engine Guide, “I do believe that if you have so much code on your pages that it makes it hard to dig out the content, then you might have some issues.” I agree.

5.) Your navigation and internal linking is coded poorly.

Designers and developers can be pretty creative when building a web site. Sometimes that creativity comes out in the form of site navigation that’s built in complicated DHTML or javascript code. Sometimes that creativity comes out in the form of a Flash- or Ajax-based navigation, where what we think of as web pages aren’t really web pages at all. This kind of design and implementation can stop a crawler in its tracks. Google talked about crawlability problems with flash, ajax, and javascript in late 2007:

“One of the main issues with Ajax sites is that while Googlebot is great at following and understanding the structure of HTML links, it can have a difficult time finding its way around sites which use JavaScript for navigation. While we are working to better understand JavaScript, your best bet for creating a site that’s crawlable by Google and other search engines is to provide HTML links to your content.”

Conclusion

Crawlability is often overlooked in the name of creativity and coding, but it’s as important to your SEO efforts as content development, link building, and any other element of the SEO Success Pyramid. Ignore it at your own risk.

Comments (20)

Trackback URL | Comments RSS Feed

  1. Robert says:

    I agree, I can’t believe that people still get the robots.txt wrong! Then again I can… heck I often don’t even use a robots file.

    I would say that point number 5 is possibly the most important. We all know that links are valuable but I wonder how many fail to realise just how valuable a well structured internal linking system can be? indeed this could be all an internal page needs to gain a decent rank!

  2. great article matt. i think people would be surprised at how many companies fall short with crawlability, content, and links.

    what ends up happening most of the time with the robots.txt file is the dev guys end up migrating their code from development to production and forget to have some checks and balances in place (e.g. unit tests) to prevent these sort of things from happening.

  3. Josh says:

    Great article Matt!!! This should help a lot of website owners who are having issues with their SEO management…

  4. Thanks for sharing the robots.txt case study! OUCH

    Crawlability, Content, Links – Great reminder about how the center of your SEO Success Pyramid requires intense focus. What value!!

  5. Yossarian says:

    Even though I allow spiders all the time in the robots.txt I always need to look online just to confirm I have done it the right way round!

    I know one day I will get forgetful and add the / by mistake

  6. “Your navigation and internal linking is coded poorly.”

    I find this as usually the biggest issue/barrier in getting indexed.

  7. Bastian says:

    Especially with bigger clients these are in most cases the first problems to solve. It’s unbelieveable what can be found on their servers 🙂

  8. GadgetsGuy says:

    very useful article, believe it or not .. sometimes i screw with robots.txt 😀

  9. Ken says:

    Great article. Great points. The one I see most common is code bloat. My recommendation to webmasters out there – take advantage of CSS!

  10. In whether designing or planning your site’s SEO it is really important that you make sure that it is always user-friendly ^^

  11. Greg Silvano says:

    Great article, Matt. One problem is that web coders aren’t necessarily SEO experts and are working under little or no SEO guidance. A lot of issues don’t get discovered until much later – like when somebody notices the site can’t be found on Google.

    Some day this’ll all be second nature to the developers. But we’re not there yet.

  12. Great article, Matt! I am tempeted print it out and give it to all those clients that argue with me that what they currently have is just fine. You made fantastic points very fast and in easy-to-understand terms for businesses.

  13. Nice tips. But one must also see that he does not unnecessary input too many similar keywords.

  14. Hastimal says:

    Hello Matt
    Great article matt, as i am new the SEO optimization this will help me a lot from were to begin.

    This tips will help me while building a new websites.

    Thanks 🙂

  15. Adrian Eden says:

    I’m a big fan of putting HTML page linkages in the footer of your websites using keyword rich descriptions.

  16. great article ^^

    i agree with robert. fifth point for me is the most important

  17. Pete Mason says:

    Point 5: “Your navigation and internal linking is coded poorly.” can be very pertinent for websites coded in .NET

    The standard link controls are javascript – and completely impenetrable to Google et al.

    A potential client’s site has exactly this problem. It’s one of the issues which can occur when an application programmer decides to build a public-facing website.