In my opinion, the nuts-and-bolts of SEO can generally be boiled down to three primary parts: Crawlability, Content, and Links. These three things make up the middle row of the SEO Success Pyramid, and they’re an absolute must as you work your way up the pyramid to becoming a trusted site.
Search engine spiders/bots aren’t all that intelligent. If a spider can’t find content (because of a broken link, for example), it’s not programmed to stop what it’s doing and go looking around for that great article you wrote. It’s going to move on to the next link and keep crawling, crawling, and crawling. That’s what it does.
It’s common sense: If a spider can’t access your content, your content won’t be indexed and will never be found in search engines. That’s why crawlability is a foundational element of SEO and the SEO Success Pyramid.
5 Common Crawlability Barriers
1.) You screwed up the robots.txt file.
If you’re like me, you roll your eyes every time you hear or read someone talking about this, right? I mean, really, who still screws up their robots.txt file? Search marketers and others have been banging this drum so long, you’d think it doesn’t need to be said anymore.
Well, well, well … have a look at what I found last week on Yahoo Answers:
Apparently, we do still need to bang the drum: Be careful with your robots.txt files. It’s the first thing to check when you think you have crawlability issues. You can learn everything you need to know at www.robotstxt.org.
2.) Too many variables/parameters in your URLs
Search engines are getting better at crawling long, ugly links — but they still don’t like them. Google’s webmaster guidelines explain it in plain English:
If you decide to use dynamic pages (i.e., the URL contains a “?” character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
(Bonus: Short URLs also get clicked on more often in the SERPs. They’re good for crawlability and clickability.)
3.) Session IDs in your URLs
Search engine spiders flat-out do not like to see session IDs in your URLs. If you’re using session IDs on your site, be sure to store them in cookies (which spiders don’t accept) instead of including them as part of your URLs. Session IDs can cause a single page of content to be visible at multiple URLs, and that would just clog up the SERPs. So, search engines don’t like to crawl URLs with session IDs.
4.) Your site suffers from code bloat.
Code bloat is one of those things that isn’t really a problem … until it’s a Big Problem. Spiders are generally good at separating code from content, but that doesn’t mean you should make it more difficult by having so much code that the content is hard to find. If you look at the source code of your web pages, and finding the content is like looking for the proverbial needle-in-a-haystack, you may have crawlability problems. As Stoney deGeyter recently said on Search Engine Guide, “I do believe that if you have so much code on your pages that it makes it hard to dig out the content, then you might have some issues.” I agree.
5.) Your navigation and internal linking is coded poorly.
Crawlability is often overlooked in the name of creativity and coding, but it’s as important to your SEO efforts as content development, link building, and any other element of the SEO Success Pyramid. Ignore it at your own risk.