A couple years ago, while I was working at OWT, one of our clients was launching a new product — but they weren’t the only company doing so. One of their main competitors was launching the same product, sourced from the same manufacturer. The product didn’t have much search history. It also didn’t have much competition for the relevant keywords.
Still, we screwed up. We took a very traditional approach to SEOing the new product: build out some great content on the web site, go after some links, do some PR, etc. The competition took a non-traditional approach: They slapped together a PDF with a couple pages of text content about the product, uploaded it to their web site, linked to it from their home page, and in no time flat that PDF file had the No. 1 ranking in both Google and Yahoo! for the relevant keywords. What’s worse — we had a heck of a time getting the great content we developed to outrank the PDF file. Ultimately, we followed the “if you can’t beat ’em, join ’em” theory and produced our own PDF which immediately started battling the competitor’s PDF for search engine supremacy, until our great content and links eventually caught up and won the battle.
With so many businesses — especially retailers — having access to PDFs full of product information, here are some thoughts on optimizing PDFs for search engine visibility.
1. All three major engines can crawl and index text-based PDFs. If you need proof, just do a search on each SE with [pdf] in the query. Google: white paper pdf … Yahoo: white paper pdf … MSN: white paper PDF
2. PDF optimization is similar to optimization for a regular content page. Try this: good use of keywords/phrases, appropriate headlines and sub-headlines, solid content that reads well to a human eye, etc. If the PDF will include images, a caption underneath each image would be a good idea, especially if the caption includes a targeted keyword/phrase. (Of course, don’t overdo it. Remember my mom’s advice about SEO.)
Proof: Using the search above, we find this PDF ranked prominently in all three engines. On page 9 of this PDF, there’s a bold content heading (the equivalent of an H2): Awareness and Usage of the XML Button. Let’s not use the exact text, but something close: Here are the SERPs for [xml button awareness]: Yahoo, Google, and MSN. In each case, you find the PDF ranked highly in the SERPs and that exact bold content heading showing prominently in the snippet.
3. The most important thing where PDFs and SEO is concerned is how the PDF is created. Don’t use Photoshop to make your PDF, because when you do that, you’re actually making a big image file, not a true PDF — and the spiders cannot crawl or “read” the text from that image file. The PDF should be created with a text-based program, like MS Word or Adobe Pagemaker, so that the final product is text-based and can be crawled.
4. Your PDF can reside anywhere on your site, but the same rule about spiders not being likely to crawl content that’s too deep applies. The safest thing to do is to put it as close to the root directory as possible.
5. When publishing a PDF on your site, you should very visibly link to the PDF from the home page, or from some page that gets crawled regularly. You have to lead the crawler along so it finds the new content as quickly as possible. Don’t just post the PDF and then cross your fingers that it gets crawled. (See my old post, Training the Crawlers for more.)
6. It’s probably a good idea to use a keyword when naming the files, such as keyword.pdf. I haven’t done any serious investigation on what impact this has, but it would seem to be a good idea to use a keyword when naming the file — to be safe, in case there’s a little boost to be had.
So that’s my quick and dirty overview on PDF optimization for SEO. What do you do with PDFs, if anything?
[tags]seo, pdf, web content[/tags]