RSS Feed for This PostCurrent Article

Optimizing PDFs for SEO

pdf logoA couple years ago, while I was working at OWT, one of our clients was launching a new product — but they weren’t the only company doing so. One of their main competitors was launching the same product, sourced from the same manufacturer. The product didn’t have much search history. It also didn’t have much competition for the relevant keywords.

Still, we screwed up. We took a very traditional approach to SEOing the new product: build out some great content on the web site, go after some links, do some PR, etc. The competition took a non-traditional approach: They slapped together a PDF with a couple pages of text content about the product, uploaded it to their web site, linked to it from their home page, and in no time flat that PDF file had the No. 1 ranking in both Google and Yahoo! for the relevant keywords. What’s worse — we had a heck of a time getting the great content we developed to outrank the PDF file. Ultimately, we followed the “if you can’t beat ‘em, join ‘em” theory and produced our own PDF which immediately started battling the competitor’s PDF for search engine supremacy, until our great content and links eventually caught up and won the battle.

With so many businesses — especially retailers — having access to PDFs full of product information, here are some thoughts on optimizing PDFs for search engine visibility.

1. All three major engines can crawl and index text-based PDFs. If you need proof, just do a search on each SE with [pdf] in the query. Google: white paper pdf … Yahoo: white paper pdf … MSN: white paper PDF

2. PDF optimization is similar to optimization for a regular content page. Try this: good use of keywords/phrases, appropriate headlines and sub-headlines, solid content that reads well to a human eye, etc. If the PDF will include images, a caption underneath each image would be a good idea, especially if the caption includes a targeted keyword/phrase. (Of course, don’t overdo it. Remember my mom’s advice about SEO.)

Proof: Using the search above, we find this PDF ranked prominently in all three engines. On page 9 of this PDF, there’s a bold content heading (the equivalent of an H2): Awareness and Usage of the XML Button. Let’s not use the exact text, but something close: Here are the SERPs for [xml button awareness]: Yahoo, Google, and MSN. In each case, you find the PDF ranked highly in the SERPs and that exact bold content heading showing prominently in the snippet.

3. The most important thing where PDFs and SEO is concerned is how the PDF is created. Don’t use Photoshop to make your PDF, because when you do that, you’re actually making a big image file, not a true PDF — and the spiders cannot crawl or “read” the text from that image file. The PDF should be created with a text-based program, like MS Word or Adobe Pagemaker, so that the final product is text-based and can be crawled.

4. Your PDF can reside anywhere on your site, but the same rule about spiders not being likely to crawl content that’s too deep applies. The safest thing to do is to put it as close to the root directory as possible.

5. When publishing a PDF on your site, you should very visibly link to the PDF from the home page, or from some page that gets crawled regularly. You have to lead the crawler along so it finds the new content as quickly as possible. Don’t just post the PDF and then cross your fingers that it gets crawled. (See my old post, Training the Crawlers for more.)

6. It’s probably a good idea to use a keyword when naming the files, such as keyword.pdf. I haven’t done any serious investigation on what impact this has, but it would seem to be a good idea to use a keyword when naming the file — to be safe, in case there’s a little boost to be had.

So that’s my quick and dirty overview on PDF optimization for SEO. What do you do with PDFs, if anything?

Technorati Tags: , ,

Feel free to share this with friends: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Sphinn
  • StumbleUpon
  • del.icio.us
  • Mixx
  • Facebook
  • TwitThis

Trackback URL

  1. 4 Comment(s)

  2. By gradiva on Nov 26, 2006 | Reply

    Hi Matt - thanks for the interesting post! I definitely think that PDF listings in the search engine results can be about as ugly as it gets.

    From what I can tell, Google (the only engine I researched) may grab a page title from any of the following:

    - document meta data (title)
    - the first line of text
    - the file name
    - text from within the document that is formatted in larger font

    I never saw an example of a document that *did* have a metadata title defined in which the metadata title was not used. (In other words, as far as I can tell Google will always take the metadata title first, before the other options). (My research is about 6 months old so of course things may have changed.)

    Your readers might want to know how easy it is to define a document metadata title: just select File > Properties or File > Document Properties.

    Best wishes,

    Gradiva Couzin
    http://www.yourseoplan.com/

  3. By Matt McGee on Nov 27, 2006 | Reply

    Great reply and information, thanks Gradiva — much appreciated. I haven’t done much study of how and where that metadata title property gets used, so it’s good to see what you’ve discovered. If you uncover more tricks/secrets about PDF SEO, please let me know. :-)
  4. By technomatters on Feb 21, 2007 | Reply

    i don’t think making pdf’s for seo, because many website CMS and directory softwares providing the url rewrite concept.

  5. By Matt McGee on Feb 22, 2007 | Reply

    That’s true, but this isn’t about repurposing a web page in PDF form. It’s about taking PDF material and making it more search engine friendly. Companies generate a lot of PDFs, so if they’re being posted to the web, they should be optimized!
  1. 5 Trackback(s)

  2. Nov 1, 2006: That Girl From Marketing » HP Says No Way to MySpace / Facebook, Writing a Social media Press Release, PDF SEO & So Tired
  3. Nov 3, 2006: Improve Your Ranks Through PDF » Unofficial SEO Blog - SEO Information much before its official!
  4. Nov 8, 2006: | Business Blogs - RSS Feeds - Business Case Studies | Business Thought Leadership | BNET
  5. Jan 19, 2007: Internet Marketing ile ilgili 2006′nın en iyileri at Savaş Şakar Kişisel Sitesi
  6. Dec 13, 2007: The Best Internet Marketing Posts of 2006 » techipedia | tamar weinberg

Post a Comment