Have you ever thought about how Google might use all that clickstream data it has? For some reason, this question popped into my head on the flight home from SES San Jose last week.
Background: What Click-thru data?
If you have a Google account, and if you have the Google toolbar installed (or if you’ve manually enabled the Web History tracking in your Google account), Google will keep track of every search you perform and every click you make in the SERPs. It’s kinda like this:
If you right-click, choose “Copy Link Location”, and see a really long, ugly URL like the second one above, Google is tracking what you’re doing. They know the URL you clicked, the position it was in on the SERPs, and who-knows-what-else. (I blurred some of the above image in case my personal ID or something is in there.) 🙂
Back to the Question
As I said, I just started thinking about this on the plane. What could Google possibly do with all this click-thru data, other than store it in my personal web history for my own convenience? It was a short flight, so I only came up with two possibilities:
1.) Use it to influence rankings.
Clicks can be gamed. A programmer can build a spider that runs a bunch of searches and then clicks on certain listings. That makes the value of click-thru data questionable. So, whenever this subject comes up at search conferences, Google reps (and reps from other engines) always say they’re very hesitant to put too much faith in clicks as a factor in the rankings algorithm.
On the other hand, isn’t it reasonable to assume that some amount of click analysis is included in the algorithm? And isn’t it also reasonable to assume Google is able to filter out some of the automated clicks on the organic side? (They do that on the PPC side to fight click-fraud, don’t they?)
2.) Use it to influence/determine crawl and indexing depth.
There are several factors that determine how deeply a site will be crawled. Aside from crawlability, I’m talking about things like quantity and quality of incoming links. These are what influence PageRank, and PageRank is a strong indicator of how deeply and often pages/sites will be crawled.
At the same time, couldn’t click-thru data be used to influence the depth a site will be crawled, and the amount of pages that will be indexed? Think of it this way: If you have 100 pages on your site, and all 100 are in the index, but only 75 generate clicks on a regular basis, isn’t it reasonable to assume Google would drop the 25 pages that no one ever clicks on? (Let’s assume these pages also don’t have many inbound links or other signs of trust/authority/quality.)
My gut tells me click-thru data has a very slight influence on rankings and indexing depth. But I’d love to hear your thoughts: Agree or disagree with me? How else might all that click-thru data be used?
Comments are open, so fire away.