What does Google do with all that click-thru data?
Have you ever thought about how Google might use all that clickstream data it has? For some reason, this question popped into my head on the flight home from SES San Jose last week.
Background: What Click-thru data?
If you have a Google account, and if you have the Google toolbar installed (or if you’ve manually enabled the Web History tracking in your Google account), Google will keep track of every search you perform and every click you make in the SERPs. It’s kinda like this:
If you right-click, choose “Copy Link Location”, and see a really long, ugly URL like the second one above, Google is tracking what you’re doing. They know the URL you clicked, the position it was in on the SERPs, and who-knows-what-else. (I blurred some of the above image in case my personal ID or something is in there.) 🙂
Back to the Question
As I said, I just started thinking about this on the plane. What could Google possibly do with all this click-thru data, other than store it in my personal web history for my own convenience? It was a short flight, so I only came up with two possibilities:
1.) Use it to influence rankings.
Clicks can be gamed. A programmer can build a spider that runs a bunch of searches and then clicks on certain listings. That makes the value of click-thru data questionable. So, whenever this subject comes up at search conferences, Google reps (and reps from other engines) always say they’re very hesitant to put too much faith in clicks as a factor in the rankings algorithm.
On the other hand, isn’t it reasonable to assume that some amount of click analysis is included in the algorithm? And isn’t it also reasonable to assume Google is able to filter out some of the automated clicks on the organic side? (They do that on the PPC side to fight click-fraud, don’t they?)
2.) Use it to influence/determine crawl and indexing depth.
There are several factors that determine how deeply a site will be crawled. Aside from crawlability, I’m talking about things like quantity and quality of incoming links. These are what influence PageRank, and PageRank is a strong indicator of how deeply and often pages/sites will be crawled.
At the same time, couldn’t click-thru data be used to influence the depth a site will be crawled, and the amount of pages that will be indexed? Think of it this way: If you have 100 pages on your site, and all 100 are in the index, but only 75 generate clicks on a regular basis, isn’t it reasonable to assume Google would drop the 25 pages that no one ever clicks on? (Let’s assume these pages also don’t have many inbound links or other signs of trust/authority/quality.)
My gut tells me click-thru data has a very slight influence on rankings and indexing depth. But I’d love to hear your thoughts: Agree or disagree with me? How else might all that click-thru data be used?
Comments are open, so fire away.
Matt,
I would have to go with the somewhat popular yet even more mythical “personalization” approach. Although I have heard a lot more bluster about this in recent years than actual proof I would have to believe that is still important to Google. I would like to see results that are more relevant to me but I also fear that if personalization of search was taken to its extreme it may actually limit my ability to find something “out of the box”. Double edged sword for sure but in this world of people wanting to be shepherded like sheep to the most convenient answer for everything I suspect enough people out there would enjoy the ability to think and experimentless if Google said it was OK.
For sure, Frank. They’ve said personalization involves adjusting the SERPs to match the content and types of content you’ve clicked on before. Good addition to the list. Thank you. 🙂
still not sure if you are talking about PPC or SEO…
if you are talking PPC.. no clue..
if you are talking about SEO.. i’m still trying to figure out if this is a serious post or linkbait..
if this is actually a serious post, you can start here..
http://en.wikipedia.org/wiki/Bounce_Rate
=)
Thanks, paisley, but what does bounce rate have to do with what we’re talking about? We’re talking about the data Google collects from activity on its SERPs, not your own site analytics.
And can you explain the difference between a “serious post” and linkbait for me, too?
Hmm, this is tricky – because relevance is not just pre-click but post-click – did the searcher find what he/she was looking for? How long did he/she stay on the site?
Well, mission accomplished depends on what the query is – maybe one was looking for the answer to a question and a 10 second visit was all you need.
Perhaps Google takes the most popular search terms and tests out click data on those first – like “credit cards” for example – are people looking for information or applications? Are they looking for comparison sites or specific credit card issuers’ sites? Often comparison sites outrank the banks in US search, but not in Canada, and you get UK results mixed into both .com and .ca versions of Google. And don’t forget wikipedia and blended search results.
It would make sense for Google to track which types of sites attract clicks – if nobody cares about credit card news or images, then it’s better for Google to show just pages. If nobody wants to know the history of credit cards through Wikipedia, then skew that one down a bit.
I believe Google has the ability to do manual programming (thoughts?) on priority terms. But we love to speculate!
They may not use it to directly affect SERPS, but rather to look for interesting patterns. For example, perhaps users in one geographical area click a certain result more than others, or perhaps there’s patterns based on the time of day that the search takes place, etc.
I used data harvesters many times and although you can target specific keywords, I haven’t seen anything yet that could simulate different searches made from different IPs so I am sure Google could easily identify click fraud.
I think what Paisley was trying to illustrate (albeit poorly and rude) was that bounce rate is a valid metric to consider here.
Where if the click through rate on a position in the serps is high, but the user just bounces back to the serps to look for another result, then the serps didn’t do their job in providing the most valid results and that particular result with the high bounce rate gets voted down as it doesn’t engage the user and provide a vaild result to the search phrase
@Linda — great point about certain queries leading to certain types of behavior, sometimes including only spending a few seconds on the clicked-to site. They have so much data, and so many smart mathematicians / scientists / engineers, I would think there are already IF/THENs built into their data analysis. IF it’s this kind of query, THEN the click-thru might lead to a short visit, etc. I’m sure that’s part of any analysis they’re doing, don’t you think?
@adw – good call on the geographic nature and even time-based nature of click-thru behavior.
@CMG – thanks. And so they could do that on both the paid results and the organic results, yes? I would think so.
@Conrad – understood, thank you. And what Linda was talking about would certainly factor into that. Some searches don’t require lengthy stays on the site you click through to visit.
Great comments, everyone – thank you so much. Good stuff to think about.
Hello,
I am a layperson, so I would just like a response that is not too technical.
I am updating my site, so I check my SERPs for my key words on a daily basis, and frequently go to the various pages on my website. So between these two activities, we are talking about maybe 50 clicks per day that are related to my website, http://www.creativecounselors.com. I like the google toolbar (primarily to have access to my google bookmarks). However, I will dismantle it if it is potentially detrimental to my SERP’s if Google is tracking me and thinking that I am artificially trying to inflate my SERP’s. Should I get rid of the google toolbar?
Thanks,
Garrett