With 15 years experience in the search industry — including several years as a key member of the Yahoo! search team and now as Senior Director of Product Search and Comparison Shopping for the shopping search engine Become.com — Jon Glick knows how search engine algorithms work. He speaks on the subject regularly at SES conferences, and was kind enough to answer a wide range of questions from me over the past two months after we met at SES San Jose. This is Part 2 of our 3-part interview.
(…continued from part one)
Matt: When you talked with Mike Grehan in 2004, you said: “…the issue of ‘I’m number one for this keyword’ may not exist at all in a few years. You know, you’ll be number one for that keyword depending on who types it in! And from where and on what day….” How much closer are we to that kind of scenario?
Jon: We are just starting to see this with the personalization that Google offers as an option. Search engines are going slow on this for a couple of reasons. First, there are privacy concerns. If search gets too ‘psychic,’ it gets spooky for some people as they realize just how much PII (personally identifiable information) these companies have about them. The recent AOL debacle over releasing query information highlighted this. The second reason is that there is more ROI in personalization technology that serves relevant ads than in developing better search results. I actually think that when personalized search becomes more widespread, it will be from the data and algos developed for contextual ad serving migrating to organic search. Of the various personalization technologies, local is the one most likely to impact organic search. The engines are already extracting local indicators such as area codes and ZIPs from web pages. Using the user’s ZIP code or the location of their IP address is generic enough not to be considered PII.
In that same interview, you gave Mike some great linkbait when you explained that Yahoo uses the “Keywords” meta tag for “matching, not ranking.” Has anything changed since then?
Not much has changed. The meta keywords tag is still so contaminated on many sites that search engines can’t use it to improve ranking, which is a shame. With tagging becoming such a phenomenon, search engines would love to use that input to improve results, but as soon as we do, tag spam will become an issue.
Reciprocal linking — good or bad?
Good in moderation. From a connectivity perspective a link is a link, but the search engines are constantly improving their scanning for artificial link patterns. Truly popular sites may share some links, but it’s a fairly small percentage of their total inlinks. If a site gets 90% of its inlinks from reciprocal sources it’s probably engaged in some artificial form of link creation/trading. The AIR™ algo that we developed here at Become.com looks at the page content on each side of the links to validate the quality. Reciprocal links from sites on the same topic are counted more strongly, while off-topic reciprocal links will actually hurt a site’s ranking. This makes sense. If your office supply site has tons of reciprocal links with poker sites, the links were probably not organic. I would assume that other search engines will also evolve to this type of approach. The big risk with doing lots of reciprocal linking is that that once the spam cops find and ban one site that you have traded links with, it’s not hard for them to find your site as they kill off the rest of the “farm.”
What about link buying – have the engines figured out yet how to identify a link that’s been paid for vs. an editorial link?
If a site sells you a link, that alone is very hard to detect. However sites that sell links tend to sell them to lots of sites and that’s what makes detection much easier.
Every site only has a fixed amount of connectivity to impart to other sites. The more links they sell, the less each one is worth. People tend to look at the PR of the site, but not how many links it is already giving. The double whammy is the more links a site sells the less you get for your money, and the greater the risk that the search engines will disqualify links from that site. An interesting and innovative buying strategy was Rand Fishkin’s. He bought links from high PR sites that didn’t realize that they were selling links (Harvard.edu in his case). That way they were less likely to sell links to lots of competitors. It worked well for a while, but eventually Google caught on. [Ed. note: For background on this, see Search Engine Roundtable's recap of the SES 2006 New York panel, Search Algorithm Research and Patents.] You still see this on TV and radio station websites where they’ve sold static links as ad space and aren’t considering the value of the PR they provide, e.g. sites like http://www.kfan.com/main.html.
You mentioned local search…. If I search for “seattle hotels” on Google and then on Google Local, the SERPs are different. This happens across the big three engines and their local versions. So the obvious question is: Why? How does a local search algo differ from a regular algo?
The general web search and the local search are searching different corpora and using very different algos. When “seattle hotels” is entered into the general search, it’s the pages with the most keyword hits, highest connectivity, etc. that rank well. This is just like any other general search query and the algo doesn’t have any semantic understanding that the search is local; the search is just keywords, and the pages are just keywords and links.
In contrast, the local search functions more like a yellow pages directory. It ignores “authority” from sources like link connectivity and instead relies on some basic structured data like the proximity of the hotel to Seattle’s city center. It’s searching a set of hotel listings as opposed to pages that have been crawled from the web.
What this points to is the weakness of the one-size-fits-all general algos. Google, Yahoo!, etc. know that the search is local, you can see that from the local data that they show above the first result, but they use the same algo as with any other search. Ideally, they should weight ranking factors differently for each type of query.
So, why don’t they? Would it be too slow to use some kind of query classification in the main algorithm?
Search Engines are doing the query classification up front, so speed isn’t really the issue; I think it comes down to risk. Using that special vertical algo might yield great results, but if you get the intent wrong the results are going to appear very broken. It’s easy and safe to show general results for all queries, and just put that vertical “shortcut” at the top of the results. That way if it’s off topic people just ignore it.
You also mentioned PageRank earlier. There’s still a good debate about its importance in Google’s algorithm. What would you say about its current value?
Link connectivity is still the core of Google’s algorithm. The connectivity systems used today are still based on the general PageRank principles, but have grown increasingly sophisticated over time. That’s probably why Google keeps telling people not to get too obsessed with the PR “fuelbar” on the Google toolbar. The one additional factor that has really come to the fore is targeted anchor text. It’s not just who links into you, but how they describe your site that counts these days if you’re trying to rank for particular keywords. Getting more of these types of links is the best thing you can do for your Google ranking.
Do the engines give blogs too much credit & importance?
They used to, but I think they are being a bit more rational now. When Google started indexing blogs they tended to recrawl them extremely aggressively and rank them highly. This naturally fed the explosive growth of blog spam, so all of the SEs have backed off. I think blogs are getting pretty fair treatment these days.
What do you think about all the article trading going on — people writing articles and getting them posted on a half-dozen other sites just to get the inbound link. Will we see algorithm tweaks targeting this “article spam”?
Personally, I’m OK with this. If someone generates content good enough that others want to post it, that’s far better qualification than just trading a link. Plus, it encourages people to generate new content. Right now search engines are focusing on identifying duplicated content at the page level, but as article trading increases they may have to get more granular and filter out articles that exist on multiple pages. What’s forcing this is the growth of the collage sites where spammers are “borrowing” articles to create the look of real content. SEs need to be able to better detect this because it’s running rampant right now and getting through the dup filters.
What’s the status of using click-thrus (in organic SERPs) as a ranking factor? Has it been used? Is it being used now?
Vertical search engines like comparison shopping sites and image searches make great use of CTR, but the major algorithmic search engines aren’t using these data for ranking. Believe me, we’d love to, but as soon as we do we’re inviting spammers to hit our sites with click-bots. Right now it’s just too easy to game.
CTR is used for testing algos against each other. If I’m thinking of pushing out a new algo I can try it with a small percentage of the site’s users and see what happens to CTR by position. The better algo will have more clicks and a greater share of the clicks on the top results.
What about time spent on the clicked-on site? It seems difficult to me to measure this with any degree of trust and accuracy.
It’s unlikely that search engines are using this. As you pointed out, it is difficult to measure, and it would reward sites that had user-unfriendly practices like mousetrapping or back button disables. From my perspective, this is one example of where Google has a laundry list of possible approaches in a patent application, the vast majority of which they have no plans to use.
I recently noticed some jumbo-size listings in Yahoo’s SERPs. How does that happen? Are those sites part of the paid inclusion program, perhaps?
These were probably just a temporary hiccup in the snippet generation system in Yahoo!’s SERPs. It looks like they’ve cleared that up. This is actually much less likely to happen with the paid inclusion program since all the titles and descriptions are supplied in highly validated feeds. Yahoo!’s SSP (Search Submit Pro) program can be a great way to get high quality traffic (we use it here at Become.com), and they hold users to a high quality standard. The ALL CAPS title in that listing would get rejected if it were submitted to the SSP program.
I’m gonna steal a page out of ESPN’s book at this point. They bring their experts on and do a thing called “Fact or Fiction”, where the host makes a statement and the expert answers if it’s fact or fiction, and then usually gives a quick explanation. I thought we could do that for some SEO myths.
1) You can’t be hurt by sites that link to you.
Fact. They won’t hurt your ranking; otherwise I could just create a bunch of spammy domains, link to a site I didn’t like and bam! they’re penalized. I might even report my own spam to expedite the process. What having a bunch of spammy sites linking to you may do is draw scrutiny from the spam cops, but if your site’s following all the rules you’ll be fine.
2) It’s good to link out to authority sites in your industry.
“Faction”. It won’t hurt, though doesn’t really help either. Authority linking conveys a minor benefit in Kleinberg-style algorithms like Ask.com’s, but has no real impact in the larger engines.
3) 250 words per page is optimal.
Fiction. Search engines factor in page length when computing keyword density, so there is no optimal length. A product page with 50 words might repeat “ipod” five times and be very readable, but if a 3,000 word essay has “ipod” 300 times that’s probably the result of keyword stuffing.
4) Directory listings are still a valuable SEO tactic.
Fact. Not only is a directory link good for PageRank, it also makes sure you’ll get crawled. Directory pages are commonly used as seed pages for most crawlers. Also, since these directories are human reviewed for acceptance they are less likely to accept blatant spam. In the case of Yahoo! Directory there is also a $300/yr. cost that tends to help keep low quality pages out.
5) Registering a domain for several years is a good SEO tactic.
Fact. There is a minor benefit to domains with longer registrations. It shows that the site is planning on being around a while, and makes it more costly for spammers to buy disposable domains. Just like when the IRS determines who to audit, each “flag” is worth a certain amount, and if you score too highly, boom – you’re audited. A single year registration is just one flag.
6) Private/hidden domain registration is a bad SEO tactic.
Fact. This is one more potential “flag” that can earn you points to a spam audit. It works well for spammers who are attempting to keep the engines from finding all their sites in one swoop. Those guys are in the business of disposable domains; if you’re not it’s better to avoid this tactic.
Coming up in Pt. 3: Froogle, Become.com, shopping search, and SEO for retailers
[tags]seo, sem, google, yahoo, msn search, msn live search, ask.com, jon glick, personalized search, local search, link building, link buying, reciprocal linking[/tags]