Crawlers, New Microdata & Panda: A Selective Recap of SMX Advanced

by Evan Fishkin

OK. I realize this post is a little untimely since SMX Advanced ended like … uh, three weeks ago. But due to having to grapple with The Plague, which I inadvertently picked up during one of four visits to five different airports in a 10-day period, I’m just now able to deliver my top 3 takeaways from the Seattle conference. So maybe I exaggerate just a bit. But now, without further ado (drumroll, please)…

Top Three Takeaways from SMX Advanced 2011
1.Crawl Issues Are Still Annoying
2.Schema.org microdata may transform SERPs
3.KungFu Panda 2 is better than Google’s Panda 2.2 Update

NOTE: I may not be able to clearly articulate my argument and the true merits of KungFu Panda 2, but I need only to submit any Seth Rogen line from that film to be confident in my decision.

Those Crawl Issues and GoogleBot
Webmasters always have had troubles with GoogleBot not properly crawling their site. Fortunately, making GoogleBot more sophisticated and making it capable of amazing crawl features is one of Google’s most recent crowning achievements.

Nobody’s perfect, so fortunately, there are ways to help Google help you.

A Proven Pagination Strategy
REI’s in-house SEO Jonathan Colman shared the retailer’s strategy for dealing with pagination issues. His secret lies in proper use of the rel=canonical attribution. Whenever you have a page that contains even a portion of the content of another page, add a rel=canonical tag that links to the original version of the content.

Jonathan went on to demonstrate the process used by REI. He adds the rel=canonical to any page deeper than the top page of the category and links the tag back to a page that contains all the products that might show up for each of the sections/categories filters.

For example, the tent section on REI.com begins at this URL:

http://www.rei.com/category/4500001_Tents+and+Shelters

In the source code, you’ll see the following:
href= “http://www.rei.com/category/4500001_Tents+and+shelters“>

This code exists on every “Tents & Shelters” sub-page, except on filtered pages. On product pages, where a filter as been applied, the canonical tag points back to the first page of the filter.

Show us the money!  REI has seen significantly increased crawl activity from GoogleBot, improved page load time, reduced duplication in Google’s indices, and more unique pages being indexed.

Caveat!  During Jonathan’s presentation, Maile Ohye of Google, pointed out that this was technically a “bad” use of the canonical as it might prevent or stop Google from crawling past the first page of products. That might result in some products not being crawled in their most relevant categories, with their most relevant filters.

Maile suggests…
Apply the rel=canonical on all pages pointing to the page that showed “ALL” products for the applied categories and filters.

Category: Tents and Shelters

Filter 1: Camping
Items per page: 20

http://www.rei.com/search?cat=4500001_Tents+and+Shelters&jxBest+use=Camping&hist=cat%2C4500001_Tents+and+Shelters%3ATents+and+Shelters^jxBest+use%2CCamping

Canonical tag points to:

Category: Tents and Shelters
Filter 1: Camping
Items per page: 100

http://www.rei.com/search?cat=4500001_Tents+and+Shelters&jxBest%20use=Camping&scv_page_size=109&seq=1&hist=cat%2C4500001_Tents+and+Shelters%3ATents+and+Shelters^jxBest+use%2CCamping

Source code, more than meets the eye?

If you haven’t heard about Schema.org, you’re seriously behind on some of the biggest Internet news of the month. Read on! I’ll help you catch up quickly.

Schema.org, a co-operative effort by Google, Bing and Yahoo!, has released the next generation of microdata mark-up for the Internet. You can find the instructions and information necessary to pioneer this awesome future at schema.org/docs/gs.html.

OK, double rainbow witnessed. But what does it mean?! On an incredibly simple level, it means more of this:

http://www.google.com/search?q=12%2B13

And less of this:

http://www.google.com/search?q=pythagorean+theorem

That’s right. Less results, more answers.

On a more complex level, it means that search engines will be able to (with a significant amount of webmaster effort) classify content and information into specific types and will be better equipped to understand the difference among a person’s name, a hotel and a restaurant, or among a series of random numbers, equations, dates and times.

The full scope of the new microdata is relatively small in comparison to the scope of information descriptors, but large in comparison to the pre-existing microdata in use. Overall, this is a big step for search engines.

Gaping Holes Everywhere
The potential for inaccurate use of these microdata formats is massive. Making webmasters responsible for adding micro-data code to their websites means the onus of verification of the accuracy of that data is on the search engines. We have yet to see what those validation processes will look like.

There’s a Panda Eating Your Buffet Lunch
Content farms were so profitable. Were.

Thanks to Panda (we’re up to version 2.2, in case you were wondering), that business model has been pretty well slapped out of existence. So far, there doesn’t seem to be any way around it. Which is good. Because that was the point of Panda.
Google’s main mission always has been to index the web and drive searchers to the most relevant content for their queries. Granted, some content farms did an excellent job of providing answers, but they did an inexcusably rotten job of providing valuable, unique, visitor-oriented, high quality content.

They weren’t called “content farms” for nothing. They were producing and propagating content at unprecedented rates. They weren’t focusing on creating unique content that people would care about. They were more focused on mass production for the purpose of scamming search engines into believing they deserved to be No. 1 in the SERPs.

Ultimately, Panda is a great example of a search engine filtering its index for great content. And for those with lots of duplicate content, scraped or scraped and mashed content, it’s a massive hurdle to clear in order to rank well. The trick to this hurdle is … (GASP! SHOCK!) … writing unique, compelling, enjoyable and informative content and not stuffing your site with advertising. If you aren’t a low value website, don’t look like one!

Thanks for taking the time to read this. I hope you got something out of it all. As always, if you have any questions, please contact me or leave a comment below.

Have fun,
Evan Fishkin

general-seo, marketing-strategy, on-page-seo, seo-research

Crawlers, New Microdata & Panda: A Selective Recap of SMX Advanced