by David Tittle
For those of you who haven’t heard, Matt Cutts has stated that Google is now capable of indexing embedded Facebook comments. This was later confirmed by labnol.org’s Amit Agarwal. How do they do this, and why is it important?
How they do it
You might say, “But, Google is full of smart people, right? I’m sure this would be a snap for them!” And, you are right, but there’s a catch. The more complicated you make your web crawler, the longer it takes to run. If you are crawling a small collection of web pages, this would be no big deal, but extend this to the entire Internet… It can be very significant. Consider the following (math alert!):
Using a simple scraper I’ve written just now, it takes about 0.75 seconds to load and scrape text from a page (about 80 pages a minute). That’s just the html.
This would mean that my simple scraper would be able to parse about 115,200 of these pages in one day, where the more advanced parser would only be able to parse 41,140 of these pages in one day. This means the simple parser can handle more than twice as many pages as the complex parser, while the complex parser can, potentially, find more content.
Why is it important?
This is good news for Facebook. Their mission of becoming the one, true web platform can only be helped by having their content displayed in search results. This means more people are likely to start incorporating Facebook into their sites. Facebook really should send Google a thank you letter for this one.
It also means that more websites will have their content available. Finally, there’s hope for all those companies that paid way too much for a designer to come in and create a site that loads all of their products via AJAX.
That being said…
- Google is now parsing AJAX requests, including Facebook comments embedded in web pages.
- Pages that are easier to parse are easier to index.
Of course, this must be proven. Perhaps someone from the community has results or thoughts on this issue?