Both Amazon and Netflix's business models rely on effective recommendation systems. The recommendations provided by such systems are based on the purchasing habits of millions of customers. As such, these systems are non-trivial and have evolved out of years of research in both academia and industry.
In addition to mining millions of customer transaction records, for many products there is a vast amount of information available online. While I do not have a lot of familiarity with recommendation systems literature, it seems obvious that the Web and Social Media is a great source of information that could be useful when building such systems. Bloggers' profile pages, wishlists, netflix queues, book lists and the blog posts themselves are potential clues to learn which two items may be related to each other.
As a simple example, consider the movie "Pulp Fiction", by querying Google for all the inlinks to the IMDB homepage of
Sin City Pulp Fiction and counting which are the other movies that are "co-cited" here is a list of five movies that are most likely to be related to "Pulp Fiction":
Most of these look quite relevant. Some critics have claimed similarities between Pulp Fiction and Snatch. One surprise though was LOTR, I wouldn't have expected it to be grouped with Pulp Fiction, but I guess I like them both very much -- so it seems reasonable in my case atleast.
Just for fun, here is another example with "Sin City" another one of my favorite movies.
Unless you have a large index of the Blogosphere or the Web, it would be quite inefficient to mine for such correlations (by passing queries to search engines) on a large scale. I do not know how much of the search engine information is leveraged in recommendation systems built by Amazon or Netflix. It might also be worth looking into differences in the recommendations produced on the basis of "how people co-cite two products" vs. "how people purchase two products".