Clustering vs Similarity

Off late, I’ve been looking at search and classification tools, so it’s only appropriate that I make a post on the topic.

These terms can be confusing to differentiate. Similarity is a more general term eg. all documents in a set that talk about linux. The act of clustering similar documents further classifies them eg. documents containing information about linux applications, linux kernel development, linux security etc .

Still confused? Try out http://clusty.com , a cluster classifying search engine and enter a search term . The results are all related (similar) but are classified into discrete clusters.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s