Off late, I’ve been looking at search and classification tools, so it’s only appropriate that I make a post on the topic.
These terms can be confusing to differentiate. Similarity is a more general term eg. all documents in a set that talk about linux. The act of clustering similar documents further classifies them eg. documents containing information about linux applications, linux kernel development, linux security etc .
Still confused? Try out http://clusty.com , a cluster classifying search engine and enter a search term . The results are all related (similar) but are classified into discrete clusters.