# Resolve "analyse augmented tfidf"

Closes #9 (closed)

The **issue** with augmented TF-IDF was apparently caused by the fact that
never return 0 for an a>0. This leads to a similarity value above 0 even if the attributes are completely distinct token sets
In fact for example for a=0.4 the minimum sim value is at about 0.7. This can be verified by a frequency analysis of returned similarity values on a given sample dataset.

For a=0 this is not an issue.

As a **solution**, using Cohen et al. definition of TF-IDF, one can overcome the issue of atf(s,t) not being equal to zero for distinct s and t.
Their definition uses the intersection of the token sets which leads to atf(s,t)=0 for two distinct sets.

Edited by Tinsaye Abye