[feature] variants of tf-idf weighting
Closes #8 (closed)
Implements three different variants of tfidf, while the main differences are in the calculation of the tf.
- natural: tf x log(N/df)
- logarithmic: log(1+tf) x log(N/df)
- augmented: (0.5 + (0.5 * tf) / max(tf))) x log(N/df)
Even thought natural and logarithmic have similar performance, the results from the augmented variant looks very off. On a closer look we can see that most sim values are between 0.8 and 1.
Edited by Tinsaye Abye