Skip to content

[feature] variants of tf-idf weighting

Tinsaye Abye requested to merge 8-tf-idf-weighting-variants into develop

Closes #8 (closed)

Implements three different variants of tfidf, while the main differences are in the calculation of the tf.

  1. natural: tf x log(N/df)

image

  1. logarithmic: log(1+tf) x log(N/df)

image

  1. augmented: (0.5 + (0.5 * tf) / max(tf))) x log(N/df)

image

Even thought natural and logarithmic have similar performance, the results from the augmented variant looks very off. On a closer look we can see that most sim values are between 0.8 and 1.

image image

Edited by Tinsaye Abye

Merge request reports