[feature] variants of tf-idf weighting
Closes #8 (closed)
Implements three different variants of tfidf, while the main differences are in the calculation of the tf.
- natural: tf x log(N/df)
- logarithmic: log(1+tf) x log(N/df)
- augmented: (0.5 + (0.5 * tf) / max(tf))) x log(N/df)
Even thought natural and logarithmic have similar performance, the results from the augmented variant looks very off. On a closer look we can see that most sim values are between 0.8 and 1.
Edited by Tinsaye Abye
Merge request reports
Activity
added enhancement label
mentioned in commit 096da6a7
mentioned in issue #9 (closed)
Please register or sign in to reply