[feature] variants of tf-idf weighting
Compare changes
Files
14
output/mongoElkanJaro-07-2020Jun01-1659.csv
0 → 100644
+ 21
− 0
\ No newline at end of file
Closes #8 (closed)
Implements three different variants of tfidf, while the main differences are in the calculation of the tf.
Even thought natural and logarithmic have similar performance, the results from the augmented variant looks very off. On a closer look we can see that most sim values are between 0.8 and 1.