Skip to content
Snippets Groups Projects

[feature] variants of tf-idf weighting

Merged Tinsaye Abye requested to merge 8-tf-idf-weighting-variants into develop

Closes #8 (closed)

Implements three different variants of tfidf, while the main differences are in the calculation of the tf.

  1. natural: tf x log(N/df)

image

  1. logarithmic: log(1+tf) x log(N/df)

image

  1. augmented: (0.5 + (0.5 * tf) / max(tf))) x log(N/df)

image

Even thought natural and logarithmic have similar performance, the results from the augmented variant looks very off. On a closer look we can see that most sim values are between 0.8 and 1.

image image

Edited by Tinsaye Abye

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading