document frequency pre calculation

To have something to start with, i should pre-process all sentences and save for each word its document frequency in some sort of map. This can letter be access during the similarity measurement.