Counting word-frequencies
I have created a small C# program that parses a text and counts the frequencies of the individual words. The ultimate goal is to compare the words with a list of words that the user already knows, and then come up with 'most frequent unknown words' (and maybe link them to a dictionary file with translations).
But with inflected languages like Russian, it is hard to join multiple cases of the same word in the same bin. Of course there are solutions for this issue (I know Google can find different cases of the same word), but does anyone know of a published solution?