Thats exactly as i would do it. Use the STL's map container! It will tell you the particular occurence of each word and more importantly, from this you can get the words without duplicates. You could then use a priority queue to sort the data by priority. I did something similar for a project in college. It involved the use of Huffman Encoding which encodes a text file into a smaller test file in alot less bytes.