
Originally Posted by
SBerry
Thats exactly as i would do it. Use the STL's map container! It will tell you the particular occurence of each word and more importantly, from this you can get the words without duplicates. You could then use a priority queue to sort the data by priority. I did something similar for a project in college. It involved the use of Huffman Encoding which encodes a text file into a smaller test file in alot less bytes.

Thanks, I knew once I saw maps and how they worked I had my answer how to dedupe a wordlist. I'm curious as to what you mean by priority sorting.
This is where I'm at so far with doing both steps. The code works but is probably rather inefficient. Just to learn vectors a little more and to get some code working for the time being I used the vector sort function to take care of any sorting. Unfortunately it doesn't sort properly, what I mean is when you sort the list it will sort capital letters first then lower case letters.
I've been having problems creating an algorithm for string sorting. I've been trying quicksort mainly but for some reason I can't get the code to work, I've also tried insertion sort but it doesn't seem like the best thing and I still can't get the code to work. For now I am setting it aside until I understand C++ better then I can get into writing better algorithms for this sort of thing.
Code:
#include <iostream>
#include <map>
#include <string>
#include <fstream>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
ios_base::sync_with_stdio(false);
ofstream filtered;
ofstream filtered1;
ifstream textfile ("list.txt");
string text_input;
map<string, long int> map_data;
vector<string> sort_vec;
long int i;
if (textfile.is_open())
{
filtered.open("filtered_list.txt");
while( ! textfile.eof() )
{
getline (textfile, text_input);
map_data[text_input]++;
if (map_data[text_input] == 1)
{
filtered << text_input << '\n';
}
}
filtered.close();
textfile.close();
cout << "Filter Process Complete!" << endl;
map_data.clear();
}
else
cout << "Unable to Open file: " << endl;
ifstream textfile1 ("filtered_list.txt");
if (textfile1.is_open())
{
filtered1.open("Filtered_Sorted.txt");
while( ! textfile1.eof() )
{
getline (textfile1, text_input);
sort_vec.push_back(text_input);
}
sort(sort_vec.begin(), sort_vec.end());
for (i = 0; i < sort_vec.size(); i++)
filtered1 << sort_vec[i] << endl;
cout << "Sorting Process Complete!" << endl;
filtered1.close();
textfile1.close();
sort_vec.clear();
}
else
cout << "Unable to Open file: " << endl;
system("pause");
return 0;
}
This will not handle wordlists 2GB+ I already tried it on pureh@te's wordlist. As a matter of fact notepad won't even open it. I don't understand the reason to having such large wordlists anyways since it is going to do nothing but slow your pc down when its used. I know a lot of programs have trouble handling too large of wordlists anyways. Split them up into reasonable sizes so they are easier to work with.
I'm also unsure how large of a list vectors can hold before they overflow and crash which is another reason why I want to make my own algorithm. But this code is small and I've learned a lot from it. Even my maps declaration of long int will only hold about 2.4 billion words.

Originally Posted by
level
No cat.....sounds familiar. I think he just wants some C++ help, maybe for a school project.
I'm not attending any schools this is just for me to learn C++.