I know this isn`t really a backtrack specific thread but i dont know where else to post.
I have a 25Gb folder full of wordlists.
I wanted to combine and clean these so i began looking into unix commands. Obviously there are the simple ones...
But that still leaves alot of junk info in the final file. and takes ages to complete. It also requires me to check up on it and run the next command etc etc.Code:Combine: cat file1.txt file2.txt > outputfile3.txt ---------------------------------------------- Sort: sort filename | uniq ---------------------------------------------- Remove Duplicates: sort -u -o new_file old_file ----------------------------------------------
But in my travels i found a page all about sorting and cleaning up wordlists. Removing html tags, emails etc. They gave a full run through of the commands used, but again its gonna take too much faffing about. But they did give an all-in-one set of instructions. But again, needs faffing and checking up on after each command.
What i want to do is turn this into a full script i can just run and have it do all the commands one after another and give me a final result. Prefferably a script that i can just point to the folder and run. But i have no idea about scripting and wondered if there is anyone out there that could help me???Code:AIO + Sort cat * > /tmp/aio-"${PWD##*/}".lst && rm * && mv /tmp/aio-"${PWD##*/}".lst ./ tr '\r' '\n' < aio-"${PWD##*/}".lst > stage1-tmp && tr '\0' ' ' < stage1-tmp > stage1-tmp1 && tr -cd '\11\12\15\40-\176' < stage1-tmp1 > stage1-tmp && mv stage1-tmp stage1 && rm stage1-* htmlTags="a|b|big|blockquote|body|br|center|code|del|div|em|font|h[1-9]|head|hr|html|i|img|ins|item|li|ol|option|p|pre|s|small|span|strong|sub|sup|table|td|th|title|tr|tt|u|ul" cat stage1 | sed -r "s/ */ /gI;s/^[ \t]*//;s/[ \t]*$//;s/<[^>]*>//g;s/^\w.*=\"\w.*\">//;s/^($htmlTags)>//I;s/<\/*($htmlTags)$//I;s/&*/&/gI;s/"/\"/gI;s/'/'/gI;s/'/'/gI;s/</ stage2 && rm stage1 sort -b -f -i -T "$(pwd)/" stage2 > stage3 && rm stage2 grep -v " * .* " stage3 > stage3.1 grep " * .* " stage3 > stage3.4 rm stage3 for fileIn in stage3.*; do cat "$fileIn" | uniq -c -d > stage3.0 sort -b -f -i -T "$(pwd)/" -k1,1r -k2 stage3.0 > stage3 && rm stage3.0 sed 's/^ *//;s/^[0-9]* //' stage3 >> "${PWD##*/}"-clean.lst && rm stage3 cat "$fileIn" | uniq -u >> "${PWD##*/}"-clean.lst rm "$fileIn" done rm -f stage* #aio-"${PWD##*/}".lst wc -l "${PWD##*/}"-clean.lst md5sum "${PWD##*/}"-clean.lst
The source for this is here


