Results 1 to 8 of 8

Thread: robots.txt mining made easy

  1. #1
    Junior Member imported_spudgunman's Avatar
    Join Date
    Feb 2007
    Posts
    78

    Default robots.txt mining made easy

    a report to convert the robots.txt to HTML.. this is just a script to speed up research of domains that think hiding a folder from google keeps people out. I got tired of cutting and pasting for pen tests and so made this.

    I dont think there is anything out there to do this for you so I might have created a first. or not.

    let me know what you think.

    Code:
    #!/bin/bash
    # robotRepoprter.sh -- a script for creating web server robot.txt clickable reports
    # by KellyKeeton.com (c)2008
    version=.06
    # dont forget to chmod 755 robotReporter.sh or there will be no 31337 h4x0r1ng
    if [ "$1" = "" ]; then #deal with command line nulls
    echo
    echo robotReporter$version - Robots.txt report generator
    echo will download and convert the robots.txt
    echo on a domain to a HTML clickable map.
    echo 
    echo Usage: robotReporter.sh example.com -b
    echo
    echo   -b keep orginal of the downloaded robots.txt
    echo
    exit
    fi
    wget -m -nd HTTP://$1/robots.txt -o /dev/nul #download the robots.txt file
    if [ -f robots.txt ]; then #if the file is there do it
    if [ "$2" = "-b" ]; then # dont delete the robots.txt file
    cp robots.txt robots_$1.html
    mv robots.txt robots_$1.txt
    echo "###EOF Created on $(date +%c) with host $1" >> robots_$1.txt
    echo "###Created with robotReporter $version - KellyKeeton.com" >> robots_$1.txt
    else
    mv robots.txt robots_$1.html
    fi
    #html generation using sed
    sed -i "s/#\(.*\)/ \r\n#\1<br>/" robots_$1.html # parse comments
    sed -i "/Sitemap:/s/: \(.*\)/ <a href=\"\1\">\1<\/a> <br>/" robots_$1.html # parse the sitemap lines
    sed -i "/-agent:/s/$/<br>/" robots_$1.html #parse user agent lines
    sed -i "/-delay:/s/$/<br>/" robots_$1.html #parse user agent lines
    sed -i "/llow:/s/\/\(.*\)/ <a href=\"http:\/\/$1\/\1\">\1<\/a> <br>/" robots_$1.html # parse all Dis/Allow lines
    echo "<br> Report ran on $(date +%c) with host <a href=\"http://$1\">$1</a> <br> Created with robotReporter $version - <a href=\"http://www.kellykeeton.com\">KellyKeeton.com</a>" >> robots_$1.html
    echo report written to $(pwd)/robots_$1.html
    #done
    else #wget didnt pull the file
    echo $1 has no robots.txt to report on.
    fi

  2. #2
    Senior Member Thorn's Avatar
    Join Date
    Jan 2010
    Location
    The Green Dome
    Posts
    1,509

    Default

    Quote Originally Posted by spudgunman View Post
    a report to convert the robots.txt to HTML.. this is just a script to speed up research of domains that think hiding a folder from google keeps people out. I got tired of cutting and pasting for pen tests and so made this.

    I dont think there is anything out there to do this for you so I might have created a first. or not.

    let me know what you think.
    That looks promising. I have a pen test against a lab server this week, and may try it on that. I'll let you know how this works.
    Thorn
    Stop the TSA now! Boycott the airlines.

  3. #3
    Junior Member imported_spudgunman's Avatar
    Join Date
    Feb 2007
    Posts
    78

    Default

    Quote Originally Posted by Thorn View Post
    That looks promising. I have a pen test against a lab server this week, and may try it on that. I'll let you know how this works.
    cool, its nothing really "out of the box" Im just putting HTML around the txt file for funsies because other scanners just tell you to "view it" but dont pull it down.

    you can test it out on anything.. a lab server typically wouldnt have a robots.txt file on a lab server as no bots would be on your internal systems.

  4. #4
    Senior Member Thorn's Avatar
    Join Date
    Jan 2010
    Location
    The Green Dome
    Posts
    1,509

    Default

    Quote Originally Posted by spudgunman View Post
    cool, its nothing really "out of the box" Im just putting HTML around the txt file for funsies because other scanners just tell you to "view it" but dont pull it down.

    you can test it out on anything.. a lab server typically wouldnt have a robots.txt file on a lab server as no bots would be on your internal systems.
    This lab server is for a blind test, and may or may not have a robot.txt file (or a lot of other stuff.) I'm allowed to use any tool on it I want, so if there is a robot file, I'll use this. The HTML will look pretty in the the report appendix.
    Thorn
    Stop the TSA now! Boycott the airlines.

  5. #5
    Junior Member
    Join Date
    Jan 2010
    Posts
    42

    Default

    nice attempt. Thanks

  6. #6
    Senior Member ShadowKill's Avatar
    Join Date
    Dec 2007
    Posts
    908

    Default

    Quote Originally Posted by Fr0zen Sm0ke View Post
    nice attempt. Thanks

    Attempt



    "The goal of every man should be to continue living even after he can no longer draw breath."

    ~ShadowKill

  7. #7
    Junior Member
    Join Date
    Jan 2010
    Posts
    42

    Default

    Quote Originally Posted by ShadowKill View Post
    Attempt
    Oh i'am sorry. I did not meant to say that. I tried the script and its working perfectly.

  8. #8
    Senior Member ShadowKill's Avatar
    Join Date
    Dec 2007
    Posts
    908

    Default

    Quote Originally Posted by Fr0zen Sm0ke View Post
    Oh i'am sorry. I did not meant to say that. I tried the script and its working perfectly.
    That's better

    Good to know that it's being tested and shown working. Nice piece of code you have there brother. Consider it added to my toolkit......



    "The goal of every man should be to continue living even after he can no longer draw breath."

    ~ShadowKill

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •