Page 1 of 3 123 LastLast
Results 1 to 10 of 24

Thread: Data Mining

  1. #1
    Just burned his ISO
    Join Date
    Jun 2010
    Posts
    5

    Default Data Mining

    Hello Everyone,

    I went through the forum and found a thread dated back to February about data mining, but it seemed to completely derail into something different entirely. I thought a new thread was in order.

    I am running into a similar situation throughout testing that I would like to improve upon. I generally have a list of exploited boxes which can be actively connected to, browsed, and execute arbitrary code. The issue is that it is too labor intensive to manually rummage through shares and drives looking for relevant data and I'd like an automated process. I'm looking for something more robust than the dir command or grep, because I'd like to see more customized results.

    For example, my original idea was to have regex structures with logical relationships to one another. The presence of these relationships would infer a higher degree of confidence than searching for terms independently. The hope is to reduce false positives and generate some sort of priority system. I was originally going to use meterpreter and make an extension using the existing framework and client blob, but I'm running into some issues with that approach.

    Has anyone come across a tool like this before? I've been known to reinvent the wheel, so I'm reaching out here before I start development. Does anyone see problems/concerns with this strategy?

    Thanks a lot!

  2. #2
    Very good friend of the forum hhmatt's Avatar
    Join Date
    Jan 2010
    Posts
    660

    Default Re: Data Mining

    Quote Originally Posted by Blu3Robot View Post
    I generally have a list of exploited boxes which can be actively connected to, browsed, and execute arbitrary code.
    So what your saying is that you have a list of exploited machines without the user's permission and you don't have the enough time during the day to collect all their personal data?

  3. #3
    Super Moderator lupin's Avatar
    Join Date
    Jan 2010
    Posts
    2,943

    Default Re: Data Mining

    Quote Originally Posted by hhmatt View Post
    So what your saying is that you have a list of exploited machines without the user's permission and you don't have the enough time during the day to collect all their personal data?
    I interpreted this as a common situation the OP ran into during penetration testing (which I assumed to mean legitimate, authorised penetration testing). Since we do tend to get a lot of skiddies here posting about doing things they shouldn't OP, for our peace of mind can you clarify the conditions under which you find yourself in this situation?

    Maybe/maybe not related, but I was recently reading a review of Metasploit Express which mentioned that it had a "Loot" feature to collect a bunch of standard information from 'sploited boxes. The review was a bit fuzzy on the exact details, but perhaps this is something you could use or extend if you get Metasploit Express. It may not enable the type of specific granular control you seem to be after, but it might be worth examining.
    Capitalisation is important. It's the difference between "Helping your brother Jack off a horse" and "Helping your brother jack off a horse".

    The Forum Rules, Forum FAQ and the BackTrack Wiki... learn them, love them, live them.

  4. #4
    Very good friend of the forum hhmatt's Avatar
    Join Date
    Jan 2010
    Posts
    660

    Default Re: Data Mining

    I am failing to think of a instance as to where mass data mining would be a legitimate use of a penetration test (at least from an ethical standpoint). I was hoping either the OP or someone else could shed some more light on this.

  5. #5
    Super Moderator lupin's Avatar
    Join Date
    Jan 2010
    Posts
    2,943

    Default Re: Data Mining

    Quote Originally Posted by hhmatt View Post
    I am failing to think of a instance as to where mass data mining would be a legitimate use of a penetration test (at least from an ethical standpoint). I was hoping either the OP or someone else could shed some more light on this.
    In a client side test where you have managed to get a number of machines under your control you may want to gather information from those machiens in order to properly demonstrate the risk of having vulnerable client systems. It would depend on the scope and purpose of the test of course, but I can see legitimate reasons for doing that. Pentesters routinely grab craploads of PII or credit card numbers in order to demonstrate risk to server style applications, so I dont see why gathering targeted information from client systems is necessarily any different if thats what the client requesting the test is concerned about.
    Capitalisation is important. It's the difference between "Helping your brother Jack off a horse" and "Helping your brother jack off a horse".

    The Forum Rules, Forum FAQ and the BackTrack Wiki... learn them, love them, live them.

  6. #6
    Senior Member Thorn's Avatar
    Join Date
    Jan 2010
    Location
    The Green Dome
    Posts
    1,509

    Default Re: Data Mining

    Quote Originally Posted by lupin View Post
    In a client side test where you have managed to get a number of machines under your control you may want to gather information from those machiens in order to properly demonstrate the risk of having vulnerable client systems. It would depend on the scope and purpose of the test of course, but I can see legitimate reasons for doing that. Pentesters routinely grab craploads of PII or credit card numbers in order to demonstrate risk to server style applications, so I dont see why gathering targeted information from client systems is necessarily any different if thats what the client requesting the test is concerned about.
    Agreed. Sometimes you're seeking specific data -such as credit card numbers- and combing through different machines can be a pain in the butt to try and locate what it is you need to complete the job. However, as hhmatt points out, the line "I generally have a list of exploited boxes..." is certainly open to interpretation.

    Aside from that, that the phrase "data mining" isn't what I use to describe seeking target data in a pen test. To me, data mining is more the process of retrieving previously unknown or undetermined data, by combining information from unrelated databases. For example, combining a database of car sales of a geographic area with a database for preschool-age children in the same area to come up with a data set of potential families looking to purchase new child car seats.
    Thorn
    Stop the TSA now! Boycott the airlines.

  7. #7
    Super Moderator lupin's Avatar
    Join Date
    Jan 2010
    Posts
    2,943

    Default Re: Data Mining

    Quote Originally Posted by Thorn View Post
    Aside from that, that the phrase "data mining" isn't what I use to describe seeking target data in a pen test. To me, data mining is more the process of retrieving previously unknown or undetermined data, by combining information from unrelated databases. For example, combining a database of car sales of a geographic area with a database for preschool-age children in the same area to come up with a data set of potential families looking to purchase new child car seats.
    Yes I agree, the term already has an accepted definition which you have described quite well. I was ready to churn out a "not related to BT" infraction when I read the title (but obviously changed my mind after reading the post)
    Capitalisation is important. It's the difference between "Helping your brother Jack off a horse" and "Helping your brother jack off a horse".

    The Forum Rules, Forum FAQ and the BackTrack Wiki... learn them, love them, live them.

  8. #8
    Very good friend of the forum hhmatt's Avatar
    Join Date
    Jan 2010
    Posts
    660

    Default Re: Data Mining

    Thank you very much Lupin and Thorn. I do remember a few years back when the term data mining started to become popular mostly as a form of (usually illegal) gathering information for things such as marketing and profiling. I can see how the term can be used both ways, since it's such a loosely defined term. I also have a better understanding as to where this fits into a pentest now, mainly according to the terms agreed by the client. I can see where a client may not understand how at risk their company is to losing mass amounts of private information.

  9. #9
    Just burned his ISO
    Join Date
    Jun 2010
    Posts
    5

    Default

    Quote Originally Posted by lupin View Post
    In a client side test where you have managed to get a number of machines under your control you may want to gather information from those machiens in order to properly demonstrate the risk of having vulnerable client systems. It would depend on the scope and purpose of the test of course, but I can see legitimate reasons for doing that. Pentesters routinely grab craploads of PII or credit card numbers in order to demonstrate risk to server style applications, so I dont see why gathering targeted information from client systems is necessarily any different if thats what the client requesting the test is concerned about.
    Hi Everyone,

    Thank you for all the responses. I can tell that I need to be very careful with my wording here

    My situation is exactly like Lupin described. In particular, I am testing a very large number of hosts from inside the firewall (with expressed permission from the owner(s)). The goal is to assess the level of impact that a certain compromised host can have. For example, if I can somehow exploit an apache server, then that is clearly not desirable. The real impact comes from what I can obtain from that server, if anything, and any additional capabilities branching from that. If that server had PII, then it would have a higher impact and more critical to address first.

    I saw a few other questions that I will answer in separate posts.

    Thanks!

    Quote Originally Posted by Thorn View Post
    Agreed. Sometimes you're seeking specific data -such as credit card numbers- and combing through different machines can be a pain in the butt to try and locate what it is you need to complete the job. However, as hhmatt points out, the line "I generally have a list of exploited boxes..." is certainly open to interpretation.

    Aside from that, that the phrase "data mining" isn't what I use to describe seeking target data in a pen test. To me, data mining is more the process of retrieving previously unknown or undetermined data, by combining information from unrelated databases. For example, combining a database of car sales of a geographic area with a database for preschool-age children in the same area to come up with a data set of potential families looking to purchase new child car seats.
    Hi Thor,

    Let me clarify what I meant by a list of exploited boxes as I may not have considered the ambiguity. That list refers to boxes that have identified through the vulnerability discovery phase. This is after the network mapping phase which are both done under the watchful eye of the IT and IDS teams.

    As for the term data mining, that was probably careless on my part. I hope we can get past the semantics though and try to come up with some ideas.

    Thanks
    Last edited by lupin; 06-24-2010 at 12:00 AM. Reason: Merging...

  10. #10
    Senior Member Thorn's Avatar
    Join Date
    Jan 2010
    Location
    The Green Dome
    Posts
    1,509

    Default Re: Data Mining

    Quote Originally Posted by Blu3Robot View Post
    Hi Everyone,

    Thank you for all the responses. I can tell that I need to be very careful with my wording here

    My situation is exactly like Lupin described. In particular, I am testing a very large number of hosts from inside the firewall (with expressed permission from the owner(s)). The goal is to assess the level of impact that a certain compromised host can have. For example, if I can somehow exploit an apache server, then that is clearly not desirable. The real impact comes from what I can obtain from that server, if anything, and any additional capabilities branching from that. If that server had PII, then it would have a higher impact and more critical to address first.

    I saw a few other questions that I will answer in separate posts.

    Thanks!



    Hi Thor,

    Let me clarify what I meant by a list of exploited boxes as I may not have considered the ambiguity. That list refers to boxes that have identified through the vulnerability discovery phase. This is after the network mapping phase which are both done under the watchful eye of the IT and IDS teams.

    As for the term data mining, that was probably careless on my part. I hope we can get past the semantics though and try to come up with some ideas.

    Thanks
    OK, now that we've cleared that up...

    First of all, some purists might argue that what you're asking isn't really about pen testing per se, but more about the goal after you've penetrated the system/network.

    Personally, however, I happen to think that it's a very important piece of what we do. Finding a given vulnerability might impress someone in the IT department that might be rectified (someday) when time and money are available, but tell the CEO you found a vulnerability on Port 173 on the server will make him start yawning in the middle of presenting your findings.

    On the other hand, getting some information that is vital to the company (e.g. customers' credit card numbers) is the kind of thing that makes C-level people sit up and taken notice, and you can see them get heartburn right in front of you as they think about having to explain the potential loss to the board of directors. That's the kind of finding that will actually get things fixed.

    However, my impression is that you have identified some potential vulnerabilities, but don't know exactly what you want to find.

    What you need to find is can only be answered by determining the goal, and that is determined by asking "what kind(s) of things can the client not afford to lose without disastrous consequences?" It may be one type of data, say, the big proprietary company secret, (think of the formula for Coke-a-Cola) or multiple data types such as patient health data and/or patient credit cards, or could also be non-data such as the taking over or disrupting the process control for a chemical plant.

    Of course, once you've determined what the goal is, you have to ask, "where does it live?" After all, looking at a secretary's PC and reading her tweets about how drunk she got last weekend and what she did with the fives sailors may be entertaining, (look for pictures!) but it isn't going to help you track down spreadsheets with the CFO's projections for the next year's secret plans for a potential stock split.

    So ask yourself, are you looking at the CIO's workstation, or the workstation of an engineering team? Small servers running Windows or *nix? How about IBM I-series or even AS-400's mainframes? (Yes, there are still AS-400's out there holding a lot of data...) Or SCADA PLC's and RTU's?

    Now that you've got those questions answered, you can determine what tools (if any) that you can use. It may be a matter of using a commercial tool such as Tripwire; you may be able to just do a simple command line wildcard search for something as simple as a particular file type; or perhaps you'll need to craft some custom packets using Scapy to make an RTU turn off a pump.

    Once you answer those questions: "What is the goal?" and "Where does the data live that we need to find to achieve the goal?", you can start to determine what the tools you'll need. But until you have some direction, searching for any useful data is will be akin to searching for a black cat in a cellar at midnight without a flashlight.
    Last edited by Thorn; 06-24-2010 at 12:24 PM. Reason: Typos; cleaned up some lines.
    Thorn
    Stop the TSA now! Boycott the airlines.

Page 1 of 3 123 LastLast

Similar Threads

  1. Data Mining
    By morpheous in forum Experts Forum
    Replies: 22
    Last Post: 02-19-2010, 06:50 AM
  2. getting no data
    By chief30 in forum OLD Newbie Area
    Replies: 4
    Last Post: 02-26-2009, 02:59 AM
  3. Data or IV still do nothing
    By Upsman in forum OLD BT3final Support
    Replies: 2
    Last Post: 11-01-2008, 10:59 PM
  4. robots.txt mining made easy
    By imported_spudgunman in forum OLD Tutorials and Guides
    Replies: 7
    Last Post: 07-14-2008, 11:10 AM
  5. No data from AP
    By Cookie Monster in forum OLD Newbie Area
    Replies: 1
    Last Post: 01-01-2008, 05:23 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •