Tuesday, September 27, 2011

Hound: Website crawler

Hound is a website crawler i developed a couple of months ago. Today i'm releasing the 0.11 version, which includes some bug fixes and new features.

The crawler starts by crawling a given base URL. It then analyses its html code and searches for other URLs which will be collected and will be enqueued for analysis.

The crawler's behaviour is based on plugins. Different kinds of plugins affect it in a different way. One can, for example, activate certain Filter Plugins which will restrict the URLs the crawler will visit, based on each plugin's behaviour. It could be undesirable, under most circumstances, to allow Hound to visit google, facebook, or youtube. This is why a HostFilter can be used, making the crawler only visit URLs that belong to the base host.

There are different types of plugins, each executed on a certain phase of the crawling session. These can be:
  • Parsers: These are applied to downloaded html in order to normalise it so that the other plugins are able to collect data without worrying about aspects such as which encoding is used, html entities.
  • Crawl filters: These are applied to every URL found. If a crawl filter matches a certain URL, then the latter is discarded. Host, extension and network filters are examples of them.
  • Collect filters: These filters are applied to collected URLs, so that they are not taken into account in the crawling results. Again, you might not want to include google links in the results.
  • Form collect filters: These filters are applied to form tags found in html files.
  • Header filters: These are applied to a downloaded file's headers. They are usually used when filtering mime-types, for example.
  • Collectors: The most important plugins. These take care of analysing the html files, parsing href or src attributes, among others, and feeding the crawler with new URLs.
The file hound.conf contains a list of active plugins, including their arguments. Once you've picked the right configuration, you can start a crawling session by executing:
./hound http://website.to.crawl.com
 The output will be only be written to stdout. If you want to store it in a file, you can do it by using the -o parameter, followed by its path. This will write the results both to stdout and to the given file. If you don't want to write results to stdout, use the -n parameter.

Once the crawl session has ended, you can use hound to parse the results. Run the following command to list the URLs found:
./hound -i /tmp/hound.out -p urls
Where /tmp/hound.out is the output file used during the crawling session. You can always parse the results manually, since they're stored in text files. To list all the form tags found, execute:
./hound -i /tmp/hound.out -p forms
Which will print something like:
0 POST http://blablabla/search cms_search --- hidden +++ query --- text +++ commit --- image
1 POST http://blablabla/contact/send article_id --- hidden +++ subject --- text +++ sender_name --- text +++ sender_mail --- text +++ reset --- reset
The number on the left of each line identifies each form. This is the way hound uses to encode forms. To generate the html code for a given form id, run:
./hound -i /tmp/hound.out -p form:0
Which will print the html code for the first form.

In order to download hound, you can visit the sourceforge project's site. It is open source, developed using python so you can have a look at the code and create new plugins to serve your needs.

Thursday, September 8, 2011

Netstat shellscript

This is a shellscript i coded a couple of month ago, after i found my router didn't have this utility, and wanted to check its active connections.

It only displays TCP connections, printing the source and destination IP address and port of each of them. The script requires the sh shell interpreter, making it possible to use it in systems which don't have other interpreters like bash, which provides several features which would make the script simpler.

This is the script:

#!/bin/sh parse_num() {     x=$(echo $1 | sed -n 's/0*//p')     if [ $(echo $x | wc -c) -eq 1 ]     then         x=0     fi     echo $x } hex_to_ip() {     index=7     output=''     while [ $index -gt 0 ]     do         end=$(expr $index + 1)         value=$(printf "%d" "0x$(parse_num $(echo $1 | cut -b $index-$end))")         output="$output.$value"         index=$(expr $index - 2)     done     echo $(echo $output | cut -b 2-) } printf "         Src IP  Src port          Dst IP  Dst port\n" cat /proc/net/tcp | while read line; do     srcip=$(hex_to_ip $(echo $line | sed -n 's/^[0-9]*: //p' | sed 's/:.*//p'))     srcport=$(printf "%d" 0x$(parse_num $(echo $line | sed  -n 's/^ *[0-9]*: [0-9,A-F]*://p' | cut "-d " -f 1)))     dstip=$(hex_to_ip $(echo $line | sed -n 's/^[0-9]*: [0-9,A-F]*:[0-9,A-F]* //p' | sed -n 's/:.*//p'))     dstport=$(printf "%d" 0x$(parse_num $(echo $line | sed -n 's/^ *[0-9]*: [0-9,A-F]*:[0-9,A-F]* [0-9,A-F]*://p' | cut "-d " -f1)))         printf "%15s %9s %15s %9s\n" $srcip $srcport $dstip $dstport done

An output example:

Monday, September 5, 2011

Simple socks5 server in C++

This is a socks5 proxy server i implemented a few months ago. It's developed in C++, and as far as i have tested, works pretty well. There are a couple of things left to do, like handling domain name connection requests, or do a better thread handling, but it has served its purpose so far.

In order to compile it, you can use the GNU C++ compiler, and linking the application with libpthread:
g++ -o socks5 socks5.cpp -lpthread
By default it only accepts authenticated connections, using the USERNAME define as username, and the PASSWORD define as password. If you want to allow unauthenticated connection requests, add a -DALLOW_NO_AUTH flag when compiling.

The proxy server listens on port 5555, you might as well want to change the SERVER_PORT define if you want it to wait for connections on another port.

The source code can be downloaded from github: https://github.com/mfontanini/Programs-Scripts/blob/master/socks5/socks5.cpp.

Hope you find it useful!

Saturday, September 3, 2011

ARP spoofing using libtins

This is an example program I created to test libtins, a library I've been developing with some colleagues. This library allows the user to forge packets, from link layer to transport or even application layer, in C++ by creating their own PDU stack and sending them without worrying about raw sockets, endianness, nor low level socket handling.

To use this program, compile it and link it with libtins. Using GNU C++ compiler, this could be done this way:
g++ -o arpspoofing arpspoofing.cpp -ltins

And then execute it using the gateway and victim's IP addresses as arguments, for example:
This code snippet is included as an example in libtins source code, inside the examples folder. You can have a look at it online here.

Friday, September 2, 2011

Password combination generator

After several months without mounting an encrypted filesystem, i found out i had forgotten its passphrase. However, i remembered the words i had used, but not the case sensitivity of each character nor the characters i'd replaced for numbers or symbols('o' for '0', 's' for '5', etc). Moreover, i didn't remember which symbol i'd used to separate these words(i could have used '_', '!', '#', etc..).

So after spending half an hour trying out every combination of upper and lower case characters, digits and symbols, i came out with a script to do this automatically.

This script expects several words in lower case as arguments, printing them in the same order, but modifying their case, and transforming characters to numbers or symbols, using a conversion map. There's also a '-e' parameter which allows the user to directly execute a certain command for each combination. The command to execute must contain the string "{0}", which will indicate where each combination will be replaced.

For example, in my quest to mount my encrypted file, i used:
./dictionary.py -e "truecrypt -p {0} --non-interactive encrypted.file" one two three
Where "one two three" are the words which will be used the perform character combinations.

This is the script:

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.

import sys, os

class Word:
    # Dictionary used for conversions between characters, other than
    # simple lower-to-upper conversions. Add here as many as you want,
    # as long as you don't create a cycle ;).
    leet_map = {'A' : '4',
                'E' : '3',
                'I' : '1',
                'O' : '0',
                '1' : '!',
                'S' : '5',
                '5' : '$'

    # The characters to append after each word. Only one of these will
    # be appended at a time.
    appended_list = [' ', '!', '_']
    # Appended characters to avoid if this Word is last in the sequence.
    # By default, no spaces will be appended at the end of the phrase,
    # however, they will be included in the middle of it.
    appended_list_avoid = [' ']
    def __init__(self, base, is_last = False):
        self.base = list(base)
        self.current = list(base)
        self.current_index = 0
        self.is_last = is_last
        self.appended = ''
        self.done = False

    # Increment a particular character. Add any conversion rules in here.
    def _next_char(self, char):
        if char.isalpha():
            if char.islower():
                return char.upper()
                if char in Word.leet_map:
                    return Word.leet_map[char]
                return char
            return char if not char in Word.leet_map else Word.leet_map[char]

    # Increment the word. Should only be called if has_next returns True.
    def next(self):
        this_char = self._next_char(self.current[self.current_index])
        if this_char == self.current[self.current_index]:
            # No more conversions for this char, reset previous ones.
            for i in range(self.current_index + 1):
                self.current[i] = self.base[i]
            self.current_index += 1
            if self.current_index < len(self.base):
                self.current_index = 0
                # Find the current appended character
                    index = Word.appended_list.index(self.appended)
                    index = -1
                if index == len(Word.appended_list) - 1:
                    # Appended char cannot be incremented. We're done.
                    self.done = True
                        if self.is_last:
                            while Word.appended_list[index+1] in Word.appended_list_avoid:
                                index += 1
                        # Increment the appended character.
                        self.appended = Word.appended_list[index+1]
                        self.current_index = 0
                        self.done = True
            self.current[self.current_index] = this_char

    # Returns boolean indicating whether this Word can be incremented.
    def has_next(self):
        return not self.done

    # Returns the current string.
    def get_current(self):
        return ''.join(self.current) + self.appended

    # Resets every field in this Word.
    def reset(self):
        self.current = list(self.base)
        self.current_index = 0
        self.appended = ''
        self.done = False

class Wordlist:
    def __init__(self, words):
        self.words = []
        for i in words[:-1]:
        self.words.append(Word(words[-1], True))

    # Increment the words one step.
    def _inc(self, index):
        # No words to increment left
        if index == len(self.words):
            return index
        if self.words[index].has_next():
            if not self.words[index].has_next():
                # We've got carry. Reset words[0:index], 
                # then increment words[index+1] and propagate.
                for i in range(index+1):
                return self._inc(index+1)
                return index
            return index + 1

    def do_action(self, to_exec):
        line = ''.join(map(lambda x: ''.join(x.get_current()), self.words))
        if len(to_exec) == 0:
            print line
            os.system(to_exec.format('"' + line + '"'))

    def generate(self, to_exec):
        i = 0
        while i < len(self.words):
            i = self._inc(0)

def usage():
    print ' Usage: ' + sys.argv[0] + ' [-e EXEC] <WORD1> [WORD2] [WORD3]\n'
    print ' If -e option is used, then the next parameter is the command to execute'
    print ' for each word combination. The command must contain {0} where each '
    print ' combination will be included. Example: "echo {0}"\n'
    print ' If no command is given, then each permutation will be printed to stdout'

if __name__ == '__main__':
    if len(sys.argv) == 1 or '-h' in sys.argv:

    args = sys.argv[1:]
    to_exec = ''

    if args[0] == '-e':
        if len(args) <= 2:
        to_exec = args[1]
        args = args[2:]

    words = Wordlist(args)

Hope you find it useful!