Sunday, October 21, 2012

libtins v0.2 released

After coding and testing libtins a lot in the past months, we're proud to announce the release of the 0.2 version. libtins is a network packet crafting and sniffing library. It allows you to forge packets with very little effort, forgetting about each protocol data unit's endianness, internal representation, etc.

In this release, there have been several changes:

  • IP and hardware addresses can now be handled easily. Instead of using pointers or integral values to represent them, there's now a class which abstract each of them, making it easy to create them from their string representations, and compare them. You can now use hardware addresses as keys inside std::maps, or insert them in std::sets.
  • Added support for big endian architectures. We've worked hard to make sure every getter, setter and function available handles endianness correctly. Now you can create tools and run them on both little and big endian architectures, without worrying about it.
  • Generalized and simplified some interfaces. The Sniffer class required you to inherit a class from an AbstractSnifferHandler just to perform a call to Sniffer::sniff_loop. Now this function takes a template functor argument and calls it every time a new packet is sniffed off the wire, making your life a lot easier.
  • Network interfaces used to be handled internally by each PDU. Classes would usually take a std::string, look up the corresponding interface index and store it, and also provide overloads that took directly the integral index. Now there's a NetworkInterface class which does this job internally. So PDUs now take objects of this type rather than providing several overloads(which in cases like the Dot11 class hierarchy, reduces the boilerplate code significantly).
  • You can now follow TCP streams on the fly. There's a TCPStreamFollower class that sniffs packets(either from a network interface or a pcap file), and reassembles TCP streams, executing a callback whenever there's data available.
  • We're planning to allow decrypting any 802.11 encrypted data frame on the fly. In this release, by providing tuples (bssid, password), you can decrypt WEP-encrypted frames while sniffing, in a completely transparent way. I'll soon add an example in the libtins website on how to do that.
  • We've added support for some new PDUs: Null/Loopback, IEEE 802.3, LLC and DNS.
  • You can now read and write pcap files, using a very simple interface. 
  • Finally, there's been a huge refactoring on the entire code. Code has been RAII'd a lot. There are less pointers moving around, more automatic storage objects and references.
In case you want to try the library out, please visit its website and download the latest version.

Thursday, August 30, 2012

Compile time MD5 using constexpr

Today, someone on stackoverflow asked how to perform compile-time hashing in C++. After providing a small, naive, example in my answer, I thought it would actually be interesting to implement some well known hashing algorithm using C++11 features, such as constexpr. So I gave MD5 a try, and after some hours of trying stuff, I was able to create compile-time MD5 hashes.

The MD5 hashing algorithm.

Implementing the algorithm itself was not as hard as I thought it would be. The algorithm is roughly like this:
  1. It starts with 4 32-bits integers which contain a predefined value.
  2. The algorithm contains 4 "rounds". On each of them, a different function is applied to those integers and the input string.
  3. The result of those rounds serves as input to the next ones.
  4. Add the 4 resulting 32-bit integers with the original, predefined integer values.
  5. Interpret the resulting integers as a chunk of 16 bytes.
That's it. Note that I've somehow stripped down the algorithm to short strings. A real implementation would iterate several times doing the same stuff on the entire buffer. But since nobody is going to hash large strings on compile time, I've simplified the algorithm a little bit.

The implementation

So basically the implementation was not that hard, some template metaprogramming and several constexpr functions. Since there was a pattern on the way the arguments were provided as input to the functions in each round, I used a specialization which avoided lots of repeated code.

The worst part was generating the input for that algorithm. The input is not just the string which is to be hashed. The steps to generate that input is roughly as follows:
  1. Create a buffer of 64 bytes, filled with zeros.
  2. Copy the input string into the start of the buffer.
  3. Add a 0x80 character on buffer[size_of_input_string].
  4. On buffer[56], the value sizeof_input_string * 8 must be stored.
That's all. The algorithm will only work with strings of no more than 31 bytes. It could be generalized by modifying the appropriate bytes on buffer[57:60], but that was not my objective.

In order to achieve this buffer initialization, I had to somehow decompose the input string into characters, and then join them and those special bytes, into an array which could be used during compile time. In order to achieve this, I implemented a constexpr_array, which is just a wrapper to a built-in array, but provides a data() constexpr member, something that std::array does not have(I'm not really sure why):
template<typename T, size_t n>
struct constexpr_array {
    const T array[n];
    constexpr const T *data() const {
        return array;
The decomposition ended up pretty simple, but I had to struggle to figure out how the hell to implement it.

Finally, the interface for the compile time MD5 would be the following:
template<size_t n> constexpr md5_type md5(const char (&data)[n]) 

The typedef for the function's result is the following:
typedef std::array<char, 16> md5_type;

Note that everything in the implementation resides in namespace ConstexprHashes.

As an example, this code generates a hash and prints it:
#include <iostream>
#include "md5.h"

int main() {
    constexpr auto value = ConstexprHashes::md5("constexpr rulz");
    std::cout << std::hex;
    for(auto v : value) {
        if(((size_t)v & 0xff) < 0x10)
            std::cout << '0';
        std::cout << ((size_t)v & 0xff);
    std::cout << std::endl;

This prints out: "b8b4e2be16d2b11a5902b80f9c0fe6d6", which is the right hash for "constexpr rulz".

Unluckily, the only compiler that is able to compile this is clang 3.1(I haven't tested it on newer versions). GCC doesn't like the code, and believes some constexpr stuff is actually non-constexpr. I'm not 100% sure that clang is the one that's right, but it looks like it is.

You can find the MD5 implementation here. Maybe I'll implement SHA1 soon, whenever I have some time.

Well, this was the first time I used constexpr, it was a nice experience. Implementing this using only TMP would be an extreme pain in the ass.

Wednesday, August 22, 2012

small integer C++ wrapper

Currently, I'm working on libtins, a library I'm developing with a friend. This library mainly contains classes that abstract PDUs, among other stuff.

Since PDU classes are basically a wrapper over the protocol data units handled by the operating system, getters and setters are provided to enable the user to modify the fields in them. For example, there is a TCP::data_offset method which takes a value of type uint8_t and stores it in the TCP header's data offset field, which is 4 bit wide.

While developing test units for this library, I would use some random number to initialize those fields, and then use the corresponding getter to check that the same number came out of it. A problem that I faced several times is that, there is no way to indicate that, while a setter method takes an uint8_t, the field being set internally is actually 4 bits wide, so any value larger than 15 would overflow, leading to the wrong number being stored. We really want to be able to detect those ugly errors.

One solution would be to, on each setter, check whether the provided value is larger to 2 ** n - 1, where n is the width of the field being set, and raise an exception if this condition is true. This has the drawback that every setter should make the appropriate check, using the appropriate power of 2, and throwing the same exception on each of them. This boilerplate solution already looks nasty.

So I came with a better solution. C++'s implicit conversions can do magic for us. All we need is a class that wraps a value of an arbitrary bit width(up to 64 bits) that performs the corresponding checks while being initialized.

The wrapper class is called small_uint(I thought about providing support for signed integral types, but finally dropped that option. It'd be easy to implement though). The class declaration is this one:

template<size_t n> class small_uint;

The template non-type parameter n indicates the length, in bits, of the field. This class should be optimal, meaning that it should use to smallest integral type that can hold a value of that width.

Internally, a compile time switch is performed to check which is the best underlying type, meaning, the type in which storing the field wastes less space. For example, for 7 bit small_uints, a uint8_t is used, while 11 bit fields will be stored in a member variable of type uint16_t. This underlying type is typedef'ed as the repr_type.

There is a constructor which takes a repr_type and raises an exception if it is larger than the 2 ** n - 1, and a user-defined conversion to repr_type which simply returns the stored value. Since no arithmetic operations are performed with these integers, there are no such operators defined. It don't really know if defining them would make sense. If you wanted to perform such operations, you would just use standard arithmetic types. Only operator== and operator!= have been defined.

I haven't used this class in libtins yet, but setters would probably look like this:

void TCP::data_offset(small_uint<4> value) {
     tcp_header_.data_offset = value;

That way, a nice and clean solution can be achieved, avoiding boiler plate code. Note that internally, C++11 features could make a lot of things easier(such as std::numeric_limits<>::max() being constexpr, std::conditional, and constexpr functions), but I wanted to use only C++03 stuff, since the library is intended to work with the latter standard.

The small_uint implementation can be found here.

Monday, July 2, 2012

Python style range loops in C++

Python is a nice scripting language which has some really flexible characteristics. One thing I like about it is the integer range for-loops that can be achieved using the built-in function range:
for i in range(10):
    # Insert clever statement below 
    print i 
Doing that same thing in C++ would require some for loop like the one below:
for(size_t i(0); i < 10; ++i)
    std::cout << i << std::endl;
Which is larger and less clearer(well, not that much :D). So I created a simple wrapper to achieve the same thing in C++, using C++11's range-based for loops. The wrapper can be found here.

In order to use ranges, a function named range should be used, which contains these overloads:
// Returns a range_wrapper<T> containing the range [start, end)
template<class T>
range_wrapper<T> range(const T &start, const T &end) {
    return {start, end};

// Returns a range_wrapper<T> containing the range [T(), end)
template<class T>
range_wrapper<T> range(const T &end) {
    return {T(), end};

range_wrapper<> is a simple wrapper that defines begin() and end(), both return a range_iterator, the former one pointing to the beginning of the range and the latter to the end. The range_iterator<> class template is just a wrapper over the template parameter, and defines all of the forward iterator required member functions/typedefs. The prefix/suffix increment operators apply the same operator on the wrapped object, while the dereference operator returns it.

Since range-based for loops require the iterated sequence to define begin() and end() or that the global std::begin()/end() functions are defined for the given type, using the range_wrapper<> class template in these for loops is perfectly valid.

Using this wrapper, that code can be reduced to this:
// Prints numbers in range [0, 10) 
for(auto item : range(10))
    std::cout << item << std::endl;

Using the first overload, which takes the start and end of the range, we can indicate the starting number.
// Prints numbers in range [5, 15) 
for(auto item : range(5, 15))
    std::cout << item << std::endl;
Note that range is a template function, so it could be adapted to perform range iteration through other types(std::string comes to my mind right now).

This same thing can be achieved using boost::irange, but hey, I was bored and wanted to implement it myself.

Friday, April 6, 2012

Python wrapper in C++11

This is a post about a wrapper for python scripts I developed using C++11. This was the first time I used variadic templates, and i must say it's an amazing feature in this new C++ standard! It's great to have a type-safe way to use variable arguments.

The whole wrapper is inside the Python namespace. Before you start using anything inside it, you should call Python::initialize(), which initializes the Python API.

The Python::Object class provides an abstraction of a python object. Python::Objects wrap a PyObject*(which is the abstraction of a Python object provided by its API), inside a std::shared_ptr. On their destructor, a call to Py_DECREF is performed, so the underlying PyObject* will get free'd appropriately.

In order to load a python script, a static method Python::Object::from_script(const std::string &) should be called, which returns a Python::Object that represents that script. The name of the script(with or without the ".py" extension) should be passed as argument:

#include "pywrapper.h"

/* ... */

Python::Object script = Python::Object::from_script(""); //could be test also

After that sentence, the "script" variable will have loaded the "" script, located in the current working directory.

Now you can call functions defined in that script using the Python::Object::call_function method, which has this signatures:


// Variadic template arguments version
template<typename... Args>
Python::Object call_function(const std::string &name, const Args... &args);
// No arguments version
Python::Object call_function(const std::string &name);

This method takes the name of the function as the first argument, followed by 0 or more arguments. These arguments will be implicitly converted to PyObject pointers, which can be used as arguments using the Python API. So far, you can use arguments of these types:
  • std::string
  • const char *
  • Any integral type(for which std::is_integral is true).
  • bool
  • double
  • std::vector
  • std::list
  • std::map
Both std::vector and std::list will be converted to a Python list, with the exception of std::vector<char>, which will be converted to a bytearray. The std::map objects will be converted to Python dicts.

The Script::call_function method returns the Python return value wrapped in a Python::Object. Note that the objects stored in the std::vectors and std::lists must also be convertible to PyObject pointers(must be listed above).

In case you want to use the return value, you might want to use Python::Object::convert which will convert the wrapped PyObject* into one of the same C++ types mentioned above, and also std::tuple.

As an example, i used this python script:


def foo(a, b, c, d, e):
    print '{0} - {1} - {2} - {3} - {4}'.format(
        a, str(b), str(c), repr(d), repr(e)

def int_fun():
    return 12

def list_fun():
    return [1,2,3,4,561,2]
def dict_fun():
    return {
        'bar' : 1,
        'foo' : 15

def tuple_fun():
    return (1, 'foo', 15.5)

def bool_fun():
    return False

x = 1598

It's just a bunch of functions that take/return different types of arguments. My C++ code that calls these functions is this one:

#include <string>
#include <vector>
#include <iostream>
#include <stdexcept>
#include <iomanip>
#include <map>
#include <tuple>
#include "pywrapper.h"

int main() {
    Python::Object script(Python::Object::from_script(""));
    std::vector<int> v({2,6,5});
    std::map<std::string, int> dict({
        {"bleh", 1},
        {"foofoo", 10}
    std::cout << "Calling foo:\n";
    script.call_function("foo", "a string", true, 10, v, dict);
    // Int test
    std::cout << "Calling int_fun:\n";
    Python::Object ptr = script.call_function("int_fun");
    int num;
        std::cout << "Result: " << num << '\n';
        std::cout << "Long conversion failed\n";
    // List test
    std::vector<int> lst;
    std::cout << "Calling list_fun:\n";
    ptr = script.call_function("list_fun");
    if(ptr.convert(lst)) {
        std::cout << "List size: " << lst.size() << '\n';
        for(auto it(lst.begin()); it != lst.end(); ++it)
            std::cout << *it << " ";
        std::cout << '\n';
        std::cout << "List conversion failed\n";
    // Dict test
    std::map<std::string, int> mp;
    std::cout << "Calling dict_fun:\n";
    ptr = script.call_function("dict_fun");
    if(ptr.convert(mp)) {
        std::cout << "Map size: " << mp.size() << '\n';
        for(auto it(mp.begin()); it != mp.end(); ++it)
            std::cout << it->first << " -> " << it->second << '\n';
        std::cout << "Map conversion failed\n";
    // Tuple test
    std::cout << "Calling tuple_fun:\n";
    ptr = script.call_function("tuple_fun");
    std::tuple<int, std::string, double> tup;
    if(ptr.convert(tup)) {
        std::cout << std::get<0>(tup) << "\n";
        std::cout << std::get<1>(tup) << "\n";
        std::cout << std::get<2>(tup) << "\n";
        std::cout << "Tuple conversion failed\n";
    bool bool_val;
    std::cout << "Calling bool_fun:\n";
    ptr = script.call_function("bool_fun");
        std::cout << "Result: " << std::boolalpha << bool_val << '\n';
        std::cout << "Long conversion failed\n";
    // Get attr test
    std::cout << "Retrieving 'x' variable:\n";
    try {
        ptr = script.get_attr("x");
            std::cout << "X == " << num << '\n';
    } catch(std::runtime_error &ex) {
        std::cout << ex.what() << '\n';

In the last code senteces, the Python::Object::get_attr method is used, which returns a Python::Object containing the contents of the script attribute which has this method argument's name. After executing this application, this output is produced:

Calling foo:
a string - True - 10 - [2, 6, 5] - {'bleh': 1, 'foofoo': 10}
Calling int_fun:
Result: 12
Calling list_fun:
List size: 6
1 2 3 4 561 2
Calling dict_fun:
Map size: 2
bar -> 1
foo -> 15
Calling tuple_fun:
Calling bool_fun:
Result: false
Retrieving 'x' variable:
X == 1598

As you can see, this class allows a type-safe variable argument interface for calling Python functions and retrieving defined attributes in scripts.

You can get the header and source file here.

In order to compile this application, using gcc, remember to use the -std=c++0x and -lpython2.7 arguments.

I hope you find this wrapper useful!

Saturday, March 3, 2012

Configuration file parser - C++

I'm working on a project developed in C++, which can be configured using several parameters on runtime. Since there were lots of options, i decided to include a configuration file in which the user could assign a value to each defined attribute. This is an example of the file's structure:


# Default yasps configuration
# By default the server is bound to
# --------------------------------------------------
# Server configuration

# Authentication configuration
# Unauthenticated connections are allowed by default

As you can see, the options require different data types. Therefore, i needed a generic algorithm that could parse the file and interpret the given values as strings, integer, bools or whatever data type i indicated, and assign them to the corresponding attribute. To achieve this, i created a small class using template parameters.

The ConfigurationParser is extremely simple to use. There is one method which adds an attribute and associates it with a pointer. Whenever that attribute name is found on the file, the parser will try to interpret the given value and store it in that pointer. The method has the following signature:

template<class T>

void add_option(const std::string &name, T *value_ptr);  

The only constraint imposed on the type T is that the input operator(operator>>) is defined. As long as you use either primitive types or std::string(s), you don't have to implement anything else. In case you have created a certain class that can be deserialized, you would have to implement this operator.

Once every attribute has been set, you have to call the ConfigurationParser::parse method, using the configuration file name as the argument. This is the signature of this method:

void parse(const std::string &file_name);

This method can raise different exceptions, depending on what problem was encountered:
  • std::ios_base::failure if an error occurred when opening the file.
  • ConfigurationParser::NoValueGivenError if no value was set for an attribute that appeared on the configuration file. e.g. bleh= . There should be some value after the '=' character.
  • ConfigurationParser::InvalidValueError if there was a data type missmatch when trying to interpret an attribute's value. This can happen if, for example, an attribute expects an integer value, however, a string value is given.
  • ConfigurationParser::InvalidOptionError is raised if an attribute which was not registered using ConfigurationParser::add_option appeared in the configuration file.
No that the ConfigurationParser class is included inside the CPPUtils namespace. Finally, here is an example, taken from the project i'm working on:

#include <iostream>
#include <string>
#include "configparser.h" 

using CPPUtils::ConfigurationParser;

class Configuration {
    void load(const std::string &file_name) {
        ConfigurationParser parser;
        parser.add_option("username", &config_username);
        parser.add_option("password", &config_password);
        parser.add_option("bind_address", &address);
        parser.add_option("log_file", &log_filename);
        parser.add_option("port", &port);
        parser.add_option("noauth", &config_allow_no_auth);
        parser.add_option("enable_logging", &config_enable_logging);
        try {
        catch(std::ios_base::failure &ex) {
            std::cerr << "[ERROR] Error opening " << file_name << "(" << ex.what() << ").\n";
        catch(ConfigurationParser::NoValueGivenError &ex) {
            std::cerr << "[ERROR] Parse error: No value give for " << ex.what() << " attribute.\n";
        catch(ConfigurationParser::InvalidValueError &ex) {
            std::cerr << "[ERROR] Parse error: Invalid value for attribute " << ex.what() << "\n";
        catch(ConfigurationParser::InvalidOptionError &ex) {
            std::cerr << "[ERROR] Parse error: Could not find a valid attribute in \"" << ex.what() << "\"\n";;
    std::string config_username, config_password, address, log_filename;
    short port;
    bool config_allow_no_auth, config_enable_logging;

That's all. You can download the header file here, the source file here and the only header dependency, exception.h. The class is licensed under GPLv3, so feel free to use it.

Thursday, February 9, 2012

Linux terminal keylogger in userspace


Sometimes, during a pentest, you have access to a certain system user's password, can actually successfully login on the system(using ssh for example), but cannot gain root privileges. This can be caused due to the fact that the user is not a sudoer, nor is sudo installed on the system, and of course, that the password is not the same as the root user.

However, this user might actually use "su" to become the superuser. In this case, we have access to his settings and configuration files, located in his home directory, which can be modified and used to obtain the root's password.

There are several environment variables that libc uses to modify an application's behaviour when launching it. Among many, there is one that allows the user to load one or more arbitrary shared objects right before launching a certain application. This environment variable is named LD_PRELOAD.

The LD_PRELOAD environment variable

Using LD_PRELOAD, a user can launch an application and force it to load any shared object he wants. So how can we take advantage of LD_PRELOAD? On GNU/Linux, programs are usually compiled dynamically, allowing the symbols used in it to be loaded when the application is launched. When the application is executed, the shared objects that contain the symbols used are loaded, and those symbol's addresses are resolved. Every reference to the loaded symbols will use those addresses.

Fortunately for us, this symbol loading is done after the shared objects pointed by LD_PRELOAD are loaded. Therefore, if we compile a shared object which contains a function that has the same signature(name + return type + arguments) as one that another application expects to resolve, it will use ours instead of the "original" one, allowing us to execute arbitrary code.

Okay, so now we can inject arbitrary code, now how can we use this technique? We could hook the read system call, then wait for the user to launch su. Our object would be loaded and we would be able to read the data typed by the user, in which we will eventually find the root's password.

However, we cannot do this, since su is a suid file. The libc does not allow us to execute an application that has the suid bit set, and instruct it to load our library on startup. If this was possible, we could execute ping, for example, make our shared object load and drop a root shell right away. Therefore LD_PRELOAD + suid binary is not a possibility.

Anyhow, we can use another approach. We can hook the execve syscall, which is used by bash to execute commands, and modify its behaviour. This way we could execute whatever we want, no matter what bash is trying to execute.

The actual keylogger

So what i did was to instead of trying to hook the read syscall and log every byte read by the application, was to hook execve and fork() right before calling the original execve and log the data that is written to stdin(which should be read by the execve'd application) into a file. This way, no matter what was being executed, suid files or regular, we can still read stdin, since we are not hooking the syscall on the executed application, but on bash. My execve is basically a proxy that logs everything that is written to stdin before it is sent to the child process.

Finally, we can read every byte written by the user, but we require bash to load our shared object using the LD_PRELOAD environment variable. We can achieve this by editing(or creating) the ~/.pam_environment file and inserting this line into it:
In my case, i have inserted the line:
After a restart(or logout performed by the user) that environment variable will be set, loading our shared object everytime bash starts.

Note that we can't edit the .bashrc file, since this file is interpreted by bash after the process is created. What we require is a way of editting the LD_PRELOAD environment variable before bash is started, which can be achieved through ~/.pam_environment.

The source code of this PoC is available here Download both keylogger.c and Makefile. In order to compile, just execute make, will be the object generated.

This is a screenshot of the keylogger, storing the password and commands typed by the user when executing su:

By default the log file is located in /tmp/output. If you want to change it, just edit the sourcecode and edit the OUTPUT_FILE define, or add a CFLAG to the Makefile:
CFLAGS=-c -Wall -O3 -fPIC -DOUTPUT_FILE=/home/somebody/blah.log
There is an array named injected_files which contains a list of all the files that will be "logged" when executed. This is used because there are lots of applications that we don't want to log. This array contains /bin/su and /usr/bin/ssh. Feel free to add any other application that you want to keylog.

Finally, there's a code snippet at the end of the file which removes the value of the LD_PRELOAD variable right after the shared object is loaded. This is just a mechanism to hide our keylogger. Otherwise, if the user executed "echo $LD_PRELOAD" while using bash, he would see the location of our keylogger as the output. Note that by removing the LD_PRELOAD variable, child processes will not load our shared object, therefore in a GUI environment, where the first application that is executed is gnome-terminal or other graphical terminal, and this application is the one that executes bash, the shared object will be loaded by gnome-terminal, but won't be loaded by bash. Therefore, if you plan on using this keylogger on a GUI environment, remove the body of the "void init(void)" function, located at the end of keylogger.c.

Hope you find it useful!