Sunday 10 April 2011

Elegant ways to tokenize strings


Last night I posted a solution at stackoverflow. The question was : what is the right way to split a string into a vector of strings. Delimiter is space or comma.

This is what I first came up with, for space separated string:


#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>

int main() 
{
        std::string s = "What is the right way to split a string into a vector of strings";
        std::stringstream ss(s);
        std::istream_iterator<std::string> begin(ss);
        std::istream_iterator<std::string> end;
        std::vector<std::string> vstrings(begin, end);
        std::copy(vstrings.begin(), vstrings.end(), 
                  std::ostream_iterator<std::string>(std::cout, "\n"));
        return 0;
}

Output at ideone.

And then I came up with this elegant solution when string have both comma and space:

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <locale>
#include <cstring>

struct tokens : std::ctype<char>
{
    tokens(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();

        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));

        rc[','] = std::ctype_base::space; 
        rc[' '] = std::ctype_base::space; 
        return &rc[0];
    }
};

int main() 
{
        std::string s = "right way, wrong way, correct way";
        std::stringstream ss(s);
        ss.imbue(std::locale(std::locale(), new tokens()));
        std::istream_iterator<std::string> begin(ss);
        std::istream_iterator<std::string> end;
        std::vector<std::string> vstrings(begin, end);
        std::copy(vstrings.begin(), vstrings.end(), 
                  std::ostream_iterator<std::string>(std::cout, "\n"));
        return 0;
}

Output at ideone.


No comments:

Post a Comment