Parsing

This page documents the objects and functions that in some way deal with parsing or otherwise manipulating text. Everything here follows the same conventions as the rest of the library.

[top]

base64



This object allows you to encode and decode data to and from the Base64 Content-Transfer-Encoding defined in section 6.8 of rfc2045.

C++ Example Programs: file_to_code_ex.cpp
More Details...
#include <dlib/base64.h>
[top]

basic_utf8_ifstream



This object represents an input file stream much like the normal std::ifstream except that it knows how to read UTF-8 data. So when you read characters out of this stream it will automatically convert them from the UTF-8 multibyte encoding into a fixed width wide character encoding.

There are also two typedefs of this object. The first is utf8_wifstream which is a typedef for wchar_t as the wide character to read into. The second is utf8_uifstream which uses unichar instead of wchar_t.


More Details...
#include <dlib/unicode.h>
[top]

cast_to_string



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::string strings. The types supported are any types that can be written to std::ostream via operator<<.
More Details...
#include <dlib/string.h>
[top]

cast_to_wstring



cast_to_string is a templated function which makes it easy to convert arbitrary objects to std::wstring strings. The types supported are any types that can be written to std::wostream via operator<<.
More Details...
#include <dlib/string.h>
[top]

cmd_line_parser



This object allows you to easily parse a command line. Note that the documentation for the cmd_line_parser_option (the object returned by the parser's .option() function) is in a separate file.

Note also that there are standard typedefs for the ASCII and wide character versions of the cmd_line_parser template. These are the command_line_parser and wcommand_line_parser types respectively.



C++ Example Programs: compress_stream_ex.cpp, train_object_detector.cpp
More Details...
#include <dlib/cmd_line_parser.h>

Extensions to cmd_line_parser

get_option

This extension provides a convenience function for accessing the options to a command line argument or a config_reader. It is automatically #included when using the command line parser or config reader.

More Details...
[top]

config_reader



This object represents something which is intended to be used to read text configuration files.

C++ Example Programs: config_reader_ex.cpp
More Details...
#include <dlib/config_reader.h>

Extensions to config_reader

config_reader_thread_safe

This object extends a normal config_reader by simply wrapping all its member functions inside mutex locks to make it safe to use in a threaded program.

More Details...
[top]

convert_utf8_to_utf32



This is a global function that can convert UTF-8 strings into strings of 32bit unichar characters.
More Details...
#include <dlib/unicode.h>
[top]

cpp_pretty_printer



This object represents an HTML pretty printer for C++ source code.
More Details...
#include <dlib/cpp_pretty_printer.h>


Implementations:
cpp_pretty_printer_kernel_1:
This is implemented by using the cpp_tokenizer object. This is the pretty printer I use on all the source in this library. It applies a color scheme, turns include directives such as #include "file.h" into links to file.h.html and puts HTML anchor points on function and class declarations. It also looks for comments starting with /*!A and puts an anchor before the comment using the word following the A as the name of the anchor.
kernel_1a
is a typedef for cpp_pretty_printer_kernel_1
cpp_pretty_printer_kernel_2:
This is implemented by using the cpp_tokenizer object. It applies a black and white color scheme suitable for printing on a black and white printer. It also places the document title prominently at the top of the pretty printed source file.
kernel_2a
is a typedef for cpp_pretty_printer_kernel_2
[top]

cpp_tokenizer



This object represents a simple tokenizer for C++ source code.
More Details...
#include <dlib/cpp_tokenizer.h>


Implementations:
cpp_tokenizer_kernel_1:
This is implemented by using the tokenizer object in the obvious way.
kernel_1a
is a typedef for cpp_tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

is_combining_char



This is a global function that can tell you if a character is a Unicode combining character or not.
More Details...
#include <dlib/unicode.h>
[top]

left_substr



This is a function to return the part of a string to the left of a user supplied delimiter.
More Details...
#include <dlib/string.h>
[top]

lpad



This is a function to pad whitespace (or user specified characters) onto the left most end of a string.
More Details...
#include <dlib/string.h>
[top]

ltrim



This is a function to remove the whitespace (or user specified characters) from the left most end of a string.
More Details...
#include <dlib/string.h>
[top]

narrow



This is a function for converting a string of type std::string or std::wstring to a plain std::string.
More Details...
#include <dlib/string.h>
[top]

pad



This is a function to pad whitespace (or user specified characters) onto the ends of a string.
More Details...
#include <dlib/string.h>
[top]

pad_int_with_zeros



Converts an integer into a string and pads it with leading zeros.
More Details...
#include <dlib/string.h>
[top]

right_substr



This is a function to return the part of a string to the right of a user supplied delimiter.
More Details...
#include <dlib/string.h>
[top]

rpad



This is a function to pad whitespace (or user specified characters) onto the right most end of a string.
More Details...
#include <dlib/string.h>
[top]

rtrim



This is a function to remove the whitespace (or user specified characters) from the right most end of a string.
More Details...
#include <dlib/string.h>
[top]

split



Breaks a string into a sequence of substrings delimited by a user specified set of characters.
More Details...
#include <dlib/string.h>
[top]

split_on_first



Breaks a string into two parts. The split point is selected based on the first occurrence of a delimiter character.
More Details...
#include <dlib/string.h>
[top]

split_on_last



Breaks a string into two parts. The split point is selected based on the last occurrence of a delimiter character.
More Details...
#include <dlib/string.h>
[top]

strings_equal_ignore_case



This is a pair of functions to do a case insensitive comparison between strings.
More Details...
#include <dlib/string.h>
[top]

string_assign



string_assign is an object which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects. Since string_assign is a simple stateless object there is a global instance of it called dlib::sa.

C++ Example Programs: config_reader_ex.cpp
More Details...
#include <dlib/string.h>
[top]

string_cast



string_cast is a templated function which makes it easy to convert strings to other types. The types supported are any types that can be read by the basic_istream operator>>. It also supports casting between wstring, string, and ustring objects.
More Details...
#include <dlib/string.h>
[top]

tokenizer



This object represents a simple tokenizer for textual data.
More Details...
#include <dlib/tokenizer.h>


Implementations:
tokenizer_kernel_1:
This is implemented in the obvious way.
kernel_1a
is a typedef for tokenizer_kernel_1
kernel_1a_c
is a typedef for kernel_1a that checks its preconditions.
[top]

tolower



This is a function to convert a string to all lowercase.
More Details...
#include <dlib/string.h>
[top]

toupper



This is a function to convert a string to all uppercase.
More Details...
#include <dlib/string.h>
[top]

trim



This is a function to remove the whitespace (or user specified characters) from the ends of a string.
More Details...
#include <dlib/string.h>
[top]

unichar



This is a typedef for an unsigned 32bit integer which we use to store Unicode values.
More Details...
#include <dlib/unicode.h>
[top]

ustring



This is a typedef for a std::basic_string<unichar>. That is, it is a typedef for a string object that stores unichar Unicode characters.
More Details...
#include <dlib/unicode.h>
[top]

wrap_string



wrap_string is a function that takes a string and breaks it into a number of lines of a given length. You can use this to make a string fit nicely into a command prompt window for example.
More Details...
#include <dlib/string.h>
[top]

xml_parser



This object represents a simple SAX style event driven XML parser. It takes its input from an input stream object and sends events to all registered document_handler and error_handler objects.

The xml_parser object also uses the interface classes document_handler and error_handler. Subclasses of these classes are passed to the xml_parser which generates events while it's parsing and sends them to the appropriate handler.

C++ Example Programs: xml_parser_ex.cpp
More Details...
#include <dlib/xml_parser.h>