How to Contribute

Contributing Money

The simplest way to contribute is to donate money via Zelle or PayPal to davis@dlib.net. Any amount is appreciated :)

Contributing Code

Code contributions are welcome and can be done by submitting a pull request to dlib's github page.

If you want to make a big change or feature addition, it's probably a good idea to talk to me about it first. Additionally, you should read over the coding guidelines below and try to follow them. It is also probably a good idea to read the books Effective C++ and More Effective C++ by Scott Meyers.

Coding Guidelines

1. Use Design by Contract
2. Use spaces instead of tabs.
3. Use the standard C++ naming convention
4. Use RAII
5. Don't use pointers
6. Don't use #define for constants.
7. Don't use stack based arrays.
8. Use exceptions, but don't abuse them
9. Write portable code
10. Setup regression tests
11. Use the Boost Software License
  • Apply Design by Contract to Your Code

      The most important part of a software library isn't the code, it is the set of interfaces the library exposes to the user. These interfaces need to be easy to use right, and hard to use wrong. The only way this happens is if the interfaces are documented in a simple, consistent, and precise way.

      The name for the way I design and document these interfaces is known as Design by Contract. There is a lot that can be said about Design by Contract, in fact, whole books have been written about it, and programming languages exist which use Design by Contract as a central element. Here I will just go over some of the basic ways it is used in dlib as well as some of the reasons why it is a Good Thing.

    • Functions should have documented preconditions which are programmatically verifiable

        Many functions have a set of requirements or preconditions that need to be satisfied if they are to be used. If these requirements are not satisfied when a function is called then the function will not do what it is supposed to do. Moreover, any piece of software that calls a function but doesn't make sure all preconditions are satisfied contains a bug, by definition.

        This means all functions must precisely document their preconditions if they are to be usable. In fact, all preconditions should be programmatically verifiable. Doing this has a number of benefits. First, it means they are unambiguous. English can be confusing and vague, but saying "some_predicate == true" uses a formal language, C++, that we all should understand quite well. Second, it means you can put checks into the code that will catch all usage errors.

        These checks should always be implemented using DLIB_ASSERT or DLIB_CASSERT and they should always cover all preconditions. These macros take a boolean argument and if it is false they throw dlib::fatal_error. So you can use them to check that all your preconditions are true. Also, don't forget that a violated function precondition indicates a bug in a program. That is, when dlib::fatal_error is thrown it means a bug has been found and the only thing an application can do at that point is print an error message and terminate. In fact, dlib::fatal_error has checks in it to make sure someone doesn't catch the exception and ignore it. These checks will abruptly terminate any program that attempts to ignore fatal errors.

        The above considerations bring me to my next bit of advice. Developers new to Design by Contract often confuse input validation requirements with function preconditions. When I tell them to consider any violation of a function's preconditions a bug and terminate their application with an error message they complain that this is not at all what an application should do when it receives invalid user inputs. They are right, that would be a bad thing and you should not write software that behaves that way. The way out of this problem is, of course, to not consider invalid input as a bug. Instead, you should perform explicit input validation on any data coming into your program before it gets to any functions that have preconditions which demand the validated inputs. Moreover, if you make your preconditions programmatically verifiable then it should be easy to validate any inputs by simply using whatever it is you use to check your preconditions.

        Consider the function cross_validate_trainer as an example. One of its requirements is that the input forms a valid binary classification problem. This is documented in the list of preconditions as "is_binary_classification_problem(x,y) == true". This precondition is just saying that when you call the is_binary_classification_problem function on the x and y inputs it had better return true if you want to use those inputs with the cross_validate_trainer function. Given this information it is trivial to perform input validation. All you have to do is call is_binary_classification_problem on your input data and you are done.

        Using the above technique you have validated your inputs, documented your preconditions, and are buffered by DLIB_ASSERT statements that will catch you if you accidentally forget to validate any inputs.

        The thing to understand here is that a violation of a function's preconditions means you have a bug on your hands. Or in other words, you should never intentionally violate any function preconditions. But of course it will happen from time to time because bugs are unavoidable. But at least with this approach you will get a detailed error message early in development rather than a mysterious segmentation fault days or weeks later.

    • Functions should have documented postconditions

        I don't have nearly as much to say about postconditions as I did about function requirements. You should strive to write programmatically verifiable postconditions because that makes your postconditions more precise. However, it is sometimes the case that this isn't practical and that is fine. But whatever you do write needs to clearly communicate to the user what it is your function does.

    • Now you may be wondering why this is called Design by Contract and not Documentation by Contract. The reason is that the process of writing down all these detailed descriptions of what your code does becomes part of how you design software. For example, often you will find that when you go to write down the requirements for calling a function you are unable to do so. This may be because the requirements are so complex you can't think of a way to describe them, or you may realize that you don't even know what they are. Alternatively, you may know what they are but discover there isn't any way to verify them programmatically. All these things are symptoms of a bad design and the reason you became aware of this design problem was by attempting to apply Design by Contract.

      After you get enough practice with this way of writing software you begin to think a lot more about questions like "how can I design this class such that every member function has a very simple set of requirements and postconditions?" Once you start doing this you are well on your way to creating software components that are easy to use right, and hard to use wrong.

      The notation dlib uses to document preconditions and postconditions is described in the introduction. All code that goes into dlib must document itself using this notation. You should also separate the implementation and specification of a component into two separate files as described in the introduction. This way users aren't confused or distracted by implementation details when they look at the documentation.

  • Use spaces instead of tabs.

      This is just generally good advice but it is especially important in dlib since everything is viewable as pretty-printed HTML. Tabs show up as 8 characters in most browsers and this results in the HTML version being difficult to read. So don't use tabs. Additionally, please use 4 spaces for each tab level.

  • Don't use capitol letters in the names of variables, functions, or classes. Use the _ character to separate words.

      The reason dlib uses this style is because it is the style used by the C++ standard library. But more importantly, dlib currently provides an interface to users that has a consistent look and feel and it is important to continue to do so.

      As for constants, they should usually contain all upper case letters but all lowercase is ok sometimes.

  • Don't use manual resource management. Use RAII instead.

      You should not be calling new and delete in your own code. You should instead be using objects like the std::vector, std::shared_ptr, or any number of other objects that manage resources such as memory for you. If you want an array use std::vector (or the checked std_vector_c). If you want to make a lookup table use a map. If you want a two dimensional array use matrix or array2d.

      These container objects are examples of what is called RAII (Resource Acquisition Is Initialization) in C++. It is essentially a name for the fact that, in C++, you can have totally automated and deterministic resource management by always associating resource acquisition with the construction of an object and resource release with the destruction of an object. I say resource management here rather than memory management because, unlike Java, RAII can be used for more than memory management. For example, when you use a mutex you first lock it, do something, and then you need to remember to unlock it. The RAII way of doing this is to use the auto_mutex which will lock a mutex and automatically unlock it for you. Or suppose you have made a TCP connection to another machine and you want to be certain the resources associated with that connection are always released. You can easily accomplish this with RAII by using the std::unique_ptr as shown in this example program.

      RAII is a trivial technique to use. All you have to do is not call new and delete and you will never have another memory leak. Just use the appropriate container instead. Finally, if you don't use RAII then your code is almost certainly not exception safe.

  • Don't use pointers

      There are a number of reasons to not use pointers. First, if you are using pointers then you are probably not using RAII. Second, pointers are ambiguous. When I see a pointer I don't know if it is a pointer to a single item, a pointer to nothing, or a pointer to an array of who knows how many things. On the other hand, when I see a std::vector I know with certainty that I'm dealing with a kind of array. Or if I see a reference to something then I know I'm dealing with exactly one instance of some object.

      Most importantly, it is impossible to validate the state of a pointer. Consider two functions:

      double compute_sum_of_array_elements(const double* array, int array_size); 
      double compute_sum_of_array_elements(const std::vector<double>& array); 
      The first function is inherently unsafe. If the user accidentally passes in an invalid pointer or sets the size argument incorrectly then their program may crash and this will turn into a potentially hard to find bug. This is because there is absolutely nothing you can do inside the first function to tell the difference between a valid pointer and size pair and an invalid pointer and size pair. Nothing. The second function has none of these difficulties.

      If you absolutely need pointer semantics then you can usually use a smart pointer like std::unique_ptr or std::shared_ptr. If that still isn't good enough for you and you really need to use a normal C style pointer then isolate your pointers inside a class or function so that they are contained in a small area of the code. However, in practice the container classes in dlib and the STL are more than sufficient in nearly every case where pointers would otherwise be used.

  • Don't use #define for constants.

      dlib is meant to be integrated into other people's projects. Because of this everything in dlib is contained inside the dlib namespace to avoid naming conflicts with user's code. #defines don't respect namespaces at all. For example, if you #define a constant called SIZE then it will cause a conflict with any piece of code anywhere that contains the identifier SIZE. This means that #define based constants must be avoided and constants should be created using the const keyword instead.

  • Don't use stack based arrays.

      A stack based array, or C style array, is an array declared like this:

      int array[200];
      Most of my criticisms of pointers also apply to stack based arrays. In particular, if you are passing a stack based array to a function then that means you are probably using functions similar to the unsafe compute_sum_of_array_elements() example above.

      The only time it is OK to use this kind of array is when you use it for simple tasks and you don't start passing pointers to the array to other parts of your code. You should also use a constant to store the array size and use that constant in your loops rather than hard coding the size in numerous places.

      But even still, you should use a container class instead and preferably one with the ability to do range checking such as the std_vector_c.

      Consider the following two bits of code:

      for (int i = 0; i < array_size; ++i) 
         my_c_array[i] = 4;
      
      for (int i = 0; i < my_std_vector.size(); ++i)
         my_std_vector[i] = 4;
      
      The second loop clearly doesn't overflow the bounds of the my_std_vector. On the other hand, just by looking at the code in the first loop, we can not tell if it overflows my_c_array. We have to assume that array_size is the appropriate constant but we could be wrong.

      Buffer overflows are probably the most common kind of bug in C and C++ code. These bugs also lead to serious exploitable security holes in software. So please try to avoid stack based arrays.

  • Use exceptions, but don't abuse them.

      Exceptions are one of the great features of modern programming languages. Some people, however, consider that to be a contentious statement. But if you accept the notion that a software library should be hard to use wrong then it becomes difficult to reject exceptions.

      Most of the complaints I hear about exceptions are actually complaints about their misuse rather than objections to the basic idea. So before I begin to defend the above paragraph I would like to lay out more clearly when it is appropriate to use exceptions and when it is not.

      There are two basic questions you should ask yourself when deciding whether to throw an exception in response to some event. The first is (1) "should this event occur in the normal use of my library component?" The second question is (2) "if this event were to occur, is it likely that the user will want to place the code for dealing with the event near the invocations of my library component?"

      If your answers to the above two questions are "no" then you should probably throw an exception in response to the event. On the other hand, if you answer "yes" to either of these questions then you should probably not throw an exception.

      A good example of an event worth throwing exceptions for is running out of memory. (1) It doesn't happen very often, and (2) when it does happen it is hardly ever the case that you want to deal with the out of memory event right next to the place where you are attempting to allocate memory.

      Alternatively, an example of an event that shouldn't throw an exception comes to us from the C++ I/O streams. This part of the standard library allows you to read the contents of a file from disk. When you hit the end of file they do not throw an exception. This is appropriate because (1) you usually want to read a file in its entirety. So hitting EOF happens all the time. Additionally, (2) when you hit EOF you usually want to break out of the loop you are in and continue immediately into the next block of code.

      Usually when someone tells me they don't like exceptions they give reasons like "they make me put try/catch blocks all over the place and it makes the code hard to read." Or "it makes it hard to understand the flow of a program with exceptions in it." Invariably they have been working with bodies of software that disregard the above rules regarding questions 1 and 2. Indeed, when exceptions are used for flow control the results are horrifying. Using exceptions for events that occur in the normal use of a software component, especially when the events need to be dealt with near where they happen result in a spaghetti-like mess of throw statements and try/catch blocks. Clearly, exceptions should be used judiciously. So please, take my advice regarding questions 1 and 2 to heart.

      Now let's go back to my claim that exceptions are an important part of making a library that is hard to use wrong. But first let's be honest about one thing, many developers don't think very hard about error handling and they similarly aren't very careful about checking function return codes. Moreover, even the most studious of us can easily forget to add these checks. It is also easy to forget to add appropriate exception catch blocks.

      So what is so great about exceptions then? Well, let's imagine some error just occurred and it caused an exception to be thrown. If you forgot to setup catch blocks to deal with the error then your program will be aborted. Not exactly a great thing. But you will, however, be able to easily find out what exception was thrown. Additionally, exceptions typically contain a message telling you all about the error. Moreover, any debugger worth its salt will be able to show you a stack trace that lets you see exactly where the exception came from. The exception forces you, the user, to be aware of this potential error and to add a catch block to deal with it. This is where the "hard to use wrong" comes from.

      Now let's imagine that we are using return codes to communicate errors to the user and the same error occurs. If you forgot to do all your return code checking then you will simply be unaware of the error. Maybe your program will crash right away. But more likely, it will continue to run for a while before crashing at some random place far away from the source of the error. You and your debugger now get to spend a few hours of quality time together trying to figure out what went wrong.

      The above considerations are why I maintain that exceptions, when used properly, contribute to the "hard to use wrong" factor of a library. There are also other reasons to use exceptions. They free the user from needing to clutter up code with lots of return code checking. This makes code easier to read and let's you focus more on the algorithm you are trying to implement and less on the bookkeeping.

      Finally, it is important to note that there is a place for return codes. When you answer "no" to questions 1 and 2, I suggest using exceptions. However, if you answer "yes" to even one of them then I would recommend pretty much anything other than throwing an exception. In this case error codes are often an excellent idea.

      As an aside, it is also important that your exception classes inherit from dlib::error to maintain consistency with the rest of the library.

  • Write portable code

    • Don't complicate the build process

        One of dlib's design goals is to not require any installation process before it can be used. A user should be able to copy the dlib folder into their project and have it just work.

        In particular, using dlib in a project should not make it difficult to compile the project from the command line. For example, all the example programs provided with dlib can be compiled using a single statement on the command line.

        Similarly, the user should be able to check the dlib folder into whatever version control system they use without running into any difficulties. The user should then be able to check out copies of the code on any of the dlib supported platforms and have their project build without needing to mess with anything.

    • Don't make assumptions about how objects are laid out in memory.

        If you have been following the prohibition against messing around with pointers then this won't even be an issue for you. Moreover, just about the only time this should even come up is when you are casting blocks of memory into other types or dumping the contents of memory to an I/O channel. All of these things are highly non-portable so don't do them.

        If you want a portable way to write the state of an object to an I/O channel then I recommend you use the serialization capability in dlib. If that doesn't suit your needs then do something else, but whatever you do don't just dump the contents of memory. Convert your data into some portable format and then output that.

        As an example of something else you might do: suppose you have a bunch of integers you want to write to disk. Assuming all your integers are positive numbers representable using 32 or fewer bits you could store all your numbers in dlib::uint32 variables and then convert them into either big or little endian byte order and then write them to an output stream. You could do this using code similar to the following:

        dlib::byte_orderer bo;
        ...
        bo.host_to_big(my_uint);
        my_out_stream.write((char*)&my_uint, sizeof(my_uint));
        ... 

        There are three important things to understand about this process. First, you need to pick variables that always have the same size on all platforms. This means you can't use any of the built in C++ types like int, float, double, long, etc. All of these types have different sizes depending on your platform and even compiler settings. So you need to use something like dlib::uint32 to obtain a type of a known size.

        Second, you need to convert each thing you write out into either big or little endian byte order. The reason for this is, again, portability. If you don't explicitly convert to one of these byte orders then you end up writing data out using whatever byte order is used by your current machine. If you do this then only machines that have the same byte order as yours will be able to read in your data. If you use the dlib::byte_orderer object this is easy. It is very type safe. In fact, you should have a hard time even getting it to compile if you use it wrong.

        The third thing you should understand is that you need to write out each of your variables one at a time. You can't write out an entire struct in a single ostream.write() statement because the compiler is allowed to put any kind of padding it feels like between the fields in a struct.

        You may be aware that compilers usually provide #pragma directives that allow you to explicitly control this padding. However, if you want to submit code to dlib you will not use this feature. Not all compilers support it in the same way and, more importantly, not all CPU architectures are even capable of running code that has had the padding messed with. This is because it can result in the CPU attempting to perform what is called an "unaligned load" which many CPUs (like the SPARC) are incapable of doing.

        So in summary, convert your data into a known type with a fixed size, then convert into a specific byte order (like big endian), then write out each variable individually. Or you could just use serialize and not worry about all this horrible stuff. :)

    • All code that calls functions that aren't in dlib or the C++ standard library must be isolated inside the API wrappers.

        If you want to contribute code to dlib which needs to use something that isn't in the C++ standard then we need to introduce a new library component in the API wrappers section. The new component would provide whatever functionality you need. This new component would have to provide at least POSIX and win32 implementations.

        It is also worth pointing out that simple wrappers around operating system specific calls are usually a bad solution. This is because there are invariably subtle, if not huge, differences between what is available on different operating systems. So being truly portable takes a lot of work. It involves reading everything you can find about all the APIs needed to implement the feature on each target platform. In many cases there will be important details that are undocumented and you will only be able to find out about them by searching the internet for other developers complaining about bugs in API functions X, Y, and Z. All this stuff needs to be abstracted away to put a portable and simple interface in front of it. So this is a task that shouldn't be taken lightly.

  • Library components should have regression tests

      dlib has a regression test suite located in the dlib/test folder. Whenever possible, library components should have tests associated with them. GUI components get a pass since it isn't very easy to setup automatic tests for them but pretty much everything else should have some sort of test.

  • You must use the Boost Software License

      Having the library use more than one open source license is confusing so I ask that any code contributions be licensed under the Boost Software License.