|
Machine LearningThis page documents all the machine learning algorithms present in the library. In particular, there are algorithms for performing binary classification, regression, clustering, anomaly detection, and feature ranking, as well as algorithms for doing more specialized computations. A good tutorial and introduction to the general concepts used by most of the objects in this part of the library can be found in the svm example program. After reading this example another good one to consult would be the model selection example program. The major design goal of this portion of the library is to provide a highly modular and simple architecture for dealing with kernel algorithms. Towards this end, dlib takes a generic programming approach using C++ templates. In particular, each algorithm is parameterized to allow a user to supply either one of the predefined dlib kernels (e.g. RBF operating on column vectors), or a new user defined kernel. Moreover, the implementations of the algorithms are totally separated from the data on which they operate. This makes the dlib implementation generic enough to operate on any kind of data, be it column vectors, images, or some other form of structured data. All that is necessary is an appropriate kernel. Paper Describing dlib Machine LearningDavis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009 @Article{dlib09, author = {Davis E. King}, title = {Dlib-ml: A Machine Learning Toolkit}, journal = {Journal of Machine Learning Research}, year = {2009}, volume = {10}, pages = {1755-1758}, } |
|
A New Discriminant Principal Component Analysis Method with Partial Supervision (2009) by Dan Sun and Daoqiang ZhangThis algorithm is basically a straightforward generalization of the classical PCA technique to handle partially labeled data. It is useful if you want to learn a linear dimensionality reduction rule using a bunch of data that is partially labeled.
This object represents a map from objects of sample_type (the kind of object a kernel function operates on) to finite dimensional column vectors which represent points in the kernel feature space defined by whatever kernel is used with this object.
To use the empirical_kernel_map you supply it with a particular kernel and a set of basis samples. After that you can present it with new samples and it will project them into the part of kernel feature space spanned by your basis samples.
This means the empirical_kernel_map is a tool you can use to very easily kernelize any algorithm that operates on column vectors. All you have to do is select a set of basis samples and then use the empirical_kernel_map to project all your data points into the part of kernel feature space spanned by those basis samples. Then just run your normal algorithm on the output vectors and it will be effectively kernelized.
Regarding methods to select a set of basis samples, if you are working with only a few thousand samples then you can just use all of them as basis samples. Alternatively, the linearly_independent_subset_finder often works well for selecting a basis set. Some people also find that picking a random subset works fine.
An example use of this object is as an online algorithm for recursively estimating the centroid of a sequence of training points. This object then allows you to compute the distance between the centroid and any test points. So you can use this object to predict how similar a test point is to the data this object has been trained on (larger distances from the centroid indicate dissimilarity/anomalous points).
The object internally keeps a set of "dictionary vectors" that are used to represent the centroid. It manages these vectors using the sparsification technique described in the paper The Kernel Recursive Least Squares Algorithm by Yaakov Engel. This technique allows us to keep the number of dictionary vectors down to a minimum. In fact, the object has a user selectable tolerance parameter that controls the trade off between accuracy and number of stored dictionary vectors.
The long and short of this algorithm is that it is an online kernel based regression algorithm. You give it samples (x,y) and it learns the function f(x) == y. For a detailed description of the algorithm read the above paper.
This is an implementation of an online algorithm for recursively finding a set of linearly independent vectors in a kernel induced feature space. To use it you decide how large you would like the set to be and then you feed it sample points.
Each time you present it with a new sample point it either keeps the current set of independent points unchanged, or if the new point is "more linearly independent" than one of the points it already has, it replaces the weakly linearly independent point with the new one.
This object uses the Approximately Linearly Dependent metric described in the paper The Kernel Recursive Least Squares Algorithm by Yaakov Engel to decide which points are more linearly independent than others.
This object represents a multilayer layer perceptron network that is trained using the back propagation algorithm. The training algorithm also incorporates the momentum method. That is, each round of back propagation training also adds a fraction of the previous update. This fraction is controlled by the momentum term set in the constructor.
It is worth noting that a MLP is, in general, very inferior to modern kernel algorithms such as the support vector machine. So if you haven't tried any other techniques with your data you really should.
mlp_kernel_1:
This is implemented in the obvious way.
kernel_1ais a typedef for mlp_kernel_1 kernel_1a_cis a typedef for kernel_1a that checks its preconditions.
dlib contains a few "training post processing" algorithms (e.g. reduced and reduced2). These tools take in a trainer object, tell it to perform training, and then they take the output decision function and do some kind of post processing to it. The null_trainer_type object is useful because you can use it to run an already learned decision function through the training post processing algorithms by turning a decision function into a null_trainer_type and then giving it to a post processor.
This is a batch trainer object that is meant to wrap other batch trainer objects that create decision_function objects. It performs post processing on the output decision_function objects with the intent of representing the decision_function with fewer basis vectors.
It begins by performing the same post processing as the reduced_decision_function_trainer object but it also performs a global gradient based optimization to further improve the results.
So for example, suppose you wanted to set the bias term so that the accuracy of your decision function on +1 labeled samples was 99%. To do this you would use an instance of this object declared as follows: roc_trainer_type<trainer_type>(your_trainer, 0.99, +1);
Trains a relevance vector machine for solving regression problems. Outputs a decision_function that represents the learned regression function.
The implementation of the RVM training algorithm used by this library is based on the following paper:Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.
Trains a relevance vector machine for solving binary classification problems. Outputs a decision_function that represents the learned classifier.
The implementation of the RVM training algorithm used by this library is based on the following paper:Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.
Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization by Vojtech Franc, Soren Sonnenburg; Journal of Machine Learning Research, 10(Oct):2157--2192, 2009.
Trains a nu support vector classifier and outputs a decision_function.
The implementation of the nu-svm training algorithm used by this library is based on the following excellent papers:The implementation of the Pegasos algorithm used by this object is based on the following excellent paper:
Pegasos: Primal estimated sub-gradient solver for SVM (2007) by Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro In ICML
This SVM training algorithm has two interesting properties. First, the pegasos algorithm itself converges to the solution in an amount of time unrelated to the size of the training set (in addition to being quite fast to begin with). This makes it an appropriate algorithm for learning from very large datasets. Second, this object uses the kcentroid object to maintain a sparse approximation of the learned decision function. This means that the number of support vectors in the resulting decision function is also unrelated to the size of the dataset (in normal SVM training algorithms, the number of support vectors grows approximately linearly with the size of the training set).
Trains a probabilistic_decision_function using some sort of batch trainer object such as the svm_nu_trainer or rbf_network_trainer.
The probability model is created by using the technique described in the paper:Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods by John C. Platt. March 26, 1999