Abstract:
During the last decades, the support vector machine (SVM) has been applied broadly
within the field of computational biology or bioinformatics to answer biological questions
and to reach valid biological conclusions. However, a successful application of
SVM depends heavily on the determination of the right type and suitable parameter
settings of kernel functions. The selection of the appropriate kernel and kernel parameters
are both considered as the choice of kernel problem. Therefore, kernel learning
becomes a crucial problem for all kernel-based methods like the SVM. Recently, the
multiple kernel learning (MKL) has been developed to tackle the kernel learning problem
efficiently and gives some scopes to improve the performance of a system.
On the other hand, sometimes it is desirable to handle multiple data sources for
pattern recognition in the field of bioinformatics. In this context, if these data sources
are combined appropriately as one data source, it is then possible to provide a more
"complete" representation of an entity which in turns, enhances the performance of a
pattern recognition system. In this case, MKL also provides a way to combine features
from various data sources, where each kernel will be dedicated to a particular type of
data source.
In order to use the above two advantages of MKL, we have applied MKL in two
challenging problems in bioinformatics: protein subcellular localization prediction and
protein post-translational modifications (PTMs) prediction. The knowledge of the subcellular
localization and PTMs of proteins are important for both basic research and
drug development. Recently various types of computational tools have been developed
to predict the subcellular localization and PTMs or PTMs site of a protein through different
types of machine learning algorithms. However, in order to meet the current
demand of drug development and basic research, both of the above prediction systems
require additional effort to produce efficient high-throughput tools.
In our thesis work, we have applied MKL in order to give potential solution for the
choice of kernel problem in one of the two mentioned applications of bioinformatics.
In this case, the set of radial basis function (RBF) kernels (different values of sigma
create different kernels) has been considered as the search space of the choice of kernel
problem. Moreover, since both applications can be solved from various data sources,
features from various sources are fused using multiple kernel learning with the expectation
of better improvements. The experimental results show that the prediction
systems using MKL based SVM provide better performance than other top existing systems
in both applications. We have completed nine experiments throughout this thesis
work. Where, four of those show the capability of single kernel based SVM, one shows
the effects of the choice of kernel problem, one provides potential solution to the
choice of kernel problem using MKL, finally, rest three show the application of MKL in
handling multiple data sources. In addition to it, we have developed six user-friendly
web servers for six specific prediction purposes as a product of these experiments.
Description:
This thesis is Submitted to the Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh for The Degree of Doctor of Philosophy (PhD)