Multiple Kernel Learning And Its Application In Bioinformatics

Hasan, Md. Al Mehedi

RUCL Repository
→
Faculty of Engineering
→
Department of Computer Science and Engineering
→
PhD Thesis
→
View Item

Multiple Kernel Learning And Its Application In Bioinformatics

Hasan, Md. Al Mehedi

URI: http://rulrepository.ru.ac.bd/handle/123456789/329

Date: 2017

Abstract:

During the last decades, the support vector machine (SVM) has been applied broadly within the field of computational biology or bioinformatics to answer biological questions and to reach valid biological conclusions. However, a successful application of SVM depends heavily on the determination of the right type and suitable parameter settings of kernel functions. The selection of the appropriate kernel and kernel parameters are both considered as the choice of kernel problem. Therefore, kernel learning becomes a crucial problem for all kernel-based methods like the SVM. Recently, the multiple kernel learning (MKL) has been developed to tackle the kernel learning problem efficiently and gives some scopes to improve the performance of a system. On the other hand, sometimes it is desirable to handle multiple data sources for pattern recognition in the field of bioinformatics. In this context, if these data sources are combined appropriately as one data source, it is then possible to provide a more "complete" representation of an entity which in turns, enhances the performance of a pattern recognition system. In this case, MKL also provides a way to combine features from various data sources, where each kernel will be dedicated to a particular type of data source. In order to use the above two advantages of MKL, we have applied MKL in two challenging problems in bioinformatics: protein subcellular localization prediction and protein post-translational modifications (PTMs) prediction. The knowledge of the subcellular localization and PTMs of proteins are important for both basic research and drug development. Recently various types of computational tools have been developed to predict the subcellular localization and PTMs or PTMs site of a protein through different types of machine learning algorithms. However, in order to meet the current demand of drug development and basic research, both of the above prediction systems require additional effort to produce efficient high-throughput tools. In our thesis work, we have applied MKL in order to give potential solution for the choice of kernel problem in one of the two mentioned applications of bioinformatics. In this case, the set of radial basis function (RBF) kernels (different values of sigma create different kernels) has been considered as the search space of the choice of kernel problem. Moreover, since both applications can be solved from various data sources, features from various sources are fused using multiple kernel learning with the expectation of better improvements. The experimental results show that the prediction systems using MKL based SVM provide better performance than other top existing systems in both applications. We have completed nine experiments throughout this thesis work. Where, four of those show the capability of single kernel based SVM, one shows the effects of the choice of kernel problem, one provides potential solution to the choice of kernel problem using MKL, finally, rest three show the application of MKL in handling multiple data sources. In addition to it, we have developed six user-friendly web servers for six specific prediction purposes as a product of these experiments.

Description:

This thesis is Submitted to the Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh for The Degree of Doctor of Philosophy (PhD)

Show full item record