登入
選單
返回
Google圖書搜尋
Optimization Algorithms for SVM Classification
Wenjuan Wang
其他書名
Applications to Geometrical Chromosome Analysis
出版
éditeur inconnu
, 2016
URL
http://books.google.com.hk/books?id=oKWoswEACAAJ&hl=&source=gbs_api
註釋
The genome is highly organized within the cell nucleus. This organization, in particular the localization and dynamics of genes and chromosomes, is known to contribute to gene expression and cell differentiation in normal and pathological contexts. The exploration of this organization may help to diagnose disease and to identify new therapeutic targets. Conformation of chromosomes can be analyzed by distance measurements of distinct fluorescently labeled DNA sites. In this context, the spatial organization of yeast chromosome III was shown to differ between two cell types, MATa and MATa. However, imaging data are subject to noise, due to microscope resolution and the living state of yeast cells. In this thesis, the aim is to develop new classification methods to discriminate two mating types of yeast cells based on distance measurements between three loci on chromosome III aided by estimation the bound of the perturbations. We first address the issue of solving large scale SVM binary classification problems and review state of the art first order optimization stochastic algorithms. To deal with uncertainty, we propose a learning model that adjusts its robustness to noise. The method avoids over conservative situations that can be encountered with worst case robust support vector machine formulations. The magnitude of the noise perturbations that is incorporated in the model is controlled by optimizing a generalization error. No assumption on the distribution of noise is taken. Only rough estimates of perturbations bounds are required. The resulting problem is a large scale bi-level program. To solve it, we propose a bi-level algorithm that performs very cheap stochastic gradient moves and is therefore well suited to large datasets. The convergence is proven for a class of general problems. We present encouraging experimental results confirming that the technique outperforms robust second order cone programming formulations on public datasets. The experiments also show that the extra nonlinearity generated by the uncertainty in the data penalizes the classification of chromosome data and advocates for further research on nonlinear robust models. Additionally, we provide the experimenting results of the bilevel stochastic algorithm used to perform automatic selection of the penalty parameter in linear and non-linear support vector machines. This approach avoids expensive computations that usually arise in k-fold cross validation.