登入選單
返回Google圖書搜尋
Feature Selection in Big Data
註釋In this dissertation, we develop machine learning algorithms and apply them to real life cancer data to infer plausible hypotheses compatible with the data. In the first chapter, we give the theoretical background for a network inference algorithm developed by our group. In the second chapter, we present a new machine learning algorithm, called CLOT (Combined L-One and Two), for sparse regression. We prove that CLOT has the desirable properties of the two most popular regression algorithms LASSO and EN (Elastic Net). We apply the CLOT algorithm to predict the IC50 values of 116 chemical compounds on 72 lung cancer cell lines. Our results show that CLOT is as accurate as EN and it is also produces sparse solution similar to LASSO. In the final chapter, we relate the newly emerging field of one-bit-compressed sensing to the well-established field of PAC learning theory. We also provide a new sparse classification algorithm and apply it to predict the metastasis of cancer to lymph nodes in endometrial cancer patients using microRNA data. As a result, we propose a new diagnostic test to predict metastasis based on the microRNA data. To the best of our knowledge, this diagnostic test is the first one to be validated on an independent data set in endometrial cancer.