A Statistical Approach for Information Extraction of Biological Relationships
註釋ABSTRACT: Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and small-molecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classiers in an ensemble approach. The three classiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models dened by interaction word. The three classiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and cross-corpus validation to replicate an application scenario. The three classiers are unique and we and that performance of individual classiers varies depending on the corpus. Therefore, an ensemble of classiers removes the need to choose one classier and provides optimal performance.