登入
選單
返回
Google圖書搜尋
Clustered Autoencoder Imputation
Daniel J. Furman
出版
University of Idaho
, 2020
ISBN
9798672155715
URL
http://books.google.com.hk/books?id=kp_AzgEACAAJ&hl=&source=gbs_api
註釋
Many datasets have missing entries. Since downstream tasks often require full datasets with little noise, accurately imputing the missing data is quite valuable. Autoencoders have proven themselves as effective data imputers. However, while they exploit high order dependencies between the columns of a dataset, autoencoders typically treat each row independently. This produces two problems. First, imputation accuracy is suboptimal because not all of the data is used effectively. Second, downstream classification tasks suffer since rows belonging to different classes get treated the same. Presented in this thesis is CLAIM (CLustered Autoencoder IMputation), an algorithm that adapts existing autoencoder networks in a way that directly addresses these issues. CLAIM first separates rows into clusters based on similarity. Then, in the encoder, it applies different, loosely connected, learned linear transformations to each cluster. Results show that this method improves accuracy with typical autoencoder imputation strategies on large enough datasets. Also presented is a CLAIM-specific iterative clustering algorithm, which allows CLAIM to improve initial cluster assignments as needed.