登入選單
返回Google圖書搜尋
Clustered Autoencoder Imputation
註釋Many datasets have missing entries. Since downstream tasks often require full datasets with little noise, accurately imputing the missing data is quite valuable. Autoencoders have proven themselves as effective data imputers. However, while they exploit high order dependencies between the columns of a dataset, autoencoders typically treat each row independently. This produces two problems. First, imputation accuracy is suboptimal because not all of the data is used effectively. Second, downstream classification tasks suffer since rows belonging to different classes get treated the same. Presented in this thesis is CLAIM (CLustered Autoencoder IMputation), an algorithm that adapts existing autoencoder networks in a way that directly addresses these issues. CLAIM first separates rows into clusters based on similarity. Then, in the encoder, it applies different, loosely connected, learned linear transformations to each cluster. Results show that this method improves accuracy with typical autoencoder imputation strategies on large enough datasets. Also presented is a CLAIM-specific iterative clustering algorithm, which allows CLAIM to improve initial cluster assignments as needed.