作者: Wang, ZM (Wang, Zengmao); Du, B (Du, Bo); Zhang, LF (Zhang, Lefei); Zhang, LP (Zhang, Liangpei)
|
摘要: Batch-mode active learning approaches are dedicated on the training sample set selection for classification, where a batch of unlabeled samples is queried at each iteration. The current state-of-the-art AL techniques exploit different query functions, which are mainly based on the evaluation of two criteria uncertainty and diversity. Generally, the two criterions are independent of each other, and they also cannot guarantee that the new queried samples are identical and independently distributed (i.i.d.) from the unknown source distribution. To solve this problem, a novel form of upper bound for the true risk in the setting is derived by minimizing this upper bound to measure the discriminative information, which is connected with the uncertainty. And for the distribution match, the proposed method adopts the maximum mean discrepancy to constrain the distribution of the labeled samples and make them as similar to the overall sample distribution as possible, which helps capture the representative information of the data structure. In the proposed framework, the defining of the binary classes is generalized to a multiclass problem, in addition, the discriminative and representative information (DR) are combined together. In this way, our method is shown to query the most informative samples while preserving the source distribution as much as possible, thus identifying the most uncertain and representative queries. Meanwhile, the number of new queried samples is adaptive, and depends on the distribution of the labeled samples. In the experiments, we employed two benchmark remote sensing datasets the Indian Pines and Washington DC datasets and the results confirmed the superior performance of the proposed framework compared with the other state-of-the-art AL methods. (C) 2015 Elsevier B.V. All rights reserved.
|