Moment Constrained Semi-Supervised LDA (extended abstract)

(1)

Moment Constrained Semi-Supervised LDA

Marco Loog

Pattern Recognition Laboratory, Delft University of Technology, The Netherlands

The Image Group, University of Copenhagen, Denmark

Abstract

This BNAIC compressed contribution provides a summary of the work originally presented at the First IAPR Workshop on Partially Supervised Learning and published in [5]. It outlines the idea behind supervised and semi-supervised learning and highlights the major shortcoming of many current methods. Having identified the principal reason for their limitations, it briefly sketches a conceptually different take on the matter for linear discriminant analysis (LDA). Finally, the contribution hints at some of the results obtained. For any details, the reader is of course referred to [5].

1 Semi-Supervision and Current Limitations

Supervised learning aims to learn from examples. That is, given a limited number of instances of a particu-lar input-output relation, its goal is to generalize this relationship to new and unseen data in order to enable the prediction of the associated output given new input. Specifically, supervised classification aims to infer an unknown feature vector-class label relation from a finite, potentially small, number of input feature vec-tors and their associated, desired output class labels. Now, an elementary questions in pattern recognition and machine learning is whether and, if so, how the availability of additional unlabeled data can signif-icantly improve the training of such classifier. This is what constitutes the problem of semi-supervised classification or, generally, semi-supervised learning [2].

The hope or, rather, belief is that semi-supervision can bring enormous progress to many scientific and application areas in which classification problems play a key role, simply by exploiting the often enormous amounts of unlabeled data available (think computer vision, text mining, retrieval, medical diagnostics, but also social sciences, psychometrics, econometrics, etc.). The matter of the fact, however, is that up to now semi-supervised methods have not been widely accepted outside of the realms of computer science, being little used in other domains. Part of the reason for this may be that current methods offer no performance guarantees [1] and often deteriorate in the light of large amounts of unlabeled samples [2, Chapter 4].

2 Sketch of a Different Take

In line with [4], [5] identifies as main reason for the frequent failure of semi-supervision that current semi-supervised approaches typically rely on assumptions extraneous to the classifier being considered. If, however, these additional assumptions are not accurate, such approach may obviously fail.

Focusing on classical LDA, the approach from [5] instead exploits the fact that the parameters that are to be estimated, i.e., class means miand a within covariance matrixW, fulfill particular intrinsic relations.

In particular, there are two relations that link label-dependent with label-independent quantities: one links the K class means mi∈ Rdto the overall data mean µ ∈ Rdthrough PKi=1pimi= µ (with pithe class priors)

and the other links the between and within covariance with the total covariance:B+W = Θ [3]. In this way, class-independent parameters like µ andΘ, which can be more accurately estimated using the additional unlabeled data, impose constraints on the parameters relevant to LDA, miandW, leading to a reduction in

(2)

1e0 1e2 1e4 33.5

34 34.5

haberman

1e0 1e2 1e4 30

35 40

spect

1e0 1e2 1e4 36

38 40

pima

1e0 1e2 1e4 20 22 24 26 28 wdbc

Figure 1: Mean error rates (vertical axis, averages over 1,000 repetitions) for the supervised (black), the proposed constrained semi-supervised (orange), and the self-learned classifier (light blue) on four real-world UCI Machine Learning Repository (Asuncion and Newman, 2007) data sets for various unlabeled sample sizes (horizontal axis, logarithmic scale) and a total of ten labeled training samples.

variability of these label dependent estimates. As a result, the performance of this semi-supervised linear discriminant is expected to improve over that of its supervised equal and typically does not deteriorate with increasing numbers of unlabeled data.

The ad hoc approach employed in [5] to find parameters that indeed satisfy the two above constraints is as follows1_{. Simply transform all labeled feature data x into x}0 _{= Θ}1₂_T−1₂_{(x − m)}_{+ µ with T the total}

covariance and m the total mean over all labeled data and withΘ and µ their counterparts as determined on all data, labeled as well as unlabeled. The label dependent statistics determined on x0_{now fulfill the}

necessary constraints. Note that these constraints are already automatically fulfilled in the supervised setting as in case µ= m and Θ = T. The constraints only come into effect when additional unlabeled data is used.

3 Impression of Experimental Results

The few experimental results displayed in Figure 1 give an impression of the potential behavior of the constrained semi-supervised approach (in orange) in comparison to the standard, supervised setting (in light blue) and LDA trained by means of a common semi-supervised method (in black) typically referred to as self-training or self-learning [2, Chapter 1]. Results similar to those obtained by self-learning would have been obtained by the classical EM approach [2, Chapter 3]. In these experiments the proposed constrained approach improves over the supervised and self-learned approach in all cases. Additional results, examples where also this new approach may still fail to improve upon the supervised setting, and some further limitations are discussed in the original contribution [5].

References

[1] S. Ben-David, T. Lu, and D P´al. Does unlabeled data provably help? Worst-case analysis of the sample complexity of semi-supervised learning. In COLT 2008, pages 33–44, 2008.

[2] O. Chapelle, B. Sch¨olkopf, and A. Zien. Semi-Supervised Learning. MIT Press, 2006. [3] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990.

[4] M. Loog. Constrained parameter estimation for semi-supervised learning: The case of the nearest mean classifier. In ECML PKDD, volume 6322 of LNAI, pages 291–304. Springer, 2010.

[5] M. Loog. Semi-supervised linear discriminant analysis using moment constraints. In Partially Super-vised Learning, volume 7081 of LNAI, pages 32–41. Springer, 2012.

[6] M. Loog and A.C. Jensen. Constrained log-likelihood-based semi-supervised linear discriminant anal-ysis. In S+SSPR, volume 7626 of LNCS. Springer, 2012.

1_{A more principled approach can be found in [6].}