Clustering Proximities

Project Members:

  • Prof. Dr. Barbara Hammer
  • Dipl.-Inf. Alexander Hasenfuss

Duration: since 01/2007

Project Description:

Clustering and data visualization constitute important problems which occur in all areas of data analysis such as text and web mining, biomedical applications, evaluation of sensor data, etc.\ Prototype-based methods such as the self-organizing map or neural gas offer robust, flexible and efficient methods with numerous successful applications. The original approaches, however, have been proposed for standard Euclidean data only, and they cannot be applied to more general possibly non-Euclidean metric structures such as alignment distances or compression metric.

We developed relational clustering which extends SOM and NG towards data given by a general proximity matrix. The method is equivalent to the standard Euclidean one if a kernel-embedding of data into a Euclidean feature space exists. Otherwise, the method provably converges to a (possibly local) optimum of the relational dual cost function for every nonsingular symmetric matrix. Unlike standard SOM and NG, the method shows quadratic complexity which can be reduced to linear complexity by using approximations. A publication in this context has obtained the best paper award at KI 2007.