Abstract
Deep Archetypal Analysis (DeepAA) generates latent representations of high-dimensional datasets in terms of intuitively understandable basic entities called archetypes. The proposed method extends linear Archetypal Analysis (AA), an unsupervised method to represent multivariate data points as convex combinations of extremal data points. Unlike the original formulation, Deep AA is generative and capable of handling side information. In addition, our model provides the ability for data-driven representation learning which reduces the dependence on expert knowledge. We empirically demonstrate the applicability of our approach by exploring the chemical space of small organic molecules. In doing so, we employ the archetype constraint to learn two different latent archetype representations for the same dataset, with respect to two chemical properties. This type of supervised exploration marks a distinct starting point and let us steer de novo molecular design.