Skip to main content
Top

Utilizing Embeddings to Learn a Universal Customer Behavior Representation in E-Commerce

  • 2026
  • Book

About this book

E-commerce operates in a highly dynamic and competitive environment, where customer satisfaction is key to success. Delivering personalized experiences at scale requires systems capable of reliably modeling individual customer behavior while respecting privacy and data protection constraints such as the GDPR. This book proposes a universal, privacy-compliant customer representation that is task-agnostic and incrementally adaptable. A decoupled three-stage approach is introduced, combining self-supervised learning of customer embeddings from behavioral data with flexible downstream models for predicting customer intentions. Temporal extensions improve performance, particularly under sparse information conditions, while lifelong learning enables dynamic adaptation to new interactions and evolving product spaces without full retraining.
Comprehensive experiments across multiple real-world e-commerce datasets demonstrate consistent performance improvements over state-of-the-art baselines. By decoupling personalization from personal data, this work offers a scalable and privacy-preserving foundation for next-generation personalization systems.

Table of Contents

  1. Frontmatter

  2. 1. Introduction

    Miguel Alves Gomes
    This chapter delves into the transformative impact of digitalization on e-commerce, highlighting the critical role of search engines and recommendation systems in enhancing customer experience. It explores the importance of personalization, driven by data-driven technologies, in tailoring shopping experiences and increasing revenue. The text discusses the challenges and opportunities in modeling customer behavior, emphasizing the need for a universal customer representation (UCR) that can adapt to various use cases and tasks. It also addresses the technical and legal challenges of data-driven personalization, including data privacy regulations and the dynamic nature of customer behavior. The chapter outlines research goals aimed at developing a data-driven method for modeling customer behavior, focusing on the creation of a UCR that is transferable, automated, and scalable. It also investigates the generalizability of self-supervised learned customer behavior embeddings to other domains, contributing to a broader understanding of representation learning and its applications.
  3. 2. Fundamentals and Research Scope

    Miguel Alves Gomes
    This chapter delves into the core principles of e-commerce analytics, focusing on customer journey mapping, targeting methodologies, and machine learning techniques for predicting customer behavior. It begins by defining customer targeting and the customer journey, highlighting the dynamic and individualized nature of customer interactions. The text then explores various types of customer representations and the data-driven approaches crucial for understanding and predicting customer behavior. Key machine learning concepts, including supervised and self-supervised learning, are introduced, with a focus on their application in e-commerce. The chapter also discusses feature engineering, sequential learning models like RNNs, LSTMs, and Transformers, and the evaluation metrics used to assess the performance of predictive models. Embeddings, particularly context-based methods like Skip-Gram and CBOW, are highlighted as a method for representing customer behaviors in a dense, low-dimensional vector space. The chapter concludes by summarizing the theoretical groundwork necessary to address research questions in e-commerce analytics, emphasizing the importance of personalized targeting and the ethical considerations of data usage.
  4. 3. State-of-the-Art

    Miguel Alves Gomes
    This chapter delves into the state-of-the-art methodologies for constructing Universal Customer Representations (UCR) and task-specific customer representations, focusing on their applications in personalized customer targeting. It begins by exploring various UCR approaches, which aim to create reusable customer representations applicable across multiple tasks. These approaches are categorized based on input type (text-based, activity-based, multimodal) and learning techniques (self-supervised learning, contrastive learning, multi-task learning, and reconstruction-based approaches). The chapter highlights the dominance of Transformer-based models in enhancing generalization across domains and evaluates these representations on different downstream tasks to demonstrate their transferability and adaptability. In contrast, task-specific representations are optimized for distinct prediction tasks such as Click-Through Rate (CTR) prediction, purchase prediction, and churn prediction. While UCR methods emphasize transferability, task-specific models prioritize predictive accuracy within specific domains. The chapter also discusses the emergence of end-to-end learning methods as a promising alternative to traditional feature engineering-based approaches, offering automated feature extraction and improved generalization. However, these methods come with challenges such as high computational costs, data requirements, and privacy concerns. The chapter concludes by introducing a two-stage end-to-end customer targeting approach that integrates customer representation learning and prediction into a single learning process, enhancing model adaptability and effectiveness. This comprehensive overview provides professionals with a detailed understanding of the current methodologies in customer representation learning and their potential for advancements in personalized customer targeting.
  5. 4. Use Cases and Data

    Miguel Alves Gomes
    This chapter delves into the critical role of personalized marketing strategies in modern e-commerce platforms, focusing on the pre-purchase and purchase phases of the customer journey. It introduces a practical use case and data foundations to investigate and validate the proposed UCR concept, emphasizing the importance of real-time, personalized decision-making for maximizing customer satisfaction and business outcomes. The chapter presents a data model that abstracts customer interactions in a privacy-preserving and regulation-compliant manner, ensuring alignment with strict data privacy regulations like GDPR. Four diverse datasets—Breinify, RetailRocket, YooChoose, and OpenCDP—are introduced to evaluate the transferability and generalizability of the proposed UCR approach. Each dataset offers unique characteristics and challenges, reflecting the heterogeneity of real-world e-commerce environments. The comparative analysis highlights differences in event types, conversion rates, customer observability, and session statistics, demonstrating the adaptability of the data model. The chapter concludes by laying the groundwork for evaluating the UCR approach across multiple use cases and data environments, ensuring scalability, privacy-preserving, and reproducibility.
  6. 5. Learning Universal Customer Behavior Representation in E-Commerce with Embeddings

    Miguel Alves Gomes
    This chapter explores the challenges and solutions in representing customer behavior in e-commerce using embeddings. It introduces a self-supervised approach that leverages interaction data to create universal customer representations, which are task-agnostic and privacy-compliant. The chapter presents a three-stage methodology consisting of customer information collection, representation learning, and downstream prediction. Extensive experiments across four e-commerce datasets and three predictive tasks demonstrate the effectiveness of the proposed method. The results show that the pretrained universal customer representation combined with an LSTM consistently outperforms task-specific and end-to-end baselines in predictive accuracy and robustness. The chapter also highlights the importance of input modality and data granularity, as well as the real-time capability of the proposed method. The findings confirm that universal customer representations can be learned directly from real-world interaction data, enabling accurate behavior prediction across diverse e-commerce tasks and deployment scenarios.
  7. 6. Enhancing Customer Behavior Embeddings with Additional Information

    Miguel Alves Gomes
    This chapter delves into the enhancement of customer behavior embeddings by incorporating additional temporal information, a critical factor in modeling customer intent and improving predictive capabilities in e-commerce. The investigation focuses on two primary strategies for integrating temporal information: feature-based encoding and learned time embeddings. The chapter introduces the Time Extended Embedding (TEE) approach, which combines both strategies to create a robust and generalizable representation of customer behavior. The experimental setup involves three diverse datasets: Breinify, YooChoose, and OpenCDP, each selected to validate the generalizability of the approach under different real-world conditions. The results demonstrate that the TEE approach consistently outperforms the baseline UCR embedding and the Time2Vec-based method across all datasets, with the most significant improvements observed in the YooChoose dataset, which features coarse-grained interactions. The chapter also conducts an ablation study to evaluate the individual contributions of temporal features, revealing that the minute of the hour is the most influential factor. Additionally, the real-time evaluation highlights the trade-offs between model performance and inference latency, emphasizing the importance of feature selection and architectural decisions in real-world deployment scenarios. The findings confirm that temporal patterns encode meaningful behavioral cues not captured by sequence information alone, offering a valuable augmentation to customer behavior modeling. The chapter concludes by summarizing the contributions and providing insights into the practical implementation of temporal feature integration in self-supervised embeddings.
  8. 7. Lifelong Learning Embeddings for Adaptive Customer Behavior Modeling

    Miguel Alves Gomes
    This chapter explores the challenges of maintaining effective customer behavior models in the rapidly evolving e-commerce landscape. It introduces Lifelong Learning Embeddings (LLE), a novel approach that enables continuous adaptation to new products and interactions without the need for complete retraining. The chapter delves into the four core components of LLE: incremental touchpoint integration, adaptive embedding dimensionality, regularization-based continual learning, and optional pruning of outdated touchpoints. Through comprehensive experiments on three datasets, the effectiveness of LLE is demonstrated, showing consistent performance improvements over static retraining baselines. The results highlight the importance of preserving and extending previously learned representations, as well as the benefits of adaptive dimensionality control. The chapter concludes by discussing the practical implications of LLE for real-world e-commerce systems, emphasizing its potential to enhance customer behavior modeling and drive business success.
  9. 8. Beyond E-Commerce: Generalizing Self-Supervised Behavior Embedding Representation

    Miguel Alves Gomes
    This chapter delves into the fascinating question of whether self-supervised behavior embeddings, initially developed for e-commerce, can be effectively applied to other domains. The investigation focuses on three core subquestions: abstracting the e-commerce use case for general behavior prediction, identifying the general properties of the three-stage approach and UCR embeddings, and determining the conditions for practical transfer realization. The chapter systematically explores these subquestions, providing a structured analysis of the essential properties required for behavior prediction and the transferability of the proposed approach. It also examines related application domains such as healthcare, education, wildlife, finance, mobility, and cybersecurity, highlighting the potential and challenges of applying the method in these fields. The conclusion underscores the general conditions and empirical evidence supporting the transferability of the three-stage behavior embedding approach, offering a promising alternative to traditional, manually engineered features in behavior prediction pipelines.
  10. 9. Critical Reflection and Outlook

    Miguel Alves Gomes
    This chapter delves into the critical reflection and future outlook of the UCR, TEE, and LLE approaches, which were previously presented in detail. It systematically examines the principal components of these approaches, identifying methodological, architectural, and experimental limitations for each. The chapter addresses the core assumptions and constraints of the UCR design, the trade-offs between temporal expressiveness and computational feasibility in the TEE approach, and the modularity, dimensionality adaptation, and pruning strategy of the LLE approach. It also discusses the limitations of the experimental design, including the use of synthetic labeling for churn and CTR tasks, and the absence of live A/B testing. The chapter concludes with an outlook that synthesizes these limitations and outlines potential avenues for addressing them in future work. Additionally, it explores alternative approaches for universal customer representation, such as graph neural networks and pretrained foundation models, and discusses future directions for research in privacy-preserving customer representation learning.
  11. 10. Summary

    Miguel Alves Gomes
    This thesis delves into the creation of a Universal Customer Representation (UCR) designed to accurately model and predict customer behavior in the dynamic e-commerce sector. The research addresses the need for effective personalization strategies through a detailed understanding of customer preferences, which is achieved by developing robust customer representations. The study introduces a decoupled three-stage methodology that involves representing customer behavior through timestamped symbolic interaction sequences, learning UCR embeddings via a Skip-Gram model, and conducting predictive tasks independently. Extensive evaluations across diverse e-commerce datasets validate the predictive strength and real-time applicability of the proposed embeddings. The thesis also explores the integration of temporal context through Time Extended Embedding (TEE) and continuous adaptation through Lifelong Learning Embedding (LLE), demonstrating substantial performance gains. Additionally, the potential for cross-domain application of UCR embeddings is discussed, highlighting the structural prerequisites and domain-specific challenges. The research concludes by outlining future research directions, including privacy-preserving extensions and decentralized implementations, emphasizing the importance of ethical and customer-centric personalization.
  12. Backmatter

Title
Utilizing Embeddings to Learn a Universal Customer Behavior Representation in E-Commerce
Author
Miguel Alves Gomes
Copyright Year
2026
Electronic ISBN
978-3-658-50781-7
Print ISBN
978-3-658-50780-0
DOI
https://doi.org/10.1007/978-3-658-50781-7

PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG