Utilizing Embeddings to Learn a Universal Customer Behavior Representation in E-Commerce
- 2026
- Book
- Author
- Miguel Alves Gomes
- Publisher
- Springer Fachmedien Wiesbaden
About this book
E-commerce operates in a highly dynamic and competitive environment, where customer satisfaction is key to success. Delivering personalized experiences at scale requires systems capable of reliably modeling individual customer behavior while respecting privacy and data protection constraints such as the GDPR. This book proposes a universal, privacy-compliant customer representation that is task-agnostic and incrementally adaptable. A decoupled three-stage approach is introduced, combining self-supervised learning of customer embeddings from behavioral data with flexible downstream models for predicting customer intentions. Temporal extensions improve performance, particularly under sparse information conditions, while lifelong learning enables dynamic adaptation to new interactions and evolving product spaces without full retraining.
Comprehensive experiments across multiple real-world e-commerce datasets demonstrate consistent performance improvements over state-of-the-art baselines. By decoupling personalization from personal data, this work offers a scalable and privacy-preserving foundation for next-generation personalization systems.
Table of Contents
-
Frontmatter
-
1. Introduction
Miguel Alves GomesThis chapter delves into the transformative impact of digitalization on e-commerce, highlighting the critical role of search engines and recommendation systems in enhancing customer experience. It explores the importance of personalization, driven by data-driven technologies, in tailoring shopping experiences and increasing revenue. The text discusses the challenges and opportunities in modeling customer behavior, emphasizing the need for a universal customer representation (UCR) that can adapt to various use cases and tasks. It also addresses the technical and legal challenges of data-driven personalization, including data privacy regulations and the dynamic nature of customer behavior. The chapter outlines research goals aimed at developing a data-driven method for modeling customer behavior, focusing on the creation of a UCR that is transferable, automated, and scalable. It also investigates the generalizability of self-supervised learned customer behavior embeddings to other domains, contributing to a broader understanding of representation learning and its applications.AI Generated
This summary of the content was generated with the help of AI.
AbstractThe current century has been defined by the digitalisation of society, facilitated by the proliferation of mobile devices and the connectivity and utilisation of the Internet. This development has led to a notable simplification in the use of online services and online shopping for the customer. For instance, the advent of one-click purchasing has made it significantly easier for customers to find and purchase products online without navigating through multiple pages or performing extensive searches. -
2. Fundamentals and Research Scope
Miguel Alves GomesThis chapter delves into the core principles of e-commerce analytics, focusing on customer journey mapping, targeting methodologies, and machine learning techniques for predicting customer behavior. It begins by defining customer targeting and the customer journey, highlighting the dynamic and individualized nature of customer interactions. The text then explores various types of customer representations and the data-driven approaches crucial for understanding and predicting customer behavior. Key machine learning concepts, including supervised and self-supervised learning, are introduced, with a focus on their application in e-commerce. The chapter also discusses feature engineering, sequential learning models like RNNs, LSTMs, and Transformers, and the evaluation metrics used to assess the performance of predictive models. Embeddings, particularly context-based methods like Skip-Gram and CBOW, are highlighted as a method for representing customer behaviors in a dense, low-dimensional vector space. The chapter concludes by summarizing the theoretical groundwork necessary to address research questions in e-commerce analytics, emphasizing the importance of personalized targeting and the ethical considerations of data usage.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn this chapter, the fundamental concepts and methodologies relevant for answering the research questions are established. This includes the core principles of e-commerce analytics such as the customer journey and customer targeting, and the ML techniques necessary to create a customer representation as well as predicting future customer behavior in e-commerce. -
3. State-of-the-Art
Miguel Alves GomesThis chapter delves into the state-of-the-art methodologies for constructing Universal Customer Representations (UCR) and task-specific customer representations, focusing on their applications in personalized customer targeting. It begins by exploring various UCR approaches, which aim to create reusable customer representations applicable across multiple tasks. These approaches are categorized based on input type (text-based, activity-based, multimodal) and learning techniques (self-supervised learning, contrastive learning, multi-task learning, and reconstruction-based approaches). The chapter highlights the dominance of Transformer-based models in enhancing generalization across domains and evaluates these representations on different downstream tasks to demonstrate their transferability and adaptability. In contrast, task-specific representations are optimized for distinct prediction tasks such as Click-Through Rate (CTR) prediction, purchase prediction, and churn prediction. While UCR methods emphasize transferability, task-specific models prioritize predictive accuracy within specific domains. The chapter also discusses the emergence of end-to-end learning methods as a promising alternative to traditional feature engineering-based approaches, offering automated feature extraction and improved generalization. However, these methods come with challenges such as high computational costs, data requirements, and privacy concerns. The chapter concludes by introducing a two-stage end-to-end customer targeting approach that integrates customer representation learning and prediction into a single learning process, enhancing model adaptability and effectiveness. This comprehensive overview provides professionals with a detailed understanding of the current methodologies in customer representation learning and their potential for advancements in personalized customer targeting.AI Generated
This summary of the content was generated with the help of AI.
AbstractAs introduced in Chap. 1, the primary objective of this thesis is to explore methodologies for constructing a UCR to enable personalized customer targeting. Consequently, this chapter provides an overview of state-of-the-art approaches for representing customer behavior. Sect. 3.1 presents general methodologies for developing and utilizing UCR, as outlined in Chap. 2. Sect. 3.2 examines approaches that focus on task-specific customer representation, highlighting techniques tailored to distinct applications. -
4. Use Cases and Data
Miguel Alves GomesThis chapter delves into the critical role of personalized marketing strategies in modern e-commerce platforms, focusing on the pre-purchase and purchase phases of the customer journey. It introduces a practical use case and data foundations to investigate and validate the proposed UCR concept, emphasizing the importance of real-time, personalized decision-making for maximizing customer satisfaction and business outcomes. The chapter presents a data model that abstracts customer interactions in a privacy-preserving and regulation-compliant manner, ensuring alignment with strict data privacy regulations like GDPR. Four diverse datasets—Breinify, RetailRocket, YooChoose, and OpenCDP—are introduced to evaluate the transferability and generalizability of the proposed UCR approach. Each dataset offers unique characteristics and challenges, reflecting the heterogeneity of real-world e-commerce environments. The comparative analysis highlights differences in event types, conversion rates, customer observability, and session statistics, demonstrating the adaptability of the data model. The chapter concludes by laying the groundwork for evaluating the UCR approach across multiple use cases and data environments, ensuring scalability, privacy-preserving, and reproducibility.AI Generated
This summary of the content was generated with the help of AI.
AbstractAs highlighted in Chap. 1, modern e-commerce platforms must offer personalized experiences to retain customers. This necessity demands the implementation of diverse and adaptable marketing strategies tailored to individual customers. However, achieving personalization at scale requires knowledge about the customers and their intentions, which can be challenging due to the dynamic nature of customer interactions. -
5. Learning Universal Customer Behavior Representation in E-Commerce with Embeddings
Miguel Alves GomesThis chapter explores the challenges and solutions in representing customer behavior in e-commerce using embeddings. It introduces a self-supervised approach that leverages interaction data to create universal customer representations, which are task-agnostic and privacy-compliant. The chapter presents a three-stage methodology consisting of customer information collection, representation learning, and downstream prediction. Extensive experiments across four e-commerce datasets and three predictive tasks demonstrate the effectiveness of the proposed method. The results show that the pretrained universal customer representation combined with an LSTM consistently outperforms task-specific and end-to-end baselines in predictive accuracy and robustness. The chapter also highlights the importance of input modality and data granularity, as well as the real-time capability of the proposed method. The findings confirm that universal customer representations can be learned directly from real-world interaction data, enabling accurate behavior prediction across diverse e-commerce tasks and deployment scenarios.AI Generated
This summary of the content was generated with the help of AI.
AbstractFollowing the discussion of the theoretical foundations, the limitations of existing approaches, and the goal for applying UCR in a real-world e-commerce scenario, this chapter addresses the first research question. -
6. Enhancing Customer Behavior Embeddings with Additional Information
Miguel Alves GomesThis chapter delves into the enhancement of customer behavior embeddings by incorporating additional temporal information, a critical factor in modeling customer intent and improving predictive capabilities in e-commerce. The investigation focuses on two primary strategies for integrating temporal information: feature-based encoding and learned time embeddings. The chapter introduces the Time Extended Embedding (TEE) approach, which combines both strategies to create a robust and generalizable representation of customer behavior. The experimental setup involves three diverse datasets: Breinify, YooChoose, and OpenCDP, each selected to validate the generalizability of the approach under different real-world conditions. The results demonstrate that the TEE approach consistently outperforms the baseline UCR embedding and the Time2Vec-based method across all datasets, with the most significant improvements observed in the YooChoose dataset, which features coarse-grained interactions. The chapter also conducts an ablation study to evaluate the individual contributions of temporal features, revealing that the minute of the hour is the most influential factor. Additionally, the real-time evaluation highlights the trade-offs between model performance and inference latency, emphasizing the importance of feature selection and architectural decisions in real-world deployment scenarios. The findings confirm that temporal patterns encode meaningful behavioral cues not captured by sequence information alone, offering a valuable augmentation to customer behavior modeling. The chapter concludes by summarizing the contributions and providing insights into the practical implementation of temporal feature integration in self-supervised embeddings.AI Generated
This summary of the content was generated with the help of AI.
AbstractBuilding upon the findings from Research Question 1, it has been demonstrated that self-supervised embeddings derived solely from customer interaction data serve as effective UCR. -
7. Lifelong Learning Embeddings for Adaptive Customer Behavior Modeling
Miguel Alves GomesThis chapter explores the challenges of maintaining effective customer behavior models in the rapidly evolving e-commerce landscape. It introduces Lifelong Learning Embeddings (LLE), a novel approach that enables continuous adaptation to new products and interactions without the need for complete retraining. The chapter delves into the four core components of LLE: incremental touchpoint integration, adaptive embedding dimensionality, regularization-based continual learning, and optional pruning of outdated touchpoints. Through comprehensive experiments on three datasets, the effectiveness of LLE is demonstrated, showing consistent performance improvements over static retraining baselines. The results highlight the importance of preserving and extending previously learned representations, as well as the benefits of adaptive dimensionality control. The chapter concludes by discussing the practical implications of LLE for real-world e-commerce systems, emphasizing its potential to enhance customer behavior modeling and drive business success.AI Generated
This summary of the content was generated with the help of AI.
AbstractBuilding upon the previous chapter, which demonstrated how enriching the UCR approach with temporal information enhances its expressiveness and predictive power, this chapter extends the discussion to another critical challenge, which is relevant for academia and industry alike: the need for continuous adaptability in dynamic e-commerce environments. While TEE addresses the question of how to encode additional session-based behavioral signals, they do not fully resolve the problem of how to maintain and evolve customer representations over time as new products, interactions, and behavioral patterns emerge. Accordingly, this chapter tackles the challenge of adaptability by addressing the third research question of this thesis. -
8. Beyond E-Commerce: Generalizing Self-Supervised Behavior Embedding Representation
Miguel Alves GomesThis chapter delves into the fascinating question of whether self-supervised behavior embeddings, initially developed for e-commerce, can be effectively applied to other domains. The investigation focuses on three core subquestions: abstracting the e-commerce use case for general behavior prediction, identifying the general properties of the three-stage approach and UCR embeddings, and determining the conditions for practical transfer realization. The chapter systematically explores these subquestions, providing a structured analysis of the essential properties required for behavior prediction and the transferability of the proposed approach. It also examines related application domains such as healthcare, education, wildlife, finance, mobility, and cybersecurity, highlighting the potential and challenges of applying the method in these fields. The conclusion underscores the general conditions and empirical evidence supporting the transferability of the three-stage behavior embedding approach, offering a promising alternative to traditional, manually engineered features in behavior prediction pipelines.AI Generated
This summary of the content was generated with the help of AI.
AbstractThe empirical findings presented in the preceding chapters demonstrated that customer behavior in e-commerce can be effectively captured using a self-supervised learning approach that constructs a UCR from interaction data alone. The proposed UCR embeddings have been shown to be effective across diverse prediction tasks, including purchase prediction, churn estimation, and CTR forecasting, when tested on different datasets. Furthermore, the approach satisfied critical operational requirements, such as low inference latency, adaptability to heterogeneous data sources, and robustness under dynamic conditions. -
9. Critical Reflection and Outlook
Miguel Alves GomesThis chapter delves into the critical reflection and future outlook of the UCR, TEE, and LLE approaches, which were previously presented in detail. It systematically examines the principal components of these approaches, identifying methodological, architectural, and experimental limitations for each. The chapter addresses the core assumptions and constraints of the UCR design, the trade-offs between temporal expressiveness and computational feasibility in the TEE approach, and the modularity, dimensionality adaptation, and pruning strategy of the LLE approach. It also discusses the limitations of the experimental design, including the use of synthetic labeling for churn and CTR tasks, and the absence of live A/B testing. The chapter concludes with an outlook that synthesizes these limitations and outlines potential avenues for addressing them in future work. Additionally, it explores alternative approaches for universal customer representation, such as graph neural networks and pretrained foundation models, and discusses future directions for research in privacy-preserving customer representation learning.AI Generated
This summary of the content was generated with the help of AI.
AbstractBuilding upon the preceding chapters, in which the design, implementation, and evaluation of the UCR, TEE, and LLE approaches were presented in detail, this chapter provides a critical reflection of their limitations and unresolved challenges, including constraints arising from the experimental design, methodological choices, and aspects of the implementation that could have been strengthened. While the proposed approaches demonstrate that the integration of privacy-awareness into customer representation learning is feasible, it must be acknowledged that alternative architectures and methods offer complementary capabilities in terms of efficiency, robustness, or adaptability. Furthermore, by discussing theoretical, architectural, and experimental shortcomings a more balanced perspective on the generalizability, scalability, and practical feasibility of the contributions is provided. -
10. Summary
Miguel Alves GomesThis thesis delves into the creation of a Universal Customer Representation (UCR) designed to accurately model and predict customer behavior in the dynamic e-commerce sector. The research addresses the need for effective personalization strategies through a detailed understanding of customer preferences, which is achieved by developing robust customer representations. The study introduces a decoupled three-stage methodology that involves representing customer behavior through timestamped symbolic interaction sequences, learning UCR embeddings via a Skip-Gram model, and conducting predictive tasks independently. Extensive evaluations across diverse e-commerce datasets validate the predictive strength and real-time applicability of the proposed embeddings. The thesis also explores the integration of temporal context through Time Extended Embedding (TEE) and continuous adaptation through Lifelong Learning Embedding (LLE), demonstrating substantial performance gains. Additionally, the potential for cross-domain application of UCR embeddings is discussed, highlighting the structural prerequisites and domain-specific challenges. The research concludes by outlining future research directions, including privacy-preserving extensions and decentralized implementations, emphasizing the importance of ethical and customer-centric personalization.AI Generated
This summary of the content was generated with the help of AI.
AbstractThis thesis contributes to the development and realization of a Universal Customer Representation (UCR) capable of accurately modeling and predicting customer behavior in e-commerce contexts. The e-commerce sector is highly dynamic and competitive, where customer satisfaction is a key driver of commercial success. To achieve this, effective personalization strategies are essential, as they enable tailored experiences that respond to individual customer needs. Given the diversity of customer behavior, personalization requires multiple marketing strategies and thus a detailed understanding of customer preferences. -
Backmatter
- Title
- Utilizing Embeddings to Learn a Universal Customer Behavior Representation in E-Commerce
- Author
-
Miguel Alves Gomes
- Copyright Year
- 2026
- Publisher
- Springer Fachmedien Wiesbaden
- Electronic ISBN
- 978-3-658-50781-7
- Print ISBN
- 978-3-658-50780-0
- DOI
- https://doi.org/10.1007/978-3-658-50781-7
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.