Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 28 Jul 2021]
Title:The Trip to The Enterprise Gourmet Data Product Marketplace through a Self-service Data Platform
View PDFAbstract:Data Analytics provides core business reporting needs in many software companies, acts as a source of truth for key information, and enables building advanced solutions, e.g., predictive models, machine learning, real-time recommendations, to grow the business.
A self-service, multi-tenant, API-first, and scalable data platform is the foundational requirement in creating an enterprise data marketplace, which enables the creation, publishing, and exchange of data products. Such a marketplace enables the exploration and discovery of data products, further providing high-level data governance and oversight on marketplace contents. In this paper, we describe our way to the gourmet data product marketplace. We cover the design principles, the implementation details, technology choices, and the journey to build an enterprise data platform that meets the above characteristics. The platform consists of ingestion, streaming, storage, transformation, schema generation, fail-safe, data sharing, access management, PII data automatic identification, self-service storage optimization recommendations, and CI/CD integration.
We then show how the platform enables and operates the data marketplace, facilitating the exchange of stable data products across users and tenants. We motivate and show how we run scalable decentralized data governance. All of this is built and run for Cimpress Technology (CT), which operates the Mass Customization Platform for Cimpress and its businesses. The CT data platform serves 1000s of users from different platform participants, with data sourced from heterogeneous sources. Data is ingested at a rate of well over 1000 individual messages per second and serves more than 100k analytical queries daily.
Submission history
From: Michal Zasadzinski [view email][v1] Wed, 28 Jul 2021 07:52:09 UTC (1,609 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.