Enterprise Data Mesh (EDM) Business Rules
The Enterprise Data Mesh (EDM) business rules provided in this topic serve as the CMS standards and conventions for implementing CMS data mesh solutions.
All production EDM systems must comply with the CMS TRA and the Data Mesh Reference Architecture.
BR-DL-2: Separation of storage from compute
Separating the data from the application and analytic tools that access it means that the data does not have to be duplicated for organization that wants to use it. Data contributors can focus on managing their data, while users are allowed to work with tools that are native to their understanding.
BR-DL-3: Data assets are not copied or moved
The Enterprise Data Mesh does not seek to centralize the data. Using a very light footprint, we work with the contributors and align their existing data to a set of common standards and integration patterns. We register those datasets to a centralized metadata catalog, which makes it accessible and discoverable based on permissions.
BR-DL-4: Shared data assets are registered in the Hive Metastore and a user-facing data catalog
Rather than moving their data to a central location, data contributors simply publish the information about their metadata to the Hive Metastore which functions as a compute metadata catalog. The compute metadata catalog allows data sets to be automatically discovered by database tools. The Hive Metastore is available now to enable access to the data mesh via the System to System Design Pattern.
Data contributors will also provide their metadata to a User Data Catalog which will provide a searchable user interface. Once the User Data Catalog is available, Individuals who are interested in implementing the End-User to Data Mesh Design Pattern will be able to explore data sets to find options that will meet their business needs.
BR-DL-5: The EDM does not store raw data or unstructured data. All data in the EDM is fully structured and immediately consumable
The EDM is closer in design to the industry term Data Lakehouse and customizes the design and approach to CMS requirements using Data mesh and Data domain principles.
BR-DL-6: Data sets remain within the data owner’s security boundary
Because the data is not moved or copied, the data remains within the data owner’s security boundary and under the data contributor’s control. Only the metadata is shared. The data contributor remains in control of who can access their data.
BR-DL-7: Data owners curate their data assets and manage freshness and usability
BR-DL-8: Data consumers bring their own compute resources
“Separation of storage from compute” means that Data Consumers can point their own tools (computes) at different types of storage, accessing the data wherever it lives, rather than having to load it all into one database. Each user is allowed to bring their own skills and explore the data in ways that make sense to them. Our goal is to democratize our data. Democratizing data means making data accessible to the average non-technical user of information systems, without having to require the involvement of IT.
BR-DL-9: The data owner determines the users, groups, roles, and policies that govern data access
In the System to System Design Pattern, the consuming system will be responsible for implementing the role based security required by the data owner. But in the End User to Data Mesh Design Pattern access roles will be applied to the data and maintained by the data owners. We began investigating potential solutions last fall and are currently performing Proof of Concept for two solutions.