CMS Business Intelligence Reference Architecture
This CMS BI Reference Architecture presents a framework for developing CMS BI solutions, provides architectural guidance and standards, incorporates best practices, and defines the BI environment using components compatible with the CMS TRA. It serves as a blueprint for building CMS BI solutions.
This chapter represents the CMS BI Reference Architecture in two virtual views—the “Business View” and the “Technical View.” Each view is intended to communicate with a different audience.
The Business View
The Business View, the first view of the CMS BI Reference Architecture, is also referred to as the BI Analytical Framework. This view is for communicating with a business or end-user community, and is represented in business-related, non-technical terms.
As shown in Business Intelligence Analytical Framework , the BI Analytical Framework consists of an iterative and gradual progression with the following steps:
- Collection of data
- Understanding relationships in data, which leads to information
- Understanding patterns in information, which leads to knowledge
- Understanding principles in knowledge, which leads to intelligent actions
The Business View is represented by a multi-layered BI Analytical Framework and includes a range of categories, from information consumers to levels of data sourcing.
Users Layer
The Users Layer represents different types of information consumers at various levels of the organization. This architecture supports each type of consumer within the organization as well as those external to the organization, such as CMS business partners, contractors, and U.S. citizens. The key principle of the BI Analytical Framework is its use of role-based authorization, which delivers only the needed information to each type of information consumer. The BI Analytical Framework ensures that the right information is given to the right consumer, at the right time, through the right tools.
Analytics Layer
The second layer of the BI Analytical Framework is the Analytics Layer, comprised of the Analytical Subject Areas and the Analytical Techniques.
The Analytical Subject Areas encompass the analysis and reporting of key business process categories: Claim, Beneficiary, Plan, Provider, and Quality. CMS has built analytics applications for each business process area to help CMS information users understand more about their areas of interest (beneficiaries, claims, etc.).
CMS Analytical Subject Areas represents the current CMS integrated analytics environment and its supported Analytical Subject Areas.
Each Analytical Subject Area includes BI tools and technologies to provide historical, current, and predictive views of its business operations. The following analytics functions are performed in their respective analytical subject areas:
-
Claim Analytics
- Monitoring benefits and claims
- Identifying payments, overpayments, and recovery
-
Beneficiary Analytics
- Assessing enrollment and participation
- Verifying eligibility and entitlement
-
Provider Analytics
- Identifying fraud, waste, and abuse
- Monitoring service and performance
-
Plan Analytics
- Assessing premium and cost sharing
- Assessing education and outreach
-
Quality Analytics
- Monitoring quality of service and performance
- Understanding quality outcomes and efficiency
The analytics circles in CMS Analytical Subject Areas overlap and form a Venn diagram showing how the BI Analytical Framework enables cross-organizational analytics in an integrated analytics environment. In such a system, information consumers from diverse organizations can ask more detailed, involved questions of their enterprise BI Environment.
As the CMS IDR Cloud becomes the primary, authoritative, enterprise-wide data asset through consolidation of CMS data warehouses, data marts, and applications, CMS can broaden its scope of analytics to include new subject areas.
The Analytics Layer also includes the Analytical Techniques, which are essentially the analytics applications. Analytical techniques available for use in the BI environment are:
- Standard query and ad hoc reporting
- Online analytical processing implemented as relational online analytical processing (ROLAP), multidimensional online analytical processing (MOLAP), and hybrid online analytical processing (HOLAP)
- Dashboards and scorecards
- Data mining
- Predictive modeling
- Data visualization
- Geographical information systems (GIS)
The BI Analytical Framework provides support for CMS BI tools that are integrated with Microsoft Office applications such as Excel, PowerPoint, and Word and with collaboration and social networking tools like SharePoint.
Some BI software tools have strategic alliances with geographic analysis application vendors to provide support for GIS. These GIS capabilities provided in the BI software tools are also supported in the BI Analytical Framework.
The TRA Glossary provides a definition of each analytics application.
Data Layer
The Data Layer comprises data sources and data warehousing layers. The data sources layer represents the operational application systems environment that supports high-volume transaction processing. This environment should not be disturbed for serving any kind of BI user request. The raw data found in the operational systems is also not suitable for direct query and reporting purposes.
The data from the operational systems is selected to represent the source data system of record, meaning the data that is most accurate, complete, up-to-date, trusted, and accessible. The selected raw source data system of record is first stored in the data staging area where it is transformed into standardized CMS formats and stored in the data warehousing layer for analytic processing. The data in the data staging area is not accessible by BI users.
Data transformation is the process of mapping the source data to the destination target environment and transforming the data based upon specific business rules. The business rules represent the correctness or quality of data. When the source data is cleaned, transformed, and cataloged, it is stored in the target data warehouse.
Data warehouses are non-volatile and are periodically updated to reflect changes in business requirements. IDR is an active data warehouse that supports more frequent updates. Data marts are specific subsets of data extracted from a data warehouse for a specific purpose and user group.
Operational Systems
The CMS operational data is found in legacy systems and stored in standard mainframe data storage facilities like the Virtual Storage Access Management (VSAM) and flat files. The operational data categories are Claim, Beneficiary, Plan, Provider, and Quality.
The Data Layer also includes data sources from external agencies, such as the Eligibility, Entitlement, Enrollment, and Death File from the Social Security Administration and census data from the U.S. Bureau of the Census. Each operational system represents an application silo in the CMS operational environment.
CMS is actively integrating its disparate operational data systems into enterprise data repositories (e.g., data warehouses, data marts, and ODSs), which are further integrated into an enterprise-level IDR. CMS uses Informatica PowerCenter to extract, transform, and load data from the various operational systems into integrated CMS data repositories.
The following data repositories are available for use in the CMS BI Environment.
Data Warehouses
- Integrated Data Repository Cloud
Data Marts
- Medicare VDM
- Drug Data Processing System VDM
Operational Data Stores
CMS uses the following ODSs to build Agency data warehouses and data marts:
- Common Working File (CWF)
- Health Insurance Portability and Accountability Act (HIPAA) Eligibility Transaction System (HETS) 270/271
- Medicare Advantage Prescription Drug System (MARx) Inductive Use Interface (IUI)
- Health Integrated General Ledger Accounting System (HIGLAS)
Collectively, these data warehouses and data marts become the data presentation area where CMS data is organized, stored, and made available for direct querying by users, report writers, and other analytic applications. Technical details of the CMS data warehouses, ODSs, and data marts are provided above in Data Layer.
Security, Data Privacy, and Data Use Agreement
Important components of the CMS BI Analytical Framework from the Business View are security and data privacy. As CMS integrates data from disparate internal and external data sources for access and sharing by common user communities, it must adhere to current Moderate Level CMSRs published in the CMS ARS , the CMS Policy for the Information Security Program (PISP), and CMS privacy and DUA policies. For additional information on privacy and DUAs, please refer to the CMS website on Privacy.
Security, privacy, and DUA are discussed further below in Cross-Infrastructure Layer.
The Technical View
The second view of the BI Reference Architecture is the Technical View, as shown in CMS BI Reference Architecture . This view provides more technical detail and is intended for a technically oriented businessperson or someone implementing, maintaining, or operating CMS IT systems. The guiding principle of the Technical View is that components of the BI framework are implementable segments categorized as technology, hardware, or software.
The Technical View is represented by the same fundamental concepts as the Business View and comprised of the same three layers—Users, Analytics, and Data—found in the BI Analytical Framework. In addition, the new concept of a Cross-Infrastructure Layer is included in the Technical View, representing the enterprise IT infrastructure management, services, technology, and components. This layer consists of the following categories: Security, Privacy, and DUA; System and Data Management and Administration; Network Connectivity, Protocols, and Access Middleware; and Hardware and Software Platforms.
Users Layer
The Users Layer in the Technical View represents the BI user interfaces. A user interface is the system by which BI users interact with the BI Environment and includes interfaces for Web browsers, portals, devices, and web services.
Web Browsers and Portals
The new CMS web browser-based BI Portal is the primary interface for all business users of the CMS BI Environment and is much more than a layer of screen and report presentation programs. The BI Portal includes services for accessing and retrieving data from the BI / Semantic Layer (described below in detail in Metadata) and from CMS data repositories—all triggered by user requests. Once successfully logged into the BI Portal, the user can enter a new business question, select a query or data mining function to answer a business question, or view BI training and other supporting documents.
The CMS Enterprise Portal is the common user presentation layer that provides a centralized, browser-based, secure point of entry for BI users to access BI data. The BI Portal logically consolidates information and business functions, enabling consistent delivery and presentation of information across the user base. Specifically, BI users can:
- Collaborate and share queries and reports
- Use browser-based reporting applications
- Manipulate data and information
- Save data and information in the BI Portal layer
Portal users can perform these specific BI analytic functions in various BI applications without having to exit the portal.
The CMS Enterprise Portal is also discussed in the previous topic, CMS Enterprise Portal Framework.
Devices
Devices are technology components used as vehicles to receive information and interact with the digital environment. Uses and features of devices vary widely. Examples of devices include personal computers, personal digital assistants (PDA), and mobile and Smart phones. Although CMS approves the use of these devices, their functionality and interaction with the CMS Enterprise Portal system have not yet been approved by the TRB.
Web Services
Web services are technologies and processes for facilitating communications between two applications. For example, custom-coded applications can communicate with a BI tool using web services. In the CMS BI environment, web services enable the use of selected (published) analytical information by other BI applications.
Analytics Layer
The Analytics Layer provides access to analytics applications central to the BI environment. A variety of applications may be supported, from static reporting and balanced scorecards to sophisticated quantitative models embedded in an operational process. This layer typically consists of various technological components designed to meet specific needs.
The CMS BI Reference Architecture supports the technology components required in the business analytics applications described in detail above in The Business View.
Data Layer
The Data Layer represents all the data sources, data integration processes, and data repositories that support the BI Environment. CMS Technical Data Architecture presents the CMS Technical Data Architecture, showing the flow of data in the Data Layer and the integration processes necessary to move data from the data sources into usable formats in the BI Environment.
This topic examines the components and services of the CMS technical data architecture, as well as the primary data sources available in the CMS operational systems. The operational systems are found in legacy systems and stored in standard mainframe data storage facilities like VSAM and flat files.
Operational Data Stores provide other sources of data available for building CMS data warehouses. CMS has implemented numerous ODSs, including CWF, MARx, and HIGLAS, that contain detailed, transaction-level data. ODSs are primarily used as data feeds to build data warehouses and not as data sources in the BI Environment for user query and reporting.
Data is extracted from the operational systems based upon specific BI application business requirements. The extracted data is placed in the Data Staging Area, where much of the data transformation takes place and the added value of a data warehouse is realized. The raw data is stored in the Data Staging Area in simplified and accessible forms—e.g., flat files, relational tables, or proprietary structures used by the Informatica PowerCenter Extract, Transform, and Load (ETL) tool. The data found in the Data Staging Area is not available as input to analytic processing in the BI environment.
Once the data in the Data Staging Area are transformed, combined, and cleaned using specific business rules, it is simple for load utilities to load the data into relational databases in the Data Repositories. Load utilities are provided by the relational database management system (RDBMS) software or by the Informatica PowerCenter.
Because the ETL process is iterative, the source for a specific load process may also be the target data warehouse itself, which becomes the data feed for building dependent data marts.
The following subtopics describe each component of the Data Layer in detail.
Data Sources
The Data Sources Layer identifies all sources of data available within and outside of CMS that are accessed and used as part of the BI Environment. This data may include either structured or unstructured data.
Operational
The appropriate Medicare and Medicaid data is obtained from the operational systems: Claim, Beneficiary, Provider, Plan, or Quality. Operational systems data is extracted and transformed into standardized CMS formats based upon specific business rules, converted into database records, and stored in relational database management systems.
Unstructured
Unstructured data are captured as text in email or documents, as audio or video files, or as images. In some cases, unstructured data may be stored in the RDBMSs as binary or character large objects (BLOB or CLOB). CMS BI applications may simply retrieve unstructured data as BLOBs or CLOBs for use in the BI Environment.
Unstructured data may also be indexed and stored in the CMS Enterprise Content Management (ECM) system for use by the BI application.
Informational
Informational data sources are the output of analytic processes. Output data are stored as informational data in the RDBMSs or multidimensional databases and reused for further analysis.
External
External data sources are those available external to CMS’s data assets, such as Social Security Administration and Census Bureau data. External data is transformed and loaded to enhance the CMS data warehouses and data marts.
Data Integration
The Data Integration Layer contains all technology components and processes that support the processing and movement of data to prepare it for storage in the Data Repositories Layer or to share it with other analytical applications and systems. This layer may process data in scheduled batch intervals or in near real-time / “just-in-time” intervals, depending on the nature of the data and its business purpose.
Extract, Transform, Load / Apply
Extract, Transform, Load / Apply refers to the technologies and processes by which the Data Sources Layer is accessed, extracted, transformed, and loaded into storage in the Data Repositories Layer. This involves extracting the operational data, transforming it based on specific business rules and common data warehousing best practices, and loading it into a target data store or stores.
This is generally referred to as the ETL process. In the context of BI architecture, the key point is that properly designed and executed ETL allows for both the organization of data according to subject matter and the enforcement of business rules as the data is loaded. Information about data movement and the associated business rules is then stored in the metadata repository. ETL software is used multiple times in the life of data movement. Initially, ETL can be used to move (or extract) data from its original source into an ODS, data warehouse, or data mart. During each of these moves, transformations and/or business rules may be applied to clean, scrub, remove duplicates, and/or standardize the data.
In addition to loading the scrubbed data from the ODS databases into the data warehouses, additional ETL processes can aggregate data into a data mart to provide a summarized view of the information. When the loading process is complete, the data is ready for business users to access through the BI Portal.
ETL processing occurs on regular schedules (i.e., daily, weekly, or monthly) to meet business reporting or analysis requirements and consistently refreshes the data loaded into the data warehouse.
Integrity / Quality
Integrity / Quality represents the technology stage in which the operational data that has completed the ETL process is further assessed for quality, reliability, completeness, timeliness, accuracy, and missing values. Achieving and sustaining a high level of data quality requires an effective enterprise data governance program as well as diligent analysis, planning, implementation, and monitoring.
Agency-wide data quality standards for source data are applied in the data cleansing, consistency checking, completion, and profiling activities conducted by the ETL software as part of the transformation process. Data profiling uses specific tools to “measure” the data quality level expected from a specific source system, and the process may be quite complex. In this stage, additional edits or business rules may also be applied to data to meet user requirements.
CMS uses validation tools to measure and monitor key data quality attributes across all data types and sources. This information is used to create and support a business culture that values data quality across the CMS enterprise.
Synchronization
Synchronization is the process of sharing data by copying it across physical storage repositories while still retaining its authoritative validity. Through this process, CMS provides a database that could be used by mid-tier processes, Java-based applications, and Procedural Language / Structured Query Language applications without having to cross the network to access the DB2 data repositories in the production environment.
Data Repositories
The Data Repositories Layer contains the databases and components providing most of the storage for the data that supports a BI Environment. Data repositories are not a replacement or replica of operational databases that reside on the Data Sources Layer, but rather, a complementary set of databases that reshape data into formats necessary for responding to ad hoc queries and helping to make business management decisions.
Data Staging Area
The Data Staging Area is where raw data from the Operational Systems is loaded, cleaned, combined, and exported to one or more data warehouses / marts. The raw data is copied into simple, accessible formats (e.g., relational databases and flat files) for transformation. The transformation programs from ETL software reformat the source data records, rectify any discrepancies, and delete duplicate records. The resolution process for data record issues is predefined in the ETL software according to CMS-created business rules.
Operational Data Store
The Operational Data Store is a hybrid environment in which operational data is transformed by ETL into an integrated format. Once placed in the ODS, the integrated data is available for online updates.
The ODS has a dual purpose—it serves as a point of integration for operational systems, and supplies current, detailed data for data warehouses in the CMS data repositories
The ODS is not available for direct query and reporting by users in the BI Environment.
Data Warehouse
A data warehouse (DW) consists of various databases in which the previously cleansed ODS data is stored in standardized formats for quick access by multiple BI tools and approved users. DWs have different attributes than transactional databases and are designed to optimize response time for user queries and reports, normalize source data, and eliminate data redundancy. Data may be sorted by specific query tables or dimensions, such as geography, time, or claim type, and is sharable by BI tools and users. Data may also be aggregated into a table such as total claims by geographic region for the week, month, or year.
This layer includes several large, user-defined databases as DWs, the largest of which is the IDR. Over time, as existing data from various sources is integrated into the IDR and more users rely on it as the primary, authoritative source of enterprise data, disparate query results from users across CMS will be reduced.
Data Mart
Because the source data extracted, transformed, and loaded into the BI Environment is used in different ways by a variety of users, data marts are used to reorganize source data to meet different user purposes. Data marts are, in effect, smaller versions of the predefined database tables developed in the DWs. Data marts are basically division- or branch-sized DWs or databases, making data available to specific user groups. For instance, one data mart may contain all the data needed to quickly answer Part D queries, while another may be focused on building a data structure to view beneficiary information by month and year. Views or VDMs provide a “window” into the data and can be used to filter out data elements that the user does not want or need to see. These VDMs are pre-defined in each data mart. The data marts and OLAP constitute all the predefined, tailored databases specifically designed to optimize response time for meeting various demands of user groups.
Metadata
Metadata is information common to all layers in the BI Environment that is crucial for ensuring the integrity of data as it moves from being raw data to structured formats accessible by end users. Metadata, or “data about data,” provides a consistent description of discrete data by defining common data names, definitions, and integrity rules across BI tools, ETL tools, and databases. Metadata supports all aspects of BI delivery capability, including:
- Capturing business conversion and transformation rules
- Facilitating the user’s ability to understand the “business” meaning of data elements and relationships for creating BI queries and reports
- Assisting an auditor’s ability to trace data lineage from sources to reports
Metadata also captures metrics for data usage and retrieval, providing insight into BI performance.
Metadata management is a key aspect of the BI Environment and relies on the agency’s strategy that defines and maintains the enterprise-wide metadata source. Each tool and database within the BI Environment houses metadata relevant to its function and purpose.
Metadata captured in the BI Environment is generally known as the BI Semantic Layer. This layer isolates business users from the technical complexities of databases by using everyday terms to describe the business environment. By using the BI Semantic Layer to create a query, users can retrieve exactly the data that interests them while communicating in their familiar business terminology. The BI Semantic Layer empowers users with a variety of tools and applications by providing deeper insight into enterprise data assets while hiding the complexity of data and analytics.
Some BI tools, such as MicroStrategy, call this semantic layer its Metadata, while others, such as Business Objects, call it the Universe.
Cross-Infrastructure Layer
The BI Environment requires interactions across its multiple layers that are relevant to all layers. Many benefits of the dynamic and actionable nature of the BI Environment are a result of these additional component interactions.
The Cross-Infrastructure Layer includes security, data privacy, DUA, and infrastructure components, as described in the following subtopics.
Security, Data Privacy, and Data Usage Agreement
Security is of paramount importance in a BI Environment and is applicable to every component at every level of the architecture. Security not only addresses access to applications and data, but also enables business-rule and role-based views of data.
Safeguarding privacy is closely related to security and is of utmost importance in a BI Environment. Privacy processes and technology focus on defining and managing highly sensitive PII and PHI data.
Security and data privacy adhere to the technologies, processes, and organizational components that meet with CMS and federal government security and privacy regulations, policies, and standards. The CMS PISP provides specific security policies; the current Moderate Level CMSRs published in the CMS ARS provide specific security requirements; and the CMS TRA Network Services, Access Control and Identity Management chapter provides specific engineering guidance for implementing system access controls that partially address the CMSR.
A Data Use Agreement is a legally binding agreement between CMS and an external entity (e.g., contractor, private industry, academic institution, or other federal government or state agency), formed when an external entity requests the use of CMS personally identifiable data that is covered by the Privacy Act of 1974.
CMS BI tools use the CMS Enterprise LDAP Directory for user authentication. UserIDs and passwords are managed through CMS Enterprise Identity Management services.
Role-based authorization is used to manage access to BI applications, reports, analytic functions, tables, views, and procedures based upon BI user classifications, discussed in detail below in BI Implementation, BI User Management.
Infrastructure
Infrastructure provides the technologies, processes, and services that enable the BI Environment to exist and operate. Infrastructure primarily includes Systems and Data Management and Administration; Database Management and Administration; Security Administration and Control; Network Connectivity, Protocols, and Access Middleware; and Hardware and Software Platforms.
Functions, capabilities, services, hardware, and software provided in each infrastructure component are described in the following subtopics.
System Management and Administration
System Management and Administration in the CMS BI Environment provide the following support and services:
- Support BI server administration via role-based access control
- Support authentication and authorization within the BI Tool and BI Portals
- Establish DUAs for all contractor staff
- Create database schemas to support the BI analytics applications
- Create user account(s) to access the various reporting databases via the BI tool
- Create ETL business rules, workflows, and source-to-target mappings to move data from source systems to IDR or other platforms
Hardware and Software Platforms
The CMS BI Environment employs a variety of hardware and software platforms, including the following:
- Web servers, including failover and load-balancing
- Application servers, including failover and load-balancing
- Database servers to support storage of application data and metadata as well as any reporting databases
- Server operating systems
- Software application servers
- Database middleware clients
- Workload / scripting automation software
- Web server software
- BI tool software and associated Software Development Kit (SDK)
- BI tool web services
Alignment with the CMS TRA Multi-Zone Architecture
The CMS BI Reference Architecture aligns with the CMS Multi-Zone Architecture, as illustrated in Alignment of CMS BI Reference Architecture with CMS TRA Multi-Zone Architecture .
BI Component Placement in the CMS TRA Multi-Zone Architecture
CMS Business Intelligence Component Placement depicts the common placement for BI tool components in the CMS TRA Multi-Zone Architecture.
Load Balancers
Load balancers are the initial point of contact for new user sessions, ensuring that user session load is dynamically and evenly distributed across all Web servers. Dynamic distribution of load allows scalability, resiliency, and ease of configuration of Web servers. User session load balancers must be placed in the Presentation Zone.
Web Servers
Web servers manage communication between the Web clients and the BI application servers. Web servers will be redundantly configured and load balanced. All Web servers must be placed in the Presentation Zone.
Web servers must be implemented on a CMS-approved Web server platform. Web browsers must communicate with the Web server through HTTPS.
Business Intelligence Servers
Business Intelligence servers must be implemented in the Application Zone. These servers provide the core analytical processing and job management for all reporting, analysis, and monitoring applications. Exceptions should be authorized by the TRB.
BI Servers communicate with the Web servers in the Presentation Zone through Transmission Control Protocol / Internet Protocol (TCP/IP) and with the Database Servers in the Data Zone through standard database access methods — ODBC and JDBC, which are non-compliant with the CMS TRA Multi-Zone Architecture, as stated above in BI Environment, Non-Compliance with CMS TRA Multi-Zone Architecture.
The BI application servers currently implemented in CMS are:
- MicroStrategy Intelligence Server
- Business Objects Enterprise Server
- Cognos Business Intelligence Server
- SAS Enterprise BI Server
Database Servers
Database Servers support the DWs, data marts, and ODSs implemented in the data zone.
Metadata Repositories
The BI tool metadata repository is a set of database tables that store BI information including warehouse schema, server definition, projects, queries, reports, users, and warehouse connectivity information. A server definition is a specification of connectivity information such as the metadata data source name (DSN) and the metadata ID and password for a configured instance of a BI server.