Business Intelligence Environment

CMS has successfully implemented the Agency’s enterprise-wide BI Environment that provides a consolidated, secure gateway to the wealth of CMS data where users can leverage a suite of standard COTS BI tools. These BI software tools are made available for use by CMS offices and centers, analysts, developers, project managers, external researchers, law enforcement, partners, government contractors, and the public.

PREFERRED

While CMS supports all of these tools, CMS prefers that project teams consider these factors:

  • Use case
  • Cost at scale
  • Ability to reuse developed capabilities for the given data source(s)

These tools have been integrated with various data repositories and frameworks. Some are for use by the enterprise and external entities, while others are specific to particular Centers or projects. The latter are called out because their capabilities can be replicated for other use cases.

The analytic and BI tools implemented and supported at CMS include the following COTS products (comparisons below):

  • Esri ArcGIS Enterprise provides an AWS Cloud based enterprise solution that enables geospatial mapping and analysis. ArcGIS is used by the Center for Program Integrity (CPI) to enable geographic mapping and analyses to help meet its business objective of finding and reducing fraud, waste, and abuse in Medicare and Medicaid. It uses geocoded data from the Integrated Data Repository Cloud (IDRC) Provider and Beneficiary data.

  • AWS Services – AWS provides Query services like Amazon Athena and Redshift Spectrum, data visualization tools like Amazon QuickSight, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR (Elastic MapReduce). Each of these services addresses different needs and use cases, and are used by multiple centers at CMS. The table below compares them and provides guidance to help project teams choose one or more services based on their requirements.

  • SAP Business Objects (SAP Business Intelligence Platform) supports the Center for Program Integrity (CPI), for example, using the CMS Integrated Data Repository Cloud (IDRC) and provides ad hoc and OLAP reporting capabilities.

  • IBM Cognos Analytics supports the Part D and Drug Data Processing System (DDPS),Medical Appeal System (MAS), Unified Case Management (UCM), and Eligibility Appeals Case Management Solution (EACMS) (among others) using the Teradata data repository and provides ad hoc and online analytical processing (OLAP) reporting capabilities.

  • Informatica PowerCenter and Data Engineering Integration (DEI) are data integration tools used to integrate data warehouses, data marts, and operational data stores (ODS) with the capability to extract, transform, and load (ETL) data from virtually any business data source in CMS. PowerCenter is used for ETL with DB2, Teradata, Oracle databases, and flat files.

  • MicroStrategy MicroStrategy is a data analytics and business intelligence reporting tool that can generate interactive dashboards, high-end formatting in reports, scorecards, and several other features relating to the generation, sorting, and automated distribution of reports that provide insights into past, present, and future business trends. It supports Part B Analytics Reports (PBAR), for example, using the Teradata data repository and provides ad hoc and OLAP reporting capabilities.

  • Microsoft Power BI is a unified cloud-based business intelligence and analytics service that provides a full overview of the application's most critical data. By connecting to all the application data sources, Power BI simplifies data evaluation and sharing with scalable dashboards, interactive reports, embedded visuals, etc. while maintaining data accuracy, consistency, and security.

  • Python, R, and Scala are programming languages that are very helpful for data scientists since they offer powerful libraries to support data science use cases. The BI tools team provisions CMS Python AWS WorkSpaces with connectivity to IDRC , ACO-OS (DB2), and Informatica server for teams to use. Users can utilize these virtual desktops to run Python scripts and big data computations without storing data on their local machine (see comparisons).

  • SAS Enterprise Business Intelligence (EBI) combines the strengths of SAS Analytics and SAS Data Management and provides business users with a powerful tool for making better decisions. It also boosts data consistency and streamlines the administration. SAS offers a unified platform for teams to prepare the data, analyze it visually, and build, operationalize, and manage data science, and AI/ML models in an augmented design experience. SAS is a statistical software suite for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. CPI and HHS OIG use SAS for accessing IDRC and IDRC (Snowflake) to perform analytical reports to identify Fraud, Waste, and Abuse for CMS.

  • Tableau provides all types of users with intuitive business intelligence (BI) tools to enhance data discovery and understanding. With simple drag-and-drop features, a user can easily access and analyze key data, create innovative reports and visualizations, and share critical insights across the agency. Tableau is currently used by a lot of centers including CMCS/DSG/DIS, CMMI, OA, OIT/ISPG, OIT/IUSG, and OSPR.

Business Intelligence Integrations and Platforms

These services provide integration with data sources for business intelligence and data analytics tools.

  • Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in CMS cloud and manages and deploys cloud infrastructure on the project’s behalf. Databricks is currently used by CMCS for its DataConnect. It is also available as Software as a Service (SaaS).

  • Snowflake is a cloud-native data warehouse platform that enables data storage, processing, and analytic solutions that are faster, easier to use, and massively scalable. Snowflake is currently used by OIT/EADG for the IDR Cloud program as well as EDM.

  • Enterprise Data Mesh (EDM) LaunchPad is an AWS-based Platform as a Service (PaaS) solution designed to facilitate the rapid setup of proof of concepts (POCs) and pilot projects. It offers flexibility in terms of supported architectures and leverages various AWS services to provide a comprehensive platform for application development and testing. Also, it is a pre-ATO’ed production environment for consumers who do not have a Cloud presence and want to explore cloud capabilities. The EDM Launchpad is designed to accelerate analytics and visualization endeavors for the CMS data community. CMS team can leverage AWS services such as Amazon Redshift for high-performance data warehousing, Amazon QuickSight for intuitive and interactive data visualization, and Amazon S3 for scalable object storage. Data integration and ETL workloads can be facilitated by AWS Glue and AWS Databrew, ensuring seamless data preparation and transformation. The environment also incorporates IAM policies to enforce secure access controls and compliance with CMS data governance policies. The Launchpad’s architecture can handle large volumes of data and support complex analytical workflows, ensuring scalability and performance. EDL Launchpad consumers can process large datasets to analyze and visualize the data using AI/ML models with the Python programming language.

  • EDM Workspace is a service offering provided to individual data analysts and scientists within CMS with a versatile and tailored virtual desktop environment equipped with the necessary tools for data analysis, wrangling, and visualization. The technology stack and architectural choices can be customized to ensure a secure, efficient, and personalized analytical workspace for each user. The Workspace offering is designed as an all-inclusive AWS Windows or Linux workspace tailored to cater to the unique preferences and requirements of individual users. The architecture features AWS WorkSpaces, providing a secure, cloud-based virtual desktop environment.

    • These virtual desktops are integrated with Active Directory and Okta SSO for seamless authentication and authorization with CMS EUA and IDM user accounts.
    • The CMS EDM team ensures that connectivity is established to CMS Business Intelligence tools, as needed, to ensure data scientists and data analysts can generate reports utilizing the metadata and data accessed through the EDM.

Background

CMS collects and maintains a vast amount of healthcare data. With the successful implementation of the CMS Integrated Data Strategy, the strategic, tactical, and operational importance of this data is now solidified as a critical enterprise asset. The CMS IDR Cloud has become the primary, authoritative enterprise data asset through the consolidation and modernization of various CMS data warehouses, data marts, and applications.

CMS has implemented an enterprise-wide BI Environment that provides the front-end query, analytics, and reporting solutions that make data more accessible. These solutions also empower users to make important decisions more efficiently and with greater confidence. The CMS IDR Cloud and BI Environments are major components of the CMS BI Reference Architecture, providing shared access to consolidated, reliable data and information across the CMS enterprise as well as with other government agencies and external business partners. The CMS BI Reference Architecture provides more detail.

Objectives

The CMS user base needs access to an integrated, layered CMS BI Reference Architecture that meets the following objectives:

  • Integrates authoritative data from diverse internal and external sources into a primary repository for access and sharing by common user communities

  • Adheres to the current CMS Minimum Security Requirements (CMSR) at the Moderate security level, as published in the CMS ARS , and requirements of the privacy and data use agreement (DUA) processes

  • Ensures that source data collected in the common CMS data repositories meets Agency standards for data quality and consistency

  • Develops self-service methods for easy, quick, and secure access to BI data by providing users with tools for reporting and analyzing data, whether that data resides within the IDR Cloud or in applications and data stores outside the IDR Cloud

  • Enables use of metadata and common enterprise-wide semantics to fully employ the self-service model, giving users access to information about the data they use

  • Provides an environment with the capacity to analyze integrated authoritative data in a secure CMS BI Environment

CMS BI Production Environment

The current CMS BI Production Environment consists of a multi-zone architecture that aligns with the CMS TRA. The CMS BI Production Environment supports the Presentation, Application, and Data Zones and the specific access and protection requirements for meeting the needs of users in each zone. CMS Business Intelligence Production Environment (As-Is) illustrates the BI tools and processes that form the framework of the CMS BI Production Environment.

CMS Business Intelligence Production Environment (As-IS)

Figure shows the current three zones of the CMS BI Production Environment: Presentation, Application, and Data. BI users interface with the Presentation Zone via MicroStrategy, Cognos, SAS BI, and Business Objects Web clients. The Presentation Zone includes a firewall and a network switch connecting to MicroStrategy, Cognos. SAS BI, and Business Object portals. The Application Zone includes a firewall and a network switch connecting to MicroStrategy, Cognos, SAS Enterprise BI application servers, and the Business Objects Web Portal. Underlying these is the SAS Metadata. The Data Zone includes a firewall and a network switch, the two categories of Business Objects Application Server and Informatica PowerCenter for ETL, and the broad category Oracle, which contains four areas of metadata: MicroStrategy, Cognos, Business Objects, and Informatica. Next to Oracle is another three-part category: Teradata, IBM DB2, and Oracle data repositories. The current CMS BI Production Environment is compliant with the CMS TRA Multi-Zone Architecture.
CMS Business Intelligence Production Environment (As-Is)

The BI components are implemented in the CMS BI Production Environment in accordance with the CMS TRA Multi-Zone Architecture.

Non-Compliance with CMS TRA Multi-Zone Architecture

Although the BI tools implemented at CMS are enterprise-class BI software suites, the product manufacturers did not develop them to meet the unique demands and design requirements of the CMS infrastructure. None of the software suites, as implemented at CMS, are fully compliant with one aspect of the CMS TRA—i.e., data transport between the Application Zone and the Data Zone.

The CMS TRA requires that, “When an Application Zone application requires information or actions from the Data Zone, the application makes the request and processes the response using a ‘redundant messaging facility’.”

Instead of messaging, the BI software in use at CMS separates business processing and data processing by using ODBC or Java database connectivity (JDBC) as standard database access methods.

Only the manufacturers of the CMS BI software can alter this aspect of their products. The current versions of the implemented CMS BI tools have no accepted methods for mitigating this conflict with the CMS TRA Multi-Zone Architecture.

CMS Enterprise Portal Framework

In the current CMS BI Production Environment, each BI tool supports its own portal enabling application users to access the BI contents—the metadata, database data, queries, and reports—created for each specific BI application. Because information in the CMS BI production environment is distributed across many BI applications, information sharing and collaboration among CMS BI applications is quite limited.

CMS has implemented an enterprise-wide secure gateway, known as the CMS Enterprise Portal, that enables all BI software tools to access, manipulate, analyze, and share data. This portal provides a one-stop site where users can access and analyze data, helping them make important decisions more efficiently and with greater confidence. The BI Portal Framework leverages CMS’s Portal platform and architecture, providing aggregated reports and visualizations of data retrieved from various BI tools interacting with the CMS Enterprise Data Warehouse, which comprises any CMS data warehouse or data mart built for decision support.

CMS Enterprise Portal Framework illustrates the CMS Enterprise Portal Framework.

Figure depicts the enterprise framework as described in the text. The presentation zone, application zone and data zone are drawn out to show the relationships between them.
CMS Enterprise Portal Framework

The CMS Enterprise Portal has the following additional features:

  • Aligns with the CMS Enterprise Portal Strategy
  • Integrates with the CMS Access Manager and CMS Enterprise Lightweight Directory Access Protocol
  • Migrates the existing BI Portal from Solaris 10 to Z/Linux Operating System
  • Implements single sign-on

The initial release of the CMS Enterprise Portal supports:

  1. Parts A, B, D, and Enrollment Dashboards
  2. Part B Analytics Reports
  3. National Level Repository (NLR), Health Information Technology for Economic and Clinical Health Act (HITECH) Electronic Health Record incentive program]
  4. Affordable Care Act (ACA) Program Dashboard
  5. MicroStrategy Web
  6. Teradata ViewPoint

PREFERRED

Enterprise and other Data Repository Frameworks

These provide access to source data repositories housed in the cloud, and have been extensively integrated with enterprise BI tools. Depending on use case, they can be used as appropriate, or their designs can be reused for similar capabilities.

  • Enterprise Data Mesh (EDM) is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design. The EDM creates a layer of connectivity that abstracts away the complexities of connecting, managing, and supporting access to data. At its core, it is used to stitch together data held across multiple data silos. The premise of a data mesh is that it is used to connect distributed data across different locations and organizations. A data mesh ensures that data is available, easily discoverable, secure, and interoperable with the applications that need access to it.

  • The Integrated Data Repository Cloud (IDRC) is a high-volume data warehouse integrating Medicare claims—Parts A, B, C, D, and Durable Medical Equipment (DME) with beneficiary and provider data sources, as well as such ancillary data as contract information and risk scores. This robust, integrated data supports much needed analytics across CMS, as well as external agencies such as DOJ, FBI, OIG, etc.

    It consists of a data lake, ETL components, and a Snowflake data warehouse. Access to the IDR Cloud includes SAS EBI and Snowflake user interfaces, as well as the Data-as-a-Service (DaaS) API, or through EDL Snowflake.

  • CMCS DataConnect is an all-in-one analytics platform for the Center for Medicaid & CHIP Services (CMCS). DataConnect is built on Databricks and AWS QuickSight dashboards. It provides read-only access to an expanding set of enterprise datasets, integrated with tools. Existing data insights include Dashboards and Data Spotlights.

  • CMS Master Data Management (MDM) is a CMS enterprise shared service application with a focus on eliminating redundancy, inconsistency and fragmentation of CMS data and increasing efficiencies. MDM provides a single point of access to a singular, synchronized, comprehensive, and ID-resolved authoritative source of Beneficiary, Provider, Organization, Program and Relationship data for use within CMS and by external organizations and agencies.

Business Intelligence Tools Comparisons

Below are comparisons of analytics and BI tools in use at CMS

CMS Enterprise BI Tools Comparison

Criteria

ArcGIS

BusinessObjects

Cognos

MicroStrategy

SAS-EBI

Tableau

Vendor

Esri

SAP

IBM

MicroStrategy

SAS

Salesforce

Use Cases Best Suited for

Geospatial mapping and analysis

Reporting, visualization, seamless integration and native connectivity within the SAP ecosystem

Reporting, data exploration, modeling, analytics, dashboards, scorecards, and event monitoring and metrics

Interactive dashboards, scorecards, formatted reports, thresholds and alerts, automated report distribution

Statistical software suite for data management, analytics, multivariate analysis, predictive analytics

Intuitive visual analytics for all user types, data discovery, predictive analytics

CMS User Base

Developers and users of maps and dashboards

Developers, data analysts, and report/dashboard consumers

Developers and report/dashboard consumers

Developers and report/dashboard consumers

Data analysts

Developers and report/dashboard consumers

CMS User Capacity

  • 150 Creator users
  • Unlimited Viewers
  • 2 ArcGIS Pro licenses in Amazon Workspaces
  • 5 ArcGIS Desktop Pro (CPI users)
  • Approximately 1,400 users
  • Concurrent user capacity 200
  • Approximately 5,000 users
  • Concurrent user capacity 100
  • Unlimited user’s licenses allocations
  • Approximately 3,500 users
  • Concurrent user capacity 100
  • User’s license allocation limited
  • Approximately 1,200 users (Web and Citrix Microsoft Client)
  • Concurrent user capacity 200
  • SAS licenses are limited
  • Approximately 350 users
  • Concurrent Users Capacity 30
  • User’s Licenses Allocation:
    • 105 Creators
    • 18 Explorers
    • 545 Viewers
Web-based Tools used at CMS
  • ArcGIS Enterprise
  • ArcGIS Dashboards
  • ArcGIS StoryMaps
  • ArcGIS Web AppBuilder
  • SAP Business Objects
  • Web Intelligence Reporting Tool
  • SAP Lumira Dashboard Tool
  • Cognos Report Authoring
  • Cognos Query Studio
  • Cognos Dashboards
  • MicroStrategy Web
  • MicroStrategy Dashboard
  • MicroStrategy Document
  • SAS Information Delivery Portal
  • SAS Web Report Studio
  • SAS Studio

Web and Desktop tool:

  • Tableau Desktop
  • Tableau Server
  • Tableau Prep Builder
  • Tableau Services Manager
CMS Integration CMS Enterprise IDM CMS Enterprise Portal CMS Enterprise Portal CMS Enterprise Portal CMS Enterprise LDAP CMS Enterprise LDAP and Okta MFA
Developer Access
  • ArcGIS Enterprise
  • ArcGIS Pro
  • ArcGIS for Adobe Creative Cloud
  • StreetMap Premium
  • AWS Workspaces for desktop tools
Windows based client tools on Citrix to create data structures (Universes), and Dashboards (Lumira) AWS workspaces with Okta MFA MicroStrategy Developer client tool

Access via Citrix:

  • Enterprise Guide
  • Enterprise Miner
  • Information Map Studio
  • OLAP Cube Studio
  • SAS Add-in for MS Office
AWS workspaces with Okta MFA
Some CMS Applications & Data Sources Provider and Beneficiary Geocode data located in IDR Cloud for CPI’s workflows uses IDRC - Teradata
  • Medical Appeal System (Oracle)
  • System for Tracking Audit & Reimbursement (Oracle)
  • Unified Case Management (DB2 & Postgres SQL)
  • Eligibility Appeals Case Management Solution (Athena)

MDX (Oracle) Part B Analytical Reporting/IDR (Teradata)

  • IDRC (Snowflake)
  • TMSIS (AWS Redshift and Databricks)
  • MSIS (DB2)
  • MQM (SQL Server)
  • HCRIP (Oracle)
  • OA/EW_Dashboard (Excel and CSV data)
  • OA/CFM (Amazon Redshift)
  • ISPG/CEDE (MS SQL Server)
  • HCDR (MS SQL Server)
  • DataConnect (Databricks)

Pricing Model

named user

per user

per user

per user

per user

per user

Cost

Enterprise License Agreement funded by CPIs

Funded by CPI, limitations on how many users outside of CPI can use it

Funded by OIT/EADG, no cost to the project

Funded by OIT/EADG, no cost to the project

Funded by OIT/EADG, licenses are limited and assigned to existing users

Funded by OIT (ICPG & IUSG), licenses are limited

Comparison of AWS Services

Criteria

Athena

Redshift Spectrum

QuickSight

Redshift

EMR

What Is It?

Amazon Athena is an interactive query service that makes it easy to analyze unstructured, semi-structured, and structured S3 data directly with SQL

Amazon Redshift Spectrum allows efficient querying and retrieval of structured and semi-structured S3 data without having to load the data into Amazon Redshift tables

Amazon QuickSight is a fast, easy-to-use, business analytics service in the cloud

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Uses SQL to analyze structured and semi-structured data using AWS-designed hardware and machine learning (ML)

Amazon Elastic MapReduce (EMR) is a cloud big-data platform to run distributed big data processing frameworks such as Spark, Hadoop, Presto, or Hbase

Use Cases Best Suited for

Ad-hoc interactive SQL queries against data on Amazon S3

Employ massive parallelism to run very fast against large datasets in S3

Interactive visualizations and dashboards, ad-hoc analysis and embedded analytics via APIs & SDKs

A data warehouse solution that pulls data from many different sources / systems into a common format

Custom big data processing, analytics, ML

Setup & Management

Serverless: no setup or server management

Serverless: no setup or server management

Serverless: no setup or server management

Fully managed service along with the capability to manage clusters/ nodes. A serverless option is also available.

Greater effort to setup & manage clusters and the software installed on them

Performance

Well suited for small/medium datasets. Not suited for large datasets

Well suited for medium to large datasets – multiple clusters can concurrently query the same dataset in Amazon S3 without the need to replicate the data for each cluster

Automatically scale to tens of thousands of users

Well suited for large datasets. Resources are automatically provisioned and scaled based on need

Full control & flexibility over the configuration of clusters & performance

Pricing Model

Pay only for what you use – price per query

Pay only for what you use – price per query

Pay only for what you use – price per session

Pay for the provisioned capacity by the hour as long as the cluster is running

Amazon EMR cost is added to the EC2 cost (the underlying servers) and Elastic Block Store (EBS) usage

Cost

Low to Medium

Medium

Medium

Medium to High

Medium to High

Comparison of Python, R, and Scala

Criteria

Python

R

Scala

What Is It?

Python is a simple, open source, general-purpose language and is easy to learn. Many data analysis, manipulation, ML, and deep learning libraries are written in Python, and hence it has gained popularity in the big data ecosystem and is one of the de facto languages of Data Science.

R is a language and environment for statistical computing and graphics and hence rightly called the language of statisticians. R Studio can be used for research, statistics, plotting, and data analytics applications. R is also used for building data models to be used for data analysis.

Scala is a hybrid functional programming language since it has both object-oriented and functional programming features. Scala is a machine-compiled language that runs in a Java Virtual Machine (JVM). Scala is highly scalable and is the native language of Apache Spark.

Learning Curve

Easiest

Moderate

Steep learning curve

Used by

Beginners & Data Engineers

Data Scientists/ Statisticians

Big Data Programmers

Use Cases Best Suited for

Data Engineering, ML, Data Visualization

Data Analysis, Data Visualization, Statistics

Spark Native

Type of Language

General Purpose

Specifically for Data Scientists. Needs conversion into Scala/Python before productizing

Object-Oriented & Functional General Purpose

Concurrency

Does not Support Concurrency

NA

Supports Concurrency

Type Safety

Dynamically Typed

Dynamically Typed

Statically typed (except for Spark 2.0 Data frames)

Interpreted Language (Read-Evaluate-PrintLoop (REPL))

Yes

Yes

No

Mature ML Libraries

Excellent

Excellent

Limited

Visualization Libraries

Excellent

Excellent

Limited

Web Notebooks Support

Jupyter Notebook Support

R Notebook

Apache Zeppelin Notebook Support

Performance

Slower

Slower

Faster (about 10x faster than Python)

Cost

Free, open source

Free, open source

Free, open source