Skip to main content

Business Intelligence Environment

CMS has successfully implemented the Agency’s enterprise-wide BI Environment that provides a consolidated, secure gateway to the wealth of CMS data where users can leverage a suite of standard COTS BI tools. These BI software tools are made available for use by CMS offices and centers, analysts, developers, project managers, external researchers, law enforcement, partners, government contractors, and the public.

PREFERRED

While CMS supports all of these tools, CMS prefers that project teams consider these factors:

  • Use case
  • Cost at scale
  • Ability to reuse developed capabilities for the given data source(s)

These tools have been integrated with various data repositories and frameworks. Some are for use by the enterprise and external entities, while others are specific to particular Centers or projects. The latter are called out because their capabilities can be replicated for other use cases.

CMS staff are directed to transition from SAS software to CMS CIO-approved alternative solutions by December of 2026.

CMS has also identified Microsoft Power BI as a preferred tool within the CMS Microsoft 365 SaaS tenant. It is also available in AWS and Azure.

The analytic and BI tools implemented and supported at CMS include the following COTS products (comparisons below):

  • Esri ArcGIS Enterprise provides an AWS Cloud based enterprise solution that enables geospatial mapping and analysis. ArcGIS is used by the Center for Program Integrity (CPI) to enable geographic mapping and analyses to help meet its business objective of finding and reducing fraud, waste, and abuse in Medicare and Medicaid. It uses geocoded data from the Integrated Data Repository Cloud (IDRC) Provider and Beneficiary data.

  • AWS Services – AWS provides Query services like Amazon Athena and Redshift Spectrum, data visualization tools like Amazon QuickSight, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR (Elastic MapReduce). Each of these services addresses different needs and use cases, and are used by multiple centers at CMS. The table below compares them and provides guidance to help project teams choose one or more services based on their requirements.

  • SAP Business Objects (SAP Business Intelligence Platform) supports the Center for Program Integrity (CPI), for example, using the CMS Integrated Data Repository Cloud (IDRC) and provides ad hoc and OLAP reporting capabilities.

  • IBM Cognos Analytics supports the Part D and Drug Data Processing System (DDPS), Medical Appeal System (MAS), Unified Case Management (UCM), and Eligibility Appeals Case Management Solution (EACMS) (among others) using the Teradata data repository and provides ad hoc and online analytical processing (OLAP) reporting capabilities.

  • Informatica PowerCenter and Data Engineering Integration (DEI) are data integration tools used to integrate data warehouses, data marts, and operational data stores (ODS) with the capability to extract, transform, and load (ETL) data from virtually any business data source in CMS. PowerCenter is used for ETL with DB2, Teradata, Oracle databases, and flat files.

  • MicroStrategy MicroStrategy is a data analytics and business intelligence reporting tool that can generate interactive dashboards, high-end formatting in reports, scorecards, and several other features relating to the generation, sorting, and automated distribution of reports that provide insights into past, present, and future business trends. It supports Part B Analytics Reports (PBAR), for example, using the Teradata data repository and provides ad hoc and OLAP reporting capabilities.

  • Microsoft Power BI is a unified cloud-based business intelligence and analytics service that provides a full overview of the application's most critical data. By connecting to all the application data sources, Power BI simplifies data evaluation and sharing with scalable dashboards, interactive reports, embedded visuals, etc. while maintaining data accuracy, consistency, and security. It integrates with Microsoft 365, Azure, and AWS using data connectors. All users of CMS E-mail have a Microsoft 365 subscription that includes Power BI Pro. More information about Power BI is available at the CMSConnect / App Hub

  • Python, R, and Scala are programming languages that are very helpful for data scientists since they offer powerful libraries to support data science use cases. The BI tools team provisions CMS Python AWS WorkSpaces with connectivity to IDRC , ACO-OS (DB2), and Informatica server for teams to use. Users can utilize these virtual desktops to run Python scripts and big data computations without storing data on their local machine (see comparisons).

  • SAS Enterprise Business Intelligence (EBI) combines the strengths of SAS Analytics and SAS Data Management and provides business users with a powerful tool for making better decisions. It also boosts data consistency and streamlines the administration. SAS offers a unified platform for teams to prepare the data, analyze it visually, and build, operationalize, and manage data science, and AI/ML models in an augmented design experience. SAS is a statistical software suite for data management, advanced analytics, multivariate analysis, business intelligence, criminal investigation, and predictive analytics. CPI and HHS OIG use SAS for accessing IDRC and IDRC (Snowflake) to perform analytical reports to identify Fraud, Waste, and Abuse for CMS.

    Note that CMS staff are directed to transition from SAS software to CMS CIO-approved alternative solutions by December of 2026.

  • Tableau provides all types of users with intuitive business intelligence (BI) tools to enhance data discovery and understanding. With simple drag-and-drop features, a user can easily access and analyze key data, create innovative reports and visualizations, and share critical insights across the agency. Tableau is currently used by a lot of centers including CMCS/DSG/DIS, CMMI, OA, OIT/ISPG, OIT/IUSG, and OSPR.

Business Intelligence Integrations and Platforms

These services provide integration with data sources for business intelligence and data analytics tools.

  • Databricks is a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in CMS cloud and manages and deploys cloud infrastructure on the project’s behalf. Databricks is currently used by CMCS for its DataConnect. It is also available as Software as a Service (SaaS).

  • Snowflake is a cloud-native data warehouse platform that enables data storage, processing, and analytic solutions that are faster, easier to use, and massively scalable. Snowflake is currently used by OIT/EADG for the IDR Cloud program as well as the IDR EDP.

  • Enterprise Data Mesh (EDM) LaunchPad is an AWS-based Platform as a Service (PaaS) solution designed to facilitate the rapid setup of proof of concepts (POCs) and pilot projects. It offers flexibility in terms of supported architectures and leverages various AWS services to provide a comprehensive platform for application development and testing. Also, it is a pre-ATO’ed production environment for consumers who do not have a Cloud presence and want to explore cloud capabilities. The EDM Launchpad is designed to accelerate analytics and visualization endeavors for the CMS data community. CMS team can leverage AWS services such as Amazon Redshift for high-performance data warehousing, Amazon QuickSight for intuitive and interactive data visualization, and Amazon S3 for scalable object storage. Data integration and ETL workloads can be facilitated by AWS Glue and AWS Databrew, ensuring seamless data preparation and transformation. The environment also incorporates IAM policies to enforce secure access controls and compliance with CMS data governance policies. The Launchpad’s architecture can handle large volumes of data and support complex analytical workflows, ensuring scalability and performance. EDL Launchpad consumers can process large datasets to analyze and visualize the data using AI/ML models with the Python programming language.

  • BI Workspaces are provided to individual data analysts and scientists within CMS with a versatile and tailored virtual desktop environment equipped with the necessary tools for data analysis, wrangling, and visualization. The technology stack and architectural choices can be customized to ensure a secure, efficient, and personalized analytical workspace for each user. The Workspace offering is designed as an all-inclusive AWS Windows or Linux workspace tailored to cater to the unique preferences and requirements of individual users. The architecture features AWS WorkSpaces, providing a secure, cloud-based virtual desktop environment.

    • These virtual desktops are integrated with Active Directory and Okta SSO for seamless authentication and authorization with CMS EUA and IDM user accounts.
    • The CMS EDM team ensures that connectivity is established to CMS Business Intelligence tools, as needed, to ensure data scientists and data analysts can generate reports utilizing the metadata and data accessed through the EDM.

Background

CMS collects and maintains a vast amount of healthcare data. With the successful implementation of the CMS Integrated Data Strategy, the strategic, tactical, and operational importance of this data is now solidified as a critical enterprise asset. The CMS IDR Cloud has become the primary, authoritative enterprise data asset through the consolidation and modernization of various CMS data warehouses, data marts, and applications.

CMS has implemented an enterprise-wide BI Environment that provides the front-end query, analytics, and reporting solutions that make data more accessible. These solutions also empower users to make important decisions more efficiently and with greater confidence. The CMS IDR Cloud and BI Environments are major components of the CMS BI Reference Architecture, providing shared access to consolidated, reliable data and information across the CMS enterprise as well as with other government agencies and external business partners. The CMS BI Reference Architecture provides more detail.

Objectives

The CMS user base needs access to an integrated, layered CMS BI Reference Architecture that meets the following objectives:

  • Integrates authoritative data from diverse internal and external sources into a primary repository for access and sharing by common user communities

  • Adheres to the current CMS Minimum Security Requirements (CMSR) at the Moderate security level, as published in the CMS ARS , and requirements of the privacy and data use agreement (DUA) processes

  • Ensures that source data collected in the common CMS data repositories meets Agency standards for data quality and consistency

  • Develops self-service methods for easy, quick, and secure access to BI data by providing users with tools for reporting and analyzing data, whether that data resides within the IDR Cloud or in applications and data stores outside the IDR Cloud

  • Enables use of metadata and common enterprise-wide semantics to fully employ the self-service model, giving users access to information about the data they use

  • Provides an environment with the capacity to analyze integrated authoritative data in a secure CMS BI Environment

CMS BI Production Environment

The current CMS BI Production Environment consists of a multi-zone architecture that aligns with the CMS TRA. The CMS BI Production Environment supports the Presentation, Application, and Data Zones and the specific access and protection requirements for meeting the needs of users in each zone. CMS Business Intelligence Production Environment (As-Is) illustrates the BI tools and processes that form the framework of the CMS BI Production Environment.

CMS Business Intelligence Production Environment (As-IS)

Figure shows the current three zones of the CMS BI Production Environment: Presentation, Application, and Data. BI users interface with the Presentation Zone via MicroStrategy, Cognos, SAS BI, and Business Objects Web clients. The Presentation Zone includes a firewall and a network switch connecting to MicroStrategy, Cognos. SAS BI, and Business Object portals. The Application Zone includes a firewall and a network switch connecting to MicroStrategy, Cognos, SAS Enterprise BI application servers, and the Business Objects Web Portal. Underlying these is the SAS Metadata. The Data Zone includes a firewall and a network switch, the two categories of Business Objects Application Server and Informatica PowerCenter for ETL, and the broad category Oracle, which contains four areas of metadata: MicroStrategy, Cognos, Business Objects, and Informatica. Next to Oracle is another three-part category: Teradata, IBM DB2, and Oracle data repositories. The current CMS BI Production Environment is compliant with the CMS TRA Multi-Zone Architecture.
CMS Business Intelligence Production Environment (As-Is)

The BI components are implemented in the CMS BI Production Environment in accordance with the CMS TRA Multi-Zone Architecture.

Non-Compliance with CMS TRA Multi-Zone Architecture

Although the BI tools implemented at CMS are enterprise-class BI software suites, the product manufacturers did not develop them to meet the unique demands and design requirements of the CMS infrastructure. None of the software suites, as implemented at CMS, are fully compliant with one aspect of the CMS TRA—i.e., data transport between the Application Zone and the Data Zone.

The CMS TRA requires that, “When an Application Zone application requires information or actions from the Data Zone, the application makes the request and processes the response using a ‘redundant messaging facility’.”

Instead of messaging, the BI software in use at CMS separates business processing and data processing by using ODBC or Java database connectivity (JDBC) as standard database access methods.

Only the manufacturers of the CMS BI software can alter this aspect of their products. The current versions of the implemented CMS BI tools have no accepted methods for mitigating this conflict with the CMS TRA Multi-Zone Architecture.

CMS Enterprise Portal Framework

In the current CMS BI Production Environment, each BI tool supports its own portal enabling application users to access the BI contents—the metadata, database data, queries, and reports—created for each specific BI application. Because information in the CMS BI production environment is distributed across many BI applications, information sharing and collaboration among CMS BI applications is quite limited.

CMS has implemented an enterprise-wide secure gateway, known as the CMS Enterprise Portal, that enables all BI software tools to access, manipulate, analyze, and share data. This portal provides a one-stop site where users can access and analyze data, helping them make important decisions more efficiently and with greater confidence. The BI Portal Framework leverages CMS’s Portal platform and architecture, providing aggregated reports and visualizations of data retrieved from various BI tools interacting with the CMS Enterprise Data Warehouse, which comprises any CMS data warehouse or data mart built for decision support.

CMS Enterprise Portal Framework illustrates the CMS Enterprise Portal Framework.

Figure depicts the enterprise framework as described in the text. The presentation zone, application zone and data zone are drawn out to show the relationships between them.
CMS Enterprise Portal Framework

The CMS Enterprise Portal has the following additional features:

  • Aligns with the CMS Enterprise Portal Strategy
  • Integrates with the CMS Access Manager and CMS Enterprise Lightweight Directory Access Protocol
  • Migrates the existing BI Portal from Solaris 10 to Z/Linux Operating System
  • Implements single sign-on

The initial release of the CMS Enterprise Portal supports:

  1. Parts A, B, D, and Enrollment Dashboards
  2. Part B Analytics Reports
  3. National Level Repository (NLR), Health Information Technology for Economic and Clinical Health Act (HITECH) Electronic Health Record incentive program]
  4. Affordable Care Act (ACA) Program Dashboard
  5. MicroStrategy Web
  6. Teradata ViewPoint

PREFERRED

Enterprise and other Data Repository Frameworks

These provide access to source data repositories housed in the cloud, and have been extensively integrated with enterprise BI tools. Depending on use case, they can be used as appropriate, or their designs can be reused for similar capabilities.

  • The Integrated Data Repository Cloud (IDRC) is a high-volume data warehouse integrating Medicare claims—Parts A, B, C, D, and Durable Medical Equipment (DME) with beneficiary and provider data sources, as well as such ancillary data as contract information and risk scores. This robust, integrated data supports much needed analytics across CMS, as well as external agencies such as DOJ, FBI, OIG, etc.

    It consists of a data lake, ETL components, and a Snowflake data warehouse. Access to the IDR Cloud includes SAS EBI and Snowflake user interfaces, as well as the Data-as-a-Service (DaaS) API.

  • IDR Enterprise Data Product (EDP) is a new type of data object that uses the Snowflake External Table feature. The EDP incorporates the content of the former Enterprise Data Mesh Hive Metadata.

  • CMCS DataConnect is an all-in-one analytics platform for the Center for Medicaid & CHIP Services (CMCS). DataConnect is built on Databricks and AWS QuickSight dashboards. It provides read-only access to an expanding set of enterprise datasets, integrated with tools. Existing data insights include Dashboards and Data Spotlights.

  • CMS Master Data Management (MDM) is a CMS enterprise shared service application with a focus on eliminating redundancy, inconsistency and fragmentation of CMS data and increasing efficiencies. MDM provides a single point of access to a singular, synchronized, comprehensive, and ID-resolved authoritative source of Beneficiary, Provider, Organization, Program and Relationship data for use within CMS and by external organizations and agencies.

Business Intelligence Tools Comparisons

Below are comparisons of analytics and BI tools in use at CMS

CMS Enterprise BI Tools Comparison
Criteria ArcGIS BusinessObjects Cognos MicroStrategy Power BI SAS-EBI Tableau
Vendor Esri SAP IBM MicroStrategy Microsoft SAS Salesforce
Use Cases Best Suited for Geospatial mapping and analysis Reporting, visualization, seamless integration and native connectivity within the SAP ecosystem Reporting, data exploration, modeling, analytics, dashboards, scorecards, and event monitoring and metrics Interactive dashboards, scorecards, formatted reports, thresholds and alerts, automated report distribution Unified cloud-based business intelligence and analytics service Statistical software suite for data management, analytics, multivariate analysis, predictive analytics Intuitive visual analytics for all user types, data discovery, predictive analytics
CMS User Base Developers and users of maps and dashboards Developers, data analysts, and report & dashboard consumers Developers, data analysts, and report & dashboard consumers Developers and report & dashboard consumers Developers, data analysts, and report & dashboard consumers Data analysts Developers and report & dashboard consumers
CMS User Capacity
  • 150 Creator users
  • Unlimited Viewers
  • 2 ArcGIS Pro licenses in Amazon Workspaces
  • 5 ArcGIS Desktop Pro (CPI users)
  • Approximately 1,400 users
  • Concurrent user capacity 200
  • Approximately 5,000 users
  • Concurrent user capacity 100
  • Unlimited user license allocations
  • Approximately 3,500 users
  • Concurrent user capacity 100
  • User license allocation limited
  • All Microsoft 365 G5 subscriptions
  • No concurrent user limit
  • Approximately 1,200 users (Web and Citrix Microsoft Client)
  • Concurrent user capacity 200
  • SAS licenses are limited
  • Approximately 350 users
  • Concurrent Users Capacity 30
  • User’s Licenses Allocation:
    • 105 Creators
    • 18 Explorers
    • 545 Viewers
Web-based Tools used at CMS
  • ArcGIS Enterprise
  • ArcGIS Dashboards
  • ArcGIS StoryMaps
  • ArcGIS Web AppBuilder
  • SAP Business Objects
  • Web Intelligence Reporting Tool
  • SAP Lumira Dashboard Tool
  • Cognos Report Authoring
  • Cognos Query Studio
  • Cognos Dashboards
  • MicroStrategy Web
  • MicroStrategy Dashboard
  • MicroStrategy Document
Power BI Pro (SaaS)
  • SAS Information Delivery Portal
  • SAS Web Report Studio
  • SAS Studio
Web and Desktop tools
  • Tableau Desktop
  • Tableau Server
  • Tableau Prep Builder
  • Tableau Services Manager
CMS Integration CMS Enterprise IDM CMS Enterprise Portal CMS Enterprise Portal CMS Enterprise Portal Microsoft 365 Tenant CMS Enterprise LDAP CMS Enterprise LDAP and Okta MFA
Developer Access
  • ArcGIS Enterprise
  • ArcGIS Pro
  • ArcGIS for Adobe Creative Cloud
  • StreetMap Premium
  • AWS Workspaces for desktop tools
Windows based client tools on Citrix to create data structures (Universes), and Dashboards (Lumira) AWS workspaces with Okta MFA MicroStrategy Developer client tool Power BI Pro Access via Citrix:
  • Enterprise Guide
  • Enterprise Miner
  • Information Map Studio
  • OLAP Cube Studio
  • SAS Add-in for MS Office
AWS workspaces with Okta MFA
Some CMS Applications & Data Sources Provider and Beneficiary Geocode data located in IDR Cloud for CPI workflows uses IDRC Cloud
  • Medical Appeal System (Oracle)
  • System for Tracking Audit & Reimbursement (Oracle)
  • Unified Case Management (DB2 & Postgres SQL)
  • Eligibility Appeals Case Management Solution (Athena)
MDX (Oracle) Part B Analytical Reporting/IDR Cloud All sources available to Data Connectors
  • IDRC (Snowflake)
  • TMSIS (AWS Redshift and Databricks)
  • MSIS (DB2)
  • MQM (SQL Server)
  • HCRIP (Oracle)
  • OA/EW_Dashboard (Excel and CSV data)
  • OA/CFM (Amazon Redshift)
  • ISPG/CEDE (MS SQL Server)
  • HCDR (MS SQL Server)
  • DataConnect (Databricks)
Pricing Model named user per user per user per user per user or per workspace per user per user
Cost Enterprise License Agreement funded by CPIs Funded by CPI, limitations on how many users outside of CPI can use it Funded by OIT/EADG, no cost to the project Funded by OIT/EADG, no cost to the project Included in Microsoft 365 subscription Funded by OIT/EADG, licenses are limited and assigned to existing users Funded by OIT (ICPG & IUSG), licenses are limited
Comparison of AWS Services
Criteria Athena Redshift Spectrum QuickSight Redshift EMR
What Is It? Amazon Athena is an interactive query service that makes it easy to analyze unstructured, semi-structured, and structured S3 data directly with SQL Amazon Redshift Spectrum allows efficient querying and retrieval of structured and semi-structured S3 data without having to load the data into Amazon Redshift tables Amazon QuickSight is a fast, easy-to-use, business analytics service in the cloud Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. Uses SQL to analyze structured and semi-structured data using AWS-designed hardware and machine learning (ML) Amazon Elastic MapReduce (EMR) is a cloud big-data platform to run distributed big data processing frameworks such as Spark, Hadoop, Presto, or Hbase
Use Cases Best Suited for Ad-hoc interactive SQL queries against data on Amazon S3 Employ massive parallelism to run very fast against large datasets in S3 Interactive visualizations and dashboards, ad-hoc analysis and embedded analytics via APIs & SDKs A data warehouse solution that pulls data from many different sources / systems into a common format Custom big data processing, analytics, ML
Setup & Management >Serverless: no setup or server management Serverless: no setup or server management Serverless: no setup or server management Fully managed service along with the capability to manage clusters/ nodes. A serverless option is also available. Greater effort to setup & manage clusters and the software installed on them
Performance Well suited for small/medium datasets. Not suited for large datasets Well suited for medium to large datasets – multiple clusters can concurrently query the same dataset in Amazon S3 without the need to replicate the data for each cluster Automatically scale to tens of thousands of users Well suited for large datasets. Resources are automatically provisioned and scaled based on need Full control & flexibility over the configuration of clusters & performance
Pricing Model Pay only for what you use – price per query Pay only for what you use – price per query Pay only for what you use – price per session Pay for the provisioned capacity by the hour as long as the cluster is running Amazon EMR cost is added to the EC2 cost (the underlying servers) and Elastic Block Store (EBS) usage
Cost Low to Medium Medium Medium Medium to High Medium to High
Comparison of Python, R, and Scala
Criteria Python R Scala
What Is It? Python is a simple, open source, general-purpose language and is easy to learn. Many data analysis, manipulation, ML, and deep learning libraries are written in Python, and hence it has gained popularity in the big data ecosystem and is one of the de facto languages of Data Science. R is a language and environment for statistical computing and graphics and hence rightly called the language of statisticians. R Studio can be used for research, statistics, plotting, and data analytics applications. R is also used for building data models to be used for data analysis. Scala is a hybrid functional programming language since it has both object-oriented and functional programming features. Scala is a machine-compiled language that runs in a Java Virtual Machine (JVM). Scala is highly scalable and is the native language of Apache Spark.
Learning Curve Easiest Moderate Steep learning curve
Used by Beginners & Data Engineers Data Scientists/ Statisticians Big Data Programmers
Use Cases Best Suited for Data Engineering, ML, Data Visualization Data Analysis, Data Visualization, Statistics Spark Native
Type of Language General Purpose Specifically for Data Scientists. Needs conversion into Scala/Python before productizing Object-Oriented & Functional General Purpose
Concurrency Does not Support Concurrency N/A Supports Concurrency
Type Safety Dynamically Typed Dynamically Typed Statically typed (except for Spark 2.0 Data frames)
Interpreted Language — Read-Evaluate-PrintLoop (REPL) Yes Yes No
Mature ML Libraries Excellent Excellent Limited
Visualization Libraries Excellent Excellent Limited
Web Notebooks Support Jupyter Notebook Support R Notebook Apache Zeppelin Notebook Support
Performance Slower Slower Faster (about 10x faster than Python)
Cost Free, open source Free, open source Free, open source

TRA 2025 Release 1General Distribution / Unclassified Information