Business Intelligence Implementation Considerations

This topic provides key considerations in implementing new CMS BI applications. These considerations represent guidelines and best practices in planning, implementing, and deploying a BI environment that maximizes business benefits.

Integrate Operational Data Sources

The ETL process is essential in integrating CMS operational data sources into the BI Environment for analytics. ETL extracts and transforms data based on business rules and data warehousing best practices and loads that data into the target data repository(ies).

ETL programs will be created by one of two methods: creating custom ETL programs with a programming language or using a COTS product such as for example, Informatica PowerCenter. This decision will be based upon the project scope, data volume, size, and cost. It is noteworthy that Informatica provisions the use of custom code embedded in its PowerCenter workflow.

Data Integrity / Quality

Data in the CMS data repositories for use in the BI Environment must be clean, consistent, and complete. Data integrity / quality checks will ensure the data’s cleanliness, consistency, and completeness before it is published to the user community. This process involves cleansing the data by correcting misspellings, resolving domain conflicts, addressing missing elements, or parsing into standard formats.

An integral part of the data quality process is data profiling, which uses specific tools to monitor and measure the level of data quality expected from a specific data source.

BI User Management

This topic describes the user management measures CMS will consider in implementing any new BI application.

Owners and developers of new BI application must define user requirements that include the following functions:

  1. Classifying BI application users for role-based access, including some or all the following user classes:

    1. Standard User – An individual with access to predefined reports or data structures within the authorized BI application (e.g., executives, managers, CMS business partners, and U.S. citizens).

    2. Power User – An individual with standard user access plus the ability to generate ad hoc reports using data within an authorized BI application (e.g., savvy business analysts and BI analysts).

    3. BI Developer – An individual responsible for developing and implementing BI applications (e.g., CMS BI Contractor).

    4. Business Intelligence Analyst – An individual responsible for providing information based on unique and changing needs within each BI application in the BI environment e.g., EADG’s BI consultants and developers, business analysts within the Division of Quality Coordination and Data Distribution).

  2. Defining user access roles and implementing a process for authorizing users

  3. Identifying BI contents such as metadata, queries, and reports that specific user groups may access

  4. Identifying the data that user groups may access from the CMS data repositories

After the BI application security requirements are defined, the following security measures will be applied in the BI Environment.

Authentication

Authentication will be performed by the methods described in the Network Services, Access Control and Identity Management topic. Identity and authentication requirements may vary depending on the sensitivity of the data involved.

Authorization

Access control applies to BI contents in BI applications and to database objects in the CMS data repositories.

The authorization of access to database objects—tables, views, stored procedures, columns, and even rows—will be controlled by the BI application and the RDBMS.

Administration

The Office of Information Technology-designated administrator(s) will manage and control BI application security with an application’s available administrative tools

For BI tools that provide centralized management of user administration, application, and core infrastructure, the administration tool will be placed on a hardened and secure workstation. The OIT-designated administrator will align users, groups, and roles in EUA / LDAP with the BI objects. BI tools that provide web-based administration, only authorized users with valid credentials can log into the central administration console, which may only be accessed from internal CMS networks.

Sizing and Configuration

New BI applications must be properly sized and configured in the BI production environment. EADG will provide guidance in sizing and configuration. This organization will work with new BI applications to determine the optimum hardware configuration based on the BI application requirements. The BI tool vendor will also provide information on best practices when implementing its COTS product.

The following are some (but not all) key factors used to determine sizing and configuration: number of users, complexity of reports, number of ad hoc reports versus cached reports, and the need for data mining or internet access. Analyzing these factors is a starting point in determining the optimal configuration for the business user BI Environment.

Number of Users

The user size of the business intelligence community is a significant factor in determining hardware requirements. The number of users can be categorized as follows:

  • Total users – The number of all user accounts that will be created in the BI Environment.

  • Active users – The number of users who will be logged into the BI system.

  • Concurrent users – The number of users who are likely to have queries and reports processing simultaneously on the BI system.

Of these categories, it is most important to determine the number of concurrent users. The BI system must be able to support the maximum number of concurrent users expected at any given time.

Report Complexity

BI system resource use and processing time depends on the complexity of reports to be compiled. Some factors contributing to report complexity are:

  • Number of result rows required

  • Complexity of metric calculations needed

  • Degree of analytical processing performed

Ad Hoc Reports versus Cached Reports

Standard reports can be scheduled to execute at non-busy times. When scheduled reports execute, BI tools create a report cache. When these standard reports are subsequently executed, the system accesses a cached report and does not execute a report against the DW or data mart.

Because ad hoc queries are executed on the spur of the moment, they cannot be cached in advance. Therefore, ad hoc reports are executed against the DW / data mart. If users execute a high percentage of ad hoc reports, additional system resources are required.

Data Mining

Data mining is an analysis that discovers patterns in sample data sets using specific algorithms such as decision trees, neural networks, and clustering. When algorithms find patterns in small sample sets, data mining further validates the hypothesis with larger data sets in very large DWs. This process may take several hours.

Data mining can consume the greatest amount of BI system resources. Although the best practice is to schedule data mining procedures in non-busy hours, additional system resources must be considered in sizing and configuring BI systems for data mining.

Internet and Extranet Accesses

The workload of BI users in the CMS extranet for internal users, CMS business partners, and contractors is more manageable and predictable than the workload of public users from the Internet. When BI queries and reports are made public, the workload of the public—the number of concurrent users, the complexity of reports, and the number of standard reports—will be difficult to predict. Constant monitoring of the Internet workload is thus required to balance the Internet and extranet workloads.

User Activity Control and Monitor

To proactively administer and monitor usage and system performance, BI administrators must establish user activity controls and monitor activity based on the workload definition rules. BI administrators must establish thresholds such as query processing time as well as output row limitations to monitor for exception conditions.