Business Intelligence Implementation Considerations
This topic provides key considerations in implementing new CMS BI applications. These considerations represent guidelines and best practices in planning, implementing, and deploying a BI environment that maximizes business benefits.
Integrate Operational Data Sources
The ETL process is essential in integrating CMS operational data sources into the BI Environment for analytics. ETL extracts and transforms data based on business rules and data warehousing best practices and loads that data into the target data repository(ies).
ETL programs will be created by one of two methods: creating custom ETL programs with a programming language or using a COTS product such as for example, Informatica PowerCenter. This decision will be based upon the project scope, data volume, size, and cost. It is noteworthy that Informatica provisions the use of custom code embedded in its PowerCenter workflow.
Data Integrity / Quality
Data in the CMS data repositories for use in the BI Environment must be clean, consistent, and complete. Data integrity / quality checks will ensure the data’s cleanliness, consistency, and completeness before it is published to the user community. This process involves cleansing the data by correcting misspellings, resolving domain conflicts, addressing missing elements, or parsing into standard formats.
An integral part of the data quality process is data profiling, which uses specific tools to monitor and measure the level of data quality expected from a specific data source.
BI User Management
This topic describes the user management measures CMS will consider in implementing any new BI application.
Owners and developers of new BI application must define user requirements that include the following functions:
-
Classifying BI application users for role-based access, including some or all the following user classes:
-
Standard User – An individual with access to predefined reports or data structures within the authorized BI application (e.g., executives, managers, CMS business partners, and U.S. citizens).
-
Power User – An individual with standard user access plus the ability to generate ad hoc reports using data within an authorized BI application (e.g., savvy business analysts and BI analysts).
-
BI Developer – An individual responsible for developing and implementing BI applications (e.g., CMS BI Contractor).
-
Business Intelligence Analyst – An individual responsible for providing information based on unique and changing needs within each BI application in the BI environment e.g., EADG’s BI consultants and developers, business analysts within the Division of Quality Coordination and Data Distribution).
-
-
Defining user access roles and implementing a process for authorizing users
-
Identifying BI contents such as metadata, queries, and reports that specific user groups may access
-
Identifying the data that user groups may access from the CMS data repositories
After the BI application security requirements are defined, the following security measures will be applied in the BI Environment.
Authentication
Authentication will be performed by the methods described in the Network Services, Access Control and Identity Management topic. Identity and authentication requirements may vary depending on the sensitivity of the data involved.
Authorization
Access control applies to BI contents in BI applications and to database objects in the CMS data repositories.
The authorization of access to database objects—tables, views, stored procedures, columns, and even rows—will be controlled by the BI application and the RDBMS.
Administration
The Office of Information Technology-designated administrator(s) will manage and control BI application security with an application’s available administrative tools
For BI tools that provide centralized management of user administration, application, and core infrastructure, the administration tool will be placed on a hardened and secure workstation. The OIT-designated administrator will align users, groups, and roles in EUA / LDAP with the BI objects. BI tools that provide web-based administration, only authorized users with valid credentials can log into the central administration console, which may only be accessed from internal CMS networks.
Sizing and Configuration
New BI applications must be properly sized and configured in the BI production environment. EADG will provide guidance in sizing and configuration. This organization will work with new BI applications to determine the optimum hardware configuration based on the BI application requirements. The BI tool vendor will also provide information on best practices when implementing its COTS product.
The following are some (but not all) key factors used to determine sizing and configuration: number of users, complexity of reports, number of ad hoc reports versus cached reports, and the need for data mining or internet access. Analyzing these factors is a starting point in determining the optimal configuration for the business user BI Environment.
Number of Users
The user size of the business intelligence community is a significant factor in determining hardware requirements. The number of users can be categorized as follows:
-
Total users – The number of all user accounts that will be created in the BI Environment.
-
Active users – The number of users who will be logged into the BI system.
-
Concurrent users – The number of users who are likely to have queries and reports processing simultaneously on the BI system.
Of these categories, it is most important to determine the number of concurrent users. The BI system must be able to support the maximum number of concurrent users expected at any given time.
Report Complexity
BI system resource use and processing time depends on the complexity of reports to be compiled. Some factors contributing to report complexity are:
-
Number of result rows required
-
Complexity of metric calculations needed
-
Degree of analytical processing performed
Ad Hoc Reports versus Cached Reports
Standard reports can be scheduled to execute at non-busy times. When scheduled reports execute, BI tools create a report cache. When these standard reports are subsequently executed, the system accesses a cached report and does not execute a report against the DW or data mart.
Because ad hoc queries are executed on the spur of the moment, they cannot be cached in advance. Therefore, ad hoc reports are executed against the DW / data mart. If users execute a high percentage of ad hoc reports, additional system resources are required.
Data Mining
Data mining is an analysis that discovers patterns in sample data sets using specific algorithms such as decision trees, neural networks, and clustering. When algorithms find patterns in small sample sets, data mining further validates the hypothesis with larger data sets in very large DWs. This process may take several hours.
Data mining can consume the greatest amount of BI system resources. Although the best practice is to schedule data mining procedures in non-busy hours, additional system resources must be considered in sizing and configuring BI systems for data mining.
Internet and Extranet Accesses
The workload of BI users in the CMS extranet for internal users, CMS business partners, and contractors is more manageable and predictable than the workload of public users from the Internet. When BI queries and reports are made public, the workload of the public—the number of concurrent users, the complexity of reports, and the number of standard reports—will be difficult to predict. Constant monitoring of the Internet workload is thus required to balance the Internet and extranet workloads.
User Activity Control and Monitor
To proactively administer and monitor usage and system performance, BI administrators must establish user activity controls and monitor activity based on the workload definition rules. BI administrators must establish thresholds such as query processing time as well as output row limitations to monitor for exception conditions.