Press Releases Dec 01, 2013

HealthCare.gov Progress and Performance Report

HealthCare.gov Progress and Performance Report

Overview

In mid-October, the Obama administration conducted an assessment of the site HealthCare.gov.  The assessment was conducted by experts from across government and private sector.  The team identified the problems and necessary fixes and determined that HealthCare.gov was fixable, but only with significant changes to the management approach, and a relentless focus on execution. This report details the substantial progress that has been made to improve and stabilize HealthCare.gov, including hundreds of software fixes and numerous hardware upgrades, so that the system runs smoothly for the vast majority of users.

The status of HealthCare.gov in October was marked by an unacceptable user experience.  Consumers were experiencing slow response times and frequent, inexplicable error messages.  The website experienced frequent outages. For some weeks in the month of October, the site was down an estimated 60 percent of the time. The assessment determined the root causes for these site flaws to be hundreds of software bugs, insufficient hardware and infrastructure.  The system monitoring and response mechanisms were not sufficient for identifying issues or bugs or responding to them in real time.  Inadequate management oversight and coordination among technical teams prevented real-time decision making and efficient responses to address the issues with the site.

Improving the user experience for HealthCare.gov required deeper real-time analysis to the system, additional technical expertise, and a strong management structure to drive the prioritization and metric-driven execution of fixes.  The Center for Medicare and Medicaid Services (CMS) appointed QSSI as the General Contractor and Systems Integrator.  QSSI, with their deep project management expertise, coordinates all activity with CMS and other contractors.  With one central command structure and “War Room” meetings of all key parties held twice a day for real-time, data-based decision making, the team has been able to implement high-performance management practices and drive through a priority set of fixes.  

The newly installed technical monitoring instruments have allowed for constant real-time analysis of site performance.  With this new data and management structure the team has the capacity to rapidly respond to any incidents and to better understand root causes.

Over the last five weeks, substantial progress has been made improving HealthCare.gov and getting the system to where it needs to be:

Hundreds of software fixes, hardware upgrades and continuous monitoring have measurably improved the consumer experience

Site capacity is stable at its intended level

Operating metrics are greatly improved, and activity levels demonstrate the site is working for consumers

While there is more work to be done, the team is operating with private sector velocity and effectiveness, and will continue their work to improve and enhance the website in the weeks and months ahead.  The following charts provide data on the systems enhancements that have been executed, and the resulting improvements in the site’s key operating metrics over the last several weeks.

Real-Time Monitoring. Dedicated team focused on site monitoring and instant incident response. This figure is a screen shot of four different windows that show data about real-time monitoring and incident response of healthcare.gov.  It is not possible to make out the details of each window in full because the screen capture is of low quality/resolution. On top of all four graphs is an image of a text box that hides portions of each of those graphs.  That text box summarizes the four themes shown in the data: there is 24/7 monitoring of site performance; there is a standup war-room meetings twice a day; there is an open bridge for instant incident response; and there is a process for clear accountability and decision-making. End of figure description.

Software Fixes. The team has knocked more than 400 bug fixes and software improvements off the punch list. This figure is a line graph titled "Cumulative Software Fixes." It illustrates the number of fixes implemented on healthcare.gov. The X-axis is weeks, starting with October 5 and continuing through November 30th. The Y-axis is the number of fixes, which ranges from 0 to 450. From October 5 to November 2, approxiamtely, 100 fixes were implemented. From November 2 to November 9, approximately, there appear to be not additional bugs fixed. However, beginning on November 9 and continuing through November 30th, approximately, there were an additional 350 fixes implemented, making the total fixes impleented, as of November 30th, over 400.  End of figure description.

Hardware Upgrades. A series of significant hardware enhancements have increased redundancy, reliability and scale. This figure is a flow chart that summarizes some of the key advancements made in relation to the IT infrastructure for healthcare.gov. There are four specific IT elements covered in this flow chart: Registration Database, Core Database, Website, and Network. To improve the Registration Database, dedicated hardware was installed, which allowed for a 4 X increase in registration throughput. To improve the Core Database, 12 servers were deployed and the storage unit was upgraded, which allowed for a 3 X increase in database throughput. To improve the Website, the applications environment was doubled, which allowed for a 2 X increase in capacity. To improve the Network, the firewall capacity was increased, which allowed for a 5 X increase in network throughput. End of figure description.

Response Times. System speed has increased dramatically, with response time running under 1 second. This figure is a hybrid bar chart and line graph and is titled "Average Response Time." The X-axis is days, ranging from "Late October" to November 29. The Y-axis is Seconds, ranging from 0 seconds to 8 seconds. In late Octover, there was an average response time of 8 seconds, which is the bar chart piece of the figure. The rest of the figure is the line graph portion, which shows, on average, a response time of 1 second or less continuing through November 29. End of figure description.

Error Rates. Per page system time outs or failures have been driven down from over 6% to well under 1%. This figure is a bar chart titled "Average Error Rates." The X-axis, with the exception of the first data point, which is October, is arranged by weeks, starting with November 9 and continuing through November 29. The Y-axis is the percentage of per-page failures. In October, the per-page failure rate was approximately 6%. During the week of November 9, the percentage of per-page failures declined to approximately 2%. For the week of November 16th, the percentage of per-page failures was approximately 1%. During the week of November 22nd, the percentage of per-page failures declined to approximately 0.75%. For the week of November 29th, the percentage of per-page failures was approximately 0.75%. End of figure description.

System Stability. Uptime is consistently surpassing 90%. This figure is a bar chart titled "System Availability." The X-axis is arranged by Weeks, starting with November 2nd and continuing through November 30th. The Y-axis is the percentage of uptime for healthcare.gov. Overall, the graph indicates an increase in uptime. For the week of November 2nd, the uptime for healthcare.gov was 42.9%. For the week November 9th, uptime increased to 71.9%. For the week of November 16th, the uptime was 93.3%. For the week of November 23rd, uptime decreased to 92.4%. For the week of November 30th, uptime increased to a new high of 95.1%. End of figure description.

System Capacity. Software and hardware upgrades enable the system to support its intended volumes. This figure is a flow chart summarizing how the IT upgrades will enable the healthcare.gov system to support the intended volume of 800,000 site visits per day. After the upgrades, the site can now support 50,000 concurrent users. Users spend an average of 20 to 30 minutes on the site. Thus, approximately 800,000 site visits can be handled per day. End of figure description.

Summary. Achieving a system that runs smoothly for the vast majority of consumers. For Response Time, the progress update is the average system response time is lower than 1 second. For Error Rate, the progess update is there is a lower error rate consistently well below 1%. For System Stability, the progress update is hardware upgrades and software fixes support a system uptime of 90%+. For Rapid Response Team, the progress update is a 24/7 monitoring and operations center and team is in place to ensure optimal system performance and to respond to glitches and unplanned downtimes. For Concurrent Users, the progress update is there is capacity for concurrent user target of 50,000, supporting a minimum of 800,000 visits per day.  End of summary graph.

Conclusion

As the metrics detailed in this report reveal, dramatic progress has been made on improving HealthCare.gov.  There is more work to be done to continue to improve and enhance the website and continue to improve the consumer experience in the weeks and months ahead.  The new management system and instrumentation have helped improve site stability, lower the error rating below 1%, increase capacity to allow 50,000 concurrent users to simultaneously use the site and will help drive continuous improvement on the site.  While we strive to innovate and improve our outreach and systems for reaching consumers, we believe we have met the goal of having a system that will work smoothly for the vast majority of users.