Background
Thyroid cancer (TC) is the most common endocrine malignancy, consisting of nearly 3% of all newly diagnosed cancer cases in the United States each year.1 Greater than 70% of those cases are women, representing the fifth most diagnosed malignancy in females. It is the second most common cancer among Hispanic and Asian/Pacific Islander women in the US, who also have the highest mortality rates.2 Differentiated thyroid carcinoma is the most common form, accounting for around 90% of all cases and includes papillary thyroid carcinoma (PTC), follicular carcinoma (FC) and Hurthle cell carcinoma (HTC).3,4 Medullary thyroid carcinoma (MTC), a rare neuroendocrine tumor that arises from the neural crest-derived parafollicular calcitonin-secreting thyroid C cells, represents 4% of all TC.5 Anaplastic thyroid carcinoma (ATC) is the most aggressive thyroid tumor and while only around 1-2% of all TC, accounts for the majority of TC death.6
The diagnosis of TC in the United States has tripled over the last 25 years.1 Several studies attribute the significant increase to overdiagnosis of small indolent tumors that would otherwise not cause symptoms or require treatment with a majority of the increase being explained by PTC tumors 2 cm or smaller.7-11 In fact, recent studies suggest the incidence rate of thyroid cancer stabilized between 2013-2016 and declined between 2016-2018.12,13 This stabilization followed by decline has been postulated to be a result of changes in practice patterns and reclassification of some cancer types. In 2016, the Endocrine Pathology Society working group reported clinical outcomes and refined the diagnostic criteria for encapsulated follicular variant of papillary thyroid carcinoma (EFVPC) and proposed replacing the term with non-invasive follicular thyroid neoplasm with papillary-like nucleus features (NIFTP) to describe these tumors more accurately.14 The American Thyroid Association (ATA) recommended this terminology change in 2017.15 This led to the reclassification of approximately 10-20% of thyroid tumors from malignant to benign.14 In addition, guidelines for the management of thyroid nodules have become increasingly more conservative regarding size thresholds for nodule biopsy and discourage biopsy for nodules < 1 cm.16,17 However, some studies have reported a true increase in advanced-stage and larger PTC tumors as well as incidence-based mortality that cannot be explained by overdiagnosis and suggest that lifestyle-related factors such as obesity may be contributory.10,18 Also, there continue to be disparities in diagnosis and treatment of TC in patients based on race, ethnicity and socioeconomic status with patients from minority backgrounds more likely to present with larger tumors, and distant metastases than white patients.2,19,20 However, a recent report from Ginzberg et al. suggests that the updated ATA guidelines ameliorated some of these disparities.21
TC almost exclusively presents as thyroid nodules, occurring in 7-15% of cases depending on gender, age, radiation exposure, family history and other factors.16 However, thyroid nodules are very common; most are asymptomatic and benign and do not require monitoring, treatment, or evaluation. In fact, over 60% of the population will have a thyroid nodule by the time they are over the age of 65.16 Therefore, it is important to distinguish between benign and malignant nodules for patients to receive appropriate treatment and prevent unnecessary surgery.
The malignancy potential of a thyroid nodule is determined through a multimodality manner including physical exam, personal and family history, radiographic assessment, and fine needle aspiration (FNA) biopsy. FNA biopsies are the procedure of choice when evaluating clinically suspicious thyroid nodules and every year more than 500,000 of this minimally invasive procedure are performed.22 The results of FNA biopsies are reported using the Bethesda System for Reporting Thyroid Cytopathology (TBSRTC).23 This system was established to provide consensus recommendations for diagnostic categories for FNA specimens with a goal of standardizing classification and reporting across health care providers. It includes recommendations on sample adequacy, malignancy risk, report layout and management and has been widely adopted. As shown in Table 1 (reproduced from TBSRTC), the system recognizes six diagnostic categories and provides an estimated cancer risk for each, based on identification of cancer in a subsequent nodule resection.
TABLE 1
Bethesda Reporting System Categories for Thyroid Nodule Cytology and Risk of Malignancy
|
Risk of malignancy Mean % (range)
|
I. Nondiagnostic
|
13 (5-20)
|
II. Benign
|
4 (2-7)
|
III. Atypia of Undetermined Significance (AUS)
|
22 (13-30)
|
IV. Follicular neoplasm
|
30 (23-34)
|
V. Suspicious for malignancy
|
74 (67-83)
|
VI. Malignant
|
97 (97-100)
|
A repeat aspiration with ultrasound guidance is recommended for nondiagnostic Bethesda category I samples with excision considered for persistently non-diagnostic or unsatisfactory nodules.23 Category II almost always results in conservative surveillance as data continue to support a low false-negative rate (<3%). Category VI, malignant, is used whenever features are conclusive for malignancy with additional comments used to subclassify the malignancy. In this case, near-total thyroidectomy or lobectomy is recommended. While this system has been successful in distinguishing benign nodules from malignant, approximately 20-25% of thyroid nodule FNA results are reported as indeterminate or suspicious (Bethesda III-V) and nodules in these categories carry a risk of malignancy from ~10-75%.23 In addition, intra- and inter-observer variability has been reported leading to differences in classification and the risk of malignancy (ROM) for these categories.24,25
Despite advances in technique, thyroid surgery is not without adverse effects including general complications such as postoperative fever, infection, hemorrhage, and cardiopulmonary events as well as thyroid-specific complications including vocal cord/fold paralysis and hypoparathyroidism or hypocalcemia. Thyroid-specific complications have been reported in more than 10% of patients and are significantly higher in patients greater than 65 years of age.26 As such, there has been a concerted effort to develop technologies to improve the classification and risk stratification of indeterminate and suspicious thyroid nodules as part of the surgical decision making process.
Molecular Testing
Current clinical guidelines, including the ATA Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer, and The National Comprehensive Cancer Network (NCCN) Thyroid Carcinoma guidelines endorse the use of molecular tests to further risk stratify patients with indeterminate (Bethesda III and IV) thyroid nodule cytology results, as well as their use in identifying cancer types with challenging cytology such as MTC.16,27 Molecular tests can be broadly grouped into “rule-out” tests designed to identify benign nodules thereby placing the patient on surveillance and avoiding surgery, “rule-in” tests that aim to predict the aggressiveness of malignancy and aid in surgical decision making and “general” tests that can act as both rule-in and rule-out. Most currently offered tests utilize Next Generation Sequencing (NGS) methodologies to either assess characteristic gene expression profiles (GEP) or genomic sequence variant profiles that are known to be associated with malignancy.
First generation Tests
The Afirma gene expression classifier (AGEC) is an early GEP test that was developed as a rule-out.28 The landmark publication in 2012 described the test validated against the gold standard histopathology of known benign or malignant thyroid tissue and classified indeterminate thyroid nodules into benign or suspicious using a proprietary algorithm based on gene expression signatures.28 The algorithm assesses the expression of 142 primary genes plus 25 additional genes that filter out rare neoplasms such as medullary carcinoma and renal carcinoma as the sample is processed through a series of “cassettes.” This prospective study examined 265 of 577 indeterminate nodules from 4812 FNAs (5.5%) collected from 3789 patients at 43 clinical sites over a 19-month period. The AGEC correctly called 78 of 85 malignant samples suspicious for a sensitivity of 92% and 93 of 180 benign samples were called correctly for a specificity of 52%. These percentages were consistent regardless of the sample category. The prevalence of malignancy (POM) was 24% and 25% for Bethesda category III and IV nodules respectively, yielding a negative predictive value (NPV) of 95% and 94% respectively. Because the POM for category V was much higher at 62%, the respective NPV was 85%. These data suggested that the AGEC could rule out malignancy in over 90% of indeterminate category III and IV nodules. Since then, the test has garnered wide acceptance in clinical practice and as described above, the approach has been recommended by professional associations.16,27
Many follow up reports including multiple meta-analyses on the performance of the AGEC have been published.29-31 Silaghi et al. summarized 25 studies involving 4,538 indeterminate nodules of 4,424 patients who had been evaluated using the AGEC test from May of 2009 to June of 2018.30 The overall sensitivity and specificity across all studies was 97% and 19% respectively with an NPV of 91% and positive predictive value (PPV) of 39%. However, most of the reports are retrospective from single centers and demonstrate variable test performance among institutions. Some reports indicate variability amongst institutions that differ in POM of indeterminate nodules.32,33 For example, in a study comparing the AGEC-benign call rate between Memorial Sloan Kettering Cancer Center (MSK), a tertiary referral cancer center with a POM of 30-38%, and Mount Sinai Beth Israel (MSBI) a comprehensive health system with a POM of 10-19%, Marti et al found that the NPV at MSK was 86-92% yet 95-98% at MSBI. Conversely the PPVs of GEC-suspicious results were 57.1% and 13.3% respectively with 86% (18/21) of resected GEC-suspicious nodules at MSBI being benign on final pathology. This data matched closely to the predicted PPVs and NPVs and highlights the importance of knowing the POM at each institution.33 Valderrabano et al reported that the low resection rate of GEC-benign nodules makes the false-negative and NPV impossible to calculate and the only reliable metrics of benign call rate (BCR, the proportion of nodules tested with a GEC-benign result) and PPV suggested that the initial cohort study is not representative of the populations to which the AGEC was subsequently applied.34
In 2011, Nikiforov et al. reported on the efficacy of a gene hot spot panel in 967 FNA samples from indeterminate nodules for variants that commonly occur in thyroid cancer such as BRAF p.V600E, KRAS codons 12/13, NRAS and HRAS codon 61 and RET and PAX8 fusions establishing that molecular profiling using FNAs of thyroid nodules can aid in malignancy identification as a rule-in test.35 This report was followed by further clinical validation studies on Bethesda III and IV nodules using ThyroSeq v2 (TSv2), a panel consisting of additional variant hotspots in genes known to be drivers in thyroid carcinogenesis as well those that develop late with expression analysis of an additional eight genes to determine cell type composition. The larger number of variants examined resulted in a higher sensitivity than the original seven gene panel as well as a higher NPV. In a study of 143 FNA samples from patients with Bethesda category IV nodules with known surgical outcomes, the TSv2 test demonstrated a sensitivity and specificity of 90% and 93% respectively with a PPV of 83% and an NPV of 96%.36 Similar results were obtained for category III nodules.37 However, like the AGEC, variability across multiple institutions has been reported.38-40 For example, in a retrospective analysis of 273 category III and IV nodules from four different institutions, Marcadis et al. reported variation in test performance and diagnoses.38 Although sensitivity was similar to what was originally reported by Nikiforov, the specificity was lower (52% vs. 93%). This led to a range of PPVs from 22%-43% across the institutions which is lower than what was originally reported at 83%. A PPV of 22% was reported by Taye et al. with a PPV of 9% (2/22) and 7% (1/15) across all RAS and NRAS mutations respectively. The authors noted that many genetic alterations, such as those in the RAS family, appeared to be nonspecific for malignancy and positive reports should be interpreted with care.39
Second generation tests
Updated versions of both rule-in and rule-out test types have been developed. The AGEC was replaced by the Afirma Genomic Sequencing Classifier (AGSC) which tests for BRAF p.V600E and RET/PTC fusion variants, as well as characteristic MTC and parathyroid tissue profiles in addition to a more robust classifier that provides a benign or suspicious result for indeterminate nodules .41 If positive, genomic profiling may be used to further inform on risk of malignancy and tumor prognosis. Thyroseq v3 (TSv3) is an expanded version of TSv2 containing variant targets in 112 genes as well as copy number alterations (CNAs) in multiple genomic regions and expression analysis of 19 genes. Results are reported as positive (high probability of cancer/NIFTP) or negative (low probability of cancer/NIFTP).42 Positive samples are further classified into high, intermediate and low molecular risk groups based on the variant(s) identified.
Multiple reviews have been performed on these second-generation tests and describe increased performance over their predecessors.30,43,44 In Lee et al., preliminary pooled studies demonstrated that both assays, AGSC and TSv3, have a high sensitivity (96% and 95% respectively) and high NPV (96% and 92% respectively) demonstrating that either test type can be used to rule out malignancy.43 The AGSC and TSv3 were reported to have a specificity of 53% and 50% with a PPV of 63% and 70% respectively. Although this represents an increase in specificity for the AGSC (12% to 53%) the specificity for TSv3 compared to TSv2 went down (78% to 49.6%). However, the specificity of the tests ranged across multiple studies particularly from single centers suggesting inter-institution variation similar to what was seen in the first-generation tests. Silaghi reported similar results.30 Livhits et al performed a randomized clinical trial across nine sites by using both the AGSC and the TSv3 in practice on a rotating monthly basis.45 Of the 346 samples ultimately tested, 189 and 157 were randomized to the AGSC and TSv3 respectively. For the AGSC test, 19 nodule samples were insufficient for testing, 107 (53.2%) were classified as benign and 73 (36.3%) as suspicious. Twelve of the benign samples were surgically resected and histopathologically classified as benign. Fifty-eight of the suspicious samples were resected and revealed NIFTP in 10 (17.2%) and malignancy in 21 (36.2%). The TSv3 test identified 103 (60.2%) negative nodules, 60 (35.1%) positive, and seven insufficient for testing. Eleven negative nodules were resected, and one was found to be a minimally invasive Hurthle cell carcinoma with capsular invasion only that was resected due to growth during the surveillance period. Of the positive nodules, 49 (81.7%) were resected and histopathological results revealed NIFTP in 11 (22.4%) and malignancy in 20 (40.8%). These data demonstrated high sensitivity (97-100%) and reasonably high specificity (80-85%) for both tests and diagnostic surgery was avoided in approximately half of the patients in the study.45 However, consistent with similar studies, nodules with benign/negative results were assumed to be benign in the absence of histopathological confirmation. Therefore, to further assess the false negative rates of the AGSC and TSv3, Kim et al. performed a prospective study of a single center in patients surveilled over a median of 34 months (range 12-60).46 They reported that of the 217 indeterminate nodules initially reported with negative or benign results 14 (8%) underwent immediate resection and were all confirmed to be benign. Of the 147 that remained on continued surveillance, 15 were resected during the surveillance period. The minimally invasive Hurthle cell carcinoma initially found to be negative by TSv3, remained the only false positive. Of the 133 test positive nodules, 97 underwent immediate resection and 59 were determined to be cancerous and of those that were initially surveilled, 16 underwent delayed surgery with an additional nine found to be malignant. These data reaffirm the high sensitivity rate previously reported for both assays.46
Molecular profiles
The variants identified in a nodule can also predict the risk and/or class of malignancy. For example, nodules with “driver” mutations such as BRAF p.V600E or pathogenic variants in RET have a higher probability of malignancy than those carrying RAS or RAS-like variants.16 Tumors harboring BRAF p.V600E are generally classic PTC that frequently involve regional lymph nodes with a higher rate of metastasis, and RET mutations are present in all inherited MTCs and 6-10% of apparent sporadic disease.15,47-50 In contrast, RAS alterations (KRAS/NRAS/HRAS) are the most frequently identified in indeterminate thyroid nodules. However, unlike BRAF p.V600E, the utility of detecting RAS alterations remains uncertain. In a systemic review of 35 studies examining RAS mutations published between 2000 and 2015, Najafian et al. reported a prevalence of RAS mutations in 0-48% of benign nodules and 10-93% of malignant nodules across the studies.51 In a study of over 1500 patients, Yip et al. reported an indolent clinical course and nearly 100% disease free survival at five years for patients with RAS-positive nodules.50 Guan et al. also reported that although RAS variants were the most frequent alterations detected in more than 500 fine needle biopsies, they provided poor value for prediction of TC since most RAS alterations presented in benign nodules and NIFTPs (59% and 13% respectively).52 However, the presence of a “second-hit” in another gene such as TERT or TP53 significantly increased the risk of malignancy.