Bill Chiu, Cord Sturgeon, Peter Angelos. Which Intraoperative Parathyroid Hormone Assay Criterion Best Predicts Operative Success?A Study of 352 Consecutive Patients. Arch Surg. 2006;141(5):483–488. doi:10.1001/archsurg.141.5.483
The 6 published criteria for predicting curative parathyroid resection by means of intraoperative parathyroid hormone (IOPTH) assay are not equivalent.
Retrospective review of 352 patients undergoing parathyroidectomy for primary hyperparathyroidism from January 1, 1999, to December 31, 2004. We evaluated 6-month postoperative IOPTH values and serum calcium levels.
Tertiary referral center.
Main Outcome Measures
The IOPTH values at baseline (preincision and preexcision) and at 5 and 10 minutes after parathyroidectomy were reviewed according to the Miami criterion (>50% drop from highest baseline IOPTH level at 10 minutes after excision), criterion 1 (>50% drop from preincision IOPTH level at 10 minutes), criterion 2 (>50% drop from highest baseline IOPTH level at 10 minutes and final IOPTH level within the reference range), criterion 3 (>50% drop from highest baseline IOPTH level at 10 minutes and final IOPTH level less than the preincision value), criterion 4 (>50% drop from highest baseline IOPTH level at 5 minutes), and criterion 5 (>50% drop from preexcision IOPTH level at 10 minutes).
Criterion 2 had sensitivity of 88%, specificity of 22%, positive predictive value of 97%, and negative predictive value of 6%. Criterion 2 had good agreement with criteria 1 and 3. Of patients whose IOPTH level drop satisfied criterion 2 but not criterion 1, 14% had postoperative hypercalcemia at 6 months. When criterion 2 was not satisfied but criteria 1, 3, 4, and 5 and the Miami criterion were, failure rates were 0%, 4%, 7%, 6%, and 9%, respectively.
Satisfying criterion 2 had a high operative success but resulted in additional unnecessary surgical exploration. Criterion 1 was better at predicting postoperative normocalcemia than criterion 2.
The success of parathyroidectomy for primary hyperparathyroidism is higher than 95% in the hands of experienced surgeons.1 With the refinement of preoperative and intraoperative localization tools, minimally invasive parathyroidectomy has evolved into a procedure commonly performed in an outpatient setting at many specialized centers across the United States.2- 9 An intraoperative parathyroid hormone (IOPTH) assay has become an essential tool in minimally invasive parathyroid surgery. The assay enables the rapid detection of falling parathyroid hormone levels during parathyroid surgery. Surgeons use the assay to predict surgical success after parathyroid resection, allowing the performance of a more limited dissection.
There is regional variation in the use and interpretation of the IOPTH assay. The various criteria that have been developed are not uniform in predictive value, as supported by the fact that reports of the various criteria used have shown mixed results.2,10 Comparison between reports is difficult, because the collection method and timing of the phlebotomy have not necessarily been consistent. In this study, we retrospectively reviewed the IOPTH values of a large series of patients who underwent parathyroidectomy for primary hyperparathyroidism at a single institution. We hypothesized that a careful review of these data would validate the use of our institutional criterion.
Several investigators have reported their results using the various criteria for evaluating the fall in IOPTH level after parathyroidectomy.1,11- 16 Several of these criteria have already been compared in a retrospective fashion at one institution.12 Our study was designed to perform a similar analysis to determine if our institutional criterion would be found to be as accurate as other reported criteria. Because we considered our criterion to be more stringent than others, we analyzed our data to determine whether the added stringency would result in greater operative success.
We retrospectively reviewed the medical records of 352 patients who underwent parathyroidectomy for primary hyperparathyroidism at a single institution from January 1, 1999, to December 31, 2004. Patients with multiple endocrine neoplasia and secondary, tertiary, and familial hyperparathyroidism were excluded. The IOPTH values at baseline (preincision and preexcision) and at 5 and 10 minutes after gland excision were recorded. We also reviewed 6-month postoperative serum calcium levels. This study was approved by the institutional review board of Northwestern University.
All patients underwent preoperative technetium Tc 99m sestamibi scans, and approximately 70% of the scans were obtained at our facility. Patients received 20 mCi of the radiolabeled pharmaceutical intravenously, and images were taken with a large field-of-view camera and pinhole collimators. Patients were placed in a supine position before 600-second images from the anterior-posterior, left anterior oblique, and right anterior oblique views were captured. Images from the same views were also obtained 3 hours after the radiotracer injection. All technetium Tc 99m sestamibi scans in the study were reviewed by the operating surgeon, and those performed at our institution were also read by the radiologists.
When a single focus of persistent activity was seen on the scan, a focused parathyroid exploration was undertaken. When more than 1 focus was apparent or no clear pattern was present on the scan, a unilateral exploration was undertaken. Intraoperative decisions regarding termination of the operation or extension to bilateral exploration were based on intraoperative findings and IOPTH values obtained after glandular resection. The IOPTH values were obtained by using the QuiCk IntraOperative Intact PTH Assay (Nichols Institute Diagnostics, San Clemente, Calif). The reference range of IOPTH values used at our institution was 0 to 65 pg/mL.
Before the surgical incision, blood drawn from a peripheral site was used to obtain the preincision IOPTH value. When an abnormal gland was identified, a preexcision IOPTH value was obtained before ligating the parathyroid vascular pedicle. Peripheral blood was again drawn at 5 and 10 minutes after excision of the parathyroid adenoma to obtain the respective IOPTH values. The criterion used intraoperatively to predict surgical success was prospectively chosen. Our criterion stipulated that the 5- or 10-minute postexcision IOPTH value must fall to less than 50% of the highest baseline value (preincision or preexcision, whichever is higher) and be within the reference range (described in criterion 2 in the following paragraph). If this criterion was fulfilled, the patient was considered to have a curative resection, and no further exploration was made.
All IOPTH values were retrospectively reviewed to determine whether they fulfilled any of the following 6 published criteria:
“Miami” criterion (>50% drop from the highest IOPTH level at 10 minutes12)
Criterion 1 (>50% drop from the preincision IOPTH level at 10 minutes after gland excision14,15)
Criterion 2 (>50% drop from the highest IOPTH level at 10 minutes after gland excision and a final IOPTH level within the reference range17)
Criterion 3 (>50% drop from the highest IOPTH level at 10 minutes after gland excision and a final IOPTH level less than the preincision value18)
Criterion 4 (>50% drop from the highest IOPTH level at 5 minutes after gland excision19)
Criterion 5 (>50% drop from the preexcision IOPTH level at 10 minutes after gland excision16)
The nomenclature for these criteria was matched to that described by Carneiro et al12 for ease of comparison between studies. We believe criterion 2 is the most stringent because of the requirement for the IOPTH level to drop to within the reference range rather than just dropping relative to the preincision or preexcision level.
Six months after the operation, a serum calcium level was obtained from each patient. A calcium level of 10.6 mg/dL or greater (≥2.7 mmol/L) was considered to be hypercalcemic. The operation was considered a success when the patient remained normocalcemic 6 months after parathyroidectomy. Using our operative procedures, we were able to achieve 6-month postoperative normocalcemia in 95.7% of our patients.
When meeting a criterion correctly predicted normocalcemia at 6 months after parathyroidectomy, the fall in IOPTH level was considered a true-positive (TP) result. A true-negative (TN) result was defined as the failure to meet the criterion and the development of hypercalcemia at 6 months. A false-positive (FP) result was assigned when the criterion predicted normocalcemia at 6 months but the postoperative calcium level revealed hypercalcemia. A false-negative (FN) result was assigned when a criterion predicted hypercalcemia at 6 months but the actual value was normocalcemic. These definitions were used to calculate the sensitivity (TP/[TP+FN]), specificity (TN/[TN+FP]), positive predictive value (TP/[TP+FP]), and negative predictive value (TN/[TN+FN]) for each set of criteria examined.
We calculated the Cohen κ statistic to determine the agreement between criterion 2 and the other criteria individually. The Cohen κ statistic measures the amount of agreement, above what is expected by chance alone, between 2 methods. The κ value ranges from 0 to 1, where 0 indicates no agreement better than chance and 1 denotes perfect agreement.20
We analyzed those situations where criterion 2 was satisfied, but other criteria were not. We also included those patients who failed to satisfy criterion 2, but who satisfied 1 or more of the other criteria. By correlating the operative failures with the criteria satisfied, we were able to compare the utility of different criteria.
The 352 patients identified included 264 women (75%) and 88 men (25%). The mean age was 57.2 years at the time of parathyroidectomy. Focused parathyroidectomies were performed for 290 patients, and 62 underwent bilateral exploration. Seventeen patients (5%) had previous parathyroidectomies. Three hundred three patients (86.1%) had a single adenoma, 13 (3.7%) had a double adenoma, and 34 (9.7%) had 4-gland hyperplasia. The mean weight for the first gland removed was 1019 mg.
Since 1999, criterion 2 has been followed in our institution to guide focused parathyroidectomy. A few patients in this study who had operations during early 1999 underwent evaluation by the Miami criteria. All patients were divided into the following 3 groups: (1) those who had exactly 1 gland removed (298 patients), (2) those who had more than 1 gland removed according to the protocol dictated by criterion 2 (31 patients), and (3) those who had more than 1 gland removed but not according to the protocol dictated by criterion 2 (21 patients). (There were 2 patients with incomplete values.) Because criterion 2 was not followed in group 3 patients, the postoperative calcium levels could not reflect the outcome for criterion 2. These patients were subsequently excluded from the evaluation.
For patients in group 1, criterion 2 yielded a sensitivity of 0.88, specificity of 0.22, positive predictive value of 0.97, and negative predictive value of 0.06 for normocalcemia at 6 months (Table 1). For patients in group 2, criterion 2 had a positive predictive value of 0.87 (Table 1).
We used the Cohen κ statistic to determine the amount of agreement between criterion 2 and the other criteriafor detection of persistent disease after the removal of the first parathyroid gland. Criterion 2 had good agreement with criteria 1 and 3 but only moderate agreement with criteria 4 and 5 and the Miami criterion (Table 2). When we applied the same statistical methods to the IOPTH values after the last gland resected in cases of multiglandular disease, criterion 2 had only fair agreement with criteria 1 and 3 and poor agreement with criteria 4 and 5 and the Miami criterion (Table 2).
We compared criterion 2 with the other criteria by analyzing results in those patients whose falling IOPTH values satisfied criterion 2 but not the other criteria (Table 3). In situations where criterion 2 was satisfied, the operation was terminated. In 22 patients whose IOPTH level drop satisfied criterion 2 but not criterion 1, 3 patients (13.6%) had postoperative hypercalcemia at 6 months.
When criterion 2 was not satisfied, the surgical exploration continued despite, in many cases, the other criteria calling for termination (Table 3). Those patients with postoperative hypercalcemia represented failure of those criteria calling for termination of surgical exploration. For the Miami criterion and criteria 3, 4, and 5, the failure rate ranged from 4% to 9%. However, none of those patients who failed criterion 2 but satisfied criterion 1 (n = 19) had postoperative hypercalcemia. Of these 19 patients who failed criterion 2 but satisfied criterion 1, 15 (78.9%) did not have further surgical dissection, and their IOPTH values fell within the reference range after waiting 15 to 25 minutes after excision of the first gland. Additional dissection was performed for the remaining 4 patients; of these, 3 had 4-gland hyperplasia and 1 had a double adenoma.
With the increased demand for minimally invasive parathyroidectomy, objective IOPTH criteria that accurately predict surgical success, thus obviating further surgical exploration, are essential. Many reports on use of IOPTH levels in parathyroidectomy have been published, and their accuracy in predicting postoperative normocalcemia has varied widely from 80% to 96%.1,11,13 These variations could be attributed to different specimen collection methods, fewer IOPTH samples collected, or disparate timing of the blood draw. In this study, we retrospectively analyzed our institutional IOPTH data to determine which published IOPTH criterion for surgical success is most accurate in our hands.
Because criterion 2 was used intraoperatively, the outcome (6-month postoperative serum calcium level) could only be used to assess criterion 2. Because the operation was not terminated until criterion 2 was satisfied, the 6-month calcium level was dependent on criterion 2. The 6-month calcium level could not be used as an independent outcome to assess the utility of the other criteria. We found that criterion 2, in our hands, does not have the high sensitivity and specificity that others have reported for these criteria.12 Although criterion 2 had a high positive predictive value, the negative predictive value was very low. One possible explanation for the low negative predictive value was that in some cases when criterion 2 was not satisfied, no further parathyroid glands were removed, and the patient was ultimately cured. For example, the IOPTH values sometimes did not drop to within the reference range by 10 minutes after resection of the gland. Additional IOPTH values drawn at 15 to 25 minutes after parathyroidectomy fell to within the normal range without further dissection. In these situations when the magnitude of IOPTH level drop for criterion 2 was satisfied, but not within the required 10-minute time frame, these results were still recorded as negative for satisfying criterion 2.
In the 19 patients where criterion 1 was satisfied but criterion 2 was not, 15 patients (78.9%) did not require further parathyroid resection, and IOPTH levels fell to within the reference range, usually in 15 to 25 minutes. Only in 4 (21.1%) of the 19 patients were additional abnormal glands sought and found because IOPTH levels failed to drop to within the reference range (in 3 patients with 4-gland hyperplasia and 1 with a double adenoma). All 4 of these patients had 10-minute IOPTH values greater than 120 pg/mL.
One purpose of this study was to compare the criterion used at our institution, criterion 2, with other published criteria. Because the postoperative calcium level would not be an independent outcome for criteria other than that used during the operation, the traditional statistical descriptions such as sensitivity or specificity could not legitimately be applied to the other criteria in this study. Therefore, we have used other instruments such as the Cohen κ to measure how closely other criteria agreed with criterion 2. We assumed that the higher the level of agreement between criterion 2 and the other criteria, the more likely the results obtained by criterion 2 could be replicated if the other criteria were used. We found that criteria 1 and 3 had good agreement with criterion 2. This suggested that the higher stringency of criterion 2 did not necessarily predict greater surgical success.
We further compared criterion 2 with other criteria in situations where one criterion was satisfied but not the other. When criterion 2 was satisfied and the operation was subsequently terminated, criterion 1 called for further exploration in 22 patients. Of these patients, 3 (13.6%) had operative failure. This suggested that if criterion 1 were used during the operation, these failures might have been prevented. When the situation was reversed, where criterion 2 was not satisfied and criterion 1 was satisfied (19 patients), no operative failure resulted. In this situation, because criterion 2 was not satisfied, further surgical exploration was made. Therefore, the 6-month postoperative calcium level directly reflected the utility of criterion 2 and not that of criterion 1. However, only 4 of the 19 patients were found to have additional abnormal glands. Unless these additional glands were left in the patient, we could not legitimately determine the utility of criterion 1 in this circumstance. Nevertheless, there was no operative failure when criterion 1 and eventually criterion 2 were both satisfied.
The goal of parathyroid surgery, regardless of the approach used, is durable postoperative normalization of calcium levels. Hypercalcemia in the postoperative setting is considered an operative failure. Accordingly, in this study, we used the calcium level 6 months after the operation as the measure for operative success. In selecting an IOPTH criterion, surgeons must weigh the benefits of focused minimally invasive parathyroidectomy against the risk of missing an adenoma and failing to cure hyperparathyroidism. We used criterion 2 in our practice because we believed that it was more stringent, and would lead to fewer FP IOPTH interpretations. However, adding further stipulations to IOPTH criteria to make them more stringent did not result in greater operative success. Although it intuitively seems likely that requiring the IOPTH level to drop lower (eg, into the reference range such as in criterion 2) would lead to fewer cases of persistent or recurrent hyperparathyroidism, the drop of 50% from the lower (preincision) baseline value in criterion 1 probably minimizes such failures. We propose that criterion 1 may miss fewer remaining abnormal glands than the widely used Miami criterion or criterion 2 that we previously used.
Correspondence: Peter Angelos, MD, PhD, Section of Endocrine Surgery, Department of Surgery, Northwestern University, 201 E Huron St, Galter Suite 10-105, Chicago, IL 60611 (firstname.lastname@example.org).
Accepted for Publication: November 11, 2005.
Previous Presentations: This paper was presented at the 113th Scientific Session of the Western Surgical Association; November 9, 2005; Rancho Mirage, Calif; and is published after peer review and revision. The discussions that follow this article are based on the originally submitted manuscript and not the revised manuscript.
Acknowledgment: We thank Leah Welty, PhD, for her assistance with the statistical analysis.
Gary B. Talpos, MD, Detroit, Mich: Over the past 25 years, parathyroid surgery has advanced in terms of patient selection (who to operate on vis-a-vis the asymptomatic patient with minimal hypercalcemia) and smaller operations (focused, unilateral, and scan-directed), protected by IOPTH monitoring. This paper deals with the final component of these advances, the IOPTH monitoring.
Dr Angelos' results in these patients at 6-month follow-up—and I congratulate him for putting that in; a lot of papers don't put in 6-month data—suggest he doesn't really need ancillary studies such as these. His results are great.
What about our surgical trainees or colleagues who don't have this sort of volume? Can IOPTH monitoring allow them to avoid the early stages of the learning curve? Do we have IOPTH monitoring criteria that provide a litmus test to close the incision or to continue the operation that accurately predicts success or failure? Dr Angelos, I think, says yes, we have these criteria.
This work is predicated largely upon a uniform definition of primary hyperparathyroidism, yet we know this condition is heterogeneous. At least 30% of these patients have multigland disease, and the risk of multigland disease increases with age. In our Henry Ford Hospital experience, one third of our parathyroid patients from the past 10 years had normal nonsuppressed PTH determinations.
My questions deal with your patients and with your methodology.
What are the median or mean calcium and PTH values of your patients? Our data show a definite break at a calcium level of 11.4 mg/dL and at a PTH level equal to or greater than 130 pg/mL with regard to sestamibi scan accuracy. Likewise, did patient age impact your results with regard to the different criteria you evaluated, or historically to your scan accuracy at Northwestern? Have you previously performed a unilateral operation successfully concluded when IOPTH criteria were reached and then continued on to explore the contralateral side?
The second area of question deals with methodology. Can you assure us that your serum calcium and PTH determinations were performed with the same equipment? Do the calcium levels represent corrected calcium determinations? I admit you are fortunate to have the Nichols cart in the operating room rather than having to send your samples to a central laboratory as many of us do in our own institutions. But who standardizes your machine every day and who is responsible for the upkeep and maintenance of the quality assurance log? Are these results comparable day after day?
Can you tell us more about sestamibi scanning at Northwestern, because scan-positive patients impact your results by providing the denominator of patients for this procedure. Chicago is firmly within the old midwest Goiter Belt, as Detroit is. Has the prevalence of goiters impacted your results? Have you utilized exogenous thyroxine to suppress the thyroid as part of your scanning determination or methodology, or have you varied the amount of radiation labeling of the sestamibi molecule to improve your scan accuracy?
Finally, what criteria are you following now? What are you advocating that we do? I think that is the important issue for discussion.
Dr Angelos: We do not have data nor do we present the data for the preoperative median or mean calcium or PTH levels. I think he raises a very good point that if we did have those data we might see some very important differences relevant to which patients have positive scans or negative scans relative to their calcium and PTH levels.
We do not have a series of patients who had a positive sestimibi scan, had a unilateral operation with a successful drop in the PTH levels, and then went on to have the other side explored. So I cannot comment on the number of patients who may have additional abnormal glands on the other side which were not particularly active in terms of PTH levels.
The calcium levels that we report were not done on the same machine. We chose the normal range based on what is the normal range for the machines we use, but they are not all identical, and these are not corrected values. These patients, though, were outpatients, they were not chronically ill patients, so in general their albumin levels were within normal range. But we don't have those data.
In terms of who performs the standardized curves, we are fortunate to have 2 laboratory technicians who perform all of the IOPTH assays on the machine, the Nichols machine, in the operating room. They are excellent, and I think that the standardization is quite good.
We have found that 82% of our patients have a positive sestamibi scan, suggestive of a single adenoma gland. We do not have experience with the use of levothyroxine to increase our scan accuracy, but it is a good point that deserves further analysis.
Finally, I would say that we looked at these data with the idea that this added stringency of dropping to within the normal range or less than 65 pg/mL would give us a higher success rate in terms of predicting normocalcemia. What in fact we found was that drop into the normal range did not improve our results. We are now planning on opting for criterion 1, which is basing it on a 50% drop from our preincision value. The preincision PTH value tends to be lower than the postincision or preexcision value after the parathyroid gland has been manipulated.
Clive S. Grant, MD, Rochester, Minn: When you defined the TN group of patients, did you also include patients in whom the IOPTH failed to fall intraoperatively after removal of a single enlarged gland, you then further explored, found an additional enlarged gland, removed it, and the patient was cured?
Dr Angelos: Yes, we did include those. Those were the patients who had more than 1 abnormal gland removed, yes.
Dr Grant: The second question relates to the frequency of multiple gland disease in general. That is a key factor in the whole idea of IOPTH, its value and necessity. This ranges from as little as 4% to as much as 30% in Allen Siperstein's reports. What is your frequency, and do you have any idea how to try to reconcile these widely disparate numbers?
Dr Angelos: We have a 10% rate of multigland disease, which is at the lower end. I can't prove it, but my opinion is that part of the variability in these reported incidences has to do with which patients get sent to which centers. For example, in Chicago at Northwestern, we have a reputation of preferring to do a focused operation. Some of our colleagues at other institutions will routinely do 4-gland explorations, despite what the scans show. As a result, patients who have positive scans that have been done at other institutions selectively come to Northwestern because they know they will get a minimally invasive operation rather than getting a 4-gland exploration. I think that this type of self-selection by patients has a lot to do with the different rates.
Samuel Snyder, MD, Temple, Tex: Excellent review of a question that begs to be answered: Which criteria to use? Of all those that you studied, was there any statistically significant difference that you found in your results?
Dr Angelos: Part of the difficulty of the construction of the study is that it is a retrospective study. We were using a criterion, criterion 2, and then retrospectively comparing it to other data, and so in that sense we can't do direct statistical comparison, because in order to do that you really need to randomize your patients to criterion 1, criterion 2, etc, and then you can do a direct statistical comparison.
So we do—and we didn't describe it here in our presentation, but it is in the paper—we do what is called a κ analysis to try to look at how closely the results of the different studies predicted the values. But we don't have a direct statistical comparison.
Melanie Richards, MD, San Antonio, Tex: I have 1 concern. In my own practice, I measured the baseline prior to making the incision, waited the 10 minutes, and defined success as a 50% drop. However, there were several patients who dropped the 50% and returned over 6 months later with recurrent primary hyperparathyroidism secondary to multiglandular disease. These patients did not have the postexcision value in the normal range. Our current goal is to have the PTH drop 50% and normalize. How many of your patients had recurrent disease after the 6-month period?
Dr Angelos: There have been a few. I can't give you an exact number because we really didn't carefully analyze recurrent disease after 6 months.
Part of the difficulty is that beyond 6 months, if patients are doing fine, we are not generally following them. They are followed by the medical endocrinologists or their internists. So it is really only when patients actually have a problem after 6 months that we will usually see them back again.
But we do have a handful of those. At this point I don't believe that there are enough of them to be able to say that one criterion or another would be able to distinguish among those people who might have a late recurrence.
Quan-Yang Duh, MD, San Francisco, Calif: Do you cheat? What I mean by that is not everybody uses IOPTH monitoring. In fact, some people who use it use it half-heartedly. They draw the blood, close the patient, take the patient out of the operating room, and see what happens.
My own bias is that the surgeon has a very good idea whether or not it is going to be a good operation. For example, for somebody with a positive ultrasound, you go in and find a 2-cm tumor, statistically there is a very low chance you are going to find something else.
So do you ever close the patient, take the patient to the recovery room, and just get the numbers whenever they come back?
Dr Angelos: I really want to emphasize that I think that all of this looking at which criterion is the best is really an attempt to try to use it as an adjunct to good clinical judgment by the surgeon. I don't think that the PTH assay can replace that sort of a judgment.
So that being said, yes, we do cheat sometimes. It is an uncommon situation where we would close and go to the recovery room without having a result that would suggest success. It happens occasionally, but pretty infrequently.
What is a more common situation is that we go in, remove an abnormal gland, and our levels at 10 minutes have not dropped to within normal range. Much to the chagrin of our anesthesiologists, we generally then wait, draw another sample, we wait 13 minutes for the result, and it is almost always down within the normal range. In that sense, I would say that is cheating because even though the level has not dropped, I don't automatically reopen and look at the other side. So we do use clinical judgment in that sense.