Accurate gestational dating is imperative for optimal maternal and neonatal outcomes. The addition of ultrasound fetal biometric measurements to a woman's reported last menstrual period (LMP) allows for improvement in gestational dating.1 Methods for estimating gestational age from ultrasound fetal biometric measurements were reported decades ago, mostly in racially homogenous populations.2 Indeed, formulas from early reports are still used today to estimate gestational age.3 Formula accuracy was initially evaluated and then further refined using pregnancies conceived by in vitro fertilization as the referent standard.4 Ultrasound imaging has continued to advance since that time5,6 and may allow for improved gestational age estimation through better image quality and more precise caliper placement. The recent performance and completion of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Fetal Growth Studies provides a unique opportunity to readdress the issue of gestational age estimation in a larger and diverse cohort, one that included a cohort of women of four different racial–ethnic groups, conditions that were optimal for fetal growth, a standardized ultrasound protocol, and extensive training and quality control of images and measurements.7,8 In addition, maternal health characteristics have changed in the last 30 years or more such as age and obesity rates and a reevaluation seems prudent. Because there is no consensus in the obstetric community on whether racial and ethnic-specific formulas are superior to racial and ethnic-neutral formulas, the NICHD Fetal Growth Studies provide this opportunity as well.
Our objectives were to develop a gestational age estimation model using ultrasound fetal parameters in women who participated in the NICHD Fetal Growth Studies, to compare this formula with one in common use3 and to assess the accuracy of racial and ethnic-specific formulas.
MATERIALS AND METHODS
The NICHD Fetal Growth Studies-Singletons was a prospective cohort study that recruited women of four self-reported racial or ethnic groups (non-Hispanic black, Hispanic, non-Hispanic white, and Asian), who had low-risk pregnancies with optimal conditions for fetal growth. To be included, women had to have a certain LMP and a first-trimester ultrasonogram that confirmed their pregnancy dating. The ultrasound estimate of gestation had to be between 8 0/7 weeks and 13 6/7 weeks and match the LMP-based gestational age within 5 days for women between 8 0/7 weeks and 10 6/7 weeks, within 6 days for those between 11 0/7 weeks and 12 6/7 weeks, and within 7 days for participants between 13 0/7 weeks and 13 6/7 weeks.1 If these criteria were not met, the patient was excluded. Cycle length and other hormonal contraception were not assessed and were not exclusions to participation. The project gestational age was then based on the menstrual date. Women were screened at 8 0/7 weeks of gestation to 13 6/7 weeks of gestation for maternal health status associated with normal fetal growth (aged 18–40 years; body mass index [BMI, calculated as weight (kg)/[height (m)]2] 19.0–29.9; healthy lifestyles and living conditions [see the exclusion criteria subsequently]; low-risk medical and obstetric history). Body mass index was calculated from self-reported prepregnancy height and weight and was confirmed by measurement at the enrollment visit. Details of the study methodology have been published elsewhere.7 A cohort of women with BMI 30.0–45.0 mg/kg2 was recruited also, but is not detailed in the original report.7 A separate report comparing the obese cohort with the nonobese cohort is in development, but this comparison is not shown in our current study on gestational age assessment.
The exclusion criteria in the NICHD Fetal Growth Studies-Singletons were extensive: current pregnancy with multifetal gestation, history of preterm low birth weight (less than 2,500 g), or macrosomic (greater than 4,000 g) neonate; history of stillbirth or neonatal death; medically assisted conception; cigarette smoking or illicit drug use in the past 6 or 12 months, respectively; one or more daily alcoholic drinks; previous fetal congenital malformation; history of noncommunicable diseases (asthma requiring weekly medication, autoimmune disorders, cancer, diabetes mellitus, epilepsy or seizures requiring medication, hematologic disorders, hypertension, psychiatric disorders, renal disease, thyroid disease); or history of gravid diseases (gestational diabetes, severe preeclampsia or eclampsia, or hemolysis, elevated liver enzymes, low platelet count syndrome). Women who participated but who had complications in the current pregnancy or adverse outcomes for the neonate were excluded (post hoc) in the original report.7 Sample size was determined so that the 5th and 95th percentiles for each gestational week for each racial or ethnic group could be precisely estimated over gestation and assuming a 10% attrition and 30% post hoc exclusion rate for women with pregnancy or neonatal complications. We estimated needing 600 women in each group to reach our estimate of 320 women (after post hoc exclusions); our high completion and lower than expected exclusion rate resulted in larger cohorts. Women were recruited from 12 participating U.S. clinical sites from July 2009 through January 2013.
After an ultrasonogram at 8 0/7 weeks to 13 6/7 weeks of gestation, women underwent serial ultrasound assessment of fetal biometry, five targeted visits approximately once every 4 to 8 weeks, in one of four ultrasound schedules, assigned randomly, so that each gestational week from 14 to 40 was represented without exposing individual women to weekly ultrasonography. All ultrasound examinations were performed using Voluson E8 machines. All study ultrasonographers underwent ante hoc training and credentialing, and their measurement techniques were subject to rigorous quality assurance.9 Ultrasonographers were blinded to the project gestational age, to the results of all prior scans, and to the gestational age of each measurement calculated in real time on the ultrasound machine to avoid measurement bias. Institutional review board approval was obtained for the NICHD and all participating clinical institutions, and the data and imaging coordinating centers. Women were enrolled from four racial–ethnic groups, 611 non-Hispanic black, 649 Hispanic, 614 non-Hispanic white, and 460 Asian or Pacific Islander women. Protocol completion rates were 92%, 93%, 93%, and 90%, respectively.
Using either a transabdominal or transvaginal approach, biparietal diameter was measured at the level of the thalami and cavum septa pellucida or the cerebral peduncles as the linear distance from the outer edge of the proximal to the inner edge of the distal skull; head circumference was measured at the same level (and often on the same images) using the ellipse function around the outer perimeter of the skull. Abdominal circumference was measured using the ellipse function circumscribing the actual or projected skin line in the transverse plane at the level of the stomach and the junction of the umbilical vein and portal sinus. Femur length was measured as the linear distance between the midpoints of each end of the calcified femoral diaphysis.
The nonobese cohort was combined with the obese cohort for the current study of gestational age assessment. A total of 468 obese women were included, 192 non-Hispanic black, 132 Hispanic, 137 non-Hispanic white, and 7 Asian or Pacific Islander. The obese cohort had the same eligibility criteria with the addition of the following exclusion criteria, which are different and not as extensive as the exclusion criteria of the singleton cohort: autoimmune diseases, cancer, chronic hypertension requiring two or more medications, chronic renal disease, diabetes while not pregnant, human immunodeficiency virus, or psychiatric disorders. These additional exclusion criteria were added because major chronic conditions such as hypertension and diabetes disproportionately affect obese women, making it hard to determine whether the elevated risks of large for gestational age or macrosomia (effects on fetal growth) are the result of those complications or the excess maternal fatness. Our study's obese cohort of women was free of these major pre-existing chronic conditions, which helps to differentiate morbidity from obesity-related fetal effects. This will be detailed in a separate report. All participating women from the nonobese and obese cohorts were included in this analysis of gestational age estimation; there were no post hoc exclusions, which is different than the original fetal growth study report.7
Gestational age estimation models were developed using a backward elimination regression technique that initially contained all biometric measurements (biparietal diameter, head circumference, abdominal circumference, and femur length), including first-order, quadratic, and interaction terms, then removing the least significant terms until only terms that were significant at the .05 level remained. We used linear mixed models to account for the intrapatient correlation associated with repeated measurements. The final chosen model (hereafter referred to as the NICHD model) was fit using all patients in the database and all gestational ages from 14 to 40 weeks. This model was validated using the technique of repeated cross-validation, a paradigm that includes development of the model on a random selection of 50% of the data set and validation of the model on the remaining 50%. For each formula, a single scan of each patient was used and chosen randomly on half of the population and tested on the other half of the population. We performed this 50% test–50% training procedure 1,000 times. Using the coefficients of this validated model, we derived a formula for estimating gestational age based on the statistically significant terms of biometric parameters (hereafter referred to as the NICHD formula).
The estimation error was defined as the mean squared difference between estimated and observed (project) gestational age among test set samples and was used to assess formula performance in several ways. First we evaluated the NICHD formula's SD of estimation in the second compared with third trimesters. Second, we compared the NICHD formula's SD of estimation with that of the 1984 Hadlock formula, which is in common use today.3 Finally, using the same statistical techniques described previously, we developed racial and ethnic-specific formulas for each of the four racial–ethnic groups and compared the performance of the NICHD formula's SD of estimation with racial and ethnic-specific formulas. The SDs were multiplied by 1.96 to obtain the estimation error with the interpretation that in 95% of patients, the difference between the formula's estimated gestational age and the project gestational age is within plus or minus this value (ie, gestational weeks or days).
Prior literature has focused on three smaller gestational age windows as separate entities because of the different variation in the accuracy of estimation among these windows.1,3,4,10–12 Therefore, we also analyzed the difference in SD of estimation error among gestational age windows (14–20, 21–27, and 28–40 weeks).
The clinical effect of more accurate gestational dating was assessed by comparing the number of women who would be outside of a prescribed range of error for gestational age prediction using either formula for each of the three gestational age windows studied (14–20, 21–27, 28–40 weeks). The designated “acceptable” range of error is based on prior literature1 and was as follows: 14–20 weeks±7 days, 21–27 weeks±10 days, and 28–40 weeks±14 days. We used a generalized estimating equation to account for multiple measurements per individual and compare rates of estimated gestational ages outside of prescribed ranges for both the Hadlock and NICHD formulas. This allowed an assessment of any statistically significant differences between the two formulas. All analyses were implemented using SAS 9.4 or R 3.1.2.
The study population included 803 non-Hispanic black (28.6%), 781 Hispanic (27.9%), 751 non-Hispanic white (26.8%), and 467 Asian (16.7%), totaling 2,802 women. Table 1 includes demographic data of the population. Table 2 presents the selected fetal anthropometric parameters and their associated estimated regression coefficients obtained from model selection. The r2 for the new NICHD formula was 0.975. The concordance among study site ultrasonographers for all measurements was in excess of r=0.99, demonstrating no significant difference among study sites. The equation is: gestational age (weeks) =10.6–0.168×BPD+0.045×HC+0.03×AC+0.058×FL+0.002×BPD2+0.002×FL2+0.0005×(BPD×AC)−0.005×(BPD×FL)−0.0002×(HC×AC)+0.0008×(HC×FL)+0.0005×(AC×FL), in which BPD=biparietal diameter, HC=head circumference, AC=abdominal circumference, and FL=femur length.
Figure 1 demonstrates the observed gestational age based on the LMP (confirmed by first-trimester ultrasonogram) plotted against the gestational age estimated by the NICHD formula applied to our population for gestational weeks 14–40. The figure shows that the accuracy of estimation of the formula is higher early in gestation and decreases with advancing gestation.
Figure 2 shows the gestational ages estimated by the NICHD formula compared with the Hadlock formula. The figure suggests that early in gestation, the two approaches give similar estimates and demonstrate increased differences in estimates with advancing gestation.
Table 3 presents the error of estimation results for the NICHD formula for different gestational age windows. The estimation of gestational age in weeks is more precise over the interval of 14–20 weeks compared with a later interval of 21–27 weeks. Furthermore, the accuracy is lower from 28 to 40 weeks of gestation as compared with the other two windows. This indicates more accurate estimation earlier in gestation. The NICHD formula's performance was similar to Hadlock at 14–20 weeks of gestation, whereas it was significantly more accurate after 20 weeks of gestation. The accuracy of the NICHD formula is ±7 days from 14 to 20 weeks of gestation, ±10 days from 21 to 27 weeks of gestation, and ±17 days from 28 to 40 weeks of gestation. The new formula performed better than a formula developed in 1984 with an estimation error of 10.4 compared with 11.2 days from 21 to 27 weeks of gestation and 17.0 compared with 19.8 days at 28–40 weeks of gestation, respectively. Table 3 also shows the comparison of racial and ethnic-specific formulas and the NICHD formula applied to each racial–ethnic group. Results were similar overall (gestational weeks 14–40) and for each smaller gestational age window (14–20 weeks, 21–27 weeks, 28–40 weeks).
Figure 3 and Table 4 present the error of estimation for each gestational week when applying the NICHD and Hadlock formulas to our population. This can be used to establish gestational age when LMP differs from the ultrasound biometry outside of the estimated error range. The results are better with the NICHD compared with the Hadlock formula after 35 weeks of gestation. Before 35 weeks of gestation, the Hadlock and NICHD formulas are very similar.
Table 5 presents an example of the effect of more accurate gestational dating using the NICHD formula compared with the Hadlock formula. There is a significantly smaller proportion of patients outside the prescribed range of acceptable error using the NICHD formula at 21–27 weeks and 28–40 weeks of gestation.
We have developed a model for gestational age estimation and compared this new formula with a commonly used formula (Hadlock 1984).3 We found the new formula to have improved estimation in the second and third trimesters and used a robust method for evaluating racial and ethnic-specific formulas and found that they are not superior to the new formula.
Similar to prior literature, gestational age estimation is better with the NICHD formula from 14 to 20 weeks of gestation compared with later windows, which indicates less biological variation earlier in gestation.2–4 The accuracy of the NICHD formula is ±7 days from 14 to 20 weeks of gestation, ±10 days from 21 to 27 weeks of gestation, and ±17 days from 28 to 40 weeks of gestation, although we also provide individual week errors, which can be used to establish gestational age when LMP differs from the ultrasound biometry outside of the estimated error range (Table 4). The large population also allows us to support a recommendation to use ultrasonography to establish gestational age when fetal biometry at 14–15 weeks of gestation is 8 days or more different than the LMP. This type of recommendation has previously been lacking in the literature.13
We believe several factors are responsible for the observed improvements. The rigorous credentialing process for ultrasonographers, including masking gestational age at each examination, provided an unbiased method for obtaining each measurement and allowed minimization of random and systematic error.4,14 Also, the ongoing quality improvement program provided sustained use of a consistent measurement method.9,15 Modern ultrasound equipment with better image quality also possibly improved the accuracy. Improved image quality in all clinical situations (eg, previous abdominal scar, increased BMI) has been a focus of ultrasound vendors for many years.16–18
The results of Table 5 demonstrate the clinical importance of more accurate estimation. Because the SDs of estimated errors between the two groups cannot be compared using inferential statistics, and thus P values cannot be displayed, we created Table 5 to demonstrate that the clinical effect of improved estimation with the NICHD formula is significant.19 These findings are particularly important for women who present late to prenatal care, when ultrasonography has a wider range of error. Currently, 5–15% of women fall outside the prescribed range of acceptable error. Using the NICHD formula, 2–5% of women would have more accurate dating, thus preventing interventions for preterm labor or postterm pregnancy (Table 5). Up to 25% of the entire population in the United States undergoes induction of labor,20 providing a significant potential to prevent morbidity from unnecessary interventions. The error range for the third trimester (+17 days) argues that caution in gestational dating is necessary late in gestation.
We believe the finding that the racial and ethnic-specific formulas were similar to a racial and ethnic-neutral formula is the result of the adequately represented racial–ethnic diversity in the population from which our formula was developed.21–23 In addition, this fact simplifies the workflow in ultrasound units because a single formula for gestational age estimation is accurate for all races and ethnicities.
The NICHD Fetal Growth Studies showed previously that there is a significant difference in many fetal measurements when comparing different racial–ethnic groups.7 The results of the current study found no difference among different racial–ethnic groups. This discrepancy is explained by the fact that growth trajectories, as assessed in the parent trial, and the relationship between cross-sectional measures and gestational age, as assessed in the current trial, reflect different biological processes.
Our study has strengths. We included a large number of women in multiple racial–ethnic groups in many centers. We controlled for bias inherent in using repeated measurements with the use of linear mixed regression models. We used statistical methodology to control for the problem of developing the predictive model in the same population in which we then tested the model—cross-validation with 1,000 iterations of a 50% test–50% training algorithm. The richness of our data allows us to suggest that our results will be more generally applicable to the U.S. obstetric population than any other formula.
Our study has limitations. Our study population did not have a known date of conception such as with in vitro fertilization that allows us to pinpoint the “true” gestational age. Even with a known date of conception, as is seen with in vitro fertilization pregnancies, ultrasonography is associated with an unavoidable range of error, which is similar to the range of error seen in our study.4 In addition, both the NICHD and Hadlock formulas were subject to the same estimation errors because their estimates were both compared with the same project gestational age. So, compared with a commonly used formula, the NICHD formula is associated with significantly smaller error ranges when gestational age is estimated beyond 20 weeks. We cannot comment on gestational age assessment in pregnancies conceived by assisted reproductive technologies because they were not included in our cohort.
Using the NICHD formula has the potential to decrease unnecessary obstetric interventions with associated decreases in morbidity and cost savings. Given the large number of annual births in the United States and the significant rate of interventions, these improvements could have important implications at a population level.
1. Methods for estimating the due date. Committee Opinion No. 700. American College of Obstetricians and Gynecologists. Obstet Gynecol 2017;129:e150–4.
2. Harrison RF, Roberts AP, Campbell S. A critical evaluation of tests used to assess gestational age. Br J Obstet Gynaecol 1977;84:98–107.
3. Hadlock FP, Deter RL, Harrist RB, Park SK. Estimating fetal age: computer-assisted analysis of multiple fetal growth parameters. Radiology 1984;152:497–501.
4. Chervenak FA, Skupski DW, Romero R, Myers MK, Smith-Levitin M, Rosenwaks Z, et al. How accurate is fetal biometry in the assessment of fetal age? Am J Obstet Gynecol 1998;178:678–87.
5. Morales-Roselló J, Hervás-Marin D, Stirrup O, Perales-Marín A, Khalil A. International standards for fetal growth: relevance of advances in ultrasound technology. Ultrasound Obstet Gynecol 2015;46:631–2.
6. Papageorghiou AT, Ohuma EO, Altman DG, Todros T, Cheikh Ismail L, Lanbert A, et al. International standards for fetal growth based on serial ultrasound measurements: the Fetal Growth Longitudinal Study of the INTERGROWTH-21st Project. Lancet 2014;384:869–79.
7. Buck-Louis G, Grewal J, Albert PS, Sciscione A, Wing DA, Grobman WA, et al. Racial/ethnic standards for fetal growth: the NICHD Fetal Growth Studies. Am J Obstet Gynecol 2015;213:449.e1–41.
8. Grantz KL, Grewal J, Albert PS, Wapner R, D'Alton ME, Sciscione A, et al. Dichorionic twin trajectories: the NICHD Fetal Growth Studies. Am J Obstet Gynecol 2016;215:221.e1–16.
9. Hediger ML, Fuchs KM, Grantz KL, Grewal J, Kim S, Gore-Langton RE, et al. Ultrasound quality assurance for singletons in the National Institute for Child Health and Human Development Fetal Growth Studies. J Ultrasound Med 2016;35:1725–33.
10. Sabbagha RE, Tamura RK, Socol ML. The use of ultrasound in obstetrics. Clin Obstet Gynecol 1982;25:735–52.
11. Romero R, Quintero R, Brekus C. Assessment of gestational age. In: Divon M, editor. Abnormal fetal growth. New York (NY): Elsevier Science; 1991.
12. Kalish RB, Thaler HT, Chasen ST, Gupta M, Berman SJ, Rosenwaks Z, et al. First- and second-trimester ultrasound assessment of gestational age. Am J Obstet Gynecol 2004;191:975–8.
13. Spong CY. Redefining ‘term’ pregnancy: recommendations from the defining ‘term’ pregnancy workgroup. JAMA 2013;309:2445–6.
14. Perni SC, Chervenak FA, Kalish RB, Magherini-Rothe S, Predanic M, Streltzoff J, et al. Intraobserver and interobserver reproducibility of fetal biometry. Ultrasound Obstet Gynecol 2004;24:654–8.
15. Boamah EA, Asante K, Ae-Ngibise K, Kinney PL, Jack DW, Manu G, et al. Gestational age assessment in the Ghana Randomized Air Pollution and Health Study (GRAPHS): ultrasound capacity building, fetal biometry protocol development, and ongoing quality control. JMIR Res Protoc 2014;3:e77.
16. Blackwell R. New developments in equipment. Clin Obstet Gynaecol 1983;10:371–94.
17. Malone FD, Athanassiou A, Nores J, D'Alton ME. Effect of ISDN bandwidth on image quality for telemedicine transmission of obstetric ultrasonography. Telemed J 1998;4:161–5.
18. D'Alton ME, Cleary-Goldman J. Education and quality review for nuchal translucency ultrasound. Semin Perinatol 2005;29:380–5.
19. Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13–22.
20. Chauhan SP, Ananth CV. Induction of labor in the United States: a critical appraisal of appropriateness and reducibility. Semin Perinatol 2012;36:336–43.
21. Schofield LS. Correcting for measurement error in latent variables used as predictors. Ann Appl Stat 2015;9:2133–2152.
22. Al-Gindan YY, Hankey CR, Govan L, Gallagher D, Heymsfield SB, Lean ME. Derivation and validation of simple anthropometric equations to predict adipose tissue mass and total fat mass with MRI as the reference method. Br J Nutr 2015;114:1852–67.
23. Al-Gindan YY, Hankey C, Govan L, Gallagher D, Heymsfield SB, Lean ME. Derivation and validation of simple equations to predict total muscle mass from simple anthropometric and demographic data. Am J Clin Nutr 2014;100:1041–51.