As an indicator of growth rate, bone age is used to determine the level of development and to predict chronological age when this is not known. The repeatability of the measurement (intra-and inter-observer) and the systematic error inherent to Bone Age (BA) estimation using the Greulich & Pyle (GPA) method has been estimated using a sample from the Spanish population. The inter-observer Lin’s concordance correlation coefficient was ρc=0.98, with a value of 0.05±0.52 for girls and 0.06±0.44 for boys (difference between observations in years), whereas the intra-observer Lin’s concordance correlation coefficient was ρc=0.99 (0.05±0.27 for girls and 0.10±0.36 for boys). The mean difference between the bone and chronological ages was 0.51±1.13 years in girls and 0.32±1.11 years in boys. Given the systematic errors with regard to the reference population, we propose the adjustment required to apply the ages calculated according to the atlas and indicate the limitations inherent to predicting chronological age when only information from the bones in the hand and wrist is used.
The technological, social and political upheavals of the past few years have modified lifestyles, revolutionised economies and caused the downfall of numerous companies and the emergence of many others in light of the new business opportunities created. The impetus of the so-called “Information Society” and the “Knowledge Society” is producing a globalisation of economies and markets and has intensified migratory flows in search of new opportunities. In addition, the regional instability caused by armed conflicts has caused, and is currently causing, many people to seek refuge in more stable settings [1,2]. These migratory flows have created an increasing need to estimate the chronological age of people lacking official papers or checkable documentation [3,4]. The relevance in a legal setting is that age determines the possibilities for repatriation, the scope for detention, the bodies responsible for attending asylum requests and the administrative, civil and penal procedures applicable in such situations. In addition, in the event of death, the effective lack of identification of the body has legal, civil and economic consequences for both the deceased and their family. Efforts in the ethical and health fields are directed at caring for the victims of trafficking en route and at the destination and children and adolescents living under precarious conditions [5-8].
In the absence of reliable information, the chronological age has been related to the degree of somatic development and maturity of individuals (biological age) and, specifically, bone age, dental age, morphological age (growth age) and degree of sexual maturity [9,10]. In general, the bone age is evaluated on the basis of a radio graphical examination of the bones, with the (left) hand and wrist being the most widely used. Bone maturity is determined from the degree of ossification of the bones in wrist (carpus) and the development and degree of fusion of the metacarpal bones, phalanges and distal epiphyses of the ulna and radius [11-15]. Using an Orthopantomogram (OPG) and intraoral dental X-rays, dental age evaluates the eruption status of teeth (number and groups of teeth that emerge in the oral cavity [16,17]) and the mineralisation status of the dental crowns and roots [18,19]. Morphological age is determined on the basis of physical characteristics such as height, weight and general body shape and a comparison of the results with growth curves [9,20]. The degree of sexual maturity refers to the developmental status achieved by the secondary sex characteristics and the onset of menarche in girls. Sexual maturity is also related to the general body growth acceleration observed during puberty . A combination of these components (physical examination, dental examination and bone examination) is used to achieve greater accuracy and precision when estimating the chronological age. Given the correlations between these components and the suitability and possibilities offered by the different methods [21,22], the “Study Group on Forensic Age Diagnostics” (http://www.charite.de/rechtsmedizin/agfad/index.htm) has established a procedure for estimating chronological age in a forensic setting. Unusually, the protocol proposes evaluating the degree of fusion of the proximal epiphysis of the collar bone by X-ray or CT scan to determine whether an individual is older than 21 years [23-25].
A large number of methods for evaluating bone maturity (bone age) have been proposed in the literature. The Greulich and Pyle Atlas (GPA) is currently the most widely used such method due to the simplicity of its application, it’s easy accessibility, its reasonable predictive ability (the hand and wrist present numerous ossification sites) and the limited effect of radiation in this region (0.0001 to 0.1 mSV per exposure) . The GPA contains a series of standard X-rays of the left hand/wrist representative of bone maturation for each age group and sex (0 to 18 years of age in girls and 0 to 19 years in boys) along with maturity indicators for the bones in the hand and the distal epiphyses of the ulna and radius. To define these references, the authors used a sample population comprising 1000 boys and girls of high social class (with no nutritional problems or diseases that could affect their growth) born in Cleveland, Ohio (USA), between 1931 and 1942. Based on European standards and descendants of Europeans, the method of Tanner and Whitehouse 3 (TW3-RUS) is also used and has been automated using image-processing techniques. TW3 is a quantitative alternative based on skeletal maturity scores for the bones in the hand and wrist .
Application of the GPA for determining bone age requires the standards to be fitted to each population as genetic, environmental and socioeconomic factors can all influence the degree of bone maturity and, to a large extent, explain the systematic error encountered when applied to a population other than the reference population [9,27]. The application and adjustment factors (systematic error) have been described for large number of populations: Central Europe ; Caucasians ; Malawi ; Taiwan ; Morocco ; Turkey ; Italy ; South Africa ; Pakistan ; Asian, African/American, Caucasian and Hispanic  and French .
In the paediatric field, an estimation of bone age is of interest as an indicator of growth rate disorders. An ability to predict possible growth delays allows progress to be made in preventive and, if required, corrective therapy. In this regard, the information provided by the bone age is usually complemented by that provided by morphological growth indicators (in absolute or relative terms). In addition, paediatric monitoring allows the evolution of these components with time to be observed and, as a result, one-off estimates to be replaced by the estimates provided by time series [39,40].
Given the above, the interest of the present study lies in evaluating the bone age and using this estimation to predict chronological age when this is not known. To ensure the reliability of such measurements we have proposed the following objectives: 1) to evaluate the measurement repeatability (inter-and intra-observer error) when determining bone age using the GPA method and ensure that the error is admissible; and 2) to evaluate the systematic error inherent to the estimation of bone age using the GPA method when applied to the Spanish population (effect of differential growth with respect to the reference American population) and to propose correction factors (correction of ages in the X-rays found in the atlas).
The database studied has been created from X-rays of the left hand and wrist corresponding to 489 individuals of Spanish nationality (244 girls and 245 boys), ranging from 10 to 18 years of age, attended at the Hospital Sant Joan de Déu, the University Hospital of the University of Barcelona, which specialises in paediatrics, gynaecology and obstetrics. A total of 60 individuals (30 girls and 30 boys) were selected at random for ages from 10 to 17 years, and 4 girls and 5 boys for the 18 year age group, which is poorly represented at this institution. All cases studied were previously anonymised in accordance with Spanish legislation. To avoid any effect caused by atypical values, individuals who presented developmental anomalies or fractures were not considered, and X-rays in which the bones appeared distorted (poor quality image or incorrect projection) were also discarded. The chronological age was determined as the difference between the date the X-ray was taken and the date of birth (month and year) in all cases. All X-rays selected in this study were taken according to the established protocol: the patient is seated at the edge of the radiology stretcher, resting the hand to be examined on the Radiography (RDI) chassis. To obtain an Anteroposterior (AOP) projection of the hand, patients extend their fingers, which are slightly separated and, as for the carpus and metacarpus, in intimate contact with the plate. The imaging technician then places the X-ray beam above the third metacarpus to record the X-ray. The bone age of individuals was estimated by two observers with an understanding of human osteological variation, with experience in interpreting radiographs and with the support of radiologists and the research team.
Greulich and Pyle method
The method originally proposed and developed by Todd 1937 , and subsequently extended by his pupils Greulich and Pyle 1959 , is a qualitative method for determining the maturity of children and adolescents. Using X-rays of the left hand and wrist, in a flat position and with a posterior view, the bone maturity is determined from the degree of mineralisation of the bones in the wrist (carpus), the development of the metacarpal bones and phalanges and the degree of fusion of the distal epiphyses in the ulna and radius. In practical terms, the ordered reading of the bones that appear in the X-ray was performed to determine the presence or absence of certain carpals, assess the degree of ossification of the epiphysis, establish the shape and size of the bones and estimate the degree of fusion of the epiphyses and their respective diaphyses. In light of this information, and to avoid confusion when irregularities in the order of appearance of the bones occur, age was determined on the basis of greater similarity with the standard in the atlas.
The reliability of the measurement is related to the repeatability (inter- and intra-observer) and the systematic error in applying the method. Lin’s concordance correlation coefficient  has previously been calculated to assess the repeatability of the measurement when the GPA method is applied. The value of this coefficient was calculated for girls and boys:
where n is the sample size, y1i is the first set of measurements (first observer or first replicate), y2i the second set of measurements (second observer or second replicate), y1, s2y1 and y2, s2y2 are the mean and variance of the first and second sets of measurements. The mean difference between observations (inter- and intra-observer) and the standard deviation were also calculated. The fit of the inter-observer difference (estimation of bone age) to the normal distribution was tested using the Shapiro-Wilks W statistic.
Evaluation of bone age using the GPA method is affected by the systematic error inherent to application of this method in different populations (differences with respect to the American reference population). Formally,
BAi(sp) = BAi(GPA) + ei(µ,σ) = µ + BAi(GPA) + ei(0,σ)
where BAi(sp) is the bone age for the study population, BAi(GPA) the bone age for the reference population (GPA) and ei(µ,σ) the random error (with an unknown distribution, mean µ (systematic error) and standard deviation σ).
As the systematic error and standard deviation may be different (the values reported in the literature demonstrate this for most populations) for each category j (age and sex class), the model can be rewritten as follows for each category:
BAi/j(sp) = µj + BAi/j(GPA) +ei(0,σj)
where BAi/j(sp) and BAi/j(GPA) are the bone ages for individual i from category j for the study population and reference population respectively. In this context, the estimation of the systematic error is reduced to:
as the mean (normal) growth is related to the mean chronological age of the study population in each category
(sp) is the chronological age for individual i from category j).
As such, the mean and standard deviation was calculated for the chronological and bone age for each sex and age group, the equality of the mean for each age was compared (Student’s t-test for paired samples), the differences between the means for the chronological and bone ages determined (systematic error estimation), the fit of the residuals to the normal distribution tested using the Shapiro-Wilks W statistic, and a correction factor for the age in the atlas radiographies (GPA) proposed. Separate Bland-Altman plots are presented for girls and boys in order to better visualise the importance and randomness of the differences in each age group .
The standard deviation in chronological age can be assigned to the random selection of the individuals within their age group (
corresponds to the uniform selection in the single-year age groups). The standard deviation obtained when estimating bone age can be attributed to the intrinsic variability of the Spanish population (growth differences between individuals of the same sex and age) if the measurement error is low.
This variability in bone age is transferred to estimation of the chronological age (when this is unknown).
The repeatability of the measurement when the GPA method is applied presents two components: the reproducibility (inter-observer variability) and the repeatability (intra-observer variability). The repeatability of the measurement when the GPA method is applied was evaluated separately for boys and girls by two observers, who determined the bone age for the whole sample (244 girls and 245 boys). Lin’s concordance correlation coefficient was estimated to be ρc,G=0.9800 for girls and ρc,B=0.9849 for boys. The difference between observers fits a normal distribution (W=0.9936 and p=0.9943 for girls, and W=0.9836 and p=0.5538 for boys). The mean difference between observers is essentially irrelevant, at 0.0546 years (equivalent to 20 days), with a standard deviation of 0.5246 years (equivalent to 6.3 months), for girls, and 0.0633 years (equivalent to 23 days), with a standard deviation of 0.4401 years (equivalent to 5.3 months), for boys. The repeatability of the measurement when the GPA method is applied was performed using a single observer and a smaller sample (30 girls and 30 boys). Lin’s concordance correlation coefficient was estimated to be ρc,G=0.9944 for girls and ρc,B=0.9880 for boys. The mean difference between observations is essentially irrelevant, at 0.0500 years (equivalent to 18 days), with a standard deviation of 0.2739 years (equivalent to 3.3 months), for girls, and 0.1000 years (equivalent to 36 days), with a standard deviation of 0.3572 years (equivalent to 4.3 months), for boys (Figure 1).
Figure 1: Comparison of the measurement of bone age: a) and b) by two observers; c) and d) with two replicates.
The systematic error when evaluating bone age for the Spanish population affects both sexes (girls and boys) significantly. These systematic errors are not uniform and vary between -0.2389 and +1.0417 years (equivalent to -2.9 and +12.5 months), for girls and -0.0750 and +0.6417 years (equivalent to -0.9 and 7.7 months), for boys (without considering the 18 years age group due to the limited sample size). Overestimations are the largest and affect individuals with chronological ages of between 11 and 16 in girls and 10-11 and 14-17 years in boys. Underestimations of bone age were not found to be significant (without considering the 18 years age group). The residuals in the model for a normal distribution (W=0.9872 and p=0.7485 for girls, and W=0.9856 and p=0.5749 for boys). Adjustment of the atlas (subtracting the systematic error) eliminates the bias (mean difference) between bone age and chronological age for all age groups and sexes (Table 1). The importance and randomness of the differences in each category (age and sex class) can be visualised using a Bland–Altman plot (Figure 2).
||Comparison of means
Table 1: Evaluation of bone age using the GPA method: description of the sample (size, chronological age and evaluation of bone age, mean± standard deviation); comparison of means by age group (* denotes significant differences in the mean); and systematic bias in bone age estimation).
Figure 2: Bland–Altman plots for the comparison between chronological and bone age.
The values for Lin’s concordance correlation coefficient obtained for the sample of the Spanish population used in this study highlight the robustness of bone age evaluation as both the inter- and intra-observer error are low. The repeatability obtained has been related to the high ability to identify the same closest reference (X-ray in the atlas) when the GPA method is used. The value obtained in the inter-observer study (ρc=0.98: 0.0546±0.5246 years for girls and 0.0633±0.4401 years for boys) is in accordance with previous findings in the literature. Thus, the mean difference obtained is similar to that reported by Groell et al. 1999  for a sample of 47 individuals from central Europe, and that reported by Lynnerup et al.,  for a heterogeneous sample comprising 159 individuals from various geographical settings, including Vietnam, India (Asia); Russia, Belarus, Georgia, Latvia, Romania, Ukraine, Moldova, Albania (Central and Eastern Europe); Iraq, Iran, Palestine, Afghanistan (Middle East); Rwanda, Zaire, Nigeria, Guinea, Somalia, Cameroon, Sierra Leone, Burundi, Angola (Sub-Saharan Africa); Algeria, Libya (Northern Africa). Similarly, the standard deviation is similar to that reported by Groell et al.,  but much lower than that reported by Lynnerup et al., . The similarity between the variances for the differences between observations (inter- and intra-observer) suggests some degree of overlap between the two sources of error, in other words the repeatability explains the reproducibility . Consequently, trained observers do not significantly increase the variability when evaluating bone age using the GPA method.
Although individual growth patterns tend to be regular, genetic, environmental and socioeconomic factors influence the bone maturation process in individuals and explain the variability between individuals in the same population . When compared with a reference population (GPA, TW3,…), the difference between chronological age and bone age is due to the different influence of explanatory factors in the two populations (reference and study), temporal asynchronism (when the effect of the explanatory factors is similar) or to the combined effect of these two causes. The systematic error observed in our sample (0.5052 years in girls and 0.3153 years in boys) is higher than that for the reference population in the Atlas and for the current American population (period 1931-1942 ; current African-American and European-American ). However, our sample is in agreement with the errors described in other populations from the same region (Sweden ; Turkey ; Holland-Caucasia ; Italy-Caucasian ) but lower than that for Asian and African populations (Malawi ; Taiwan ; Morocco ; Pakistan ). The positive systematic error observed in most categories of the Spanish population can be explained by the earlier maturation in Western European countries and the USA  and by changes in surrounding conditions. With a temporal imbalance in the maturation rate of the two populations (Spanish and reference GPA), the different growth rates over time (prepubertal depression, pubertal spurt and deceleration) explains the non-uniform differences in the categories. In this context, minor differences were expected in the prepubertal depression phase (10-11 years in girls and 12-13 years in boys), medium to high variances in the pubertal spurt phase (11-14 years in girls and 13-16 years in boys), and a progressive decrease in differences in the deceleration phase (14-17 years in girls and 16-17 years in boys). The systematic error reflects the analogies and difference with respect to the reference population. From this perspective, the Spanish population does not differ from those in its regional geographical surroundings (Europe and the Mediterranean region). With regard to application of the method, the degree of bone maturity in individuals is related to the standard for the population itself. However, the atlas must be corrected (in the opposite sense to the systematic error) to minimise errors.
The standard deviation obtained when estimating bone age indicates the intrinsic variability of the population, in other words growth differences between individuals of the same sex and age. The pubertal spurt (accelerated growth rate due to the synergic action of growth hormone and sex steroids) explains the greater variability in bone age per category (age and sex class) and the corresponding transfer to the variability in the differences between chronological age and bone age (GPA) to a large extent. Similarly, the deceleration in growth rate explains the decrease in the variability of the bone age and the progressive transfer of this effect to the differences. This effect is clearly seen in the Spanish population, where the spurt occurs from the age of 11 years in girls and 13 years in boys, and the deceleration occurs from around 14 years in girls and 16 years in boys . The intrinsic variability in bone age for the Spanish population (s=1.1100 years in girls and 1.1268 years in boys) is slightly higher than that for the corresponding American populations [11,46], similar to that for populations in the same region and some parts of Africa [29,30,35,47,48], and lower than that for Asian populations and other parts of Africa [31,32]. The variability in bone age reflects the heterogeneity of growth in the population used to define the population-based limits of normality. In this context, bone age becomes an indicator of possible growth rate disorders. In a forensic setting, bone variability is transformed into uncertainty/error when chronological age is estimated (with a probability of 0.95, the maximum error is quantified as ±1.96•s). With intermediate variability (s≈1.1 years), the results obtained for the Spanish population highlight the limitations as regard predicting the chronological age when only bone information for the hand and wrist is used.
Two actions can be taken to achieve greater reliability when predicting chronological age, namely reducing the errors in bone age estimation and adding complementary information. A reduction in the errors inherent to bone age estimation using the GPA method may be difficult as the (inter- and intra-observer) repeatability in the sample from the Spanish population is low, the systematic error has already been evaluated and the age adjusted in the atlas, and a reduction in the resolution effect (increasing the number of reference X-rays) is complex and of limited utility. The application of alternative methods based on European references also failed to provide satisfactory results. Thus, when using the TW3-RUS method, the systematic error was reduced in some cases but the variability was only reduced in very homogeneous populations: 0.5067±0.7967 in a sample of Spanish boys aged between 12 and 14 years (elite youth soccer players); 0.2250±0.7000 in a sample of German girls aged between 12 and 15 years, and 0.9000±1.0750 in a sample of German boys of the same age; 0.2060±1.060 in a sample of Turkish girls aged between 11 and 15 years, and 0.1560±1.1200 in a sample of Turkish boys of the same age [51-53]. Providing complementary information (examination of other bones, physical/anthropometric examination and/or dental examination) and directing efforts towards multivariate treatment is the path followed by the “Study Group on Forensic Age Diagnostics”. Indeed, this is probably the preferred path.
We would like to thank the Diagnostic Imaging Department at the Hospital Sant Joan de Déu de Barcelona for providing us with access to the X-rays required to conduct this study. We would also like to thank the Editor and the anonymous referees, whose suggestions have allowed us to improve this paper.