Lung cancer is the most frequent cancer in the world, and its survival remains poor.Objectives
1) To determine survival of lung cancer by stage; 2) To identify factors that explain and predict the likelihood of survival and the risk of dying from this cancer and 3) To find out the distribution of lung cancer cases by stage.
Retrospective follow-up study of people diagnosed with invasive lung cancer during 2006-2011, identified through the Mallorca Cancer Registry. Cases ascertained only by death certificate and lymphomas were excluded. Sex, age, diagnostic method, histology, T, N, M, and stage, date of diagnosis, date of follow-up or death, and cause of death were collected. End point of follow-up was 31st December 2013. Multiple Imputation (MI) method was performed to obtain stage when unknown. Actuarial and Kaplan-Meier methods were used for survival analysis. Extended Cox models were built to identify factors that explain and predict survival.
A total of 2,576 lung cancer cases were diagnosed, 18.8% in women, with a mean age of 66 years. Survival by stage at 5 years after diagnosis was 46% at stage IA, 40% at stage IB, 45% at stage IIA, 22% at stage IIB, 16% at stage IIIA, 7% at stage IIIB, and 3% at stage IV, 5 years after diagnosis. Women, younger cases, patients with adenocarcinoma, and patients diagnosed in IA, IB and IIA stages had a better prognosis.
Lung cancer survival was similar for IA, IB and IIA stages, but worsened remarkably in more advanced stages. It was also influenced by age, sex, and histology.
Lung neoplasms were the most common cancer worldwide in 2012 and the first cause of tumor-related deaths in developed countries when both sexes were considered together . In Spain, lung cancer is the third most common cancer in men and the fourth in women. The Spanish Network of Cancer Registries (REDECAN) has estimated that in 2015 age-standardized incident rates (world) for lung cancer were 50.5 per 100,000 (44.4-57.4) in men and 12.7 (11.4-14.3) in women . Age-standardized mortality rates (world) in 2015 were 36.35 in men and 8.70 in women .
Lung cancer survival remains poor worldwide, as shown by CONCORD-2 , with 5-year survival below 20% everywhere in Europe, in the range of 15-19% in North America and less than 10% in some East Europa, Asia and African countries. According to EUROCARE-5 , European lung cancer age-standardized 5-years relative survival for cases diagnosed 2000-2007 was 13.0 (12.9-13.1) and small geographical differences were observed, varying from 9.0% (8.8-9.1) in the UK and Ireland to 14.8% (14.6-14.9) in central Europe. In Spain, 5-years relative survival was 10.7 (10.2-11.2), almost five points higher in women [14.7 (13.1-16.6)] compared to men [10.1 (9.5-10.6)]. Differences among Spanish regions were explored from EUROCARE-4 data, and it was observed that Navarra had twice the rate of Granada [12.4 (10.3-144.6) with respect to 6.1 (4.8-7.6)] .
In Europe, there are only a few reports about lung cancer that include survival rates by stage [7-9], none following the UICC TNM Classification 7th edition, that classifies lung cancer in 7 categories and includes small cell lung cancers . The percentage of cases with unknown stage in these studies was not negligible: 23% in Ireland  or 12% in Northern Italy , potentially weakening the validity of the results. In these situations, multiple imputation is an alternative and valid method to deal with missing values , as shown in some studies . In Spain, population-based cancer registries do not normally collect information about stage as a standard procedure but the Mallorca Cancer Registry has done so since 2006 . In addition to stage; the sex, age and histology, as well as molecular status , smoking status, treatment and deprivation [7,13], are all significant factors that determine survival [7,8,14].
The aims of the present study were: 1) to determine lung cancer survival by stage; 2) to identify factors that explain and predict the likelihood of survival, and the risk of dying from this cancer and, incidentally: 3) to find out the distribution of lung cancer cases by stage according UICC TNM Classification 7th edition.
Retrospective follow-up study of patients living in Mallorca diagnosed with lung cancer between 2006 and 2011, identified through the Mallorca Cancer Registry.
Cases with invasive C33 and C34 topography codes of any histology except lymphomas (from 9590 to 9720 both included) were included. Cases ascertained only by Death Certificate (DCO) were excluded.
As well as the topography and histology according to ICD-O 3rd edition1 , data on: sex, age, site and sub-site, date of diagnosis, diagnostic method (clinical or pathological), pathological or clinical Tumour size (T), pathological or clinical regional lymph Nodes (N), Metastasis (M), stage, date of last follow-up or date of death, and cause of death (lung cancer or other causes) were also collected.
Age was recorded as: 15-44 years old, 45-54, 55-64, 65-74, and 75 and over. Site and sub-site were recorded as: main bronchus, upper lobe, bronchus or lung, middle lobe, bronchus or lung, lower lobe, bronchus or lung and overlapping sites of bronchus and lung. Histology was recorded by modifying Parkin groups : adenocarcinoma (8140, 8211, 8323, from 8230 to 8231, from 8250 to 8260, from 8480 to 8490, from 8550 to 8560, from 8570 to 8574), squamous carcinoma (from 8050 to 8076), small cell carcinoma (from 8041 to 8045), and others/unspecified (8082, 8190, 8290, 8310, 8046, 8320, 8430, 8500, 8510, 8562, 8580, 8693, 8720, 8730, from 8141 to 8143, from 8200 to 8201, from 8240 to 8241, from 8244 to 8246, from 8470 to 8471, from 8012 to 8031, from 8010 to 8011, from 8032 to 8034, from 8000 to 8004 and from 8800 and above).
Stage was calculated according to the UICC TNM Classification 7th edition: IA, IB, IIA, IIB, IIIA, IIIB and IV . Pathological T (pT) or N (pN) was prioritised over clinical, except when neo-adjuvant therapies had been applied. An integrated approach  was used by combining pathological and clinical components to obtain the stage. An in-depth review of clinical case records with missing values for T, N or M, as well as cases without follow-up was performed.
Time was calculated from date of diagnosis to date of death or date of last follow-up. Vital status refers to the state (alive (0) or dead from lung cancer (1) or from other causes (2)) at the time of the last follow-up. Deaths from other causes were censored, as well as cases that emigrated from Mallorca, or lost cases. The starting point of follow-up was the date of diagnosis, and the end point was 31st December 2013.
Multiple Imputation (MI) method was used to obtain stage when it was unknown. The imputation model was run, where each missing value was replaced with a set of 10 imputations by chained equation (MICE) procedure and combined the Cox model results of each imputed and completed data set under Rubin’s rules . Thus, a single multiple imputation Cox regression model generated by the combination of results of 10 imputed data sets was obtained. A more detailed description about the MICE application can be found in Ramos et al. (2016) .
Survival analysis was developed by actuarial and Kaplan-Meier methods to estimate likelihood of survival and risk of death, the Log-Rank test to assess the statistical differences of the observed survival curves by each categorical variable, and the Cox regression models to identify prognostic factors of the risk of death. In this case, age was considered as a continuous variable because our interest was to know the effect of each unit increase on the risk of dying from lung cancer. The proportional hazard assumption for each covariate was tested by introducing time-dependent variables. Since some of the covariates, age and histology, did not meet this assumption, the standard Cox model could not be used, as the effect of these variables on the risk of death was time-dependent, and they could not be introduced in the model. Therefore, we applied the extended Cox regression which, in addition to analysing the effect of covariates on the risk of dying, also allows modelling the time dependent effect of age and histology covariates. To do this, we introduce time-dependent variables in the model. To compare the effect of the imputation procedure on the hazard ratio estimation of covariates, the extended Cox regression was performed before and after MI. Finally, MI was carried out with STATA 13, and survival analysis with SPSS 20.
A total of 2,659 lung cancer cases were diagnosed from 1st January 2006 to 31st December 2011. Finally, we worked with 2,576 cases because 59 DCOs, 9 lymphomas, and 15 cases where there were missing follow-up data or data concerning death were excluded. Most of the patients were men (81.2%), aged over 56 years (83.1%) and diagnosed through pathological bases (90.1%). 2,209 people died from lung cancer. T was unknown in 48.4% of cases, N in 60.2% of cases, and M in 17.7% of cases, although stage was unknown in only 12.8% of cases. After MI, stage distribution was 5.0% for stage IA, 5.1% for stage IB, 2.9% for stage IIA, 3.1% for stage IIB, 10.4% for stage IIIA, 10.9% for stage IIIB and 62.6% for stage IV. A full description of cases is presented in table 1.
Median time of survival was 209 days, and in absolute terms only 316 patients survived until the end of the study. 1 year after diagnosis, survival was 36%, 3 years after diagnosis it was 15%, and 5 years after diagnosis, survival was 11%. Survival rates at 5 years after diagnosis were 47% for stage I, 35% for stage II, 11% for stage III, and 1% for stage IV. As it is shown in table 2, survival rates by stage per year changed before and after MI, being slightly overestimated in the early stages and underestimated in the advanced stages in the original data set.
Survival curves showed differences in lung cancer survival by age (P < 0.001), sex (P < 0.001), histology (P < 0.001) and method of diagnosis (P<0.001) (Figure 1). Comparing each variable by pair of categories, no differences in survival were found between ages 15-44, 45-54, 55-64 (P > 0.5). There were also no differences between stage IA, stage IB and stage IIA (P > 0.5) or finally between stage IIB and IIIA (P = 0.086) (Figure 2).
Age, sex, histology, stage and the time-dependent variables of age and histology were included in the extended Cox regression model before and after MI (Table 3). Both models (original vs. MI model) were similar and determined that women, younger cases, patients with adenocarcinoma and patients diagnosed in IA, IB and IIA stages have a better prognosis.
Lung cancer 5-years survival in Mallorca was similar to average survival in Spain and Europe . Stage was the main factor associated with survival. No differences in survival were found between stages IA, IB and IIA, or between IIB and IIIA. In early stages (IA, IB and IIA), susceptible to surgery, lung cancer survival was close to 50%, but it worsened significantly in more advanced stages, so to speak about lung cancer survival, without considering stage, makes little sense. In addition to stage; age, sex and histology were also relevant factors associated to lung cancer survival.
Only 13% of the cases were diagnosed in the early stages, so clearly an effort should be made to make progress in secondary prevention. The US National Lung Screening Trial showed a 20% reduction in cancer related mortality and a 6.7% reduction in all-cause mortality in smokers or former smokers aged 50-75 years who had low radiation dose CT compared to chest X-ray. Since then, it has been argued that more trials are needed to confirm the findings, so seven large-scale European trials are currently in progress . Until effectiveness of secondary prevention is proved, primary prevention for lung cancer is the best option through reinforcing tobacco control.
In 12.8% of cases the stage was missing, even though clinical records of all cases were reviewed deeply. The percentage of unknown stages for lung cancer in SEER statistics  is much lower, around 2%, but it is important to consider that Local-Regional-Disseminated classification is used and not TNM. Multiple imputation of unknown stages was a useful tool to make the most of the database, as other studies previously reported [9,19]. Although multiple imputation did not shift stage distribution, survival rates changed significantly in lung and colorectal cancers  after this statistical procedure.
We have confirmed that women have better prognosis in lung cancer independently of histology, as previously shown for non-small-cell lung cancers , which contradicts previous studies that attributed better survival in women to the predominance of adenocarcinoma type . The reasons for cancer survival differences in favour of women, not only in lung cancer but in many other cancers as salivary glands, head and neck, oesophagus, stomach, colon and rectum, pancreas, pleura, bone, melanoma of skin, kidney, brain, thyroid, Hodgkin disease and non-Hodgkin’s lymphoma , have also been linked to hormonal status, different exposure to risk factors, specific driver mutations or better response to treatments, but today they remain unknown [24-25].
Age was associated with worse prognosis, as has been observed in other studies [14,24]. The novelty in our case was that we designed an extended Cox regression model incorporating age as a time-dependent variable, and we observed that the negative effect of age on survival decreases over time. This counters the hypothesis that age differences in survival are due to differences in treatment .
Finally, we observed differences by histologic group. Adenocarcinoma and squamous carcinoma had similar survival curves, better than the one for small cell carcinoma. Nevertheless, when we adjusted for sex, age and stage in the multivariate Cox analysis, the survival of adenocarcinoma was better than squamous carcinoma and small cell carcinoma survival. Other studies found no differences between non-small cell and small cell carcinomas, probably because they had a non-insignificant percentage of non-microscopically verified cases .
The limitations of this study were the availability of data from only one registry, as well as the lack of information about other prognostic factors such as comorbidities, active and passive smoking exposure, socioeconomic and performance status, type of therapy and its related toxicity or molecular status. Unfortunately, most of population-based cancer registries, at least in Europe, do not regularly collect all this information.
To sum up, lung cancer survival was similar for IA, IB and IIA stages, but remarkably worse in more advanced stages. Besides stage, female and adenocarcinoma cases had better prognosis; the reasons could be related, but to date are still unknown, so more research is needed in this field.