The Follow-up of a Cohort Study of the Joint Effects of Environmental and Genetic Factors on Esophageal Cancer in North-East Iran
Background
There are large geographical differences in the incidence of oesophageal cancer (EC) around the world with both Iran and the UK being countries where the incidence is highest1. However, the epidemiology of esophageal cancer in the UK differs markedly from that in most of the rest of world. In much of the world outside the UK, the disease occurs primarily in men, over 90% is attributable to alcohol and tobacco, and it comprises mainly squamous carcinomas. In the UK, by contrast, adenocarcinomas have become the most common histology2. Also, at least in women, the proportion attributable to alcohol and tobacco is small for either histological type3 . Low consumption of fruit and vegetables and high consumption of hot drinks instead appear to be of greater importance, although the data are limited and based on a retrospective study design. Similar lifestyle factors also appear relevant in north-east Iran, where rates are 20-fold higher than in the UK, and are similar among men and women4.The earliest reports of high incidence of esophageal cancer (EC) in the northern parts of Iran date back to the early 1970s5 6. A population-based cancer registry had been established in 1969 as a joint effort between Tehran University and the International Agency for Research on Cancer (IARC), and subsequently confirmed the high incidence of EC in the eastern portion of the Caspian Sea littoral, in the area which is now known as Golestan Province. The highest incidence rates were reported from the semi-desert plain settled mainly by people of Turkmen ethnicity in Gonbad and Kalaleh counties, with estimated incidence rates of 109 per 100,000 among men and 174 per 100,000 among women (adjusted to the 1970 World Standard Population). The registry also showed low incidence of EC in the nearby Gilan province, 300 km to the west of Golestan, with incidence rates of 15 per 100,000 among menExtensive studies on the descriptive epidemiology of the disease were conducted in northern Iran in the 1970's by the University of Tehran and the International Agency for Research on Cancer (IARC). Rates varied from moderate to the south-east of the Caspian Sea, to extremely high in the arid, semi-steppe region to the east of the Caspian (Golestan province) where the population is mainly of Turkmen ethnicity7. The continuation of these high rates into the 1990's has been confirmed both by a recent screening study8and by a preliminary report from the main treatment referral centre in the province9. More recently, case control studies have been carried out in the region and these have shown that tobacco and opium consumption 10, family history11, and infection with human papilloma virus12are all associated with an increased risk of ECEtiological hypotheses related to diet and life style can be best addressed in prospective cohort studies, in which measurement error can be reduced and recall bias is minimal. From 2002 to 2003, a pilot study of 1057 subjects was conducted by the Digestive Disease Research Center (DDRC) of Tehran University of Medical Sciences in collaboration with IARC and the US National Cancer Institute (NCI) to evaluate the logistical aspects of establishing a prospective study in Golestan. The aims of the pilot study were to assess the response rate of the study population, to develop valid and reliable methods for assessing nutritional, anthropometric and life style factors, to develop follow-up methods to ascertain mortality and cancer incidence among the enrolled subjects, and to establish efficient procedures for collecting and storing biological samples. Results of the pilot study confirmed the feasibility of conducting a prospective cohort study in Golestan 13 14. Subsequently, the Golestan Cohort Study (GCS) was launched in January 2004. The study protocol and the informed consent used for this study were approved by the ethical review committees of DDRC, IARC, and NCI. The three-year enrollment phase of the cohort was funded in part by a project grant from Cancer Research UK (PI Prof Bruce Ponder ref: C20/A5860). Enrollment started in 2005 and closed in June 2008 when the accrual goal of 50 000 subjects was reached. Funding for long term follow-up of the full cohort is now sought.
Methods and progress
2.1 Aims
The primary aims of the Golestan cohort are:
(i) To identify risk factors for EC by a comprehensive assessment of ethnicity, occupational history, socioeconomic status, past medical history, family history of cancers, gastrointestinal symptoms and signs, tobacco, opium and alcohol use, oral health, anthropometric characteristics, physical activity, and tea drinking habits, including tea temperature. Nutritional patterns are also evaluated using a food frequency questionnaire (FFQ) specifically developed for this population and validated during the pilot study.14 The FFQ covers 116 food items, including bread and cereals, meat and dairy products, oils, sweets, legumes, vegetables, fruits, and condiments, as well as cooking methods.
(ii) To establish biospecimen banks for blood, urine, hair, and nail samples to be used in molecular and genetic studies of cross-sectional or nested case-control design.
(iii) To provide a model for population-based studies in a country in economic and social transition based on collaboration between local health workers, local health authorities, national research centers, national government, and international research institutions.
2.2 Who is in the sample?

The study population is a sample of the Golestan population, aged 40-75 years. The primary goal was to establish a cohort of 50,000 healthy individuals, with equal numbers of men and women, 20 percent from urban areas, and 80 percent of Turkmen ethnicity. We planned to enroll the urban participants from Gonbad City, the second largest city of Golestan with 128,102 inhabitants aged 40-75, and the rural participants from villages in Gonbad, Kalaleh, and Aq-Qala counties, with 53,121 aged 40 – 75.
A total of 16,599 urban inhabitants older than 40 years were selected randomly from five areas of Gonbad City by systematic clustering based on the household number. The selected inhabitants were contacted at home by specially-trained health workers and invited to visit the Golestan Cohort Study Center, a research center specifically established for this project in Gonbad, and to participate in the study. A total of 10,032 urban participants were enrolled from Gonbad, with participation rates of approximately 70% for women and 50% for men.
In rural areas, recruitment took advantage of the network of health houses, primary health care centers present in each group of villages, which are typically staffed by two auxiliary health personnel (locally called the Behvarz). The Behvarz are in charge of vaccination programmes, family planning, reporting births, deaths, and major communicable diseases, and initial primary care treatment. All residents of all villages in the study catchment area who were eligible for this study were invited to participate. Temporary recruitment centers were established in the health houses of 198 selected villages, and the Behvarz accompanied the GCS research team to contact the selected subjects at their homes. The invitation group thoroughly explained the purpose and procedures of the study to the eligible subjects and invited them to participate in the study. If the study participant did not fully understand the procedures, he/she was invited to visit the study center and observe all steps of the study in person. A total of 40,013 participants were enrolled from 326 villages, with participation rates of 84% for women and 70% for men.
Exclusion criteria were: (i) unwillingness to participate at any stage of the study for any reason, (ii) being a temporary resident, and (iii) having a current or previous diagnosis of an upper gastrointestinal (UGI) cancer. Before interview, a written informed consent was obtained from each participant.
Each subject was interviewed by a trained general physician and a trained nutritionist, either in the local language (Turkmen) or in the national formal language (Persian), depending on the participant’s preference. Two structured questionnaires were administered, a lifestyle questionnaire and a food frequency questionnaire. Following the questionnaires and a limited physical examination, samples of blood (10 mL), urine (4.5 mL), hair (3 cm from the base of scalp) and nail (trimmings from all 10 toenails) were collected by a trained technician. In the urban area, all biological samples were immediately processed in the central laboratory at the Golestan Cohort Study Center. In the rural areas, blood samples were kept in refrigerators (+4° C), until they were transferred within cooling boxes to the central laboratory; the maximum duration between blood collection and final processing was 8 hours. The blood samples were centrifuged and aliquoted in 500 μL straws (8 straws of plasma, 4 straws of buffy coat, and 2 straws of red blood cells) and stored at -80° C. Urine samples were stored at -20° C, and hair and nail samples were stored at room temperature. Half of the frozen blood samples were subsequently transferred on dry ice to DDRC in Tehran, and then shipped at regular intervals to IARC in Lyon, France, where they are stored in nitrogen vapour (approximately -135° C).
All participants received a personal GCS identification card at the time of enrollment which allows them to come to Atrak Clinic if they experience any gastrointestinal symptoms. Atrak Clinic is a specialized gastrointestinal clinic established by DDRC in the main hospital in Gonbad,13 and provides free services for the GCS participants.
Table 1 shows the composition of the cohort. The distribution by ethnicity and place of residence is close to the initial goal; however, because of a higher response rate, the number of women in the cohort (n = 28,804) is higher than that of men (n = 21,241).
Table 1. Age, sex, ethnicity, and place of residence of the 50 045 participants in the Golestan Cohort Study (2004 – 2008)
| Men (age group) | | Women (age group) | |
| ≤ 45 | 46 – 55 | 56+ | | ≤ 45 | 46 – 55 | 56+ | Total |
Number | 5,394 | 7,973 | 7,874 | | 8,877 | 11,532 | 8,395 | 50,045 |
Ethnicity (%) | | | | | | | | |
Turkmen | 76 | 77 | 74 | | 74 | 74 | 72 | 74 |
Non-Turkmen | 24 | 23 | 26 | | 26 | 26 | 28 | 26 |
Place of residence (%) | | | | | | | |
Urban | 19 | 17 | 19 | | 20 | 21 | 23 | 20 |
Rural | 81 | 83 | 81 | | 80 | 79 | 77 | 80 |
3. Follow-Up and outcomes ascertinment
Retaining and tracking each participant to the end of study with complete ascertainment of the outcomes of interest is critical for success in any cohort study. Cancer and death registration is not systematic in the study area and so the key end points can only be ascertained through careful active and passive follow-up.
3.1 Follow-up methods
In order to maximize cohort retention we are using multiple methods of follow-up. We aim to obtain follow-up information on each participant once a year. The purpose of the follow-up is to ascertain vital status, cause and date of death, occurrence and date of any cancer. These are as follows.
Active telephone contact:
Ninety-eight percent of participants have a private telephone line. All subject will be contacted annually by a trained telephone interviewer who will complete the case review form (Appendix 3). This records information regarding vital status of the participant, any occurrence of disease and hospital admissions that have happened since last contact. Any planned changes of address will also be recorded to facilitate future follow-up. If the participant cannot be contacted after 7 attempts on different days over 2 consecutive weeks, the interviewer will use available alternative contact information to try to contact the participant’s at a work or through their neighbours or relatives. If all these approaches fail an interviewer will try to contact the participant through a home visit (see below).
Active follow-up by team
Interviewers will visit the participants or relatives at home or work to conduct fact-to-face interviews. These are necessary in order to: i) complete a verbal autopsy in deceased cases 15
(Appendix 5); and ii) for participants who report the occurrence of any cancer, endoscopy or biopsy to complete the outcome questionnaire (Appendix 6), and collect the Para clinical reports, including Lab. tests, radiological reports, hospital documents, pathological reports, etc (Appendix 7). It should be mentioned that in case of any cancer a complete assessment including collecting the samples (blocks or lams) is mandatory.
Passive follow-up
At the time of enrollment each subject is asked to contact the research team when specific events occur. They are asked to report in-patient and out-patient hospital visits, particularly for endoscopy or biopsy. Family members are also asked to report death of the subject. These contacts are registered and subsequently followed-up by the research team for further details.
Follow-up by Behvarze
In the rural areas the Behvarze will be contacted once a month and asked to report any occurrence of cancer diagnoses and hospital admissions and deaths in participants and to complete the case review questionnaire for participants that have no telephone number (Appendix 4). The Behvarze will be visited in person on a six-monthly basis in order to identify any deaths that have occurred. The cause of death attributed by the Behvarze will be noted although this will be confirmed by a verbal autopsy.
Other methods for passive follow-up
The databases of the Atrak clinic, the death register and the Golestan cancer registry will be reviewed monthly to find the cancer in study subjects. Special software will be developed to look for cohort participants in these three databases (death, cancer registries and Atrak clinic).
3.2 Outcomes ascertainment
The primary end points are occurrence of upper GI malignancy, other cancer, death or migration.
A) Upper GI Malignancies: The main outcome of this cohort study is cancer of the upper gastro-intestinal tract, including; squamous cell carcinoma of the oesphagus (ESCC), adenocarcinoma of the oesophagus (EAC), gastric cardia cancer (GCC) and gastric non-cardia cancer (GNCC). However, histological and topographical information is often missing from clinical records and categories of EC NOS (not otherwise specified), GC NOS and esophago-gastric junction (EGJ) tumors NOS will also be used.
B) Other Cancers: Other cancers will be recorded and classified by topography and morphology. In case of absence of detailed pathology reports they can be recorded as their primary sites without morphology. (Appendix 1)
C) Death: The interviewers must register occurrence of any deaths in participants. This study will utilize a death categorization table comprises 20 groups (ICD-10), to arrange the deaths. (Appendix 2) It is predicted that more than 80% of all deaths would be occur because of cardiovascular diseases, accidents and cancers.
D) Migration: There are three types of migration that we have defined:
First type is “Address Change”, that means any changes in participants’ residency, which are inside the catchment area (that’s not exactly one type of migration). These participants should be tracked through their relatives and/or their neighbors or any other way for continuation of their follow-up. The second type is “Seasonal Migration” that means mass movement of some people who live in certain area with their domestic animals to find better grasslands for their animals at certain seasons and come back to previous area in other seasons. We can follow these participants at their returning. Third type is “Emigration” that means the movement of participants to the outside of our catchment area. In this case every effort should be undertaken to retain the participant and follow-up him/her as much as possible.
Case confirmation: The process of case confirmation, including assessing the completeness of medical and Para clinical documents, and allocation of a category of definition (study codes) for cancer or death, requires a comprehensive review. At the first step the follow-up team should complete the case packet form for all outcomes (Appendix 7). Then two independent reviewers (internists) will not only review all documents but also allocate a definition (study codes) and a date of occurrence to subject (Appendix 8). The two allocated codes will be matched, in case of a persisting disagreement a third person will review the case packet and decide to want further documents, or allocate a specific code and date to subject.
Every cohort participant is informed that in case of experiencing any upper GI symptoms he/she can be seen at the Atrak clinic free of charge. The clinic has 5 staff; one gastroenterologist for patient visit and doing endoscopy, taking biopsies or replacing stents whenever needed, 2 nurses just for aiding the gastroenterologist and getting required samples (blood that is stored in a -70°C freezer and Hair, Nail & Urine which is stored in a -20°C freezer) and doing interviews, one data entry operator who enters all data collected for each participant to a specific computerized database, and one lab technician for dividing blood samples and managing biopsies, taking slides of them and preparing for reading. We predict that (from previous data) 200 participants will visit the Atrak clinic per annum.
The Golestan cancer registry is run by Golestan University and DDRC. It covers a population of 1.6 million, in 11 districts. Cancer registry officers actively collect data for all incident cases in 20 hospitals, and more than 15 pathologic centers. All oncology centres, laboratories, imaging services are now covered by registry. The data for each case is collected and then rechecked and coded centrally for any duplication or missing. CanReg is used for data entry and analysis.
Upper GI cancer outcomes (incident cancers and cancer deaths) are the most important outcomes of the study, so they will be reviewed again and verified by an International Endpoint Review Committee (IERC) of expert MDs representing DDRC, IARC and NCI. The IERC will meet periodically, as needed. For each case, the IERC members will review the clinical history and all available diagnostic materials (pathology slides, x-rays, etc.) and record their consensus findings on an International Endpoint Review Form (Form to be developed). This IERC diagnosis will be the final study diagnosis for these outcome events.
3.3 Data management
Data are managed using a Microsoft SQL Server database. The database is used both to store individual participant data and to manage the follow-up process. The data are held on a secure server with each operator having her own unique user name and password with user specific access to different levels of stored data. At the end of each month we back up from database on DVD in 3 copies: one remains at the safe place in Follow-up room, one is sent to DDRC and finally the 3rd copy is sent to Isfahan for more safety.
1. InitIal results
A description of the cohort and the baseline characteristics of the participants has recently been submitted for publication 16. About 50 percent of men and 85 percent of women have had no formal education. The highest attained educational level was lower in older subjects and among women, compared to younger subjects and men, respectively. The GCS confirms previous findings of a low prevalence of tobacco smoking, nass chewing (a kind of smokeless tobacco), and alcohol drinking in this population, particularly among women. Among men, approximately 60 percent had never smoked tobacco, and 83 percent and 92 percent had never used nass or alcohol,
respectively. Current tobacco use was more common and current nass chewing was less common among younger men than among older men. Among women, the rates of tobacco smoking and consumption of nass and alcohol were negligible. Twenty-two percent of men and seven percent of women were current opium users.
Several sub-studies were conducted within the pilot study of the GCS. Exposure to polycyclic aromatic hydrocarbons (PAHs), estimated by measuring a stable urinary metabolite, was high in the great majority of the participants, most of whom were non-smokers17 . Median serum selenium was 155 µg/L, which suggests that the population of Golestan receives adequate selenium and selenium deficiency is not a risk factor for EC in this region 18. Contamination with carcinogenic mycotoxins was not found in a limited number of raw rice, sorghum, and wheat samples that were collected from the region 13. Symptoms of gastro-esophageal reflux disease were common among pilot study participants, and 31 percent experienced these symptoms at least once a week. Approximately 4 percent of the pilot study participants were positive for hepatitis-B surface antigen (HBsAg) and we have developed a plan to enroll them in a separate cohort of HBV carriers. In data obtained from twelve 24-hour dietary recalls, rural residents reported significantly lower intake of several food groups and nutrients, and intake of some vitamins was lower than the recommended values among rural dwellers and women. Average body mass index (BMI) in a subset of GCS participants was shown to be high; the prevalences of overweight (BMI ≥ 25) and obesity (BMI ≥ 30) were 64 percent and 28 percent respectively19. Finally, it was shown that PAH exposure may contribute to etiology of EC in the area20.The pilot study interviewed 1057 study subjects, and a repeat interview was performed on 130 subjects two months after the first interview. The kappa statistics for agreement were above 0.7 for most variables, including tobacco, nass, opium and alcohol consumption, as well as for most self-reported gastro-esophageal symptoms. Two different methods were examined for estimating the temperature at which tea was usually consumed, and the method with the higher repeatability (kappa statistics = 0.71) was selected for use in the actual cohort. The validity of the questionnaire data about opium use was assessed in 150 subjects by comparing their questionnaire responses with the presence of codeine or morphine in their urine; the questionnaire responses had a sensitivity of 0.93 and a specificity of 0.89 for identifying subjects with these urinary opium metabolites21. There was also a good agreement between self-reported current tobacco smoking or nass use and positive urinary cotinine. To validate the study FFQ, twelve 24-hour recall questionnaires (one every month) and four FFQs (one in each season) were administered to 131 participants during one year. There was good correlation between FFQ and recall data on food group and nutrient intakes, and there was acceptable correlation between questionnaire data and biomarker measurements14 .To examine the repeatability of the data collected in the actual cohort, we repeated the entire enrollment process, including interviews and sample collections, in 698 cohort participants from rural areas. The mean interval between the first and second enrollments was 45 months. There was very good agreement between data collected at the two interviews.In the first four years of follow-up, the GCS has had minimal attrition, with very few subjects being lost to follow-up. Because of very good phone accessibility and a very low rate of emigration out of Golestan, the success rate of the follow-up during the first four years has been 99.8%. There have been 78,542 contacts for 42,095 participants [10,788 (26 percent) urban and 31,307 (74 percent) rural] over the first four years of follow-up. A total of 19,287 participants have had a single follow-up, 12,824 participants have been successfully contacted twice and the remaining 9,984 participants have been contacted three (6329) or four (3655) times.
There has been 252 incident cancers to date of which 89 have been upper GI cancers. The total cancer incidence is equivalent to 321 per 100,000 person years of follow-up. The site-specific distribution of these cancers is shown in Table 2.
Table 2: Incident cancers during first three years of follow-up
Cancer | Number (%) | Rate per 100,000 person years |
Esophageal | 60 (24%) | <span style="font-size: 9pt; font-family: &# |