GWAS of outcome after SAH | Haemoglobin After inTraCranial Haemorrhage (HATCH) Consortium

An international multi-centre genome-wide association study of outcome after aneurysmal subarachnoid haemorrhage

The aim of this study is to identify genetic variations predicting outcome after aSAH. It is a collaboration between the HATCH Consortium, the International Stroke Genetics Consortium and independent investigators.

Background

Aneurysmal subarachnoid haemorrhage (aSAH) is associated with significant morbidity and mortality, and is the stroke type with the highest economic cost per capita (1). The best predictor of clinical outcome after aSAH is the World Federation of Neurological Surgeons (WFNS) score (2), which reflects the clinical severity of early brain injury after aSAH. Yet, this score only explains 12% of the variance in outcome (3), and multivariate predictive models including other covariates can only account for less than 25% of variance (3). There is some evidence suggesting that clinical outcome after aSAH has a substantial genetic component. Most genetic association studies to date have been hypothesis-driven, investigating loci with a plausible functional relationship to clinical outcome, such as haptoglobin, endothelial nitric oxide synthase, apolipoprotein E, brain-derived neurotrophic factor, and genes associated with fibrinolysis and inflammation (4). However, this approach is limited by our incomplete understanding of the biological mechanisms underlying recovery after aSAH, and will not identify novel genetic factors and therapeutic targets. A systematic approach, studying the whole genome in an unbiased way, will overcome this limitation and deliver novelty. While a genome-wide association study (GWAS) of aneurysmal rupture has been performed, no GWAS of clinical outcome after aSAH has been undertaken to date. Nowadays, GWAS design is relatively straightforward and cost-effective. This is due to the relative ease of forming consortia in the modern connected world, and the availability of high density genotyping arrays at low cost.

Hypothesis & Aim

We hypothesize that the genetic background of the individual is a significant predictor of outcome after aSAH. We aim to perform a GWAS of clinical outcome after aSAH. The primary outcome will be the Glasgow Outcome Scale, GOS (5) or its extended version GOSE (6) and/or modified Rankin Scale, mRS (7,8), dichotomized into favourable and unfavourable.

Recruitment

Patients will have already been recruited, or are being recruited, at member sites of the HATCH and International Stroke Genetics consortia, and other collaborators worldwide.

Membership application is open to any investigator able to provide:

GWAS data, DNA or cellular samples
GOS(E) or mRS outcome data
Age
Evidence of institutional review board approval and patient consent

Patient inclusion criteria

All aSAH cases are eligible for inclusion. The aneurysmal nature of the SAH can be ascertained by any angiographic method. The National Institute of Neurological Disorders and Stroke (NINDS) Common Data Element (CDE) for SAH is C14244, and the permissible value will be saccular aneurysm. The International Statistical Classification of Diseases and Related Health Problems (ICD)-10 code is I60. Perimesenchephalic, traumatic, dissecting and other causes of SAH are exclusion criteria.

Outcome data

The NINDS CDE for mRS is C13230, for GOS is C07193 and for GOSE is C07194.

Retrospective data

The minimum is GOS(E) and/or mRS at between 1 and 24 months.

Prospective data

GOS(E) and/or mRS between 1 and 24 months is the bare minimum. The preference for time point will be 6 months. If mRS and/or GOS(E) is also available at 12 months after ictus, this will also be useful. The preference for outcome measure will be as follows: (1) both GOS(E) and mRS; (2) mRS only; (3) GOS(E) only.

Covariates

The primary aim is not to explain maximum variance as is conventionally done in predictive modeling but to detect associations between genetic variation and outcome. We have limited covariates to confounding variables since this is essential in establishing causality; confounding variables are defined by a forward path linking the variable to both exposure and outcome. Directed acyclic graph theory has been used to rationalise the choice of covariates. Age and genetic ancestry are the only known variables satisfying this definition and will be included as essential covariates

Additional covariates, if available, include WFNS grade, treatment modality (surgical/endovascular/conservative), time since SAH, rebleed, delayed cerebral ischemia.

Data

Clinical outcome

The two most commonly used clinical outcome measures are the mRS and/or GOS, and study sites are likely to be using either one or the other. Outcome will be dichotomized into favourable and unfavourable, enabling both scales to be used (3). Only mRS or GOS data will be used from individual study sites i.e. mixed mRS or GOS data from individual study sites will not be allowed. This will enable study site to be used as a covariate, adjusting for differential usage of the two scales across study sites.

Data security

Data will be de-identified and stored centrally on a secure server with encryption to 256-bit strength

Genotyping

For cellular samples, DNA will be extracted using magnetic particle chemistry on a QIAsymphony SP platform (Qiagen). In order to minimize the amount of imputation needed to merge data from different arrays, all efforts will be made to genotype centrally, on the same platform, unless GWAS data is already available or it is not possible to transfer DNA or cellular samples for governance or other reasons. The Infinium Global Screening Beadchip Array (Illumina) is one of the currently most cost-effective, with around 640,000 markers and an inter-marker distance of 4.4kb, and has highly optimized multiethnic genome-wide content.

Quality control

The resulting raw genome-wide data will be subjected to standard quality control methods. Patients with gender mismatch, individual missingness >5%, heterozygosity rates ±3 standard deviations from the samples' heterozygosity rate mean and cryptic relatedness (proportional identity by descent > 0.1875) will be excluded. Single nucleotide polymorphisms (SNPs) with extreme deviation from Hardy-Weinberg equilibrium, minimum allele frequency (MAF) of <1 %, and SNP call rate <90% will be excluded. Population stratification will be performed, but will be used as a covariate rather than a quality control measure, in this multi-ethnic GWAS (see below).

Imputation

Imputation may be needed if GWAS data is already available in order to align data. Whether this is needed or not, imputation will be performed using the Sanger imputation server in order to increase the density of coverage to enable fine mapping around significant loci. Haplotypes will be pre-phased using EAGLE2 into the Haplotype Reference Consortium (r1.1) which is the largest reference panel of human haplotypes and imputed using the positional Burrows-Wheeler transform (PBWT). Imputed genotypes will be quality controlled by excluding SNPs with a posterior probability less than 0.8, a minor allele frequency less than 5%, greater than 10% missing genotypes, or extreme deviation from Hardy-Weinberg equilibrium (p≤1x10^-10).

Multi-ethnicity

This is a multi-ethnic GWAS, and spurious associations between genetic variants and clinical outcome may occur when the latter varies across ethnicities for non-genetic reasons and allele frequencies also vary in the same direction. In the past GWAS have mainly focused on populations of European ancestry, but this is changing since the advantages of multi-ethnic GWAS are being increasingly recognized. These include the ability to gather larger datasets and to detect association for genetic variants otherwise segregating at low frequency, both of which increase statistical power. Population stratification will be assessed by principal component analysis, using reference populations from the 1000 genomes project, and significant genetic ancestry eigenvectors will be used as covariates in logistic regression modelling.

Multi-stage design

A two-stage design will be used to minimize false positive results, maximize power and optimize cost. In Stage I, 2500 samples will be tested for association using multivariable regression analyses with dichotomized clinical outcome as dependent, genetic variant as predictor, and including essential covariates. SNPs for follow-up will be selected using a clumping procedure to generate a shortlist of index SNPs with support from correlated SNPs. Priority will be given to genomewide significant SNPs (p<5x10^-8) (9). Additional allelic signals with suggestive levels of significance (p<1x10^-4) will be included based on their proximity (500kb upstream or downstream) to genes known to affect clinical outcome after aSAH (4). In stage II, SNPs selected for follow-up will be tested for association in 2500 independent samples. A fixed effects inverse variance-weighted meta-analysis will be used to combine evidence from stages I and II and to determine the final significance and effect size.

Power calculation

The meta-analysis will combine evidence from 5000 samples of which 30% are expected to have an unfavourable event (3). According to this sample size and event rate and using the Genetics Design software in R, we estimate that the meta-analysis will have 80% power to detect common SNPs (MAF=0.4) with an effect size of 1.33, and rare SNPs (MAF=0.1) with an effect size of 1.56 at a genome-wide level of significance (p<5x10^-8).

Biological relevance

The Consortium includes members with the appropriate expertise to conduct post-GWAS functional studies, to elucidate the role of identified genetic variants in the biological mechanism leading to clinical outcome after aSAH.

First, William Tapper’s group has bioinformatics expertise in functional annotation of candidate genes linked to significant sentinel SNPs and in silico prediction of function by interrogation of databases of histone methylation, hypersensitivity to DNase I, transcription factor binding sites, regulatory motifs, gene expression and expression quantitative traits.

Second, the laboratories of Sylvain Dore, Spiros Blackburn and Ian Galea have expertise in cellular and animal modelling of various aspects of haemoglobin neurotoxicity, clearance and inflammation. Hence the highest GWAS-significant genetic variants with the highest predicted functional scores will be introduced using lentiviral transfection or CRISPR/cas9 technology into immortalized cell lines of a lineage appropriate to the function of the gene. Wild-type and mutant cell lines will be exposed to an in vitro challenge appropriate to the function of the gene (eg inflammation, protection from redox stress or haemoglobin clearance).

Authorship

All contributors will be authors.

Current recruitment

For the stage 1 discovery analysis the following samples have been recruited:

Recruitment site	Sample size
Southampton, UK	804
University College London, UK	817
Pittsburgh, US	180
Geisinger, US	66
Hallym, South Korea	90
Geneva, Switzerland	63
Utrecht, Netherlands	470
Total	2490

Recruitment is ongoing for the stage 2 validation analysis.

References

Taylor, T. N. et al. Lifetime cost of stroke in the United States. Stroke 27, 1459-1466 (1996).
Teasdale, G. M. et al. A universal subarachnoid hemorrhage scale: report of a committee of the World Federation of Neurosurgical Societies. J Neurol Neurosurg Psychiatry 51, 1457 (1988).
Jaja, B. N. R. et al. Development and validation of outcome prediction models for aneurysmal subarachnoid haemorrhage: the SAHIT multinational cohort study. BMJ 360, j5745, doi:10.1136/bmj.j5745 (2018).
Ducruet, A. F. et al. Genetic determinants of cerebral vasospasm, delayed cerebral ischemia, and outcome after aneurysmal subarachnoid hemorrhage. J Cereb Blood Flow Metab 30, 676-688, doi:10.1038/jcbfm.2009.278 (2010).
Jennett, B. & Bond, M. Assessment of outcome after severe brain damage. Lancet 1, 480-484 (1975).
Jennett, B., Snoek, J., Bond, M. R. & Brooks, N. Disability after severe head injury: observations on the use of the Glasgow Outcome Scale. J Neurol Neurosurg Psychiatry 44, 285-293 (1981).
Farrell, B., Godwin, J., Richards, S. & Warlow, C. The United Kingdom transient ischaemic attack (UK-TIA) aspirin trial: final results. J Neurol Neurosurg Psychiatry 54, 1044-1054 (1991).
Rankin, J. Cerebral vascular accidents in patients over the age of 60. II. Prognosis. Scott Med J 2, 200-215, doi:10.1177/003693305700200504 (1957).
Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32, 227-234, doi:10.1002/gepi.20297 (2008).