Personalized genomic disease risk of volunteers
Manuel L. Gonzalez-Garay'', Amy L. McGuireb, Stacey Pereirab, and C. Thomas Casket'
'Center for Molecular Imaging, Division of Genomia and Bioinformatics, The Brown Foundation Institute of Molecular Medicine, University of Texas Health
Science Center, Houston, TX 77030; and ', Center for Medical Ethics and Health Policy, Department of Medicine and Medical Ethics, and 'Department of
Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030
Contributed by C. Thomas Caskey, August 27, 2013 (sent for review July 11, 2013)
Next-generation sequencing (NGS) Is commonly used for Results
researching the causes of genetic disorders. However, its useful- Categories of Variants to Report to Patients. Variants obtained
ness in clinical practice for medical diagnosis is in early de- from our workflow (described in Fig, 1) were reported using
velopment. In this report, we demonstrate the value of NGS for three categories. Our first variant category consists of variants
genetic risk assessment and evaluate the limitations and barriers identified in an individual where the alleles are found in Human
for the adoption of this technology into medical practice. We Genome Mutation Database (HGMD) (13, 14) and labeled
performed whole exome sequencing (WES) on 81 volunteers, and disease-causing mutations (DM). These alleles also were re-
for each volunteer, we requested personal medical histories, quired to be rare (<1% allele frequency in 6,500 exomes from
constructed a three-generation pedigree, and required their the National Heart, Lung, and Blood Institute (NHLBI) Exome
participation in a comprehensive educational program. We lim- Sequencing Project (15) and the 1,000 Genomes Project
ited our clinical reporting to disease risks based on only rare Genomes (16, 17)] and predicted to be damaging to protein
damaging mutations and known pathogenic variations in genes function by two of three predictions algorithms [Polyphen 2.0
previously reported to be associated with human disorders. We (18), Sift (19-24), and MutationTaster (25)] using Database of
identified 271 recessive risk alleles (214 genes), 126 dominant risk Human Non-synonymous SNVs and their functional predictions
alleles (101 genes), and 3 X-recessive risk alleles (3 genes). We and annotations (dbNSFP) (26) as described in Fig. 2. The genome
linked personal disease histories with causative disease genes in sequence data of each volunteer were reviewed and interpreted,
18 volunteers. Furthermore, by incorporating family histories into taking into account personal medical history, a three-generation
our genetic analyses, we identified an additional five heritable pedigree with family history of diseases, and bioinformatics
diseases. Traditional genetic counseling and disease education analysis. The medical history of each volunteer in this cohort was
were provided in verbal and written reports to all volunteers. Our rich with detail because each had a private physician used for
report demonstrates that when genome results are carefully annual examinations, and in some cases. disease therapy. Fig. 3
interpreted and integrated with an individual's medical records summarizes the results of our pipeline: we recruited 81 non-
and pedigree data, NGS is a valuable diagnostic tool for genetic related volunteers and sequenced their genomic DNA using
disease risk. exome sequencing. We detected 65,582 unique nonsyttonymous
coding variants (nscv). Every nscv was interrogated for human
molecular medicine I disease prediction I whole exome sequencing inherited disease mutations using the HGMD (13, 14) database
from Biobase (DM category consisting of 109,708 variations).
We were able to detect 1,036 HGMD (13, 14) DM variations.
S equencing the whole genome of patients with genetic dis-
orders has become reality since the sequencing of the first
individual human in 2007 (1). Further advances in massively
After using the filters described in Fig. 2, the number was reduced
to 275 pathogenic variants. We identified in our cohort 208 au-
tosomal recessive (AR) alleles (169 genes), 64 autosomal domi-
parallel DNA sequencing are reducing the price of sequencing nant (AD) alleles (44 genes), and three X-linked recessive (XLR)
an entire genome or exome. The quality and speed of sequencing
and analyzing a personal genome are improving at an unprece- Significance
dented pace, making possible the introduction of next-generation
sequencing (NGS) into the clinic on a research basis (2-7). Replacing traditional methods for genetic testing of inheritable
Advancements in NOS have stimulated international research disorders with next-generation sequencing (NGS) will reduce
initiatives to identify genetic links to rare disorders in children, the cost of genetic testing and increase the information avail-
with an average diagnostic success of 20-25% and the discovery able for the patients. NGS will become an invaluable resource
of new disease-gene associations (8-12). for the patient and physicians, especially if the sequencing in-
The rapidly increasing number of aging adults in our society formation is stored properly and reanalyzed as bioinformatics
will place unprecedented demands on the health care system. To tools and annotations improve. NGS is still at the early stages
provide adults with a healthy longevity we need to develop of development and it is full of false-positive and -negative
a system to identify genetic risk and apply early intervention on results and requires infrastructure and specialized personnel to
pathology progression. In this report, we decided to sequence the properly analyze the results. This paper will explain our expe-
whole exomes of a healthy adult cohort of 81 volunteers and rience with an adult population, our bioinformatics analysis,
evaluate the value of applying NOS in combination with medical and our clinical decisions to assure that our genetic diagnostics
history and pedigree data. In this report we plan to address three were accurate to detect carrier status and serious medical
main questions. (i) What genetic discoveries need to be provided conditions in our volunteers.
to the volunteers? (ii) What is the practical value of delivering
this information to volunteers? (iii) What are the challenges and Author contributions: m I.G.-0. and CT.C. designed research; PAL.G.4 . A LM.. 5.P.. and
barriers to the adoption of this powerful technology into medical CT.C. performed research; PA.L.G.4. analysed data; and M.LGA...A.L.M. S.P. and CT.C.
practice? mote the paper.
The individual genetic reports yield helpful medical risk in- The authors declare no conflict of Interest.
formation, suggesting that population sequencing of asymptom- Freely available online through the PNAS open access option.
atic adults may prove to be valuable and useful. We provided to 'To whom correspondence may be addressed. E-mail: manuell.GonzalezGarayeluth,unc.
the participants, under our institutional review board, genetic edu or tcaskeyelbotedu.
risk findings from the analyses and genetic counseling to discuss This article contains supporting information online at vninv.pnas.orgiloalcupisuppildoi:10.
their results. 1073/Dna 13159341IONIXTv0Plementet
www.yroas.orgrcgikloill0.10734mas.1315936110 PNAS Earty Edition I 1of 6
EFTA01140242
physicians for further analytes measurements. There were two
IINININOISNONInC• individuals with morbid obesity (body mass index of 32 and 37
kghto who carried an MC4R allele associated with pediatric
NovoMen moNnint la enlocenot MIN Went+ meow,:
SAPAtesb/Picard obesity and rare heterozygotic adults (35, 36). Two ophthalmo-
SAMMe .Remove duplicate logic disease/gene associations were identified. The childhood
•Rnallbrate aligrenents GAN brittle corneal syndrome type 1 occurred in a volunteer who had
•local rtalignmnits undergone successful corneal transplant and carried a putative
compound heterozygosity in ZNF469 (37). One volunteer was
GAIN% Bantian under care for macular dystrophy and carried an ABCA4 allele
a c" dance SNIVIndel taller (38). One sterile male volunteer was found to have an insertion
%N s in gene USP26 (known to be responsible for infertility in men)
SNM annotated sciostindels a IMoons
Welt Cereal° deournamoon (39). Associations for melanoma and breast cancer were identi-
uwEll fied. The two patients with melanoma carried different gene
ANNMAR allele associations: GRIN2A and BAG4 (40-42). Two volunteers
diagnosed with breast cancer had different allele associations in
Hg. 1. Workflow for processing NGS data. Raw sequencing data are aligned BRCA2 (43, 44). Single cases of early onset prostate (LRP2) (45)
against the reference sequence using Novoalign software from NovoCraft.
and follicular thyroid cancer (TPR) cancer were identified (46,
SAM files are preprocessed using SAMtoots and Picard to create BAM files and
47). A volunteer with nonsyndromic deafness was found to have
remove duplicates. The Genome Analysis Tcolkit (GATK) is then used to
recalibrate the alignments, perform local realignments, and identify SNPs and
risk alleles in two genes associated with autosomal dominant
indels. Finally, SnpEff and ANNOVAR are used to annotate variants. (AD) deafness and had a three-generation positive family history
of deafness (48). In each case, the volunteer was instructed to
inform their Physician and was requested to confirm the ge-
alleles (3 genes). These data resulted in an average of 3.5 disease nomic allele identification in a Clinical Laboratory Improve-
allele reports per volunteer. ment Amendments (CLIA)-certified laboratory, even when each
The approach for a second category of variants consisted of reported allele had been sequenced twice in independent studies.
creating a personalized list of candidate genes from Online The finding provided information for personal and family risk
Mendelian Inheritance in Man (OMIM) (27, 28) known to be counseling not possible before gene association.
associated with the disorders reported in the medical literature.
Incorporation of Three•Generation Pedigrees into the Genetic Analyses.
We detected 131 alleles (131 genes) using this approach. Each one
of these variants provided a potential causation for the volunteer's The three-generation pedigree medical information was analyzed
disorders. Each one of the variations obtained from this approach to identify those volunteer families who warranted additional ge-
passed our stringent pipeline. This approach added on average netic study. Table S2 lists those genetic disorders identified by
another 2.0 disease alleles per volunteer report. pedigree/familial medical history. In each case, the volunteer was
The third approach used a family history to create a person- counseled for the family risk and encouraged to contact at risk
alized list of candidate genes from OMIM (27, 28). and as be- family members who may benefit from focused genetic studies.
fore, we compared our list of candidate genes with the disorders Three of the families have reported that they have had their fa-
reported in the family history. milial genetic diagnosis resolved at this time paraganglioma (49),
Before reporting an allele to the volunteer, we reviewed the Prader-Willi syndrome (50, 51), and ankylosing spondylitis (AS)
original publications that support the pathogenicity of all of the (52)1. One additional family is under study rourette syndrome
alleles (HGMD) and/or the evidence associating the gene with (534 Additional familial disease risks were identified by history
the disorder (OMIM). At this time, all three abovementioned for atrial fibrillation (AR), bicuspid aortic valve (BAV), dyslexia
categories of investigation were reported in full recognition; (AR), Fatny's (XLR), gall stones (AD), and myotonic dystrophy
some would be found to be non-disuse-producing alleles as (anticipation AD). Success with this approach was productive but
databases improve and functional assays complement informatics not universally accepted because disease/gene resolution requires
predictions. We have updated clinical reports as these data interaction with interested and motivated family members.
emerged and counseled the patients on the options for reducing or
eliminating the disease risk.
WWICall Rant NM 1 Sam Ms
Disease Genes Identified in the Cohort. Table SI summarizes our
dOSNP 132 CGI var.annosadonFie
disease associations. Matching personal medical records to per-
sonal genome reports was informative. We elected to report ,0. GamretSess brown Gene OK%)
findings as disease-gene associations instead of reporting findings
as diagnostic because we did not included in our study traditional mpgfl LANNCNAR (na-toding vatianta)
"surrogate markers" (analytes, proteins, and imaging) for the 4 Stank:IKIND Db foe (Mum, Cause* Mutations
confirmation of a disease diagnosis. We considered potentially 1%
Aker Out variants MM >r
causative findings to be those mutations that are predicted to be n Should have been convect damaging for
damaging in addition to being reported in either HGMD (13, 14) menu 2/3 ptecutions tools
.11
or OMIM (27, 28) databases. These mutations are considered to . frolyphen-2. Sift and hivtatronTaster1
be "need to know" and are reported to volunteers. There was Sternalfrequencyfilter < 3%
identification of associations for vascular disease and/or hyper-
cholesterolemia in five individuals related to LDL receptor YPOVarlards Won
(LDLR) alleles. LDLR mutations are causative of early onset Fig. 2. Pipeline to generate variants reports. Every variant in the variant call
autosomal dominant coronary artery disease (CAD) and manifest format file is annotated using spnEff and ANNOVAR; nonsynonymous cod-
hypercholesterolemia (29, 30). Three individuals were taking ing variants are annotated using the commercial version of the HGMD da-
statins related to their hypercholesterolemia. Two individuals tabase. (Left) Our selection of variants by the creation of a personalized
were not under care but had history of personal hypercholester- candidate gene list using medical history and family history for each vol-
olemia and in one case a son with hypercholesterolemia. unteer. Mutations with a minor allele frequency of >1% are removed using
There were four volunteers detected with risk genes for di- frequencies from the NHLSI exome sequencing project (ESP), 1,000 Genomes
abetes mellitus (31-34). Two of the individuals were under Project. Variants that are consider benign by two of three predictions tools
therapy for diabetes 2, whereas two additional volunteers had are removed (using dbNSFP). Finally, we remove variants that are present in
elevated fasting blood sugars and were being followed by their our cohort more than three times.
2 of 6 I www.pnas.orgfcgikloW10.10734mas.1315934110 Gonzalez-Garay et al.
EFTA01140243
81volunteers
65,582 NSCV NSC-snps Exon Sequencing
Using HGMD
(109,708 annotated
1,036 NSC-sts from HGMD
variants)
A
275 NSC-snps from HGMD after filtering 160 NSC•snps from OMIM
Medical and family History Interpretation
ck.1
Medical History Family History Negative History
B B 206 HGMD Autosomal recessive (169 Genes)
23 disease-gene 4 resolved
63 MAIM (No.HGMD)Autosemal recessive (63 Genes)
associations 1In progress 3 HGMD X linked recessive (3 Genes)
6 OMIM (No-HGMD) X linked recessive (6 Genes)
64 HGMD Autosomal Dominant (44 Genes)
62 °Mill (tio.HGIND)AulosoM31Dominant (62 Genes)
Fig. 3. Summary of result. The flowchart provides the number of variants from each step of the pipeline described in Fig. 2.
Table S3 provides a sampling of the recessive risk alleles. They BRCAI, BRCAZ PALB2, R4D5IC, and RADS& Two volunteers
constitute the majority of the observed alleles. Of the 160 off- with BRCA2 risk alleles were diagnosed with breast cancer. One
spring of the 81 volunteers, no children were affected with these man carried a premature chain termination mutation and has
disorders. MI volunteers indicated their families were complete, a first-degree relative with breast cancer (50s). A third volunteer
and thus, no spousal genetic studies were recommended, but had a frame shift mutation (high-risk allele) but not found to
information was proposed to be provided to reproductive age have breast cancer. All alleles were predicted to be damaging.
descendants. Many of the genes identified are pan of prenatal Eight volunteers had first-degree relatives with breast cancer,
carrier screens and/or newborn state-sponsored screening pro- whereas four had a negative family history of disease. All were
grams [phenylketonuria, maple syrup urine disease, cystic fibro- advised to seek confirmation via a CLIA-cenified laboratory.
sis, Niemann-Pick disease, Gaucher disease, factor V Leiden One patient with an HGMD (13, 14) allele was confirmed but
thrombophilia, medium-chain acyl-CoA dehydrogenase (MCAD) predicted to be "neutral" by a commercial laboratory. All were
deficiency]. Undoubtedly, NGS will expand the number of non- counseled regarding the need for regular mammograms and
unreported disease alleles and scope of genes studied for couples gynecological examinations and were requested to inform their
in the pregnancy setting. The Beyond Batten Disease Foundation physician of this research risk allele identification.
of Austin, TX (54), has this goal. Table S6 displays the colon cancer alleles. There was no disease
Table S4 shows that a category of high concern was the incidence of colon cancer in this group with the exception of one
identification of XLR disease risk alleles among our female vol- volunteer with a positive dysplastic polyp biopsy. Five volunteers
unteers. One volunteer had an affected son (isolated case) with had a positive family history of colon cancer. Five volunteers had
Fabry disease that was diagnosed before our study. There were no family history of disease. All were advised to obtain confir-
four disease alleles identified, each listed in HGMD (13, 14). matory CLIA-certified laboratory diagnosis and advise their phy-
There was no family history of these disorders found in the three- sician of the research allele identification. Of the 10 volunteers,
generation pedigree of each. MI were counseled to have their test many had undergone colonoscopy as pan of their health care.
confirmed and daughters studied in a CLIA-certified laboratory Table S7 includes all of the remaining type of cancers. Two
given the high disease risk (50% for men). Three men in our study volunteers diagnosed with melanomas were found to have dif-
had alleles predicted from the OMIM (27, 28) disease database to ferent disease gene risk alleles. We identified 10 volunteers with
be causative for cutis laxus, Duchenne muscular dystrophy, con- prostate risk alleles. One volunteer reported a diagnosis of
genital nystagmus, and hemophilia A, illustrating the challenge of prostate cancer at age 55 while the other nine volunteers
predicting damaging mutations bioinformatically. None had the reported no familial history of the disease. Genetic counseling
disorders. Counseling and family study were individualized for for cancer risk required the greatest counseling time. The con-
each disease risk. Volunteers were made aware of database errors cepts of the two-hit hypothesis (55) and "somatic mutations"
in the reports. (56) were difficult to grasp for the volunteers, even when we
Tables S5-510 provide a third category that is very problem- discussed the subject in great detail during the education session.
atic, the AD group. The allele identification is as previously All volunteers were provided information regarding standard of
described, but counseling is more difficult because of variation in practice approaches for early detection of the respective cancer.
severity and time onset. For this age group of volunteers, the Table S8 lists all of the affected volunteers with cardiomyop-
interest was high because disease prevention was frequently athies (57). Five volunteers had a medical history of cardiac
expressed as a goal in the face-to-face counseling meetings. A dysrhythmia with identified risk alleles. One younger (50s) vol-
poststudy survey also reflected this objective. We focused in this unteer had first-degree relatives requiring pacemakers and car-
paper on the three major causes of death in the United States: ried two risk alleles. Three volunteers had either stent placements
cancer, cardiovascular disease, and neurodegenerative disease. or bypass procedures related to CAD. Each was in their 70s.
In our analysis of each volunteer, we reviewed the genomic and Table S9 lists the 11 volunteers who had no apparent disease
family data. but had a positive family history of tachycardia, sudden death,
Table S5 lists the breast cancer risk results. There were 12 and CAD and carried risk alleles. We provide this experience to
volunteers found to have breast cancer risk alleles of genes broaden alertness to both genetic causation and risk of disease
Gonzalez-Garry et al. PNOS tarty Edition I 3 of 6
EFTA01140244
for adult-onset cardiovascular disease (58). Of the alleles listed and the 69 sets of whole genomes from CGI (15-17, 67). How-
in Tables SE and S9, 13 alleles were found in HGMD (13, 14). ever, we need larger datasets from very carefully phenotyped
We advised volunteers to inform their physicians of these results patients to assist in the interpretation of the variants in our
for their long-term clinical care. patients. The million genome project of the US Department of
In Table SI0, we listed the results for adult-onset neurodegen- Veterans Affairs (68) has the potential to provide such data, as
erative diseases. Our findings were limited but of high interest to the well as private health plans considering adaptation of genome
cohort. It was frequently asked by volunteers if they had Alz- sequencing.
heimer's risk. We summarize our findings for Alzheimer's and
Parkinson risk alleles (59, 60). The genes included APOE, APP, Genetic Discoveries Provided to Volunteers. There are several
PSENI, MAPT, ElF461, GBA, GIGYF2, LRRIC.2, PARIC2, PM20DI, approaches to disclose the results to volunteers. Groups like
and SNCA. There were nine volunteers with HGMD (13, 14) listed Patel et al. use the statistics and epidemiology approach in
risk alleles. Of these, two had a positive family history of Parkinson reporting the polygenic risk assessment using common SNPs that
disease and one with Alzheimer's disease. One of the PARK2 alleles have been previous associated with genetic disorders from ge-
occurred in a volunteer who provided a history of three second- nome-wide association studies (69). The PGP-10 project uses an
degree relatives in a sibship affected with disease. The reminder had automated tool or Genome Environment Trait Evidence (GET-
no family history of either disease. There were 25 alleles predicted Evidence) system, with is a system that is collaboratively edited
to be damaging. One is a frameshift allele. None of these volunteers (70). For this project, we decided to focus on reporting only high-
had a family history of disease. quality variants that are rare in the population and considered
damaging by two of three commonly used predictions algorithms.
Discussion In addition, the variant has to be either reported in HGMD
Exome Sequendng Is Limited. The full spectrum of disease muta- under category DM or the gene has to have been previous
tion identification is not satisfied by exome sequencing alone associated with a genetic disorder (OMIM). The group of vol-
because large deletions, copy number variations (CNVs), and unteers consisted of adults with complete medical and family
triplet repeats are not reliably identified at this time. Further- history so we personalized the reports as described in Fig. 2 to
more, exon capture relies on probe design. For example, the specifically try to identify molecular explanations for the mal-
discovery of the MAGEL2 mutation in our Prader-Willi patient adies reported in their medical or family history. This approach
was made using whole genome sequencing (WGS) from com- generated reports that were easy to explain and accepted by the
plete genomics and missed by exome capture because of high GC patients during the genetic counseling session.
content (51). The accuracy of coding allele identifications was.
however, quite high and thus of great utility as a genome Medical Histories and Family Pedigrees Complement Sequencing
screening approach. CGI (61) sequencing produced higher cov- Resift. The utility of genome data was significantly enhanced
erage than exome sequencing data for CNV, large deletions, when integrating standard medical care features of personal and
and regulatory elements will have utility as we analyze previously family disease diagnosis. The significant number of 23 disease
labeled "junk" DNA for disease causation (62). There is also the associations in all likelihood represents a bias of our volunteers
issue of our limited knowledge of disease alleles within the to seek answers to their personal disease history. This observa-
databases. One of our biggest challenges for the interpretation of tion may hold a key to how we obtain maximal use of genome
human genomes is the lack of gene annotations and the errors in sequencing--sequence the disease index cases. Our experience
databases. Our knowledge base for human disorders is small. would suggest a high value for that utilization. This approach has
There are only —100,000 pathogenic variants in the HGMD (13, been clearly documented to be successful for pediatric genetic
14) database and a fraction of them have errors. If we do not use disorders but not exploited for adult-onset disease. The practical
annotated variants but instead gene annotations as our source of value of this study is summarized in Tables SI and S2 and fell
information, we can calculate the fraction of knowledge that we into two general categories: (i) new knowledge of the genetic risk
can use at this time. For example, the number of genes associ- and heritability for themselves and family; and (ii) options for
ated with human disorders reported by HGMD (13, 14), OMIM therapy (CAD) or imaging (cancer) for personal and extended
(27, 28), UniProtICB (63), Gene Atlas (64), etc. is 4,622. From family care. By using the medical and family history, we were
the 4,622 genes, only 1,955 genes have high-quality data because able to clarify the genetic risk in 6 of the 81 cases. One of the
they are part of the GeneTest (65) database. GeneTest (65) is cases yielded a new discovery of a gene associated with Prader-
a database originally created by the National Center for Bio- Willi syndrome. which is described in another paper (51).
technology Information to track all of the laboratories worldwide
that offer a genetic test for a gene. With this information, we Prenatal vs. Adult Genetic Screening. The technology and this report
know that the fraction of genes that we can use for the in- beg the question of whether we are prepared to offer adult disease
terpretation of a human genome of a successful high-quality risk screening. Currently, prenatal and newborn screening for
whole exome or whole genome dataset is -7-18% when using a selected set of frequently occurring disease alleles (not genome
the high confidence set of 1,955 genes or a set of 4,622 genes. sequencing) is a standard of practice. There are questions that
Despite these limitations, this report documents the utility for deserve medical and ethical review before adult screening
disease associations and risk. becomes a standard of practice. First, for reproductive and new-
During the last few years, the field of NOS has developed born diagnosis, typically only actionable childhood diseases are
a large number of tools that make it easier to handle the analysis explored, which respects the future autonomy of the child and
of reads, variant calling, functional prediction, and annotation preserves her right to an open future (71, 72). Because adult
(66). There are also large publicly available datasets of healthy screening decisions would be made by an autonomous individual
individuals that can be used as controls that can be used to for her own health decisions, broader conceptions of utility, in-
remove technology specific errors or filter out common poly- cluding personal utility, need to be considered (73). It is a clear
morphisms. As we begin to use whole genome sequencing at an and simple decision to provide patients with actionable genetic
increasing depth, we are discovering more variants, so these information from a WES study; on the other hand, it is challenging
public datasets are becoming increasingly important for quality and it raises a difficult ethical question to decide what to do with
control and filtering of variants in smaller projects. One of the incidental genetic findings that are not actionable and could lead
main limitations is the lack of access to public and private ge- to physiological distress to the patient (e.g. APO-E for Alzheimer
nome and exome variants. There are thousands of datasets, but dictate). Despite this ethical dilemma our group of volunteers
the majority are inaccessible to the scientific community. We elected to receive information even if the genetic information
recognize the existence of the 1,000 Genomes project, the might not be actionable. Only 3% of the volunteers were uncertain
NHLBI Exome Sequencing Project (ESP), Exome variant server, about receiving nonactionable information (SIPausnuly Survey).
4 of 6 I www.pnas.orglegildoi/10.10734wias.13I5934110 Gotualez-Garay et al.
EFTA01140245
Volunteer Response to Clinical Reports. From our poststudy survey, technology and cost. Bioinformatics focused on the practical ex-
we found that 72% of the responders reported speaking with traction of medical relevant/actionable data are a challenge. We
their physician about their results. This raises important ques- relied heavily on HGMD alleles for "need to know" information
tions about whether nongeneticists are adequately prepared to to patients. This approach is flawed in three ways: (i) databases
counsel patients based on WES results and whether such follow- contain errors; (ii) highly validated disease databases are scattered,
up will lead to iatrogenic harm or unjustified use of health care private, and limited; and (iii) the future will provide more disease
resources (74). Twenty-five percent reported changing their risk alleles by sequencing than by patient reports in the literature.
behaviors because of the results, which is surprising given that Our current limitation for interpretation of a genome is not the
previous reports found no significant behavior change resulting quality of the data of the coverage of the genome but our disease
from adult risk screening in a direct-to-consumer setting (75). knowledge database. R. Cotton's Human Variome Project (62)
Despite that all of the participants were clearly informed that together with Beijing Genome Institute are proposing to create
their results originated from two independent sequencing experi- a highly validated disease allele database.
ments and that we advised them to have their results clinically New technological advances such as structure-based pre-
validated in a CLIA-certified laboratory, 78% reported that they diction of protein-protein interactions on a genome wide scale
did not have the results confirmed. This low percentage of (80), 3D structure of protein active and contact sites (SI), high-
confirmatory results from the volunteers raises the question of throughput functional assays of damaging alleles (81-83), and
whether it is sufficient to counsel research participants to have new approaches that combine analytes, metabolomics and ge-
results clinically confirmed or if investigators should be required netic information from a single individual (84) are just a few
to confirm results before disclosure. examples of the new technologies that will help us to generate
It was apparent for some volunteers that they were seeking better interpretation of genomic data.
information related to familial diseases. Resolution of these The delivery of the genome risk information will need to be
questions required family member interest and motivation be- carried out by a new cadre of physicians and counselors skilled in
cause, in all cases, we had sequenced the nonrisk family mem- medicine, genetics, and education/counseling. These experts will
ber. We followed up each case with a referral to a qualified need to integrate into medical care as well as has been done for
genetics program with diagnostic capacity for the suspected newborn screening, prenatal diagnosis, and newborn genetic
genetic disease. disease diagnosis.
Our efforts to analyze cancer, cardiovascular, neurodegener- The approach of adult screening is in its early phase but from
ative, and obesity/diabetes risk were successful but needed con- our data appears very promising. We conclude that the genomic
siderable education/counseling to avoid confusion over risk vs. study of adults deserves intensified effort to determine if "need
diagnosis. Second, there are standard of care options for those to know" genome information has the utility for improved
with risk alleles for cancer, cardiovascular disease, and diabetes quality of health for our aging population.
for disease modification or early diagnosis. 'Thus, sequencing
serves as a new screening risk detection approach toward the Materials and Methods
objective of improved health. It is expected that genomic studies The oversight of this research was under two institutional review boards: (i)
will increase surveillance studies (e.g., colonoscopy. gynecologic HSC-IMM-08-0641 (University of Texas Health Science Center at Houston)
examinations, mammograms, cardiovascular markers and scan- and (ii) H-30710 (Baylor College of Medicine).
ning studies) but has the possibility of more precisely identifying
the patients who may benefit from rlititsce prevention surveillance. Cohort Description. The cohort consists of members and spouses in the
The area of adult-onset neurologic disorders is an increasing Houston Chapter of the Young President Organization (YPO) (85). Theentire
concern worldwide as our population ages, thus exposing disease description of the cohort can be found in SI Materials and Methods.
incidence not seen earlier. The genetic disease discoveries are
limited. Confirmatory diagnostics such as image analysis and MS Sequencing. Standard NGS was performed using illumine HighSeq; an
biomarkers/surrogate markers are just emerging, and prevention extended explanation can be found in Materials and Methods.
therapeutic options are nonexistent. Although one might ques-
tion the utility of screening for these disorders at this time, the Sequencing Analysis. Fig. 1 illustrates OUf pipeline, and fig. 2 describes our
experience with Huntington disease (76) screening taught valu- pipeline to detect known pathogenic variations. Additional details can be
able lessons on how to proceed with studying and counseling found in Sf Materials and Methods.
families at risk. Furthermore, there are new therapeutic trials in
disease prevention for Alzheimer's (58) and Parkinson disease Counseing. Genome counseling was conducted by a board-certified internist
based on the genetic cause of disease. These clinical trials use and medical geneticist by both individual meetings and two written sum-
genetic diagnosis to select participants, which is also a successful maries over a period of 12 mo. Additional information can be found in SI
approach in cancer drug development (77-79). Materials and Methods.
Barriers to the Adoption of Genetic Screening via Sequendng. Al- ACKNOWLEDGMENTS. This work was supported by the Cullen Foundation
for Higher Education and the Governing Board of the Greater Houston
though the above comments would present the case for the value of Community Foundation. The funding organizations made the awards to the
adult genetic screening via whole genome sequencing, there are University of Texas Health Science Center at Houston and Baylor College of
major issues to be addressed. In our opinion, the least is sequencing Medicine. C.T.C. was the principal investigator of both grants.
1. Lew S. et al. (2007) The diploid genome sequence of an individual human. PLoS Riot 8. Anonymous Finding of rare disease genes in Canada (forge Canada). Available at
3(10):4254. http/Avenv.genomebccaipartfolia/projects/health.projecb/finding.of.raredisease.
2. Bamshad Mi, et aL (2011) Excaie sequencing as a tool for Mendelian disease gene genevincanada.forge-canada/. Accessed September 19,2013.
discovery. Nat Rev Genet 12(11):74S-7SS. 9. Gehl WA, et al. (2012) The National Institutes of Health 8a-diagnosed diseases pro-
3. Tabor 14K, Berkman BE. Hull 5C. aamShad Ml (2011) GenanKs really gets personal: gram: Insights into rare diseases. Genet Med ta(tkm-59.
How exome and whole genome sequencing challenge the ethical framework of hu- 10. Gant WA et al. 12012) The !Catena! Institutes of Health Lnoiegnesect diseases pro-
man genetics research. Am Med Genet A 1SSA(12):2916-2924. gram: Insights Into rare diseases Genet Med 14(1)51-59.
4. Lander ES R011)Genomesequeuingannhersary. The accelerator. Scknce 331(6020): 11. Gehl WA lifft 0 (2011) The NIH undiagnosed diseases program: Lessons learned.
1024. /AMA 305(I8):1904 -I905.
S. Lander ES 0011) Initial impact of the sequencing of the human genome. Nature 12. Koenekoop RK. et al; Finding of Rare Disease Genes (FORGE) Canada Consortium
470(7333):187-197. (2012) Mutations in NMNAT1 MAO Leber congenital amaurosis and identify a new
6. Biesedser LC, Burke W, Kahane I, Non SE, limn ern R (2012) Next.generation se. disease pathway for retinal degeneration. Nat Genet 44(9):1035-1039.
quencing in the clinic Are we ready? Nat Rev Genet 13(11)1318424. 13. Stetson PD. et al. (2012) The Human Gene Mutation Database (IMMO) and Its ex-
7. Hennekam Rc, Biese<ker LG (2012) Next-generation sequencing demands next-gen- ploitation in the fields of personalized genomlcs and molecular evolution. Curr Pro-
eration phenotypIng. Men Muth 33(5)1384-886. tocol erolnlorm 39:1.13.1-1.1320.
Genzakz-Gairay et al. PNAS Early Edition I 5 of 6
EFTA01140246
14. Stenson PD, et al. (2009) The Human Gene Mutation Database: 2008 update. Genome M. Ruel Let al. (2008) Impairment of SLC17A8 encoding vesicular glutamate transporter.
Med 1(1)13. 3, VGLUT3, underlies nOnSyndrOmk deafness DFNA2S and inner hair cell dysfunction
IS. Anonymous NHLBI exome sequencing project (ESP)exane variant server. Available at in null mice. Am .1 Hum Genet 83(2):278-292.
http:Nevsgswashington.edteEVSL Accessed September 19, 2013. 49. van Hulstelp LT, Dekkers OM, Mn Fl. Smlt 1W, Calmat EP 0012) Risk of malignant
16. Oarke L Zheng-Bradley X. et at 12012) The 1800 Genomes Project: Data management paraganglioma 1n 9211B-mutation and 50410mtnatiOn canals A systematic review
and canmunity access. Nat Methods 9(5)459-462. and meta-analysis./ Med Genet 49(12):768-776.
17. Abecasb GR. et al; 1000 Genomes Protect Consortium (2010) A map of human ge- 50. Pang Y, Tsal TF, Bressler J. Beaudet AL 11998) Imprinting in Angelman and Prader-
nome variation fran poptiation-scale sequencing. Nature 4670319):1061-1073. Willi syndromes. Cuss Opin Genet On B(3):334-342.
ILL Adzhubei La, et al. (2010) A method and server for predicting damaging missense SI. Schaaf CP, et al. (2013) Truncating mutations of MAGEL2 cause autism and erader-
mutations. Nat Methods 7(41:248-249. Willi syndrome (PWS) or PWS.like phenotypes. Nat Genet. In press.
19. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding nonsynonymous 52. Rashid T, (bringer A (2011) Gut-mediated and MLA-827-assoriated arthritis: An em-
variants on protein function using the SIFT algorithm. Nat Probst 40)1073-1081. phasis on ankylosing spondylitis and CrohNs disease with a proposal for the use of
20. Slm NL. Kumar P. et al (2012) SIFT web server: Predicting effects of amino acid sub- new treatment. DiSCOY hied 12(64):187-194.
stitutions on proteins. Nucleic Acids Re, 40(Web Saver issuckYV4S2-W457. 53. Deng H, Gao IC, lankovic 1(2012) The genetics of Tourette syndrome. Nat Rev Neural
21. Hu 1. Ng PC 8012) Predicting the effects of frameshdling lads. Genuine BIN 1342)119. 80)203-213.
22. Ng PC Henatoff S (2001) Predicting deleterious amino acid substitutions. Gnome Re, 54. Anonymous Beyond Batten Disease Foundation. Available at httrabeyonSatten.
11(5)1163-874. orgy. Accessed September 19,2013.
23. Ng PC Henikoff S 0003) 5SF: Predicting amino acid changes that affect protein 55. Knudson AG (1996) Hereditary cancer: Two hits revisited. Cancer ReS Cen Onttif
function. Nucleic Acids Re, 31(13):3812-3814. 122(3):135-140.
24. Ng PC. Henikoff 5 (2006) Predicting the effects of amino acid substitutions on protein 56. Milt-Zaino, S. et al; Breast Cancer Working Group of the International Cancer Genome
function. Anne Rev Genomics Num Genet 7:61-80. Consortium (2012) The life history of 21 breast cancers. CeN 149(5)394-1007.
25. Schwarz 109, Rodelsperger C Schuelke NI, Seelow LI (2010) MutationTaster evaluates 57. Alcalai R, Seidman /G, Seidman CE (2008) Genetic bash of hypertrophic cardiony
thseasecausMg potential of sequence alterations. Nat Methods 7181:575-576. apathy from bench to the clinics. / Carthovasc EintrOphydol 1901:104-110.
26. Liu X. Nan X. Boer-winkle E (2011) dbNSFP: a lightweight database of human non- 58. Rader 01. Cohen 1. Hobbs NH (2003) Monogenk hypercholesterolemla New insights
synonymous SNPs and their functional predictions. Man Mutat 32(8)490499. in pathogenesis and treatment. Gin Invert 111(12)179S-1801.
27. Anonymous Online Mendelian Inheritance in man 0M61. Available at httpllornimorg 59. Martin I. Dawson VL Dawson TM 12011) Recent advances In the genetics of Parkin-
Accessed September 19,2013. son's disease. Anna Rev Genomics Mum Genet 12:301-325.
21. Anonymous NCBI OMIM Online Mendelian Inheritance in Man. Available at httpli 60. Selkoe D1 (2012) Preventing Alzheirner's disease. Science 337(6100:1488-1492.
www.ncbLnlanih.govlornim. Accessed September 19. 2013. 61. Anonymous Complete Genomics Inc. Available at Mbyfernwr.ownpletegenomks.
29. Huijgen K Kindt I, Defesche 1C, Kastelein II (2012) Cardiovascular risk in relation to can. Accessed September 19,2013.
functionality of sequence variants in the gene coding for the low-density koprcrtein 62. Anonymous Human varlome project. Available at httplFwenv)umanvarlomeprOjea.
receptor: A study among 29.365 iedwolva tested for 64 specific low-density lipo- Org. Accessed September 19. 2013.
protein-receptor sequence variants. Cur Heart 133(181:2325-2330. 63. Anonymous UniProtKB. Available at http://wnw.uniprotorgtuniprot. Accessed
30. Boekhoktt 5M. et al. (2012) ASSOciateell of LDt cholesterol, non.HDL cholesterol and September 19.2013.
aPoliPoprotein B levels with risk of cardiovascular events among patients treated 64. Anonymous Gene atlas. Available at hnp:Nmws.geneatias.orgIgenelmain.jsp. At-
with statins: A meta-analysis. JAIAA 307(12k1302-1309. temod September 19, 2013.
31. Waeber G. et al. (2000) The gene MAPKINPI. encoding islet.bran-I, is a candidate for 65. Anonymous Ger** TeStIlla Registry (GeneTesis). Available at http/Ave.w.geneteStS.
type 2 diabetes. Nat Genet 24(3)291-295. org. Accessed September 19,2011
32. Mosta L el al. (2011) Genetic variability of the fructosamme 3-kinase gene in diabetic 66. Anonymous stganswers. Available at httDINSeganswert con. Accessed September 19.
patients. CM Chem Lab Med 41(5):803-808. 2013.
33. da Silva Xavier G, et al. (2011) Per-arntsim (PM) domaM-containing protein kinase is 67. Anonymous 69 genornes data. Ausilable at httpininwtcornpletegenomicscorn/public-
downregulated In human Islets in type 2 diabetes rid regulates gluCagOn secretion. datae69-Genoinest Accessed September 19. 2013.
Diabetobgia 54(4)219-827. 68. Anonymous The million veteran program. Available at http:Nvnwr.va.gmstopcsipresteir
34. MacDonald PE, Rottman P (2011) Per-amt.sim (PAS) domain kinase (PAW as a reg. pressrekrze.chraid-2090. Accessed September 19,2013.
uLatOr of glucagon secretion. Diabetologia 54(4):719-721. 69. Patel C.), et at. 12013) Whole genome sequencing In support of wellness and health
35. Oltahilly S (2009) Human genetics ilurninates the paths to metabolic disease. Nature maintenance. Gramme Med 5(6):58.
462(7271)307-314. 70. Ball MP, et at. 0012) A public resource facilitating clinical use of genomes. fl oc Nate
36. van did Berg L et al. 12011) Melanocordn-4 receptor gene mutations In a Dutch Aced 56 LISA 109(30)11920-11927.
cohort of obese children. Obesity (Silver Spring) 19(3)400-611. 71. American Academy of Pediatrics Committee on Bioethics (2001) Ethical issues with
37. Al-Owain M. A1.Doseri MS. Sunker A. Shuaib T. Alkuraya FS (2012) Identification of genetic testing In pediatrics. Pediatrics W7(61:1451-1455.
a novel ZNF469 mutation in a large family wit' s Ehlen.Danlos phenotype. Gene 72. Oasis OS (1997) Genetic dilemmas and the child's right to an open future. Hastings
S11(2k497-430. Cent Rep 27(2):7-15.
38. Fritsch. LG, et al, (2012) A subgroup of age-related macular degeneration Is emaci- 73. Wolf SM, Lawrenz W. et at (2008) Managing Incidental findings In human subjects
ated with mono-allelic sequence variants in the ABCAO gene. Invest Ophthalmol Vin research: Analysis and recommendations./ Law Med Ethic 36(2)219-24B.
Sal 53(4):2112-2118. 79. McGuire At. Burke W (ZOOM An unwelcome side effect of direCt4O-COMumer per-
39. %hang 1, et al. (2012) IPOtyrnerphism of Usp26 correlates with Idiopathic male In- sonal genome testing: Raiding the medical commons. /AMA 300(22):2669-2671.
fertaityl. Ihonghua Nan Ke Xue 18(2)10S-10B. 75. Blois CS, Scheele N), Topol 61 (2011) Effect of direct-to-consumer gencenewide pro-
40. Wel X. et al.; MSC Comparative Sequencing Program (2011) Ellen* sequencing filing to assess disease risk. N Enloe I Med 364(6):524-534.
identifies GRIN2A as frequently mutated in melanoma. Nat Genet d3(5)A42-446. 76. Wexler NS (2012) Huntington's disease: Advocacy driving science. Annu Rev Med 63:
91. Howell PM, Jr. Li X, Riker AI, )G Y (2010) MicroRNA in melanoma. °droner J 10(2k 1-22.
83-92. 77. Caskey CT (2007) The drug develeprnent crisis: Efficiency and safety. AMIN Rev Med
92. Xi V, et al. (2008) Global comparative gene expression analysis of melanoma patient 5a,1-16
samples. derived <es lines and corresponding turner xenografts. Canter Genomics 78. Casket, CT ( 2010) Using genetic diagnosis to determine Individual therapeutic utility.
Proteomks 50):1-35. Annu Rev Med 61:1-15.
Q. Nelson HO, Huffman LH, Fu R, Harris EL; U.S. Preventive Services Task Force (2005) 79. Miller G (2012) Alzheimer's research. Stopping Alzheimer's before it starts. Science
Genetic risk assessment and BRCA mutation testing for breast and ovarian cancer 337(6096):790-792.
susceptibiky: Systematic evidence review for the V.S. Preventive Services Task Force. 80. Mang GC et al. (2012) Structure-based prediction of protein-protein interactions on
Ann intern Med 143(5):362-379. a genome.wicte scale. Nature 490(7421):556-560.
44. Anonymous National Cancer Institute BRCA1 and BRCAZ. Available at httplAwm. 81. Edwards AM. BounVa C Kerr DJ, Wilhon TM (2009) Open access chemical and algal
cancer.govkancertopiatfactsheet/RiskttIRCA. Accessed September 19,2013. probes to support drug discovery. Nat Chem Rio! 50):436-490.
45. Holt SIC. et al. (2008) ASSO0atiOn of megalin genetic polymorphism with prostate 82. Maroon( MT. Jarvis BM. Donnelly-Roberts D (2012) High throughput functional assays
cancer risk and prognosis. CM Cancer ReS 14(12):3823-3831. for P2X receptors. Cliff Protocol Phannaca lumChapter 9:Unit 9.15.
96. Frank.Raue K, et al. (2013) Prevalence and clinical spectrum of nonsecretoni medul- 83. Trivedi 5, Liu /, Liu R. Bostwick R (2010) Advances in functional assays for high.
lary thyroid carcinoma In a series of 839 patients with sporadic medullary thyrOld thrOughput greening of ion thannelstargets Expert Opal Ono) Gismo 5(I 0)1995-I C06.
carcinoma. Thyroid 23(3):294-300. 89. Suhre K. et at; CARDloGRAM (2011) Human metabolic individuality in biomedical and
97. Mak HH, et aL (2007)Oncogenic activation of the Met receptor tyrosine kinase fusion pharmaceutical research. Nature 477(7362):54-60.
protein, Ter-Met. Involves exclusion from the endocytic degradative pathway. On- 85. AnCelyMOW Membership criteria YPO. Available at httinivnwrypo.orgdoin.ypor.
cogene 26(51k7213-7221. Accessed September 19,2013.
GM 6 I www.priaS.OrgfCgildOi/10.1073/13ries.1315939110 Gonzalez-Garay et al.
EFTA01140247
Supporting Information
Gonzalez-Garay et al. 10.1073/pnas.1315934110
SI Materials and Methods the BAM file [Binary version of a SAM (Sequencing Alignment
Cohort Description. cohort consists of members and spouses in Map) file] file were visualized using Integrative Genomics
the Houston Chapter of the Young Presidents Organization Viewer (IGV) (33). The purpose of this step was to try to remove
(YPO). Criteria for membership into the YPO includes corporate the remaining false positives.
and community leadership (1). This cohort is well educated and Each genetic variant was validated using the following steps: (i)
of higher socioeconomic status. All 450 YPO members were retrieve reads over variant sites for each individual; (ii) make
invited to attend an 8-h educational program incorporating SamTools (8) genotype calls (an alternate calling algorithm);
technology, human genetics, anticipated outcomes, ethical con- (iii) retrieve quality scores for all reads; (iv) keep track of the
siderations, discussion groups, and technology demonstrations directional depth and require at least two variant reads in the 5'
and printed materials. Of the 150 attendees, 81 volunteered to and 3' orientation for a variant to be considered true; and (v)
participate in this study: 46 men and 35 women, with an average filter out variants if the SamTools (8) genotype call disagrees
age of 54 y. All 81 elected under the terms of the University of with the GATK (10) call or if the quality scores or directional
Texas Health Science Center at Houston's institutional review depth values do not exceed minimum values.
board to receive "need to know" genomic disease risk results.
Each volunteer provided a detailed medical and drug use history Establishing Criteria for Highly Reliable Variant Calling from Exome
reviewed by our physician-researcher (C.T.C.). A three-genera- Sequencing. Our first objective was to define the methods needed
tion medical pedigree was acquired on each volunteer. One to identify a set of "highly reliable- variants from the Illumine
volunteer could provide no family history. sequencing and apply these methods to variant calling on all of
our samples. To meet our definition of a highly reliable variant,
Whole exome sequencing (WES) Sequendng. Genomic DNA was each variant had to be detected under two independent or-
extracted using a UNA kit (Promega wizard genomic DNA puri- thogonal sequencing technologies and been considered as high
fication kit) following Promega's instructions (2). The cohort was quality. Because there is not a common definition of what a high-
sequenced twice: the first whole exome sequencing experiment quality variant is, we decided to take advantage of the confidence
(2011) was performed using Illumina's HiSeq and the Genome category scores provided from complete genomics; variants with
Analyzer Hz system (3) after enrichment with Nimblegen V2 kit a score of VQHIGH are consider high quality (masterVarbeta
(44 Mb) (4) (outsourced to the national center for genome re- files version 2.0) and develop an equivalent value in our illumine
sources). Our second WES experiment (2013) was performed us- sequencing data. To accomplish our first objective, a dataset of
ing Illumines newest machines HiSEq. 2500 (3) after enrichment variants was generated from a set of 24 samples that we se-
with Agilent SureSelect target enrichment V5+UTRs (targeting quenced using Illumine (3) and an orthogonal sequencing tech-
coding regions plus UTRs) (5) (outsourced to Axeq Technologies). nology (CGI) (6). CGI has their own proprietary workflow from
Genome sequencing of a small subset (24 subjects) for validation alignment to data annotation (34), Fig. 1 describes our analysis
purposes was carried out by Complete Genomics Inc. (CGI) (6). workflow for exome sequencing data. Fig. S2A shows the in-
tersection between the nonsynonymous coding variants (NSCVs)
Sequendng Analysis. Our analysis pipeline consists of Novoalign detected by CGI (6) and Illumine (3) exome sequencing. We
(7), Samtools (8), Picard (9), and The Genome Analysis Toolkit extracted variants from CGI with a score of VOHIGH and that
(GATK) (10), followed by variant annotation (11-14) using were also detected in the corresponding illumina's vcf file (Fig.
multiple databases from the University of California Santa Cruz S2/3). This subset of highly reliable variants represents an aver-
(UCSC) Genome bioinformatics site (15). Fig. 1 illustrates our age of 72% of the variants detected by CGI. By using our da-
pipeline. Fig. 2 describes our pipeline to detect known patho- taset, we were able to systematically test for conditions and
genic variations. We detected known variants associated with software setting in our pipeline that generate the majority of the
human diseases using the Human Genome Mutation Database highly reliable variants and reduce the probability of selecting
(HGMD) database from Biobase (16, 17) and genes known to be variants not present in our dataset. We reached the conclusions
associated with human disorders from Online Mendelian In- that by using two variant callers tools, GATK UnifiedGenotyper
heritance in Man (OMIM) (18, 19) and GeneTests (20). Func- and mplileup/bcftools (samtools), and selecting an overlapping
tional effects of each nonsynonymous coding variant were set of variants, we obtained variants of the highest quality. In
evaluated using three different functional prediction algorithms addition, a postcalling filter enforces that each variant has to
[Polyphen 2.0 (21), Sift (n-r), and MutationTaster (28)] using have a mapping quality >30, a base quality >20, and a coverage
the Database of Human Non-synonymous SNVs and their func- >10, with at least a 3:7 ratio of variant to reference (Het) and the
tional predictions and annotations (dbNSFP) (29). Filtration of presence of the variant in reads from both orientations. By using
common polymorphisms was accomplished using frequencies from these postcalling filters, we eliminated the majority of false-
the National Heart. Lung, and Blood Institute (NHLBI) exome positive calls (FP).
sequencing project (ESP) (30), 1,000 Genomes (31, 32), and in-
ternally by removing any variant that appeared more than three Counseling. Genome counseling was conducted by a board-cer-
times in our cohort. In addition, a group of candidate genes was tified internist and a medical geneticist by both individual
obtained from OMIM (18, 19) for each volunteer after a careful meetings and two written summaries over a period of 12 mo. The
analysis of the family and personal health history of each volunteer. summary reports were prepared and jointly endorsed by a bio-
Variations in those OMIM (18, 19) candidate genes were identified informatician and a physician. Additional counseling was con-
and submitted to the same frequency and functional effects filter as ducted by phone calls and appointments with their physician as
described before. requested by the volunteers.
Variant Validation. Every variant identified in our pipeline was Counseling of Results. Both causative and problematic alleles were
evaluated for quality control, and the variant's read alignments in reported verbally and in two written reports over an 18-mo period.
conzeiez-oarro et al.kwm.pnas.orgicgikontentishorti1315934110 1 el 8
EFTA01140248
The first comprehensive report was updated —1 y after (i) larger The results of the anonymous online survey showed that,
control databases downgraded some problematic alleles with overall, participants were motivated to take part in the project to
more than a 1% frequency; (ii) private consultation with disease receive their genetic results and learn about their personal risk of
experts; and (iii) validation with original publications and small disease. Seventy-nine percent of respondents reported that the
disease center databases. Several new disease—gene associations opportunity to receive their personal genetic results was the most
were discovered for the reported familial diseases found by important factor in their decision to take part in the project,
pedigree and personal medical histories. Volunteers were informed whereas another 10% cited a personal interest in genetics in
that these were research results and instructed to consult with their general. When asked to choose which factor was most important
personal physician so that they could have the results validated in in their decision to receive their personal genetic results, most
a Clinical Laboratory Improvement Amendments (CLIA)-certified respondents (52%) reported that their interest in finding out their
laboratory. Volunteers whose family members warranted genetic personal risk for diseases was the most important factor; other
study were referred to the Baylor College of Medicine genetics important factors included the desire to get information about
program as a medical referral because this function was outside risk of health conditions for their children (17%), the desire to
the institutional review board scope and Baylor College of learn more about the medical conditions in their family (10%),
Medicine offered both clinical genetic and CLIA Laboratory and curiosity about their genetic makeup (10%).
expertise. Our study preceded the publication of the incidental Ninety-seven percent of respondents agreed or strongly agreed
findings guidelines in clinical WES and whole genome se- that they were glad that they decided to participate in this study
quencing (WGS) of the American College of Medical Genetics and receive their personal results, leaving only 3% undecided.
and Genomics (ACMG) (35). However, we have reviewed their Most respondents (72%) spoke with their primary care provider
list of 57 genes and 24 actionable conditions, and we found that about their results, and 50% reported that they spoke with other
we included all their genes in our analysis. medical professionals, including cardiologists, oncologists, and
obstetricians/gynecologists, among others; 22% reported that
Poststudy Survey they had their twice-confirmed research results confirmed in
We conducted an online survey to assess volunteers' experiences a CLIA-cenified laboratory.
of participating in this project under a Baylor College of Medi- Twenty-five percent of respondents reported that the test
cine instituational review board. The survey consisted of 82 items results motivated them to make changes to their health care (i.e.,
and focused on how the volunteers felt about taking part in the undergoing tests, seeing a specialist, taking vitamins or herbal
research project, as well as their perspectives on genetic in- supplements), exercise, medications, or insurance (Table S11).
formation in health care and genomic research in general. Study Respondents generally felt that researchers should offer per-
participants were told the survey was completely voluntary and sonalized results to research participants: 54% felt that researchers
that they could skip any question they preferred not to answer are obligated to offer results. 22% felt that researchers are obli-
and could end their participation at any time. gated to offer results only if the researcher is a physician, and the
All 81 study volunteers were invited via e-mail to participate in remaining 24% did not think researchers were obligated to offer
the anonymous online survey within 12 mo after receiving their results. Respondents were pleased with the methods by which they
individual genome reports. Forty-two participants responded to were given their results in this study, with 95% agreeing or strongly
the online survey (response rate, 51.9%; 38 responses were agreeing that they were glad the researchers sent them a person-
complete). Of those who responded, 59% were men, 41% were alized results report, and 100% agreeing or strongly agreeing that
women, and 95% had biological children. Ninety-seven percent they found the in-person consultation about their results very
described their race as white, and 5% chose "other- (participants helpful. When asked, 94% said they would also want an electronic
could choose all that applied); 5% also identified themselves as record of their entire genome if it were available.
Hispanic or Latino. All participants had earned a college degree, When asked about genetic testing in health care, 83% reported
and 63% had completed at least some graduate work. All par- that they felt that genetic testingshould be a regular part of health
ticipants reported having had a routine medical check-up within care and 97% agreed or strongly agreed that they felt comfortable
the last 2 y, and when asked how they would rate their health, using these results to make decisions about their health. Nev-
58% reported excellent, 29% reported very good, 11% reported ertheless, respondents were evenly split when asked if they
good, and 3% reported fair. thought these results should be part of their medical record.
Poststudy survey results. This study had as its objective to deliver In summary, our poststudy surveys indicated that volunteers
helpful medical genetic information. The mandatory education were motivated to gain personal and family health knowledge,
program informed volunteers that unexpected risks were to be satisfied with the translation of the genetic information, and had
expected. Our institutional review board required volunteers to have a divided opinion about incorporating their genetic information
the options of declining this information. None chose that option. into their medical records.
1. Anonymous Membership criteria YPO. Available at http:Itwvnv.yp0.0r940In-ypot to. motenna A. et al. (2010) the Genome Analysis TO011dt: A maoeeduce framework for
Accessed September 19, 2013. analyzing neat-generation DNA segmenting data. Genome Res 20191:1297-1303.
2. Anornmote Wizard. Available at httpd%wnv.pornega.comIresources/probacelsrtedinical- 11. Cingobni P snpEff: SNP effect predictor. Available at hnpf/snpeff.sourceforge.netr
manualgONAtardlenom,r4na.purfficatiankrt.prototoV. Accessed September 19. 2013. ACCeSSed September 19, 2013.
3. Anonymous Illumine. Available at httpdVenw.illumina.com. Accessed September 19, 12. Cingolani P, et al. (2012)A program for annotatin and predicting the effects of single
2013. nucleotide poirrnabhiSms. SnpEff: SNPs in the genome of Drosophila melanogaster
4. Anonymous NemtleGet Rome. Available at htipAyntwnirnOlelenCOnOrMiuttMeetefre2/ strain while; Iso-2; (Austin) 612)930-92.
vgandex.html.Aaessed September 19, 2013. 13. San Lucas FA, Wang G, Schee< P, Peng Et (2012) Integrated annotation and analysis or
5. Aglent Te<Mologies Aglimt SureSelect array. Available at httpawnvgenantitS.2114M, genetkvariamsfrannext-generationsequencingstudesMthvarianttook euoinformarks
comientaorroSequencingSureSelect.Human.All-ExonScat740002&tabickAGPf6 ZB(3):421-422.
1206. Accessed September 19, 2013. 14. Wang K, Li M, Habana/len H (2010) ANNOVAft functional annotation of genetic
6. Anonymous Complete Genomlcs mc. Available at httpAmwr.conpletegenomks,com. variants from high.throughput sequencing data. Nudek Adds acs 38(161:e164.
Accessed September 19, 2013. 15. Kuhn RM, leaussler D, Kent W1 (2013) The UGC gencene browser and associated
7. Novccraft.com (2012) Available at httplAwm.novocraft.com. Accessed September tools. &lel Bioinfonn 14(2)140-161.
19. 2013. 16. Stamen PO. et .1(2012)Th, thaw Gene MutationDatabase (HOMO) and Its exploitation
S. SAMtools. Available at http://samtools.sourceforge.ned. Accessed September 19, in the fields of persona/tied gerramics and molecular evolution. Caw Protocol Ilioldorm.
2013. 17. stenson PD. at al. (2009) The Human Gene Mutation Database: 2008 update. Genoa*
9. Picard. Available at httpl/pkard.sourceforge.nett Accessed September 19, 2013. Med 1(1):I3.
Gonzalez-Garay et al. www.pnas.orgicgi/contentishort/1315934110 2 of 8
EFTA01140249
IL Anonymous NCEtt OMIM Online Mendelian Inheritance inMan.Availableat httpdAvww. 28. Schwarz 11A, R6delsperger C, Schuelke µ Seelow D (2010) MutationTaster
neblnimnih.govrornim Accessed September 19.2013. evaluates disease-causing potential Of sequence alterations. Nat Methods 7(8):
19. Anonymous Online Mendelian Inheritance in Man OMIM. Available at httpfromim. S7S-S76.
org. Accessed September 19, 2013. 29. Uu X. Mn X, Boenuinkle E (2011) dbNSFP. a lightweight database of human
20. Anonymous Genetic Testing Registry (GeneTests). Available at httpavnwr.geneteStS. nonsynonymous SNPs and their functional predictions. Num Mutat 32(8):894-899.
org. Accessed September 19, 2013. 30. Anonymous NHLSi Extent Sequencing Pitied (ESP) extent variant server. Available at
21. Adrhubel IA, et al. (2010) A method and server for predicting damaging rMSSenSe htipllevs.gs.washingtonedurEVS/. Accessed September 19, 2013.
mutations. Nat Methods 7(4)248-249. 31. Clarke L. 2henggraciley x. et al. (2012) The 1000 Genomes Project Data management
22. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding nor -synonymous and community access. Nat Methods 9(S)A59-462.
variants on protein function using the SIFT agorithm. Nat PrOtOC 417):1073-1081. 32. Abecasis GR. et al.; 1000 Genomes Project Comonium (2010) A map of human
23. Sim M., et al. (2012) SIFT web server predicting effects of amino acid substitutions on genome variation from population.scale sequencing. Nature 467(7319):
proteins. Nucleic Adds Res 40(Web Server issue):W452-W457. 1061-1073.
24. Hu J. Ng PC (2012) Predicting the effects of frameShiffing Indels. Genre 13(21iR9. 33. Robinson lT. et al (2019 Integrative genomlcs viewer. Nat Iliotedthoi 29(1)24-26.
2S. Ng PC Hentoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 34. Complete genomics (data file format standard pipeline version 2.0). Available at
I1(5):863-874. http'Aswrw.mrtpletegenomlaco0kustomeriupporvdocumentatIo&100357139.htm1.
21, Ng PC. Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein Accessed September 19, 2013.
function. Nucleic Acids Res 31(13):3812-1814. 35. Green RC, Berg 1S, et al. (2013) ACMG recommendations for reporting of
27. Ng PC. Henikoff 5 (2006) Predicting the effects of amino acid substitutions On protein incidental findings In clinical exome and genome sequencing. Genet Med 15(7):
function. Annu Rev Genomia Nun Genet 7:6140. 565-574.
1
250
200
I ISO
1100
/ 50
0 4 p
3 • 5 0
Frequency
ifinx2 lRCAt
0FTR GCw23
• .3a3TF2 LOUR LRP2 WWI pea irA_...*J
3 )4PG Ei4Ql TTN
2 kOCA1 MICA3 ARCM trcz
arA CB SCA0M P.CAN STF42 }WTI }WOO I.. I
14.0005 CAM Ts ACVRLI pan *mpg Aims Komi
Fig. SI. Grouping genes by occurrence. frequency of genes with nonsynonymous coding mutations in our cohort. This graphic provides a summary of the
number of times alleles were observed for an individual gene. In each of these cases, the allele was either part of HGMD or OMIM, rare, and carried a high
polyphen2 score. An example of a gene with frequent risk alleles include Titin, the largest genes in our genome and recently reported to be causative of
dilated cardiomyopathy. A second example of a smaller gene wi h a large number of variations is MR, where the disease database is deep, and it is known to
be one of the most common autosomal recessive diseases in whites. This graphic supports that we did not select polymorphic genes but unique mutations in
each volunteer.
11,054 t 8571100%j
Non-syn-coding saps NowfyycMInasaps
s LI
11.171 • III 8.137 • 147
MI alumina High Quaky Snps 06%) H•gh Quality Saps detected
also by Alumina 172%1
Average of 24 samples CGI variants only
Rig. S2. Variants detected using Complete Genomics Inc (CGO and Illumine. (Left) Comparison of nonsynonymous coding SNPs (NSCS) obtained from Com-
plete Genomics (red) and Illumine (green). Twenty-four human samples were sequenced using both technologies, and NSCS were compared in each sample.
The average results were calculated and graphed as a venn diagram. The intersection represents the set of NSCS detected by both technologies. On average,
73% of the NSCS detected by CGI were also detected by Illumine, while 82% of the NSCS detected by Illumine were also detected by CGI. (Right) Using the same
samples we calculated that 96% of all the CGI NSCS
are considered "High Quality" according to the CGI proprietary quality matrix. An average of 72% of all the
Nsa detected by CGI was also detected by Illumine (blue). Since two orthogonal sequence technologies detected the same set of NSCS, this group of variants
most likely represents a set of real variants which we refer to as Mighty reliable NSCS." The set of "Highly reliable NSCS"
were used to establish quality criteria
in our Illumina's variant detection pipeline.
GOnZeleZ-Gerity et al. www.pnas.orgicgikontentrshorV1315934110 3 of 8
EFTA01140250
Table Si. Disease associations with alleles
Case Disease Risk gene Allele HGMD OMIM gene ID
3937 Hypercholesterolaemia LOLA p.P526H CM100938 606945
3890 Hypercholesterolaemia LOLA O7261 CM920469 606945
3910 Hypercholesterolaemia LOLA O7261 CM920469 606945
3900 Hypercholesterolaemia LOLA p.V8271 CM920471 606945
3915 Hypercholesterolaemia LOLA p.V8271 CM920471 606945
3923 Obesity MC4R p.1251L CM030483 155541
3923 Diabetes mellitus, type II MAPK8IP1 p.D386E NA 604641
3973 Obesity MC4R p.C326R CM070992 155541
3937 Diabetes mellitus type 2 (MODY) FN3K p.H146R NA 608425
3937 Diabetes mellitus type 2 (MODY) PASK p.P12S6L NA 607505
3923 Macular degeneration, age related ABC*: p.G863A CM970003 601691
3898 Brittle cornea syndrome type 1 ZNF469 pD2902Y NA 612078
(BCS1) keratoconus
3889 Male infertility USP26 p.T123 Q124insT NA 300309
3942 Melanoma BAG4 p.W103X NA 603884
3959 Melanoma GRIN2A p.N1076K NA 138253
3896 Breast or ovarian cancer BRCA2 p.1505T CM010167 600185
3959 Breast or ovarian cancer BRCA2 p.S384F CM065036 600185
3897 Breast or ovarian cancer BRCA2 p.T2515I CM994287 600185
3950 Follicular thyroid cancer (age 41) TPR p.R105C NA 189940
3960 Prostate cancer LRP2 P.N479H NA 600073
3960 Prostate cancer LRP2 P.G4417D NA 600073
3934 Nonsyndromic deafness MYH14 p.M1611 NA 608568
3934 Nonsyndromic deafness SLC17A8 p.R75C NA 607557
NA, not available.
Table Si. Familial diseases and assedatIons %IN prer' Association
Case Disorder Gene Volunteer relatedness Volunteer Affected relative
3949 Praeder Willie MAGEL2 2°
3947 Paraganglioma SDHB 1°
3930 Ankylosing spondylitis HLA-827 1°
3930 Tourettes TBD 1°13) IP IP
3928 Parkinson LRRK2 1°
—, negative; IP, research in progress.
Gonzalez-Casey et al. www.pnas.olgkgVcontent/short/13I5934I10 4 of 8
EFTA01140251
Table S3. Recessive disorders
Cases Disease Risk gene Allele HGMD OMIM
3958 Niemann-Pick type C2 disease NPC2 p.N111K CM081368 601015
3896, 3900, 3915, 3895 Antitrypsin al deficiency SERPINA1 p.R247C, p.E366K (3) CM910298, CM830003 107400
3894 Glycogen storage disease 0 GYS2 p.Q183X CM023388 138571
3889 Glycogen storage disease la G6PC p.R83C CM930261 613742
3901 Glycogen storage disease 3 AGL p.R477H CM104343 610860
3945 Glycogen storage disease 4 GBEI p.Y329S CM960705 607839
3898 Glycogen storage disease 6 PYGL p.D634H CM078418 613741
3941, 3952 Glycogen storage disease 9B PHKB p.Q650K CM031327 172490
3915, 3919, 3943, 3954 Fanconi anemia FANCA p.T126R, p.S858R (3) CM043494, CM992317 607139
3936, 3934 Familial Mediterranean fever MEFV p.E148Q, p.P369S, p.R408Q CM981240, CM990837, CM990838 608107
395, 439, 243, 953 Cystic fibrosis CFTR p.D1152H, p.S1235R, CM950256, CM930133 602421
3933 Sandhoff disease HEXB p.A543T CM970723 606873
3940 Fuchs endothelial dystrophy ZEB1 p.Q824P CM100242 189909
3908 Factor V deficiency FS p.P18165 CM095204 612309
3952 Hepatic lipase deficiency LIPC p.T405M CM910258 151670
3962 Krabbe disease GALC p.T112A CM960678 606890
3954 Macular corneal dystrophy, type 2 CHST6 p.Q331H CM055930 605294
3891, 3947, 3959, 3924, Usher syndrome Id CDH23 p.A366, p.01806E, p.R1060W CM050545, CM105104, CM021537 605516
3895, 3897
3900, 3910 Phenylketonuria PAH p.A3005, p.R53H CM920555, CM981427 612349
3933, 3946 MCAD (medium-chain acyl-coA ACADM p.K329E (2) CM900001 607008
dehydrogenase deficiency)
3914 Adrenal hyperplasia HSD3B2 p.R249X CM950655 613890
3926 17-a-hydroxylase/17,20-Iyase CYP17A1 p.R449C HM0669 609300
deficiency
Table 54. X-linked recessive
Case Disorder Risk gene Allele Sex HGMD OMIM
3891 ATRX syndrome ATRX p.N18605 Female CM950125 300032
3930 Fabry disease GLA p.A143T Female CM972773 300644
3901 Mucopolysaccharidosis II IDS p.D252N Female CM960865 300823
Table SS. Breast cancer risk
Case Disease Risk gene Allele Family history Sex Age (y) HGMD OMIM gene ID
3959 Breast cancer BRCA2 p.5384F Affected (44) Female 44 CM065036 600185
3896 Breast cancer BRCA2 p.15057 Affected Female 49 CM010167 600185
3955 Breast cancer BRCA2 p.E1625fs Negative Female 42 CD011121 600185
3962 Breast cancer PALB2 p.V1103M First second, third degree (2) Female 51 CM118272 610355
(49-60s)
3936 Breast cancer BACA? p.Y856H First degree (sister 40s) Male 62 CM042673 113705
3936 Breast cancer BRCA2 p.K2729N First degree (sister 40s) Male 62 CM021957 600185
3963 Breast cancer BRCA2 p.R2034C First degree (60s) Male 48 CM994286 600185
3897 Breast cancer BRCA2 p.T25151 First degree (80) Female 51 CM994287 600185
3934 Breast cancer RADS1C pT287A First degree (uterine) Female 50 NA 602774
3939 Breast cancer RADSO p.R1069X First degree breast (60s)hecond Male 56 NA 604040
colon (60s)
3912 Breast cancer RADS1C p.A126T Negative Male 77 CM1010201 602774
3923 Breast cancer RADS1C pT287A Negative Male 60 CM1010198 602774
3956 Breast cancer RADS1C pT287A Negative Male 59 CM1010198 602774
NA, not available.
Gonzalez-Gas ay et al. www.pnas.orgkgkontent/shortfl3I5934I10 S of 8
EFTA01140252
Table S6. Colon cancer risk
Case Disease Risk gene Allele Family history Sex Age (y) HGMD OMIM gene ID
3896 Colon cancer MLHI p.K618A First degree Female 49 CM973729, CM950808 120436
3891 Colon cancer MLH3 p.E1451K First degree (70s) Female 62 CM013011 604395
3897 Colon cancer APC p.A2690T First and second Female 51 CM045404 611731
degree cancer
3904 Colon cancer MSH2 p.G315V Second degree Male 49 CM 995220 609309
3897 Colon cancer MSH2 p.G12D Negative Female 51 CM 950813 609309
3962 Colon cancer APC p.52621C Negative Female 51 CM921028 611731
3955 Colon cancer APC p.R2505C? Negative Female 42 NA 611731
3933 Colon cancer MUTYH p.63820 Negative Female 69 CM020287 604933
NA, not available.
Table 57. Other cancer risk
Case Disease Risk gene Allele Family history Sex Age (y) HGMD OMIM gene ID
3959 Melanoma GRINIA p.N1076K Affected Female 44 NA 138253
3942 Melanoma BAG4 p.W103X Affected Male 70 NA 603884
3950 Follicular thyroid cancer TPR p.R105C Affected Male 48 NA 189940
3960 Prostate cancer LRP2 p.N479H Affected Male 65 NA 600073
3946 Prostate cancer LRP2 p.M46011 Negative Female 59 NA 600073
3957 Prostate cancer LRP2 p.N17975 First degree Male 44 NA 600073
(father)
3957 Prostate cancer DLC1 p.089N First degree Male 44 NA 604258
(father)
3932 Prostate cancer CHEKI p.E64K Negative Male 47 CM030414 604373
3935 Prostate cancer ELACI p.R781H Negative Female 70 CM010221 605367
3902 Prostate cancer MSR1 p.H441R Negative Female 46 CM023581 153622
3900 Prostate cancer MSR1 p.R293X Negative Male 45 CM023579 153622
3954 Prostate cancer RNASEL p.E265X Negative Male 72 CM020300 180435
3954 Prostate cancer RNASEL p.6595 Negative Male 72 CM031342 180435
3963 Retinoblastoma RBI p.R656W Negative Male 48 CM030511 614041
3896 Pituitary cancer ACVRL1 p.A482V Negative Female 46 CM994582 601284
3896 Pituitary cancer ACVRL1 p.A482V Negative Female 46 CM994582 601284
3930 Esophageal cancer WWOX p.G 1785 Negative Female 52 NA 605131
3973 Esophageal cancer WWOX p.R120W Negative Male 71 CM016224 605131
3916 Esophageal cancer WWOX p.R120W Negative Male 70 CM016224 605131
3941 Gastric cancer MET p.A347T Negative Male 46 NA 164860
NA, not available.
Gonzalez-Gas ay et al. www.pnas.orgkgifccintent/shOrtfl3I5934I10 6 of 8
EFTA01140253
Table 58. Cardiomyopathy-affected volunteers
Case Disease Risk gene Allele Clinical Age (y) HGMD OMIM gene ID
3925 Dilated cardiomyopathy MYH6 p.A1443D Atrial fibrillation 65 CM107536 160710
3926 Cardiomyopathy DSG2 p.V158G Arrhythmia 65 CM070921 125671
arrhythmogenic right ventricular
3935 Dilated cardiomyopathy MYH6 p.R1398Q Cardiac dysrhythmia 70 NA 160710
3935 Cardiomyopathy, dilated, 1EE MYH6 p.R1398Q Cardiac dysrhythmia 70 NA 160710
3935 Arrhythmogenic right TTN p.P3751R Cardiac dysrhythmia 70 NA 188840
ventricular cardiomyopathy
3955 Dilated cardiomyopathy ACTN2 p.Q349L V pacemaker 53 NA 102573
3955 Familial hypertrophic CSRP3 p.R100H V pacemaker 53 CM091458 600824
cardiomyopathy 12
3916 Dilated cardiomyopathy LAMA2 p.T821M Stent placement 71 NA 156225
type 1A
3887 Cardiomyopathy, hypertrophic MYBPC3 p.R326Q Stent placement (3) 73 CM020155 600958
3887 Cardiomyopathy familial MYLK2 p.V402F Stent placement (3) 73 NA 606566
hypertrophic (CMH)
3953 Brugada syndrome KCNE3 p.M65T Two bypass, scent, 71 NA 604433
(arrhythmia) and familial history of CAD
3953 Arrhythmogenic right TTN p.P5237T Two bypass, scent, 71 NA 188840
ventricular cardiomyopathy and familial history of CAD
3937 Hypercholesterolaemia LDLR p.P526H Three generations of early MI, 53 CM100938 606945
elevated LDL, cholesterol, triglycerides,
and treated with statins
3890 Hypercholesterolaemia LDLR p.T7261 1° early MI 57 CM920469 606945
3910 Hypercholesterolaemia LDLR p.T7261 V aortic occlusion, 51 CM920469 606945
elevated cholesterol
3900 Hypercholesterolaemia LDLR p.V8271 1° early MI 45 CM920471 606945
3915 Hypercholesterolaemia LDLR p.V8271 Three generations of elevated cholesterol, 70 CM920471 606945
treated with statins
CAD, coronary artery disease; MI, myocardial infarction; NA, not available.
Table S9. Cardiomyopathy unaffected but family history
Case Disease Risk gene Allele Clinical Age (y) HGMD OMIM gene ID
3943 Arrhythmogenic right ventricular TTN p.G1345D Familial history of 44 NA 188840
cardiomyopathy arrhythmia
3896 Dilated cardiomyopathy SYNE1 p.I.3057V Familial history 45 NA 608441
3896 Arrhythmogenic right ventricular JUP p.V648I Familial history 45 NA 173325
dysplasia/cardiomyopathy
3944 Hypertrophic cardiomyopathy OBSCN p.K1671N Father 45 NA 608616
3931 Dilated cardiomyopathy MYH6 p.R1398Q Familial history 46 NA 160710
3907 Cardiomyopathy, hypertrophic ACTN2 p.T495M Father 47 CM101366 102573
3950 Cardiomyopathy MYOMI p.G11625 Familial history 48 NA 603508
3919 Romano-Ward syndrome (arrhythmia) SCNSA p.51769N Familial history 51 CM002391 600163
3889 Romano-Ward syndrome (arrhythmia) SCNSA p.51769N Mother 51 CM002391 600163
3917 Cardiomyopathy MYOMI p.R1573Q Familial history + 51 NA 603508
father
3960 Dilated cardiomyopathy NEBL p.K60N Son CAD 66 CM106905 605491
3976 Cardiomyopathy MYOM1 p.E704K Older brother 72 NA 603508
3976 Early onset myopathy MYH2 p.V9701 Older brother 72 CM051560 160740
Gonzalez-Gatay et al. www.pnas.orgkgkontentishort./13I5934I10 7 of
EFTA01140254
Table 510. Neurodegenerative risk
Case Disease Risk gene Allele Family history Age (y) HGMD OMIM
3908 Alzheimer's disease APOE p.C130R Negative 44 CM900020 107741
3916 Alzheimer's disease APOE p.L46P Parkinson 1° (72) 71 CM990167 107741
3954 Alzheimer's disease APP p.R469H Negative 72 NA 104760
3942 Frontotemporal dementia MAPT p.5427F Negative 71 NA 157140
3954 Frontotemporal dementia MAPT p.V224G Negative 72 NA 157140
3895 Parkinson disease ElF4G1 p.G686C Negative 49 CM117028 600495
3916 Parkinson disease ElF4G1 p.R120SH Parkinson 1° (78) 64 CM117009 600495
3951 Parkinson disease ElF4G1 p.51596T Negative 64 NA 600495
3931 Parkinson disease 11 GIGYF2 p.P1222fs Negative 44 NA 612003
3946 Parkinson disease 11 GIGYF2 p.H1171R Negative 59 NA 612003
3957 Parkinson disease 11 GIGYF2 p.M481 Negative 44 NA 612003
3930 Parkinson disease 11 GIGYF2 p.51035C Negative S2 NA 612003
3933 Parkinson disease 11 GIGYF2 p.5103SC Negative 68 NA 612003
3928 Parkinson disease LRRK2 p.A419V Tremor 1° Parkinson 2° 68 CM125746 609007
3903 Parkinson disease LRRK2 p.O972G Negative 54 NA 609007
3919 Parkinson disease LRRK2 p.O972G Negative 51 NA 609007
3889 Parkinson disease LRRK2 p.620195 Negative 51 CM050659 609007
3951 Parkinson disease LRRK2 p.L119P Negative 50 NA 609007
3918 Parkinson disease LRRK2 p.L286V Negative 64 NA 609007
3907 Parkinson disease LRRK2 p.P15425 Alzheimer's 2° 47 NA 609007
3935 Parkinson disease LRRK2 p.P15425 Negative 70 NA 609007
3893 Parkinson disease LRRK2 p.R1514Q Negative 45 CM057190 609007
3943 Parkinson disease LRRK2 p.R1514Q Negative SO CM057190 609007
3949 Parkinsonism, juvenile, PARK2 p.R275W 2° three siblings S2 CM991007 602544
autosomal recessive
3924 Parkinsonism, juvenile, PARK2 p.R334C Negative 54 CM003865 602544
autosomal recessive
3927 Parkinson PM20D1 p.A332V Negative 73 NA 613164
3886 Parkinson PM20D1 p.P2S1Q Negative 62 NA 613164
Table 511. Percentage of survey respondents reporting having made behavioral changes
specifically motivated by their test results
Type of behavior change Yes No
Changes to diet 4 (10%) 36 (90%)
Changes to health care (such as undergoing tests or 4 (10%) 36 (90%)
seeing a specialist)
Changes to use of vitamins/herbal supplements 4 (10%) 36 (90%)
Changes to exercise 3 (8%) 37 (92%)
Changes to medications 1 (2%) 39 (98%)
Changes to insurance coverage 1 (2%) 39 (98%)
Number of respondents making at least one of the 10 (25%)
above behavior changes
Gonzalez-Gatay et al. www.pnas.orglegikontent/short/13I5934I10 8 of 8
EFTA01140255