Thursday, June 30, 2016

The Qatar genome: a population-specific tool for precision medicine in the Middle East

Our Qatar genome paper came out today in Human Genome Variation. The article presents the value of ancestry-aware genome interpretation for precision medicine. We developed a reference genome tailored to Middle Eastern populations, using data sampled in Qatar. Using this reference, the quality of genotype data and the efficiency of genome interpretation is improved. Full text available here, below is the editor's summary and the abstract.

Editorial Summary

Precision medicine: A reference genome for the Middle East

Researchers have created a new reference genome from Qatari people to improve precision medicine and genetic disease research in the region. Juan Rodriguez-Flores of Weill Cornell Medical College led an international team that sequenced the genomes of over 1,000 Qataris and identified 26 million differences from the reference human genome. In many cases, the variant versions were more common than the reference version, so these differences were incorporated into the Qatari Genome. A reference genome which reflects the frequency of variants in a population is an important resource when studying its genetic diseases or tailoring treatments for individual patients. The team also compiled a catalog of the pathogenic variants they identified in the Qatari Genome. These new tools will facilitate the discovery of disease-causing variants and the development of customized therapies for individuals in similar populations.


Reaching the full potential of precision medicine depends on the quality of personalized genome interpretation. In order to facilitate precision medicine in regions of the Middle East and North Africa (MENA), a population-specific genome for the indigenous Arab population of Qatar (QTRG) was constructed by incorporating allele frequency data from sequencing of 1,161 Qataris, representing 0.4% of the population. A total of 20.9 million single nucleotide polymorphisms (SNPs) and 3.1 million indels were observed in Qatar, including an average of 1.79% novel variants per individual genome. Replacement of the GRCh37 standard reference with QTRG in a best practices genome analysis workflow resulted in an average of 7* deeper coverage depth (an improvement of 23%) and 756,671 fewer variants on average, a reduction of 16% that is attributed to common Qatari alleles being present in QTRG. The benefit for using QTRG varies across ancestries, a factor that should be taken into consideration when selecting an appropriate reference for analysis.

Thursday, January 28, 2016

DNA Land Results

I took my AncestryDNA data and uploaded it to The results were quite consistent with AncestryDNA and Genes For Good, although the focus is on regions rather than individual countries. They do provide a nice summary figures.

Friday, January 15, 2016

Puerto Rico is the ultimate melting pot: Ancestry DNA results

I just received my Ancestry DNA results. Wow, Puerto Rico is a serious melting pot, I had no idea of the multitude of sources for my ancestry. The results are quite consistent with Genes For Good, but with more detail. The main discrepancy is based on classification differences, Genes For Good puts North African together with West Asian, while AncestryDNA puts North African with African, hence my % African is higher in Ancestry DNA.

66% European
7% West Asian
15% African
11% Native American

Here is a bar chart of the details, excluding contributions below 1% (South Asian, South African).

Wednesday, January 6, 2016

Accepted at Genome Research: Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations

Below is the abstract of an article that answers the fundamental question "Where did Arabs come from?". By sequencing the genomes of 108 Qataris, including 60 of indigenous Arab/Bedouin ancestry, our study begins to answer the question. Full text available here.

Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations
Genome Research, Online access January 4, 2015

An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out of Africa migrations that occurred between 125,000 - 60,000 years ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups and these genomes showed clear hallmarks of an ancient out of Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out of Africa bottleneck but before the major Neanderthal ad-mixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, while the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out of Africa migrations.

Wednesday, December 9, 2015

Genes for Good Ancestry Results

Today I logged into Facebook and got some exciting news, my saliva DNA sample was analyzed by Genes for Good, and they sent me results on ancestry analysis. They sent me three slides, including (A) a pie chart with my proportion of ancestry from different sources, (B) A representation of my maternal/paternal chromosomes, showing large chunks of DNA specific to each ancestry, and (C) where my genome lies in a spectrum of globally sampled genomes.

The results are quite interesting. My Native American and Sub-Saharan African ancestry are in line with what is expected for Puerto Ricans. According to family legend, most of my ancestors came from Mediterranean Europe (Spain and Italy) in the 18th century, hence the 61% European ancestry is to be expected. I'm curious as to what the 17% West Asia and North Africa ancestry means. 

A. Summary of my ancestry mixture.
B. Chromosomal fragments of specific ancestry. I can't wait to get my hands on the data, to see my ancestry for different genes. Also, my parents recently signed up for AncestryDNA, it would be very interesting to see what I got from each parent. It looks like the Native American and Sub-Saharan ancestry are on the same chromosome, a truly Puerto Rican mix!
 C. Where does my genome lie in comparison to other populations? There I am, right in the middle, a bit more African than Europeans (PC1) and a bit more African than Europeans (PC2). This method collapses millions of dimensions of genetic variation into a 2D image, not exactly the most accurate result, it does not make sense that I am overlapping with Central and South Asian populations.
 Overall and interesting result, I'd highly recommend signing up for Genes For Good to anyone.

Friday, November 20, 2015

My Genome

A few years ago my genome and microbiome was sequenced by the Personal Genome Project. If anyone is interested, here is a link to the data.

Tuesday, January 7, 2014

In print: Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

Just published in Human Mutationour second article on disease risk allele prevalence in Qatar.

This study presents a framework for enabling precision genome-based medicine (PGM) in a population not sampled by nor related to public consortium sequencing projects, using the example of Qatar. Through sampling and exome sequencing of representatives from three major ancestry groups identified in prior work (Q1 Bedouin, Q2 Persian-South Asian, Q3 African), we identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. Genetic screening in Qatar includes only 4 out of the 37. This study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance.