Three research institutions — the Chinese Academy of Agricultural Sciences (CAAS), the Beijing Genomics Institute (BGI) Shenzhen, and the International Rice Research Institute (IRRI) — collaborated to sequence the genomes of 3,024 rice varieties and lines housed at the IRRI (82%) and the CAAS (18%) gene banks.
Funded by grants from the Bill & Melinda Gates Foundation and the Chinese Ministry of Science and Technology, the sequencing and initial analysis produced a dataset that contains millions of genomic sequences from a diverse set of rice varieties that, when combined with phenotyping observations, gene expression, and other information, provides an important step in establishing gene-trait associations, building predictive models, and applying these models to breeding.
Dataset of 3,000 rice varieties
Through funding from the Global Rice Science Partnership, the 3,024 genomes were re-analyzed against five popular varieties that represent the three main subgroups of cultivated rice — indica, japonica, and aus.
This new 3,000 Rice Genomes Project (3K RGP) data analysis set is massive at 120 terabytes, which is well beyond the computing capacities of most research institutions. However, these new results are now publicly available online, as an Amazon Web Services (AWS) Public Data Set.
Accessing the data is free, and use is governed by the stipulations for data analysts and users from the Toronto Statement.
“The dataset provides access to millions of genetic markers that can be used to design sustainable crops for the future, that is, ones that are high-yielding and more nutritious while at the same time requiring less water, fertilizer, and pesticides,” said. Dr. Rod Wing, director of the Arizona Genomics Institute at the University of Arizona and a pioneer in rice genome sequencing.
Data access & analysis tools
Dr. Kenneth McNally, senior scientist in IRRI’s T.T. Chang Genetic Resources Center and a project team member, added that the data set also comes with tools to help researchers visualize and analyze genetic information.
Data access and analysis tools are being made available for the 3K RGP dataset through the International Rice Informatics Consortium (IRIC), which promotes collaboration in bioinformatics analysis of rice data and provides computational tools to facilitate rice improvement via discovery of new gene-trait associations and accelerated breeding.
One of the tools, the SNP-Seek database, is designed to provide user-friendly access to a type of genetic marker called single nucleotide polymorphisms (SNPs) identified from this data. Another tool in SNP-Seek, the JBrowse genome browser, displays chromosome-specific SNP data derived from the set.
“The 3K RGP dataset is a powerful tool that will unite researchers from around the world to help drive the next green revolution,” Dr. Wing said.
The International Rice Genebank of the T.T. Chang Genetic Resources Center at the IRRI contains more than 127,000 rice varieties and accessions from all over the world.
These accessions hold a virtually untapped reservoir of genes/traits that can be used to make rice cultivation more sustainable, with a smaller environmental footprint. Traits targeted for improvement include higher nutritional quality; tolerance of pests, diseases, and environmental stresses, such as flood and drought; and reduced greenhouse gas emissions.