Dr. Liu Jianjun oversees genome sequencing of 10,000 Singaporeans. He believes, as do many other researchers, that studying the genomes of a population will revolutionize medical science and challenge our view of the world. But decoding is only one part of the job. Crunching and storing the information is the other. Here, the use of Big Data is key.
The undertaking was dreamed up in late 2016 when scientists working at the Genomic Institute of Singapore (GIS) met for a retreat, where they pondered “the big picture, big science and the future,” as Dr. Liu Jianjun puts it. The team came up with a seemingly daring idea: What if they sequenced the genome of not just a few but of many Singaporeans?
“Last year, 10,000 people still sounded really big,” laughs Liu.
Seven months later, the team has deciphered 3,000 genomes. They expect to reach their goal of 10,000 sequences in mid-2018. The project is known as SG10K, and entails performing whole-genome sequencing of 10,000 Singaporeans.
Dr. Liu’s team is made up of 13 research associates and post-doctoral students. “The aim is to characterize genetic variations in the Singaporean population, create a whole genome sequencing (WGS) reference panel for accurate genotype attribution and generate a large control dataset for WGS-based genetic association study of diseases,” says Liu.
The project aims to provide genetic information, making clinical and pharmaceutical research within Singapore’s population easier, and promises to add to the study of Asian-centric genetic diseases. The sheer speed at which the project’s scientists have worked showcases the exponential progress genomics has made since the Human Genome Project. When the first complete human genome sequence was published in 2003, it took scientists from six participating countries 13 years to decipher the code, and the project cost about $3 billion.
Not even 15 years later, that kind of decoding is routinely carried out on an industrial scale. The high-tech equipment in the GIS basement currently reads out about 300 genomes per week, each costing about $1,000. “And in the near future, it will be down to $100 and again much faster,” says Liu. This progress allows the scientist to dream big dreams: “Ten thousand is just the beginning. After that, we hope to sequence the genomes of 250,000 Singaporeans, then every single citizen, 3.5 million people.”
Dr. Liu wears tailored suits, but in the style of most Singaporeans, skips a tie. He switches easily from intense concentration to easygoing banter. When he talks about his favorite topic – genetics – it can be hard to get a word in. “Thirty years ago I wanted to be a marine biologist, then I tried my hand at quantitative genetics,” says Liu in a voice which, despite all his years abroad, still echoes his native China. But studying fruit flies was not all that satisfying. “That turned out to be a little too… indirect,” Liu jokes, giving the impression that in fact he found it tiresome in the extreme.
Science as a growth strategy
Dr. Liu’s dreams stand a good chance of becoming reality. Over the past decades, Singapore, which is about the same size as Lake Geneva, Switzerland, has made science and technology a major pillar of its national growth strategy. The GIS, of which Dr. Liu is deputy executive director, is the national flagship program for genomic sciences in the city- state, and is equipped with state-of-the-art research infrastructure. Over 300 scientists, trainees and staff work in its headquarters in Singapore’s Biopolis, a hypermodern science park designed by Iraqi-born architect Zaha Hadid. It consists of 13 research centers, shops, restaurants, a pub, a childcare center and even a typical Singaporean food court in which food stalls dish out cheap but delicious street food.
A multi-institutional effort to enable big data analytics and integrative genomics in Singapore, named the Centre for Big Data and Integrative Genomics (c-BIG), is at the core of GIS. It enables high throughput sequencing, molecular cytogenetics, bioinformatics, single-cell genomics, high throughput / content screening and genome engineering. SG10K is one of its most prominent projects. Its heart beats in the basement of a sleek beige building aptly called Genome. It seems like a cheerful place to work: laughter can be heard in the background as staff call out to “JJ” as their boss is affectionately known. “Can I join you?” jokes a researcher when he pretends to photobomb Dr. Liu.
Singapore has poured hundreds of millions into supercomputing. Listening to Dr. Liu, one gets the feeling that it was a smart move. The SG10K will generate about 2 – 3 petabyte or 2,000 – 3,000 terabytes of data. That extremely large data set will then be analyzed to reveal patterns, trends and associations, with a special focus on detection of genetic variants – a herculean task, considering that a typical genome of a healthy individual harbors some three to five million variants, and previous studies identified about 88 million sites in the human genome that vary among people.
Currently, advanced bioinformatics solutions including QIAGEN’s CLC Genomic Workbench, Ingenuity Pathways Analysis and the Human Gene Mutation Database help Dr. Liu’s team and other groups at the GIS to cope with this challenge. “Big Data is the driver of progress,” says Liu. Data generation will eventually become of secondary interest, as data processing, analysis and interpretation increasingly prove to be a major bottleneck for genomic research. “The future lies in computational genomics. The field is moving forward fast, and is multidimensional. Machine learning and Artificial Intelligence will be the next big thing,” Liu says.
Understanding the genetic makeup
Asia is a booming market for pharmaceuticals and personalized consumer products. But to tailor products and medicines to the Asian market, the actual genetic makeup of the region’s population needs to be better understood. “Globally a lot of genetic information has been collected, but mostly from Caucasians. Information on Asian populations has been lacking,” according to Liu. Genetic variation plays an important role in a variety of human diseases and quantitative traits. Many genetic findings have shown population-specific characteristics, highlighting the importance of population diversity in human genetic studies.
“Singapore has an extremely diverse genetic pool,” says Liu. Long the center of one of the world’s busiest trading routes, immigrants from all over Asia have made Singapore their home. Today, the population consists of three major ethnic groups: Chinese, Malay, and Indian, which together represent over 80 percent of the genetic diversity of Asian population. The tropical island nation’s colorful history, combined with a business-oriented, forward-thinking government, a first-class health-care system and the capacity to quickly implement policy, makes it an ideal place to study Asia’s genetic diversity.
If Dr. Liu has no doubt that genomics will revolutionize medicine, Big Pharma companies agree, based on their large-scale investment in pharmacological genomics.
The aim is to understand which variants play a role in diseases and to build a personalized treatment plan through genetic profiling. Another goal is to establish who will benefit from a drug, or, on the contrary, who may have an adverse reaction.
Big Data projects like SG10K will facilitate genome-wide association studies and non-hypothesis, genome-wide control samples, says Liu, who is relentlessly optimistic about the next years in his field. “Moving forward, the question will be ‘How can I manipulate the genes that make us sick?’” Mankind will have powerful tools to edit and rearrange genes, Liu predicts. “Can we understand all genetic diseases? Yes. Will we figure out a genetic therapy for each disease? Probably. Will we be able to live out a lifespan of 120 years if we eliminate all diseases? I don’t really know, but I think in 20 to 30 years we will be able to edit genomes,” he says.
He envisions that in five to 10 years the genome of every newborn baby will be sequenced at birth. The information will be stored in secure cloud environments and selectively shared with healthcare professionals upon the patient’s consent in a similar way that other healthcare records are shared today. Eventually, this will be done on an international level, once questions about data security and privacy have been resolved. “Of course this is very powerful data. We need to deeply discuss the ethical implications and come up with a plan for data control and security,” Liu says.
In the meantime, Dr. Liu and his team are trying to tackle more prosaic problems. Experts at GIS are developing new algorithms and testing new tools for sharing data. The idea is to keep the data centralized. “For obvious reasons medical information is very sensitive. We are nervous to give access,” says Liu. A larger team of scientists is working on a system which would allow the health sector to upload their patient data anonymously into a matrix and then play around with it “like in a sandbox.” As one of the key drivers of future progress, a solution to store the massive amounts of data produced by Big Data has to be found. To this end, Singapore is now exploring various cloud-storage solutions.
The path to genomic medicine and its implementation in routine medical care via personalized therapies is one that is being fortified by results gained through population-based sequencing initiatives. These include not only SG10K, but also a number of other initiatives such as the Chinese Million Genomes endeavor which aims to sequence the genomes of one million people, the U.S.-based Precision Medicines Initiative which is targeting the same number of patients, whole genome population studies in the Netherlands, Qatar, Turkey and Japan, as well as projects such as the International Cancer Genome Consortium that coordinates large-scale cancer genome studies.