Will Google sequence your genome for free?

By Richard Owen

12 Mar 2020

Find the authors
on LinkedIn:

Richard Owen

Head of Biology

Recent advances in reducing the cost of DNA sequencing are beginning to offer the possibility of healthcare services in affluent countries sequencing the full genome of all their citizens. This has the potential of delivering huge benefits in the early detection and management of many diseases and consequently will profoundly improve clinical practice. However, the necessary balance between available cash, suitable technical resources and access to personal data are acting as a significant block to the genome’s potential.

The first human genomes were sequenced by the Human Genome Project and took 13 years, cost $2.7 billion and involved 20 of the world’s leading laboratories [1]. Since then there has been spectacular improvements in the speed and cost of genome sequencing, so much so that the advances made significantly outstrip the well-known Moore’s law improvements made in the computer chip industry [2, 3]. As an example of this progress, Veritas Genetics are currently offering a whole genome sequencing service direct to the consumer for less than £500 [4]. It is highly likely that, in a few years’ time, with further advances in technology and the inevitable scaling efficiencies involved when testing at a national level that the cost per sequenced genome could drop to around £50.

At this price point, the costs involved in a nation-wide genome sequencing program do not seem unreasonable. To sequence every child born in the UK (731,000 per year or 2,000 per day) would cost £37m which is a mere 0.03% of NHS England’s 2018 budget [5, 6]. Their genome will not change over time and if sequenced at birth its information can be made use of throughout their life and hence deliver maximum benefit. Even sequencing the entire population would be equivalent to just 3% of the budget for 2018 [6]. The apparent attainability of this is reinforced when considering that the UK government spent over £300m on the 100,000 genome project and plans to turn the NHS into “the first mainstream health service in the world to offer genomic medicine as part of routine care” [7]. Access to such a phenomenal genetic resource paired with the demographic and health records held on each person by the NHS would allow academics and clinicians to fast track research and identify the sequences that allow early and corrective management of many of today’s crippling chronic diseases.

So, while the above might suggest this is an obvious project for governments to invest in, there are very good reasons why they might be hesitant to do so. Firstly, we do not know how to make use of the vast majority of the data that will be collected – we are in a position where we can read people’s genomes at a relatively low cost but struggle to understand and make full use of the information contained and this may well take many years to change. Secondly two very significant sets of infrastructures have to be established. The first is a national facility capable of sequencing 2,000 genomes a day, the second is a data storage and analysis capability that can handle the astonishing amount of data that will be created by such a system. One way to visualize the amount of data is that the 750,000 genomes collected each and every year would require a stack of standard 4 gigabyte DVDs about 1.5 miles high [8]. Indeed, the data storage and processing required for large scale genomic analysis is seen by some as the biggest of the sources of so-called Big Data and possibly its most challenging aspect [9]. The UK Government has a poor record in both large infrastructure projects and in IT projects, so they are unlikely to make the decision to invest until they know exactly how the data would be utilized and that the required infrastructures are de-risked.

So, while the UK government (and others) might view this as a risky and unjustifiable investment before we really know how to make use of even a fraction of the information it is very possible that some companies might see it as an attractive investment opportunity. It is well known that some of the large IT companies (Apple, Amazon and Microsoft) not only are much closer to having the capabilities to handle the vast amounts of cloud-based data that would result from universal sequencing but also have made significant investments in healthcare opportunities. Google Ventures would be seen as a front runner as they have already invested $1.5bn in healthcare including 23andMe (one of the leading direct-to-consumer sequencing companies) and are “especially interested in companies at the intersection of health and information technology” [10]. Google has already partnered with the Broad Institute of MIT and Harvard and is providing its cloud services with a toolkit developed by the institute that can be used to analyze the data [8].

However, while Google and the others seem the obvious resource to carry out this task there are huge implications in profit-seeking companies holding personal data that could be used to predict what diseases, life styles, behaviors and preferences they may have – theoretically allowing the ultimate targeting of advertising and insurance provision. Currently personal data is protected by the EU’s General Data Protection Regulations (GDPR) and the US’s Health Insurance Portability and Accountability Act (HIPAA) but these would have to be significantly updated to prevent the highly profitable abuse of data that could happen.

If neither government nor private companies can be trusted to carry this out, are we destined to miss out on the benefits of the secrets that our genomes hold? Possibly not if the obvious solution of a partnership between government and the big IT companies can be set up with the appropriate business model and data protection. A private company could relatively easily set up the two necessary infrastructures of sequencing capability and cloud analytics and I certainly wouldn’t bet against Google either scaling up 23andme or else purchasing one of the major sequencing companies to do this. They could then run the sequencing service for all UK newborns for free and hold their sequences. These sequences would be linked to codes that prevented the company identifying the person involved but the government would hold a master list that linked codes to identities (this is very similar to how clinical trials are run where a private company holds data linked to a reference code but only the hospital can link a reference code to an identity).

As ongoing clinical research discovers new genetic biomarkers the private company could then charge on a “pay per view” basis each time data is accessed by GPs or hospitals. From the payer’s viewpoint it would allow access to the data not only when the information has been researched sufficiently so that it can be made use of but also at the moment it is actually clinically needed.

Academic researchers could hugely accelerate the rate of discovery of new biomarkers by data mining within the stored genomes. They could link genomes to identities and their NHS records (using the master codes), follow them over time and discover the relevance and utility of further DNA sequences. The ability of the private company to do similar data mining would be severely restricted by the lack of access to any health data.

This would appear to be a win/win/win situation; governments do not have to spend money on risky programs before there is any utility in doing so, patients will receive personal and predictive clinical therapy and companies will be able to make profitable returns on investments in areas that they are experts in. Indeed, it could be argued that without this sort of bold public/commercial initiative it will be many years before we start to make real use of genomic data in routine clinical practice.

It would be interesting to canvas opinion on this, let me know of your thoughts.

Richard Owen

Senior Healthcare Innovation Consultant
Connect on LinkedIn

References

[1] UK National Human Genome Research Institute report. Available at https://www.genome.gov/human-genome-project/Completion-FAQ
[2] G.E. Moore. Cramming More Components onto Integrated Circuits, Electronics, 114–117, April 1965 https://www.intel.co.uk/content/www/uk/en/silicon-innovations/moores-law-technology.html
[3] https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
[4] https://www.veritasgenetics.com/
[5] Overview of the UK population: August 2019; Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/articles/overviewoftheukpopulation/august2019
[6] https://fullfact.org/health/spending-english-nhs/
[7] Wellcome Trust press release 2016. Available at https://wellcome.ac.uk/press-release/prime-minister-opens-%C2%A342m-biodata-innovation-centre-and-new-sequencing-facility
[8] https://www.washingtonpost.com/news/speaking-of-science/wp/2015/07/07/sequencing-the-genome-creates-so-much-data-we-dont-know-what-to-do-with-it/
[9] Stephens et al. PLOS Biology 2015. DOI:10.1371 10 https://www.gv.com/portfolio/

Find the authors on LinkedIn:

Richard Owen

Head of Biology