Will Google sequence your genome for free?

By Richard Owen

12 Mar 2020

Find the authors
on LinkedIn:

Richard Owen

Head of Biology

Recent advances in reducing the cost of DNA sequencing are beginning to offer the possibility of healthcare services in affluent countries sequencing the full genome of all their citizens. This has the potential of delivering huge benefits in the early detection and management of many diseases and consequently will profoundly improve clinical practice. However, the necessary balance between available cash, suitable technical resources and access to personal data are acting as a significant block to the genome’s potential.

The first human genomes were sequenced by the Human Genome Project and took 13 years, cost $2.7 billion and involved 20 of the world’s leading laboratories [1]. Since then there has been spectacular improvements in the speed and cost of genome sequencing, so much so that the advances made significantly outstrip the well-known Moore’s law improvements made in the computer chip industry [2, 3]. As an example of this progress, Veritas Genetics are currently offering a whole genome sequencing service direct to the consumer for less than £500 [4]. It is highly likely that, in a few years’ time, with further advances in technology and the inevitable scaling efficiencies involved when testing at a national level that the cost per sequenced genome could drop to around £50.

At this price point, the costs involved in a nation-wide genome sequencing program do not seem unreasonable. To sequence every child born in the UK (731,000 per year or 2,000 per day) would cost £37m which is a mere 0.03% of NHS England’s 2018 budget [5, 6]. Their genome will not change over time and if sequenced at birth its information can be made use of throughout their life and hence deliver maximum benefit. Even sequencing the entire population would be equivalent to just 3% of the budget for 2018 [6]. The apparent attainability of this is reinforced when considering that the UK government spent over £300m on the 100,000 genome project and plans to turn the NHS into “the first mainstream health service in the world to offer genomic medicine as part of routine care” [7]. Access to such a phenomenal genetic resource paired with the demographic and health records held on each person by the NHS would allow academics and clinicians to fast track research and identify the sequences that allow early and corrective management of many of today’s crippling chronic diseases.

So, while the above might suggest this is an obvious project for governments to invest in, there are very good reasons why they might be hesitant to do so. Firstly, we do not know how to make use of the vast majority of the data that will be collected – we are in a position where we can read people’s genomes at a relatively low cost but struggle to understand and make full use of the information contained and this may well take many years to change. Secondly two very significant sets of infrastructures have to be established. The first is a national facility capable of sequencing 2,000 genomes a day, the second is a data storage and analysis capability that can handle the astonishing amount of data that will be created by such a system. One way to visualize the amount of data is that the 750,000 genomes collected each and every year would require a stack of standard 4 gigabyte DVDs about 1.5 miles high [8]. Indeed, the data storage and processing required for large scale genomic analysis is seen by some as the biggest of the sources of so-called Big Data and possibly its most challenging aspect [9]. The UK Government has a poor record in both large infrastructure projects and in IT projects, so they are unlikely to make the decision to invest until they know exactly how the data would be utilized and that the required infrastructures are de-risked.

So, while the UK government (and others) might view this as a risky and unjustifiable investment before we really know how to make use of even a fraction of the information it is very possible that some companies might see it as an attractive investment opportunity. It is well known that some of the large IT companies (Apple, Amazon and Microsoft) not only are much closer to having the capabilities to handle the vast amounts of cloud-based data that would result from universal sequencing but also have made significant investments in healthcare opportunities. Google Ventures would be seen as a front runner as they have already invested $1.5bn in healthcare including 23andMe (one of the leading direct-to-consumer sequencing companies) and are “especially interested in companies at the intersection of health and information technology” [10]. Google has already partnered with the Broad Institute of MIT and Harvard and is providing its cloud services with a toolkit developed by the institute that can be used to analyze the data [8].

However, while Google and the others seem the obvious resource to carry out this task there are huge implications in profit-seeking companies holding personal data that could be used to predict what diseases, life styles, behaviors and preferences they may have – theoretically allowing the ultimate targeting of advertising and insurance provision. Currently personal data is protected by the EU’s General Data Protection Regulations (GDPR) and the US’s Health Insurance Portability and Accountability Act (HIPAA) but these would have to be significantly updated to prevent the highly profitable abuse of data that could happen.

If neither government nor private companies can be trusted to carry this out, are we destined to miss out on the benefits of the secrets that our genomes hold? Possibly not if the obvious solution of a partnership between government and the big IT companies can be set up with the appropriate business model and data protection. A private company could relatively easily set up the two necessary infrastructures of sequencing capability and cloud analytics and I certainly wouldn’t bet against Google either scaling up 23andme or else purchasing one of the major sequencing companies to do this. They could then run the sequencing service for all UK newborns for free and hold their sequences. These sequences would be linked to codes that prevented the company identifying the person involved but the government would hold a master list that linked codes to identities (this is very similar to how clinical trials are run where a private company holds data linked to a reference code but only the hospital can link a reference code to an identity).

As ongoing clinical research discovers new genetic biomarkers the private company could then charge on a “pay per view” basis each time data is accessed by GPs or hospitals. From the payer’s viewpoint it would allow access to the data not only when the information has been researched sufficiently so that it can be made use of but also at the moment it is actually clinically needed.

Academic researchers could hugely accelerate the rate of discovery of new biomarkers by data mining within the stored genomes. They could link genomes to identities and their NHS records (using the master codes), follow them over time and discover the relevance and utility of further DNA sequences. The ability of the private company to do similar data mining would be severely restricted by the lack of access to any health data.

This would appear to be a win/win/win situation; governments do not have to spend money on risky programs before there is any utility in doing so, patients will receive personal and predictive clinical therapy and companies will be able to make profitable returns on investments in areas that they are experts in. Indeed, it could be argued that without this sort of bold public/commercial initiative it will be many years before we start to make real use of genomic data in routine clinical practice.

It would be interesting to canvas opinion on this, let me know of your thoughts.

Richard Owen

Senior Healthcare Innovation Consultant
Connect on LinkedIn

References

[1] UK National Human Genome Research Institute report. Available at https://www.genome.gov/human-genome-project/Completion-FAQ
[2] G.E. Moore. Cramming More Components onto Integrated Circuits, Electronics, 114–117, April 1965 https://www.intel.co.uk/content/www/uk/en/silicon-innovations/moores-law-technology.html
[3] https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
[4] https://www.veritasgenetics.com/
[5] Overview of the UK population: August 2019; Office for National Statistics. Available at https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/articles/overviewoftheukpopulation/august2019
[6] https://fullfact.org/health/spending-english-nhs/
[7] Wellcome Trust press release 2016. Available at https://wellcome.ac.uk/press-release/prime-minister-opens-%C2%A342m-biodata-innovation-centre-and-new-sequencing-facility
[8] https://www.washingtonpost.com/news/speaking-of-science/wp/2015/07/07/sequencing-the-genome-creates-so-much-data-we-dont-know-what-to-do-with-it/
[9] Stephens et al. PLOS Biology 2015. DOI:10.1371 10 https://www.gv.com/portfolio/

Find the authors on LinkedIn:

Richard Owen

Head of Biology

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::b	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
PREF	8 months	PREF cookie is set by Youtube to store user preferences like language, format of search results and other customizations for YouTube Videos embedded in different sites.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_8253459_1	1 minute	Set by Google to distinguish users.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjSession_*	1 hour	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
_vwo_uuid_v2	1 year	This cookie is set by Visual Website Optimiser and calculates unique traffic on a website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_cfuvid	session	Description is currently not available.
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description