Author Archives: admin


Akari Therapeutics Reports Third Quarter 2020 Financial Results and Highlights Recent Clinical Progress – GlobeNewswire

December 11, 2020 09:00 ET | Source: Akari Therapeutics Plc

NEW YORK and LONDON, Dec. 11, 2020 (GLOBE NEWSWIRE) -- Akari Therapeutics, Plc (Nasdaq: AKTX), a late-stage biopharmaceutical company focused on innovative therapeutics to treat orphan autoimmune and inflammatory diseases where complement (C5) and/or leukotriene (LTB4) systems are implicated, today announced financial results for the third quarter ended September 30, 2020, as well as recent clinical progress.

With the imminent opening of our Phase III trial in pediatric patients with HSCT-TMA in Europe and a clear regulatory path in the U.S. and Europe for our Phase III study in patients with BP, we are now in the exciting position of progressing two Phase III programs in orphan diseases in which there are no approved treatments, said Clive Richardson, Chief Executive Officer of Akari Therapeutics.

Third Quarter 2020 and Recent Clinical Highlights

Akaris two lead programs in BP and HSCT-TMA are in Phase III development. The Company also has programs addressing lung and ophthalmology diseases.

Phase III clinical trial in patients with BP

Phase III clinical trial in pediatric patients with HSCT-TMA

Ophthalmology program

Lung program

PNH - long term data

Third Quarter 2020 Financial Results

COVID-19 Corporate Update

Akaris clinical trial sites are based in areas currently affected by the global outbreak of the COVID-19 pandemic, and public health epidemics such as this can adversely impact the Companys business as a result of disruptions, such as travel bans, quarantines, and interruptions to access the trial sites and supply chains, which could result in material delays and complications with respect to research and development programs and clinical trials. Moreover, as a result of the pandemic, there is a general unease of conducting unnecessary activities in medical centers. As a consequence, the Companys ongoing trials have been halted or disrupted. For example, the Phase I/II clinical trial in patients with AKC study has been halted and recruitment in the Phase III clinical trial in pediatric patients with HSCT-TMA has been and may continue to be delayed. It is too early to assess the full impact of the coronavirus outbreak on trials for nomacopan, but coronavirus is expected to affect Akaris ability to complete recruitment in the original timeframes. The extent to which the COVID-19 pandemic impacts operations will depend on future developments, which are highly uncertain and cannot be predicted with confidence, including the duration and continued severity of the outbreak, and the actions that may be required to contain the coronavirus or treat its impact. In particular, the continued spread of COVID-19 globally, could adversely impact the Companys operations and workforce, including research and clinical trials and the ability to raise capital, could affect the operations of key governmental agencies, such as the FDA, which may delay the development of the Companys product candidates and could result in the inability of suppliers to deliver components or raw materials on a timely basis or at all, each of which in turn could have an adverse impact on the Companys business, financial condition and results of operation.

About Akari Therapeutics

Akari is a biopharmaceutical company focused on developing inhibitors of acute and chronic inflammation, specifically for the treatment of rare and orphan diseases, in particular those where the complement (C5) or leukotriene (LTB4) systems, or both complement and leukotrienes together, play a primary role in disease progression. Akari's lead drug candidate, nomacopan (formerly known as Coversin), is a C5 complement inhibitor that also independently and specifically inhibits leukotriene B4 (LTB4) activity.

Cautionary Note Regarding Forward-Looking Statements

Certain statements in this press release constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. You should not place undue reliance upon the Companys forward looking statements. Except as required by law, the Company undertakes no obligation to revise or update any forward-looking statements in order to reflect any event or circumstance that may arise after the date of this press release. These forward-looking statements reflect our current views about our plans, intentions, expectations, strategies and prospects, which are based on the information currently available to us and on assumptions we have made. Although we believe that our plans, intentions, expectations, strategies and prospects as reflected in or suggested by those forward-looking statements are reasonable, we can give no assurance that the plans, intentions, expectations or strategies will be attained or achieved. Furthermore, actual results may differ materially from those described in the forward-looking statements and will be affected by a variety of risks and factors that are beyond our control. Such risks and uncertainties for our company include, but are not limited to: needs for additional capital to fund our operations, our ability to continue as a going concern; uncertainties of cash flows and inability to meet working capital needs; an inability or delay in obtaining required regulatory approvals for nomacopan and any other product candidates, which may result in unexpected cost expenditures; our ability to obtain orphan drug designation in additional indications; risks inherent in drug development in general; uncertainties in obtaining successful clinical results for nomacopan and any other product candidates and unexpected costs that may result therefrom; difficulties enrolling patients in our clinical trials; our ability to enter into collaborative, licensing, and other commercial relationships and on terms commercially reasonable to us; failure to realize any value of nomacopan and any other product candidates developed and being developed in light of inherent risks and difficulties involved in successfully bringing product candidates to market; inability to develop new product candidates and support existing product candidates; the approval by the FDA and EMA and any other similar foreign regulatory authorities of other competing or superior products brought to market; risks resulting from unforeseen side effects; risk that the market for nomacopan may not be as large as expected; risks associated with the impact of the COVID-19 pandemic; risks associated with theSECinvestigation; inability to obtain, maintain and enforce patents and other intellectual property rights or the unexpected costs associated with such enforcement or litigation; inability to obtain and maintain commercial manufacturing arrangements with third party manufacturers or establish commercial scale manufacturing capabilities; the inability to timely source adequate supply of our active pharmaceutical ingredients from third party manufacturers on whom the company depends; unexpected cost increases and pricing pressures and risks and other risk factors detailed in our public filings with theU.S. Securities and Exchange Commission, including our most recently filed Annual Report on Form 20-F filed with theSEC. Except as otherwise noted, these forward-looking statements speak only as of the date of this press release and we undertake no obligation to update or revise any of these statements to reflect events or circumstances occurring after this press release. We caution investors not to place considerable reliance on the forward-looking statements contained in this press release.

AKARI THERAPEUTICS, Plc

CONDENSED CONSOLIDATED BALANCE SHEETS As of September 30, 2020 and December 31, 2019 (in U.S. Dollars, except share data)

AKARI THERAPEUTICS, Plc

CONDENSED CONSOLIDATED STATEMENTS OF COMPREHENSIVE INCOME (LOSS) - UNAUDITED For the Three and Nine Months Ended September 30, 2020 and September 30, 2019 (in U.S. Dollars)

For more information Investor Contact:

Peter Vozzo Westwicke (443) 213-0505 peter.vozzo@westwicke.com

Media Contact:

Sukaina Virji / Lizzie Seeley Consilium Strategic Communications +44 (0)20 3709 5700 Akari@consilium-comms.com

Link:
Akari Therapeutics Reports Third Quarter 2020 Financial Results and Highlights Recent Clinical Progress - GlobeNewswire

Researchers identify the origin of a deadly brain cancer – McGill Newsroom

Finding could lead to potential therapies

Researchers at McGill University are hopeful that the identification of the origin and a specific gene needed for tumour growth could lead to new therapeutics to treat a deadly brain cancer that arises in teens and young adults. The discovery relates to a subgroup of glioblastoma, a rare but aggressive form of cancer that typically proves fatal within three years of onset. The findings are published in the latest issue of the journalCell.

To complete their study, the research team, led by McGills Dr. Nada Jabado, Professor of Pediatrics and Human Genetics and Dr. Claudia Kleinman, Assistant Professor of Human Genetics, assembled the largest collection of samples for this subgroup of glioblastoma and discovered new cancer-causing mutations in a gene called PDGFRA, which drives cell division and growth when it is activated.

The researchers noted that close to half of the patients at diagnosis and the vast majority at tumour recurrence had mutations in this gene, which was also unusually highly expressed in this subgroup of glioblastoma. We investigated large public datasets of both children and adult patients in addition to those we had generated from patients samples in the lab and came to the same conclusion, PDGFRA was unduly activated in these tumours. This led us to suspect this kinase plays a major role in tumour formation explains Dr Carol Chen, a postdoctoral fellow, and Shriya Deshmukh an MD-PhD candidate in the Jabado lab and the studys co-first authors.

Employing a big data resource generated by their team using new technologies that measure the levels of every gene in thousands of individual cells, they were able to discover that this brain tumour originates in a specific type of neuronal stem cell. We used single cell analyses to create an atlas of the healthy developing brain, identifying hundreds of cell types and their traits, explains Selin Jessa, a PhD student in the Kleinman lab and co-first author on this study. Since these brain tumours retain a memory, or footprint, of the cell in which they originated, we could then pinpoint the most similar cell type for these tumours in the atlas, in this case, inhibitory neuronal progenitors that arise during fetal development or after birth in specific structures of the developing brain, adds Dr. Kleinman who leads a computational research lab at the Lady Davis Institute at the Jewish General Hospital.

An unexpected finding

The researchers note that the PDGFRA gene is not usually turned on in this neuronal stem cell population. By using sequencing technologies that measure how a cells DNA is spatially organized in 3D, notes Djihad Hadjadj, a postdoctoral fellow in the Jabado lab and the studys co-first author, We found that, exquisitely in this neuronal stem cell, the DNA has a unique structure in the 3D dimension that allows the PDGFRA gene to become activated where it shouldnt be, ultimately leading to cancer.

The finding is also important in properly classifying the tumour. Previously, this tumour type was classified as a glioma, because under the microscope, it resembles glial cells, one of the major cell types in the brain, says Dr. Jabado, who holds a CRC Tier 1 in Pediatric Oncology in addition to being a clinician scientist at the Montreal Childrens Hospital and leading a research lab focused on studying brain tumours at the Research Institute of the McGill University Health Centre. Our work reveals that this is a case of mistaken identity. These tumours actually arise in a neuronal cell, not a glial cell.

A hope for potential treatment

PDGFRA is targetable by drugs that inhibit its activity, and there are, in fact already approved drugs that target it for other cancers for which mutations in this gene are responsible, such as gastrointestinal stromal tumours. This offers hope for work into finding targeted therapies for this group of deadly brain tumours, note the researchers.

The combined studies of the genome, including at the single cell level and the genomic architecture in 3D of the tumour compared to the normal developing brain, were crucial in this study. They helped identify the specific timepoints during development where the cell is vulnerable to the cancer-driver event in these gliomas, which were revealed to be neuronal tumours. Importantly, the authors unravel genetic events that could lead to targeted therapy in a deadly cancer. Our findings provide hope for improved care in the near future for this tumour entity as these exquisite vulnerabilities may pinpoint to treatments that would preferentially attack the bad cells, say Drs. Jabado and Kleinman, who have joined efforts in the fight against deadly brain tumour. Stalled development is at the root of many of these cancers. The same strategy will prove important to unravel the origin, identify and exploit specific vulnerabilities, and orient future strategies for earlier detection in other brain tumour entities affecting children and young adults.

This study was made possible in large part thanks to support from the Genome Canada LSARP project Tackling Childhood Brain Cancer at the root to improve survival and quality of life, which includes funding from Genome Canada, Genome Quebec, CIHR and other sources, as well as the Fondation Charles-Bruneau and the National Institutes of Health.

Histone H3.3G34-Mutant Interneuron Progenitors Co-optPDGFRAfor Gliomagenesis, by C. Chen, S. Deshmukh, S. Jessa, D. Hadjadj, C. Kleinman, N. Jabado, et al, was published in the journalCellon December 10, 2020. DOI:https://doi.org/10.1016/j.cell.2020.11.012

Read more here:
Researchers identify the origin of a deadly brain cancer - McGill Newsroom

Cancer Stem Cell Therapy Market Revenue, Global Forecast, Cost, Key Participants and Emerging Trends and Key Players-AVIVA BioSciences , AdnaGen – The…

Summary of the Cancer Stem Cell Therapy Market Report

Rise in R&D activities across the globe, increase in demand and growth across several application areas are some of the factors boosting the growth of the market.

Key Companies

AVIVA BioSciences AdnaGen Advanced Cell Diagnostics Silicon Biosystems

Cancer Stem Cell Therapy Market by Type

Autologous Stem Cell Transplants Allogeneic Stem Cell Transplants Syngeneic Stem Cell Transplants Others

Cancer Stem Cell Therapy Market by Application

Hospital Clinic Medical Research Institution

The major regional market covered under the scope of the study are APAC, North America, Europe, South & Central America, Africa and the Middle East. Singapore, Russia, Mexico, South America, Canada, France, the U.S., Germany, Africa, Italy, the United Kingdom, India, China, the Middle East, Central America, Japan, South America, Taiwan, and South Korea among others.

To know more about the report, visit @https://decisivemarketsinsights.com/cancer-stem-cell-therapy-market/58996063/request-sample

Cancer Stem Cell Therapy Market Overview, Key Trends Market Dynamics

Growth across various application areas and major geographies, growing R&D activities and rising demand are some of the key factors currently driving this market. The market would witness significant growth throughout the forecast period. Other factors are increasing the rate of adoption and improving the product that drives the demand at a fast pace. At present, i.e. 2020, the effect of COVID -19 can be seen; however, the market will soon recover in the coming years probably by 2021.

Regional Coverage of Global Cancer Stem Cell Therapy Market

Mexico, Canada, and the United States are the major countries covered under North America Italy, UK, Germany, Italy, UK, France, UK, Russia are covered under Europe Taiwan, India, China, South Korea, Singapore, Japan, and Others are covered under Asia Pacific Rest of the World (RoW) covers Africa, South America & Central America and the Middle East COVID -19 Impact Analysis

The report also offers a detailed insight of COVID -19 impact analysis:

Before COVID -19 Present Scenario Post recovery of COVID -19

Inquire Before Purchasing the report, visit @https://decisivemarketsinsights.com/cancer-stem-cell-therapy-market/58996063/pre-order-enquiry

Table of Content

Customization can be availed on Request:

Chapter 1: Introduction and Scope Chapter 2: Key Company Profiles Chapter 3: Remarks, Share and Forecast across type, application and geography Chapter 4: Market Remarks of Asia Pacific region Chapter 5: Market Remarks of Europe region Chapter 6: Market Remarks of Asia Pacific region Chapter 7: Market Remarks of North America region Chapter 8: Market Remarks of Middle East and Africa region Chapter 9: Key Important features of the market Chapter 10: Key trends of the market and the market Opportunities Chapter 11: Strategies to be adopted by the key players

Continued.

Key Pointers of the Report

For each and every segment and its sub-segment, market share and growth rate are given Estimation and forecast provided from 2020 to 2027 Data triangulation method has been followed to conclude the market The study also includes the strategies to be followed by the major players COVID -19 impact analysis was also covered under the framework of impact analysis

Supplementary Pointers of the Report:

Stated below are some of the added key points of the report:

SWOT Analysis Porters Five Analysis Value Chain Analysis Market Attractiveness Analysis PEST Analysis

To Inquire about the Discount available with the report, visit @https://decisivemarketsinsights.com/cancer-stem-cell-therapy-market/58996063/request-discount

Note: Year End Discount If you purchase the report this year Flat 15% instant discount 20% discount on 2nd report 1 Year consultation and 10 % free customization

Kindly contact us and our expert will get back to you within 30 minutes: Decisive Markets Insights Sunil Kumar Sales Head Email sales@decisivemarketsinsights.com US +18317045538 UK +44125663604

Follow this link:
Cancer Stem Cell Therapy Market Revenue, Global Forecast, Cost, Key Participants and Emerging Trends and Key Players-AVIVA BioSciences , AdnaGen - The...

Stem Cell therapist to visit Jefferson salon for special event – Marshall News Messenger

JEFFERSON Guests to Salon Rouge Spa in Jefferson next Friday will have a chance to consult with stem cell therapist Gail McBride and her team of doctors and specialists, Salon owner Brooke Bradley-LaFleur said Friday.

One of my employees heard her ad on the radio and has been having shoulder pain, LaFleur said. She looked into and realized that Gail was planning to have knee surgery after suffering knee pain for years but instead she had the stem cell injection and was able to avoid surgery. Gail brought me pictures of her x-rays before and after the stem cell injection and you could clearly see a huge difference. It was amazing.

LaFleur said after talking with McBride, who owns Longview Regeneration and Wellness Center, that Jeffersonians would enjoy a chance to learn about possible alternatives to surgery for issues like joint pain and skin rejuvenation through stem cell therapy.

What I really love is helping people avoid having to have surgery, LaFleur said. Gail and her team of specialists and doctors will come down and offer consultations and then decide how to proceed. Stem cell therapy can also be used for anti-aging against wrinkles. Some people need just one injection and others need more, depending on the location and severity of the issue they are treating.

The event with McBride at Salon Rouge is set for 5 to 7 p.m. on Friday.

LaFleur said masks will be worn and social distancing will be enforced to make sure guests remain safe during the event.

Refreshments will be served and gift certificates will be awarded during the event.

The rest is here:
Stem Cell therapist to visit Jefferson salon for special event - Marshall News Messenger

Stem Cell Exosomes Market: Increasing advanced applications of exosomes is expected to drive the market – BioSpace

Stem Cell Exosomes Market: Overview

Exosomes possesses the potential to be a carrier for drug delivery owing to their transportation properties. The stem cell exosomes have other properties of high biocompatibility and intrinsic long-term circulation, which are ideal for proteins, nucleic acids, and chemicals. Additionally, new researches showed results of exosomes possessing properties of mediators in intercellular communication and mRNA transcripts, delivering proteins, and many others. They have properties, which make them biocompatible and useful to become agents to provide treatment for various disorders.

The rapidly increasing interest for advanced material to provide disease-based treatment in case of emergency is inducing more research and funding to explore stem cell exosomes. This is a key factor driving growth of the stem cell exosomes market from past few years and is estimated to be the same for next few years as well.

Request Brochure of Report - https://www.transparencymarketresearch.com/sample/sample.php?flag=B&rep_id=80394

Stem Cell Exosomes Market: Notable Development

The stem cell exosomes market is identified as highly competitive without dominant players owing to many players operating in the market. Some of the key players in the market include Anjarium Biosciences, Codiak Biosciences, Capricor Therapeutics, Creative Medical Technology Holdings, Evox Therapeutics, Everkine Corporation, Exogenus Therapeutics, ReNeuron, Kimera Labs, and Unicyte AG.

The market is witnessing lucrative investments for adoption of newer and improving technologies. Such investments are on grounds of few acquisitions and mergers, tie ups, and to cater to global population.

Request for Analysis of COVID-19 Impact on Stem Cell Exosomes Market - https://www.transparencymarketresearch.com/sample/sample.php?flag=covid19&rep_id=80394

Some of few developments observed in the market:

Pre Book Stem Cell Exosomes Market Report at https://www.transparencymarketresearch.com/checkout.php?rep_id=80394&ltype=S

Stem Cell Exosomes Market: Growth Factors

The factors impacting on growth of the market include increasing prevalence of cancer and advent of technological advancements in exosomes and its applications. Additionally, increasing advanced applications of exosomes coupled with increasing awareness about presence of improved medical techniques are propelling growth of the global stem cell exosomes market. The 2012 reports by World Health Organization (WHO), the number of patients is expected to increase by 70% in next two decade. Increase in patients may lead to increase in fatality due to cancer, which increase attention toward advanced medications. This factor is likely to boost demand for the exosomes in diagnosis and therapeutics.

However, number of technical difficulties are limiting its adoption globally and hindering growth of the global stem cell exosomes market. The other factors restraining market growth are stringent regulatory frameworks and commercialization of exosomes. Nonetheless, the factors such as increase in research coupled with funding for researches are estimated to open doors of opportunities for growth in coming future.

Read more information here:

https://www.transparencymarketresearch.com/stem-cell-exosomes-market.html

About Us

Transparency Market Research is a next-generation market intelligence provider, offering fact-based solutions to business leaders, consultants, and strategy professionals.

Our reports are single-point solutions for businesses to grow, evolve, and mature. Our real-time data collection methods along with ability to track more than one million high growth niche products are aligned with your aims. The detailed and proprietary statistical models used by our analysts offer insights for making right decision in the shortest span of time. For organizations that require specific but comprehensive information we offer customized solutions through ad hoc reports. These requests are delivered with the perfect combination of right sense of fact-oriented problem solving methodologies and leveraging existing data repositories.

TMR believes that unison of solutions for clients-specific problems with right methodology of research is the key to help enterprises reach right decision.

Contact

Mr. Rohit Bhisey Transparency Market Research

State Tower,

90 State Street,

Suite 700,

Albany NY - 12207

United States

USA - Canada Toll Free: 866-552-3453

Email: sales@transparencymarketresearch.com

Website: https://www.transparencymarketresearch.com/

Here is the original post:
Stem Cell Exosomes Market: Increasing advanced applications of exosomes is expected to drive the market - BioSpace

3 Stocks That Are Giving Their Investors Coal in Their Stockings – The Motley Fool

No CEO should be evaluated on their company's stock performance in a single year. Strategies often take several years to play out. When a company's fortunes depend on drug development, the timeline can be even longer. That's why no investor should take it too hard that Galapagos NV (NASDAQ:GLPG), Sage Therapeutics (NASDAQ:SAGE), and bluebird bio (NASDAQ:BLUE) have had a rough year.

The stocks could have wildly different outcomes in 2021, although recent developments make it seem like they are as likely to stage a comeback as they are to throw in the towel. During the holiday season, parents often tell misbehaving kids that Santa won't bring them what they want for Christmas. With share prices down dramatically from all-time highs, let's find out why shareholders of these three biotechs are getting coal in their stockings this year.

Image source: Getty Images.

If you asked investors in July 2019, they probably would have been surprised to see Galapagos on a list like this one. Spirits were high at that time, as Gilead Sciences (NASDAQ:GILD) had just made a $5.1 billion investment in Galapagos for its research pipeline including filgotinib, the company's arthritis drug now marketed as Jyseleca. Although the drug was approved in Europe and Japan, the U.S. Food and Drug Administrationrejected it due to toxicity concerns and doubts about the risk/benefit profile at dosage levels in the study.

The FDA's action will push approval out at least a year -- that is -- if Gilead wants to keep up the effort. Even if the drug were to make it through the regulatory gauntlet, it would face stiff competition from AbbVie's (NYSE:ABBV) Rinvoq, which gained approval in 2019. Galapagos has run more tests to allay the concerns, and if successful, Gilead could refile for approval next year. But today's investors don't seem confident. Shares in Galapagos are down 42% this year.

Zuranolone, Sage's drug that helps treat depression, failed a phase 3 trial in 2017. That was strike one. Last December, shares fell nearly 60% in a day after the drug once again failed a clinical trial. That was strike two. At one point in 2020, shares of Sage were down 86% from their 2019 highs. Clearly not quitters, management restructured to conserve cash, and launched three new phase 3 studies of the drug as a treatment for major depressive disorder and postpartum depression. Then, a funny thing happened on the way to the results expected next year.

In November, Biogen (NASDAQ:BIIB) injected $1.5 billion into the company to jointly develop and commercialize zuranolone. Shares sold off on the news but still sit about where they began 2020. The stock is up 160% since the beginning of April. For its money, Biogen earns 50% of profits in the U.S., and shoulders the costs in most non-U.S. markets while paying royalties to Sage from any sales. While one analyst applauded Biogen for "getting the milk without having [to] buy the whole cow," Sage shareholders are on the other end of that colorful analogy. Three years after the first failure of zuranolone, the best shareholders can now hope for is to share any windfall with Biogen.

Bluebird bio is trying to cure sickle cell disease and beta thalassemia through gene therapy. Despite the company's LentiGlobin product showing promise in clinical trials, bluebird's stock is down 50% in 2020 and more than 80% from all-time highs in 2018. The therapy candidate is essentially a stem cell transplant that takes a functioning gene, inserts it into the patient's harvested stem cells, and then reinserts those stem cells into the body.

After receiving approval in Europe, as well as both fast-track and breakthrough designation from the FDA, progress has been slow due to pricing negotiations, COVID-19 constraints, and an FDA request for the company to prove it can scale up from clinical trials. Bluebird also ran afoul of the FDA in May for the same issue, when the agency refused to review the application for ide-cel -- a CAR-T therapy for multiple myeloma being developed with Bristol Myers Squibb (NYSE:BMY). Although bluebird still plans to file for approval of LentiGlobin in mid-2021, the delays are costing the company its head start in the race for a cure. CRISPR Therapeutics (NASDAQ:CRSP) and partner Vertex Pharmaceuticals(NASDAQ:VRTX) have received advanced therapy designation for their cure using gene editing to target the same diseases. If CRISPR and Vertex get there first, shareholders might as well forget next Christmas too.

Read the rest here:
3 Stocks That Are Giving Their Investors Coal in Their Stockings - The Motley Fool

Exploiting the diphtheria toxin internalization receptor enhances delivery of proteins to lysosomes for enzyme replacement therapy – Science Advances

Abstract

Enzyme replacement therapy, in which a functional copy of an enzyme is injected either systemically or directly into the brain of affected individuals, has proven to be an effective strategy for treating certain lysosomal storage diseases. The inefficient uptake of recombinant enzymes via the mannose-6-phosphate receptor, however, prohibits the broad utility of replacement therapy. Here, to improve the efficiency and efficacy of lysosomal enzyme uptake, we exploited the strategy used by diphtheria toxin to enter into the endolysosomal network of cells by creating a chimera between the receptor-binding fragment of diphtheria toxin and the lysosomal hydrolase TPP1. We show that chimeric TPP1 binds with high affinity to target cells and is efficiently delivered into lysosomes. Further, we show superior uptake of chimeric TPP1 over TPP1 alone in brain tissue following intracerebroventricular injection in mice lacking TPP1, demonstrating the potential of this strategy for enhancing lysosomal storage disease therapy.

Lysosomal storage diseases (LSDs) are a group of more than 70 inherited childhood diseases characterized by an accumulation of cellular metabolites arising from deficiencies in a specific protein, typically a lysosomal hydrolase. Although each individual disease is considered rare, LSDs have a combined incidence of between 1/5000 and 1/8000 live births, and together, they account for a substantial proportion of the neurodegenerative diseases in children (1). The particular age of onset for a given LSD varies depending on the affected protein and the percentage of enzymatic activity still present; however, in most cases, symptoms manifest early in life and progress insidiously, affecting multiple tissues and organs (2). In all but the mildest of cases, disease progression results in severe physical disability, possible intellectual disability, and a shortened life expectancy, with death occurring in late childhood or early adolescence.

As they are monogenic diseases, reintroducing a functional form of the defective enzyme into lysosomes is in principle a viable strategy for treating LSDs. Enzyme replacement therapy (ERT) is now approved for the treatment of seven LSDs, and clinical trials are ongoing for five others (3). However, delivering curative doses of recombinant lysosomal enzymes into lysosomes remains a major challenge in practice. ERT typically takes advantage of a specific N-glycan posttranslational modification, mannose-6-phosphorylation (M6P), which controls trafficking of endogenous lysosomal enzymes, as well as exogenous uptake of lysosomal enzymes from circulation by cells having the cation-independent M6P receptor (CIMPR) (4). Hence, a combination of factors including (i) the abundance of the M6P receptor in the liver, (ii) poor levels of CIMPR expression in several key target tissue types such as bone and skeletal muscle, (iii) incomplete and unpredictable M6P labeling of recombinant enzymes, and (iv) the highly variable affinity of recombinant lysosomal enzymes for CIMPR [viz., Kds (dissociation constants) ranging from low to mid micromolar (5, 6)] all contribute to diminishing the overall effectiveness of therapies using CIMPR for cell entry (3).

To improve the delivery of therapeutic lysosomal enzymes, we drew inspiration from bacterial toxins, which, as part of their mechanism, hijack specific host cellsurface receptors to gain entry into the endolysosomal pathway. While we and others have explored exploiting this pathway to deliver cargo into the cytosol (7, 8), here we asked whether this same approach could be used to enhance the delivery of lysosomal enzymes into lysosomes. We choose the diphtheria toxin (DT)diphtheria toxin receptor (DTR) system owing to the ubiquitous nature of the DTR, in particular its high expression levels on neurons.

Corynebacterium diphtheriae secretes DT exotoxin, which is spread to distant organs by the circulatory system, where it affects the lungs, heart, liver, kidneys, and the nervous system (9). It is estimated that 75% of individuals with acute disease also develop some form of peripheral or cranial neuropathy. This multiorgan targeting results from the fact that the DTR, heparin-binding EGF (epidermal growth factor)like growth factor (HBEGF), is ubiquitously expressed. The extent to which DT specifically targets difficult-to-access tissues such as muscle and bone, however, is not currently known.

DT is a three-domain protein that consists of an N-terminal ADP (adenosine diphosphate)ribosyl transferase enzyme (DTC), a central translocation domain (DTT), and a C-terminal receptorbinding domain (DTR). The latter is responsible for both binding cell surface HBEGF with high affinity [viz., Kd = 27 nM (10)] and triggering endocytosis into early endosomes (Fig. 1A). Within endosomes, DTT forms membrane-spanning pores that serve as conduits for DTC to enter the cytosol where it inactivates the host protein synthesis machinery. The remaining portions of the toxin remain in the endosomes and continue to lysosomes where they are degraded (11, 12). We hypothesized that the receptor-binding domain, lacking any means to escape endosomes, would proceed with any attached cargo to lysosomes and, thus, serve as a means to deliver cargo specifically into lysosomes following high-affinity binding to HBEGF.

(A) DT intoxication pathway (left), DT domain architecture, and LTM structure (right). (B and C) DTK51E/E148K, LTM, mCherry-LTM, and LTM-mCherry compete with wild-type DT for binding and inhibit its activity in a dose-dependent manner with IC50 (median inhibitory concentration) values of 46.9, 10.1, 52.7, and 76.1 nM, respectively (means SD; n = 3). (D and E) C-terminal and N-terminal fusions of LTM to mCherry were immunostained (red) and observed to colocalize with the lysosomal marker LAMP1 (39). (F) Fractional co-occurrence of the red channel with the green channel (Manders coefficient M2) were calculated for mCherry-LTM and LTM-mCherry and were found to be 0.61 0.10 and 0.52 0.11, respectively (means SD; n = 6).

In this study, we generated a series of chimeric proteins containing the DTR-binding domain, DTR, with the goal of demonstrating the feasibility of delivering therapeutic enzymes into lysosomes through the DT-HBEGF internalization pathway. We showed that DTR serves as a highly effective and versatile lysosome-targeting moiety (LTM). It can be placed at either the N or C terminus of the cargo, where it retains its high-affinity binding to HBEGF and the ability to promote trafficking into lysosomes both in vitro and in vivo. On the basis of its advantages, over M6P-mediated mechanisms, we further investigated the utility of LTM for the lysosomal delivery of human tripeptidyl peptidase-1 (TPP1) with the long-term goal of treating Batten disease.

To evaluate whether the DTR-binding fragment could function autonomously to traffic cargo into lysosomes, we first asked whether the isolated 17-kDa DTR fragment could be expressed independently from DT holotoxin and retain its affinity for HBEGF. We cloned, expressed, and purified the receptor-binding fragment and evaluated its ability to compete with full-length DT for the DTR, HBEGF. Before treating cells with a fixed dose of wild-type DT that completely inhibits protein synthesis, cells were incubated with a range of concentrations of LTM or a full-length, nontoxic mutant of DT (DTK51E/E148K). LTM-mediated inhibition of wild-type DT-mediated toxicity was equivalent to nontoxic DT (Fig. 1B), demonstrating that the receptor-binding fragment can be isolated from the holotoxin without affecting its ability to fold and bind cell surface HBEGF. Next, we evaluated whether LTM had a positional bias (i.e., was able to bind HBEGF with a fusion partner when positioned at either terminus). To this end, we generated N- and C-terminal fusions of LTM to the model fluorescent protein mCherry (i.e., mCherry-LTM and LTM-mCherry). To determine binding of each chimera to HBEGF, we quantified the ability of each chimera to compete with wild-type DT on cells in the intoxication assay. Both constructs competed with wild-type DT to the same extent as LTM alone and DTK51E/E148K (Fig. 1C), demonstrating that LTM is versatile and autonomously folds in different contexts.

To evaluate intracellular trafficking, HeLa cells were treated with either LTM-mCherry or mCherry-LTM and then fixed and stained 4 hours later with an antibody against the lysosomal marker LAMP1. In both cases, we observed significant uptake of the fusion protein (Fig. 1, D and E). We calculated Manders coefficients (M2) to quantify the extent to which signal in the red channel (LTM-mCherry and mCherry-LTM) was localizing with signal in the green channel (LAMP1). The fraction of red/green co-occurrence was calculated to be 0.61 for mCherry-LTM and 0.52 for LTM-mCherry, indicating trafficking to the lysosomal compartments of the cells and no significant difference (P = 0.196) between the two orientations of chimera (Fig. 1F). Together, these results confirm that the LTM is capable of binding HBEGF and trafficking associated cargo into cells and that the LTM can function in this manner at either terminus of a fusion construct.

With minimal positional bias observed in the mCherry fusion proteins, we next screened LTM fusions to TPP1 to identify a design that maximizes expression, stability, activity, and, ultimately, delivery. TPP1 is a 60-kDa lysosomal serine peptidase encoded by the CLN2 gene, implicated in neuronal ceroid lipofuscinosis type 2 or Batten disease. Loss of function results in the accumulation of lipofuscin, a proteinaceous, autofluorescent storage material (13). Exposure to the low-pH environment of the lysosome triggers autoproteolytic activation of TPP1 and release of a 20-kDa propeptide that occludes its active site. From a design perspective, we favored an orientation in which the LTM was N terminal to TPP1, as autoprocessing of TPP1 would result in the release of the upstream LTM-TPP1 propeptide, liberating active, mature TPP1 enzyme in the lysosome (Fig. 2A). Given the need for mammalian expression of lysosomal enzymes, we generated synthetic genetic fusions of the LTM to TPP1, in which we converted the codons from bacterially derived DT into the corresponding mammalian codons. Human embryonic kidney (HEK) 293F suspension cells stably expressing recombinant TPP1 (rTPP1) and TPP1 with an N-terminal LTM fusion (LTM-TPP1) were generated using the piggyBac transposon system (14). A C-terminal construct (TPP1-LTM) was also produced; however, expression of this chimera was poor in comparison with rTPP1 and LTM-TPP1 (~0.4 mg/liter, cf. 10 to 15 mg/liter).

(A) Design of LTM-TPP1 fusion protein and delivery schematic. (B) Enzyme kinetics of rTPP1 and LTM-TPP1 against the synthetic substrate AAF-AMC are indistinguishable. Michaelis-Menten plots were generated by varying [AAF-AMC] at a constant concentration of 10 nM enzyme (means SD; n = 3). Plots and kinetic parameters were calculated with GraphPad Prism 7.04. (C) Maturation of TPP1 is unaffected by the N-terminal fusion of LTM. (D) LTM-TPP1 inhibits wild-type DT activity in a dose-dependent manner (IC50 of 17.2 nM), while rTPP1 has no effect on protein synthesis inhibition by DT (means SD; n = 3). (E) LTM and DTR-TPP1 bind HBEGF with apparent Kds of 13.3 and 19.1 nM, respectively. (F) LTM-TPP1 (39) colocalizes with LAMP1 staining (red).

The activity of rTPP1 and LTM-TPP1 against the tripeptide substrate Ala-Ala-Phe-AMC (AAF-AMC) was assessed to determine any effects of the LTM on TPP1 activity. The enzyme activities of rTPP1 and LTM-TPP1 were determined to be equivalent, as evidenced through measurements of their catalytic efficiency (Fig. 2B), demonstrating that there is no inference by LTM on the peptidase activity of TPP1. Maturation of LTM-TPP1 through autocatalytic cleavage of the N-terminal propeptide was analyzed by SDSpolyacrylamide gel electrophoresis (PAGE) (Fig. 2C). Complete processing of the zymogen at pH 3.5 and 37C occurred between 5 and 10 min, which is consistent with what has been observed for the native recombinant enzyme (15).

The ability of LTM-TPP1 to compete with DT for binding to extracellular HBEGF was first assessed with the protein synthesis competition assay. Similar to LTM, mCherry-LTM, and LTM-mCherry, LTM-TPP1 prevents protein synthesis inhibition by 10 pM DT with an IC50 (median inhibitory concentration) of 17.2 nM (Fig. 2D). As expected, rTPP1 alone was unable to inhibit DT-mediated entry and cytotoxicity. To further characterize this interaction, we measured the interaction between LTM and LTM-TPP1 and recombinant HBEGF using surface plasmon resonance (SPR) binding analysis (Fig. 2E). By SPR, LTM and LTM-TPP1 were calculated to have apparent Kds of 13.3 and 19.1 nM, respectively, values closely corresponding to the IC50 values obtained from the competition experiments (10.1 and 17.2 nM, respectively). Consistent with these results, LTM-TPP1 colocalizes with LAMP1 by immunofluorescence (Fig. 2F).

To study uptake of chimeric fusion proteins in cell culture, we generated a cell line deficient in TPP1 activity. A CRISPR RNA (crRNA) was designed to target the signal peptide region of TPP1 in exon 2 of CLN2. Human HeLa Kyoto cells were reverse transfected with a Cas9 ribonucleoprotein complex and then seeded at low density into a 10-cm dish. Single cells were expanded to colonies, which were picked and screened for TPP1 activity. A single clone deficient in TPP1 activity was isolated and expanded, which was determined to have ~4% TPP1 activity relative to wild-type HeLa Kyoto cells plated at the same density (Fig. 3A). The small residual activity observed is likely the result of another cellular enzyme processing the AAFAMC (7-amido-4-methlycoumarin) substrate used in this assay, as there is no apparent TPP1 protein being produced (Fig. 3B). Sanger sequencing of the individual alleles confirmed complete disruption of the CLN2 gene (fig. S1). In total, three unique mutations were identified within exon 2 of CLN2: a single base insertion resulting in a frameshift mutation and two deletions of 24 and 33 base pairs (bp), respectively.

(A) CLN2 knockout cells exhibit ~4% TPP1 activity relative to wild-type HeLa Kyoto cells (means SD; n = 3). (B) Western blotting against TPP1 reveals no detectable protein in the knockout cells. (C) (Left) In vitro maturation of pro-rTPP1 and LTM-TPP1 (16 ng) was analyzed by Western blot. (Right) TPP1 present in wild-type (WT) and TPP1/ cells, and TPP1/ cells treated with 100 nM rTPP1 and LTM-TPP1. (D) Uptake of rTPP1 and LTM-TPP1 into HeLa Kyoto TPP1/ cells was monitored by TPP1 activity (means SD; n = 4). (E) TPP1 activity present in HeLa Kyoto TPP1/ cells following a single treatment with 50 nM LTM-TPP1 (means SD; n = 3).

Next, we compared the delivery and activation of rTPP1 and LTM-TPP1 into lysosomes by treating TPP1/ cells with a fixed concentration of the enzymes (100 nM) and by analyzing entry and processing by Western blot (Fig. 3C). In both cases, most enzymes were present in the mature form, indicating successful delivery to the lysosome; however, the uptake of LTM-TPP1 greatly exceeded the uptake of rTPP1. As both rTPP1 and LTM-TPP1 receive the same M6P posttranslational modifications promoting their uptake by CIMPR, differences in their respective uptake should be directly attributable to uptake by HBEGF. To quantify the difference in uptake and lysosomal delivery, cells were treated overnight with varying amounts of each enzyme, washed, lysed, and assayed for TPP1 activity. The activity assays were performed without a preactivation step, so signal represents protein that has been activated in the lysosome. For both constructs, we observed a dose-dependent increase in delivery of TPP1 to the lysosome (Fig. 3D). Delivery of LTM-TPP1 was significantly enhanced compared with TPP1 alone at all doses, further demonstrating that uptake by HBEGF is more efficient than that by CIMPR alone. TPP1 activity in cells treated with LTM-TPP1 was consistently ~10 greater than that of cells treated with rTPP1, with the relative difference increasing at the highest concentrations tested. This may speak to differences in abundance, replenishment, and/or recycling of HBEGF versus CIMPR, in addition to differences in receptor-ligand affinity. Uptake of LTM-TPP1 and rTPP1 into several other cell types yielded similar results (fig. S2). To assess the lifetime of the delivered enzyme, cells were treated with LTM-TPP1 (50 nM) and incubated overnight. Cells were washed and incubated with fresh media, and TPP1 activity was assayed over the course of several days. Cells treated with LTM-TPP1 still retained measurable TPP1 activity at 1 week after treatment (Fig. 3E).

While the DT competition experiment demonstrated that HBEGF is involved in the uptake of LTM-TPP1 but not rTPP1 (Fig. 2D), it does not account for the contribution of CIMPR to uptake. Endoglycosidase H (EndoH) cleaves between the core N-acetylglucosamine residues of high-mannose N-linked glycans, leaving behind only the asparagine-linked N-acetylglucosamine moiety. Both rTPP1 and LTM-TPP1 were treated with EndoH to remove any M6P moieties, and delivery into Hela TPP1/ was subsequently assessed. While rTPP1 uptake is completely abrogated by treatment with EndoH, LTM-TPP1 uptake is only partially decreased (Fig. 4), indicating that while HBEGF-mediated endocytosis is the principal means by which LTM-TPP1 is taken up into cells, uptake via CIMPR still occurs. The fact that CIMPR uptake is still possible in the LTM-TPP1 fusion means that the fusion is targeted to two receptors simultaneously, increasing its total uptake and, potentially, its biodistribution.

Uptake of LTM-TPP1 via the combination of HBEGF and CIMPR was shown to be 3 to 20 more efficient than CIMPR alone in cellulo (fig. S2). To interrogate this effect in vivo, TPP1-deficient mice (TPP1tm1pLob or TPP1/) were obtained as a gift from P. Lobel at Rutgers University. Targeted disruption of the CLN2 gene was achieved by insertion of a neo cassette into intron 11 in combination with a point mutation (R446H), rendering these mice TPP1 null by both Western blot and enzyme activity assay (16). Prior studies have demonstrated that direct administration of rTPP1 into the cerebrospinal fluid (CSF) via intracerebroventricular or intrathecal injection results in amelioration of disease phenotype (17) and even extension of life span in the disease mouse (18). To compare the uptake of LTM-TPP1 and rTPP1 in vivo, the enzymes were injected into the left ventricle of 6-week-old TPP1/ mice. Mice were euthanized 24 hours after injection, and brain homogenates of wild-type littermates, untreated, and treated mice were assayed for TPP1 activity (Fig. 5A). Assays were performed without preactivation, and therefore, the results report on enzyme that has been taken up into cells, trafficked to the lysosome, and processed to the mature form.

(A) Assay schematic. (B) TPP1 activity in brain homogenates of 6-week-old mice injected with two doses (5 and 25 g) of either rTPP1 or LTM-TPP1 (5 g, P = 0.01; 25 g, P = 0.002). (C) TPP1 activity in brain homogenates following a single 25-g dose of LTM-TPP1, 1, 7, and 14 days postinjection. Data are presented as box and whisker plots, with whiskers representing minimum and maximum values from n 4 mice per group. Statistical significance was calculated using paired t tests with GraphPad Prism 7.04.

While both enzymes resulted in a dose-dependent increase in TPP1 activity, low (5 g) and high (25 g) doses of rTPP1 resulted in only modest increases of activity, representing ~6 and ~26% of the wild-type levels of activity, respectively (Fig. 5B). At the same doses, LTM-TPP1 restored ~31 and ~103% of the wild-type activity. To assess the lifetime of enzyme in the brain, mice were injected intracerebroventricularly with 25 g of LTM-TPP1 and euthanized either 1 or 2 weeks postinjection. Remarkably, at 1 week postinjection, ~68% of TPP1 activity was retained (compared with 1 day postinjection), and after 2 weeks, activity was reduced to ~31% (Fig. 5C).

ERT is a lifesaving therapy that is a principal method of treatment in non-neurological LSDs. Uptake of M6P-labeled enzymes by CIMPR is relatively ineffective due to variable receptor affinity (5, 6), heterogeneous expression of the receptor, and incomplete labeling of recombinantly produced enzymes (19). Despite its inefficiencies and high cost (~200,000 USD per patient per year) (20), it remains the standard of care for several LSDs, as alternative treatment modalities (substrate reduction therapy, gene therapy, and hematopoietic stem cell transplantation) are not effective, not as well developed, or inherently riskier (2125). Improving the efficiency and distribution of recombinant enzyme uptake may help address some of the current shortcomings in traditional ERT.

Several strategies have been used to increase the extent of M6P labeling on recombinantly produced lysosomal enzymes: engineering mammalian and yeast cell lines to produce more specific/uniform N-glycan modification (19, 26, 27), chemical or enzymatic modification of N-glycans posttranslationally (28), and covalent coupling of M6P (29). M6P-independent uptake of a lysosomal hydrolase by CIMPR has been demonstrated for both -glucuronidase (28) and acid -glucosidase (30, 31). In the latter work, a peptide tag (GILT) targeting insulin-like growth factor II receptor (IGF2R) was fused to recombinant alpha glucosidase, which enabled receptor-mediated entry into cells. CIMPR is a ~300-kDa, 15-domain membrane protein with 3 M6P-binding domains and 1 IGF2R domain. By targeting the IGF2R domain with a high-affinity (low nanomolar) peptide rather than the low-affinity M6P-binding domain, the authors were able to demonstrate a >20-fold increase in the uptake of a GAA-peptide fusion protein in cell culture and a ~5-fold increase in the ability to clear built-up muscle glycogen in GAA-deficient mice.

In this study, we have demonstrated efficient uptake and lysosomal trafficking of a model lysosomal enzyme, TPP1, via a CIMPR-independent route, using the receptor-binding domain of a bacterial toxin. HBEGF is a member of the EGF family of growth factors, and DT is its only known ligand. Notably, it plays roles in cardiac development, wound healing, muscle contraction, and neurogenesis; however, it does not act as a receptor in any of these physiological processes (32). Intracellular intoxication by DT is the only known process in which HBEGF acts as a receptor, making it an excellent candidate receptor for ERT, as there is no natural ligand with which to compete. Upon binding, DT is internalized via clathrin-mediated endocytosis and then trafficked toward lysosomes for degradation (33, 34). Acidification of endosomal vesicles by vacuolar ATPases (adenosine triphosphatases) promotes insertion of DTT into the endosomal membrane and subsequent translocation of the catalytic DTC domain into the cytosol. In the absence of an escape mechanism, the majority of internalized LTM should be trafficked to the lysosome, as we have demonstrated with our chimera (Figs. 2F and 3C). Uptake of LTM-TPP1 in vitro is robustly relative to rTPP1 (Fig. 3D and fig. S2), and TPP1 activity is sustained in the lysosome for a substantial length of time (Fig. 3E). We have also demonstrated that the increase in uptake efficiency that we observed in cell culture persists in vivo. TPP1 activity in the brains of CLN2-null mice was significantly greater in animals treated with intracerebroventricularly injected LTM-TPP1, as compared with those treated with TPP1 at two different doses (Fig. 5B), and, remarkably, this activity persists with an apparent half-life of ~8 days (Fig. 5C).

An important consideration for further development of the LTM platform for clinical development is the potential immunogenicity of using a bacterial fragment in this context. Previously, we demonstrated that the receptor-binding fragment of DT could be replaced with a human scFv (single-chain fragment variable) targeting HBEGF (8). With our demonstration of the potential for targeting HBEGF for LSDs, future efforts will focus on increasing the affinity and specificity of these first-generation humanized LTMs to develop high-affinity chimeras with greatly reduced immunogenicity for further development.

While the ability of LTM-TPP1 to affect disease progression has yet to be determined, recent positive clinical trial results (35) and the subsequent approval of rTPP1 (cerliponase alfa) for treatment of neuronal ceroid lipofuscinosis 2 (NCL2) provide support for this approach. In that clinical trial, 300 mg of rTPP1 was administered by biweekly intracerebroventricular injection to 24 affected children, and this was able to prevent disease progression. While this dose is of the same order of magnitude as other approved ERTs (<1 to 40 mg/kg) (36, 37), it represents a substantial dose, especially considering that it was delivered to a single organ. Improving the efficiency of uptake by targeting an additional receptor as we have done here, is expected to greatly decrease the dose required to improve symptoms, while at the same time decreasing costs and the chances of dose-dependent side effects.

DTK51E/E148K, LTM, LTM-mCherry, mCherry-LTM, and HBEGF constructs were cloned using the In-Fusion HD cloning kit (Clontech) into the Champion pET SUMO expression system (Invitrogen). Recombinant proteins were expressed as 6His-SUMO fusion proteins in Escherichia coli BL21(DE3)pLysS cells. Cultures were grown at 37C until an OD600 (optical density at 600 nm) of 0.5, induced with 1 mM IPTG (isopropyl--d-thiogalactopyranoside) for 4 hours at 25C. Cell pellets harvested by centrifugation were resuspended in lysis buffer [20 mM tris (pH 8.0), 160 mM NaCl, 10 mM imidazole, lysozyme, benzonase, and protease inhibitor cocktail] and lysed by three passages through an EmulsiFlex C3 microfluidizer (Avestin). Following clarification by centrifugation at 18,000g for 20 min and syringe filtration (0.2 m), soluble lysate was loaded over a 5-ml His-trap FF column (GE Healthcare) using an AKTA FPLC. Bound protein was washed and eluted over an imidazole gradient (20 to 150 mM). Fractions were assessed for purity by SDS-PAGE, pooled, concentrated, and frozen on dry ice in 25% glycerol for storage at 80C.

TPP1 cDNA was obtained from the SPARC BioCentre (The Hospital for Sick Children) and cloned into the piggyBac plasmid pB-T-PAF (J.M.R., University of Toronto) using Not I and Asc I restriction sites to generate two expression constructs (pB-T-PAF-ProteinA-TEV-LTM-TPP1 and pB-T-PAF-ProteinA-TEV-TPP1). Stably transformed expression cell lines (HEK293F) were then generated using the piggyBac transposon system, as described (14). Protein expression was induced with doxycycline, and secreted fusion protein was separated from expression media using immunoglobulin G (IgG) Sepharose 6 fast flow resin (GE Healthcare) in a 10-ml Poly-Prep chromatography column (Bio-Rad). Resin was washed with 50 column volumes of wash buffer [10 mM tris (pH 7.5) and 150 mM NaCl] and then incubated overnight at 4C with TEV (Tobacco Etch Virus) protease to release the recombinant enzyme from the Protein A tag. Purified protein was then concentrated and frozen on dry ice in 50% glycerol for storage at 80C.

Cellular intoxication by DT was measured using a nanoluciferase reporter strain of Vero cells (Vero NlucP), as described previously (8). Briefly, Vero NlucP cells were treated with a fixed dose of DT at EC99 (10 pM) and a serial dilution of LTM, LTM-mCherry, mCherry-LTM, DTK51E/E148K, LTM-TPP1, or rTPP1 and incubated overnight (17 hours) at 37C. Cell media was then replaced with a 1:1 mixture of fresh media and Nano-Glo luciferase reagent (Promega), and luminescence was measured using a SpectraMax M5e (Molecular Devices). Results were analyzed with GraphPad Prism 7.04.

SPR analysis was performed on a Biacore X100 system (GE Healthcare) using a CM5 sensor chip. Recombinant HBEGF was immobilized to the chip using standard amine coupling at a concentration of 25 g/ml in 10 mM sodium acetate (pH 6.0) with a final response of 1000 to 2500 resonance units (RU). LTM and LTM-TPP1 were diluted in running buffer [200 mM NaCl, 0.02% Tween 20, and 20 mM tris (pH 7.5)] at concentrations of 6.25 to 100 nM and injected in the multicycle analysis mode with a contact time of 180 s and a dissociation time of 600 s. The chip was regenerated between cycles with 10 mM glycine (pH 1.8). Experiments were performed in duplicate using two different chips. Binding data were analyzed with Biacore X100 Evaluation Software version 2.0.2, with apparent dissociation constants calculated using the 1:1 steady-state affinity model.

HeLa cells were incubated with LTM-mCherry (0.5 M), mCherry-LTM (0.5 M), or LTM-TPP1 (2 M) for 2 hours. Cells were washed with ice-cold phosphate-buffered saline (PBS), fixed with 4% paraformaldehyde, and permeabilized with 0.5% Triton X-100. mCherry constructs were visualized with a rabbit polyclonal antibody against mCherry (Abcam, ab16745) and anti-rabbit Alexa Fluor 568 (Thermo Fisher Scientific). LAMP1 was stained with a mouse primary antibody (DSHB 1D4B) and anti-mouse Alexa Fluor 488 (Thermo Fisher Scientific).

Colocalization was quantified using the Volocity (PerkinElmer) software package to measure Manders coefficients of mCherry signal with LAMP1 signal. The minimal threshold for the 488- and 568-nm channels was adjusted to correct the background signal. The same threshold for both channels was used for all the cells examined.

CLN2/ fibroblast 19494 were incubated with LTM-TPP1 (2 M) for 2 hours. Cells were washed with ice-cold PBS, fixed with 4% paraformaldehyde, and permeabilized with 0.5% Triton X-100. LTM-TPP1 was visualized with a mouse monoclonal against TPP1 (Abcam, ab54685) and anti-mouse Alexa Fluor 488 (Thermo Fisher Scientific). LAMP1 was stained with rabbit anti-LAMP1 and anti-rabbit Alexa Fluor 568 (Thermo Fisher Scientific).

TPP1 protease activity was measured using the synthetic substrate AAF-AMC using a protocol adapted from Vines and Warburton (38). Briefly, enzyme was preactivated in 25 l of activation buffer [50 mM NaOAc (pH 3.5) and 100 mM NaCl] for 1 hour at 37C. Assay buffer [50 mM NaOAc (pH 5.0) and 100 mM NaCl] and substrate (200 M AAF-AMC) were then added to a final volume of 100 l. Fluorescence (380 nm excitation/460 nm emission) arising from the release of AMC was monitored in real time using a SpectraMax M5e (Molecular Devices). TPP1 activity in cellulo was measured similarly, without the activation step. Cells in a 96-well plate were incubated with 25 l of 0.5% Triton X-100 in PBS, which was then transferred to a black 96-well plate containing 75 l of assay buffer with substrate in each well.

crRNA targeting the signal peptide sequence in exon 2 of CLN2 was designed using the Integrated DNA Technologies (www.idtdna.com) design tool. The gRNA:Cas9 ribonucleoprotein complex was assembled according to the manufacturers protocol (Integrated DNA Technologies) and reverse transfected using Lipofectamine RNAiMAX (Thermo Fisher Scientific) into HeLa Kyoto cells (40,000 cells in a 96-well plate). Following 48 hours of incubation, 5000 cells were seeded into a 10-cm dish. Clonal colonies were picked after 14 days and transferred to a 96-well plate. Clones were screened for successful CLN2 knockout by assaying TPP1 activity and confirmed by Sanger sequencing and Western blot against TPP1 antibody (Abcam, ab54385).

The pro-form of TPP1 was matured in vitro to the active form in 50 mM NaOAc (pH 3.5) and 100 mM NaCl for 1 to 30 min at 37C. The autoactivation reaction was halted by the addition of 2 Laemmli SDS sample buffer containing 10% 2-mercaptoethanol and boiled for 5 min. Pro and mature TPP1 were separated by SDS-PAGE and imaged on a ChemiDoc gel imaging system (Bio-Rad).

Proteins or cellular lysate were separated by 4 to 20% gradient SDS-PAGE before being transferred to a nitrocellulose membrane using the iBlot (Invitrogen) dry transfer system. Membranes were then blocked for 1 hour with a 5% milktris-buffered saline (TBS) solution and incubated overnight at room temperature with a 1:100 dilution of mouse monoclonal antibody against TPP1 (Abcam, ab54685) in 5% milk-TBS. Membranes were washed 3 5 min with 0.1% Tween 20 (Sigma-Aldrich) in TBS before a 1-hour incubation with a 1:5000 dilution of sheep anti-mouse IgG horseradish peroxidase secondary antibody (GE Healthcare) in 5% milk-TBS. Chemiluminescent signal was developed with Clarity Western ECL substrate (Bio-Rad) and visualized on a ChemiDoc gel imaging system (Bio-Rad).

rTTP1 and LTM-TPP1 were treated with EndoH (New England Biolabs) to remove N-glycan modifications. Enzymes were incubated at 1 mg/ml with 2500 U of EndoH for 48 hours at room temperature in 20 mM tris (pH 8.0) and 150 mM NaCl in a total reaction volume of 20 l. Cleavage of N-glycans was assessed by SDS-PAGE, and concentrations were normalized to native enzyme-specific activities.

Cryopreserved TPP1+/ embryos were obtained from P. Lobel at Rutgers University and rederived in a C57/BL6 background at The Centre for Phenogenomics in Toronto. Animal maintenance and all procedures were approved by The Centre for Phenogenomics Animal Care Committee and are in compliance with the CCAC (Canadian Council on Animal Care) guidelines and the OMAFRA (Ontario Ministry of Agriculture, Food, and Rural Affairs) Animals for Research Act.

TPP1/ mice (60 days old) were anesthetized with isoflurane (inhaled) and injected subcutaneously with sterile saline (1 ml) and meloxicam (2 mg/kg). Mice were secured to a stereotactic system, a small area of the head was shaved, and a single incision was made to expose the skull. A high-speed burr was used to drill a hole at stereotaxic coordinates: anteroposterior (A/P), 1.0 mm; mediolateral (M/L), 0.3 mm; and dorsoventral (D/V), 3.0 mm relative to the bregma, and a 33-gauge needle attached to a 10-l Hamilton syringe was used to perform the intracerebroventricular injection into the left ventricle. Animals received either 1 or 5 l of enzyme (5 g/l), injected at a constant rate. Isoflurane-anesthetized animals were euthanized by transcardial perfusion with PBS. Brains were harvested and frozen immediately, then thawed and homogenized in lysis buffer [500 mM NaCl, 0.5% Triton X-100, 0.1% SDS, and 50 mM Tris (pH 8.0)] using 5-mm stainless steel beads in TissueLyser II (Qiagen). In vitro TPP1 assay was performed, as described, minus the activation step.

Acknowledgments: We thank P. Lobel at Rutgers University for providing the TPP1-deficient mice. Funding: We are grateful to the Canadian Institutes of Health Research for funding. Author contributions: S.N.S.-M. devised and performed experiments and drafted the initial manuscript. G.L.B. provided materials and assisted in conceptualization and experimental design. X.Z., D.Z., and R.H. contributed to the experimental design and performed experiments. P.K.K. and B.A.M. contributed to the experimental design. J.M.R. contributed to the experimental design and revised the manuscript. R.A.M. assisted in conceptualization, contributed to the experimental design, and assisted in writing the manuscript. Competing interests: B.A.M. is a chief medical advisor at Taysha Gene Therapies. The authors declare that they have no other competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Read more:
Exploiting the diphtheria toxin internalization receptor enhances delivery of proteins to lysosomes for enzyme replacement therapy - Science Advances

Haywards Heath woman’s bid to fund stem cell treatment to combat MS – Mid Sussex Times

Joceline Colvert was diagnosed with relapsing remitting Multiple Sclerosis in her early 20s and says she spent the first eight years researching and managing her condition while trying to mention it as little as possible to others and completing her Sound Production degree.

I spent most of my late 20s and early 30s finding ways to manage relapses, the symptoms of which have included whole body numbness, loss of the use of both hands, right eye blindness, vertigo and double vision, she said. Thankfully these symptoms did resolve however left scarring on my nerves. This results in reduced vision in my formerly blind eye and hands that dont function very well with repetitive tasks.

This semi-denial worked for me until about 2010 when I started to become a bit limpy which I did my best to hide. After a couple of memorable falls and fractures I decided to face up to being slightly rickety and got a hiking pole that I used occasionally in public. Since then Ive needed to get used to being visibly disabled, and switch between two hiking poles for very short distances and a wheelchair everywhere else.

Joceline, who lives with her husband and her five beloved cats and dogs, says she is not eligible for Haematopoietic Stem Cell Transplantation (HSCT), on the NHS which is the first treatment I have ever got excited about and believe could work. It could be truly life-changing.

As a result she is trying to raise money to fund the treatment herself.

HSCT is a procedure that aims to reset the faulty immune system which, in my case, is attacking my nervous system from within, Joceline said. Stem cells will be taken from my bone marrow or blood before my immune system is wiped out with chemotherapy. My cells are then reintroduced into my blood, where they grow a new immune system which will hopefully no longer attack my nerves or have any memory of MS.

The aim of HSCT is to completely halt progression, putting MS into remission with no requirement for immunosuppressant drug therapy. The success rate for relapsing remitting MS is 80% - 90% which is absolutely phenomenal compared to the limited available drug treatments, which only aim to slow down disability.

HSCT is available on the NHS, however there is a very strict criteria for which I do not qualify. The expense of the treatment and the increased pressures on the public purse mean the NHS will only treat patients who have been diagnosed for fewer than 15 years.

I have been diagnosed for 18 years.

I had prepared myself for this possibility and, for the last year, have been researching treatment with The National Pirogov Medical Centre Russia (Moscow). Russia has been pioneering in their use of HSCT to treat MS and are world renowned for their expertise and care. Im excited to have a treatment date in March 2021 which fills me with hope for a future free from progression. I need your help to get there.

Joceline, who loves making stop-motion animation puppets and props and playing musical instruments, says the treatment will cost 40,800, and the flights 800.

She has launched a Go Fund Me page at https://gf.me/u/y538k2 which has already seen donations of more than 26,000.

I am incredibly grateful for any help you can give towards enabling me to access this life-changing treatment, she said.

After almost two decades of managing MS flare-ups and their consequences, its hard to put into words just what a future without them would mean to me.

Thank you for reading this and for any help you can put towards this goal.

Read the original:
Haywards Heath woman's bid to fund stem cell treatment to combat MS - Mid Sussex Times

Family of sick girl whose stem cell donor pulled out at the last minute find a replacement – The Sun

THE campaigning family of a sick girl whose stem cell transplant match pulled out at the last minute have found a new one.

Evie Hodgson, eight, has a deadly blood disorder and was preparing for the operation when her donor cancelled without explanation.

4

4

Her family launched an appeal to find another, telling their story on ITVs This Morning, which saw 25,000 people sign up and found a second match.

Evies mum Tina, 37, said: This is the best Christmas present we could ever wish for.

Evie, of Whitby, North Yorks, was diagnosed with aplastic anaemia in May.

Medics were delighted when their global search for a bone marrow donor found a match only for disaster to strike.

RAF worker Tina, married to chief executive Andy, 49, said: We got the call about the new match while Evie was being treated for an infection.

4

4

Exclusive

Latest

"She said, Mummy, youre my hero. It was so emotional.

Evie, who has a brother, is to have the op at the Great North Childrens Hospital in Newcastle in January.

She said: Thank you so much to everyone. Youve saved my life.

GOT a story? RING The Sun on 0207 782 4104 or WHATSAPP on 07423720250 or EMAILexclusive@the-sun.co.uk

Go here to see the original:
Family of sick girl whose stem cell donor pulled out at the last minute find a replacement - The Sun

Integration of intra-sample contextual error modeling for improved detection of somatic mutations from deep sequencing – Science Advances

INTRODUCTION

The process of single-nucleotide variant (SNV) accumulation is an important universal element of cancer initiation and progression. While the genetic landscape of the most common malignancies has been broadly described (13), accurate identification of driver mutations in specimens with low cancer DNA purity continues to be of great importance yet presents substantial challenges. Hybrid-capturebased next-generation sequencing (NGS) is one of the most common techniques being used for circulating tumor DNA profiling (4, 5), detection of therapy-resistant clones (6, 7) and preleukemia (8, 9), and monitoring disease burden during therapy (10). Nevertheless, in all of these settings, the relevant genomic alterations typically exist at low relative abundance.

Several different methods have been developed in recent years to address the barrier of identifying the minute fraction of DNA molecules harboring an alteration against the high background of NGS-associated errors. Among the various methods, state-of-the-art techniques for error suppression typically can be categorized into two groups: (i) those that incorporate unique molecular identifiers (UMIs) to suppress library amplification errors by the assembly of consensus sequences (1113) and (ii) those that use probabilistic models to estimate background sequencing noise. The latter group can be further segregated into those that generate models that estimate error rates by the analysis of data from a single sample (i.e., single sample/tumor-only mode) (1416), data from a single control sample (1618), or data from multiple control samples (e.g., cohort of healthy controls) (1921). In the case of paired patients tumor and matched normal sample, Bayesian statistics models are commonly used to identify tumor-specific somatic variants that are distinguishable from the background and the germline variants detected within the matched normal sample (22, 23). Some techniques rely on a ploidy assumption to calculate genotype probabilities (24), while others have adapted statistical models to analyze allele frequencies directly (16), thus allowing the identification of rare subclones in existing, complex cancer genomes. Since a single control sample cannot fully account for the stochastic nature of NGS errors, other algorithms have been developed to generate site-specific error estimations using a larger cohort of controls (1921). This approach could be problematic as proper control samples are not always available. When control samples are completely lacking, stringent preprocessing steps can be applied to prioritize high-confidence mutations, for instance, thresholds on base quality scores, supporting read counts, and variant allele frequencies.

Despite advances enabled by the diverse approaches mentioned above, each is associated with inherent disadvantages that can lead to increased assay complexity, elevated sequencing costs, and/or suboptimal exchange between sensitivity and specificity (fig. S1, table S1, and Supplementary Note). To overcome these limitations, we characterized the contextual patterns of high-frequency errors observed during targeted hybrid-capture NGS in >1000 samples, divided across multiple technically diverse and clinically relevant human cohorts. On the basis of these patterns, we developed Espresso, a novel UMI-independent method that optimizes the suppression of artifacts from deep NGS for accurate SNV mutation calling.

To demonstrate the challenges associated with lowvariant allele fraction (VAF) mutation calling from hybrid-capturetargeted NGS, we interrogated multiple benchmarking datasets that differ by their library preparation techniques, captured genomic loci, number of samples, and sequencing depths (Fig. 1A, table S2, and Materials and Methods). Briefly, these datasets include the following: (i) CB: a human cord blood dataset; (ii) CL: a cell line dilution series using genomic DNA from the acute myeloid leukemia (AML) cell line MOLM13 and the colon cancer cell line SW48; (iii and iv) pre-AML1 and pre-AML2: peripheral blood DNA from two separate cohorts, each composed of pre-AML cases (that is, blood was drawn before clinical diagnosis of AML) and age- and sex-matched controls (9); and (v) AML-MRD: a cohort composed of peripheral blood DNA samples obtained from patients with AML during the course of treatment.

(A) Raw, SSCS, and duplex average sequencing depths across all the samples included in this study. Different colors represent different datasets, and these are consistent across all of the figure panels. (B) Sample-wide error abundance in the diverse NGS cohorts. The fraction of genomic positions being observed with at least one nonreference allele supporting read in each sample is indicated. Error burden is significantly different among the investigated datasets (Mann-Whitney test: P < 1.2 1053 for the indicated comparisons). (C) Inverse correlation between the abundance of genomic positions with nonreference allele and their corresponding allele frequencies is demonstrated (Spearmans rank order correlation: r = 0.95; *P < 2.2 1016). Each dot represents a single sample. (D) Panel-wide error abundance in the diverse NGS cohorts as determined by the inclusion of positions with a minimum of one nonreference supporting read in at least one sample. NA, not applicable.

Three different target panels were used to sequence these cohorts, resulting in 83,000 to 1.2 million interrogated bases (table S2). Investigating these genomic loci revealed that the percentage of positions with nonreference alleles per sample varied widely among the different datasets and, in some cases, among samples within a particular dataset (Fig. 1B). Samples with a lower percentage of positions with nonreference alleles displayed higher average error rates (Fig. 1C). Furthermore, almost all genomic positions sequenced harbored a nonreference allele in at least one sample in each dataset (Fig. 1D). Overall, these observations reveal the magnitude of the challenge presented by potential false-positive variants produced by hybrid-capture NGS. Since such a large number of technical artifacts may mask clinically relevant variants, we conducted an unbiased exploration of multiple strategies aiming to specifically suppress NGS errors while maintaining high sensitivity in identifying real mutations.

To evaluate the contextual dependencies of errors in the datasets described above, we investigated how error rates differ with respect to the substitution type, and its 5 and 3 one-base flanking genomic sequence. We found that error rates are highly heterogeneous across the 192 distinct trinucleotide sequence contexts (Fig. 2A, top, and fig. S2) and are highly variable between samples within the same experimental cohort (Fig. 2A, bottom). High error rates were frequently observed at C>A and C>T substitutions (Fig. 2, A and B, and fig. S2). C>T error rates were particularly high when they occurred at a CpG context (Fig. 2, A and C, and fig. S2). Initiated by spontaneous deamination of 5-methylcytosine, real mutations in this context accumulate during aging (25), are frequent in germline cells (Supplementary Note), and are also highly prevalent in cancer genomes (26), emphasizing the importance of evaluating error rates in relation to their associated genomic contexts.

(A) Nonreference average error rates at the 192 distinct trinucleotide contexts are shown using the AML-MRD dataset. Vertical lines in each box represent individual samples. Samples order is kept among distinct contexts. Arrows represent a group of samples with high error rates across multiple contexts. The bottom panels exemplified variation among contextual error rates (*Wilcoxon signed-rank test: P < 1.8 1017) and samples (Mann-Whitney test, samples with the highest and lowest error rates. C[G>T]C: P < 7.7 1041, T[A>C]C: P < 3.6 106). (B). C>T and C>A substitutions are more frequent (Wilcoxon signed-rank test, P < 1.4 10252 for all the comparisons with the other substitution types). (C) High error rates at CpG sites (Wilcoxon signed-rank test, P < 1.1 1064 for all comparisons). (D) Error rates vary between error contexts and their reciprocals (Wilcoxon rank sum test, P < 0.05; #significance was not reached). (E) Average sequencing depths. Arrows represent a group of samples with low sequencing depths across multiple contexts. (F) Reduced sequencing depth at contexts that include reference cytosine and an increasing number of guanine (Pearson correlation: r = 0.35; P = 2.3 10264) and at contexts that include reference guanine with an increasing number of cytosine (r = 0.29; P = 8.6 10179). (G) Low sequencing depth at contexts with C>G or G>C base substitutions (Wilcoxon signed-rank test: P = 1.7 10217). (H) Inverse correlation between depth and error rates (black dashed line, log-log scaled Pearson correlation: r = 0.27; P = 9.7 10308). Correlation strengths differ among different error contexts (colored dashed lines). (I) The number of nonreference supporting reads at the 192 distinct trinucleotide contexts is shown. The samples order is identical across (A), (E), and (I).

While contextual error patterns were generally similar between their complementary counterparts, they did not always mirror each other perfectly within any particular sample (fig. S3A). Small yet statistically significant asymmetric error rates were consistently observed among the majority of error contexts in each of the cohorts (Fig. 2D and fig. S3B). For instance, we measured asymmetric error rates involving G>T/C>A, in line with prior observations (27). Error rate asymmetries were markedly directional and consistently elevated in specific contexts as compared with their matched reciprocals in all of the investigated datasets. As an example, each of the 16 trinucleotide contexts containing A>T substitutions demonstrated elevated error rates as compared with their corresponding reciprocal contexts containing T>A substitutions. Together, these results indicate that 192, rather than 96, contextual error types would need to be considered to accurately model error rates.

Next, we investigated how sequencing depth may influence error frequencies. As with error rates, sequencing depth differed between distinct contextual error types (Fig. 2E). We noticed a marked inverse correlation between sequencing depth and guanine or cytosine content within specific trinucleotide contexts, a possible reflection of the systemic under-coverage in GC-rich regions reported in NGS (Fig. 2F) (28, 29). Sequencing depth was also lower within trinucleotide contexts that included C>G and G>C substitutions as compared with those that included nucleotide substitutions that reduce GC content (Fig. 2G). These data illustrate how sequencing depth can be influenced by both the trinucleotide context and the nonreference allele.

Overall, a modest, statistically significant inverse correlation was observed between sequencing depth and error rates (Fig. 2H). Correlation strengths were not equal among distinct contextual error types. Further supporting this trend, individual samples with lower average sequencing depth displayed high error rates in multiple contextual error types (see arrows in Fig. 2, A and E). In contrast to the error rates, the absolute number of nonreference supporting reads at the distinct contextual error types showed reduced inter-sample differences in those samples; however, the differences between distinct contextual errors were preserved (Fig. 2I). Collectively, the results obtained here suggest that integration of intra-sample contextual error modeling of nonreference supporting reads at each of the 192 contexts may be a promising strategy for accurate suppression of errors produced by hybrid-capture NGS.

As described above, errors varied across samples yet were highly stereotypical according to sequence context and sequencing depth. We reasoned that intra-sample contextual error patterns could be leveraged for in silico error suppression. Such an approach could have several inherent advantages over existing error suppression methods that rely on UMIs, apply thresholds based on intra-samplewide error rates, or use control samples to train error rate models. Therefore, we devised a computational approach, called Espresso, to model within a sample of interest the nonreference allele counts at each of the 192 distinct contextual error types. Espresso incorporates three distinct features that make it robust to different sequencing datasets (Supplementary Note): (i) pragmatic pre-filters that prepare the dataset for error modeling (fig. S4), (ii) automatic selection of the most appropriate probabilistic distribution for error modeling at a particular contextual error type (fig. S5), and (iii) utilization of nonreference supporting reads as opposed to VAF for error modeling (fig. S6). Unlike applying fixed and arbitrary cutoffs (e.g., minimum VAF, coverage, and number of supporting reads), nonreference alleles would not be indiscriminately eliminated by such an approach; rather, mutations would only be called if they reached statistical significance when compared to their corresponding error distributions (Fig. 3, A to E, and Materials and Methods).

Flowchart illustrating the error modeling technique that is implemented by Espresso. (A) Following the summarization of the sequencing data to include the dominant alleles at each investigated genomic position, their corresponding read counts, and the average mapping read qualities in each sample of interest, a set of filters is being applied, aiming to deplete potential somatic SNVs and common polymorphism from being included in the error models. (B) On the basis of the distribution of the nonreference supporting reads in the enriched error list, Espresso selects between either the exponential or the Weibull probabilistic approaches. (C) The nonreference supporting read (SR) counts in each sample are being grouped based on the genomic sequence context to generate 192 context-specific distribution models. (D) The models are being reapplied to the entire samples data for outlier identification. True positives are being determined if they reach statistical significance when compared to their corresponding error distribution. (E) The cumulative distribution function graph displays the empirical data (black dots) and the theoretical data (blue line) generated by the 192 models in all the samples included in the CB dataset (top, exponential models) and the AML-MRD dataset (bottom, Weibull models). (F) Panel-wide error rates defined as the number of nonreference alleles supporting reads following error suppression, divided by all the reads from the same category (i.e., raw, SSCS, and duplex reads) across the entire 1,264,830-bp panel and (G) percentage of error-free positions in the 10 cord blood samples are illustrated. For error suppression, a cutoff P value 0.05 (Bonferroni-adjusted) was used. SSCS and duplex cutoffs are 1 nonreference supporting read unless indicated otherwise. * indicates Wilcoxon signed-rank test: P < 0.002.

To evaluate the performance of Espresso, we first applied it to the CB dataset. We reasoned that CB would have a minimal burden of somatic mutations, allowing for a more precise estimation of true error rates. We also tested in parallel other common error suppression techniques for unbiased comparative performance assessment (Materials and Methods). The techniques selected for comparison were representative of the spectrum of previously published tools. Specifically, we used two UMI-based methods, namely, single-strand consensus sequences (SSCSs) and duplex sequences (12), and two statistical methods for error correction that model background error distributions differently. Among the two statistical methods used, one relies on a training cohort to estimate error rates at the allele level (termed AL here) (20), and the other estimates error rates at the sample level (termed SL here) (14) without consideration for distinct sequence contexts.

Panel-wide error rates were highly similar among the 10 CB samples but varied significantly among the different error suppression methods (Fig. 3F). As compared with the various statistical approaches (i.e., SL, AL, and Espresso), the UMI-based methods demonstrated inferior error suppression capabilities. A minimum of nine nonreference supporting SSCS reads or three nonreference supporting duplex reads were required to achieve panel-wide error rates comparable to that of SL and Espresso in the CB dataset. We observed similar relative performance among the methods to maximize the number of error-free positions across the entire target panel (Fig. 3G). Considering the highest panel-wide error rate obtained by Espresso (2.74 106) and the lowest of the panel-wide error rate observed without error suppression (0.025) across the CB samples, Espresso achieved an error rate reduction of more than 9000-fold.

To evaluate the sensitivity and specificity exchange delivered by Espresso, we analyzed the sequencing data from the CL dataset, which consisted of a dilution series using two cancer cell lines, MOLM13 and SW48. For sensitivity measurements, we assessed the ability of the different methods to detect 119 MOLM13-specific germline variants at the different dilutions (table S3). To evaluate specificity, we assessed the miscalling of 186 AML-related somatic hotspot mutations that are covered by the target panel but are absent from both cell lines (table S3). Espresso outperformed all the other methods in distinguishing between true and false variants (Fig. 4A). In contrast, duplex sequencing achieved the smallest area under the receiver operator curve (AUC), highlighting the low diagnostic accuracy of this method and, consequently, its limited clinical utility in detecting variants across large hybrid-capture panels.

(A) Espresso demonstrates improved sensitivity versus specificity and (B) preferable precision-recall trade-offs as compared with the various indicated methods. The ability of each method to differentiate between 119 positive alleles and 186 negative control variants in a set of serially diluted cell line DNA samples was tested. (C and D) No substantial benefit of using UMIs to augment Espressos performance could be determined. Sensitivities and specificities were measured at all the possible combinations of the unique P values outputted by Espresso and the unique numbers of SSCS or duplex nonreference supporting reads that were observed in the dataset. The maximum sensitivities at each calculated value of specificity are illustrated. (E to H) Sensitivity versus specificity trade-offs derived by the reduced and extended contextual error modeling approaches are illustrated in comparison with Espresso. Ninety-five percent confidence intervals (shaded colors) and average values were derived by three random subsets of the data for each one of the indicated in silico decreased panel sizes. (I) Heatmap illustrating the percentage of contextual models that can be generated by Espresso when data are being restricted by either panel size reduction or sequencing depth reduction, or both. Data removal was controlled for both the reference and nonreference supporting reads, thus keeping the variant allele frequencies of the nonreference alleles similar to those in the original samples. The red line illustrates such combinations, of which 90% or more of the distinct contextual models could have been generated in every sample in the CL dataset. With datasets that fall below this line, the 12-model contextual error modeling approach can be used in addition to Espresso.

The use of hybrid-capture NGS panels allows for the detection of mutations at thousands of genomic positions. However, their use also creates unique challenges for true variant identification across so many bases. In addition to high sensitivity and specificity, positive predictive value (PPV) must be prioritized to maximize utility. We assessed PPV in conjunction with sensitivity (i.e., precision-recall analysis). We focused on variants with expected VAF 0.2%, since accurate variant detection below this threshold is clinically important yet has proven to be a great challenge for existing hybrid-capture NGS platforms (5, 30). Espresso provided a sensitivity of 19.9%, thus achieving the highest number of true-positive, low-VAF alleles at 100% PPV among the tested methods (Fig. 4B). This corresponds to a 6.8-fold improvement as compared to AL, which was the next best-performing method to detect low-VAF alleles without scarifying PPV. Notably, SL performed far worse in this analysis than the other methods due to a high number of false-positive calls across various sensitivity thresholds. This result highlights the limited power of noncontextual, sample-level error modeling in detecting mutations with very low read support despite its ability to achieve an extremely high level of error suppression (Fig. 3, F and G). Further supporting this, we compared the false-positive and true-positive calls obtained by Espresso with that of Mutect2 (16) at tumor-only mode. Once more, Espresso demonstrated superior results (table S4).

Previously, the suppression of errors through statistical error modeling was shown to be enhanced by combination with UMI-based approaches (20). However, integrating UMI information with Espresso did not confer significant performance improvements (Fig. 4, C and D), suggesting that accurate detection of low-frequency variants can be achieved with Espresso alone. Collectively, the comparative analysis using the CL dataset indicates that the bioinformatic strategy applied here outperformed other methods in the reliable distinction of low-frequency errors from real SNVs.

To characterize pragmatic constraints of our method, we compared Espresso with alternative sequence context-based error models. Specifically, we included (i) a simplified 12-model design that accounts only for the 12 possible distinct substitution types without consideration of flanking bases and (ii) an expanded 3072-model design that accounts for the substitution type and for two additional 3 and 5 flanking bases. We evaluated the impact of panel size (i.e., number of interrogated bases) and sequencing depth on the performance of Espresso and the alternative sequence context-based models using the CL dataset.

This comparative analysis exposed critical factors affecting the performance of the alternative models. On the one hand, the performance of the 3072-model approach suffered with reduced panel size (Fig. 4, E to H, and fig. S7A). This is an expected observation that is attributed to the reduction in the number of nonreference alleles being used to populate a relatively high number of models, thus resulting in either model generation failure or an inadequate estimation of the background error noise. In contrast, performance of the 12-model approach was less dependent on panel size since the relatively small number of models was easily populated with nonreference alleles (Fig. 4, E to H, and fig. S7B); however, Espresso consistently outperformed the 12-model approach, presumably because the 12 models were insufficient to account for errors arising within distinct sequence contexts. Moreover, the 12-model approach performed poorly on the largest panel size, possibly as a result of model overfitting from high-VAF errors that escape the initial filtering steps (Materials and Methods). The performance of Espresso was relatively consistent across a broad range of panel sizes from ~1 Mb down to ~50 kb (Fig. 4, E to H, and fig. S7C).

Next, we serially downsampled the CL dataset to simulate various practical scenarios of panel sizes (1 Mb to 32.5 kb) and sequencing depths (4500 to 1000). At each simulated panel-depth combination, we determined the percentage of trinucleotide contexts that could be modeled directly by Espresso (Fig. 4I). Notably, low represented nonreference alleles that cannot be modeled directly by Espresso would still be analyzed automatically by alternative techniques that are included in the software package (see Data and materials availability). Overall, these results illustrate the performance dependencies of Espresso and related sequence contextbased models to assist with their implementation in a wide range of sequencing settings.

Having demonstrated Espressos high analytical performance in the CB and CL datasets, we next sought to evaluate its clinical utility. The presence of persistent AML clones that carry genetic abnormalities during or after treatment has been shown to carry crucial prognostic information (31). Therefore, we assembled a cohort of 42 patients with AML (AML-MRD; table S5) whose mutations were previously determined at diagnosis (table S3). Forty of the 42 patients had serial samples analyzed by ultra-deep hybrid-capture NGS at two time points during therapy; for the other two patients, single follow-up samples were available.

Since minimal/measurable residual disease (MRD) monitoring may guide clinical decisions (3234), in addition to true positives, both false positives and false negatives could have tremendous implications for patient care. We therefore evaluated F1 scores, which represent the harmonic mean of PPV and sensitivity. For comparative performance evaluation, mutations reported at diagnosis were considered as true positives if they were detected in the follow-up samples of the same patient or as false positives if they were detected in other patients. We first applied a cutoff of 0.05 (Bonferroni-adjusted) for the probabilistic methods SL, AL, and Espresso and a heuristic threshold of 1 nonreference supporting reads for the UMI-based methods SSCS and duplex. Tested on the subset of samples obtained at either the first time point (T1, closer to diagnosis) or the second time point (T2, further into treatment), Espresso delivered the highest F1 scores (0.71 at T1 and 0.74 at T2) followed by AL and duplex (Fig. 5A). We next applied the optimized SSCS and duplex cutoffs used in the CB analysis (i.e., 9 and 3 nonreference supporting reads, respectively). Although F1 scores improved with these parameters, they still fell short due to an increased number of false positives for SSCS 9 and an increased number of false negatives for duplex 3 in both the T1 and the T2 data subsets as compared with Espresso (Fig. 5B).

(A) Espresso provides a preferred balance between precision (PPV) and recall (sensitivity), as determined by the inspection of 78 SNVs reported across 35 of 42 patients at the time of AML diagnosis. Mutations were called in the patients sample at 21 different iterations. In each iteration, 6 random patients of the 42 were excluded. Median F1 scores and 1 SD are shown for the various methods tested at two time points during the course of treatment (T1 and T2, Wilcoxon signed-rank test: P 6.4 105 for all the comparisons with Espresso). (B) The variation in the mutations being called by Espresso ( 0.05, Bonferroni-adjusted), SSCS (9 nonreference supporting reads), and duplex (3 nonreference supporting reads) is illustrated. Red color indicates called mutations, while blue color indicates that mutations were not detected. FP, false positives; FN, false negatives. (C) Sensitivity versus specificity as determined by the different tested methods. (D) Enrichment of clones, carriers of TP53, and DNMT3A mutations is observed in patients with AML following therapy. The y axis represents the number of mutations detected, classified by the affected genes.

Despite the technical differences between the CL and AML-MRD datasets, Espresso once again produced the most preferred balance between sensitivity and specificity (Fig. 5C). We compared Espresso with additional algorithms and saw consistent outcomes. Espresso outperformed Mutect2 (16) in both the tumor-only mode and the panel of normals mode when samples obtained from 14 healthy adults were used (table S4). Espresso also outperformed deepSNV (18), a statistical algorithm that was developed specifically for the accurate detection of SNVs from deep targeted sequencing experiments. The comparison with deepSNV extrapolates beyond the probabilistic approaches being used and illustrates the benefits of other features implemented in our bioinformatic pipeline for the reduction of false-positive calls (fig. S8).

Having established Espresso as the preferred methodology to maximize the accuracy of SNV detection from peripheral blood, we next sought to implement it for the characterization of clonal dynamics in patients with AML. Since the competitive balance among different hematopoietic clones is likely to change during multiple rounds of chemotherapy, we hypothesized that Espresso would enable the identification of resistant clones that were not reported at diagnosis. We therefore extended our analysis to include an additional 147 highly recurrent AML SNVs that are covered by the AML-MRD hybrid-capture panel (table S3). Across all the samples, Espresso identified 92 mutations ( 0.05, Bonferroni-adjusted) with the lowest being reported at VAF = 0.0135% (table S6 and fig. S9). These correspond to 59 distinct mutations, out of which 47 (~80%) were present in at least two samples of the same patient (that is, reported at diagnosis and detected in at least one additional time point by Espresso or detected in the two follow-up samples by Espresso). Such a high percentage of validated mutations is an indicator of Espressos reliable mutation calling. Among these, Espresso has enabled the detection of 22 new putative driver SNVs not reported at diagnosis in 15 patients, including in 3 of the 7 patients (~43%) with no SNVs in the diagnostic report (table S6). Further supporting the validity of the mutations called by Espresso, most of these newly identified mutations were in genes that commonly contribute to positive clonal selection following cytotoxic chemotherapy (3537), including TP53 and DNMT3A (Fig. 5D).

Together, our results demonstrate substantial advantages of Espresso over other methods for SNV detection from peripheral blood of patients with AML during the course of therapy. Encouraged by a recent consensus document release from the European LeukemiaNet MRD Working Party (38), many studies are now underway to evaluate the prognostic and predictive significance of clonal dynamics in AML and the proposed role of MRD detection as a surrogate endpoint for clinical trials (39). Implementation of Espresso in these contexts has the potential for significant clinical utility.

Age-related clonal hematopoiesis (ARCH) is a common phenomenon evident by the presence of somatic mutations in hematopoietic stem cells of otherwise healthy individuals that cause a clonal expansion of the stem cells and their progeny (40). Recently, our group reported several hundred ARCH-associated mutations spread across 27 genes with various contributions to the risk of AML transformation (9). Our study provided a proof of concept for risk prediction of AML. Nevertheless, large population screens using broad sequencing panels remain socioeconomically unattractive because of high costs, the relatively low incidence of AML, and the relatively high incidence of ARCH in the general population.

To address these challenges, we reasoned that interrogating a small number of highly recurrent AML mutations would be a more tractable approach than broad hybrid-capture sequencing. This approach could theoretically result in improved segregation between pre-AML and controls while reducing sequencing costs. The success of this approach relies on the accurate identification of preleukemic mutations in asymptomatic individuals.

We first compiled datasets that would allow comparisons among the distinct methods used in our previous analyses. For this reason, we focused initially on the pre-AML1 dataset, which contains UMIs in the sequencing reads, and the CB dataset, which could be used as a training set for error rate estimation at the AL. Putative driver SNVs (that is, mutations in coding sequences other than synonymous SNVs and mutations at splice sites) identified by each method at the recurrently mutated genomic loci were used to derive random forest classifiers that were trained and tested on their corresponding methods mutation calls (table S7). For the probabilistic methods, 0.05 (Bonferroni-adjusted) was used, and for the UMI-dependent methods, we applied either a threshold of one supporting consensus read or SSCS 9 and duplex 3. The Espresso-derived classifier exhibited the highest level of performance for discriminating pre-AML from controls (AUC: 0.74) and reported the highest sensitivity (46.8%) at 100% specificity (Fig. 6A). A reduction in specificity down to 96.3 or 93.7% was needed to achieve the same sensitivity with the SL-derived and SSCS-derived classifiers, respectively. The SSCS-derived model also underperformed the Espresso-derived classifier when the SSCS 1 cutoff was applied (AUC: 0.66, Fig. 6A, dashed line). The duplex 3 derived classifier had the poorest performance (AUC: 0.42), owing to poor duplex consensus efficiency (fig. S1B), low duplex coverage (Fig. 1A), and subsequent dropout of mutations not meeting the required cutoff. On the contrary, with a threshold of one supporting duplex read, a large number of putatively false-positive SNVs were called, resulting in poor classification accuracy (AUC: 0.65, Fig. 6A, dashed line). The AL-derived classifier also performed poorly due to a high number of false-positive SNVs (AUC: 0.62).

Classification performance evaluation of pre-AML and control, mutated samples. (A) Each classifier was trained and tested on the mutations that were obtained from the classifiers corresponding method. (B) Comparison between the Espresso and the SL-derived classifiers. In this iteration, each classifier was trained using its corresponding methods mutation calls and was tested in its accuracy to classify pre-AML cases and controls, including mutated samples identified by the other method as well. (C) Comparative performance validation between the Espresso and the SL-derived classifiers to differentiate between pre-AML and control samples obtained from an additional validation dataset (8). Information regarding the study participants age, specific mutations, and their VAFs was obtained directly from the main text. (D) Performance estimation using the validation dataset and simulated controls. (E) Precision-recall trade-offs are calculated at the individual level (that is, serial samples are accounted for single individuals and individuals without any mutations are also included in the performance measurements). The red dot indicates AMLs incidence rate. This is equivalent to a situation where no screen is being conducted at all [PPV = incidence rate = 0.006% (44), SN = 100%]. The green dot indicates the model performance using an additional published dataset consisting of 11,262 individuals when the model was set to achieve 100% specificity in the training set. Horizontal color bars represent PPV ranges determined for screening mammography for breast cancer (54) and fecal immunochemical test for advanced adenomas and colorectal cancer (CRC) (55). Comparison with the genetic risk model performance shows the extent to which sensitivity must be compromised to achieve PPV comparable with these widely applied early detection tests.

There is a low cumulative risk of ARCH progression to hematologic neoplasms (41). For this reason, the implementation of a population-based pre-AML genomic screening test would need to achieve exceedingly high specificity and low false-positive rate. We therefore prioritized the Espresso- and SL-derived classifiers for subsequent performance evaluation. Additional mutations that were found by Espresso and SL in the pre-AML2 dataset were included in the analysis (table S7). Each classifier was trained on the mutations found by its corresponding method in both the datasets (pre-AML1 and pre-AML2) and tested on the data that include all the mutations detected by either of the two methods. The Espresso-derived classifier once more provided a better overall sensitivity-specificity balance and a greater sensitivity at 100% specificity (Fig. 6B). Similar trends were observed when both the classifiers were applied to an external validation set consisting of mutations called in 188 pre-AMLs and 181 controls (8), with the Espresso-derived classifier again displaying higher discriminatory accuracy (Fig. 6C). Together, the superior classifier performance using mutations called by Espresso illustrates that accurate mutation calling is imperative when designing genetic risk prediction models.

To estimate how well the winning classifier would perform as a population-wide screening test, we spiked the validation set into >4 million in silico simulated controls (prevalence ~0.005%; Materials and Methods). Despite the small genomic footprint (table S8), the Espresso-derived classifier resulted in accurate identification of the mutated pre-AML samples (AUC: 0.84; Fig. 6D). As an example, when the model was tuned to minimize false-positive calls based on the pre-AML1/pre-AML2 merged training dataset, a sensitivity of 29.3% and a specificity of 99.8% were obtained. Precision-recall analysis revealed the extent to which the Espresso-derived classifier may enrich for individuals at high risk of developing AML as compared with current practice (no screening, i.e., AML incidence rate) (Fig. 6E). Sensitivity was 4.8% at 100% PPV; this small subset detected with no false positives was enriched for highly penetrant SRSF2/IDH2 double-positive individuals with the highest risk for AML development (table S9). Last, we estimated the model performance in an additional published cohort of 11,262 individuals (42). In this cohort, when the model was tuned to minimize false positives within the training dataset, a sensitivity of 14.3% and a PPV of 4.8% were obtained (Fig. 6E and table S9).

In this study, we described the rationale, technical performance characteristics, and potential clinical utility for Espresso, a novel method to improve hybrid-capture sequencingbased SNV detection. Unlike many other NGS error suppression methods, including the representative published UMI-based and probabilistic modelbased approaches tested here, Espresso does not rely on UMIs or a training set of controls for error rate estimations; therefore, Espresso improves practicality by reducing library preparation complexity, assay costs, and analysis time. We observed additional notable advantages of Espresso over alternative methods, and these were consistent across diverse datasets. Specifically, Espresso produced superior error suppression and an improved trade-off between sensitivity and specificity for detection of low-VAF alleles.

These advantages of Espresso were the result of several key features. First, Espresso applies a set of pre-filters to prepare the data for error modeling. Second, Espresso automatically selects between two statistical models to estimate the number of alternative supporting reads rather than the VAFs; thus, in addition to selecting the more appropriate error distribution model, it better accounts for error rate bias resulting from variation in sequencing depth within hybrid-capture NGS datasets. Third, Espresso markedly reduces false-positive calls by considering only the dominant nonreference allele at each interrogated genomic position. Fourth, Espresso leverages a large number of errors that share the same trinucleotide sequence context within the investigated sample; thus, it reduces the potential for misrepresentation of real error rates by relatively small control cohorts.

To explore its potential use in clinical settings, we tested the performance of Espresso to detect SNVs in serial peripheral blood samples from 42 patients with AML who achieved clinical remission. Consistent with the performance in the other investigated datasets, Espresso outperformed all the other tested methods in this setting. Using Espresso, we found resistant subclones enriched for TP53 and DNMT3A mutations that were genetically distinct from the AML clones present at diagnosis. In the future, more extensive cohort studies are needed to determine whether the selection and enrichment of such clones following induction therapy may affect patient outcomes in a nonautonomous fashion, similar to the observations in solid malignancies (43). Furthermore, combining accurate detection of persistent mutations together with other independent prognostic markers will be necessary to build clinically relevant models for accurate determination of the risk of relapse.

Our results emphasize the importance of accurate mutation detection for the derivation of classification models in the setting of early detection of AML. Using Espresso, we derived a risk prediction model that is focused on a minimal yet highly informative set of genomic loci that are recurrently mutated in patients with AML. With only 1594 genomic bases being interrogated, our results imply that up to 29.3% of de novo AML cases can be predicted years in advance with a specificity of 99.8%. Although sensitivity may greatly suffer with elevated PPV, considering the incidence rates of AML in the general population (~6:100,000) (44), our approach would still provide meaningful patient enrichment. Modest sensitivity may be acceptable when screening the general population as long as specificity and PPV remain high. Further prospective validation studies are required to assess the feasibility, utility, and cost-effectiveness of this targeted approach. Our findings should also be extended to incorporate additional predictive biomarkers. As AML is a blood-borne disease, we envision that epigenetic and metabolomic perturbations within leukocytes may further improve prediction accuracy, thus making AML predictions more clinically useful. Our results indicate that certain biomarker-enriched populations may be at an exceedingly high risk of developing AML. In time, novel therapeutic developments and targeted therapies against blood cells with high-risk mutations may provide the minimal side effects necessary to deliver a favorable risk-benefit ratio that justifies the initiation of early intervention clinical studies.

In summary, we have described, benchmarked, and validated a new practical NGS error suppression technique. We have demonstrated the superiority of Espresso in detecting somatic SNVs as compared with existing state-of-the-art approaches and defined its limitations with respect to sequencing depths and hybrid-capture panel sizes. We used Espresso to derive new biological insights, augmenting our understanding of the genetic mutations that define high-risk malignant transformation and therapy resistance clones in patients with AML. We envision that Espresso will prove useful in guiding clinical decisions and scientific research alike.

CB dataset: This dataset is composed of 10 human umbilical cord blood genomic DNA samples obtained from Trillium Hospital (Mississauga, Ontario, Canada) with informed consent in accordance with guidelines approved by the University Health Network Research Ethics Board. Cord blood was processed 24 to 48 hours after delivery. Mononuclear cells were enriched using Ficoll-Paque followed by red blood lysis by ammonium chloride and CD34+ selection before DNA extraction. CL dataset: MOLM13 cell line DNA was mixed with SW48 cell line DNA at relative concentrations of 100, 5, 1, 0.2, 0.04, and 0% and was sequenced in duplicate. Pre-AML1 and pre-AML2 datasets: Detailed information regarding these cohorts is described elsewhere (9). Briefly, the pre-AML1 dataset contains peripheral blood genomic DNA samples obtained from a total of 509 individuals upon enrollment into the European Prospective Investigation into Cancer and Nutrition (EPIC) study (45) between 1993 and 1998. Together, 414 control individuals who did not develop any hematological disorders during the extended follow-up period and 95 individuals who developed AML were included in this study. The pre-AML2 dataset contains peripheral blood genomic DNA samples obtained from individuals enrolled in the EPIC-Norfolk longitudinal cohort study between 1994 and 2010. Samples were available from 37 patients with AML and 262 age- and sex-matched controls without a history of cancer or any hematological conditions. Samples taken at multiple time points were available for a fraction of the participants in this cohort. Notably, samples from eight pre-AML patients in the pre-AML2 cohort were separately sequenced in the pre-AML1 dataset (by independent investigators using a different methodology). To avoid statistical misrepresentation of AML predictions, we removed those samples from the pre-AML2 dataset before the derivation of the described genetic risk models. AML-MRD dataset: This dataset is composed of peripheral blood genomic DNA from 42 patients with AML treated at the Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. All 42 patients achieved morphologic leukemia-free state (MLFS) on chemotherapy. Complete count recovery occurred when absolute neutrophil count recovered to 1 109/liter and platelet count recovered to 100 109/liter up to 7 days following the bone marrow assessment that confirmed MLFS status. All patients were deidentified with patient IDs. Their demographic and clinical features were captured (table S5). All the samples in this study, including healthy individuals and patients with cancer, were collected with informed consent for research use and were approved by Institutional Review Boards in accordance with the Declaration of Helsinki. Protocols were approved by the following ethics committees: (i) International Agency for Research on Cancer Ethics Committee approval #14-31, (ii) East of EnglandCambridgeshire and Hertfordshire Research Ethics Committee reference number 98CN01, and (iii) University Health Network Research Ethics Board # 01-0573.24.

Library construction and sequencing were done as previously described (9). Briefly, for each sample in the CB, CL, and pre-AML1 datasets, 100 ng of genomic DNA was sheared to 250base pair (bp) fragments before library construction (KAPA HyperPrep Kit KK8504, Kapa Biosystems) with a Covaris E220 instrument using the recommended settings. After end repair and A-tailing, ligation of UMI-containing adaptors was performed with 100-fold molar excess. Agencourt AMPure XP beads (Beckman Coulter) were used for library cleanup following eight cycles of fragment amplification with 0.5 M Illumina universal and indexing primers. Targeted hybrid-capture was carried out on pools of three indexed libraries. Five microliters of Cot-I DNA (1 mg ml1; Invitrogen) and 1 nmol each of xGen Universal Blocking Oligo, TS-p5, and xGen Universal Blocking Oligo, TS-p7 (8 nucleotides) were added to each pool of adaptor-ligated DNA. The mixture was dried using a SpeedVac and then was resuspended in 1.1 l of water, 3.4 l of NimbleGen hybridization component A, and 8.5 l of NimbleGen 2 hybridization buffer. The mixture was heat-denatured at 95C for 10 min following the addition of 4 l of xGen Lockdown Probes (3 pmol; xGen AML Cancer Panel v.1.0). Hybridization was conducted at 47C for 72 hours. Washing and recovery of the captured DNA were initiated with 100 l of clean streptavidin beads that were added to each capture. Following separation of the libraries and the supernatant using a magnet, 200 l of 1 Stringent Wash Buffer was added, and the reaction was incubated for 5 min at 65C. The supernatant containing unbound DNA was removed before repeating the high stringency wash for the second time. The bound DNA was then washed one time with 200 l of each of the following: 1 Wash Buffer, 1 Wash Buffer II, and 1 Wash Buffer III. The washed DNA on beads was resuspended in 40 l of nuclease-free water, and this volume was divided into two polymerase chain reaction (PCR) tubes that were subjected to 10 cycles of post-capture amplification (Kapa Biosystems, recommended conditions). Libraries were spiked with 2% PhiX before sequencing. The procedure used for the pre-AML2 dataset is described elsewhere (referred to as the validation cohort) (9). For each sample in the AML-MRD dataset, peripheral blood samples were collected during remission in PAXgene Blood DNA Tubes (PreAnalytiX, Hombrechtikon, Switzerland). DNA was extracted according to the manufacturers instructions. Illumina-compatible libraries were constructed from 100 ng of sheared genomic DNA using the Covaris M220 sonicator (Covaris, Woburn, MA, USA) and the KAPA HyperPrep Kit (#KK8504, Kapa Biosystems, Wilmington, MA, USA). Following end repair and A-tailing, adapter ligation was performed for 16 hours at 4C using 100-fold molar excess of adapters. Agencourt AMPure XP beads (Beckman Coulter) were used for library cleanup, and ligated fragments were amplified by PCR for 6 cycles using 0.5 M universal and indexed primers. Following hybrid-capture at 47C for 72 hours, the captured DNA fragments were enriched with 12 cycles of PCR. Paired-end 2 125-bp sequencing was performed on an Illumina HiSeq 2500 instrument with eight libraries multiplexed into each lane.

Paired-end sequencing data from the Illumina platform were converted to FASTQ format. When included, the unique molecular barcode information at each read of the pair was trimmed and was added to the read header. The Burrows-Wheeler aligner (BWA-mem) (46) was used for the alignment of the processed FASTQ files to the reference hg19 genome. To eliminate the chance of ambiguous short indel alignment on neighboring SNV miscalls, we removed reads with indels. We further cleaned the data from short and hard clipped reads and any nonunique read alignments. We found that, together, these preprocessing steps can improve SNV detection (fig. S8). Consensus read assembly into read families was done in a similar way to previous reports (47, 48). Specifically, reads that share the same molecular barcode sequence, the genomic position of where each read of the pair maps to the reference, and the CIGAR string were grouped. Families that consisted of at least two reads were used to generate SSCS, and a consensus base was called when there was full agreement. When a consensus base was called, it was assigned with the maximum base quality score observed in its corresponding precollapsed reads. Similarly, when two SSCSs with corresponding UMIs on the reciprocal strand were observed, duplex reads were generated. After converting the raw-, SSCS-, and duplex-containing sam files into coordinate-sorted bam files, we used samtools (49) version 1.2 and Varscan2 (14) version 2.2.8 to summarize the data. The following parameters were used: (i) mpileup parameters: -s -x -BQ0 -q1 -d100000 and (ii) pileup2cns parameters: --min-coverage 10 --min-reads2 1 --min-avg-qual 30 --min-var-freq 0.0001 --p-value 1 --strand-filter 0. These are rather permissive parameters allowing the output of all the dominant alleles in each one of the investigated genomic positions. To allow unbiased performance comparisons, we used this format as an input for all the probabilistic methods (SL, AL, and Espresso) and the UMI-based methods (SSCS and duplex).

With Espresso, we deployed a novel approach to model errors based on their association with either one of the 192 contextual contexts (Fig. 3, A to E). These correspond to 12 base substitution types, four alternative 5 bases, and four alternative 3 bases. To mitigate the impact of outliers and real mutations on overfitting, a set of filters is applied to exclude specific variants from the contextual error models (Supplementary Note and fig. S4). These include the removal of alleles (i) that are observed as germline variants in the general population (50, 51) with minor allele frequency 0.1%, (ii) with VAF/error rates 5%, (iii) that have MapQual<59 and MapQual!=0 [for additional information, please refer to the manual of Varscan2 (14)], (iv) that describe recurrent cancer mutations, and (v) that disproportionally persist across multiple samples in the dataset (see the Flagged alleles section; Materials and Methods). Notably, to prevent performance comparison bias, we used these filters together with all the probabilistic methods (SL, AL, and Espresso) and the UMI-based methods (SSCS and duplex) tested.

To determine the more appropriate distribution type for error modeling, Espresso first investigates the overall distribution of nonreference supporting reads in a context-independent manner, in the samples filtered, error-enriched list. On the basis of the observed peak occurrence, either exponential or Weibull distribution models are selected to generate all the contextual models. If the peak corresponds to a single nonreference supporting read, exponential distribution will be used to represent the data; otherwise, if this value is larger than 1, Weibull distribution will be used. Either the pexp or pweibull R functions are then being used together with the modeled parameters from the fitdistrplus package (either rate or shape and scale) to determine how high any nonreference allele of interest is being represented above its corresponding contextual background. A Bonferroni-corrected P value 0.05 was used to determine whether any nonreference allele received significantly more supported reads.

For comparative performance analysis, error rate models at the AL were constructed as previously described (20). Briefly, if the total number of nonzero allele frequencies seen in the training set used for error modeling was 5, we used Gaussian distribution; otherwise, we fit a Weibull distribution to the allele frequencies observed in the training set. Specifically, the pnorm or pweibull R functions were used together with the modeled parameters (either mean and SD or shape and scale) to estimate the likelihood that any allele frequency value of interest is above the corresponding modeled distribution derived for the same interrogated position in the corresponding training set. The yielded P values were adjusted by incorporating the fraction of nonzero allele frequencies into the final models [for additional information, please refer to iDES (20)]. Training datasets were constructed as follows: (i) The pre-AML1 dataset was used for the CB analysis (Fig. 3) and the CL analysis (Fig. 4). (ii) A training set composed of peripheral blood genomic DNA samples from 14 healthy individuals was sequenced and used in the analysis of the AML-MRD data (Fig. 5). (iii) The CB dataset was used as a training set for the derivation of the AL-based model for AML risk prediction (Fig. 6). To evaluate allele mutated status at the SL, we used Varscan2 (14) that computes statistical significance in single samples by Fishers exact test.

While parameters such as specific genomic context, the presence of a repetitive region, and low base or read mapping quality may explain the basis of some errors, these do not always capture artifacts that may persist across multiple samples. We therefore derived a statistical approach to flag recurrently specious alleles. To flag potentially low-frequency artifactual alleles that escaped conventional filtering, we iterated between the 99 and 99.9% nonreference allele frequency quantiles in the entire investigated cohort in increments of 0.1% (user-defined parameters). The 10 derived VAF values were used consecutively to apply Fishers exact tests, determining whether errors with VAF above the quantile-derived cutoff distribute proportionately among all the observed nonreference alleles in the dataset or being clustered in a low number of alleles across many samples in an unbalanced fashion. Then, if included, we removed recurrent Catalogue of Somatic Mutations in Cancer (COSMIC) (52) mutations (that is, SNVs with classification other than synonymous with at least three case reports of hematopoietic and lymphoid tissues; COSMIC version 80) to derive a final list of dataset-specific flagged alleles to be excluded from contextual error modeling.

To derive with a list of mutations that are highly associated with leukemic transformation for AML risk prediction model derivation, we interrogated the COSMIC database (52) and ranked variants according to their evidence for functional relevance in AML. All the SNVs with classification other than synonymous with at least 10 case reports of hematopoietic and lymphoid tissues were considered hotspot variants. For the future implementation of our findings, we reasoned that any hybrid-capture probe design and short sequencing reads would efficiently encompass at least several genomic bases surrounding these hotspots. Therefore, we extended the variant calls to capture mutations with a putative deleterious effect that are within fiveamino acid distance surrounding each hotspot variant. Genomic loci that were found to be mutated in the training cohort (pre-AML1 and pre-AML2) were used for the final model derivation (table S8). Notably, we discarded genomic loci with mutations in KIT, KRAS, and PHF6 as these were found solely in the training sets controls. Such enrichment surely does not correlate with real-life evidence and can bias classification. We then used a random forest algorithm via the R package randomForest. Mutations were grouped by genes, and their VAFs were used to train the model together with the age of the individuals at sampling and the number of the mutations that they carry. If more than one mutation was detected in the same gene, the highest VAF was used. The number of features used for each one of the 5000 generated trees was two.

To simulate a large population screen, we used the mutations detected by Espresso in the controls from the pre-AML1 and pre-AML2 (termed merged dataset here). We first calculated the frequency of controls that carry at least one mutation at the following age groups: 20 to 49, 50 to 64, 65 to 74, and >75 years old. For these age groups, we obtained the incidence rates of AML through the Surveillance, Epidemiology, and End Results Program (53). By assuming similar age distribution for the validation cohort (8) and the individuals interrogated in the merged dataset and knowing the number of pre-AML cases interrogated in the validation cohort (n = 188), we were able to estimate the number of simulated controls needed to mimic real incidence rates for each age group. Overall, 4,033,904 controls were simulated.

The frequency of ARCH and the number of mutations that each individual carries within each control age group from the merged dataset helped us to estimate how many of the simulated individuals are expected to carry mutations in the relevant genomic loci (table S8). Overall, 5.05, 7.69, 10.70, and 19.09% of the individuals within the age range of 20 to 49, for 50 to 64, for 65 to 74, and 75 years, respectively, were simulated to have ARCH. A total of 285,629 individuals (~7%) were simulated to carry one mutation, 934 with two mutations (~0.02%), and 156 with three mutations (~0.004%). We next assigned the specific mutations to the simulated individuals based on their association with each age group. For example, for the 149,423 simulated mutated controls with a simulated age of 50 to 64, we populated a list of 149,423 specific mutations that were detected in control individuals in the same age group or in younger age groups in the merged dataset. We also allowed 10% of the mutations detected in the merged dataset in one age group older to be randomly included. Last, we aimed to assign VAF to the simulated mutations. We observed that the VAF of the detected mutations in the merged dataset did not significantly correlate with age [R(Pearson) = 0.20; P = 0.07] and that a lognormal distribution accurately captures the VAF distribution among all the detected mutations. We therefore used the rlnorm R function to simulate VAFs. This resulted with a median VAF of 1.45% and a mean VAF of 2.45% for the simulated controls; 37.46% of the simulated VAFs received a value of VAF 2%. As intended, these values are highly comparable with those of the mutations found in the merge datasets controls (table S7).

Read more from the original source:
Integration of intra-sample contextual error modeling for improved detection of somatic mutations from deep sequencing - Science Advances