The Region 4 Stork (R4S) Collaborative Project
Part 6: Other Applications and Future Developments
Click CC box for captions; full transcript is below.
Published: August 2013Print Record of Viewing
Accurate identification of newborns with metabolic disease can significantly improve patient outcomes. Conversely, a missed diagnosis can result in significant morbidity and may even result in death. While a false-positive diagnosis does not carry the burden of increased morbidity or mortality, there are social and psychological costs that may generate significant harm. The Region 4 Stork Collaborative was developed to improve detection of true positive cases of metabolic disease and improve accurate diagnosis. The R4S project uses Mayo-developed software that provides postanalytical interpretation of complex metabolic profiles. The R4S project offers physicians worldwide the opportunity to utilize this software to analyze their patients’ test results, and compare them with other locations’ results.
R4S Collaborative Project Part 6 discusses other applications and future developments, including 8 additional live applications on the R4S site.
Presenter: Piero Rinaldo, MD, PhD
- Co-director of the Biochemical Genetics Laboratory
- Professor of Laboratory Medicine and Pathology
- T. Denny Sanford Professor of Pediatrics at Mayo Clinic
Questions and Feedback
TranscriptDownload the PDF
Thank you for the introduction. This presentation is the final segment of a 6 part series describing the products and clinical tools of a newborn screening quality improvement project called Region 4 Stork, or R4S.
I have a disclosure to make: a provisional patent application related to the content of this presentation has been submitted by Mayo Clinic. The title of the application is “Computer-Based Dynamic Data Analysis.”
The title of this presentation is “Other applications and future developments”
Although the MS/MS application has been the primary focus of the entire series, there are 8 more live applications on the R4S website. They are devoted to other established targets of the recommended uniform screening panel but also to some new tests under development and validation. The list of established applications includes congenital adrenal hyperplasia, biotinidase deficiency, routine second sample analyzed by tandem mass spectrometry, shown as MS/MS , and severe combined immunodeficiency, or SCID. The applications related to new tests and conditions are Lysosomal storage diseases, Friedreich ataxia, x-linked adrenoleukodystrophy, and other Peroxisomal disorders, and finally a combined application titled Pilot study that includes the data of the 3 previous applications and one more condition, Wilson disease.
A defining characteristic of all these application is the consistency of format and content, namely all the first generation and post-analytical interpretive tools. For example, the plot by marker for T-cell recombinant excision circle, the analyte measured to screen for Severe Combined Immunodeficiencies that is abbreviated as TREC, looks exactly like the tool in the MS/MS application showing the cumulative reference range and, in this figure, the disease ranges of 19 separate conditions. Likewise, the all conditions tool for 9 lysosomal storage diseases is identical to the one found in the MS/MS application. The relevant point to be made here is that familiarity of a user with one application allows the utilization of all the others.
This table summarizes the status of the 9 R4S applications as March 31, 2013. The MS/MS application is approaching the mark of 15,000 true positive cases and has surpassed the milestone of 1 million data points. The other modules include between 77 and 866 true positive cases, on average more than 300 per application. Even if significant smaller, these are volumes that a single laboratory would be hardly pressed to assemble in isolation without cooperation.
R4S constitutes a novel approach to the interpretation of laboratory test results, one that has the potential to create net value in the practice of both the performing laboratory and the ordering physician. R4S has reached an unprecedented level of worldwide collaboration, 185 laboratories in 51 countries to date, and has evolved into a testing environment for a continuous and dynamic clinical validation process. Clinical significance is not based on arbitrary choices, but is entirely evidence-based. Reference ranges are not expressed as greater than or less than but as cumulative percentiles. The same is true for disease ranges which are condition specific, not cumulative by analyte. Cutoff values are simply not necessary. Peer comparison is extensive, transparent, available on demand and always up to date. The selection of ratios is facilitated by several tools and plot for data analysis. Like in the case of cutoff values, sequential algorithms are made obsolete, and replaced by the parallel, simultaneous evaluation of all informative markers. Differential diagnosis is automatic, and in most cases resolved by dedicated dual scatter plots. Subjectivity is minimal, and shared knowledge is practically built in. Rather than resting on these accomplishments, this is exactly the time to ask “What is NEXT?”
R4S is likely to continue to evolve, starting with additional newborn screening modules. In the near future a new application will be launched to collect data and create tools based on the first round of confirmatory testing once a referral has been made. This application will be called “Newborn screening short term confirmatory testing” and will collect the data of the traditional tests performed in plasma and urine for confirmation purposes. When applicable, each condition will be split in 2 groups, true positives and false positives to allow the creation of dual scatter plots targeting the differential diagnosis between them.
In addition to new R4S applications, a new cluster called CLIR 1.0 has been created. As described in the next slides, they cover other tests in Clinical Biochemical Genetics, other specialties of Laboratory Medicine and Pathology, and some basic research applications. Later in the presentation, version 2.0 of the software will be briefly introduced.
This effort is driven by the belief that R4S tools are NOT just applicable to newborn screening, but they can provide useful answers to 3 basic questions in a broad spectrum of clinical circumstances: detection of an overall profile that fits the pattern of a target condition (yes or no answer), a differential diagnosis between 2 conditions with similar and overlapping biochemical phenotypes, and the recognition of the most likely condition among a large number of possible choices.
As a reminder, the IT infrastructure of R4S was initially located at the Michigan Public Health Institute in Lansing, Michigan. In the summer of 2012 the R4S website and applications became part of the Newborn Screening Translational Research Ne2rk based in Bethesda, Maryland. At the same time, a new and completely separated cluster of applications was created within the Mayo Clinic IT infrastructure. This new project has been named Collaborative Laboratory Integrated Reports, or CLIR 1.0.
CLIR 1.0 applications target any test performed by the Biochemical Genetics Laboratory (BGL for brevity) that rely on selection of cutoff values and pattern recognition of complex metabolic profiles. To date, approximately 30 such applications have been created on a Mayo intranet development site. A screen shot of the log in page is shown here.
Just like in the case of newborn screening, the availability of very large numbers of true positive cases is critically important to establish condition specific disease ranges for all the analytes of interest. If we consider the current size of the R4S database as a goal to emulate, between 1500 and 2000 cases per year are needed over a period of 10 years.
Fortunately, data archiving of true positive cases has been a priority of BGL since 1999, and indeed a similar if not greater number of cases have been collected during the past 10 calendar years.
This table shows the 11 major groups on inborn errors of metabolism in the BGL database, and the number of diagnosed cases per category per year. Overall, more than 20,000 cases are already available to create disease ranges for a broad number of inherited conditions.
This table shows the status of the 10 BGL applications with the highest number of true positive cases
with a legend of the abbreviations listed in the header line. 2 of them, urine acylglycines (A-C-Y-L-G) and plasma very long chain fatty acids (POX) will be highlighted in the next slides. Between 100 and 1200 cases per application have been already uploaded to CLIR 1.0 applications. However, this work has just started considering that only16% of the total number of available cases has been processed so far.
This effort is driven by the expectation of achieving a number of clinically useful deliverables: first, the automated production by any instrument software in the laboratory of a .csv file inclusive of fully anonymized batches of clinical sample raw data. A second deliverable is the creation of a post-analytical interpretive tool for every condition potentially detectable by a test or a combination of multiple tests. The third and final goal is the routine processing of such data batches by the tool runner in every CLIR application. Obviously, the sheer magnitude of this effort is justified if it can be proven that there is added value to be found in this process.
The first example of added value is provided by the plasma very long chain fatty acids test. This test is primarily used to recognize several inherited disorders affecting the function of the intracellular organelles called peroxisomes. This test requires a complex differential diagnosis between single enzyme defects and global disorders of peroxisomal biogenesis. By reproducing the process described for newborn screening, the frequent occurrence of minimal elevations of one or more measured analytes, findings that often trigger a request for a repeat sample, can be replaced by post-analytical tools that conclusively override those random and non-specific results. The improved specificity is not the only added value: preliminary workload recording has shown a reduction of average review time by a laboratory director from 2 minutes to 12 seconds per test, a reduction of almost 99%.
The same evaluation is under way for the urine acylglycine profile, another high volume test where the utilization of the tool runner in routine daily practice is ready to be activated.
It should be apparent that CLIR could be used for any other multi-analyte test performed by a wide spectrum of clinical laboratories that require expert interpretation of complex profiles. Notably, the test catalog of the Department of Laboratory Medicine and Pathology at Mayo Clinic includes at least 463 tests with more than one reportable result, more than half of them actually include 5 or more markers.
Finally. CLIR could support the data collection, secure web-based sharing, and comparison between sites collaborating on research projects that include the characterization of biochemical phenotypes. To date, a dozen applications have been started already, new ones could be added almost instantly when a request is brought to our attention.
Requests to activate a new CLIR application, either clinical or research, have relatively simple requirements to be fulfilled, a list we call the “starter kit”. An application needs one or more content experts who are ready to assume the role of curator of the database. These individuals are given administrative access to the full arsenal of tools, including the tool builder. To populate the log-in icon a short and long name are needed (for example, POX and plasma very long chain fatty acids). A description of the condition types and again short, long name, and, if applicable, SNOMED code of the individual conditions which are the targets of the application. Analytes can be sorted in categories called “types”, again full descriptions and preferred abbreviations need to be provided. The individual analytes should also describe the unit of measurement and, if available, the LOINC code on record that matches the unit and the specimen type. Once the framework of conditions and analytes has been set up, a process that could be completed very quickly for an application of average complexity using the administrative tools available on the website, the next step is to calculate and upload analyte percentile values of reference subjects. The final step is to enter one by one all available true positive cases. This process can be facilitated by semi-automated uploading of .csv files that can be easily generated from available Excel spreadsheets. Post-analytical tools could be activated with data from as little as 5 cases, obviously much larger numbers are needed for a tool used to analyze prospective data in a clinical setting.
A frequently asked question is if there is a limit to the number of analytes that could be added to a single application.
While there is no set limit in the software, a first attempt to test the reliability and speed of the tools when large numbers of analytes need to be processed was tested using an application for Cystic Fibrosis mutation screening based on a mass spectrometric method. This model fits well the 3 clinical questions described earlier, having to answer the question if a given case is a CF carrier or not, if the case is either a carrier or is affected, and having to pick one or more mutations from a total of 106 different alleles.
As shown on the left side of this slide, one single case amounts to 434 analytes. Each mutation is defined by a measure of the signal to noise ratio and by the peak height of the wild type and mutant signals as shown in this partial enlargement. Not shown here, these results are applied to calculate a total of 3776 ratios. It is therefore legitimate to ask if the tool runner could process this amount of data on a routine basis.
Including controls, one routine batch includes 48 specimens, and more than 20,000 results. CF mutation screening is a high volume test that requires the processing of >1000 batches per year and >20 million results, not including the calculated ratios.
The tool runner was not challenged by this load and in only a few seconds can process a batch. Shown here is the all conditions tool for a case who is a carrier for the most common CF mutation, DF508.
This observation is promising but it should be mention that there are other limitations of the R4S/CLIR software that still need to be resolved. Incomplete sets of data interfere with the calculation of ratios, negative values cannot be processed, and there are instances where tests results are expressed only as a binary choice, positive or negative. Another frequent problem is the lack of measurable results (when normal is equal to zero) to calculate reference ranges. Finally, values less than 1 and especially less than 0.1 should include a sufficient number of significant decimal digits to avoid artifactual clustering in the data display tools.
These are just some of the reasons behind the decision to develop a second generation of the software that has been named CLIR 2.0.
This slide shows the current appearance of the plot by condition in 1.0 and a prototype of the same tool in CLIR 2.0. In addition to a sharper graphic definition, users will actually be able to choose a color-palette if they prefer not to use the default colors, mostly red and green, which are used in the tools.
This slide is quite crowded and yet it is only a partial list of the large number of improvements and new features that will be incorporated in the new version of the software. One that is worth mentioning is shown at the top of the list: in 2.0 it will be possible to generate on demand dual scatter plots for any 2 conditions a user is interested to compare in a given case. Currently, this tool requires extensive preliminary work behind the scenes to set up a pair of matching 2-conditions tools, and their merging in a dedicated dual scatter plot that also requires editing before it is released into production.
Beside improvements and new features, the most compelling reason to develop a new version of the software is the realization that 1.0 had become some sort of an inverted pyramid: it started as a single application (MS/MS), then several other applications were added on top of the first one, all similar but with unique requirements that prompted a number of customized changes, and now potentially hundreds of CLIR applications are being developed. This progression is not sustainable without running the risk of making the whole infrastructure unstable, and prone to outages. This is why version 2.0 is needed to create a broad and robust foundation able to support the workload and diversification of both R4S and CLIR.
In parallel to the coding of the new infrastructure, work will continue to explore the possibility of creating a CLIR application for every test we do, and to find the dedicated content experts who will assume the oversight and curation of data collection to establish reference range and condition-specific disease range percentiles, and the transfer of archived data to the appropriate CLIR application. At the same time, it is necessary to establish and implement a routine process to capture more data prospectively. This is the basis of the concept described earlier of constantly evolving, dynamic clinical validation. As the number of applications grows, it will become a necessity to train a larger group of super-users who are proficient in the use of the tool builder. Indeed, the ultimate goal is to create post-analytical tools for every target condition of a laboratory test.
The pursuit of this arguably ambitious plan could lead us to significant outcomes: objective and quantifiable improvement of test performance, utilization, and ultimately of patient care. As already mentioned, reliance on a quality assurance system based on constant, not static clinical validation. Tools could improve the consistency of interpretation among multiple laboratory directors but also physicians who alternate in covering either a section of the laboratory or a clinical service. Education will also benefit, as students, residents and fellows will have immediate access to large bodies of objective evidence that will remain available to them once their training is completed and they move along in their professional career. It is likely this type work will translate in greater academic visibility and productivity. CLIR applications could also offer an opportunity to all laboratory personnel, not just physicians and scientists, to learn new information management skills and to foster enduring professional satisfaction. These outcomes combined could lead to improved performance and also expense reductions, for example by facilitating a systematic conversion to a paperless process. Last but certainly not least, the most far-reaching goal of version 2.0 of the CLIR software is to obsolete the need to establish age-matched reference ranges for test results particularly in a pediatric population.
This work is the scope of an exciting collaboration we have established recently with 2 investigators from Oslo University Hospital, Drs. Lars Mørkrid and Alexander Rowe. They proposed to replace our static cumulative percentiles with the collection of data points that are converted to z-scores, an independent measure of deviation that has been illustrated in a previous segment of this series, and applied over an age continuum. This process also allows a statistical robust exclusion of outliers, when applicable.
The expected end product of this new line of research is the creation at the front end of the CLIR software of the equivalent of “growth charts” for every analyte under consideration, and the expression of a patient result, shown here for a patient between 1 and 10 years of age as a white circle between the 90% and the 97.5% percentiles, in a manner that is free of arbitrary age “bins” and, like R4S and CLIR, free of equally arbitrary cutoff values.
This animated slide should lead to a better appreciation of the evolution of the R4S/CLIR software. The starting point is the status quo, age-matched reference ranges and arbitrary cutoff values. The first step was the systematic adoption of reference percentiles and condition specific disease ranges, also calculated as percentiles. Once these ranges were established with adequate power, it was possible to obsolete the conventional approach and define cutoff target ranges. The limitations of the target ranges, and a less than ideal overall utilization, lead us to focus on the degree of overlap between reference and disease ranges, overlap that could be anywhere between substantial and not existent. Although very effective in newborn screening practice, this approach is not corrected for age and needs to be once again replaced this time by the definition of z-score percentiles.
This slides shows how the current software would be forced to look at patients with the same condition but clustered according to age. Moreover ,in the first generation of interpretive tools the focus of data analysis had been on the range LIMITS, and the consequent estimate of the degree of overlap.
In the next generation of tools, the focus will be placed on the degree of dispersion of the data within a single continuous age range. In addition to the significant advantage of eliminating the need to establish separate applications for different age ranges, preliminary evidence has shown that reference and disease percentile ranges will be segregated more effectively, leading to more sensitive and specific tools.
Age is the most obvious but just one of the covariates that the 2.0 software will be able to incorporate. This figure shows again the work of our colleagues from Oslo University Hospital who were granted retrospective access to approximately 90,000 newborn screening results for the marker 17-OH progesterone. Correction for 2 additional covariates, birth weight and gender, in addition to age at collection, resulted in a very promising distribution of values that could lead to the selection of a single threshold thought the continuum of birth weight values, the root cause of historically poor performance of newborn screening for congenital adrenal hyperplasia.
In summary, CLIR 2.0 will be capable to provide front end correction of laboratory results for multiple covariates, will continue to foster worldwide collaboration and data sharing, and will enhance existing high throughput data portals for batch data submission to the tool runner.
In the end, a virtually unlimited number of web-based, always up-to-date, and on-demand post-analytical tools will become available to a broad spectrum of users in clinical but also research practices.
This is the conclusion of part VI of the R4S series of Mayo Medical Laboratories Hot Topics, and also of the entire series.
I would like to acknowledge several individuals who have contributed to the work presented in this series, beginning with the team of programmers and code developers, particularly David McHugh and Gregg Marquardt. The importance of the contributions by our Norwegian colleagues Lars Mørkrid and Alex Rowe cannot be overstated. I also want to recognize my BGL colleagues, laboratory directors, current fellows and genetic counselors who have spent countless hours populating and testing applications as they became available. Many other individuals in BGL have also contributed, just too many to mention here.
Finally, I would like to thank the MCSI scientific and technical publication team lead by Denise Masoner for their outstanding support, and patience, during the recording of this series.
Please do not hesitate to contact us if you have any questions or requests related to the content of this presentation. Of course, we will be happy to provide a password to R4S to any interested new users, just send a request to the email address firstname.lastname@example.org. Thank you very much for your attention.