The AAPS Journal ( # 2016) DOI: 10.1208/s12248-016-9946-6
Meeting Report Workshop Report: Crystal City VI—Bioanalytical Method Validation for Biomarkers Mark E. Arnold,1,5 Brian Booth,2 Lindsay King,3 and Chad Ray4
Received 11 April 2016; accepted 5 June 2016 Abstract.
With the growing focus on translational research and the use of biomarkers to drive drug development and approvals, biomarkers have become a significant area of research within the pharmaceutical industry. However, until the US Food and Drug Administration’s (FDA) 2013 draft guidance on bioanalytical method validation included consideration of biomarker assays using LC-MS and LBA, those assays were created, validated, and used without standards of performance. This lack of expectations resulted in the FDA receiving data from assays of varying quality in support of efficacy and safety claims. The AAPS Crystal City VI (CC VI) Workshop in 2015 was held as the first forum for industry-FDA discussion around the general issues of biomarker measurements (e.g., endogenous levels) and specific technology strengths and weaknesses. The 2-day workshop served to develop a common understanding among the industrial scientific community of the issues around biomarkers, informed the FDA of the current state of the science, and will serve as a basis for further dialogue as experience with biomarkers expands with both groups.
INTRODUCTION In 2013, the FDA released its revised Bioanalytical Method Validation (BMV) draft guidance (1) containing a section on biomarker validation. As this was the first time that biomarkers had been included in a method validation guidance document, there were many implications and a number of questions ensued. In order to further explore the intent and questions related to biomarker method validation, the AAPS sponsored a 2-day workshop, Crystal City VI, with industry and FDA participants dedicated to the topic, and this publication presents the official proceeding of the workshop. The goals of this workshop were to explore the analytical challenges of biomarker assays conducted using ligand binding assay (LBA) and/or liquid chromatography-mass spectrometry (LC-MS) in the context of the 2013 FDA Draft Bioanalytical Validation Guidance and discussions at the 2013 Crystal City V
The views expressed are those of the authors and do not reflect official policy of the FDA. No official endorsement by the FDA is intended or should be inferred. 1
Bioanalytical Solution Integration, Burlington, New Jersey, USA. U.S. Food and Drug Administration, Silver Spring, Maryland, USA. 3 Pfizer Inc., Groton, Connecticut, USA. 4 Pfizer Inc., La Jolla, California, USA. 5 To whom correspondence should be addressed. (e-mail:
[email protected]; ) 2
meeting (2). The discussion focused on the issues associated with the conduct of biomarker assay validation relative to the validation of pharmacokinetic (PK) method validation described in the guidance. These discussions encompassed when the data was to be used for drug registration, various scientific perspectives, experiences, and areas that require additional clarity. Through three sessions, the meeting addressed universal biomarker issues (day 1 morning) and issues specific to liquid chromatography (including mass spectrometry) (day 1 afternoon) or to ligand binding (day 2 morning). The conference, held in September 2015 in Baltimore, MD, included a format that combined presentations and panel discussions with significant audience participation. The speakers and participants were composed of biopharmaceutical industry, contract research, clinical laboratory, and government scientists. Discussion underscored the extensive experience with the measurement of endogenous biomarkers within the clinical chemistry community, and the drug development industry recognized the need to exploit that knowledge. Specifically, the need to better understand the approaches used to analytically validate a biomarker assay used to make a clinical decision and the methods used to control these assays over time, both within and between labs, will help evolve our practices. In preparation for the workshop, the organizers identified some key challenges that differentiate biomarker assays from drug assays. A series of pre-conference meetings with the panelists and speakers drew from both the drug development industry, with PK expertise 1550-7416/16/0000-0001/0 # 2016 American Association of Pharmaceutical Scientists
Arnold et al. and experience with biomarkers, and expertise from clinical chemistry to begin defining biomarker-relevant quality attributes. Central to those discussions and those at the workshop was the need to understand the biological context of the biomarker and disease to establish an appropriate assay. The meeting started with an overview of the relationship between biomarkers and regulatory decision-making. The FDA representatives shared their perspectives and examples of biomarker applications that influenced the decision to approve or not approve a therapeutic. These opening examples highlighted the importance of assay validation testing in the context of biomarkers that serve as primary or secondary endpoints. Biomarkers are often applied from early decision-making in preclinical animal models to phase IV post-marketing analysis. Biomarker results are also included in the regulatory dossier to support both efficacy and safety claims. As a result, the integrity of the results must be verified through careful evaluation of the analytical performance. The challenge for biomarker scientists is to develop a validation strategy that covers the breadth of the analytical science (metabolites, nucleic acids, proteins, cells, and tissues) and factors in the biology of the intended use. The focus of this meeting was on protein and metabolite biomarkers measured by LBA and LC-MS assays but excluded flow cytometry; however, some of the general concepts can be applied to nucleic acids, cells, and tissue analysis. In contrast to previous Crystal City meetings (I–V) held between 1990 and 2013 (2–6) that focused on PK assays, there was a high degree of alignment between all technology platform practitioners about the challenges, if not the solutions, of biomarker assays. The rationale for additional regulatory guidance is based on the use of biomarkers that the agency has witnessed within filings. The quality and relevance of the biomarker have not always met expectations. To provide a framework for how the agency views the bioanalytical method and the resulting concentration data, five questions were posed that scientists need to consider: 1. 2. 3. 4. 5.
What is the purpose of this assay? Am I measuring what I think I am measuring? How much variability/error is in the measurement? What are the limits to these measurements? How do handling conditions affect the measurement?
While these questions are common to PK assays, presentations and discussions during the remainder of the conference provided insight into how biomarkers are different from small molecule drugs, and developing assays for their measurement will require some re-thinking and novel approaches. Another crucial question that is central to these considerations emerged from the discussion, namely BHow much biological variability is there for this biomarker?^ The importance of understanding the biomarker biology was a central theme during all technology discussions. The scope of the workshop also needed to be clarified as specific to drug development and separate from medical practice and patient care where biomarker analysis is covered under the Clinical Laboratory Improvement Amendments (CLIA) as overseen by the Clinical and Laboratory Standards Institute (CLSI). However, throughout the workshop, experience from clinical biomarker measurement was presented and discussed to
explore how this knowledge could be exploited in drug development. In developing a PK assay, the intent and analyte are well understood, whereas with biomarkers, the complete biology is frequently more complex and not always known. A bioanalytical scientist must understand the biology as much as possible to help define the characteristics of the assay. Combined, the biology and essential performance characteristics will permit the scientist to develop the method and the validation experiments which will demonstrate that the assay is able to assess the biomarker in the context of the question being posed. Questions such as BHow much precision, sensitivity, range and stability is needed?^, BHow large a difference is there between the healthy and patient values?^, BWhat is the variability within these populations?^, BWhat are the expected changes with therapy?^, or BWhat are the physio-chemical properties of the biomarker (e.g., isoforms and protein families)?^ are some that need to be considered. There was broad consensus that during drug development there is work done in discovery and early clinical phases where the value of a biomarker in evaluating the disease state is unknown or unproven, but as more information is known, it clearly supports the labeling claims intended for the drug. In order to frame the discussion, the biomarker quantification continuum, originally described by Lee et al. (7) and later expanded by Cummings et al. (8), was described. The majority of the attention was placed on relative quantitative and definitive quantitative assays. Definitive quantitative assays were described as having the following characteristics: a continuous numerical scale, inclusion of a reference material that represents an exact copy of the endogenous m a t e r i al , a h i g h e r- o r d e r re f e r e n c e m e t h o d , a n d commutability. In the field of protein biomarkers, there are only 22 proteins that have achieved certified reference material status by the Joint Committee for Traceability in Laboratory Medicine (JCTLM). With this in mind, the two segments of the drug development paradigm are those proposed at Crystal City V as requiring category 1 and category 2 assays. Crystal City VI endorsed the positions that category 1 assays were those used for internal decision-making and that the extent of assay validation is the purview of each company to decide. In comparison, for category 2 assays, those supporting a labeling claim, it was agreed that a more extensive and consistent set of validation procedures would be required. Those procedures may use practices typical of PK assay validation but are also expected to include experimental testing that addresses the unique aspects of biomarker assays. Considerations of these aspects have previously been reported (9,10). BIOMARKER REFERENCE MATERIAL Reference material for biomarkers differs from reference material (standard) for PK assays in a number of ways, and this has significant implications for biomarker assays. PK reference material is very highly characterized with very tight control over lot changes, typically by a single group who provides concentration value assignment, traceability to reference measurement procedures, and/or reference materials of higher order under GMP conditions. In contrast,
Crystal City VI Biomarkers Conference Report biomarker reference material is often obtained from external vendors, the characterization and concentration value assignment may not be well defined, and over time there may be differences from lot to lot. Moreover, there may be multiple sources of reference material and these may be very different from each other. Finally, for PK assays, the reference material is often identical (even the same lot) as what was initially dosed into a subject, although it may change over time in the body after dosing, whereas the endogenous form of a biomarker may not be identical to the biomarker reference material. The biology of the biomarker may not be fully understood, and there may be significant heterogeneity in the endogenous forms. For smallmolecule biomarkers and smaller peptides, this can often be addressed but this is a particular concern for protein biomarkers. Thus, there are uncertainties associated with both the absolute concentration of the calibrator and its biological similarity to the endogenous analyte which have implications for the ability to obtain absolute accuracy. The concepts of commutability and reference method standardization were introduced from clinical chemistry practice. The concept describes the setting in which multiple different assays may be used in different labs to measure the same analyte. These assays often include different reagents and reference materials. The data are used to make a medical decision about patient treatment. Therefore, all of these assays/labs need to provide the same result in order to provide consistent effective patient treatment, i.e., the same decision would be made in lab A with test A as in lab B with test B. The commutability of reference material simply means they are interchangeable. However, commutability must be demonstrated, which requires considerable effort. Many cases remain imperfect, which has led to efforts to standardize and/ or harmonize the use of reference material (11,12). The majority of the biomarker assays and reference material used in industry have not been assessed for commutability. These practices have been used to harmonize biomarker assays for drug development in terms and concepts between the International Federation of Clinical Chemistry and other metrology organizations around the world. Since the majority of the biomarkers lack certified reference material, a different approach is needed in drug development. Moreover, commutability has not been demonstrated for the majority of commercial biomarker assays and reference materials. As noted above, the exact nature of the reference material as compared to the endogenous form(s) was extensively discussed. Over the course of drug development, the sources for the biomarker may change, with corresponding changes within the quality and purity of the material. It is, therefore, essential to utilize well-characterized reference materials and, in transitioning between lots or suppliers, to evaluate the impact to the assay. Hence, it is necessary to develop a strategy to assess commutability in drug development. When considering category 2 assays, a change in lot or sources could provide difficulties in comparing data within a study or across studies being included within a population model. Appropriate bridging must be performed between the different reference materials to understand and limit the differences. Other approaches to within and across study comparison, including using longitudinal quality control samples (QCs), will be discussed later.
SURROGATE MATRIX AND CALIBRATION CURVES Surrogate matrices are widely used for calibrators since for most biomarker assays, the native matrix virtually always contains the biomarker analyte and a blank matrix is nonexistent. There was general agreement that this approach can be scientifically valid for both category 1 and 2 assays. No specific recommendations were made on tests for surrogate matrix use due to the variety of approaches to surrogate matrices that go from the complex (the same matrix from a different species) to the simple (phosphate-buffered saline). However, whatever surrogate matrix is used requires an assessment of the potential differences in the assay’s ability to measure analyte relative to that in the patient population’s matrix and the extent of dissimilarity to the native matrix may be a guide to the extent of testing required (i.e., the more dissimilar, the more testing would be expected). Presentations at the workshop provided a number of examples of surrogate matrix approaches for calibration standards. They included using spiked calibrators in the surrogate matrix, spiked calibrators for the lower limit of quantification (LLOQ) and upper limit of quantification (ULOQ) samples and either endogenous pools for the concentrations in between or mixtures of endogenous matrix and surrogate, or supplemental spiking of endogenous pools to achieve specific concentrations. While spiking of the surrogate matrix to achieve specific concentrations is the most likely to be implemented, in all cases, the performance of the calibration curve in whatever matrix is used must be demonstrated to behave similarly to the endogenous biomarker in native matrix. This is normally performed with a parallelism experiment (discussed later), and it should be noted that approaches to parallelism may differ between LBA and LC-MS. It was agreed that the concentrations back-calculated against the calibration curve would be reported in most cases and that these concentration are only reliable as referenced to the reference material used to prepare the calibrators (i.e., they are relative measures). Thus, changes in a reference material source or lot may result in different results for the same sample. This distinction vs. PK assays requires that the bioanalytical scientist understand the relationship but also perform the testing needed to express and explain it to both health authorities and the drug development teams utilizing the data. There was a discussion, particularly for LBAs, on the different ways to determine sensitivity for biomarker assays. This included defining the LLOQ through parallelism experiments with samples containing only endogenous biomarker (the common concentration method with n = 20), the use of the lower limit of detection, and the used of traditional PK spiked QCs. For LBAs, samples are typically diluted with an assay buffer to manage matrix interference, and for PK assays, this often requires a large (1:20–1:40) minimal required dilution (MRD). The selection of the amount of dilution and diluent is tested through the evaluation of individual samples from the appropriate normal and/or patient population unspiked and spiked at the LLOQ and HQC levels. For biomarker assays where sensitivity is a key concern, these dilutions are often very small (1:2) or even absent. When the calibrators are prepared in a surrogate matrix, this is then used to dilute samples but the best approach to assess the effectiveness of
Arnold et al. this dilution to manage matrix effects is through parallelism experiments as described later. This was recognized as an area for future discussion, but importantly, it also illustrated that more than one approach could be scientifically valid. QUALITY CONTROL SAMPLES The discussion on QCs was in strong agreement on several topics, but left flexibility for the different approaches and situations that were mentioned. Except where the endogenous biomarker concentrations are low, the recommendations are to always use endogenous pools for as many QC levels as possible. One widely used approach in the clinical chemistry setting is the use of large pools derived from either patients or healthy individuals, and patient pools. These may provide low and high concentrations that through admixing schemes (13) provide concentrations that cover the expected study sample concentrations. Creation of some pools (e.g., a low QC) may require the diluting of the endogenous pool from a higher concentration with surrogate matrix. Enrichment spiking of additional biomarker into pools of endogenous matrix is another, although least preferred, option. This position is primarily based on protein biomarkers where the spiked reference material may not be identical to the endogenous form. Consideration should be given to each preparation and its similarity to the expected study samples. In all cases, QC pool concentrations should cover the range of expected concentrations in study samples. Throughout the conference, there was an emphasis on the importance of utilizing authentic patient samples to address reliability. In this context, these are not samples from the clinical trial the assay is intended to support but rather samples collected from the same patient population as the intended trial and/or normal patients, ideally with a range of biomarker levels. Endogenous samples or pools of sample analyte concentrations are determined by repeated measurements and a value assigned to them. Aliquots are frozen and have multiple uses, which include run acceptance, qualification of new lots of kits, and as critical reagents. They can also be used to accept new lots of calibrators, although other strategies using a set of individual endogenous samples can also be used, as discussed previously. QCs based solely on endogenous samples or pools can be used to control assay drift in the absence of a fixed gold standard, but this remains a risk as one lot is transitioned to the next. The use of conventional control charts, as are used in clinical chemistry labs, to track assay performance and lot changes over time would be important for long-term studies. In clinical practice, there is often a need to both harmonize across labs, standardizing to a gold standard calibrator, so that an individual result from one lab or brand of test for a given analyte would be the same as that from another lab or brand of test, thus ensuring that a patient would get the same diagnosis irrespective of the lab or brand of test used. Each lab would determine a normal range in this context. By contrast, in drug development, even for category 1 assays, there would only be a need to standardize if the assay was run at multiple sites.
PRECISION Assay performance parameters cannot be assessed without understanding the precision of a biomarker assay. There was broad agreement that for biomarker assays, where measurement of changes in biomarker levels over time is often the main goal, the assay needs to achieve a level of precision based on the biology and amount of change expected. Thus, biomarkers with dramatic differences between healthy and disease patients or where the therapy will cause a significant change would not be expected to require the same level of precision as compared to situations in which there is higher inter-subject variability and changes need to be compared to a pre-therapy measurement. Measurement of precision typically starts with spiked calibrators, and endogenous samples from the target population can also be measured repeatedly to determine precision and assign values (means and variance between and within subjects and/or pools relative to the calibrator). A proven strategy in the clinical diagnostic community, and discussed at the workshop, is the concept of total allowable error and total analytical error. The total allowable error concept is simple and uses the biology or the clinical question that you desire to answer as the guide for the amount of error that the system can tolerate. Total allowable error was described by Westgard and Groth (14) and is a measure of both imprecision and bias. The intrasubject variability (CVi), composed of both analytical and biological variance, can be used as a guide for recommended precision to meet the purpose of the biomarker assay. The consensus recommendation for imprecision in clinical chemistry applications is 0.5% (CVi). An additional concept is total analytical error that defines both the random and systematic components of the error. Careful consideration must be taken in planning an appropriate analysis of variance study that incorporates all of the sources of analytical variation. While this approach may require additional effort, it is expected that it would be a more scientifically appropriate approach to define analytical performance criteria and that it would be as good as or better than current PK guidance and a fixed 4-6-X rule. ACCURACY There was a considerable debate on the ability of a biomarker assay to demonstrate accuracy similar to a PK assay. A key component of this conversation was differentiation of absolute accuracy from relative accuracy and relevance to biomarker measurement. Positions ranged from accuracy is a must, through absolute accuracy is possible only in a few cases (e.g., small-molecule or peptide biomarkers, the use of stable label analogue reference standards in LC-MS), to it should not be considered in any situation. Overall, the presentations and discussion revisited the importance of accuracy vs. precision for biomarker assays in the context of intended use, the nature of the reference material available, and the emphasis on endogenous analyte. While not resolved, as a minimum, relative accuracy assessments would still be expected as part of assay validation. This is an area of future discussion within the industry.
Crystal City VI Biomarkers Conference Report PARALLELISM For both LBA and LC-MS assays, parallelism experiments were recognized as a critical experiment for determining the validity of surrogate matrices, determining the MRD and establishing immunological similarity for LBAs, and establishing limits of quantification. The use of this approach to address matrix effects vs. diluting a high sample into range vs. defining similarity has been a source of confusion. Sample dilutions with the same matrix as the calibrators to a point where proportional dilution-corrected recovery is achieved indicate that non-specific matrix effects, if any, have been eliminated and provide confidence that the calibrator surrogate matrix is appropriate for sample analysis. For LBA assays, the common smallest dilution that achieves this is often selected as the MRD. Immunological similarity between the calibrator and the endogenous form(s) of the biomarker can be established when dilutions of samples are parallel to the calibration curve. This can be assessed through calculation of back-calculated dilution-corrected recovery for each of these measureable dilutions and determined if the results are the same as the value observed at the MRD within the error of the assay. Multiple samples with different concentrations of analyte should be measured. When parallelism is achieved, this provides high confidence that the analyte and calibrator are immunologically similar to the assay reagents and, thus, the assay is measuring what it is intended to, but it does not mean they are identical. While this concept is increasingly well understood and there is agreement that these are critical validation experiments, its application can be complex, requires scientific judgment, and does not lend itself to a simple rule set. These issues are applicable to LC-MS assays using immunocapture procedures as part of sample preparation. There was discussion regarding how many samples would need to be measured, what acceptance criteria would be applied, what to do when samples were not available with high-enough levels to measure after multiple dilutions, as well as what statistical approaches should or could be used. While there are well-established statistical approaches described for the latter for potency assays, overall, this will be an important area for future discussions. For LC-MS assays, in the absence of an immunocapture step, parallelism is more focused on the impact of binding partners for the biomarker and the influence of other endogenous components vs. their absence in the surrogate matrix on the chromatography and detection, which in the case of LC-MS is the ionization efficiency in the source. Where the endogenous pool concentrations are low, standard addition may be used for the endogenous samples, but the considerations previously discussed for the reference material must be taken into account. Non-parallelism between the surrogate matrix calibrators and samples diluted over the range of the assay indicates a problem requiring remediation. STABILITY Understanding the stability of the biomarker from sample collection, through post-collection processing, during transport and storage, sample preparation, and analysis is essential to support the reliability that can be placed in the biomarker measurement. Since biomarkers are by definition
endogenous compounds involved in biological processes, a number of endogenous enzymes and binding partners may exist. Therefore, the stability at collection and post-collection processing leading up to freezing must be explored and understood. It was noted that this can be challenging for some organizations that do not have ready access to freshly drawn samples from healthy individuals and challenging for most organizations when considering patient population samples. When looking at subsequent stability measurements (e.g., room temperature, long-term storage, and freeze-thaw), differences between individuals and between healthy and patient populations suggest stability should be performed using multiple sources and not just a single pool. While there may be challenges in assessing long-term stability in individual lots of matrix, pools may mask individual variability. Many diseases are so heterogeneous across individuals and longitudinally that individuals should be considered initially. Stability must also be considered in the context of what form of the analyte is being measured and how much is known about the heterogeneity and metabolism/ catabolism and biotransformation. For example, an LBA that measures a bioactive molecule may be designed with reagents that recognize specific epitopes at either end of the molecule including a portion of the bioactive region. Changes to the molecule outside of these epitopes, unless they resulted in conformation changes, may not be detected. Typically, during the early development of a biomarker assay, well before they are considered for category 2 applications, pre-analytical stability (sample collection) and a sufficient understanding of the biology of the biomarker with regard to what forms are present to inform assay design ideally should be available. In contrast to PK QC stability, determinations of biomarker stability must use the first day of analysis (T0) to assess subsequent measures. While no specific number of replicates was identified, three would seem to be a minimum and five would provide a better understanding of the variability of each measurement. SPECIFICITY AND INTERFERENCES Assay specificity in LC-MS assays predominantly relies on the mass spectrometer to measure the whole molecule or a portion (e.g., tryptic peptide) by its specific molecular weight and unique fragments. For small-molecule and peptide biomarkers, traditional extraction approaches work well to provide clean samples suitable to achieve low-sensitivity LCMS assays. Understanding the biology cannot be overlooked as isobaric forms are commonly present (e.g., lipid biomarkers) and the LC can be used to provide separation to augment the specificity of the mass spectrometer detection. When using an immunocapture step, either for the whole protein or peptide from an enzymatic digestion, capture of other proteins or peptides is typically not a problem as the LC may provide separation (one level of specificity) on top of the mass spectrometer measuring only the protein/peptide of interest. Critical within this process is the selection of the surrogate peptide and ensuring that it is not present from other endogenous sources. Situations of protein families with similar sequences or pro-proteins that are cleaved to the active form, as well as catabolites that may be captured and produce the same surrogate peptide, should be evaluated
Arnold et al. during method development. Fortunately, the mass spectrometer excels at these types of experiments by permitting simultaneous measurements of multiple peptides from different portions of the various related molecules (e.g., measuring two peptides, one from the portion of the pro-protein that is cleaved off and one from the active protein, can demonstrate the extent to which the pro-protein may be contributing to the observed active protein). As these types of specificity determinations are essential to ensuring the quality of the assay, one must also consider not only matrix from healthy subjects but also that of patients where perturbations of the typical levels of potential interfering components could exist. Use of the patient matrix during method development and validation is, therefore, essential for both LBA and LC-MS assays. Ligand binding assay specificity is designed into the assay during early method development, both through assay format and more importantly through reagent development. Reagent specificity can be designed based on the epitope and forms of analyte and is tested and selected for in a variety of ways (e.g., Western blots, surface plasmon resonance binding assays) prior to selection for use within an assay. Method development follows a learn-and-confirm approach to ensure specificity, and understand if reagents and assay are measuring the intended analyte(s). Thus, analyte specificity is largely defined prior to validation. Confirmation through conventional spike recovery experiments in matrix with recombinant calibrator may be of limited value. The parallelism experiments as discussed above can be used to assess the presence of interfering substances. While multiplexing is widely used to measure multiple biomarkers and can also be applied to simultaneously measure multiple biomarker forms, these are unlikely to become category 2 assays. In addition, unlike LC-MS, little or no sample processing is required for LBAs, although these approaches can also be used. Thus, non-specific binding, matrix, and other common interferences must be managed through reagent specificity, wash stringency and sample dilution (assay) buffer formulation. The latter may be more important for the LBA biomarker than LBA PK assays. For both platforms, the potential for specific interferences from binding partners requires direct experimental testing in a similar way to PK assays but the interferences must be known and available within the form of the matrix or a purified form.
SAMPLE ANALYSIS The critical performance standard of the biomarker assay should be defined during assay validation and used to establish the acceptance criteria for each sample analysis run, as well as how to demonstrate consistent performance across runs. Here, it is the biology of the disease biomarker that will drive the selection of appropriate performance criteria and not a regulatory defined target. Depending on the program or study, the performance criteria may also need to be considered over a longer time frame or in the context of aggregating data from multiple studies. The issues related to different sources and lots of reference material, the use of pools of endogenous matrix for longitudinal performance monitoring, and the use of the assay in multiple labs, are
among other concepts that should be considered prior to analyzing samples. It was noted that biomarker values in any given population published in the literature may not match the values obtained from a biomarker commercial kit or in-house assay. These reports must be considered in the context of the proceeding discussion of calibrators and the lack of standardization or harmonization. Given this context, it would not be surprising to see such differences, and, as described previously for clinical chemistry assays, the standardized commutability approaches assess this. INCURRED SAMPLE REANALYSIS There was some discussion of incurred sample reanalysis (ISR) during the workshop. It was noted that the basic nature of the biomarker permits consideration of alternative approaches to that are used for PK assays. Some of these approaches totally avoid using study samples. Since QC samples will be endogenous matrix, preferably from the patient population, they can be used to demonstrate assay performance across runs within a study. In clinical chemistry, the mega-pool, a QC or QC pools at different concentrations, could be used to check performance across runs in a study and, when used within multiple studies, provide evidence of the assay performance longitudinally through those studies. Mega-pools are large pools of healthy or patient matrix made from 100s of individuals and can then be dedicated to assessing assay performance over the long term. Obviously, these pools eliminate individual variations, but individual variation can be addressed through testing 10 or more individual patient samples during validation. Whether a small pool or a mega-pool approach is used, each provides evidence that the measured values are representative of study samples and the assay is in control. This discussion was an example of the innovative thinking that can be applied to biomarker assays, and more discussion along these lines will further evolve the topic. REPORTING While reporting was not extensively discussed, it is expected for category 2 assays that both method validation and study sample analysis reports are needed. To aid the health authority reviewer, a summary of the biomarker biology background relevant to the assay and its utility in the program could be included within the validation report. Beyond this, the content would be similar to the reports used for PK assays in that the methodology and performance of the method (validation or during study sample analysis, respectively) are presented in sufficient detail to demonstrate the reliability of the data generated by the method. ADDITIONAL PERSPECTIVE A previous publication by Lowes and Ackermann (15) provides additional participant perspectives on the workshop discussions. With the increasing utilization of biomarkers as an essential part of drug development and approvals, it is expected that this and the current publication will form the basis of future discussion between the scientific and regulatory communities.
Crystal City VI Biomarkers Conference Report CONCLUSIONS The Crystal City VI Workshop focused on the differences between biomarker and PK assays and some of the unique challenges associated with biomarker assays, introduced concepts from clinical chemistry biomarker practice that may be useful for industry to learn from and/or adapt, and recontext critical key validation experiments. There was broad consensus on the need to approach biomarker assays during assay development and validation in the context of its biology and intended use. These include the issues related to the presence of the analyte in all matrix samples (i.e., the absence of a blank matrix). The concept of commutability was introduced and requires additional discussion within the scientific and regulatory communities as to its applicability. There was an emphasis on precision and the use of total error. While there remains a need to approach absolute accuracy for some purposes, relative accuracy can be routinely obtained and is all that is needed for most biomarker measurements. Finally, there was recognition that parallelism is a key analytical validation experiment and that validation experiments using endogenous analyte are critical to define performance and manage change over time. The workshop concluded by not throwing out the BMV rules but rather by saying that the fundamental questions to be answered through the validation are different and, based on the biology, require changes to the practices related to PK assays in order for them to be applicable to biomarker assays. FUTURE DIRECTIONS As the Crystal City VI Workshop was not intended to provide definitive proscriptive practices, it is recognized that science-based practices will be proposed and refined through the literature and at other conferences. It is hoped that those efforts, including collaboration among the scientific and regulatory authorities on a global scale, will generate harmonized practices for all; to that end, bioanalytical focus groups, such as those in the AAPS and in other organizations, should be challenged to tackle the various aspects and circumstances of biomarker validation in order to evolve the collective thinking on this important issue. ACKNOWLEDGMENTS The authors would like to thank and acknowledge the efforts of a number of people who contributed to a successful workshop: Speakers: Sriram Subramanian, Steven Piccoli, Lauren Stevenson, Andrew Hoofnagle, Bradley Ackerman, Richard King, James Mapes, Lakshmi Amaravadi, Medha Kamat, and Noriko Katori Panelists: John Kadavil, Michael Skelly, Russell Grant, Christopher Evans, Lorin Bachmann, Hendrik Neubert, Paul Rhyne, Binodh DeSilva, Russell Weiner, and Masood Kahn Session Chair: Faye Vazvei Notetakers: Eric Fluhler, Jianing Zeng, Theingi Thway, and Stephanie Fraiser
Moderators: Roger Hayes, Omar Laterza, Binodh Desilva, and Patrick Bennett and the AAPS meeting team led by Elizabeth Scuderi: Kimberly Brown, Sandy Hawken, Teresa Homrich, Grace Jones, Kate McHugh, Scott Didawick, Ian Hoch, and Todd Reitzel
REFERENCES 1. FDA, US Department of Health and Human Services. Draft Guidance for Industry: Bioanalytical Method Validation (Revised). [Online] September 2013. http://www.fda.gov/downloads/ Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ UCM368107.pdf. 2. Booth B, Arnold ME, DeSilva B, Amaravadi L, Dudal S, Fluhler E, et al. Workshop report: Crystal City V—quantitative bioanalytical method validation and implementation: the 2013 Revised FDA Guidance. AAPS J. 2015;17(2):277–88. doi:10.1208/s12248-014-9696-2. 3. Shah VP, Midha KK, Dighe S, McGilveray IJ, Skelly JP, Yacobi A, et al. Analytical methods validation: bioavailability, bioequivalence and pharmacokinetic studies. Pharm Res. 1992;9(4):588– 92. 4. Shah VP, Midha KK, Findlay JWA, Hill HM, Hulse JD, McGilveray IJ, et al. Bioanalytical method validation—a revisit with a decade of progress. Pharm Res. 2000;17(12):1151–557. 5. Viswanathan CT, Bansal S, Booth B, DeStefano AJ, Rose MJ, Sailsted J, et al. Workshop/conference report—quantitative bioanalytical methods validation and implementation: best practices for chromatographic and ligand binding assays. AAPS J. 2007;9(1):E30–42. 6. Fast DM, Kelley M, Viswanathan CT, O’Shaughnessy J, King SP, Chaudhary A, et al. Workshop report and follow-up—AAPS workshop on current topics in GLP bioanalysis: assay reproducibility for incurred samples—implications of Crystal City recommendations. AAPS J. 2009;11(2):238–41. doi:10.1208/s12248-0099100-9. 7. Lee J, Smith W, Nordbloom G, Bowsher R. Validation of assays for the bioanalysis of novel biomarkers. In: Bloom C, Dean RA, editors. Biomarkers in clinical drug development. New York: Mercel Dekker; 2003. p. 119–49. 8. Cummings J, Raynaud F, Jones L, Sugar R, Dive C. Fit-forpurpose biomarker method validation for application in clinical trials of anticancer drugs. Br J Cancer. 2010;103(9):1313–7. 9. Lee J, Viswanath D, Barrett Y, Weiner R, Allinson J, Fountain S, et al. Fit-for-purpose method development and validation for successful biomarker measurement. Pharm Res. 2006;23:312–28. doi:10.1007/s11095-005-9045-3. 10. Timmerman P, Herling C, Stoellner D, Jaitner B, Pihl S, Elsby K, et al. European Bioanalysis Forum recommendation on method establishment and bioanalysis of biomarkers in support of drug development. Bioanalysis. 2012;4(15):1883–94. doi:10.4155/ BIO.12.164. 11. Franzini C. Commutability of reference materials in clinical chemistry. J Int Fed Clin Chem. 1993;5:186–9. 12. Vesper HW, Miller G, Myers GL. Reference materials and commutability. Clin Biochem Rev. 2007;28:139–47. 13. Grant R, Hoofnagle A. From lost in translation to paradise found: enabling protein biomarker method transfer by mass spectrometry. Clin Chem. 2014;60(7):941–4. doi:10.1373/ clinchem.2014.224840. 14. Westgard JO, Groth T. Design and evaluation of statistical control procedures: applications of a computer BQuality Control Simulator^ program. Clin Chem. 1981;27:1536–45. 15. Lowes S, Ackermann BL. AAPS and US FDA Crystal City VI workshop on bioanalytical method validation for biomarkers. Bioanalysis. 2016;8(3):163–7. doi:10.4155/bio.15.251.