Quantification of up to 4 Targets
Agnostic to Multiplexing Geometry
BioRad or AppliedBiosystems Data
Clustering of Low Input Data
Digital polymerase chain reaction (dPCR) is a PCR-based technology that enables the sensitive quantification of nucleic acids. In a dPCR experiment, nucleic acids are randomly distributed and end-point amplified in several thousands of partitions that act as micro PCR reactors. The partitioning process and the end-point detection of targets are the foundation of dPCR high sensitivity: each partition receives either zero or few nucleic acid copies, increasing the amplification efficiency; the end-point reaction ensures the amplification of targets to a detectable level. The signal emitted by hydrolysis probes or intercalating binding dyes is used to detect the partitions containing the targets sequence. The maximum number of fluorescence signals read in a single sample represents the major limitation of dPCR technology: the majority of dPCR system on the market is able to detect up to two fluorescence signals, limiting the experiment plexity. Several strategies were developed to overcome that limitation (Whale et al., 2016), however data analysis of multiplex assays and clustering of data generated from low input specimens are still an issue: manual annotation is time-consuming, user-dependent and has poor reproducibility.
Digital PCR Cluster Predictor (dPCP) was developed to automate the analysis of multiplex digital PCR data with up to four targets. dPCP supports the analysis of data generated by QuantStudio 3D Digital PCR System and QX100/QX200 Droplet Digital PCR System, is independent of multiplexing geometry, and is not influenced by the amount of input nucleic acid.
dPCP requires two types of input files:
The sample table has nine columns:
The sample table has to be filled out by the user with the required information. The table format is fundamental for the analysis and must not be changed. A template can be downloaded from the Input Data panel.
The input parameters to be provided are:
The analysis can be saved as a dPCP file by clicking the appropriate button in the Results panel. The dPCP file can be uploaded in the Input Data panel to visualize and modify the analysis.
The first step carried out by dPCP is the collection of data and information from the input files. It is fundamental to generate high-quality data for the reference because dPCP starts the identification of clusters from the reference sample. The ideal reference has:
dPCP identifies the empty partitions and single-target clusters in the reference using the non-parametric algorithm called density-based spatial clustering of applications with noise (Ester et al., 1996) (DBSCAN). Maximum distance (ε) between cluster elements and the number of minimum elements (minPts) to assemble a cluster are the input parameters to be chosen by the users. The Test Reference panel helps the user to identify the most suitable ε and minPts values. The usage of a reference is not mandatory: the empty partitions and single-target clusters can be directly identified in the sample. However, when the input amount is low only a few data elements are positive for each target and the DSCAN analysis could fail in the detection of those clusters. In such a case the usage of a reference sample is fundamental for the correct clustering of the sample data.
After the identification of empty partitions and single-target clusters, their centroid position is identified by computing the arithmetic mean of the coordinates of their data elements. The distance between a cluster centroid and the centroid of empty partitions can be represented by a Euclidean vector. As the coordinates of the centroids of multi-target clusters are predicted to be the sum of the coordinates of the centroids of single-target clusters, the position of the centroid of multi-target clusters can be calculated by computing the vector sum of vectors representing the distance of the centroid the single-target clusters to the centroids of empty partitions.
The clustering analysis of sample data is carried out by the c-means algorithm (Bezdek, 1981; Lai Chung and Lee, 1994; Pal et al., 1996). The principle of fuzzy c-means algorithm is to minimize the variance within the cluster. The intra-cluster variance is defined as the sum of the squared distance of all cluster elements from the cluster centroid. The fluorescence values of sample elements and the coordinates of all centroids are used as input parameter for the analysis. The output of the c-means analysis is a matrix showing the probability of membership of the data elements to each cluster. Each data element is assigned to the cluster whose probability is the highest. If the highest probability is lower than 0.5 a data element is classified as rain and its membership is recalculated with Mahalanobis distance (Mahalanobis, 1936). Mahalanobis distance computes the distance between a point and a distribution, it is based on measuring at multidimensional level how many standard deviations away is a point from the mean of a distribution. The rain-tagged elements are assigned to the cluster with the lowest Mahalanobis distance.
Finally, the copies per partition of each target are calculated according to a Poisson model. (Hindson et al., 2011). Precision is calculated as previously described (Majumdar et al., 2015). Replicates can be combined and the copies per partition are re-calculated.
The results of the dPCP analysis can be visualized in 2 panels:
Quality controls were developed to have a graphical view of the results and to check the quality of the fundamental steps of dPCP analysis. The Quality control panel shows for each sample the plot of the clusters centroid, the plot of c-means clustering and the plot of rain analysis (the latter only if the command of rain analysis is enabled in the Input Data panel). The 3 plots can be downloaded as a unique image, choosing the file format and dpi.
The Figure 1 shows how to interpret the quality control plot of the clusters centroid.
Fig. 1: Quality control of the centroids coordinates prediction. (A) The prediction of coordinates of multi-target cluster centroid did not match the real position. The shift of centroid position can be the consequence of cross-reactive probes or poor assay optimization. (B) The position of clusters centroids were correctly predicted.
In the bottom part of the Quality Control panel, the plot of the DBSCAN analysis is shown for each reference. The plot can be exported as an image, choosing the file format and dpi. The results of the DBSCAN analysis can be exported to rds file to be used as a reference in other dPCP analysis. When a rds file is used as reference, the dPCP does not need to perform a DBSCAN analysis because it retrieves the information directly from the rds file, speeding up the workflow.
The identification of the empty partitions and single-target clusters in the reference is the first step of dPCP analysis and it relies on the DBSCAN algorithm. The performance of DBSCAN depends on the input parameters ε and minPts that are the only two values the user has to adapt for dPCP analysis. The Test Reference panel helps to choose the best input values and to select a suitable reference sample. The user has to upload the raw data file (see How to start the analysis?) of the candidate reference sample and the values of ε and minPts to be tested. The results are shown in a plot. The ideal combination of reference and input values is chosen according to the following criteria:
Some explicative examples are shown in the Figure 2.
The combinations (E) and (F) identified the empty partitions cluster and all single-target clusters, therefore they are suitable for the analysis and this sample can be used as a reference.
Bezdek,J.C. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms Springer US, Boston, MA.
Ester,M. et al. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining., pp. 226–231.
Hindson,B.J. et al. (2011) High-throughput droplet digital PCR system for absolute quantitation of DNA copy number. Anal. Chem., 83, 8604–8610.
Lai Chung,F. and Lee,T. (1994) Fuzzy competitive learning. Neural Networks, 7, 539–551.
Mahalanobis,P.P.C. (1936) On the generalized distance in statistics. Proc. Natl. Inst. Sci. India, 2, 49–55.
Majumdar,N. et al. (2015) Digital PCR modeling for maximal sensitivity, dynamic range and measurement precision. PLoS One, 10, e0118833.
Pal,N.R. et al. (1996) Sequential competitive learning and the fuzzy c-means clustering algorithms. Neural Networks, 9, 787–796.
Whale,A.S. et al. (2016) Fundamentals of multiplexing with digital PCR. Biomol. Detect. Quantif., 10, 15–23.
Developed by Alfonso De Falco.
The R package used is available on CRAN. The source code is published on Github.
Copyright 2020 Laboratoire national de santé
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.