Neural Computing and Applications https://doi.org/10.1007/s00521-017-3291-2
ORIGINAL ARTICLE
Stem cell motion-tracking by using deep neural networks with multi-output Yangxu Wang1 • Hua Mao1 • Zhang Yi1 Received: 11 April 2017 / Accepted: 15 November 2017 The Natural Computing Applications Forum 2017
Abstract The aim of automated stem cell motility analysis is reliable processing and evaluation of cell behaviors such as translocation, mitosis, death, and so on. Cell tracking plays an important role in this research. In practice, tracking stem cells is difficult because they have frequent motion, deformation activities, and small resolution sizes in microscopy images. Previous tracking approaches designed to address this problem have been unable to generalize the rapid morphological deformation of cells in a complex living environment, especially for real-time tracking tasks. Herein, a deep learning framework with convolutional structure and multi-output layers is proposed for overcoming stem cell tracking problems. A convolutional structure is used to learn robust cell features through deep features learned on massive visual data by a transfer learning strategy. With multi-output layers, this framework tracks the cell’s motion and simultaneously detects its mitosis as an assistant task. This improves the generalization ability of the model and facilitates practical applications for stem cell research. The proposed framework, tracking and detection neural networks, also contains a particle filter-based motion model, a specialized cell sampling strategy, and corresponding model update strategy. Its current application to a microscopy image dataset of human stem cells demonstrates increased tracking performance and robustness compared with other frequently used methods. Moreover, mitosis detection performance was verified against manually labeled mitotic events of the tracked cell. Experimental results demonstrate good performance of the proposed framework for addressing problems associated with stem cell tracking. Keywords Cell tracking Neural networks Mitosis detection Multi-output
1 Introduction Stem cells have shown huge potential for regenerative medicine in recent years for their ability to replace damaged or diseased tissues [1, 2]. Cell tracking, which aims to observe and analyze individual cell activities such as motion and deformation directly from microscopy images, plays an important role in this research and also in the & Hua Mao
[email protected] Yangxu Wang
[email protected] Zhang Yi
[email protected] 1
Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, People’s Republic of China
broader field of biomedicine [3]. From tracking results, cell behavior can be expanded upon for constructing cell lineage and analyzing cell morphology. Tracking can also provide the sensitivity and specificity of cell measurements necessary to uncover the principles of whether stem cells can regenerate tissues. While tools exist for this process, even modern analytical methods require significant input including pre-filtering of data, manual tuning of algorithms, and post-analysis of output data. Thus, developing an automated cell tracking method is essential [4]. Cell tracking over time is essentially a problem of object tracking, but with several complications particular to the domain. First, cells rotate randomly, move irregularly during migration, and can change shape or contour dramatically [5]. Traditional tracking methods consistently assess a rigid target without significant shape changes, and it is very challenging to learn robust cellular features. Second, it is difficult to distinguish tracked cells from other
123
Neural Computing and Applications
small particles, as living cellular environments are complex and contain germs and dead cells [6]. The final challenge arises from the low resolution of microscopy images, with the small size of tracked cells making it even more difficult. Most cell tracking methods first detect cells in all frames independently with segmentation algorithms such as gradient features [7], intensity [8], or wavelet decomposition [9]. Next, the most probable cells between adjacent frames are determined using intensity coherence and spatial information based on detected cells [10]. A level-set based approach was proposed to handle segmentation and tracking problems using image level sets computed via threshold decomposition [11]. Cells in successive frames were segmented frame by frame and between-frames; object-pairing was carried out to follow each object displacement during the image sequence in a subsequent method [12]. However, image pre-segmentation, a crucial step for these methods, is unsuitable for real-time tracking tasks. Additional methods [13, 14] directly optimized parameterized models or functions in order to fit them into targeted cells. This methodology does not identify all possible objects in the frame, but instead focuses on one unique candidate located around an initial position. A mean-shift algorithm was successfully used to achieve livecell tracking in gray scale phase-contrast videos using cell regions modeled by geometric kernel structures [15]. Automated cell tracking methods developed thus far are typically isolated to practical data and always require human effort for pre-processing, feature selection, and parameter tuning [16]. Such limitations render these methods inadequate for real-time stem cell tracking. Recently, tracking-by-detection methods have become widely used in the field of visual tracking [17, 18]. These methods handle tracking as a real-time processing procedure with online feature learning and tracking techniques suitable for real-time cell tracking tasks. As the search window is only located over the tracked cell in the first frame in this type of method, it is still difficult to learn robust features of the target cell from limited samples. Arising from the development of deep learning [19], which has proven to be a successful method for feature learning [20, 21], are the advantages of feature discovery by deep networks trained on large image datasets for special applications by a transfer training method. In this paper, a tracking-by-detection method using deep neural network is proposed for automated stem cell tracking task in microscopic images. Convolutional neural networks (CNNs) [22] are utilized in our work for learning robust features of moving cells, which can be first pre-trained with a large training set for a classification task and then fine-
123
tuned using stem cell images. This strategy makes use of the robust contour features learned from the pre-trained model and transfers them to cells through the fine-tuned procedure. This process is helpful for overcoming the problem of limited training samples for real-time cell tracking. Moreover, to further utilize and verify the robustness of features learned from CNNs structures, an additional output important for stem cell research is proposed in our framework: simultaneous detection of tracked cell mitosis [23]. Thus, our framework, called tracking and detection neural networks (TDNNs), is essentially a deep neural network with convolutional structure [24] for feature extraction and two output layers for tracking and detection tasks, respectively. While the two outputs share the same features from convolutional layers, they individually handle tracking or mitosis detection tasks. After training, the tracking output estimates if the correct target stem cell has been tracked in each frame, while the mitosis detection output detects if the tracked stem cell is splitting. In this manner, it was revealed that our framework learned robust features of moving cells automatically and exhibited better generalization performance in practice. Finally, it is applied to a microscopy image dataset of human stem cells and achieves real-time results for both tracking and mitosis detection tasks. Other contributions of this paper are summarized as follows: •
•
•
The novel deep neural network proposed for analyzing the activities of single stem cells does not require segmentation or contouring of cells, or complex image pre-conditioning techniques during the tracking and mitosis detection processes. With only one ground truth for the target stem cell in the beginning frame, this framework automatically tracks the cell’s motion in the following frames and detects its mitosis; Although cells may change shape or contour dramatically, use of a transfer learning method by deep neural networks facilitates the learning of invariant robust features of stem cells for both tracking and mitosis detection tasks; and Moreover, with a multi-output learning strategy, tracking and mitosis detection are combined in our framework. They can work simultaneously for real-time analysis for cell tracking and mitosis detection tasks. And it is helpful to understand cell’s life activity.
In Sect. 2, an outline of TDNNs is given, and the deep learning architectures and techniques used in this work are introduced. Experimental details are elucidated in Sect. 3. Finally, conclusions and future work are described in Sect. 4.
Neural Computing and Applications
2 Method
2.2 Particle filter-based method
In this section, the framework and method proposed for cell tracking and mitosis detection are introduced.
The particle filter approach [25], used during the tracking process for sampling as shown in green in Fig. 1, is a sequential Bayesian estimation approach that recursively infers the hidden state of the target. It is the dominant approach in visual tracking. Mathematically, tracking aims at solving the problem of verifying the most probable state of the target at each time step t based on the observation results of the previous time step t 1 [26]: st ¼ arg max p st jy1:t1 Z ð1Þ ¼ arg max p st jst1 p st1 jy1:t1 dst1 :
Framework Our deep learning-based framework is shown in Fig. 1. It consists of two main parts: a sampling process (green) and the TDNNs. The TDNNs are composed of a tracking part (blue), detection part (red), and convolutional layers. The target stem cell is only marked out using a bounding box at the start frame, and then the system tracks the cell and detects its mitosis automatically. This bounding box covering the target cell is denoted as z ¼ fx; y; h; lg. Where fx; yg denotes the center coordinate of the box, and fh; lg denotes the height and width of the box. Subsequently, each incoming frame of the cell image sequence is processed as follows: 1.
First, z is obtained from the previous frame (contains image and position information of the target cell), and a particle filter-based random sampling algorithm is applied to the current frame to generate n candidate position set Z ¼ z1 ; z2 ; . . .; zn ; The cell tracking part of the network confirms the target cell region z0 from candidate positions set Z in the current frame. Set z ¼ z0 ; The mitosis detection part of the networks detects whether the target cell is splitting; and Process the next frame and return to step 1.
2.
3. 4.
After step 2, the target cell is chosen from candidates in the current frame, while step 3 establishes whether the target cell is undergoing mitosis. The framework repeats this process frame by frame to track the cell and detect its mitosis simultaneously. Additional details are as follows:
2.1 Initialization In the initialization phase, the original position of the target cell zinit , which is denoted with a bounding box, is initialized manually at the beginning frame (shown in Fig. 1).
where st and yt denote latent state and observation variables at time t, respectively. Bayes rule is used to update the posterior distribution of the state variable here when a new observation yt arrives: pðyt jst Þpðst jy1:t1 Þ : p st jy1:t ¼ pðyt jy1:t1 Þ
ð2Þ
Particle filter approximates the true poster state distribution pðst jy1:t1 Þ by a set of n samples, called particles, fsti gni¼1 with corresponding importance weights fwti gni¼1 which sum to 1. Particles are drawn from an importance distribution qðst js1:t1 ; y1:t Þ. For cell tracking, the state variable s represents the position of the cell, which is denoted as z ¼ fx; y; h; lg. Here, a normal distribution is proposed to model each dimension of qðst jst1 Þ independently. The particle filter is used as a motion model for trackers to generate the probable states of the target at each time step. The tracking result is the particle with the largest probability output determined by different observation models, which the deep neural network was used for this work. For the cell tracking task, as shown in Fig. 2, the particle filter algorithm is used for sampling the candidate positions observational model
TDNNs
TDNNs
Top K
Top K
Paral filter based sampling
Tracking part
Top K
Candicated posions CNNs
Detecon part
Paral filter based sampling
Partial filter Partial filter
…
…
Fig. 1 Overview of the proposed framework. Blue: tracking part; green: particle filter-based sampling process; and red: mitosis detection part (color figure online)
Fig. 2 Particle filter method in tracking phase
123
Neural Computing and Applications
in the current frame based on the tracking result of the previous frame. The n candidate positions are denoted as Z ¼ z1 ; z2 ; . . .; zn and each candidate is denoted as zi ¼ fai ; bi ; hi ; li g. Next, the TDNNs are used as an observational model, which accepts the image of these candidates as inputs and outputs with both probabilities of being cells or backgrounds, and probabilities of mitosis. Instead of just obtaining the particle (candidate position) with maximum probability, the top K probabilities are used here to determine the final result. First, K positions with maximum probabilities Si ; i ¼ 1; 2; . . .; K are chosen. Next, the position with the minimum sum distance between other positions among them is chosen: K X S ¼ arg min ð S Si Þ 2 :
ð3Þ
Output Input
i¼1
The complete algorithm is shown in Algorithm 1.
Fig. 3 Deep neural networks
Algorithm 1 The particle filter based method Require: Parameters: N : the number of particles sampling by particle filter algorithm in each step K: the number of the top K particles Z = {z1 , z2 , ..., zN }: the position set of the N particles S = {S1 , S2 , ..., SN }: the corresponding probability for P if they’re the cell or background z0 : the initial position of the target cell zt : the output position of the target cell in frame t Input Data: F = {F0 , F1 , F2 , ..., FT } : Image sequence F with length T Ensure: for all Ft ∈ F do if t = 0 then Initial the position of the target cell at the beginning frame: z0 ; else Sampling N particles based on Ft−1 using particle filter algorithm and get their position set Z: Z = {z1 , z2 , ..., zN }; for all Zn ∈ Z, n = 1, 2, ..., N do Compute the probability Sn if it’s the target cell or background at position Zn using the tracking networks model; end for Get the S set, S = {S1 , S2 , ..., SN } Get the K positions Z K = {Z1 , Z2 , ..., ZK } from Z with the max probabilities in S; K zt = arg minj (Zj − Zi )2 i=1 end if end for
2.3 Tracking and detection neural networks Broadly speaking, deep neural networks (DNNs) [27] are artificial neural networks with multiple hidden layers between the input and output layers and are trained with special techniques [28], as shown in Fig. 3. DNNs can be used to model complex nonlinear relationships for a variety of applications. Convolutional deep neural networks (TDNNs) are proposed in this work for both tracking and mitosis detection tasks. CNNs, a popular structure of deep learning networks for achieving state-of-the-art performance on computer vision tasks, are used in the framework. After training, it can extract high-level
123
abstraction features of input data through the alternate convolutional structures and use them for the final classification or regression task. The structure of the TDNNs is shown in Fig. 4. It can be divided into three parts: convolutional feature extraction, tracking, and detection. For the convolutional part, it contains the number of convolutional layers followed by max-pooling layers and two fully connecting layers. Although complex models are more powerful, they cost lots of computing time for real-time tracking and are very difficult to fine-tune using limited cell samples. This part is pre-trained on the CIFAR-10 benchmark dataset [29], a classification dataset containing 60,000 color images.
Neural Computing and Applications Pre-trained CNN Conv + pooling 32 32
Conv + pooling
1024 64
512
64
resize
512
2
64
Tracking 3
256
3
2 32
Conv + pooling
Mitosis detecon
Conv + pooling
Fine-tune
Fig. 4 Structure of the proposed networks
CIFAR-10 is reported to be suitable for simple CNNs to learn extracting robust shallow features [30]. In this manner, previous layers of the TDNNs learn the features of edges and contours for recognition. After pre-training, it learns how to recognize stem cells, and the weights of this part are fixed. Two output parts follow behind these pretrained layers, one for the tracking task and the other for the mitosis detection task. The tracking part outputs the probability of targets being cells or backgrounds, while the detection part outputs whether or not the cell is splitting. They shared the weights of the convolutional part, but are trained with different targets. The weights of the tracking part update online, while the weights of the mitosis detection part are trained offline.
Tracking The tracking part, which follows the convolutional feature extraction part, consists of two fully connecting layers with sigmoid units. It fine-tunes using cell images sampled from the beginning frame first, with weights updating during the online tracking procedure. The sampling process is shown in Fig. 5. Positive samples are selected in a 3 3 region around the center of the target cell. Morphologic transformation is used here (two-horizontal symmetric figure with 90 rotated four times). Thus, 72 ð3 3 8) positive samples are produced after the sampling process. For negative samples, we randomly selected 100 points as the center points in an annular region, which is in the region of (r, 1.2r) (as shown in pffiffiffiffiffiffiffiffiffiffiffiffiffiffi yellow). Here, r is computed as h2 þ l2 . All samples are
Pos sampling
3X3
Transformaon
8
Random sampling Neg sampling
... 100
Fig. 5 Sampling phase. Different strategies are used for positive and negative samples. Positive samples are selected in the 3 3 central region, while negative samples are randomly selected in an annular region
123
Neural Computing and Applications
used to fine-tune the tracking part by minimizing the mean square error (MSE), as shown in Eq. 4. m 1X ðFðxi Þ yi Þ2 ; Lð x; HÞ ¼ ð4Þ m i¼1 Here, H denotes the parameter set; xi denotes the ith sample of the training batch with size m. y is the label. Fðxi Þ denotes the output of the tracking part for input xi . After finetuning, the TDNNs accept the images of candidate positions selected by the particle filter algorithm as inputs and output the probability that they are cells or background. As shown in Fig. 6, the tracking part of TDNNs updates using an online mode. In frame n, the TDNNs are trained to recognize the target cell by sampling around its position in frame n 1. After that, N candidate positions for the target cell are first chosen in frame n þ 1 (based on the tracking result of frame n with particle filter algorithm) and then checked using the tracking part of TDNNs. Weights of the tracking part will be updated if necessary to better fit the changing of the cell. If the maximum output probability of the current network for candidates is less than threshold, it may respond that the current model has not learned efficient features to recognize the target cell in this frame. Thus, it samples again in this frame (frame n þ 1) and soon updates the weights of the tracking part. Mitosis detection The mitosis detection part has almost the same structure as the tracking part, with the exception of the number of neurons in their first fully connected layer (512 vs. 256). To train this part, a dataset containing images of cells in two states (normal cells and ones in mitosis) is constructed. All cells are chosen from stem cell images that were not part of the training process. In total, 133 positive samples (mitosis) and 150 negative
samples (normal) are labeled for training, as shown in Fig. 7. All samples are resized to 32 32 to match the input size of the TDNNs. A softmax function was added at the end of the detection part. m 1X Lð x; HÞ ¼ ½yi logðPðxi ÞÞ m i¼1 ð5Þ þ ð1 yi Þ ð1 logðPðxi ÞÞÞ; Here, H is the parameter set of this part, x denotes the input, and y denotes the label for whether the cell sample is splitting or not. Pðxi Þ denotes the output of the detection part for input xi . A back-propagation algorithm is used to train this network. In contrast to tracking, this part is trained offline. After training, weights of the mitosis detection part remain unchanged. As shown in Fig. 1, when the tracking part outputs the probability of a candidate position being a cell or background, the mitosis detection part simultaneously outputs the mitosis probability of the cell. After the position of the cell in the current frame is determined, the mitosis probability of this cell is also confirmed. If the probability is larger than threshold, it is treated as a splitting cell and outputs this result. In this manner, it can discover the splitting event of a cell during its tracking process.
3 Experiment and results Details of the dataset and experiments are described in this section. The dataset was introduced in Sect. 3.1. In our experiment, the TDNNs are used for tracking a single stem cell motion and detecting its mitosis. Experimental results are shown in Sect. 3.3.
Fig. 6 Model update strategy for the tracking process
train
sampling
TDNNs (tracking)
update TDNNs (tracking)
recognize Frame n
…
123
Frame n+1
…
Neural Computing and Applications Fig. 7 Mitosis samples
Spling sample
Normal sample
133 3.1 Dataset and processing The source dataset used in experiments included lowcontrast RGB image sequences of stem cells captured by a microscope at 5-min intervals, containing 376 frames with a size of 3322 2496 pixels. It records various activities of stem cells, which develop from a stem cell into cell clusters or complex tissues. A local field of this image sequence with a size of 400 400 is shown in Fig. 8; frames were selected to reflect the activity of cells. To process and display data more conveniently, we randomly chose a 400 400 patch of each frame at the same position to construct new sequences in our experiment. Finally, the dataset is divided into three parts (different positions) with a smaller size, as shown in Fig. 9. Notably, Data3 is only used to select cells for training the mitosis detection model.
150
As it does not require pre-processing and segmentation, it has the same performance on the whole image. An established computer vision dataset for object recognition, CIFAR-10, is used for the pre-training phase. It consists of 60,000 32 32 color images containing one of the 10 object classes, with 6000 images per class. We first converted them to gray images and trained a 10-classification network. Next, the classification layer is removed to initialize the convolutional feature extraction part of the TDNNs. Weights of the tracking and detection part are initialized randomly following a normal distribution. For the mitosis detection model, fifty frames are first randomly selected from Data3. Next, a manually annotated cell dataset is built from these frames to train the mitosis detection part, which consists of 133 splitting samples and 150 normal samples, as shown in Fig. 7. As a common
Mitosis
1
5
9
14
16
22
32
34
38
Mitosis
24
27
30
Fig. 8 Stem cell sequence. The target cell being tracked and detected is circled in yellow. Mitotic events are also shown (frame 14 ! 16, 30 ! 32) (color figure online)
123
Neural Computing and Applications
2496
as this is helpful to improve the generalization ability of models. Mean square error (MSE) cost is used for tracking, while cross entropy cost is used for mitosis detection. For the choice of deep network architecture, the convolutional neural network (CNN) model is chosen because of its ability of learning shift-invariant features. As the stem cell images are always with small size, a 5-layer convolutional structure with small kernel size (33) is used at last inspired by the CIFAR networks.
Fig. 9 Datasets. Data1 and Data2 are used to verify the performance of TDNNs. Data3 is only used for selecting cells to train the mitosis detection model
sense of deep learning, more labeled samples will improve detection accuracy. When training the tracking and mitosis detection networks, morphological transformation (rotation and symmetry transformation) is performed for all input image patches to increase the generalization ability of the model. For each image, a horizontal symmetry transformation is performed first, and then the image is rotated 90 counterclockwise four times. For each input sample, it generates eight final images for training.
3.2 Implementation details The proposed framework is implemented in Python based on Keras and Theano. As shown in Fig. 4, the structure of networks in our framework is: {input: 32 32}-{conv1: 32 @ 3 3}-{conv2: 32 @ 3 3}-{max-pooling1: 2 2}{dropout1: 0.25}-{conv3: 64 @ 3 3}-{conv4: 64 @ 3 3}-{max-pooling2: 2 2}-{dropout2: 0.25}-{fc1: 1024}-{fc2: 512}: 1. {tracking fc: 512}-{tracking output: 2}, 2. {detection fc: 256}-{detection output: 2}. Here, 32 @ 3 3 denotes that there are 32 convolutional kernels with a size of 3 3 in this convolution layer. The size of max-pooling layers is 2 2. Dropout layers are added behind the convolution layers with different dropout rates,
123
Sampling During the tracking process, the particle filter algorithm is set to randomly draw 1000–2000 particles in each frame. It is set up to be 1500 in our experiment which balances the performance and the speed. We chose the top K particles with maximum output scores using the tracking networks. K, which can be set from 20 to 50, was set to 40 in our experiment. The update threshold is 0.85 for tracking. Weights of the tracking part update whether the maximum output score of the 1500 particles is less than 0.85 or the model has not been updated for more than 15 frames. Tracking The tracking model updates online. For the first frame, the tracking model is fine-tuning using the samples sampling around the ground truth box as shown in Fig. 5. Seventy-two (3 3 8) positive samples and 100 negative samples are selected. Next, the stochastic gradient descent (SGD) algorithm is used to update the network’s parameters. The learning rate is set to 0.005, and an early stop strategy is used in the training process with a maximum training epoch of 15. In the following frames, the learning rate is set to 0.0005, and the maximum training epoch is set to 4 if it is necessary to update the tracking model. Mitosis detection To train the mitosis detection part, images of 133 positive samples and 150 negative samples are annotated out first. As cells exhibit various shapes and sizes during the division process, samples are expanded by symmetry, rotation, and translation operation to improve the generalization ability of the model. Finally, the detection model is trained using 133 8 positive samples and 150 8 negative samples. It outputs the mitosis probability of the tracked cell. The SGD algorithm is used to train the mitosis detection part offline. Thus, after training, weights of the detection part remain unchanged during the tracking process. The learning rate for this model is set to 0.001, and the model converges after 50 epochs. In our experiment, if the output probability of the model is larger than 0.90, it will be defined as a mitotic event and pointed out during the tracking process.
Neural Computing and Applications Table 1 Performance of the tracking model on OTB15 dataset
Ours
DLT
MIL
MTT
VTD
Car4
100 (5.8)
100 (6.0)
24.7 (81.8)
100 (3.4)
35.2 (41.5)
Car11
86.4 (13.2)
100 (1.2)
68.7 (19.3)
100 (1.3)
65.6 (23.9)
Women
69.2 (8.9)
67.1 (9.4)
12.2 (123.7)
19.8 (257.8)
17.1 (133.6)
Animal
88.5 (8.9)
87.3 (10.2)
63.4 (16.1)
88.7 (11.1)
91.5 (10.8)
Walking
100 (0.9)
29.8 (11.2)
68.4 (19.3)
100 (1.2)
65.6 (23.9)
Surfer
100 (1.0)
86.5 (4.6)
42.4 (16.2)
82.1 (7.5)
90.3 (5.8)
Football
77.2 (5.4)
54.4 (7.1)
55.1 (16.0)
71.1 (6.5)
80.8 (4.1)
Box
82.2 (15.6)
72.5 (24.0)
65.1 (109.0)
25.6 (54.8)
34.1 (114.1)
Walking
Car
Woman
The proposed model is compared with some state-of-the-art trackers using eight challenging benchmark video sequences. These trackers are as follows: DLT [26], MIL [32], MTT [33], and VTD [34]. We use the original implementations and default parameter settings of these trackers provided by their authors as they got the best results
Ours
VTD
TLD
MIL
MTT
Fig. 10 Comparison of different trackers in terms of the bounding box reported
3.3 Results Both tracking and mitosis detection results are illustrated in this section. The tracking performance of the proposed framework is evaluated first and then compared with some common tracking algorithms on the OTB15 dataset [31]. Next, TDNNs are tested for tracking stem cells in the same frame. After that, TDNNs are used to track different stem cells and simultaneously detect their mitosis. As mitotic events occur along with the motion of stem cells, we have manually marked all of the mitotic events of target stem cells in this experiment to verify the detection accuracy of the TDNNs.
Tracking The tracking part of TDNNs is first tested on videos of the OTB15 dataset [31] to verify its efficacy. Results are shown in Table 1. Two common tracking performance metrics are used here for quantitative comparison: successful tracking rate and central location error. A tracker is considered to be successful if the overlap percentage of its tracking result (denoted as op) is larger than 0.5, op can be computed as follows: op ¼
areaðbbtracker \ bbtruth Þ : areaðbbtruth [ bbtracker Þ
ð6Þ
While bbtracker denotes the bounding box produced by the tracker, and bbtruth denotes the ground truth bounding box. The central location error denotes the Euclidean distance between the centers of bbtracker and bbtruth . The best results
123
Neural Computing and Applications
Fig. 11 Tracking results of three different cells in the same frame
123
Neural Computing and Applications Table 2 Comparison of different trackers on selected stem cells
Ours
DLT
VTD
MIL
MTT
Cell1
89.2 (3.7)/84.4 (4.7)
83.4 (6.0)
23.5 (33.8)
51.0 (21.4)
81.4 (4.9)
Cell2
95.1 (3.2)/89.3 (7.4)
85.5 (6.4)
28.4 (31.5)
59.8 (21.2)
85.3 (5.7)
Cell3
92.2 (3.6)/87.1 (8.2)
82.6 (7.9)
19.6 (35.2)
42.2 (26.9)
89.1 (4.5)
Bold values show the results with morphological transformation in our experiments The results without morphological transformation during the sampling phase are also shown in this table which is denoted using underlines Table 3 The detection results of TDNNs for selected cells. It achieves a detection accuracy of 88.9% for normal cells and 63.2% for mitotic cells Stem cell
All frames
Mitosis
Normal
Accuracy
Cell1
102
3 (2)
99 (89)
89.2
Cell2
102
3 (2)
99 (92)
92.1
Cell3 Cell4
102 102
4 (2) 4 (4)
98 (81) 98 (91)
81.4 93.1
Cell5
102
2 (1)
100 (88)
87.3
Cell6
102
3 (1)
99 (86)
85.3
Total
612
19 (12)
593 (527)
88.0
are highlighted in bold face. The TDNNs achieve the best results in car4, women, walking, surfer, and box. It also achieves the best results on average among all of the eight video sequences compared with other trackers. For football and animal, the targets are fast moving sporter and animal with motion blur (like cells). Other methods failed in some frames. The TDNNs can track the targets to the end with fewer mistakes. Parameters for the particle filter-based sampling algorithm are set to N ¼ 1500 and K ¼ 40, as previously mentioned. The same network architecture is used in this experiment. Only the original image patches are used for training during this sampling phase, morphological transformation is not used here. This model is pre-trained on the CIFAR-10 dataset and fine-tuned using images that sampling from current frames as described in Sect. 2.3. Figure 10 shows the comparison of different trackers in terms of the bounding box reported on some video sequences. Compared with other approaches, it demonstrates better results. Next, the framework was used for the cell tracking task. To address the various morphological deformation of stem cells, a morphological transformation is used here during the sampling phase, as introduced in Fig. 5. Three stem cells are randomly selected, and their ground truth bounding boxes labeled manually. Tracking results are shown in Fig. 11. As the stem cells are too small for observation (about 10 10 pixels), the overlap threshold is set at 0.4. The beginning 102 frames of all 300 frames were
chosen for this experiment because of their most frequent cell movement. The results are shown in Table 2, where the bounding boxes of different target cells are only given in the beginning frame. The TDNNs show strong robustness for cell deformation during the tracking process. Among all compared trackers, it achieves the best results in both the tracking success rate and the central-pixel error. With the morphological transformation phase, it gets small central-pixel errors for all cell sequences during the tracking process. The results without morphological transformation during the sampling phase are also shown here. It reveals that morphological transformation plays an important part in TDNNs which improves the generalization ability of the model. Mitosis detection As mentioned above, the mitosis detection part is trained on a selected dataset containing 133 mitotic samples and 150 normal (non-mitotic) samples. It is first divided into a training set (100 mitotic samples and 110 normal samples) and a validation set (33 mitotic samples and 40 normal samples) randomly. The model is trained on the training set and tested on the validation set. It gets an 87.6% average accuracy with a fivefold cross-validation. Next, all samples are used to train the mitosis detection networks to detect mitotic events during the tracking process. Six cells, each of which has split more than once, are chosen in this experiment. All mitotic events (the mitosis occurs in which frame during the cells motion) of the six cells are annotated manually. For each stem cell, the initial position is given to the TDNNs, and it begins to track the cells motion and detect its mitotic events during the tracking process frame by frame. Notably, as TDNNs are a single-target tracking model in essence, when mitosis occurs it only continues to track one of the two cells that has the maximum probability. As shown in Eq. 7, if the mitosis frame detected by the detection model, denoted as Fdetect , is near the labeled mitosis frame Flabel , it is considered to be a correct detection. As shown in Table 3, the numbers in the brace denote correct detection results of all mitotic and normal cells. The detection model demonstrated an accuracy of 88.0% in total. It correctly pointed out 12 of 19 mitosis events. Some of the detection results are shown in Figs. 12 and 13. Finally, the proposed TDNNs
123
Neural Computing and Applications
Fig. 12 Mitosis detection results of cells. The cell in the mitosis frame detected by the detection model is labeled with a black bounding box
Fig. 13 Tracking results of one cell
framework achieves an average frame rate of 1.6 fps (frame per second) on an i7 3.6 GHz dual-core PC with the Nvidia TitanZ GPU for both the tracking and detection tasks. jFdetect Flabel j 1:
ð7Þ
4 Conclusion and future work In this manuscript, a deep learning-based framework— TDNNs—for stem cell tracking is proposed and applied to a real stem cell dataset. By sharing the same features from convolutional layers, TDNNs can detect mitosis of the tracked cell at the same time as an assistant task. The results show that it achieves better performance on single cell tracking tasks than some other tracking algorithms and exhibits good performance for detection tasks during the tracking process. Although accurate automated cell tracking remains challenging, deep learning approaches certainly offer insights into technological improvements. Our future work will attempt to
123
expand TDNNs for multi-target tracking tasks, as it can currently only track one of the two cells present after mitosis occurs, which remains a challenge for future work. Acknowledgements This work was supported by the Key Program of National Natural Science Foundation of China (61402306, 61432012, U1435213).
Compliance with ethical standards Conflict of interest We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service, and company that could be construed as influencing the position presented in or the review of this manuscript.
References 1. Weissman I (2005) Stem cell research. J Am Med Assoc 294(11):1359–1366
Neural Computing and Applications 2. Dimarakis I, Levicar N (2007) Cell culture medium composition and translational adult bone marrow-derived stem cell research. Stem Cells 24(12):2888–2890 3. Kircher MF, Gambhir SS, Grimm J (2011) Noninvasive celltracking methods. Nat Rev Clin Oncol 8(11):677–688 4. Sacan A, Ferhatosmanoglu H (2008) Celltrack: an open-source software for cell tracking and motility analysis. Bioinformatics 24(14):1647–1649 5. Bise R, Yin Z, Kanade T (2011) Reliable cell tracking by global data association. In: Proceedings of 2011 IEEE international symposium on biomedical imaging: from nano to macro, vol 48, pp 1004–1010 6. Meijering E, Dzyubachyk O, Smal I (2012) Methods for cell and particle tracking. Methods Enzymol 504:183–200 7. Alkofahi O, Radke RJ, Goderie SK, Shen Q, Temple S, Roysam B (2006) Automated cell lineage construction: a rapid method to analyze clonal development established with murine neural progenitor cells. Cell Cycle 5(3):327–335 8. Li F, Zhou X, Ma J, Wong STC (2010) Multiple nuclei tracking using integer programming for quantitative cancer cell cycle analysis. IEEE Trans Med Imaging 29(1):96–105 9. Padfield D, Rittscher J, Roysam B (2011) Coupled minimum-cost flow cell tracking for high-throughput quantitative analysis. Med Image Anal 15(4):650–668 10. Ren Y, Xu B, Zhang J, Zhang W (2015) A generalized data association approach for cell tracking in high-density population, In: Proceedings of IEEE international conference on control, automation and information sciences (ICCAIS), pp 502–507 11. Mukherjee DP, Ray N, Acton ST (2004) Level set analysis for leukocyte detection and tracking. IEEE Trans Image Process 13(4):562–572 12. Lou X, Hamprecht FA (2011) Structured learning for cell tracking. In: Advances in neural information processing systems, pp 1296–1304 13. Li K, Chen M, Kanade T (2008) Cell population tracking and lineage construction with spatiotemporal context. Med Image Anal 12(5):546–566 14. Maska M, Ulman V, Svoboda D, Matula P, Matula P, Ederra C, Urbiola A (2014) A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11):1609–1617 15. Jiang RM, Crookes D, Luo N, Davidson MW (2010) Live-cell tracking using sift features in DIC microscopic videos. IEEE Trans Bio-med Eng 57(9):2219 16. Guo D, Al VDV (2014) Red blood cell tracking using optical flow methods. IEEE J Biomed Health Inform 18(3):991–998 17. Wu Y, Lim J, Yang MH (2013) Online object tracking: A benchmark. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2411–2418
18. Li X, Hu W, Shen C, Zhang Z, Dick A, Hengel AVD (2013) A survey of appearance models in visual object tracking. ACM Trans Intell Syst Technol 4(4):1–48 19. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 20. Zhang H, Cao X, Ho JKL, Chow TWS (2016) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 99:1 21. Oyedotun OK, Khashman A (2017) Deep learning in visionbased static hand gesture recognition. Neural Comput Appl 28:3941–3951 22. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol 25, no 2 23. Wei J, Li XP, Sessler AM (2011) Mitosis detection for stem cell tracking in phase-contrast microscopy images 48(1):2121–2127 24. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition 25. Chang C, Ansari R (2005) Kernel particle filter for visual tracking. IEEE Trans Signal Process Lett 12(3):242–245 26. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems, pp 809–817 27. Bengio Y (2009) Learning deep architectures for AI, Foundations and trends. Mach Learn 2(1):1–127 28. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507 29. Abouelnaga Y, Ali OS, Rady H, Moustafa M (2016) Cifar-10: Knn-based ensemble of classifiers. In: Proceedings of international conference on computational science and computational intelligence 30. Carvalho EF, Engel PM (2014) Convolutional sparse feature descriptor for object recognition in cifar-10. In: Intelligent systems, pp 131–135 31. Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37:1834–1848 32. Babenko B, Yang MH, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619 33. Ahuja N (2012) Robust visual tracking via multi-task sparse learning. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2042–2049 34. Kwon J, Lee KM (2010) Visual tracking decomposition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1269–1276
123