APPLIED PROBLEMS
A Real-Time Algorithm for Small Group Detection in Medium Density Crowds1 Jie Shaoa,*, Nan Dongb, and Qian Zhaoa aDepartment
of Electronic and Information Engineering, Shanghai University of Electric Power, Shanghai, China Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, China *e-mail:
[email protected]
b
Abstract—In this paper, we focus on the task of small group detection in crowded scenarios. Small groups are widely considered as one of the basic elements in crowds, so it is a major challenge to distinguish group members from the individuals in the crowd. It is also a basic problem in video surveillance and scene understanding. We propose a solution for this task, which could run in real time and could work in both low and medium density crowded scenes. In particular, we build a social force based collision avoidance model on each individual for goal direction prediction, and employ the predicted goal directions instead of traditional positions and velocities in collective motion detection to find group members. We evaluate our approach over three datasets including tens of challenging crowded scenarios. The experimental results demonstrate that our proposed approach is not only highly accurate but also improves the practical property performance compared to other state-of-the-art methods. Keywords: small group detection, medium crowded scenes, goal prediction, goal based coherent filtering DOI: 10.1134/S1054661818020074
INTRODUCTION Detection and tracking on moving subjects is one of the foundation tasks in crowd analysis. Traditionally, individuals are treated as the only moving subjects in the crowd. However, recently, small groups have been recognized as another important basic elements which compose the crowd [1]. Research works on group detection are divided into two categories, one category is group clustering based on their motion characteristics. Some of them are similar to crowd segmentation, considering crowds division into large groups of different motions, such as [2, 3], and some others are interested in semantic understanding of human motions, like [4]. The other category is small group detection, which focus on distinguishing individuals with social relationships with strangers [5–10]. Technically, small group involved in the latter category is closer to the conception of group defined in [1]. In the previous small group detection approaches, some [7] focused on the interactions inside the group and implemented experiments on simulative scenarios. Some others [9] concentrated on studies on low density crowds, implementing experiments on datasets, such as the BIWI Walking Pedestrian dataset [11]. Clustering methods are considered as a common approach. Ge et.al [6] suggested a hier1 The article is published in the original.
Received March 5, 2017
archical clustering method for trajectory and velocity features in order to detect groups. Alternatively, social force model (SFM) is also widely used in the crowd simulation community [8, 10, 17]. Sochman [8] tried to directly estimate the group clustering parameters of SFM based on tracking observations. Further, Mazzon et al. [10] employed a buffered greedy graph-based multi-target tracker for group tracking after SFMbased detection. All the above solutions rely on positions and velocities, which are transient features, and may change following the environments. Besides, most approaches provided experimental results mainly on low density crowds, lack of evaluation on performance on medium density crowds. Lately, F. Solera et al. [5] provided a solution for grouping in medium density crowds. They jointly adopted sociologically grounded features and a structural SVM learning framework in their grouping algorithm, and presented state-of-the-art results when relying on ground truth trajectories previously extracted by other detector/tracker system. Nevertheless, unfortunately, their method is based on off-line learning, and it’s time consuming. Even omitted its training time, it would still need seconds or even minutes for feature extraction and training per frame. This paper offers an effective small group detection solution, which could not only deal with both low and medium density crowded scenes, but also run in real time. The strong novelty of our approach is the joint adoption of an improved social force model for goal prediction and a goal-based coherent filtering algorithm for group detection. To this end, with the given
ISSN 1054-6618, Pattern Recognition and Image Analysis, 2018, Vol. 28, No. 2, pp. 282–287. © Pleiades Publishing, Ltd., 2018.
A REAL-TIME ALGORITHM
Video acquisition
283
L direction prediction model
Individual detection results
Goal
Group detection results
Pedestrian Velocity Driving Force Repulsive Force Evasive Force Trajectory Obstacle
Goal based coherent filtering Goal prediction results Center Invariant neighbor Removed neighbor Unrelated Neighborhood Goal direction
Fig. 1. Overview of our group detection algorithm. Circles of different colors indicate individuals’ positions in the “Detection results” block. Arrows indicate predicted goal direction of individuals in the “Goal prediction” block. Individuals in the same group are connected by lines in the “Group detection results” block.
detected trajectories, we build a social force based collision avoidance model [12] for each individual. After computing their repulsive forces, evasive forces and driving forces, the goal directions of the individuals are predicted. Then the K nearest neighbor algorithm is applied to find K neighbors of each individual. In the resulting clusters, we embedded goal directions into Goal based Coherent Filtering (G-CF), so that individuals characterized of coherent goal invariance are classified in the same group. The whole solution is shown in six schematic blocks in Fig. 1. The details of each step will be described in the next section. IMPLEMENTATION We derive a novel solution for small group detection in low and medium density crowds. Our algorithm runs at 25 frames/seconds on 720 × 576 resolution videos and is highly accurate with various scenarios. Our major contribution is that we provide a lot of high-performance results on crowds of different densities and our solution runs at a high speed even on higher density scenes. As pedestrian detection and tracking has achieved a lot of outstanding results and effects, we start our small group detection process based on the given pedestrian trajectories detected by Dollar et al. [13] and milan et al. [14].
and make moves to avoid possible collisions. In our approach, we apply the model for goal prediction. There are three forces defined in the model: the driving force, the repulsive force and the evasive force. The driving force Fg is the force that makes the individual pi move towards its goal gi . The repulsive force Fr is the force to avoid collisions with the environments, such as walls or barriers. The evasive force Fe comes from the intent that the individual pi would like to keep a certain distance from others in order to feel comfortable. We only focus on the description of goal direction predicting in this paper. The behavior of an individual pi derives from the following assumptions: The individual pi is at the position oi , and trying to reach its desired goal position gi with a driving force Fg , which is defined as:
Fg = 1 (vides − vi ) ⇒ vides = τFg + vi . τ
Here, we assume that vides is the velocity that pi would like to have to reach its goal gi , and it plans to change its velocity from its current velocity vi to vides within a certain time τ …vides is computed from Equ. 1 based on Fg . Then its desired moving direction is defined as:
Goal Prediction Let P = { p1,… , pi ,… , pn} be the set of n people in the scene, and oi = {xi , yi } be the feet position of person pi . The Predictive Collision Avoidance Model has been used to simulate the movements of an individual PATTERN RECOGNITION AND IMAGE ANALYSIS
(1)
ηi =
vides g − oi = i . des ||vi || ||gi − oi ||
(2)
As the desired moving direction ηi is also its goal direction for pi , we also call it the goal direction in our
Vol. 28
No. 2
2018
284
SHAO et al.
paper. Then we need to consider how to get Fg in the following step. Since the movement of an individual pi is controlled by three forces simultaneously, the velocity of pi at time t + 1 is derived from the following equation:
vi (t + 1) = vi (t ) + (Fg + Fr + Fe )Δt.
(3)
After computing Fr and Fe , Fg could be obtained by Eq. (3). The defination of Fr are as follows:
⎧n dsafe − diw d < d iw safe ⎪ iw κ , Fr = ⎨ (diw ) ⎪⎩ 0 otherwise
(4)
niw is the shortest distance between pi and the barrier. dsafe is the safe threshold between pi and the barrier. κ is the pre-defined parameter. Here κ = 1. The evasive force Fe generated from the other individual p j to the individual pi is determined by two factors: the distance oi − o j from pi to p j , and their velocities vi and v j . Therefore, we employ the following way to define Fe . If pi is considered as the center, and Thrn is the radius of its neighborhood, then all the p j in its neighborhood are called candidates. We compute angles (o j − oi ), vi between pi and all of its candi dates. If (o j − oi ), vi < Thrangle , then the candidate p j may meet pi later, so we assume that it has evasive force on pi . The following function is used to compute the possible meeting time tc of pi and p j . (5) |(o j − oi ) − (vi − v j )tc | = Thrw . The Thrw is the precaution threshold, which is the nearest distance that an individual could accept. Assuming that the solution of Eq. (5) is t1 and t2 , so that tc is computed by the following equation:
0 t1 < 0, t2 < 0 ⎧ ⎪⎪ t1 t1 > 0 > t2 , tc = ⎨ t2 t2 > 0 > t1 ⎪ ⎩⎪min(t1, t2 ) t1 > 0, t2 > 0
(6)
then the position of pi and p j after tc frames could be predicted, which are denoted by oi' and o'j respectively. D j = |oi' − oi | + |oi' − o'j | is the distance for pi to avoid collision with p j . Referring to [12], the evasive force from p j to pi is defined as a linear function f (D j ) . The defination of f (i) is described in [12] in details. In the end, the evasive force of pi from M people is as follows: M
Fe =
∑ j
ω j f (D j ) i
oi' − o'j |oi' − o'j |
,
(7)
The third block in Fig. 1 shows the sketch diagram of these three forces, the velocity and goal direction of an individual. The individuals are represented by brown points. We take the upmost point as an example. It receives three forces from different directions, which are derived from his motivation, the obstacle in front of it and another individual by its side. They are indicated by yellow arrow, blue arrow and red arrow. The motion of the individual is decided by the combination of three forces, so green arrow indicates its velocity pointing to another direction. According to Eqs. (1) to (3), the brown arrow with dotted line is actually the direction pointing to its goal, which is the predicted goal direction. Small Group Detection
A group is defined as two or more people interacting to reach a common goal and perceiving a shared membership, based on both physical and social identity. Two key properties of group members are summarized based on the above definition: sharing a common goal and being in the neighborhood. As a result, we distinguish group members from other people based on two requirements: 1. They are the majorities with coherent motions in a neighborhood under certain distance; 2. They tend to have high goal relevance over a certain time (Table 1). We propose our group detection algorithm following the above two requirements. As we defined in the previous subsection, the individual pi has its goal direction ηi and its feet position oi . The K nearest neighbors of pi is N i ( p1,… , p j ,… , pK ), which builds the initial neighborhood of pi . Then based on the common sense that group members usually stay closer to each other than strangers, we assume the distance between group members to be shorter than a threshold Thr, which is defined to be three times of a person’s average diameter in the scene. Individuals that over Thr distance away from pi are filtered. In order to find individuals staying in pi 's neighborhood for a certain consecutive time T, we need to associate pi 's neighborhoods from time t to time t + T, and individuals that stay in the neighborhoods for the whole period are qualified. The resulting group are Gi (t ) . The final step is to calculate the correlation of goal directions between pi and its neighbors, and reserve the average value of Ci, j ( pi , p j ) during consecutive T times. Consequently, we provide the following definitions:
ηi i η j |ηi | i |η j |
Ci, j ( pi , p j ) = t
Ci, j ( pi , p j ) = 1 T
∑
PATTERN RECOGNITION AND IMAGE ANALYSIS
(8)
t +T
Cit, j ( pi ,
p j ).
t
Vol. 28
No. 2
2018
A REAL-TIME ALGORITHM Table 1. Algorithm: Goal based coherent filtering Input:{ pi (oi , ηi )|i = 1 : n} for d = 0 to T for i = 1 to n Search the K nearest neighbor for pi Resulting in N i (t + d ) = { p j (o j (t + d ), η j (t + d ))} if dist( pi , p j ) > Thrg then delete p j from N i (t + d ) end if end for end for if p j ∈ {N i (t ) ∩ … ∩ N i (t + T )} then p j → G i (t ) end if for ∀p j ∈ G i (t ) Compute Ci, j ( pi , p j ) based on Eq. (8) if Ci, j ( pi , p j ) < λ Delete p j from G i (t ) end if end for
Table 2. Parameters of testing videos
Video
#p
#g
student003 30 GVEII 39 1dawei5 45 1shatian3 225
8 10 4 40
din (pixel) dout (pixel)
34 36 26 11
106 100 72 23
di / o 0.32 0.36 0.36 0.48
We apply a threshold λ on Ci, j ( pi , p j ) to evaluate if the neighbors have high relevance of moving directions with pi in the end. Referring to the fifth block in Fig. 1, the black point in the center is assumed to be pi . The green and red points are individuals in its neighborhood after filtering. The black arrows are goal directions of the individuals. After calculation, the correlations between pi and green points are lower than λ, so only the red point is the group member of pi . EXPERIMENTS As far as we can tell, in the previous relevent works, the one of Solera et.al. includes the most experimental results on videos of medium density crowd. Consequently, we evaluate our approach by comparing with theirs on three datasets. Firstly, we give a brief descripPATTERN RECOGNITION AND IMAGE ANALYSIS
285
tion of the experimental datasets, and then show our results. Finally we present the comparisons. The first dataset is the crowds-by-examples dataset [13] which is briefly named student003. It includes 406 people and 108 groups. It’s widely used by most small group detection works. The second dataset is the Vittorio Emanuele II Gallery dataset, briefly named GVEII. It includes 117 people and 18 groups, and was provided recently by Bandini et.al. [16]. The third dataset is provided by Solera et.al., named MPT-20X100, which is composed of 20 medium density crowd videos of 100 frames. All the videos in MPT-20X100 are characterized of a high number of pedestrians. We provide a density parameter comparison result in table 2. din is the average distance between small group members in the video. dout is the shortest distance between group members and non-group members. di / o = din /dout is the density measure parameter. The scene is more crowded, the value of di / o is bigger. #p is the average number of person in the video. #g is the average number of small groups in the video. As shown in the table 2, the di / o values of these four videos are from 0.3 to 0.5, which indicates that they all belong to the middle density crowd scene. Figure 2 shows results on four videos from the above three datasets. The first and second rows are results from student003 and GVEII. The third and fourth rows are results of videos, named 1dawei5 and 1shatian3 in MPT-20X100. As each individual is indicated by a point at its feet’s position, we employ a red line to indicate if they are in the same group. In order to show the results more clearly, only areas in the middle of the scenes are presented. As we could see in Fig. 2, although di / o of video student003 is the lowest in Table 2, motion patterns in the scene are variable, including groups of two, three and four. Pedestrians may also be occluded by others. Besides, the video is captured outdoor, which means that tracking results may be unstable due to light variation and shadows. di / o of video GVEII and 1dawei5 are higher than that of student003. Small groups in those two scenes are mostly walking in pairs. The scene in video 1 shatian3 are more complex than the above three. It not only includes a mass of people, but also has the highest density in the scene. Resolutions of individuals are too low to accurately detect their trajectories. Even so, our method still has good performance on all these videos. Qualitative comparison results are given in Table 3. We show the precision, recall and running speed in the table. Our experiment was implemented in Matlab on a PC of Intel i5 core CPU and 4G RAM. λ equals to 0.8 and K is 10 in our experiments. We manually recorded the ground truth of all the videos. The performance suggests that our method is much faster than the one proposed in [5]. And we have much better results on GVEII than [5]. For student003 and 1da-
Vol. 28
No. 2
2018
286
SHAO et al.
Fig. 2. Quantitative results on sequences of four datasets. Group members are connected by red lines. 1st row: video student003; 2nd row: video GVEII; 3rd and 4th row: video 1dawei5 and 1shatian3 from MPT-20x100 dataset.
wei5, our method is on par with theirs. The false negative value is high according to our results on video 1shatian5, which implies that our method has excellent performance on low and medium density crowd,
but number of undetected group would increase as the density increases to a certain extent. CONCLUSION
Table 3. Comparative results on the challenging datasets (a) Our method Dataset
Precision (%) Recall (%) Time(s/frame)
Student003
87.69
78.08
0.04
GVEII
92.55
84.5
0.03
1dawei5
94.11
87.45
0.06
1shatian3
92.30
62.13
0.18
(b) Solera et al. [5] Dataset
Precision (%) Recall (%) Time(s/frame)
Student003
81.91
81.27
12.3
GVEII
84.12
84.11
16.75
91.95
10.18
1dawei5 1shatian3
100 84.9
83.59
492
This paper proposes a small group detection approach for higher density crowded scenes. The group detector improves the state-of-the-art approaches by leveraging goal direction invariance instead of velocities. The goal direction is predicted based on an improved social force model for each individual, and group members are restrictedly selected inside neighboring people who share the similar direction for a consecutive time. We show that our solution for group detection outperforms state-of-the-art methods. As future work, we are considering building the model inside the group and analysis their interactions. REFERENCES 1. M. Moussaid, N. Perozo, S. Garnier, et al., “The walking behaviour of pedestrian social groups and its impact
PATTERN RECOGNITION AND IMAGE ANALYSIS
Vol. 28
No. 2
2018
A REAL-TIME ALGORITHM
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
on crowd dynamics,” PLoS ONE 5 (4), e10047, 7 pages (2010). J. Shao, N. Dong, and Q. Zhao, “An adaptive clustering approach for group detection in the crowd,” in Proc. 2015 22nd Int. Conf. on Systems, Signals and Image Processing (IWSSIP) (IEEE, 2015), pp. 77–80. B. Zhou, X. Tang, and X. Wang, “Coherent filtering: Detecting coherent motions from crowd clutters,” in Computer Vision — ECCV 2012, Part II, Ed. by A. Fitzgibbon, Lecture Notes in Computer Science (Springer, Heidelberg, 2012), Vol. 7573, pp. 857–871. I. O. Nunes, P. O. S. Vaz de Melo, and A. A. F. Loureiro, “Group mobility: Detection, tracking and characterization,” in Proc. 2016 IEEE Int. Conf. on Communications (ICC) (IEEE, 2016), pp. 4940–4945. F. Solera, S. Calderara, and R. Cucchiara, “Socially constrained structural learning for groups detection in crowd,” IEEE Trans. Pattern Anal. Mach. Intellig. 38, 995–1008 (2015). W. Ge, R. Collins, and R. Ruback, “Vision-based analysis of small groups in pedestrian crowds,” IEEE Trans. Pattern Anal. Mach. Intellig. 34, 1003–1016 (2012). I. Karamouzas and M. Overmars, “Simulating and evaluating the local behavior of small pedestrian groups,” IEEE Trans. Visualization Comput. Graphics 18, 394–406 (2011). J. Љochman and D. C. Hogg, “Who knows who – Inverting the Social Force Model for finding groups,” in Proc. 2011 IEEE Int. Conf. on Computer Vision Workshops (ICCV Workshops) (IEEE, 2011), pp. 830–837. L. Bazzani, M. Cristani, and V. Murino, “Decentralized particle filter for joint individual group tracking,” in Proc. 2012 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2012), pp. 1886–1893. R. Mazzon, F. Poiesi, and A. Cavallaro, “Detection and tracking of groups in crowd,” in Proc. 2013 10th IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2013), pp. 202–207. S. Pellegrini, A. Ess, and K. Schindler, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in Proc. 2009 IEEE 12th Int. Conf. on Computer Vision (IEEE, 2009), pp. 261–268. I. Karamouzas, P. Heil, P. van Beek, and M. H. Overmars, “A predictive collision avoidance model for pedestrian simulation,” in Motion in Games, Ed. by A. Egges, R. Geraerts, and M. Overmars, Lecture Notes in Computer Science (Springer, Berlin, 2009), Vol. 5884, pp. 41–52. P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intellig. 36, 1532–1545 (2014). A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization for multitarget tracking,” IEEE Trans. Pattern Anal. Mach. Intellig. 36, 58–72 (2014).
PATTERN RECOGNITION AND IMAGE ANALYSIS
287
15. A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Computer Graphics Forum 26, 655–664 (2007). 16. S. Bandin, A. Gorrin, and G. Vizzari, “Towards an integrated approach to crowd analysis and crowd synthesis: A case study and first results,” Pattern Recognit. Lett. 44, 16–29 (2014). 17. V. Kaltsa, A. Briassouli, I. Kompatsiaris, and M. G. Strintzis, “Swarm-based motion features for anomaly detection in crowds,” in Proc. 2014 IEEE International Conference on Image Processing (ICIP) (IEEE, 2014), pp. 2353–2357.
Vol. 28
Jie Shao received the B.S. and M.S. degree in Nanjing University of Aeronautics and Astronautics. She got her Ph.D. in Tongji University in 2012. Right now she is an associate professor in Shanghai University of Electric Power. Her current research interests include video surveillance, facial exprssion recognition and human activity analysis.
Dong Nan received the BEng and MSc degrees from Yantai university in 2004 and 2007, respectively, and the PhD degree in control theory and control engineering from Tongji university in 2011. He is currently an associate research fellow in the Shanghai Advanced Research Institute, Chinese Academy of Sciences. His research interests are in computer vision and machine learning, include object recognition, human activity recognition and image/video understanding.
Zhao Qin received B.S degree from Zhejiang University in 1992 and M.S degree and Ph.D. from Shanghai University in 2005 and 2012, respectively. Now she is an associate professor and Master’s Tutor in Shanghai University of Electric Power. Her main research directions include machine vision, pattern recognition and IC design.
No. 2
2018