Semi-supervised Elastic net for pedestrian counting

doi:10.1016/j.patcog.2010.10.002

Pattern Recognition

Volume 44, Issues 10–11, October–November 2011, Pages 2297-2304

https://doi.org/10.1016/j.patcog.2010.10.002 Get rights and content

Abstract

Pedestrian counting plays an important role in public safety and intelligent transportation. Most pedestrian counting algorithms based on supervised learning require much labeling work and rarely exploit the topological information of unlabelled data in a video. In this paper, we propose a Semi-Supervised Elastic Net (SSEN) regression method by utilizing sequential information between unlabelled samples and their temporally neighboring samples as a regularization term. Compared with a state-of-the-art algorithm, extensive experiments indicate that our algorithm can not only select sparse representative features from the original feature space without losing their interpretability, but also attain superior prediction performance with only very few labelled frames.

Introduction

Pedestrian counting in public places plays a key role in many applications, such as evacuating from a dense region to a sparse one when an emergency happens [1], or optimizing the design of traffic infrastructures to provide better transportation services. Furthermore, social security and surveillance strongly depend on the effectiveness of pedestrian counting [2].

Generally speaking, estimating the number of pedestrians can be performed in two different manners. The first approach is based on detection or tracking methods, e.g. Leibe et al. [3] and Kim and Cipolla [4]. Such methods could count pedestrian number accurately when pedestrian density is low. As the density increases, however, the performance of these methods will deteriorate since occlusion will become obvious in the dense crowd. Other aspects such as variances in pedestrian including height, pose and large bags will also impair the performance. The second approach directly estimates the number of pedestrians in a scene through using pedestrians’ pixel and texture information. Without identifying individuals, one remarkable advantage of this approach is that pedestrians’ privacy can be well preserved [5]. Assuming that the number of pixels occupied by pedestrians has a linear relationship with pedestrian counts, Davies et al. [6] used the pixel information to count pedestrians. However, such a way suffers from (1) perspective effect, which means that human size in an image changes as the distance from the subject to the camera varies, and (2) overlapping between pedestrians. For example, the number of pixels occupied by two overlapped pedestrians is less than that of pixels occupied by two non-overlapped pedestrians if the distances between pedestrians and camera keep the same. Therefore, it invalidates the assumption of the linear relationship mentioned above.

To solve the perspective issue, Chan et al. [5] proposed a perspective normalization map in which each entry in the map is regarded as a normalization factor on the corresponding pixels, giving larger factors to distant pixels in which the corresponding detected areas are far away from camera. Ma et al. [7] proposed another perspective correction matrix, based on the fact that a person in the scene is vertical to the ground plane, whether standing or walking. As for the overlapping issue, one way for this is to directly estimate the pedestrian density from each image. Marana et al. [8] claimed that some textures, such as Gray Level Co-occurrence Matrix (GLCM), can describe the crowd density well since the dense crowd always has finer texture, whereas sparse crowd is of coarser texture. Observing that pedestrian edge is also an effective feature to describe the pedestrian density, Marana et al. [9] proposed Minkowski fractal dimension and Kong et al. [10] employed edge orientation histogram based on the pedestrian edge. Chan et al. [5] further refined the performance with the combination of Minkowski fractal dimension and edge orientation histogram features.

Note that all the aforementioned pedestrian counting algorithms are achieved under the supervised learning framework. A major disadvantage for these algorithms is that, to obtain a good performance, it is necessary to collect a great number of labelled frames in the learning process. However, it is a boring and labor-intensive work to label too many frames. In addition, mislabeling often happens easily during manual labeling.

Fortunately, there are a large number of unlabelled frames which provide useful topological information that can benefit to the performance of pedestrian counting. In the machine learning community, semi-supervised learning framework, which utilizes unlabelled samples effectively, has been extensively investigated in last decade [11]. One representative strategy of semi-supervised learning is to use the regularization technique to utilize structure of unlabelled data. Zhu [11] proved that if the unlabelled data structure can approximate the true population distribution well, an expected result will be obtained. To better exploit information in unlabelled data, Zhu and Goldberg [12], Zhang et al. [13] and Hou et al. [14] introduced domain knowledge to build the regularization term for regression and dimension reduction, respectively. It is worth noting that compared with semi-supervised classification, semi-supervised regression is less studied. The main reason is that in classification tasks, the number of categories is finite, whereas object values are continuous in regression tasks. Zhou and Li [15] proposed a co-training style algorithm in which two different regressors make their respective estimations on unlabelled data, and refine the performance by learning the results of each other. Rwebangira and Lafferty [16] proposed a semi-supervised locally linear regression through adding a weighted regularization term to the regression function. However, it is not easy to directly generalize these semi-supervised approach to pedestrian counting since (1) the difficulty of seeking two proper views or distant metrics on pedestrian data for co-training, and (2) the local linear property does not always be true for pedestrian data with rich features.

Another common issue in pedestrian counting is that there are many redundant features to be removed when a collection of features are extracted from pedestrian video. Although Lasso proposed by Tibshirani [17] can be helpful to reduce such features, its efficiency depends greatly on the number of dimensions. When the pairwise correlation between a group of features is high, lasso chooses only one feature from the group. To overcome these two limitations, Elastic net, a variant of Lasso [17], is proposed by adding a L₂ norm constraint [18]. Therefore, we attempt to utilize the Elastic net to reduce the redundant features which have less relationship with the properties of pedestrians.

To better reflect the degree of the pedestrian density and effectively utilize the structural information in the unlabelled pedestrian image sequence, we also introduce statistical landscape features (SLF) proposed by Xu and Chen [19] to extract statistical features closely related to the property of pedestrian counts. Then we propose semi-supervised Elastic net (SSEN) by incorporating sequential correlation between frames into the Elastic net. With this way, we can remove redundant features from SLF, and meanwhile utilize sequential information in the unlabelled frames to improve the performance of counting pedestrians.

The main contributions of the proposed SSEN algorithm include: (1) To the best of our knowledge, this is the first time to employ semi-supervised learning to solve the pedestrian counting problem. (2) Domain knowledge which refers to sequential information extracted from pedestrian videos is elegantly utilized. (3) Without losing interpretability, the original features can be effectively reduced by the proposed SSEN algorithm. (4) Compared with the state-of-the-art pedestrian counting algorithm [5], which generally requires hundreds of labelled frames to obtain a relatively good result, SSEN achieves better performance, even with only very few labelled frames.

The remainder of this paper is organized as follows. In Section 2, we describe the procedure of feature extraction, and in Section 3, we detail our SSEN algorithm. Experimental results are provided and analyzed in Section 4. Finally, Section 5 concludes this work.

Section snippets

Feature extraction

To extract a collection of pedestrian features, it is necessary to segment foreground from the background image. We use a moving average [20] to compute a foreground mask, followed by smoothing the mask image with a median filter and mathematical morphology. Then, we obtain the foreground image by multiplying the smoothed mask image with the corresponding frame. Fig. 1 shows an example.

We extract six sets of features from the mask and foreground images so that the properties of pedestrians in

Elastic net

For better understanding of the proposed SSEN algorithm, we will introduce lasso at first. Given a collection of data points ${(x_{i}, y_{i}) | i = 1, 2, \dots, n}$ , where each independent variable $x_{i}$ consists of m features, and y_i is the corresponding response variable, the lasso optimizes the following function [17]: $\hat{β} = \arg \min_{β} \sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{m} β_{j} x_{j})}^{2} s.t . \sum_{j = 1}^{m} | β_{j} | \leq t,$ where $β = (β_{1}, β_{2}, \dots, β_{m})^{T}$ is the weighted values of features, and $t \geq 0$ is a threshold. When t is large enough, lasso will obtain a similar result as the linear

Semi-supervised Elastic net with sequential data

One common way to construct a semi-supervised algorithm is to add unlabelled data as a regularization term to refine the performance of learning. Such regularization implicitly assumes some topological structures of data, for example, manifold structure [23]. If the assumption does not hold, however, the performance may be deteriorated by the introduction of regularization. The other way is to use domain knowledge to achieve semi-supervised learning. Although the knowledge has no direct

Experiments

To evaluate the performance of the proposed SSEN algorithm, we carry out a series of experiments on two benchmark datasets, i.e., UCSD pedestrian dataset [5] and Fudan pedestrian dataset.¹ The UCSD dataset extracted 2000 frames of size 238×158 from a video as ground-truth. Fig. 5(a) shows several sequential frames from the dataset. The Fudan pedestrian dataset contains 1500 sequential frames of size 320×240, as shown in Fig.

Conclusion

In this paper we have proposed a semi-supervised Elastic net algorithm to count pedestrians in the image sequence. Through utilizing sequential information in the video as a regularization, the proposed algorithm can select the representative features from the original high-dimensional features without sacrificing interpretability, and attain better prediction performance with only a few training data compared with state-of-the-art algorithm. In the future, we will study a more effective

Acknowledgements

This work was supported in part by the NFSC (nos. 60635030, 60975044) and 973 program (no. 2010CB327900), Shanghai Leading Academic Discipline Project No. B114 and Scientific research start-up fund of the hundred talents program of CASIA (Y0J2021MZ1). The authors would like to thank the reviewers of the first version of this paper for their various comments, which helped to greatly improve the presentation.

Ben Tan received his B.S. degree in the Department of Computer Science and Technology at Tongji University, Shanghai, China, in 2007. He is now a master student in School of Computer Science, Fudan University, Shanghai, China. His current research interests include image processing and machine learning.

References (26)

C.P. Hou et al.
Multiple view semi-supervised dimensionality reduction
Pattern Recognition
(2010)
J.J. Verbeek et al.
Gaussian fields for semi-supervised regression and correspondence learning
Pattern Recognition
(2006)
L. Nanni et al.
Dynamic plan generation and real-time management techniques for traffic evacuation
IEEE Transactions on Intelligent Transportation Systems
(2008)
B.B. Zhan et al.
Crowd analysis: a survey
Journal of Machine Vision and Applications
(2008)
B. Leibe, E. Seemann, B. Schiele, Pedestrian detection in crowded scenes, in: IEEE Conference on Computer Vision and...
T.-K. Kim, R. Cipolla, Mcboost: multiple classifier boosting for perceptual co-clustering of images and visual...
A.B. Chan, Z.J. Liang, N. Vasconcelos, Privacy preserving crowd monitoring: counting people without people models or...
A.C. Davies et al.
Crowd monitoring using image processing
Electronics & Communication Engineering Journal
(1995)
R. Ma, L. Li, W. Huang, On pixel count based crowd density estimation for visual surveillance, in: IEEE Conference on...
A.N. Marana et al.
Automatic estimation of crowd density using texture
Safety Science
(1997)

A.N. Marana, L.F. Costa, R.A. Lotufo, S.A. Velastin, Estimating crowd density with Minkowski fractal dimension, in:...

D. Kong, D. Gray, H. Tao, A viewpoint invariant approach for crowd counting, in: IEEE International Conference on...

X.J. Zhu, Semi-supervised learning literature survey, Technical Report 1530, Computer Sciences, University of...

Cited by (65)

Cost-sensitive learning for semi-supervised hit-and-run analysis
2021, Accident Analysis and Prevention
Hit-and-run crashes not only degrade the morality, but also result in delays of medical services provided to victims. However, class imbalance problem exists as the number of hit-and-run crashes is much smaller than that of non-hit-and-run crashes. The missing label problem also exists in the crash analysis due to reasons like data barrier such that the information hidden in the unlabelled samples has not been effectively utilised. In this paper, a cost-sensitive semi-supervised logistic regression (CS³LR) model is proposed for hit-and-run analysis, in order to tackle class-imbalanced data distribution and missing label problem, based on the crash dataset of Victorian, Australia (2013–2019). By performing label estimation with logistic regression jointly utilising both labelled and unlabelled data with pseudo labels in a well-designed cost-sensitive semi-supervised maximum likelihood framework, the proposed model can obtain an unbiased likelihood parameter for hit-and-run prediction and analysis. Comparing the experimental results of CS³LR model with two logistic regression models and seven machine learning methods, better performance of CS³LR model is demonstrated. The most significant contributing factors to hit-and-run crashes extracted by CS³LR with only 10% labelled data show a high degree of consistency with the true contributing factors obtained by the supervised cost-sensitive logistic regression with complete hit-and-run labels. The effects of class-weighted ratio and hyper-parameter λ on the performance of hit-and-run crash prediction model have also been analysed. The results can further provide recommendations and implications on the policies and counter-measures for preventing hit-and-run collisions and crimes. The methodology proposed in this paper can also be employed to analyse crash data with other types of missing labels, such as crash severity.
An efficient semi-supervised manifold embedding for crowd counting
2020, Applied Soft Computing Journal
Citation Excerpt :
Finally, we verify the effectiveness of the proposed method by comparing the experimental results with other state-of-the art competitors. In the field of crowd counting, there are three benchmark datasets, namely UCSD dataset [6], Mall dataset [7], and Fudan dataset [18], widely used to assess the performance of different methods. Some frames of the three databases are displayed in Fig. 2.
Crowd counting is one of the most paramount tasks for safety and security. Many existing methods mainly focus on the predicted accuracy but ignore the efficiency, which hinders their applications in practice. Moreover, their performance heavily depends on the learning from a large number of labeled scene data, which is cost-expensive for crowd counting. In this paper, we present a novel crowd counting approach called semi-supervised manifold embedding (SSME) to address the above weaknesses. In the newly proposed method, we formulate the crowd counting as a semi-supervised classification problem and learn a linear mapping from the high-dimensional scene feature space to the low-dimension label space by simultaneously imposing the label fitness and the manifold smoothness, where the learned linear mapping facilitates the efficiency of crowd counting. In order to alleviate the issue that most supervised approaches to crowd counting require sufficient labeled data for improving the performance, we exploit the first neighbor propagation to select informative samples in the proposed SSME-based algorithm. Thorough validation experiments on three challenging benchmark datasets indicate that the proposed method is capable of achieving more impressive prediction accuracy on the number of pedestrians in a monitoring scene than other state-of-the-art competitors.
A hybrid safe semi-supervised learning method
2020, Expert Systems with Applications
Citation Excerpt :
During the past decades, Semi-Supervised Learning (SSL) has received more and more attention in the machine learning field. Different SSL methods (Chapelle, Scholkopf, & Zien, 2006; Chen, Shao, Li, & Deng, 2016; Gan, Sang, Huang, Tong, & Dan, 2013; Tan, Zhang, & Wang, 2011; Zhu, 2005) have been proposed and achieved promising performance with the help of unlabeled instances. Nevertheless, the existing SSL methods can not always obtain the desired performance in all cases.
Within the past few years, Safe Semi-Supervised Learning (S3L) has become a hot topic in the machine learning field and many related S3L methods have been proposed to safely exploit the unlabeled information. However, these methods only considered the risk from a single level, such as the instance or model level. They can not reduce the adverse effects of both the risky unlabeled instances and inappropriate learning models. Therefore, it is important to investigate a novel effective S3L method. In this paper, we present a hybrid S3L method which can inherit the merits of both the instance-level and model-level approaches. In our algorithm, multiple Graph-based SSL (GSSL) classifiers are firstly trained and used to predict the unlabeled instances. The risk degrees of the unlabeled instances and the qualities of the constructed graphs are then estimated through the predictions of multiple GSSL classifiers. Finally, we build two regularization terms to constrain the predictions of the unlabeled instances and adaptively select the graphs with high qualities. These regularization terms aim at reducing the negative effect of both the risky unlabeled instances and inappropriate learning models with low-quality graphs. Experimental results on different real-world datasets verify the effectiveness of our algorithm by comparisons to the state-of-the-art Supervised Learning (SL), SSL and S3L methods. In conclusion, our algorithm can not only enrich the research of S3L, but enlarge the practical scope of SSL in the expert and intelligent systems to a certain extent.
Interpretable machine learning for predicting biomethane production in industrial-scale anaerobic co-digestion
2020, Science of the Total Environment
The objective of this study is to apply machine learning models to accurately predict daily biomethane production in an industrial-scale co-digestion facility. The methodology involved applying elasticnet, random forest, and extreme gradient boosting to input–output data from an industrial-scale anaerobic co-digestion (ACoD) facility. The models were used to predict biomethane for 1-day, 3-day, 5-day, 10-day, 20-day, 30-day, and 40-day time horizons. These models were fit on four years of operational data. The results showed that elastic net (a model with assumptions of linearity) was clearly outperformed by random forest and extreme gradient boosting (XGBoost), which had out-of-sample R² values ranging between 0.80 and 0.88, depending on the time horizon. In addition, feature importance and partial dependence analysis demonstrated the marginal and interaction effects on biomethane of selected biowaste inputs. For instance, food waste co-digested with percolate were shown to have strong positive interaction effects. One implication of this study is that XGBoost and random forest algorithms applied to industrial-scale ACoD data provide dependable prediction results and may be a useful complement for experimental and mechanistic/theoretical models of anaerobic digestion, especially where detailed substrate characterization is difficult. However, these models have limitations, and suggestions for deriving additional value from these methods are proposed.
NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images
2024, arXiv
NWPU-MOC: A Benchmark for Fine-Grained Multicategory Object Counting in Aerial Images
2024, IEEE Transactions on Geoscience and Remote Sensing

View all citing articles on Scopus

Junping Zhang received the M.S. degree in control theory and control engineering from Hunan University, Changsha, China, in 2000 and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2003. Since 2003, he has been an Associate Professor with the School of Computer Science, Fudan University, Shanghai, China. His research includes machine learning, pattern recognition, image processing, biometric authentication, and intelligent transportation systems. He is an associate editor of IEEE Intelligent Systems since 2009. He is also an associate editor of IEEE Transactions on Intelligent Transportation Systems since 2010.

Liang Wang received both the B.Eng. and M.Eng. degrees in Electronic Engineering from Anhui University, China and the Ph.D. degree in Pattern Recognition and Intelligent System from the Institute of Automation, Chinese Academy of Sciences (CAS). From 2004 to 2009, he was with Imperial College London, UK, Monash University, and University of Melbourne, Australia, respectively. Currently, he is a Lecturer with the Department of Computer Science, University of Bath, United Kingdom. His major research interests include machine learning, pattern recognition, computer vision, multimedia processing, and data mining. He has widely published at highly-ranked international journals such as IEEE TPAMI, IEEE TIP, IEEE TKDE, and IEEE TCSVT and leading international conferences such as CVPR, ICCV and ICDM. He has been serving with more than 20 major international journals and more than 40 major international conferences. He is an associate editor of IEEE Transactions on Systems, Man and Cybernetics: Part B, International Journal of Image and Graphics (WorldSci), International Journal of Signal Processing (Elsevier), and Neurocomputing (Elsevier). He is a lead guest editor of three special issues appearing in PRL (Pattern Recognition Letters), IJPRAI (International Journal of Pattern Recognition and Artificial Intelligence) and IEEE TSMC-B, as well as a co-editor of five edited books.

View full text

Semi-supervised Elastic net for pedestrian counting

Abstract

Introduction

Section snippets

Feature extraction

Elastic net

Semi-supervised Elastic net with sequential data

Experiments

Conclusion

Acknowledgements

Pattern Recognition

Pattern Recognition

Dynamic plan generation and real-time management techniques for traffic evacuation

IEEE Transactions on Intelligent Transportation Systems

Crowd analysis: a survey

Journal of Machine Vision and Applications

Crowd monitoring using image processing

Electronics & Communication Engineering Journal

Automatic estimation of crowd density using texture

Safety Science