DOI: https://doi.org/10.20535/2523-4455.2019.24.6.197449
Зображення обкладинки

Удосконалений метод визначення положення суглобових з’єднань скелету людини на відеопослідовностях

Denys Volodymyrovych Soldatov, Anton Yuriiovych Varfolomieiev

Анотація


В роботі запропоновано ряд удосконалень методу визначення положення суглобових з’єднань скелету людини на відеопослідовностях з метою підвищення точності прогнозування положення людини у просторі. Це досягається за рахунок застосування наступних нововведень: врахування інформації про кути переміщення та наближення чи віддалення людини, що дозволяє розрізняти рухи, які схожі у відцентрованих кадрах, але відрізняються переміщенням; використання адаптивного розміру вікна для розрахунку HOG3D ознак; використання нейронної мережі для екстраполяції положень суглобових з’єднань у просторі у випадку відсутності або недостатньої точності прогнозування. Експериментальна перевірка, проведена на наборі даних HumanEva-1, показала підвищення в середньому на 11 пікселів точності локалізації суглобових з’єднань при застосуванні запропонованих модифікацій та підтвердила перспективність використання удосконаленого методу для подальшого вирішення задачі розпізнавання рухів.


Ключові слова


розпізнавання рухів; прогнозування; CNN; HOG3D

Повний текст:

PDF

Перелік посилань для Cited-By Linking


A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-toend Recovery of Human Shape and Pose. CVPR, [Online]. Available: https://arxiv.org/abs/1712.06584, 2018. [Accessed 29 11 2019]

D. Xiang, H. Joo, and Y. Sheikh. Monocular Total Capture: Posing Face, Body, and Hands in the Wild. [Online]. Available: https://arxiv.org/abs/1812.01598, 2018. [Accessed 29 11 2019]

D.Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.P. Seidel, H. Rhodin, G. Pons-Moll, C. Theobalt, XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera, [Online] Available: https://arxiv.org/abs/1907.008372019 [Accessed 29 11 2019]

C. Ionescu, D. Papava, V. Olaru and C. Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, No. 7, July 2014.

L. Sigal, A. Balan and M. J. Black, "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion," International Journal of Computer Vision (IJCV), vol. 87, pp. 4–27, 2010. DOI: 10.1007/s11263-009-0273-6

S. Li and A. B. Chan, "3D Human Pose Estimation from Monocular Images with Deep Convolutional Network," Asian Conference on Computer Vision (ACCV), 2014. DOI: 10.1007/978-3-319-16808-1_23

N.C. Camgoz, S. Hadfield, O. Koller and R. Bowden, "Using convolutional 3D neural networks for userindependent continuous gesture recognition," 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 49–54, 2016. DOI: 10.1109/ICPR.2016.7899606

C. Cao, C. Lan, Y. Zhang, W. Zeng, H. Lu and Y. Zhang. "Skeleton-Based Action Recognition with Gated Convolutional Neural Networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 11, pp. 3247–3257, 2018. DOI: 10.1109/TCSVT.2018.2879913

Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 579–583, 2015. DOI: 10.1109/ACPR.2015.7486569

Skeleton-Based Action Recognition with Directed Graph Neural Networks. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2019. DOI: 10.1109/CVPR.2019.00810

L. Li, W. Zheng, Z. Zhang, Y. Huang, and L. Wang, "Skeleton-Based Relational Modeling for Action Recognition" [Online]. Available: https://arxiv.org/abs/1805.02556, 2018. [Accessed 29 11 2019]

L. Shi, Y. Zhang, J. Cheng, and H. Lu. NonLocal Graph Convolutional Networks for Skeleton-Based Action Recognition. [Online] Available: https://arxiv.org/abs/1805.07694, May 2018. [Accessed 29 11 2019]

R. Urtasun, D. Fleet, and P. Fua, "3D People Tracking with Gaussian Process Dynamical Models," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006. DOI: 10.1109/CVPR.2006.15

C. Sminchisescu and B. Triggs, "Covariance Scaled Sampling for Monocular 3D Body Tracking," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2001. DOI: 10.1109/CVPR.2001.990509

M. Burenius, J. Sullivan and S. Carlsson, "3D Pictorial Structures for Multiple View Articulated Pose Estimation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. DOI: 10.1109/CVPR.2013.464

V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab and S. Ilic, "3D Pictorial Structures for Multiple Human Pose Estimation" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. DOI: 10.1109/CVPR.2014.216

P. Felzenszwalb, R. Girshick, D. McAllester and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 32, no. 9, pp. 1627–1645, 2010. DOI: 10.1109/TPAMI.2009.167

B. Sapp, A. Toshev and B. Taskar, "Cascaded Models for Articulated Pose Estimation," Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 406–420, 2010. DOI: 10.1007/978-3-642-15552-9_30

A. Agarwal and B. Triggs, "3D Human Pose from Silhouettes by Relevance Vector Regression," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2004. DOI: 10.1109/CVPR.2004.1315258

L. Sigal, A. Balan and M. J. Black, "Combined Discriminative and Generative Articulated Pose and Non-rigid Shape Estimation," Advances in Neural Information Processing Systems (NIPS), 2007.

C. Ionescu, I. Papava, V. Olaru and C. Sminchisescu. "Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, no. 7, pp. 1325–1339, 2014. DOI: 10.1109/TPAMI.2013.248

J. Shotton, A. Fitzgibbon, M. Cook and A. Blake, "Real-Time Human Pose Recognition in Parts from a Single Depth Image," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. DOI: 10.1109/CVPR.2011.5995316

C. Ionescu, J. Carreira and C. Sminchisescu, "Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. DOI: 10.1109/CVPR.2014.215

M. Andriluka, S. Roth and B. Schiele, "Monocular 3D Pose Estimation and Tracking by Detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2010. DOI: 10.1109/CVPR.2010.5540156

M. Hofmann and D. M. Gavrila, "Multi-view 3D Human Pose Estimation in Complex Environment," International Journal of Computer Vision (IJCV), vol. 96, pp. 103–124, 2012. DOI: 10.1007/s11263-011-0451-1

S. Zuffi, J. Romero, C. Schmid and M. J. Black, "Estimating Human Pose with Flowing Puppets," IEEE International Conference on Computer Vision (ICCV), 2013. DOI: 10.1109/ICCV.2013.411

B. Tekin, X. Sun, X. Wang, V. Lepetit and P. Fua, "Predicting People's 3D Poses from Short Sequences" [Online]. Available: https://arxiv.org/abs/1504.08200, 2018. [Accessed 29 11 2019]

D. Weinland, M. Ozuysal and P. Fua, "Making Action Recognition Robust to Occlusions and Viewpoint Changes," Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol. 6313, pp. 635–648, 2010. DOI: 10.1007/978-3-642-15558-1_46

D. Park, C. L. Zitnick, D. Ramanan and P. Dollar, "Exploring Weak Stabilization for Motion Feature Extraction," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. DOI: 10.1109/CVPR.2013.371

S. Li and A.B. Chan. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network. In ACCV, 2014 DOI: 10.1007/978-3-319-16808-1_23

Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR (2012) [Online]. Available: https://arxiv.org/abs/1207.0580 [Accessed 29 11 2019]


Перелік посилань


  1. A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik. End-toend Recovery of Human Shape and Pose. CVPR, [Online]. Available: https://arxiv.org/abs/1712.06584, 2018. [Accessed 29 11 2019]
  2. D. Xiang, H. Joo, and Y. Sheikh. Monocular Total Capture: Posing Face, Body, and Hands in the Wild. [Online]. Available: https://arxiv.org/abs/1812.01598, 2018. [Accessed 29 11 2019]
  3. D.Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.P. Seidel, H. Rhodin, G. Pons-Moll, C. Theobalt, XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera, [Online] Available: https://arxiv.org/abs/1907.008372019 [Accessed 29 11 2019]
  4. C. Ionescu, D. Papava, V. Olaru and C. Sminchisescu, Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, No. 7, July 2014.
  5. L. Sigal, A. Balan and M. J. Black, "HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion," International Journal of Computer Vision (IJCV), vol. 87, pp. 4–27, 2010. DOI: 10.1007/s11263-009-0273-6
  6. S. Li and A. B. Chan, "3D Human Pose Estimation from Monocular Images with Deep Convolutional Network," Asian Conference on Computer Vision (ACCV), 2014. DOI: 10.1007/978-3-319-16808-1_23
  7. N.C. Camgoz, S. Hadfield, O. Koller and R. Bowden, "Using convolutional 3D neural networks for userindependent continuous gesture recognition," 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 49–54, 2016. DOI: 10.1109/ICPR.2016.7899606
  8. C. Cao, C. Lan, Y. Zhang, W. Zeng, H. Lu and Y. Zhang. "Skeleton-Based Action Recognition with Gated Convolutional Neural Networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 11, pp. 3247–3257, 2018. DOI: 10.1109/TCSVT.2018.2879913
  9. Y. Du, Y. Fu, and L. Wang, "Skeleton based action recognition with convolutional neural network," 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 579–583, 2015. DOI: 10.1109/ACPR.2015.7486569
  10. Skeleton-Based Action Recognition with Directed Graph Neural Networks. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2019. DOI: 10.1109/CVPR.2019.00810
  11. L. Li, W. Zheng, Z. Zhang, Y. Huang, and L. Wang, "Skeleton-Based Relational Modeling for Action Recognition" [Online]. Available: https://arxiv.org/abs/1805.02556, 2018. [Accessed 29 11 2019]
  12. L. Shi, Y. Zhang, J. Cheng, and H. Lu. NonLocal Graph Convolutional Networks for Skeleton-Based Action Recognition. [Online] Available: https://arxiv.org/abs/1805.07694, May 2018. [Accessed 29 11 2019]
  13. R. Urtasun, D. Fleet, and P. Fua, "3D People Tracking with Gaussian Process Dynamical Models," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006. DOI: 10.1109/CVPR.2006.15
  14. C. Sminchisescu and B. Triggs, "Covariance Scaled Sampling for Monocular 3D Body Tracking," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2001. DOI: 10.1109/CVPR.2001.990509
  15. M. Burenius, J. Sullivan and S. Carlsson, "3D Pictorial Structures for Multiple View Articulated Pose Estimation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. DOI: 10.1109/CVPR.2013.464
  16. V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab and S. Ilic, "3D Pictorial Structures for Multiple Human Pose Estimation" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. DOI: 10.1109/CVPR.2014.216
  17. P. Felzenszwalb, R. Girshick, D. McAllester and D. Ramanan, "Object Detection with Discriminatively Trained Part Based Models," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 32, no. 9, pp. 1627–1645, 2010. DOI: 10.1109/TPAMI.2009.167
  18. B. Sapp, A. Toshev and B. Taskar, "Cascaded Models for Articulated Pose Estimation," Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 406–420, 2010. DOI: 10.1007/978-3-642-15552-9_30
  19. A. Agarwal and B. Triggs, "3D Human Pose from Silhouettes by Relevance Vector Regression," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2004. DOI: 10.1109/CVPR.2004.1315258
  20. L. Sigal, A. Balan and M. J. Black, "Combined Discriminative and Generative Articulated Pose and Non-rigid Shape Estimation," Advances in Neural Information Processing Systems (NIPS), 2007.
  21. C. Ionescu, I. Papava, V. Olaru and C. Sminchisescu. "Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 36, no. 7, pp. 1325–1339, 2014. DOI: 10.1109/TPAMI.2013.248
  22. J. Shotton, A. Fitzgibbon, M. Cook and A. Blake, "Real-Time Human Pose Recognition in Parts from a Single Depth Image," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. DOI: 10.1109/CVPR.2011.5995316
  23. C. Ionescu, J. Carreira and C. Sminchisescu, "Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. DOI: 10.1109/CVPR.2014.215
  24. M. Andriluka, S. Roth and B. Schiele, "Monocular 3D Pose Estimation and Tracking by Detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2010. DOI: 10.1109/CVPR.2010.5540156
  25. M. Hofmann and D. M. Gavrila, "Multi-view 3D Human Pose Estimation in Complex Environment," International Journal of Computer Vision (IJCV), vol. 96, pp. 103–124, 2012. DOI: 10.1007/s11263-011-0451-1
  26. S. Zuffi, J. Romero, C. Schmid and M. J. Black, "Estimating Human Pose with Flowing Puppets," IEEE International Conference on Computer Vision (ICCV), 2013. DOI: 10.1109/ICCV.2013.411
  27. B. Tekin, X. Sun, X. Wang, V. Lepetit and P. Fua, "Predicting People's 3D Poses from Short Sequences" [Online]. Available: https://arxiv.org/abs/1504.08200, 2018. [Accessed 29 11 2019]
  28. D. Weinland, M. Ozuysal and P. Fua, "Making Action Recognition Robust to Occlusions and Viewpoint Changes," Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol. 6313, pp. 635–648, 2010. DOI: 10.1007/978-3-642-15558-1_46
  29. D. Park, C. L. Zitnick, D. Ramanan and P. Dollar, "Exploring Weak Stabilization for Motion Feature Extraction," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. DOI:  10.1109/CVPR.2013.371
  30. S. Li and A.B. Chan. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network. In ACCV, 2014 DOI: 10.1007/978-3-319-16808-1_23
  31. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR (2012) [Online]. Available: https://arxiv.org/abs/1207.0580 [Accessed 29 11 2019]






Copyright (c) 2019 Солдатов Д. В., Варфоломєєв А. Ю.

Creative Commons License
Ця робота ліцензована Creative Commons Attribution 4.0 International License.

ISSN: 2523-4447
e-ISSN: 2523-4455