Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web

Robots Learn by Watching Videos – “Researchers at the University of Maryland Institute for Advanced Computer Studies (UMIACS) partnered with a scientist at the National Information Communications Technology Research Centre of Excellence in Australia (NICTA) to develop robotic systems that are able to teach themselves. Specifically, these robots are able to learn the intricate grasping and manipulation movements required for cooking by watching online cooking videos. The key breakthrough is that the robots can “think” for themselves, determining the best combination of observed motions that will allow them to efficiently accomplish a given task. The work [was] presented on Jan. 29, 2015, at the Association for the Advancement of Artificial Intelligence Conference in Austin, Texas. The researchers achieved this milestone by combining approaches from three distinct research areas: artificial intelligence, or the design of computers that can make their own decisions; computer vision, or the engineering of systems that can accurately identify shapes and movements; and natural language processing, or the development of robust systems that can understand spoken commands. Although the underlying work is complex, the team wanted the results to reflect something practical and relatable to people’s daily lives.”

Association for the Advancement of Artificial Intelligence, 2015. Authors – Yezhou Yang, University of Maryland; Yi Li, NICTA, Australia; Cornelia Fermuller, University of Maryland; Yiannis Aloimonos, University of Maryland.  “In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by “watching” unconstrained videos with high accuracy.”

Sorry, comments are closed for this post.