New Approach May Make More Reliable and Safe Robots

New Approach May Make More Reliable and Safe Robots

Northwestern University researchers have developed a method that allows robots to learn to perform new tasks reliably and safely using less data.
Most robots operate in regimented, rigidly structured manufacturing cells, where they systematically obey programmed instructions. But the real world is far from structured. “If we want robots to operate collaboratively with people, they have to be able to be adaptive in ways that they have not needed to before,” said Todd Murphey, Professor of Mechanical Engineering at Northwestern University.

Northwestern University Professor Todd Murphey along with graduate students Allison Pinosky (left) and Thomas Berrueta are piggybacking on strides in machine learning made over the past decade.
To be functional in the real world, robots need to adapt to chaotic, unstructured environments and learn to deal with novel situations on their own. “We won't accept [any robot] that isn't safe, understandably, and we also won't pay for anything that isn't effective,” Murphey said.

In aiding the development of such useful robots, Murphey, along with his graduate students Thomas Berrueta and Allison Pinosky, are piggybacking on tremendous strides in machine learning over the past decade. Machine learning is perfectly suited for real-world robots because it is the best way to learn in any environment. The research was published in the journal Nature Machine Intelligence.

For the past few decades, researchers have tried applying machine learning to robots, but the results have not been reliable. The robots are often not able to learn to perform their tasks successfully. The question—how much of machine learning as currently understood, really applies to robotics?—is not easily answered. But the researchers are trying. 

Learn More About Membership

Their work finds roots in Reinforcement Learning, a type of machine learning. Reinforcement learning is a decision-making framework for an autonomous agent like a robot, trying to achieve some goal in its environment. Given a lot of data about prior experiences, the agent uses trial and error to learn and achieve its goal. Murphey and team have labeled their approach maximum diffusion reinforcement learning (MaxDiff RL), which allows robots to learn to perform new tasks reliably and safely and has shown impressive results. 


Data in real robots


Despite decades of research, getting real-world robots to learn to perform new tasks on their own in unstructured environments like homes and kitchens has not been easy. Machine learning on robots has not produced reliable results.

A significant challenge: Machine learning is not intended for embodied systems. It’s all very well and good for machine learning to train abstract algorithms like ChatGPT or characters in games but the edifice comes crumbling down when machine learning has to apply to a body operating in the real world. “Machine learning is not intended for systems that obey the laws of physics,” Murphey said. 

The other challenge is that reinforcement learning needs a lot of data about the robot’s experience within its environment, which is difficult to generate. It’s easy to get such data in a simulated setting, with a robot control program running within a virtual environment. But simulated data is difficult to apply to a real robot in a physical environment. Good data for training robots needs results derived from commanding motors in the real world.

Worse, it’s not enough for the training data to be based on the real world; it also needs to be truly random. 

The control system of a robot delivers desired outcomes. It’s the control system that makes a mobile robot move from place to place or has a robot arm manipulate a tool. So, to gather random data for machine learning, a straightforward method would be to mix the robot’s control signal with some amount of random noise, to generate variation in whatever the robot is doing.

More for You: Machine Learning Finds Defects in Wind Turbine Blades

Unfortunately, this approach does not lead to random data. The recorded experiences of the robot have a tendency to produce the same sequences over and over and the resulting record does not represent a randomized sample of all possible states.


MaxDiffRL approach


MaxDiffRL outlines a robot control mechanism for generating truly random data to prime the system for machine learning.

Murphey and team arrived at a theoretical formulation that starts with the goal that the generated data should be random, and then calculates how the control signal should be perturbed to make it so. Their method statistically calculates perturbations in such a way that the robot systematically explores random states of its environment. 

Machine learning algorithms can then apply effectively to the collected data. Such trained robots demonstrate much higher reliability and the ability to learn very quickly.

The takeaway message is that on the one hand, embodied systems like robots present a problem for machine learning because they produce highly correlated data. On the other hand, because their control system can be programmed, they can deliberately explore their environment. “The advantage,” Murphey said, “is that they can respond to these deficiencies, and collect data that makes up for these deficiencies.”

One of the criticisms of machine learning is that it learns via brute force computation, utterly unlike the efficient way that biological systems like animals learn. 

MaxDiff RL turns this idea of brute-force machine learning on its head. It opens up the possibility that robots too could one day sample their environment more intelligently with less data, the way animals can.

The exciting results so far have been on a purely academic benchmark. “Over the next couple of years,” Murphey said, “we will be trying this on academic experimental systems.” And that should pave the way to try them on practical robots at the development level, and after that, in real-world deployment. 

“This research is part of a nascent movement to understand how control architectures should play a role in understanding learning. Most of the data that we have available to us was collected in one way or another by people. At some point, we might actually need automation systems and robotic systems [themselves] to be collecting that data,” Murphey said. 

And it’s in this data collection that MaxDiff RL can help.

Poornima Apte is technology writer based in Walpole, Mass.

 

You are now leaving ASME.org