ARTICLE AD BOX
![]()
Robots are getting better at recognising objects, navigating buildings and following spoken instructions, but remembering places over long periods has remained a surprisingly difficult task.
While people can often recall where they left a bag, a set of keys or an unfinished project without consciously thinking about it, machines usually treat every new journey as a fresh experience unless they are carefully programmed otherwise. A team at the Massachusetts Institute of Technology (MIT) has been working on that gap by building a memory system designed to help robots connect objects, locations and time in a much more natural way.
Rather than simply mapping walls and corridors, the system creates a richer record of the world around it, allowing a robot to answer questions about things it has previously encountered using ordinary language instead of technical commands.
How MIT is teaching robots to remember more than just maps
Most robots already rely on maps to understand where they are. These digital maps help them avoid obstacles, move through buildings, and return to familiar locations. What they generally do not capture is the kind of everyday context that humans remember without much effort.
A person walking through an office may remember that a blue backpack was left beside a meeting room or that a broken chair sits near the entrance. Standard robotic maps usually record the physical layout rather than these small but useful details. The MIT team wanted to close that gap by combining spatial awareness with descriptive information. Their framework allows a robot to build a long-term memory that links locations with detailed observations gathered while exploring an environment over days or even longer.As reported by MIT News, “If we want robots to work side-by-side with humans and interact better with humans, they must speak the same language. The robot must be able to reason about time and space the same way humans do. That is essentially what our method is doing. It is turning a traditional map into a language-based map that is easier for the robot to think about and access using language,” says Luca Carlone, an associate professor in MIT’s Department of Aeronautics and Astronautics (AeroAstro), principal investigator in the Laboratory for Information and Decision Systems (LIDS), and director of the MIT SPARK Laboratory.
How DAAAM creates meaningful spatial memories
The new system has been given the name Describe Anything, Anywhere, Anytime, at Any Moment, shortened to DAAAM. Rather than storing only positions and distances, it records descriptive information about the things a robot encounters.If the robot moves through a university campus, for example, it might recognise a well-known building, identify architectural features, notice a bicycle rack nearby and distinguish between several different bicycles.
Instead of treating these as isolated observations, it stores them together within the same area of its digital map.That means a robot could later connect a red bicycle with a flat tyre to the building beside which it was seen, creating a much richer memory than a traditional navigation system could provide.
How MIT merged two different AI approaches
The project brings together technologies that have largely developed separately.Modern computer vision systems are remarkably good at describing objects inside photographs.
They can recognise furniture, vehicles, buildings and many everyday items while generating detailed written descriptions. These systems, though, are generally designed to analyse one image or scene at a time rather than maintaining a long-term understanding of an entire environment.Robotic mapping systems take the opposite approach. They build detailed three-dimensional models of buildings, streets or campuses so that machines can move safely through them, but they often lack rich descriptions of the objects contained within those spaces.DAAAM blends these two ideas into a single framework, giving robots both an accurate map and an organised memory of what they have seen.
How robots save time by analysing multiple objects at once
Recording detailed descriptions takes time, and that quickly becomes a problem for mobile robots. A machine travelling through a factory, warehouse or university could encounter hundreds of objects within only a few minutes. Analysing every one of them individually would slow the system too much for practical use.To avoid that bottleneck, the MIT team developed a method that groups nearby objects and chooses only the most useful camera views for detailed analysis. These selected images capture several objects at once, allowing the system to describe multiple items in parallel instead of processing each separately.That simple adjustment greatly reduces the amount of work required while still preserving detailed information about the environment.
How robots quickly recall objects and places from past journeys
Remembering information is only part of the challenge. A robot also needs to retrieve it quickly when someone asks a question.The researchers addressed this by linking the memory framework with a large language model that can decide which search tools to use depending on the request. Instead of relying on a single search method, the system selects different ways of finding information based on meaning, location or other clues.If someone asks about a sculpture seen near a particular campus building, the robot can search both for the artwork itself and for the surrounding location before producing an answer. This approach also helps reduce incorrect responses because the language model retrieves information directly from the stored memory rather than relying on guesses.
Better performance across different questions
During testing, DAAAM consistently outperformed existing systems when answering questions about environments it had explored.
Depending on the type of query, its accuracy improved by between 21 and 53 per cent compared with earlier approaches, as reported by MIT.Those gains came from combining detailed descriptions with organised spatial information rather than relying on either method alone. The system is also fast enough to operate while a robot is actively moving, making it suitable for real-world environments instead of controlled demonstrations.
Where this technology could be used
Factories are an obvious setting because workers and robots often share the same workspace. A robot with this type of memory could retrieve tools, unfinished components or equipment based on spoken instructions referring to earlier tasks instead of requiring exact coordinates.Other possibilities extend beyond manufacturing. Maintenance staff wearing augmented reality devices could receive assistance locating equipment or identifying unexpected changes inside large buildings. Visitors navigating complex campuses or public spaces might also benefit from systems capable of recalling landmarks and answering location-based questions naturally.

English (US) ·