Physical AI - Real World Application
Archetype AI Team. November 1, 2023.
Generative AI and large language models (LLMs) have opened new avenues for innovation and understanding across the digital domain. This became possible because LLMs trained on massive internet text datasets can learn the fundamental structure of language. What if AI models could also learn the fundamental structure of physical behaviors–how the physical world around us changes over space and time? What if AI could help us understand the behaviors of real world systems, objects, and people?
At Archetype AI, we believe that this understanding could help to solve humanity’s most important problems. That is why we are building a new type of AI: physical AI, the fusion of artificial intelligence with real world sensor data, enabling real time perception, understanding, and reasoning about the physical world. Our vision is to encode the entire physical world, capturing the fundamental structures and hidden patterns of physical behaviors.
We’re building the first AI foundation model that learns about the physical world directly from sensor data, with the goal of helping humanity understand the complex behavior patterns of the world around us all. We call it a Large Behavior Model, or LBM. Here’s how it works.
Getting Data From The Real World
LLMs get their data from text—the vast corpus of writing humans have added to the internet over the last several decades. But that is the extent of their knowledge.
The web captures just a small glimpse of the true nature of the physical world. To truly understand real world behaviors, an AI model must be able to interpret much more than human-friendly data like text and images. Many physical phenomena are beyond direct human perception: too fast-moving, complex, or simply outside the realm of what our biological senses can detect. For example, we can not see or hear the temperature of objects or the chemical composition of air. An AI model trained only on a human interpretation of the physical world is inherently limited and biased in its understanding.
Our foundation model learns about the physical world directly from sensor data – any and all sensor data. Not just cameras and microphones, but temperature sensors, inertial sensors, radar, LIDAR, infrared, pressure sensors, chemical and environmental sensors, and more. We are surrounded by sensors that capture rich spatial, temporal, and material properties of the world.
Different sensors measure different physical properties and phenomena which are related and correlated. To encode the entire physical world, information from all these sensors must be encoded, fused and processed into a single representation. This fusion enables a holistic approach in perceiving and comprehending the physical world. And by allowing the model to ingest sensor data directly from the physical world, we can have it learn about the world without human biases or perceptive limitations.
The radar signals are difficult to interpret. We are applying powerful physical AI to make sense of complex signals like this one.
Just as the petabytes of writing on the web underpin LLMs’ human-like reasoning capability, this wealth of real world sensing data can enable physical AI to develop the capability to interpret and reason about the physical world.
But sensor data isn’t text. Incorporating multimodal sensor data into an AI model isn’t trivial because different sensors produce very different raw signals: not just in the format of the data, but also in the information that the data represents. Current ML techniques require a custom processing pipeline or model for each kind of sensor. This approach makes it infeasible to fuse and interpret data across multiple sensing modalities at scale.
Our approach to solve this is to develop a single foundation model for all sensor data, enabled by a universal sensor language.
A Universal Embedding: The “Rosetta Stone” for the Physical World
[The concept of creating