We've come a long way with large language models since 2022, the year ChatGPT was born. In just two years, LLMs have become the hottest topic in AI, changing more than we could have expected. In 2024, a new term is trending: large action models (LAMs).
LLMs, as the name suggests, are language models with the main function of generating text. Large action models, on the other hand, can “generate” or perform actions with the given instructions. These actions can range from moving a cursor to ordering an Uber based on user commands.
Large action models are an exciting step toward general intelligence, or AGI, a concept we've discussed and sometimes feared — it's starting to feel a lot more real now. In this blog post, we'll delve into large action models, understand how they operate, their connection to LLM agents, and introduce the key yet few players in the LAM field.
What is a large action model (LAM) ?
Large action model is a generative AI model that performs certain actions given the commands of the user. They’re at the heart of modern AI agents and are designed to act like humans in terms of analyzing data and then acting based on it.
Unlike large language models (LLMs) that focus on processing and generating text, LAMs are built to take concrete actions. For example, while an LLM might be used to understand and respond to customer inquiries, an LAM-based AI agent could autonomously handle the tasks those inquiries concern, like adjusting a service or processing a return, without human intervention.
Although LAM is a relatively new concept and now widely implemented, its applications can reach wide audiences. It can be really helpful in processes when human time can be saved. For example, take healthcare – a large action model can automate routine tasks like scheduling and managing patient records. Or manufacturing - a well-programmed LAM can control and optimize production processes, reducing the need for human oversight.
It all boils down to one essential point - designing an optimal large action model that can really reduce or almost eliminate the need for human intervention is not easy. LAMs need advanced infrastructure and significant investments in training to work effectively. For any organization thinking about adopting LAMs, it's important to fully appreciate not just the advantages but also the limitations they come with.
LAMs and AI agents
The more trending alternative for an LAM is AI or LLM agents. They both explain the same concept - task automation using AI systems. But to be more precise, LAM is what works under the hood in agentic systems. In order to complete a given task, an AI agent’s supposed to have a well-designed large action model behind the scenes.
So, while an AI agent is the broader entity that acts and makes decisions, a large action model is a sophisticated component that helps the agent understand and execute complex tasks. It's like if the AI agent is a person, the large action model would be the brain that plans and acts efficiently.
How does an LAM agent work?
A large action model agent usually works with a few components building on top of each other. They work a lot like an LLM agent would work, except they may require multimodal functionality and broader access to external tools. Here’s a very high-level overview of LAM agent components:
Large language model (LLM): A large action model starts with a foundational LLM working under the hood. This is the groundwork of any LAM.
LLM fine-tuning or RLHF: Then, to make an LLM fluent in specific areas, you use alignment techniques like LLM fine-tuning, RLHF/RLAIF or DPO. This can require multimodal data training like including text, image, audio – depending on the application of the LAM.
External tools: This is the step that turns an LLM into an LAM agent. Here, the fine-tuned LLM is connected to external tools that will allow it to perform actions on its own.
Large action model examples
The main talk about LAMs started with Rabbit AI’s release of R1, but there are a few other players in the game. In particular, the recent release of Anthropic’s Claude features shook the AI community with what’s possible in agentic AI.
Claude’s computer use
Antropic recently released a mind-blowing feature with Claude 3.5 Sonnet’s latest update – computer use.
With computer use, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. At this stage, it is still experimental—at times error-prone. Computer use is a groundbreaking feature that makes Claude what we call a large action model – an agent that can act instead of the human.
There are some wild examples on X of people trying computer use.
1. Claude ordered Uber for the user.
2. Claude posts on X.
3. Claude solves Captchas.
You can find other fascinating use cases under this thread.
Rabbit AI's R1
The concept of large action models started trending when Rabbit AI released its trainable AI assistant – the R1 device. It uses a large action model to mimic and automate human interactions across different tech interfaces. R1 is trained to perform tasks like making reservations, ordering services, or providing directions.
R1 is still in the pre-order stage, meaning it's relatively new on the market and doesn't have a wide range of reviews or use cases yet. However, its introduction is starting to challenge the notion that large action models are far-reaching across industries.
Rabbit AI also introduced LAM playground – one of their new experiments to continue building their vision for a cross-platform generic agent system that can help users perform actions.
Adept AI’s ACT-1
Adept AI has put a lot of effort into large action model development and is building the next frontier of models that can take action in the digital world.
In 2022, they announced their first large model – action transformer (ACT-1). Now, Adept is focused on building agentic workflows that work with the large action model logic.
Salesforce XLam
Salesforce released its xLAM family of LAMs on September 6, 2024.
The xLAM family consists of several models, including:
- xLAM-1B: A compact model that has shown impressive performance in function-calling tasks despite having only 1 billion parameters. It is open-source and available for community experimentation.
- xLAM-7B: A larger model designed for more demanding applications.
- xLAM-8x22B: A high-performance model aimed at industrial applications requiring significant computational resources
LAMs vs. LLMs
Large language models (LLMs) are primarily focused on processing and generating language. They're trained to grasp natural language, enabling them to excel at tasks like creating text, translating languages, and summarizing complex documents. These models rely heavily on analyzing vast amounts of text to detect patterns and meanings.
In contrast, large action models (LAMs) are built to act based on their analysis and the data they receive. They do more than process text; they interact with various systems and interfaces to perform tasks that involve actual actions, such as controlling robots or managing software processes. LAMs can handle a variety of data types, including text, images, and other sensor data, allowing them to function in more dynamic environments.
While LLMs produce text-based outputs and insights, they don't interact with the external world. LAMs, however, are designed to perform tasks that involve real-world interfaces and responses. They also learn from the results of their actions, using this feedback to improve and adapt over time. This learning capability is essential for applications that require continuous adjustments to changing conditions.
LAM use cases and applications
The primary usage of any large action model is automation. By automation, we mean a considerable reduction of human intervention in the given task. So it’s safe to say that LAMs can find their use case anywhere when humans need to concentrate on other, not repetitive tasks. Here are a few areas we think LAMs can make big changes.
1. Smart device helpers: Imagine having a tiny gadget like Rabbit's new R1 that debuted at CES 2024. This little device uses an LAM to understand and respond to your voice commands, making it feel like you're just chatting with a friend. Some early R1 adopters did find it redundant to carry another device apart from their mobiles, but if further developed, R1 might have chances to be a fun and helpful gadget.
2. Smarter customer support: In the near future, we may have customer support chatbots that don’t only answer questions. If the LAM technology reaches a point where it can execute tasks and do it quickly, we may have chatbots that book appointments, process returns, or update account information, all through automated interactions.
3. Personalized marketing: LAMs can automate parts of the marketing process by reacting to customer behaviors. For example, if many customers show interest in a particular type of product through their interactions, an LAM could automatically adjust email campaigns to highlight that product, streamlining the process without ongoing manual intervention.
4. Content generation: LAMs can help manage the timing and type of content created based on data-driven insights. For businesses with seasonal trends, a LAM could prepare and schedule marketing content in advance, ensuring it's relevant and timely based on previous successful strategies.
5. Enhanced data insights: In practical terms, LAMs can be programmed to handle routine data monitoring tasks. In supply chain management, a LAM could track inventory levels, identify potential shortages based on trends, and initiate orders with suppliers to prevent stockouts, all without needing manual checks.
Wrapping up
2024 has been the year of AI agents – autonomous, multi-agent, complex systems that can do more than plain text generation. The ‘more’ here means taking actions and replacing humans in repetitive and somewhat easily automatable tasks. These agents work based on large action models, which are complex AI systems that can take actions on your laptop or mobile device.
R1, Claude, ACT-1, are the pioneer examples of large action models, but with the ongoing discussions and developments around AI agents, we anticipate this list to expand in the near future – the actionable and agentic future.