Overview
Build computer use agents with one unified SDK — any model, any tool
Act SDK
The Act SDK is a unified SDK for building computer use agents with Python and TypeScript. It provides a simple interface for executing looping agentic actions with support for many models and tools. Build production-ready computer use agents with pre-built tools to connect to Scrapybara instances.
How it works
act
initiates an interaction loop that continues until the agent achieves its objective. Each iteration of the loop is called a step
, which consists of the agent’s text response, the agent’s tool calls, and the results of those tool calls. The loop terminates when the agent returns a message without invoking any tools, and returns messages
, steps
, text
, output
(if schema
is provided), and usage
after the agent’s execution.
Python
TypeScript
An act
call consists of 3 core components:
Model
The model specifies the base LLM for the agent. At each step, the model examines the previous messages, the current state of the computer, and uses tools to take action. Currently, the SDK supports Anthropic models, with more providers coming soon.
Python
TypeScript
Tools
Tools are functions that enable agents to interact with the computer. Each tool is defined by a name
, description
, and how it can be executed with parameters
and an execution function. A tool can take in a Scrapybara instance to interact with it directly. Learn more about pre-built tools and how to define custom tools here.
Python
TypeScript
Prompt
The prompt is split into two parts, the system
prompt and a user prompt
. system
defines the general behavior of the agent, such as its capabilities and constraints. You can use our provided SYSTEM_PROMPT
to get started, or define your own. prompt
should denote the agent’s current objective. Alternatively, you can provide messages
instead of prompt
to start the agent with a history of messages. act
conveniently returns messages
after the agent’s execution, so you can reuse it in another act
call.
Python
TypeScript
Structured output
Use the schema
parameter to define a desired structured output. The response’s output
field will contain the typed data returned by the model. This is particularly useful when scraping or collecting structured data from websites.
Under the hood, we pass in a StructuredOutputTool
to enforce and parse the schema.
Python
TypeScript
Agent credits
Consume agent credits or bring your own API key. Without an API key, each step consumes 1 agent credit. With your own API key, model charges are billed directly to your provider API key.
Full example
Here is how you can build a hybrid computer use agent that can take action graphically, control the browser programmatically, and output structured data.