Act SDK
Build computer use agents with one unified SDK — any model, any tool
What is the Act SDK?
The Act SDK is a unified SDK for building computer use agents with Python and TypeScript. It provides a simple interface for executing looping agentic actions with support for many models and tools. Build production-ready computer use agents with pre-built tools to connect to Scrapybara instances.
How it works
act initiates an interaction loop that continues until the agent achieves its objective. Each iteration of the loop is called a step, which consists of the agent’s text response, the agent’s tool calls, and the results of those tool calls. The loop terminates when the agent returns a message without invoking any tools, and returns messages, steps, text, output (if schema is provided), and usage after the agent’s execution.
Python
TypeScript
An act call consists of 3 core components:
Model
The model specifies the base LLM for the agent. At each step, the model examines the previous messages, the current state of the computer, and uses tools to take action. Each step will cost an amount of agent credits depending on the model. You can also bring your own API key to bill model charges directly.
Python
TypeScript
Tools
Tools are functions that enable agents to interact with the computer. Each tool is defined by a name, description, and how it can be executed with parameters and an execution function. A tool can take in a Scrapybara instance to interact with it directly. Learn more about pre-built tools and how to define custom tools here.
Python
TypeScript
Prompt
The prompt is split into two parts, the system prompt and a user prompt. system defines the general behavior of the agent, such as its capabilities and constraints. You can use our provided UBUNTU_SYSTEM_PROMPT, BROWSER_SYSTEM_PROMPT, and WINDOWS_SYSTEM_PROMPT to get started, or define your own. prompt should denote the agent’s current objective. Alternatively, you can provide messages instead of prompt to start the agent with a history of messages. act conveniently returns messages after the agent’s execution, so you can reuse it in another act call.
Python
TypeScript
Structured output
Use the schema parameter to define a desired structured output. The response’s output field will contain the typed data returned by the model. This is particularly useful when scraping or collecting structured data from websites.
Under the hood, we pass in a StructuredOutputTool to enforce and parse the schema.
Python
TypeScript
Agent credits
Consume agent credits or bring your own API key. Without an API key, each step consumes 1 agent credit. With your own API key, model charges are billed directly to your provider API key.
Full example
Here is how you can build a computer use agent that can output structured data.
