Act SDK | Scrapybara Docs

What is the Act SDK?

The Act SDK is a unified SDK for building computer use agents with Python and TypeScript. It provides a simple interface for executing looping agentic actions with support for many models and tools. Build production-ready computer use agents with pre-built tools to connect to Scrapybara instances.

How it works

act initiates an interaction loop that continues until the agent achieves its objective. Each iteration of the loop is called a step, which consists of the agent’s text response, the agent’s tool calls, and the results of those tool calls. The loop terminates when the agent returns a message without invoking any tools, and returns messages, steps, text, output (if schema is provided), and usage after the agent’s execution.

Python

TypeScript

1 response = client.act(
2     model=OpenAI(),
3     tools=[
4         BashTool(instance),
5         ComputerTool(instance),
6         EditTool(instance),
7     ],
8     system=UBUNTU_SYSTEM_PROMPT,
9     prompt="Go to the top link on Hacker News",
10     on_step=lambda step: print(step.text),
11 )
12 messages = response.messages
13 steps = response.steps
14 text = response.text
15 usage = response.usage

An act call consists of 3 core components:

Model

The model specifies the base LLM for the agent. At each step, the model examines the previous messages, the current state of the computer, and uses tools to take action. Each step will cost an amount of agent credits depending on the model. You can also bring your own API key to bill model charges directly.

Python

TypeScript

1 from scrapybara.openai import OpenAI
2 
3 model = OpenAI()
4 
5 # Use your own API key
6 model = OpenAI(api_key="your_api_key")

Tools

Tools are functions that enable agents to interact with the computer. Each tool is defined by a name, description, and how it can be executed with parameters and an execution function. A tool can take in a Scrapybara instance to interact with it directly. Learn more about pre-built tools and how to define custom tools here.

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.tools import BashTool, ComputerTool, EditTool
3 
4 client = Scrapybara()
5 instance = client.start_ubuntu()
6 
7 tools = [
8     BashTool(instance),
9     ComputerTool(instance),
10     EditTool(instance),
11 ]

Prompt

The prompt is split into two parts, the system prompt and a user prompt. system defines the general behavior of the agent, such as its capabilities and constraints. You can use our provided UBUNTU_SYSTEM_PROMPT, BROWSER_SYSTEM_PROMPT, and WINDOWS_SYSTEM_PROMPT to get started, or define your own. prompt should denote the agent’s current objective. Alternatively, you can provide messages instead of prompt to start the agent with a history of messages. act conveniently returns messages after the agent’s execution, so you can reuse it in another act call.

Python

TypeScript

1 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
2 
3 system = UBUNTU_SYSTEM_PROMPT
4 prompt = "Go to the top link on Hacker News"

Structured output

Use the schema parameter to define a desired structured output. The response’s output field will contain the typed data returned by the model. This is particularly useful when scraping or collecting structured data from websites.

Under the hood, we pass in a StructuredOutputTool to enforce and parse the schema.

Python

TypeScript

1 from pydantic import BaseModel
2 from typing import List
3 
4 class HNSchema(BaseModel):
5     class Post(BaseModel):
6         title: str
7         url: str 
8         points: int
9     
10     posts: List[Post]
11 
12 response = client.act(
13     model=OpenAI(),
14     tools=[
15         ComputerTool(instance),
16     ],
17     schema=HNSchema,
18     system=UBUNTU_SYSTEM_PROMPT,
19     prompt="Get the top 10 posts on Hacker News",
20 )
21 
22 posts = response.output.posts

Agent credits

Consume agent credits or bring your own API key. Without an API key, each step consumes 1 agent credit. With your own API key, model charges are billed directly to your provider API key.

Full example

Here is how you can build a computer use agent that can output structured data.

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.openai import OpenAI
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 from scrapybara.tools import ComputerTool
5 from pydantic import BaseModel
6 from typing import List
7 
8 client = Scrapybara()
9 instance = client.start_ubuntu()
10 
11 class HNSchema(BaseModel):
12     class Post(BaseModel):
13         title: str
14         url: str 
15         points: int
16     
17     posts: List[Post]
18 
19 response = client.act(
20     model=OpenAI(),
21     tools=[
22         ComputerTool(instance),
23     ],
24     system=UBUNTU_SYSTEM_PROMPT,
25     prompt="Get the top 10 posts on Hacker News",
26     schema=HNSchema,
27     on_step=lambda step: print(step.text),
28 )
29 
30 posts = response.output.posts
31 print(posts)
32 
33 instance.stop()