Conversations | Scrapybara Docs

Understanding multi-turn conversations

Multi-turn conversations in Scrapybara enable your agents to maintain context and state across multiple interactions. The Act SDK provides a structured way to manage these conversations through its message architecture.

Message architecture

The Act SDK uses a structured message system with three primary message types and five different part types. Understanding these components is crucial for building sophisticated multi-turn agents.

Message types

1 # Message types
2 class UserMessage:
3     role: str = "user"  # Always "user"
4     content: List[Union[TextPart, ImagePart]]  # What the user sends
5 
6 class AssistantMessage:
7     role: str = "assistant"  # Always "assistant"
8     content: List[Union[TextPart, ToolCallPart, ReasoningPart]]  # The agent's response
9     response_id: Optional[str] = None  # Unique identifier for the response
10 
11 class ToolMessage:
12     role: str = "tool"  # Always "tool"
13     content: List[ToolResultPart]  # Results from tool operations
14 
15 Message = Union[UserMessage, AssistantMessage, ToolMessage]

Message part types

Each message type contains various “parts” that serve different purposes:

1 # Message part types
2 class TextPart:
3     type: str = "text"  # Always "text"
4     text: str  # Plain text content
5 
6 class ImagePart:
7     type: str = "image"  # Always "image"
8     image: str  # Base64 encoded image or URL
9     mime_type: Optional[str] = None  # e.g., "image/png", "image/jpeg"
10 
11 class ToolCallPart:
12     type: str = "tool-call"  # Always "tool-call"
13     id: Optional[str] = None  # Unique identifier for the tool call
14     tool_call_id: str  # ID matching the tool result
15     tool_name: str  # Name of the tool being called
16     args: dict[str, Any]  # Arguments passed to the tool
17 
18 class ToolResultPart:
19     type: str = "tool-result"  # Always "tool-result"
20     tool_call_id: str  # ID matching the original tool call
21     tool_name: str  # Name of the tool that was called
22     result: Any  # Result returned by the tool
23     is_error: Optional[bool] = False  # Whether the tool execution resulted in an error
24 
25 class ReasoningPart:
26     type: str = "reasoning"  # Always "reasoning"
27     id: Optional[str] = None  # Unique identifier for the reasoning part
28     reasoning: str  # The agent's internal reasoning
29     signature: Optional[str] = None  # Cryptographic signature for verification
30     instructions: Optional[str] = None  # Additional context about the reasoning

Building multi-turn conversations

Instead of providing a single prompt, you can pass a complete message history using the messages parameter. This allows you to maintain the full conversation context. The Act SDK returns a messages field in the response that contains the complete conversation history. You can reuse this directly in your next act call.

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.anthropic import Anthropic
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 from scrapybara.tools import BashTool, ComputerTool, EditTool
5 
6 client = Scrapybara()
7 instance = client.start_ubuntu()
8 
9 # Initial conversation
10 response = client.act(
11     model=Anthropic(),
12     tools=[
13         BashTool(instance),
14         ComputerTool(instance),
15         EditTool(instance),
16     ],
17     on_step=lambda step: print(step.text),
18     system=UBUNTU_SYSTEM_PROMPT,
19     prompt="Create a file called hello.py that prints 'Hello, World!'",
20 )
21 
22 print('--------------------------------')
23 
24 # Continue the conversation with the previous messages
25 follow_up_response = client.act(
26     model=Anthropic(),
27     tools=[
28         BashTool(instance),
29         ComputerTool(instance),
30         EditTool(instance),
31     ],
32     on_step=lambda step: print(step.text),
33     system=UBUNTU_SYSTEM_PROMPT,
34     messages=response.messages + [
35         {
36             "role": "user",
37             "content": [
38                 {
39                     "type": "text",
40                     "text": "Now modify the file to accept a name as a command line argument and print 'Hello, {name}!'"
41                 }
42             ]
43         }
44     ]
45 )
46 
47 instance.stop()

Including screenshots in messages

Screenshots are a powerful way to provide visual context to your agent. You can include them in user messages using the ImagePart type.

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.anthropic import Anthropic
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 
5 client = Scrapybara()
6 instance = client.start_ubuntu()
7 
8 # Take a screenshot
9 screenshot = instance.screenshot().base_64_image
10 
11 # Send the screenshot to the agent
12 messages = [
13     {
14         "role": "user",
15         "content": [
16             {
17                 "type": "text",
18                 "text": "What do you see in this screenshot? Describe the desktop environment."
19             },
20             {
21                 "type": "image",
22                 "image": 'data:image/png;base64,' + screenshot,
23                 "mime_type": "image/png"
24             }
25         ]
26     }
27 ]
28 
29 response = client.act(
30     model=Anthropic(),
31     system=UBUNTU_SYSTEM_PROMPT,
32     messages=messages,
33 )
34 
35 print(response.text)
36 instance.stop()

Working with tools and reasoning

The Act SDK captures both tool calls and agent reasoning in its message architecture. Here’s how you can access and work with this information:

Examining tool calls and results

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.anthropic import Anthropic
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 from scrapybara.tools import BashTool
5 
6 client = Scrapybara()
7 instance = client.start_ubuntu()
8 
9 response = client.act(
10     model=Anthropic(),
11     tools=[BashTool(instance)],
12     system=UBUNTU_SYSTEM_PROMPT,
13     prompt="Show me the current directory structure",
14 )
15 
16 # Analyze the conversation steps
17 for message in response.messages:
18     if message.role == "assistant":
19         for part in message.content:
20             if part.type == "tool-call":
21                 print(f"Tool called: {part.tool_name}")
22                 print(f"Arguments: {part.args}")
23     elif message.role == "tool":
24         for part in message.content:
25             print(f"Tool result from {part.tool_name}: {part.result}")
26 
27 instance.stop()

Accessing agent reasoning

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.anthropic import Anthropic
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 from scrapybara.tools import BashTool, ComputerTool
5 
6 client = Scrapybara()
7 instance = client.start_ubuntu()
8 
9 response = client.act(
10     model=Anthropic(name="claude-3-7-sonnet-20250219-thinking"),
11     tools=[
12         BashTool(instance),
13         ComputerTool(instance),
14     ],
15     system=UBUNTU_SYSTEM_PROMPT,
16     prompt="Open Firefox and navigate to scrapybara.com",
17 )
18 
19 # Extract reasoning parts from assistant messages
20 for message in response.messages:
21     if message.role == "assistant":
22         for part in message.content:
23             if part.type == "reasoning":
24                 print("Agent reasoning:")
25                 print(part.reasoning)
26 
27 # Or access reasoning directly from steps
28 for step in response.steps:
29     if step.reasoning_parts:
30         print(f"Step reasoning: {step.reasoning_parts}")
31 
32 instance.stop()

Best practices for multi-turn conversations

Maintain message history: Always use the returned messages from each call to maintain conversation context.
Clear instructions: Provide clear, specific instructions in each new user message.
Handle context length: For very long conversations, consider summarizing or truncating older messages to avoid exceeding model context limits.
Include visual context: Use screenshots when appropriate to provide additional context to the agent.
Monitor token usage: Track token usage through the usage field to prevent exceeding quotas or limits.
Process message parts: Parse and handle different message parts appropriately based on their type.

Simple multi-turn example

Here’s an interactive Read-Eval-Print Loop (REPL) implementation that allows you to have ongoing conversations with your agent:

Python

TypeScript

1 from scrapybara import Scrapybara
2 from scrapybara.anthropic import Anthropic
3 from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4 from scrapybara.tools import BashTool, ComputerTool, EditTool
5 
6 def agent_repl():
7     client = Scrapybara()
8     instance = client.start_ubuntu()
9     tools = [BashTool(instance), ComputerTool(instance), EditTool(instance)]
10     messages = []
11 
12     print("Scrapybara REPL started. Type 'exit' to quit")
13 
14     try:
15         while True:
16             # Get user input
17             user_input = input("\n> ")
18 
19             # Exit command
20             if user_input.lower() == 'exit':
21                 break
22 
23             # Regular text command
24             messages.append({
25                 "role": "user",
26                 "content": [{"type": "text", "text": user_input}]
27             })
28 
29             # Process with agent
30             print("Processing...")
31             response = client.act(
32                 model=Anthropic(),
33                 tools=tools,
34                 system=UBUNTU_SYSTEM_PROMPT,
35                 on_step=lambda step: print(step.text),
36                 messages=messages
37             )
38 
39             # Update conversation history
40             messages = response.messages
41     finally:
42         instance.stop()
43         print("Session ended.")
44 
45 if __name__ == "__main__":
46     agent_repl()