Conversations

Build stateful agents with persistent conversations

Understanding multi-turn conversations

Multi-turn conversations in Scrapybara enable your agents to maintain context and state across multiple interactions. The Act SDK provides a structured way to manage these conversations through its message architecture.

Message architecture

The Act SDK uses a structured message system with three primary message types and five different part types. Understanding these components is crucial for building sophisticated multi-turn agents.

Message types

1# Message types
2class UserMessage:
3 role: str = "user" # Always "user"
4 content: List[Union[TextPart, ImagePart]] # What the user sends
5
6class AssistantMessage:
7 role: str = "assistant" # Always "assistant"
8 content: List[Union[TextPart, ToolCallPart, ReasoningPart]] # The agent's response
9 response_id: Optional[str] = None # Unique identifier for the response
10
11class ToolMessage:
12 role: str = "tool" # Always "tool"
13 content: List[ToolResultPart] # Results from tool operations
14
15Message = Union[UserMessage, AssistantMessage, ToolMessage]

Message part types

Each message type contains various “parts” that serve different purposes:

1# Message part types
2class TextPart:
3 type: str = "text" # Always "text"
4 text: str # Plain text content
5
6class ImagePart:
7 type: str = "image" # Always "image"
8 image: str # Base64 encoded image or URL
9 mime_type: Optional[str] = None # e.g., "image/png", "image/jpeg"
10
11class ToolCallPart:
12 type: str = "tool-call" # Always "tool-call"
13 id: Optional[str] = None # Unique identifier for the tool call
14 tool_call_id: str # ID matching the tool result
15 tool_name: str # Name of the tool being called
16 args: dict[str, Any] # Arguments passed to the tool
17
18class ToolResultPart:
19 type: str = "tool-result" # Always "tool-result"
20 tool_call_id: str # ID matching the original tool call
21 tool_name: str # Name of the tool that was called
22 result: Any # Result returned by the tool
23 is_error: Optional[bool] = False # Whether the tool execution resulted in an error
24
25class ReasoningPart:
26 type: str = "reasoning" # Always "reasoning"
27 id: Optional[str] = None # Unique identifier for the reasoning part
28 reasoning: str # The agent's internal reasoning
29 signature: Optional[str] = None # Cryptographic signature for verification
30 instructions: Optional[str] = None # Additional context about the reasoning

Building multi-turn conversations

Instead of providing a single prompt, you can pass a complete message history using the messages parameter. This allows you to maintain the full conversation context. The Act SDK returns a messages field in the response that contains the complete conversation history. You can reuse this directly in your next act call.

1from scrapybara import Scrapybara
2from scrapybara.anthropic import Anthropic
3from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4from scrapybara.tools import BashTool, ComputerTool, EditTool
5
6client = Scrapybara()
7instance = client.start_ubuntu()
8
9# Initial conversation
10response = client.act(
11 model=Anthropic(),
12 tools=[
13 BashTool(instance),
14 ComputerTool(instance),
15 EditTool(instance),
16 ],
17 on_step=lambda step: print(step.text),
18 system=UBUNTU_SYSTEM_PROMPT,
19 prompt="Create a file called hello.py that prints 'Hello, World!'",
20)
21
22print('--------------------------------')
23
24# Continue the conversation with the previous messages
25follow_up_response = client.act(
26 model=Anthropic(),
27 tools=[
28 BashTool(instance),
29 ComputerTool(instance),
30 EditTool(instance),
31 ],
32 on_step=lambda step: print(step.text),
33 system=UBUNTU_SYSTEM_PROMPT,
34 messages=response.messages + [
35 {
36 "role": "user",
37 "content": [
38 {
39 "type": "text",
40 "text": "Now modify the file to accept a name as a command line argument and print 'Hello, {name}!'"
41 }
42 ]
43 }
44 ]
45)
46
47instance.stop()

Including screenshots in messages

Screenshots are a powerful way to provide visual context to your agent. You can include them in user messages using the ImagePart type.

1from scrapybara import Scrapybara
2from scrapybara.anthropic import Anthropic
3from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4
5client = Scrapybara()
6instance = client.start_ubuntu()
7
8# Take a screenshot
9screenshot = instance.screenshot().base_64_image
10
11# Send the screenshot to the agent
12messages = [
13 {
14 "role": "user",
15 "content": [
16 {
17 "type": "text",
18 "text": "What do you see in this screenshot? Describe the desktop environment."
19 },
20 {
21 "type": "image",
22 "image": 'data:image/png;base64,' + screenshot,
23 "mime_type": "image/png"
24 }
25 ]
26 }
27]
28
29response = client.act(
30 model=Anthropic(),
31 system=UBUNTU_SYSTEM_PROMPT,
32 messages=messages,
33)
34
35print(response.text)
36instance.stop()

Working with tools and reasoning

The Act SDK captures both tool calls and agent reasoning in its message architecture. Here’s how you can access and work with this information:

Examining tool calls and results

1from scrapybara import Scrapybara
2from scrapybara.anthropic import Anthropic
3from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4from scrapybara.tools import BashTool
5
6client = Scrapybara()
7instance = client.start_ubuntu()
8
9response = client.act(
10 model=Anthropic(),
11 tools=[BashTool(instance)],
12 system=UBUNTU_SYSTEM_PROMPT,
13 prompt="Show me the current directory structure",
14)
15
16# Analyze the conversation steps
17for message in response.messages:
18 if message.role == "assistant":
19 for part in message.content:
20 if part.type == "tool-call":
21 print(f"Tool called: {part.tool_name}")
22 print(f"Arguments: {part.args}")
23 elif message.role == "tool":
24 for part in message.content:
25 print(f"Tool result from {part.tool_name}: {part.result}")
26
27instance.stop()

Accessing agent reasoning

1from scrapybara import Scrapybara
2from scrapybara.anthropic import Anthropic
3from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4from scrapybara.tools import BashTool, ComputerTool
5
6client = Scrapybara()
7instance = client.start_ubuntu()
8
9response = client.act(
10 model=Anthropic(name="claude-3-7-sonnet-20250219-thinking"),
11 tools=[
12 BashTool(instance),
13 ComputerTool(instance),
14 ],
15 system=UBUNTU_SYSTEM_PROMPT,
16 prompt="Open Firefox and navigate to scrapybara.com",
17)
18
19# Extract reasoning parts from assistant messages
20for message in response.messages:
21 if message.role == "assistant":
22 for part in message.content:
23 if part.type == "reasoning":
24 print("Agent reasoning:")
25 print(part.reasoning)
26
27# Or access reasoning directly from steps
28for step in response.steps:
29 if step.reasoning_parts:
30 print(f"Step reasoning: {step.reasoning_parts}")
31
32instance.stop()

Best practices for multi-turn conversations

  1. Maintain message history: Always use the returned messages from each call to maintain conversation context.

  2. Clear instructions: Provide clear, specific instructions in each new user message.

  3. Handle context length: For very long conversations, consider summarizing or truncating older messages to avoid exceeding model context limits.

  4. Include visual context: Use screenshots when appropriate to provide additional context to the agent.

  5. Monitor token usage: Track token usage through the usage field to prevent exceeding quotas or limits.

  6. Process message parts: Parse and handle different message parts appropriately based on their type.

Simple multi-turn example

Here’s an interactive Read-Eval-Print Loop (REPL) implementation that allows you to have ongoing conversations with your agent:

1from scrapybara import Scrapybara
2from scrapybara.anthropic import Anthropic
3from scrapybara.prompts import UBUNTU_SYSTEM_PROMPT
4from scrapybara.tools import BashTool, ComputerTool, EditTool
5
6def agent_repl():
7 client = Scrapybara()
8 instance = client.start_ubuntu()
9 tools = [BashTool(instance), ComputerTool(instance), EditTool(instance)]
10 messages = []
11
12 print("Scrapybara REPL started. Type 'exit' to quit")
13
14 try:
15 while True:
16 # Get user input
17 user_input = input("\n> ")
18
19 # Exit command
20 if user_input.lower() == 'exit':
21 break
22
23 # Regular text command
24 messages.append({
25 "role": "user",
26 "content": [{"type": "text", "text": user_input}]
27 })
28
29 # Process with agent
30 print("Processing...")
31 response = client.act(
32 model=Anthropic(),
33 tools=tools,
34 system=UBUNTU_SYSTEM_PROMPT,
35 on_step=lambda step: print(step.text),
36 messages=messages
37 )
38
39 # Update conversation history
40 messages = response.messages
41 finally:
42 instance.stop()
43 print("Session ended.")
44
45if __name__ == "__main__":
46 agent_repl()