Strands TypeScript SDK - Building Production AI Agents
Introduction
Building AI agents that work in production requires more than wrapping an LLM API. You need tool execution, streaming responses, cost management, and integration with existing systems. After evaluating several frameworks for ai-secure.dev, I chose the Strands TypeScript SDK from AWS.
Why Strands over alternatives?
| Framework | Pros | Cons |
|---|---|---|
| LangChain | Feature-rich, large ecosystem | Heavy, complex abstractions |
| crewAI | Multi-agent orchestration, role-based agents | Python-focused, heavier runtime |
| Raw Anthropic/OpenAI API | Full control | Too low-level, no tool orchestration |
| Strands SDK | Lightweight, AWS-native, streaming-first | Newer, smaller community |
Strands hits the sweet spot: enough abstraction to be productive, low enough to maintain control. It's what I used to build the security audit agent behind ai-secure.dev.
Agent Creation Basics
Creating an agent requires three things: a model, a system prompt, and tools.
import { Agent, tool } from '@strands-agents/sdk'
import { z } from 'zod'
const agent = new Agent({
model, // BedrockModel or custom provider
systemPrompt: `You are a security auditor...`,
tools: [httpSecurityCheck, dnsLookup, browserNavigate],
})
// Invoke the agent
const response = await agent.invoke('Audit https://example.com')
// Or stream for real-time updates
for await (const event of agent.stream(prompt)) {
// Handle events: text deltas, tool calls, metadata
}The SDK handles the agentic loop: model generates response → tool calls extracted → tools executed → results fed back → repeat until done.
Defining Tools
Tools are functions the agent can call. The tool() helper wraps them with Zod schema validation:
const calculatorTool = tool({
name: 'calculator',
description: 'Performs arithmetic. Params: operation, a, b',
inputSchema: z.object({
operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
a: z.number(),
b: z.number(),
}),
callback: (input) => {
let result: number
switch (input.operation) {
case 'add': result = input.a + input.b; break
case 'subtract': result = input.a - input.b; break
// ...
}
return `Result: ${result}`
},
})For domain-specific agents, design tools around your use case. My security agent has tools like:
http_security_check- Headers, TLS inspection, redirect chaindns_lookup- SPF/DMARC/CAA recordsbrowser_navigate- Navigate and interact with pagestotp- Generate 2FA codes for authenticated scans
Complex tool example (abbreviated):
const httpSecurityCheckTool = tool({
name: 'http_security_check',
description: 'HTTP security analysis: headers, TLS cert, redirects',
inputSchema: z.object({
url: z.string().describe('URL to check'),
method: z.enum(['GET', 'HEAD', 'OPTIONS']).optional(),
includeTls: z.boolean().optional(),
}),
callback: async (input) => {
// Make request, inspect TLS socket, check headers
const securityHeaders = ['strict-transport-security', 'content-security-policy', ...]
// ... implementation
return JSON.stringify({ url, statusCode, securityHeaders, tls })
},
})Tools are the agent's "hands" - design them for your domain, not as generic utilities.
Custom Model Provider
The SDK includes BedrockModel for AWS Bedrock, but you can create custom providers. I built AnthropicModel for direct Anthropic API access with features like message caching:
export class AnthropicModel {
constructor(config: AnthropicModelConfig) {
this.client = new Anthropic({ apiKey: config.apiKey })
this.config = {
modelId: config.modelId || 'claude-sonnet-4-5-20250929',
maxTokens: config.maxTokens || 16000,
enableMessageCaching: config.enableMessageCaching ?? true,
}
}
async *stream(messages, options) {
// Convert messages to Anthropic format
// Add cache_control blocks for cost reduction
// Yield SDK-compatible events
}
}Message caching reduces costs by 90% on repeated context. Add cache_control to strategic messages:
// Cache system prompt (reused every call)
request.system = [{
type: 'text',
text: systemPrompt,
cache_control: { type: 'ephemeral', ttl: '1h' }
}]
// Cache last tool definition
tools[tools.length - 1].cache_control = { type: 'ephemeral', ttl: '1h' }Cost tracking built into the model:
const MODEL_PRICING = {
'claude-sonnet-4-5-20250929': { input: 3.00, output: 15.00, cacheRead: 0.30 },
'claude-haiku-4-5-20251001': { input: 1.00, output: 5.00, cacheRead: 0.10 },
}
function calculateCost(modelId, inputTokens, outputTokens, cacheReadTokens) {
const pricing = MODEL_PRICING[modelId]
return (inputTokens * pricing.input + outputTokens * pricing.output
+ cacheReadTokens * pricing.cacheRead) / 1_000_000
}Model Routing for Cost Optimization
Not every request needs your most powerful model. Route simple tasks to cheaper models:
function classifyTask(prompt: string) {
const lower = prompt.toLowerCase()
// Complex patterns → Sonnet
const complexPatterns = [
/security|vulnerabil|audit/i,
/iso\s*27001|compliance/i,
/investigate|analyze|assess/i,
]
// Simple patterns → Haiku (10x cheaper)
const simplePatterns = [
/^(hi|hello|hey)/i,
/^(thanks|thank\s*you)/i,
/^(yes|no|ok)/i,
]
for (const pattern of complexPatterns) {
if (pattern.test(prompt)) {
return { complexity: 'complex', model: 'claude-sonnet-4-5' }
}
}
for (const pattern of simplePatterns) {
if (pattern.test(lower)) {
return { complexity: 'simple', model: 'claude-haiku-4-5' }
}
}
// URLs always complex (security audits need full power)
if (prompt.includes('http://') || prompt.includes('https://')) {
return { complexity: 'complex', model: 'claude-sonnet-4-5' }
}
return { complexity: 'complex', model: 'claude-sonnet-4-5' } // Default safe
}Log cost comparisons in production to validate routing:
📊 Tokens: 15420 in, 2341 out | $0.0812 (sonnet-4-5)
Alternative: $0.4102 (opus-4-5) → +$0.329 (+405%)Streaming Architecture
For real-time UX, stream agent events via Server-Sent Events (SSE):
app.post('/invocations', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream')
res.setHeader('Cache-Control', 'no-cache')
const sendEvent = (type, data) => {
res.write(`data: ${JSON.stringify({ type, ...data })}\n\n`)
}
for await (const event of agent.stream(prompt)) {
// Text streaming
if (event.type === 'modelContentBlockDeltaEvent') {
const delta = event.delta
if (delta?.type === 'textDelta') {
sendEvent('text', { content: delta.text })
}
}
// Tool execution tracking
if (event.type === 'modelContentBlockStartEvent') {
const start = event.start
if (start?.type === 'toolUseStart') {
sendEvent('tool_start', { tool: start.name })
}
}
if (event.type === 'afterToolsEvent') {
sendEvent('tool_end', { tool: currentTool })
}
// Token usage
if (event.type === 'modelMetadataEvent') {
totalTokens += event.usage?.totalTokens || 0
}
}
sendEvent('done', { usage: { totalTokens } })
res.end()
})Key event types:
| Event | When | Use |
|---|---|---|
modelContentBlockDeltaEvent |
Text/tool input streaming | Real-time display |
modelContentBlockStartEvent |
Tool call begins | Show "Analyzing..." |
afterToolsEvent |
Tool finished | Show result |
modelMetadataEvent |
Tokens counted | Cost tracking |
OpenAI-Compatible Adapter
Why build this? During development, I needed to chat with my agent without building a UI first. By exposing an OpenAI-compatible endpoint, I could use Cline (VS Code extension) as my interface - instant chat UI for free.
This let me iterate on tools and prompts rapidly before touching frontend code.
export function createOpenAIAdapter(config) {
const router = Router()
router.get('/v1/models', (_, res) => {
res.json({
data: [{ id: config.modelName, owned_by: 'strands-agents' }]
})
})
router.post('/v1/chat/completions', async (req, res) => {
const { messages, stream } = req.body
const prompt = extractPromptFromMessages(messages)
const { agent } = config.createAgent()
if (stream) {
// Stream SSE chunks in OpenAI format
res.setHeader('Content-Type', 'text/event-stream')
for await (const event of agent.stream(prompt)) {
// Convert to OpenAI chunk format
res.write(`data: ${JSON.stringify(chunk)}\n\n`)
}
res.write('data: [DONE]\n\n')
} else {
// Collect and return
const response = await agent.invoke(prompt)
res.json({ choices: [{ message: { content: response } }] })
}
})
return router
}
// Mount the adapter
app.use(createOpenAIAdapter({ modelName: 'security-agent', createAgent }))Now point Cline at http://localhost:8080/v1 and it works.
Production Tips
Session management with TTL:
const sessions = new Map<string, Session>()
const SESSION_TTL_MS = 30 * 60 * 1000 // 30 min
setInterval(() => {
const now = Date.now()
for (const [id, session] of sessions) {
if (now - session.lastAccessedAt > SESSION_TTL_MS) {
sessions.delete(id)
}
}
}, 60 * 1000)Issue tracking during scans:
const issueTrackerTool = tool({
name: 'issue_tracker',
description: 'Track problems during audit: auth failures, timeouts, etc.',
inputSchema: z.object({
type: z.enum(['auth_failed', 'access_denied', 'timeout', 'credentials_required']),
title: z.string(),
description: z.string(),
}),
callback: (input) => {
session.issues.push(input)
return `Issue tracked: ${input.title}`
},
})Include issues in the final report so users know what couldn't be tested.
Architecture Overview
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Frontend │────▶│ Agent Server │────▶│ Tools │
│ (Next.js) │ │ (Strands SDK) │ │ (http, dns, │
└─────────────┘ │ │ │ browser) │
▲ └────────┬─────────┘ └─────────────┘
│ │
SSE Events ┌────▼────┐
│ Model │
│ Provider│
└─────────┘Cost comparison (per 1M tokens):
| Model | Input | Output | Cache Read | Best For |
|---|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | Simple queries, greetings |
| Sonnet 4.5 | $3.00 | $15.00 | $0.30 | Security audits, analysis |
| Opus 4.5 | $5.00 | $25.00 | $0.50 | Complex reasoning |
With routing + caching, typical security audit costs ~$0.08-0.15 vs $0.40+ without.
Conclusion
The Strands TypeScript SDK provides a solid foundation for building production AI agents. Key takeaways:
- Tools are everything - Design domain-specific tools, not generic utilities
- Cache aggressively - Message caching saves 90% on repeated context
- Route by complexity - Not every request needs your best model
- Stream for UX - Users need to see progress during long operations
- Track costs - Log token usage and compare models in production
The SDK handles the agentic loop so you can focus on domain logic. For ai-secure.dev, that meant security analysis - not prompt engineering infrastructure.
Questions or building your own agent? Connect on LinkedIn.