Strands TypeScript SDK - Building Production AI Agents

January 30, 2026|7 min read|By Martin Mueller|RSS

Introduction

Building AI agents that work in production requires more than wrapping an LLM API. You need tool execution, streaming responses, cost management, and integration with existing systems. After evaluating several frameworks for ai-secure.dev, I chose the Strands TypeScript SDK from AWS.

Why Strands over alternatives?

Framework	Pros	Cons
LangChain	Feature-rich, large ecosystem	Heavy, complex abstractions
crewAI	Multi-agent orchestration, role-based agents	Python-focused, heavier runtime
Raw Anthropic/OpenAI API	Full control	Too low-level, no tool orchestration
Strands SDK	Lightweight, AWS-native, streaming-first	Newer, smaller community

Strands hits the sweet spot: enough abstraction to be productive, low enough to maintain control. It's what I used to build the security audit agent behind ai-secure.dev.

Agent Creation Basics

Creating an agent requires three things: a model, a system prompt, and tools.

import { Agent, tool } from '@strands-agents/sdk'
import { z } from 'zod'

const agent = new Agent({
  model,                    // BedrockModel or custom provider
  systemPrompt: `You are a security auditor...`,
  tools: [httpSecurityCheck, dnsLookup, browserNavigate],
})

// Invoke the agent
const response = await agent.invoke('Audit https://example.com')

// Or stream for real-time updates
for await (const event of agent.stream(prompt)) {
  // Handle events: text deltas, tool calls, metadata
}

The SDK handles the agentic loop: model generates response → tool calls extracted → tools executed → results fed back → repeat until done.

Defining Tools

Tools are functions the agent can call. The tool() helper wraps them with Zod schema validation:

const calculatorTool = tool({
  name: 'calculator',
  description: 'Performs arithmetic. Params: operation, a, b',
  inputSchema: z.object({
    operation: z.enum(['add', 'subtract', 'multiply', 'divide']),
    a: z.number(),
    b: z.number(),
  }),
  callback: (input) => {
    let result: number
    switch (input.operation) {
      case 'add': result = input.a + input.b; break
      case 'subtract': result = input.a - input.b; break
      // ...
    }
    return `Result: ${result}`
  },
})

For domain-specific agents, design tools around your use case. My security agent has tools like:

http_security_check - Headers, TLS inspection, redirect chain
dns_lookup - SPF/DMARC/CAA records
browser_navigate - Navigate and interact with pages
totp - Generate 2FA codes for authenticated scans

Complex tool example (abbreviated):

const httpSecurityCheckTool = tool({
  name: 'http_security_check',
  description: 'HTTP security analysis: headers, TLS cert, redirects',
  inputSchema: z.object({
    url: z.string().describe('URL to check'),
    method: z.enum(['GET', 'HEAD', 'OPTIONS']).optional(),
    includeTls: z.boolean().optional(),
  }),
  callback: async (input) => {
    // Make request, inspect TLS socket, check headers
    const securityHeaders = ['strict-transport-security', 'content-security-policy', ...]
    // ... implementation
    return JSON.stringify({ url, statusCode, securityHeaders, tls })
  },
})

Tools are the agent's "hands" - design them for your domain, not as generic utilities.

Custom Model Provider

The SDK includes BedrockModel for AWS Bedrock, but you can create custom providers. I built AnthropicModel for direct Anthropic API access with features like message caching:

export class AnthropicModel {
  constructor(config: AnthropicModelConfig) {
    this.client = new Anthropic({ apiKey: config.apiKey })
    this.config = {
      modelId: config.modelId || 'claude-sonnet-4-5-20250929',
      maxTokens: config.maxTokens || 16000,
      enableMessageCaching: config.enableMessageCaching ?? true,
    }
  }

  async *stream(messages, options) {
    // Convert messages to Anthropic format
    // Add cache_control blocks for cost reduction
    // Yield SDK-compatible events
  }
}

Message caching reduces costs by 90% on repeated context. Add cache_control to strategic messages:

// Cache system prompt (reused every call)
request.system = [{
  type: 'text',
  text: systemPrompt,
  cache_control: { type: 'ephemeral', ttl: '1h' }
}]

// Cache last tool definition
tools[tools.length - 1].cache_control = { type: 'ephemeral', ttl: '1h' }

Cost tracking built into the model:

const MODEL_PRICING = {
  'claude-sonnet-4-5-20250929': { input: 3.00, output: 15.00, cacheRead: 0.30 },
  'claude-haiku-4-5-20251001': { input: 1.00, output: 5.00, cacheRead: 0.10 },
}

function calculateCost(modelId, inputTokens, outputTokens, cacheReadTokens) {
  const pricing = MODEL_PRICING[modelId]
  return (inputTokens * pricing.input + outputTokens * pricing.output 
          + cacheReadTokens * pricing.cacheRead) / 1_000_000
}

Model Routing for Cost Optimization

Not every request needs your most powerful model. Route simple tasks to cheaper models:

function classifyTask(prompt: string) {
  const lower = prompt.toLowerCase()
  
  // Complex patterns → Sonnet
  const complexPatterns = [
    /security|vulnerabil|audit/i,
    /iso\s*27001|compliance/i,
    /investigate|analyze|assess/i,
  ]
  
  // Simple patterns → Haiku (10x cheaper)
  const simplePatterns = [
    /^(hi|hello|hey)/i,
    /^(thanks|thank\s*you)/i,
    /^(yes|no|ok)/i,
  ]
  
  for (const pattern of complexPatterns) {
    if (pattern.test(prompt)) {
      return { complexity: 'complex', model: 'claude-sonnet-4-5' }
    }
  }
  
  for (const pattern of simplePatterns) {
    if (pattern.test(lower)) {
      return { complexity: 'simple', model: 'claude-haiku-4-5' }
    }
  }
  
  // URLs always complex (security audits need full power)
  if (prompt.includes('http://') || prompt.includes('https://')) {
    return { complexity: 'complex', model: 'claude-sonnet-4-5' }
  }
  
  return { complexity: 'complex', model: 'claude-sonnet-4-5' } // Default safe
}

Log cost comparisons in production to validate routing:

📊 Tokens: 15420 in, 2341 out | $0.0812 (sonnet-4-5)
   Alternative: $0.4102 (opus-4-5) → +$0.329 (+405%)

Streaming Architecture

For real-time UX, stream agent events via Server-Sent Events (SSE):

app.post('/invocations', async (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream')
  res.setHeader('Cache-Control', 'no-cache')
  
  const sendEvent = (type, data) => {
    res.write(`data: ${JSON.stringify({ type, ...data })}\n\n`)
  }
  
  for await (const event of agent.stream(prompt)) {
    // Text streaming
    if (event.type === 'modelContentBlockDeltaEvent') {
      const delta = event.delta
      if (delta?.type === 'textDelta') {
        sendEvent('text', { content: delta.text })
      }
    }
    
    // Tool execution tracking
    if (event.type === 'modelContentBlockStartEvent') {
      const start = event.start
      if (start?.type === 'toolUseStart') {
        sendEvent('tool_start', { tool: start.name })
      }
    }
    
    if (event.type === 'afterToolsEvent') {
      sendEvent('tool_end', { tool: currentTool })
    }
    
    // Token usage
    if (event.type === 'modelMetadataEvent') {
      totalTokens += event.usage?.totalTokens || 0
    }
  }
  
  sendEvent('done', { usage: { totalTokens } })
  res.end()
})

Key event types:

Event	When	Use
`modelContentBlockDeltaEvent`	Text/tool input streaming	Real-time display
`modelContentBlockStartEvent`	Tool call begins	Show "Analyzing..."
`afterToolsEvent`	Tool finished	Show result
`modelMetadataEvent`	Tokens counted	Cost tracking

OpenAI-Compatible Adapter

Why build this? During development, I needed to chat with my agent without building a UI first. By exposing an OpenAI-compatible endpoint, I could use Cline (VS Code extension) as my interface - instant chat UI for free.

This let me iterate on tools and prompts rapidly before touching frontend code.

export function createOpenAIAdapter(config) {
  const router = Router()
  
  router.get('/v1/models', (_, res) => {
    res.json({
      data: [{ id: config.modelName, owned_by: 'strands-agents' }]
    })
  })
  
  router.post('/v1/chat/completions', async (req, res) => {
    const { messages, stream } = req.body
    const prompt = extractPromptFromMessages(messages)
    
    const { agent } = config.createAgent()
    
    if (stream) {
      // Stream SSE chunks in OpenAI format
      res.setHeader('Content-Type', 'text/event-stream')
      for await (const event of agent.stream(prompt)) {
        // Convert to OpenAI chunk format
        res.write(`data: ${JSON.stringify(chunk)}\n\n`)
      }
      res.write('data: [DONE]\n\n')
    } else {
      // Collect and return
      const response = await agent.invoke(prompt)
      res.json({ choices: [{ message: { content: response } }] })
    }
  })
  
  return router
}

// Mount the adapter
app.use(createOpenAIAdapter({ modelName: 'security-agent', createAgent }))

Now point Cline at http://localhost:8080/v1 and it works.

Production Tips

Session management with TTL:

const sessions = new Map<string, Session>()
const SESSION_TTL_MS = 30 * 60 * 1000 // 30 min

setInterval(() => {
  const now = Date.now()
  for (const [id, session] of sessions) {
    if (now - session.lastAccessedAt > SESSION_TTL_MS) {
      sessions.delete(id)
    }
  }
}, 60 * 1000)

Issue tracking during scans:

const issueTrackerTool = tool({
  name: 'issue_tracker',
  description: 'Track problems during audit: auth failures, timeouts, etc.',
  inputSchema: z.object({
    type: z.enum(['auth_failed', 'access_denied', 'timeout', 'credentials_required']),
    title: z.string(),
    description: z.string(),
  }),
  callback: (input) => {
    session.issues.push(input)
    return `Issue tracked: ${input.title}`
  },
})

Include issues in the final report so users know what couldn't be tested.

Architecture Overview

┌─────────────┐     ┌──────────────────┐     ┌─────────────┐
│   Frontend  │────▶│  Agent Server    │────▶│   Tools     │
│   (Next.js) │     │  (Strands SDK)   │     │ (http, dns, │
└─────────────┘     │                  │     │  browser)   │
      ▲             └────────┬─────────┘     └─────────────┘
      │                      │
   SSE Events           ┌────▼────┐
                        │  Model  │
                        │ Provider│
                        └─────────┘

Cost comparison (per 1M tokens):

Model	Input	Output	Cache Read	Best For
Haiku 4.5	$1.00	$5.00	$0.10	Simple queries, greetings
Sonnet 4.5	$3.00	$15.00	$0.30	Security audits, analysis
Opus 4.5	$5.00	$25.00	$0.50	Complex reasoning

With routing + caching, typical security audit costs ~$0.08-0.15 vs $0.40+ without.

Conclusion

The Strands TypeScript SDK provides a solid foundation for building production AI agents. Key takeaways:

Tools are everything - Design domain-specific tools, not generic utilities
Cache aggressively - Message caching saves 90% on repeated context
Route by complexity - Not every request needs your best model
Stream for UX - Users need to see progress during long operations
Track costs - Log token usage and compare models in production

The SDK handles the agentic loop so you can focus on domain logic. For ai-secure.dev, that meant security analysis - not prompt engineering infrastructure.

Questions or building your own agent? Connect on LinkedIn.

Tagged in: aws ai-agent strands typescript eng 2026 | RSS

Martin Mueller's Blog || Agency || Podcast || Resume