A technical deep-dive into how I integrate AI capabilities across four production Go projects.
I use the regular Anthropic HTTP API (sometimes via SDK, sometimes custom client) - NOT the Claude Agent SDK. The Agent SDK is for building autonomous AI agents; my projects use AI as a feature within larger applications.
| Insight | Why It Matters |
|---|---|
| Custom HTTP > SDK when you need cost tracking | SDKs abstract away token counts. For billing features, you need raw access to usage data |
| Local models (Ollama) for high-volume tasks | Scraping 10,000 pages at $0.01/page = $100. With Ollama = $0 |
| Model selection should be user-facing | Haiku is 5x cheaper than Opus. Let users choose their cost/quality tradeoff |
| Always have fallbacks | AI fails. Have: expensive model → cheap model → local model → non-AI fallback |
| Async + polling for AI tasks | AI calls take 10-60+ seconds. Never block the UI |
| Task | Best Choice | Why |
|---|---|---|
| Complex reasoning, detailed instructions | Claude Opus 4.5 | Best instruction following, highest quality |
| Simple extraction, cost-sensitive | Claude Haiku 4.5 | 5x cheaper, fast, good enough for simple tasks |
| High-volume batch processing | Ollama (local) | $0/request, no rate limits |
| Image generation | Gemini | Claude doesn't generate images |
| Embeddings for RAG | OpenAI text-embedding-3-small | Best price/performance for vectors |
| Project | AI Provider | Why This Choice |
|---|---|---|
| Time Tracker | Claude (custom HTTP) | Need per-request cost tracking for billing visibility |
| Gopher Guides | Claude SDK + OpenAI embeddings | Simpler integration, RAG needs vectors |
| Logan's 3D | Google Gemini | Need image generation (Claude can't do this) |
| Pyro.show | Ollama (local) | Scraping thousands of pages - API costs would be insane |
What it does: AI-powered time entry review and invoice auditing for freelance billing
This project tracks AI costs per user session and displays them in the UI. The official SDK abstracts away token counts, making it hard to:
- Calculate exact cost per API call
- Store token usage in the database
- Show users "this review cost $0.12"
Building a custom client gave me direct access to the usage field in every response.
// service/claude/client.go
//
// WHY CUSTOM: I need to track every token for cost visibility.
// The SDK doesn't expose usage data in a way that's easy to persist.
package claude
const (
apiURL = "https://api.anthropic.com/v1/messages"
anthropicVersion = "2023-06-01"
defaultMaxTokens = 8192
// Model IDs - Dec 2025
ModelHaiku45 = "claude-haiku-4-5-20251001"
ModelSonnet45 = "claude-sonnet-4-5-20250929"
ModelOpus45 = "claude-opus-4-5-20251101"
// Pricing per million tokens - hardcoded for cost calculation
haikuInputPricePerMTok = 1.0
haikuOutputPricePerMTok = 5.0
sonnetInputPricePerMTok = 3.0
sonnetOutputPricePerMTok = 15.0
opusInputPricePerMTok = 5.0
opusOutputPricePerMTok = 25.0
)
type Client struct {
apiKey string
httpClient *http.Client
model string
maxTokens int
}
// Usage is exposed directly - this is why I didn't use the SDK
type Usage struct {
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
}
type Response struct {
ID string `json:"id"`
Content []ContentBlock `json:"content"`
Model string `json:"model"`
Usage Usage `json:"usage"` // <-- Direct access to this
}
func NewClientWithConfig(apiKey, model string, maxTokens int) *Client {
return &Client{
apiKey: apiKey,
httpClient: &http.Client{
Timeout: 120 * time.Second, // AI processing is slow
},
model: model,
maxTokens: maxTokens,
}
}
func (c *Client) SendMessage(ctx context.Context, system, userMessage string) (*Response, error) {
req := Request{
Model: c.model,
MaxTokens: c.maxTokens,
System: system,
Messages: []Message{{Role: "user", Content: userMessage}},
}
return c.send(ctx, req)
}
func (c *Client) send(ctx context.Context, req Request) (*Response, error) {
body, _ := json.Marshal(req)
httpReq, _ := http.NewRequestWithContext(ctx, http.MethodPost, apiURL, bytes.NewReader(body))
httpReq.Header.Set("Content-Type", "application/json")
httpReq.Header.Set("x-api-key", c.apiKey)
httpReq.Header.Set("anthropic-version", anthropicVersion)
resp, err := c.httpClient.Do(httpReq)
if err != nil {
return nil, err
}
defer resp.Body.Close()
var response Response
json.NewDecoder(resp.Body).Decode(&response)
return &response, nil
}Every AI call gets logged with cost breakdown:
// WHY: Users see "This review cost $0.08" in the UI.
// Also helps me understand which features are expensive.
func CalculateCostForModel(model string, inputTokens, outputTokens int) float64 {
var inputPrice, outputPrice float64
switch model {
case ModelHaiku45:
inputPrice, outputPrice = 1.0, 5.0
case ModelOpus45:
inputPrice, outputPrice = 5.0, 25.0
default: // Sonnet
inputPrice, outputPrice = 3.0, 15.0
}
inputCost := (float64(inputTokens) / 1_000_000) * inputPrice
outputCost := (float64(outputTokens) / 1_000_000) * outputPrice
return inputCost + outputCost
}-- WHY: Full audit trail of every AI interaction.
-- Can analyze: which clients use AI most, average cost per review, etc.
CREATE TABLE ai_review_sessions (
id TEXT PRIMARY KEY,
client_id TEXT,
user_id TEXT,
entity_type TEXT, -- 'time_entry' or 'invoice'
status TEXT, -- pending, processing, completed, failed
request_data JSON,
response_data JSON,
model_used TEXT,
input_tokens INTEGER,
output_tokens INTEGER,
cost_usd REAL,
created_at TIMESTAMP,
completed_at TIMESTAMP
);What it does: Marketing content generation + MCP-based code review RAG system
Unlike Time Tracker, this project doesn't need per-request cost tracking. Marketing campaigns are infrequent (maybe 10/month), so cost visibility isn't critical. The SDK is simpler:
// pkg/claude/client.go
//
// WHY SDK: Marketing content generation is low-volume.
// Don't need granular cost tracking. SDK is simpler.
package claude
import (
"context"
"github.com/anthropics/anthropic-sdk-go"
"github.com/anthropics/anthropic-sdk-go/option"
)
type Client struct {
client anthropic.Client
httpClient *http.Client
}
func NewClient(apiKey string) *Client {
return &Client{
client: anthropic.NewClient(option.WithAPIKey(apiKey)),
httpClient: &http.Client{Timeout: 30 * time.Second},
}
}
func (c *Client) GenerateMarketingContent(ctx context.Context, url, direction string) (*MarketingContent, error) {
// Fetch the page content to use as context
pageContent, err := c.fetchPageContent(ctx, url)
if err != nil {
return nil, err
}
prompt := buildMarketingPrompt(url, pageContent, direction)
// SDK makes this clean - no manual JSON marshaling
message, err := c.client.Messages.New(ctx, anthropic.MessageNewParams{
Model: anthropic.ModelClaudeOpus4_5_20251101,
MaxTokens: 4096,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock(prompt)),
},
})
if err != nil {
return nil, err
}
var content MarketingContent
json.Unmarshal([]byte(message.Content[0].Text), &content)
return &content, nil
}OpenAI doesn't have an official Go SDK I liked. Plus, embeddings have tricky batching requirements that I wanted full control over:
// pkg/mcp/rag/embeddings.go
//
// WHY CUSTOM: OpenAI's batch limits are complex:
// - 8,191 tokens per input
// - ~300K tokens per batch (undocumented!)
// - Need intelligent batching to maximize throughput without hitting limits
type EmbeddingGenerator struct {
apiKey string
model string // "text-embedding-3-small"
batchSize int
client *http.Client
}
func (eg *EmbeddingGenerator) GenerateEmbeddings(chunks []Chunk) ([][]float32, error) {
const maxTokensPerInput = 8191 // OpenAI's documented limit
const maxTokensPerBatch = 250000 // Discovered through trial and error
var allEmbeddings [][]float32
var currentBatch []string
var currentTokens int
for i, chunk := range chunks {
chunkTokens := chunk.Metadata["tokens"].(int)
// Validate - fail fast if a chunk is too big
if chunkTokens > maxTokensPerInput {
return nil, fmt.Errorf("chunk %d exceeds limit: %d tokens", i, chunkTokens)
}
// Smart batching - send when we'd exceed the limit
if currentTokens+chunkTokens > maxTokensPerBatch {
embeddings, _ := eg.callOpenAI(currentBatch)
allEmbeddings = append(allEmbeddings, embeddings...)
currentBatch = nil
currentTokens = 0
time.Sleep(100 * time.Millisecond) // Respect rate limits
}
currentBatch = append(currentBatch, chunk.Content)
currentTokens += chunkTokens
}
// Don't forget the final batch
if len(currentBatch) > 0 {
embeddings, _ := eg.callOpenAI(currentBatch)
allEmbeddings = append(allEmbeddings, embeddings...)
}
return allEmbeddings, nil
}
// Retry with exponential backoff - OpenAI has aggressive rate limits
func (eg *EmbeddingGenerator) callOpenAI(texts []string) ([][]float32, error) {
const maxRetries = 3
for attempt := 0; attempt <= maxRetries; attempt++ {
embeddings, err := eg.callOpenAIOnce(texts)
if err == nil {
return embeddings, nil
}
if !isRetryableError(err) {
return nil, err // Don't retry client errors
}
delay := time.Duration(float64(time.Second) * float64(4*attempt))
jitter := time.Duration(rand.Float64() * float64(delay) * 0.1)
time.Sleep(delay + jitter)
}
return nil, fmt.Errorf("failed after %d retries", maxRetries)
}This exposes Gopher Guides training content as tools that Claude Code can use:
// pkg/mcp/server/server.go
//
// WHY MCP: Lets Claude Code query our training materials.
// "audit_code" tool checks code against our best practices.
import "github.com/mark3labs/mcp-go/mcp"
func (s *Server) RegisterTools(mcpServer *mcp.Server) {
mcpServer.AddTool(mcp.Tool{
Name: "audit_code",
Description: "Audit Go code against Gopher Guides best practices",
InputSchema: mcp.ToolInputSchema{
Type: "object",
Properties: map[string]interface{}{
"code": map[string]string{"type": "string", "description": "Go code to audit"},
"focus": map[string]string{"type": "string", "description": "Focus area"},
},
Required: []string{"code"},
},
}, s.handleAuditCode)
}
func (s *Server) handleAuditCode(args map[string]interface{}) (interface{}, error) {
code := args["code"].(string)
// Generate embedding for semantic search
embedding, _ := s.search.eg.GenerateEmbedding(code)
// Find relevant training content via vector similarity
results, _ := s.search.Search(embedding, 10)
// Re-rank to prioritize prescriptive content (DO/DON'T) over examples
ranked := s.rerank(results)
return formatAuditResults(ranked), nil
}What it does: E-commerce site for 3D printed products with AI-generated OG images
Claude can't generate images. For product preview images, I needed a model that can create visuals. Gemini's multimodal capabilities let me:
- Send product photos as input
- Get a composed marketing image as output
// internal/ogimage/ai_generator.go
//
// WHY GEMINI: Claude doesn't do image generation.
// Gemini can take product photos and create composed marketing images.
type AIGenerator struct {
apiKey string
httpClient *http.Client
}
const (
// Primary model - best quality
primaryModel = "gemini-3-pro-image-preview"
// Fallback - faster, cheaper, more reliable
fallbackModel = "gemini-2.5-flash-preview-05-20"
)
func (g *AIGenerator) GenerateMultiVariantOGImage(
product Product,
images [][]byte,
) ([]byte, string, error) {
// WHY FALLBACK: AI services fail. Always have a backup.
// Try best model first, fall back if it fails.
result, err := g.generateWithModel(primaryModel, product, images)
if err == nil {
return result, primaryModel, nil
}
result, err = g.generateWithModel(fallbackModel, product, images)
if err == nil {
return result, fallbackModel, nil
}
return nil, "", fmt.Errorf("all models failed")
}
func (g *AIGenerator) generateWithModel(model string, product Product, images [][]byte) ([]byte, error) {
url := fmt.Sprintf(
"https://generativelanguage.googleapis.com/v1beta/models/%s:generateContent?key=%s",
model, g.apiKey,
)
// Multimodal request - send images + text prompt
parts := []map[string]interface{}{}
// Add product images as base64
for _, img := range images {
parts = append(parts, map[string]interface{}{
"inline_data": map[string]interface{}{
"mime_type": "image/jpeg",
"data": base64.StdEncoding.EncodeToString(img),
},
})
}
// Add the generation prompt
parts = append(parts, map[string]interface{}{
"text": buildImagePrompt(product),
})
request := map[string]interface{}{
"contents": []map[string]interface{}{
{"parts": parts},
},
"generationConfig": map[string]interface{}{
"responseModalities": []string{"TEXT", "IMAGE"},
"responseMimeType": "image/png",
},
}
// Make API call...
}Gemini has rate limits. Semaphore prevents hammering the API:
// internal/jobs/og_image_refresh.go
//
// WHY SEMAPHORE: Gemini rate limits are aggressive.
// Without this, we'd get 429s constantly during batch refreshes.
import "golang.org/x/sync/semaphore"
const MaxConcurrentOGGenerations = 5
func (r *OGImageRefresher) RefreshAllProducts(ctx context.Context) error {
sem := semaphore.NewWeighted(MaxConcurrentOGGenerations)
products, _ := r.storage.GetAllProducts(ctx)
var wg sync.WaitGroup
for _, product := range products {
// Skip recent images - no need to regenerate
if time.Since(product.OGImageUpdatedAt) < 7*24*time.Hour {
continue
}
wg.Add(1)
go func(p Product) {
defer wg.Done()
// Block if 5 requests already in flight
sem.Acquire(ctx, 1)
defer sem.Release(1)
r.regenerateOGImage(ctx, p)
}(product)
}
wg.Wait()
return nil
}What it does: Scrapes fireworks retailer websites to build a product database
Cost. Scraping means thousands of pages. At $0.01/page with Claude, scraping 10,000 pages = $100. With Ollama running locally = $0.
Trade-off: Lower quality extraction, but good enough for structured data.
// internal/scraper/browser/ollama.go
//
// WHY OLLAMA: Scraping = high volume.
// 10,000 pages × $0.01/page = $100 with Claude
// 10,000 pages × $0/page = $0 with Ollama
// Quality is "good enough" for structured extraction.
type OllamaClient struct {
baseURL string // default: http://localhost:11434
model string
httpClient *http.Client
}
type OllamaRequest struct {
Model string `json:"model"`
Prompt string `json:"prompt"`
Stream bool `json:"stream"`
}
type OllamaResponse struct {
Model string `json:"model"`
Response string `json:"response"`
Done bool `json:"done"`
}
func NewOllamaClient(baseURL, model string) *OllamaClient {
return &OllamaClient{
baseURL: baseURL,
model: model,
httpClient: &http.Client{
Timeout: 120 * time.Second,
},
}
}
func (c *OllamaClient) GenerateText(ctx context.Context, prompt string) (string, error) {
req := OllamaRequest{
Model: c.model,
Prompt: prompt,
Stream: false, // Get complete response at once
}
reqBody, _ := json.Marshal(req)
httpReq, _ := http.NewRequestWithContext(ctx, "POST",
c.baseURL+"/api/generate", bytes.NewReader(reqBody))
httpReq.Header.Set("Content-Type", "application/json")
resp, err := c.httpClient.Do(httpReq)
if err != nil {
return "", err
}
defer resp.Body.Close()
var ollamaResp OllamaResponse
json.NewDecoder(resp.Body).Decode(&ollamaResp)
return ollamaResp.Response, nil
}
// Health check - for graceful degradation
func (c *OllamaClient) IsAvailable(ctx context.Context) bool {
req, _ := http.NewRequestWithContext(ctx, "GET", c.baseURL+"/api/tags", nil)
resp, err := c.httpClient.Do(req)
if err != nil {
return false
}
defer resp.Body.Close()
return resp.StatusCode == http.StatusOK
}Local models are less consistent than Claude. They sometimes return strings instead of arrays, wrap JSON in markdown, or add explanations. This type normalizes the chaos:
// WHY: Local models are inconsistent.
// Sometimes: "effects": "red stars"
// Sometimes: "effects": ["red stars", "blue peony"]
// This type handles both.
type FlexibleStringArray []string
func (f *FlexibleStringArray) UnmarshalJSON(data []byte) error {
// Try array first
var arr []string
if err := json.Unmarshal(data, &arr); err == nil {
*f = arr
return nil
}
// Fall back to string
var s string
if err := json.Unmarshal(data, &s); err == nil {
if s != "" {
*f = []string{s}
} else {
*f = []string{}
}
return nil
}
// Handle null
if string(data) == "null" {
*f = []string{}
return nil
}
return fmt.Errorf("expected string or array, got %s", string(data))
}Different Ollama models excel at different things:
// WHY DIFFERENT MODELS: No single model is best at everything.
// - llama3: General purpose, good instruction following
// - qwen2.5-coder:7b: Technical content, understands HTML/code
// - nuextract: Specialized for pulling structured data from text
// General scraping
func (e *Executor) EnableJSScraper() error {
e.ollama = browser.NewOllamaClient("http://localhost:11434", "llama3")
return nil
}
// Structured extraction
func DefaultConfig() GenericScraperConfig {
return GenericScraperConfig{
OllamaURL: "http://localhost:11434",
OllamaModel: "nuextract", // Best for JSON extraction
}
}
// Technical/code content
func NewAgent() *Agent {
return &Agent{
model: "ollama:qwen2.5-coder:7b", // Understands HTML structure
}
}When Ollama fails or isn't running, fall back to pattern-based extraction:
// WHY FALLBACK: Ollama might not be running, or might fail.
// Pattern-based extraction is lower quality but always works.
func (e *Executor) scrapeProduct(ctx context.Context, url string) (*Product, error) {
// Try AI first
if e.ollama != nil && e.ollama.IsAvailable(ctx) {
product, err := e.scrapeWithOllama(ctx, url)
if err == nil {
return product, nil
}
slog.Warn("Ollama failed, falling back", "error", err)
}
// Fall back to regex/CSS selector patterns
return e.scrapeWithPatterns(ctx, url)
}# Install
curl -fsSL https://ollama.com/install.sh | sh
# Pull models
ollama pull llama3
ollama pull qwen2.5-coder:7b
ollama pull nuextract
# Runs on http://localhost:11434 - no API key neededAll LLMs sometimes wrap JSON in markdown or add explanations:
// Every project needs this. LLMs love adding ```json blocks.
func extractJSON(text string) string {
// Try ```json ... ```
if idx := strings.Index(text, "```json"); idx != -1 {
start := idx + 7
if endIdx := strings.Index(text[start:], "```"); endIdx != -1 {
return strings.TrimSpace(text[start : start+endIdx])
}
}
// Try raw JSON
text = strings.TrimSpace(text)
if (text[0] == '{' && text[len(text)-1] == '}') ||
(text[0] == '[' && text[len(text)-1] == ']') {
return text
}
return text
}Let users choose their cost/quality tradeoff:
// Users can pick: fast+cheap (Haiku) vs slow+expensive (Opus)
type ReviewRequest struct {
ClientID string `json:"client_id"`
Model string `json:"model"` // "haiku", "sonnet", "opus"
}
func (s *Service) StartReview(req ReviewRequest) error {
var model string
switch req.Model {
case "haiku":
model = claude.ModelHaiku45 // $1/$5 per MTok
case "sonnet":
model = claude.ModelSonnet45 // $3/$15 per MTok
default:
model = claude.ModelOpus45 // $5/$25 per MTok
}
client := claude.NewClientWithConfig(apiKey, model, 8192)
// ...
}AI is slow. Never block the UI:
// Start task, return immediately, poll for status
func (h *Handler) StartReview(c echo.Context) error {
sessionID := uuid.New().String()
h.db.CreateSession(sessionID, "pending")
// Process in background
go func() {
h.db.UpdateSession(sessionID, "processing")
result, err := h.processWithAI(sessionID)
if err != nil {
h.db.UpdateSession(sessionID, "failed")
return
}
h.db.SaveResult(sessionID, result)
h.db.UpdateSession(sessionID, "completed")
}()
return c.JSON(http.StatusAccepted, map[string]string{
"session_id": sessionID,
})
}
// Frontend polls this every 2 seconds
func (h *Handler) GetStatus(c echo.Context) error {
sessionID := c.Param("id")
session, _ := h.db.GetSession(sessionID)
return c.JSON(http.StatusOK, session)
}// go.mod entries across projects
// Official Anthropic SDK (when cost tracking isn't needed)
require github.com/anthropics/anthropic-sdk-go v1.19.0
// MCP Protocol for tool integration
require github.com/mark3labs/mcp-go v0.41.1
// pgvector for embeddings storage
require github.com/pgvector/pgvector-go v0.2.2
// Concurrency control
require golang.org/x/sync v0.10.0
// Ollama (local models) - uses direct HTTP, but SDK available:
require github.com/ollama/ollama v0.11.8