Skip to content

Instantly share code, notes, and snippets.

@corylanou
Last active January 3, 2026 16:47
Show Gist options
  • Select an option

  • Save corylanou/95712559bb7d5666e8f04d01fe2031c6 to your computer and use it in GitHub Desktop.

Select an option

Save corylanou/95712559bb7d5666e8f04d01fe2031c6 to your computer and use it in GitHub Desktop.
AI/LLM Usage in My Go Projects - Technical Deep Dive

AI/LLM Usage in My Go Projects

A technical deep-dive into how I integrate AI capabilities across four production Go projects.


Executive Summary

The Short Answer

I use the regular Anthropic HTTP API (sometimes via SDK, sometimes custom client) - NOT the Claude Agent SDK. The Agent SDK is for building autonomous AI agents; my projects use AI as a feature within larger applications.

Key Insights

Insight Why It Matters
Custom HTTP > SDK when you need cost tracking SDKs abstract away token counts. For billing features, you need raw access to usage data
Local models (Ollama) for high-volume tasks Scraping 10,000 pages at $0.01/page = $100. With Ollama = $0
Model selection should be user-facing Haiku is 5x cheaper than Opus. Let users choose their cost/quality tradeoff
Always have fallbacks AI fails. Have: expensive model → cheap model → local model → non-AI fallback
Async + polling for AI tasks AI calls take 10-60+ seconds. Never block the UI

Right Tool for the Job

Task Best Choice Why
Complex reasoning, detailed instructions Claude Opus 4.5 Best instruction following, highest quality
Simple extraction, cost-sensitive Claude Haiku 4.5 5x cheaper, fast, good enough for simple tasks
High-volume batch processing Ollama (local) $0/request, no rate limits
Image generation Gemini Claude doesn't generate images
Embeddings for RAG OpenAI text-embedding-3-small Best price/performance for vectors

Project Overview

Project AI Provider Why This Choice
Time Tracker Claude (custom HTTP) Need per-request cost tracking for billing visibility
Gopher Guides Claude SDK + OpenAI embeddings Simpler integration, RAG needs vectors
Logan's 3D Google Gemini Need image generation (Claude can't do this)
Pyro.show Ollama (local) Scraping thousands of pages - API costs would be insane

Project 1: Time Tracker

What it does: AI-powered time entry review and invoice auditing for freelance billing

Why Custom HTTP Client (No SDK)?

This project tracks AI costs per user session and displays them in the UI. The official SDK abstracts away token counts, making it hard to:

  • Calculate exact cost per API call
  • Store token usage in the database
  • Show users "this review cost $0.12"

Building a custom client gave me direct access to the usage field in every response.

// service/claude/client.go
//
// WHY CUSTOM: I need to track every token for cost visibility.
// The SDK doesn't expose usage data in a way that's easy to persist.

package claude

const (
    apiURL           = "https://api.anthropic.com/v1/messages"
    anthropicVersion = "2023-06-01"
    defaultMaxTokens = 8192

    // Model IDs - Dec 2025
    ModelHaiku45  = "claude-haiku-4-5-20251001"
    ModelSonnet45 = "claude-sonnet-4-5-20250929"
    ModelOpus45   = "claude-opus-4-5-20251101"

    // Pricing per million tokens - hardcoded for cost calculation
    haikuInputPricePerMTok   = 1.0
    haikuOutputPricePerMTok  = 5.0
    sonnetInputPricePerMTok  = 3.0
    sonnetOutputPricePerMTok = 15.0
    opusInputPricePerMTok    = 5.0
    opusOutputPricePerMTok   = 25.0
)

type Client struct {
    apiKey     string
    httpClient *http.Client
    model      string
    maxTokens  int
}

// Usage is exposed directly - this is why I didn't use the SDK
type Usage struct {
    InputTokens  int `json:"input_tokens"`
    OutputTokens int `json:"output_tokens"`
}

type Response struct {
    ID      string         `json:"id"`
    Content []ContentBlock `json:"content"`
    Model   string         `json:"model"`
    Usage   Usage          `json:"usage"`  // <-- Direct access to this
}

func NewClientWithConfig(apiKey, model string, maxTokens int) *Client {
    return &Client{
        apiKey: apiKey,
        httpClient: &http.Client{
            Timeout: 120 * time.Second, // AI processing is slow
        },
        model:     model,
        maxTokens: maxTokens,
    }
}

func (c *Client) SendMessage(ctx context.Context, system, userMessage string) (*Response, error) {
    req := Request{
        Model:     c.model,
        MaxTokens: c.maxTokens,
        System:    system,
        Messages:  []Message{{Role: "user", Content: userMessage}},
    }
    return c.send(ctx, req)
}

func (c *Client) send(ctx context.Context, req Request) (*Response, error) {
    body, _ := json.Marshal(req)

    httpReq, _ := http.NewRequestWithContext(ctx, http.MethodPost, apiURL, bytes.NewReader(body))
    httpReq.Header.Set("Content-Type", "application/json")
    httpReq.Header.Set("x-api-key", c.apiKey)
    httpReq.Header.Set("anthropic-version", anthropicVersion)

    resp, err := c.httpClient.Do(httpReq)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()

    var response Response
    json.NewDecoder(resp.Body).Decode(&response)
    return &response, nil
}

Cost Tracking

Every AI call gets logged with cost breakdown:

// WHY: Users see "This review cost $0.08" in the UI.
// Also helps me understand which features are expensive.

func CalculateCostForModel(model string, inputTokens, outputTokens int) float64 {
    var inputPrice, outputPrice float64

    switch model {
    case ModelHaiku45:
        inputPrice, outputPrice = 1.0, 5.0
    case ModelOpus45:
        inputPrice, outputPrice = 5.0, 25.0
    default: // Sonnet
        inputPrice, outputPrice = 3.0, 15.0
    }

    inputCost := (float64(inputTokens) / 1_000_000) * inputPrice
    outputCost := (float64(outputTokens) / 1_000_000) * outputPrice
    return inputCost + outputCost
}

Database Schema

-- WHY: Full audit trail of every AI interaction.
-- Can analyze: which clients use AI most, average cost per review, etc.

CREATE TABLE ai_review_sessions (
    id TEXT PRIMARY KEY,
    client_id TEXT,
    user_id TEXT,
    entity_type TEXT,  -- 'time_entry' or 'invoice'
    status TEXT,       -- pending, processing, completed, failed
    request_data JSON,
    response_data JSON,
    model_used TEXT,
    input_tokens INTEGER,
    output_tokens INTEGER,
    cost_usd REAL,
    created_at TIMESTAMP,
    completed_at TIMESTAMP
);

Project 2: Gopher Guides Corp

What it does: Marketing content generation + MCP-based code review RAG system

Why the Official Anthropic SDK Here?

Unlike Time Tracker, this project doesn't need per-request cost tracking. Marketing campaigns are infrequent (maybe 10/month), so cost visibility isn't critical. The SDK is simpler:

// pkg/claude/client.go
//
// WHY SDK: Marketing content generation is low-volume.
// Don't need granular cost tracking. SDK is simpler.

package claude

import (
    "context"
    "github.com/anthropics/anthropic-sdk-go"
    "github.com/anthropics/anthropic-sdk-go/option"
)

type Client struct {
    client     anthropic.Client
    httpClient *http.Client
}

func NewClient(apiKey string) *Client {
    return &Client{
        client: anthropic.NewClient(option.WithAPIKey(apiKey)),
        httpClient: &http.Client{Timeout: 30 * time.Second},
    }
}

func (c *Client) GenerateMarketingContent(ctx context.Context, url, direction string) (*MarketingContent, error) {
    // Fetch the page content to use as context
    pageContent, err := c.fetchPageContent(ctx, url)
    if err != nil {
        return nil, err
    }

    prompt := buildMarketingPrompt(url, pageContent, direction)

    // SDK makes this clean - no manual JSON marshaling
    message, err := c.client.Messages.New(ctx, anthropic.MessageNewParams{
        Model:     anthropic.ModelClaudeOpus4_5_20251101,
        MaxTokens: 4096,
        Messages: []anthropic.MessageParam{
            anthropic.NewUserMessage(anthropic.NewTextBlock(prompt)),
        },
    })
    if err != nil {
        return nil, err
    }

    var content MarketingContent
    json.Unmarshal([]byte(message.Content[0].Text), &content)
    return &content, nil
}

Why Custom HTTP for OpenAI Embeddings?

OpenAI doesn't have an official Go SDK I liked. Plus, embeddings have tricky batching requirements that I wanted full control over:

// pkg/mcp/rag/embeddings.go
//
// WHY CUSTOM: OpenAI's batch limits are complex:
// - 8,191 tokens per input
// - ~300K tokens per batch (undocumented!)
// - Need intelligent batching to maximize throughput without hitting limits

type EmbeddingGenerator struct {
    apiKey    string
    model     string  // "text-embedding-3-small"
    batchSize int
    client    *http.Client
}

func (eg *EmbeddingGenerator) GenerateEmbeddings(chunks []Chunk) ([][]float32, error) {
    const maxTokensPerInput = 8191   // OpenAI's documented limit
    const maxTokensPerBatch = 250000 // Discovered through trial and error

    var allEmbeddings [][]float32
    var currentBatch []string
    var currentTokens int

    for i, chunk := range chunks {
        chunkTokens := chunk.Metadata["tokens"].(int)

        // Validate - fail fast if a chunk is too big
        if chunkTokens > maxTokensPerInput {
            return nil, fmt.Errorf("chunk %d exceeds limit: %d tokens", i, chunkTokens)
        }

        // Smart batching - send when we'd exceed the limit
        if currentTokens+chunkTokens > maxTokensPerBatch {
            embeddings, _ := eg.callOpenAI(currentBatch)
            allEmbeddings = append(allEmbeddings, embeddings...)
            currentBatch = nil
            currentTokens = 0
            time.Sleep(100 * time.Millisecond) // Respect rate limits
        }

        currentBatch = append(currentBatch, chunk.Content)
        currentTokens += chunkTokens
    }

    // Don't forget the final batch
    if len(currentBatch) > 0 {
        embeddings, _ := eg.callOpenAI(currentBatch)
        allEmbeddings = append(allEmbeddings, embeddings...)
    }

    return allEmbeddings, nil
}

// Retry with exponential backoff - OpenAI has aggressive rate limits
func (eg *EmbeddingGenerator) callOpenAI(texts []string) ([][]float32, error) {
    const maxRetries = 3

    for attempt := 0; attempt <= maxRetries; attempt++ {
        embeddings, err := eg.callOpenAIOnce(texts)
        if err == nil {
            return embeddings, nil
        }

        if !isRetryableError(err) {
            return nil, err  // Don't retry client errors
        }

        delay := time.Duration(float64(time.Second) * float64(4*attempt))
        jitter := time.Duration(rand.Float64() * float64(delay) * 0.1)
        time.Sleep(delay + jitter)
    }

    return nil, fmt.Errorf("failed after %d retries", maxRetries)
}

MCP Server for Code Review

This exposes Gopher Guides training content as tools that Claude Code can use:

// pkg/mcp/server/server.go
//
// WHY MCP: Lets Claude Code query our training materials.
// "audit_code" tool checks code against our best practices.

import "github.com/mark3labs/mcp-go/mcp"

func (s *Server) RegisterTools(mcpServer *mcp.Server) {
    mcpServer.AddTool(mcp.Tool{
        Name:        "audit_code",
        Description: "Audit Go code against Gopher Guides best practices",
        InputSchema: mcp.ToolInputSchema{
            Type: "object",
            Properties: map[string]interface{}{
                "code":  map[string]string{"type": "string", "description": "Go code to audit"},
                "focus": map[string]string{"type": "string", "description": "Focus area"},
            },
            Required: []string{"code"},
        },
    }, s.handleAuditCode)
}

func (s *Server) handleAuditCode(args map[string]interface{}) (interface{}, error) {
    code := args["code"].(string)

    // Generate embedding for semantic search
    embedding, _ := s.search.eg.GenerateEmbedding(code)

    // Find relevant training content via vector similarity
    results, _ := s.search.Search(embedding, 10)

    // Re-rank to prioritize prescriptive content (DO/DON'T) over examples
    ranked := s.rerank(results)

    return formatAuditResults(ranked), nil
}

Project 3: Logan's 3D Creations

What it does: E-commerce site for 3D printed products with AI-generated OG images

Why Google Gemini (Not Claude)?

Claude can't generate images. For product preview images, I needed a model that can create visuals. Gemini's multimodal capabilities let me:

  • Send product photos as input
  • Get a composed marketing image as output
// internal/ogimage/ai_generator.go
//
// WHY GEMINI: Claude doesn't do image generation.
// Gemini can take product photos and create composed marketing images.

type AIGenerator struct {
    apiKey     string
    httpClient *http.Client
}

const (
    // Primary model - best quality
    primaryModel  = "gemini-3-pro-image-preview"
    // Fallback - faster, cheaper, more reliable
    fallbackModel = "gemini-2.5-flash-preview-05-20"
)

func (g *AIGenerator) GenerateMultiVariantOGImage(
    product Product,
    images [][]byte,
) ([]byte, string, error) {
    // WHY FALLBACK: AI services fail. Always have a backup.
    // Try best model first, fall back if it fails.

    result, err := g.generateWithModel(primaryModel, product, images)
    if err == nil {
        return result, primaryModel, nil
    }

    result, err = g.generateWithModel(fallbackModel, product, images)
    if err == nil {
        return result, fallbackModel, nil
    }

    return nil, "", fmt.Errorf("all models failed")
}

func (g *AIGenerator) generateWithModel(model string, product Product, images [][]byte) ([]byte, error) {
    url := fmt.Sprintf(
        "https://generativelanguage.googleapis.com/v1beta/models/%s:generateContent?key=%s",
        model, g.apiKey,
    )

    // Multimodal request - send images + text prompt
    parts := []map[string]interface{}{}

    // Add product images as base64
    for _, img := range images {
        parts = append(parts, map[string]interface{}{
            "inline_data": map[string]interface{}{
                "mime_type": "image/jpeg",
                "data":      base64.StdEncoding.EncodeToString(img),
            },
        })
    }

    // Add the generation prompt
    parts = append(parts, map[string]interface{}{
        "text": buildImagePrompt(product),
    })

    request := map[string]interface{}{
        "contents": []map[string]interface{}{
            {"parts": parts},
        },
        "generationConfig": map[string]interface{}{
            "responseModalities": []string{"TEXT", "IMAGE"},
            "responseMimeType":   "image/png",
        },
    }

    // Make API call...
}

Concurrency Control

Gemini has rate limits. Semaphore prevents hammering the API:

// internal/jobs/og_image_refresh.go
//
// WHY SEMAPHORE: Gemini rate limits are aggressive.
// Without this, we'd get 429s constantly during batch refreshes.

import "golang.org/x/sync/semaphore"

const MaxConcurrentOGGenerations = 5

func (r *OGImageRefresher) RefreshAllProducts(ctx context.Context) error {
    sem := semaphore.NewWeighted(MaxConcurrentOGGenerations)

    products, _ := r.storage.GetAllProducts(ctx)

    var wg sync.WaitGroup
    for _, product := range products {
        // Skip recent images - no need to regenerate
        if time.Since(product.OGImageUpdatedAt) < 7*24*time.Hour {
            continue
        }

        wg.Add(1)
        go func(p Product) {
            defer wg.Done()

            // Block if 5 requests already in flight
            sem.Acquire(ctx, 1)
            defer sem.Release(1)

            r.regenerateOGImage(ctx, p)
        }(product)
    }

    wg.Wait()
    return nil
}

Project 4: Pyro.show (Fireworks Product Scraper)

What it does: Scrapes fireworks retailer websites to build a product database

Why Local Ollama (Not Cloud APIs)?

Cost. Scraping means thousands of pages. At $0.01/page with Claude, scraping 10,000 pages = $100. With Ollama running locally = $0.

Trade-off: Lower quality extraction, but good enough for structured data.

// internal/scraper/browser/ollama.go
//
// WHY OLLAMA: Scraping = high volume.
// 10,000 pages × $0.01/page = $100 with Claude
// 10,000 pages × $0/page = $0 with Ollama
// Quality is "good enough" for structured extraction.

type OllamaClient struct {
    baseURL    string  // default: http://localhost:11434
    model      string
    httpClient *http.Client
}

type OllamaRequest struct {
    Model  string `json:"model"`
    Prompt string `json:"prompt"`
    Stream bool   `json:"stream"`
}

type OllamaResponse struct {
    Model     string `json:"model"`
    Response  string `json:"response"`
    Done      bool   `json:"done"`
}

func NewOllamaClient(baseURL, model string) *OllamaClient {
    return &OllamaClient{
        baseURL: baseURL,
        model:   model,
        httpClient: &http.Client{
            Timeout: 120 * time.Second,
        },
    }
}

func (c *OllamaClient) GenerateText(ctx context.Context, prompt string) (string, error) {
    req := OllamaRequest{
        Model:  c.model,
        Prompt: prompt,
        Stream: false,  // Get complete response at once
    }

    reqBody, _ := json.Marshal(req)
    httpReq, _ := http.NewRequestWithContext(ctx, "POST",
        c.baseURL+"/api/generate", bytes.NewReader(reqBody))
    httpReq.Header.Set("Content-Type", "application/json")

    resp, err := c.httpClient.Do(httpReq)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    var ollamaResp OllamaResponse
    json.NewDecoder(resp.Body).Decode(&ollamaResp)
    return ollamaResp.Response, nil
}

// Health check - for graceful degradation
func (c *OllamaClient) IsAvailable(ctx context.Context) bool {
    req, _ := http.NewRequestWithContext(ctx, "GET", c.baseURL+"/api/tags", nil)
    resp, err := c.httpClient.Do(req)
    if err != nil {
        return false
    }
    defer resp.Body.Close()
    return resp.StatusCode == http.StatusOK
}

Handling Inconsistent Local Model Output

Local models are less consistent than Claude. They sometimes return strings instead of arrays, wrap JSON in markdown, or add explanations. This type normalizes the chaos:

// WHY: Local models are inconsistent.
// Sometimes: "effects": "red stars"
// Sometimes: "effects": ["red stars", "blue peony"]
// This type handles both.

type FlexibleStringArray []string

func (f *FlexibleStringArray) UnmarshalJSON(data []byte) error {
    // Try array first
    var arr []string
    if err := json.Unmarshal(data, &arr); err == nil {
        *f = arr
        return nil
    }

    // Fall back to string
    var s string
    if err := json.Unmarshal(data, &s); err == nil {
        if s != "" {
            *f = []string{s}
        } else {
            *f = []string{}
        }
        return nil
    }

    // Handle null
    if string(data) == "null" {
        *f = []string{}
        return nil
    }

    return fmt.Errorf("expected string or array, got %s", string(data))
}

Model Selection by Task

Different Ollama models excel at different things:

// WHY DIFFERENT MODELS: No single model is best at everything.
// - llama3: General purpose, good instruction following
// - qwen2.5-coder:7b: Technical content, understands HTML/code
// - nuextract: Specialized for pulling structured data from text

// General scraping
func (e *Executor) EnableJSScraper() error {
    e.ollama = browser.NewOllamaClient("http://localhost:11434", "llama3")
    return nil
}

// Structured extraction
func DefaultConfig() GenericScraperConfig {
    return GenericScraperConfig{
        OllamaURL:   "http://localhost:11434",
        OllamaModel: "nuextract",  // Best for JSON extraction
    }
}

// Technical/code content
func NewAgent() *Agent {
    return &Agent{
        model: "ollama:qwen2.5-coder:7b",  // Understands HTML structure
    }
}

Smart Fallback Chain

When Ollama fails or isn't running, fall back to pattern-based extraction:

// WHY FALLBACK: Ollama might not be running, or might fail.
// Pattern-based extraction is lower quality but always works.

func (e *Executor) scrapeProduct(ctx context.Context, url string) (*Product, error) {
    // Try AI first
    if e.ollama != nil && e.ollama.IsAvailable(ctx) {
        product, err := e.scrapeWithOllama(ctx, url)
        if err == nil {
            return product, nil
        }
        slog.Warn("Ollama failed, falling back", "error", err)
    }

    // Fall back to regex/CSS selector patterns
    return e.scrapeWithPatterns(ctx, url)
}

Running Ollama

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3
ollama pull qwen2.5-coder:7b
ollama pull nuextract

# Runs on http://localhost:11434 - no API key needed

Common Patterns Across All Projects

1. JSON Extraction from LLM Responses

All LLMs sometimes wrap JSON in markdown or add explanations:

// Every project needs this. LLMs love adding ```json blocks.

func extractJSON(text string) string {
    // Try ```json ... ```
    if idx := strings.Index(text, "```json"); idx != -1 {
        start := idx + 7
        if endIdx := strings.Index(text[start:], "```"); endIdx != -1 {
            return strings.TrimSpace(text[start : start+endIdx])
        }
    }

    // Try raw JSON
    text = strings.TrimSpace(text)
    if (text[0] == '{' && text[len(text)-1] == '}') ||
       (text[0] == '[' && text[len(text)-1] == ']') {
        return text
    }

    return text
}

2. User-Selectable Models

Let users choose their cost/quality tradeoff:

// Users can pick: fast+cheap (Haiku) vs slow+expensive (Opus)

type ReviewRequest struct {
    ClientID string `json:"client_id"`
    Model    string `json:"model"` // "haiku", "sonnet", "opus"
}

func (s *Service) StartReview(req ReviewRequest) error {
    var model string
    switch req.Model {
    case "haiku":
        model = claude.ModelHaiku45  // $1/$5 per MTok
    case "sonnet":
        model = claude.ModelSonnet45 // $3/$15 per MTok
    default:
        model = claude.ModelOpus45   // $5/$25 per MTok
    }

    client := claude.NewClientWithConfig(apiKey, model, 8192)
    // ...
}

3. Async Processing with Status Polling

AI is slow. Never block the UI:

// Start task, return immediately, poll for status

func (h *Handler) StartReview(c echo.Context) error {
    sessionID := uuid.New().String()
    h.db.CreateSession(sessionID, "pending")

    // Process in background
    go func() {
        h.db.UpdateSession(sessionID, "processing")
        result, err := h.processWithAI(sessionID)
        if err != nil {
            h.db.UpdateSession(sessionID, "failed")
            return
        }
        h.db.SaveResult(sessionID, result)
        h.db.UpdateSession(sessionID, "completed")
    }()

    return c.JSON(http.StatusAccepted, map[string]string{
        "session_id": sessionID,
    })
}

// Frontend polls this every 2 seconds
func (h *Handler) GetStatus(c echo.Context) error {
    sessionID := c.Param("id")
    session, _ := h.db.GetSession(sessionID)
    return c.JSON(http.StatusOK, session)
}

Dependencies

// go.mod entries across projects

// Official Anthropic SDK (when cost tracking isn't needed)
require github.com/anthropics/anthropic-sdk-go v1.19.0

// MCP Protocol for tool integration
require github.com/mark3labs/mcp-go v0.41.1

// pgvector for embeddings storage
require github.com/pgvector/pgvector-go v0.2.2

// Concurrency control
require golang.org/x/sync v0.10.0

// Ollama (local models) - uses direct HTTP, but SDK available:
require github.com/ollama/ollama v0.11.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment