| name | overview | todos | isProject | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MongoDB Query Agent |
Build a secure, company-scoped MongoDB query capability for Bizy using RubyLLM's native tool pattern, with defense-in-depth security architecture. |
|
false |
A secure MongoDB query capability for Bizy that allows company admins to explore their data through natural language. The LLM can formulate and execute MongoDB queries, with all results automatically scoped to the user's company.
| Goal | Why It Matters |
|---|---|
| Admin-only access | Only company admins can query raw data |
| Company scoping | All queries automatically filtered to user's company |
| Read-only | Zero ability to modify data, enforced at multiple layers |
| Flexible queries | Support find, aggregate, count operations |
| LLM-friendly | Documentation and discovery tools help LLM formulate queries |
This capability integrates with the Bizy Agent Framework as a specialized sub-agent:
┌─────────────────────────────────────────────────────────────┐
│ Bizy::Chat (Main Orchestrator) │
│ Core tools: GetOrgContext, GetRecognition, GetMilestones │
│ Agents: MeetingAgent, AnalyticsAgent, DataAgent │
│ │
│ User (admin): "How many bonuses were given last month │
│ grouped by department?" │
│ ↓ │
│ Bizy decides: This needs raw data → invoke DataAgent │
│ ↓ │
│ ┌─────────────────────────────────────────┐ │
│ │ DataAgent │ │
│ │ Tools: MongodbFind, MongodbAggregate │ │
│ │ Runs aggregation pipeline │ │
│ │ Returns: Grouped results │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
sequenceDiagram
participant User
participant BizyChat as Bizy::Chat
participant DataAgent as DataAgent
participant Tool as MongodbFind/Aggregate
participant Scoper as QueryScoper
participant Validator as SecurityValidator
participant Executor as QueryExecutor
participant DB as MongoDB Secondary
User->>BizyChat: "Show me top recognizers"
BizyChat->>BizyChat: Check admin? Yes
BizyChat->>DataAgent: Delegate question
DataAgent->>Tool: execute(collection, pipeline)
Tool->>Scoper: scope_aggregate(collection, pipeline)
Scoper->>Validator: validate_pipeline!
Validator-->>Scoper: Valid
Scoper-->>Tool: Scoped pipeline
Tool->>Executor: aggregate(collection, pipeline)
Executor->>DB: Execute on secondary
DB-->>Executor: Results
Executor-->>Tool: Documents
Tool-->>DataAgent: Formatted results
DataAgent-->>BizyChat: Answer
BizyChat-->>User: "Here are your top recognizers..."
This approach balances flexibility with security:
- Admin-only access - Only company admins can use the DataAgent
- Allowlist collections - Only permit queries against pre-defined collections
- Auto-inject scoping filter - Based on collection type:
- Company-scoped: Inject
company_id = user.company_id - User-scoped: Validate
user_id IN (all company user IDs)
- Company-scoped: Inject
- Validate aggregation pipelines - Ensure $match is first stage, inject scoping filter
- Block dangerous operators - Disallow $out, $merge, and unrestricted $lookup
- Read-only - Only support find, aggregate, count operations
flowchart TD
subgraph auth [Authorization]
A[Request from Bizy::Chat] --> B[DataAgent.available?]
B --> C{Is user admin?}
C -->|No| D[Agent not available]
C -->|Yes| E[Get company from user]
end
subgraph scope [Scoping]
E --> F{Collection type?}
F -->|company_id field| G["Inject: company_id = X"]
F -->|user_id field| H[Validate all user_ids in company]
H --> I["Keep filter if valid"]
end
G --> J[Execute query]
I --> J
Company-scoped collections (most collections):
- Have a
company_idfield - Scoper injects:
{ company_id: company.id } - Any user-provided
company_idis overwritten (prevents injection)
User-scoped collections (e.g., app_notifications, llm_runs):
- Have a
user_idfield but nocompany_id - Caller must specify
user_idor{ user_id: { $in: [...] } } - Scoper validates ALL specified user_ids belong to the company
- Rejects entire query if any user_id is outside the company
CRITICAL: The query capability must have ZERO ability to modify data. This is enforced at multiple layers:
Create a dedicated MongoDB user with read-only permissions:
// MongoDB shell command to create read-only user
db.createUser({
user: "bizy_readonly",
pwd: "<secure-password>",
roles: [
{ role: "read", db: "bonusly_production" }
]
})Configure a separate Mongoid client with secondary-only reads:
# config/mongoid.yml
production:
clients:
default:
# ... normal read/write connection
bizy_readonly:
uri: <%= ENV['MONGODB_BIZY_READONLY_URI'] %>
options:
read:
mode: :secondary # NOT secondary_preferred - ONLY secondary replicas
max_pool_size: 5
server_selection_timeout: 10 # Fail fast if no secondary availableImportant: Use :secondary mode, NOT :secondary_preferred. This ensures:
- Queries NEVER touch the primary database
- If no secondary is available, the query fails (rather than falling back to primary)
- Production primary is completely isolated from query load
Why this is the strongest layer: Even if application code has bugs, MongoDB itself will reject any write operations with a permission error.
Only expose read operations - no insert, update, delete, or bulk write methods:
# lib/bizy/mongodb_query_executor.rb
module Bizy
class MongodbQueryExecutor
MAX_LIMIT = 100
DEFAULT_LIMIT = 20
QUERY_TIMEOUT_MS = 30_000
def initialize
@client = Mongoid.client(:bizy_readonly)
end
# The ONLY three public methods - no write operations exist
def find(collection:, filter:, projection: nil, sort: nil, limit: DEFAULT_LIMIT)
limit = [limit.to_i, MAX_LIMIT].min
cursor = @client[collection]
.find(filter)
.projection(projection || {})
.sort(sort || { _id: -1 })
.limit(limit)
.max_time_ms(QUERY_TIMEOUT_MS)
cursor.to_a
end
def aggregate(collection:, pipeline:)
@client[collection].aggregate(
pipeline,
max_time_ms: QUERY_TIMEOUT_MS
).to_a
end
def count(collection:, filter:)
@client[collection]
.count_documents(filter, max_time_ms: QUERY_TIMEOUT_MS)
end
# NO insert, update, delete, bulk_write methods exist
# The class literally cannot write - it only has read methods
private
attr_reader :client
end
endBlock ALL aggregation stages that can write or modify data:
# lib/bizy/mongodb_pipeline_validator.rb
module Bizy
class MongodbPipelineValidator
FORBIDDEN_STAGES = %w[
$out # Writes to collection
$merge # Writes/updates to collection
$lookup # Cross-collection join (data leakage)
$graphLookup # Recursive cross-collection join
$unionWith # Unions unfiltered collection data
$documents # Creates documents from scratch
].freeze
def self.validate!(pipeline)
new.validate!(pipeline)
end
def validate!(pipeline)
validate_pipeline_recursive!(pipeline)
end
private
def validate_pipeline_recursive!(pipeline)
pipeline.each do |stage|
stage_name = stage.keys.first.to_s
if FORBIDDEN_STAGES.include?(stage_name)
raise Bizy::ForbiddenOperatorError,
"Aggregation stage '#{stage_name}' is not allowed"
end
# Recursively validate $facet sub-pipelines
if stage_name == "$facet"
stage["$facet"].each_value do |sub_pipeline|
validate_pipeline_recursive!(sub_pipeline)
end
end
end
end
end
endReject any query operators that could bypass security:
# lib/bizy/mongodb_security_validator.rb
module Bizy
class MongodbSecurityValidator
# Filter operators that can bypass field-level security
FORBIDDEN_FILTER_OPERATORS = %w[
$where # JavaScript execution - can bypass any filter
$function # JavaScript execution - can bypass any filter
].freeze
# NOTE: $expr is ALLOWED - it cannot bypass top-level field matches
# and is required for existing analytics pipelines
MAX_REGEX_LENGTH = 100
VALID_FIELD_NAME = /\A[\w.$]+\z/
def self.validate_filter!(filter)
new.validate_filter!(filter)
end
def validate_filter!(filter)
validate_filter_recursive!(filter)
end
private
def validate_filter_recursive!(filter)
return unless filter.is_a?(Hash)
filter.each do |key, value|
validate_field_name!(key)
validate_not_forbidden_operator!(key)
validate_regex_length!(key, value)
validate_filter_recursive!(value) if value.is_a?(Hash)
value.each { |v| validate_filter_recursive!(v) if v.is_a?(Hash) } if value.is_a?(Array)
end
end
def validate_field_name!(key)
unless key.to_s.match?(VALID_FIELD_NAME)
raise Bizy::InvalidFilterError, "Invalid field name format"
end
end
def validate_not_forbidden_operator!(key)
if FORBIDDEN_FILTER_OPERATORS.include?(key.to_s)
raise Bizy::ForbiddenOperatorError,
"Operator '#{key}' is not allowed"
end
end
def validate_regex_length!(key, value)
if key.to_s == "$regex" && value.to_s.length > MAX_REGEX_LENGTH
raise Bizy::InvalidFilterError, "Regex pattern too long"
end
end
end
end| Layer | What it blocks | Failure mode |
|---|---|---|
| Read-Only DB User | All writes | MongoDB permission error |
| Secondary-Only Mode | Primary DB access | Query fails if no secondary available |
| API Surface | No write methods exposed | Method doesn't exist |
| Pipeline Validation | $out, $merge, $lookup | Application error before query runs |
| Query Validation | $where, $function | Application error before query runs |
All layers are mandatory:
- Read-only DB user: Prevents any data modification at the database level
- Secondary-only mode: Isolates primary from query load; queries can never affect write performance
- API surface: Defense against code bugs - write methods don't exist
- Pipeline validation: Defense against LLM-crafted malicious aggregation pipelines
- Query validation: Defense against JavaScript injection and regex attacks
This section documents all considered attack vectors and their mitigations.
Attack:
filter: {
"$or": [
{ "company_id": "other_company_id", "amount": { "$gt": 0 } },
{ "amount": { "$lt": 0 } }
]
}Problem: Simple filter.merge("company_id" => X) only sets top-level company_id. The $or clause still contains the attacker's company_id and will match documents from other companies.
Mitigation: Recursively sanitize company_id from entire filter structure:
def sanitize_scope_field!(filter, scope_field)
filter.each do |key, value|
if key.to_s == scope_field
filter.delete(key) # Remove any user-provided scope field
elsif value.is_a?(Hash)
sanitize_scope_field!(value, scope_field)
elsif value.is_a?(Array)
value.each { |v| sanitize_scope_field!(v, scope_field) if v.is_a?(Hash) }
end
end
end
# Then inject our scope field at top level
filter[scope_field] = @company_idAttack:
filter: { "$where": "this.company_id == 'other_company_id'" }Problem: $where executes JavaScript and can bypass any field-level filters.
Mitigation: Block dangerous filter operators (see MongodbSecurityValidator above).
Attack:
pipeline: [
{ "$match": { "company_id": my_company } },
{ "$unionWith": { "coll": "bonuses" } }, # Unions ALL bonuses, unfiltered!
{ "$match": { "company_id": "other_company" } }
]Problem: $unionWith brings in data from another collection without applying the company filter.
Mitigation: Block in forbidden stages list (see MongodbPipelineValidator above).
Initial concern:
filter: { "$expr": { "$eq": ["$company_id", "other_company_id"] } }Analysis: After deeper review, $expr cannot bypass top-level field matches.
MongoDB query semantics treat all top-level conditions as an implicit AND:
# After our injection:
{
"company_id" => my_company, # Our top-level field match
"$expr" => { "$eq": ["$company_id", "other"] } # Attacker's $expr
}Both conditions must be true:
company_id == my_company(TRUE for company's documents)$company_id == "other"→my_company == "other"(FALSE)
Result: No documents match. The attacker's $expr cannot override our field match.
Decision: $expr is ALLOWED - removed from FORBIDDEN_FILTER_OPERATORS.
Attack:
# Attacker sends string ID
filter: { "user_id": "507f1f77bcf86cd799439011" }
# Validation might check ObjectId, but filter uses stringProblem: Mismatch between validation type and query type could allow bypass.
Mitigation: Normalize IDs in the filter itself:
def normalize_user_ids_in_filter!(filter, scope_field)
user_id_value = filter[scope_field]
return unless user_id_value
filter[scope_field] = case user_id_value
when String
BSON::ObjectId(user_id_value)
when BSON::ObjectId
user_id_value
when Hash
if user_id_value["$in"]
{ "$in" => user_id_value["$in"].map { |id| BSON::ObjectId(id.to_s) } }
else
user_id_value
end
when Array
{ "$in" => user_id_value.map { |id| BSON::ObjectId(id.to_s) } }
end
endAttack:
filter: { "reason": { "$regex": "(a+)+$" } }Problem: Catastrophic backtracking in regex causes CPU exhaustion (ReDoS).
Mitigation: Limit regex length (see MongodbSecurityValidator above).
Attack:
# Probe for collection names
collection: "secret_internal_collection"
# Error reveals: "Collection 'secret_internal_collection' is not available"Problem: Detailed error messages leak information about data existence.
Mitigation: Use generic error messages:
# Bad - reveals information
raise "User ID 'abc' does not belong to this company"
raise "Collection 'xyz' is not available"
# Good - generic errors
raise Bizy::NotAuthorizedError, "Query not permitted"Attack:
# Systematically extract all company data
loop do
results = query(filter: { "_id": { "$gt": last_id } }, limit: 100)
break if results.empty?
last_id = results.last["_id"]
endProblem: Even with proper scoping, attackers can extract entire datasets.
Mitigation: Audit logging for security review:
# lib/bizy/mongodb_audit_logger.rb
module Bizy
class MongodbAuditLogger
def self.log_query(user:, company:, collection:, filter:)
Rails.logger.info("[Bizy MongoDB Query]", {
user_id: user.id.to_s,
company_id: company.id.to_s,
collection: collection,
filter_keys: filter.keys, # Log structure, not values (PII)
timestamp: Time.current.iso8601
})
end
end
endAttack:
pipeline: [
{ "$match": { "company_id": my_company } },
{ "$facet": {
"legitimate": [{ "$match": { "amount": 10 } }],
"attack": [{ "$unionWith": { "coll": "users" } }] # Nested forbidden stage!
}}
]Problem: $facet contains nested pipelines that might contain forbidden stages.
Mitigation: Recursively validate nested pipelines (see MongodbPipelineValidator above).
Attack:
filter: { "company_id\x00": "other" } # Null byte
filter: { "company_id\u200B": "other" } # Zero-width spaceProblem: Unicode tricks might bypass string matching for field names.
Mitigation: Validate field name characters (see MongodbSecurityValidator above).
Attack: Steal or replay a valid session token to impersonate another user.
Mitigation: Already handled by existing authentication:
- Users authenticated via standard Rails session
- Company context derived from authenticated user
- No separate token system needed (unlike MCP approach)
| Priority | Attack Vector | Mitigation | Status |
|---|---|---|---|
| CRITICAL | Nested filter injection | Recursive sanitization of scope field | Required |
| CRITICAL | $where JavaScript | Block $where, $function | Required |
| CRITICAL | $unionWith bypass | Block $unionWith, $documents | Required |
| SAFE | $expr override | ALLOWED - cannot bypass field matches | Safe to allow |
| MEDIUM | Type confusion | Normalize IDs in filter | Required |
| MEDIUM | ReDoS | Limit regex length | Required |
| MEDIUM | Info disclosure | Generic error messages | Required |
| MEDIUM | Enumeration | Audit logging | Required |
| MEDIUM | $facet nesting | Recursive pipeline validation | Required |
| LOW | Field name tricks | Validate field name characters | Required |
| LOW | Token theft | Existing authentication | Already handled |
Defines which collections are queryable and their scoping rules:
# lib/bizy/mongodb_collection_registry.rb
module Bizy
class MongodbCollectionRegistry
# Scoping types:
# - :company - Filter by company_id field
# - :user - Filter by user_id IN (validated company user IDs)
COLLECTIONS = {
# Company-scoped collections (have company_id field)
"bonuses" => {
scope_type: :company,
scope_field: "company_id",
description: "Recognition given between employees",
docs: ["collections/bonuses"]
},
"users" => {
scope_type: :company,
scope_field: "company_id",
description: "Employee records",
docs: ["collections/users"]
},
"check_ins" => {
scope_type: :company,
scope_field: "company_id",
description: "Employee check-ins",
docs: ["collections/check_ins"]
},
"monthly_user_statistics" => {
scope_type: :company,
scope_field: "company_id",
description: "Monthly aggregated user metrics",
docs: ["collections/monthly_user_statistics"]
},
# User-scoped collections (have user_id but no company_id)
"app_notifications" => {
scope_type: :user,
scope_field: "user_id",
description: "User notifications",
docs: ["collections/app_notifications"]
},
"llm_runs" => {
scope_type: :user,
scope_field: "user_id",
description: "LLM query history",
docs: ["collections/llm_runs"]
},
"user_content_dismissals" => {
scope_type: :user,
scope_field: "user_id",
description: "Dismissed content by user",
docs: ["collections/user_content_dismissals"]
},
# ... add more as needed
}.freeze
class << self
def allowed?(collection_name)
COLLECTIONS.key?(collection_name)
end
def config(collection_name)
COLLECTIONS[collection_name]
end
def scope_type(collection_name)
COLLECTIONS.dig(collection_name, :scope_type)
end
def scope_field(collection_name)
COLLECTIONS.dig(collection_name, :scope_field)
end
def available_names
COLLECTIONS.keys
end
def all_with_descriptions
COLLECTIONS.map do |name, config|
{
name: name,
description: config[:description],
scope_type: config[:scope_type],
doc_paths: config[:docs] || []
}
end
end
end
end
endHandles query rewriting for both company-scoped and user-scoped collections:
# lib/bizy/mongodb_query_scoper.rb
module Bizy
class MongodbQueryScoper
def initialize(company:)
@company = company
@company_id = company.id
end
def scope_find(collection:, filter:)
validate_collection!(collection)
filter = filter.deep_dup
MongodbSecurityValidator.validate_filter!(filter)
merge_scope_filter(collection, filter)
end
def scope_aggregate(collection:, pipeline:)
validate_collection!(collection)
pipeline = pipeline.deep_dup
MongodbPipelineValidator.validate!(pipeline)
inject_scope_into_pipeline(collection, pipeline)
end
private
def validate_collection!(collection)
unless MongodbCollectionRegistry.allowed?(collection)
raise Bizy::UnauthorizedCollectionError, "Query not permitted"
end
end
def merge_scope_filter(collection, filter)
config = MongodbCollectionRegistry.config(collection)
case config[:scope_type]
when :company
scope_company_filter(config[:scope_field], filter)
when :user
scope_user_filter(config[:scope_field], filter)
else
raise "Unknown scope type: #{config[:scope_type]}"
end
end
def scope_company_filter(scope_field, filter)
# Remove any nested company_id to prevent injection
sanitize_scope_field!(filter, scope_field)
# Inject our company_id at top level
filter.merge(scope_field => @company_id)
end
def scope_user_filter(scope_field, filter)
requested_user_ids = extract_user_ids(filter[scope_field])
if requested_user_ids.present?
# Validate ALL requested user_ids belong to this company
validate_user_ids_in_company!(requested_user_ids)
# Normalize IDs in the filter
normalize_user_ids_in_filter!(filter, scope_field)
filter
else
# No user_ids specified - require them for user-scoped queries
raise Bizy::InvalidFilterError,
"user_id is required for this collection"
end
end
def sanitize_scope_field!(filter, scope_field)
filter.each do |key, value|
if key.to_s == scope_field
filter.delete(key)
elsif value.is_a?(Hash)
sanitize_scope_field!(value, scope_field)
elsif value.is_a?(Array)
value.each { |v| sanitize_scope_field!(v, scope_field) if v.is_a?(Hash) }
end
end
end
def extract_user_ids(user_id_filter)
case user_id_filter
when BSON::ObjectId, String
[normalize_id(user_id_filter)]
when Hash
user_id_filter["$in"]&.map { |id| normalize_id(id) }
when Array
user_id_filter.map { |id| normalize_id(id) }
else
nil
end
end
def validate_user_ids_in_company!(user_ids)
valid_count = @company.users.where(:_id.in => user_ids).count
unless valid_count == user_ids.length
raise Bizy::UnauthorizedUserError, "Query not permitted"
end
end
def normalize_id(id)
id.is_a?(BSON::ObjectId) ? id : BSON::ObjectId(id.to_s)
end
def normalize_user_ids_in_filter!(filter, scope_field)
user_id_value = filter[scope_field]
return unless user_id_value
filter[scope_field] = case user_id_value
when String
BSON::ObjectId(user_id_value)
when BSON::ObjectId
user_id_value
when Hash
if user_id_value["$in"]
{ "$in" => user_id_value["$in"].map { |id| BSON::ObjectId(id.to_s) } }
else
user_id_value
end
when Array
{ "$in" => user_id_value.map { |id| BSON::ObjectId(id.to_s) } }
end
end
def inject_scope_into_pipeline(collection, pipeline)
config = MongodbCollectionRegistry.config(collection)
case config[:scope_type]
when :company
inject_company_filter_into_pipeline(config[:scope_field], pipeline)
when :user
validate_user_filter_in_pipeline(config[:scope_field], pipeline)
end
pipeline
end
def inject_company_filter_into_pipeline(scope_field, pipeline)
if pipeline.first&.key?("$match")
# Sanitize and inject into existing $match
sanitize_scope_field!(pipeline.first["$match"], scope_field)
pipeline.first["$match"][scope_field] = @company_id
else
# Prepend new $match stage
pipeline.unshift({ "$match" => { scope_field => @company_id } })
end
end
def validate_user_filter_in_pipeline(scope_field, pipeline)
first_match = pipeline.first&.dig("$match")
unless first_match
raise Bizy::InvalidFilterError,
"Pipeline must start with $match containing user_id"
end
user_ids = extract_user_ids(first_match[scope_field])
unless user_ids.present?
raise Bizy::InvalidFilterError,
"user_id is required in first $match stage"
end
validate_user_ids_in_company!(user_ids)
end
end
end# lib/bizy/errors.rb
module Bizy
class UnauthorizedCollectionError < StandardError; end
class UnauthorizedUserError < StandardError; end
class InvalidFilterError < StandardError; end
class ForbiddenOperatorError < StandardError; end
end# app/lib/bizy/agents/data.rb
class Bizy::Agents::Data < Bizy::Agents::Base
description <<~DESC
Query MongoDB collections for data analysis and reporting.
Use this agent when the user asks about:
- Raw data from collections (bonuses, users, check_ins)
- Custom aggregations, groupings, or counts
- Data exploration and discovery
Only available to company admins. Results are automatically
scoped to the user's company. Start with list_collections
or describe_collection to understand the data model.
DESC
def self.available?(user:, metadata:)
user.company_admin? || user.can_administer?(:global)
end
private
def tools
[
Bizy::Tools::MongodbFind.new(company: @user.company),
Bizy::Tools::MongodbAggregate.new(company: @user.company),
Bizy::Tools::MongodbCount.new(company: @user.company),
Bizy::Tools::MongodbListCollections.new,
Bizy::Tools::MongodbDescribeCollection.new(company: @user.company),
Bizy::Tools::MongodbDocs.new,
]
end
def instructions
<<~PROMPT
You are a data query assistant. Help admins explore their company's data.
Available collections: #{Bizy::MongodbCollectionRegistry.available_names.join(', ')}
All queries are automatically scoped to the user's company.
For user-scoped collections (app_notifications, llm_runs), you must specify user_id(s).
Workflow:
1. Use list_collections to see available collections
2. Use describe_collection to understand fields
3. Use mongodb_docs for detailed field documentation
4. Use mongodb_find for simple queries, mongodb_aggregate for complex analysis
PROMPT
end
end# app/lib/bizy/tools/mongodb_find.rb
class Bizy::Tools::MongodbFind < RubyLLM::Tool
description <<~DESC
Query documents from a MongoDB collection. Results are automatically
scoped to your company. Maximum 100 documents per query.
Use this for simple lookups and filtered queries. For complex
aggregations with grouping or calculations, use mongodb_aggregate instead.
Example filters:
- { "amount": { "$gt": 10 } } - bonuses over 10 points
- { "created_at": { "$gte": "2025-01-01" } } - recent records
- { "hashtags": "#teamwork" } - records with specific hashtag
DESC
param :collection, desc: "Collection name (e.g., 'bonuses', 'users')"
param :filter, type: :object, desc: "MongoDB query filter", required: false
param :projection, type: :object, desc: "Fields to include/exclude (e.g., { 'amount': 1, 'reason': 1 })", required: false
param :sort, type: :object, desc: "Sort order (e.g., { 'created_at': -1 })", required: false
param :limit, type: :integer, desc: "Max documents to return (default: 20, max: 100)", required: false
def initialize(company:)
@company = company
end
def execute(collection:, filter: {}, projection: nil, sort: nil, limit: 20)
scoper = Bizy::MongodbQueryScoper.new(company: @company)
scoped_filter = scoper.scope_find(collection: collection, filter: filter)
executor = Bizy::MongodbQueryExecutor.new
results = executor.find(
collection: collection,
filter: scoped_filter,
projection: projection,
sort: sort,
limit: limit
)
Bizy::MongodbAuditLogger.log_query(
user: @company.users.first, # Will be replaced with actual user
company: @company,
collection: collection,
filter: filter
)
{ collection: collection, count: results.length, documents: results }
rescue Bizy::UnauthorizedCollectionError, Bizy::InvalidFilterError,
Bizy::ForbiddenOperatorError => e
{ error: "Query not permitted" }
end
end# app/lib/bizy/tools/mongodb_aggregate.rb
class Bizy::Tools::MongodbAggregate < RubyLLM::Tool
description <<~DESC
Run an aggregation pipeline on a MongoDB collection. The first $match
stage is automatically scoped to your company.
Forbidden stages: $out, $merge, $lookup, $graphLookup, $unionWith, $documents
Use for grouping, counting, and complex data analysis.
Example pipeline:
[
{ "$match": { "created_at": { "$gte": "2025-01-01" } } },
{ "$group": { "_id": "$giver_id", "total": { "$sum": "$amount" } } },
{ "$sort": { "total": -1 } },
{ "$limit": 10 }
]
DESC
param :collection, desc: "Collection name"
param :pipeline, type: :array, desc: "MongoDB aggregation pipeline stages"
def initialize(company:)
@company = company
end
def execute(collection:, pipeline:)
scoper = Bizy::MongodbQueryScoper.new(company: @company)
scoped_pipeline = scoper.scope_aggregate(collection: collection, pipeline: pipeline)
executor = Bizy::MongodbQueryExecutor.new
results = executor.aggregate(collection: collection, pipeline: scoped_pipeline)
{ collection: collection, result_count: results.length, results: results }
rescue Bizy::UnauthorizedCollectionError, Bizy::InvalidFilterError,
Bizy::ForbiddenOperatorError => e
{ error: "Query not permitted" }
end
end# app/lib/bizy/tools/mongodb_count.rb
class Bizy::Tools::MongodbCount < RubyLLM::Tool
description <<~DESC
Count documents in a MongoDB collection matching a filter.
Results are automatically scoped to your company.
Use this for quick counts without fetching documents.
DESC
param :collection, desc: "Collection name"
param :filter, type: :object, desc: "MongoDB query filter", required: false
def initialize(company:)
@company = company
end
def execute(collection:, filter: {})
scoper = Bizy::MongodbQueryScoper.new(company: @company)
scoped_filter = scoper.scope_find(collection: collection, filter: filter)
executor = Bizy::MongodbQueryExecutor.new
count = executor.count(collection: collection, filter: scoped_filter)
{ collection: collection, count: count }
rescue Bizy::UnauthorizedCollectionError, Bizy::InvalidFilterError,
Bizy::ForbiddenOperatorError => e
{ error: "Query not permitted" }
end
end# app/lib/bizy/tools/mongodb_list_collections.rb
class Bizy::Tools::MongodbListCollections < RubyLLM::Tool
description <<~DESC
List available MongoDB collections with descriptions and scope types.
Use this first to discover what data is available to query.
DESC
def execute
collections = Bizy::MongodbCollectionRegistry.all_with_descriptions
{ collections: collections }
end
end# app/lib/bizy/tools/mongodb_describe_collection.rb
class Bizy::Tools::MongodbDescribeCollection < RubyLLM::Tool
description <<~DESC
Get detailed information about a collection including sample fields
and links to related documentation. Use this to understand the
structure before writing queries.
DESC
param :collection, desc: "Collection name to describe"
def initialize(company:)
@company = company
end
def execute(collection:)
config = Bizy::MongodbCollectionRegistry.config(collection)
return { error: "Collection not available" } unless config
# Get sample document to show field structure
scoper = Bizy::MongodbQueryScoper.new(company: @company)
executor = Bizy::MongodbQueryExecutor.new
begin
scoped_filter = scoper.scope_find(collection: collection, filter: {})
sample = executor.find(collection: collection, filter: scoped_filter, limit: 1).first
sample_fields = sample&.keys || []
rescue Bizy::InvalidFilterError
# User-scoped collection without user_id - can't get sample
sample_fields = ["(requires user_id to sample)"]
end
{
name: collection,
description: config[:description],
scope_type: config[:scope_type],
scope_field: config[:scope_field],
sample_fields: sample_fields,
doc_paths: config[:docs] || []
}
end
end# app/lib/bizy/tools/mongodb_docs.rb
class Bizy::Tools::MongodbDocs < RubyLLM::Tool
description <<~DESC
Get detailed documentation about a MongoDB collection's fields,
semantics, and common query patterns. Use this to understand
what data means and how to query it effectively.
DESC
DOCS_PATH = Rails.root.join("doc/bizy_data")
param :collection, desc: "Collection name to get documentation for"
def execute(collection:)
# Validate collection name to prevent path traversal
unless collection.match?(/\A[\w]+\z/)
return { error: "Invalid collection name" }
end
doc_path = DOCS_PATH.join("collections/#{collection}.md")
# Ensure resolved path is within DOCS_PATH (prevent traversal)
unless doc_path.to_s.start_with?(DOCS_PATH.to_s)
return { error: "Invalid collection name" }
end
return { error: "No documentation available for #{collection}" } unless File.exist?(doc_path)
{ documentation: File.read(doc_path) }
end
end# spec/lib/bizy/mongodb_query_scoper_spec.rb
RSpec.describe Bizy::MongodbQueryScoper do
let(:company) { create(:company) }
let(:other_company) { create(:company) }
let(:user1) { create(:user, company: company) }
let(:user2) { create(:user, company: company) }
let(:other_user) { create(:user, company: other_company) }
subject { described_class.new(company: company) }
describe "#scope_find" do
context "company-scoped collections" do
it "injects company_id into filter" do
result = subject.scope_find(collection: "bonuses", filter: { "amount" => 10 })
expect(result).to eq({ "amount" => 10, "company_id" => company.id })
end
it "overwrites any user-provided company_id" do
result = subject.scope_find(
collection: "bonuses",
filter: { "company_id" => other_company.id }
)
expect(result["company_id"]).to eq(company.id)
end
it "sanitizes nested company_id in $or clauses" do
result = subject.scope_find(
collection: "bonuses",
filter: {
"$or" => [
{ "company_id" => other_company.id, "amount" => 10 },
{ "amount" => 20 }
]
}
)
expect(result["company_id"]).to eq(company.id)
expect(result["$or"].first).not_to have_key("company_id")
end
end
context "user-scoped collections" do
it "allows querying with valid user_id" do
result = subject.scope_find(
collection: "app_notifications",
filter: { "user_id" => user1.id, "seen_at" => nil }
)
expect(result["user_id"]).to eq(user1.id)
end
it "allows querying with array of valid user_ids" do
result = subject.scope_find(
collection: "app_notifications",
filter: { "user_id" => { "$in" => [user1.id, user2.id] } }
)
expect(result["user_id"]["$in"]).to contain_exactly(user1.id, user2.id)
end
it "rejects user_id from another company" do
expect {
subject.scope_find(
collection: "app_notifications",
filter: { "user_id" => other_user.id }
)
}.to raise_error(Bizy::UnauthorizedUserError)
end
it "rejects if any user_id in array is from another company" do
expect {
subject.scope_find(
collection: "app_notifications",
filter: { "user_id" => { "$in" => [user1.id, other_user.id] } }
)
}.to raise_error(Bizy::UnauthorizedUserError)
end
it "requires user_id for user-scoped collections" do
expect {
subject.scope_find(collection: "app_notifications", filter: {})
}.to raise_error(Bizy::InvalidFilterError, /user_id is required/)
end
end
it "rejects non-allowlisted collections" do
expect {
subject.scope_find(collection: "secret_data", filter: {})
}.to raise_error(Bizy::UnauthorizedCollectionError)
end
end
end# spec/lib/bizy/mongodb_security_validator_spec.rb
RSpec.describe Bizy::MongodbSecurityValidator do
describe ".validate_filter!" do
it "blocks $where operator" do
expect {
described_class.validate_filter!({ "$where" => "this.x == 1" })
}.to raise_error(Bizy::ForbiddenOperatorError, /\$where/)
end
it "blocks $function operator" do
expect {
described_class.validate_filter!({ "$function" => { body: "..." } })
}.to raise_error(Bizy::ForbiddenOperatorError, /\$function/)
end
it "allows $expr operator" do
expect {
described_class.validate_filter!({ "$expr" => { "$eq" => ["$a", "$b"] } })
}.not_to raise_error
end
it "blocks overly long regex patterns" do
long_regex = "a" * 200
expect {
described_class.validate_filter!({ "field" => { "$regex" => long_regex } })
}.to raise_error(Bizy::InvalidFilterError, /Regex pattern too long/)
end
it "blocks field names with null bytes" do
expect {
described_class.validate_filter!({ "field\x00" => "value" })
}.to raise_error(Bizy::InvalidFilterError, /Invalid field name/)
end
it "validates nested filters recursively" do
expect {
described_class.validate_filter!({
"$or" => [{ "$where" => "1==1" }]
})
}.to raise_error(Bizy::ForbiddenOperatorError, /\$where/)
end
end
end# spec/lib/bizy/mongodb_pipeline_validator_spec.rb
RSpec.describe Bizy::MongodbPipelineValidator do
describe ".validate!" do
it "blocks $out stage" do
pipeline = [{ "$match" => {} }, { "$out" => "output_collection" }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$out/)
end
it "blocks $merge stage" do
pipeline = [{ "$match" => {} }, { "$merge" => { into: "target" } }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$merge/)
end
it "blocks $lookup stage" do
pipeline = [{ "$lookup" => { from: "other", as: "joined" } }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$lookup/)
end
it "blocks $unionWith stage" do
pipeline = [{ "$match" => {} }, { "$unionWith" => { coll: "other" } }]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$unionWith/)
end
it "blocks forbidden stages nested in $facet" do
pipeline = [
{ "$match" => {} },
{ "$facet" => {
"good" => [{ "$match" => { "x" => 1 } }],
"bad" => [{ "$unionWith" => { coll: "other" } }]
}}
]
expect { described_class.validate!(pipeline) }
.to raise_error(Bizy::ForbiddenOperatorError, /\$unionWith/)
end
it "allows safe stages like $match, $group, $project" do
pipeline = [
{ "$match" => { status: "active" } },
{ "$group" => { _id: "$type", count: { "$sum" => 1 } } },
{ "$project" => { type: "$_id", count: 1 } }
]
expect { described_class.validate!(pipeline) }.not_to raise_error
end
end
end# spec/lib/bizy/mongodb_query_executor_spec.rb
RSpec.describe Bizy::MongodbQueryExecutor do
describe "read-only enforcement" do
subject { described_class.new }
it "does not expose insert methods" do
expect(subject).not_to respond_to(:insert)
expect(subject).not_to respond_to(:insert_one)
expect(subject).not_to respond_to(:insert_many)
end
it "does not expose update methods" do
expect(subject).not_to respond_to(:update)
expect(subject).not_to respond_to(:update_one)
expect(subject).not_to respond_to(:update_many)
end
it "does not expose delete methods" do
expect(subject).not_to respond_to(:delete)
expect(subject).not_to respond_to(:delete_one)
expect(subject).not_to respond_to(:delete_many)
end
it "uses the bizy_readonly MongoDB client" do
client = subject.send(:client)
expect(client).to eq(Mongoid.client(:bizy_readonly))
end
it "enforces secondary-only read mode" do
client = subject.send(:client)
read_preference = client.options[:read]
expect(read_preference[:mode]).to eq(:secondary)
end
end
end# spec/lib/bizy/agents/data_spec.rb
RSpec.describe Bizy::Agents::Data do
describe ".available?" do
let(:company) { create(:company) }
context "with non-admin user" do
let(:user) { create(:user, company: company, company_admin: false) }
it "returns false" do
expect(described_class.available?(user: user, metadata: {})).to be false
end
end
context "with company admin" do
let(:user) { create(:user, company: company, company_admin: true) }
it "returns true" do
expect(described_class.available?(user: user, metadata: {})).to be true
end
end
context "with global admin" do
let(:user) { create(:user, company: company, company_admin: false) }
before do
allow(user).to receive(:can_administer?).with(:global).and_return(true)
end
it "returns true" do
expect(described_class.available?(user: user, metadata: {})).to be true
end
end
end
endThe DataAgent's effectiveness depends on the LLM having enough context to formulate meaningful queries. Collection and field names alone don't explain what the data means or how it's calculated. This section describes the full documentation infrastructure.
Create a dedicated folder for LLM-consumable documentation, separate from engineering docs:
doc/
├── internal/ # Engineering docs (NOT exposed to LLM)
│ └── security/
│ └── how_scoping_works.md
│
└── bizy_data/ # Curated for LLM consumption
├── collections/
│ ├── bonuses.md
│ ├── users.md
│ ├── check_ins.md
│ ├── monthly_user_statistics.md
│ ├── app_notifications.md
│ ├── llm_runs.md
│ └── user_content_dismissals.md
└── examples/
└── common_queries.md
Exposing internal docs to the LLM introduces risks:
| Risk | Example | Mitigation |
|---|---|---|
| Implementation detail leakage | Class names, file paths, dependencies | Curated docs only - not entire /doc folder |
| Security mechanism exposure | "Filter by company_id at query layer" | Separate user-facing docs from engineering docs |
| Accidental secrets | API keys, hostnames in examples | Review process + automated scanning |
| Collection enumeration | References to non-allowlisted collections | Docs only reference allowlisted collections |
Each collection doc includes YAML frontmatter for automated freshness tracking:
---
collection: bonuses
documented_fields: [giver_id, receiver_ids, amount, reason, created_at, hashtags]
registry_description: "Recognition given between employees"
last_generated: 2025-02-01T14:30:00Z
---
# Bonuses Collection
...The frontmatter enables the freshness checker to detect when:
- New fields appear in the collection that aren't documented
- Documented fields have been removed from the collection
- The registry description has changed
Curated docs should:
- Be written for admin audience - Not engineering implementation details
- Focus on data meaning - What does this field represent? What questions can it answer?
- Include example queries - Show common use cases with actual query patterns
- Reference only allowlisted collections - Don't mention internal collections
- Avoid implementation details - No class names, file paths, or internal architecture
---
collection: bonuses
documented_fields: [_id, giver_id, receiver_ids, amount, reason, created_at, hashtags, via]
registry_description: "Recognition given between employees"
last_generated: 2025-02-01T14:30:00Z
---
# Bonuses Collection
Recognition events where employees appreciate each other's contributions.
## Key Fields
- `giver_id` - User who gave the recognition (ObjectId)
- `receiver_ids` - Users who received the recognition (array of ObjectIds)
- `amount` - Points awarded (integer)
- `reason` - Message explaining the recognition (string)
- `created_at` - When the recognition was given (datetime)
- `hashtags` - Company values tagged (array of strings, e.g., "#teamwork")
- `via` - Source of the recognition (e.g., "web", "slack", "ms_teams")
## Common Query Patterns
**Recognition given by a specific user:**
```json
{ "giver_id": "<user_id>" }Recognition in a date range:
{ "created_at": { "$gte": "2025-01-01", "$lte": "2025-01-31" } }Recognition mentioning a specific value:
{ "hashtags": "#teamwork" }Top givers aggregation:
[
{ "$match": { "created_at": { "$gte": "2025-01-01" } } },
{ "$group": { "_id": "$giver_id", "total_given": { "$sum": "$amount" } } },
{ "$sort": { "total_given": -1 } },
{ "$limit": 10 }
]
### Review Process for Documentation
Before any doc is added to `doc/bizy_data/`:
1. **Security review** - No internal paths, class names, dependencies, or security mechanisms
2. **No secrets** - Scan for API keys, tokens, hostnames, real user/company data
3. **Allowlisted references only** - Only mention collections in the registry
4. **User-facing language** - Written for admins, not engineers
5. **PR approval required** - Standard review process
### Documentation Freshness Automation
Keep docs in sync with collection schemas using **schema-based detection** (not source file tracking). The process runs locally first, with CI as a backup safety net.
#### Detection Strategy
Only regenerate docs when the **actual data structure** changes:
| Trigger | Detected By | Frequency |
| -------------------------------- | ----------------------- | --------- |
| New field added to collection | Sample doc comparison | Rare |
| Field removed from collection | Sample doc comparison | Rare |
| Registry description changed | Registry vs frontmatter | Rare |
| New collection added to registry | Missing doc file | Rare |
| Source code refactoring | **Not tracked** | N/A |
#### Doc Parser
Parses YAML frontmatter from documentation files:
```ruby
# lib/bizy/mongodb_doc_parser.rb
module Bizy
class MongodbDocParser
Result = Struct.new(:metadata, :content, keyword_init: true)
def self.parse(path)
new.parse(path)
end
def parse(path)
raw = File.read(path)
if raw.start_with?("---")
parts = raw.split("---", 3)
metadata = YAML.safe_load(parts[1], symbolize_names: true) || {}
content = parts[2]&.strip || ""
else
metadata = {}
content = raw
end
Result.new(metadata: metadata, content: content)
end
end
end
Detects stale documentation by comparing frontmatter against actual collection state:
# lib/bizy/mongodb_doc_freshness_checker.rb
module Bizy
class MongodbDocFreshnessChecker
DOCS_PATH = Rails.root.join("doc/bizy_data/collections")
def stale_docs
stale = []
MongodbCollectionRegistry::COLLECTIONS.each do |name, config|
doc_path = DOCS_PATH.join("#{name}.md")
unless File.exist?(doc_path)
stale << { collection: name, reason: { missing: true } }
next
end
doc = MongodbDocParser.parse(doc_path)
sample = fetch_sample_document(name)
current_fields = sample.keys.sort
documented_fields = doc.metadata[:documented_fields]&.sort || []
added = current_fields - documented_fields
removed = documented_fields - current_fields
description_changed = config[:description] != doc.metadata[:registry_description]
if added.any? || removed.any? || description_changed
stale << {
collection: name,
reason: {
added_fields: added,
removed_fields: removed,
description_changed: description_changed
}
}
end
end
stale
end
private
def fetch_sample_document(name)
Mongoid.client(:bizy_readonly)[name].find({}).limit(1).first || {}
end
end
endGenerates or regenerates documentation using an LLM, with PII-scrubbed sample data:
# lib/bizy/mongodb_doc_generator.rb
module Bizy
class MongodbDocGenerator
DOCS_PATH = Rails.root.join("doc/bizy_data/collections")
def generate(collection:, config:)
sample = fetch_sample_document(collection)
content = generate_with_llm(
collection: collection,
description: config[:description],
fields: sample.keys,
sample: sanitize_sample(sample)
)
frontmatter = {
collection: collection,
documented_fields: sample.keys.sort,
registry_description: config[:description],
last_generated: Time.current.iso8601
}
write_doc(collection, frontmatter, content)
end
private
def fetch_sample_document(collection)
Mongoid.client(:bizy_readonly)[collection].find({}).limit(1).first || {}
end
def generate_with_llm(collection:, description:, fields:, sample:)
prompt = <<~PROMPT
Generate user-facing documentation for a MongoDB collection.
Collection: #{collection}
Description: #{description}
Fields: #{fields.join(', ')}
Sample document (sanitized): #{sample.to_json}
Guidelines:
- Write for admin users, not engineers
- Explain what each field means semantically
- Include 3-5 common query patterns with example filters
- Do not mention internal class names, file paths, or implementation details
- Use markdown format with ## headers for sections
Required sections:
1. Brief description (1-2 sentences)
2. Key Fields (list with descriptions)
3. Common Query Patterns (with example filters)
PROMPT
Bizy::LlmClient.complete(prompt)
end
def sanitize_sample(sample)
# Remove PII, replace IDs with placeholders
sample.transform_values do |v|
case v
when BSON::ObjectId then "<ObjectId>"
when String then v.truncate(50)
when Array then v.first(3)
else v
end
end
end
def write_doc(collection, frontmatter, content)
path = DOCS_PATH.join("#{collection}.md")
FileUtils.mkdir_p(DOCS_PATH)
File.write(path, <<~DOC)
---
#{frontmatter.to_yaml.lines[1..].join}---
#{content}
DOC
end
end
endLocal workflow commands for managing documentation:
# lib/tasks/bizy_docs.rake
namespace :bizy do
desc "Check for stale MongoDB documentation"
task check_docs: :environment do
checker = Bizy::MongodbDocFreshnessChecker.new
stale = checker.stale_docs
if stale.empty?
puts "All Bizy data docs are up to date."
else
puts "Stale documentation detected:"
stale.each do |s|
reasons = []
if s[:reason].is_a?(Hash)
reasons << "new fields: #{s[:reason][:added_fields].join(', ')}" if s[:reason][:added_fields]&.any?
reasons << "removed fields: #{s[:reason][:removed_fields].join(', ')}" if s[:reason][:removed_fields]&.any?
reasons << "description changed" if s[:reason][:description_changed]
reasons << "doc missing" if s[:reason][:missing]
else
reasons << s[:reason].to_s
end
puts " #{s[:collection]}: #{reasons.join(', ')}"
end
puts "\nRun 'rake bizy:generate_docs' to update."
exit 1
end
end
desc "Generate/update documentation for collections"
task :generate_docs, [:collection] => :environment do |_, args|
generator = Bizy::MongodbDocGenerator.new
collections = if args[:collection]
{ args[:collection] => Bizy::MongodbCollectionRegistry.config(args[:collection]) }
else
Bizy::MongodbCollectionRegistry::COLLECTIONS
end
collections.each do |name, config|
puts "Generating docs for #{name}..."
generator.generate(collection: name, config: config)
end
puts "Done. Review changes and commit."
end
end# Check which docs are stale
rake bizy:check_docs
# Regenerate all stale docs using LLM
rake bizy:generate_docs
# Regenerate a specific collection
rake bizy:generate_docs[bonuses]A spec that fails CI if docs are stale, ensuring developers don't forget to update docs when collection schemas change:
# spec/lib/bizy/mongodb_documentation_freshness_spec.rb
RSpec.describe "Bizy Data Documentation Freshness" do
it "has up-to-date documentation for all collections" do
checker = Bizy::MongodbDocFreshnessChecker.new
stale = checker.stale_docs
if stale.any?
message = stale.map do |s|
if s[:reason].is_a?(Hash) && s[:reason][:missing]
"#{s[:collection]}: missing doc file"
elsif s[:reason].is_a?(Hash)
reasons = []
reasons << "new fields: #{s[:reason][:added_fields]}" if s[:reason][:added_fields]&.any?
reasons << "removed fields: #{s[:reason][:removed_fields]}" if s[:reason][:removed_fields]&.any?
reasons << "description changed" if s[:reason][:description_changed]
"#{s[:collection]}: #{reasons.join(', ')}"
else
"#{s[:collection]}: #{s[:reason]}"
end
end.join("\n")
fail "Stale Bizy data docs detected:\n#{message}\n\nRun locally: rake bizy:generate_docs"
end
end
endflowchart TD
A[Developer adds/changes collection] --> B[Run rake bizy:check_docs locally]
B --> C{Stale docs?}
C -->|No| D[Continue with PR]
C -->|Yes| E[Run rake bizy:generate_docs]
E --> F[LLM generates updated docs]
F --> G[Developer reviews & edits]
G --> H[Commit changes]
H --> D
D --> I[CI runs freshness spec]
I --> J{Passes?}
J -->|Yes| K[PR mergeable]
J -->|No| L[CI fails with instructions]
L --> B
Pre-define parameterized query templates that the LLM fills in:
TEMPLATES = {
"recognition_by_user" => {
collection: "bonuses",
filter: ->(params) { { receiver_ids: params[:user_id] } }
}
}Pros: Maximum security, no query parsing Cons: Very limited flexibility, new use cases require code changes
Rejected because it defeats the purpose of flexible queries and requires constant maintenance.
Create company-filtered views in MongoDB itself.
Pros: Database-level security Cons: Can't create per-company views (impractical), views are static
The original plan used MCP (Model Context Protocol) with JSON-RPC endpoints.
Pros: Standard protocol, credential separation Cons: Extra network hop, more infrastructure, doesn't align with new Bizy framework
Superseded by RubyLLM native tools which provide the same security with less complexity.
- Create read-only MongoDB user (ops/infra)
- Create
bizy_readonlyuser withreadrole only - Store credentials in environment variable
- Create
- Configure Mongoid client
- Add
bizy_readonlyclient toconfig/mongoid.yml - Configure secondary-only reads
- Add
- MongodbQueryExecutor + spec
- Read-only client, only find/aggregate/count methods
- Spec: read-only enforcement, secondary-only mode
- MongodbCollectionRegistry + spec
- Collection allowlist with scope types and doc references
- Spec: allowed/disallowed collections
- MongodbSecurityValidator + spec
- Forbidden operators, recursive validation, field validation
- Spec: all attack vectors ($where, $function, regex, field names)
- MongodbPipelineValidator + spec
- Forbidden stages, recursive $facet validation
- Spec: $out, $merge, $lookup, $unionWith, nested stages
- MongodbQueryScoper + spec
- Company/user filter injection using validators
- Spec: company scoping, user scoping, nested injection protection
- Error classes and audit logger
- MongodbFind tool + spec
- Uses scoper and executor
- Spec: scoped results, error handling
- MongodbAggregate tool + spec
- Pipeline validation
- Spec: pipeline validation, scoped aggregation
- MongodbCount tool + spec
- Scoped count queries
- Discovery tools + specs
- MongodbListCollections
- MongodbDescribeCollection
- MongodbDocs (with path traversal protection)
- Bizy::Agents::Data + spec
- Admin availability check
- Tool registration
- Instructions
- Register in Bizy::Chat
- Add to SUB_AGENTS array
- Integration tests
- End-to-end query execution
- Create
MongodbDocParser+ spec- Parse YAML frontmatter from doc files
- Spec: parsing with/without frontmatter
- Create
MongodbDocFreshnessChecker+ spec- Schema-based staleness detection using frontmatter
- Spec: missing docs, new fields, removed fields, description changes
- Create
MongodbDocGenerator+ spec- LLM-assisted documentation generation with PII scrubbing
- Spec: sample sanitization, frontmatter generation
- Create rake tasks (
bizy:check_docs,bizy:generate_docs)- Local workflow for checking and regenerating docs
- Support single-collection targeting
- Generate initial curated docs in
doc/bizy_data/collections/- One file per allowlisted collection with frontmatter
- Example queries per collection
- Create common query examples in
doc/bizy_data/examples/- Cross-collection query patterns
- Create CI freshness spec as safety net
- Fails CI if docs are stale with actionable instructions
| File | Purpose |
|---|---|
lib/bizy/mongodb_query_executor.rb |
Read-only query execution |
lib/bizy/mongodb_collection_registry.rb |
Collection allowlist |
lib/bizy/mongodb_security_validator.rb |
Filter validation |
lib/bizy/mongodb_pipeline_validator.rb |
Pipeline validation |
lib/bizy/mongodb_query_scoper.rb |
Company/user scoping |
lib/bizy/mongodb_audit_logger.rb |
Query audit logging |
lib/bizy/errors.rb |
Error classes |
| File | Purpose |
|---|---|
app/lib/bizy/agents/data.rb |
Data query agent |
app/lib/bizy/tools/mongodb_find.rb |
Find tool |
app/lib/bizy/tools/mongodb_aggregate.rb |
Aggregate tool |
app/lib/bizy/tools/mongodb_count.rb |
Count tool |
app/lib/bizy/tools/mongodb_list_collections.rb |
Discovery tool |
app/lib/bizy/tools/mongodb_describe_collection.rb |
Discovery tool |
app/lib/bizy/tools/mongodb_docs.rb |
Documentation tool |
| File | Purpose |
|---|---|
lib/bizy/mongodb_doc_parser.rb |
Parse frontmatter from doc files |
lib/bizy/mongodb_doc_freshness_checker.rb |
Schema-based staleness detection |
lib/bizy/mongodb_doc_generator.rb |
LLM-assisted doc generation |
lib/tasks/bizy_docs.rake |
Rake tasks for doc management |
| File | Purpose |
|---|---|
config/mongoid.yml |
Add bizy_readonly client |
| File | Purpose |
|---|---|
doc/bizy_data/collections/bonuses.md |
Bonuses documentation |
doc/bizy_data/collections/users.md |
Users documentation |
doc/bizy_data/collections/check_ins.md |
Check-ins documentation |
doc/bizy_data/collections/monthly_user_statistics.md |
Monthly metrics documentation |
doc/bizy_data/collections/app_notifications.md |
Notifications documentation |
doc/bizy_data/collections/llm_runs.md |
LLM runs documentation |
doc/bizy_data/collections/user_content_dismissals.md |
Dismissals documentation |
doc/bizy_data/examples/common_queries.md |
Cross-collection query examples |
| File | Purpose |
|---|---|
spec/lib/bizy/mongodb_query_executor_spec.rb |
Read-only tests |
spec/lib/bizy/mongodb_collection_registry_spec.rb |
Allowlist tests |
spec/lib/bizy/mongodb_security_validator_spec.rb |
Attack vector tests |
spec/lib/bizy/mongodb_pipeline_validator_spec.rb |
Pipeline tests |
spec/lib/bizy/mongodb_query_scoper_spec.rb |
Scoping tests |
spec/lib/bizy/mongodb_doc_parser_spec.rb |
Doc parser tests |
spec/lib/bizy/mongodb_doc_freshness_checker_spec.rb |
Freshness detection tests |
spec/lib/bizy/mongodb_doc_generator_spec.rb |
Doc generator tests |
spec/lib/bizy/mongodb_documentation_freshness_spec.rb |
CI safety net spec |
spec/lib/bizy/agents/data_spec.rb |
Agent availability tests |
spec/lib/bizy/tools/mongodb_find_spec.rb |
Tool integration tests |
spec/lib/bizy/tools/mongodb_aggregate_spec.rb |
Tool integration tests |