An intelligent AI assistant built with FastAPI, LangChain, and Groq AI. JARVIS provides two modes of interaction: General Chat (pure LLM, no web search) and Realtime Chat (with Tavily web search). The system learns from user data files and past conversations, maintaining context across sessions.
- Python 3.8+ with pip
- Operating System: Windows, macOS, or Linux (fully cross-platform)
- API Keys (set in
.envfile):GROQ_API_KEY- Get from https://console.groq.com (required). You can add more keys for round-robin and fallback (see Multiple Groq API keys).TAVILY_API_KEY- Get from https://tavily.com (optional, for realtime mode)GROQ_MODEL- Optional, defaults to "llama-3.3-70b-versatile"
-
Clone/Download this repository
-
Install Python dependencies:
pip install -r requirements.txt- Create
.envfile in the project root:
GROQ_API_KEY=your_groq_api_key_here
# Optional: add more keys for round-robin and fallback (GROQ_API_KEY_2, GROQ_API_KEY_3, ...)
TAVILY_API_KEY=your_tavily_api_key_here
GROQ_MODEL=llama-3.3-70b-versatile
# Optional: assistant name (default: Jarvis). Tone and personality stay the same.
# ASSISTANT_NAME=Jarvis
# Optional: how to address the user; otherwise uses learning data/chats.
# JARVIS_USER_TITLE=Sir- Start the server:
python run.pyThe server will start at http://localhost:8000
- Test the system (in another terminal):
python test.py- β Dual Chat Modes: General chat (pure LLM, no web search) and Realtime chat (with Tavily search)
- β Session Management: Conversations persist across server restarts
- β
Learning System: Learns from user data files and past conversations via semantic search (no token limit blow-up). No hardcoded namesβassistant name and user title come from
ASSISTANT_NAMEandJARVIS_USER_TITLEin.env, or from learning data and chats. - β
Learning data on restart: Add or edit
.txtfiles indatabase/learning_data/and restart the server to pick them up - β Vector Store: FAISS index of learning data + past chats; only relevant chunks are sent to the LLM so you never hit token limits
- β
Assistant Personality: Sophisticated, witty, professional tone with British humor (name configurable via
ASSISTANT_NAMEin.env)
- Learning data: All
.txtfiles indatabase/learning_data/are indexed in the vector store. The AI answers from this data by retrieving relevant chunks per question (not by sending all text in every prompt), so you can add many files without exceeding token limits. - Hot-reload: A background check runs every 15 seconds. If any
.txtinlearning_data/is new or modified, the vector store is rebuilt so new content is learned instantly. - Curly Brace Escaping: Prevents LangChain template variable errors
- Smart Response Length: Adapts answer length based on question complexity
- Clean Formatting: No markdown, asterisks, or emojis in responses
- Time Awareness: AI knows current date and time
User Input
β
FastAPI Endpoints (/chat or /chat/realtime)
β
ChatService (Session Management)
β
GroqService or RealtimeGroqService
β
VectorStoreService (Context Retrieval)
β
Groq AI (LLM Response Generation)
-
FastAPI Application (
app/main.py)- REST API endpoints
- CORS middleware
- Application lifespan management
-
Chat Service (
app/services/chat_service.py)- Session creation and management
- Message storage (in-memory and disk)
- Conversation history formatting
-
Groq Service (
app/services/groq_service.py)- General chat mode (pure LLM, no web search)
- Retrieves relevant context from vector store (learning data + past chats) per request; no full-text dump, so token usage stays bounded
-
Realtime Service (
app/services/realtime_service.py)- Extends GroqService
- Adds Tavily web search
- Combines search results with AI knowledge
-
Vector Store Service (
app/services/vector_store.py)- FAISS vector database
- Embeddings generation (HuggingFace)
- Semantic search for context retrieval
-
Configuration (
config.py)- Centralized settings
- User context loading
- System prompt definition
JARVIS/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI application and API endpoints
β βββ models.py # Pydantic data models
β βββ services/
β β βββ __init__.py
β β βββ chat_service.py # Session and conversation management
β β βββ groq_service.py # General chat AI service
β β βββ realtime_service.py # Realtime chat with web search
β β βββ vector_store.py # FAISS vector store and embeddings
β βββ utils/
β βββ __init__.py
β βββ time_info.py # Current date/time information
βββ database/
β βββ learning_data/ # User data files (.txt)
β β βββ userdata.txt # Personal information (auto-loaded)
β β βββ system_context.txt # System context (auto-loaded)
β β βββ *.txt # Any other .txt files (auto-loaded)
β βββ chats_data/ # Saved conversations (.json)
β βββ vector_store/ # FAISS index files
βββ config.py # Configuration and settings
βββ run.py # Server startup script
βββ test.py # CLI test interface
βββ requirements.txt # Python dependencies
βββ README.md # This file
General chat endpoint (pure LLM, no web search).
Request:
{
"message": "What is Python?",
"session_id": "optional-session-id"
}Response:
{
"response": "Python is a high-level programming language...",
"session_id": "session-id-here"
}Realtime chat endpoint (with Tavily web search).
Request:
{
"message": "What's the latest AI news?",
"session_id": "optional-session-id"
}Response:
{
"response": "Based on recent search results...",
"session_id": "session-id-here"
}Get chat history for a session.
Response:
{
"session_id": "session-id",
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Good day. How may I assist you?"}
]
}Health check endpoint.
Response:
{
"status": "healthy",
"vector_store": true,
"groq_service": true,
"realtime_service": true,
"chat_service": true
}API information endpoint.
- At startup: All
.txtfiles indatabase/learning_data/and all past chats inchats_data/are loaded, chunked, embedded, and stored in a FAISS vector store. - Restart for new learning data: Restart the server after adding or changing
.txtfiles inlearning_data/; the vector store is rebuilt on startup. - No full dump: Learning data is never sent in full in the prompt. Only the top-k retrieved chunks (from learning data + past conversations) are sent per request, so token usage stays bounded.
On startup (and when learning_data changes):
- Loads all
.txtfiles fromlearning_data/ - Loads all past conversations from
chats_data/ - Converts text to embeddings using HuggingFace model
- Creates FAISS index for fast similarity search
- Saves index to disk
- User sends message via
/chatendpoint - ChatService creates/retrieves session
- User message stored in session
- GroqService retrieves relevant context from the vector store:
- Relevant chunks from learning data (
.txtfiles) and past conversations (semantic search) - Current time information
- Relevant chunks from learning data (
- System prompt built with all context
- Groq AI generates response
- Response stored in session
- Session saved to disk
- User sends message via
/chat/realtimeendpoint - Same session management as general mode
- RealtimeGroqService:
- Searches Tavily for real-time information
- Retrieves relevant context (same as general mode)
- Combines search results with context
- Generates response with current information
- Response stored and saved
When answering a question:
- Vector store performs semantic search
- Finds most relevant documents (k=6 by default)
- Documents can be from:
- Learning data files
- Past conversations
- Context is escaped (curly braces) to prevent template errors
- Context added to system prompt
- Server-managed: If no
session_idprovided, server generates UUID - User-managed: If
session_idprovided, server uses it - Sessions persist across server restarts (loaded from disk)
- Both
/chatand/chat/realtimeshare the same session - Sessions saved to
database/chats_data/as JSON files
python test.pyCommands:
1- Switch to General Chat mode2- Switch to Realtime Chat mode/history- View chat history/clear- Start new session/quit- Exit
import requests
# General chat
response = requests.post(
"http://localhost:8000/chat",
json={
"message": "What is machine learning?",
"session_id": "my-session-id"
}
)
print(response.json()["response"])
# Realtime chat
response = requests.post(
"http://localhost:8000/chat/realtime",
json={
"message": "What's happening in AI today?",
"session_id": "my-session-id" # Same session continues
}
)
print(response.json()["response"])Create a .env file in the project root:
# Required
GROQ_API_KEY=your_groq_api_key
# Optional: add more keys for round-robin and fallback when one hits rate limit
# GROQ_API_KEY_2=second_key
# GROQ_API_KEY_3=third_key
# Optional (for realtime mode)
TAVILY_API_KEY=your_tavily_api_key
# Optional (defaults to llama-3.3-70b-versatile)
GROQ_MODEL=llama-3.3-70b-versatile
# Optional: assistant name (default: Jarvis). Tone and personality stay the same.
# ASSISTANT_NAME=Jarvis
# Optional: how to address the user (e.g. "Sir", "Mr. Smith"). If not set, the AI uses
# only learning data and conversation history to address the user (no hardcoded names).
# JARVIS_USER_TITLE=You can add multiple Groq API keys so the server uses every key one-by-one in rotation and falls back to the next key if one fails.
- Round-robin (one-by-one): The server uses each key in order: 1st request β 1st key, 2nd request β 2nd key, 3rd request β 3rd key, then back to the 1st key, and so on. Every key you give is used in turn; no key is skipped.
- Fallback: If the chosen key fails (e.g. rate limit 429 or any error), the server tries the next key, then the next, until one succeeds or all have been tried.
In your .env, set as many keys as you want using this pattern:
GROQ_API_KEY=your_first_key
GROQ_API_KEY_2=your_second_key
GROQ_API_KEY_3=your_third_key
# Add more: GROQ_API_KEY_4, GROQ_API_KEY_5, ... (no upper limit)Only GROQ_API_KEY is required. Add GROQ_API_KEY_2, GROQ_API_KEY_3, etc. for extra keys. Each key has its own daily token limit on Groqβs free tier, so multiple keys give you more capacity. The code that does round-robin and fallback is in app/services/groq_service.py (see _invoke_llm and module docstring for line-by-line explanation).
Edit config.py to modify:
- Assistant personality and tone (the assistant name is set via
ASSISTANT_NAMEin.env) - Response length guidelines
- Formatting rules
- General behavior guidelines
Add any .txt files to database/learning_data/:
- Files are automatically detected and loaded
- Content is always included in system prompt
- Files are loaded in alphabetical order
- No need to modify code when adding new files
Example files:
userdata.txt- Personal informationsystem_context.txt- System contextusersinterest.txt- User interests- Any other
.txtfile you add
- FastAPI: Modern Python web framework
- LangChain: LLM application framework
- Groq AI: Fast LLM inference (Llama 3.3 70B)
- Tavily: AI-optimized web search API
- FAISS: Vector similarity search
- HuggingFace: Embeddings model (sentence-transformers)
- Pydantic: Data validation
- Uvicorn: ASGI server
- JSON Files: Chat session storage
- FAISS Index: Vector embeddings storage
- Text Files: User learning data
- Indexing: All
.txtfiles indatabase/learning_data/are indexed in the vector store (with past chats). The AI retrieves only relevant chunks per question, so token usage stays bounded and you can add many files without hitting limits. - Restart to pick up new files: New or changed
.txtfiles inlearning_data/are loaded when you restart the server (vector store is rebuilt on startup). - No full dump: The system does not send all learning data in every prompt; it uses semantic search to pull only whatβs relevant, so you never hit the token limit.
The escape_curly_braces() function:
- Prevents LangChain from interpreting
{variable}as template variables - Escapes braces by doubling them:
{β{{,}β}} - Applied to all context before adding to system prompt
Why this matters: Prevents template variable errors when content contains curly braces.
The vector store:
- Converts text to numerical embeddings
- Stores embeddings in FAISS index
- Enables fast similarity search
- Rebuilt on every startup (always current)
Why this matters: Allows JARVIS to find relevant information from past conversations and learning data.
Sessions:
- Stored in memory during active use
- Saved to disk after each message
- Loaded from disk on server restart
- Shared between general and realtime modes
Why this matters: Conversations continue seamlessly across server restarts.
- Check that
GROQ_API_KEYis set in.env - Ensure all dependencies are installed:
pip install -r requirements.txt - Check port 8000 is not in use
- Make sure server is running:
python run.py - Check server is on
http://localhost:8000 - Verify no firewall blocking the connection
- Ensure
database/directories exist - Check file permissions on
database/directory - Delete
database/vector_store/to rebuild index
- Should be fixed by curly brace escaping
- Check for any unescaped
{or}in learning data files - Restart server after fixing
- Check
TAVILY_API_KEYis set in.env - Verify Tavily API key is valid
- Check internet connection
- Session IDs are validated to prevent path traversal (checks for both
/and\) - API keys stored in
.env(not in code) - CORS enabled for all origins (adjust for production)
- No authentication (add for production use)
This code is fully cross-platform and works on:
- β Windows (Windows 10/11)
- β macOS (all versions)
- β Linux (all distributions)
Why it's cross-platform:
- Uses
pathlib.Pathfor all file paths (handles/vs\automatically) - Explicit UTF-8 encoding for all file operations
- No hardcoded path separators
- No shell commands or platform-specific code
- Standard Python libraries only
- Session ID validation checks both
/and\for security
Tested on:
- macOS (Darwin)
- Windows (should work - uses standard Python practices)
- Linux (should work - uses standard Python practices)
python run.pyAuto-reload is enabled, so code changes restart the server automatically.
# CLI test interface
python test.py
# Or use curl
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}'- Separation of Concerns: Each service handles one responsibility
- Configuration Centralization: All settings in
config.py - Type Safety: Pydantic models for validation
- Documentation: Comprehensive docstrings in all modules
J.A.R.V.I.S was developed by Shreshth Kaushik, an online educator, businessman, and programmer known for simplifying complex topics with innovative teaching methods.
- Website: theshreshthkaushik.com
- Instagram: @theshreshthkaushik
- Telegram: t.me/theshreshthkaushik
- YouTube: Shreshth Kaushik
- Jarvis for Everyone: jarvis4everyone.com
Latest version of Jarvis: For the latest version of Jarvis and updates, visit Jarvis for Everyone.
MIT
Made with β€οΈ for intelligent conversations
Start chatting: python run.py then python test.py
yaar video nahi arahi kab see wait karaaha ho 8 days see