Gemini Chat Agent - Owen Jones

Overview

An advanced AI chat agent powered by Google Gemini 2.5 Pro with sophisticated capabilities including deep thinking, code execution, real-time web search, voice interaction, and text-to-speech. Built on Cloudflare Workers for global edge deployment.

The agent features a transparent reasoning process with collapsible thinking sections, allowing users to understand the AI's thought process. It supports multi-modal interactions through text, voice, and audio, with real-time streaming responses using Server-Sent Events.

Designed with a serverless architecture, the agent automatically scales to handle varying loads while maintaining low latency through Cloudflare's global edge network. Chat history is persisted using Cloudflare KV storage for seamless conversation continuity.

Key Features

Advanced AI Capabilities

Deep Thinking: Transparent reasoning process with collapsible thinking sections
Code Execution: Real-time Python code execution with live results
Web Search: Real-time Google Search integration for current information
Multi-modal Support: Text, voice, and audio interactions

Voice Integration

Voice Recording: Hold-to-record voice messages with automatic transcription
Text-to-Speech: High-quality speech synthesis with multiple voice options
Audio Processing: Advanced audio handling with Web Audio API
Voice Selection: Multiple voice options for personalized experience

Chat Experience

Real-time Streaming: Live response streaming with Server-Sent Events
Chat History: Persistent conversation history with Cloudflare KV storage
Markdown Support: Rich text formatting with syntax highlighting
Session Management: Automatic session handling and restoration

Performance & Deployment

Edge Computing: Deployed on Cloudflare Workers for global low-latency
Scalable Architecture: Serverless design with automatic scaling
Static Assets: Optimized asset delivery through Cloudflare CDN
Global Distribution: Available worldwide with consistent performance

Architecture

System Design

The Gemini Chat Agent follows a modular architecture that separates concerns between the frontend interface, edge worker processing, and external service integrations:

Frontend Layer

Voice recording with Web Audio API
Real-time audio playback
Markdown rendering with syntax highlighting
Server-Sent Events for streaming responses

Edge Worker

Request processing and routing
Session management
API integration orchestration
Response streaming

Storage Layer

Cloudflare KV for chat history persistence
Session data storage
User preference management
Conversation context maintenance

AI Integration

Google Gemini 2.5 Pro API integration
Thinking engine for transparent reasoning
Code execution environment
Web search capabilities

Technical Implementation

The agent is built using TypeScript and modern web technologies, leveraging Cloudflare's edge computing platform for optimal performance and scalability.

Voice Processing

Voice recording requires HTTPS or localhost for security. The implementation uses the Web Audio API for recording and processing, with automatic transcription through the Gemini API.

Streaming Architecture

Server-Sent Events enable real-time streaming of AI responses, providing immediate feedback as the AI processes and generates content. This includes separate streams for thinking, code execution, and final responses.

Session Management

Each chat session is assigned a unique ID, with conversation history stored in Cloudflare KV. The system maintains up to 20 messages per session for optimal context while managing storage efficiently.

Code Execution

The agent can execute Python code in real-time, displaying both the code and its output. This feature is particularly useful for mathematical computations, algorithm demonstrations, and data processing tasks.

API Reference

POST /api/chat

Send a chat message or audio data to the AI agent.

Request Body

{
  "message": "Your text message",
  "audioData": "base64_encoded_audio_data",
  "sessionId": "optional_session_id",
  "tts": true,
  "voice": "Kore"
}

Response Events (Server-Sent Events)

// Thinking process
{"type": "thinking", "content": "AI reasoning..."}

// Code execution
{"type": "code", "content": "print('hello')", "language": "python"}
{"type": "codeResult", "content": "hello"}

// Web search
{"type": "search", "content": "Searched: current weather"}

// Text response
{"type": "text", "content": "Response text..."}

// Audio response
{"type": "audio", "audioData": "base64_audio_data"}

// Completion
{"type": "complete"}

Usage Examples

Basic Text Chat

fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    message: "Explain quantum computing",
    tts: true
  })
});

Voice Message

// Send audio data
fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    audioData: base64AudioData,
    sessionId: "user_session_123"
  })
});

Mathematical Algorithm Implementation

Ask the AI: "Please use python to determine the 300th prime number using the segmented sieve algorithm"

The AI will demonstrate advanced algorithmic problem-solving:

Deep Thinking Process: Show transparent reasoning through multiple thinking iterations
Algorithm Explanation: Explain the segmented sieve approach and its advantages
Code Implementation: Generate optimized Python code with detailed comments
Real-time Execution: Execute the algorithm and display results
Verification: Test with known smaller primes to validate correctness

Expected Output

The 300th prime number is: 1987
The 1st prime number is: 2
The 6th prime number is: 13
The 10th prime number is: 29
The 16th prime number is: 53
The 100th prime number is: 541

Deployment Guide

Prerequisites

Node.js 18+ and npm
Cloudflare account
Google AI Studio API key
Wrangler CLI installed globally

Setup Steps

# Clone the repository
git clone https://github.com/objones25/gemini-chat-agent.git
cd gemini-chat-agent

# Install dependencies
npm install

# Create environment file
echo "GEMINI_API_KEY=your_gemini_api_key_here" > .dev.vars

# Create KV namespaces
wrangler kv:namespace create "CHAT_HISTORY"
wrangler kv:namespace create "CHAT_HISTORY" --preview

# Update wrangler.toml with namespace IDs
# Then run locally
npm run dev

Production Deployment

# Set production secrets
wrangler secret put GEMINI_API_KEY

# Deploy to Cloudflare Workers
npm run deploy

# Your app will be available at:
# https://gemini-agent.your-subdomain.workers.dev

Configuration

Wrangler Configuration

name = "gemini-agent"
main = "src/worker.ts"
compatibility_date = "2024-05-28"

[assets]
directory = "./public"
binding = "ASSETS"

[[kv_namespaces]]
binding = "CHAT_HISTORY"
id = "your_kv_namespace_id"
preview_id = "your_preview_kv_namespace_id"

Performance Optimization

Chat History: Limited to 20 messages per session
Audio Files: Maximum 20MB for voice messages
TTS: Text limited to 800 characters for optimal performance
Session Timeout: Automatic cleanup of inactive sessions