Gemini Chat Agent

Gemini Chat Agent

Stable Release
TypeScript Cloudflare Workers Google Gemini WebRTC

Overview

An advanced AI chat agent powered by Google Gemini 2.5 Pro with sophisticated capabilities including deep thinking, code execution, real-time web search, voice interaction, and text-to-speech. Built on Cloudflare Workers for global edge deployment.

The agent features a transparent reasoning process with collapsible thinking sections, allowing users to understand the AI's thought process. It supports multi-modal interactions through text, voice, and audio, with real-time streaming responses using Server-Sent Events.

Designed with a serverless architecture, the agent automatically scales to handle varying loads while maintaining low latency through Cloudflare's global edge network. Chat history is persisted using Cloudflare KV storage for seamless conversation continuity.

Key Features

Advanced AI Capabilities

  • Deep Thinking: Transparent reasoning process with collapsible thinking sections
  • Code Execution: Real-time Python code execution with live results
  • Web Search: Real-time Google Search integration for current information
  • Multi-modal Support: Text, voice, and audio interactions

Voice Integration

  • Voice Recording: Hold-to-record voice messages with automatic transcription
  • Text-to-Speech: High-quality speech synthesis with multiple voice options
  • Audio Processing: Advanced audio handling with Web Audio API
  • Voice Selection: Multiple voice options for personalized experience

Chat Experience

  • Real-time Streaming: Live response streaming with Server-Sent Events
  • Chat History: Persistent conversation history with Cloudflare KV storage
  • Markdown Support: Rich text formatting with syntax highlighting
  • Session Management: Automatic session handling and restoration

Performance & Deployment

  • Edge Computing: Deployed on Cloudflare Workers for global low-latency
  • Scalable Architecture: Serverless design with automatic scaling
  • Static Assets: Optimized asset delivery through Cloudflare CDN
  • Global Distribution: Available worldwide with consistent performance

Architecture

System Design

The Gemini Chat Agent follows a modular architecture that separates concerns between the frontend interface, edge worker processing, and external service integrations:

Frontend Layer

  • Voice recording with Web Audio API
  • Real-time audio playback
  • Markdown rendering with syntax highlighting
  • Server-Sent Events for streaming responses

Edge Worker

  • Request processing and routing
  • Session management
  • API integration orchestration
  • Response streaming

Storage Layer

  • Cloudflare KV for chat history persistence
  • Session data storage
  • User preference management
  • Conversation context maintenance

AI Integration

  • Google Gemini 2.5 Pro API integration
  • Thinking engine for transparent reasoning
  • Code execution environment
  • Web search capabilities

Technical Implementation

The agent is built using TypeScript and modern web technologies, leveraging Cloudflare's edge computing platform for optimal performance and scalability.

Voice Processing

Voice recording requires HTTPS or localhost for security. The implementation uses the Web Audio API for recording and processing, with automatic transcription through the Gemini API.

Streaming Architecture

Server-Sent Events enable real-time streaming of AI responses, providing immediate feedback as the AI processes and generates content. This includes separate streams for thinking, code execution, and final responses.

Session Management

Each chat session is assigned a unique ID, with conversation history stored in Cloudflare KV. The system maintains up to 20 messages per session for optimal context while managing storage efficiently.

Code Execution

The agent can execute Python code in real-time, displaying both the code and its output. This feature is particularly useful for mathematical computations, algorithm demonstrations, and data processing tasks.

API Reference

POST /api/chat

Send a chat message or audio data to the AI agent.

Request Body

{
  "message": "Your text message",
  "audioData": "base64_encoded_audio_data",
  "sessionId": "optional_session_id",
  "tts": true,
  "voice": "Kore"
}

Response Events (Server-Sent Events)

// Thinking process
{"type": "thinking", "content": "AI reasoning..."}

// Code execution
{"type": "code", "content": "print('hello')", "language": "python"}
{"type": "codeResult", "content": "hello"}

// Web search
{"type": "search", "content": "Searched: current weather"}

// Text response
{"type": "text", "content": "Response text..."}

// Audio response
{"type": "audio", "audioData": "base64_audio_data"}

// Completion
{"type": "complete"}

Usage Examples

Basic Text Chat

fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    message: "Explain quantum computing",
    tts: true
  })
});

Voice Message

// Send audio data
fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    audioData: base64AudioData,
    sessionId: "user_session_123"
  })
});

Mathematical Algorithm Implementation

Ask the AI: "Please use python to determine the 300th prime number using the segmented sieve algorithm"

The AI will demonstrate advanced algorithmic problem-solving:

  • Deep Thinking Process: Show transparent reasoning through multiple thinking iterations
  • Algorithm Explanation: Explain the segmented sieve approach and its advantages
  • Code Implementation: Generate optimized Python code with detailed comments
  • Real-time Execution: Execute the algorithm and display results
  • Verification: Test with known smaller primes to validate correctness

Expected Output

The 300th prime number is: 1987
The 1st prime number is: 2
The 6th prime number is: 13
The 10th prime number is: 29
The 16th prime number is: 53
The 100th prime number is: 541

Deployment Guide

Prerequisites

  • Node.js 18+ and npm
  • Cloudflare account
  • Google AI Studio API key
  • Wrangler CLI installed globally

Setup Steps

# Clone the repository
git clone https://github.com/objones25/gemini-chat-agent.git
cd gemini-chat-agent

# Install dependencies
npm install

# Create environment file
echo "GEMINI_API_KEY=your_gemini_api_key_here" > .dev.vars

# Create KV namespaces
wrangler kv:namespace create "CHAT_HISTORY"
wrangler kv:namespace create "CHAT_HISTORY" --preview

# Update wrangler.toml with namespace IDs
# Then run locally
npm run dev

Production Deployment

# Set production secrets
wrangler secret put GEMINI_API_KEY

# Deploy to Cloudflare Workers
npm run deploy

# Your app will be available at:
# https://gemini-agent.your-subdomain.workers.dev

Configuration

Wrangler Configuration

name = "gemini-agent"
main = "src/worker.ts"
compatibility_date = "2024-05-28"

[assets]
directory = "./public"
binding = "ASSETS"

[[kv_namespaces]]
binding = "CHAT_HISTORY"
id = "your_kv_namespace_id"
preview_id = "your_preview_kv_namespace_id"

Performance Optimization

  • Chat History: Limited to 20 messages per session
  • Audio Files: Maximum 20MB for voice messages
  • TTS: Text limited to 800 characters for optimal performance
  • Session Timeout: Automatic cleanup of inactive sessions