Go Embeddings Library

Overview

A high-performance, production-ready embedding service written in Go, supporting multiple transformer models with MacOS Metal acceleration. The library combines efficient memory management, optimized batch processing, and hardware acceleration to deliver fast, reliable text embeddings for production environments.

Key Features

Production-ready transformer-based text embeddings with comprehensive error handling and logging
MacOS Metal/CoreML hardware acceleration with ANE support, delivering optimized performance on Apple Silicon
Multi-model support with dynamic loading and efficient model management
Optimized batch processing with auto-tuning based on input characteristics
Two-level caching system (memory + disk) with LRU eviction for improved performance
Prometheus metrics integration for comprehensive monitoring
Support for both synchronous and asynchronous operations with worker pools
Intelligent chunking support for processing long documents
Thread-safe implementation with graceful shutdown capabilities

Example Usage

Basic Implementation

The library can be used both as a standalone service and integrated into existing Go applications. Here's a basic example:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/objones25/go-embeddings/pkg/embedding"
)

func main() {
    // Initialize configuration
    config := &embedding.Config{
        ModelPath:         "./models/all-MiniLM-L6-v2",
        MaxSequenceLength: 512,
        Dimension:        384,  // 384 for MiniLM-L6
        BatchSize:        32,
        EnableMetal:      true,  // Enable CoreML
        CoreMLConfig: &embedding.CoreMLConfig{
            EnableCaching: true,
            RequireANE:    false,
        },
        Options: embedding.Options{
            CacheEnabled:   true,
            Normalize:      true,
            PadToMaxLength: false,
        },
    }

    // Create service with timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    service, err := embedding.NewService(ctx, config)
    if err != nil {
        log.Fatalf("Failed to create service: %v", err)
    }
    defer service.Close()

    // Generate single embedding
    text := "Hello, world!"
    vector, err := service.Embed(ctx, text)
    if err != nil {
        log.Fatalf("Failed to generate embedding: %v", err)
    }
    fmt.Printf("Single embedding (len=%d): %v\n", len(vector), vector[:5])
}

Asynchronous Batch Processing

For handling large volumes of text, the library provides asynchronous batch processing capabilities:

// Initialize channels for results and errors
results := make(chan embedding.Result)
errors := make(chan error)

// Process texts asynchronously
texts := []string{
    "First text to embed",
    "Second text to embed",
    "Third text to embed",
}

err := service.BatchEmbedAsync(ctx, texts, results, errors)
if err != nil {
    log.Fatal(err)
}

// Process results as they arrive
for i := 0; i < len(texts); i++ {
    select {
    case result := <-results:
        fmt.Printf("Embedding: %v\n", result.Embedding[:5])
    case err := <-errors:
        fmt.Printf("Error: %v\n", err)
    case <-ctx.Done():
        fmt.Println("Operation timed out")
        return
    }
}

Document Chunking

For processing longer documents, the library includes intelligent chunking capabilities:

// Initialize tokenizer
tokConfig := embedding.TokenizerConfig{
    ModelID:        "all-MiniLM-L6-v2",
    SequenceLength: 512,
}
tokenizer, _ := embedding.NewTokenizer(tokConfig)

// Configure chunking
opts := embedding.DefaultChunkingOptions()
opts.Strategy = embedding.ChunkByParagraph
opts.MaxTokens = 256

// Process long document
longText := `First paragraph with content.

Second paragraph with different content.
This is still part of the second paragraph.

Third paragraph here.`

chunks, err := tokenizer.ChunkDocument(longText, opts)
if err != nil {
    log.Fatal(err)
}

Technical Capabilities

Vector Operations SIMD-optimized vector similarity computations with multiple distance metrics

Index Management Efficient vector storage with dynamic index updates and persistence

Search Capabilities Configurable KNN search with filtering and metadata support

Development Status

Currently optimizing CoreML integration for improved performance metrics and working on expanding model support
Implementing advanced caching strategies for better resource utilization
Developing comprehensive documentation and usage examples
Fine-tuning batch processing performance for various input scenarios