Go Embeddings Library

Go Embeddings Library

Stable Release
Go Machine Learning CoreML

Overview

A high-performance, production-ready embedding service written in Go, supporting multiple transformer models with MacOS Metal acceleration. The library combines efficient memory management, optimized batch processing, and hardware acceleration to deliver fast, reliable text embeddings for production environments.

Key Features

  • Production-ready transformer-based text embeddings with comprehensive error handling and logging
  • MacOS Metal/CoreML hardware acceleration with ANE support, delivering optimized performance on Apple Silicon
  • Multi-model support with dynamic loading and efficient model management
  • Optimized batch processing with auto-tuning based on input characteristics
  • Two-level caching system (memory + disk) with LRU eviction for improved performance
  • Prometheus metrics integration for comprehensive monitoring
  • Support for both synchronous and asynchronous operations with worker pools
  • Intelligent chunking support for processing long documents
  • Thread-safe implementation with graceful shutdown capabilities

Example Usage

Basic Implementation

The library can be used both as a standalone service and integrated into existing Go applications. Here's a basic example:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/objones25/go-embeddings/pkg/embedding"
)

func main() {
    // Initialize configuration
    config := &embedding.Config{
        ModelPath:         "./models/all-MiniLM-L6-v2",
        MaxSequenceLength: 512,
        Dimension:        384,  // 384 for MiniLM-L6
        BatchSize:        32,
        EnableMetal:      true,  // Enable CoreML
        CoreMLConfig: &embedding.CoreMLConfig{
            EnableCaching: true,
            RequireANE:    false,
        },
        Options: embedding.Options{
            CacheEnabled:   true,
            Normalize:      true,
            PadToMaxLength: false,
        },
    }

    // Create service with timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    service, err := embedding.NewService(ctx, config)
    if err != nil {
        log.Fatalf("Failed to create service: %v", err)
    }
    defer service.Close()

    // Generate single embedding
    text := "Hello, world!"
    vector, err := service.Embed(ctx, text)
    if err != nil {
        log.Fatalf("Failed to generate embedding: %v", err)
    }
    fmt.Printf("Single embedding (len=%d): %v\n", len(vector), vector[:5])
}

Asynchronous Batch Processing

For handling large volumes of text, the library provides asynchronous batch processing capabilities:

// Initialize channels for results and errors
results := make(chan embedding.Result)
errors := make(chan error)

// Process texts asynchronously
texts := []string{
    "First text to embed",
    "Second text to embed",
    "Third text to embed",
}

err := service.BatchEmbedAsync(ctx, texts, results, errors)
if err != nil {
    log.Fatal(err)
}

// Process results as they arrive
for i := 0; i < len(texts); i++ {
    select {
    case result := <-results:
        fmt.Printf("Embedding: %v\n", result.Embedding[:5])
    case err := <-errors:
        fmt.Printf("Error: %v\n", err)
    case <-ctx.Done():
        fmt.Println("Operation timed out")
        return
    }
}

Document Chunking

For processing longer documents, the library includes intelligent chunking capabilities:

// Initialize tokenizer
tokConfig := embedding.TokenizerConfig{
    ModelID:        "all-MiniLM-L6-v2",
    SequenceLength: 512,
}
tokenizer, _ := embedding.NewTokenizer(tokConfig)

// Configure chunking
opts := embedding.DefaultChunkingOptions()
opts.Strategy = embedding.ChunkByParagraph
opts.MaxTokens = 256

// Process long document
longText := `First paragraph with content.

Second paragraph with different content.
This is still part of the second paragraph.

Third paragraph here.`

chunks, err := tokenizer.ChunkDocument(longText, opts)
if err != nil {
    log.Fatal(err)
}

Technical Capabilities

Vector Operations SIMD-optimized vector similarity computations with multiple distance metrics
Index Management Efficient vector storage with dynamic index updates and persistence
Search Capabilities Configurable KNN search with filtering and metadata support

Development Status

  • Currently optimizing CoreML integration for improved performance metrics and working on expanding model support
  • Implementing advanced caching strategies for better resource utilization
  • Developing comprehensive documentation and usage examples
  • Fine-tuning batch processing performance for various input scenarios