IngestIQ

Architecture

System architecture and design overview

Overview#

IngestIQ is built on Clean Architecture principles with an event-driven microservices design.

High-Level Architecture#

Loading diagram...

Clean Architecture Layers#

┌─────────────────────────────────────┐
│        Presentation Layer           │
│  (Controllers, Routes, Middleware)  │
├─────────────────────────────────────┤
│        Application Layer            │
│      (Use Cases, Services)          │
├─────────────────────────────────────┤
│          Domain Layer               │
│   (Entities, Interfaces, Errors)    │
├─────────────────────────────────────┤
│       Infrastructure Layer          │
│  (Repositories, External Services)  │
└─────────────────────────────────────┘

Layer Responsibilities#

LayerResponsibility
PresentationHTTP handling, validation, response formatting
ApplicationBusiness logic, orchestration, use cases
DomainCore entities, business rules, interfaces
InfrastructureDatabase, external APIs, storage

Project Structure#

src/
├── presentation/     # HTTP Layer
│   ├── controllers/  # Request handlers
│   ├── routes/       # Route definitions
│   ├── middleware/   # Auth, validation
│   └── schemas/      # Request validation
│
├── application/      # Business Logic
│   └── useCases/     # Feature implementations
│       ├── auth/
│       ├── document/
│       ├── knowledgebase/
│       └── pipeline/
│
├── domain/           # Core Business
│   ├── entities/     # Domain objects
│   ├── interfaces/   # Repository contracts
│   └── errors/       # Domain errors
│
├── infrastructure/   # External Integrations
│   ├── repository/   # Database implementations
│   └── services/     # External service adapters
│
├── common/           # Shared Utilities
│   ├── ai/           # AI provider factory
│   ├── db/           # Database setup
│   ├── embedding/    # Embedding providers
│   └── storage/      # S3 storage
│
├── mcp/              # MCP Server
├── nats-events/      # Event handlers
└── scheduler/        # Job scheduling

Event-Driven Processing#

Document processing is fully asynchronous via NATS JetStream:

Loading diagram...

Event Types#

EventPurpose
document.processing.requestStart document processing
document.embedding.requestGenerate embeddings
document.processing.completeProcessing finished
document.processing.failedProcessing error

Data Flow#

Document Ingestion#

Loading diagram...

Search Query#

Loading diagram...

Component Details#

PostgreSQL + pgvector#

  • HNSW indexing for fast approximate nearest neighbor
  • Supports millions of vectors
  • Metadata filtering alongside vector search

NATS JetStream#

  • Durable message streaming
  • At-least-once delivery
  • Consumer groups for scaling

BullMQ + Redis#

  • Scheduled job execution
  • Retry with backoff
  • Job status tracking

MCP Server Pool#

  • Per-Knowledge Base server isolation
  • Connection pooling
  • Automatic cleanup

Scaling Considerations#

Horizontal Scaling#

ComponentScaling Strategy
API ServerMultiple instances behind load balancer
Document ProcessorConsumer groups in NATS
DatabaseRead replicas, connection pooling
Object StorageS3/MinIO handles natively

Bottlenecks#

BottleneckSolution
Embedding generationBatch processing, caching
Database writesConnection pooling, async commits
File storageCDN, distributed storage

Security Architecture#

  • JWT authentication with refresh tokens
  • Organization-level isolation
  • Role-based access control
  • API key encryption at rest
Documentation