IngestIQ

Connectors

Source and destination connectors for data ingestion

What are Connectors?#

Connectors are the integration points that define where data comes from (sources) and where it's stored (destinations). Each pipeline uses one source connector and one destination connector.

Connector Categories#

Source Connectors

Define where data originates from

Destination Connectors

Define where processed data is stored

Available Source Connectors#

ConnectorCodeDescription
File Uploadfile_uploadDirect file uploads (PDF, CSV, Excel)
Web Scrapingweb_scrapeCrawl4AI web content extraction
Google Drivegoogle_driveOAuth2 Google Drive integration
Video/YouTubevideoVideo transcription via Whisper
AudioaudioAudio file transcription
ImageimageImage OCR via Gemini Vision

Available Destination Connectors#

ConnectorCodeDescription
pgvectorpgvectorPostgreSQL with pgvector extension

Connector Configuration#

Each connector type has a configuration schema that defines required and optional settings.

Creating a Connector Config#

curl -X POST http://localhost:3000/api/v2/connector-configs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My File Upload Config",
    "connectorTypeId": 1,
    "config": {
      "maxFileSize": "50MB",
      "allowedFormats": ["pdf", "csv", "xlsx"]
    }
  }'

Source Connector Details#

Description: Upload files directly via API

Supported Formats: PDF, CSV, Excel, Word, Images

Configuration:

{
  "maxFileSize": "50MB",
  "allowedFormats": ["pdf", "csv", "xlsx", "docx"]
}

Learn more →

Description: Extract content from web pages using Crawl4AI

Configuration:

{
  "url": "https://example.com",
  "depth": 2,
  "maxPages": 100
}

Learn more →

Description: Sync documents from Google Drive folders

Requirements: OAuth2 authentication

Configuration:

{
  "folderId": "google-drive-folder-id",
  "includeSubfolders": true
}

Learn more →

Description: Transcribe video content using OpenAI Whisper

Supported Sources: YouTube URLs, direct video URLs

Configuration:

{
  "maxDuration": 10800,
  "extractAudio": true,
  "whisperModel": "whisper-1"
}

Learn more →

Destination Connector Details#

Description: PostgreSQL with the pgvector extension for efficient similarity search

Features:

  • HNSW indexing for fast approximate nearest neighbor search
  • Metadata filtering
  • Hybrid search (vector + keyword)

Configuration:

{
  "tableName": "document_embeddings",
  "dimensions": 1536,
  "indexType": "hnsw"
}

Connector Type Properties#

PropertyTypeDescription
idIntegerUnique identifier
nameStringConnector name
categoryEnumsource or destination
descriptionStringConnector description
configSchemaObjectJSON Schema for config validation
uniqueCodeStringUnique connector code
isActiveBooleanWhether connector is enabled

Listing Connector Types#

curl http://localhost:3000/api/v2/connector-configs/types \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Best Practices#

Create connector configs once and reuse across multiple pipelines:

  • Create a "PDF Upload Config" used by all PDF pipelines
  • Create a "Marketing Website Scraper" for marketing content

Name configs descriptively:

  • ✅ "Engineering Docs - Google Drive"
  • ✅ "Customer Support Videos"
  • ❌ "Config 1"
Documentation