Connectors

Source and destination connectors for data ingestion

What are Connectors?#

Connectors are the integration points that define where data comes from (sources) and where it's stored (destinations). Each pipeline uses one source connector and one destination connector.

Connector Categories#

Source Connectors

Define where data originates from

Destination Connectors

Define where processed data is stored

Available Source Connectors#

Connector	Code	Description
File Upload	`file_upload`	Direct file uploads (PDF, CSV, Excel)
Web Scraping	`web_scrape`	Crawl4AI web content extraction
Google Drive	`google_drive`	OAuth2 Google Drive integration
Video/YouTube	`video`	Video transcription via Whisper
Audio	`audio`	Audio file transcription
Image	`image`	Image OCR via Gemini Vision

Available Destination Connectors#

Connector	Code	Description
pgvector	`pgvector`	PostgreSQL with pgvector extension

Connector Configuration#

Each connector type has a configuration schema that defines required and optional settings.

Creating a Connector Config#

curl -X POST http://localhost:3000/api/v2/connector-configs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My File Upload Config",
    "connectorTypeId": 1,
    "config": {
      "maxFileSize": "50MB",
      "allowedFormats": ["pdf", "csv", "xlsx"]
    }
  }'

Source Connector Details#

Description: Upload files directly via API

Supported Formats: PDF, CSV, Excel, Word, Images

Configuration:

{
  "maxFileSize": "50MB",
  "allowedFormats": ["pdf", "csv", "xlsx", "docx"]
}

Learn more →

Description: Extract content from web pages using Crawl4AI

Configuration:

{
  "url": "https://example.com",
  "depth": 2,
  "maxPages": 100
}

Learn more →

Description: Sync documents from Google Drive folders

Requirements: OAuth2 authentication

Configuration:

{
  "folderId": "google-drive-folder-id",
  "includeSubfolders": true
}

Learn more →

Description: Transcribe video content using OpenAI Whisper

Supported Sources: YouTube URLs, direct video URLs

Configuration:

{
  "maxDuration": 10800,
  "extractAudio": true,
  "whisperModel": "whisper-1"
}

Learn more →

Destination Connector Details#

Description: PostgreSQL with the pgvector extension for efficient similarity search

Features:

HNSW indexing for fast approximate nearest neighbor search
Metadata filtering
Hybrid search (vector + keyword)

Configuration:

{
  "tableName": "document_embeddings",
  "dimensions": 1536,
  "indexType": "hnsw"
}

Connector Type Properties#

Property	Type	Description
`id`	Integer	Unique identifier
`name`	String	Connector name
`category`	Enum	`source` or `destination`
`description`	String	Connector description
`configSchema`	Object	JSON Schema for config validation
`uniqueCode`	String	Unique connector code
`isActive`	Boolean	Whether connector is enabled

Listing Connector Types#

curl http://localhost:3000/api/v2/connector-configs/types \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Best Practices#

Create connector configs once and reuse across multiple pipelines:

Create a "PDF Upload Config" used by all PDF pipelines
Create a "Marketing Website Scraper" for marketing content

Name configs descriptively:

✅ "Engineering Docs - Google Drive"
✅ "Customer Support Videos"
❌ "Config 1"

File Upload

Upload PDFs and documents

Web Scraping

Extract content from websites

Google Drive

Sync from Google Drive

Video

Transcribe video content

What are Connectors?#

Connector Categories#

Available Source Connectors#

Available Destination Connectors#

Connector Configuration#

Creating a Connector Config#

Source Connector Details#

Destination Connector Details#

Connector Type Properties#

Listing Connector Types#

Best Practices#

Related#