IngestIQ

File Upload

Upload documents directly via API

Overview#

The File Upload connector allows you to upload documents directly to IngestIQ via the REST API. This is the simplest way to ingest documents.

Supported File Types#

TypeExtensionsMax Size
PDF.pdf100MB
CSV.csv50MB
Excel.xlsx, .xls50MB
Word.docx50MB
Images.png, .jpg, .jpeg20MB

Uploading Files#

Single File Upload#

curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines/{pipelineId}/execute \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "files=@/path/to/document.pdf"

Multiple Files#

curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines/{pipelineId}/execute \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "files=@document1.pdf" \
  -F "files=@document2.pdf" \
  -F "files=@data.csv"

Response#

{
  "executionId": "exec-uuid",
  "status": "processing",
  "documentsQueued": 3,
  "message": "Pipeline execution started"
}

File Processing Details#

PDF Processing#

PDFs are processed with full text extraction:

  • Text content is extracted page by page
  • Page numbers are preserved in metadata
  • Charts and tables are processed via Gemini Vision
  • Embedded images are extracted for OCR

Scanned PDFs (image-only) are automatically detected and processed with OCR.

CSV/Excel Processing#

Structured data is handled specially:

  • Each row can become a separate chunk
  • Headers are preserved as context
  • Column names are included in metadata
  • Large files are streamed for memory efficiency

Word Documents#

Word documents are converted to PDF first (via Gotenberg), then processed:

  • Formatting is preserved where possible
  • Images and charts are included
  • TOC and headers are extracted

Configuration#

Connector Config Schema#

{
  "maxFileSize": "100MB",
  "allowedFormats": ["pdf", "csv", "xlsx", "docx"],
  "processImages": true,
  "extractTables": true
}

Creating File Upload Config#

curl -X POST http://localhost:3000/api/v2/connector-configs \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Standard File Upload",
    "connectorTypeId": 1,
    "config": {
      "maxFileSize": "50MB",
      "allowedFormats": ["pdf", "csv", "xlsx"]
    }
  }'

Best Practices#

Compress large PDFs before upload to reduce processing time:

  • Use PDF compression tools
  • Remove unnecessary images
  • Split very large documents

Name files descriptively for better organization:

  • q4-2024-financial-report.pdf
  • doc123.pdf

Upload related documents in a single execution for consistent processing.

Error Handling#

ErrorCauseSolution
FILE_TOO_LARGEExceeds max sizeCompress or split the file
UNSUPPORTED_FORMATFile type not allowedCheck allowed formats
EXTRACTION_FAILEDCannot read contentVerify file isn't corrupted
Documentation