Custom Prompts

Fine-tune document parsing and metadata extraction

Overview#

Custom prompts let you control how IngestIQ parses your documents. You can customize:

Parsing Prompts

How documents are chunked

Metadata Prompts

What metadata is extracted

Parsing Prompts#

Parsing prompts guide the AI in how to break down your documents into chunks.

Setting a Parsing Prompt#

curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Technical Docs Pipeline",
    "parsingPrompt": "Parse this technical documentation. Preserve all code blocks exactly. Keep function documentation with the code. Group related concepts together.",
    ...
  }'

Example Prompts by Document Type#

Parse this technical documentation:

1. Keep code blocks complete and unmodified
2. Group explanations with their code examples
3. Preserve command-line examples exactly
4. Keep API endpoints with their descriptions
5. Maintain hierarchical structure (h1 > h2 > h3)

Metadata Extraction#

Enable automatic extraction of structured metadata from documents.

Enabling Metadata Extraction#

curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Docs with Metadata",
    "isMetadataPrompt": true,
    "metadataParsingPrompt": "Extract: title, author, date, keywords, summary (max 200 words)",
    ...
  }'

Metadata Prompt Examples#

Extract the following metadata:
- title: Document title
- author: Author name
- date: Publication or creation date
- summary: Brief 2-3 sentence summary

Metadata Response Format#

Extracted metadata is stored in the document:

{
  "id": "doc-uuid",
  "filename": "api-guide.pdf",
  "metadata": {
    "title": "IngestIQ API Reference",
    "author": "Engineering Team",
    "version": "2.0",
    "date": "2024-01-15",
    "keywords": ["API", "REST", "authentication", "search"],
    "summary": "Complete API reference for IngestIQ platform..."
  }
}

Prompt Engineering Tips#

Be Specific#

Too vague:

Parse this document nicely.

Specific:

Parse this document by:
1. Splitting at heading boundaries
2. Keeping paragraphs together
3. Preserving list items as units

Use Numbered Instructions#

The AI follows numbered instructions more reliably:

1. First, identify section headings
2. Group content under each heading
3. Keep code blocks with explanations
4. Preserve table structure

Include Examples#

For metadata extraction, show expected format:

Extract metadata in this format:
{
  "title": "...",
  "date": "YYYY-MM-DD",
  "keywords": ["keyword1", "keyword2"]
}

Handle Edge Cases#

Specify what to do with edge cases:

If no author is specified, use "Unknown".
If date cannot be determined, omit the field.
For multi-part documents, include part numbers.

Best Practices#

Create test pipelines with different prompts:

# Create test pipeline
# Upload sample doc
# Review chunks and metadata
# Iterate on prompts

Keep track of prompt changes in your pipeline descriptions or external docs.

Begin with basic prompts and add specificity based on results.

Use different pipelines for different document types, each with tailored prompts.

Common Issues#

Issue	Cause	Solution
Chunks too small	Overly aggressive splitting	Add "keep related content together"
Missing code	Code not preserved	Add "preserve code blocks exactly"
Wrong metadata	Ambiguous instructions	Use specific field names and formats
Inconsistent results	Vague prompts	Use numbered, specific instructions