Custom Prompts
Fine-tune document parsing and metadata extraction
Overview#
Custom prompts let you control how IngestIQ parses your documents. You can customize:
How documents are chunked
What metadata is extracted
Parsing Prompts#
Parsing prompts guide the AI in how to break down your documents into chunks.
Setting a Parsing Prompt#
curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Technical Docs Pipeline",
"parsingPrompt": "Parse this technical documentation. Preserve all code blocks exactly. Keep function documentation with the code. Group related concepts together.",
...
}'
Example Prompts by Document Type#
Parse this technical documentation:
1. Keep code blocks complete and unmodified
2. Group explanations with their code examples
3. Preserve command-line examples exactly
4. Keep API endpoints with their descriptions
5. Maintain hierarchical structure (h1 > h2 > h3)
Metadata Extraction#
Enable automatic extraction of structured metadata from documents.
Enabling Metadata Extraction#
curl -X POST http://localhost:3000/api/v2/knowledgebases/{kbId}/pipelines \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Docs with Metadata",
"isMetadataPrompt": true,
"metadataParsingPrompt": "Extract: title, author, date, keywords, summary (max 200 words)",
...
}'
Metadata Prompt Examples#
Extract the following metadata:
- title: Document title
- author: Author name
- date: Publication or creation date
- summary: Brief 2-3 sentence summary
Metadata Response Format#
Extracted metadata is stored in the document:
{
"id": "doc-uuid",
"filename": "api-guide.pdf",
"metadata": {
"title": "IngestIQ API Reference",
"author": "Engineering Team",
"version": "2.0",
"date": "2024-01-15",
"keywords": ["API", "REST", "authentication", "search"],
"summary": "Complete API reference for IngestIQ platform..."
}
}
Prompt Engineering Tips#
Be Specific#
Too vague:
Parse this document nicely.
Specific:
Parse this document by:
1. Splitting at heading boundaries
2. Keeping paragraphs together
3. Preserving list items as units
Use Numbered Instructions#
The AI follows numbered instructions more reliably:
1. First, identify section headings
2. Group content under each heading
3. Keep code blocks with explanations
4. Preserve table structure
Include Examples#
For metadata extraction, show expected format:
Extract metadata in this format:
{
"title": "...",
"date": "YYYY-MM-DD",
"keywords": ["keyword1", "keyword2"]
}
Handle Edge Cases#
Specify what to do with edge cases:
If no author is specified, use "Unknown".
If date cannot be determined, omit the field.
For multi-part documents, include part numbers.
Best Practices#
Create test pipelines with different prompts:
# Create test pipeline
# Upload sample doc
# Review chunks and metadata
# Iterate on prompts
Keep track of prompt changes in your pipeline descriptions or external docs.
Begin with basic prompts and add specificity based on results.
Use different pipelines for different document types, each with tailored prompts.
Common Issues#
| Issue | Cause | Solution |
|---|---|---|
| Chunks too small | Overly aggressive splitting | Add "keep related content together" |
| Missing code | Code not preserved | Add "preserve code blocks exactly" |
| Wrong metadata | Ambiguous instructions | Use specific field names and formats |
| Inconsistent results | Vague prompts | Use numbered, specific instructions |