Features
All the features you need
A complete platform to manage your company data lifecycle
01 · Ingestion
Intelligent Ingestion
Import documents from any source. We handle extraction, cleaning, and preparation.
- Multi-format support: PDF, DOCX, TXT, MD, HTML
- Native connectors: Notion, Google Drive, SharePoint
- Built-in OCR for scanned documents
- Automatic language detection
- Bulk import with queuing
PDF Documents
1,284 docs
Notion
47 pages
Google Drive
312 files
Confluence
89 pages
GitHub
12 repos
Raw document
contract_2026.pdf · 2.4MB
chunk_1
~500 tok
chunk_2
~500 tok
chunk_3
~500 tok
Vectorized
1536 dim · pgvector
02 · Processing
Advanced Processing
Intelligent chunking, cleaning, and metadata extraction for optimal search.
- Smart chunking with context preservation
- Automatic cleaning and normalization
- Metadata extraction (author, date, tags)
- Content deduplication
- Queue with automatic retry
03 · Search API
Powerful Search
Semantic search with filters, reranking, and complete RESTful API.
- Vector similarity with pgvector
- Metadata filters (date, author, source)
- Hybrid search (vector + full-text)
- Result reranking
- RESTful API with key authentication
POST /api/v1/search
// Request
{
"query": "What is our refund policy?",
"limit": 5,
"threshold": 0.7
}
// Response
{
"results": [{
"content": "Our refund policy...",
"score": 0.92,
"document": "policies.pdf"
}]
}