extract endpoint.
Key Changes
Deprecated: quality parameter
The quality parameter previously used with values "low" and "high" for both PDF and MP4 file extraction has been deprecated.
New: model parameter
The new model parameter replaces quality and provides more granular control over the extraction process with different model options.
New: Enhanced processing options for video files
For MP4 files, chunking preferences previously set through thequality parameter are now explicitly configured in the processing_options parameter.
Migration Table
| File Type | Old Approach | New Approach | Notes |
|---|---|---|---|
quality="low" | model="aurelio-base" | Fastest, cheapest option for clean PDFs | |
quality="high" | model="docling-base" | Code-based OCR method for high precision | |
| - | model="gemini-2-flash-lite" | New! State-of-the-art VLM-based extraction | |
| MP4 | quality="low", chunk=True | model="aurelio-base", processing_options={"chunking": {"chunker_type": "regex"}} | Basic chunking for videos |
| MP4 | quality="high", chunk=True | model="aurelio-base", processing_options={"chunking": {"chunker_type": "semantic"}} | Semantic chunking for videos |
Code Examples
PDF Extraction
Before (pre-v0.0.19):
After (v0.0.19+):
Video Extraction
Before (pre-v0.0.19):
After (v0.0.19+):
URL Extraction
The changes forextract_url are identical to those for extract_file - replace the quality parameter with the appropriate model parameter, and for videos, specify chunking preferences in processing_options.
VLM-based Extraction (New Feature)
The newgemini-2-flash-lite model uses a Vision Language Model to process PDF content, offering state-of-the-art accuracy. This can be especially valuable for:
- Scanned documents with complex layouts
- Documents with tables, charts, and diagrams
- Documents where context and visual understanding are important
Pricing Considerations
aurelio-basepricing is equivalent to the oldlowquality setting- Both
docling-baseandgemini-2-flash-liteare priced equivalent to the oldhighquality setting
Transition Period
Thequality parameter will continue to work during a transition period but will be removed in a future release. We recommend updating your code to use the new model parameter as soon as possible.
