Documents
What are the differences between `vlm_pipeline_model_local` and `picture_description_local` in Docling, and how do image descriptions, OCR, and table extraction work together? Also, how do the `include_annotations` and `mark_annotations` properties affect exported output?
What are the differences between `vlm_pipeline_model_local` and `picture_description_local` in Docling, and how do image descriptions, OCR, and table extraction work together? Also, how do the `include_annotations` and `mark_annotations` properties affect exported output?
Type
Answer
Status
Published
Created
Dec 23, 2025
Updated
Apr 10, 2026
Created by
Dosu Bot
Updated by
Dosu Bot

Classification Filters for Picture Descriptions#

Deprecation Notice:

  • The parameters picture_description_local and picture_description_api are now deprecated. Please migrate to using picture_description_preset or picture_description_custom_config for specifying picture description models and options.
  • Similarly, for VLM pipelines, use vlm_pipeline_preset or vlm_pipeline_custom_config instead of the deprecated vlm_pipeline_model, vlm_pipeline_model_local, or vlm_pipeline_model_api fields.

For migration:

  • Use picture_description_preset to select a stable, admin-controlled preset (e.g., granite_vision, default).
  • Use picture_description_custom_config to specify a custom model and engine configuration (if allowed by your admin).
  • For VLM pipelines, use vlm_pipeline_preset or vlm_pipeline_custom_config in the same way.

Administrator Configuration for Custom Configs:
Custom configuration options are disabled by default and must be explicitly enabled by administrators via environment variables:

  • DOCLING_SERVE_ALLOW_CUSTOM_VLM_CONFIG - enables vlm_pipeline_custom_config
  • DOCLING_SERVE_ALLOW_CUSTOM_PICTURE_DESCRIPTION_CONFIG - enables picture_description_custom_config
  • DOCLING_SERVE_ALLOW_CUSTOM_CODE_FORMULA_CONFIG - enables code_formula_custom_config
  • DOCLING_SERVE_ALLOW_CUSTOM_TABLE_STRUCTURE_CONFIG - enables custom table structure configuration
  • DOCLING_SERVE_ALLOW_CUSTOM_LAYOUT_CONFIG - enables custom layout configuration
  • DOCLING_SERVE_ALLOW_CUSTOM_PICTURE_CLASSIFICATION_CONFIG - enables custom picture classification configuration
  • DOCLING_SERVE_ALLOW_CUSTOM_OCR_CONFIG - enables custom OCR configuration

These all default to false. If you attempt to use a custom configuration parameter without administrator authorization, API requests will receive a 422 error response. Confirm with your administrator that these features are enabled before using custom configuration options. For more details on these settings, see docs/configuration.md.

The new options provide more flexibility and future-proofing for model selection and configuration. Deprecated fields may be removed in a future release.

The rest of the classification filtering logic remains unchanged and is compatible with the new preset/custom config options.

Picture Classification Categories:
The DocumentFigureClassifier-v2.5 model supports a broad range of classification categories (15+ types) to identify different types of figures and images. This includes:

  • Chart types: bar_chart, pie_chart, line_chart, scatter_plot, box_plot
  • Images: photograph, full_page_image, page_thumbnail
  • Maps: geographical_map, topographical_map
  • Engineering: engineering_drawing
  • Chemistry: chemistry_structure
  • Other categories: music, calendar, crossword_puzzle, screenshot_from_computer, screenshot_from_manual, table
  • And additional types for various document figures

These classification labels are available for filtering and working with different types of figures/images in your document processing pipeline.

Classification-Based Picture Description Filters:
Docling provides fine-grained control over which pictures receive descriptions based on their classification results and confidence scores. Both PictureDescriptionLocal and PictureDescriptionApi configurations support the following filtering parameters:

  • classification_allow (List[PictureClassificationLabel] or NoneType): Only describe pictures whose predicted class is in this allow-list. When set, only pictures with classifications matching one of the specified labels will receive descriptions.

  • classification_deny (List[PictureClassificationLabel] or NoneType): Do not describe pictures whose predicted class is in this deny-list. When set, pictures with classifications matching any of the specified labels will not receive descriptions.

  • classification_min_confidence (float): Minimum classification confidence required before a picture can be described. This allows filtering based on how confident the classification model is about the picture's category.

These parameters give you precise control over which images are processed for description generation, helping optimize performance and focus on the most relevant visual content in your documents.

Chart Extraction Enrichment#

You can now extract structured tabular data from bar, pie, and line charts using the chart extraction enrichment model. This feature uses a vision-language model to convert supported chart images into CSV data, which is then parsed into table metadata.

  • To enable chart extraction, use the --enrich-chart-extraction CLI flag or set do_chart_extraction=True in pipeline options.
  • Chart extraction runs after picture classification and only processes images classified as bar_chart, pie_chart, or line_chart.
  • When chart extraction is enabled, picture classification is automatically turned on (since chart type predictions are required).
  • Extracted table data is stored in the tabular_chart field of the image metadata and can be exported or used in downstream processing.
  • Non-chart images are unaffected by this enrichment step.

Example CLI usage:

docling convert --enrich-chart-extraction ...

Note: Chart extraction is independent from image description and OCR. You can enable any combination of these enrichment steps as needed for your workflow.

Model Variants and Output Formats:
The chart_extraction_options field provides fine-grained control over chart extraction behavior:

  • Model variants:
    • granite-vision-v4 (GraniteVision 4.0): Default model, based on ibm-granite/granite-4.0-3b-vision
    • granite-vision (GraniteVision 3.3): Alternative model, based on ibm-granite/granite-vision-3.3-2b-chart2csv-preview
  • Output formats:
    • chart2csv: Extract chart data to CSV/table format (enabled by default)
    • chart2code: Generate Python code to recreate the chart (disabled by default)
    • chart2summary: Generate natural-language description of the chart (disabled by default)

Example configuration:

from docling.datamodel.chart_extraction_options import ChartExtractionModelKind, ChartExtractionModelOptions

pipeline_options.do_chart_extraction = True
pipeline_options.chart_extraction_options = ChartExtractionModelOptions(
    model=ChartExtractionModelKind.GRANITE_VISION_V4, # or GRANITE_VISION for 3.3
    chart2csv=True,
    chart2code=False,
    chart2summary=False
)

Export Functions and Picture Handling#

Export Parameters:
The export_to_markdown() and export_to_text() functions support the following key parameters:

  • include_annotations (bool): Whether to include annotations in the exported output (default: True for Markdown, not applicable for text)
  • mark_annotations (bool): Whether to mark annotations with special formatting in the export (default: False)
  • compact_tables (bool): Whether to use compact table format without column padding (default: False, Markdown only)
  • traverse_pictures (bool): Whether to traverse into picture items and serialize their text children (default: False)

Handling OCR Text in Scanned/Image-Based PDFs:
When processing scanned or image-based PDFs with force_full_page_ocr=True, the layout model classifies full-page scans as PictureItem nodes. OCR text items are added as children of that picture node in the document tree.

To export OCR text from these documents, you must set traverse_pictures=True when calling export_to_markdown() or export_to_text(). Without this parameter, the export functions will not traverse into picture nodes to retrieve child text items, resulting in empty or incomplete output despite OCR text being present in doc.texts.

Example usage for scanned PDFs:

# Required for scanned/image-based PDFs processed with full-page OCR
text = doc.export_to_text(traverse_pictures=True)
md = doc.export_to_markdown(traverse_pictures=True)

Annotation Export Configuration:

  • The preferred way to configure vision-language models (VLM) and picture description models is via the vlm_pipeline_preset, vlm_pipeline_custom_config, picture_description_preset, and picture_description_custom_config fields. The legacy fields (vlm_pipeline_model, vlm_pipeline_model_local, vlm_pipeline_model_api, picture_description_local, picture_description_api) are deprecated and will be removed in a future release.
  • Annotation export logic (_keep_deprecated_annotations, include_annotations, mark_annotations) works with both the new and legacy configuration fields. However, for future compatibility, migrate to the new preset/custom config fields.
  • The _keep_deprecated_annotations option controls whether the deprecated annotations attribute is populated in picture description and classification results. This option defaults to True for backward compatibility but may change to False in a future release.
  • When using the new configuration options, all annotation export controls (_keep_deprecated_annotations, include_annotations, mark_annotations) continue to function as described above.
  • For more details on the new configuration fields and migration, see the API usage documentation and model management guide.