Interactive HTML Evaluation Reports

The interactive HTML reports generated by the visualization module provide a comprehensive, shareable, and interactive summary of LLM experiment results. These reports are designed for both technical and non-technical audiences, offering rich visualizations, advanced filtering, and export capabilities in a modern, responsive layout. The reports are generated using the create_interactive_html_report function, which outputs a self-contained HTML file with embedded data and interactive controls [source].

Dashboard Layout#

The report dashboard consists of several key sections:

Header: Displays the report title and action buttons for toggling dark/light mode, exporting data as JSON or CSV, and other controls.
Summary Statistics Grid: Shows high-level metrics such as the number of experiments, total results, success rate, average accuracy, average latency, total tokens, number of models, and number of probes.
Filters: Provides dropdowns for model, probe, and provider, as well as a text search input. These filters apply to the results table and model comparison cards.
Visualizations Section: Contains interactive Plotly charts organized in tabs.
Model Comparison Section: Displays cards for each model, summarizing average accuracy, latency, success rate, and total tokens.
Results Table: A sortable, paginated table listing experiment results, with expandable rows for detailed metrics and sample results [source].

Available Charts#

Charts are rendered using Plotly and are accessible via tab navigation. The following visualizations are included:

Accuracy Comparison: Bar chart comparing model accuracies, grouped by model or probe. Hovering reveals provider and success rate details.
Latency Distribution: Box, violin, or histogram chart showing response latency distributions per model and probe.
Performance Radar: Radar (spider) chart for multi-metric comparison (accuracy, precision, recall, F1 score) across models.
Performance Heatmap: Heatmap visualizing a chosen metric (e.g., accuracy) across models and probes.
Token Usage: Bar chart displaying total token usage by model and probe.
Status Breakdown: Stacked bar chart summarizing success, error, and other result statuses per model and probe [source].

Filtering and Sorting#

Filtering controls allow users to refine the displayed data:

Dropdown Filters: Select a specific model, probe, or provider to filter results.
Text Search: Enter keywords to search across all result fields.
Reset Filters: Clear all filters with a single button.

Filters apply to both the results table and the model comparison cards. The results table supports column sorting by clicking on headers, toggling between ascending and descending order with visual indicators. Pagination is provided, displaying 20 rows per page with navigation controls [source].

Rows in the results table can be expanded to reveal detailed experiment information, including metrics and sample results.

Theming Options#

The report supports dark and light themes. A toggle button in the header switches between modes, updating CSS variables and Plotly chart backgrounds accordingly. The selected theme is persisted in the browser's localStorage, so user preferences are retained across sessions [source].

Export Functionality#

Export buttons in the header allow users to download the experiment data:

Export JSON: Downloads the full embedded experiment data as a JSON file.
Export CSV: Downloads a CSV file containing key metrics and metadata for each experiment.

The export functions operate entirely client-side, extracting the embedded data from the HTML report [source].

Embedding Experiment JSON#

All experiment data is embedded directly in the HTML report within a <script id="experiment-data" type="application/json"> tag. This enables:

Client-side interactivity and filtering without server requests.
Easy extraction and reuse of the underlying data for further analysis or sharing.
Export functionality for JSON and CSV.

To extract the embedded JSON, locate the <script id="experiment-data" type="application/json"> tag in the HTML source or use the provided export buttons [source].

Customizing the Report#

The create_interactive_html_report function provides several parameters for customizing the report:

from insideLLMs.visualization import create_interactive_html_report

create_interactive_html_report(
    experiments, # List of ExperimentResult objects
    title="My Custom Report", # Custom report title
    save_path="my_report.html", # Output file name
    include_raw_results=True, # Include the results summary table
    include_individual_results=True, # Include expandable individual results
    embed_plotly_js=False, # Embed Plotly.js (True for offline, False for CDN)
    generated_at=None # Custom generation timestamp (optional)
)

Set title to customize the report heading.
Use save_path to specify the output HTML file name.
Toggle include_raw_results and include_individual_results to control the presence of the results table and expandable details.
Set embed_plotly_js=True to embed Plotly.js for fully offline reports (note: increases file size).
The report is fully self-contained and can be shared as a single HTML file [source].

Best Practices#

The report is responsive and suitable for both desktop and mobile viewing.
Print styles are included for generating hard copies.
For large experiment sets, use filtering and pagination to navigate results efficiently.
Share the HTML file directly; all data and visualizations are embedded.

For further details, refer to the implementation in insideLLMs/visualization.py and the enhanced interactive HTML report pull request.