Backup Retention Category System in zerobyte#
Overview#
zerobyte implements a comprehensive retention category system that classifies backup snapshots based on their retention policies. The system uses retention category badges to visually indicate which snapshots are being kept and why, helping users understand their backup retention policies at a glance.
Retention Category Types#
Category Definitions#
The system supports six retention categories:
last(Latest): The most recent snapshothourly: Snapshots retained by hourly policydaily: Snapshots retained by daily policyweekly: Snapshots retained by weekly policymonthly: Snapshots retained by monthly policyyearly: Snapshots retained by yearly policy
These categories correspond to the retention policy configuration options: keepLast, keepHourly, keepDaily, keepWeekly, keepMonthly, keepYearly, and keepWithinDuration.
Visual Representation#
Each category has a distinct color scheme implemented using Tailwind CSS classes:
- Last: Blue (
bg-blue-500/20 text-blue-700) - Hourly: Cyan (
bg-cyan-500/20 text-cyan-700) - Daily: Green (
bg-green-500/20 text-green-700) - Weekly: Orange (
bg-orange-500/20 text-orange-700) - Monthly: Purple (
bg-purple-500/20 text-purple-700) - Yearly: Red (
bg-red-500/20 text-red-700)
Snapshot Analysis and Retention Classification#
Classification Process Overview#
zerobyte leverages restic's native forget command in dry-run mode to determine retention categories rather than reimplementing the retention logic. The restic functionality is implemented in the @zerobyte/core package using a dependency injection pattern with the ResticDeps interface for better testability and modularity. The process involves:
- Fetching retention categories when snapshots are listed
- Running restic forget in dry-run mode
- Parsing restic's output to extract retention reasons
- Mapping reasons to category types
- Attaching categories to snapshot data
Step 1: Fetch Retention Categories#
When snapshots are listed, the repositories controller calls getRetentionCategories in parallel with snapshot listing:
const [res, retentionCategories] = await Promise.all([
repositoriesService.listSnapshots(id, backupId),
repositoriesService.getRetentionCategories(id, backupId),
]);
Step 2: Run Restic Forget in Dry-Run Mode#
The getRetentionCategories function executes the analysis:
- Only runs if a
scheduleIdis provided (snapshots must be associated with a backup schedule) - Validates the repository exists and fetches it from the database to get the stable database ID
- Checks cache using the database ID (see Cache Retrieval section below)
- Returns an empty Map if the repository is invalid or not found (graceful error handling)
- Retrieves the schedule's retention policy
- Calls restic.forget (from
@zerobyte/core/restic/server) withdryRun: trueand--no-lockflags - Uses
--group-by tagsto group snapshots and--tag scheduleIdto filter by schedule
The restic instance is created in app/server/core/restic.ts using dependency injection, passing in required dependencies like secret resolution, organization password retrieval, and configuration paths.
The restic command includes retention policy arguments:
--keep-last N--keep-hourly N--keep-daily N--keep-weekly N--keep-monthly N--keep-yearly N--keep-within-duration DURATION(e.g., "7d", "30d")
Step 3: Parse Restic Output#
The parseRetentionCategories function processes the JSON output from restic's dry-run forget command. This function imports the ResticForgetResponse type from @zerobyte/core/restic:
export const parseRetentionCategories = (dryRunResults: ResticForgetResponse) => {
const categories = new Map<string, RetentionCategory[]>();
for (const group of dryRunResults) {
for (const reason of group.reasons) {
const { short_id } = reason.snapshot;
const categoryList: RetentionCategory[] = [];
for (const match of reason.matches) {
const category = MATCH_TO_CATEGORY[match];
if (category && !categoryList.includes(category)) {
categoryList.push(category);
}
}
if (categoryList.length > 0) {
categories.set(short_id, categoryList);
}
}
}
return categories;
};
The function uses a mapping table to convert restic's reason strings to category types:
const MATCH_TO_CATEGORY: Record<string, RetentionCategory> = {
"last snapshot": "last",
"hourly snapshot": "hourly",
"daily snapshot": "daily",
"weekly snapshot": "weekly",
"monthly snapshot": "monthly",
"yearly snapshot": "yearly",
"oldest hourly snapshot": "hourly",
"oldest daily snapshot": "daily",
"oldest weekly snapshot": "weekly",
"oldest monthly snapshot": "monthly",
"oldest yearly snapshot": "yearly",
};
Step 4: Attach Categories to Snapshots#
The controller merges retention categories with snapshot data:
const snapshots = res.map((snapshot) => {
return {
short_id: snapshot.short_id,
duration: getSnapshotDuration(summary),
paths: snapshot.paths,
tags: snapshot.tags ?? [],
size: summary?.total_bytes_processed ?? 0,
time: new Date(snapshot.time).getTime(),
retentionCategories: retentionCategories.get(snapshot.short_id) ?? [],
summary: summary,
};
});
Snapshots without matching retention categories receive an empty array, indicating they would be deleted by the retention policy.
Time-Based Classification Rules#
Restic determines retention categories based on snapshot timestamps:
keepLast: The N most recent snapshots, regardless of timekeepHourly: One snapshot per hour for the last N hourskeepDaily: One snapshot per day for the last N dayskeepWeekly: One snapshot per week for the last N weekskeepMonthly: One snapshot per month for the last N monthskeepYearly: One snapshot per year for the last N yearskeepWithinDuration: All snapshots within a duration (e.g., "7d", "30d")
Important: A single snapshot can match multiple retention rules. For example, the most recent snapshot might be classified as both "last" and "daily" if it's also the newest snapshot from today.
Cache Invalidation Mechanisms#
Cache Storage#
Retention data is cached in a SQLite database located at cache.db:
- Key format: Generated using
cacheKeys.repository.retention(repository.id, scheduleId)from the centralized cache key management system, whererepository.idis the UUID from the database - Value: JSON-serialized retention categories mapping snapshot IDs to category arrays
- Default expiration: 24 hours (86400 seconds)
Cache keys are managed through the cacheKeys object in app/server/utils/cache.ts, which provides centralized cache key generation functions to prevent typos and improve maintainability.
Cache Retrieval#
The getRetentionCategories function validates the repository and generates cache keys using the database ID:
try {
// First, fetch and validate the repository using its shortId
const repository = await findRepository(repositoryId);
if (!repository) {
return new Map<string, RetentionCategory[]>();
}
// Generate cache key using the stable database ID (repository.id), not the shortId parameter
const cacheKey = cacheKeys.repository.retention(repository.id, scheduleId);
const cached = cache.get<Record<string, RetentionCategory[]>>(cacheKey);
if (cached) {
return new Map(Object.entries(cached));
}
// Cache miss: fetch the backup schedule
const schedule = await backupsService.getScheduleByShortId(scheduleId);
if (!schedule?.retentionPolicy) {
return new Map<string, RetentionCategory[]>();
}
// Fetch fresh data via restic forget --dry-run
const dryRunResults = await restic.forget(repository.config, schedule.retentionPolicy, {
tag: scheduleId,
organizationId
});
const categories = parseRetentionCategories(dryRunResults.data);
cache.set(cacheKey, Object.fromEntries(categories));
} catch (error) {
// Handle errors gracefully
return new Map<string, RetentionCategory[]>();
}
Important: The function validates the repository before generating the cache key, ensuring proper error handling for missing or invalid repositories. The cache key uses repository.id (the UUID from the database) rather than the repositoryId parameter (shortId). This ensures cache invalidation works correctly when cache.delByPrefix(cacheKeys.repository.all(repository.id)) is called after backup operations. Using the stable database ID prevents cache key mismatches that could cause stale retention data to persist. If the repository lookup fails or returns null, the function gracefully returns an empty Map instead of throwing an error. This fix is validated by test coverage that ensures cache invalidation properly triggers recomputation of retention categories.
Cache Invalidation Triggers#
The retention cache is invalidated using cache.del() in two specific scenarios:
1. After Successful Backup Completion#
In backups.execution.ts line 154, after a successful backup:
cache.delByPrefix(cacheKeys.repository.all(ctx.repository.id));
The restic instance used here is imported from app/server/core/restic, which creates the restic instance with the appropriate dependency injection.
This single call clears all repository-related cache entries, including both snapshots and retention data.
2. After Applying Retention Policy (Forget Operation)#
In backups.execution.ts line 332:
await restic.forget(repository.config, schedule.retentionPolicy, {
tag: schedule.shortId,
organizationId
});
cache.delByPrefix(cacheKeys.repository.all(repository.id));
Similar to backup completion, all repository cache entries are cleared after applying retention policies.
Cache Design Principles#
- Centralized key management: Cache keys are generated using the
cacheKeysobject fromapp/server/utils/cache.ts, providing a consistent API and preventing typos across the codebase - Hierarchical invalidation: The
cacheKeys.repository.all(repositoryId)prefix allows clearing all repository-related cache entries (snapshots, retention data, file listings) with a single operation - Schedule-specific retention data: Retention cache keys include both repository and schedule IDs, so updates to one backup schedule don't invalidate retention data for other schedules
- Proactive invalidation: Cache is cleared immediately after operations that modify snapshot state, not lazily on next read
- Coordinated invalidation: All repository-related cache entries are cleared together using prefix-based deletion to maintain data consistency
- Automatic expiration: Expired entries are automatically removed when accessed via
get(), preventing stale data from persisting beyond 24 hours
Available Cache Key Functions#
The cacheKeys.repository object provides the following cache key generation functions:
all(repositoryId): Returns the prefix for all repository-related cache entries (used for bulk invalidation)stats(repositoryId): Cache key for repository statisticssnapshots(repositoryId, backupId?): Cache key for snapshot listings (defaults to "all" backups)ls(repositoryId, snapshotId, path, offset, limit): Cache key for snapshot file listingsretention(repositoryId, scheduleId): Cache key for retention category data
UI Implementation#
Retention Badge Component#
The RetentionCategoryBadges component implements intelligent badge rendering:
Single Category Display#
When a snapshot has only one category, it displays a single badge:
{sortedCategories.length === 1 ? (
<Badge className={cn(categoryColors[sortedCategories[0]], "border text-xs", className)}>
{categoryLabels[sortedCategories[0]]}
</Badge>
) : (
// Multiple category display
)}
Multiple Categories Display#
When multiple categories exist, it shows a summary badge (e.g., "3 tags") that expands on hover using a hover card:
<HoverCard>
<HoverCardTrigger>
<Badge className={cn("border text-xs cursor-pointer", className)}>
{sortedCategories.length} tags
</Badge>
</HoverCardTrigger>
<HoverCardContent className="w-auto p-2">
<div className="flex flex-wrap gap-1">
{sortedCategories.map((category) => (
<Badge key={category} className={cn(categoryColors[category], "border text-xs")}>
{categoryLabels[category]}
</Badge>
))}
</div>
</HoverCardContent>
</HoverCard>
Category Sorting#
Categories are sorted in retention hierarchy order:
const categoryOrder: RetentionCategory[] = ["last", "hourly", "daily", "weekly", "monthly", "yearly"];
const sortedCategories = [...categories].sort((a, b) => {
return categoryOrder.indexOf(a as RetentionCategory) - categoryOrder.indexOf(b as RetentionCategory);
});
Snapshot Timeline View#
The SnapshotTimeline component displays retention badges in a horizontally scrollable card layout:
<RetentionCategoryBadges categories={snapshot.retentionCategories} className="mt-1" />
The timeline displays snapshots as clickable cards showing:
- Snapshot date
- Snapshot time
- Snapshot size
- Retention badges (at the bottom of each card)
This component is used in the backup details page:
<SnapshotTimeline
loading={isLoading}
snapshots={snapshots ?? []}
snapshotId={selectedSnapshot?.short_id}
error={failureReason?.message}
onSnapshotSelect={setSelectedSnapshotId}
/>
Snapshot Details Pages#
Important: Retention category badges are currently not displayed on individual snapshot detail pages.
While the snapshot detail page:
- Fetches the
retentionCategoriesfield from the API - Has access to the retention data
- Displays other snapshot information (ID, hostname, time, backup schedule, volume, paths)
It does not import or render the RetentionCategoryBadges component. Retention badges are only currently visible in the snapshot timeline view on backup schedule pages.
Other Views#
The repository snapshots table shows snapshots in a traditional table format but also does not display retention category badges, only showing snapshot ID, schedule, date/time, size, and duration.
Data Flow Summary#
- API Request: Client requests snapshots for a repository/backup schedule
- Backend Processing:
- Server checks cache for retention categories
- If cache miss, runs restic forget --dry-run
- Parses retention categories from restic output
- Caches results for 24 hours
- Merges categories with snapshot data
- API Response: Snapshots include
retentionCategories: Array<string>field - Client Rendering: UI components render color-coded badges based on categories
- Cache Invalidation: Cache is cleared after backup completion or retention policy application