Owl.Data ANSI Escape Sequence Conversion#
Overview#
The Owl.Data module converts ANSI escape sequences into an internal tag-based data representation. This conversion transforms raw chardata containing escape sequences into structured Owl.Tag structures that encapsulate both content and styling. This approach eliminates the need to manually track ANSI sequence state and enables composable, type-safe terminal output formatting.
Internal Data Representation#
The Owl.Tag Structure#
The internal representation uses the Owl.Tag struct:
%Owl.Tag{
sequences: [Owl.Data.sequence()], # List of ANSI sequences (atoms or binaries)
data: data # The actual content
}
Each tag wraps content with a list of sequences representing styling attributes like colors, text effects, and hyperlinks. Sequences can be:
- Atoms for named attributes (
:red,:underline,:bright) - Binaries for extended color codes
- Tuples for hyperlinks (
{:hyperlink, url})
Recursive Type Definition#
The Owl.Data.t() type is defined recursively to support nested, composable structures:
@type t :: [binary() | non_neg_integer() | t() | Owl.Tag.t(t())] | Owl.Tag.t(t()) | binary()
Key recursive elements:
- Lists can contain
t()itself, allowing arbitrary nesting of data structures Owl.Tag.t(t())wraps data of typet(), enabling tags to contain other tags- Tags are themselves valid
t()values, making them first-class composable elements - Includes
non_neg_integer()to support charlists (lists of integer codepoints)
This recursive definition mirrors Elixir's IO.chardata but extends it with first-class styling support, enabling unlimited nesting like Owl.Data.tag(["Hello ", Owl.Data.tag("world", :green), "!"], :red).
The Owl.Data.Sequence Module#
The Owl.Data.Sequence module handles the low-level work of identifying and grouping ANSI escape sequences. It's an internal module that translates between raw escape codes and Owl's structured representation.
Identifying Escape Sequences#
The parse_many/1 function handles two types of ANSI sequences:
- CSI (Control Sequence Introducer) sequences starting with
"\e["- used for colors and text effects - OSC (Operating System Command) sequences starting with
"\e]8"- used for hyperlinks
The parse/1 function identifies individual sequences through pattern matching:
- Named color and effect sequences (
:red,:underline,:bright, etc.) - Hyperlink sequences
- 256-color sequences (
"\e[38;5;"for foreground,"\e[48;5;"for background) - RGB color sequences (
"\e[38;2;"for foreground,"\e[48;2;"for background)
Grouping Escape Sequences#
ANSI sequences often contain multiple parameters separated by semicolons. The Sequence module intelligently groups these into complete semantic units.
extract_csi_attributes/1 parses CSI parameters by splitting on semicolons and extracting integers. For example, "\e[31;42m" becomes [31, 42].
chunk_csi_attributes/1 groups related attributes into complete sequences:
- 256-color patterns:
[38, 5, n]or[48, 5, n](foreground/background with color index) - RGB patterns:
[38, 2, r, g, b]or[48, 2, r, g, b](foreground/background with RGB values) - Individual attributes: Formatted as standalone sequences
This grouping ensures that multi-parameter sequences like true color codes are kept together rather than being split into meaningless individual numbers.
Sequence Types#
The module categorizes sequences by type:
- Colors:
:foregroundand:background(8 basic colors plus light variants) - Text effects:
:blink,:intensity,:underline,:italic,:overlined,:inverse,:reverse - Hyperlinks:
:hyperlinkfor OSC 8 sequences
This categorization enables the conversion algorithm to track which sequences are active and handle conflicts (e.g., when multiple foreground colors are specified).
Conversion Process: from_chardata/1#
The from_chardata/1 function is the main entry point for converting raw ANSI sequences into tagged structures:
def from_chardata(data) do
data =
Regex.split(~r/(\e\[(\d+;)*\d+m)|(\e\]8;.*?;.*?\e\\)/, IO.chardata_to_string(data),
include_captures: true,
trim: true
)
{data, _open_tags} = do_from_chardata(data, %{})
data
end
The conversion process:
- Normalizes input using
IO.chardata_to_string/1(handles strings, charlists, and mixed data) - Splits input using a regex that captures both CSI and OSC sequences
- Recursively processes the split data with
do_from_chardata/2 - Returns the structured tagged data
The regex preserves the escape sequences in the split output (include_captures: true), allowing them to be processed alongside the text content.
Recursive Construction#
The do_from_chardata/2 Algorithm#
The do_from_chardata/2 function processes data recursively using pattern matching with multiple clauses:
Base cases:
- Binary strings are parsed for sequences or tagged based on active sequences
- Empty lists return immediately to avoid unnecessary structure
- Single-element lists unwrap and recurse to simplify the output
Recursive case for lists:
The list processing clause follows a classic recursive pattern:
- Recursively processes the head element
- Recursively processes the tail with the updated state
- Attempts to merge adjacent tags with identical sequences
- Returns the combined result with updated state
This recursive approach naturally handles arbitrarily nested structures while maintaining consistent state throughout the traversal.
State Management#
The algorithm maintains an open_tags map that tracks which sequences are currently active. This state is threaded through all recursive calls to:
- Properly nest tags based on active sequences
- Handle sequence updates -
:resetclears all tags, default values remove specific tag types - Enable intelligent merging of adjacent identical tags
The state threading ensures that when a sequence like "\e[31m" (red) is encountered, all subsequent content is wrapped in a red tag until a reset or different color is encountered.
Merging Neighboring Identical Tags#
One of the key optimizations in Owl.Data is the automatic merging of adjacent tags with identical sequences. This creates cleaner, more efficient data structures.
The Merging Logic#
Tag merging is implemented in do_from_chardata/2 and checks two patterns:
Pattern 1: Two consecutive tags with identical sequences
{%Owl.Tag{data: p1, sequences: s}, %Owl.Tag{data: p2, sequences: s}} ->
data =
if is_list(p2) do
[p1 | p2]
else
[p1, p2]
end
{tag(data, s), open_tags}
Pattern 2: Tag followed by a list starting with a tag with identical sequences
{%Owl.Tag{data: p1, sequences: s}, [%Owl.Tag{data: p2, sequences: s} | rest]} ->
data =
if is_list(p2) do
[p1 | p2]
else
[p1, p2]
end
{[tag(data, s) | rest], open_tags}
Both patterns verify that the sequences field is exactly identical and combine the data into a single tag. The merged tag wraps the combined content with the shared sequences.
When Merging Occurs#
Tag merging happens:
- During
from_chardata/1operation when reconstructing from external sources - When the recursive traversal encounters adjacent tags with the same sequence list
- Only when sequences are exactly identical (same elements, same order)
The merging does not occur during the reverse operation (to_chardata/1), which means adjacent identical tags will generate redundant ANSI escape sequences in the output.
Optimization Benefits#
- Reduced memory overhead: A single tag wraps all related content instead of multiple separate tag structures
- Simpler data structure: The resulting structure is easier to inspect and debug
- More semantic representation: Logically groups content that shares formatting, making the intent clearer
For example, when processing output from an external program that outputs "\e[31mHello\e[0m\e[31m?!\e[0m", the merging optimization recognizes that both "Hello" and "?!" share the same red color and combines them into a single tag.
Before/After Conversion Examples#
Example 1: Basic Red Text Merging#
[Owl.Data.tag("Hello", :red), Owl.Data.tag("?!", :red)]
Owl.Data.tag(["Hello", "?!"], :red)
The two separate red tags are merged into a single tag containing both strings.
Example 2: TrueColor Merging#
[
Owl.Data.tag("#", Owl.TrueColor.color(253, 151, 31)),
Owl.Data.tag(" ", Owl.TrueColor.color(253, 151, 31)),
Owl.Data.tag("Owl", Owl.TrueColor.color(253, 151, 31))
]
Owl.Data.tag(["#", " ", "Owl"], Owl.TrueColor.color(253, 151, 31))
Three separate tags using the same TrueColor orange are merged into one, significantly simplifying the structure.
Example 3: ANSI Sequence Conversion#
# Basic conversion
[:red, "hello"] |> IO.ANSI.format() |> Owl.Data.from_chardata()
# => Owl.Data.tag("hello", :red)
# Multiple attributes in one sequence
Owl.Data.from_chardata("\e[31;42mHello\e[0m")
# => Owl.Data.tag("Hello", [:red, :green_background])
# True color support with multiple attributes
Owl.Data.from_chardata("\e[4;38;2;166;226;46;48;2;33;39;112mHello\e[0m")
# => Owl.Data.tag("Hello", [:underline, Owl.TrueColor.color(166, 226, 46),
# Owl.TrueColor.color_background(33, 39, 112)])
These examples demonstrate how raw ANSI sequences are parsed into structured tags, with multiple attributes correctly grouped together.
Example 4: Nested Tag Construction#
Owl.Data.tag(
[
"hi",
"a\nand new",
" line ",
Owl.Data.tag(" hey\n aloha", :red), # Nested tag
"!!"
],
:green
)
This example shows nested tags where red text is embedded within green text. The recursive structure naturally represents this hierarchy, and operations like splitting or slicing preserve the nesting.
Charlist Handling Improvements#
Owl.Data provides robust support for charlists (lists of integer codepoints), treating them as first-class data types alongside strings. This improves compatibility with Elixir's standard IO.chardata and enables seamless handling of mixed data types.
Native Integer Support#
The type definition explicitly includes non_neg_integer(), making charlists first-class citizens alongside strings and tags. This design choice ensures that any valid IO.chardata can be processed through Owl.Data without special handling.
Automatic Character Conversion#
Integer codepoints are automatically converted to UTF-8 strings when processed by internal functions:
defp do_chunk_by(value, chunk_acc, chunk_fun, acc, acc_sequences) when is_integer(value) do
do_chunk_by(<<value::utf8>>, chunk_acc, chunk_fun, acc, acc_sequences)
end
This pattern appears throughout the module, ensuring integers are handled correctly during:
- Length calculation
- Untagging operations
- Chunking and splitting operations
Seamless ANSI Integration#
The from_chardata/1 function uses IO.chardata_to_string/1 to normalize input at the start of conversion. This allows charlists mixed with ANSI escape sequences to be parsed into tagged structures without any special handling.
Charlist Examples#
# Converting charlists with ANSI codes
Owl.Data.from_chardata(["\e[31m", ~c"Hello"]) == Owl.Data.tag("Hello", :red)
# Splitting charlists
Owl.Data.split(~c"hello", "e") == ["h", ["l", "l", "o"]]
# Untagging preserves charlist structure
Owl.Data.tag([72, 101, 108, 108, 111], :red) |> Owl.Data.untag()
# => ~c"Hello"
These examples show that charlists work seamlessly with ANSI sequences, splitting operations preserve structure, and untagging can return charlists when appropriate.
Benefits#
- Compatibility: Works with standard Elixir
IO.chardatatypes without conversion - Preservation: Charlists maintain their structure through tag operations
- Consistency: Operations like splitting, slicing, and length calculation work uniformly across strings, charlists, and mixed data
- ANSI sequence handling: Charlists can be seamlessly mixed with escape sequences and converted to tagged representations
This design ensures that code accepting IO.chardata can be easily extended to use Owl.Data's tagged representation without breaking compatibility with existing charlists.