Module utf8

This package provides some basic utilities for working with Unicode text.

Lily ensures that String values are valid UTF-8, but the String class otherwise does not provide Unicode functionality. Instead, it is provided in this module.

Notes:

This module works with codepoints, not graphemes, since the latter are significantly more complicated to handle.
There are currently no utilities for inspecting properties of codepoints.

Functions

`define as_list(string: String): List[Integer]`

Return a List of all codepoints in string.

`define compare(a: String, b: String): Integer`

Compare a and b in terms of codepoints.

If a is lesser than b, -1 is returned. If a is greater than b, 1 is returned. If they are identical, 0 is returned. This is the same return format List.sort uses, so this function can be used as a custom comparator for it.

`define each_codepoint(string: String, fn: Function(Integer))`

Call fn for each codepoint within string.

`define each_codepoint_with_index(string: String, fn: Function(Integer, Integer))`

Call fn for each codepoint within string, additionally passing the index of each codepoint as a second argument.

`define encode(codepoint: Integer): Option[String]`

Attempt to encode codepoint as UTF-8.

On success, the encoded character is returned as a String inside a Some. Otherwise, None is returned.

`define encode_list(codepoints: List[Integer]): Option[String]`

Attempt to encode all codepoints in codepoints as UTF-8.

On success, the encoded characters are joined into a String, which is then returned inside a Some. Otherwise, None is returned.

`define get(string: String, index: Integer): Integer`

Return the codepoint at index in string.

If a negative index is given, it is treated as an offset from the end of string, with -1 being considered the last element.

Errors

IndexError if index is out of range.

`define length(string: String): Integer`

Return the length of string in codepoints.

`define slice(string: String, start: Integer, stop: Integer): String`

Create a new String copying a section of string from start to stop. Unlike String.slice, the indices refer to codepoints, not bytes.

If a negative index is given, it is treated as an offset from the end of string, with -1 being considered the last element.

On error, this generates an empty String. Error conditions are:

Either start or stop is out of range.
The start is larger than the stop (reversed).

Module utf8

Functions

define as_list(string: String): List[Integer]

define compare(a: String, b: String): Integer

define each_codepoint(string: String, fn: Function(Integer))

define each_codepoint_with_index(string: String, fn: Function(Integer, Integer))

define encode(codepoint: Integer): Option[String]

define encode_list(codepoints: List[Integer]): Option[String]

define get(string: String, index: Integer): Integer

define length(string: String): Integer

define slice(string: String, start: *Integer, stop: *Integer): String