Module utf8

This package provides some basic utilities for working with Unicode text.

Lily ensures that String values are valid UTF-8, but the String class otherwise does not provide Unicode functionality. Instead, it is provided in this module.

Notes:

Functions

define as_list(string: String): List[Integer]

Return a List of all codepoints in string.

define compare(a: String, b: String): Integer

Compare a and b in terms of codepoints.

If a is lesser than b, -1 is returned. If a is greater than b, 1 is returned. If they are identical, 0 is returned. This is the same return format List.sort uses, so this function can be used as a custom comparator for it.

define each_codepoint(string: String, fn: Function(Integer))

Call fn for each codepoint within string.

define encode(codepoint: Integer): Option[String]

Attempt to encode codepoint as UTF-8.

On success, the encoded character is returned as a String inside a Some. Otherwise, None is returned.

define encode_list(codepoints: List[Integer]): Option[String]

Attempt to encode all codepoints in codepoints as UTF-8.

On success, the encoded characters are joined into a String, which is then returned inside a Some. Otherwise, None is returned.

define get(string: String, index: Integer): Integer

Return the codepoint at index in string.

If a negative index is given, it is treated as an offset from the end of string, with -1 being considered the last element.

define length(string: String): Integer

Return the length of string in codepoints.

define slice(string: String, start: *Integer, stop: *Integer): String

Create a new String copying a section of string from start to stop. Unlike String.slice, the indices refer to codepoints, not bytes.

If a negative index is given, it is treated as an offset from the end of string, with -1 being considered the last element.

On error, this generates an empty String. Error conditions are: