1.5 KiB
name = "UTF-8" file = "src/string/utf8.rs"
We will focus on the String type and its borrowed variant &str. These are UTF-8 strings, and to enforce this, all functions that create strings can only either give valid UTF-8 strings, or fail (with the error types we encountered before).
deepening
A word about UTF-8: It is a string format where characters (or "codepoints") are encoded using a variable number of bytes. ASCII characters are a subset of UTF-8, thuss are all encoded in 1 byte, but for other characters, they can take 2, 3, or 4 (maximum) bytes. Because of this, random access is difficult, because ou cannot compute the position in memory of the Nth codepoint without iterating through the whole string from the start.
This is why Rust cannot make
characcessible with direct indexing ([]operator), but allow iterating overchars.
Let's implement a function accessing the Nth char of a string:
note
You can use
str::charsto iterate over a string chars. If you want some challenge, you can also read the UTF-8 spec and iterate over single bytes of the string.
/// Returns the char at the asked position (if not out of bound)
pub fn char_at(s: &str, n: usize) -> Option<char> {
unimplemented!()
}
fn main() {
assert_eq!(char_at("abcdef", 2), Some('c'));
assert_eq!(char_at("", 1), None);
assert_eq!(char_at("🧐", 0), Some('🧐'));
}