2025-03-15 10:50:05 +01:00

38 lines
1.5 KiB
Markdown

---
name = "UTF-8"
file = "src/string/utf8.rs"
---
We will focus on the `String` type and its borrowed variant `&str`. These are UTF-8 strings, and to enforce this, all functions that create strings can only either give valid UTF-8 strings, or fail (with the error types we encountered before).
> # deepening
> A word about UTF-8:
> It is a string format where characters (or "codepoints") are encoded using a variable number of bytes.
> ASCII characters are a subset of UTF-8, thuss are all encoded in 1 byte, but for other characters, they
> can take 2, 3, or 4 (maximum) bytes. Because of this, random access is difficult, because ou cannot compute
> the position in memory of the Nth codepoint without iterating through the whole string from the start.
>
> This is why Rust cannot make `char` accessible with direct indexing (`[]` operator), but allow iterating over `char`s.
Let's implement a function accessing the Nth char of a string:
> # note
> You can use [`str::chars`](https://doc.rust-lang.org/std/primitive.str.html#method.chars) to iterate over
> a string chars. If you want some challenge, you can also read the [UTF-8 spec](https://fr.wikipedia.org/wiki/UTF-8) and iterate over single bytes of the string.
```prototype
/// Returns the char at the asked position (if not out of bound)
pub fn char_at(s: &str, n: usize) -> Option<char> {
unimplemented!()
}
```
```example
fn main() {
assert_eq!(char_at("abcdef", 2), Some('c'));
assert_eq!(char_at("", 1), None);
assert_eq!(char_at("🧐", 0), Some('🧐'));
}
```