38 lines
1.5 KiB
Markdown
38 lines
1.5 KiB
Markdown
---
|
|
name = "UTF-8"
|
|
file = "src/string/utf8.rs"
|
|
---
|
|
|
|
|
|
We will focus on the `String` type and its borrowed variant `&str`. These are UTF-8 strings, and to enforce this, all functions that create strings can only either give valid UTF-8 strings, or fail (with the error types we encountered before).
|
|
|
|
> # deepening
|
|
> A word about UTF-8:
|
|
> It is a string format where characters (or "codepoints") are encoded using a variable number of bytes.
|
|
> ASCII characters are a subset of UTF-8, thuss are all encoded in 1 byte, but for other characters, they
|
|
> can take 2, 3, or 4 (maximum) bytes. Because of this, random access is difficult, because ou cannot compute
|
|
> the position in memory of the Nth codepoint without iterating through the whole string from the start.
|
|
>
|
|
> This is why Rust cannot make `char` accessible with direct indexing (`[]` operator), but allow iterating over `char`s.
|
|
|
|
Let's implement a function accessing the Nth char of a string:
|
|
|
|
> # note
|
|
> You can use [`str::chars`](https://doc.rust-lang.org/std/primitive.str.html#method.chars) to iterate over
|
|
> a string chars. If you want some challenge, you can also read the [UTF-8 spec](https://fr.wikipedia.org/wiki/UTF-8) and iterate over single bytes of the string.
|
|
|
|
```prototype
|
|
/// Returns the char at the asked position (if not out of bound)
|
|
pub fn char_at(s: &str, n: usize) -> Option<char> {
|
|
unimplemented!()
|
|
}
|
|
```
|
|
|
|
```example
|
|
fn main() {
|
|
assert_eq!(char_at("abcdef", 2), Some('c'));
|
|
assert_eq!(char_at("", 1), None);
|
|
assert_eq!(char_at("🧐", 0), Some('🧐'));
|
|
}
|
|
```
|