NetSurf
|
UTF-8 manipulation functions (interface). More...
Go to the source code of this file.
Functions | |
uint32_t | utf8_to_ucs4 (const char *s, size_t l) |
Convert a UTF-8 multibyte sequence into a single UCS4 character. More... | |
size_t | utf8_from_ucs4 (uint32_t c, char *s) |
Convert a single UCS4 character into a UTF-8 multibyte sequence. More... | |
size_t | utf8_length (const char *s) |
Calculate the length (in characters) of a NULL-terminated UTF-8 string. More... | |
size_t | utf8_bounded_length (const char *s, size_t l) |
Calculated the length (in characters) of a bounded UTF-8 string. More... | |
size_t | utf8_bounded_byte_length (const char *s, size_t l, size_t c) |
Calculate the length (in bytes) of a bounded UTF-8 string. More... | |
size_t | utf8_char_byte_length (const char *s) |
Calculate the length (in bytes) of a UTF-8 character. More... | |
size_t | utf8_prev (const char *s, size_t o) |
Find previous legal UTF-8 char in string. More... | |
size_t | utf8_next (const char *s, size_t l, size_t o) |
Find next legal UTF-8 char in string. More... | |
nserror | utf8_to_enc (const char *string, const char *encname, size_t len, char **result) |
Convert a UTF8 string into the named encoding. More... | |
nserror | utf8_from_enc (const char *string, const char *encname, size_t len, char **result, size_t *result_len) |
Convert a string in the named encoding into a UTF-8 string. More... | |
nserror | utf8_to_html (const char *string, const char *encname, size_t len, char **result) |
Convert a UTF-8 encoded string into a string of the given encoding, applying HTML escape sequences where necessary. More... | |
bool | utf8_save_text (const char *utf8_text, const char *path) |
Save the given utf8 text to a file, converting to local encoding. More... | |
nserror | utf8_finalise (void) |
Finalise the UTF-8 library. More... | |
UTF-8 manipulation functions (interface).
Definition in file utf8.h.
size_t utf8_bounded_byte_length | ( | const char * | s, |
size_t | l, | ||
size_t | c | ||
) |
Calculate the length (in bytes) of a bounded UTF-8 string.
s | The string |
l | Maximum length of input (in bytes) |
c | Maximum number of characters to measure |
Definition at line 93 of file utf8.c.
References utf8_next().
Referenced by textarea_insert_text(), and textarea_set_caret().
size_t utf8_bounded_length | ( | const char * | s, |
size_t | l | ||
) |
Calculated the length (in characters) of a bounded UTF-8 string.
s | The string |
l | Maximum length of input (in bytes) |
Definition at line 80 of file utf8.c.
Referenced by nsfont_width(), textarea_insert_text(), textarea_replace_text_internal(), and utf8_length().
size_t utf8_char_byte_length | ( | const char * | s | ) |
Calculate the length (in bytes) of a UTF-8 character.
s | Pointer to start of character |
Definition at line 104 of file utf8.c.
Referenced by ami_key_to_nskey().
nserror utf8_finalise | ( | void | ) |
Finalise the UTF-8 library.
Definition at line 197 of file utf8.c.
References last_cd, NSERROR_OK, and utf8_clear_cd_cache().
Referenced by netsurf_exit().
nserror utf8_from_enc | ( | const char * | string, |
const char * | encname, | ||
size_t | len, | ||
char ** | result, | ||
size_t * | result_len | ||
) |
Convert a string in the named encoding into a UTF-8 string.
string | The NULL-terminated string to convert |
encname | The encoding name (suitable for passing to iconv) |
len | Length of input string to consider (in bytes), or 0 |
result | Pointer to location to store result (allocated on heap) |
result_len | The length of the data placed in result. |
Definition at line 321 of file utf8.c.
References result, and utf8_convert().
Referenced by ami_clipboard_cat_collection(), nsgtk_viewsource(), and utf8_from_local_encoding().
size_t utf8_from_ucs4 | ( | uint32_t | c, |
char * | s | ||
) |
Convert a single UCS4 character into a UTF-8 multibyte sequence.
Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This function conforms to RFC2279, however.
c | The character to process (0 <= c <= 0x7FFFFFFF) |
s | Pointer to 6 byte long output buffer |
Definition at line 56 of file utf8.c.
Referenced by fire_dom_keyboard_event(), ro_textarea_key_press(), and textarea_keypress().
size_t utf8_length | ( | const char * | s | ) |
Calculate the length (in characters) of a NULL-terminated UTF-8 string.
s | The string |
Definition at line 74 of file utf8.c.
References utf8_bounded_length().
Referenced by ro_textarea_insert_text(), ro_textarea_key_press(), ro_textarea_replace_text(), ro_textarea_set_caret(), textarea_replace_text_internal(), and textarea_set_text().
size_t utf8_next | ( | const char * | s, |
size_t | l, | ||
size_t | o | ||
) |
Find next legal UTF-8 char in string.
s | The string |
l | Maximum offset in string |
o | Offset in the string to start at |
Definition at line 129 of file utf8.c.
Referenced by ami_font_bm_convert_local_to_utf8_offset(), amiga_nsfont_position_in_string(), amiga_nsfont_split(), fb_font_position(), fb_font_split(), fb_font_width(), framebuffer_plot_text(), nsgtk_cw_input_method_commit(), nsgtk_window_input_method_commit(), ro_gui_window_import_text(), ro_textarea_get_caret(), ro_textarea_insert_text(), ro_textarea_replace_text(), ro_textarea_set_caret(), ro_textarea_set_caret_xy(), textarea_char_to_byte_offset(), textarea_keypress(), textplain_coord_from_offset(), textplain_offset_from_coords(), textplain_redraw(), utf8_bounded_byte_length(), utf8_convert_html_chunk(), utf8_to_html(), and utf8_to_local_encoding().
size_t utf8_prev | ( | const char * | s, |
size_t | o | ||
) |
Find previous legal UTF-8 char in string.
s | The string |
o | Offset in the string to start at |
Definition at line 117 of file utf8.c.
Referenced by textarea_keypress().
bool utf8_save_text | ( | const char * | utf8_text, |
const char * | path | ||
) |
Save the given utf8 text to a file, converting to local encoding.
utf8_text | text to save to file |
path | pathname to save to |
Definition at line 467 of file utf8.c.
References guit, NSERROR_OK, NSLOG, path(), netsurf_table::utf8, and gui_utf8_table::utf8_to_local.
Referenced by ro_gui_save_content().
nserror utf8_to_enc | ( | const char * | string, |
const char * | encname, | ||
size_t | len, | ||
char ** | result | ||
) |
Convert a UTF8 string into the named encoding.
string | The NULL-terminated string to convert |
encname | The encoding name (suitable for passing to iconv) |
len | Length of input string to consider (in bytes), or 0 |
result | Pointer to location to store result (allocated on heap) |
Definition at line 314 of file utf8.c.
References result, and utf8_convert().
Referenced by ami_font_unicode_width(), amiga_nsfont_position_in_string(), amiga_nsfont_split(), amiga_nsfont_text(), form_encode_item(), utf8_to_font_encoding(), utf8_to_local(), and utf8_to_local_encoding().
nserror utf8_to_html | ( | const char * | string, |
const char * | encname, | ||
size_t | len, | ||
char ** | result | ||
) |
Convert a UTF-8 encoded string into a string of the given encoding, applying HTML escape sequences where necessary.
string | String to convert (NUL-terminated) |
encname | Name of encoding to convert to |
len | Length, in bytes, of the input string, or 0 |
result | Pointer to location to receive result |
Definition at line 369 of file utf8.c.
References cd, get_cached_cd(), NSERROR_NOMEM, NSERROR_OK, result, utf8_clear_cd_cache(), utf8_convert_html_chunk(), and utf8_next().
Referenced by global_history_export_enter_cb(), hotlist_export_enter_cb(), save_complete_node_handler(), save_complete_rewrite_url_value(), and save_complete_write_value().
uint32_t utf8_to_ucs4 | ( | const char * | s, |
size_t | l | ||
) |
Convert a UTF-8 multibyte sequence into a single UCS4 character.
Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This function conforms to RFC2279, however.
[in] | s | The sequence to process |
[in] | l | Length of sequence |
Definition at line 41 of file utf8.c.
Referenced by ami_key_to_nskey(), fb_font_position(), fb_font_split(), fb_font_width(), framebuffer_plot_text(), nsbeos_window_keypress_event(), nsgtk_cw_input_method_commit(), nsgtk_window_input_method_commit(), and utf8_convert_html_chunk().