NetSurf
Functions
utf8.h File Reference

UTF-8 manipulation functions (interface). More...

#include <stdbool.h>
#include <stdint.h>
#include "utils/errors.h"
Include dependency graph for utf8.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

uint32_t utf8_to_ucs4 (const char *s, size_t l)
 Convert a UTF-8 multibyte sequence into a single UCS4 character. More...
 
size_t utf8_from_ucs4 (uint32_t c, char *s)
 Convert a single UCS4 character into a UTF-8 multibyte sequence. More...
 
size_t utf8_length (const char *s)
 Calculate the length (in characters) of a NULL-terminated UTF-8 string. More...
 
size_t utf8_bounded_length (const char *s, size_t l)
 Calculated the length (in characters) of a bounded UTF-8 string. More...
 
size_t utf8_bounded_byte_length (const char *s, size_t l, size_t c)
 Calculate the length (in bytes) of a bounded UTF-8 string. More...
 
size_t utf8_char_byte_length (const char *s)
 Calculate the length (in bytes) of a UTF-8 character. More...
 
size_t utf8_prev (const char *s, size_t o)
 Find previous legal UTF-8 char in string. More...
 
size_t utf8_next (const char *s, size_t l, size_t o)
 Find next legal UTF-8 char in string. More...
 
nserror utf8_to_enc (const char *string, const char *encname, size_t len, char **result)
 Convert a UTF8 string into the named encoding. More...
 
nserror utf8_from_enc (const char *string, const char *encname, size_t len, char **result, size_t *result_len)
 Convert a string in the named encoding into a UTF-8 string. More...
 
nserror utf8_to_html (const char *string, const char *encname, size_t len, char **result)
 Convert a UTF-8 encoded string into a string of the given encoding, applying HTML escape sequences where necessary. More...
 
bool utf8_save_text (const char *utf8_text, const char *path)
 Save the given utf8 text to a file, converting to local encoding. More...
 
nserror utf8_finalise (void)
 Finalise the UTF-8 library. More...
 

Detailed Description

UTF-8 manipulation functions (interface).

Definition in file utf8.h.

Function Documentation

◆ utf8_bounded_byte_length()

size_t utf8_bounded_byte_length ( const char *  s,
size_t  l,
size_t  c 
)

Calculate the length (in bytes) of a bounded UTF-8 string.

Parameters
sThe string
lMaximum length of input (in bytes)
cMaximum number of characters to measure
Returns
Length of string, in bytes

Definition at line 93 of file utf8.c.

References utf8_next().

Referenced by textarea_insert_text(), and textarea_set_caret().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_bounded_length()

size_t utf8_bounded_length ( const char *  s,
size_t  l 
)

Calculated the length (in characters) of a bounded UTF-8 string.

Parameters
sThe string
lMaximum length of input (in bytes)
Returns
Length of string, in characters

Definition at line 80 of file utf8.c.

Referenced by nsfont_width(), textarea_insert_text(), textarea_replace_text_internal(), and utf8_length().

Here is the caller graph for this function:

◆ utf8_char_byte_length()

size_t utf8_char_byte_length ( const char *  s)

Calculate the length (in bytes) of a UTF-8 character.

Parameters
sPointer to start of character
Returns
Length of character, in bytes

Definition at line 104 of file utf8.c.

Referenced by ami_key_to_nskey().

Here is the caller graph for this function:

◆ utf8_finalise()

nserror utf8_finalise ( void  )

Finalise the UTF-8 library.

Definition at line 197 of file utf8.c.

References last_cd, NSERROR_OK, and utf8_clear_cd_cache().

Referenced by netsurf_exit().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_from_enc()

nserror utf8_from_enc ( const char *  string,
const char *  encname,
size_t  len,
char **  result,
size_t *  result_len 
)

Convert a string in the named encoding into a UTF-8 string.

Parameters
stringThe NULL-terminated string to convert
encnameThe encoding name (suitable for passing to iconv)
lenLength of input string to consider (in bytes), or 0
resultPointer to location to store result (allocated on heap)
result_lenThe length of the data placed in result.
Returns
standard nserror value

Definition at line 321 of file utf8.c.

References result, and utf8_convert().

Referenced by ami_clipboard_cat_collection(), nsgtk_viewsource(), and utf8_from_local_encoding().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_from_ucs4()

size_t utf8_from_ucs4 ( uint32_t  c,
char *  s 
)

Convert a single UCS4 character into a UTF-8 multibyte sequence.

Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This function conforms to RFC2279, however.

Parameters
cThe character to process (0 <= c <= 0x7FFFFFFF)
sPointer to 6 byte long output buffer
Returns
Length of multibyte sequence

Definition at line 56 of file utf8.c.

Referenced by fire_dom_keyboard_event(), ro_textarea_key_press(), and textarea_keypress().

Here is the caller graph for this function:

◆ utf8_length()

size_t utf8_length ( const char *  s)

Calculate the length (in characters) of a NULL-terminated UTF-8 string.

Parameters
sThe string
Returns
Length of string

Definition at line 74 of file utf8.c.

References utf8_bounded_length().

Referenced by ro_textarea_insert_text(), ro_textarea_key_press(), ro_textarea_replace_text(), ro_textarea_set_caret(), textarea_replace_text_internal(), and textarea_set_text().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_next()

size_t utf8_next ( const char *  s,
size_t  l,
size_t  o 
)

Find next legal UTF-8 char in string.

Parameters
sThe string
lMaximum offset in string
oOffset in the string to start at
Returns
Offset of first byte of next legal character

Definition at line 129 of file utf8.c.

Referenced by ami_font_bm_convert_local_to_utf8_offset(), amiga_nsfont_position_in_string(), amiga_nsfont_split(), fb_font_position(), fb_font_split(), fb_font_width(), framebuffer_plot_text(), nsgtk_cw_input_method_commit(), nsgtk_window_input_method_commit(), ro_gui_window_import_text(), ro_textarea_get_caret(), ro_textarea_insert_text(), ro_textarea_replace_text(), ro_textarea_set_caret(), ro_textarea_set_caret_xy(), textarea_char_to_byte_offset(), textarea_keypress(), textplain_coord_from_offset(), textplain_offset_from_coords(), textplain_redraw(), utf8_bounded_byte_length(), utf8_convert_html_chunk(), utf8_to_html(), and utf8_to_local_encoding().

Here is the caller graph for this function:

◆ utf8_prev()

size_t utf8_prev ( const char *  s,
size_t  o 
)

Find previous legal UTF-8 char in string.

Parameters
sThe string
oOffset in the string to start at
Returns
Offset of first byte of previous legal character

Definition at line 117 of file utf8.c.

Referenced by textarea_keypress().

Here is the caller graph for this function:

◆ utf8_save_text()

bool utf8_save_text ( const char *  utf8_text,
const char *  path 
)

Save the given utf8 text to a file, converting to local encoding.

Parameters
utf8_texttext to save to file
pathpathname to save to
Returns
true iff the save succeeded

Definition at line 467 of file utf8.c.

References guit, NSERROR_OK, NSLOG, path(), netsurf_table::utf8, and gui_utf8_table::utf8_to_local.

Referenced by ro_gui_save_content().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_to_enc()

nserror utf8_to_enc ( const char *  string,
const char *  encname,
size_t  len,
char **  result 
)

Convert a UTF8 string into the named encoding.

Parameters
stringThe NULL-terminated string to convert
encnameThe encoding name (suitable for passing to iconv)
lenLength of input string to consider (in bytes), or 0
resultPointer to location to store result (allocated on heap)
Returns
standard nserror value

Definition at line 314 of file utf8.c.

References result, and utf8_convert().

Referenced by ami_font_unicode_width(), amiga_nsfont_position_in_string(), amiga_nsfont_split(), amiga_nsfont_text(), form_encode_item(), utf8_to_font_encoding(), utf8_to_local(), and utf8_to_local_encoding().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_to_html()

nserror utf8_to_html ( const char *  string,
const char *  encname,
size_t  len,
char **  result 
)

Convert a UTF-8 encoded string into a string of the given encoding, applying HTML escape sequences where necessary.

Parameters
stringString to convert (NUL-terminated)
encnameName of encoding to convert to
lenLength, in bytes, of the input string, or 0
resultPointer to location to receive result
Returns
standard nserror code

Definition at line 369 of file utf8.c.

References cd, get_cached_cd(), NSERROR_NOMEM, NSERROR_OK, result, utf8_clear_cd_cache(), utf8_convert_html_chunk(), and utf8_next().

Referenced by global_history_export_enter_cb(), hotlist_export_enter_cb(), save_complete_node_handler(), save_complete_rewrite_url_value(), and save_complete_write_value().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ utf8_to_ucs4()

uint32_t utf8_to_ucs4 ( const char *  s,
size_t  l 
)

Convert a UTF-8 multibyte sequence into a single UCS4 character.

Encoding of UCS values outside the UTF-16 plane has been removed from RFC3629. This function conforms to RFC2279, however.

Parameters
[in]sThe sequence to process
[in]lLength of sequence
Returns
UCS4 character

Definition at line 41 of file utf8.c.

Referenced by ami_key_to_nskey(), fb_font_position(), fb_font_split(), fb_font_width(), framebuffer_plot_text(), nsbeos_window_keypress_event(), nsgtk_cw_input_method_commit(), nsgtk_window_input_method_commit(), and utf8_convert_html_chunk().

Here is the caller graph for this function: