|
Hubbub $Id$
|
#include <inttypes.h>#include <parserutils/errors.h>Go to the source code of this file.
Functions | |
| parserutils_error | hubbub_charset_extract (const uint8_t *data, size_t len, uint16_t *mibenum, uint32_t *source) |
| Extract a charset from a chunk of data. | |
| uint16_t | hubbub_charset_parse_content (const uint8_t *value, uint32_t valuelen) |
| Parse a content= attribute's value. | |
| void | hubbub_charset_fix_charset (uint16_t *charset) |
| Fix charsets, according to the override table in HTML5, section 8.2.2.2. | |
| parserutils_error hubbub_charset_extract | ( | const uint8_t * | data, |
| size_t | len, | ||
| uint16_t * | mibenum, | ||
| uint32_t * | source | ||
| ) |
Extract a charset from a chunk of data.
| data | Pointer to buffer containing data |
| len | Buffer length |
| mibenum | Pointer to location containing current MIB enum |
| source | Pointer to location containint current charset source |
mibenum and source will be updated on exit
The larger a chunk of data fed to this routine, the better, as it allows charset autodetection access to a larger dataset for analysis.
Meaning of *source on entry:
CONFIDENT - Do not pass Go, do not attempt auto-detection. TENTATIVE - We've tried to autodetect already, but subsequently discovered that we don't actually support the detected charset. Thus, we've defaulted to Windows-1252. Don't perform auto-detection again, as it would be futile. (This bit diverges from the spec) UNKNOWN - No autodetection performed yet. Get on with it.
| void hubbub_charset_fix_charset | ( | uint16_t * | charset | ) |
Fix charsets, according to the override table in HTML5, section 8.2.2.2.
Character encoding requirements http://www.whatwg.org/specs/web-apps/current-work/#character0
| charset | Pointer to charset value to fix |
| uint16_t hubbub_charset_parse_content | ( | const uint8_t * | value, |
| uint32_t | valuelen | ||
| ) |
Parse a content= attribute's value.
| value | Attribute's value |
| valuelen | Length of value |