Joomla CMS
3.10.11 (avec JPlatform 13.1 inclus)
Documentation des API du CMS Joomla en version 3.10.11 et du framework Joomla Platform intégré
|
Define UTF8_CORE as required Wrapper round mb_strlen Assumes you have mb_internal_encoding to UTF-8 already Note: this function does not count bad bytes in the string - these are simply ignored
string | UTF-8 string |
Assumes mbstring internal encoding is set to UTF-8 Wrapper around mb_strpos Find position of first occurrence of a string
string | haystack |
string | needle (you should validate this with utf8_is_valid) |
integer | offset in characters (from left) |
Assumes mbstring internal encoding is set to UTF-8 Wrapper around mb_strrpos Find position of last occurrence of a char in a string
string | haystack |
string | needle (you should validate this with utf8_is_valid) |
integer | (optional) offset (from left) |
Assumes mbstring internal encoding is set to UTF-8 Wrapper around mb_substr Return part of a string given character offset (and optionally length)
string | |
integer | number of UTF-8 characters offset (from left) |
integer | (optional) length in UTF-8 characters from offset |
Assumes mbstring internal encoding is set to UTF-8 Wrapper around mb_strtolower Make a string lowercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings
string |
Assumes mbstring internal encoding is set to UTF-8 Wrapper around mb_strtoupper Make a string uppercase Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings
string |
Define UTF8_CORE as required Unicode aware replacement for strlen(). Returns the number of characters in the string (not the number of bytes), replacing multibyte characters with a single byte equivalent utf8_decode() converts characters that are not in ISO-8859-1 to '?', which, for the purpose of counting, is alright - It's much faster than iconv_strlen Note: this function does not count bad UTF-8 bytes in the string
if ( utf8_is_ascii($someString) ) { // It's just ASCII - use the native PHP version $someString = strtolower($someString); } else { $someString = utf8_strtolower($someString); }
string boolean TRUE if it's all ASCII utf8_is_ascii_ctrl Tests whether a string contains only 7bit ASCII bytes with device control codes omitted. The device control codes can be found on the second table here: http://www.w3schools.com/tags/ref_ascii.asp string boolean TRUE if it's all ASCII without device control codes utf8_is_ascii Strip out all non-7bit ASCII bytes If you need to transmit a string to system which you know can only support 7bit ASCII, you could use this function. string string with non ASCII bytes removed utf8_strip_non_ascii_ctrl Strip out all non 7bit ASCII bytes and ASCII device control codes. For a list of ASCII device control codes see the 2nd table here: http://www.w3schools.com/tags/ref_ascii.asp string boolean TRUE if it's all ASCII Replace accented UTF-8 characters by unaccented ASCII-7 "equivalents". The purpose of this function is to replace characters commonly found in Latin alphabets with something more or less equivalent from the ASCII range. This can be useful for converting a UTF-8 to something ready for a filename, for example. Following the use of this function, you would probably also pass the string through utf8_strip_non_ascii to clean out any other non-ASCII chars Use the optional parameter to just deaccent lower ($case = -1) or upper ($case = 1) letters. Default is to deaccent both cases ($case = 0) For a more complete implementation of transliteration, see the utf8_to_ascii package available from the phputf8 project downloads: http://prdownloads.sourceforge.net/phputf8 string UTF-8 string int (optional) -1 lowercase only, +1 uppercase only, 1 both cases string UTF-8 with accented characters replaced by ASCII chars string accented chars replaced with ascii equivalents Andreas Gohr andi@.nosp@m.spli.nosp@m.tbrai.nosp@m.n.or.nosp@m.g Tools for locating / replacing bad bytes in UTF-8 strings The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com) http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp http://hsivonen.iki.fi/php-utf8/ utf8_is_valid Locates the first bad byte in a UTF-8 string returning it's byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 string mixed integer byte index or FALSE if no bad found Locates all bad bytes in a UTF-8 string and returns a list of their byte index in the string PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 string mixed array of integers or FALSE if no bad found Strips out any bad bytes from a UTF-8 string and returns the rest PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 string string Replace bad bytes with an alternative character - ASCII character recommended is replacement char PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 string to search string to replace bad bytes with (defaults to '?') - use ASCII string Return code from utf8_bad_identify() when a five octet sequence is detected. Note: 5 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character utf8_bad_identify Return code from utf8_bad_identify() when a six octet sequence is detected. Note: 6 octets sequences are valid UTF-8 but are not supported by Unicode so do not represent a useful character utf8_bad_identify Return code from utf8_bad_identify(). Invalid octet for use as start of multi-byte UTF-8 sequence utf8_bad_identify Return code from utf8_bad_identify(). From Unicode 3.1, non-shortest form is illegal utf8_bad_identify Return code from utf8_bad_identify(). From Unicode 3.2, surrogate characters are illegal utf8_bad_identify Return code from utf8_bad_identify(). Codepoints outside the Unicode range are illegal utf8_bad_identify Return code from utf8_bad_identify(). Incomplete multi-octet sequence Note: this is kind of a "catch-all" utf8_bad_identify Reports on the type of bad byte found in a UTF-8 string. Returns a status code on the first bad byte found Joomla modification - As of PHP 7.4, curly brace access has been deprecated. As a result this function has been modified to use square brace syntax See https://github.com/php/php-src/commit/d574df63dc375f5fc9202ce5afde23f866b6450a for additional references hsivo.nosp@m.nen@.nosp@m.iki.f.nosp@m.i string UTF-8 encoded string mixed integer constant describing problem or FALSE if valid UTF-8 utf8_bad_explain http://hsivonen.iki.fi/php-utf8/ Takes a return code from utf8_bad_identify() are returns a message (in English) explaining what the problem is. int return code from utf8_bad_identify mixed string message or FALSE if return code unknown utf8_bad_identify PCRE Regular expressions for UTF-8. Note this file is not actually used by the rest of the library but these regular expressions can be useful to have available. http://www.w3.org/International/questions/qa-forms-utf-8 PCRE Pattern to check a UTF-8 string is valid Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 PCRE Pattern to match single UTF-8 characters Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms Note: modified to include full ASCII range including control chars http://www.w3.org/International/questions/qa-forms-utf-8 Locate a byte index given a UTF-8 character index Given a string and a character index in the string, in terms of the UTF-8 character position, returns the byte index of that character. Can be useful when you want to PHP's native string functions but we warned, locating the byte can be expensive Takes variable number of parameters - first must be the search string then 1 to n UTF-8 character positions to obtain byte indexes for - it is more efficient to search the string for multiple characters at once, than make repeated calls to this function Chris Smithchris.nosp@m.@jal.nosp@m.akai..nosp@m.co.u.nosp@m.k string string to locate index in int (n times) mixed - int if only one input int, array if more boolean TRUE if it's all ASCII Given a string and any byte index, returns the byte index of the start of the current UTF-8 character, relative to supplied position. If the current character begins at the same place as the supplied byte index, that byte index will be returned. Otherwise this function will step backwards, looking for the index where current UTF-8 character begins Chris Smithchris.nosp@m.@jal.nosp@m.akai..nosp@m.co.u.nosp@m.k string int byte index in the string int byte index of start of next UTF-8 character Given a string and any byte index, returns the byte index of the start of the next UTF-8 character, relative to supplied position. If the next character begins at the same place as the supplied byte index, that byte index will be returned. Chris Smithchris.nosp@m.@jal.nosp@m.akai..nosp@m.co.u.nosp@m.k string int byte index in the string int byte index of start of next UTF-8 character Utilities for processing "special" characters in UTF-8. "Special" largely means anything which would be regarded as a non-word character, like ASCII control characters and punctuation. This has a "Roman" bias - it would be unaware of modern Chinese "punctuation" characters for example. Note: requires utils/unicode.php to be loaded utf8_is_valid Used internally. Builds a PCRE pattern from the $UTF8_SPECIAL_CHARS array defined in this file The $UTF8_SPECIAL_CHARS should contain all special characters (non-letter/non-digit) defined in the various local charsets - it's not a complete list of non-alphanum characters in UTF-8. It's not perfect but should match most cases of special chars. This function adds the control chars 0x00 to 0x19 to the array of special chars (they are not included in $UTF8_SPECIAL_CHARS) string utf8_from_unicode utf8_is_word_chars utf8_strip_specials Checks a string for whether it contains only word characters. This is logically equivalent to the PCRE meta character. Note that this is not a 100% guarantee that the string only contains alpha / numeric characters but just that common non-alphanumeric are not in the string, including ASCII device control characters. string to check boolean TRUE if the string only contains word characters utf8_specials_pattern Removes special characters (nonalphanumeric) from a UTF-8 string This can be useful as a helper for sanitizing a string for use as something like a file name or a unique identifier. Be warned though it does not handle all possible non-alphanumeric characters and is not intended is some kind of security / injection filter. Andreas Gohr andi@.nosp@m.spli.nosp@m.tbrai.nosp@m.n.or.nosp@m.g string $string The UTF8 string to strip of special chars string (optional) $repl Replace special with this string string with common non-alphanumeric characters removed utf8_specials_pattern Tools for conversion between UTF-8 and unicode The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com) http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp http://hsivonen.iki.fi/php-utf8/ Takes an UTF-8 string and returns an array of ints representing the Unicode characters. Astral planes are supported ie. the ints in the output can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates are not allowed. Returns false if the input string isn't a valid UTF-8 octet sequence and raises a PHP error at level E_USER_WARNING Note: this function has been modified slightly in this library to trigger errors on encountering bad bytes Joomla modification - As of PHP 7.4, curly brace access has been deprecated. As a result this function has been modified to use square brace syntax See https://github.com/php/php-src/commit/d574df63dc375f5fc9202ce5afde23f866b6450a for additional references hsivo.nosp@m.nen@.nosp@m.iki.f.nosp@m.i string UTF-8 encoded string mixed array of unicode code points or FALSE if UTF-8 invalid utf8_from_unicode http://hsivonen.iki.fi/php-utf8/ Takes an array of ints representing the Unicode characters and returns a UTF-8 string. Astral planes are supported ie. the ints in the input can be > 0xFFFF. Occurrances of the BOM are ignored. Surrogates are not allowed. Returns false if the input array contains ints that represent surrogates or are outside the Unicode range and raises a PHP error at level E_USER_WARNING Note: this function has been modified slightly in this library to use output buffering to concatenate the UTF-8 string (faster) as well as reference the array by it's keys array of unicode code points representing a string mixed UTF-8 string or FALSE if array contains invalid code points hsivo.nosp@m.nen@.nosp@m.iki.f.nosp@m.i utf8_to_unicode http://hsivonen.iki.fi/php-utf8/ Tools for validating a UTF-8 string is well formed. The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com) http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp http://hsivonen.iki.fi/php-utf8/ Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard Note: this function has been modified to simple return true or false hsivo.nosp@m.nen@.nosp@m.iki.f.nosp@m.i string UTF-8 encoded string boolean true if valid http://hsivonen.iki.fi/php-utf8/ utf8_compliant Tests whether a string complies as UTF-8. This will be much faster than utf8_is_valid but will pass five and six octet UTF-8 sequences, which are not supported by Unicode and so cannot be displayed correctly in a browser. In other words it is not as strict as utf8_is_valid but it's faster. If you use is to validate user input, you place yourself at the risk that attackers will be able to inject 5 and 6 byte sequences (which may or may not be a significant risk, depending on what you are are doing) utf8_is_valid http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805 string UTF-8 string to check boolean TRUE if string is valid UTF-8