What is Unicode?
6 November 2022 (Updated 6 November 2022)
In a nutshell
Unicode (a.k.a The Unicode Standard) is an encoding standard maintained by the Unicode Consortium that defines a unique code point for ~149,000 characters. It covers 161 modern and historic scripts, symbols, emoji, and non-visual / formatting codes such as the enter and delete keys.
Here are some examples of character to code point mappings:
Character | Code point |
A | U+0041 |
B | U+0042 |
a | U+0061 |
b | U+0062 |
Ā | U+0100 |
[ | U+005B |
ช | U+0E0A |
DEL | U+007F |
You can use this website to browse through all the Unicode characters. For example, here‘s the Unicode info for the Latin character “a”.
Get Unicode code point for a character using JavaScript
Most programming languages let you convert between characters and unicode code points. For example, JavaScript’s String.prototype.charCodeAt()
method lets you get the UTF-16 code unit of any character:
'a'.charCodeAt(0) // 97
'8'.charCodeAt(0) // 98
function strToUnicode(str) {
return str.split('').map(char => char.charCodeAt(0))
}
strToUnicode('hello') // [104, 101, 108, 108, 111]
Other notes
- Unicode is implemented in most modern operating systems and programming languages.
- TODO: What’s the difference between UTF-8, UTF-16, UTF-32?
Sources
Tagged:
Computing
Thanks for your comment 🙏. Once it's approved, it will appear here.
Leave a comment