What is Unicode? | Sajad Torkamani

On this page

Get Unicode code point for a character using JavaScript

In a nutshell

Unicode (a.k.a The Unicode Standard) is an encoding standard maintained by the Unicode Consortium that defines a unique code point for ~149,000 characters. It covers 161 modern and historic scripts, symbols, emoji, and non-visual / formatting codes such as the enter and delete keys.

Here are some examples of character to code point mappings:

Character	Code point
A	U+0041
B	U+0042
a	U+0061
b	U+0062
Ā	U+0100
[	U+005B
ช	U+0E0A
`DEL`	U+007F

You can use this website to browse through all the Unicode characters. For example, here‘s the Unicode info for the Latin character “a”.

Get Unicode code point for a character using JavaScript

Most programming languages let you convert between characters and unicode code points. For example, JavaScript’s String.prototype.charCodeAt() method lets you get the UTF-16 code unit of any character:

'a'.charCodeAt(0) // 97
'8'.charCodeAt(0) // 98

function strToUnicode(str) {
  return str.split('').map(char => char.charCodeAt(0))
}

strToUnicode('hello') // [104, 101, 108, 108, 111]

Other notes

Unicode is implemented in most modern operating systems and programming languages.
TODO: What’s the difference between UTF-8, UTF-16, UTF-32?

Sources

Tagged: Computing

In a nutshell

Get Unicode code point for a character using JavaScript

Other notes

Sources

Leave a comment Cancel reply