sajad torkamani

In a nutshell

Unicode (a.k.a The Unicode Standard) is an encoding standard maintained by the Unicode Consortium that defines a unique code point for ~149,000 characters. It covers 161 modern and historic scripts, symbols, emoji, and non-visual / formatting codes such as the enter and delete keys.

Here are some examples of character to code point mappings:

CharacterCode point
AU+0041
BU+0042
aU+0061
bU+0062
ĀU+0100
[U+005B

U+0E0A
DELU+007F

You can use this website to browse through all the Unicode characters. For example, here‘s the Unicode info for the Latin character “a”.

Get Unicode code point for a character using JavaScript

Most programming languages let you convert between characters and unicode code points. For example, JavaScript’s String.prototype.charCodeAt() method lets you get the UTF-16 code unit of any character:

'a'.charCodeAt(0) // 97
'8'.charCodeAt(0) // 98

function strToUnicode(str) {
  return str.split('').map(char => char.charCodeAt(0))
}

strToUnicode('hello') // [104, 101, 108, 108, 111]

Other notes

Sources

Tagged: Computing

Leave a comment

Your email address will not be published. Required fields are marked *