One of the challenges in working with Oracle and Globalization is in identifying how much space oracle requires to store a particular alphabet,especially when the database uses a Unicode Character Set. The Unicode DB Characterset AL32UTF8 is a multi byte characterset. For an ASCII character, oracle(with AL32UTF8) need only 1 byte, where as for a Japanese character (hiragana,katakana or kanji), it would require either 3 or 4 bytes. Most of the Japanese characters are 3 bytes, but as more number of kanaji’s are accepeted for daily use, it becomes evident that oracle would require more bytes to store them.
To see how much space a character occupies, one can use the oracle vsize(‘<string>’) or dump(‘string’) functions. Both these functions gives the size in bytes.
Let’s see how much space(in bytes) Oracle Db with AL32UTF8 characterset requires tp store the word ‘tokyo’ in English and Japanese (Hiragana, Full Width Katakana, Half Width Katakana and Kanji).
select vsize(‘tokyo’) from dual;
select dump(‘tokyo’) from dual;
Typ=96 Len=5: 116,111,107,121,111
select vsize(‘とうきょう’) from dual;
select dump(‘とうきょう’) from dual;
Typ=96 Len=15: 227,129,168,227,129,134,227,129,141,227,130,135,227,129,134
Japanese-Full Width Katakana
select vsize(‘トウキョウ’) from dual;
select dump(‘トウキョウ’) from dual;
Typ=96 Len=15: 227,131,136,227,130,166,227,130,173,227,131,167,227,130,166
Japanese-Half Width Katakana
select vsize(‘ﾄｳｷｮｳ’) from dual;
select dump(‘ﾄｳｷｮｳ’) from dual;
Typ=96 Len=15: 239,190,132,239,189,179,239,189,183,239,189,174,239,189,179
select vsize(‘東京’) from dual;
select dump(‘東京’) from dual;
Typ=96 Len=6: 230,157,177,228,186,172