The Unicode Profiling Project was designed to gather statistics on unicode support across systems. The software checks each symbol in your systems Unicode catalog (65,535 glyphs) to see which are visible on your computer using <canvas> and Javascript.
The data generated from your computer will help profile the state of unicode support on the web. Your computers unicode support, remote address, user agent and processing time will be submitted to the server upon completion of the test — a statistic analysis of the data will be published — no specific information about your computer will be published.
The code behind this project is an extension of isFontSupported (font detection in <canvas>). As with isFontSupported, the code behind Unicode Profiling Project is released under CC0, free to use this code in creative ways in your own projects
Once the test is initiated you’ll be able to watch the glyphs as they’re scanned with their related unicode block name. It typically takes over a minute to scan an entire collection of unicode characters. This is what the acid test looks like while being processed:
Once the processing has completed you will be presented with a string of binary representing what characters are visible, and which ones are unavailable or invisible (65,535 numbers). Here are the results from my Chromium browser running on OSX 10.6.4:
Now the fun part, click on “Show Available” — this may take a few seconds as you’re referencing tens of thousands of unicode characters at once:
The project fully supports Chrome, Firefox, Safari, and Opera. Some false positives are produced in Internet Explorer as there is a unique “missing symbol” for every unicode block.
Results on my Mac:
- Safari
49,493 visible glyphs - Firefox 3.6
49,428 visible glyphs
NOTE: Each undefined symbol has a unique hash unless text size is <=11 - Google Chrome 7.0
49,493 visible glyphs - Chromium 8.0
49,492 visible glyphs - Opera 10.6
47,672 visible glyphs
NOTE: Supports different fonts in <canvas> than regular DOM
Results on my Windows:
- IE 9.0
50,826 visible glyphs
NOTE: Some false positives… each range has it’s own undefined symbol. - Firefox 3.6
51,208 visible glyphs
NOTE: Each undefined symbol has a unique hash unless text size is <=10 - Google Chrome 7.0
47,267 visible glyphs
NOTE: Textarea can have different unicode support than Div in some cases. For instance, on my computer ﰿ works in Textarea, but not in Div. - Opera 10.6
56,024 visible glyphs
NOTE: Supports different unicode in Canvas than Div and Textarea. Also, Opera supports more unicode characters than other browsers by far, possibly included in the package?
Further Research:
http://unicode.org/
http://en.wikipedia.org/wiki/Unicode
http://www.fileformat.info/info/unicode/






Aug 29, 2011 @ 15:04:55
Thank you, thank you, thank you! You have shed so much light on a murky subject.
Sep 07, 2011 @ 19:58:51
I don’t quite understand the claim that the systems UTF-8 catalog is 65,535 glyphs. It’s not. UTF-8 is a byte encoding scheme that has nothing to do with how many glyphs exist. That’s the Unicode specification’s job. And prior to 1996 there were only 65535 codepoints. Which was then quickly fixed with Unicode 2.0, which accomodated 1,114,112 codepoints instead, because you can’t even fit CJK in 65535 characters. Since 1996, not even the assigned codepoints fit in 65535 places (http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html has a nice up to date summary of the number of available and assigned codepoints per version of Unicode up to version 6). The current standard has 109,242 visible glyphs, which would be useful for building your matrix.
But quite importantly, remember that UTF-8 is NOT the same as Unicode. It’s just a convenient byte encoding scheme because it’s not a fixed number of bytes to represent a number. Unicode is the milk. UTF-8 is just one of the many brands of cups you can pour it into.
Sep 13, 2011 @ 04:42:00
Thank you for your corrections!
I’ll make the modifications to the script when some free time is available.