Premium Only Content

Unicode at gigabytes per second
We often represent text using Unicode formats (UTF-8 and UTF-16). UTF-8 is increasingly popular (XML, HTML, JSON, Rust, Go, Swift, Ruby). UTF-16 is most common in Java, .NET, and inside operating systems such as Windows. Software systems frequently have to validate text or convert text from one encoding to the other. While recent disks have bandwidths of 5 GB/s or more, conventional approaches transcode non-ASCII text at a fraction of a gigabyte per second. We show that we can transcode (UTF-8, UTF-16) at gigabytes per second on current systems (x64 and ARM) without sacrificing safety. Our open-source library can be ten times faster than the popular ICU library on non-ASCII strings and even faster on ASCII strings.
Invited talk at SPIRE 2021, 28th International Symposium on String Processing and Information Retrieval (October 4-6th, 2021 - Lille, France)
-
0:34
On_the_Other_Hand
4 years agoA Second Channel!
32 -
0:06
Womblefam1857
4 years agoSkunk second run
66 -
2:03
KNXV
4 years agoSecond chance bike drive
17 -
1:27
WMAR
4 years agoServe second Saturdays
16 -
9:18
ARFCOM News
10 hours ago $1.63 earnedNSSF "Celebrates" ATF Partnership | Glocks BANNED | Redundant Spooky Boi Ban
32.4K9 -
13:09:13
LFA TV
19 hours agoLFA TV ALL DAY STREAM - WEDNESDAY 9/17/25
304K61 -
1:00:00
BEK TV
1 day agoAPRIL LUND: FAITH, FOCUS, AND THE ROAD TO THE 2028 OLYMPIC MARATHON
22.7K -
37:15
Stephen Gardner
5 hours ago🔥Trump ERUPTS After Obama’s Charlie Kirk Comments!
35.3K68 -
13:40:35
Total Horse Channel
16 hours ago2025 WDAA Western Dressage World Championship Show | Day Two | Arena One
26.1K -
1:14:40
Glenn Greenwald
6 hours agoThe Right Wages Its Own Cancel Culture War: Lee Fang, Thomas Chatterton Williams, and Leighton Woodhouse on the State of Civil Discourse and More | SYSTEM UPDATE #517
170K79