UTF-8 CPP - A simple, portable generic library for handling UTF-8 encoded strings.

Many C++ developers miss an easy and portable way of handling Unicode encoded strings. The original C++ Standard (known as C++98 or C++03) is Unicode agnostic, and while some work is being done to introduce Unicode to the next incarnation called C++0x, for the moment nothing of the sort is available. In the meantime, developers use third party libraries like ICU, OS specific capabilities, or simply roll out their own solutions. In order to easily handle UTF-8 encoded Unicode strings, I came up with a small generic library. For anybody used to work with STL algorithms and iterators, it should be easy and natural to use. The code is freely available for any purpose - check out the license at the beginning of the utf8.h file. If you run into bugs or performance issues, please let me know and I'll do my best to address them. The purpose of this article is not to offer an introduction to Unicode in general, and UTF-8 in particular. If you are not familiar with Unicode, be sure to check out Unicode Home Page or some other source of information for Unicode. Also, it is not my aim to advocate the use of UTF-8 encoded strings in C++ programs; if you want to handle UTF-8 encoded strings from C++, I am sure you have good reasons for it.

Operating Systems

  • All platforms that support ANSI C++ and PThreads
  • Platform-independent


  • Visual C++
  • GCC
  • Any standard C++ compiler

Added : 2010-05-14 Amended: 2010-05-14 Licensing : Boost

  • SourceForge home page
  • Submitted by:noname
