How to Use Basic String and Unicode String in Modern C++
In programming, one of the most used variable types are text strings, and they are sometimes really important when storing and retrieving valuable data. It is important to store your data safely in its language and localization. Most programming languages have issues when storing texts and letters. In C++, there is very old well-known string type (arrays of chars) and modern types of std::basic_string
types such as std::string
, and std::wstring
. In addition to these modern string types, C++ Builder has another amazing string feature, UnicodeString. In this post, we explain what a basic string and UnicodeString are in modern C++ and how to use them.
What are the string types in C++?
In general there are 3 type of alphanumeric string declarations in C++;
- Array of chars (See Fundamental Types)
chars are shaped in ASCII forms which means each character has 1 byte (8 bits) size (that means you have 0 to 255 characters) - Basic String (std::basic_string)
Thebasic_string
(std::basic_string
andstd::pmr::basic_string
) is a class template that stores and manipulates sequences of alpha numeric string objects (char
,w_char
,…). A basic string can be used to definestring
,wstring
,u8string
,u16string
andu32string
data types. String
orUnicodeString
TheUnicodeString
string type is a defaultString
type of RAD Studio, C++ Builder, Delphi that is in UTF-16 format that means characters in UTF-16 may be 2 or 4 bytes. In C++ Builder and Delphi;Char
andPChar
types are nowWideChar
andPWideChar
, respectively. There is a good article about Unicode in RadStudio.
In addition, there were some old string types that we used in C++ Builder and Delphi before,
- AnsiString
Previously,String
was an alias forAnsiString
. For RAD Studio, C++ Builder and Delphi, the format ofAnsiString
has changed. CodePage and ElemSize fields have been added. This makes the format forAnsiString
identical for the newUnicodeString
. - WideString
WideStrings
were previously used for Unicode character data. Its format is essentially the same as the WindowsBSTR
.WideString
is still appropriate for use in COM applications.
What is basic_string?
The basic_string
(std::basic_string
and std::pmr::basic_string
) is a class template that stores and manipulates sequences of alpha numeric string objects (char
, w_char
,…). For example, str::string
and std::wstring
are the data types defined by the std::basic_string
. In other words, basic_string
is used to define different data_types which means a basic_string
is not a string only, it is a namespace for a general string format. A basic string can be used to define string
, wstring
, u8string
, u16string
and u32string
data types.
The basic_string
class is dependent neither on the character type nor on the nature of operations on that type. The definitions of the operations are supplied via the Traits
template parameter (i.e. a specialization of std::char_traits) or a compatible traits class. The basic_string
stores the elements contiguously.
Several string types for common character types are provided by basic string definitions as below.
String Type | Basic String Definition | Standard |
std::string | std::basic_string |
|
std::wstring | std::basic_string |
|
std::u8string | std::basic_string |
(C++20) |
std::u16string | std::basic_string |
(C++11) |
std::u32string | std::basic_string |
(C++11) |
Several string type in std::pmr
namespace for common character types are provided by the basic string definitions too. Here are more details about basic string types and their literals.
Note that you can use both std::basic_string
(std::string
, std::wstring
, std::u16string
, …) and UnicodeString
in C++ Builder. Here are more details about basic string types and their literals.
What is UnicodeString (String) in C++ Builder?
The Unicode standard for UnicodeString provides a unique number for every character (8, 16 or 32 bits) more than ASCII (8 bits) characters. UnicodeStrings are being used widely because of support to languages worldwide and emojis. In modern C++ nowadays there are two types of strings used; array of chars (char strings) and modern strings such as std::string
, std::wstring
or UnicodeString
(default type for the String
). Most compilers, IDEs are using these new string standards in their GUI forms and components to support all languages that provides applications in global.
In C++ Builder, there were other string types, such as WideStrings
and AnsiStrings
. They are now older, and not compatible with all features now of modern programming. More information about the structure of Unicode Strings can be found here . RAD Studio, Delphi & C++ Builder use Unicode-based strings: that is, the type String is a Unicode string (System.UnicodeString) instead of an ANSI string. If you want to transform your code to Unicode strings, we recommend you read this article.
How can we use UnicodeStrings in C++ Builder?
Here are some modern examples how you can use strings with the UnicodeString type,
How to declare Unicode Strings
L
is a String Literal here, represents a wchar_t
literal; here u8
, u
and U
literals can be used too. These might be default in you editor and or compiler options that means you don’t need to add if you know the default. A string literal is a sequence of characters surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in “…”, R”(…)”, u8″…”, u8R”(…)“, u”…”, uR”˜(…)˜”, U”…”, UR”zzz(…)zzz”, L”…”, or LR”(…)”, respectively. Please see String Literals section in this document Working Draft, Standard for Programming Language C++. Here below we sum some of these standards used in C++. Here are some examples,
UnicodeString ustr = L” مرحبا”; UnicodeString ustr2 = ustr + L“Hello “ + L“DEF” ; ustr=L“こんにちは”; ustr.printf( L“Pi is %8.2f”, 3.14);
|
Examples of String Literals for String Definitions
- str=”abcd”; default string based on compiler/IDE options.
- str=u8″abcd”; a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8, including the null terminator
- str=u”abcd”; a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, including the null terminator
- str=U”abcd”; a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, including the null terminator
- str=L”abcd”; a wide string literal. A wide string literal has type “array of n const wchar_t”, including the null terminator
- str=R”abcd”; raw strings
What is the difference between L”” and U”” and u”” literals in C++?
- L is based on wide string literal depends on array of n const wchar_t in your compiler/IDE options. Generally it is UTF-8 or UTF-16 format
- u is for UTF-16 format,
- U is for UTF32 formats
Length of Unicode String
UnicodeString ustr = L“ABCDE”; int length=ustr.Length(); if(ustr.Length()>45) ShowMessage(L“Too Long”);
|
Size of Unicode String
UnicodeString ustr=L“ABCDEF”; int size = ustr.Length()*ustr.ElementSize();
|
How to reach / read characters of Unicode String:
UnicodeString ustr=L“ABCDEF”; Char Ch=ustr[3]; // Ch is 3rd C char in integer now
|
How to change characters of Unicode String
ustr[3]=L‘九’; //single unicode character ustr[3]=L‘/u1F603’; ustr[3]=128515;
|
How to find position of a string in Unicode String
UnicodeString ustr=“Hey,Hello”; int pos=ustr.Pos(L“Hello”); if(ustr.Pos(L“Hello”)>0) ShowMessage(“Found Hello”);
|
Converting Unicode String to Integer
UnicodeString ustr=L“987”; int i=ustr.ToInt(); int j=ustr.ToIntDef(0); // if not an integer then set to 0
|
Converting Unicode String to Double
Converting Unicode String to Float
UnicodeString ustr=L“8.45”; double i=ustr.ToDouble();
|
Converting Unicode String to LowerCase
UnicodeString ustr=L“This is Unicode”; UnicodeString ustr2=ustr.LowerCase();
|
Converting Unicode String to UpperCase
UnicodeString ustr=L“This is Unicode”; UnicodeString ustr2=ustr.UpperCase();
|
Converting Unicode String to char String
UnicodeString src=L” ABC DEF”; char *dest= ((AnsiString)src).c_str(); // we cant use c_str() of Unicode directly, we can use c_str() of AnsiString
// or this can be used in some compilers System::UnicodeToUtf8( dest, 256, src, src.Length() );
|
Converting from higher number chars to lower number chars is not recommended. If you need to convert to a low level (char) that means that low level variable needs to be higher level (unicode), otherwise you can lose some unicode characters which will result in missing or incorrect characters in your char
strings.
Converting Unicode String to ANSI String
UnicodeString src=L“This is Unicode”; AnsiString dest= (AnsiString)src;
|
Converting Unicode String to Wide String
UnicodeString ustr=L“This is unicode”; WideChar wstr[255]; StrCopy(wstr, ustr.w_str());
|
Substring of a Unicode String
UnicodeString ustr = L“ABCDEF”; UnicodeString ustr2= ustr.SubString(5, 3);
|
Insert a String to UnicodeString
UnicodeString ustr=L“ABCDEF”; ustr.Insert(L“-insert-“, 3);
|
Deleting / Trimming Part of Unicode String
UnicodeString ustr = L“ABCDEF”; UnicodeString ustr2= ustr.Delete(2, 4); // UnicodeString& Delete(int index, int count)
|
Compare Unicode Strings
UnicodeString ustr=L“ABCDEF”; UnicodeString ustr=L“AbCdEf”; if(ustr.Compare(ustr2)==0) ShowMessage(“Sensitively Both Strings are Same “); // case-sensitive if(ustr.CompareIC(ustr2)==0) ShowMessage(“Insensitively Both Strings are Same “); // case-insensitive
|
Triming spaces and control characters from a Unicode String
ustr=L” ABC DEF”; ustr2=ustr.Trim();
|
For more details about these commands and properties listed above please check UnicodeString Mehtods & Properties and UnicodeString Types from UnicodeString wilki.
C++ Builder is the easiest and fastest C and C++ compiler and IDE for building simple or professional applications on the Windows operating system. It is also easy for beginners to learn with its wide range of samples, tutorials, help files, and LSP support for code. RAD Studio’s C++ Builder version comes with the award-winning VCL framework for high-performance native Windows apps and the powerful FireMonkey (FMX) framework for UIs.
There is a free C++ Builder Community Edition for students, beginners, and startups; it can be downloaded from here. For professional developers, there are Professional, Architect, or Enterprise versions of C++ Builder and there is a trial version you can download from here.