当前位置:网站首页>Chinese coding in QT cross platform programming

Chinese coding in QT cross platform programming

2021-01-23 16:57:47 magicdmer

Preface

In the use of Qt5 In the process of development, we encountered some cross platform Chinese code conversion problems , And then there was an investigation , Here's a summary , I hope I can help others

Character encoding

First of all, let's get to know UNICODE, UNICODE It is a character set standard issued by Microsoft and other well-known companies , The specific coding implementation is UTF-8, UTF-16 and UTF-32 These three coding formats . and GBK Character encoding belongs to another set of characters , Similar to that Big5 Character set .

And then introduce UCS ,UCS yes ISO It's a kind of similar to UNICODE Character set standard for , Then the two began to merge , The coding is basically the same . What we need to know here is UCS-2 and UTF-16 The relationship between ,UCS-2 The norm is equivalent to UTF-16 canonical A subset of , because UTF-16 There are extended characters that become longer , and UCS-2 It's fixed two bytes

windows Character encoding

windows since windows 2000 after Start using UNICODE Character set , The specific code is UTF-16 code . And then from vista After that, I started to support UNICODE 5.0 standard , Yes UTF-16 Our support is also more perfect , Support UTF-16 surrogate Extended character display . about windows For developers , We need to know windows On behalf of UTF-16 Encoded data type , Here we use visual studio series IDE And the compiler that comes with it as an example ,Qt It also uses msvc Compiler environment

  1. representative UTF-16 The data type of is wchar_t, That is, the wide character selected by our project settings . Someone might ask ,UTF-16 It's variable-length code , Extended coding will use 4 Bytes ,wchar_t How to express ? It's a pity that I can't say , It's more like UCS-2 standard , But what we need to know is windows It's true that our core is UTF-16 Of , It's just that the programming interface only provides two bytes of wchar_t. But these extended characters are not used , So we don't have to think too much about , If you need to know more, you can check the information yourself , Bloggers haven't gone deep yet
  2. visual studio The source code file created by default is ANSI code , here char The string is the default GBK code , Microsoft's development tools and Compilers for GBK Good support
  3. Of course, we can also set up visual studio use UTF-8 code , Just use UTF-8 Format of the source file , Then add the following precompiled macro to the source code . such char The string defaults to UTF-8 It's encoded ,Qt + msvc The same is true for the settings of
#if defined(_MSC_VER) && (_MSC_VER >= 1600)  
# pragma execution_character_set("utf-8")  
#endif 

Be careful : The file format uses UTF-8 May appear "error C2001: Line breaks in constants " The situation of , At this time, you can change the file format to UTF-8 BOM To solve the problem . There are actually other solutions , You can search for

linux Character encoding

linux By default, the system uses UTF-8 code , about linux For developers , It's much simpler , Use both UTF-8 And that's it , Use here QtCreator + gcc To explain

If we need to use it completely GBK It's OK to code for development , Need to be in Qt engineering main Function to add the following settings , And use GBK Format source file

QTextCodec *codec = QTextCodec::codecForName("gbk");
QTextCodec::setCodecForLocale(codec);

Qt5 Character encoding

QString yes UNICODE code , Exactly UTF-16 code .Qt The program wants to display Chinese correctly , Then you need to convert other encoded strings to UNICODE code . And then let's talk about it QString There are several kinds of construction methods of

// This is the default string passed in is  UTF-8
QString strTest = QString(" I am a ");

// This is from  UTF-8  String construction , Same as above 
QString strTest = QString::fromUtf8(" I am a ");

// This is also the default string UTF-8
QString strTest = QString::tr(" I am a ");

// This code is based on the system ,windows Namely GBK,linux Namely UTF-8
QString strTest = QString::fromlocal8Bit(" I am a ");

// This is  ASCII  This one , Single byte 
QString strTest = QString::fromLatin1();

Be careful

  1. QString yes UTF-16 , that QChar Just like WCHAR The same is two bytes
  2. fromLocal8Bit It's a system related function , This function is in the windows On , It's from GBK To unicode, But if it is linux, So it's from UTF-8 To unicode. So if we need fixed GBK turn UNICODE, So please use the following method
QTextCodec* gbkcodec = QTextCodec::codecForName("gbk");
// Here is GBK Encoded string 
const char* bdata = " Hello , The world "     
// Use TextCodec Library gbk To unicode
QString strGBK = gbkcodec->toUnicode(bdata);

Finally, we recommend UTF-8 The way of coding , This makes it easy to be compatible windows and linux Multi platform

Reference resources :https://wiki.qt.io/Strings_and_encodings_in_Qt

版权声明
本文为[magicdmer]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210123165709783m.html