C++ way of reading unicode file
-
Hello!
I have std::wstring containing file name.
I need to:
- open this file (file is a text file);
- detect if it has UTF-8 encoding;
- read contents of the file to std::wstring (apply system default codepage if not UTF-8).
Requirements:
a) C++ classes, where possible;
b) Platform-independent code, where possible (or suppose Windows).
-
Just do a compile-time check if
char
has 8 bits and use std::string for the file's contents. Converting is unnecessary, if you really need it you can use a library like utfcpp. It also has checking functions, that way you'll know at least if it's UTF-8 or not (but just assuming it is a lot faster ;)).
-
First of all I can not open file this way:
std::wstring fileName( _T("[e]alpha[/e][e]beta[/e][e]gamma[/e].txt") );
std::ifstream file( fileName ); //Compile error here
I cannot just assume UTF-8, because half of my .txt files are Win-1251 codepage, half are UTF-8 encoding.
Notepad.exe first try to save Win1251, if some symbols can not be represented, than it save them as UTF-8.
-
std::wstring fileName( _T("αβγ.txt") );
std::wifstream file( fileName ); // use wifstream!
-
Sorry. The code
std::wstring fileName( _T("[e]alpha[/e][e]beta[/e][e]gamma[/e].txt") );
std::ifstream file( fileName );
works perfectly: ifstream constructor has support for both
char
andwchar_t
.Reading about utfcpp...
-
In case u just need a function converting utf-8 to wstring u can use on of the functions in this thread:
http://www.c-plusplus.net/forum/viewtopic-var-t-is-223921.htmlbye
-
.filmor schrieb:
Just do a compile-time check if
char
has 8 bits and ...How?
-
// C++ 98 template <int I> struct Assertion_CharIsLargerThanEightBits; template <> struct Assertion_CharIsLargerThanEightBits <0> {}; Assertion_CharIsLargerThanEightBits <(char) 0x100> assertion_CharIsLargerThanEightBits; // C+ 0x static_assert ((char) 0x100 == (char) 0x00, "sizeof (char) > 8 bits!");
-
// C+ 0x static_assert ((char) 0x100 == (char) 0x00, "sizeof (char) > 8 bits!");
Why not this way?
// C+ 0x static_assert (CHAR_BIT == 8, "sizeof (char) > 8 bits!");
-
Warum einfach, wenns auch umständlich geht?
-
If you have got the boost library, you can just use its macro
BOOST_STATIC_ASSERT
.