Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV export doesn't support UTF8 characters #5729

Closed
mattab opened this issue Jul 20, 2008 · 1 comment
Closed

CSV export doesn't support UTF8 characters #5729

mattab opened this issue Jul 20, 2008 · 1 comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Milestone

Comments

@mattab
Copy link
Member

mattab commented Jul 20, 2008

Solution provided by bruno costa:
Your header looks very similar to ours, we just have a few extra parameters.

header(“Content-Type: application/vnd.ms-excel”);
header(“Expires: 0”);
header(“Cache-Control: must-revalidate, post-check=0, pre-check=0”);
header(“content-disposition: attachment;filename=keywords.csv”);

The key seems to be in the encoding, which MS Excel expects to be Unicode
UTF-16LE, any other and the results become unreliable. And the byte-order mark,
which it also expects to be present as the first two bytes in the file, 0xFF
0xFE, meaning it’s a UTF-16LE file.

This is how we’re doing it, after sending the header and having $content as the
variable containing the CSV lines in UTF-8, we send the byte-order mark and the
text in UTF-16LE:

echo chr(255) . chr(254) . mb_convert_encoding($content, ‘UTF-16LE’, ‘UTF-8’);

Another fundamental point is that the CSV content has to properly represent the
text, either stored as UTF-8 if we want to support any alphabet or GB2312 if
only Chinese as could happen in our cases, ISO-8859-15 for instance obviously
wouldn’t properly represent the text.
In the end we just need to use mb_convert_encoding() to convert from our
internal representation to what MS Excel expects.

In case it can be useful I’m also sending a few links we gathering when
studying the issue, the 4th one explains the key BOM issue:

UTF BOM
http://www.opentag.com/xfaq_enc.htm
http://en.wikipedia.org/wiki/Byte_Order_Mark
http://www.unicode.org/unicode/faq/utf_bom.html

PHP Multibyte String Functions
http://www.php.net/manual/en/ref.mbstring.php#50298

@mattab
Copy link
Member Author

mattab commented Jul 20, 2008

fixed in #309

@mattab mattab added this to the RobotRock milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For errors / faults / flaws / inconsistencies etc. Major Indicates the severity or impact or benefit of an issue is much higher than normal but not critical.
Projects
None yet
Development

No branches or pull requests

1 participant