setting a UTF-8 in java and csv file [duplicate] setting a UTF-8 in java and csv file [duplicate] java java

setting a UTF-8 in java and csv file [duplicate]


I spent some time but found solution for your problem.

First I opened notepad and wrote the following line: שלום, hello, приветThen I saved it as file he-en-ru.csv using UTF-8.Then I opened it with MS excel and everything worked well.

Now, I wrote a simple java program that prints this line to file as following:

    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));    w.print(line);    w.flush();    w.close();

When I opened this file using excel I saw "gibrish."

Then I tried to read content of 2 files and (as expected) saw that file generated by notepad contains 3 bytes prefix:

    239 EF    187 BB    191 BF

So, I modified my code to print this prefix first and the text after that:

    String line = "שלום, hello, привет";    OutputStream os = new FileOutputStream("c:/temp/j.csv");    os.write(239);    os.write(187);    os.write(191);    PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));    w.print(line);    w.flush();    w.close();

And it worked! I opened the file using excel and saw text as I expected.

Bottom line: write these 3 bytes before writing the content. This prefix indicates that the content is in 'UTF-8 with BOM' (otherwise it is just 'UTF-8 without BOM').


Unfortunately, CSV is a very ad hoc format with no metadata and no real standard that would mandate a flexible encoding. As long as you use CSV, you can't reliably use any characters outside of ASCII.

Your alternatives:

  • Write to XML (which does have encoding metadata if you do it right) and have the users import the XML into Excel.
  • Use Apache POI to create actual Excel documents.


Excel doesn't use UTF8 to open CSV files. Thats a known problem. The actual encoding used depends on the locale settings of Microsoft Windows. With a German lcoale for example Excel would open a CSV file with CP1252.

You could create an Excel file containing some persian characters and save it as an CSV file. Then write a small Java program to read this file and test some common encodings. Thats the way I used to figure out the correct encoding for German umlauts in CSV files.