Counting characters in a UTF-8 file

Use -m or --chars option.

For example (text file contains two Korean characters and newline):

falsetru@jmlee12:~$ cat text안녕falsetru@jmlee12:~$ wc -c text7 textfalsetru@jmlee12:~$ wc -m text3 text

According to wc(1):

   -c, --bytes          print the byte counts   -m, --chars          print the character counts

bash unix encoding utf-8 wc

Don't confuse chars, chars and bytes. A byte is 8 bits long, and -c counts bytes in your file whatever you put in. A char in many programming languages is also 8 bits long this is why counting bytes uses -c! If you want to count how many characters (chars) of a given alphabet you have in a file, then you need to specify in some way which encoding of chars have been used, and sometimes, that encoding uses more than a byte for a char. Read the manual for wc, it will tell you that -m will use you current locale (roughly your language/charset preferences) to decode the file and count your chars.

CodeHunter

Counting characters in a UTF-8 file

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last