Difference between Text and String in Hadoop

string text hadoop

The binary representation of a Text object is a variable length integer containingthe number of bytes in the UTF-8 representation of the string, followed by the UTF-8bytes themselves.

Text is a replacement for the UTF8 class, which was deprecatedbecause it didn’t support strings whose encoding was over 32,767 bytes, and becauseit used Java’s modified UTF-8.

Furthermore, Text uses standard UTF-8, which makes it potentially easier to inter operate with other tools that understand UTF-8.

Following are some of the differences in brief related to its functioning with respect to String:

Indexing:Because of its emphasis on using standard UTF-8, there are some differencesbetween Text and the Java String class. Indexing for the Text class is in terms of position in the encoded byte sequence, not the Unicode character in the string, or the Javachar code unit (as it is for String).

For instance, charAt() returns an int representing a Unicode code point, unlike theString variant that returns a char.

Iteration:Iterating over the Unicode characters in Text is complicated by the use of byteoffsets for indexing, since you can’t just increment the index.

Mutable:Another difference with String is that Text is mutable (like all Writable implementations in Hadoop, except NullWritable, which is a singleton). You can reuse aText instance by calling one of the set()methods on it.

Resorting to String:

Text doesn’t have as rich an API for manipulating strings asjava.lang.String, so in many cases, you need to convert the Text object to a String.This is done in the usual way, using the toString() method:

For more details read definitive guide.

CodeHunter

Difference between Text and String in Hadoop

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last