How can I use Mahout's sequencefile API code? How can I use Mahout's sequencefile API code? hadoop hadoop

How can I use Mahout's sequencefile API code?


You can do something like this:

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Text;Configuration conf = new Configuration();FileSystem fs = FileSystem.get(conf);Path outputPath = new Path("c:\\temp");Text key = new Text(); // Example, this can be another type of classText value = new Text(); // Example, this can be another type of classSequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());while(condition) {    key = Some text;    value = Some text;    writer.append(key, value);}writer.close();

You can find more information here and here

Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory

Then the call looks something like this:

ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);

The ToolRunner comes from org.apache.hadoop.util.ToolRunner

Hope this was of help.