How can I use Mahout's sequencefile API code?
You can do something like this:
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.SequenceFile;import org.apache.hadoop.io.Text;Configuration conf = new Configuration();FileSystem fs = FileSystem.get(conf);Path outputPath = new Path("c:\\temp");Text key = new Text(); // Example, this can be another type of classText value = new Text(); // Example, this can be another type of classSequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());while(condition) { key = Some text; value = Some text; writer.append(key, value);}writer.close();
You can find more information here and here
Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory
Then the call looks something like this:
ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);
The ToolRunner
comes from org.apache.hadoop.util.ToolRunner
Hope this was of help.