Mahout : To read a custom input file

java hadoop mahout

One way to do this is by creating an extension of FileDataModel. You'll need to override the readUserIDFromString(String value) method to use some kind of resolver do the conversion. You can use one of the implementations of IDMigrator, as Sean suggests.

For example, assuming you have an initialized MemoryIDMigrator, you could do this:

@Overrideprotected long readUserIDFromString(String stringID) {    long result = memoryIDMigrator.toLongID(stringID);     memoryIDMigrator.storeMapping(result, stringID);    return result;}

This way you could use memoryIDMigrator to do the reverse mapping, too. If you don't need that, you can just hash it the way it's done in their implementation (it's in AbstractIDMigrator).

java hadoop mahout

userId and itemId can be string, so this is the CustomFileDataModel which will convert your string into integer and will keep the map (String,Id) in memory; after recommendations you can get string from id.

java hadoop mahout

Assuming that your input fits in memory, loop through it. Track the ID for each string in a dictionary. If it does not fit in memory, use sort and then group by to accomplish the same idea.

In python:

import sys

import sysnext_id = 0str_to_id = {}for line in sys.stdin:    fields = line.strip().split(',')    this_id = str_to_id.get(fields[0])    if this_id is None:        next_id += 1        this_id = next_id        str_to_id[fields[0]] = this_id    fields[0] = str(this_id)    print ','.join(fields)

CodeHunter

Mahout : To read a custom input file

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last