If I have a constructor that requires a path to a file, how can I "fake" that if it is packaged into a jar?
dump your data to a temp file, and feed the temp file to it.
File tmpFile = File.createTempFile("XX", "dat");tmpFile.deleteOnExit();InputStream is = MyClass.class.getResourceAsStream("/path/in/jar/XX.dat");OutputStream os = new FileOutputStream(tmpFile)read from is, write to os, close
One recommended way is to use the Distributed Cache rather than trying to bundle it into a jar.
If you zip GeoIP.dat and copy it on hdfs://host:port/path/GeoIP.dat.zip. Then add these options to the Pig command:
pig ... -Dmapred.cache.archives=hdfs://host:port/path/GeoIP.dat.zip#GeoIP.dat -Dmapred.create.symlink=yes...
And LookupService lookupService = new LookupService("./GeoIP.dat");
should work in your UDF as the file will be present locally to the tasks on each node.