any quick sorting for a huge csv file any quick sorting for a huge csv file unix unix

any quick sorting for a huge csv file


Calling unix sort program should be efficient. It does multiple passes to ensure it is not a memory hog. You can fork a process with java's Runtime, but the outputs of the process are redirected, so you have to some juggling to get the redirect to work right:

public static void sortInUnix(File fileIn, File sortedFile)        throws IOException, InterruptedException {    String[] cmd = {           "cmd", "/c",            // above should be changed to "sh", "-c" if on Unix system           "sort " + fileIn.getAbsolutePath() + " > "               + sortedFile.getAbsolutePath() };    Process sortProcess = Runtime.getRuntime().exec(cmd);    // capture error messages (if any)    BufferedReader reader = new BufferedReader(new InputStreamReader(        sortProcess.getErrorStream()));    String outputS = reader.readLine();    while (outputS != null) {        System.err.println(outputS);        outputS = reader.readLine();    }    sortProcess.waitFor();}


How does the data get in the CSV format? Does it come from a relational database? You can make it such that whatever process creates the file writes its entries in the right order so you don't have to solve this problem down the line.

If you are doing a simple lexicographic order you can try the unix sort, but I am not sure how that will perform on a file with that size.