Compress EACH LINE of a file individually and independently of one another? (or preserve newlines) Compress EACH LINE of a file individually and independently of one another? (or preserve newlines) unix unix

Compress EACH LINE of a file individually and independently of one another? (or preserve newlines)


Are you sure you're running out of the Memory (RAM?) with your sort?

My experience debugging sort problems leads me to believe that you have probably run out of diskspace for sort to create it temporary files. Also recall that diskspace used to sort is usually in /tmp or /var/tmp.

So check out your available disk space with :

df -g 

(some systems don't support -g, try -m (megs) -k (kiloB) )

If you have an undersized /tmp partition, do you have another partition with 10-20GB free? If yes, then tell your sort to use that dir with

 sort -T /alt/dir

Note that for sort version

sort (GNU coreutils) 5.97

The help says

 -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;                          multiple options specify multiple directories

I'm not sure if this means can combine a bunch of -T=/dr1/ -T=/dr2 ... to get to your 10GB*sortFactor space or not. My experience was that it only used the last dir in the list, so try to use 1 dir that is big enough.

Also, note that you can go to the whatever dir you are using for sort, and you'll see the acctivity of the temporary files used for sorting.

I hope this helps.

As you appear to be a new user here on S.O., allow me to welcome you and remind you of four things we do:

. 1) Read the FAQs

. 2) Please accept the answer that best solves your problem, if any, by pressing the checkmark sign. This gives the respondent with the best answer 15 points of reputation. It is not subtracted (as some people seem to think) from your reputation points ;-)

. 3) When you see good Q&A, vote them up by using the gray triangles, as the credibility of the system is based on the reputation that users gain by sharing their knowledge.

. 4) As you receive help, try to give it too, answering questions in your area of expertise


There are some possible solutions:

1 - use any text processing language (perl, awk) to extract each line and save the line number and a hash for that line, and then compare the hashes

2 - Can / Want to remove the duplicate lines, leaving just one occurence per file? Could use a script (command) like:awk '!x[$0]++' oldfile > newfile

3 - Why not split the files but with some criteria? Supposing all your lines begin with letters:- break your original_file in 20 smaller files: grep "^a*$" original_file > a_file- sort each small file: a_file, b_file, and so on- verify the duplicates, count them, do whatever you want.