Can Git detect if two source files are essentially copies of each others? Can Git detect if two source files are essentially copies of each others? git git

Can Git detect if two source files are essentially copies of each others?


Why use git at all? A simple but effective technique would be to compare the sizes of the diffs between all of the different submissions, and then to manually inspect and compare those with the smallest differences.


Moss is a tool that was developed by a Stanford CS prof. I think they use it there as well. It's like diff for source code.


Adding to the other answers, you could use diff -- but I don't think the answers will be that useful by themselves. What you want is the number of lines that match, minus the number of non-blank lines, and to get that automatically you need to do a fair bit of magic with wc -l and grep to compute the sum of the lengths of the files, minus the length of the diff file, minus the number of blank lines that diff included as matching. And even then you'll miss some cases where diff decided that identical lines didn't match because of different things inserted before them.

A much better option is one of the suggestions listed in https://stackoverflow.com/questions/5294447/how-can-i-find-source-code-copying (or in https://stackoverflow.com/questions/4131900/how-to-detect-plagiarized-code, though the answers seem to duplicate).