git filter-branch led to a disconnected history: how to get rid of the old commits? git filter-branch led to a disconnected history: how to get rid of the old commits? git git

git filter-branch led to a disconnected history: how to get rid of the old commits?


I managed to solve my problem by changing the way I used cvs2git: instead of converting the whole CVS base and then use the subdirectory-filter command, I converted each of the submodules I wanted. In my case, this led to launch 18 different cvs2git commands:

Before

cvs2git --blobfile=blob --dump=dump /path/to/cvs/base# Module 1git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module1" -- --all# Module 2git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter "path/to/module2" -- --all

Now

# Module 1cvs2git --blobfile=blob_module1 --dump=dump_module1 /path/to/cvs/base/path/to/module1# Module 2cvs2git --blobfile=blob_module2 --dump=dump_module2 /path/to/cvs/base/path/to/module2

Each repository has now a perfect history.

Why the previous method didn't work ? My guess is that cvs2git was confused with all the submodules (some of them had their directory name changed during their history).

@Michael @CharlesB Thank you for taking your time to answer and help me.


I bet you are getting hit with this:

  • Differences between CVS and git branch/tag models: CVS allows a branch or tag to be created from arbitrary combinations of source revisions from multiple source branches. It even allows file revisions that were never contemporaneous to be added to a single branch/tag. Git, on the other hand, only allows the full source tree, as it existed at some instant in the history, to be branched or tagged as a unit. Moreover, the ancestry of a git revision makes implications about the contents of that revision. This difference means that it is fundamentally impossible to represent an arbitrary CVS history in a git repository 100% faithfully. cvs2git uses the following workarounds:

    • cvs2git tries to create a branch from a single source, but if it can't figure out how to, it creates the branch using a "merge" from multiple source branches. In pathological situations, the number of merge sources for a branch can be arbitrarily large. The resulting history implies that whenever any file was added to a branch, the entire source branch was merged into the destination branch, which is clearly incorrect. (The alternative, to omit the merge, would discard the information that some content was moved from one branch to the other.)

    • If cvs2git cannot determine that a CVS tag can be created from a single revision, then it creates a tag fixup branch named TAG.FIXUP, then tags this branch. (This is a necessary workaround for the fact that git only allows existing revisions to be tagged.) The TAG.FIXUP branch is created as a merge between all of the branches that contain file revisions included in the tag, which involves the same tradeoff described above for branches. The TAG.FIXUP branch is cleared at the end of the conversion, but (due to a technical limitation of the git fast-import file format) not deleted. There are some situations when a tag could be created from a single revision, but cvs2git does not realize it and creates a superfluous tag fixup branch. It is possible to delete superfluous tag fixup branches after the conversion by running the contrib/git-move-refs.py script within the resulting git repository.

  • There are no checks that CVS branch and tag names are legal git names. There are probably other git constraints that should also be checked. see cvs2git

Are you showing the refs directory of the new dirs or of the large repo after conversion? You could delete the tags in your single large export repo before you filter and split the large repo.

You can delete tags in the large repo by just deleting the file in the directory - it is just a reference to a SHA.