Create a submodule repository from a folder and keep its git commit history
Detailed Solution
See the note at the end of this answer (last paragraph) for a quick alternative to git submodules using npm ;)
In the following answer, you will know how to extract a folder from a repository and make a git repository from it and then including it as a submodule instead of a folder.
Inspired from Gerg Bayer's article Moving Files from one Git Repository to Another, Preserving History
At the beginning, we have something like this:
<git repository A> someFolders someFiles someLib <-- we want this to be a new repo and a git submodule! some files
In the steps below, I will refer this someLib
as <directory 1>
.
At the end, we will have something like this:
<git repository A> someFolders someFiles @submodule --> <git repository B><git repository B> someFolders someFiles
Create a new git repository from a folder in an other repository
Step 1
Get a fresh copy of the repository to split.
git clone <git repository A url>cd <git repository A directory>
Step 2
The current folder will be the new repository, so remove the current remote.
git remote rm origin
Step 3
Extract history of the desired folder and commit it
git filter-branch --subdirectory-filter <directory 1> -- --all
You should now have a git repository with the files from directory 1
in your repo's root with all related commit history.
Step 4
Create your online repository and push your new repository!
git remote add origin <git repository B url>git push
You may need to set the upstream
branch for your first push
git push --set-upstream origin master
Clean <git repository A>
(optional, see comments)
We want to delete traces (files and commit history) of <git repository B>
from <git repository A>
so history for this folder is only there once.
This is based on Removing sensitive data from github.
Go to a new folder and
git clone <git repository A url>cd <git repository A directory>git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch <directory 1> -r' --prune-empty --tag-name-filter cat -- --all
Replace <directory 1>
by the folder you want to remove. -r
will do it recursively inside the specified directory :). Now push to origin/master
with --force
git push origin master --force
Boss Stage (See Note below)
Create a submodule from <git repository B>
into <git repository A>
git submodule add <git repository B url>git submodule updategit commit
Verify if everything worked as expected and push
git push origin master
Note
After doing all of this, I realized in my case that it was more appropriate to use npm to manage my own dependencies instead. We can specify git urls and versions, see the package.json git urls as dependencies.
If you do it this way, the repository you want to use as a requirement must be an npm module so it must contain a package.json
file or you'll get this error: Error: ENOENT, open 'tmp.tgz-unpack/package.json'
.
tldr (alternative solution)
You may find it easier to use npm and manage dependencies with git urls:
- Move folder to a new repository
- run
npm init
inside both repositories - run
npm install --save git://github.com/user/project.git#commit-ish
where you want your dependencies installed
The solution by @GabLeRoux squashes the branches, and the related commits.
A simple way to clone and keep all those extra branches and commits:
1 - Make sure you have this git alias
git config --global alias.clone-branches '! git branch -a | sed -n "/\/HEAD /d; /\/master$/d; /remotes/p;" | xargs -L1 git checkout -t'
2 - Clone the remote, pull all branches, change the remote, filter your directory, push
git clone git@github.com:user/existing-repo.git new-repocd new-repogit clone-branchesgit remote rm origingit remote add origin git@github.com:user/new-repo.gitgit remote -vgit filter-branch --subdirectory-filter my_directory/ -- --allgit push --allgit push --tags
GabLeRoux's solution works well except if you use git lfs
and have large files under the directory you want to detach. In that case, after step 3 all the large files will remain to be pointer files instead of real files. I guess it's probably due to the .gitattributes
file being removed in the filter branch process.
Realizing this, I found the following solution works for me:
cp .gitattributes .git/info/attributes
Copying .gitattributes
which git lfs uses to track large files to .git/
directory to avoid being deleted.
When filter-branch is done don't forget to put back the .gitattributes
if you still want to use git lfs for the new repository:
mv .git/info/attributes .gitattributesgit add .gitattributesgit commit -m 'added back .gitattributes'