Importing a git history to another repository

2026-01-01

A few weeks ago, a new repository taken from a subset of a first one was created at work. But it was created by just copying files from the first repository while losing the entire git history.

Creating the new repository

Let’s call the new repository new-repo, and the first one src-repo.

src-repo repository structure can be seen as a classic webapp having the following directory structure:

backend/
frontend/
images/

The images directory contains Docker files and scripts (for software and tools unrelated to backend and frontend), which was moved for convenience to the new new-repo repository.
It was roughly created like:

$ git init new-repo
$ cd new-repo
$ cp -a /path/to/src-repo/images images
$ git add .
$ git commit -m "init: copied files from src-repo/images"

Then the result was pushed to GitHub, and the new repository started living its life with new GitHub actions. A few days ago, the images directory was deleted from src-repo as everything was migrated.

Note: I personally much prefer having everything into a single monolithic repository (up to a certain size), as it simplifies a lot of things. But due to a split between deployment and software teams, they wanted to have that directory living in its own repository as they had different life cycles.

A bug appeared in production

new-repo started living its life.
A few days ago, changes to some core files were made and then deployed to production.

The bug was really important, but was only triggered on an edge-case that was not tested by integration tests. So that is kind of a first big miss.
The changes refactored a part of the code that was here for historical reasons, but without knowing why, and as the git history was not available anymore in new-repo, it was not tracked why. It was obvious deep in the git history of src-repo, but as the images directory was deleted, it was not looked at.

I am also pretty sure that the bug would have occurred even with the git history available, as the changes looked quite harmless.

Anyway, I decided to merge back the src-repo/images history into new-repo to have everything at hand for next time.

Importing Git history

Cleaning the first repository

In a previous company, I had already used git subtree to import the history of a repository into a new directory of another repository, which is the standard and easy use case.
I wanted to do a bit of the same thing here, but it had a few challenges:

there is no sub-directory to import, we want to import the history for existing files
the new-repo started living its own life, so files cannot be overwritten like that

Anyway, the first step was to extract the history of the src-repo/images cleanly to be reimported.
After some ~~googling~~LLM search, I found a new tool called git-filter-repo that did exactly what I needed.
Install was easy: brew install git-filter-repo.

I first cloned a new instance of src-repo to be able to work inside cleanly without fear of breaking anything on my development setup.
As the relevant directory was removed in a previous commit, I needed to checkout that commit. Instead of using a new branch, I just reset --hard to the commit before as I did not care about the rest of the history.

I then did a bit of trial and error with some git filter-repo and --subdirectory-filter, but that did not suit what I wanted:

it removed the first directory (images here) that I wanted to keep as it moves files to the root directory
it kept the history of all branches, but I wanted to only keep main

I settled for a simple git filter-repo --path images --refs main which allowed me to keep the images directory of the main branch to match what was done in new-repo.
In the end, I had my src-repo with only the history of images, and all other files removed.

# before filtering
src-repo-main
A [images] --- B [backend] --- C [images] --- D [frontend]

       |
       |  "git filter-repo --path images --refs main"
       v

src-repo-main (Cleaned)
A' [images] ----------------- C' [images]
# after filtering: 
# - commits B and D have been removed
# - history is now specific to "images" with A' and C' having new commit hashes

Importing the history into the new repository

Now that src-repo had a “cleaned” importable state, I could import it into new-repo. However, as new-repo already had its own new history, I could not just overwrite everything, so I had to create a new “empty” branch with: git switch --orphan src-repo-main.

I could then easily import the history of src-repo by merging it to the empty branch with: git pull /path/to/src-repo main. This imported the cleaned history into src-repo.

Merging the two histories

At that point, there are two distinct branches in new-repo that do not share any commit between them:

main which is the original one which started to live its life
src-repo-main which was the one just imported

# imported from src-repo         while "main" started its life
branch: src-repo-main            branch: main
      |                                |
      A'                               X
      |                                |
      C'                               Y
      |                                |
      E'                               Z (HEAD)

Two options from here:

merge the two histories with a merge commit
rebase main on top of src-repo-main which would become the new default branch

Using a merge commit

Using a merge commit allows to:

keep the original histories separated
do not mess up with the original branch
but the two histories are still disjoint, using a git blame will only show the last branch, and GitHub has trouble displaying the whole history on a single file (it works much better within IDEs like IntelliJ). It can be displayed via the CLI with git log --full-history -- path/to/file

A' --- C' --- E' (src-repo-main)
                   \
                    \   <-- merge Commit (M)
                     \
      X --- Y --- Z -- M  (main)

Steps are simple, inside new-repo:

merge main into the new branch: git merge origin/main --allow-unrelated-histories, files will be updated with the ones from main into src-repo-main
push to origin: git push -u origin src-repo-main
create a Pull Request in GitHub to merge back the new branch into main. At that point, displayed diff in the Pull Request must be empty as only the history is imported and no file is changed, added or deleted
merge the Pull Request with a merge commit to keep whole history

# init and filter old repository
$ brew install git-filter-repo
$ git clone src-repo src-repo-cleaned
$ cd src-repo-cleaned
$ git reset --hard ${COMMIT_BEFORE_DELETION}
$ git filter-repo --path images --refs main
# import the branch to the new-repo
$ cd path/to/new-repo
$ git switch --orphan src-repo-main
$ git pull /path/to/src-repo-cleaned main
# import using merge strategy
$ git merge origin/main --allow-unrelated-histories
$ git push -u origin src-repo-main
# create a Pull Request to target "main"

Rebasing main on top of the new branch

Using a rebase strategy:

keeps the history linear
is incompatible with changes from main
as it introduces new commits, hashes from Pull Request are also lost
all previously opened Pull Requests need to be redone from the start and target the new branch, as commits are all new

# before with 2 histories in different branches
      A'--- C'--- E' (src-repo-main)
      X --- Y --- Z  (main)

# after "git rebase src-repo-main":
#     ("src-repo-main")   (rebased "main" commits)
      A' --- C' --- E' --- X' --- Y' --- Z'  (main-rebased)

Steps are also quite simple:

create a new branch from main to be safe: git switch -c main-rebased
rebase the changes from main on top of the new branch, only keeping changes from main: git rebase src-repo-main -X theirs. This will remove all merge commits and only keep commits from the original main

Note: using git rebase, the meanings of ours and theirs are swapped compared to a merge. Here, -X theirs tells Git to use the commits being replayed (the ones from main) over the upstream history if a conflict occurs.

With the above steps, a new history in main-rebased is available with the history from src-repo-main followed by the one from main.

From here 2 choices:

the main-rebased branch becomes the new default one, but that breaks some workflow and automation
changes from main-rebased are force-pushed to main to overwrite it. Developers need to reset their changes to the new branch

# init and filter old repository
$ brew install git-filter-repo
$ git clone src-repo src-repo-cleaned
$ cd src-repo-cleaned
$ git reset --hard ${COMMIT_BEFORE_DELETION}
$ git filter-repo --path images --refs main
# import the branch to the new-repo
$ cd path/to/new-repo
$ git switch --orphan src-repo-main
$ git pull /path/to/src-repo-cleaned main
# rebase main on top of src-repo-main
$ git switch -c main-rebased origin/main
$ git rebase src-repo-main -X theirs
# overwrite "main" on origin
$ git checkout main
$ git reset --hard main-rebased
$ git push --force

Squashing after rebasing

In the case of this repository, there have only been a few Pull Requests since it has been created. However the history was still kind of a mess as most of the Pull Requests were merged with merge commits.

I used the opportunity to clean up the history by rebasing and squashing merges that belonged to the same Pull Request. In the process, I renamed the first commit of each Pull Request to the same format than a squash merge: <Title> (#PR), so that it is then tracked correctly.
The operation is then:

$ git rebase -i <COMMIT_BEFORE_REBASE>
# reword first commit
# fixup following commits until the next ones

I looked at each Pull Request to determine the commits to squash together.
This now gives a linear and clean history for new commits on the new branch.

I have looked and tried to use git rebase -i --rebase-merges which keeps track of all the merge commits and changes on the branch, but it was too complex to use for my use case, and just having the commits themselves was enough.

The changes were validated by diffing the two branches: git diff main main-rebased to ensure their content was the same.

Update process and communication

Both strategies described above have their pros and cons. After discussion with the team, we decided to keep the history with the rebase + squash strategy, as it was the best in the long term having a single unified history.

To migrate the branch, we followed a few simple steps:

for only squash merge on the repository
communicate to the team not to merge new Pull Requests during a give time slot
deactivate protected branches if any
executing the git commands:

$ git switch main # original branch
$ git pull --rebase # ensure it is up date
$ git reset --hard origin/main # ensure it is at the same version than origin
$ git switch -c main-old # create the "backup" branch
$ git push origin # push it
$ git switch main-rebased # switch to the rebased branch
$ git reset --hard origin/main-rebased # ensure it is up date
$ git diff main main-rebased # make sure there is no difference 
$ git push --force origin main-rebased:main # overwrite main with main-rebased

check in Github everything is good
notify the users to update their repository:

$ git fetch -a
$ git reset --hard origin/main

they can now work normally again

Existing Pull Requests

When doing the migration, the original main branch was kept and renamed to main-old. As there were a few Pull Requests opened, using a rebase + squash strategy messes the diff in GitHub.
So we:

changed the target of the existing Pull Requests from main to main-old
closed them with a message to recreate them by cherry-picking the new commits on the new branch

That way developers could still see their diff with the original main branch and ensure that the diff in their new Pull Request is identical.

Conclusion

The migration went very smoothly, there were not a lot of risks. No problem were encountered and users could re-create their Pull Requests easily.

For next time:

the history from src-repo should have been imported at the beginning when creating new-repo
a test to ensure such regression in production is detected is of course needed!