Using Git Submodules for Private Content

My website has been open source for as long as it has existed. Originally, it was a WordPress site, but only the layout was out there for everyone to see since the data was saved in a database. Once I moved to Gatsby, I kept all the images and posts in a content directory. This was way better, as my content is all conveniently stored in one easy-to-save folder and the posts are all in beautiful markdown.

However, people often like my layout and want to use it, so they clone and deploy this site. Sometimes they will just leave up all the posts and images and update the name and image. Although I subscribe to the Zenhabits Uncopyright philosophy towards content - my content is out there for the world to see and do what they want with it, and it doesn't bother me - I don't think I should make it quite so easy to just clone everything I've written in a moment. If you're going to plagiarize, you should at least have to do a bit of work.

So I decided to store my content in a private git submodule. If you go to the repo for this site now, you'll see a folder that looks like content @ <hash>. If you click on it, you'll be taken to a 404 page. If I click on it, I'll be taken to a separate, private repo that contains all my images and posts.

submodule

A lot of people have asked me how to use private git submodules, so I'll go over it here. Note that this is not a deep-dive into submodules, but just the basics of adding, updating, and cloning a repo with submodules.

Git Submodules

Git submodules allow you to keep a git repository as a subdirectory of another git repository.

This could be useful if you have a lot of projects within a project. One example of this is the Dracula code theme repo. Every folder is a git submodule. This allows people to add a new theme for a new program by creating their own repo, and the owner of the parent repository only needs to reference the child repos. You can tell they're all submodules because of the @ <hash> after each subdirectory name.

Before doing anything with submodules, I would recommend running this command to update the config and set submodule.recurse to true, which allows git clone and git pull to automatically update submodules.

git config --global submodule.recurse true
Command Description
git submodule add <repo> Add a submodule within a repository
git submodule update Update existing submodules within a repository (add --remote to pull from a remote location)
git submodule init Initialize local submodules file (only necessary if repo not cloned with --recurse-submodules)

Adding a submodule

Let's imagine that you want a public blog, located on the blog repo, to contain a submodule with all the posts, located in the posts repo. So it will look like:

  • A public repo at github.com/you/blog
  • A private repo at github.com/you/posts

I'm just using GitHub as an example, it doesn't matter where the repo is hosted. Also, git submodules can also be used for both private and public repos.

First you can add the submodule. From the root of blog, you would run this command.

git submodule add https://github.com/you/posts

This would clone the posts repo into a folder in blog.

Cloning into '/Users/you/blog/posts'...

You will now have two new entries into the blog repo, a .gitmodules file, and the new posts subdirectory.

git status
Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   .gitmodules
	new file:   posts

.gitmodules will look like this:

.gitmodules
[submodule "posts"]
	path = posts
	url = https://github.com/you/posts

At this point, you have a reference to the posts repo as a submodule, so the directory structure will look like this:

.git
.gitmodules
posts/

As a note, if you cd into .git, you'll see a modules directory. This will contain a folder called posts, and this is where git is storing references and other data about your submodules.

Updating a submodule

To update submodule content, you'll pull in any changes made to the remote submodule repo with the update command. Since you would be updating content from a remote location, you'll add the --remote flag. From the root of the blog repo, you would run the command:

git submodule update --remote

It's important to note that when working with submodules, you shouldn't work on or commit your local version of the submodule repo. If you made any changes locally, your version would now be out-of-sync with the submodule repo.

You just want to treat a submodule as an entirely separate repo, but linked. This is much like code found in node_modules for an node project, where the references to the projects are listed in package.json and you know any local changes you make to a dependency in node_modules will not be persisted.

Modified content

If you make changes locally and run a git status, you will see modified content next to the modified submodule.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)
	modified:   posts (modified content)

New commits

If you make changes to the submodule and bring those commits in properly and run a git status, you will see new commits next to the modified submodule.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   posts (new commits)

If you see (new commits), you can commit those changes on the parent repo. You can also check this by viewing the diff.

git diff
diff --git a/posts b/posts
index abc..def 160000
--- a/posts
+++ b/posts
@@ -1 +1 @@
-Subproject commit abc...
+Subproject commit def...

Cloning a repo with submodules

If you clone an existing repository and it has submodules within it, you'll have to init and update to pull in all the submodule content.

git clone https://github.com/you/posts
cd posts && git submodule init && git submodule update

You can bypass this by either having the submodules.recurse setting set, or by using the --recurse-submodules flag.

git clone --recurse-submodules https://github.com/you/posts

This will clone the directory along with all submodule content.

Deployment

For my site, I've made the content submodule private. If you have a Netlify site and want to know what to do to allow Netlify to pull from the private repo, here is an overview of the steps.

  • Generate a deploy key from Netlify.
  • Add the key as a read-only deploy key on the settings for your private repo (found at github.com/you/repo/settings/keys).
  • Netlify will now have permissions to fetch the submodules that it reads from your .gitmodules file.

Summary

Here are the main points from the article:

  • Submodules are used when a subdirectory in a repo should consist of all the data from another repo.
  • You can add a submodule to a project with git add submodule <submodule-repo>.
  • You can update submodules within a project with git update submodule --remote.
  • You should clone a project that has submodules with --recurse-submodules or set submodule.recurse in your config to do this by default.
  • You should not work on any submodule files directly within the parent repo. The submodule directory should be treated only as a reference to another existing repo.

My current process for updating the site looks like:

  • Make changes to content repo.
  • Commit changes to content and push to private submodule hosted on GitHub: git commit && git push.
  • Pull new updates into local taniarascia.com repo: git submodule update --remote.
  • Commit the new submodule changes and push to public GitHub repo: git commit && git push.
  • Netlify deploys the new site.

Comments