What

What this is about

Git-Annex is an interesting extension for Git that I’m testing to synchronize files between my Android Devices and computers. Git-Annex can be run in Termux on an Android phone without having to root the phone.

When

Synchronizing files with mobile devices

Copying photos, screenshots, notes and downloads off of phone and onto the computer is a reoccurring task in order to free space ad synchronize date to and from on my phone.

Most people synchronize their phones with Google Drive, Apple iCloud or Dropbox. That’s a lot more convenient, but it also means giving up a lot of privacy.

Background

Git-Annex

Git-Annex is an extension for Git, a versioning system for source code well-known among software developers.

With git source code lives in “repositories”. Each developer can “clone” or “checkout” his own working copy. Files are added with “add” and then “commit”ed to the working copy, before they are “push”ed to the the remote repository on a server.

This allows multiple developers to work on the same code simultaneously without destroying each others work. Git keeps a history of all changes and can automatically solve most of the issues of bringing together the working copies.

As git is designed for source code and is built to analyse line-by-line differences in text files, it’s inefficient on large binary files. Each change requires storing a new copy of the entire binary file, the size of working copies grows as every file is always downloaded when cloning. Checking for differences in binary files is slow and makes little sense.

Large binary files can be very useful for instance as input files for software testing purposes and so extensions to Git have been built to enable storing large files.

Git-Annex is one such extension for Git that I am testing for synchronizing files between my Android devices and computer.

The benefit is a cloud-less decentralized solution to the problem of synchronizing files. The drawback lies in it’s complexity over other solutions.

How

Setting up Git-Annex on Android

Setting up Git-Annex with Android devices is a bit involved. The following describes how I’ve configured it on my Android devices.

Note: there is also a very helpful walk-through guide I’ve linked below [1].

For each Android device

The first step is to install Termux. Termux is a Linux environment for Android. It’s installed like a regular Android-App. The Google PlayStore versions are currently outdated and use incompatible libssl builds that break curl and wget, so it’s best to install from F-Droid.

On the Android device, in Termux we first install and configure an SSH-Server:

pkg install openssh # Install SSH Server
passwd # set a passwort
ifconfig # check the device ip address
sshd # run the ssh server

Now we can log-in from a computer:

ssh-copy-id -p 8022 root@<device-ip> # configure keyless login
ssh root@<device-ip> -p 8022 # login without password prompt

pkg update # update package references
pkg install git wget proot # install git and git-annex dependencies

uname -a # check if it's a 64-bit (aarch64) or 32-bit device

The last command will print either armabi or aarch64 depending on whether the device uses a 32-bit or 64-bit processor. There are different prebuilt versions of git-annex depending on the type of processor.

Get the Git-Annex prebuilts

If it’s a 32-Bit Android device:

wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-armel.tar.gz
tar -xvf git-annex-standalone-armel.tar.gz

If it’s a 64-Bit Android device, instead:

wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64.tar.gz
tar -xvf git-annex-standalone-arm64.tar.gz

Now we can configure git-annex:

cd git-annex.linux/
unset LD_PRELOAD # required for git-annex to find librarires
./runshell # properly configures git-annex PATH variables

git config --global user.email "<YourEMail>"
git config --global user.name "<YourName>"

termux-setup-storage # maps Android storage directories (photos, screenshots, downloads, etc) so they an be accessed from termux

exit

Close and re-open the Termux-App, so that the new git-annex configuration is properly loaded.

On most devices there are two directories

/data/data/com.termux/files/home/storage/dcim/Camera

and

/storage/emulated/0/DCIM/Camera 

that both point to same directory. So watch out not to mix the two as this can confuse git-annex.

cd /data/data/com.termux/files/home/storage/dcim/Camera # enter camera photo directory

git init # create a new git repository
git-annex init # initialize git-annex with the repository

Android uses a file-system called “sdcardfs” that is based on FAT32 and as such doesn’t allow symbolic links.

This is inconvenient for use with git-annex as it would usually place symlinks in the repositories main directory that link to locations of the actual binaries in the git-tree.

Without symlinks git-annex has to resort to using copies instead. This means there are always two copies of the data - in git repository directory tree and the main directory. This of course uses more storage, but works like the symlinks for most use-cases.

Git-Annex will occasionally state this in printouts saying it is using a “crippled filesystem” and is using the “direct mode”.

The Android file-system also doesn’t support setting write permissions which causes errors with git-annex. We can disable by setting:

git config --global --add safe.directory /data/data/com.termux/files/home/storage/dcim/Camera # remove write permission warnings

Now we can add and commit files:

git annex add . # stage all files in the directory
git commit -am "phone photos" # commit the files

The Android device is now ready to be synced with git-annex from other devices.

For convenience you may want to add a script “sync.sh” to /data/data/com.termux/files/home/.shortcuts.

#!/bin/bash
cd /data/data/com.termux/files/home/storage/dcim/Camera # enter camera photo directory
git annex add . # stage all files in the directory
git commit -am "phone photos" # commit the files
git annex sync --content

and setting that executable

chmod a+x /data/data/com.termux/files/home/.shortcuts/sync.sh

With the “Termix Widget”-App installed, and the widget added to the homescreen, synchronization can be triggered by tapping on the sync.sh script in the list of scripts directly from the home screen.

Setting up Git-Annex on Linux

On the Linux-PC (or Windows WSL)

mkdir files/ && cd files/ # create some folder and enter it
apt install git git-annex # install git and git-annex

git init # create a new git repository
git-annex init # initialize git-annex with the repository

We can then add one or more Android devices as remotes:

git remote add <deviceName> ssh://<username>@<ip-adress>:8022/data/data/com.termux/files/home/storage/dcim/Camera

Files can be synchronized by running

git annex sync --content

In case of errors

annex ignore

I ran into some issues with git-annex marking a remote with “annex-ignore = true”. This happens when the remote is not correctly configured. When a remote is once labeled with “annex-ignore”, it won’t be synchronised.

After fixing whatever the issue was, we can remove the flag by editing the config, for this

vim .git/config

and remove the line

annex-ignore = true

vanished files

Another issue I ran into while messing around with git-annex was that at one point all files appeared to have vanished. This was likely caused by re-run “git-annex init” which created a new repo and repo UUID.

I was able to see the files still existed under

.git/annex/objects

In this state one has to be careful not to run “git” commands, that may automatically issue a “git gc”, because that will in fact remove the files.

The commands

git annex repair

and

git annex fsck

can be run, but didn’t help.

However

git annex unused

showed the files that had vanished and

git annex whereused --historical --unused

listed their file names.

I found that I had a faulty commit that could be seen with

git log --stat

though I have no idea where that came from, reverting by the commit hash solved the issue

git revert <commit hash>

After re-commiting

git commit -am "photos"

and syncing on all devices

git annex sync --content

the files reappeared on all devices.

accidental 'git add' instead of 'git annex add'

Git-Annex extends Git and as such we can use mixed repositories with source and binaries.

Especially when used to standalone git one might accidentally add a file with “git add” instead of “git annex add”.

If the added file is also committed, then the usual painful approaches to get the file out of the git repository are required. One would have to “git revert” the commit and then clean the git tree to make sure the unused file doesn’t increase the git repository size.

Progress

Conclusions

So far git-annex works well for me. There is a learning curve and it does require familiarity with Git and the Linux command-line. At the stage I’m currently using it, it probably doesn’t beat simple “rsync” approaches.

However git-annex has some more advanced features. One can, for instance, specify how many copies to keep on which remotes to enforce redundancy.

It might be very useful to be able to keep the last 30 days of photos on the phone and drop older photos, if they have already been synchronized to two (or more) other remotes.

Backup hard drives can be added and can be automatically checked for data consistency.

In my view the approach is also superior to cloud-based backups as it preserves data privacy, is a lot faster in synchronisation speed and is decentralized - it doesn’t require an always-on and power consuming server. It can even be combined with off-site cloud storage by adding the cloud storage as an additional git remote.

With git-annex I can sync my Android devices to my laptop or desktop computer, depending on which is available at a given time. Then sync the two with each other at a later time without having to manually sort out conflicts.

Due to the heavy use of checksums with git and git-annex data corruption can also be detected.

And as a software developer git-annex is a useful extension to git, that one should know about, for adding large binary files, for instance as test input files, to source-code repositories.


1] https://git-annex.branchable.com/walkthrough/