On Tue, Apr 28, 2015 at 09:17:21AM -0400, Robert P. J. Day wrote:
> i'm curious if git will recognize identical underlying content from
> two different repositories depending on how that content was added and
> committed.
If it were done, it would have to be done on the same filesystem, and
either using reflinks or hardlinks. The former's safer but the latter is
good enough if you trust the object store to never be modified.
$ cd /tmp
$ git clone ~/Documents/Projects/suckless/st
$ git clone st st2
$ cd st2/.git
$ stat ./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc
File: ‘./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc’
Size: 194 Blocks: 8 IO Block: 4096 regular file
Device: 17h/23d Inode: 1895687 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ alp) Gid: ( 100/ users)
Access: 2015-03-16 22:23:04.000000000 -0400
Modify: 2015-03-16 22:23:04.000000000 -0400
Change: 2015-04-28 10:58:59.650934072 -0400
Birth: -
$ cd /tmp/st/.git
$ stat ./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc
File: ‘./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc’
Size: 194 Blocks: 8 IO Block: 4096 regular file
Device: 17h/23d Inode: 1895687 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ alp) Gid: ( 100/ users)
Access: 2015-03-16 22:23:04.000000000 -0400
Modify: 2015-03-16 22:23:04.000000000 -0400
Change: 2015-04-28 10:58:59.650934072 -0400
Birth: -
Note the identical inode number, and the link count. Another way to find
this is `find -type f -links +1`.
So it does work if you clone one repo from another on the same
filesystem. However, cloning again the same repo will not result in
deduplication. Makes sense because it'd potentially be a lot of work to
find another repo without keeping a list of recent clones, which I'm
sure some people'd balk at.
> in what circumstances would an identical tree object result in less
> work, or however you want to phrase it?
See also gitnamespaces(7). Apparently that's a way to deduplicate.
Attachment:
pgpiRU4qV3bHW.pgp
Description: PGP signature