home | list info | list archive | date index | thread index

Re: [OCLUG-Tech] a (simple?) question about what git can "reuse" across repos

  • Subject: Re: [OCLUG-Tech] a (simple?) question about what git can "reuse" across repos
  • From: Alex Pilon <alp [ at ] alexpilon [ dot ] ca>
  • Date: Tue, 28 Apr 2015 11:03:45 -0400
On Tue, Apr 28, 2015 at 09:17:21AM -0400, Robert P. J. Day wrote:
>   i'm curious if git will recognize identical underlying content from
> two different repositories depending on how that content was added and
> committed.

If it were done, it would have to be done on the same filesystem, and
either using reflinks or hardlinks. The former's safer but the latter is
good enough if you trust the object store to never be modified.

    $ cd /tmp
    $ git clone ~/Documents/Projects/suckless/st
    $ git clone st st2
    $ cd st2/.git
    $ stat ./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc
      File: ‘./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc’
      Size: 194             Blocks: 8          IO Block: 4096   regular file
    Device: 17h/23d Inode: 1895687     Links: 2
    Access: (0644/-rw-r--r--)  Uid: ( 1000/     alp)   Gid: (  100/   users)
    Access: 2015-03-16 22:23:04.000000000 -0400
    Modify: 2015-03-16 22:23:04.000000000 -0400
    Change: 2015-04-28 10:58:59.650934072 -0400
     Birth: -
    $ cd /tmp/st/.git
    $ stat ./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc
      File: ‘./objects/66/30912ed9c7123c7b20a80b90c44810a96ad9cc’
      Size: 194             Blocks: 8          IO Block: 4096   regular file
    Device: 17h/23d Inode: 1895687     Links: 2
    Access: (0644/-rw-r--r--)  Uid: ( 1000/     alp)   Gid: (  100/   users)
    Access: 2015-03-16 22:23:04.000000000 -0400
    Modify: 2015-03-16 22:23:04.000000000 -0400
    Change: 2015-04-28 10:58:59.650934072 -0400
     Birth: -

Note the identical inode number, and the link count. Another way to find
this is `find -type f -links +1`.

So it does work if you clone one repo from another on the same
filesystem. However, cloning again the same repo will not result in
deduplication. Makes sense because it'd potentially be a lot of work to
find another repo without keeping a list of recent clones, which I'm
sure some people'd balk at.

>   in what circumstances would an identical tree object result in less
> work, or however you want to phrase it?

See also gitnamespaces(7). Apparently that's a way to deduplicate.

Attachment: pgpiRU4qV3bHW.pgp
Description: PGP signature