Skip to content

kmbnw/fuse-sha1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fuse-sha1 is a FUSE filesystem based on the Python FUSE example mirroring filesystem Xmp (included 
in xmp.py).

If you did not receive these files from github.com/kmbnw/fuse-sha1, you may have an old,
corrupted, or malicious version of the software.

Any questions please contact Krysta Bouzek (nwkrystab on google).

fuse-sha1 (and xmp) works by mounting a directory as the "root" filesystem for the mirror, then
applying SHA1 checksums to an SQLite database whenever a file is removed, updated, or added.
The path stored in the SQLite database is the path for the "root", not the mirrored filesystem.
Please note when you modify a file in the mirrored filesystem, the file will actually be changed in 
the "root" filesystem.

fuse-sha1 will log errors and warnings to a file named LOG that is created in whatever directory
fuse-sha1 is run from.

== Usages for fuse-sha1 ==

## 1. Start with an empty directory to mirror ##

python fuse-sha1.py -o root=/path/to/root/dir --database=/path/to/sqlite/dbfile /path/to/fuse/mirror

For example, suppose you want to mirror /home/user/myfiles, which is currently empty (this is the
"root"), and you want to put files from /home/user/mypics into it.  Suppose you want your SQLite database
stored at /home/user/mysqlitedb.db, and that you want to use /home/user/fusetmp as the mount
directory (this is the "mirrored" directory).  First mount the mirror:

python fuse-sha1.py -o root=/home/user/myfiles --database=/home/user/mysqlitedb.db /home/user/fusetmp

Now you can copy files from /home/user/mypics:

cp -Ruv /home/user/mypics /home/user/fusetmp

Each file will be checksummed after the copy, and the path and checksum placed into the database.
Supposing you had 1.jpg, 2.png, and 3.gif inside of /home/user/mypics, you will end up with:

mirrored files (disappear after unmounting the FUSE filesystem):

/home/user/fusetmp/1.jpg
/home/user/fusetmp/2.png
/home/user/fusetmp/3.gif

actual files:

/home/user/myfiles/1.jpg
/home/user/myfiles/2.png
/home/user/myfiles/3.gif

database entries

<checksum1> /home/user/myfiles/1.jpg
<checksum1> /home/user/myfiles/2.png
<checksum1> /home/user/myfiles/3.gif

## 2. Mirror an existing directory ##

This works the same way as for an empty directory, but you have to tell fuse-sha1 that it should 
scan the filesystem at mount time.  Using the same example as in use case 1, you would do:

python fuse-sha1.py -o root=/home/user/myfiles --database=/home/user/mysqlitedb.db --rescan /home/user/fusetmp

Note that --rescan can be used anytime you want.  It simply runs an insert or update on the files, 
then removes any entries for which the path does not exist.  It can be time consuming with a large
filesystem, which is why it is not on by default.

== Handling nonexistent files ==

If you need to remove nonexistent files (e.g. if you deleted files from the root without going through
the mirrored filesystem) just add run sha1db.py with --vacuum, e.g.

python sha1db.py /home/user/mysqlitedb.db --vacuum

This will scan the database at and remove any entries for which the file does not exist.

== Handling duplicates ==

One thing that I like to do is clean out duplicates for a directory.  The FUSE tools don't have an
easy way to directly handle that (I think ZFS might), so I wrote some tooling to handle it.  

It doesn't actually delete duplicate files-it moves them to a directory you specify.  It will 
keep unique subdirectory hierarchies.  For example, if you had duplicate files in 

/home/user/myfiles/other

and wanted to have them moved to 

/home/user/duplicates

it will make the directory 

/home/user/duplicates/other

and move the files there.  This makes it relatively easy to move files back by hand.  Note that
the move will fail if the duplicate directory is not empty.

To use this feature, issue the following command:

python sha1db.py /path/to/sqlite/dbfile --dedup /path/to/duplicate/holding/dir

Using the example from use case 1 above, you would say 

python sha1db.py /home/user/mysqlitedb.db --dedup /home/user/duplicates

The dedup operation also logs the file moves to the LOG file mentioned at the beginning of this 
readme.

Note that symlinks are treated specially; they are not considered duplicates and thus will not
be removed from the database nor the filesystem.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages