项目作者: Chinory

项目描述 :
Hardlink duplicate files.
高级语言: JavaScript
项目地址: git://github.com/Chinory/lndup.git
创建时间: 2018-04-26T02:03:57Z
项目社区:https://github.com/Chinory/lndup

开源协议:MIT License

下载


lndup

Hardlink duplicate files. Ultra fast!

  • asynchronous I/O
  • one file one stat() call
  • full-speed hash least files
  • customize filters and keys with javascript

Installation

  1. $ npm i -g lndup

Usage

  1. Usage: lndup [OPTION]... [PATH]...
  2. Hardlink duplicate files.
  3. -n, --dry-run don't link
  4. -v, --verbose explain what is being done
  5. -q, --quiet don't output extra information
  6. -i, --stdin read more paths from stdin
  7. -f, --file add a file filter
  8. (stats: fs.Stats, path: string): boolean
  9. -d, --dir add a directory filter
  10. (stats: fs.Stats, path: string, files: string[]): boolean
  11. -k, --key add a key to differentiate files
  12. (stats: fs.Stats, path: string): any
  13. -H, --hash select a digest algorithm, default: sha1
  14. run 'openssl list -digest-algorithms' for available algorithms.
  15. -h, --help display this help and exit
  16. -V, --version output version information and exit
  17. See <https://github.com/chinory/lndup>
  • Not follow symbolic links.

Example

All outputs are executable unix-shell code, using comment to carry extra information.

  1. $ lndup -v .
  2. #Stat: probe: readdir 204B 3
  3. #Stat: probe: stat 144.02MiB 23
  4. #Stat: probe: select 144.02MiB 19
  5. #Time: probe: 7.351ms
  6. #Stat: verify: internal 0B 0
  7. #Stat: verify: external 144.00MiB 9
  8. #Stat: verify: total 144.00MiB 9
  9. #Time: verify: 183.209ms
  10. #Stat: solve: current 112.00MiB 7
  11. #Time: solve: 0.110ms
  12. ln -f -- '16M/null_2' '16M/null_3'
  13. ln -f -- '16M/null_2' '16M/null_1'
  14. ln -f -- '16M/ran1_1' '16M/ran1_2'
  15. ln -f -- 'root/ran4_1' 'root/ran4_2'
  16. ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'
  17. #Stat: execute: todo 64.00MiB 3 4
  18. #Stat: execute: done 48.00MiB 2 3
  19. #Stat: execute: fail 16.00MiB 1 1
  20. #Time: execute: 8.331ms

Customize filter & key

File Filters: If you don’t want to hardlink files smaller than 1024 bytes:

  1. $ lndup /path -f 'stats=>stats.size>=1024'

Directory Filters: While you don’t want to consider a directory with more than 100 files:

  1. $ lndup /path -f 'stats=>stats.size>=1024' -d '(s,p,f)=>f.length<=100'

Extra keys: Obviously, you don’t want to hardlink the same files with different user, group and mode:

  1. $ lndup /path -k 's=>s.uid' -k 's=>s.gid' -k 's=>s.mode'

Require more: Finally, you have a super idea:

  1. $ lndup /path -k 'require("/path/to/keyfunc.js")' -f 'require("/path/to/filter.js")'

Notice

Failed operation will be output to stderr like following:

  1. ln -f -- 'root/ran4_1' 'root/ran4_2' #Error: EACCES: permission denied, rename 'root/ran4_2' -> 'root/ran4_2.e8c70ebe0635ab41'

Beware of mv and rm fails, they are remedies of failed link operation, you need to complete them manually. Fortunately, the remedies rarely fail, as they are all counter-operations of just-successful operations.

Requirement

  • Node.js >=9

Introduction

data structure

  1. // nested maps
  2. devMap // devMap instanceof Map
  3. sizeMap = devMap[stat.dev] // sizeMap instanceof Map
  4. exkeyMap = sizeMap[stat.size] // exkeyMap instanceof Map
  5. contentMap = exkeyMap[value of extra keys] // contentMap instanceof Map
  6. inoMap = contentMap[hash.digest] // inoMap instanceof Map
  7. paths = inoMap[stat.ino] // paths instanceof Array

processing

  1. probe(paths).then(verify).then(solve).then(execute)

probe: Traverse the input paths asynchronously while use the stat()’s result to group files.

verify: Find out least files to hash, and group files by the digest.

solve: Make solution that instruct to hardlink the file whose inode is majority to other files.

execute: Execute that solution or just dry-run.

License

MIT © Chinory