项目作者: EdDuarte

项目描述 :
(WIP) A fork of Google's DiffMatchPatch in Java where diffs include string index pointers
高级语言: Java
项目地址: git://github.com/EdDuarte/indexed-diff-match-patch.git
创建时间: 2016-06-25T22:06:42Z
项目社区:https://github.com/EdDuarte/indexed-diff-match-patch

开源协议:Apache License 2.0

下载


Indexed DiffMatchPatch in Java

Build Status

The Diff, Match and Patch Library, originally developed by Neil Fraser and hosted at http://code.google.com/p/google-diff-match-patch/, is a library for detecting in-line text differences (‘inserted’ and ‘deleted’).

This is a fork of the DiffMatchPatch project that changes multiple parts of the algorithm so that diffs returned have a startIndex and an endIndex value.

The main idea behind this fork comes from the fact that character-based index pointers are required for a great majority of NLP tools available (e.g.: sentence-splitters, roi taggers). Using String.indexOf(diffText) or String.lastIndexOf(diffText) to obtain these indexes would not be a viable solution, since a diff text can often occur multiple times in the same document.

Other use cases include sorting diffs by startIndex (so that they are ordered by occurrence in the original document) and using String.substring() to retrieve text surrounding the differed section (snippets).

Usage

Maven

  1. <dependency>
  2. <groupId>com.edduarte</groupId>
  3. <artifactId>diff-match-patch</artifactId>
  4. <version>1.0.0</version>
  5. </dependency>

Gradle

  1. dependencies {
  2. compile 'com.edduarte:diff-match-patch:1.0.0'
  3. }

Example

  1. DiffMatchPatch dmp = new DiffMatchPatch();
  2. LinkedList<Diff> diffs = dmp.diff_main(oldText, newText);
  3. int snippetOffset = 50;
  4. for (Diff d : diffs) {
  5. int start = d.getStartIndex() - snippetOffset;
  6. int end = d.getEndIndex() + snippetOffset;
  7. if (d.getOperation().equals(DiffMatchPatch.Operation.INSERT)) {
  8. // for diffs with operation 'INSERT', the snippet is a
  9. // substring of the newText
  10. String snippet = newText.substring(start, end);
  11. } else if (d.getOperation().equals(DiffMatchPatch.Operation.DELETE) ||
  12. d.getOperation().equals(DiffMatchPatch.Operation.EQUAL)) {
  13. // for diffs with operation 'DELETE' or 'EQUAL', the snippet is a
  14. // substring of the oldText
  15. String snippet = oldText.substring(start, end);
  16. }
  17. }

Projects using this library

You can see this library in use at https://github.com/vokter/vokter.

License

  1. Copyright 2016 Eduardo Duarte
  2. Licensed under the Apache License, Version 2.0 (the "License");
  3. you may not use this file except in compliance with the License.
  4. You may obtain a copy of the License at
  5. http://www.apache.org/licenses/LICENSE-2.0
  6. Unless required by applicable law or agreed to in writing, software
  7. distributed under the License is distributed on an "AS IS" BASIS,
  8. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  9. See the License for the specific language governing permissions and
  10. limitations under the License.