|
| 1 | +# Diff/AST: A Fine-Grained Source Code Differencing Tool |
| 2 | + |
| 3 | +The tool is currently able to recognize Python, Java, Verilog, Fortran, and C/C++ via dedicated parsers. |
| 4 | +It compares abstract syntax trees (ASTs) node by node, while popular `diff` tools compare any (text) files line by line. |
| 5 | +The algorithm is based on [an algorithm](https://doi.org/10.1137/0218082) for computing *tree edit distance (TED)* between two ordered labeled trees. The TED between two trees is the minimum (weighted) number of edit operations to transform one tree into another. |
| 6 | +Unfortunately, however, applying TED algorithms directly to wild ASTs is not feasible in general because [their computational complexity is essentially, at best, quadratic according to the number of AST nodes](https://doi.org/10.1016/j.tcs.2004.12.030). |
| 7 | +Therefore Diff/AST makes moderate use of a TED algorithm in a divide-and-conquer manner backed by elaborated heuristics to approximate tree edit distances. |
| 8 | +Nevertheless, Diff/AST still requires much time for non-trivial massive inputs. Thus it always caches the results. |
| 9 | + |
| 10 | +Diff/AST is able to export ASTs, changes between them, and other syntactic/semantic information as *facts* in |
| 11 | +[XML](https://www.w3.org/TR/xml11/) or [N-Triples](https://www.w3.org/2001/sw/RDFCore/ntriples/). |
| 12 | +In particular, facts in N-Triples format can be loaded into an RDF store such as |
| 13 | +[Virtuoso](https://github.com/openlink/virtuoso-opensource) to build a *factbase* or a database of facts. |
| 14 | +Factbases are intended to be queried for software engineering tasks such as |
| 15 | +[code comprehension](https://github.com/ebt-hpc/cca), |
| 16 | +[debugging](https://stair.center/archives/research/ddj-esecfse2018), |
| 17 | +[change pattern mining](https://ieeexplore.ieee.org/document/7081845), and |
| 18 | +[code homology analysis](https://link.springer.com/chapter/10.1007/978-3-642-12029-9_7). |
| 19 | + |
| 20 | +Diff/AST is an experimental implementation of the tree differencing algorithm |
| 21 | +reported in the following paper: |
| 22 | + |
| 23 | +Masatomo Hashimoto and Akira Mori, "Diff/TS: A Tool for Fine-Grained Structural Change Analysis," |
| 24 | +In *Proc. 15th Working Conference on Reverse Engineering*, 2008, pp. 279-288, |
| 25 | +DOI: [10.1109/WCRE.2008.44](https://doi.org/10.1109/WCRE.2008.44). |
| 26 | + |
| 27 | +## Screenshots |
| 28 | + |
| 29 | +You can see the results of comparing some pairs of source files taken from [samples](samples) [here](https://codinuum.github.io/gallery-cca). |
| 30 | + |
| 31 | +## Quick start |
| 32 | + |
| 33 | +You can instantly try Diff/AST by utilizing [Docker](https://www.docker.com/) and [a ready-made container image](https://hub.docker.com/r/codinuum/diffast). |
| 34 | + |
| 35 | + $ docker pull codinuum/diffast |
| 36 | + |
| 37 | +The following command line executes Diff/AST within a container to compare sample Java programs and then saves the results in `results` (host) directory. |
| 38 | + |
| 39 | + $ ./cca.py diffast -c results samples/java/0/Test.java samples/java/1/Test.java |
| 40 | + |
| 41 | +Once you have built [DiffViewer](diffviewer), you can inspect the AST differences in a viewer window. See [`diffviewer/README.md`](diffviewer/README.md) for details. |
| 42 | + |
| 43 | + $ diffviewer/run.py -c results samples/java/0/Test.java samples/java/1/Test.java |
| 44 | + |
| 45 | +You can run both Diff/AST and DiffViewer by the following line. |
| 46 | + |
| 47 | + $ ./cca.py diffast -c results --view samples/java/0/Test.java samples/java/1/Test.java |
| 48 | + |
| 49 | +## Installing parsers and Diff/AST |
| 50 | + |
| 51 | +### Requirements |
| 52 | + |
| 53 | +* [OCaml](http://ocaml.org/) (>=4.14) |
| 54 | +* [OPAM](https://opam.ocaml.org/) |
| 55 | + |
| 56 | +### Installation |
| 57 | + |
| 58 | +The following will install `parsesrc` and `diffast`. |
| 59 | + |
| 60 | + $ opam install diffast |
| 61 | + |
| 62 | +## Building parsers and Diff/AST |
| 63 | + |
| 64 | +You can also build parsers and Diff/AST in person. |
| 65 | + |
| 66 | +### Requirements |
| 67 | + |
| 68 | +* [OCaml](http://ocaml.org/) (>=4.14) |
| 69 | +* [Dune](https://github.com/ocaml/dune) |
| 70 | +* [OPAM](https://opam.ocaml.org/) (for installing bytesrw, camlzip, cryptokit, csv, git-unix, markup, menhir, sedlex, uuidm, and vlt.) |
| 71 | + |
| 72 | +### Compilation |
| 73 | + |
| 74 | +The following will create `./dist/bin/{parsesrc,diffast}`. |
| 75 | + |
| 76 | + $ dune build --relocatable --prefix ./dist |
| 77 | + |
| 78 | +## Using with Git |
| 79 | + |
| 80 | +If you have built Diff/AST, you can use it with Git. Add the following lines to your `.gitconfig`. Note that `PATH_TO_THIS_REPO` should be replaced by your local path to this repository. |
| 81 | + |
| 82 | + [diff] |
| 83 | + tool = diffast |
| 84 | + [difftool] |
| 85 | + prompt = false |
| 86 | + [difftool "diffast"] |
| 87 | + cmd = PATH_TO_THIS_REPO/git_ext_diff "$LOCAL" "$REMOTE" |
| 88 | + [alias] |
| 89 | + diffast = difftool |
| 90 | + |
| 91 | +Then you should be able to use `git diffast` like `git diff`. You will be prompted to launch diffast for each source file comparison. Other file comparisons will be ignored. |
| 92 | + |
| 93 | + |
| 94 | +## Building docker image |
| 95 | + |
| 96 | +The following command line creates a docker image named `diffast`. |
| 97 | + |
| 98 | + $ docker build -t diffast . |
| 99 | + |
| 100 | +## License |
| 101 | + |
| 102 | +Apache License, Version 2.0 |
0 commit comments