Skip to content

Commit 2c3b79a

Browse files
committed
initial
0 parents  commit 2c3b79a

3 files changed

Lines changed: 444 additions & 0 deletions

File tree

.gitignore

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
*~
2+
*.cmo
3+
*.cmi
4+
*.cmx
5+
*.cmxs
6+
*.cmxa
7+
*.annot
8+
*.output
9+
*.automaton
10+
*.conflicts
11+
*.o
12+
*.a
13+
*.d
14+
*.di
15+
*.opt
16+
tmp
17+
_build
18+

README.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Diff/AST: A Fine-Grained Source Code Differencing Tool
2+
3+
The tool is currently able to recognize Python, Java, Verilog, Fortran, and C/C++ via dedicated parsers.
4+
It compares abstract syntax trees (ASTs) node by node, while popular `diff` tools compare any (text) files line by line.
5+
The algorithm is based on [an algorithm](https://doi.org/10.1137/0218082) for computing *tree edit distance (TED)* between two ordered labeled trees. The TED between two trees is the minimum (weighted) number of edit operations to transform one tree into another.
6+
Unfortunately, however, applying TED algorithms directly to wild ASTs is not feasible in general because [their computational complexity is essentially, at best, quadratic according to the number of AST nodes](https://doi.org/10.1016/j.tcs.2004.12.030).
7+
Therefore Diff/AST makes moderate use of a TED algorithm in a divide-and-conquer manner backed by elaborated heuristics to approximate tree edit distances.
8+
Nevertheless, Diff/AST still requires much time for non-trivial massive inputs. Thus it always caches the results.
9+
10+
Diff/AST is able to export ASTs, changes between them, and other syntactic/semantic information as *facts* in
11+
[XML](https://www.w3.org/TR/xml11/) or [N-Triples](https://www.w3.org/2001/sw/RDFCore/ntriples/).
12+
In particular, facts in N-Triples format can be loaded into an RDF store such as
13+
[Virtuoso](https://github.com/openlink/virtuoso-opensource) to build a *factbase* or a database of facts.
14+
Factbases are intended to be queried for software engineering tasks such as
15+
[code comprehension](https://github.com/ebt-hpc/cca),
16+
[debugging](https://stair.center/archives/research/ddj-esecfse2018),
17+
[change pattern mining](https://ieeexplore.ieee.org/document/7081845), and
18+
[code homology analysis](https://link.springer.com/chapter/10.1007/978-3-642-12029-9_7).
19+
20+
Diff/AST is an experimental implementation of the tree differencing algorithm
21+
reported in the following paper:
22+
23+
Masatomo Hashimoto and Akira Mori, "Diff/TS: A Tool for Fine-Grained Structural Change Analysis,"
24+
In *Proc. 15th Working Conference on Reverse Engineering*, 2008, pp. 279-288,
25+
DOI: [10.1109/WCRE.2008.44](https://doi.org/10.1109/WCRE.2008.44).
26+
27+
## Screenshots
28+
29+
You can see the results of comparing some pairs of source files taken from [samples](samples) [here](https://codinuum.github.io/gallery-cca).
30+
31+
## Quick start
32+
33+
You can instantly try Diff/AST by utilizing [Docker](https://www.docker.com/) and [a ready-made container image](https://hub.docker.com/r/codinuum/diffast).
34+
35+
$ docker pull codinuum/diffast
36+
37+
The following command line executes Diff/AST within a container to compare sample Java programs and then saves the results in `results` (host) directory.
38+
39+
$ ./cca.py diffast -c results samples/java/0/Test.java samples/java/1/Test.java
40+
41+
Once you have built [DiffViewer](diffviewer), you can inspect the AST differences in a viewer window. See [`diffviewer/README.md`](diffviewer/README.md) for details.
42+
43+
$ diffviewer/run.py -c results samples/java/0/Test.java samples/java/1/Test.java
44+
45+
You can run both Diff/AST and DiffViewer by the following line.
46+
47+
$ ./cca.py diffast -c results --view samples/java/0/Test.java samples/java/1/Test.java
48+
49+
## Installing parsers and Diff/AST
50+
51+
### Requirements
52+
53+
* [OCaml](http://ocaml.org/) (>=4.14)
54+
* [OPAM](https://opam.ocaml.org/)
55+
56+
### Installation
57+
58+
The following will install `parsesrc` and `diffast`.
59+
60+
$ opam install diffast
61+
62+
## Building parsers and Diff/AST
63+
64+
You can also build parsers and Diff/AST in person.
65+
66+
### Requirements
67+
68+
* [OCaml](http://ocaml.org/) (>=4.14)
69+
* [Dune](https://github.com/ocaml/dune)
70+
* [OPAM](https://opam.ocaml.org/) (for installing bytesrw, camlzip, cryptokit, csv, git-unix, markup, menhir, sedlex, uuidm, and vlt.)
71+
72+
### Compilation
73+
74+
The following will create `./dist/bin/{parsesrc,diffast}`.
75+
76+
$ dune build --relocatable --prefix ./dist
77+
78+
## Using with Git
79+
80+
If you have built Diff/AST, you can use it with Git. Add the following lines to your `.gitconfig`. Note that `PATH_TO_THIS_REPO` should be replaced by your local path to this repository.
81+
82+
[diff]
83+
tool = diffast
84+
[difftool]
85+
prompt = false
86+
[difftool "diffast"]
87+
cmd = PATH_TO_THIS_REPO/git_ext_diff "$LOCAL" "$REMOTE"
88+
[alias]
89+
diffast = difftool
90+
91+
Then you should be able to use `git diffast` like `git diff`. You will be prompted to launch diffast for each source file comparison. Other file comparisons will be ignored.
92+
93+
94+
## Building docker image
95+
96+
The following command line creates a docker image named `diffast`.
97+
98+
$ docker build -t diffast .
99+
100+
## License
101+
102+
Apache License, Version 2.0

0 commit comments

Comments
 (0)