Skip to content

Commit 6ba645a

Browse files
committed
update readme
1 parent 6cea5db commit 6ba645a

1 file changed

Lines changed: 55 additions & 1 deletion

File tree

README.md

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,62 @@
33
[Updated paper](docparser.pdf)
44

55

6+
### Installation and requirements
7+
8+
Tested for Ubuntu 18.04/20.04.
9+
10+
Use of a GPU significantly speeds up generation of detection outputs, but it is possible to run the inference demo code on CPU.
11+
12+
To setup via Anaconda, please follow these steps:
13+
14+
1. Install anaconda. Up-to-date instructions can be found at: https://docs.anaconda.com/anaconda/install/
15+
16+
2. Set up python 3.6 environment:
17+
`conda create -n docparser python=3.6`
18+
19+
3. Activate the environment:
20+
`source activate docparser`
21+
22+
4. Install all requirements:
23+
`pip install -r requirements.txt`
24+
- (for GPU-enabled installation: `pip install -r requirements_gpu.txt`)
25+
26+
27+
5. Install Mask R-CNN library:
28+
- We used a slightly modified version of https://github.com/matterport/Mask_RCNN, though the original version should still be usable, possibly with minor adaptions.
29+
- Clone repository from https://github.com/j-rausch/Mask_RCNN
30+
- Change into mask rcnn directory
31+
- type `python setup.py develop`
32+
33+
6. Install docparser:
34+
- Change into DocParser directory
35+
- type `python setup.py develop`
36+
37+
7. Prepare the datasets:
38+
- Download arxivdocs-target and ICDAR files as shown on https://github.com/DS3Lab/arXivDocs
39+
- Extract datasets to the `DocParser` subdirectory
40+
- (resulting in structure: `DocParser/datasets`).
41+
42+
8. Prepare the trained models:
43+
- Download from URL:
44+
- Extract the pretrained models to the `default_models` subdirectory in `DocParser/docparser/`
45+
- (resulting in structure `DocParser/docparser/default_models/`).
46+
- For convenience, we include the COCO pre-trained weights from from https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 in the zip file
47+
48+
9. For running the ICDAR demo:
49+
- Please note that, in order to run the ICDAR 2013 evaluation script provided by the competition organizers, a Java installation is necessary. We used `openjdk 11.0.7 2020-04-14` in our experiments.
50+
- If necessary, update permissions for the evaluation script (on linux systems):
51+
`chmod a+x DocParser/docparser/utils/dataset-tools-20180206.jar`
52+
53+
54+
10. From the `DocParser` directory, execute:
55+
`python demos/demo_inference.py` plus one or more of the following command line arguments:
56+
57+
- `--page`
58+
- `--table`
59+
- `--icdar`
60+
- e.g. `python demos/demo_inferencey.py --page --table`
661

7-
TBA
862

963
### Evaluations
1064

0 commit comments

Comments
 (0)