|
1 | 1 | # SeFNet: Bridging Tabular Datasets with Semantic Feature Nets |
| 2 | +This is the repository on which the website https://sefnet.mi2.ai is hosted. To access the files relating to the article, please change the branch to main. |
2 | 3 |
|
3 | | -This repository contains code and resources that can be used to reproduce the results presented in the article "SeFNet: Bridging Tabular Datasets with Semantic Feature Nets". |
4 | | - |
5 | | -## Reproducting results |
6 | | -### 0. Annotating datasets <br> |
7 | | -The annotation of datasets' features is a tedious process, so the annotations we made manually have been made available in the `annotations` directory. Every annotation file is in .csv format and it consists of two columns: column_name (original feature names) and term_id (SNOMED-CT term ids). |
8 | | - |
9 | | -### 1. Calculating similarity between terms <br> |
10 | | -Similarity of terms is callculated using Maven. The necessary dependency information and java configuration are contained in the file pom.xml. Key functionalities used, such as computing semantic similarity between terms, have been implemented in the [slib-sml](https://github.com/sharispe/slib) library. |
11 | | - |
12 | | -In order to reproduce our results you have to first get access to [SNOMED-CT ontology](https://www.snomed.org/get-snomed). After downloading the ontology place the folder in the main catalog of the repository. In our research we have used the US version released on March 1, 2023. |
13 | | - |
14 | | -When ontology files are present all that is needed is to execute AllTermsSimilarity.java. |
15 | | - |
16 | | -### 2. Calculating DOSS matrix <br> |
17 | | -Before the DOSS matrix can be calculated, python and the necessary packages must be installed (`numpy` and `pandas`). We have used python 3.9 and the versions of the packages specified in requirements.txt. |
18 | | -``` |
19 | | -pip install -r requirements.txt |
20 | | -``` |
21 | | -Now all that is required is to exectute the script: |
22 | | -``` |
23 | | -python DOSS.py |
24 | | -``` |
25 | | - |
26 | | -## Repository structure |
27 | | -``` |
28 | | -├── annotations - directory containing datasets annotations |
29 | | -├── calculate-term-similarities |
30 | | -│ ├── src/main/java |
31 | | -│ │ ├── AllTermsSimilarity.java - calculate semantic similarity between all annotated terms (term_similarities.csv) |
32 | | -│ │ ├── Dataset2DatasetSimilarity.java - calculate semantic similarity between terms in two datasets |
33 | | -│ │ ├── SingleTermSimilarity.java - calculate semantic similarity between two terms |
34 | | -│ ├── pom.xml - maven project configuration |
35 | | -├── datasets - directory containing datasets which could be shared |
36 | | -├── DOSS.py - python script which creates DOSS_matrix.csv |
37 | | -├── DOSS_matrix.csv |
38 | | -├── README.md |
39 | | -├── annotations.csv - annotations of all used terms |
40 | | -├── requirements.txy - python necessary packages |
41 | | -└── term_similarities.csv - semantic similarity between all annotated terms calculated in AllTermsSimilarity.java |
42 | | -``` |
43 | | - |
44 | | -## Citation |
45 | | -``` |
46 | | -TBC |
47 | | -``` |
| 4 | +The website has been powered by [Hugo Pages](https://gohugo.io/) and [PaperMod Theme](https://github.com/adityatelange/hugo-PaperMod/). |
0 commit comments