MolJSON-data

This repository contains the data and analysis scripts associated with the MolJSON paper. To use MolJSON in your own work, please see: https://github.com/oxpig/MolJSON.

Warning

This is the data and analysis repository for the MolJSON paper, not the main MolJSON package repository.

questions/: benchmark question files
model_responses/raw/: raw model outputs
model_responses/checked/: evaluated model outputs with correctness labels
evaluation_scripts/: scripts used to evaluate model responses
analysis_scripts/: scripts used to generate analysis figures
analysis_outputs/plots/: generated figures
submission_scripts/: scripts used to generate model responses

Dependencies

To create a conda environment for the evaluation and analysis scripts:

conda create -n moljson-data -c conda-forge rdkit pandas numpy matplotlib selfies tqdm openjdk
conda activate moljson-data

To run the submission scripts you will also need:

pip install openai anthropic

Notes

The question and response files are stored in compressed .gz form. These can be uncompressed with commands such as gzip -d filename.gz.

The model_responses/checked files contain the evaluated benchmark outputs and correctness labels used for analysis.

The OPSIN parser used for IUPAC-to-SMILES conversion must be downloaded separately from https://github.com/dan2097/opsin. In this work we used OPSIN version 2.9.0.

Citation

Please use the following citation when referencing or using MolJSON.

@article{runcie2026moljson,
  title = {Molecular Representations for Large Language Models},
  author = {Runcie, Nicholas T. and Imrie, Fergus and Deane, Charlotte M.},
  year = {2026},
  journal = {arXiv preprint arXiv:2605.01822},
  doi = {10.48550/arXiv.2605.01822},
  url = {http://arxiv.org/abs/2605.01822},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolJSON-data

Contents

Dependencies

Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis_outputs/plots		analysis_outputs/plots
analysis_scripts		analysis_scripts
evaluation_scripts		evaluation_scripts
model_responses		model_responses
questions		questions
submission_scripts		submission_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MolJSON-data

Contents

Dependencies

Notes

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages