Indian Supreme Court Judgments

Summary

This repo contains the code to download judgments from ecourts website. Data for bulk download is available freely from AWS. It contains judgments from 1950 to Present, along with raw metadata (in json format) and structured metadata(parquet format). Judgments are available in both English and regional Indian languages in tar format for easier download.

The data is licensed under Creative Commons Attribution 4.0 (CC-BY-4.0), which means you are free to use, share, and adapt the data as long as you provide appropriate attribution.
AWS sponsors the storage and data transfer costs of the data.
Join discord server if you want to collaborate on this repo or have any questions.
Be responsible, considerate, and think about the maintainers of the ecourts website. Avoid scraping with high concurrency.
Data downloaded using previous version of this repo from an API is availavble in Kaggle. Code for that is available in the branch old

Data

From 1950 to Present
~35K judgments in english, some of which have regional language versions.
~52.24GB of data (see dataset_sizes.csv for detailed breakdown)
Any metadata about judgment like Disposal nature, Decision date etc have also been part of the dataset.

Structure of the data in the s3 bucket

s3://indian-supreme-court-judgments/
├── data/
│   └── tar/
│       └── year=YYYY/
│           ├── english/
│           │   ├── english.tar              # Main archive (or part-*.tar for large archives)
│           │   └── english.index.json       # Index with parts info
│           └── regional/
│               ├── regional.tar
│               └── regional.index.json
└── metadata/
    ├── tar/
    │   └── year=YYYY/
    │       ├── metadata.tar
    │       └── metadata.index.json
    └── parquet/
        └── year=YYYY/
            └── metadata.parquet

Where YYYY represents the year (1950-2025).

Each year has following data:

English judgments (english.tar, or multiple part-*.tar files for large archives)
Regional language judgments (regional.tar)
Metadata (metadata.tar and metadata.parquet)
index.json files that contain info about the files in the tar files

Index File Structure (V2)

The index files use a V2 format that supports multiple archive parts. When an archive exceeds 1GB, new parts are created with timestamped names:

{
  "year": 2025,
  "archive_type": "english",
  "file_count": 8131,
  "total_size": 5740309504,
  "total_size_human": "5.35 GB",
  "created_at": "2025-01-15T07:12:14.860503Z",
  "updated_at": "2025-12-27T10:30:00.000000Z",
  "parts": [
    {
      "name": "english.tar",
      "files": ["judgment1.pdf", "judgment2.pdf"],
      "file_count": 5000,
      "size": 4000000000,
      "size_human": "3.73 GB",
      "created_at": "2025-01-15T07:11:49.283600Z"
    },
    {
      "name": "part-20251227T103000.tar",
      "files": ["judgment5001.pdf", "judgment5002.pdf"],
      "file_count": 3131,
      "size": 1740309504,
      "size_human": "1.62 GB",
      "created_at": "2025-12-27T10:30:00.000000Z"
    }
  ]
}

Columns/fields in the metadata.parquet are

title
petitioner
respondent
description
judge
author_judge
citation
case_id
cnr
decision_date
disposal_nature
court
available_languages
raw_html
path
nc_display
scraped_at
year

Working with the data in AWS

Example command to list all available years: aws s3 ls s3://indian-supreme-court-judgments/data/tar/ --no-sign-request
Example command to download English judgments for 2023: aws s3 cp s3://indian-supreme-court-judgments/data/tar/year=2023/english/english.tar . --no-sign-request
Example command to view metadata index for 2023: aws s3 cp s3://indian-supreme-court-judgments/metadata/tar/year=2023/metadata.index.json . --no-sign-request
Since the S3 bucket is public, files can also be downloaded using links like https://indian-supreme-court-judgments.s3.amazonaws.com/data/tar/year=2023/english/english.tar

See the AWS tutorials for more detailed examples of:

Downloading and extracting judgment data
Querying metadata using AWS Athena here

Local development

install uv
install dependencies: uv sync
source .venv/bin/activate
python3 download.py
VS Code extensions: Python, Pylance, ruff

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.github		.github
.vscode		.vscode
opendata		opendata
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE.md		LICENSE.md
README.md		README.md
archive_manager.py		archive_manager.py
calculate_dataset_sizes.py		calculate_dataset_sizes.py
clean-metadata.py		clean-metadata.py
count_judgments.py		count_judgments.py
dataset_sizes.csv		dataset_sizes.csv
download.py		download.py
package_tar_files.py		package_tar_files.py
process_metadata.py		process_metadata.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sc-requests.md		sc-requests.md
sync_s3.py		sync_s3.py
sync_s3_fill.py		sync_s3_fill.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indian Supreme Court Judgments

Summary

Data

Structure of the data in the s3 bucket

Index File Structure (V2)

Working with the data in AWS

Local development

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Indian Supreme Court Judgments

Summary

Data

Structure of the data in the s3 bucket

Index File Structure (V2)

Working with the data in AWS

Local development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages