Skip to content

ikhado/GeoLlama

Repository files navigation

GeoLlama Logo

🌍 GeoLlama

Large Language and Vision Assistant for Earth Observation

Kha Do

Python License Hugging Face Conda

A Visual Large Language Assistant for Multi-spectral Remote Sensing Data

InstallationDatasetsTrainingEvaluation


Overview

Contributions

  • GeoLlama: Introduce GeoLlama, the large Vision-Language Model (VLM) for multi-spectral remote-sensing imagery.

    • Propose a novel Grounded Spectral-aware Connector (GSC) module that integrates visible and non-visible spectral bands.
    • Inject spectral knowledge into all cross-attention layers of the language model, enabling the full exploitation of spectral cues and expert knowledge across diverse remote sensing tasks.
  • GeoLlamaInstData: Construct GeoLlamaInstData, the large-scale instruction-following dataset for multi-spectral imagery.

    • Pair spectral images with rich, object-centric, and deep analysis conversations.
    • Provide rigorous training and benchmarking for VLMs in remote-sensing applications.
  • Experimental Results: Demonstrate that GeoLlama, by effectively leveraging spectral information, outperforms both general-purpose VLMs and specialized remote sensing VLMs.

    • Achieve superior performance in multi-label classification, image description, and visual question answering tasks.

📋 Table of Contents


Installation

⚠️ Important: This project is designed for Linux. For other operating systems, please refer to:

1. Clone Repository

git clone https://github.com/ikhado/GeoLlama.git
cd GeoLlama

2. Setup Environment

# Create and activate conda environment
conda create -n GeoLlama python=3.10 -y
conda activate GeoLlama

# Upgrade pip and install package
pip install --upgrade pip
pip install -e .

3. Install Training Dependencies

# Install training-specific packages
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Updating to Latest Version

git pull
pip install -e .

📊 Datasets

🌐 Primary Dataset

Download the image-caption pairs from ChatEarthNet.

Data construction process:

Data Construction Process

📁 Available JSON Files

Dataset Type File Description
Pre-training GeoLlama_Pre_train.json Initial training dataset
Instruction Tuning GeoLlama_Instruct.json Fine-tuning dataset

🎯 Training

Model architecture:

GeoLlama Model Architecture

We utilize the pre-trained backbone Llama-3.2-11B-Vision-Instruct and train the projector from scratch.

Training Scripts

Training Phase Script Dataset
Pre-training pre_train_llama32.sh Pre-training dataset
Visual Instruction Tuning fine_tune_llama32.sh Instruction fine-tuning dataset

📈 Evaluation

For evaluation procedures, refer to the evaluation script:

./geollama/eval/test.py

Visualization Results:

Example Results


Acknowledgements

We gratefully acknowledge the following projects upon which our work is built:

GeoChat Llama3.2 Vision


Made with ❤️ for the Remote Sensing Community

GitHub Stars GitHub Forks

About

A Visual Large Language Assistant for Multi-spectral Remote Sensing Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages