Skip to content

Requesting scores for GPT 5.1 and GPT 5.2 (Possibility of crowd-sourcing evaluations for new models?) #6

@apoorvagnihotri

Description

@apoorvagnihotri

Hello, I found this work to be one of the firsts which could actually test LLMs on their SWE performance independently of the SWE agent being used.

I am curious if there is any way I could contribute to the evaluation numbers for newer models coming out every now and then. I would probably be not able to run the whole benchmark but it would be super cool in case I could contribute with some issues so that the evaluations can be democratized and whenever there's a new model coming out, people can contribute to the evaluations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions