Skip to content

Add structured accessors on Reader for actions, signing info, and ingredients #289

Description

@HamzaJavaid-gh

Summary

Every consumer of Reader that needs more than the raw JSON string ends up writing the same small walker to reach three sub-structures of the manifest store: the c2pa.actions action list, signature_info, and ingredients. I'd like to propose three thin, additive accessors on Reader that are get_actions(), get_signing_info(), and get_ingredients(), that replace this boilerplate while staying consistent with the existing get_active_manifest() / get_manifest(label) shape.

Before I open a PR, I want to align on the API shape and a handful of design decisions.

Motivation

Concrete example from a downstream project that uses c2pa-python to detect AI-generated images. Today, reading actions, signing info, and ingredients from a Reader looks like this (~30 lines):

manifest = manifest_store["manifests"][manifest_store["active_manifest"]]
sig_info = manifest.get("signature_info", {})

actions = []
assertion_labels = []
for assertion in manifest.get("assertions", []):
    label = assertion.get("label", "")
    assertion_labels.append(label)
    if label.startswith("c2pa.actions"):  # handle both v1 and v2 label variants
        for action in assertion.get("data", {}).get("actions", []):
            agent = action.get("softwareAgent", {})
            agent_name = agent.get("name") if isinstance(agent, dict) else agent
            actions.append({"action": action.get("action"), "software_agent": agent_name, ...})

ingredients = manifest.get("ingredients", [])

With the proposed accessors, the same three reads become:

sig_info = reader.get_signing_info() or {}
actions = reader.get_actions() or []
ingredients = reader.get_ingredients() or []

The walker code is duplicated across every project that consumes Reader. It's also easy to get subtly wrong — for example, missing that c2pa.actions.v2 exists alongside c2pa.actions, or forgetting that data.actions can be absent on some assertions.

Proposed public API

Three new methods on Reader, layered on top of the existing get_active_manifest() / get_manifest(label):

class Reader(ManagedResource):
    def get_actions(self, manifest_label: Optional[str] = None) -> Optional[list[dict]]: ...
    def get_signing_info(self, manifest_label: Optional[str] = None) -> Optional[dict]: ...
    def get_ingredients(self, manifest_label: Optional[str] = None) -> Optional[list[dict]]: ...

Behavior contract:

Scenario get_actions() get_signing_info() get_ingredients()
Active manifest present, sub-structure present list[dict] dict list[dict]
Active manifest present, sub-structure missing [] None []
No active manifest (unsigned asset) None None None
manifest_label is a known label Result from that manifest Result from that manifest Result from that manifest
manifest_label is an unknown string KeyError (from get_manifest) KeyError KeyError
Reader is closed C2paError (matches Reader.json()) C2paError C2paError

Design decisions I'd like feedback on

  1. Return dict / list[dict], not typed dataclasses.
    Rationale: consistency with get_active_manifest() and get_manifest(label), resilience to C2PA JSON schema drift (new fields pass through transparently), and no new public types to export from __init__.py. Consumers who want typing can wrap. Open to going the other way if the project has a direction toward typed accessors.

  2. manifest_label=None targets the active manifest; a string targets a specific manifest.
    Rationale: mirrors existing behavior split — get_active_manifest() for None, get_manifest(label) for specific. Reuses the existing exception semantics (KeyError on unknown label) rather than inventing a third convention.

  3. get_actions() handles both c2pa.actions and c2pa.actions.v2.
    Rationale: the C2PA spec permits both labels on a single manifest. Concatenating in encounter order is the least-surprising default; alternatives (v2-only-if-present, dict-keyed-by-label) either lose information or break the flat-list contract.

  4. Do not coerce digitalSourceType to C2paDigitalSourceType.
    Rationale: the enum can trail the IPTC/CAWG taxonomy, and coercing would mean either silently dropping unknown URIs or raising on them. A separate helper for AI-source-type classification would compose cleanly on top of this without leaking an enum into a passthrough accessor.

  5. Pure Python change, no c2pa-rs or c2pa_c_ffi work.
    The accessors read from the same parsed dict that get_active_manifest() already populates via Reader.json(). No new FFI symbols and no changes to the native library.

Non-goals / out of scope for this proposal

  • No new public types exported from c2pa/__init__.py.
  • No typed-schema conversion of returned dicts.
  • No changes to Reader.json() / Reader.detailed_json() / crjson() output.
  • No enum coercion (see decision 4).
  • No changes to c2pa-rs.
  • Ancillary ideas — a validation-category enum on top of validation_status, a Settings helper for the official trust list, an is_ai_digital_source_type() classifier — are deliberately kept separate. Happy to pursue any of them in follow-up issues if the maintainers see value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions