Skip to content

[runtime] Support per-event-type configurable log levels for event log#609

Merged
wenjin272 merged 13 commits into
apache:mainfrom
weiqingy:541-impl
May 12, 2026
Merged

[runtime] Support per-event-type configurable log levels for event log#609
wenjin272 merged 13 commits into
apache:mainfrom
weiqingy:541-impl

Conversation

@weiqingy
Copy link
Copy Markdown
Collaborator

@weiqingy weiqingy commented Apr 7, 2026

Linked issue: #541 , #552

Purpose of change

Add per-event-type configurable log levels (OFF, STANDARD, VERBOSE) to the event log system, as designed in #552.

Key changes:

  • EventLogLevel enum (OFF, STANDARD, VERBOSE) with hierarchical config key inheritance (event-log.type.<EVENT_TYPE>.level)
  • At STANDARD level, large content fields are truncated using configurable per-field thresholds (max-string-length, max-array-elements, max-depth) with JSON-parseable wrapper objects (truncatedString, truncatedList, truncatedObject)
  • Top-level logLevel and eventType fields added to JSON log record schema for downstream filtering
  • EventFilter removed — fully subsumed by the log level system
  • eventLogTruncatedEvents counter metric for observability
  • Python config options and E2E tests

Config options:

Key Default Description
event-log.level STANDARD Root default log level
event-log.type.<EVENT_TYPE>.level (inherits) Per-type override
event-log.standard.max-string-length 2000 Max string chars at STANDARD
event-log.standard.max-array-elements 20 Max array elements at STANDARD
event-log.standard.max-depth 5 Max nesting depth at STANDARD

Tests

  • EventLogLevelResolverTest — 11 tests: exact match, hierarchy inheritance, root/built-in defaults, case insensitivity, edge cases
  • JsonTruncatorTest — 8 tests: string/array/depth truncation, wrappers, protected fields, composition, disabled thresholds
  • FileEventLoggerTest — 7 new tests: STANDARD truncation, VERBOSE preservation, OFF suppression, per-type override, hierarchical inheritance, JSON schema fields, backward-compatible deserialization
  • Python E2E tests — 3 new: VERBOSE, OFF, STANDARD truncation

API

  • New public API: EventLogLevel enum in api module
  • Removed public API: EventFilter interface (0.x pre-release, no known users)
  • Modified: EventLoggerConfig — removed eventFilter field/builder method

Documentation

  • doc-needed

weiqingy added 5 commits April 6, 2026 22:15
- Create EventLogLevel enum (OFF, STANDARD, VERBOSE) with case-insensitive
  fromString() parser in api module
- Add 4 new config options to AgentConfigOptions: event-log.level,
  event-log.standard.max-string-length, event-log.standard.max-array-elements,
  event-log.standard.max-depth
- Delete EventFilter.java — subsumed by the log level system
- Remove eventFilter field and references from EventLoggerConfig and
  FileEventLogger
- Remove 6 EventFilter-related tests from FileEventLoggerTest

Part of apache#541: per-event-type configurable log levels.
- EventLogLevelResolver: resolves effective EventLogLevel for event type
  strings using hierarchical config key inheritance (exact match → parent
  package walk-up → root default → built-in STANDARD). Caches results in
  ConcurrentHashMap. Takes Map<String, Object> to avoid runtime→plan
  module dependency.
- JsonTruncator: truncates Jackson JsonNode trees per configurable
  thresholds (max-string-length, max-array-elements, max-depth). Produces
  JSON-parseable wrapper objects (truncatedString, truncatedList,
  truncatedObject). Protected fields (eventType, id, attributes) at top
  level are never truncated.
- Comprehensive unit tests for both classes covering exact match,
  hierarchy inheritance, edge cases, all truncation types, composition,
  and disabled thresholds.

Part of apache#541: per-event-type configurable log levels.
…operator

- EventLogRecord: add logLevel, eventType, truncator, and
  truncatedEventsCounter fields. Old 2-arg constructor defaults to
  VERBOSE for backward compatibility.
- EventLogRecordJsonSerializer: write logLevel and eventType as
  top-level JSON fields. Apply JsonTruncator when level is STANDARD,
  increment truncatedEventsCounter when truncation occurs.
- EventLogRecordJsonDeserializer: read optional logLevel field,
  default to VERBOSE for old records (backward compat).
- FileEventLogger: resolve log level per event via
  EventLogLevelResolver, skip OFF events, initialize truncator from
  agent config. Accept truncation counter via setter.
- ActionExecutionOperator: pass full AgentConfiguration.getConfData()
  to logger via agentConfig property. Wire truncation counter from
  BuiltInMetrics to FileEventLogger after open().
- BuiltInMetrics: add eventLogTruncatedEvents counter with getter.
- FileEventLoggerTest: add 7 new tests for STANDARD truncation,
  VERBOSE preservation, OFF suppression, per-type override,
  hierarchical inheritance, new JSON fields, backward deserialization.

Part of apache#541: per-event-type configurable log levels.
- Add EVENT_LOG_LEVEL, EVENT_LOG_MAX_STRING_LENGTH,
  EVENT_LOG_MAX_ARRAY_ELEMENTS, EVENT_LOG_MAX_DEPTH config options to
  Python AgentConfigOptions (keys and defaults match Java)
- Update existing test_python_event_logging to verify new logLevel and
  top-level eventType fields in JSON output
- Add test_event_log_verbose_level: verifies VERBOSE produces full
  content without truncation
- Add test_event_log_off_level: verifies OFF suppresses all event
  logging
- Add test_event_log_standard_truncation: verifies STANDARD with low
  max-string-length threshold produces truncatedString wrappers
- Extract _run_event_logging_pipeline() and _read_log_records() helpers

Part of apache#541: per-event-type configurable log levels.
- FileEventLogger: use PythonEvent.getEventType() (Python module path)
  for level resolution instead of EventContext.getEventType() (Java
  wrapper class name). Without this, per-type Python config keys like
  event-log.type.flink_agents.api.events.event.OutputEvent.level would
  never match.
- EventLogRecordJsonSerializer: use Python module path for top-level
  eventType field for PythonEvents, consistent with the eventType
  inside the event object.
- BuiltInMetrics: remove dead markEventTruncated() method — counter is
  incremented via serializer path directly.

Part of apache#541: per-event-type configurable log levels.
@github-actions github-actions Bot added doc-needed Your PR changes impact docs. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue. labels Apr 7, 2026
@weiqingy
Copy link
Copy Markdown
Collaborator Author

weiqingy commented Apr 7, 2026

Hey @wenjin272 @xintongsong could you please take a look at the PR? Feedback is welcome. Thanks!

Copy link
Copy Markdown
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @weiqingy, thanks for your work. I left some minor comments.

Map<String, Object> data = confData != null ? confData : Collections.emptyMap();

// Parse root default
Object rootValue = data.get(ROOT_LEVEL_KEY);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just use EVENT_LOG_LEVEL.getKey() here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Changed to ConfigOption<EventLogLevel> with EventLogLevel.STANDARD as the default. AgentConfiguration.get() already supports enum types via ConfigurationUtils.convertValue(), so this works end-to-end.

if (pythonType != null) {
topLevelEventType = pythonType;
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for extracting the python event type is the same as in FileEventLogger. We can extract a common util function.

Copy link
Copy Markdown
Collaborator Author

@weiqingy weiqingy Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Extracted EventUtil.resolveEventType(Event, EventContext) and replaced the duplicated logic in both FileEventLogger.append() and EventLogRecordJsonSerializer.serialize().

private final EventLogLevel logLevel;
private final String eventType;
private final transient JsonTruncator truncator;
private final transient Counter truncatedEventsCounter;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can mark truncator and truncatedEventsCounter as @Nullable.
Additionally, these two members are actually required variables for the serialization of EventLogRecord, rather than part of its content. During deserialization, we have to set them to null. I wonder if they could be removed from within EventLogRecord itself.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @Nullable on both fields.

Regarding removing them from EventLogRecord — I agree they are serialization plumbing rather than content. However, Jackson's custom serializer (EventLogRecordJsonSerializer) receives only the record object via serialize(EventLogRecord record, ...). Since the ObjectMapper is shared/static, there's no clean way to pass the truncator and counter to the serializer except through the record itself. The transient + @Nullable markers signal that these are not part of the logical content. Happy to explore alternatives if you have a preferred pattern in mind.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple approach might be to move the truncation from the Record serialization process into FileEventLogger.append(), like:

@Override
public void append(EventContext context, Event event) throws Exception {
    if (writer == null) {
        throw new IllegalStateException("FileEventLogger not initialized. Call open() first.");
    }

    String eventTypeStr = EventUtil.resolveEventType(event, context);

    // Resolve log level and skip OFF events
    EventLogLevel level =
            levelResolver != null ? levelResolver.resolve(eventTypeStr) : EventLogLevel.VERBOSE;
    if (level == EventLogLevel.OFF) {
        return;
    }

    EventLogRecord record = new EventLogRecord(context, event, level);
    // All events should be JSON serializable, since we check it when sending events to context:
    // RunnerContextImpl.sendEvent
    JsonNode tree = MAPPER.valueToTree(record);
    if (level == EventLogLevel.STANDARD && truncator != null) {
        JsonNode eventNode = tree.get("event");
        if (eventNode instanceof ObjectNode) {
            boolean truncated = truncator.truncate((ObjectNode) eventNode);
            if (truncated && truncatedEventsCounter != null) {
                truncatedEventsCounter.inc();
            }
        }
    }
    String json =
            prettyPrint
                    ? MAPPER.writerWithDefaultPrettyPrinter().writeValueAsString(tree)
                    : MAPPER.writeValueAsString(tree);
    writer.println(json);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the concrete sketch — followed it in c4596bc, and combined it with Xintong's broader concern to drop logLevel from EventLogRecord as well, since it shares the same "not part of the event content" character as the truncator. EventLogRecord is now a pure (context, event) carrier, and FileEventLogger.append() owns truncation orchestration plus injection of the logLevel field on the way out.

The on-disk JSON layout is unchanged — entries still serialize as timestamp, logLevel, eventType, event — and the deserializer silently ignores any logLevel field present in older log files, so existing logs keep replaying without changes.

* values are "OFF", "STANDARD", and "VERBOSE". Defaults to "STANDARD".
*/
public static final ConfigOption<String> EVENT_LOG_LEVEL =
new ConfigOption<>("event-log.level", String.class, "STANDARD");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just use enum here

public static final ConfigOption<EventLogLevel> EVENT_LOG_LEVEL =
            new ConfigOption<>("event-log.level", EventLogLevel.class, EventLogLevel.STANDARD);

Copy link
Copy Markdown
Collaborator Author

@weiqingy weiqingy Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR.

this.rootDefault =
rootValue != null
? EventLogLevel.fromString(rootValue.toString())
: BUILT_IN_DEFAULT;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just use EVENT_LOG_LEVEL.getDefaultValue() here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Replaced BUILT_IN_DEFAULT with AgentConfigOptions.EVENT_LOG_LEVEL.getDefaultValue(). Both removed constants are now derived from the single config option definition.

return;
}
eventLogger.open(new EventLoggerOpenParams(runtimeContext));
if (eventLogger instanceof FileEventLogger) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to specifically check the type of eventLogger here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instanceof check is needed because setTruncatedEventsCounter() is specific to FileEventLogger — it's not on the EventLogger interface. The truncation counter is registered centrally in BuiltInMetrics (alongside numOfEventProcessed, numOfActionsExecuted, etc.) so all operator metrics live in one place under the same FlinkAgentsMetricGroupImpl hierarchy. The operator then passes the counter reference to FileEventLogger after open().

We considered adding the counter to the EventLogger interface or having the logger self-register, but both options would either leak Flink metrics concerns into the public API or break the centralized metrics pattern. The instanceof check keeps the interface clean while allowing FileEventLogger to participate in the operator's metrics.

Open to suggestions if you'd prefer a different approach here.

- Use ConfigOption<EventLogLevel> enum type instead of String
- Derive root key and default from EVENT_LOG_LEVEL config constant
- Add @nullable on transient truncator/counter fields in EventLogRecord
- Extract EventUtil.resolveEventType() for PythonEvent type resolution
@weiqingy
Copy link
Copy Markdown
Collaborator Author

@wenjin272 Thanks for the review! I’ve addressed your comments - could you take another look?

Comment on lines +127 to +128
truncated |= truncateArrayContents((ArrayNode) child, depth + 1);
JsonNode replacement = truncateArray((ArrayNode) child);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we truncate the array before truncating the array contents? If an element is omitted, we should not bother to truncate its content at all.

Copy link
Copy Markdown
Collaborator Author

@weiqingy weiqingy Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — applied in 4356979. Both call sites now truncate the array first and only recurse into the elements that survive: the truncateObject array branch and the symmetric path in truncateArrayContents for nested arrays.

Added testRecursionSkipsDroppedTailElements, which holds references to the dropped tail elements and asserts their content remains unmutated, so the invariant is locked in against future regressions.

Comment on lines +47 to +50
private final EventLogLevel logLevel;
private final String eventType;
@Nullable private final transient JsonTruncator truncator;
@Nullable private final transient Counter truncatedEventsCounter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the log-level and truncator in the record?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. In c4596bc EventLogRecord is reduced to just (context, event); logLevel, eventType, truncator, and truncatedEventsCounter are all gone from the record.

Truncation orchestration and the counter live in FileEventLogger.append() now, and the logLevel field in the on-disk JSON is emitted by the logger at write time rather than carried on the record.

EventLogRecord no longer carries serialization-time state (logLevel,
eventType, truncator, truncatedEventsCounter). It returns to a pure
(context, event) data carrier. FileEventLogger.append() now resolves
the level, builds the JSON tree, injects logLevel, truncates the
event subtree at STANDARD, and writes — keeping the same on-disk
JSON layout (timestamp, logLevel, eventType, event).

The deserializer silently ignores any logLevel field present in
older log files for backward compatibility.

Addresses review feedback from @wenjin272 (apache#609 r3107948831, replying
to round-1 r3061919972) and @xintongsong (apache#609 r3108788804).
In JsonTruncator, run truncateArray before truncateArrayContents at
both call sites (truncateObject array branch and the nested-array
branch in truncateArrayContents). When an array is over the size cap,
recursion now visits only the retained subset inside the wrapper's
truncatedList — elements that will be dropped are never traversed.

Adds a regression test that holds references to dropped tail
ObjectNodes and asserts their long-string fields remain unmutated
after truncation.

Addresses review feedback from @xintongsong (apache#609 r3108615942).
@wenjin272
Copy link
Copy Markdown
Collaborator

LGTM

Copy link
Copy Markdown
Contributor

@xintongsong xintongsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wenjin272
Copy link
Copy Markdown
Collaborator

Hi, @weiqingy. Due to the refactoring of Event in #631, this PR needs to resolve some conflicts. Mainly about the SerDer for Event.

weiqingy added 3 commits May 10, 2026 23:52
Resolve conflicts introduced by the Event SerDer refactor in apache#631
(commit 071c3a0). The approved on-disk log layout is preserved —
{timestamp, logLevel, eventType, event:{...}} — with the eventType
value adopting the short type string (e.g. "_input_event") that
apache#631 introduced repo-wide.

Conflict resolution:
- EventLogRecordJsonDeserializer: drop polymorphic typed reconstruction
  (deserialize as base Event); retain backward-compat note that older
  records' top-level logLevel is silently ignored.
- EventLogRecordJsonSerializer: keep top-level eventType for grep
  ergonomics, source it from event.getType(), drop PythonEvent path.
- FileEventLoggerTest: drop reintroduced EventFilter tests (this PR
  replaces filter with EventLogLevel).

Adapt to apache#631:
- EventUtil: remove resolveEventType() — it referenced the now-deleted
  PythonEvent. FileEventLogger.append uses event.getType() directly.
- JsonTruncator: traverse the attributes envelope so STANDARD-level
  truncation still reaches the user payload, which apache#631 moved from
  event fields into event.attributes.
- FileEventLoggerTest: update assertions to read fields from
  event.attributes.<field>; rewrite testHierarchicalInheritance using
  custom dotted event types since built-in events no longer have
  dot-separated parents.
Resolve conflicts with the ActionExecutionOperator manager-class
refactor in apache#546 (commit 1931403).

- Drop our PR's private initEventLogger from ActionExecutionOperator —
  the refactor moved this responsibility into EventRouter.
- Inline the FileEventLogger truncated-events-counter wiring into
  EventRouter.initEventLogger so the metric is still attached after
  the move.
- Drop the duplicate manager-managed helpers (maybeInitActionState,
  maybePersistTaskResult, setupDurableExecutionContext, persist,
  processEligibleWatermarks, createOrGetRunnerContext,
  createEventLogger, maybeInitActionStateStore,
  getCurrentSubtaskKeyGroupRange, isKeyOwnedByCurrentSubtask) from
  ActionExecutionOperator — their call sites already route through
  the manager classes after the auto-merge.
…e#546 merge

The per-event-type log-level feature reads `event-log.level` and
`event-log.standard.*` from the agent config map that the operator
forwards to FileEventLogger via AGENT_CONFIG_PROPERTY_KEY. Our
original wiring lived in ActionExecutionOperator.createEventLogger.
When apache#546 moved event-logger setup into EventRouter, it carried
forward baseLogDir and prettyPrint but not the agent-config map, so
the resolver always fell back to the built-in STANDARD default and
ignored every user override.

This broke three Python e2e tests added by this PR:
test_event_log_verbose_level, test_event_log_off_level, and
test_event_log_standard_truncation. The Java FileEventLoggerTest unit
tests still passed because they construct EventLoggerConfig directly.

Re-add the AGENT_CONFIG_PROPERTY_KEY property in
EventRouter.createEventLogger.
@weiqingy
Copy link
Copy Markdown
Collaborator Author

Thanks @wenjin272 @xintongsong . I’ve updated the PR to resolve the branch conflicts.

@wenjin272 wenjin272 merged commit c72f8f6 into apache:main May 12, 2026
23 checks passed
@weiqingy weiqingy mentioned this pull request May 12, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-needed Your PR changes impact docs. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants