refactor!: include grpc error message in tracing#422
Conversation
There was a problem hiding this comment.
@xiaoyawei I'm on board with adding this functionality in, it does match the protocol spec, but some concerns with the specific implementation, see comments.
Please let me know if you are interested in working on this within ~1 month, I know it's a very old PR. Else I might push my suggestions on top since I'm anyway planning a breaking release soon.
General feedback:
This is a breaking change to a public type (which I see no good way to avoid), so let's make the commit message and pr description something like:
refactor!: include grpc error message in tracing
We do need test coverage as well:
- happy path (percent-encoded value)
- invalid percent-encoding
- invalid utf-8
- unknown status code
- empty grpc-message header value
| /// | ||
| /// [gRPC status codes]: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md#status-codes-and-their-use-in-grpc | ||
| #[derive(Clone, Copy, Debug)] | ||
| #[derive(Clone, Copy, Debug, PartialEq, Eq)] |
There was a problem hiding this comment.
We should add #[repr(i32)] rather than relying on the default isize repr while casting
| let status = or_else!(status.to_str().ok(), HeaderNotString); | ||
| let status = or_else!(status.parse::<i32>().ok(), HeaderNotInt); | ||
| let code = or_else!(headers.get("grpc-status"), GrpcStatusHeaderMissing); | ||
| let code = or_else!(GrpcCode::from_bytes(code.as_ref()), HeaderNotGrpcCode); |
There was a problem hiding this comment.
This is a behavioral regression. Previously, >16 would be propagated as non-success, now they are converted to HeaderNotGrpcCode.
The spec states:
gRPC libraries that encounter values outside this range must either propagate them directly or convert them to UNKNOWN.
There was a problem hiding this comment.
Let's make sure we have a test for this after fixing.
| percent_decode(header.as_bytes()) | ||
| .decode_utf8() | ||
| .map(|cow| cow.to_string()) | ||
| .ok() |
There was a problem hiding this comment.
Currently we are throwing away the full value in case of non-utf8. It would be more in the spirit of the spec to do a lossy decode:
percent_decode(header.as_bytes()).decode_utf8_lossy()
Implementations must not error due to receiving an invalid ASCII-Value that's a valid field-value in HTTP, but the precise behavior is not strictly defined: they may throw the value away or accept the value. If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata. For example, if the metadata is provided to the application as a list in a request, the application should not trigger an error by providing that same list as the metadata in the response.
| /// [gRPC status codes]: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md#status-codes-and-their-use-in-grpc | ||
| #[derive(Clone, Copy, Debug)] | ||
| #[derive(Clone, Copy, Debug, PartialEq, Eq)] | ||
| pub enum GrpcCode { |
There was a problem hiding this comment.
Let's tag this as #[non_exhaustive] while we are anyway breaking semver
| @@ -248,15 +276,17 @@ impl ClassifyEos for GrpcEosErrorsAsFailures { | |||
| #[derive(Debug)] | |||
| pub enum GrpcFailureClass { | |||
There was a problem hiding this comment.
While we are anyway breaking semver, let's make this #[non_exhaustive].
| Error(String), | ||
| } | ||
|
|
||
| impl fmt::Display for GrpcFailureClass { |
There was a problem hiding this comment.
Let's also impl std::error::Error for GrpcFailureClass while we are anyway touching it
| } | ||
|
|
||
| impl GrpcStatus { | ||
| pub(crate) fn code(&self) -> GrpcCode { |
There was a problem hiding this comment.
Let's promote this from pub(crate) to pub, it seems useful
| message: Option<String>, | ||
| } | ||
|
|
||
| impl GrpcStatus { |
There was a problem hiding this comment.
Let's also add a pub fn message()-> Option<&str> to avoid forcing usage of the format machinery to peek the message
|
@jlizen Thanks for the review. I will try to address your comments and push something in the next a couple of days |
039613e to
68d4bdd
Compare
- Add #[repr(i32)] and #[non_exhaustive] to GrpcCode - Add #[non_exhaustive] to GrpcFailureClass - Fix behavioral regression: status codes >16 now correctly treated as non-success - Use decode_utf8_lossy() instead of discarding on invalid UTF-8 - Impl std::error::Error for GrpcFailureClass - Promote GrpcStatus methods to pub, add code_raw() and message() accessors - Export GrpcStatus from classify module - Add tests for percent-encoding, invalid UTF-8, unknown codes, and empty message
68d4bdd to
aaca45e
Compare
|
Hey @jlizen , would you take another look when you have time? |
jlizen
left a comment
There was a problem hiding this comment.
The refactor looks great, a few more small things, then it's ready to merge.
| #[cfg(not(feature = "trace"))] | ||
| let message = headers | ||
| .get("grpc-message") | ||
| .and_then(|header| header.to_str().ok().map(|s| s.to_owned())); |
There was a problem hiding this comment.
this path also needs the .decode_utf_lossy() handling
| classify_grpc_metadata_test! { | ||
| name: invalid_percent_encoding, | ||
| status: "13", | ||
| message: "bad%2Gencode", |
There was a problem hiding this comment.
this is good, but let's also have a similar test that is valid utf-8
| } | ||
| } | ||
|
|
||
| #[cfg(feature = "trace")] |
There was a problem hiding this comment.
As best as i can tell this split only exists to avoid depending on percent-encoding for non-trace, but it's a tiny no-dependencies dependency, I probably wouldn't bother with it in favor of simplifying.
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Let's also have a test probing that GrpcCode::Ok (code 0) is classified as a success
- Remove cfg(feature = "trace") gate on percent-decoding: always use percent_decode().decode_utf8_lossy() for grpc-message extraction - Make percent-encoding a non-optional dependency to simplify the code - Add test for valid UTF-8 percent-encoded message - Add test that GrpcCode::Ok is classified as success via ClassifyResponse
|
@jlizen See how your comments are addressed in the 3rd commit of this PR |
jlizen
left a comment
There was a problem hiding this comment.
Looks great! Thanks for the contribution.
Aiming to get this released within the next ~2 weeks.
Motivation
Currently the tracing middleware only captures the gRPC status code when extracting the error out of a http response. Besides the code, gRPC also requires a human-readable message for error handling, and it'll be useful if this message is extracted from the response as well.
Solution
Per gRPC over HTTP/2 Protocol, a percent-encoded message is included in the
grpc-messagefield in the response header/trailer. This PR extracts this value and includes it in a newGrpcStatusstruct withinGrpcFailureClass, such that downstream usages can use this message for tracing.Breaking changes:
GrpcFailureClass::Code(NonZeroI32)→GrpcFailureClass::Status(GrpcStatus)GrpcCodeandGrpcFailureClassare now#[non_exhaustive]GrpcCodeis now#[repr(i32)]GrpcFailureClassimplementsstd::error::Error