Skip to content

Commit 34714f9

Browse files
observability quick tidy-up
1 parent c9af449 commit 34714f9

7 files changed

Lines changed: 89 additions & 35 deletions
Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,60 @@
11
= Failure Considerations
22
:description: Data durability refers to the fault tolerance and persistence of data in the face of software or hardware failure.
3-
:page-topic-type: concept
4-
// :page-aliases: ROOT:failure-considerations,ROOT:durability,ROOT:enhanced-durability,7.6@server:developer-guide:durability.adoc
3+
:page-toclevels: 2
4+
// :page-aliases: ROOT:failure-considerations.adoc,ROOT:durability.adoc,ROOT:enhanced-durability.adoc,7.6@server:developer-guide:durability.adoc
55

66
include::project-docs:partial$attributes.adoc[]
77

88
[abstract]
99
{description}
10+
Prepare your app for the inevitable challenges of working in a distributed network environment.
11+
12+
13+
1014
Even the most reliable software and hardware might fail at some point, and along with the failures, introduce a chance of data loss.
11-
Couchbases durability features include Synchronous Replication, and the possibility to use distributed, multi-document ACID transactions.
15+
Couchbase's durability features include Synchronous Replication, and the possibility to use distributed, multi-document ACID transactions.
1216
It is the responsibility of the development team and the software architect to evaluate the best choice for each use case.
1317

18+
This page covers the durability options offered by Couchbase Server,
19+
with the rest of this section covering logging, health check, and observability --
20+
all key to understanding the health of a complex, distributed environment.
21+
22+
23+
1424
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=intro]
1525

1626
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=syncrep]
1727
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=syncrep2]
1828
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=syncrep3]
1929

20-
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=older]
21-
2230
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=performance]
2331

32+
33+
=== Legacy Durability
34+
35+
Early versions of Couchbase Server used client-verified durablilty.
36+
This is still available in the SDK --
37+
see the https://docs.couchbase.com/sdk-api/couchbase-scala-client/com/couchbase/client/scala/durability/index.html[API documentation on durability] for details of `PersistTo` and `ReplicateTo` --
38+
but in almost every case with current Couchbase Server versions it's best to use the guarantees offered by the the Server.
39+
40+
41+
42+
2443
include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=txns]
2544

45+
46+
// placeholder for discussions about what happens when a node goes down.
47+
2648
// include::{version-common}@sdk:shared:partial$durability-replication-failure-considerations.adoc[tag=failover]
2749

50+
51+
52+
53+
54+
55+
56+
////
2857
== Further Reading
2958
3059
For now, much of the discussion (concept-level documentation) can still be found interleaved in the xref:howtos:error-handling.adoc#exception-handling[practical error handling howto doc].
60+
////

modules/concept-docs/pages/response-time-observability.adoc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
= Tracing
22
:description: Tracing and Metrics provide fine-grained insight into how an application is performing, and helps to diagnose when it is not.
3-
:nav-title: Request Tracing and Metrics
4-
:page-topic-type: concept
3+
// :nav-title: Request Tracing and Metrics
54
:page-aliases: ROOT:threshold-logging.adoc
5+
:page-toclevels: 2
66

7-
include::project-docs:partial$attributes.adoc[]
87

98
[abstract]
109
{description}

modules/howtos/pages/collecting-information-and-logging.adoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Logging
22
:description: Configuring logging; working with the event bus; and log redaction for data security.
3-
:page-topic-type: howto
4-
:page-aliases: ROOT:logging
3+
:page-toclevels: 3
4+
:page-aliases: ROOT:logging.adoc
55

66
[abstract]
77
{description}
@@ -84,7 +84,7 @@ NOTE: Gradle automatically uses the correct SLF4J API 2.x dependency required by
8484
====
8585

8686
[configuring-log4j]
87-
==== Configuring Log4j 2 output
87+
==== Configuring Log4j 2 Output
8888

8989
Log4j 2 needs a configuration file to tell it which messages to log, where to write them, and how each message should be formatted.
9090

@@ -163,6 +163,7 @@ Add these as children of the `dependencies` element.
163163
TIP: An alternate way to ensure Maven uses the correct version of the SLF4J API is to declare the dependency on `slf4j-jdk14` *before* the dependency on the Couchbase SDK.
164164
See the Maven documentation on https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Transitive_Dependencies[Transitive Dependencies] to learn more about how Maven resolves transitive dependency version conflicts.
165165
--
166+
166167
Gradle::
167168
+
168169
--
@@ -176,6 +177,7 @@ NOTE: Gradle automatically uses the correct SLF4J API 2.x dependency required by
176177
--
177178
====
178179

180+
179181
[configuring-the-jdk-logger]
180182
==== Configuring a JUL Logger
181183

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,42 @@
1-
= Diagnosing and preventing Network Problems with Health Check
2-
:description: In today's distributed and virtual environments, users will often not have full administrative control over their whole network.
3-
:navtitle: Health Check
4-
:page-topic-type: howto
1+
= Health Check
2+
:description: pass:q[Health Check provides `ping()` and `diagnostics()` tests for the health of the network and the cluster.]
3+
:page-aliases: concept-docs:health-check.adoc
4+
// :page-aliases: ROOT:health-check.adoc
5+
:page-toclevels: 2
6+
7+
58

69
[abstract]
710
{description}
8-
Health Check introduces _Ping_ to check nodes are still healthy, and to force idle connections to be kept alive in environments with eager shutdowns of unused resources.
9-
_Diagnostics_ requests a report from all the connected sockets against the cluster (from a client point of view), giving instant, but passive health check information.
1011

1112

12-
Diagnosing problems in distributed environments is far from easy, so Couchbase provides a _Health Check API_ with `Ping()` for active monitoring, and `Diagnostics()` for a look at what the client believes is the current state of the cluster.
13-
More extensive discussion of the uses of Health Check can be found in the xref:concept-docs:health-check.adoc[Health Check Concept Guide].
13+
14+
In today's distributed and virtual environments, users will often not have full administrative control over their whole network.
15+
Working in distributed environments is hard. Latencies come and go, so do connections in their entirety.
16+
Is it a network glitch, or is the remote cluster down?
17+
Sometimes just knowing the likely cause is enough to get a good start on a workaround, or at least avoid hours wasted on an inappropriate solution.
18+
19+
Health Check features _Ping_ to check nodes are still healthy, and to force idle connections to be kept alive in environments with eager shutdowns of unused resources.
20+
_Diagnostics_ requests a report from a node, giving instant health check information.
21+
22+
23+
24+
// Uses
25+
include::{version-common}@sdk:pages:partial$health-check.adoc[tag="uses"]
26+
27+
1428

1529
== Ping
1630

31+
32+
`Ping` _actively_ queries the status of the specified services, giving status and latency information for every node reachable.
33+
In addition to its use as a monitoring tool, a regular `Ping` can be used in an environment which does not respect keep alive values for a connection.
34+
1735
At its simplest, `ping` provides information about the current state of the connections in the Couchbase Cluster, by actively polling:
1836

1937
[source,java]
2038
----
21-
include::example$HealthCheck.java[tag=ping-basic]
39+
include::devguide:example$java/HealthCheck.java[tag=ping-basic]
2240
----
2341

2442
This will print the latency for each socket (endpoint) connected per service. More information is available on the classes.
@@ -27,24 +45,25 @@ This is made easy by the `exportToJson` method:
2745

2846
[source,java]
2947
----
30-
include::example$HealthCheck.java[tag=ping-json-export]
48+
include::devguide:example$java/HealthCheck.java[tag=ping-json-export]
3149
----
3250

3351
By default the SDK will ping all services available on the target cluster.
3452
You can customize the type of services to ping through the `PingOptions`:
3553

3654
[source,java]
3755
----
38-
include::example$HealthCheck.java[tag=ping-options]
56+
include::devguide:example$java/HealthCheck.java[tag=ping-options]
3957
----
4058

4159
In this example, only the Query service is included in the ping report.
4260

43-
Note that `ping` is available both on the `Cluster` and the `Bucket` level.
44-
The difference is that at the cluster level, the key-value service might not be
61+
Note that `ping` is available both at the `Cluster` and the `Bucket` level.
62+
The difference is that at the cluster level, the key-value (Data) service might not be
4563
included based on the Couchbase Server version in use.
4664
If you want to make sure the key-value service is included, perform it at the bucket level.
4765

66+
4867
== Diagnostics
4968

5069
Diagnostics works in a similar fashion to `ping` in the sense that it returns a report of how all the sockets/endpoints are doing, but the main difference is that it is passive.
@@ -53,17 +72,18 @@ This makes it much cheaper to call on a regular basis, but does not provide any
5372

5473
[source,java]
5574
----
56-
include::example$HealthCheck.java[tag=diagnostics-basic]
75+
include::devguide:example$java/HealthCheck.java[tag=diagnostics-basic]
5776
----
5877

5978
Because it is passive, diagnostics are only available at the `Cluster` level and cover everything in the current SDK state. Also, because it is not doing any I/O you cannot proactively filter the list of services that are returned, all you need to do is look only at the ones that are interesting to you.
6079

61-
A `DiagnosticsResult` has one interesting property over a ping result: It provides a cumulative `ClusterState` through the `state()` method.
62-
The state can be `ONLINE`, `DEGRADED` or `OFFLINE`. This allows to give a single, although simplistic, view on how your cluster is doing from a client point of view.
80+
A `DiagnosticsResult` has one interesting property over a ping result -- it provides a cumulative `ClusterState` through the `state()` method.
81+
The state can be `ONLINE`, `DEGRADED` or `OFFLINE`.
82+
This allows to give a single, although simplistic, view on how your cluster is doing from a client point of view.
6383
The state is determined as follows:
6484

6585
* If at least one socket is open and all of them are connected, it is `ONLINE`
6686
* If at least one is connected but not all are, it is `DEGRADED`
6787
* If none are connected, it is `OFFLINE`
6888

69-
Of course you can iterate over the individual states and apply a different algorithm if needed.
89+
You can iterate over the individual states and apply a different algorithm if needed.

modules/howtos/pages/observability-metrics.adoc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
= Metrics Reporting
22
:description: Individual request tracing presents a very specific (though isolated) view of the system.
3-
:page-topic-type: howto
3+
:page-toclevels: 2
4+
45

56
[abstract]
67
{description}
@@ -21,7 +22,7 @@ By default the metrics will be emitted every 10 minutes, but you can customize t
2122

2223
[source,java]
2324
----
24-
include::example$Metrics.java[tag=metrics-enable-custom,indent=0]
25+
include::devguide:example$java/Metrics.java[tag=metrics-enable-custom,indent=0]
2526
----
2627

2728
Once enabled, there is no further configuration needed. The `LoggingMeter` will emit the collected request statistics every interval.
@@ -128,7 +129,7 @@ For metrics, add this logic to the application:
128129

129130
[source,java]
130131
----
131-
include::example$Metrics.java[tag=metrics-otel-prometheus,indent=0]
132+
include::devguide:example$java/Metrics.java[tag=metrics-otel-prometheus,indent=0]
132133
----
133134

134135

@@ -242,7 +243,7 @@ See the Micrometer documentation for details.
242243

243244
[source,java]
244245
----
245-
include::example$MetricsMicrometer.java[tag=metrics-micrometer-prometheus,indent=0]
246+
include::devguide:example$java/MetricsMicrometer.java[tag=metrics-micrometer-prometheus,indent=0]
246247
----
247248

248249

modules/howtos/pages/observability-tracing.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= Request Tracing
22
:description: Collecting information about an individual request and its response is an essential feature of every observability stack.
3-
:page-topic-type: howto
3+
:page-toclevels: 2
44
:page-aliases: ROOT:tracing-from-the-sdk.adoc
55

66
[abstract]
@@ -18,7 +18,7 @@ It is possible to customize this behavior by modifying the configuration:
1818

1919
[source,java]
2020
----
21-
include::example$Tracing.java[tag=tracing-configure,indent=0]
21+
include::devguide:example$java/Tracing.java[tag=tracing-configure,indent=0]
2222
----
2323

2424
In this case the emit interval is one minute and Key/Value requests will only be considered if their latency is greater or equal than two seconds.
@@ -69,9 +69,11 @@ More information will be provided as we get closer to stabilization.
6969

7070

7171
== OpenTelemetry Integration
72+
7273
The built-in tracer is great if you do not have a centralized monitoring system, but if you already plug into the OpenTelemetry ecosystem we want to make sure to provide first-class support.
7374

7475
=== Exporting to OpenTelemetry
76+
7577
This method exports tracing telemetry in OpenTelemetry's standard format (OTLP), which can be sent to any OTLP-compatible receiver such as Jaeger, Zipkin or `opentelemetry-collector`.
7678

7779
Add this to your Maven, or the equivalent to your build tool of choice:

modules/howtos/pages/slow-operations-logging.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= Slow Operations Logging
22
:description: Tracing information on slow operations can be found in the logs as threshold logging, orphan logging, and other span metrics.
3-
:page-topic-type: howto
3+
:page-toclevels: 2
44

55
[abstract]
66
{description}

0 commit comments

Comments
 (0)