**Building Distributed Tracing in Go: A Complete Guide to Request Tracking Across Services**
Source: Dev.to
📚 Author Promotion
As a best‑selling author, I invite you to explore my books on Amazon.
Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
1. The Tracer – Central Piece of the System
The implementation revolves around a Tracer struct. It manages the entire tracing process. When creating a tracer you provide:
- Service name – identifies the service in the trace.
- Sampling rate – the percentage of requests you actually want to record (recording every request would be prohibitively expensive in a high‑traffic system).
tracer := NewTracer("order-service", 0.1) // Sample 10 % of traces
2. Spans – Units of Work
A span represents one unit of work (e.g., a DB query, an HTTP handler). The StartSpan method is where the magic begins. It:
- Looks for a parent span in the supplied
context. If one exists, the new span becomes its child, building the trace hierarchy. - Asks the sampler whether this span should be recorded.
func (t *Tracer) StartSpan(ctx context.Context, name string, opts ...SpanOption) (context.Context, *Span) {
var parentSpanContext trace.SpanContext
if parent := trace.SpanFromContext(ctx); parent != nil {
parentSpanContext = parent.SpanContext()
}
samplingResult := t.sampler.ShouldSample(SamplingParameters{
TraceID: generateTraceID(),
ParentContext: parentSpanContext,
Name: name,
Attributes: make(map[string]interface{}),
})
// ... create span based on the sampling decision
}
No‑op Span
If the sampler decides not to record, we return a no‑op span. It does nothing, keeping the overhead near zero while allowing the same code paths to run.
Real Span & Object Pool
If the sampler says yes, we obtain a span from a sync.Pool. Reusing span objects reduces pressure on Go’s garbage collector.
span := t.spanPool.Get().(*Span)
// ... configure the span
return ctx, span
Span Lifecycle
A span stores:
- Unique ID
- Parent ID
- Start & end timestamps
- Attributes (key‑value pairs, e.g.,
http.method="GET"ordb.query="SELECT * FROM users")
When the work finishes, call EndSpan:
- Calculates duration
- Sets final status (success, error, etc.)
- Sends the span to a buffered channel for export
- Resets the span and returns it to the pool
3. Context Propagation – Carrying Trace Data Across Services
Propagation moves trace information from one service to another. For HTTP, the trace ID and span ID are encoded in headers.
Extracting Incoming Context
ctx := tracer.Extract(r.Context(), propagation.HeaderCarrier(r.Header))
Injecting Outgoing Context
tracer.Inject(ctx, propagation.HeaderCarrier(r.Header))
The same pattern works for gRPC, message queues, or any transport—just use the appropriate carrier type.
4. Sampling Strategies
4.1 Probability Sampling
The simplest method: roll a dice for each new trace. With a rate of 0.1 (10 %), a random number below 0.1 means the trace is sampled.
Pros: Easy to understand, predictable.
Cons: During traffic spikes, 10 % of a huge volume can still overwhelm the backend.
4.2 Rate‑Limiting Sampling
A more sophisticated approach that caps the number of spans per second.
- Credit system – each second the sampler gains a fixed number of credits (e.g., 100).
- When a span is created, it spends one credit. If no credits remain, the span is dropped.
This keeps the load on the tracing backend bounded, even under sudden traffic surges.
5. Putting It All Together
func handler(w http.ResponseWriter, r *http.Request) {
// 1️⃣ Extract incoming trace context
ctx := tracer.Extract(r.Context(), propagation.HeaderCarrier(r.Header))
// 2️⃣ Start a new span for this handler
ctx, span := tracer.StartSpan(ctx, "http.handler", trace.WithAttributes(
attribute.String("http.method", r.Method),
attribute.String("http.path", r.URL.Path),
))
defer span.EndSpan()
// 3️⃣ Do some work (e.g., DB query)
doDBWork(ctx)
// 4️⃣ Call downstream service, injecting trace context
req, _ := http.NewRequestWithContext(ctx, "GET", "http://service-b/api", nil)
tracer.Inject(ctx, propagation.HeaderCarrier(req.Header))
http.DefaultClient.Do(req)
// 5️⃣ Respond to the client
fmt.Fprintln(w, "OK")
}
TL;DR
- Tracer – central manager, holds service name & sampling rate.
- Span – unit of work; created via
StartSpan, finished viaEndSpan. - Propagation –
Extractincoming headers,Injectoutgoing headers. - Sampling – probability vs. rate‑limiting to control data volume.
With these building blocks you can instrument any Go service, get end‑to‑end visibility, and keep the overhead under control. Happy tracing!
Rate‑Limiting Sampler
If there are no credits left, new spans are not sampled until more credits accumulate. This gives a hard upper limit on data volume.
func (rls *RateLimitingSampler) ShouldSample(params SamplingParameters) SamplingResult {
rls.mu.Lock()
defer rls.mu.Unlock()
// Update credits based on time passed
now := time.Now()
elapsed := now.Sub(rls.lastCreditUpdate).Seconds()
rls.currentCredits += elapsed * rls.creditsPerSecond
// Spend a credit if we have one
if rls.currentCredits >= 1.0 {
rls.currentCredits -= 1.0
return SamplingResult{Decision: RecordAndSample}
}
return SamplingResult{Decision: Drop}
}
An even smarter system might use adaptive sampling. This could increase the sampling rate automatically if it detects a rise in HTTP error codes, giving more visibility during failures. The sampler interface makes it easy to plug in these different strategies.
Exporting Spans
Collecting spans is one thing; sending them somewhere useful is another. The TraceExporter handles this. Spans are sent into a buffered channel (exporterCh). A separate goroutine reads from this channel and groups spans into batches. Batching is critical for efficiency—sending one span per HTTP request would be wasteful. By grouping them, network overhead can be reduced dramatically.
The batch processor either waits for a batch to fill up (e.g., 100 spans) or for a timer to fire (e.g., every 5 seconds). This way, spans are exported quickly during high traffic, but a partial batch isn’t left waiting forever during low traffic.
func (te *TraceExporter) processBatches() {
batch := make([]*SpanData, 0, te.batchSize)
for {
select {
case span := <-te.batchCh:
batch = append(batch, span)
if len(batch) >= te.batchSize {
te.sendBatch(batch)
batch = batch[:0]
}
case <-time.After(te.flushInterval):
if len(batch) > 0 {
te.sendBatch(batch)
batch = batch[:0]
}
}
}
}