Masking PII in Logs and Traces: Manual vs Automated

December 11, 2025 • Written by Jakob Serlier

Tags: Observability, Security, Kubernetes, OpenTelemetry, Healthcare
---

I’ve recently been experimenting with PII masking in observability pipelines using Presidio. When comparing the approaches, three automated and one manual, the operational differences were significant.

Approaches

Approach Data Flow
Presidio Proxy (Two-Collector) App → Frontend Collector → Presidio Proxy → Backend Collector → Loki/Tempo/Grafana
Custom Go Collector (Built-in Masking) App → Custom Go Collector (with masking) → Loki/Tempo/Grafana
Sidecar Interceptor (Pod-local) App → Sidecar (localhost:4318) → Presidio → Collector → Loki/Tempo/Grafana
Manual In-App Masking App → mask_pii() → Normal logging/OTLP export → Loki/Tempo/Grafana

Automated masking

I tested three automated approaches: an OTLP proxy, a custom Go collector, and a pod-local sidecar interceptor. They all work, but have similar failure modes:

1. Model correctness

NLP models are imperfect. In practice:

  • Patient IDs flagged as SSNs
  • Internal IPs flagged as phone numbers
  • Organization names removed entirely
  • ICD medical codes misclassified

You can tune confidence thresholds and entity lists, but you are always trading false positives against false negatives. This is true for automated and manual masking, but automated masking amplifies the impact because over-masking can hide information that is critical for debugging.

2. Consider: coverage gaps

Automated masking only sees the telemetry you intercept. If PII is logged anywhere in the request path before your proxy or collector, you're leaking PII

3. Latency and failure modes

Presidio latency is wildly dependent on a variety of factors. During local testing of the k8s demo, it adds around 40 to 50 ms per call.

Automated masking puts this in the critical path. If Presidio stalls, your telemetry backs up. If your proxy or collector crashes, the entire pipeline stalls.

For brownfield systems where code changes are impossible, automated masking may still be the only viable starting point. For anything latency sensitive or regulated, it is a risk multiplier.

Manual masking

Manual masking means developers call a mask function before logging or setting span attributes. It is explicit, and predictable.

The advantages are simple:

  • Developers know which fields are actually sensitive
  • No redaction surprises in production logs
  • PII never leaves the service if masked
  • No hot-path dependency on external services

The obvious downside is human error. Someone forgets to mask a field. Solve this with linting, wrappers, and code review. Coverage gaps still exist where developers fail to intercept, and we should not underestimate developer discipline which is a systematic risk.

Practical considerations

A few themes became obvious across the demos:

  • Mask before data leaves the application. Anything emitted unmasked can land in places you never intended.
  • Validate your language models. The recommended English spaCy models behave reasonably well; the xx multilingual model performs noticeably worse for Arabic and must be tested.
  • Automated masking isn’t a compliance solution. It only covers the telemetry you intercept. Infrastructure logs and platform components need their own controls.

These demos assume Presidio is a fixed requirement, so alternatives like deterministic field scrubbing, schema-driven masking, or service-mesh request-body redaction aren’t covered—though they’re valid in many systems. Even with in-app masking, PII may still leak through frameworks, reverse proxies, service meshes, or debug logging paths that observe data before masking occurs. In practice, masking only guarantees that application-generated telemetry is clean when applied correctly; everything around the application still needs review.

Closing thoughts

If correctness matters, use manual masking. It avoids most operational hazards and makes data flow explicit. However, repeating: do not underestimate developer discipline which is a systematic risk.

If you cannot modify application code, automated masking can work, but do not underestimate the complexity and the general risk on observability (given the failure modes). The examples in the automated repository are intentionally minimal, and real deployments require productionizing.

Resources

The two demo repositories for PII (Personally Identifiable Information) masking using Microsoft Presidio can be found here:

Comments

No comments yet.

    Stored via Netlify Functions & Blobs. Do not include sensitive info.