feat: Observabilité et monitoring complet
Implémentation complète de la stack d'observabilité pour le monitoring de la plateforme multi-tenant Classeo. ## Error Tracking (GlitchTip) - Intégration Sentry SDK avec GlitchTip auto-hébergé - Scrubber PII avant envoi (RGPD: emails, tokens JWT, NIR français) - Contexte enrichi: tenant_id, user_id, correlation_id - Configuration backend (sentry.yaml) et frontend (sentry.ts) ## Metrics (Prometheus) - Endpoint /metrics avec restriction IP en production - Métriques HTTP: requests_total, request_duration_seconds (histogramme) - Métriques sécurité: login_failures_total par tenant - Métriques santé: health_check_status (postgres, redis, rabbitmq) - Storage Redis pour persistance entre requêtes ## Logs (Loki) - Processors Monolog: CorrelationIdLogProcessor, PiiScrubberLogProcessor - Détection PII: emails, téléphones FR, tokens JWT, NIR français - Labels structurés: tenant_id, correlation_id, level ## Dashboards (Grafana) - Dashboard principal: latence P50/P95/P99, error rate, RPS - Dashboard par tenant: métriques isolées par sous-domaine - Dashboard infrastructure: santé postgres/redis/rabbitmq - Datasources avec UIDs fixes pour portabilité ## Alertes (Alertmanager) - HighApiLatencyP95/P99: SLA monitoring (200ms/500ms) - HighErrorRate: error rate > 1% pendant 2 min - ExcessiveLoginFailures: détection brute force - ApplicationUnhealthy: health check failures ## Infrastructure - InfrastructureHealthChecker: service partagé (DRY) - HealthCheckController: endpoint /health pour load balancers - Pre-push hook: make ci && make e2e avant push
This commit is contained in:
44
monitoring/grafana/provisioning/datasources/datasources.yml
Normal file
44
monitoring/grafana/provisioning/datasources/datasources.yml
Normal file
@@ -0,0 +1,44 @@
|
||||
# Grafana Datasources Provisioning
|
||||
# Auto-configures Prometheus and Loki connections
|
||||
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
# Prometheus - Metrics
|
||||
- name: Prometheus
|
||||
uid: prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: false
|
||||
jsonData:
|
||||
timeInterval: "15s"
|
||||
httpMethod: POST
|
||||
|
||||
# Loki - Logs
|
||||
- name: Loki
|
||||
uid: loki
|
||||
type: loki
|
||||
access: proxy
|
||||
url: http://loki:3100
|
||||
editable: false
|
||||
jsonData:
|
||||
maxLines: 1000
|
||||
derivedFields:
|
||||
# Link correlation_id to traces
|
||||
- name: correlation_id
|
||||
matcherRegex: '"correlation_id":"([^"]+)"'
|
||||
url: '/explore?orgId=1&left=["now-1h","now","Loki",{"expr":"{correlation_id=\"$${__value.raw}\"}"}]'
|
||||
datasourceUid: loki
|
||||
urlDisplayLabel: "View correlated logs"
|
||||
|
||||
# Alertmanager
|
||||
- name: Alertmanager
|
||||
uid: alertmanager
|
||||
type: alertmanager
|
||||
access: proxy
|
||||
url: http://alertmanager:9093
|
||||
editable: false
|
||||
jsonData:
|
||||
implementation: prometheus
|
||||
Reference in New Issue
Block a user