📊 Monitoring Guide - Gunei ERP (Enterprise Architecture)

Sistema completo de monitoreo y observabilidad multi-ambiente del sistema Gunei ERP.

Versión 2.1 - Multi-Ambiente Enterprise (Staging + Production)

📋 Tabla de Contenidos

Descripción General
Arquitectura de Monitoreo
Health Checks por Ambiente
Scripts de Monitoreo
Logs Centralizados
- Docker Logging Configuration
- Timezone
Cron Jobs
Notificaciones Discord
Monitoreo PostgreSQL Shared
Comparación de Métricas Entre Ambientes
Métricas y Alertas
Dashboard Consolidado
Troubleshooting

🎯 Descripción General

Objetivos

Detectar problemas antes de que afecten a usuarios en cada ambiente
Centralizar logs de múltiples servicios y ambientes
Automatizar health checks cada 5 minutos (staging + production)
Notificar fallos vía Discord identificando ambiente
Mantener histórico de eventos por ambiente
Monitorear infraestructura compartida (PostgreSQL, Caddy)

Componentes por Ambiente

┌─ STAGING ─────────────────────────┐
│ Frontend Staging → Health Check   │
│ Backend Staging  → Health Check   │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘
                ↓
┌─ PRODUCTION ───────────────────────┐
│ Frontend Production → Health Check │
│ Backend Production  → Health Check │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘
                ↓
┌─ INFRASTRUCTURE ───────────────────┐
│ PostgreSQL Shared → Health Check   │
│ Caddy Shared     → Health Check    │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘
                ↓
        Logs Centralizados
                ↓
         Discord Webhooks

Filosofía de Monitoreo

Por Ambiente: Cada ambiente (staging/production) se monitorea independientemente
Infraestructura Compartida: PostgreSQL y Caddy se monitorean como servicios críticos que afectan a ambos ambientes
Alertas Contextuales: Las notificaciones identifican claramente el ambiente afectado
Logs Segregados: Logs separados por ambiente pero centralizados para análisis
Redundancia: Si staging falla, production puede seguir operando (y viceversa)

🏗️ Arquitectura de Monitoreo

Servicios Monitoreados

Componente	Tipo	Afecta a	Health Endpoint
Caddy Shared	Infraestructura	Ambos ambientes	N/A (proceso)
PostgreSQL Shared	Infraestructura	Ambos ambientes	`pg_isready` (puerto 5433)
Frontend Staging	Aplicación	Staging	`https://staging-erpfront.gunei.xyz/health`
Backend Staging	Aplicación	Staging	`https://staging-erpback.gunei.xyz/status`
Frontend Production	Aplicación	Production	(pendiente URL)
Backend Production	Aplicación	Production	(pendiente URL)

Ubicación de Scripts

bash

/root/scripts/
├── monitor-logs.sh         # Ver logs multi-ambiente (actualizado)
├── health-check.sh         # Verificar salud completa (actualizado)
├── alert-check.sh          # Health check + alertas (actualizado)
├── check-staging.sh        # Check solo staging (nuevo)
├── check-production.sh     # Check solo production (nuevo)
├── check-infrastructure.sh # Check servicios compartidos (nuevo)
└── metrics-dashboard.sh    # Dashboard consolidado (nuevo)

🥦 Health Checks por Ambiente

Endpoints por Ambiente

Staging Environment

Frontend Staging:

bash

GET https://staging-erpfront.gunei.xyz/health

Response:
{
  "status": "healthy",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "service": "gunei-erp-frontend",
  "version": "0.0.1",
  "runtime": "bun",
  "environment": "staging"
}

Backend Staging:

bash

GET https://staging-erpback.gunei.xyz/status

Response:
{
  "status": "ok",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "environment": "staging",
  "database": "connected",
  "uptime": 123456
}

Production Environment (Cuando esté activo)

Frontend Production:

bash

GET https://erpfront.gunei.xyz/health  # URL pendiente configurar

Response:
{
  "status": "healthy",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "service": "gunei-erp-frontend",
  "environment": "production"
}

Backend Production:

bash

GET https://erpback.gunei.xyz/status  # URL pendiente configurar

Response:
{
  "status": "ok",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "environment": "production",
  "database": "connected"
}

Infrastructure (Compartida)

PostgreSQL Shared:

bash

# Check staging database
docker exec postgres-shared pg_isready -U gunei_staging_user -d gunei_erp_staging

# Check production database
docker exec postgres-shared pg_isready -U gunei_prod_user -d gunei_erp_production

# Check server
docker exec postgres-shared pg_isready -U postgres

Caddy Shared:

bash

# Check proceso
docker ps | grep caddy-shared

# Check logs para errores
docker logs caddy-shared --tail 50 | grep -i error

Mapeo de Puertos por Ambiente

Ambiente	Servicio	Puerto Interno	Puerto Host
Shared	PostgreSQL	5432	5433
Staging	Backend	3000	3000
Staging	Frontend	3001	3001
Production	Backend	3000	3100
Production	Frontend	3001	3101

Verificación de puertos:

bash

# Ver todos los puertos en uso
ss -tlnp | grep -E "3000|3001|3100|3101|5433"

# Verificar que no hay conflictos
netstat -tlnp | grep -E ":300[01]|:310[01]|:5433"

Conexiones a Base de Datos por Ambiente

Ambiente	Database	Usuario	Host
Staging	`gunei_erp_staging`	`gunei_staging_user`	`postgres-shared:5432`
Production	`gunei_erp_production`	`gunei_prod_user`	`postgres-shared:5432`

Verificar conexión correcta:

bash

# Staging - debe conectar a gunei_erp_staging
docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "\conninfo"

# Production - debe conectar a gunei_erp_production
docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "\conninfo"

# Ver conexiones activas por database
docker exec postgres-shared psql -U postgres -c "SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;"

Criterios de Salud

Por Servicio:

Status 200: Sistema operativo
Status 5xx: Fallo crítico
Timeout (5s): Servicio no responde
Database: Conexión verificada

Por Ambiente:

Staging Healthy: Ambos servicios (frontend + backend) responden OK
Production Healthy: Ambos servicios responden OK
Infrastructure Healthy: PostgreSQL + Caddy operativos

🔧 Scripts de Monitoreo

1. monitor-logs.sh (Actualizado)

Propósito: Visualizar logs de todos los servicios y ambientes simultáneamente

bash

#!/bin/bash
# Ver logs multi-ambiente en una sola pantalla

# Uso
/root/scripts/monitor-logs.sh

# Ver logs de un ambiente específico
/root/scripts/monitor-logs.sh staging
/root/scripts/monitor-logs.sh production

# Ver logs de infraestructura
/root/scripts/monitor-logs.sh infrastructure

Ejemplo de implementación actualizada:

bash

#!/bin/bash
# /root/scripts/monitor-logs.sh

ENVIRONMENT=${1:-all}

echo "==================================="
echo "📋 Gunei ERP - Monitor de Logs"
echo "==================================="
echo ""

show_logs() {
    service=$1
    title=$2
    echo "📦 === $title ==="
    if docker ps | grep -q $service; then
        docker logs --tail 20 $service 2>&1 | tail -10
    else
        echo "⚠️  Container $service no está corriendo"
    fi
    echo ""
}

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "infrastructure" ]; then
    echo "🏗️ === INFRASTRUCTURE ==="
    show_logs "postgres-shared" "PostgreSQL Shared"
    show_logs "caddy-shared" "Caddy Shared"
    echo ""
fi

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "staging" ]; then
    echo "🟢 === STAGING ==="
    show_logs "gunei-backend-staging" "Backend Staging"
    show_logs "gunei-frontend-staging" "Frontend Staging"
    echo ""
fi

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "production" ]; then
    if docker ps | grep -q "gunei-backend-production"; then
        echo "🔵 === PRODUCTION ==="
        show_logs "gunei-backend-production" "Backend Production"
        show_logs "gunei-frontend-production" "Frontend Production"
        echo ""
    fi
fi

echo "📊 === Estado de Containers ==="
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
echo ""

echo "💾 === Uso de Recursos ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Servicios monitoreados:

Infraestructura: PostgreSQL Shared, Caddy Shared
Staging: Backend Staging, Frontend Staging
Production: Backend Production, Frontend Production (cuando esté deployado)

Características:

Output coloreado por ambiente
Filtrado por ambiente específico
Timestamps sincronizados
Detección automática de ambientes activos

2. health-check.sh (Actualizado)

Propósito: Verificar estado del sistema completo por ambiente

bash

#!/bin/bash
# Check completo de salud multi-ambiente

# Uso
/root/scripts/health-check.sh

# Check solo un ambiente
/root/scripts/health-check.sh staging
/root/scripts/health-check.sh production

Ejemplo de implementación actualizada:

bash

#!/bin/bash
# /root/scripts/health-check.sh

ENVIRONMENT=${1:-all}

echo "🥦 Health Check - Gunei ERP (Multi-Environment)"
echo "================================================"
echo ""

# Colores
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

check_service() {
    service=$1
    url=$2

    if curl -f -s -o /dev/null "$url"; then
        echo -e "${GREEN}✅ $service: OK${NC}"
        return 0
    else
        echo -e "${RED}❌ $service: FAIL${NC}"
        return 1
    fi
}

# Infrastructure
echo "🏗️  Infraestructura:"
if docker exec postgres-shared pg_isready -U postgres > /dev/null 2>&1; then
    echo -e "${GREEN}✅ PostgreSQL Shared: OK${NC}"
else
    echo -e "${RED}❌ PostgreSQL Shared: FAIL${NC}"
fi

if docker ps | grep -q caddy-shared; then
    echo -e "${GREEN}✅ Caddy Shared: OK${NC}"
else
    echo -e "${RED}❌ Caddy Shared: FAIL${NC}"
fi

# Staging
if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "staging" ]; then
    echo ""
    echo "🟢 Staging Environment:"
    check_service "Backend Staging /status" "http://localhost:3000/status"
    check_service "Frontend Staging /health" "http://localhost:3001/health"
    check_service "HTTPS staging-erpback.gunei.xyz" "https://staging-erpback.gunei.xyz/status"
    check_service "HTTPS gunei.xyz" "https://staging-erpfront.gunei.xyz/health"
fi

# Production (si existe)
if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "production" ]; then
    if docker ps | grep -q "gunei-backend-production"; then
        echo ""
        echo "🔵 Production Environment:"
        check_service "Backend Production /status" "http://localhost:3100/status"
        check_service "Frontend Production /health" "http://localhost:3101/health"
        # URLs públicas cuando estén configuradas:
        # check_service "HTTPS erpback.gunei.xyz" "https://erpback.gunei.xyz/status"
        # check_service "HTTPS erpfront.gunei.xyz" "https://erpfront.gunei.xyz/health"
    fi
fi

# Databases
echo ""
echo "🗄️  Databases:"
if docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1; then
    echo -e "${GREEN}✅ DB Staging: OK${NC}"
else
    echo -e "${RED}❌ DB Staging: FAIL${NC}"
fi

if docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1; then
    echo -e "${GREEN}✅ DB Production: OK${NC}"
else
    echo -e "${YELLOW}⚠️  DB Production: No disponible o no configurada${NC}"
fi

# Verificar containers corriendo
echo ""
echo "📦 Docker Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"

# Uso de recursos
echo ""
echo "💾 System Resources:"
df -h / | tail -1 | awk '{print "Disk: "$3" / "$2" ("$5" used)"}'
free -h | grep Mem | awk '{print "RAM:  "$3" / "$2" (used/total)"}'

Output ejemplo:

🥦 Health Check - Gunei ERP (Multi-Environment)
================================================

🏗️  Infraestructura:
✅ PostgreSQL Shared: OK
✅ Caddy Shared: OK

🟢 Staging Environment:
✅ Backend Staging /status: OK
✅ Frontend Staging /health: OK
✅ HTTPS staging-erpback.gunei.xyz: OK
✅ HTTPS staging-erpfront.gunei.xyz: OK

🔵 Production Environment:
✅ Backend Production /status: OK
✅ Frontend Production /health: OK

🗄️  Databases:
✅ DB Staging: OK
✅ DB Production: OK

📦 Docker Containers:
NAMES                        STATUS
caddy-shared                 Up 5 days
postgres-shared              Up 5 days
gunei-backend-staging        Up 2 days
gunei-frontend-staging       Up 2 days
gunei-backend-production     Up 1 day
gunei-frontend-production    Up 1 day

💾 System Resources:
Disk: 45G / 200G (23% used)
RAM:  4.2G / 8.0G (used/total)

3. alert-check.sh (Actualizado)

Propósito: Health check automatizado con notificaciones por ambiente

bash

#!/bin/bash
# Ejecutado por cron cada 5 minutos

# Funciones:
# - Ejecuta health checks por ambiente
# - Detecta fallos específicos del ambiente
# - Envía alertas a Discord identificando ambiente
# - Registra en log centralizado con tags de ambiente

Lógica de alertas actualizada:

Primera falla (staging): Alerta inmediata "🟢 Staging Down"
Primera falla (production): Alerta inmediata "🔵 Production Down"
Falla de infraestructura: Alerta crítica "🚨 Infrastructure Down (affects all environments)"
Fallas consecutivas: Alerta cada 15 minutos
Recuperación: Notificación de sistema restaurado con downtime

Ejemplo de implementación:

bash

#!/bin/bash
# /root/scripts/alert-check.sh

LOG_FILE="/var/log/gunei-health.log"
STATE_DIR="/var/tmp/gunei-health-state"
DISCORD_WEBHOOK="${DISCORD_WEBHOOK_URL}"

mkdir -p "$STATE_DIR"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

check_and_alert() {
    ENV=$1
    SERVICE=$2
    URL=$3
    STATE_FILE="$STATE_DIR/${ENV}_${SERVICE}_state"
    
    if curl -f -s -o /dev/null "$URL"; then
        # Service is UP
        if [ -f "$STATE_FILE" ]; then
            # Was down, now recovered
            DOWNTIME=$(cat "$STATE_FILE")
            log "[INFO] [$ENV] $SERVICE recovered (downtime: $DOWNTIME)"
            send_discord_recovery "$ENV" "$SERVICE" "$DOWNTIME"
            rm "$STATE_FILE"
        fi
    else
        # Service is DOWN
        if [ ! -f "$STATE_FILE" ]; then
            # First failure
            echo "$(date +%s)" > "$STATE_FILE"
            log "[ERROR] [$ENV] $SERVICE is DOWN"
            send_discord_alert "$ENV" "$SERVICE" "$URL"
        else
            # Still down
            START_TIME=$(cat "$STATE_FILE")
            CURRENT_TIME=$(date +%s)
            DOWNTIME=$((CURRENT_TIME - START_TIME))
            log "[ERROR] [$ENV] $SERVICE still DOWN (${DOWNTIME}s)"
        fi
    fi
}

send_discord_alert() {
    ENV=$1
    SERVICE=$2
    URL=$3
    
    if [ "$ENV" = "staging" ]; then
        COLOR="3066993"  # Verde oscuro
        EMOJI="🟢"
    elif [ "$ENV" = "production" ]; then
        COLOR="15548997"  # Rojo
        EMOJI="🔵"
    else
        COLOR="15105570"  # Naranja
        EMOJI="🚨"
    fi
    
    curl -X POST "$DISCORD_WEBHOOK" \
        -H "Content-Type: application/json" \
        -d "{
            \"embeds\": [{
                \"title\": \"${EMOJI} ${ENV^^} - ${SERVICE} Down\",
                \"color\": ${COLOR},
                \"description\": \"Service is not responding\",
                \"fields\": [
                    {\"name\": \"Environment\", \"value\": \"$ENV\", \"inline\": true},
                    {\"name\": \"Service\", \"value\": \"$SERVICE\", \"inline\": true},
                    {\"name\": \"URL\", \"value\": \"$URL\"},
                    {\"name\": \"Time\", \"value\": \"$(date '+%Y-%m-%d %H:%M:%S')\"}
                ]
            }]
        }"
}

send_discord_recovery() {
    ENV=$1
    SERVICE=$2
    DOWNTIME=$3
    
    curl -X POST "$DISCORD_WEBHOOK" \
        -H "Content-Type: application/json" \
        -d "{
            \"embeds\": [{
                \"title\": \"✅ ${ENV^^} - ${SERVICE} Recovered\",
                \"color\": 5763719,
                \"description\": \"Service is responding normally\",
                \"fields\": [
                    {\"name\": \"Environment\", \"value\": \"$ENV\", \"inline\": true},
                    {\"name\": \"Downtime\", \"value\": \"${DOWNTIME} seconds\", \"inline\": true}
                ]
            }]
        }"
}

# Main - Check all services
log "[INFO] === Health Check Started ==="

# Infrastructure
check_and_alert "infrastructure" "PostgreSQL" "direct-pg-check"
check_and_alert "infrastructure" "Caddy" "docker-ps-check"

# Staging
check_and_alert "staging" "Backend" "https://staging-erpback.gunei.xyz/status"
check_and_alert "staging" "Frontend" "https://staging-erpfront.gunei.xyz/health"

# Production (si está activo)
if docker ps | grep -q "gunei-backend-production"; then
    check_and_alert "production" "Backend" "http://localhost:3100/status"
    check_and_alert "production" "Frontend" "http://localhost:3101/health"
fi

log "[INFO] === Health Check Completed ==="

4. check-staging.sh (Nuevo)

Propósito: Health check rápido solo del ambiente staging

bash

#!/bin/bash
# Check rápido de staging

/root/scripts/health-check.sh staging

5. check-production.sh (Nuevo)

Propósito: Health check rápido solo del ambiente production

bash

#!/bin/bash
# Check rápido de production

/root/scripts/health-check.sh production

6. check-infrastructure.sh (Nuevo)

Propósito: Health check de servicios compartidos críticos

bash

#!/bin/bash
# Check de infraestructura compartida

echo "🏗️  Infrastructure Health Check"
echo "=============================="
echo ""

# PostgreSQL
echo "PostgreSQL Shared:"
if docker exec postgres-shared pg_isready -U postgres > /dev/null 2>&1; then
    echo "  ✅ Server: OK"
else
    echo "  ❌ Server: FAIL"
fi

if docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1; then
    echo "  ✅ Staging DB: OK"
else
    echo "  ❌ Staging DB: FAIL"
fi

if docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1; then
    echo "  ✅ Production DB: OK"
else
    echo "  ⚠️  Production DB: Not available"
fi

# Caddy
echo ""
echo "Caddy Shared:"
if docker ps | grep -q caddy-shared; then
    echo "  ✅ Container: Running"
    
    # Check SSL certificates
    CERT_COUNT=$(docker exec caddy-shared ls /data/caddy/certificates/ 2>/dev/null | wc -l)
    echo "  ✅ SSL Certificates: $CERT_COUNT"
    
    # Check logs for errors
    ERROR_COUNT=$(docker logs caddy-shared --tail 100 2>&1 | grep -i error | wc -l)
    if [ "$ERROR_COUNT" -eq 0 ]; then
        echo "  ✅ No recent errors"
    else
        echo "  ⚠️  Recent errors: $ERROR_COUNT"
    fi
else
    echo "  ❌ Container: Not running"
fi

# Disk space
echo ""
echo "System Resources:"
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
echo "  Disk: $DISK_USAGE% used"
if [ "$DISK_USAGE" -gt 80 ]; then
    echo "    ⚠️  WARNING: High disk usage"
fi

MEM_USAGE=$(free | awk 'NR==2 {printf "%.0f", $3/$2*100}')
echo "  Memory: $MEM_USAGE% used"
if [ "$MEM_USAGE" -gt 85 ]; then
    echo "    ⚠️  WARNING: High memory usage"
fi

7. metrics-dashboard.sh (Nuevo)

Propósito: Dashboard consolidado de métricas multi-ambiente

Ver sección Dashboard Consolidado para implementación completa.

📝 Logs Centralizados

Archivo Principal

bash

/var/log/gunei-health.log

Formato de Logs Actualizado

[TIMESTAMP] [LEVEL] [ENVIRONMENT] [COMPONENT] Message

Ejemplo:

[2026-01-12 12:34:56] [INFO] [STAGING] [HEALTH_CHECK] Backend health: OK
[2026-01-12 12:39:56] [ERROR] [STAGING] [HEALTH_CHECK] Backend not responding (timeout)
[2026-01-12 12:40:01] [ERROR] [STAGING] [ALERT] Discord notification sent: Backend down
[2026-01-12 12:44:56] [INFO] [STAGING] [HEALTH_CHECK] Backend health: OK
[2026-01-12 12:45:01] [INFO] [STAGING] [ALERT] Discord notification sent: Backend recovered

[2026-01-12 13:15:22] [INFO] [PRODUCTION] [HEALTH_CHECK] Backend health: OK
[2026-01-12 13:15:22] [INFO] [PRODUCTION] [HEALTH_CHECK] Frontend health: OK

[2026-01-12 13:20:30] [ERROR] [INFRASTRUCTURE] [HEALTH_CHECK] PostgreSQL connection slow
[2026-01-12 13:20:30] [WARNING] [INFRASTRUCTURE] [ALERT] PostgreSQL performance degraded

Rotación de Logs

bash

# Configuración: /etc/logrotate.d/gunei-health
/var/log/gunei-health.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    size 100M
}

Ver Logs por Ambiente

bash

# Logs de staging
grep "\[STAGING\]" /var/log/gunei-health.log | tail -n 50

# Logs de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | tail -n 50

# Logs de infraestructura
grep "\[INFRASTRUCTURE\]" /var/log/gunei-health.log | tail -n 50

# Errores de staging
grep "\[STAGING\]" /var/log/gunei-health.log | grep ERROR

# Alertas de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | grep ALERT

# Tiempo real por ambiente
tail -f /var/log/gunei-health.log | grep "\[STAGING\]"
tail -f /var/log/gunei-health.log | grep "\[PRODUCTION\]"

Docker Logging Configuration

Todos los contenedores usan logging JSON con rotación automática:

yaml

logging:
  driver: "json-file"
  options:
    max-size: "10m"    # Máximo 10MB por archivo
    max-file: "3"      # Máximo 3 archivos
    tag: "{{.Name}}/{{.ID}}"

Verificación:

bash

# Ver configuración de logging de un container
docker inspect gunei-backend-staging | grep -A 10 LogConfig

# Los logs rotan automáticamente (no llenan disco)
docker logs gunei-backend-staging --tail 100

Beneficios:

Logs no llenan disco (máximo ~30MB por container)
Rotación automática sin intervención
Tags identifican container en logs centralizados

Timezone

Todos los contenedores usan timezone Argentina:

yaml

environment:
  - TZ=America/Argentina/Buenos_Aires

Verificación:

bash

# Ver timezone de cada container
docker exec gunei-backend-staging date
docker exec gunei-frontend-staging date
docker exec postgres-shared date

# Output esperado: hora Argentina (UTC-3)

Implicaciones:

Logs con timestamps en hora Argentina
Cron jobs ejecutan en hora local del servidor
Backups con nomenclatura en hora Argentina

Logs de Contenedores por Ambiente

Staging:

bash

# Backend staging
docker logs gunei-backend-staging -f
docker logs gunei-backend-staging --tail 100
docker logs gunei-backend-staging --since 1h

# Frontend staging
docker logs gunei-frontend-staging -f
docker logs gunei-frontend-staging --tail 100

Production:

bash

# Backend production
docker logs gunei-backend-production -f
docker logs gunei-backend-production --tail 100

# Frontend production
docker logs gunei-frontend-production -f
docker logs gunei-frontend-production --tail 100

Infrastructure:

bash

# PostgreSQL (afecta ambos ambientes)
docker logs postgres-shared -f
docker logs postgres-shared --tail 100

# Caddy (routing de ambos ambientes)
docker logs caddy-shared -f
docker logs caddy-shared --tail 100

# Logs de acceso de Caddy por ambiente
docker exec caddy-shared cat /var/log/caddy/staging-erpfront.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/staging-erpback.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/production-erpfront.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/production-erpback.log | tail -n 50

⏰ Cron Jobs

Configuración Actual

bash

# Ver crontab
crontab -l

# Editar crontab
crontab -e

Schedule Completo (Actualizado)

bash

# Health checks cada 5 minutos (multi-ambiente)
*/5 * * * * /root/scripts/alert-check.sh >> /var/log/gunei-health.log 2>&1

# Backups diarios 2 AM (ambos ambientes)
0 2 * * * /root/scripts/backup-postgres.sh >> /var/log/gunei-backups.log 2>&1

# Verificación de backups 3 AM
0 3 * * * /root/scripts/verify-backup.sh >> /var/log/gunei-backups.log 2>&1

# Cleanup semanal (domingos 4 AM)
0 4 * * 0 /root/scripts/cleanup-backups.sh >> /var/log/gunei-backups.log 2>&1

# Check de disco diario 5 AM
0 5 * * * /root/scripts/check-disk-space.sh >> /var/log/gunei-health.log 2>&1

# Metrics dashboard cada hora (opcional)
0 * * * * /root/scripts/metrics-dashboard.sh >> /var/log/gunei-metrics.log 2>&1

Verificar Ejecución

bash

# Ver último run de health checks (cualquier ambiente)
grep "HEALTH_CHECK" /var/log/gunei-health.log | tail -n 1

# Ver último run por ambiente
grep "\[STAGING\]" /var/log/gunei-health.log | grep "HEALTH_CHECK" | tail -n 1
grep "\[PRODUCTION\]" /var/log/gunei-health.log | grep "HEALTH_CHECK" | tail -n 1

# Verificar que cron está activo
systemctl status cron

# Ver logs de cron
grep CRON /var/log/syslog | tail -n 20

# Ver ejecuciones recientes de alert-check
grep "alert-check.sh" /var/log/syslog | tail -n 10

📢 Notificaciones Discord

Webhook Configurado

bash

# Variable de entorno (en GitHub Secrets y VPS)
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...

Tipos de Notificaciones Actualizadas

1. Deployment Success (Staging)

json

{
  "embeds": [{
    "title": "✅ Deployment Successful - Staging",
    "color": 5763719,
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Branch", "value": "develop", "inline": true},
      {"name": "Commit", "value": "abc1234"},
      {"name": "Duration", "value": "2m 34s"}
    ]
  }]
}

2. Deployment Success (Production)

json

{
  "embeds": [{
    "title": "✅ Deployment Successful - Production",
    "color": 3447003,
    "fields": [
      {"name": "Environment", "value": "production", "inline": true},
      {"name": "Branch", "value": "main", "inline": true},
      {"name": "Commit", "value": "abc1234"},
      {"name": "Duration", "value": "2m 45s"}
    ]
  }]
}

3. Staging Backend Down

json

{
  "embeds": [{
    "title": "🟢 Staging - Backend Down",
    "color": 3066993,
    "description": "Backend staging is not responding",
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Service", "value": "Backend", "inline": true},
      {"name": "Endpoint", "value": "https://staging-erpback.gunei.xyz/status"},
      {"name": "Response", "value": "Timeout (5s)"},
      {"name": "Time", "value": "2026-01-12 14:30:00"}
    ]
  }]
}

4. Production Backend Down (Crítico)

json

{
  "embeds": [{
    "title": "🔵 Production - Backend Down",
    "color": 15548997,
    "description": "🚨 CRITICAL: Production backend is not responding",
    "fields": [
      {"name": "Environment", "value": "production", "inline": true},
      {"name": "Service", "value": "Backend", "inline": true},
      {"name": "Endpoint", "value": "https://erpback.gunei.xyz/status"},
      {"name": "Response", "value": "Timeout (5s)"},
      {"name": "Time", "value": "2026-01-12 14:30:00"},
      {"name": "Action", "value": "@here Immediate attention required"}
    ]
  }]
}

5. Infrastructure Down (Muy Crítico)

json

{
  "embeds": [{
    "title": "🚨 Infrastructure Down - Affects All Environments",
    "color": 15105570,
    "description": "CRITICAL: Shared infrastructure is failing",
    "fields": [
      {"name": "Component", "value": "PostgreSQL Shared", "inline": true},
      {"name": "Impact", "value": "All environments", "inline": true},
      {"name": "Time", "value": "2026-01-12 14:30:00"},
      {"name": "Action", "value": "@everyone CRITICAL - ALL ENVIRONMENTS AFFECTED"}
    ]
  }]
}

6. Service Recovered

json

{
  "embeds": [{
    "title": "✅ Staging - Backend Recovered",
    "color": 5763719,
    "description": "Backend staging is responding normally",
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Downtime", "value": "15 minutes", "inline": true}
    ]
  }]
}

Rate Limits

Discord: 30 mensajes / minuto
Nuestro sistema: ~2-4 mensajes / 5 minutos (staging + production checks)
Infraestructura: ~1 mensaje / 5 minutos

Total máximo: ~5-6 mensajes / 5 minutos en operación normal

🐘 Monitoreo PostgreSQL Shared

Configuración del Servidor

bash

# Puerto expuesto al host
Puerto: 5433

# Conexión desde el host
psql -h localhost -p 5433 -U postgres

# Conexión desde otros containers (via Docker network)
Host: postgres-shared
Puerto: 5432 (puerto interno)

Health Checks PostgreSQL

bash

# Check básico - servidor responde
docker exec postgres-shared pg_isready -U postgres -p 5432

# Check staging database
docker exec postgres-shared pg_isready -U gunei_staging_user -d gunei_erp_staging

# Check production database
docker exec postgres-shared pg_isready -U gunei_prod_user -d gunei_erp_production

Métricas de PostgreSQL

bash

# Conexiones activas por database
docker exec postgres-shared psql -U postgres -c "
SELECT datname as database,
       count(*) as connections,
       count(*) FILTER (WHERE state = 'active') as active,
       count(*) FILTER (WHERE state = 'idle') as idle
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY datname;"

# Tamaño de databases
docker exec postgres-shared psql -U postgres -c "
SELECT datname as database,
       pg_size_pretty(pg_database_size(datname)) as size
FROM pg_database
WHERE datname LIKE 'gunei_%';"

# Queries lentas (últimas 10)
docker exec postgres-shared psql -U postgres -c "
SELECT pid, datname, usename,
       now() - query_start as duration,
       left(query, 50) as query_preview
FROM pg_stat_activity
WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;"

# Cache hit ratio (debería ser > 99%)
docker exec postgres-shared psql -U postgres -c "
SELECT datname,
       round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 2) as cache_hit_ratio
FROM pg_stat_database
WHERE datname LIKE 'gunei_%';"

Alertas PostgreSQL

Métrica	Warning	Critical
Conexiones totales	> 80	> 95
Conexiones staging	> 40	> 50
Conexiones production	> 40	> 50
Cache hit ratio	< 95%	< 90%
Query duration	> 5s	> 30s

📊 Comparación de Métricas Entre Ambientes

Script: compare-environments.sh

bash

#!/bin/bash
# /root/scripts/compare-environments.sh

echo "📊 Comparación Staging vs Production"
echo "====================================="
echo ""

# Response times
echo "⏱️  Response Times:"
STAGING_TIME=$(curl -o /dev/null -s -w '%{time_total}' https://staging-erpback.gunei.xyz/status)
echo "   Staging Backend:    ${STAGING_TIME}s"

if docker ps | grep -q "gunei-backend-production"; then
    PROD_TIME=$(curl -o /dev/null -s -w '%{time_total}' http://localhost:3100/status)
    echo "   Production Backend: ${PROD_TIME}s"
fi
echo ""

# Container resources
echo "💾 Container Resources:"
echo "   CONTAINER                    CPU%   MEM USAGE"
docker stats --no-stream --format "   {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep gunei
echo ""

# Database connections
echo "🗄️  Database Connections:"
docker exec postgres-shared psql -U postgres -t -c "
SELECT '   ' || datname || ': ' || count(*) || ' connections'
FROM pg_stat_activity
WHERE datname LIKE 'gunei_%'
GROUP BY datname;"
echo ""

# Database sizes
echo "📦 Database Sizes:"
docker exec postgres-shared psql -U postgres -t -c "
SELECT '   ' || datname || ': ' || pg_size_pretty(pg_database_size(datname))
FROM pg_database
WHERE datname LIKE 'gunei_%';"

Métricas Comparativas

Métrica	Staging	Production	Notas
Response time	< 500ms	< 200ms	Prod debe ser más rápido
Memory usage	< 512MB	< 1GB	Prod puede usar más recursos
DB connections	< 20	< 50	Prod tiene más carga
Error rate	< 5%	< 0.1%	Prod debe ser más estable

📈 Métricas y Alertas

Métricas Monitoreadas por Ambiente

Sistema (Compartido)

CPU usage: Via top / htop
Memory usage: Via free -h
Disk space: Via df -h
Network: Via netstat / ss

Staging

Response time: Health check timing
HTTP status codes: 200, 5xx
Database connections: Conexiones activas a gunei_erp_staging
Container health: Docker ps status staging

Production

Response time: Health check timing
HTTP status codes: 200, 5xx
Database connections: Conexiones activas a gunei_erp_production
Container health: Docker ps status production

Infrastructure

PostgreSQL: Conexiones totales, queries/segundo, cache hit ratio
Caddy: Requests/segundo, errores 5xx, SSL certificate expiry

Umbrales de Alerta

bash

# Disk space (sistema)
WARNING: 80%
CRITICAL: 90%

# Memory (sistema)
WARNING: 85%
CRITICAL: 95%

# Response time (por ambiente)
WARNING: > 2 segundos
CRITICAL: > 5 segundos (timeout)

# Consecutive failures (por ambiente)
STAGING: 2 fallos consecutivos → alerta
PRODUCTION: 1 fallo → alerta inmediata
INFRASTRUCTURE: 1 fallo → alerta crítica inmediata

# PostgreSQL connections (compartido)
WARNING: > 80 conexiones
CRITICAL: > 95 conexiones (max 100)

📊 Dashboard Consolidado

Script: metrics-dashboard.sh (Nuevo)

Propósito: Dashboard visual consolidado de todos los ambientes

bash

#!/bin/bash
# /root/scripts/metrics-dashboard.sh

echo "╔═══════════════════════════════════════════════════════════╗"
echo "║        Gunei ERP - Multi-Environment Dashboard           ║"
echo "╚═══════════════════════════════════════════════════════════╝"
echo ""

# Function to check service
check_url() {
    if curl -f -s -o /dev/null "$1"; then
        echo "✅"
    else
        echo "❌"
    fi
}

# Infrastructure
echo "┌─ Infrastructure (Shared) ──────────────────────────────────"
POSTGRES_STATUS=$(docker exec postgres-shared pg_isready -U postgres 2>/dev/null && echo "✅" || echo "❌")
CADDY_STATUS=$(docker ps | grep -q caddy-shared && echo "✅" || echo "❌")
echo "│ PostgreSQL Shared: $POSTGRES_STATUS"
echo "│ Caddy Shared:      $CADDY_STATUS"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Staging
echo "┌─ Staging Environment 🟢 ───────────────────────────────────"
STAGING_BACKEND=$(check_url "https://staging-erpback.gunei.xyz/status")
STAGING_FRONTEND=$(check_url "https://staging-erpfront.gunei.xyz/health")
STAGING_DB=$(docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1 && echo "✅" || echo "❌")

echo "│ Backend:  $STAGING_BACKEND  https://staging-erpback.gunei.xyz"
echo "│ Frontend: $STAGING_FRONTEND  https://staging-erpfront.gunei.xyz"
echo "│ Database: $STAGING_DB  gunei_erp_staging"

# Container status
STAGING_BACKEND_CONTAINER=$(docker ps | grep gunei-backend-staging | awk '{print $7, $8, $9}')
STAGING_FRONTEND_CONTAINER=$(docker ps | grep gunei-frontend-staging | awk '{print $7, $8, $9}')
echo "│   Backend container:  Up $STAGING_BACKEND_CONTAINER"
echo "│   Frontend container: Up $STAGING_FRONTEND_CONTAINER"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Production (si está activo)
if docker ps | grep -q "gunei-backend-production"; then
    echo "┌─ Production Environment 🔵 ─────────────────────────────────"
    PROD_BACKEND=$(check_url "http://localhost:3100/status")
    PROD_FRONTEND=$(check_url "http://localhost:3101/health")
    PROD_DB=$(docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1 && echo "✅" || echo "❌")
    
    echo "│ Backend:  $PROD_BACKEND  (port 3100)"
    echo "│ Frontend: $PROD_FRONTEND  (port 3101)"
    echo "│ Database: $PROD_DB  gunei_erp_production"
    
    PROD_BACKEND_CONTAINER=$(docker ps | grep gunei-backend-production | awk '{print $7, $8, $9}')
    PROD_FRONTEND_CONTAINER=$(docker ps | grep gunei-frontend-production | awk '{print $7, $8, $9}')
    echo "│   Backend container:  Up $PROD_BACKEND_CONTAINER"
    echo "│   Frontend container: Up $PROD_FRONTEND_CONTAINER"
    echo "└────────────────────────────────────────────────────────────"
    echo ""
fi

# System Resources
echo "┌─ System Resources ──────────────────────────────────────────"
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}')
MEM_USAGE=$(free -h | awk '/^Mem:/ {print $3 "/" $2}')
CPU_LOAD=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}')

echo "│ Disk:   $DISK_USAGE used"
echo "│ Memory: $MEM_USAGE"
echo "│ Load:   $CPU_LOAD"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Recent Events
echo "┌─ Recent Events (Last 5) ────────────────────────────────────"
grep -E "\[ERROR\]|\[WARNING\]" /var/log/gunei-health.log | tail -n 5 | while read line; do
    echo "│ $line"
done
echo "└────────────────────────────────────────────────────────────"
echo ""

echo "Last updated: $(date '+%Y-%m-%d %H:%M:%S')"

Uso:

bash

# Ejecutar dashboard
/root/scripts/metrics-dashboard.sh

# Ejecutar cada hora automáticamente (ya configurado en cron)
# O ejecutar manualmente cuando se necesite

Output ejemplo:

╔═══════════════════════════════════════════════════════════╗
║        Gunei ERP - Multi-Environment Dashboard           ║
╚═══════════════════════════════════════════════════════════╝

┌─ Infrastructure (Shared) ──────────────────────────────────
│ PostgreSQL Shared: ✅
│ Caddy Shared:      ✅
└────────────────────────────────────────────────────────────

┌─ Staging Environment 🟢 ───────────────────────────────────
│ Backend:  ✅  https://staging-erpback.gunei.xyz
│ Frontend: ✅  https://staging-erpfront.gunei.xyz
│ Database: ✅  gunei_erp_staging
│   Backend container:  Up 2 days
│   Frontend container: Up 2 days
└────────────────────────────────────────────────────────────

┌─ Production Environment 🔵 ─────────────────────────────────
│ Backend:  ✅  (port 3100)
│ Frontend: ✅  (port 3101)
│ Database: ✅  gunei_erp_production
│   Backend container:  Up 1 day
│   Frontend container: Up 1 day
└────────────────────────────────────────────────────────────

┌─ System Resources ──────────────────────────────────────────
│ Disk:   23% used
│ Memory: 4.2G/8.0G
│ Load:   0.45
└────────────────────────────────────────────────────────────

┌─ Recent Events (Last 5) ────────────────────────────────────
│ [2026-01-12 13:15:00] [WARNING] [STAGING] High response time
└────────────────────────────────────────────────────────────

Last updated: 2026-01-12 14:30:00

🔧 Troubleshooting

Health Check Fallando en Staging pero Production OK

Síntoma: Alertas de staging pero production funciona normal

Diagnóstico:

bash

# Ver logs específicos de staging
grep "\[STAGING\]" /var/log/gunei-health.log | tail -n 20

# Check manual de staging
curl -v https://staging-erpback.gunei.xyz/status
curl -v https://staging-erpfront.gunei.xyz/health

# Ver logs de containers staging
docker logs gunei-backend-staging --tail 50
docker logs gunei-frontend-staging --tail 50

# Verificar que no hay conflicto de recursos con production
docker stats --no-stream | grep staging

Solución:

bash

# Si staging está lento o crasheando
docker restart gunei-backend-staging
docker restart gunei-frontend-staging

# Verificar database staging específicamente
docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;"

Production Down pero Staging OK

Síntoma: Production no responde, staging operativo

Diagnóstico:

bash

# Ver logs de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | tail -n 20

# Check manual de production
curl -v http://localhost:3100/status
curl -v http://localhost:3101/health

# Ver containers production
docker logs gunei-backend-production --tail 50
docker logs gunei-frontend-production --tail 50

Solución:

bash

# Reiniciar production services
cd /opt/apps/gunei-erp/backend/production
docker compose restart backend

cd /opt/apps/gunei-erp/frontend/production
docker compose restart frontend

Infrastructure Down (Afecta Ambos Ambientes)

Síntoma: Tanto staging como production fallan simultáneamente

Diagnóstico:

bash

# Check infrastructure
/root/scripts/check-infrastructure.sh

# Ver logs de PostgreSQL
docker logs postgres-shared --tail 100

# Ver logs de Caddy
docker logs caddy-shared --tail 100

# Verificar que los contenedores están corriendo
docker ps | grep -E "postgres-shared|caddy-shared"

Solución:

bash

# Reiniciar PostgreSQL (CUIDADO: afecta ambos ambientes)
docker restart postgres-shared
sleep 10

# Reiniciar Caddy
docker restart caddy-shared

# Verificar que todo volvió
/root/scripts/health-check.sh

Demasiadas Notificaciones Discord (Multi-Ambiente)

Síntoma: Spam de alertas de múltiples ambientes

Diagnóstico:

bash

# Ver frecuencia de alertas por ambiente
grep "Discord notification sent" /var/log/gunei-health.log | grep "\[STAGING\]" | tail -n 20
grep "Discord notification sent" /var/log/gunei-health.log | grep "\[PRODUCTION\]" | tail -n 20

# Identificar ambiente problemático
grep "\[ERROR\]" /var/log/gunei-health.log | tail -n 50

Solución:

bash

# Implementar cooldown por ambiente en alert-check.sh
# O ajustar frecuencia de cron
# O implementar diferentes umbrales por ambiente:
#   - Staging: 3 fallos consecutivos antes de alertar
#   - Production: 1 fallo inmediato

Logs No Rotan / Disco Lleno

Síntoma: Disco > 90%, logs gigantes de múltiples ambientes

Diagnóstico:

bash

# Ver tamaño de logs
du -sh /var/log/gunei-*.log

# Ver tamaño por container
du -sh /var/lib/docker/containers/*/

Solución:

bash

# Forzar rotación
logrotate -f /etc/logrotate.d/gunei-health

# Limpiar logs viejos
find /var/log -name "*.gz" -mtime +30 -delete

# Limpiar logs de Docker
docker system prune -a --volumes

Cron No Ejecuta Scripts

Ver troubleshooting en versión anterior - no cambia.

Conflictos Entre Ambientes

Síntoma: Un ambiente afecta al otro (staging lento cuando production está bajo carga, o viceversa)

Diagnóstico:

bash

# Ver recursos por container
docker stats --no-stream | grep gunei

# Ver conexiones de DB por ambiente
docker exec postgres-shared psql -U postgres -c "
SELECT datname, count(*), state
FROM pg_stat_activity
WHERE datname LIKE 'gunei_%'
GROUP BY datname, state
ORDER BY datname;"

# Verificar que cada ambiente usa su DB correcta
docker logs gunei-backend-staging 2>&1 | grep -i "database\|connection"
docker logs gunei-backend-production 2>&1 | grep -i "database\|connection"

Posibles causas:

Conexiones DB cruzadas: Backend staging conectando a DB production
Puertos incorrectos: Production usando puerto de staging
Recursos compartidos saturados: PostgreSQL/Caddy al límite
Variables de entorno incorrectas: DATABASE_URL apuntando al ambiente incorrecto

Solución:

bash

# Verificar puertos no colisionan
ss -tlnp | grep -E "3000|3001|3100|3101"

# Output esperado:
# :3000 - gunei-backend-staging
# :3001 - gunei-frontend-staging
# :3100 - gunei-backend-production
# :3101 - gunei-frontend-production

# Si hay conflicto, recrear container con puerto correcto
# Staging backend debe mapear 3000:3000
# Production backend debe mapear 3100:3000

# Verificar DATABASE_URL en cada ambiente
docker exec gunei-backend-staging printenv | grep DATABASE
# Debe mostrar: gunei_erp_staging

docker exec gunei-backend-production printenv | grep DATABASE
# Debe mostrar: gunei_erp_production

# Si está mal configurado, actualizar docker-compose y recrear

Puerto en Uso por Otro Proceso

Síntoma: Container no inicia, error "port already in use"

Diagnóstico:

bash

# Ver qué proceso usa el puerto
lsof -i :3000
lsof -i :3001
lsof -i :3100
lsof -i :3101
lsof -i :5433

# Ver todos los containers (incluso detenidos)
docker ps -a | grep gunei

Solución:

bash

# Si es un container zombie
docker rm -f <container_id>

# Si es otro proceso
kill <PID>

# Reiniciar el container afectado
docker compose up -d

Database Connections Exhausted

Síntoma: "too many connections" error en logs de backend

Diagnóstico:

bash

# Ver conexiones totales
docker exec postgres-shared psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"

# Ver por database
docker exec postgres-shared psql -U postgres -c "
SELECT datname, count(*)
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY datname;"

# Ver conexiones idle antiguas
docker exec postgres-shared psql -U postgres -c "
SELECT datname, usename, state, query_start, now() - query_start as duration
FROM pg_stat_activity
WHERE state = 'idle'
ORDER BY duration DESC
LIMIT 20;"

Solución:

bash

# Terminar conexiones idle antiguas de staging
docker exec postgres-shared psql -U postgres -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'gunei_erp_staging'
AND state = 'idle'
AND query_start < now() - interval '30 minutes';"

# Igual para production si es necesario (con más cuidado)
docker exec postgres-shared psql -U postgres -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'gunei_erp_production'
AND state = 'idle'
AND query_start < now() - interval '1 hour';"

# Reiniciar backends para limpiar connection pools
docker restart gunei-backend-staging
docker restart gunei-backend-production

📚 Referencias

Scripts de Monitoreo

/root/scripts/monitor-logs.sh: Ver logs multi-ambiente
/root/scripts/health-check.sh: Check completo
/root/scripts/alert-check.sh: Alertas automatizadas
/root/scripts/check-staging.sh: Check rápido staging
/root/scripts/check-production.sh: Check rápido production
/root/scripts/check-infrastructure.sh: Check infraestructura
/root/scripts/metrics-dashboard.sh: Dashboard consolidado
/root/scripts/compare-environments.sh: Comparar staging vs production

URLs de Monitoreo

Staging:

Frontend: https://staging-erpfront.gunei.xyz/health
Backend: https://staging-erpback.gunei.xyz/status

Production

Frontend: https://erpfront.gunei.xyz/health
Backend: https://erpback.gunei.xyz/status

Última actualización: 14 Enero 2026 Versión: 2.1.0

Cambios en v2.1:

✅ Documentación detallada de PostgreSQL Shared (puerto 5433)
✅ Mapeo de puertos por ambiente (staging 3000/3001, production 3100/3101)
✅ Conexiones a DB por ambiente documentadas
✅ Métricas PostgreSQL: conexiones, tamaños, queries lentas, cache hit ratio
✅ Comparación de métricas entre ambientes (compare-environments.sh)
✅ Docker Logging Configuration (json-file, rotación automática, max 10m/3 files)
✅ Timezone Argentina (TZ=America/Argentina/Buenos_Aires)
✅ Troubleshooting: conflictos entre ambientes
✅ Troubleshooting: puertos en uso
✅ Troubleshooting: database connections exhausted

Cambios en v2.0:

✅ Soporte multi-ambiente (staging + production)
✅ Monitoreo de infraestructura compartida (PostgreSQL, Caddy)
✅ Scripts actualizados con detección automática de ambientes
✅ Logs centralizados con tags de ambiente
✅ Notificaciones Discord contextuales por ambiente
✅ Dashboard consolidado multi-ambiente
✅ Health checks independientes por ambiente
✅ Troubleshooting específico multi-ambiente
✅ Nuevos scripts: check-staging, check-production, check-infrastructure, metrics-dashboard

📊 Monitoring Guide - Gunei ERP (Enterprise Architecture) #

📋 Tabla de Contenidos #

🎯 Descripción General #

Objetivos #

Componentes por Ambiente #

Filosofía de Monitoreo #

🏗️ Arquitectura de Monitoreo #

Servicios Monitoreados #

Ubicación de Scripts #

🥦 Health Checks por Ambiente #

Endpoints por Ambiente #

Staging Environment #

Production Environment (Cuando esté activo) #

Infrastructure (Compartida) #

Mapeo de Puertos por Ambiente #

Conexiones a Base de Datos por Ambiente #

Criterios de Salud #

🔧 Scripts de Monitoreo #

1. monitor-logs.sh (Actualizado) #

2. health-check.sh (Actualizado) #

3. alert-check.sh (Actualizado) #

4. check-staging.sh (Nuevo) #

5. check-production.sh (Nuevo) #

6. check-infrastructure.sh (Nuevo) #

7. metrics-dashboard.sh (Nuevo) #

📝 Logs Centralizados #

Archivo Principal #

Formato de Logs Actualizado #

Rotación de Logs #

Ver Logs por Ambiente #

Docker Logging Configuration #

Timezone #

Logs de Contenedores por Ambiente #

⏰ Cron Jobs #

Configuración Actual #

Schedule Completo (Actualizado) #

Verificar Ejecución #

📢 Notificaciones Discord #

Webhook Configurado #

Tipos de Notificaciones Actualizadas #

1. Deployment Success (Staging) #

2. Deployment Success (Production) #

3. Staging Backend Down #

4. Production Backend Down (Crítico) #

5. Infrastructure Down (Muy Crítico) #

6. Service Recovered #

Rate Limits #

🐘 Monitoreo PostgreSQL Shared #

Configuración del Servidor #

Health Checks PostgreSQL #

Métricas de PostgreSQL #

Alertas PostgreSQL #

📊 Comparación de Métricas Entre Ambientes #

Script: compare-environments.sh #

Métricas Comparativas #

📈 Métricas y Alertas #

Métricas Monitoreadas por Ambiente #

Sistema (Compartido) #

Staging #

Production #

Infrastructure #

Umbrales de Alerta #

📊 Dashboard Consolidado #

Script: metrics-dashboard.sh (Nuevo) #

🔧 Troubleshooting #

Health Check Fallando en Staging pero Production OK #

Production Down pero Staging OK #

Infrastructure Down (Afecta Ambos Ambientes) #

Demasiadas Notificaciones Discord (Multi-Ambiente) #

Logs No Rotan / Disco Lleno #

Cron No Ejecuta Scripts #

Conflictos Entre Ambientes #

Puerto en Uso por Otro Proceso #

Database Connections Exhausted #

📚 Referencias #

Scripts de Monitoreo #

URLs de Monitoreo #

📊 Monitoring Guide - Gunei ERP (Enterprise Architecture)

📋 Tabla de Contenidos

🎯 Descripción General

Objetivos

Componentes por Ambiente

Filosofía de Monitoreo

🏗️ Arquitectura de Monitoreo

Servicios Monitoreados

Ubicación de Scripts

🥦 Health Checks por Ambiente

Endpoints por Ambiente

Staging Environment

Production Environment (Cuando esté activo)

Infrastructure (Compartida)

Mapeo de Puertos por Ambiente

Conexiones a Base de Datos por Ambiente

Criterios de Salud

🔧 Scripts de Monitoreo

1. monitor-logs.sh (Actualizado)

2. health-check.sh (Actualizado)

3. alert-check.sh (Actualizado)

4. check-staging.sh (Nuevo)

5. check-production.sh (Nuevo)

6. check-infrastructure.sh (Nuevo)

7. metrics-dashboard.sh (Nuevo)

📝 Logs Centralizados

Archivo Principal

Formato de Logs Actualizado

Rotación de Logs

Ver Logs por Ambiente

Docker Logging Configuration

Timezone

Logs de Contenedores por Ambiente

⏰ Cron Jobs

Configuración Actual

Schedule Completo (Actualizado)

Verificar Ejecución

📢 Notificaciones Discord

Webhook Configurado

Tipos de Notificaciones Actualizadas

1. Deployment Success (Staging)

2. Deployment Success (Production)

3. Staging Backend Down

4. Production Backend Down (Crítico)

5. Infrastructure Down (Muy Crítico)

6. Service Recovered

Rate Limits

🐘 Monitoreo PostgreSQL Shared

Configuración del Servidor

Health Checks PostgreSQL

Métricas de PostgreSQL

Alertas PostgreSQL

📊 Comparación de Métricas Entre Ambientes

Script: compare-environments.sh

Métricas Comparativas

📈 Métricas y Alertas

Métricas Monitoreadas por Ambiente

Sistema (Compartido)

Staging

Production

Infrastructure

Umbrales de Alerta

📊 Dashboard Consolidado

Script: metrics-dashboard.sh (Nuevo)

🔧 Troubleshooting

Health Check Fallando en Staging pero Production OK

Production Down pero Staging OK

Infrastructure Down (Afecta Ambos Ambientes)

Demasiadas Notificaciones Discord (Multi-Ambiente)

Logs No Rotan / Disco Lleno

Cron No Ejecuta Scripts

Conflictos Entre Ambientes

Puerto en Uso por Otro Proceso

Database Connections Exhausted

📚 Referencias

Scripts de Monitoreo

URLs de Monitoreo