Skip to content
On this page

📊 Monitoring Guide - Gunei ERP (Enterprise Architecture)

Sistema completo de monitoreo y observabilidad multi-ambiente del sistema Gunei ERP.

Versión 2.1 - Multi-Ambiente Enterprise (Staging + Production)


📋 Tabla de Contenidos


🎯 Descripción General

Objetivos

  • Detectar problemas antes de que afecten a usuarios en cada ambiente
  • Centralizar logs de múltiples servicios y ambientes
  • Automatizar health checks cada 5 minutos (staging + production)
  • Notificar fallos vía Discord identificando ambiente
  • Mantener histórico de eventos por ambiente
  • Monitorear infraestructura compartida (PostgreSQL, Caddy)

Componentes por Ambiente

┌─ STAGING ─────────────────────────┐
│ Frontend Staging → Health Check   │
│ Backend Staging  → Health Check   │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘

┌─ PRODUCTION ───────────────────────┐
│ Frontend Production → Health Check │
│ Backend Production  → Health Check │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘

┌─ INFRASTRUCTURE ───────────────────┐
│ PostgreSQL Shared → Health Check   │
│ Caddy Shared     → Health Check    │
│        ↓                           │
│   Logs + Metrics                   │
└────────────────────────────────────┘

        Logs Centralizados

         Discord Webhooks

Filosofía de Monitoreo

  • Por Ambiente: Cada ambiente (staging/production) se monitorea independientemente
  • Infraestructura Compartida: PostgreSQL y Caddy se monitorean como servicios críticos que afectan a ambos ambientes
  • Alertas Contextuales: Las notificaciones identifican claramente el ambiente afectado
  • Logs Segregados: Logs separados por ambiente pero centralizados para análisis
  • Redundancia: Si staging falla, production puede seguir operando (y viceversa)

🏗️ Arquitectura de Monitoreo

Servicios Monitoreados

ComponenteTipoAfecta aHealth Endpoint
Caddy SharedInfraestructuraAmbos ambientesN/A (proceso)
PostgreSQL SharedInfraestructuraAmbos ambientespg_isready (puerto 5433)
Frontend StagingAplicaciónStaginghttps://staging-erpfront.gunei.xyz/health
Backend StagingAplicaciónStaginghttps://staging-erpback.gunei.xyz/status
Frontend ProductionAplicaciónProduction(pendiente URL)
Backend ProductionAplicaciónProduction(pendiente URL)

Ubicación de Scripts

bash
/root/scripts/
├── monitor-logs.sh         # Ver logs multi-ambiente (actualizado)
├── health-check.sh         # Verificar salud completa (actualizado)
├── alert-check.sh          # Health check + alertas (actualizado)
├── check-staging.sh        # Check solo staging (nuevo)
├── check-production.sh     # Check solo production (nuevo)
├── check-infrastructure.sh # Check servicios compartidos (nuevo)
└── metrics-dashboard.sh    # Dashboard consolidado (nuevo)

🥦 Health Checks por Ambiente

Endpoints por Ambiente

Staging Environment

Frontend Staging:

bash
GET https://staging-erpfront.gunei.xyz/health

Response:
{
  "status": "healthy",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "service": "gunei-erp-frontend",
  "version": "0.0.1",
  "runtime": "bun",
  "environment": "staging"
}

Backend Staging:

bash
GET https://staging-erpback.gunei.xyz/status

Response:
{
  "status": "ok",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "environment": "staging",
  "database": "connected",
  "uptime": 123456
}

Production Environment (Cuando esté activo)

Frontend Production:

bash
GET https://erpfront.gunei.xyz/health  # URL pendiente configurar

Response:
{
  "status": "healthy",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "service": "gunei-erp-frontend",
  "environment": "production"
}

Backend Production:

bash
GET https://erpback.gunei.xyz/status  # URL pendiente configurar

Response:
{
  "status": "ok",
  "timestamp": "2026-01-12T12:34:56.789Z",
  "environment": "production",
  "database": "connected"
}

Infrastructure (Compartida)

PostgreSQL Shared:

bash
# Check staging database
docker exec postgres-shared pg_isready -U gunei_staging_user -d gunei_erp_staging

# Check production database
docker exec postgres-shared pg_isready -U gunei_prod_user -d gunei_erp_production

# Check server
docker exec postgres-shared pg_isready -U postgres

Caddy Shared:

bash
# Check proceso
docker ps | grep caddy-shared

# Check logs para errores
docker logs caddy-shared --tail 50 | grep -i error

Mapeo de Puertos por Ambiente

AmbienteServicioPuerto InternoPuerto Host
SharedPostgreSQL54325433
StagingBackend30003000
StagingFrontend30013001
ProductionBackend30003100
ProductionFrontend30013101

Verificación de puertos:

bash
# Ver todos los puertos en uso
ss -tlnp | grep -E "3000|3001|3100|3101|5433"

# Verificar que no hay conflictos
netstat -tlnp | grep -E ":300[01]|:310[01]|:5433"

Conexiones a Base de Datos por Ambiente

AmbienteDatabaseUsuarioHost
Staginggunei_erp_staginggunei_staging_userpostgres-shared:5432
Productiongunei_erp_productiongunei_prod_userpostgres-shared:5432

Verificar conexión correcta:

bash
# Staging - debe conectar a gunei_erp_staging
docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "\conninfo"

# Production - debe conectar a gunei_erp_production
docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "\conninfo"

# Ver conexiones activas por database
docker exec postgres-shared psql -U postgres -c "SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;"

Criterios de Salud

Por Servicio:

  • Status 200: Sistema operativo
  • Status 5xx: Fallo crítico
  • Timeout (5s): Servicio no responde
  • Database: Conexión verificada

Por Ambiente:

  • Staging Healthy: Ambos servicios (frontend + backend) responden OK
  • Production Healthy: Ambos servicios responden OK
  • Infrastructure Healthy: PostgreSQL + Caddy operativos

🔧 Scripts de Monitoreo

1. monitor-logs.sh (Actualizado)

Propósito: Visualizar logs de todos los servicios y ambientes simultáneamente

bash
#!/bin/bash
# Ver logs multi-ambiente en una sola pantalla

# Uso
/root/scripts/monitor-logs.sh

# Ver logs de un ambiente específico
/root/scripts/monitor-logs.sh staging
/root/scripts/monitor-logs.sh production

# Ver logs de infraestructura
/root/scripts/monitor-logs.sh infrastructure

Ejemplo de implementación actualizada:

bash
#!/bin/bash
# /root/scripts/monitor-logs.sh

ENVIRONMENT=${1:-all}

echo "==================================="
echo "📋 Gunei ERP - Monitor de Logs"
echo "==================================="
echo ""

show_logs() {
    service=$1
    title=$2
    echo "📦 === $title ==="
    if docker ps | grep -q $service; then
        docker logs --tail 20 $service 2>&1 | tail -10
    else
        echo "⚠️  Container $service no está corriendo"
    fi
    echo ""
}

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "infrastructure" ]; then
    echo "🏗️ === INFRASTRUCTURE ==="
    show_logs "postgres-shared" "PostgreSQL Shared"
    show_logs "caddy-shared" "Caddy Shared"
    echo ""
fi

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "staging" ]; then
    echo "🟢 === STAGING ==="
    show_logs "gunei-backend-staging" "Backend Staging"
    show_logs "gunei-frontend-staging" "Frontend Staging"
    echo ""
fi

if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "production" ]; then
    if docker ps | grep -q "gunei-backend-production"; then
        echo "🔵 === PRODUCTION ==="
        show_logs "gunei-backend-production" "Backend Production"
        show_logs "gunei-frontend-production" "Frontend Production"
        echo ""
    fi
fi

echo "📊 === Estado de Containers ==="
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
echo ""

echo "💾 === Uso de Recursos ==="
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Servicios monitoreados:

  • Infraestructura: PostgreSQL Shared, Caddy Shared
  • Staging: Backend Staging, Frontend Staging
  • Production: Backend Production, Frontend Production (cuando esté deployado)

Características:

  • Output coloreado por ambiente
  • Filtrado por ambiente específico
  • Timestamps sincronizados
  • Detección automática de ambientes activos

2. health-check.sh (Actualizado)

Propósito: Verificar estado del sistema completo por ambiente

bash
#!/bin/bash
# Check completo de salud multi-ambiente

# Uso
/root/scripts/health-check.sh

# Check solo un ambiente
/root/scripts/health-check.sh staging
/root/scripts/health-check.sh production

Ejemplo de implementación actualizada:

bash
#!/bin/bash
# /root/scripts/health-check.sh

ENVIRONMENT=${1:-all}

echo "🥦 Health Check - Gunei ERP (Multi-Environment)"
echo "================================================"
echo ""

# Colores
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

check_service() {
    service=$1
    url=$2

    if curl -f -s -o /dev/null "$url"; then
        echo -e "${GREEN}$service: OK${NC}"
        return 0
    else
        echo -e "${RED}$service: FAIL${NC}"
        return 1
    fi
}

# Infrastructure
echo "🏗️  Infraestructura:"
if docker exec postgres-shared pg_isready -U postgres > /dev/null 2>&1; then
    echo -e "${GREEN}✅ PostgreSQL Shared: OK${NC}"
else
    echo -e "${RED}❌ PostgreSQL Shared: FAIL${NC}"
fi

if docker ps | grep -q caddy-shared; then
    echo -e "${GREEN}✅ Caddy Shared: OK${NC}"
else
    echo -e "${RED}❌ Caddy Shared: FAIL${NC}"
fi

# Staging
if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "staging" ]; then
    echo ""
    echo "🟢 Staging Environment:"
    check_service "Backend Staging /status" "http://localhost:3000/status"
    check_service "Frontend Staging /health" "http://localhost:3001/health"
    check_service "HTTPS staging-erpback.gunei.xyz" "https://staging-erpback.gunei.xyz/status"
    check_service "HTTPS gunei.xyz" "https://staging-erpfront.gunei.xyz/health"
fi

# Production (si existe)
if [ "$ENVIRONMENT" = "all" ] || [ "$ENVIRONMENT" = "production" ]; then
    if docker ps | grep -q "gunei-backend-production"; then
        echo ""
        echo "🔵 Production Environment:"
        check_service "Backend Production /status" "http://localhost:3100/status"
        check_service "Frontend Production /health" "http://localhost:3101/health"
        # URLs públicas cuando estén configuradas:
        # check_service "HTTPS erpback.gunei.xyz" "https://erpback.gunei.xyz/status"
        # check_service "HTTPS erpfront.gunei.xyz" "https://erpfront.gunei.xyz/health"
    fi
fi

# Databases
echo ""
echo "🗄️  Databases:"
if docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1; then
    echo -e "${GREEN}✅ DB Staging: OK${NC}"
else
    echo -e "${RED}❌ DB Staging: FAIL${NC}"
fi

if docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1; then
    echo -e "${GREEN}✅ DB Production: OK${NC}"
else
    echo -e "${YELLOW}⚠️  DB Production: No disponible o no configurada${NC}"
fi

# Verificar containers corriendo
echo ""
echo "📦 Docker Containers:"
docker ps --format "table {{.Names}}\t{{.Status}}"

# Uso de recursos
echo ""
echo "💾 System Resources:"
df -h / | tail -1 | awk '{print "Disk: "$3" / "$2" ("$5" used)"}'
free -h | grep Mem | awk '{print "RAM:  "$3" / "$2" (used/total)"}'

Output ejemplo:

🥦 Health Check - Gunei ERP (Multi-Environment)
================================================

🏗️  Infraestructura:
✅ PostgreSQL Shared: OK
✅ Caddy Shared: OK

🟢 Staging Environment:
✅ Backend Staging /status: OK
✅ Frontend Staging /health: OK
✅ HTTPS staging-erpback.gunei.xyz: OK
✅ HTTPS staging-erpfront.gunei.xyz: OK

🔵 Production Environment:
✅ Backend Production /status: OK
✅ Frontend Production /health: OK

🗄️  Databases:
✅ DB Staging: OK
✅ DB Production: OK

📦 Docker Containers:
NAMES                        STATUS
caddy-shared                 Up 5 days
postgres-shared              Up 5 days
gunei-backend-staging        Up 2 days
gunei-frontend-staging       Up 2 days
gunei-backend-production     Up 1 day
gunei-frontend-production    Up 1 day

💾 System Resources:
Disk: 45G / 200G (23% used)
RAM:  4.2G / 8.0G (used/total)

3. alert-check.sh (Actualizado)

Propósito: Health check automatizado con notificaciones por ambiente

bash
#!/bin/bash
# Ejecutado por cron cada 5 minutos

# Funciones:
# - Ejecuta health checks por ambiente
# - Detecta fallos específicos del ambiente
# - Envía alertas a Discord identificando ambiente
# - Registra en log centralizado con tags de ambiente

Lógica de alertas actualizada:

  • Primera falla (staging): Alerta inmediata "🟢 Staging Down"
  • Primera falla (production): Alerta inmediata "🔵 Production Down"
  • Falla de infraestructura: Alerta crítica "🚨 Infrastructure Down (affects all environments)"
  • Fallas consecutivas: Alerta cada 15 minutos
  • Recuperación: Notificación de sistema restaurado con downtime

Ejemplo de implementación:

bash
#!/bin/bash
# /root/scripts/alert-check.sh

LOG_FILE="/var/log/gunei-health.log"
STATE_DIR="/var/tmp/gunei-health-state"
DISCORD_WEBHOOK="${DISCORD_WEBHOOK_URL}"

mkdir -p "$STATE_DIR"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

check_and_alert() {
    ENV=$1
    SERVICE=$2
    URL=$3
    STATE_FILE="$STATE_DIR/${ENV}_${SERVICE}_state"
    
    if curl -f -s -o /dev/null "$URL"; then
        # Service is UP
        if [ -f "$STATE_FILE" ]; then
            # Was down, now recovered
            DOWNTIME=$(cat "$STATE_FILE")
            log "[INFO] [$ENV] $SERVICE recovered (downtime: $DOWNTIME)"
            send_discord_recovery "$ENV" "$SERVICE" "$DOWNTIME"
            rm "$STATE_FILE"
        fi
    else
        # Service is DOWN
        if [ ! -f "$STATE_FILE" ]; then
            # First failure
            echo "$(date +%s)" > "$STATE_FILE"
            log "[ERROR] [$ENV] $SERVICE is DOWN"
            send_discord_alert "$ENV" "$SERVICE" "$URL"
        else
            # Still down
            START_TIME=$(cat "$STATE_FILE")
            CURRENT_TIME=$(date +%s)
            DOWNTIME=$((CURRENT_TIME - START_TIME))
            log "[ERROR] [$ENV] $SERVICE still DOWN (${DOWNTIME}s)"
        fi
    fi
}

send_discord_alert() {
    ENV=$1
    SERVICE=$2
    URL=$3
    
    if [ "$ENV" = "staging" ]; then
        COLOR="3066993"  # Verde oscuro
        EMOJI="🟢"
    elif [ "$ENV" = "production" ]; then
        COLOR="15548997"  # Rojo
        EMOJI="🔵"
    else
        COLOR="15105570"  # Naranja
        EMOJI="🚨"
    fi
    
    curl -X POST "$DISCORD_WEBHOOK" \
        -H "Content-Type: application/json" \
        -d "{
            \"embeds\": [{
                \"title\": \"${EMOJI} ${ENV^^} - ${SERVICE} Down\",
                \"color\": ${COLOR},
                \"description\": \"Service is not responding\",
                \"fields\": [
                    {\"name\": \"Environment\", \"value\": \"$ENV\", \"inline\": true},
                    {\"name\": \"Service\", \"value\": \"$SERVICE\", \"inline\": true},
                    {\"name\": \"URL\", \"value\": \"$URL\"},
                    {\"name\": \"Time\", \"value\": \"$(date '+%Y-%m-%d %H:%M:%S')\"}
                ]
            }]
        }"
}

send_discord_recovery() {
    ENV=$1
    SERVICE=$2
    DOWNTIME=$3
    
    curl -X POST "$DISCORD_WEBHOOK" \
        -H "Content-Type: application/json" \
        -d "{
            \"embeds\": [{
                \"title\": \"${ENV^^} - ${SERVICE} Recovered\",
                \"color\": 5763719,
                \"description\": \"Service is responding normally\",
                \"fields\": [
                    {\"name\": \"Environment\", \"value\": \"$ENV\", \"inline\": true},
                    {\"name\": \"Downtime\", \"value\": \"${DOWNTIME} seconds\", \"inline\": true}
                ]
            }]
        }"
}

# Main - Check all services
log "[INFO] === Health Check Started ==="

# Infrastructure
check_and_alert "infrastructure" "PostgreSQL" "direct-pg-check"
check_and_alert "infrastructure" "Caddy" "docker-ps-check"

# Staging
check_and_alert "staging" "Backend" "https://staging-erpback.gunei.xyz/status"
check_and_alert "staging" "Frontend" "https://staging-erpfront.gunei.xyz/health"

# Production (si está activo)
if docker ps | grep -q "gunei-backend-production"; then
    check_and_alert "production" "Backend" "http://localhost:3100/status"
    check_and_alert "production" "Frontend" "http://localhost:3101/health"
fi

log "[INFO] === Health Check Completed ==="

4. check-staging.sh (Nuevo)

Propósito: Health check rápido solo del ambiente staging

bash
#!/bin/bash
# Check rápido de staging

/root/scripts/health-check.sh staging

5. check-production.sh (Nuevo)

Propósito: Health check rápido solo del ambiente production

bash
#!/bin/bash
# Check rápido de production

/root/scripts/health-check.sh production

6. check-infrastructure.sh (Nuevo)

Propósito: Health check de servicios compartidos críticos

bash
#!/bin/bash
# Check de infraestructura compartida

echo "🏗️  Infrastructure Health Check"
echo "=============================="
echo ""

# PostgreSQL
echo "PostgreSQL Shared:"
if docker exec postgres-shared pg_isready -U postgres > /dev/null 2>&1; then
    echo "  ✅ Server: OK"
else
    echo "  ❌ Server: FAIL"
fi

if docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1; then
    echo "  ✅ Staging DB: OK"
else
    echo "  ❌ Staging DB: FAIL"
fi

if docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1; then
    echo "  ✅ Production DB: OK"
else
    echo "  ⚠️  Production DB: Not available"
fi

# Caddy
echo ""
echo "Caddy Shared:"
if docker ps | grep -q caddy-shared; then
    echo "  ✅ Container: Running"
    
    # Check SSL certificates
    CERT_COUNT=$(docker exec caddy-shared ls /data/caddy/certificates/ 2>/dev/null | wc -l)
    echo "  ✅ SSL Certificates: $CERT_COUNT"
    
    # Check logs for errors
    ERROR_COUNT=$(docker logs caddy-shared --tail 100 2>&1 | grep -i error | wc -l)
    if [ "$ERROR_COUNT" -eq 0 ]; then
        echo "  ✅ No recent errors"
    else
        echo "  ⚠️  Recent errors: $ERROR_COUNT"
    fi
else
    echo "  ❌ Container: Not running"
fi

# Disk space
echo ""
echo "System Resources:"
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
echo "  Disk: $DISK_USAGE% used"
if [ "$DISK_USAGE" -gt 80 ]; then
    echo "    ⚠️  WARNING: High disk usage"
fi

MEM_USAGE=$(free | awk 'NR==2 {printf "%.0f", $3/$2*100}')
echo "  Memory: $MEM_USAGE% used"
if [ "$MEM_USAGE" -gt 85 ]; then
    echo "    ⚠️  WARNING: High memory usage"
fi

7. metrics-dashboard.sh (Nuevo)

Propósito: Dashboard consolidado de métricas multi-ambiente

Ver sección Dashboard Consolidado para implementación completa.


📝 Logs Centralizados

Archivo Principal

bash
/var/log/gunei-health.log

Formato de Logs Actualizado

[TIMESTAMP] [LEVEL] [ENVIRONMENT] [COMPONENT] Message

Ejemplo:

[2026-01-12 12:34:56] [INFO] [STAGING] [HEALTH_CHECK] Backend health: OK
[2026-01-12 12:39:56] [ERROR] [STAGING] [HEALTH_CHECK] Backend not responding (timeout)
[2026-01-12 12:40:01] [ERROR] [STAGING] [ALERT] Discord notification sent: Backend down
[2026-01-12 12:44:56] [INFO] [STAGING] [HEALTH_CHECK] Backend health: OK
[2026-01-12 12:45:01] [INFO] [STAGING] [ALERT] Discord notification sent: Backend recovered

[2026-01-12 13:15:22] [INFO] [PRODUCTION] [HEALTH_CHECK] Backend health: OK
[2026-01-12 13:15:22] [INFO] [PRODUCTION] [HEALTH_CHECK] Frontend health: OK

[2026-01-12 13:20:30] [ERROR] [INFRASTRUCTURE] [HEALTH_CHECK] PostgreSQL connection slow
[2026-01-12 13:20:30] [WARNING] [INFRASTRUCTURE] [ALERT] PostgreSQL performance degraded

Rotación de Logs

bash
# Configuración: /etc/logrotate.d/gunei-health
/var/log/gunei-health.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    size 100M
}

Ver Logs por Ambiente

bash
# Logs de staging
grep "\[STAGING\]" /var/log/gunei-health.log | tail -n 50

# Logs de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | tail -n 50

# Logs de infraestructura
grep "\[INFRASTRUCTURE\]" /var/log/gunei-health.log | tail -n 50

# Errores de staging
grep "\[STAGING\]" /var/log/gunei-health.log | grep ERROR

# Alertas de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | grep ALERT

# Tiempo real por ambiente
tail -f /var/log/gunei-health.log | grep "\[STAGING\]"
tail -f /var/log/gunei-health.log | grep "\[PRODUCTION\]"

Docker Logging Configuration

Todos los contenedores usan logging JSON con rotación automática:

yaml
logging:
  driver: "json-file"
  options:
    max-size: "10m"    # Máximo 10MB por archivo
    max-file: "3"      # Máximo 3 archivos
    tag: "{{.Name}}/{{.ID}}"

Verificación:

bash
# Ver configuración de logging de un container
docker inspect gunei-backend-staging | grep -A 10 LogConfig

# Los logs rotan automáticamente (no llenan disco)
docker logs gunei-backend-staging --tail 100

Beneficios:

  • Logs no llenan disco (máximo ~30MB por container)
  • Rotación automática sin intervención
  • Tags identifican container en logs centralizados

Timezone

Todos los contenedores usan timezone Argentina:

yaml
environment:
  - TZ=America/Argentina/Buenos_Aires

Verificación:

bash
# Ver timezone de cada container
docker exec gunei-backend-staging date
docker exec gunei-frontend-staging date
docker exec postgres-shared date

# Output esperado: hora Argentina (UTC-3)

Implicaciones:

  • Logs con timestamps en hora Argentina
  • Cron jobs ejecutan en hora local del servidor
  • Backups con nomenclatura en hora Argentina

Logs de Contenedores por Ambiente

Staging:

bash
# Backend staging
docker logs gunei-backend-staging -f
docker logs gunei-backend-staging --tail 100
docker logs gunei-backend-staging --since 1h

# Frontend staging
docker logs gunei-frontend-staging -f
docker logs gunei-frontend-staging --tail 100

Production:

bash
# Backend production
docker logs gunei-backend-production -f
docker logs gunei-backend-production --tail 100

# Frontend production
docker logs gunei-frontend-production -f
docker logs gunei-frontend-production --tail 100

Infrastructure:

bash
# PostgreSQL (afecta ambos ambientes)
docker logs postgres-shared -f
docker logs postgres-shared --tail 100

# Caddy (routing de ambos ambientes)
docker logs caddy-shared -f
docker logs caddy-shared --tail 100

# Logs de acceso de Caddy por ambiente
docker exec caddy-shared cat /var/log/caddy/staging-erpfront.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/staging-erpback.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/production-erpfront.log | tail -n 50
docker exec caddy-shared cat /var/log/caddy/production-erpback.log | tail -n 50

⏰ Cron Jobs

Configuración Actual

bash
# Ver crontab
crontab -l

# Editar crontab
crontab -e

Schedule Completo (Actualizado)

bash
# Health checks cada 5 minutos (multi-ambiente)
*/5 * * * * /root/scripts/alert-check.sh >> /var/log/gunei-health.log 2>&1

# Backups diarios 2 AM (ambos ambientes)
0 2 * * * /root/scripts/backup-postgres.sh >> /var/log/gunei-backups.log 2>&1

# Verificación de backups 3 AM
0 3 * * * /root/scripts/verify-backup.sh >> /var/log/gunei-backups.log 2>&1

# Cleanup semanal (domingos 4 AM)
0 4 * * 0 /root/scripts/cleanup-backups.sh >> /var/log/gunei-backups.log 2>&1

# Check de disco diario 5 AM
0 5 * * * /root/scripts/check-disk-space.sh >> /var/log/gunei-health.log 2>&1

# Metrics dashboard cada hora (opcional)
0 * * * * /root/scripts/metrics-dashboard.sh >> /var/log/gunei-metrics.log 2>&1

Verificar Ejecución

bash
# Ver último run de health checks (cualquier ambiente)
grep "HEALTH_CHECK" /var/log/gunei-health.log | tail -n 1

# Ver último run por ambiente
grep "\[STAGING\]" /var/log/gunei-health.log | grep "HEALTH_CHECK" | tail -n 1
grep "\[PRODUCTION\]" /var/log/gunei-health.log | grep "HEALTH_CHECK" | tail -n 1

# Verificar que cron está activo
systemctl status cron

# Ver logs de cron
grep CRON /var/log/syslog | tail -n 20

# Ver ejecuciones recientes de alert-check
grep "alert-check.sh" /var/log/syslog | tail -n 10

📢 Notificaciones Discord

Webhook Configurado

bash
# Variable de entorno (en GitHub Secrets y VPS)
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...

Tipos de Notificaciones Actualizadas

1. Deployment Success (Staging)

json
{
  "embeds": [{
    "title": "✅ Deployment Successful - Staging",
    "color": 5763719,
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Branch", "value": "develop", "inline": true},
      {"name": "Commit", "value": "abc1234"},
      {"name": "Duration", "value": "2m 34s"}
    ]
  }]
}

2. Deployment Success (Production)

json
{
  "embeds": [{
    "title": "✅ Deployment Successful - Production",
    "color": 3447003,
    "fields": [
      {"name": "Environment", "value": "production", "inline": true},
      {"name": "Branch", "value": "main", "inline": true},
      {"name": "Commit", "value": "abc1234"},
      {"name": "Duration", "value": "2m 45s"}
    ]
  }]
}

3. Staging Backend Down

json
{
  "embeds": [{
    "title": "🟢 Staging - Backend Down",
    "color": 3066993,
    "description": "Backend staging is not responding",
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Service", "value": "Backend", "inline": true},
      {"name": "Endpoint", "value": "https://staging-erpback.gunei.xyz/status"},
      {"name": "Response", "value": "Timeout (5s)"},
      {"name": "Time", "value": "2026-01-12 14:30:00"}
    ]
  }]
}

4. Production Backend Down (Crítico)

json
{
  "embeds": [{
    "title": "🔵 Production - Backend Down",
    "color": 15548997,
    "description": "🚨 CRITICAL: Production backend is not responding",
    "fields": [
      {"name": "Environment", "value": "production", "inline": true},
      {"name": "Service", "value": "Backend", "inline": true},
      {"name": "Endpoint", "value": "https://erpback.gunei.xyz/status"},
      {"name": "Response", "value": "Timeout (5s)"},
      {"name": "Time", "value": "2026-01-12 14:30:00"},
      {"name": "Action", "value": "@here Immediate attention required"}
    ]
  }]
}

5. Infrastructure Down (Muy Crítico)

json
{
  "embeds": [{
    "title": "🚨 Infrastructure Down - Affects All Environments",
    "color": 15105570,
    "description": "CRITICAL: Shared infrastructure is failing",
    "fields": [
      {"name": "Component", "value": "PostgreSQL Shared", "inline": true},
      {"name": "Impact", "value": "All environments", "inline": true},
      {"name": "Time", "value": "2026-01-12 14:30:00"},
      {"name": "Action", "value": "@everyone CRITICAL - ALL ENVIRONMENTS AFFECTED"}
    ]
  }]
}

6. Service Recovered

json
{
  "embeds": [{
    "title": "✅ Staging - Backend Recovered",
    "color": 5763719,
    "description": "Backend staging is responding normally",
    "fields": [
      {"name": "Environment", "value": "staging", "inline": true},
      {"name": "Downtime", "value": "15 minutes", "inline": true}
    ]
  }]
}

Rate Limits

  • Discord: 30 mensajes / minuto
  • Nuestro sistema: ~2-4 mensajes / 5 minutos (staging + production checks)
  • Infraestructura: ~1 mensaje / 5 minutos

Total máximo: ~5-6 mensajes / 5 minutos en operación normal


🐘 Monitoreo PostgreSQL Shared

Configuración del Servidor

bash
# Puerto expuesto al host
Puerto: 5433

# Conexión desde el host
psql -h localhost -p 5433 -U postgres

# Conexión desde otros containers (via Docker network)
Host: postgres-shared
Puerto: 5432 (puerto interno)

Health Checks PostgreSQL

bash
# Check básico - servidor responde
docker exec postgres-shared pg_isready -U postgres -p 5432

# Check staging database
docker exec postgres-shared pg_isready -U gunei_staging_user -d gunei_erp_staging

# Check production database
docker exec postgres-shared pg_isready -U gunei_prod_user -d gunei_erp_production

Métricas de PostgreSQL

bash
# Conexiones activas por database
docker exec postgres-shared psql -U postgres -c "
SELECT datname as database,
       count(*) as connections,
       count(*) FILTER (WHERE state = 'active') as active,
       count(*) FILTER (WHERE state = 'idle') as idle
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY datname;"

# Tamaño de databases
docker exec postgres-shared psql -U postgres -c "
SELECT datname as database,
       pg_size_pretty(pg_database_size(datname)) as size
FROM pg_database
WHERE datname LIKE 'gunei_%';"

# Queries lentas (últimas 10)
docker exec postgres-shared psql -U postgres -c "
SELECT pid, datname, usename,
       now() - query_start as duration,
       left(query, 50) as query_preview
FROM pg_stat_activity
WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC
LIMIT 10;"

# Cache hit ratio (debería ser > 99%)
docker exec postgres-shared psql -U postgres -c "
SELECT datname,
       round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 2) as cache_hit_ratio
FROM pg_stat_database
WHERE datname LIKE 'gunei_%';"

Alertas PostgreSQL

MétricaWarningCritical
Conexiones totales> 80> 95
Conexiones staging> 40> 50
Conexiones production> 40> 50
Cache hit ratio< 95%< 90%
Query duration> 5s> 30s

📊 Comparación de Métricas Entre Ambientes

Script: compare-environments.sh

bash
#!/bin/bash
# /root/scripts/compare-environments.sh

echo "📊 Comparación Staging vs Production"
echo "====================================="
echo ""

# Response times
echo "⏱️  Response Times:"
STAGING_TIME=$(curl -o /dev/null -s -w '%{time_total}' https://staging-erpback.gunei.xyz/status)
echo "   Staging Backend:    ${STAGING_TIME}s"

if docker ps | grep -q "gunei-backend-production"; then
    PROD_TIME=$(curl -o /dev/null -s -w '%{time_total}' http://localhost:3100/status)
    echo "   Production Backend: ${PROD_TIME}s"
fi
echo ""

# Container resources
echo "💾 Container Resources:"
echo "   CONTAINER                    CPU%   MEM USAGE"
docker stats --no-stream --format "   {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}" | grep gunei
echo ""

# Database connections
echo "🗄️  Database Connections:"
docker exec postgres-shared psql -U postgres -t -c "
SELECT '   ' || datname || ': ' || count(*) || ' connections'
FROM pg_stat_activity
WHERE datname LIKE 'gunei_%'
GROUP BY datname;"
echo ""

# Database sizes
echo "📦 Database Sizes:"
docker exec postgres-shared psql -U postgres -t -c "
SELECT '   ' || datname || ': ' || pg_size_pretty(pg_database_size(datname))
FROM pg_database
WHERE datname LIKE 'gunei_%';"

Métricas Comparativas

MétricaStagingProductionNotas
Response time< 500ms< 200msProd debe ser más rápido
Memory usage< 512MB< 1GBProd puede usar más recursos
DB connections< 20< 50Prod tiene más carga
Error rate< 5%< 0.1%Prod debe ser más estable

📈 Métricas y Alertas

Métricas Monitoreadas por Ambiente

Sistema (Compartido)

  • CPU usage: Via top / htop
  • Memory usage: Via free -h
  • Disk space: Via df -h
  • Network: Via netstat / ss

Staging

  • Response time: Health check timing
  • HTTP status codes: 200, 5xx
  • Database connections: Conexiones activas a gunei_erp_staging
  • Container health: Docker ps status staging

Production

  • Response time: Health check timing
  • HTTP status codes: 200, 5xx
  • Database connections: Conexiones activas a gunei_erp_production
  • Container health: Docker ps status production

Infrastructure

  • PostgreSQL: Conexiones totales, queries/segundo, cache hit ratio
  • Caddy: Requests/segundo, errores 5xx, SSL certificate expiry

Umbrales de Alerta

bash
# Disk space (sistema)
WARNING: 80%
CRITICAL: 90%

# Memory (sistema)
WARNING: 85%
CRITICAL: 95%

# Response time (por ambiente)
WARNING: > 2 segundos
CRITICAL: > 5 segundos (timeout)

# Consecutive failures (por ambiente)
STAGING: 2 fallos consecutivos → alerta
PRODUCTION: 1 fallo → alerta inmediata
INFRASTRUCTURE: 1 fallo → alerta crítica inmediata

# PostgreSQL connections (compartido)
WARNING: > 80 conexiones
CRITICAL: > 95 conexiones (max 100)

📊 Dashboard Consolidado

Script: metrics-dashboard.sh (Nuevo)

Propósito: Dashboard visual consolidado de todos los ambientes

bash
#!/bin/bash
# /root/scripts/metrics-dashboard.sh

echo "╔═══════════════════════════════════════════════════════════╗"
echo "║        Gunei ERP - Multi-Environment Dashboard           ║"
echo "╚═══════════════════════════════════════════════════════════╝"
echo ""

# Function to check service
check_url() {
    if curl -f -s -o /dev/null "$1"; then
        echo ""
    else
        echo ""
    fi
}

# Infrastructure
echo "┌─ Infrastructure (Shared) ──────────────────────────────────"
POSTGRES_STATUS=$(docker exec postgres-shared pg_isready -U postgres 2>/dev/null && echo "" || echo "")
CADDY_STATUS=$(docker ps | grep -q caddy-shared && echo "" || echo "")
echo "│ PostgreSQL Shared: $POSTGRES_STATUS"
echo "│ Caddy Shared:      $CADDY_STATUS"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Staging
echo "┌─ Staging Environment 🟢 ───────────────────────────────────"
STAGING_BACKEND=$(check_url "https://staging-erpback.gunei.xyz/status")
STAGING_FRONTEND=$(check_url "https://staging-erpfront.gunei.xyz/health")
STAGING_DB=$(docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;" > /dev/null 2>&1 && echo "" || echo "")

echo "│ Backend:  $STAGING_BACKEND  https://staging-erpback.gunei.xyz"
echo "│ Frontend: $STAGING_FRONTEND  https://staging-erpfront.gunei.xyz"
echo "│ Database: $STAGING_DB  gunei_erp_staging"

# Container status
STAGING_BACKEND_CONTAINER=$(docker ps | grep gunei-backend-staging | awk '{print $7, $8, $9}')
STAGING_FRONTEND_CONTAINER=$(docker ps | grep gunei-frontend-staging | awk '{print $7, $8, $9}')
echo "│   Backend container:  Up $STAGING_BACKEND_CONTAINER"
echo "│   Frontend container: Up $STAGING_FRONTEND_CONTAINER"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Production (si está activo)
if docker ps | grep -q "gunei-backend-production"; then
    echo "┌─ Production Environment 🔵 ─────────────────────────────────"
    PROD_BACKEND=$(check_url "http://localhost:3100/status")
    PROD_FRONTEND=$(check_url "http://localhost:3101/health")
    PROD_DB=$(docker exec postgres-shared psql -U gunei_prod_user -d gunei_erp_production -c "SELECT 1;" > /dev/null 2>&1 && echo "" || echo "")
    
    echo "│ Backend:  $PROD_BACKEND  (port 3100)"
    echo "│ Frontend: $PROD_FRONTEND  (port 3101)"
    echo "│ Database: $PROD_DB  gunei_erp_production"
    
    PROD_BACKEND_CONTAINER=$(docker ps | grep gunei-backend-production | awk '{print $7, $8, $9}')
    PROD_FRONTEND_CONTAINER=$(docker ps | grep gunei-frontend-production | awk '{print $7, $8, $9}')
    echo "│   Backend container:  Up $PROD_BACKEND_CONTAINER"
    echo "│   Frontend container: Up $PROD_FRONTEND_CONTAINER"
    echo "└────────────────────────────────────────────────────────────"
    echo ""
fi

# System Resources
echo "┌─ System Resources ──────────────────────────────────────────"
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}')
MEM_USAGE=$(free -h | awk '/^Mem:/ {print $3 "/" $2}')
CPU_LOAD=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}')

echo "│ Disk:   $DISK_USAGE used"
echo "│ Memory: $MEM_USAGE"
echo "│ Load:   $CPU_LOAD"
echo "└────────────────────────────────────────────────────────────"
echo ""

# Recent Events
echo "┌─ Recent Events (Last 5) ────────────────────────────────────"
grep -E "\[ERROR\]|\[WARNING\]" /var/log/gunei-health.log | tail -n 5 | while read line; do
    echo "$line"
done
echo "└────────────────────────────────────────────────────────────"
echo ""

echo "Last updated: $(date '+%Y-%m-%d %H:%M:%S')"

Uso:

bash
# Ejecutar dashboard
/root/scripts/metrics-dashboard.sh

# Ejecutar cada hora automáticamente (ya configurado en cron)
# O ejecutar manualmente cuando se necesite

Output ejemplo:

╔═══════════════════════════════════════════════════════════╗
║        Gunei ERP - Multi-Environment Dashboard           ║
╚═══════════════════════════════════════════════════════════╝

┌─ Infrastructure (Shared) ──────────────────────────────────
│ PostgreSQL Shared: ✅
│ Caddy Shared:      ✅
└────────────────────────────────────────────────────────────

┌─ Staging Environment 🟢 ───────────────────────────────────
│ Backend:  ✅  https://staging-erpback.gunei.xyz
│ Frontend: ✅  https://staging-erpfront.gunei.xyz
│ Database: ✅  gunei_erp_staging
│   Backend container:  Up 2 days
│   Frontend container: Up 2 days
└────────────────────────────────────────────────────────────

┌─ Production Environment 🔵 ─────────────────────────────────
│ Backend:  ✅  (port 3100)
│ Frontend: ✅  (port 3101)
│ Database: ✅  gunei_erp_production
│   Backend container:  Up 1 day
│   Frontend container: Up 1 day
└────────────────────────────────────────────────────────────

┌─ System Resources ──────────────────────────────────────────
│ Disk:   23% used
│ Memory: 4.2G/8.0G
│ Load:   0.45
└────────────────────────────────────────────────────────────

┌─ Recent Events (Last 5) ────────────────────────────────────
│ [2026-01-12 13:15:00] [WARNING] [STAGING] High response time
└────────────────────────────────────────────────────────────

Last updated: 2026-01-12 14:30:00

🔧 Troubleshooting

Health Check Fallando en Staging pero Production OK

Síntoma: Alertas de staging pero production funciona normal

Diagnóstico:

bash
# Ver logs específicos de staging
grep "\[STAGING\]" /var/log/gunei-health.log | tail -n 20

# Check manual de staging
curl -v https://staging-erpback.gunei.xyz/status
curl -v https://staging-erpfront.gunei.xyz/health

# Ver logs de containers staging
docker logs gunei-backend-staging --tail 50
docker logs gunei-frontend-staging --tail 50

# Verificar que no hay conflicto de recursos con production
docker stats --no-stream | grep staging

Solución:

bash
# Si staging está lento o crasheando
docker restart gunei-backend-staging
docker restart gunei-frontend-staging

# Verificar database staging específicamente
docker exec postgres-shared psql -U gunei_staging_user -d gunei_erp_staging -c "SELECT 1;"

Production Down pero Staging OK

Síntoma: Production no responde, staging operativo

Diagnóstico:

bash
# Ver logs de production
grep "\[PRODUCTION\]" /var/log/gunei-health.log | tail -n 20

# Check manual de production
curl -v http://localhost:3100/status
curl -v http://localhost:3101/health

# Ver containers production
docker logs gunei-backend-production --tail 50
docker logs gunei-frontend-production --tail 50

Solución:

bash
# Reiniciar production services
cd /opt/apps/gunei-erp/backend/production
docker compose restart backend

cd /opt/apps/gunei-erp/frontend/production
docker compose restart frontend

Infrastructure Down (Afecta Ambos Ambientes)

Síntoma: Tanto staging como production fallan simultáneamente

Diagnóstico:

bash
# Check infrastructure
/root/scripts/check-infrastructure.sh

# Ver logs de PostgreSQL
docker logs postgres-shared --tail 100

# Ver logs de Caddy
docker logs caddy-shared --tail 100

# Verificar que los contenedores están corriendo
docker ps | grep -E "postgres-shared|caddy-shared"

Solución:

bash
# Reiniciar PostgreSQL (CUIDADO: afecta ambos ambientes)
docker restart postgres-shared
sleep 10

# Reiniciar Caddy
docker restart caddy-shared

# Verificar que todo volvió
/root/scripts/health-check.sh

Demasiadas Notificaciones Discord (Multi-Ambiente)

Síntoma: Spam de alertas de múltiples ambientes

Diagnóstico:

bash
# Ver frecuencia de alertas por ambiente
grep "Discord notification sent" /var/log/gunei-health.log | grep "\[STAGING\]" | tail -n 20
grep "Discord notification sent" /var/log/gunei-health.log | grep "\[PRODUCTION\]" | tail -n 20

# Identificar ambiente problemático
grep "\[ERROR\]" /var/log/gunei-health.log | tail -n 50

Solución:

bash
# Implementar cooldown por ambiente en alert-check.sh
# O ajustar frecuencia de cron
# O implementar diferentes umbrales por ambiente:
#   - Staging: 3 fallos consecutivos antes de alertar
#   - Production: 1 fallo inmediato

Logs No Rotan / Disco Lleno

Síntoma: Disco > 90%, logs gigantes de múltiples ambientes

Diagnóstico:

bash
# Ver tamaño de logs
du -sh /var/log/gunei-*.log

# Ver tamaño por container
du -sh /var/lib/docker/containers/*/

Solución:

bash
# Forzar rotación
logrotate -f /etc/logrotate.d/gunei-health

# Limpiar logs viejos
find /var/log -name "*.gz" -mtime +30 -delete

# Limpiar logs de Docker
docker system prune -a --volumes

Cron No Ejecuta Scripts

Ver troubleshooting en versión anterior - no cambia.

Conflictos Entre Ambientes

Síntoma: Un ambiente afecta al otro (staging lento cuando production está bajo carga, o viceversa)

Diagnóstico:

bash
# Ver recursos por container
docker stats --no-stream | grep gunei

# Ver conexiones de DB por ambiente
docker exec postgres-shared psql -U postgres -c "
SELECT datname, count(*), state
FROM pg_stat_activity
WHERE datname LIKE 'gunei_%'
GROUP BY datname, state
ORDER BY datname;"

# Verificar que cada ambiente usa su DB correcta
docker logs gunei-backend-staging 2>&1 | grep -i "database\|connection"
docker logs gunei-backend-production 2>&1 | grep -i "database\|connection"

Posibles causas:

  1. Conexiones DB cruzadas: Backend staging conectando a DB production
  2. Puertos incorrectos: Production usando puerto de staging
  3. Recursos compartidos saturados: PostgreSQL/Caddy al límite
  4. Variables de entorno incorrectas: DATABASE_URL apuntando al ambiente incorrecto

Solución:

bash
# Verificar puertos no colisionan
ss -tlnp | grep -E "3000|3001|3100|3101"

# Output esperado:
# :3000 - gunei-backend-staging
# :3001 - gunei-frontend-staging
# :3100 - gunei-backend-production
# :3101 - gunei-frontend-production

# Si hay conflicto, recrear container con puerto correcto
# Staging backend debe mapear 3000:3000
# Production backend debe mapear 3100:3000

# Verificar DATABASE_URL en cada ambiente
docker exec gunei-backend-staging printenv | grep DATABASE
# Debe mostrar: gunei_erp_staging

docker exec gunei-backend-production printenv | grep DATABASE
# Debe mostrar: gunei_erp_production

# Si está mal configurado, actualizar docker-compose y recrear

Puerto en Uso por Otro Proceso

Síntoma: Container no inicia, error "port already in use"

Diagnóstico:

bash
# Ver qué proceso usa el puerto
lsof -i :3000
lsof -i :3001
lsof -i :3100
lsof -i :3101
lsof -i :5433

# Ver todos los containers (incluso detenidos)
docker ps -a | grep gunei

Solución:

bash
# Si es un container zombie
docker rm -f <container_id>

# Si es otro proceso
kill <PID>

# Reiniciar el container afectado
docker compose up -d

Database Connections Exhausted

Síntoma: "too many connections" error en logs de backend

Diagnóstico:

bash
# Ver conexiones totales
docker exec postgres-shared psql -U postgres -c "SELECT count(*) FROM pg_stat_activity;"

# Ver por database
docker exec postgres-shared psql -U postgres -c "
SELECT datname, count(*)
FROM pg_stat_activity
WHERE datname IS NOT NULL
GROUP BY datname;"

# Ver conexiones idle antiguas
docker exec postgres-shared psql -U postgres -c "
SELECT datname, usename, state, query_start, now() - query_start as duration
FROM pg_stat_activity
WHERE state = 'idle'
ORDER BY duration DESC
LIMIT 20;"

Solución:

bash
# Terminar conexiones idle antiguas de staging
docker exec postgres-shared psql -U postgres -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'gunei_erp_staging'
AND state = 'idle'
AND query_start < now() - interval '30 minutes';"

# Igual para production si es necesario (con más cuidado)
docker exec postgres-shared psql -U postgres -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'gunei_erp_production'
AND state = 'idle'
AND query_start < now() - interval '1 hour';"

# Reiniciar backends para limpiar connection pools
docker restart gunei-backend-staging
docker restart gunei-backend-production

📚 Referencias

Scripts de Monitoreo

  • /root/scripts/monitor-logs.sh: Ver logs multi-ambiente
  • /root/scripts/health-check.sh: Check completo
  • /root/scripts/alert-check.sh: Alertas automatizadas
  • /root/scripts/check-staging.sh: Check rápido staging
  • /root/scripts/check-production.sh: Check rápido production
  • /root/scripts/check-infrastructure.sh: Check infraestructura
  • /root/scripts/metrics-dashboard.sh: Dashboard consolidado
  • /root/scripts/compare-environments.sh: Comparar staging vs production

URLs de Monitoreo

Staging:

Production


Última actualización: 14 Enero 2026 Versión: 2.1.0

Cambios en v2.1:

  • ✅ Documentación detallada de PostgreSQL Shared (puerto 5433)
  • ✅ Mapeo de puertos por ambiente (staging 3000/3001, production 3100/3101)
  • ✅ Conexiones a DB por ambiente documentadas
  • ✅ Métricas PostgreSQL: conexiones, tamaños, queries lentas, cache hit ratio
  • ✅ Comparación de métricas entre ambientes (compare-environments.sh)
  • ✅ Docker Logging Configuration (json-file, rotación automática, max 10m/3 files)
  • ✅ Timezone Argentina (TZ=America/Argentina/Buenos_Aires)
  • ✅ Troubleshooting: conflictos entre ambientes
  • ✅ Troubleshooting: puertos en uso
  • ✅ Troubleshooting: database connections exhausted

Cambios en v2.0:

  • ✅ Soporte multi-ambiente (staging + production)
  • ✅ Monitoreo de infraestructura compartida (PostgreSQL, Caddy)
  • ✅ Scripts actualizados con detección automática de ambientes
  • ✅ Logs centralizados con tags de ambiente
  • ✅ Notificaciones Discord contextuales por ambiente
  • ✅ Dashboard consolidado multi-ambiente
  • ✅ Health checks independientes por ambiente
  • ✅ Troubleshooting específico multi-ambiente
  • ✅ Nuevos scripts: check-staging, check-production, check-infrastructure, metrics-dashboard