Features
Deployments
Deployment Operations

Deployment Operations

Learn how to manage the deployment lifecycle with retry, rollback, cancellation, and redeployment operations in OEC.SH.


Overview

OEC.SH provides comprehensive deployment lifecycle management operations to handle failures, rollback problematic deployments, and maintain environment stability. The platform uses a sophisticated state machine with ARQ-based background task processing for resilient, long-running operations.

Available Operations

OperationPurposeUse Case
RetryRetry a failed deployment with same configurationTransient network errors, temporary resource unavailability
RollbackRevert to a previous successful deploymentNew deployment introduced bugs or issues
CancelStop an in-progress deploymentIncorrect configuration, need to make changes
RedeployForce redeployment of current configurationConfiguration changes, dependency updates
DestroyRemove environment and all containersEnvironment cleanup, resource reclamation

Deployment State Machine

OEC.SH uses a state machine to track deployment lifecycle and ensure valid state transitions.

Deployment Statuses

# From models/deployment.py
class DeploymentStatus(enum.Enum):
    PENDING = "pending"          # Deployment created, waiting to start
    QUEUED = "queued"           # Queued in ARQ for processing
    BUILDING = "building"       # Building Docker image
    DEPLOYING = "deploying"     # Actively deploying to server
    RUNNING = "running"         # Legacy - use DEPLOYING instead
    SUCCESS = "success"         # Deployment completed successfully
    FAILED = "failed"           # Deployment failed with error
    CANCELLED = "cancelled"     # Deployment cancelled by user
    ROLLED_BACK = "rolled_back" # Previous deployment rolled back
    DESTROYED = "destroyed"     # Environment was destroyed

State Transitions

Valid state transitions for deployment operations:

Terminal States

Terminal states (no further automatic transitions):

  • SUCCESS - Can be rolled back to or used as rollback target
  • FAILED - Can be retried (creates new deployment)
  • CANCELLED - Can trigger new deployment manually
  • DESTROYED - Environment removed, requires full redeploy

Retry Failed Deployments

Retry a failed deployment with the same configuration to recover from transient failures.

When to Retry vs Redeploy

ScenarioActionReason
Network timeout during git cloneRetryTransient network issue
SSH connection droppedRetryTemporary connection problem
Docker registry rate limitRetryWait and retry
Invalid git branch nameRedeployFix configuration first
Missing environment variablesRedeployFix config first
Wrong Odoo versionRedeployUpdate project settings

API Endpoint

POST /api/v1/deployments/{deployment_id}/retry
Authorization: Bearer <token>

Request

curl -X POST https://api.oec.sh/api/v1/deployments/a1b2c3d4-5678-90ab-cdef-1234567890ab/retry \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "id": "e5f6g7h8-90ab-cdef-1234-567890abcdef",
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "version": 5,
  "status": "pending",
  "trigger": "manual",
  "git_commit": "a1b2c3d",
  "git_branch": "main",
  "extra_data": {
    "retry_of": "a1b2c3d4-5678-90ab-cdef-1234567890ab"
  },
  "created_at": "2025-01-15T10:30:00Z"
}

What Gets Retained

When retrying a deployment, the following are retained from the original:

  • Git configuration: Same branch, same commit (if specified)
  • Environment configuration: CPU, RAM, disk limits
  • Environment variables: All environment-specific variables
  • Addon repositories: Same platform, org, and project addons
  • Database configuration: Same PostgreSQL settings
  • Trigger metadata: Marked as retry with reference to original

What Gets Reset

The following are reset on retry:

  • Deployment ID: New UUID generated
  • Version number: Incremented to next version
  • Status: Starts at PENDING
  • Timestamps: New created_at, started_at, completed_at
  • Logs: Fresh log entries
  • Container ID: New container created on success

Retry Limits

Automatic retry limits:

  • ARQ task retries: 3 attempts (configured via QUEUE_MAX_TRIES)
  • Exponential backoff: 5s → 15s → 45s between retries
  • Dead Letter Queue (DLQ): Failed tasks after max retries

Manual retry limits:

  • No hard limit on manual retries via API
  • Rate limited: 10 retries per minute per organization

Implementation Details

# From routes/deployments.py - Line 570
@router.post("/{deployment_id}/retry")
async def retry_deployment(
    db: DBSession,
    current_user: CurrentUser,
    deployment_id: UUID,
) -> DeploymentResponse:
    # Validate deployment status
    if original.status != DeploymentStatus.FAILED:
        raise HTTPException(400, "Can only retry failed deployments")
 
    # Check for ongoing deployments
    if has_ongoing_deployment(project_id):
        raise HTTPException(409, "Deployment already in progress")
 
    # Create new deployment with same config
    new_deployment = Deployment(
        project_id=original.project_id,
        version=next_version,
        git_branch=original.git_branch,
        git_commit=original.git_commit,
        extra_data={"retry_of": str(original.id)}
    )
 
    # Queue for execution via ARQ
    await enqueue_task("deploy_environment", ...)

Permission Requirements

  • Required permission: project.environments.deploy
  • Organization scope: Must be member of environment's organization
  • Project scope: Must have deploy permission on project

Deployment Cancellation

Cancel a pending or running deployment gracefully to stop execution and clean up resources.

When to Cancel

  • Incorrect configuration detected before deployment completes
  • Wrong branch/commit selected and need to redeploy
  • Need to make changes to environment variables or settings
  • Server issues detected (maintenance, high load)
  • Cost control - stop expensive operation

API Endpoint

POST /api/v1/deployments/{deployment_id}/cancel
Authorization: Bearer <token>

Request

curl -X POST https://api.oec.sh/api/v1/deployments/abc123/cancel \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "message": "Deployment cancelled"
}

Cancellation Behavior

Graceful Cancellation (Default)

When you cancel a deployment, the system:

  1. Marks deployment as CANCELLED in database
  2. Updates environment status to last known stable state
  3. Logs cancellation event with user ID and timestamp
  4. Does NOT kill running processes - they complete naturally
  5. Does NOT rollback partial changes - containers may remain

State After Cancellation

Deployment StepState After CancelCleanup Required
PENDINGClean, nothing createdNone
CONNECTINGCleanNone
CLONING_REPOPartial git clone on serverFiles remain in /opt/paasportal/
CREATING_POSTGRESPostgreSQL container runningContainer + volume remain
STARTING_CONTAINEROdoo container may be runningContainer remains
COMPLETEDToo late to cancelDeployment succeeded

Cleanup Operations

After cancellation, you may need to:

  1. Stop running containers manually via environment actions
  2. Remove partial clones via environment destroy
  3. Delete DNS records if auto-created
  4. Check for orphaned Docker networks

Implementation

# From routes/deployments.py - Line 504
@router.post("/{deployment_id}/cancel")
async def cancel_deployment(
    db: DBSession,
    current_user: CurrentUser,
    deployment_id: UUID,
) -> dict[str, str]:
    # Validate current status
    if deployment.status not in (DeploymentStatus.PENDING, DeploymentStatus.RUNNING):
        raise HTTPException(400, f"Cannot cancel deployment in {deployment.status} state")
 
    # Mark as cancelled
    deployment.status = DeploymentStatus.CANCELLED
 
    # Log cancellation
    log_entry = DeploymentLog(
        deployment_id=deployment.id,
        level=LogLevel.WARNING,
        message="Deployment cancelled by user",
        data={"cancelled_by": str(current_user.id)}
    )
 
    await db.commit()

Limitations

  • Cannot cancel SUCCESS or FAILED deployments (already terminal states)
  • No automatic cleanup of partial resources
  • ARQ tasks may complete if already executing step
  • Docker operations in-flight will finish (cannot be interrupted mid-step)

Permission Requirements

  • Required permission: project.deployments.cancel
  • Organization scope: Must be member of environment's organization

Rollback to Previous Deployment

Rollback to a previous successful deployment to quickly recover from problematic deployments.

When to Rollback

  • New deployment introduced bugs that affect production users
  • Performance regression detected after deployment
  • Breaking changes deployed accidentally
  • Need immediate recovery while investigating root cause
  • Database migration issues (with caveats - see below)

API Endpoint

POST /api/v1/deployments/{deployment_id}/rollback
Authorization: Bearer <token>

Note: This endpoint rolls back TO the specified deployment (not FROM).

Request

# Rollback TO deployment abc123 (the last known good deployment)
curl -X POST https://api.oec.sh/api/v1/deployments/abc123/rollback \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "id": "new_deployment_id",
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "version": 8,
  "status": "pending",
  "trigger": "rollback",
  "git_commit": "a1b2c3d",
  "git_branch": "main",
  "rollback_from_id": "abc123",
  "extra_data": {
    "rollback_to": "abc123"
  }
}

Rollback Target Selection

The system identifies rollback targets as deployments with:

  1. Status: SUCCESS only
  2. Same environment: Must be for same environment
  3. Can rollback: can_rollback = true (default for successful deployments)
# Get deployment history to find rollback target
curl https://api.oec.sh/api/v1/deployments?project_id=proj_abc123&status=success \
  -H "Authorization: Bearer YOUR_TOKEN"

What Gets Rolled Back

When rolling back to a previous deployment:

ComponentRolled BackDetails
Code✅ YesGit repository reverted to target commit
Addon Repositories✅ YesAll platform/org/project addons reverted
Docker Image✅ YesSame Odoo version + Docker image
Configuration✅ Yesodoo.conf from target deployment
Environment Variables✅ YesEnvironment-specific variables from target
Database Schema⚠️ PartialSee Database Rollback section
Filestore❌ NoUploaded files NOT rolled back
PostgreSQL Data❌ NoDatabase data NOT rolled back

Database Rollback Limitations

⚠️ CRITICAL: Database schema rollback is NOT automatic

When rolling back code, the database may have migrations from the newer deployment that are NOT reverted:

# Scenario: Rollback from v2 to v1
v1 deployment:
  - Odoo modules: sale, purchase
  - Database: odoo_schema_v1
 
v2 deployment (added CRM):
  - Odoo modules: sale, purchase, crm
  - Database: odoo_schema_v2 (crm tables added)
 
Rollback to v1:
  - Code:  Reverted to v1 (no CRM module)
  - Database: ⚠️  Still has crm tables from v2
  - Result: May cause errors if CRM dependencies exist

Rollback Best Practices

  1. Test rollback in staging first before production
  2. Check database migrations before rolling back
  3. Consider restore from backup if database changes are incompatible
  4. Monitor logs immediately after rollback
  5. Have runbook ready for rollback procedures
  6. Document breaking changes that require special rollback handling

Rollback Process

The rollback operation creates a new deployment (not a revert):

# From routes/deployments.py - Line 673
@router.post("/{deployment_id}/rollback")
async def rollback_deployment(...):
    # Validate target
    if target.status != DeploymentStatus.SUCCESS:
        raise HTTPException(400, "Can only rollback to successful deployments")
 
    # Create new deployment with rollback trigger
    rollback = Deployment(
        project_id=target.project_id,
        trigger=DeploymentTrigger.ROLLBACK,
        version=next_version,
        git_branch=target.git_branch,
        git_commit=target.git_commit,
        rollback_from_id=target.id
    )
 
    # Deploy using OdooDeployer (same as normal deploy)
    await enqueue_task("deploy_environment", ...)

Rollback with Database Restore

For full rollback including database, use backup restore:

# 1. Find backup from before problematic deployment
GET /api/v1/environments/{env_id}/backups
 
# 2. Restore backup (includes database + filestore)
POST /api/v1/environments/{env_id}/restore
{
  "backup_id": "backup_from_before_issue"
}
 
# 3. Then rollback code to match
POST /api/v1/deployments/{old_deployment_id}/rollback

Rollback Safety Checks

Before rollback, the system checks:

  1. No deployment in progress for the environment
  2. Target deployment exists and is accessible
  3. Target is successful deployment (not failed/cancelled)
  4. User has deploy permission (rollback requires deploy rights)
  5. Environment is active and not deleted

Permission Requirements

  • Required permission: project.environments.deploy
  • Rollback requires same permissions as deploy
  • Organization scope: Must be member of environment's organization

Manual Redeployment

Force redeployment of the current configuration to apply changes or resolve inconsistencies.

When to Redeploy

  • Configuration changes: Environment variables, resource limits
  • Dependency updates: apt packages, Python requirements
  • DNS changes: New domain or subdomain
  • Addon repository changes: Added/removed addon repos
  • Container corruption: Broken Odoo container
  • After restore: Database restored from backup

Redeploy vs Deploy

OperationBehaviorUse Case
DeployCreates new deployment if none runningInitial deployment, after destroy
RedeployForces new deployment even if runningApply config changes, fix issues

API Endpoint

POST /api/v1/environments/{environment_id}/deploy
Authorization: Bearer <token>
Content-Type: application/json
 
{
  "force": true
}

Force Flag Behavior

Without force: true:

  • Checks if deployment in progress → returns 409 Conflict
  • Checks if environment already running → may skip deployment

With force: true:

  • Stops existing container gracefully
  • Creates new deployment record
  • Deploys fresh containers with current configuration
  • Preserves database and filestore (no data loss)

Request

# Force redeploy of environment
curl -X POST https://api.oec.sh/api/v1/environments/env_xyz789/deploy \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "force": true,
    "git_branch": "main"
  }'

What Gets Redeployed

ComponentRedeployedDetails
Docker Container✅ YesNew container with current config
Git Code✅ YesFresh clone from repository
Addon Repositories✅ YesRe-clones all addon repos
odoo.conf✅ YesRegenerated from current settings
Dependencies✅ YesReinstalls apt.txt + requirements.txt
PostgreSQL❌ NoExisting database reused
Filestore❌ NoExisting filestore preserved

Configuration Changes Requiring Redeploy

Setting ChangedRequires RedeployReason
Environment variables✅ YesMust regenerate odoo.conf
Resource limits (CPU/RAM)✅ YesDocker container limits
Odoo version✅ YesDifferent Docker image
Addon repositories✅ YesMust re-clone repos
Domain/subdomain✅ YesTraefik routing update
Git branch✅ YesSwitch to different code
Database credentials⚠️ MaybeIf changed, must redeploy
PgBouncer settings✅ YesPostgreSQL config regenerated

Redeploy Implementation

# From tasks/worker.py - Line 108
async def redeploy_environment(
    ctx: dict,
    environment_id: str,
    task_id: str | None = None,
    **kwargs
) -> dict:
    # Delegate to deploy with force=True
    return await execute_deploy(
        ctx,
        environment_id,
        task_id=task_id,
        force=True,  # Force redeploy
        **kwargs
    )

Redeploy vs Destroy + Deploy

ApproachDowntimeData LossSpeed
Redeploy~2-5 minNoneFast
Destroy + Deploy~3-7 minDatabase + filestore lostSlow

Use redeploy when:

  • Applying configuration changes
  • Updating dependencies
  • Fixing container issues
  • Preserving data is critical

Use destroy + deploy when:

  • Complete cleanup needed
  • Database corrupted
  • Starting fresh environment

Permission Requirements

  • Required permission: project.environments.deploy
  • Organization scope: Must be member of environment's organization

Environment Destroy

Completely remove an environment and all associated resources including containers, networks, and volumes.

Destroy Operation

Removes all infrastructure for an environment:

  • Odoo container: Stops and removes
  • PostgreSQL container: Stops and removes
  • PostgreSQL volume: Always removed (required for clean redeploy)
  • Docker network: Removes isolated network
  • odoo.conf: Deletes configuration file
  • Optional: Odoo data directory (addons, filestore)

API Endpoint

DELETE /api/v1/deployments/environment/{environment_id}/destroy?delete_data=false
Authorization: Bearer <token>

Query Parameters

ParameterTypeDefaultDescription
delete_databooleanfalseAlso delete Odoo data directory (filestore, custom addons)

Request Examples

# Destroy containers only (preserve filestore and custom addons)
curl -X DELETE https://api.oec.sh/api/v1/deployments/environment/env_xyz789/destroy \
  -H "Authorization: Bearer YOUR_TOKEN"
 
# Destroy everything including data
curl -X DELETE https://api.oec.sh/api/v1/deployments/environment/env_xyz789/destroy?delete_data=true \
  -H "Authorization: Bearer YOUR_TOKEN"

Response

{
  "message": "Environment destroyed successfully",
  "environment_id": "env_xyz789",
  "deleted_containers": true,
  "deleted_database": true,
  "deleted_data": false
}

What Gets Deleted

Always Deleted (Default)

# Docker containers
docker stop {env_id}_odoo
docker rm {env_id}_odoo
docker stop {env_id}_db
docker rm {env_id}_db
 
# Docker network
docker network rm paasportal_net_{env_id}
 
# PostgreSQL volume (CRITICAL: Required for fresh DB credentials)
docker volume rm paasportal_pgdata_{env_id}
 
# Odoo configuration
rm /opt/paasportal/{env_id}/odoo.conf

Conditionally Deleted (delete_data=true)

# Entire environment directory
rm -rf /opt/paasportal/{env_id}/
  ├── odoo.conf          # Config file
  ├── addons/           # Custom addons (git clones)
  ├── filestore/        # User uploaded files
  └── logs/             # Odoo log files

Why PostgreSQL Volume is Always Removed

Critical Security Requirement: PostgreSQL stores password hashes in the volume. If the volume persists across deployments with different credentials, authentication will fail.

# Problem scenario if volume NOT removed:
1. Deploy environment with password "abc123"
    PostgreSQL volume created with password hash
2. Destroy environment
3. Redeploy with password "xyz789"
    New password in odoo.conf
    Volume still has old password hash
    Connection fails: FATAL: password authentication failed

Solution: Always remove PostgreSQL volume on destroy to ensure fresh credentials.

Destroy Deployment Record

When you destroy an environment, a special deployment record is created:

# From routes/deployments.py - Line 1187
destroy_record = Deployment(
    project_id=project.id,
    environment_id=environment_id,
    trigger=DeploymentTrigger.DESTROY,
    status=DeploymentStatus.DESTROYED,
    started_at=datetime.now(UTC),
    completed_at=datetime.now(UTC)
)

This provides audit trail of when environment was destroyed and by whom.

Environment Status After Destroy

environment.status = EnvironmentStatus.PENDING
environment.container_id = None
environment.container_name = None
environment.is_active = False  # CRITICAL: Releases quota

⚠️ CRITICAL: Setting is_active = False releases resource quota. Without this, the organization's quota remains allocated even after containers are destroyed.

Quota Implications

# From services/quota_service.py
def _get_total_resources(org_id):
    # Only count active environments
    result = db.execute(
        select(func.sum(ProjectEnvironment.cpu_cores))
        .where(
            ProjectEnvironment.organization_id == org_id,
            ProjectEnvironment.is_active == True  # ← Must be set to False on destroy
        )
    )

If is_active not set to False:

  • Quota remains allocated
  • Cannot deploy new environments (quota exceeded)
  • Organization appears over quota despite no running containers

Destroy vs Pause

OperationContainersDataQuotaRestart
PauseStoppedPreservedStill allocatedCan restart anytime
DestroyRemovedOptional preserveReleasedMust redeploy

Permission Requirements

  • Required permission: project.environments.deploy
  • Destroy requires deploy permission (destructive operation)
  • Organization scope: Must be member of environment's organization

Deployment Triggers

Deployments can be triggered through multiple methods, each tracked separately.

Trigger Types

# From models/deployment.py
class DeploymentTrigger(enum.Enum):
    MANUAL = "manual"              # User clicked deploy button
    GIT_PUSH = "git_push"          # Git webhook (GitHub/GitLab push)
    WEBHOOK = "webhook"            # Generic webhook trigger
    SCHEDULED = "scheduled"        # Scheduled deployment (future)
    ROLLBACK = "rollback"          # Rollback operation
    AUTO_DEPLOY = "auto_deploy"    # Auto-deploy on branch update
    DESTROY = "destroy"            # Environment destroyed
    CONFIG_UPDATE = "config_update" # Configuration changed

Manual Deployment

Triggered by user via dashboard or API:

POST /api/v1/deployments
{
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "git_branch": "main"
}

Properties:

  • Tracked to specific user (triggered_by field)
  • Immediate execution
  • Full deployment logs visible in dashboard

Webhook-Triggered Deployment

Automatic deployment on git push via GitHub/GitLab webhooks.

GitHub Webhook Setup

# Webhook URL
POST https://api.oec.sh/api/v1/webhooks/github
 
# Headers
X-Hub-Signature-256: sha256=<hmac_signature>
X-GitHub-Event: push
X-GitHub-Delivery: <delivery_id>
 
# Payload
{
  "ref": "refs/heads/main",
  "repository": {
    "full_name": "your-org/your-repo"
  },
  "commits": [...]
}

Webhook Security

# From routes/webhooks.py - Line 53
def verify_github_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = "sha256=" + hmac.new(
        secret.encode("utf-8"),
        payload,
        hashlib.sha256
    ).hexdigest()
 
    return hmac.compare_digest(expected, signature)

Security requirements:

  • HMAC-SHA256 signature validation
  • Per-project webhook secret
  • Rejects unsigned requests
  • Rate limited: 30 requests/minute

Auto-Deploy Configuration

Enable auto-deploy in project settings:

PATCH /api/v1/projects/{project_id}/environments/{env_id}
{
  "auto_deploy": true,
  "git_branch": "main"
}

When auto_deploy = true:

  • Push to git_branch triggers deployment
  • Creates deployment with trigger = "git_push"
  • triggered_by is null (webhook, not user)

Module Repository Auto-Deploy

Webhook also triggers addon repository deployments:

# From routes/webhooks.py - Line 100
async def _trigger_module_repo_deploys(repo_url, branch):
    # Find all module repos with:
    # - Same git_url
    # - Same git_branch
    # - auto_deploy = true
    # - status = ACTIVE
 
    for repo in matching_repos:
        # Queue deployment for each environment
        await enqueue_task("deploy_module_repository", ...)

Scheduled Deployments (Future)

Planned feature: Deploy at specific times

{
  "schedule": {
    "cron": "0 2 * * *",
    "timezone": "UTC",
    "enabled": true
  }
}

Use cases:

  • Deploy during maintenance windows
  • Off-peak deployments
  • Coordinated multi-environment updates

Deployment Queue

OEC.SH uses ARQ (Async Redis Queue) for background task processing with sophisticated queue management.

Queue Architecture

Queue Configuration

# From config.py - Line 472
class QueueSettings(BaseSettings):
    max_jobs: int = 10              # Max concurrent jobs
    job_timeout: int = 600           # 10 minutes per job
    max_tries: int = 3               # Retry failed jobs 3 times
    queue_name: str = "paasportal:arq:queue"
    deployment_timeout: int = 1800   # 30 min deployment timeout

Concurrent Deployment Limits

System-wide: 10 concurrent jobs across all organizations

Per-organization: No hard limit, but rate limited:

  • Deployments: 10/minute
  • Heavy operations: 10/minute

Per-environment: 1 deployment at a time:

# From routes/deployments.py - Line 113
# Check for ongoing deployments
result = await db.execute(
    select(Deployment).where(
        Deployment.environment_id == environment_id,
        Deployment.status.in_([
            DeploymentStatus.PENDING,
            DeploymentStatus.RUNNING
        ])
    )
)
 
if result.scalar_one_or_none():
    raise HTTPException(409, "Deployment already in progress")

Queue Priority

FIFO (First In, First Out) - no priority system currently.

All deployments processed in order of enqueue time:

# From tasks/worker.py - Line 78
async def deploy_environment(ctx, environment_id, task_id, **kwargs):
    # Jobs processed in order queued
    logger.info(f"Starting deployment {environment_id}")

Future priority system may include:

  • Production deployments first
  • Paid tier customers first
  • Critical hotfix deployments

Deployment Timeout Handling

Stuck deployment detection: ARQ cron job runs every 5 minutes

# From tasks/worker.py - Line 457
async def check_stuck_deployments(ctx: dict) -> None:
    # Find deployments stuck for > 30 minutes
    cutoff_time = datetime.now(UTC) - timedelta(seconds=1800)
 
    stuck = db.execute(
        select(Deployment).where(
            Deployment.status.in_([PENDING, RUNNING]),
            Deployment.started_at < cutoff_time
        )
    )
 
    for deployment in stuck:
        deployment.status = DeploymentStatus.FAILED
        deployment.error_message = "Deployment timed out after 30 minutes"
        environment.status = EnvironmentStatus.ERROR

Timeout behaviors:

  • Deployment marked as FAILED
  • Environment status → ERROR
  • Error message logged
  • No automatic retry (manual retry available)

Task Retry Logic

Failed tasks automatically retry with exponential backoff:

# Retry schedule (from config)
retry_base_delay: 5.0 seconds
retry_max_delay: 300.0 seconds (5 minutes)
 
# Actual retries
Attempt 1: Immediate
Attempt 2: 5 seconds later
Attempt 3: 15 seconds later (exponential backoff)

After 3 failed attempts, task moves to Dead Letter Queue (DLQ).

Dead Letter Queue (DLQ)

Failed tasks after max retries are stored for manual inspection:

# DLQ storage
dlq_key = f"paasportal:dlq:{job_id}"
redis.hset(dlq_key, {
    "function_name": "deploy_environment",
    "args": [...],
    "error": "SSH connection timeout",
    "retry_count": 3,
    "traceback": "..."
})

Accessing DLQ:

# Future API endpoint (planned)
GET /api/v1/admin/dead-letter-queue

Deployment Artifacts

Each deployment creates and stores artifacts for rollback and audit purposes.

Container Images

Docker images pulled from registry:

# From models/deployment.py
deployment.image_tag = "odoo:17.0"  # Full image URL

Image sources:

  • Official Odoo images: odoo:17.0, odoo:18.0
  • Custom registry images: Configured in OdooVersion model
  • With authentication: Username/password encrypted

Configuration Snapshots

Each deployment captures full configuration:

deployment.extra_data = {
    "git": {
        "committer_name": "John Doe",
        "committer_email": "john@example.com",
        "commit_date": "2025-01-15T10:00:00Z"
    },
    "config": {
        "cpu_cores": 2,
        "ram_mb": 4096,
        "disk_gb": 20,
        "workers": 4
    },
    "addons": [
        {"name": "platform-addons", "branch": "17.0"},
        {"name": "org-addons", "branch": "main"}
    ]
}

Git Commit Information

Captured from git repository during deployment:

# From services/odoo_deployer.py - Line 453
git_info = {
    "git_commit": "a1b2c3d4e5f6g7h8i9j0",  # Full SHA
    "git_message": "Fix authentication bug",
    "git_branch": "main",
    "committer_name": "Jane Smith",
    "committer_email": "jane@example.com",
    "commit_date": "2025-01-15T14:23:45Z"
}

Display in UI:

Deployment #12 - main@a1b2c3d
"Fix authentication bug"
by Jane Smith on Jan 15, 2025 at 2:23 PM

Database Dumps (for Rollback)

NOT automatically created on each deployment.

For rollback with database, use backup system:

# Create backup before risky deployment
POST /api/v1/environments/{env_id}/backups
{
  "backup_type": "manual",
  "comment": "Before v2.0 deployment"
}
 
# If deployment fails, restore backup
POST /api/v1/environments/{env_id}/restore
{
  "backup_id": "backup_xyz"
}

Artifact Retention

Artifact TypeRetentionCleanup
Deployment recordsForeverManual deletion only
Deployment logs90 daysAutomatic cleanup
Container imagesOn server until destroyDocker prune
Git clonesUntil destroyRemains in /opt/paasportal/
Backup filesPer retention policyGFS (Grandfather-Father-Son)

Health Checks

Post-deployment health validation ensures the environment is accessible and functioning.

Health Check Process

# From services/odoo_deployer.py - Line 598
result.step = DeploymentStep.HEALTH_CHECK
 
healthy = await self._health_check(config)
 
if healthy:
    result.add_log("Health check passed!")
else:
    result.add_log("Warning: Health check failed, but container is running")

Health Check Types

1. Container Health Check

Verifies container is running:

docker inspect {container_name} --format='{{.State.Health.Status}}'
 
# Expected: "healthy" or "running"

2. HTTP Health Check

Attempts to connect to Odoo web interface:

async def _health_check(config):
    url = f"https://{config.subdomain}.{config.apps_domain}"
 
    try:
        response = await http.get(url, timeout=30)
        return response.status_code == 200
    except Exception as e:
        logger.warning(f"Health check failed: {e}")
        return False

3. Database Connection Check

Verifies PostgreSQL connectivity:

docker exec {db_container} pg_isready -U odoo
 
# Expected: "accepting connections"

Health Check Endpoints

Configurable health check endpoints (future feature):

{
  "health_checks": [
    {
      "type": "http",
      "path": "/web/health",
      "expected_status": 200,
      "timeout": 10
    },
    {
      "type": "tcp",
      "port": 8069,
      "timeout": 5
    }
  ]
}

Health Check Failure Handling

Current behavior (non-fatal):

  • Health check fails → Warning logged
  • Deployment continues → Status = SUCCESS
  • Environment status → RUNNING
  • Container remains running

Future behavior (optional automatic rollback):

{
  "health_check_policy": {
    "enabled": true,
    "rollback_on_failure": true,
    "retries": 3,
    "retry_interval": 30
  }
}

Post-Deployment Validation

After health checks pass, additional validations run:

  1. DNS resolution check: Verify domain resolves to correct IP
  2. SSL certificate check: Ensure HTTPS working (if configured)
  3. Odoo database accessibility: Can connect to database
  4. Addon loading: All modules loaded successfully

Deployment Logs

Comprehensive logging for every deployment step with structured log entries stored in PostgreSQL.

Log Structure

# From models/deployment.py - Line 175
class DeploymentLog(Base):
    id: UUID
    deployment_id: UUID
    timestamp: datetime
    level: LogLevel  # DEBUG, INFO, WARNING, ERROR
    message: str
    step: str  # Deployment step identifier
    data: dict  # Additional structured data

Log Levels

class LogLevel(enum.Enum):
    DEBUG = "debug"      # Verbose debugging info
    INFO = "info"        # Normal operational messages
    WARNING = "warning"  # Warning but not fatal
    ERROR = "error"      # Fatal errors causing failure

Deployment Steps Logged

# From services/odoo_deployer.py - Line 49
DEPLOYMENT_STEPS = [
    "initializing",              # Preparing deployment config
    "connecting",               # SSH connection to server
    "creating_network",         # Docker network creation
    "configuring_dns",          # DNS record creation
    "creating_postgres",        # PostgreSQL container
    "cloning_platform_repos",   # Platform addon repos
    "cloning_org_repos",        # Organization addon repos
    "cloning_repo",            # Project primary repository
    "pulling_image",           # Odoo Docker image
    "generating_config",       # odoo.conf generation
    "starting_container",      # Odoo container start
    "installing_dependencies", # apt.txt + requirements.txt
    "initializing_database",   # Database initialization
    "verifying_dns",          # DNS propagation check
    "configuring_traefik",    # Traefik routing setup
    "health_check",           # Post-deployment health check
    "completed"               # Deployment success
]

Log API Endpoints

Get Deployment Logs

GET /api/v1/deployments/{deployment_id}/logs?level=error&skip=0&limit=500
Authorization: Bearer <token>

Query parameters:

  • level (optional): Filter by log level (debug, info, warning, error)
  • skip (optional): Pagination offset (default: 0)
  • limit (optional): Max logs to return (default: 500, max: 500)

Response:

{
  "deployment_id": "abc123",
  "logs": [
    {
      "id": "log_001",
      "level": "info",
      "message": "Connecting to 192.168.1.100...",
      "timestamp": "2025-01-15T10:30:05Z",
      "data": {
        "vm_ip": "192.168.1.100",
        "ssh_port": 22
      }
    },
    {
      "id": "log_002",
      "level": "info",
      "message": "SSH connection established (1.2s)",
      "timestamp": "2025-01-15T10:30:06Z",
      "data": {
        "duration_seconds": 1.2
      }
    }
  ]
}

Log Examples

Successful Deployment

[
  {"level": "info", "message": "Initializing deployment..."},
  {"level": "info", "message": "Connecting to 192.168.1.100..."},
  {"level": "info", "message": "SSH connection established (1.2s)"},
  {"level": "info", "message": "Creating network paasportal_net_abc123..."},
  {"level": "info", "message": "Network created (0.3s)"},
  {"level": "info", "message": "Configuring DNS record (early for propagation)..."},
  {"level": "info", "message": "DNS record created: env1.apps.oec.sh -> 192.168.1.100 (0.8s)"},
  {"level": "info", "message": "Creating PostgreSQL container abc123_db..."},
  {"level": "info", "message": "PostgreSQL container running (3.2s)"},
  {"level": "info", "message": "Cloning 2 platform addon repositories..."},
  {"level": "info", "message": "[1/2] Cloning platform-addons (17.0)..."},
  {"level": "info", "message": "[1/2] platform-addons ready (2.1s)"},
  {"level": "info", "message": "Pulling Odoo 17.0 image..."},
  {"level": "info", "message": "Image ready (8.3s)"},
  {"level": "info", "message": "Starting Odoo container abc123_odoo..."},
  {"level": "info", "message": "Container started: 9f8e7d6c5b4a (2.7s)"},
  {"level": "info", "message": "Initializing Odoo database..."},
  {"level": "info", "message": "Database initialization complete (45.2s)"},
  {"level": "info", "message": "Performing health check..."},
  {"level": "info", "message": "Health check passed! (3.1s)"},
  {"level": "info", "message": "Deployment completed! Total: 68.9s (1.1 min)"}
]

Failed Deployment

[
  {"level": "info", "message": "Initializing deployment..."},
  {"level": "info", "message": "Connecting to 192.168.1.100..."},
  {"level": "error", "message": "SSH connection timeout after 30 seconds"},
  {"level": "error", "message": "Deployment failed: Failed to connect to server via SSH"}
]

Real-Time Log Streaming

SSE (Server-Sent Events) for real-time log updates:

// Frontend: Subscribe to deployment logs
const eventSource = new EventSource(
  `/api/v1/events?organizationId=${orgId}`
);
 
eventSource.addEventListener('deployment.log', (event) => {
  const log = JSON.parse(event.data);
  console.log(`[${log.level}] ${log.message}`);
});

SSE event format:

{
  "event": "deployment.log",
  "data": {
    "deployment_id": "abc123",
    "level": "info",
    "message": "Container started: 9f8e7d6c5b4a (2.7s)",
    "step": "starting_container",
    "timestamp": "2025-01-15T10:35:22Z"
  }
}

Log Retention

  • Storage: PostgreSQL deployment_logs table
  • Retention: Logs retained for lifetime of deployment record
  • Cleanup: Manual deletion only (no automatic cleanup currently)
  • Size: Indexed by deployment_id + timestamp for efficient queries

Permissions

All deployment operations are protected by OEC.SH's Permission Matrix system (Sprint 2E21).

Required Permissions

OperationPermission CodeDescription
Deployproject.environments.deployCreate new deployment
Redeployproject.environments.deployForce redeploy existing
Retryproject.environments.deployRetry failed deployment
Rollbackproject.environments.deployRollback to previous
Cancelproject.deployments.cancelCancel in-progress deployment
View Deploymentsproject.deployments.listList deployment history
View Logsproject.deployments.viewView deployment logs
Destroyproject.environments.deployDestroy environment

Permission Hierarchy

Portal Admin (55+ permissions)
  └─ Organization Owner (40+ permissions)
      └─ Organization Admin (30+ permissions)
          └─ Project Admin (20+ permissions)
              └─ Project Member (10+ permissions)

System Roles

RoleDeployRollbackCancelView LogsDestroy
Portal Admin
Org Owner
Org Admin
Org Member
Project Admin
Project Member

Permission Checks

# From routes/deployments.py - Line 66
@router.post("", response_model=DeploymentResponse)
async def create_deployment(...):
    # Check permission
    has_permission = await check_permission(
        db=db,
        user=current_user,
        permission_code="project.environments.deploy",
        organization_id=project.organization_id,
        project_id=project.id,
    )
 
    if not has_permission:
        raise HTTPException(403, "You don't have permission to deploy.")

Production Environment Protection

Additional protection for production environments (configurable):

# Future feature
if environment.type == "production":
    # Require special permission for production deploy
    has_prod_permission = await check_permission(
        db, user,
        "project.production.deploy",
        organization_id, project_id
    )
 
    if not has_prod_permission:
        raise HTTPException(403, "Production deployment requires special permission")

API Reference

Complete API reference for all deployment operations.

Create Deployment

Endpoint: POST /api/v1/deployments

Request:

{
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "git_branch": "main",
  "git_commit": "a1b2c3d4e5f6"
}

Response: 201 Created

{
  "id": "deploy_001",
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "version": 1,
  "status": "pending",
  "trigger": "manual",
  "triggered_by": "user_123",
  "git_branch": "main",
  "git_commit": "a1b2c3d4e5f6",
  "created_at": "2025-01-15T10:30:00Z"
}

Rate Limit: 10 requests/minute


List Deployments

Endpoint: GET /api/v1/deployments

Query Parameters:

  • project_id (optional): Filter by project
  • status (optional): Filter by status (pending, success, failed, etc.)
  • skip (optional): Pagination offset (default: 0)
  • limit (optional): Results per page (default: 50, max: 100)

Response: 200 OK

[
  {
    "id": "deploy_001",
    "project_id": "proj_abc123",
    "environment_id": "env_xyz789",
    "version": 5,
    "status": "success",
    "trigger": "manual",
    "git_commit": "a1b2c3d",
    "git_message": "Fix authentication bug",
    "started_at": "2025-01-15T10:30:00Z",
    "completed_at": "2025-01-15T10:32:15Z",
    "duration_seconds": 135.2
  }
]

Get Deployment

Endpoint: GET /api/v1/deployments/{deployment_id}

Response: 200 OK

{
  "id": "deploy_001",
  "project_id": "proj_abc123",
  "environment_id": "env_xyz789",
  "vm_id": "vm_001",
  "version": 5,
  "status": "success",
  "trigger": "manual",
  "triggered_by": "user_123",
  "git_commit": "a1b2c3d4e5f6",
  "git_branch": "main",
  "git_message": "Fix authentication bug",
  "started_at": "2025-01-15T10:30:00Z",
  "completed_at": "2025-01-15T10:32:15Z",
  "duration_seconds": 135.2,
  "container_id": "9f8e7d6c5b4a",
  "image_tag": "odoo:17.0",
  "extra_data": {
    "git": {
      "committer_name": "Jane Smith",
      "committer_email": "jane@example.com",
      "commit_date": "2025-01-15T09:00:00Z"
    }
  }
}

Get Deployment Logs

Endpoint: GET /api/v1/deployments/{deployment_id}/logs

Query Parameters:

  • level (optional): Filter by log level (debug, info, warning, error)
  • skip (optional): Pagination offset (default: 0)
  • limit (optional): Max logs (default: 500, max: 500)

Response: 200 OK

{
  "deployment_id": "deploy_001",
  "logs": [
    {
      "id": "log_001",
      "level": "info",
      "message": "Connecting to 192.168.1.100...",
      "timestamp": "2025-01-15T10:30:05Z",
      "data": {"vm_ip": "192.168.1.100"}
    }
  ]
}

Retry Deployment

Endpoint: POST /api/v1/deployments/{deployment_id}/retry

Response: 200 OK

{
  "id": "deploy_new",
  "project_id": "proj_abc123",
  "version": 6,
  "status": "pending",
  "trigger": "manual",
  "git_branch": "main",
  "git_commit": "a1b2c3d",
  "extra_data": {
    "retry_of": "deploy_001"
  }
}

Error Responses:

  • 400 Bad Request: Can only retry failed deployments
  • 404 Not Found: Deployment not found
  • 409 Conflict: Deployment already in progress

Rollback Deployment

Endpoint: POST /api/v1/deployments/{deployment_id}/rollback

Response: 200 OK

{
  "id": "deploy_rollback",
  "project_id": "proj_abc123",
  "version": 7,
  "status": "pending",
  "trigger": "rollback",
  "git_branch": "main",
  "git_commit": "previous_commit",
  "rollback_from_id": "deploy_005",
  "extra_data": {
    "rollback_to": "deploy_001"
  }
}

Error Responses:

  • 400 Bad Request: Can only rollback to successful deployments
  • 404 Not Found: Deployment not found
  • 409 Conflict: Deployment already in progress

Cancel Deployment

Endpoint: POST /api/v1/deployments/{deployment_id}/cancel

Response: 200 OK

{
  "message": "Deployment cancelled"
}

Error Responses:

  • 400 Bad Request: Cannot cancel deployment in {status} state
  • 404 Not Found: Deployment not found

Destroy Environment

Endpoint: DELETE /api/v1/deployments/environment/{environment_id}/destroy

Query Parameters:

  • delete_data (optional): Also delete Odoo data directory (default: false)

Response: 200 OK

{
  "message": "Environment destroyed successfully",
  "environment_id": "env_xyz789",
  "deleted_containers": true,
  "deleted_database": true,
  "deleted_data": false
}

Get Deployment Progress

Endpoint: GET /api/v1/deployments/{deployment_id}/progress

Response: 200 OK

{
  "deployment_id": "deploy_001",
  "status": "deploying",
  "progress_percent": 65,
  "steps": [
    {
      "id": "initializing",
      "name": "Initializing",
      "description": "Preparing deployment configuration",
      "status": "completed",
      "logs": [
        {"message": "Initializing deployment...", "timestamp": "2025-01-15T10:30:00Z"}
      ]
    },
    {
      "id": "connecting",
      "name": "Connecting",
      "description": "Connecting to server via SSH",
      "status": "completed",
      "logs": [
        {"message": "SSH connection established (1.2s)", "timestamp": "2025-01-15T10:30:01Z"}
      ]
    },
    {
      "id": "pulling_image",
      "name": "Pulling Image",
      "description": "Downloading Odoo Docker image",
      "status": "running",
      "logs": []
    },
    {
      "id": "starting_container",
      "name": "Starting Container",
      "status": "pending"
    }
  ],
  "current_step": "pulling_image",
  "started_at": "2025-01-15T10:30:00Z"
}

Best Practices

Guidelines for safe and effective deployment operations.

When to Retry vs Redeploy

Retry when:

  • Transient network errors (SSH timeout, git clone failed)
  • Docker registry rate limits
  • Temporary server resource exhaustion
  • No configuration changes needed

Redeploy when:

  • Configuration changed (env vars, resource limits)
  • Dependencies updated (apt packages, pip requirements)
  • DNS or domain changed
  • Need to apply new settings

Rollback Safety Considerations

  1. Check database migrations: Ensure rollback target's database schema is compatible
  2. Test in staging first: Always test rollback in non-production environment
  3. Consider backup restore: For full rollback including database, use backup restore + code rollback
  4. Document breaking changes: Maintain runbook of changes that prevent rollback
  5. Monitor after rollback: Watch logs and metrics after rollback completes

Testing Before Production Deployment

  1. Use staging environment: Deploy to staging first
  2. Run automated tests: Execute test suite after deployment
  3. Manual QA: Verify critical workflows
  4. Performance testing: Check response times and load
  5. Rollback drill: Practice rollback procedure before production deploy

Deployment Runbook Template

# Deployment Runbook: [Feature Name]
 
## Pre-Deployment
- [ ] Code review completed
- [ ] Tests passing in CI/CD
- [ ] Database migrations reviewed
- [ ] Backup created (ID: _________)
- [ ] Staging deployment successful
- [ ] Rollback procedure documented
 
## Deployment Steps
1. Deploy to production via dashboard
2. Monitor deployment logs for errors
3. Verify health checks pass
4. Run smoke tests
5. Monitor error rates for 15 minutes
 
## Rollback Plan
**If deployment fails**:
1. Cancel deployment if in progress
2. Check error logs: /api/v1/deployments/{id}/logs
3. Rollback to deployment ID: _________
4. If rollback fails, restore backup: _________
 
## Breaking Changes
- Database migration: Adds `new_column` (compatible with v1)
- API endpoint changed: `/v1/old``/v2/new` (v1 still supported)
 
## Success Criteria
- [ ] All health checks pass
- [ ] Error rate < 0.1%
- [ ] Response time < 500ms p95
- [ ] No user-reported issues in first 30 min

Troubleshooting

Common issues and solutions for deployment operations.

Stuck Deployments

Symptom: Deployment status stuck at PENDING or DEPLOYING for > 30 minutes

Diagnosis:

# Check deployment logs
GET /api/v1/deployments/{id}/logs?level=error
 
# Check ARQ worker status
# (Admin only - via SSH to backend server)
docker logs paasportal_worker

Solutions:

  1. Wait for automatic timeout: System automatically fails stuck deployments after 30 minutes
  2. Cancel deployment: POST /api/v1/deployments/{id}/cancel
  3. Retry deployment: After cancellation, retry from dashboard
  4. Check server resources: SSH to server, check docker ps, df -h, free -m

Failed Retries

Symptom: Retry fails with same error repeatedly

Diagnosis:

# Get detailed error message
GET /api/v1/deployments/{id}
 
# Common error patterns
"SSH connection timeout"           Server unreachable
"Git clone failed"                Invalid credentials or repo URL
"Docker pull failed"              Registry authentication issue
"PostgreSQL connection failed"    Database container not starting

Solutions by Error Type:

SSH Connection Timeout:

# Check VM is reachable
ping {vm_ip}
 
# Check SSH port open
nc -zv {vm_ip} 22
 
# Update VM SSH credentials if changed
PATCH /api/v1/vms/{vm_id}

Git Clone Failed:

# Verify repository URL
curl -I {git_repo_url}
 
# Check git credentials
# - For GitHub: OAuth token in user profile
# - For private repos: Ensure git connection configured
GET /api/v1/organizations/{org_id}/git-connections

Docker Pull Failed:

# Check Docker registry credentials
# - For custom registries: Update OdooVersion config
GET /api/v1/admin/odoo-versions/{version_id}
 
# Test pull manually on server
docker pull odoo:17.0

Rollback Failures

Symptom: Rollback creates new deployment but it fails

Diagnosis:

# Check if target deployment is too old
GET /api/v1/deployments/{target_id}
 
# Compare database schemas
# (Manual - requires database access)
psql -U postgres -d {db_name} -c "\d"

Solutions:

Database Schema Incompatibility:

  1. Restore backup from before problematic deployment:

    POST /api/v1/environments/{env_id}/restore
    {"backup_id": "backup_before_issue"}
  2. Then rollback code:

    POST /api/v1/deployments/{old_deployment_id}/rollback

Missing Addon Dependencies:

  • Check if rolled-back code depends on addons that were removed
  • Restore missing addon repositories
  • Redeploy with correct addon configuration

Resource Exhaustion:

  • Server out of disk space for rollback
  • Check: df -h /var/lib/docker
  • Clean up: docker system prune -a

Deployment Logs Not Showing

Symptom: Deployment progress shows no logs or logs frozen

Diagnosis:

# Check deployment status
GET /api/v1/deployments/{id}
 
# Check SSE connection
# (Frontend - browser console)
EventSource readyState: 1 (connected)

Solutions:

  1. Refresh page: SSE connection may have dropped
  2. Check network: Firewall blocking SSE port
  3. Backend issue: Check worker logs (admin only)
  4. Fetch logs directly: Use logs API endpoint instead of SSE

Environment Destroyed But Quota Not Released

Symptom: After destroying environment, organization shows as over quota

Diagnosis:

# Check environment is_active status
GET /api/v1/environments/{env_id}
 
# Check quota usage
GET /api/v1/organizations/{org_id}/quota

Solution:

# Manually set is_active to false
# (Admin only - direct database update)
UPDATE project_environments
SET is_active = false
WHERE id = '{env_id}';
 
# Quota will recalculate on next check

Prevention: Always use destroy API endpoint, not manual container deletion


Related Documentation


Last Updated: January 2025 Platform Version: 2E41