Deployment Operations
Learn how to manage the deployment lifecycle with retry, rollback, cancellation, and redeployment operations in OEC.SH.
Overview
OEC.SH provides comprehensive deployment lifecycle management operations to handle failures, rollback problematic deployments, and maintain environment stability. The platform uses a sophisticated state machine with ARQ-based background task processing for resilient, long-running operations.
Available Operations
| Operation | Purpose | Use Case |
|---|---|---|
| Retry | Retry a failed deployment with same configuration | Transient network errors, temporary resource unavailability |
| Rollback | Revert to a previous successful deployment | New deployment introduced bugs or issues |
| Cancel | Stop an in-progress deployment | Incorrect configuration, need to make changes |
| Redeploy | Force redeployment of current configuration | Configuration changes, dependency updates |
| Destroy | Remove environment and all containers | Environment cleanup, resource reclamation |
Deployment State Machine
OEC.SH uses a state machine to track deployment lifecycle and ensure valid state transitions.
Deployment Statuses
# From models/deployment.py
class DeploymentStatus(enum.Enum):
PENDING = "pending" # Deployment created, waiting to start
QUEUED = "queued" # Queued in ARQ for processing
BUILDING = "building" # Building Docker image
DEPLOYING = "deploying" # Actively deploying to server
RUNNING = "running" # Legacy - use DEPLOYING instead
SUCCESS = "success" # Deployment completed successfully
FAILED = "failed" # Deployment failed with error
CANCELLED = "cancelled" # Deployment cancelled by user
ROLLED_BACK = "rolled_back" # Previous deployment rolled back
DESTROYED = "destroyed" # Environment was destroyedState Transitions
Valid state transitions for deployment operations:
Terminal States
Terminal states (no further automatic transitions):
SUCCESS- Can be rolled back to or used as rollback targetFAILED- Can be retried (creates new deployment)CANCELLED- Can trigger new deployment manuallyDESTROYED- Environment removed, requires full redeploy
Retry Failed Deployments
Retry a failed deployment with the same configuration to recover from transient failures.
When to Retry vs Redeploy
| Scenario | Action | Reason |
|---|---|---|
| Network timeout during git clone | Retry | Transient network issue |
| SSH connection dropped | Retry | Temporary connection problem |
| Docker registry rate limit | Retry | Wait and retry |
| Invalid git branch name | Redeploy | Fix configuration first |
| Missing environment variables | Redeploy | Fix config first |
| Wrong Odoo version | Redeploy | Update project settings |
API Endpoint
POST /api/v1/deployments/{deployment_id}/retry
Authorization: Bearer <token>Request
curl -X POST https://api.oec.sh/api/v1/deployments/a1b2c3d4-5678-90ab-cdef-1234567890ab/retry \
-H "Authorization: Bearer YOUR_TOKEN"Response
{
"id": "e5f6g7h8-90ab-cdef-1234-567890abcdef",
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"version": 5,
"status": "pending",
"trigger": "manual",
"git_commit": "a1b2c3d",
"git_branch": "main",
"extra_data": {
"retry_of": "a1b2c3d4-5678-90ab-cdef-1234567890ab"
},
"created_at": "2025-01-15T10:30:00Z"
}What Gets Retained
When retrying a deployment, the following are retained from the original:
- Git configuration: Same branch, same commit (if specified)
- Environment configuration: CPU, RAM, disk limits
- Environment variables: All environment-specific variables
- Addon repositories: Same platform, org, and project addons
- Database configuration: Same PostgreSQL settings
- Trigger metadata: Marked as retry with reference to original
What Gets Reset
The following are reset on retry:
- Deployment ID: New UUID generated
- Version number: Incremented to next version
- Status: Starts at
PENDING - Timestamps: New
created_at,started_at,completed_at - Logs: Fresh log entries
- Container ID: New container created on success
Retry Limits
Automatic retry limits:
- ARQ task retries: 3 attempts (configured via
QUEUE_MAX_TRIES) - Exponential backoff: 5s → 15s → 45s between retries
- Dead Letter Queue (DLQ): Failed tasks after max retries
Manual retry limits:
- No hard limit on manual retries via API
- Rate limited: 10 retries per minute per organization
Implementation Details
# From routes/deployments.py - Line 570
@router.post("/{deployment_id}/retry")
async def retry_deployment(
db: DBSession,
current_user: CurrentUser,
deployment_id: UUID,
) -> DeploymentResponse:
# Validate deployment status
if original.status != DeploymentStatus.FAILED:
raise HTTPException(400, "Can only retry failed deployments")
# Check for ongoing deployments
if has_ongoing_deployment(project_id):
raise HTTPException(409, "Deployment already in progress")
# Create new deployment with same config
new_deployment = Deployment(
project_id=original.project_id,
version=next_version,
git_branch=original.git_branch,
git_commit=original.git_commit,
extra_data={"retry_of": str(original.id)}
)
# Queue for execution via ARQ
await enqueue_task("deploy_environment", ...)Permission Requirements
- Required permission:
project.environments.deploy - Organization scope: Must be member of environment's organization
- Project scope: Must have deploy permission on project
Deployment Cancellation
Cancel a pending or running deployment gracefully to stop execution and clean up resources.
When to Cancel
- Incorrect configuration detected before deployment completes
- Wrong branch/commit selected and need to redeploy
- Need to make changes to environment variables or settings
- Server issues detected (maintenance, high load)
- Cost control - stop expensive operation
API Endpoint
POST /api/v1/deployments/{deployment_id}/cancel
Authorization: Bearer <token>Request
curl -X POST https://api.oec.sh/api/v1/deployments/abc123/cancel \
-H "Authorization: Bearer YOUR_TOKEN"Response
{
"message": "Deployment cancelled"
}Cancellation Behavior
Graceful Cancellation (Default)
When you cancel a deployment, the system:
- Marks deployment as
CANCELLEDin database - Updates environment status to last known stable state
- Logs cancellation event with user ID and timestamp
- Does NOT kill running processes - they complete naturally
- Does NOT rollback partial changes - containers may remain
State After Cancellation
| Deployment Step | State After Cancel | Cleanup Required |
|---|---|---|
PENDING | Clean, nothing created | None |
CONNECTING | Clean | None |
CLONING_REPO | Partial git clone on server | Files remain in /opt/paasportal/ |
CREATING_POSTGRES | PostgreSQL container running | Container + volume remain |
STARTING_CONTAINER | Odoo container may be running | Container remains |
COMPLETED | Too late to cancel | Deployment succeeded |
Cleanup Operations
After cancellation, you may need to:
- Stop running containers manually via environment actions
- Remove partial clones via environment destroy
- Delete DNS records if auto-created
- Check for orphaned Docker networks
Implementation
# From routes/deployments.py - Line 504
@router.post("/{deployment_id}/cancel")
async def cancel_deployment(
db: DBSession,
current_user: CurrentUser,
deployment_id: UUID,
) -> dict[str, str]:
# Validate current status
if deployment.status not in (DeploymentStatus.PENDING, DeploymentStatus.RUNNING):
raise HTTPException(400, f"Cannot cancel deployment in {deployment.status} state")
# Mark as cancelled
deployment.status = DeploymentStatus.CANCELLED
# Log cancellation
log_entry = DeploymentLog(
deployment_id=deployment.id,
level=LogLevel.WARNING,
message="Deployment cancelled by user",
data={"cancelled_by": str(current_user.id)}
)
await db.commit()Limitations
- Cannot cancel
SUCCESSorFAILEDdeployments (already terminal states) - No automatic cleanup of partial resources
- ARQ tasks may complete if already executing step
- Docker operations in-flight will finish (cannot be interrupted mid-step)
Permission Requirements
- Required permission:
project.deployments.cancel - Organization scope: Must be member of environment's organization
Rollback to Previous Deployment
Rollback to a previous successful deployment to quickly recover from problematic deployments.
When to Rollback
- New deployment introduced bugs that affect production users
- Performance regression detected after deployment
- Breaking changes deployed accidentally
- Need immediate recovery while investigating root cause
- Database migration issues (with caveats - see below)
API Endpoint
POST /api/v1/deployments/{deployment_id}/rollback
Authorization: Bearer <token>Note: This endpoint rolls back TO the specified deployment (not FROM).
Request
# Rollback TO deployment abc123 (the last known good deployment)
curl -X POST https://api.oec.sh/api/v1/deployments/abc123/rollback \
-H "Authorization: Bearer YOUR_TOKEN"Response
{
"id": "new_deployment_id",
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"version": 8,
"status": "pending",
"trigger": "rollback",
"git_commit": "a1b2c3d",
"git_branch": "main",
"rollback_from_id": "abc123",
"extra_data": {
"rollback_to": "abc123"
}
}Rollback Target Selection
The system identifies rollback targets as deployments with:
- Status:
SUCCESSonly - Same environment: Must be for same environment
- Can rollback:
can_rollback = true(default for successful deployments)
# Get deployment history to find rollback target
curl https://api.oec.sh/api/v1/deployments?project_id=proj_abc123&status=success \
-H "Authorization: Bearer YOUR_TOKEN"What Gets Rolled Back
When rolling back to a previous deployment:
| Component | Rolled Back | Details |
|---|---|---|
| Code | ✅ Yes | Git repository reverted to target commit |
| Addon Repositories | ✅ Yes | All platform/org/project addons reverted |
| Docker Image | ✅ Yes | Same Odoo version + Docker image |
| Configuration | ✅ Yes | odoo.conf from target deployment |
| Environment Variables | ✅ Yes | Environment-specific variables from target |
| Database Schema | ⚠️ Partial | See Database Rollback section |
| Filestore | ❌ No | Uploaded files NOT rolled back |
| PostgreSQL Data | ❌ No | Database data NOT rolled back |
Database Rollback Limitations
⚠️ CRITICAL: Database schema rollback is NOT automatic
When rolling back code, the database may have migrations from the newer deployment that are NOT reverted:
# Scenario: Rollback from v2 to v1
v1 deployment:
- Odoo modules: sale, purchase
- Database: odoo_schema_v1
v2 deployment (added CRM):
- Odoo modules: sale, purchase, crm
- Database: odoo_schema_v2 (crm tables added)
Rollback to v1:
- Code: ✅ Reverted to v1 (no CRM module)
- Database: ⚠️ Still has crm tables from v2
- Result: May cause errors if CRM dependencies existRollback Best Practices
- Test rollback in staging first before production
- Check database migrations before rolling back
- Consider restore from backup if database changes are incompatible
- Monitor logs immediately after rollback
- Have runbook ready for rollback procedures
- Document breaking changes that require special rollback handling
Rollback Process
The rollback operation creates a new deployment (not a revert):
# From routes/deployments.py - Line 673
@router.post("/{deployment_id}/rollback")
async def rollback_deployment(...):
# Validate target
if target.status != DeploymentStatus.SUCCESS:
raise HTTPException(400, "Can only rollback to successful deployments")
# Create new deployment with rollback trigger
rollback = Deployment(
project_id=target.project_id,
trigger=DeploymentTrigger.ROLLBACK,
version=next_version,
git_branch=target.git_branch,
git_commit=target.git_commit,
rollback_from_id=target.id
)
# Deploy using OdooDeployer (same as normal deploy)
await enqueue_task("deploy_environment", ...)Rollback with Database Restore
For full rollback including database, use backup restore:
# 1. Find backup from before problematic deployment
GET /api/v1/environments/{env_id}/backups
# 2. Restore backup (includes database + filestore)
POST /api/v1/environments/{env_id}/restore
{
"backup_id": "backup_from_before_issue"
}
# 3. Then rollback code to match
POST /api/v1/deployments/{old_deployment_id}/rollbackRollback Safety Checks
Before rollback, the system checks:
- No deployment in progress for the environment
- Target deployment exists and is accessible
- Target is successful deployment (not failed/cancelled)
- User has deploy permission (rollback requires deploy rights)
- Environment is active and not deleted
Permission Requirements
- Required permission:
project.environments.deploy - Rollback requires same permissions as deploy
- Organization scope: Must be member of environment's organization
Manual Redeployment
Force redeployment of the current configuration to apply changes or resolve inconsistencies.
When to Redeploy
- Configuration changes: Environment variables, resource limits
- Dependency updates: apt packages, Python requirements
- DNS changes: New domain or subdomain
- Addon repository changes: Added/removed addon repos
- Container corruption: Broken Odoo container
- After restore: Database restored from backup
Redeploy vs Deploy
| Operation | Behavior | Use Case |
|---|---|---|
| Deploy | Creates new deployment if none running | Initial deployment, after destroy |
| Redeploy | Forces new deployment even if running | Apply config changes, fix issues |
API Endpoint
POST /api/v1/environments/{environment_id}/deploy
Authorization: Bearer <token>
Content-Type: application/json
{
"force": true
}Force Flag Behavior
Without force: true:
- Checks if deployment in progress → returns 409 Conflict
- Checks if environment already running → may skip deployment
With force: true:
- Stops existing container gracefully
- Creates new deployment record
- Deploys fresh containers with current configuration
- Preserves database and filestore (no data loss)
Request
# Force redeploy of environment
curl -X POST https://api.oec.sh/api/v1/environments/env_xyz789/deploy \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"force": true,
"git_branch": "main"
}'What Gets Redeployed
| Component | Redeployed | Details |
|---|---|---|
| Docker Container | ✅ Yes | New container with current config |
| Git Code | ✅ Yes | Fresh clone from repository |
| Addon Repositories | ✅ Yes | Re-clones all addon repos |
| odoo.conf | ✅ Yes | Regenerated from current settings |
| Dependencies | ✅ Yes | Reinstalls apt.txt + requirements.txt |
| PostgreSQL | ❌ No | Existing database reused |
| Filestore | ❌ No | Existing filestore preserved |
Configuration Changes Requiring Redeploy
| Setting Changed | Requires Redeploy | Reason |
|---|---|---|
| Environment variables | ✅ Yes | Must regenerate odoo.conf |
| Resource limits (CPU/RAM) | ✅ Yes | Docker container limits |
| Odoo version | ✅ Yes | Different Docker image |
| Addon repositories | ✅ Yes | Must re-clone repos |
| Domain/subdomain | ✅ Yes | Traefik routing update |
| Git branch | ✅ Yes | Switch to different code |
| Database credentials | ⚠️ Maybe | If changed, must redeploy |
| PgBouncer settings | ✅ Yes | PostgreSQL config regenerated |
Redeploy Implementation
# From tasks/worker.py - Line 108
async def redeploy_environment(
ctx: dict,
environment_id: str,
task_id: str | None = None,
**kwargs
) -> dict:
# Delegate to deploy with force=True
return await execute_deploy(
ctx,
environment_id,
task_id=task_id,
force=True, # Force redeploy
**kwargs
)Redeploy vs Destroy + Deploy
| Approach | Downtime | Data Loss | Speed |
|---|---|---|---|
| Redeploy | ~2-5 min | None | Fast |
| Destroy + Deploy | ~3-7 min | Database + filestore lost | Slow |
Use redeploy when:
- Applying configuration changes
- Updating dependencies
- Fixing container issues
- Preserving data is critical
Use destroy + deploy when:
- Complete cleanup needed
- Database corrupted
- Starting fresh environment
Permission Requirements
- Required permission:
project.environments.deploy - Organization scope: Must be member of environment's organization
Environment Destroy
Completely remove an environment and all associated resources including containers, networks, and volumes.
Destroy Operation
Removes all infrastructure for an environment:
- Odoo container: Stops and removes
- PostgreSQL container: Stops and removes
- PostgreSQL volume: Always removed (required for clean redeploy)
- Docker network: Removes isolated network
- odoo.conf: Deletes configuration file
- Optional: Odoo data directory (addons, filestore)
API Endpoint
DELETE /api/v1/deployments/environment/{environment_id}/destroy?delete_data=false
Authorization: Bearer <token>Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
delete_data | boolean | false | Also delete Odoo data directory (filestore, custom addons) |
Request Examples
# Destroy containers only (preserve filestore and custom addons)
curl -X DELETE https://api.oec.sh/api/v1/deployments/environment/env_xyz789/destroy \
-H "Authorization: Bearer YOUR_TOKEN"
# Destroy everything including data
curl -X DELETE https://api.oec.sh/api/v1/deployments/environment/env_xyz789/destroy?delete_data=true \
-H "Authorization: Bearer YOUR_TOKEN"Response
{
"message": "Environment destroyed successfully",
"environment_id": "env_xyz789",
"deleted_containers": true,
"deleted_database": true,
"deleted_data": false
}What Gets Deleted
Always Deleted (Default)
# Docker containers
docker stop {env_id}_odoo
docker rm {env_id}_odoo
docker stop {env_id}_db
docker rm {env_id}_db
# Docker network
docker network rm paasportal_net_{env_id}
# PostgreSQL volume (CRITICAL: Required for fresh DB credentials)
docker volume rm paasportal_pgdata_{env_id}
# Odoo configuration
rm /opt/paasportal/{env_id}/odoo.confConditionally Deleted (delete_data=true)
# Entire environment directory
rm -rf /opt/paasportal/{env_id}/
├── odoo.conf # Config file
├── addons/ # Custom addons (git clones)
├── filestore/ # User uploaded files
└── logs/ # Odoo log filesWhy PostgreSQL Volume is Always Removed
Critical Security Requirement: PostgreSQL stores password hashes in the volume. If the volume persists across deployments with different credentials, authentication will fail.
# Problem scenario if volume NOT removed:
1. Deploy environment with password "abc123"
→ PostgreSQL volume created with password hash
2. Destroy environment
3. Redeploy with password "xyz789"
→ New password in odoo.conf
→ Volume still has old password hash
→ Connection fails: FATAL: password authentication failedSolution: Always remove PostgreSQL volume on destroy to ensure fresh credentials.
Destroy Deployment Record
When you destroy an environment, a special deployment record is created:
# From routes/deployments.py - Line 1187
destroy_record = Deployment(
project_id=project.id,
environment_id=environment_id,
trigger=DeploymentTrigger.DESTROY,
status=DeploymentStatus.DESTROYED,
started_at=datetime.now(UTC),
completed_at=datetime.now(UTC)
)This provides audit trail of when environment was destroyed and by whom.
Environment Status After Destroy
environment.status = EnvironmentStatus.PENDING
environment.container_id = None
environment.container_name = None
environment.is_active = False # CRITICAL: Releases quota⚠️ CRITICAL: Setting is_active = False releases resource quota. Without this, the organization's quota remains allocated even after containers are destroyed.
Quota Implications
# From services/quota_service.py
def _get_total_resources(org_id):
# Only count active environments
result = db.execute(
select(func.sum(ProjectEnvironment.cpu_cores))
.where(
ProjectEnvironment.organization_id == org_id,
ProjectEnvironment.is_active == True # ← Must be set to False on destroy
)
)If is_active not set to False:
- Quota remains allocated
- Cannot deploy new environments (quota exceeded)
- Organization appears over quota despite no running containers
Destroy vs Pause
| Operation | Containers | Data | Quota | Restart |
|---|---|---|---|---|
| Pause | Stopped | Preserved | Still allocated | Can restart anytime |
| Destroy | Removed | Optional preserve | Released | Must redeploy |
Permission Requirements
- Required permission:
project.environments.deploy - Destroy requires deploy permission (destructive operation)
- Organization scope: Must be member of environment's organization
Deployment Triggers
Deployments can be triggered through multiple methods, each tracked separately.
Trigger Types
# From models/deployment.py
class DeploymentTrigger(enum.Enum):
MANUAL = "manual" # User clicked deploy button
GIT_PUSH = "git_push" # Git webhook (GitHub/GitLab push)
WEBHOOK = "webhook" # Generic webhook trigger
SCHEDULED = "scheduled" # Scheduled deployment (future)
ROLLBACK = "rollback" # Rollback operation
AUTO_DEPLOY = "auto_deploy" # Auto-deploy on branch update
DESTROY = "destroy" # Environment destroyed
CONFIG_UPDATE = "config_update" # Configuration changedManual Deployment
Triggered by user via dashboard or API:
POST /api/v1/deployments
{
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"git_branch": "main"
}Properties:
- Tracked to specific user (
triggered_byfield) - Immediate execution
- Full deployment logs visible in dashboard
Webhook-Triggered Deployment
Automatic deployment on git push via GitHub/GitLab webhooks.
GitHub Webhook Setup
# Webhook URL
POST https://api.oec.sh/api/v1/webhooks/github
# Headers
X-Hub-Signature-256: sha256=<hmac_signature>
X-GitHub-Event: push
X-GitHub-Delivery: <delivery_id>
# Payload
{
"ref": "refs/heads/main",
"repository": {
"full_name": "your-org/your-repo"
},
"commits": [...]
}Webhook Security
# From routes/webhooks.py - Line 53
def verify_github_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = "sha256=" + hmac.new(
secret.encode("utf-8"),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)Security requirements:
- HMAC-SHA256 signature validation
- Per-project webhook secret
- Rejects unsigned requests
- Rate limited: 30 requests/minute
Auto-Deploy Configuration
Enable auto-deploy in project settings:
PATCH /api/v1/projects/{project_id}/environments/{env_id}
{
"auto_deploy": true,
"git_branch": "main"
}When auto_deploy = true:
- Push to
git_branchtriggers deployment - Creates deployment with
trigger = "git_push" triggered_byisnull(webhook, not user)
Module Repository Auto-Deploy
Webhook also triggers addon repository deployments:
# From routes/webhooks.py - Line 100
async def _trigger_module_repo_deploys(repo_url, branch):
# Find all module repos with:
# - Same git_url
# - Same git_branch
# - auto_deploy = true
# - status = ACTIVE
for repo in matching_repos:
# Queue deployment for each environment
await enqueue_task("deploy_module_repository", ...)Scheduled Deployments (Future)
Planned feature: Deploy at specific times
{
"schedule": {
"cron": "0 2 * * *",
"timezone": "UTC",
"enabled": true
}
}Use cases:
- Deploy during maintenance windows
- Off-peak deployments
- Coordinated multi-environment updates
Deployment Queue
OEC.SH uses ARQ (Async Redis Queue) for background task processing with sophisticated queue management.
Queue Architecture
Queue Configuration
# From config.py - Line 472
class QueueSettings(BaseSettings):
max_jobs: int = 10 # Max concurrent jobs
job_timeout: int = 600 # 10 minutes per job
max_tries: int = 3 # Retry failed jobs 3 times
queue_name: str = "paasportal:arq:queue"
deployment_timeout: int = 1800 # 30 min deployment timeoutConcurrent Deployment Limits
System-wide: 10 concurrent jobs across all organizations
Per-organization: No hard limit, but rate limited:
- Deployments: 10/minute
- Heavy operations: 10/minute
Per-environment: 1 deployment at a time:
# From routes/deployments.py - Line 113
# Check for ongoing deployments
result = await db.execute(
select(Deployment).where(
Deployment.environment_id == environment_id,
Deployment.status.in_([
DeploymentStatus.PENDING,
DeploymentStatus.RUNNING
])
)
)
if result.scalar_one_or_none():
raise HTTPException(409, "Deployment already in progress")Queue Priority
FIFO (First In, First Out) - no priority system currently.
All deployments processed in order of enqueue time:
# From tasks/worker.py - Line 78
async def deploy_environment(ctx, environment_id, task_id, **kwargs):
# Jobs processed in order queued
logger.info(f"Starting deployment {environment_id}")Future priority system may include:
- Production deployments first
- Paid tier customers first
- Critical hotfix deployments
Deployment Timeout Handling
Stuck deployment detection: ARQ cron job runs every 5 minutes
# From tasks/worker.py - Line 457
async def check_stuck_deployments(ctx: dict) -> None:
# Find deployments stuck for > 30 minutes
cutoff_time = datetime.now(UTC) - timedelta(seconds=1800)
stuck = db.execute(
select(Deployment).where(
Deployment.status.in_([PENDING, RUNNING]),
Deployment.started_at < cutoff_time
)
)
for deployment in stuck:
deployment.status = DeploymentStatus.FAILED
deployment.error_message = "Deployment timed out after 30 minutes"
environment.status = EnvironmentStatus.ERRORTimeout behaviors:
- Deployment marked as
FAILED - Environment status →
ERROR - Error message logged
- No automatic retry (manual retry available)
Task Retry Logic
Failed tasks automatically retry with exponential backoff:
# Retry schedule (from config)
retry_base_delay: 5.0 seconds
retry_max_delay: 300.0 seconds (5 minutes)
# Actual retries
Attempt 1: Immediate
Attempt 2: 5 seconds later
Attempt 3: 15 seconds later (exponential backoff)After 3 failed attempts, task moves to Dead Letter Queue (DLQ).
Dead Letter Queue (DLQ)
Failed tasks after max retries are stored for manual inspection:
# DLQ storage
dlq_key = f"paasportal:dlq:{job_id}"
redis.hset(dlq_key, {
"function_name": "deploy_environment",
"args": [...],
"error": "SSH connection timeout",
"retry_count": 3,
"traceback": "..."
})Accessing DLQ:
# Future API endpoint (planned)
GET /api/v1/admin/dead-letter-queueDeployment Artifacts
Each deployment creates and stores artifacts for rollback and audit purposes.
Container Images
Docker images pulled from registry:
# From models/deployment.py
deployment.image_tag = "odoo:17.0" # Full image URLImage sources:
- Official Odoo images:
odoo:17.0,odoo:18.0 - Custom registry images: Configured in
OdooVersionmodel - With authentication: Username/password encrypted
Configuration Snapshots
Each deployment captures full configuration:
deployment.extra_data = {
"git": {
"committer_name": "John Doe",
"committer_email": "john@example.com",
"commit_date": "2025-01-15T10:00:00Z"
},
"config": {
"cpu_cores": 2,
"ram_mb": 4096,
"disk_gb": 20,
"workers": 4
},
"addons": [
{"name": "platform-addons", "branch": "17.0"},
{"name": "org-addons", "branch": "main"}
]
}Git Commit Information
Captured from git repository during deployment:
# From services/odoo_deployer.py - Line 453
git_info = {
"git_commit": "a1b2c3d4e5f6g7h8i9j0", # Full SHA
"git_message": "Fix authentication bug",
"git_branch": "main",
"committer_name": "Jane Smith",
"committer_email": "jane@example.com",
"commit_date": "2025-01-15T14:23:45Z"
}Display in UI:
Deployment #12 - main@a1b2c3d
"Fix authentication bug"
by Jane Smith on Jan 15, 2025 at 2:23 PMDatabase Dumps (for Rollback)
NOT automatically created on each deployment.
For rollback with database, use backup system:
# Create backup before risky deployment
POST /api/v1/environments/{env_id}/backups
{
"backup_type": "manual",
"comment": "Before v2.0 deployment"
}
# If deployment fails, restore backup
POST /api/v1/environments/{env_id}/restore
{
"backup_id": "backup_xyz"
}Artifact Retention
| Artifact Type | Retention | Cleanup |
|---|---|---|
| Deployment records | Forever | Manual deletion only |
| Deployment logs | 90 days | Automatic cleanup |
| Container images | On server until destroy | Docker prune |
| Git clones | Until destroy | Remains in /opt/paasportal/ |
| Backup files | Per retention policy | GFS (Grandfather-Father-Son) |
Health Checks
Post-deployment health validation ensures the environment is accessible and functioning.
Health Check Process
# From services/odoo_deployer.py - Line 598
result.step = DeploymentStep.HEALTH_CHECK
healthy = await self._health_check(config)
if healthy:
result.add_log("Health check passed!")
else:
result.add_log("Warning: Health check failed, but container is running")Health Check Types
1. Container Health Check
Verifies container is running:
docker inspect {container_name} --format='{{.State.Health.Status}}'
# Expected: "healthy" or "running"2. HTTP Health Check
Attempts to connect to Odoo web interface:
async def _health_check(config):
url = f"https://{config.subdomain}.{config.apps_domain}"
try:
response = await http.get(url, timeout=30)
return response.status_code == 200
except Exception as e:
logger.warning(f"Health check failed: {e}")
return False3. Database Connection Check
Verifies PostgreSQL connectivity:
docker exec {db_container} pg_isready -U odoo
# Expected: "accepting connections"Health Check Endpoints
Configurable health check endpoints (future feature):
{
"health_checks": [
{
"type": "http",
"path": "/web/health",
"expected_status": 200,
"timeout": 10
},
{
"type": "tcp",
"port": 8069,
"timeout": 5
}
]
}Health Check Failure Handling
Current behavior (non-fatal):
- Health check fails → Warning logged
- Deployment continues → Status =
SUCCESS - Environment status →
RUNNING - Container remains running
Future behavior (optional automatic rollback):
{
"health_check_policy": {
"enabled": true,
"rollback_on_failure": true,
"retries": 3,
"retry_interval": 30
}
}Post-Deployment Validation
After health checks pass, additional validations run:
- DNS resolution check: Verify domain resolves to correct IP
- SSL certificate check: Ensure HTTPS working (if configured)
- Odoo database accessibility: Can connect to database
- Addon loading: All modules loaded successfully
Deployment Logs
Comprehensive logging for every deployment step with structured log entries stored in PostgreSQL.
Log Structure
# From models/deployment.py - Line 175
class DeploymentLog(Base):
id: UUID
deployment_id: UUID
timestamp: datetime
level: LogLevel # DEBUG, INFO, WARNING, ERROR
message: str
step: str # Deployment step identifier
data: dict # Additional structured dataLog Levels
class LogLevel(enum.Enum):
DEBUG = "debug" # Verbose debugging info
INFO = "info" # Normal operational messages
WARNING = "warning" # Warning but not fatal
ERROR = "error" # Fatal errors causing failureDeployment Steps Logged
# From services/odoo_deployer.py - Line 49
DEPLOYMENT_STEPS = [
"initializing", # Preparing deployment config
"connecting", # SSH connection to server
"creating_network", # Docker network creation
"configuring_dns", # DNS record creation
"creating_postgres", # PostgreSQL container
"cloning_platform_repos", # Platform addon repos
"cloning_org_repos", # Organization addon repos
"cloning_repo", # Project primary repository
"pulling_image", # Odoo Docker image
"generating_config", # odoo.conf generation
"starting_container", # Odoo container start
"installing_dependencies", # apt.txt + requirements.txt
"initializing_database", # Database initialization
"verifying_dns", # DNS propagation check
"configuring_traefik", # Traefik routing setup
"health_check", # Post-deployment health check
"completed" # Deployment success
]Log API Endpoints
Get Deployment Logs
GET /api/v1/deployments/{deployment_id}/logs?level=error&skip=0&limit=500
Authorization: Bearer <token>Query parameters:
level(optional): Filter by log level (debug, info, warning, error)skip(optional): Pagination offset (default: 0)limit(optional): Max logs to return (default: 500, max: 500)
Response:
{
"deployment_id": "abc123",
"logs": [
{
"id": "log_001",
"level": "info",
"message": "Connecting to 192.168.1.100...",
"timestamp": "2025-01-15T10:30:05Z",
"data": {
"vm_ip": "192.168.1.100",
"ssh_port": 22
}
},
{
"id": "log_002",
"level": "info",
"message": "SSH connection established (1.2s)",
"timestamp": "2025-01-15T10:30:06Z",
"data": {
"duration_seconds": 1.2
}
}
]
}Log Examples
Successful Deployment
[
{"level": "info", "message": "Initializing deployment..."},
{"level": "info", "message": "Connecting to 192.168.1.100..."},
{"level": "info", "message": "SSH connection established (1.2s)"},
{"level": "info", "message": "Creating network paasportal_net_abc123..."},
{"level": "info", "message": "Network created (0.3s)"},
{"level": "info", "message": "Configuring DNS record (early for propagation)..."},
{"level": "info", "message": "DNS record created: env1.apps.oec.sh -> 192.168.1.100 (0.8s)"},
{"level": "info", "message": "Creating PostgreSQL container abc123_db..."},
{"level": "info", "message": "PostgreSQL container running (3.2s)"},
{"level": "info", "message": "Cloning 2 platform addon repositories..."},
{"level": "info", "message": "[1/2] Cloning platform-addons (17.0)..."},
{"level": "info", "message": "[1/2] platform-addons ready (2.1s)"},
{"level": "info", "message": "Pulling Odoo 17.0 image..."},
{"level": "info", "message": "Image ready (8.3s)"},
{"level": "info", "message": "Starting Odoo container abc123_odoo..."},
{"level": "info", "message": "Container started: 9f8e7d6c5b4a (2.7s)"},
{"level": "info", "message": "Initializing Odoo database..."},
{"level": "info", "message": "Database initialization complete (45.2s)"},
{"level": "info", "message": "Performing health check..."},
{"level": "info", "message": "Health check passed! (3.1s)"},
{"level": "info", "message": "Deployment completed! Total: 68.9s (1.1 min)"}
]Failed Deployment
[
{"level": "info", "message": "Initializing deployment..."},
{"level": "info", "message": "Connecting to 192.168.1.100..."},
{"level": "error", "message": "SSH connection timeout after 30 seconds"},
{"level": "error", "message": "Deployment failed: Failed to connect to server via SSH"}
]Real-Time Log Streaming
SSE (Server-Sent Events) for real-time log updates:
// Frontend: Subscribe to deployment logs
const eventSource = new EventSource(
`/api/v1/events?organizationId=${orgId}`
);
eventSource.addEventListener('deployment.log', (event) => {
const log = JSON.parse(event.data);
console.log(`[${log.level}] ${log.message}`);
});SSE event format:
{
"event": "deployment.log",
"data": {
"deployment_id": "abc123",
"level": "info",
"message": "Container started: 9f8e7d6c5b4a (2.7s)",
"step": "starting_container",
"timestamp": "2025-01-15T10:35:22Z"
}
}Log Retention
- Storage: PostgreSQL
deployment_logstable - Retention: Logs retained for lifetime of deployment record
- Cleanup: Manual deletion only (no automatic cleanup currently)
- Size: Indexed by
deployment_id+timestampfor efficient queries
Permissions
All deployment operations are protected by OEC.SH's Permission Matrix system (Sprint 2E21).
Required Permissions
| Operation | Permission Code | Description |
|---|---|---|
| Deploy | project.environments.deploy | Create new deployment |
| Redeploy | project.environments.deploy | Force redeploy existing |
| Retry | project.environments.deploy | Retry failed deployment |
| Rollback | project.environments.deploy | Rollback to previous |
| Cancel | project.deployments.cancel | Cancel in-progress deployment |
| View Deployments | project.deployments.list | List deployment history |
| View Logs | project.deployments.view | View deployment logs |
| Destroy | project.environments.deploy | Destroy environment |
Permission Hierarchy
Portal Admin (55+ permissions)
└─ Organization Owner (40+ permissions)
└─ Organization Admin (30+ permissions)
└─ Project Admin (20+ permissions)
└─ Project Member (10+ permissions)System Roles
| Role | Deploy | Rollback | Cancel | View Logs | Destroy |
|---|---|---|---|---|---|
| Portal Admin | ✅ | ✅ | ✅ | ✅ | ✅ |
| Org Owner | ✅ | ✅ | ✅ | ✅ | ✅ |
| Org Admin | ✅ | ✅ | ✅ | ✅ | ✅ |
| Org Member | ❌ | ❌ | ❌ | ✅ | ❌ |
| Project Admin | ✅ | ✅ | ✅ | ✅ | ✅ |
| Project Member | ❌ | ❌ | ❌ | ✅ | ❌ |
Permission Checks
# From routes/deployments.py - Line 66
@router.post("", response_model=DeploymentResponse)
async def create_deployment(...):
# Check permission
has_permission = await check_permission(
db=db,
user=current_user,
permission_code="project.environments.deploy",
organization_id=project.organization_id,
project_id=project.id,
)
if not has_permission:
raise HTTPException(403, "You don't have permission to deploy.")Production Environment Protection
Additional protection for production environments (configurable):
# Future feature
if environment.type == "production":
# Require special permission for production deploy
has_prod_permission = await check_permission(
db, user,
"project.production.deploy",
organization_id, project_id
)
if not has_prod_permission:
raise HTTPException(403, "Production deployment requires special permission")API Reference
Complete API reference for all deployment operations.
Create Deployment
Endpoint: POST /api/v1/deployments
Request:
{
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"git_branch": "main",
"git_commit": "a1b2c3d4e5f6"
}Response: 201 Created
{
"id": "deploy_001",
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"version": 1,
"status": "pending",
"trigger": "manual",
"triggered_by": "user_123",
"git_branch": "main",
"git_commit": "a1b2c3d4e5f6",
"created_at": "2025-01-15T10:30:00Z"
}Rate Limit: 10 requests/minute
List Deployments
Endpoint: GET /api/v1/deployments
Query Parameters:
project_id(optional): Filter by projectstatus(optional): Filter by status (pending, success, failed, etc.)skip(optional): Pagination offset (default: 0)limit(optional): Results per page (default: 50, max: 100)
Response: 200 OK
[
{
"id": "deploy_001",
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"version": 5,
"status": "success",
"trigger": "manual",
"git_commit": "a1b2c3d",
"git_message": "Fix authentication bug",
"started_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:32:15Z",
"duration_seconds": 135.2
}
]Get Deployment
Endpoint: GET /api/v1/deployments/{deployment_id}
Response: 200 OK
{
"id": "deploy_001",
"project_id": "proj_abc123",
"environment_id": "env_xyz789",
"vm_id": "vm_001",
"version": 5,
"status": "success",
"trigger": "manual",
"triggered_by": "user_123",
"git_commit": "a1b2c3d4e5f6",
"git_branch": "main",
"git_message": "Fix authentication bug",
"started_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:32:15Z",
"duration_seconds": 135.2,
"container_id": "9f8e7d6c5b4a",
"image_tag": "odoo:17.0",
"extra_data": {
"git": {
"committer_name": "Jane Smith",
"committer_email": "jane@example.com",
"commit_date": "2025-01-15T09:00:00Z"
}
}
}Get Deployment Logs
Endpoint: GET /api/v1/deployments/{deployment_id}/logs
Query Parameters:
level(optional): Filter by log level (debug, info, warning, error)skip(optional): Pagination offset (default: 0)limit(optional): Max logs (default: 500, max: 500)
Response: 200 OK
{
"deployment_id": "deploy_001",
"logs": [
{
"id": "log_001",
"level": "info",
"message": "Connecting to 192.168.1.100...",
"timestamp": "2025-01-15T10:30:05Z",
"data": {"vm_ip": "192.168.1.100"}
}
]
}Retry Deployment
Endpoint: POST /api/v1/deployments/{deployment_id}/retry
Response: 200 OK
{
"id": "deploy_new",
"project_id": "proj_abc123",
"version": 6,
"status": "pending",
"trigger": "manual",
"git_branch": "main",
"git_commit": "a1b2c3d",
"extra_data": {
"retry_of": "deploy_001"
}
}Error Responses:
400 Bad Request: Can only retry failed deployments404 Not Found: Deployment not found409 Conflict: Deployment already in progress
Rollback Deployment
Endpoint: POST /api/v1/deployments/{deployment_id}/rollback
Response: 200 OK
{
"id": "deploy_rollback",
"project_id": "proj_abc123",
"version": 7,
"status": "pending",
"trigger": "rollback",
"git_branch": "main",
"git_commit": "previous_commit",
"rollback_from_id": "deploy_005",
"extra_data": {
"rollback_to": "deploy_001"
}
}Error Responses:
400 Bad Request: Can only rollback to successful deployments404 Not Found: Deployment not found409 Conflict: Deployment already in progress
Cancel Deployment
Endpoint: POST /api/v1/deployments/{deployment_id}/cancel
Response: 200 OK
{
"message": "Deployment cancelled"
}Error Responses:
400 Bad Request: Cannot cancel deployment in {status} state404 Not Found: Deployment not found
Destroy Environment
Endpoint: DELETE /api/v1/deployments/environment/{environment_id}/destroy
Query Parameters:
delete_data(optional): Also delete Odoo data directory (default: false)
Response: 200 OK
{
"message": "Environment destroyed successfully",
"environment_id": "env_xyz789",
"deleted_containers": true,
"deleted_database": true,
"deleted_data": false
}Get Deployment Progress
Endpoint: GET /api/v1/deployments/{deployment_id}/progress
Response: 200 OK
{
"deployment_id": "deploy_001",
"status": "deploying",
"progress_percent": 65,
"steps": [
{
"id": "initializing",
"name": "Initializing",
"description": "Preparing deployment configuration",
"status": "completed",
"logs": [
{"message": "Initializing deployment...", "timestamp": "2025-01-15T10:30:00Z"}
]
},
{
"id": "connecting",
"name": "Connecting",
"description": "Connecting to server via SSH",
"status": "completed",
"logs": [
{"message": "SSH connection established (1.2s)", "timestamp": "2025-01-15T10:30:01Z"}
]
},
{
"id": "pulling_image",
"name": "Pulling Image",
"description": "Downloading Odoo Docker image",
"status": "running",
"logs": []
},
{
"id": "starting_container",
"name": "Starting Container",
"status": "pending"
}
],
"current_step": "pulling_image",
"started_at": "2025-01-15T10:30:00Z"
}Best Practices
Guidelines for safe and effective deployment operations.
When to Retry vs Redeploy
Retry when:
- Transient network errors (SSH timeout, git clone failed)
- Docker registry rate limits
- Temporary server resource exhaustion
- No configuration changes needed
Redeploy when:
- Configuration changed (env vars, resource limits)
- Dependencies updated (apt packages, pip requirements)
- DNS or domain changed
- Need to apply new settings
Rollback Safety Considerations
- Check database migrations: Ensure rollback target's database schema is compatible
- Test in staging first: Always test rollback in non-production environment
- Consider backup restore: For full rollback including database, use backup restore + code rollback
- Document breaking changes: Maintain runbook of changes that prevent rollback
- Monitor after rollback: Watch logs and metrics after rollback completes
Testing Before Production Deployment
- Use staging environment: Deploy to staging first
- Run automated tests: Execute test suite after deployment
- Manual QA: Verify critical workflows
- Performance testing: Check response times and load
- Rollback drill: Practice rollback procedure before production deploy
Deployment Runbook Template
# Deployment Runbook: [Feature Name]
## Pre-Deployment
- [ ] Code review completed
- [ ] Tests passing in CI/CD
- [ ] Database migrations reviewed
- [ ] Backup created (ID: _________)
- [ ] Staging deployment successful
- [ ] Rollback procedure documented
## Deployment Steps
1. Deploy to production via dashboard
2. Monitor deployment logs for errors
3. Verify health checks pass
4. Run smoke tests
5. Monitor error rates for 15 minutes
## Rollback Plan
**If deployment fails**:
1. Cancel deployment if in progress
2. Check error logs: /api/v1/deployments/{id}/logs
3. Rollback to deployment ID: _________
4. If rollback fails, restore backup: _________
## Breaking Changes
- Database migration: Adds `new_column` (compatible with v1)
- API endpoint changed: `/v1/old` → `/v2/new` (v1 still supported)
## Success Criteria
- [ ] All health checks pass
- [ ] Error rate < 0.1%
- [ ] Response time < 500ms p95
- [ ] No user-reported issues in first 30 minTroubleshooting
Common issues and solutions for deployment operations.
Stuck Deployments
Symptom: Deployment status stuck at PENDING or DEPLOYING for > 30 minutes
Diagnosis:
# Check deployment logs
GET /api/v1/deployments/{id}/logs?level=error
# Check ARQ worker status
# (Admin only - via SSH to backend server)
docker logs paasportal_workerSolutions:
- Wait for automatic timeout: System automatically fails stuck deployments after 30 minutes
- Cancel deployment:
POST /api/v1/deployments/{id}/cancel - Retry deployment: After cancellation, retry from dashboard
- Check server resources: SSH to server, check
docker ps,df -h,free -m
Failed Retries
Symptom: Retry fails with same error repeatedly
Diagnosis:
# Get detailed error message
GET /api/v1/deployments/{id}
# Common error patterns
"SSH connection timeout" → Server unreachable
"Git clone failed" → Invalid credentials or repo URL
"Docker pull failed" → Registry authentication issue
"PostgreSQL connection failed" → Database container not startingSolutions by Error Type:
SSH Connection Timeout:
# Check VM is reachable
ping {vm_ip}
# Check SSH port open
nc -zv {vm_ip} 22
# Update VM SSH credentials if changed
PATCH /api/v1/vms/{vm_id}Git Clone Failed:
# Verify repository URL
curl -I {git_repo_url}
# Check git credentials
# - For GitHub: OAuth token in user profile
# - For private repos: Ensure git connection configured
GET /api/v1/organizations/{org_id}/git-connectionsDocker Pull Failed:
# Check Docker registry credentials
# - For custom registries: Update OdooVersion config
GET /api/v1/admin/odoo-versions/{version_id}
# Test pull manually on server
docker pull odoo:17.0Rollback Failures
Symptom: Rollback creates new deployment but it fails
Diagnosis:
# Check if target deployment is too old
GET /api/v1/deployments/{target_id}
# Compare database schemas
# (Manual - requires database access)
psql -U postgres -d {db_name} -c "\d"Solutions:
Database Schema Incompatibility:
-
Restore backup from before problematic deployment:
POST /api/v1/environments/{env_id}/restore {"backup_id": "backup_before_issue"} -
Then rollback code:
POST /api/v1/deployments/{old_deployment_id}/rollback
Missing Addon Dependencies:
- Check if rolled-back code depends on addons that were removed
- Restore missing addon repositories
- Redeploy with correct addon configuration
Resource Exhaustion:
- Server out of disk space for rollback
- Check:
df -h /var/lib/docker - Clean up:
docker system prune -a
Deployment Logs Not Showing
Symptom: Deployment progress shows no logs or logs frozen
Diagnosis:
# Check deployment status
GET /api/v1/deployments/{id}
# Check SSE connection
# (Frontend - browser console)
EventSource readyState: 1 (connected)Solutions:
- Refresh page: SSE connection may have dropped
- Check network: Firewall blocking SSE port
- Backend issue: Check worker logs (admin only)
- Fetch logs directly: Use logs API endpoint instead of SSE
Environment Destroyed But Quota Not Released
Symptom: After destroying environment, organization shows as over quota
Diagnosis:
# Check environment is_active status
GET /api/v1/environments/{env_id}
# Check quota usage
GET /api/v1/organizations/{org_id}/quotaSolution:
# Manually set is_active to false
# (Admin only - direct database update)
UPDATE project_environments
SET is_active = false
WHERE id = '{env_id}';
# Quota will recalculate on next checkPrevention: Always use destroy API endpoint, not manual container deletion
Related Documentation
- Deployment Overview - Introduction to deployments
- Deployment Configuration - Configure deployment settings
- Environment Management - Manage environments lifecycle
- Backup & Restore - Backup before risky deployments
- Permissions Matrix - Deployment permission system
- ARQ Task Queue - Background task processing architecture
Last Updated: January 2025 Platform Version: 2E41