Retry Failed Deployment

Feature ID: DEP-002 Category: Deployments Required Permission: project.environments.deploy API Endpoint: POST /api/v1/deployments/{deployment_id}/retry

Overview

Retry a failed deployment with the exact same code and configuration. Unlike redeploy (which pulls latest code) or rollback (which returns to a previous successful state), retry attempts to deploy the same Git commit that failed, allowing you to fix external issues (server connectivity, permissions, etc.) without changing your code.

Use this feature when you need to:

Retry after fixing server connectivity issues
Retry after resolving Git repository access problems
Retry after correcting Docker registry credentials
Retry after increasing server resources (disk space, memory)
Retry after fixing DNS configuration
Retry without changing the code commit

What retry does:

Creates a new deployment record with same Git commit
Increments version number (e.g., v2 failed → v3 retry)
Preserves Git branch and commit SHA from failed deployment
Clears error messages and timing data
Sets status to PENDING and trigger to MANUAL
Links to original deployment via extra_data.retry_of

What retry does NOT do:

Pull latest code from Git (use Redeploy for that)
Revert to previous successful deployment (use Rollback for that)
Automatically fix code issues (you need to push fixes first)

Prerequisites

Required Conditions

Failed Deployment: Original deployment must have status FAILED
No Concurrent Deployments: No other deployments running for the same project
Permission: User has project.environments.deploy permission
Environment Accessible: Target environment still exists and is accessible

What You Need

A failed deployment to retry
Understanding of why the original deployment failed
External issues resolved (connectivity, permissions, resources, etc.)

When to Use Retry vs Redeploy vs Rollback

Retry (Same Commit)

Use when: External issues caused failure, code is correct

Examples:

✅ Server ran out of disk space (freed space, now retry)
✅ Git repository was temporarily unavailable (now accessible)
✅ SSH key permissions were incorrect (fixed permissions)
✅ Docker registry rate limit exceeded (waited, now retry)

Behavior:

Same Git commit: abc123def456
Same Git branch: main
New version number: v3 (if original was v2)
Trigger: MANUAL

Redeploy (Latest Commit)

Use when: You want to deploy the latest code from the branch

Examples:

✅ Fixed bugs in code and pushed to Git
✅ Updated dependencies in requirements.txt
✅ Added new Odoo modules
✅ Modified configuration files

Behavior:

Latest Git commit: xyz789abc012 (different from before)
Same Git branch: main
New version number: v4
Trigger: MANUAL

Rollback (Previous Success)

Use when: Current deployment is problematic, need to revert to known-good state

Examples:

✅ Latest deployment introduced bugs in production
✅ Performance degraded after update
✅ Need to revert to last stable version quickly

Behavior:

Target Git commit: old789abc456 (from previous successful deployment)
Sets rollback_from_id: Links to deployment being rolled back from
New version number: v5
Trigger: ROLLBACK

How to Retry a Failed Deployment

Method 1: Via API (Current Implementation)

Step 1: Get Failed Deployment ID

List deployments to find the failed one:

GET /api/v1/projects/{project_id}/deployments?status=failed
 
Response:
{
  "items": [
    {
      "id": "deploy-uuid",
      "version": 2,
      "status": "failed",
      "git_commit": "abc123def456",
      "git_branch": "main",
      "error_message": "Failed to connect to server: Connection timeout",
      "created_at": "2024-12-11T10:00:00Z"
    }
  ]
}

Step 2: Create Retry Request

Send POST request to retry endpoint:

POST /api/v1/deployments/{deployment_id}/retry
 
Request: (no body required)
 
Response (200 OK):
{
  "id": "new-deploy-uuid",
  "project_id": "proj-uuid",
  "environment_id": "env-uuid",
  "version": 3,
  "status": "pending",
  "trigger": "manual",
  "triggered_by": "user-uuid",
  "git_commit": "abc123def456",  // Same as failed deployment
  "git_branch": "main",  // Same as failed deployment
  "extra_data": {
    "retry_of": "deploy-uuid"  // Links to original failed deployment
  },
  "created_at": "2024-12-11T11:00:00Z"
}

Step 3: Monitor New Deployment

The new deployment will start automatically (when background task execution is implemented). Monitor progress via:

DeploymentProgress Component:

Automatically shows progress for latest deployment
Polls every 2 seconds for updates
Displays step-by-step progress

SSE Real-Time Updates:

useSSEEvent("deployment_progress", (event) => {
  if (event.data.deployment_id === newDeploymentId) {
    // Update UI with progress
  }
});

Method 2: Via UI (Proposed Implementation)

Note: The retry button is not yet implemented in the frontend. When implemented, it would work like this:

Navigate to Environment → Deployments tab
Find failed deployment in deployment history
Click "Retry" button (next to failed deployment)
Confirmation dialog appears:
- Shows original deployment details
- Shows Git commit and branch
- Shows error message from failure
- Asks "Retry deployment with same code?"
Click "Retry Deployment"
New deployment starts with same code

Proposed UI Component:

{deployment.status === 'failed' && (
  <button
    onClick={() => handleRetry(deployment.id)}
    className="btn-secondary"
  >
    <RotateCcw className="h-4 w-4" />
    Retry
  </button>
)}

What Data is Preserved vs Reset

Preserved from Failed Deployment

Field	Value	Notes
`project_id`	Same	Same project context
`git_commit`	Same	Exact same commit SHA
`git_branch`	Same	Branch reference preserved
`extra_data.retry_of`	Set	Links to original deployment ID

Reset/New for Retry Deployment

Field	Value	Notes
`id`	New UUID	Completely new deployment record
`version`	Incremented	e.g., v2 → v3
`status`	PENDING	Starts fresh
`trigger`	MANUAL	User-initiated retry
`triggered_by`	Current user	May differ from original user
`started_at`	NULL	Cleared
`completed_at`	NULL	Cleared
`duration_seconds`	NULL	Cleared
`error_message`	NULL	Cleared
`error_code`	NULL	Cleared
`container_id`	NULL	Cleared

Not Copied (Implementation Issue)

Critical Note: The current retry implementation does NOT copy:

environment_id (target environment reference)
vm_id (target server reference)

This means the retry creates a deployment record but lacks target information. The implementation needs to be completed to include these fields.

Retry Conditions and Validations

Pre-Condition Checks

Before retry is allowed, the system validates:

1. Deployment Status = FAILED

if original.status != DeploymentStatus.FAILED:
    raise HTTPException(400, "Can only retry failed deployments")

Valid Statuses for Retry: Only FAILED

Invalid Statuses:

❌ SUCCESS - Already succeeded, use Redeploy instead
❌ PENDING - Not yet executed
❌ RUNNING - Currently executing
❌ CANCELLED - Was cancelled, not failed

2. No Concurrent Deployments

active_deployments = await db.execute(
    select(Deployment).where(
        Deployment.project_id == original.project_id,
        Deployment.status.in_([DeploymentStatus.PENDING, DeploymentStatus.RUNNING])
    )
)
if active_deployments.scalar_one_or_none():
    raise HTTPException(409, "A deployment is already in progress for this project")

Why: Prevents resource conflicts and ensures server stability

Resolution: Wait for active deployment to complete or cancel it first

3. User Has Deploy Permission

has_permission = await check_permission(
    db=db,
    user=current_user,
    permission_code="project.environments.deploy",
    organization_id=original.project.organization_id,
    project_id=original.project_id,
)
if not has_permission:
    raise HTTPException(403, "You don't have permission to deploy")

Required Role: At least org_member or project_admin

Version Management

Each retry increments the deployment version number:

Version Calculation

# Get max version for project
result = await db.execute(
    select(func.max(Deployment.version)).where(
        Deployment.project_id == original.project_id
    )
)
next_version = (result.scalar() or 0) + 1
 
# Create retry with incremented version
new_deployment.version = next_version

Version Sequence Example

Deployment Timeline:
─────────────────────────────────────────────
v1: SUCCESS  (initial deploy)
v2: FAILED   (deployment error)
v3: SUCCESS  (retry of v2) ← extra_data.retry_of = v2
v4: FAILED   (new code introduced bug)
v5: SUCCESS  (retry of v4) ← extra_data.retry_of = v4

Key Points:

Versions are always incrementing (never reuse a version number)
Retry creates a new version (v3), not v2 again
Original failed deployment (v2) remains in database
extra_data.retry_of field links retry to original

Troubleshooting Failed Retries

Issue 1: Retry Also Failed

Symptoms: Second deployment also fails with same error

Common Causes:

Issue Not Actually Fixed
- Verified server connectivity: ✓ Working
- Verified Git access: ✗ Still failing
- Solution: Actually fix the Git SSH key
Different Error Occurred
- Original error: "Connection timeout"
- New error: "Disk full"
- Solution: Address the new error, retry again
Code Issue (Not External)
- Error: "Module 'sale' not found in addons path"
- Solution: Don't use retry, fix code and redeploy

Issue 2: "Can Only Retry Failed Deployments"

Symptoms: API returns 400 Bad Request

Cause: Trying to retry a deployment that's not in FAILED status

Resolution:

Check deployment status:
```
GET /api/v1/deployments/{deployment_id}
```
Verify status is exactly "failed"
If status is "success", use Redeploy instead
If status is "cancelled", create new deployment

Issue 3: "Deployment Already in Progress"

Symptoms: API returns 409 Conflict

Cause: Another deployment is currently PENDING or RUNNING for the same project

Resolution:

Find active deployment:

GET /api/v1/projects/{project_id}/deployments?status=running,pending

Wait for it to complete (monitor progress)

Or cancel it if safe:

POST /api/v1/deployments/{active_id}/cancel

Then retry failed deployment

Issue 4: Retry Created But Not Executing

Symptoms: Retry deployment stuck in PENDING status

Cause: Background task execution not implemented (current limitation)

Temporary Workaround:

Trigger deployment manually via Deploy button
Or use Redeploy action (which triggers background task)

Permanent Fix: Implementation needs to call background task:

# After creating retry deployment
await enqueue_task(
    "run_environment_deployment",
    deployment_id=str(new_deployment.id),
    environment_id=str(new_deployment.environment_id),
    user_id=str(current_user.id),
)

Issue 5: Different User Retrying

Symptoms: Retry succeeds but audit logs show different user

Explanation: This is expected behavior

Details:

Original deployment: triggered_by = user-abc
Retry deployment: triggered_by = user-xyz (current user)
Retry reference: extra_data.retry_of = original-deploy-id

Why: Allows team members to retry each other's failed deployments (common in collaborative environments)

Comparison Table

Feature	Retry	Redeploy	Rollback
Purpose	Fix external issues	Deploy latest code	Revert to stable
Git Commit	Same as failed	Latest from branch	From previous success
Version	New (v+1)	New (v+1)	New (v+1)
Trigger	MANUAL	MANUAL	ROLLBACK
Pre-Condition	Status = FAILED	Any status	Target = SUCCESS
Use Case	Server was down	Code was updated	Rollback bad deploy
Risk Level	Low (same code)	Medium (new code)	Low (proven code)
Speed	Fast	Fast	Fast

Best Practices

1. Understand Failure Before Retry

✅ Do: Read error messages and deployment logs ✅ Do: Identify root cause (server issue vs code issue) ✅ Do: Fix external issues before retry ❌ Don't: Blindly retry hoping it will work

2. Use Correct Action for Situation

✅ Do: Use Retry for external failures (connectivity, permissions) ✅ Do: Use Redeploy after pushing code fixes ✅ Do: Use Rollback for quick revert to stable state ❌ Don't: Use Retry when code needs fixing (it won't help)

3. Monitor Retry Attempts

✅ Do: Watch deployment progress in real-time ✅ Do: Check if retry encounters same error ✅ Do: Note version numbers for audit trail ❌ Don't: Retry more than 2-3 times without investigating

4. Document Retry Reasons

✅ Do: Add comments in project tracking system ✅ Do: Note what external issue was fixed ✅ Do: Update runbooks with learnings ❌ Don't: Silently retry without team communication

5. Check for Concurrent Operations

✅ Do: Verify no other deployments running ✅ Do: Coordinate with team before retrying ✅ Do: Use deployment queue to avoid conflicts ❌ Don't: Force multiple concurrent deployments

API Reference