Features
Deployments
Retry Failed Deployment

Retry Failed Deployment

Feature ID: DEP-002 Category: Deployments Required Permission: project.environments.deploy API Endpoint: POST /api/v1/deployments/{deployment_id}/retry


Overview

Retry a failed deployment with the exact same code and configuration. Unlike redeploy (which pulls latest code) or rollback (which returns to a previous successful state), retry attempts to deploy the same Git commit that failed, allowing you to fix external issues (server connectivity, permissions, etc.) without changing your code.

Use this feature when you need to:

  • Retry after fixing server connectivity issues
  • Retry after resolving Git repository access problems
  • Retry after correcting Docker registry credentials
  • Retry after increasing server resources (disk space, memory)
  • Retry after fixing DNS configuration
  • Retry without changing the code commit

What retry does:

  1. Creates a new deployment record with same Git commit
  2. Increments version number (e.g., v2 failed → v3 retry)
  3. Preserves Git branch and commit SHA from failed deployment
  4. Clears error messages and timing data
  5. Sets status to PENDING and trigger to MANUAL
  6. Links to original deployment via extra_data.retry_of

What retry does NOT do:

  • Pull latest code from Git (use Redeploy for that)
  • Revert to previous successful deployment (use Rollback for that)
  • Automatically fix code issues (you need to push fixes first)

Prerequisites

Required Conditions

  • Failed Deployment: Original deployment must have status FAILED
  • No Concurrent Deployments: No other deployments running for the same project
  • Permission: User has project.environments.deploy permission
  • Environment Accessible: Target environment still exists and is accessible

What You Need

  • A failed deployment to retry
  • Understanding of why the original deployment failed
  • External issues resolved (connectivity, permissions, resources, etc.)

When to Use Retry vs Redeploy vs Rollback

Retry (Same Commit)

Use when: External issues caused failure, code is correct

Examples:

  • ✅ Server ran out of disk space (freed space, now retry)
  • ✅ Git repository was temporarily unavailable (now accessible)
  • ✅ SSH key permissions were incorrect (fixed permissions)
  • ✅ Docker registry rate limit exceeded (waited, now retry)

Behavior:

  • Same Git commit: abc123def456
  • Same Git branch: main
  • New version number: v3 (if original was v2)
  • Trigger: MANUAL

Redeploy (Latest Commit)

Use when: You want to deploy the latest code from the branch

Examples:

  • ✅ Fixed bugs in code and pushed to Git
  • ✅ Updated dependencies in requirements.txt
  • ✅ Added new Odoo modules
  • ✅ Modified configuration files

Behavior:

  • Latest Git commit: xyz789abc012 (different from before)
  • Same Git branch: main
  • New version number: v4
  • Trigger: MANUAL

Rollback (Previous Success)

Use when: Current deployment is problematic, need to revert to known-good state

Examples:

  • ✅ Latest deployment introduced bugs in production
  • ✅ Performance degraded after update
  • ✅ Need to revert to last stable version quickly

Behavior:

  • Target Git commit: old789abc456 (from previous successful deployment)
  • Sets rollback_from_id: Links to deployment being rolled back from
  • New version number: v5
  • Trigger: ROLLBACK

How to Retry a Failed Deployment

Method 1: Via API (Current Implementation)

Step 1: Get Failed Deployment ID

List deployments to find the failed one:

GET /api/v1/projects/{project_id}/deployments?status=failed
 
Response:
{
  "items": [
    {
      "id": "deploy-uuid",
      "version": 2,
      "status": "failed",
      "git_commit": "abc123def456",
      "git_branch": "main",
      "error_message": "Failed to connect to server: Connection timeout",
      "created_at": "2024-12-11T10:00:00Z"
    }
  ]
}

Step 2: Create Retry Request

Send POST request to retry endpoint:

POST /api/v1/deployments/{deployment_id}/retry
 
Request: (no body required)
 
Response (200 OK):
{
  "id": "new-deploy-uuid",
  "project_id": "proj-uuid",
  "environment_id": "env-uuid",
  "version": 3,
  "status": "pending",
  "trigger": "manual",
  "triggered_by": "user-uuid",
  "git_commit": "abc123def456",  // Same as failed deployment
  "git_branch": "main",  // Same as failed deployment
  "extra_data": {
    "retry_of": "deploy-uuid"  // Links to original failed deployment
  },
  "created_at": "2024-12-11T11:00:00Z"
}

Step 3: Monitor New Deployment

The new deployment will start automatically (when background task execution is implemented). Monitor progress via:

DeploymentProgress Component:

  • Automatically shows progress for latest deployment
  • Polls every 2 seconds for updates
  • Displays step-by-step progress

SSE Real-Time Updates:

useSSEEvent("deployment_progress", (event) => {
  if (event.data.deployment_id === newDeploymentId) {
    // Update UI with progress
  }
});

Method 2: Via UI (Proposed Implementation)

Note: The retry button is not yet implemented in the frontend. When implemented, it would work like this:

  1. Navigate to Environment → Deployments tab
  2. Find failed deployment in deployment history
  3. Click "Retry" button (next to failed deployment)
  4. Confirmation dialog appears:
    • Shows original deployment details
    • Shows Git commit and branch
    • Shows error message from failure
    • Asks "Retry deployment with same code?"
  5. Click "Retry Deployment"
  6. New deployment starts with same code

Proposed UI Component:

{deployment.status === 'failed' && (
  <button
    onClick={() => handleRetry(deployment.id)}
    className="btn-secondary"
  >
    <RotateCcw className="h-4 w-4" />
    Retry
  </button>
)}

What Data is Preserved vs Reset

Preserved from Failed Deployment

FieldValueNotes
project_idSameSame project context
git_commitSameExact same commit SHA
git_branchSameBranch reference preserved
extra_data.retry_ofSetLinks to original deployment ID

Reset/New for Retry Deployment

FieldValueNotes
idNew UUIDCompletely new deployment record
versionIncrementede.g., v2 → v3
statusPENDINGStarts fresh
triggerMANUALUser-initiated retry
triggered_byCurrent userMay differ from original user
started_atNULLCleared
completed_atNULLCleared
duration_secondsNULLCleared
error_messageNULLCleared
error_codeNULLCleared
container_idNULLCleared

Not Copied (Implementation Issue)

Critical Note: The current retry implementation does NOT copy:

  • environment_id (target environment reference)
  • vm_id (target server reference)

This means the retry creates a deployment record but lacks target information. The implementation needs to be completed to include these fields.


Retry Conditions and Validations

Pre-Condition Checks

Before retry is allowed, the system validates:

1. Deployment Status = FAILED

if original.status != DeploymentStatus.FAILED:
    raise HTTPException(400, "Can only retry failed deployments")

Valid Statuses for Retry: Only FAILED

Invalid Statuses:

  • SUCCESS - Already succeeded, use Redeploy instead
  • PENDING - Not yet executed
  • RUNNING - Currently executing
  • CANCELLED - Was cancelled, not failed

2. No Concurrent Deployments

active_deployments = await db.execute(
    select(Deployment).where(
        Deployment.project_id == original.project_id,
        Deployment.status.in_([DeploymentStatus.PENDING, DeploymentStatus.RUNNING])
    )
)
if active_deployments.scalar_one_or_none():
    raise HTTPException(409, "A deployment is already in progress for this project")

Why: Prevents resource conflicts and ensures server stability

Resolution: Wait for active deployment to complete or cancel it first


3. User Has Deploy Permission

has_permission = await check_permission(
    db=db,
    user=current_user,
    permission_code="project.environments.deploy",
    organization_id=original.project.organization_id,
    project_id=original.project_id,
)
if not has_permission:
    raise HTTPException(403, "You don't have permission to deploy")

Required Role: At least org_member or project_admin


Version Management

Each retry increments the deployment version number:

Version Calculation

# Get max version for project
result = await db.execute(
    select(func.max(Deployment.version)).where(
        Deployment.project_id == original.project_id
    )
)
next_version = (result.scalar() or 0) + 1
 
# Create retry with incremented version
new_deployment.version = next_version

Version Sequence Example

Deployment Timeline:
─────────────────────────────────────────────
v1: SUCCESS  (initial deploy)
v2: FAILED   (deployment error)
v3: SUCCESS  (retry of v2) ← extra_data.retry_of = v2
v4: FAILED   (new code introduced bug)
v5: SUCCESS  (retry of v4) ← extra_data.retry_of = v4

Key Points:

  • Versions are always incrementing (never reuse a version number)
  • Retry creates a new version (v3), not v2 again
  • Original failed deployment (v2) remains in database
  • extra_data.retry_of field links retry to original

Troubleshooting Failed Retries

Issue 1: Retry Also Failed

Symptoms: Second deployment also fails with same error

Common Causes:

  1. Issue Not Actually Fixed

    • Verified server connectivity: ✓ Working
    • Verified Git access: ✗ Still failing
    • Solution: Actually fix the Git SSH key
  2. Different Error Occurred

    • Original error: "Connection timeout"
    • New error: "Disk full"
    • Solution: Address the new error, retry again
  3. Code Issue (Not External)

    • Error: "Module 'sale' not found in addons path"
    • Solution: Don't use retry, fix code and redeploy

Issue 2: "Can Only Retry Failed Deployments"

Symptoms: API returns 400 Bad Request

Cause: Trying to retry a deployment that's not in FAILED status

Resolution:

  1. Check deployment status:
    GET /api/v1/deployments/{deployment_id}
  2. Verify status is exactly "failed"
  3. If status is "success", use Redeploy instead
  4. If status is "cancelled", create new deployment

Issue 3: "Deployment Already in Progress"

Symptoms: API returns 409 Conflict

Cause: Another deployment is currently PENDING or RUNNING for the same project

Resolution:

  1. Find active deployment:
    GET /api/v1/projects/{project_id}/deployments?status=running,pending
  2. Wait for it to complete (monitor progress)
  3. Or cancel it if safe:
    POST /api/v1/deployments/{active_id}/cancel
  4. Then retry failed deployment

Issue 4: Retry Created But Not Executing

Symptoms: Retry deployment stuck in PENDING status

Cause: Background task execution not implemented (current limitation)

Temporary Workaround:

  1. Trigger deployment manually via Deploy button
  2. Or use Redeploy action (which triggers background task)

Permanent Fix: Implementation needs to call background task:

# After creating retry deployment
await enqueue_task(
    "run_environment_deployment",
    deployment_id=str(new_deployment.id),
    environment_id=str(new_deployment.environment_id),
    user_id=str(current_user.id),
)

Issue 5: Different User Retrying

Symptoms: Retry succeeds but audit logs show different user

Explanation: This is expected behavior

Details:

  • Original deployment: triggered_by = user-abc
  • Retry deployment: triggered_by = user-xyz (current user)
  • Retry reference: extra_data.retry_of = original-deploy-id

Why: Allows team members to retry each other's failed deployments (common in collaborative environments)


Comparison Table

FeatureRetryRedeployRollback
PurposeFix external issuesDeploy latest codeRevert to stable
Git CommitSame as failedLatest from branchFrom previous success
VersionNew (v+1)New (v+1)New (v+1)
TriggerMANUALMANUALROLLBACK
Pre-ConditionStatus = FAILEDAny statusTarget = SUCCESS
Use CaseServer was downCode was updatedRollback bad deploy
Risk LevelLow (same code)Medium (new code)Low (proven code)
SpeedFastFastFast

Best Practices

1. Understand Failure Before Retry

Do: Read error messages and deployment logs ✅ Do: Identify root cause (server issue vs code issue) ✅ Do: Fix external issues before retry ❌ Don't: Blindly retry hoping it will work


2. Use Correct Action for Situation

Do: Use Retry for external failures (connectivity, permissions) ✅ Do: Use Redeploy after pushing code fixes ✅ Do: Use Rollback for quick revert to stable state ❌ Don't: Use Retry when code needs fixing (it won't help)


3. Monitor Retry Attempts

Do: Watch deployment progress in real-time ✅ Do: Check if retry encounters same error ✅ Do: Note version numbers for audit trail ❌ Don't: Retry more than 2-3 times without investigating


4. Document Retry Reasons

Do: Add comments in project tracking system ✅ Do: Note what external issue was fixed ✅ Do: Update runbooks with learnings ❌ Don't: Silently retry without team communication


5. Check for Concurrent Operations

Do: Verify no other deployments running ✅ Do: Coordinate with team before retrying ✅ Do: Use deployment queue to avoid conflicts ❌ Don't: Force multiple concurrent deployments


API Reference

Retry Failed Deployment

Endpoint: POST /api/v1/deployments/{deployment_id}/retry

Authentication: Required (Bearer token)

Path Parameters:

  • deployment_id (UUID, required) - ID of failed deployment to retry

Request Body: None

Response (200 OK):

{
  "id": "new-deploy-uuid",
  "project_id": "proj-uuid",
  "environment_id": "env-uuid",
  "vm_id": "server-uuid",
  "version": 3,
  "status": "pending",
  "trigger": "manual",
  "triggered_by": "user-uuid",
  "git_commit": "abc123def456",
  "git_branch": "main",
  "git_message": "Original commit message",
  "extra_data": {
    "retry_of": "original-deploy-uuid"
  },
  "created_at": "2024-12-11T11:00:00Z",
  "started_at": null,
  "completed_at": null,
  "duration_seconds": null,
  "error_message": null
}

Errors:

  • 400 Bad Request - Can only retry failed deployments
  • 403 Forbidden - Missing project.environments.deploy permission
  • 404 Not Found - Deployment not found
  • 409 Conflict - Another deployment already in progress

Get Deployment Details

Endpoint: GET /api/v1/deployments/{deployment_id}

Response includes extra_data.retry_of field for retry deployments:

{
  "id": "deploy-uuid",
  "version": 3,
  "status": "success",
  "extra_data": {
    "retry_of": "original-failed-deploy-uuid"
  }
}

Code Examples

Example 1: Retry via cURL

# Step 1: Find failed deployment
curl -X GET "https://api.oec.sh/api/v1/projects/proj-uuid/deployments?status=failed" \
  -H "Authorization: Bearer YOUR_TOKEN"
 
# Response shows deployment ID: deploy-abc123
 
# Step 2: Retry the deployment
curl -X POST "https://api.oec.sh/api/v1/deployments/deploy-abc123/retry" \
  -H "Authorization: Bearer YOUR_TOKEN"
 
# Response: New deployment created with status "pending"

Example 2: Retry via Python

import requests
 
# Configuration
API_URL = "https://api.oec.sh/api/v1"
TOKEN = "your_jwt_token"
headers = {"Authorization": f"Bearer {TOKEN}"}
 
# Find failed deployment
response = requests.get(
    f"{API_URL}/projects/proj-uuid/deployments",
    params={"status": "failed"},
    headers=headers
)
deployments = response.json()["items"]
 
if deployments:
    failed_deployment = deployments[0]
    print(f"Found failed deployment v{failed_deployment['version']}")
    print(f"Error: {failed_deployment['error_message']}")
 
    # Retry the deployment
    retry_response = requests.post(
        f"{API_URL}/deployments/{failed_deployment['id']}/retry",
        headers=headers
    )
 
    if retry_response.status_code == 200:
        new_deployment = retry_response.json()
        print(f"Retry created: v{new_deployment['version']}")
        print(f"Status: {new_deployment['status']}")
    else:
        print(f"Retry failed: {retry_response.json()['detail']}")

Example 3: Retry via TypeScript (Frontend)

import { deploymentsApi } from '@/lib/api/deployments';
 
const handleRetry = async (failedDeploymentId: string) => {
  try {
    // Call retry endpoint
    const response = await deploymentsApi.retry(failedDeploymentId);
 
    // Show success message
    toast.success(`Retry queued: Deployment v${response.data.version}`);
 
    // Navigate to deployment progress
    router.push(`/deployments/${response.data.id}`);
  } catch (error) {
    // Handle errors
    if (error.response?.status === 400) {
      toast.error("Can only retry failed deployments");
    } else if (error.response?.status === 409) {
      toast.error("Another deployment is already in progress");
    } else {
      toast.error("Failed to retry deployment");
    }
  }
};

Related Documentation


Last Updated: December 11, 2025 Applies to: OEC.SH v2.0+ Related Sprint: Sprint 2E41 - Documentation System

Implementation Status: ⚠️ Partially implemented - Retry endpoint exists but requires completion:

  • ✅ Creates new deployment record
  • ✅ Preserves Git commit and branch
  • ✅ Validates pre-conditions
  • ⚠️ Missing: Copy environment_id and vm_id
  • ⚠️ Missing: Trigger background task execution
  • ⚠️ Missing: Frontend UI component