Retry Failed Deployment
Feature ID: DEP-002
Category: Deployments
Required Permission: project.environments.deploy
API Endpoint: POST /api/v1/deployments/{deployment_id}/retry
Overview
Retry a failed deployment with the exact same code and configuration. Unlike redeploy (which pulls latest code) or rollback (which returns to a previous successful state), retry attempts to deploy the same Git commit that failed, allowing you to fix external issues (server connectivity, permissions, etc.) without changing your code.
Use this feature when you need to:
- Retry after fixing server connectivity issues
- Retry after resolving Git repository access problems
- Retry after correcting Docker registry credentials
- Retry after increasing server resources (disk space, memory)
- Retry after fixing DNS configuration
- Retry without changing the code commit
What retry does:
- Creates a new deployment record with same Git commit
- Increments version number (e.g., v2 failed → v3 retry)
- Preserves Git branch and commit SHA from failed deployment
- Clears error messages and timing data
- Sets status to PENDING and trigger to MANUAL
- Links to original deployment via
extra_data.retry_of
What retry does NOT do:
- Pull latest code from Git (use Redeploy for that)
- Revert to previous successful deployment (use Rollback for that)
- Automatically fix code issues (you need to push fixes first)
Prerequisites
Required Conditions
- Failed Deployment: Original deployment must have status
FAILED - No Concurrent Deployments: No other deployments running for the same project
- Permission: User has
project.environments.deploypermission - Environment Accessible: Target environment still exists and is accessible
What You Need
- A failed deployment to retry
- Understanding of why the original deployment failed
- External issues resolved (connectivity, permissions, resources, etc.)
When to Use Retry vs Redeploy vs Rollback
Retry (Same Commit)
Use when: External issues caused failure, code is correct
Examples:
- ✅ Server ran out of disk space (freed space, now retry)
- ✅ Git repository was temporarily unavailable (now accessible)
- ✅ SSH key permissions were incorrect (fixed permissions)
- ✅ Docker registry rate limit exceeded (waited, now retry)
Behavior:
- Same Git commit:
abc123def456 - Same Git branch:
main - New version number: v3 (if original was v2)
- Trigger:
MANUAL
Redeploy (Latest Commit)
Use when: You want to deploy the latest code from the branch
Examples:
- ✅ Fixed bugs in code and pushed to Git
- ✅ Updated dependencies in requirements.txt
- ✅ Added new Odoo modules
- ✅ Modified configuration files
Behavior:
- Latest Git commit:
xyz789abc012(different from before) - Same Git branch:
main - New version number: v4
- Trigger:
MANUAL
Rollback (Previous Success)
Use when: Current deployment is problematic, need to revert to known-good state
Examples:
- ✅ Latest deployment introduced bugs in production
- ✅ Performance degraded after update
- ✅ Need to revert to last stable version quickly
Behavior:
- Target Git commit:
old789abc456(from previous successful deployment) - Sets
rollback_from_id: Links to deployment being rolled back from - New version number: v5
- Trigger:
ROLLBACK
How to Retry a Failed Deployment
Method 1: Via API (Current Implementation)
Step 1: Get Failed Deployment ID
List deployments to find the failed one:
GET /api/v1/projects/{project_id}/deployments?status=failed
Response:
{
"items": [
{
"id": "deploy-uuid",
"version": 2,
"status": "failed",
"git_commit": "abc123def456",
"git_branch": "main",
"error_message": "Failed to connect to server: Connection timeout",
"created_at": "2024-12-11T10:00:00Z"
}
]
}Step 2: Create Retry Request
Send POST request to retry endpoint:
POST /api/v1/deployments/{deployment_id}/retry
Request: (no body required)
Response (200 OK):
{
"id": "new-deploy-uuid",
"project_id": "proj-uuid",
"environment_id": "env-uuid",
"version": 3,
"status": "pending",
"trigger": "manual",
"triggered_by": "user-uuid",
"git_commit": "abc123def456", // Same as failed deployment
"git_branch": "main", // Same as failed deployment
"extra_data": {
"retry_of": "deploy-uuid" // Links to original failed deployment
},
"created_at": "2024-12-11T11:00:00Z"
}Step 3: Monitor New Deployment
The new deployment will start automatically (when background task execution is implemented). Monitor progress via:
DeploymentProgress Component:
- Automatically shows progress for latest deployment
- Polls every 2 seconds for updates
- Displays step-by-step progress
SSE Real-Time Updates:
useSSEEvent("deployment_progress", (event) => {
if (event.data.deployment_id === newDeploymentId) {
// Update UI with progress
}
});Method 2: Via UI (Proposed Implementation)
Note: The retry button is not yet implemented in the frontend. When implemented, it would work like this:
- Navigate to Environment → Deployments tab
- Find failed deployment in deployment history
- Click "Retry" button (next to failed deployment)
- Confirmation dialog appears:
- Shows original deployment details
- Shows Git commit and branch
- Shows error message from failure
- Asks "Retry deployment with same code?"
- Click "Retry Deployment"
- New deployment starts with same code
Proposed UI Component:
{deployment.status === 'failed' && (
<button
onClick={() => handleRetry(deployment.id)}
className="btn-secondary"
>
<RotateCcw className="h-4 w-4" />
Retry
</button>
)}What Data is Preserved vs Reset
Preserved from Failed Deployment
| Field | Value | Notes |
|---|---|---|
project_id | Same | Same project context |
git_commit | Same | Exact same commit SHA |
git_branch | Same | Branch reference preserved |
extra_data.retry_of | Set | Links to original deployment ID |
Reset/New for Retry Deployment
| Field | Value | Notes |
|---|---|---|
id | New UUID | Completely new deployment record |
version | Incremented | e.g., v2 → v3 |
status | PENDING | Starts fresh |
trigger | MANUAL | User-initiated retry |
triggered_by | Current user | May differ from original user |
started_at | NULL | Cleared |
completed_at | NULL | Cleared |
duration_seconds | NULL | Cleared |
error_message | NULL | Cleared |
error_code | NULL | Cleared |
container_id | NULL | Cleared |
Not Copied (Implementation Issue)
Critical Note: The current retry implementation does NOT copy:
environment_id(target environment reference)vm_id(target server reference)
This means the retry creates a deployment record but lacks target information. The implementation needs to be completed to include these fields.
Retry Conditions and Validations
Pre-Condition Checks
Before retry is allowed, the system validates:
1. Deployment Status = FAILED
if original.status != DeploymentStatus.FAILED:
raise HTTPException(400, "Can only retry failed deployments")Valid Statuses for Retry: Only FAILED
Invalid Statuses:
- ❌
SUCCESS- Already succeeded, use Redeploy instead - ❌
PENDING- Not yet executed - ❌
RUNNING- Currently executing - ❌
CANCELLED- Was cancelled, not failed
2. No Concurrent Deployments
active_deployments = await db.execute(
select(Deployment).where(
Deployment.project_id == original.project_id,
Deployment.status.in_([DeploymentStatus.PENDING, DeploymentStatus.RUNNING])
)
)
if active_deployments.scalar_one_or_none():
raise HTTPException(409, "A deployment is already in progress for this project")Why: Prevents resource conflicts and ensures server stability
Resolution: Wait for active deployment to complete or cancel it first
3. User Has Deploy Permission
has_permission = await check_permission(
db=db,
user=current_user,
permission_code="project.environments.deploy",
organization_id=original.project.organization_id,
project_id=original.project_id,
)
if not has_permission:
raise HTTPException(403, "You don't have permission to deploy")Required Role: At least org_member or project_admin
Version Management
Each retry increments the deployment version number:
Version Calculation
# Get max version for project
result = await db.execute(
select(func.max(Deployment.version)).where(
Deployment.project_id == original.project_id
)
)
next_version = (result.scalar() or 0) + 1
# Create retry with incremented version
new_deployment.version = next_versionVersion Sequence Example
Deployment Timeline:
─────────────────────────────────────────────
v1: SUCCESS (initial deploy)
v2: FAILED (deployment error)
v3: SUCCESS (retry of v2) ← extra_data.retry_of = v2
v4: FAILED (new code introduced bug)
v5: SUCCESS (retry of v4) ← extra_data.retry_of = v4Key Points:
- Versions are always incrementing (never reuse a version number)
- Retry creates a new version (v3), not v2 again
- Original failed deployment (v2) remains in database
extra_data.retry_offield links retry to original
Troubleshooting Failed Retries
Issue 1: Retry Also Failed
Symptoms: Second deployment also fails with same error
Common Causes:
-
Issue Not Actually Fixed
- Verified server connectivity: ✓ Working
- Verified Git access: ✗ Still failing
- Solution: Actually fix the Git SSH key
-
Different Error Occurred
- Original error: "Connection timeout"
- New error: "Disk full"
- Solution: Address the new error, retry again
-
Code Issue (Not External)
- Error: "Module 'sale' not found in addons path"
- Solution: Don't use retry, fix code and redeploy
Issue 2: "Can Only Retry Failed Deployments"
Symptoms: API returns 400 Bad Request
Cause: Trying to retry a deployment that's not in FAILED status
Resolution:
- Check deployment status:
GET /api/v1/deployments/{deployment_id} - Verify status is exactly "failed"
- If status is "success", use Redeploy instead
- If status is "cancelled", create new deployment
Issue 3: "Deployment Already in Progress"
Symptoms: API returns 409 Conflict
Cause: Another deployment is currently PENDING or RUNNING for the same project
Resolution:
- Find active deployment:
GET /api/v1/projects/{project_id}/deployments?status=running,pending - Wait for it to complete (monitor progress)
- Or cancel it if safe:
POST /api/v1/deployments/{active_id}/cancel - Then retry failed deployment
Issue 4: Retry Created But Not Executing
Symptoms: Retry deployment stuck in PENDING status
Cause: Background task execution not implemented (current limitation)
Temporary Workaround:
- Trigger deployment manually via Deploy button
- Or use Redeploy action (which triggers background task)
Permanent Fix: Implementation needs to call background task:
# After creating retry deployment
await enqueue_task(
"run_environment_deployment",
deployment_id=str(new_deployment.id),
environment_id=str(new_deployment.environment_id),
user_id=str(current_user.id),
)Issue 5: Different User Retrying
Symptoms: Retry succeeds but audit logs show different user
Explanation: This is expected behavior
Details:
- Original deployment:
triggered_by = user-abc - Retry deployment:
triggered_by = user-xyz(current user) - Retry reference:
extra_data.retry_of = original-deploy-id
Why: Allows team members to retry each other's failed deployments (common in collaborative environments)
Comparison Table
| Feature | Retry | Redeploy | Rollback |
|---|---|---|---|
| Purpose | Fix external issues | Deploy latest code | Revert to stable |
| Git Commit | Same as failed | Latest from branch | From previous success |
| Version | New (v+1) | New (v+1) | New (v+1) |
| Trigger | MANUAL | MANUAL | ROLLBACK |
| Pre-Condition | Status = FAILED | Any status | Target = SUCCESS |
| Use Case | Server was down | Code was updated | Rollback bad deploy |
| Risk Level | Low (same code) | Medium (new code) | Low (proven code) |
| Speed | Fast | Fast | Fast |
Best Practices
1. Understand Failure Before Retry
✅ Do: Read error messages and deployment logs ✅ Do: Identify root cause (server issue vs code issue) ✅ Do: Fix external issues before retry ❌ Don't: Blindly retry hoping it will work
2. Use Correct Action for Situation
✅ Do: Use Retry for external failures (connectivity, permissions) ✅ Do: Use Redeploy after pushing code fixes ✅ Do: Use Rollback for quick revert to stable state ❌ Don't: Use Retry when code needs fixing (it won't help)
3. Monitor Retry Attempts
✅ Do: Watch deployment progress in real-time ✅ Do: Check if retry encounters same error ✅ Do: Note version numbers for audit trail ❌ Don't: Retry more than 2-3 times without investigating
4. Document Retry Reasons
✅ Do: Add comments in project tracking system ✅ Do: Note what external issue was fixed ✅ Do: Update runbooks with learnings ❌ Don't: Silently retry without team communication
5. Check for Concurrent Operations
✅ Do: Verify no other deployments running ✅ Do: Coordinate with team before retrying ✅ Do: Use deployment queue to avoid conflicts ❌ Don't: Force multiple concurrent deployments
API Reference
Retry Failed Deployment
Endpoint: POST /api/v1/deployments/{deployment_id}/retry
Authentication: Required (Bearer token)
Path Parameters:
deployment_id(UUID, required) - ID of failed deployment to retry
Request Body: None
Response (200 OK):
{
"id": "new-deploy-uuid",
"project_id": "proj-uuid",
"environment_id": "env-uuid",
"vm_id": "server-uuid",
"version": 3,
"status": "pending",
"trigger": "manual",
"triggered_by": "user-uuid",
"git_commit": "abc123def456",
"git_branch": "main",
"git_message": "Original commit message",
"extra_data": {
"retry_of": "original-deploy-uuid"
},
"created_at": "2024-12-11T11:00:00Z",
"started_at": null,
"completed_at": null,
"duration_seconds": null,
"error_message": null
}Errors:
400 Bad Request- Can only retry failed deployments403 Forbidden- Missingproject.environments.deploypermission404 Not Found- Deployment not found409 Conflict- Another deployment already in progress
Get Deployment Details
Endpoint: GET /api/v1/deployments/{deployment_id}
Response includes extra_data.retry_of field for retry deployments:
{
"id": "deploy-uuid",
"version": 3,
"status": "success",
"extra_data": {
"retry_of": "original-failed-deploy-uuid"
}
}Code Examples
Example 1: Retry via cURL
# Step 1: Find failed deployment
curl -X GET "https://api.oec.sh/api/v1/projects/proj-uuid/deployments?status=failed" \
-H "Authorization: Bearer YOUR_TOKEN"
# Response shows deployment ID: deploy-abc123
# Step 2: Retry the deployment
curl -X POST "https://api.oec.sh/api/v1/deployments/deploy-abc123/retry" \
-H "Authorization: Bearer YOUR_TOKEN"
# Response: New deployment created with status "pending"Example 2: Retry via Python
import requests
# Configuration
API_URL = "https://api.oec.sh/api/v1"
TOKEN = "your_jwt_token"
headers = {"Authorization": f"Bearer {TOKEN}"}
# Find failed deployment
response = requests.get(
f"{API_URL}/projects/proj-uuid/deployments",
params={"status": "failed"},
headers=headers
)
deployments = response.json()["items"]
if deployments:
failed_deployment = deployments[0]
print(f"Found failed deployment v{failed_deployment['version']}")
print(f"Error: {failed_deployment['error_message']}")
# Retry the deployment
retry_response = requests.post(
f"{API_URL}/deployments/{failed_deployment['id']}/retry",
headers=headers
)
if retry_response.status_code == 200:
new_deployment = retry_response.json()
print(f"Retry created: v{new_deployment['version']}")
print(f"Status: {new_deployment['status']}")
else:
print(f"Retry failed: {retry_response.json()['detail']}")Example 3: Retry via TypeScript (Frontend)
import { deploymentsApi } from '@/lib/api/deployments';
const handleRetry = async (failedDeploymentId: string) => {
try {
// Call retry endpoint
const response = await deploymentsApi.retry(failedDeploymentId);
// Show success message
toast.success(`Retry queued: Deployment v${response.data.version}`);
// Navigate to deployment progress
router.push(`/deployments/${response.data.id}`);
} catch (error) {
// Handle errors
if (error.response?.status === 400) {
toast.error("Can only retry failed deployments");
} else if (error.response?.status === 409) {
toast.error("Another deployment is already in progress");
} else {
toast.error("Failed to retry deployment");
}
}
};Related Documentation
- View Deployment Logs - Understand why deployment failed
- Deploy Environment - Initial deployment process
- Environment Actions - Redeploy and other actions
Last Updated: December 11, 2025 Applies to: OEC.SH v2.0+ Related Sprint: Sprint 2E41 - Documentation System
Implementation Status: ⚠️ Partially implemented - Retry endpoint exists but requires completion:
- ✅ Creates new deployment record
- ✅ Preserves Git commit and branch
- ✅ Validates pre-conditions
- ⚠️ Missing: Copy environment_id and vm_id
- ⚠️ Missing: Trigger background task execution
- ⚠️ Missing: Frontend UI component