CreateReasoningEngine fails with code 13 after deleting previous resource (Dockerfile/image_spec path, us-central1)

#### Environment details

  - OS type and version: macOS 24.6.0 (darwin arm64)
  - Python version: N/A (using REST API directly via curl, not the Python SDK)
  - pip version: N/A
  - `google-cloud-aiplatform` version: N/A — REST API v1beta1

#### Summary

After successfully deploying a Dockerfile-based reasoning engine via the REST API (`image_spec: {}` + `inline_source.source_archive`), deleting that resource with `force=true`, and then attempting to create a new one, **all subsequent `CreateReasoningEngine` operations fail with code 13** in `us-central1`. The same request succeeds in `us-east4` within the same project.

Cloud Logging confirms the build completes and the container starts healthy — the failure is in Agent Engine's internal post-deploy verification.

#### Steps to reproduce

1. Deploy a Dockerfile-based reasoning engine via REST API to `us-central1`:
   ```bash
   curl -X POST \
     "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/us-central1/reasoningEngines" \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -d '{
       "display_name": "my-dockerfile-agent",
       "spec": {
         "source_code_spec": {
           "inline_source": { "source_archive": "<base64-tar-gz-of-Dockerfile-and-source>" },
           "image_spec": {}
         },
         "agent_framework": "custom",
         "class_methods": [{"name": "query", "api_mode": ""}],
         "deployment_spec": {
           "env": [{"name": "SOME_VAR", "value": "some_value"}],
           "min_instances": 1,
           "max_instances": 1,
           "resource_limits": {"cpu": "4", "memory": "8Gi"}
         }
       }
     }'
   ```
   Result: **Success** — operation completes, resource created, `:query` works.

2. Delete the reasoning engine:
   ```bash
   curl -X DELETE \
     "https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/us-central1/reasoningEngines/{RESOURCE_ID}?force=true" \
     -H "Authorization: Bearer $(gcloud auth print-access-token)"
   ```
   Result: Delete succeeds (`done: true`).

3. Create a new reasoning engine with the same or different payload:
   ```bash
   # Same curl as step 1, different display_name
   ```
   Result: **Fails with code 13** every time.

4. Deploy the **exact same payload** to `us-east4` in the same project:
   ```bash
   # Same curl but with us-east4 in the URL
   ```
   Result: **Success** — deploys fine, container starts, `:query` works.

#### Observed behavior

- The operation is accepted and returns an operation name
- Cloud Logging (`reasoning_engine_build`) shows the Dockerfile build completing successfully ("DONE", image pushed with SHA digest)
- Cloud Logging (`reasoning_engine_stdout`) shows the container starting and logging that it's listening on port 8080
- Despite the container being healthy, the operation completes with code 13

#### Expected behavior

`CreateReasoningEngine` should succeed since the build completes and the container starts healthy. Deleting and recreating a reasoning engine should not permanently break the region for the project.

#### Minimal Dockerfile used for testing

```dockerfile
FROM node:22-slim
WORKDIR /app
COPY server.js ./
CMD ["node", "server.js"]
```

```javascript
// server.js
const http = require("http");
http.createServer((req, res) => {
  if (req.url === "/ping") { res.end(JSON.stringify({status:"ok"})); return; }
  let body = "";
  req.on("data", c => body += c);
  req.on("end", () => {
    res.writeHead(200, {"content-type":"application/json"});
    res.end(JSON.stringify({output: "echo: " + body}));
  });
}).listen(8080, () => console.log("listening on 8080"));
```

Even this minimal 2-file container fails with code 13 in `us-central1` after the delete, but deploys fine in `us-east4`.

#### Error response

```json
{
  "name": "projects/{NUMBER}/locations/us-central1/reasoningEngines/{ID}/operations/{OP_ID}",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.CreateReasoningEngineOperationMetadata",
    "genericMetadata": {
      "createTime": "2026-05-07T23:59:49.660727Z",
      "updateTime": "2026-05-07T23:59:49.660727Z"
    }
  },
  "done": true,
  "error": {
    "code": 13,
    "message": "Please refer to our documentation (https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/troubleshooting/deploy) for checking logs and other troubleshooting tips."
  }
}
```

#### Additional context

- Region: `us-central1` is broken, `us-east4` works — same project, same payload, same permissions
- The issue started immediately after deleting a previously deployed reasoning engine
- All IAM roles verified (reasoningEngineServiceAgent, artifactregistry.reader, storage.objectAdmin, logging.logWriter)
- Staging bucket exists and is accessible
- Cloud Resource Manager API is enabled
- No VPC-SC configured
- Multiple retries over 2+ hours — issue does not self-heal
- Operations cannot be cancelled via the API ("not cancellable")

#### Hypothesis

Deleting the reasoning engine left orphaned internal state (Cloud Run revision, internal AR image reference, or routing configuration) in `us-central1` that blocks new reasoning engine deployments from completing their post-deploy verification step. The build and container startup succeed, but the orchestration layer's readiness check fails against stale internal state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CreateReasoningEngine fails with code 13 after deleting previous resource (Dockerfile/image_spec path, us-central1) #6754

Environment details

Summary

Steps to reproduce

Observed behavior

Expected behavior

Minimal Dockerfile used for testing

Error response

Additional context

Hypothesis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CreateReasoningEngine fails with code 13 after deleting previous resource (Dockerfile/image_spec path, us-central1) #6754

Description

Environment details

Summary

Steps to reproduce

Observed behavior

Expected behavior

Minimal Dockerfile used for testing

Error response

Additional context

Hypothesis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions