Environment details
- OS type and version: macOS 24.6.0 (darwin arm64)
- Python version: N/A (using REST API directly via curl, not the Python SDK)
- pip version: N/A
google-cloud-aiplatform version: N/A — REST API v1beta1
Summary
After successfully deploying a Dockerfile-based reasoning engine via the REST API (image_spec: {} + inline_source.source_archive), deleting that resource with force=true, and then attempting to create a new one, all subsequent CreateReasoningEngine operations fail with code 13 in us-central1. The same request succeeds in us-east4 within the same project.
Cloud Logging confirms the build completes and the container starts healthy — the failure is in Agent Engine's internal post-deploy verification.
Steps to reproduce
-
Deploy a Dockerfile-based reasoning engine via REST API to us-central1:
curl -X POST \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/us-central1/reasoningEngines" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-d '{
"display_name": "my-dockerfile-agent",
"spec": {
"source_code_spec": {
"inline_source": { "source_archive": "<base64-tar-gz-of-Dockerfile-and-source>" },
"image_spec": {}
},
"agent_framework": "custom",
"class_methods": [{"name": "query", "api_mode": ""}],
"deployment_spec": {
"env": [{"name": "SOME_VAR", "value": "some_value"}],
"min_instances": 1,
"max_instances": 1,
"resource_limits": {"cpu": "4", "memory": "8Gi"}
}
}
}'
Result: Success — operation completes, resource created, :query works.
-
Delete the reasoning engine:
curl -X DELETE \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/{PROJECT}/locations/us-central1/reasoningEngines/{RESOURCE_ID}?force=true" \
-H "Authorization: Bearer $(gcloud auth print-access-token)"
Result: Delete succeeds (done: true).
-
Create a new reasoning engine with the same or different payload:
# Same curl as step 1, different display_name
Result: Fails with code 13 every time.
-
Deploy the exact same payload to us-east4 in the same project:
# Same curl but with us-east4 in the URL
Result: Success — deploys fine, container starts, :query works.
Observed behavior
- The operation is accepted and returns an operation name
- Cloud Logging (
reasoning_engine_build) shows the Dockerfile build completing successfully ("DONE", image pushed with SHA digest)
- Cloud Logging (
reasoning_engine_stdout) shows the container starting and logging that it's listening on port 8080
- Despite the container being healthy, the operation completes with code 13
Expected behavior
CreateReasoningEngine should succeed since the build completes and the container starts healthy. Deleting and recreating a reasoning engine should not permanently break the region for the project.
Minimal Dockerfile used for testing
FROM node:22-slim
WORKDIR /app
COPY server.js ./
CMD ["node", "server.js"]
// server.js
const http = require("http");
http.createServer((req, res) => {
if (req.url === "/ping") { res.end(JSON.stringify({status:"ok"})); return; }
let body = "";
req.on("data", c => body += c);
req.on("end", () => {
res.writeHead(200, {"content-type":"application/json"});
res.end(JSON.stringify({output: "echo: " + body}));
});
}).listen(8080, () => console.log("listening on 8080"));
Even this minimal 2-file container fails with code 13 in us-central1 after the delete, but deploys fine in us-east4.
Error response
{
"name": "projects/{NUMBER}/locations/us-central1/reasoningEngines/{ID}/operations/{OP_ID}",
"metadata": {
"@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.CreateReasoningEngineOperationMetadata",
"genericMetadata": {
"createTime": "2026-05-07T23:59:49.660727Z",
"updateTime": "2026-05-07T23:59:49.660727Z"
}
},
"done": true,
"error": {
"code": 13,
"message": "Please refer to our documentation (https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/troubleshooting/deploy) for checking logs and other troubleshooting tips."
}
}
Additional context
- Region:
us-central1 is broken, us-east4 works — same project, same payload, same permissions
- The issue started immediately after deleting a previously deployed reasoning engine
- All IAM roles verified (reasoningEngineServiceAgent, artifactregistry.reader, storage.objectAdmin, logging.logWriter)
- Staging bucket exists and is accessible
- Cloud Resource Manager API is enabled
- No VPC-SC configured
- Multiple retries over 2+ hours — issue does not self-heal
- Operations cannot be cancelled via the API ("not cancellable")
Hypothesis
Deleting the reasoning engine left orphaned internal state (Cloud Run revision, internal AR image reference, or routing configuration) in us-central1 that blocks new reasoning engine deployments from completing their post-deploy verification step. The build and container startup succeed, but the orchestration layer's readiness check fails against stale internal state.
Environment details
google-cloud-aiplatformversion: N/A — REST API v1beta1Summary
After successfully deploying a Dockerfile-based reasoning engine via the REST API (
image_spec: {}+inline_source.source_archive), deleting that resource withforce=true, and then attempting to create a new one, all subsequentCreateReasoningEngineoperations fail with code 13 inus-central1. The same request succeeds inus-east4within the same project.Cloud Logging confirms the build completes and the container starts healthy — the failure is in Agent Engine's internal post-deploy verification.
Steps to reproduce
Deploy a Dockerfile-based reasoning engine via REST API to
us-central1:Result: Success — operation completes, resource created,
:queryworks.Delete the reasoning engine:
Result: Delete succeeds (
done: true).Create a new reasoning engine with the same or different payload:
# Same curl as step 1, different display_nameResult: Fails with code 13 every time.
Deploy the exact same payload to
us-east4in the same project:# Same curl but with us-east4 in the URLResult: Success — deploys fine, container starts,
:queryworks.Observed behavior
reasoning_engine_build) shows the Dockerfile build completing successfully ("DONE", image pushed with SHA digest)reasoning_engine_stdout) shows the container starting and logging that it's listening on port 8080Expected behavior
CreateReasoningEngineshould succeed since the build completes and the container starts healthy. Deleting and recreating a reasoning engine should not permanently break the region for the project.Minimal Dockerfile used for testing
Even this minimal 2-file container fails with code 13 in
us-central1after the delete, but deploys fine inus-east4.Error response
{ "name": "projects/{NUMBER}/locations/us-central1/reasoningEngines/{ID}/operations/{OP_ID}", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1beta1.CreateReasoningEngineOperationMetadata", "genericMetadata": { "createTime": "2026-05-07T23:59:49.660727Z", "updateTime": "2026-05-07T23:59:49.660727Z" } }, "done": true, "error": { "code": 13, "message": "Please refer to our documentation (https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/troubleshooting/deploy) for checking logs and other troubleshooting tips." } }Additional context
us-central1is broken,us-east4works — same project, same payload, same permissionsHypothesis
Deleting the reasoning engine left orphaned internal state (Cloud Run revision, internal AR image reference, or routing configuration) in
us-central1that blocks new reasoning engine deployments from completing their post-deploy verification step. The build and container startup succeed, but the orchestration layer's readiness check fails against stale internal state.