diff --git a/SKILLS.md b/SKILLS.md index e4cc2c4..9c4babc 100644 --- a/SKILLS.md +++ b/SKILLS.md @@ -55,17 +55,23 @@ These tools walk the Collibra asset relation graph to answer lineage and semanti These tools query the technical lineage graph — a map of all data objects and transformations across external systems, including unregistered assets, temporary tables, and source code. Unlike business lineage (which only covers assets in the Collibra Data Catalog), technical lineage covers the full physical data flow. -**`search_lineage_entities`** — Search for data entities in the technical lineage graph by name, type, or DGC UUID. Use this as a starting point when you don't have an entity ID. Supports partial name matching and type filtering (e.g. `table`, `column`, `report`). Paginated. +**Workflow**: Almost all lineage questions follow the same pattern: **(1)** `search_lineage_entities` → **(2)** `get_lineage_upstream` or `get_lineage_downstream` → **(3)** optionally `get_lineage_entity` for the most relevant entities only. Do not resolve every entity ID — summarize from the graph structure and only look up entities the user specifically needs details on. Only call `get_lineage_transformation` when the user asks to see actual SQL or logic. -**`get_lineage_entity`** — Get full metadata for a specific lineage entity by ID: name, type, source systems, parent entity, and linked DGC identifier. Use after obtaining an entity ID from a search or lineage traversal. +**IMPORTANT — ID types**: Lineage tools use their own internal entity IDs, which are **not** the same as DGC asset UUIDs. You cannot pass a DGC asset UUID directly to `get_lineage_upstream` or `get_lineage_downstream`. To bridge from the catalog to the lineage graph, call `search_lineage_entities` with the asset's UUID as `dgcId` to obtain the lineage entity ID first. -**`get_lineage_upstream`** — Get all upstream entities (sources) for a data entity, along with the transformations connecting them. Use to answer "where does this data come from?". Paginated. +**LIMITATION — Column-level lineage**: Columns cannot be searched by name in `search_lineage_entities` (`nameContains` does not work for columns). The `dgcId` parameter also does not reliably resolve columns because there is no consistent mapping between Collibra catalog column UUIDs and technical lineage entity IDs. To reach a column in the lineage graph, first find its parent table (by name or `dgcId`), then use `get_lineage_upstream` or `get_lineage_downstream` on the table to discover its columns in the lineage graph. -**`get_lineage_downstream`** — Get all downstream entities (consumers) for a data entity, along with the transformations connecting them. Use to answer "what depends on this data?" or "what is impacted if this changes?". Paginated. +**`search_lineage_entities`** *(entry point)* — Search by name, type, or DGC UUID. **Start here** for almost all lineage questions to resolve an entity name or DGC asset UUID to a lineage entity ID. Supports partial name matching and type filtering (e.g. `table`, `column`, `report`). Paginated. **Note**: name search and DGC UUID lookup do not work reliably for columns — see limitation above. -**`search_lineage_transformations`** — Search for transformations by name. Returns lightweight summaries. Use to discover ETL jobs or SQL queries by name. +**`get_lineage_upstream`** *(step 2: trace sources)* — Given a lineage entity ID (not a DGC UUID), returns all upstream source entities and connecting transformations. Use to answer "where does this data come from?". Results contain entity IDs only. Paginated. -**`get_lineage_transformation`** — Get the full details of a transformation, including its SQL or script logic. Use after finding a transformation ID in an upstream/downstream result or search. +**`get_lineage_downstream`** *(step 2: trace consumers)* — Given a lineage entity ID (not a DGC UUID), returns all downstream consumer entities and connecting transformations. Use for impact analysis: "what depends on this?", "what breaks if this changes?". Results contain entity IDs only. Paginated. + +**`get_lineage_entity`** *(follow-up: resolve IDs)* — Get full metadata for a specific lineage entity by its lineage ID (not a DGC UUID): name, type, source systems, parent entity, and linked DGC identifier. Only call this for the most relevant entity IDs from upstream/downstream results — do not resolve every ID. + +**`get_lineage_transformation`** *(terminal: view logic)* — Get the full details of a transformation, including its SQL or script logic. Only call when the user explicitly asks about the transformation code. Do not call just to understand the lineage graph. + +**`search_lineage_transformations`** *(specialized)* — Search for transformations by name. Only use when the user explicitly asks about a transformation by name. This is **not** a general entry point for lineage questions — start with `search_lineage_entities` instead. ### Data Contracts @@ -103,13 +109,13 @@ These tools query the technical lineage graph — a map of all data objects and ### Trace upstream lineage for a data asset 1. `search_lineage_entities` with the asset name → get entity ID 2. `get_lineage_upstream` → relations with source entity IDs and transformation IDs -3. `get_lineage_entity` for any source entity to get its details -4. `get_lineage_transformation` for any transformation ID to see the logic +3. Summarize based on the graph structure — only call `get_lineage_entity` for the most relevant source entities, not all of them +4. Only call `get_lineage_transformation` if the user explicitly asks to see the SQL or logic ### Perform impact analysis (downstream) 1. `search_lineage_entities` with the asset name → get entity ID 2. `get_lineage_downstream` → relations with consumer entity IDs -3. Follow up with `get_lineage_entity` for specific consumers as needed +3. Summarize based on the graph structure — only call `get_lineage_entity` for the most relevant consumers, not all of them ### Manage a data contract 1. `list_data_contract` to find the contract UUID diff --git a/pkg/tools/get_lineage_downstream.go b/pkg/tools/get_lineage_downstream.go index 7d2b492..0240e60 100644 --- a/pkg/tools/get_lineage_downstream.go +++ b/pkg/tools/get_lineage_downstream.go @@ -9,7 +9,7 @@ import ( ) type GetLineageDownstreamInput struct { - EntityId string `json:"entityId" jsonschema:"Required. ID of the entity to trace downstream from. Can be numeric string or DGC UUID."` + EntityId string `json:"entityId" jsonschema:"Required. The lineage entity ID to trace downstream from (obtained from search_lineage_entities). This is NOT a DGC asset UUID — to go from a catalog asset to a lineage entity ID, first call search_lineage_entities with the asset's UUID as dgcId."` EntityType string `json:"entityType,omitempty" jsonschema:"Optional. Filter to only include entities of this type (e.g. 'table', 'report'). Useful when you only care about specific downstream asset types."` Limit int `json:"limit,omitempty" jsonschema:"Optional. Max relations per page. Default: 20, Min: 1, Max: 100."` Cursor string `json:"cursor,omitempty" jsonschema:"Optional. Pagination cursor from a previous response. Do not construct manually."` @@ -17,8 +17,12 @@ type GetLineageDownstreamInput struct { func NewGetLineageDownstreamTool(collibraClient *http.Client) *chip.Tool[GetLineageDownstreamInput, clients.GetLineageDirectionalOutput] { return &chip.Tool[GetLineageDownstreamInput, clients.GetLineageDirectionalOutput]{ - Name: "get_lineage_downstream", - Description: "Get the downstream technical lineage graph for a data entity -- all direct and indirect consumer entities that are impacted by it, along with the transformations connecting them. This traces through all data objects across external systems (including unregistered assets, temporary tables, and source code), not just assets in the Collibra Data Catalog. Use this to answer \"What depends on this data?\" or \"If this table changes, what else is affected?\" Essential for impact analysis before modifying or deprecating a data asset. Results are paginated.", + Name: "get_lineage_downstream", + Description: `WORKFLOW: Call this AFTER search_lineage_entities has given you an entity ID. This is the tool for impact analysis and tracing data consumers. + Use when the user asks: "what depends on this data?", "what uses this table?", "what breaks if this column changes?", "what reports use this data?", "what is the impact of changing this?". + Typical workflow: (1) search_lineage_entities to find the entity ID → (2) get_lineage_downstream with that ID → (3) optionally get_lineage_entity for the most relevant consumer entities only. + Returns: a paginated list of relations, each connecting the source entity to a downstream consumer entity ID through transformation IDs. Results contain IDs only — summarize what you can from the graph structure and only call get_lineage_entity for entities the user specifically needs details on. + Do not call get_lineage_transformation unless the user explicitly asks about the SQL or transformation logic.`, Handler: handleGetLineageDownstream(collibraClient), Permissions: []string{}, } diff --git a/pkg/tools/get_lineage_entity.go b/pkg/tools/get_lineage_entity.go index 9a08e1d..f14a10f 100644 --- a/pkg/tools/get_lineage_entity.go +++ b/pkg/tools/get_lineage_entity.go @@ -9,13 +9,16 @@ import ( ) type GetLineageEntityInput struct { - EntityId string `json:"entityId" jsonschema:"Required. Unique identifier of the data entity. Can be a numeric string (e.g. '12345') or a DGC UUID (e.g. '550e8400-e29b-41d4-a716-446655440000')."` + EntityId string `json:"entityId" jsonschema:"Required. The lineage entity ID (obtained from search_lineage_entities, get_lineage_upstream, or get_lineage_downstream). This is NOT a DGC asset UUID."` } func NewGetLineageEntityTool(collibraClient *http.Client) *chip.Tool[GetLineageEntityInput, clients.GetLineageEntityOutput] { return &chip.Tool[GetLineageEntityInput, clients.GetLineageEntityOutput]{ - Name: "get_lineage_entity", - Description: "Get detailed metadata about a specific data entity in the technical lineage graph. Technical lineage covers all data objects across external systems -- including source code, transformations, and temporary tables -- regardless of whether they are registered in Collibra (unlike business lineage, which only covers assets ingested into the Data Catalog). An entity represents any tracked data asset such as a table, column, file, report, API endpoint, or topic. Returns the entity's name, type, source systems, parent entity, and linked Data Governance Catalog (DGC) identifier. Use this when you have an entity ID from a lineage traversal, search result, or user input and need its full details.", + Name: "get_lineage_entity", + Description: `WORKFLOW: This is a FOLLOW-UP tool for resolving entity IDs you already have. Do not call this as a first step — start with search_lineage_entities instead. + Use when you have an entity ID from get_lineage_upstream or get_lineage_downstream results and need to know the entity's name, type, or other metadata. Returns: name, type, source systems, parent entity, and linked DGC identifier. + IMPORTANT: Upstream/downstream results return entity IDs only. You do NOT need to resolve every ID — summarize based on entity IDs and only call this tool for the most relevant entities the user asked about. Resolving all IDs wastes tool calls. + Do not call this if search_lineage_entities already returned the information you need.`, Handler: handleGetLineageEntity(collibraClient), Permissions: []string{}, } diff --git a/pkg/tools/get_lineage_transformation.go b/pkg/tools/get_lineage_transformation.go index 07ea994..e479080 100644 --- a/pkg/tools/get_lineage_transformation.go +++ b/pkg/tools/get_lineage_transformation.go @@ -14,8 +14,11 @@ type GetLineageTransformationInput struct { func NewGetLineageTransformationTool(collibraClient *http.Client) *chip.Tool[GetLineageTransformationInput, clients.GetLineageTransformationOutput] { return &chip.Tool[GetLineageTransformationInput, clients.GetLineageTransformationOutput]{ - Name: "get_lineage_transformation", - Description: "Get detailed information about a specific data transformation, including its SQL or script logic. A transformation represents a data processing activity (ETL job, SQL query, script, etc.) that connects source entities to target entities in the lineage graph. Use this when you found a transformation ID in an upstream/downstream lineage result and want to see what the transformation actually does -- the SQL query, script content, or processing logic.", + Name: "get_lineage_transformation", + Description: `WORKFLOW: This is a TERMINAL tool — only call it when the user explicitly wants to see the actual SQL, script, or transformation logic. Requires a transformation ID from a prior get_lineage_upstream or get_lineage_downstream result. + Use when the user asks: "show me the SQL", "what logic transforms this data?", "how is this ETL job defined?". + Do NOT call this just to understand the lineage graph — get_lineage_upstream and get_lineage_downstream already show which transformations connect entities, which is sufficient for most lineage questions. Only call this when the user wants the actual code or logic. + Do NOT call search_lineage_transformations to find a transformation ID if you already have it from upstream/downstream results.`, Handler: handleGetLineageTransformation(collibraClient), Permissions: []string{}, } diff --git a/pkg/tools/get_lineage_upstream.go b/pkg/tools/get_lineage_upstream.go index 6fa5ca0..9d8df55 100644 --- a/pkg/tools/get_lineage_upstream.go +++ b/pkg/tools/get_lineage_upstream.go @@ -9,7 +9,7 @@ import ( ) type GetLineageUpstreamInput struct { - EntityId string `json:"entityId" jsonschema:"Required. ID of the entity to trace upstream from. Can be numeric string or DGC UUID."` + EntityId string `json:"entityId" jsonschema:"Required. The lineage entity ID to trace upstream from (obtained from search_lineage_entities). This is NOT a DGC asset UUID — to go from a catalog asset to a lineage entity ID, first call search_lineage_entities with the asset's UUID as dgcId."` EntityType string `json:"entityType,omitempty" jsonschema:"Optional. Filter to only include entities of this type (e.g. 'table', 'column'). Useful when you only care about specific upstream asset types."` Limit int `json:"limit,omitempty" jsonschema:"Optional. Max relations per page. Default: 20, Min: 1, Max: 100."` Cursor string `json:"cursor,omitempty" jsonschema:"Optional. Pagination cursor from a previous response. Do not construct manually."` @@ -17,8 +17,12 @@ type GetLineageUpstreamInput struct { func NewGetLineageUpstreamTool(collibraClient *http.Client) *chip.Tool[GetLineageUpstreamInput, clients.GetLineageDirectionalOutput] { return &chip.Tool[GetLineageUpstreamInput, clients.GetLineageDirectionalOutput]{ - Name: "get_lineage_upstream", - Description: "Get the upstream technical lineage graph for a data entity -- all direct and indirect source entities that feed data into it, along with the transformations connecting them. This traces through all data objects across external systems (including unregistered assets, temporary tables, and source code), not just assets in the Collibra Data Catalog. Use this to answer \"Where does this data come from?\" or \"What are the sources feeding this table?\" Each relation in the result connects a source entity to a target entity through one or more transformations. Results are paginated.", + Name: "get_lineage_upstream", + Description: `WORKFLOW: Call this AFTER search_lineage_entities has given you an entity ID. This is the tool for tracing data sources. + Use when the user asks: "where does this data come from?", "what are the sources for this table?", "how is this column calculated?", "what feeds into this report?". + Typical workflow: (1) search_lineage_entities to find the entity ID → (2) get_lineage_upstream with that ID → (3) optionally get_lineage_entity for the most relevant source entities only. + Returns: a paginated list of relations, each connecting a source entity ID to the target through transformation IDs. Results contain IDs only — summarize what you can from the graph structure and only call get_lineage_entity for entities the user specifically needs details on. + Do not call get_lineage_transformation unless the user explicitly asks about the SQL or transformation logic. The upstream graph already shows which transformations connect entities.`, Handler: handleGetLineageUpstream(collibraClient), Permissions: []string{}, } diff --git a/pkg/tools/search_lineage_entities.go b/pkg/tools/search_lineage_entities.go index 11b5db6..969755a 100644 --- a/pkg/tools/search_lineage_entities.go +++ b/pkg/tools/search_lineage_entities.go @@ -11,15 +11,19 @@ import ( type SearchLineageEntitiesInput struct { NameContains string `json:"nameContains,omitempty" jsonschema:"Optional. Partial match on entity name (case insensitive). Min: 1, Max: 256 chars. Example: 'sales'"` Type string `json:"type,omitempty" jsonschema:"Optional. Exact match on entity type. Common types: table, column, file, report, apiEndpoint, topic. Example: 'table'"` - DgcId string `json:"dgcId,omitempty" jsonschema:"Optional. Filter by Data Governance Catalog UUID. Use to find the lineage entity linked to a specific Collibra catalog asset."` + DgcId string `json:"dgcId,omitempty" jsonschema:"Optional. Filter by Data Governance Catalog UUID. Use to find the lineage entity linked to a specific Collibra catalog asset. Tip: you can pass the 'assetId' from get_asset_details or discover_data_assets here to bridge from the catalog to the lineage graph."` Limit int `json:"limit,omitempty" jsonschema:"Optional. Max results per page. Default: 20, Min: 1, Max: 100."` Cursor string `json:"cursor,omitempty" jsonschema:"Optional. Pagination cursor from a previous response. Do not construct manually."` } func NewSearchLineageEntitiesTool(collibraClient *http.Client) *chip.Tool[SearchLineageEntitiesInput, clients.SearchLineageEntitiesOutput] { return &chip.Tool[SearchLineageEntitiesInput, clients.SearchLineageEntitiesOutput]{ - Name: "search_lineage_entities", - Description: "Search for data entities in the technical lineage graph by name, type, or DGC identifier. Technical lineage covers all data objects across external systems -- including source code, transformations, and temporary tables -- regardless of whether they are registered in Collibra (unlike business lineage, which only covers assets ingested into the Data Catalog). Returns a paginated list of matching entities. This is typically the starting tool when you don't have a specific entity ID -- for example, to find all tables with \"sales\" in the name, or to find the lineage entity linked to a specific Collibra catalog asset via its DGC UUID. Supports partial name matching (case insensitive).", + Name: "search_lineage_entities", + Description: `WORKFLOW: This is the ENTRY POINT for almost all lineage questions. Call this first to find entity IDs before using any other lineage tool. + Use when the user asks: "where does this data come from?", "what columns are in this report?", "what feeds into this table?", "what depends on this dataset?". Start here to resolve the entity name to an ID. + Searches the technical lineage graph (all data objects across external systems, including unregistered assets, temporary tables, and source code — not just Collibra catalog assets). Supports partial name matching (case insensitive), type filtering (table, column, file, report, apiEndpoint, topic), and DGC UUID lookup. Returns entity IDs and names (paginated). + LIMITATIONS — Column-level lineage lookups: Columns cannot be searched by name (nameContains does not work for columns). The dgcId parameter also does not reliably resolve columns because there is no consistent mapping between Collibra catalog column UUIDs and technical lineage entity IDs. To reach a column in the lineage graph, first find its parent table (by name or dgcId), then use get_lineage_upstream or get_lineage_downstream on the table to discover its columns in the lineage graph. + NEXT STEPS: Use the returned entity ID with get_lineage_upstream (to trace sources) or get_lineage_downstream (to trace consumers). Do not call get_lineage_entity unless you need metadata not already in the search results.`, Handler: handleSearchLineageEntities(collibraClient), Permissions: []string{}, } diff --git a/pkg/tools/search_lineage_transformations.go b/pkg/tools/search_lineage_transformations.go index e77669d..c67432a 100644 --- a/pkg/tools/search_lineage_transformations.go +++ b/pkg/tools/search_lineage_transformations.go @@ -16,8 +16,10 @@ type SearchLineageTransformationsInput struct { func NewSearchLineageTransformationsTool(collibraClient *http.Client) *chip.Tool[SearchLineageTransformationsInput, clients.SearchLineageTransformationsOutput] { return &chip.Tool[SearchLineageTransformationsInput, clients.SearchLineageTransformationsOutput]{ - Name: "search_lineage_transformations", - Description: "Search for transformations in the technical lineage graph by name. Returns a paginated list of matching transformation summaries. Use this to discover ETL jobs, SQL queries, or other processing activities without knowing their IDs. For example, find all transformations with \"etl\" or \"sales\" in the name. To see the full transformation logic (SQL/script), use get_lineage_transformation with the returned ID.", + Name: "search_lineage_transformations", + Description: `WORKFLOW: This is a SPECIALIZED tool — only use it when the user explicitly asks about a transformation by name (e.g. "find the ETL job called X"). This is NOT a general entry point for lineage questions. + For most lineage questions ("where does this data come from?", "what depends on this?"), start with search_lineage_entities instead — that is the correct entry point. Transformation IDs are normally obtained from get_lineage_upstream or get_lineage_downstream results, not from this search. + Use when the user asks: "find the transformation named X", "search for ETL jobs matching Y", "list transformations with 'sales' in the name". Returns paginated transformation summaries (ID and name). Use get_lineage_transformation with a returned ID to see the full SQL/logic.`, Handler: handleSearchLineageTransformations(collibraClient), Permissions: []string{}, }