Issue
What is the issue?
Two internal data-access methods issue a growing number of database queries as the dataset scales, creating N+1 query patterns.
__make_cre_links fetches all Links rows for a CRE in one query, then issues a separate SELECT for each linked Node. For a CRE with L links this is 1 + L queries. Since __make_cre_links is called from get_CREs for every CRE being loaded, the total across M CREs becomes M x (1 + L).
__get_all_nodes_and_cres fetches all primary keys first, then re-queries each record individually. For CREs this chains into get_cre_by_db_id, which adds a further SELECT external_id FROM cre WHERE id = ? per CRE before calling get_CREs. Each CRE costs at minimum three round-trips before __make_cre_links is even reached. The same IDs-first pattern is present in the node branch.
__get_all_nodes_and_cres is called on every with_graph() invocation, which loads the in-memory CRE graph, making this the most impactful path.
Expected Behaviour
__make_cre_links should resolve all links and their associated nodes in a single JOIN query rather than one query per node.
__get_all_nodes_and_cres should fetch full ORM objects directly instead of fetching IDs first and re-querying each record individually.
Actual Behaviour
__make_cre_links issues 1 + N queries per CRE call where N is the number of linked nodes.
__get_all_nodes_and_cres issues multiple redundant round-trips per record due to the IDs-first fetch pattern and the get_cre_by_db_id chain.
Steps to reproduce
The behaviour can be observed by enabling SQLAlchemy query logging (SQLALCHEMY_ECHO=True) and calling any endpoint that triggers with_graph() or get_CREs() on a dataset with a reasonable number of CREs and links.
Issue
What is the issue?
Two internal data-access methods issue a growing number of database queries as the dataset scales, creating N+1 query patterns.
__make_cre_linksfetches allLinksrows for a CRE in one query, then issues a separateSELECTfor each linkedNode. For a CRE with L links this is 1 + L queries. Since__make_cre_linksis called fromget_CREsfor every CRE being loaded, the total across M CREs becomes M x (1 + L).__get_all_nodes_and_cresfetches all primary keys first, then re-queries each record individually. For CREs this chains intoget_cre_by_db_id, which adds a furtherSELECT external_id FROM cre WHERE id = ?per CRE before callingget_CREs. Each CRE costs at minimum three round-trips before__make_cre_linksis even reached. The same IDs-first pattern is present in the node branch.__get_all_nodes_and_cresis called on everywith_graph()invocation, which loads the in-memory CRE graph, making this the most impactful path.Expected Behaviour
__make_cre_linksshould resolve all links and their associated nodes in a single JOIN query rather than one query per node.__get_all_nodes_and_cresshould fetch full ORM objects directly instead of fetching IDs first and re-querying each record individually.Actual Behaviour
__make_cre_linksissues 1 + N queries per CRE call where N is the number of linked nodes.__get_all_nodes_and_cresissues multiple redundant round-trips per record due to the IDs-first fetch pattern and theget_cre_by_db_idchain.Steps to reproduce
The behaviour can be observed by enabling SQLAlchemy query logging (
SQLALCHEMY_ECHO=True) and calling any endpoint that triggerswith_graph()orget_CREs()on a dataset with a reasonable number of CREs and links.