Relatenta v1.1.0 — Research Insight Enhancement Development Specification

Version: 1.1.0 Date: 2026-03-13 Status: Development Goal: Transform from "visualization tool" to "research insight platform"

1. Overview

1.1 Objective

Add 7 new analysis features to provide actionable research insights beyond basic visualization. All features use existing data model and open-source libraries — no external API or ML model dependencies.

1.2 Feature List

#	Feature	New File	Benchmarked From
F1	Community Detection & Cluster Visualization	`app/services_insight.py`	VOSviewer (Leiden algorithm)
F2	Burst Detection (Emerging Topics)	`app/services_insight.py`	CiteSpace (Kleinberg's burst)
F3	Collaborator Recommendation	`app/services_insight.py`	ResearchRabbit
F4	Shortest Path Analysis	`app/services_insight.py`	Inciteful (Literature Connector)
F5	Research Gap Detection	`app/services_insight.py`	Inciteful + SciSpace (Structural Holes)
F6	Strategic Diagram	`app/services_insight.py`	SciMAT + Bibliometrix
F7	Thematic Evolution	`app/services_insight.py`	Bibliometrix (Thematic Evolution Map)

1.3 Architecture Decision

All 7 features are implemented in a single new module: app/services_insight.py
Each feature is a standalone function that takes a SQLAlchemy Session and returns structured data
UI integration in streamlit_app.py via a new "Insights" tab
No new database tables required — all features compute from existing tables
New dependency: community (python-louvain) for community detection

2. Detailed Feature Specifications

F1: Community Detection & Cluster Visualization

Purpose: Automatically identify research communities/clusters within the network and color-code them for visual insight.

Algorithm: Louvain community detection (NetworkX + python-louvain)

Chosen over Leiden for compatibility (python-louvain is pip-installable, leidenalg requires C++ build)
Modularity-based optimization, resolution parameter exposed to user

Input:

db: Session — database session
layer: str — "authors", "keywords", "orgs", "nations"
resolution: float — resolution parameter (default 1.0, higher = more communities)
year_min/year_max: int | None — optional year filter

Processing:

Build NetworkX Graph from existing edge tables (CoauthorEdge, keyword co-occurrence, OrgEdge, NationEdge)
Apply year filter to restrict works/edges
Run community.best_partition(G, resolution=resolution) to assign each node to a community
Compute per-community statistics: size, density, top members

Output:

{
    "communities": {
        0: {"nodes": ["A1", "A2", ...], "size": 5, "density": 0.8, "label": "Top keyword or author"},
        1: {"nodes": ["A3", "A4", ...], "size": 3, "density": 0.6, "label": "..."},
        ...
    },
    "partition": {"A1": 0, "A2": 0, "A3": 1, ...},  # node_id -> community_id
    "modularity": 0.45,
    "num_communities": 4
}

UI:

Dropdown to select resolution parameter (0.5, 1.0, 1.5, 2.0)
Graph nodes colored by community assignment
Summary table showing community statistics
Expandable details per community (members, internal density, keywords)

Complexity: O(n log n) for Louvain algorithm

F2: Burst Detection (Emerging Topics)

Purpose: Identify keywords/topics that have experienced sudden growth in recent years — indicating emerging research fronts.

Algorithm: Growth rate-based burst detection

Calculate year-over-year growth rate for each keyword
Identify keywords with sustained high growth over a configurable window
Rank by burst score = (recent_count - baseline_count) / baseline_count

Input:

db: Session
window_years: int — burst detection window (default 3 years)
min_papers: int — minimum total papers for a keyword to be considered (default 3)

Processing:

Query WorkKeyword joined with Work.year to get per-keyword per-year counts
Calculate baseline (average count in years before window) and recent (average in window)
Compute burst_score = (recent - baseline) / max(baseline, 1)
Filter keywords with min_papers threshold
Rank by burst_score descending

Output:

[
    {
        "keyword_id": 42,
        "keyword": "Federated Learning",
        "burst_score": 4.5,       # 450% growth
        "baseline_avg": 2.0,      # avg papers/year before window
        "recent_avg": 11.0,       # avg papers/year in window
        "trend": [0, 1, 2, 3, 5, 8, 15, 20],  # yearly counts
        "years": [2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025],
        "status": "burst"         # burst | growing | stable | declining
    },
    ...
]

UI:

Bar chart of top 15 bursting keywords with burst scores
Sparkline mini-charts showing per-keyword trend
Status badges (Burst / Growing / Stable / Declining)
Configurable window_years slider

F3: Collaborator Recommendation

Purpose: Suggest potential collaborators based on research interest overlap and network proximity.

Algorithm: Multi-signal scoring

Keyword Similarity (Jaccard): |shared_keywords| / |union_keywords| between two authors
Network Proximity: Common neighbor count in co-authorship network
Complementarity Bonus: Reward authors with some overlap but also unique expertise

Input:

db: Session
author_id: int — target author for recommendations
top_n: int — number of recommendations (default 10)

Processing:

Get target author's keyword set from WorkKeyword via WorkAuthor
For each other author NOT already a co-author: a. Compute Jaccard similarity of keyword sets b. Count common co-authorship neighbors (friends-of-friends) c. Compute complementarity = |their_unique_keywords| / |total_keywords|
Score = 0.5 * jaccard + 0.3 * normalized_common_neighbors + 0.2 * complementarity
Rank by score descending, return top_n

Output:

[
    {
        "author_id": 15,
        "author_name": "Dr. Jane Smith",
        "score": 0.72,
        "jaccard_similarity": 0.65,
        "common_neighbors": 3,
        "common_neighbor_names": ["Alice", "Bob", "Carol"],
        "shared_keywords": ["Deep Learning", "NLP"],
        "unique_keywords": ["Reinforcement Learning", "Robotics"],
        "path_length": 2  # shortest path distance in co-authorship network
    },
    ...
]

UI:

Select target author from dropdown
Ranked table of recommendations with similarity scores
Visual breakdown: shared keywords, common neighbors, unique expertise
Click to highlight recommended author in graph

F4: Shortest Path Analysis

Purpose: Find the shortest collaboration path between any two researchers — enabling networking opportunity discovery.

Algorithm: NetworkX shortest_path (BFS-based for unweighted)

Input:

db: Session
source_id: int — starting author
target_id: int — destination author
layer: str — "authors" (default), extensible to other layers

Processing:

Build NetworkX Graph from CoauthorEdge table
Run nx.shortest_path(G, source, target) to find shortest path
If no path exists, report disconnected
For each intermediate node, fetch author details
For each edge in path, fetch shared publications

Output:

{
    "path_exists": True,
    "path_length": 3,
    "path": [
        {"author_id": 1, "name": "Source Author"},
        {"author_id": 5, "name": "Intermediary 1", "shared_papers_with_prev": 4},
        {"author_id": 8, "name": "Intermediary 2", "shared_papers_with_prev": 2},
        {"author_id": 12, "name": "Target Author", "shared_papers_with_prev": 1}
    ],
    "all_paths": [...]  # up to 5 alternative paths (if available)
}

UI:

Two dropdown selectors for source and target authors
Visual path display with author names and edge weights
"No path found" message if disconnected
Alternative paths (if any)

F5: Research Gap Detection (Structural Holes)

Purpose: Identify under-explored research areas by finding "structural holes" in the keyword co-occurrence network — pairs of active keyword clusters that rarely appear together.

Algorithm: Burt's Structural Holes + Bridge Detection

Build keyword co-occurrence network
Detect communities (reuse F1)
Identify cross-community edges with low weight relative to intra-community edges
Compute bridging score for each cross-community keyword pair

Input:

db: Session
year_min/year_max: int | None
min_keyword_count: int — minimum papers for a keyword to be included (default 3)
top_n: int — number of gaps to return (default 15)

Processing:

Build keyword co-occurrence Graph from works (filtered by year)
Run community detection on keyword network
For each pair of communities (C_i, C_j): a. Count inter-community edges and their total weight b. Count intra-community edges for both c. Compute gap_score = (intra_density_avg - inter_density) / intra_density_avg
For each gap, identify the most representative keywords from each community
Rank by gap_score * community_size_product

Output:

[
    {
        "community_a": {"id": 0, "top_keywords": ["Deep Learning", "CNN"]},
        "community_b": {"id": 2, "top_keywords": ["Medical Ethics", "Policy"]},
        "gap_score": 0.85,
        "inter_edges": 2,
        "potential_bridges": ["AI Ethics"],  # keywords that weakly connect both
        "suggestion": "Deep Learning + Medical Ethics: active individually but rarely combined"
    },
    ...
]

UI:

Table of research gaps ranked by score
Each gap shows the two keyword clusters and their weak connection
"Bridge keywords" that weakly span both clusters
Expandable suggestion text

F6: Strategic Diagram

Purpose: Map research themes on a 2D strategic diagram (centrality vs density) to classify them as Motor/Niche/Emerging/Declining.

Algorithm: Callon's centrality-density analysis (SciMAT methodology)

Centrality (X-axis): External cohesion — how strongly a keyword cluster connects to other clusters
Density (Y-axis): Internal cohesion — how strongly keywords within a cluster are interconnected

Input:

db: Session
year_min/year_max: int | None
min_keyword_count: int — minimum papers for inclusion (default 3)

Processing:

Build keyword co-occurrence network (filtered by year)
Run community detection (reuse F1 result)
For each community C_i: a. Density = sum of intra-community edge weights / (|C_i| * (|C_i| - 1) / 2) b. Centrality = sum of inter-community edge weights (C_i to all other clusters) / (|C_i| * |all_other_nodes|)
Normalize both to 0-1 range
Classify quadrant:
- Q1 (high centrality, high density) = Motor themes — well-developed and central
- Q2 (low centrality, high density) = Niche themes — well-developed but peripheral
- Q3 (low centrality, low density) = Emerging or Declining themes
- Q4 (high centrality, low density) = Basic/Transversal themes — central but underdeveloped

Output:

{
    "themes": [
        {
            "cluster_id": 0,
            "label": "Deep Learning / CNN",
            "top_keywords": ["Deep Learning", "CNN", "Image Recognition"],
            "centrality": 0.82,
            "density": 0.75,
            "quadrant": "Motor",
            "size": 12,  # number of keywords
            "total_papers": 150
        },
        ...
    ],
    "median_centrality": 0.5,
    "median_density": 0.5
}

UI:

Plotly scatter plot with 4 quadrants
X-axis: Centrality, Y-axis: Density
Bubble size = number of papers, color = quadrant
Quadrant labels: Motor / Niche / Emerging or Declining / Basic & Transversal
Hover tooltip showing top keywords and paper count
Median lines dividing the quadrants

F7: Thematic Evolution

Purpose: Show how research themes evolve over time — how keyword clusters form, merge, split, or disappear across time periods.

Algorithm: Temporal keyword clustering + Sankey/alluvial flow

Divide time range into periods
Run community detection on each period's keyword network
Map cluster continuity across periods by keyword overlap

Input:

db: Session
n_periods: int — number of time slices (default 3)
min_keyword_count: int — minimum papers per keyword per period (default 2)

Processing:

Get year range from data, divide into n_periods equal intervals
For each period: a. Build keyword co-occurrence network (works within that period) b. Run community detection c. Label each community by its top-2 keywords
For consecutive periods (P_i, P_{i+1}): a. For each community C in P_i and community D in P_{i+1}:
- Compute overlap = |keywords_in_C ∩ keywords_in_D| / |keywords_in_C ∪ keywords_in_D| b. Create flow edges where overlap > threshold (default 0.1) c. Flow weight = overlap * min(size_C, size_D)
Classify evolution events:
- Continuation: One cluster maps primarily to one cluster
- Merge: Multiple clusters map to one
- Split: One cluster maps to multiple
- Emergence: Cluster with no predecessor
- Disappearance: Cluster with no successor

Output:

{
    "periods": [
        {"label": "2018-2020", "start": 2018, "end": 2020},
        {"label": "2021-2023", "start": 2021, "end": 2023},
        {"label": "2024-2026", "start": 2024, "end": 2026}
    ],
    "nodes": [
        {"id": "P0_C0", "label": "Deep Learning / CNN", "period": 0, "size": 45},
        {"id": "P0_C1", "label": "NLP / RNN", "period": 0, "size": 30},
        {"id": "P1_C0", "label": "Deep Learning / Transformer", "period": 1, "size": 60},
        ...
    ],
    "flows": [
        {"source": "P0_C0", "target": "P1_C0", "weight": 35, "overlap": 0.65},
        {"source": "P0_C1", "target": "P1_C0", "weight": 20, "overlap": 0.40},
        ...
    ],
    "events": [
        {"type": "merge", "description": "Deep Learning/CNN + NLP/RNN merged into Deep Learning/Transformer"}
    ]
}

UI:

Plotly Sankey diagram showing flows between periods
Each column = one time period
Node height = cluster size (paper count)
Flow width = keyword overlap strength
Color coding by cluster identity
Evolution event annotations

3. UI Integration Plan

3.1 New Tab: "Insights"

Add a 5th tab to the main interface (after Reports, before How-to):

Graph | Heatmaps | Reports | Insights | How-to

3.2 Insights Tab Layout

+--------------------------------------------------+
| Insights                                          |
|                                                   |
| [Analysis Type ▼]  [Run Analysis]                |
|                                                   |
| ┌──── Analysis Options ───────────────────────┐  |
| │ (Dynamic controls based on selected type)    │  |
| └──────────────────────────────────────────────┘  |
|                                                   |
| ┌──── Results ─────────────────────────────────┐  |
| │ (Charts, tables, visualizations)             │  |
| └──────────────────────────────────────────────┘  |
+--------------------------------------------------+

3.3 Analysis Type Options

Community Detection
Emerging Topics (Burst Detection)
Collaborator Recommendation
Shortest Path (Networking Path)
Research Gap Detection
Strategic Diagram
Thematic Evolution

4. Dependencies

4.1 New Package

python-louvain>=0.16    # community detection (Louvain algorithm)

4.2 Existing Packages (no changes)

networkx>=3.0 — graph construction, shortest path, centrality
plotly>=5.15.0 — Sankey diagram, scatter plot, bar charts
pandas>=2.0.0 — data processing

5. File Changes Summary

File	Change Type	Description
`app/services_insight.py`	NEW	All 7 insight analysis functions
`streamlit_app.py`	MODIFY	Add Insights tab, import services_insight
`requirements.txt`	MODIFY	Add python-louvain
`VERSION`	MODIFY	Update to 1.1.0
`CHANGELOG.md`	MODIFY	Add v1.1.0 entry
`README.md`	MODIFY	Update feature list and documentation links

6. Testing Strategy

6.1 Unit Test Scenarios

Each function tested with:

Empty database (should return empty/default results)
Single author/keyword (edge cases)
Normal dataset (Geoffrey Hinton demo data)
Year filtering applied

6.2 Integration Test

Full workflow: Ingest data -> Run each analysis -> Verify UI renders without error

6.3 Performance Expectations

All analyses should complete within 5 seconds for datasets up to 1,000 works
Community detection: O(n log n)
Shortest path: O(V + E)
Burst detection: O(K * Y) where K=keywords, Y=years
Strategic diagram: O(K^2) worst case
Thematic evolution: O(P * K^2) where P=periods

7. Risk Assessment

Risk	Mitigation
python-louvain not installable	Fallback to NetworkX greedy_modularity_communities
Large dataset performance	Apply top-N filtering before expensive computations
No data for analysis	Show informative empty state messages
Community detection returns 1 community	Show message "Network too small or uniform for community detection"
No path between authors	Show "No collaboration path found" with explanation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relatenta v1.1.0 — Research Insight Enhancement Development Specification

1. Overview

1.1 Objective

1.2 Feature List

1.3 Architecture Decision

2. Detailed Feature Specifications

F1: Community Detection & Cluster Visualization

F2: Burst Detection (Emerging Topics)

F3: Collaborator Recommendation

F4: Shortest Path Analysis

F5: Research Gap Detection (Structural Holes)

F6: Strategic Diagram

F7: Thematic Evolution

3. UI Integration Plan

3.1 New Tab: "Insights"

3.2 Insights Tab Layout

3.3 Analysis Type Options

4. Dependencies

4.1 New Package

4.2 Existing Packages (no changes)

5. File Changes Summary

6. Testing Strategy

6.1 Unit Test Scenarios

6.2 Integration Test

6.3 Performance Expectations

7. Risk Assessment

FilesExpand file tree

v1.1.0_Enhancement_Development_Spec.md

Latest commit

History

v1.1.0_Enhancement_Development_Spec.md

File metadata and controls

Relatenta v1.1.0 — Research Insight Enhancement Development Specification

1. Overview

1.1 Objective

1.2 Feature List

1.3 Architecture Decision

2. Detailed Feature Specifications

F1: Community Detection & Cluster Visualization

F2: Burst Detection (Emerging Topics)

F3: Collaborator Recommendation

F4: Shortest Path Analysis

F5: Research Gap Detection (Structural Holes)

F6: Strategic Diagram

F7: Thematic Evolution

3. UI Integration Plan

3.1 New Tab: "Insights"

3.2 Insights Tab Layout

3.3 Analysis Type Options

4. Dependencies

4.1 New Package

4.2 Existing Packages (no changes)

5. File Changes Summary

6. Testing Strategy

6.1 Unit Test Scenarios

6.2 Integration Test

6.3 Performance Expectations

7. Risk Assessment