This tutorial shows how to build a small Streamlit dashboard on top of data retrieved from the 4TU.ResearchData API. The goal is not to build a production application immediately, but to create a clean, teachable foundation that other people can run, extend, and publish.
This is part of a series of tutorials on different ways of using the API of 4TU.ResearchData relevant for researchers and support staff at technical universities.
You will create a small web app that:
- fetches metadata from the 4TU.ResearchData API,
- transforms the API response into a tabular structure with
pandas, - presents the result in Streamlit,
- adds a text search filter,
- adds a simple visualization,
- and can be deployed online with Streamlit Community Cloud.
A dashboard is easier to explain and maintain when you separate it into three layers:
This layer is responsible for talking to the API.
Typical responsibilities:
- define the API endpoint,
- send HTTP requests,
- pass authentication headers when needed,
- handle errors,
- return raw JSON.
In Python, this is usually implemented with requests.
This layer converts the raw JSON into a structure that is easier to use in the UI.
Typical responsibilities:
- extract the list of records from the API response,
- flatten nested JSON,
- rename columns,
- clean missing values,
- derive helper columns such as year, author string, or record URL.
In Python, this is typically done with pandas.
This layer renders the dashboard.
Typical responsibilities:
- page title and introduction,
- controls such as filters and search boxes,
- metrics and tables,
- charts,
- links back to the original records.
In this tutorial, this layer is built with Streamlit.
That gives you a simple mental model:
4TU.ResearchData API -> Python request -> pandas cleanup -> Streamlit dashboard
This separation is worth emphasizing in the tutorial because it makes later extensions much easier. For example, if the API response changes, you usually only update the access or transformation code, not the whole dashboard.
Before starting, complete the environment setup described in the workshop materials repo.
At minimum, make sure you have:
- Python 3.10+
- streamlit
- pandas
- requests
A simple structure for this lesson is:
project/
├── minimal_dashboard.py
├── requirements.txt
├── installation.md
└── tutorial_dashboard.md
-
The original python scripts lives in the workshop materials repo
-
repository file:
minimal_dashboard.py -
tutorial section: “Full minimal example” with the full code
-
follow-up subsections: “How it works” and “Extensions”
from __future__ import annotations
# ============================================================
# 1. Import the libraries we need
# ============================================================
import pandas as pd
import requests
import streamlit as st
# ============================================================
# 2. Basic page setup
# ============================================================
st.set_page_config(page_title="4TU Dataset Dashboard", layout="wide")
st.title("4TU.ResearchData Minimal Dashboard")
st.write("A simple Streamlit dashboard that reads dataset metadata from the 4TU.ResearchData API.")
# ============================================================
# 3. API settings
# ============================================================
BASE_URL = "https://data.4tu.nl"
ARTICLES_ENDPOINT = f"{BASE_URL}/v2/articles"
GROUPS_ENDPOINT = f"{BASE_URL}/v3/groups"
# ============================================================
# 4. Small helper function to request JSON from an endpoint
# ============================================================
def get_json(url: str, params: dict | None = None) -> list | dict:
response = requests.get(url, params=params, timeout=30)
response.raise_for_status()
return response.json()
# ============================================================
# 5. Load institutional groups
# We use this to translate group IDs into group names
# ============================================================
group_map = {}
try:
groups_data = get_json(GROUPS_ENDPOINT)
if isinstance(groups_data, list):
for group in groups_data:
group_id = group.get("id")
group_name = group.get("name")
if group_id is not None and group_name:
group_map[group_id] = group_name
except requests.RequestException as e:
st.warning(f"Could not load groups from the API: {e}")
# ============================================================
# 6. Load a small batch of datasets
# We keep the number small to make the workshop faster
# ============================================================
try:
articles_data = get_json(
ARTICLES_ENDPOINT,
params={
"limit": 500,
"offset": 0,
},
)
except requests.RequestException as e:
st.error(f"Could not load datasets from the API: {e}")
st.stop()
# ============================================================
# 7. Turn the JSON into a simpler table
# We only keep a few fields for the workshop
# ============================================================
rows = []
if isinstance(articles_data, list):
for article in articles_data:
article_id = article.get("id")
title = article.get("title", "No title")
published_date = article.get("published_date")
group_id = article.get("group_id")
institution = group_map.get(group_id, f"Unknown group ({group_id})")
rows.append(
{
"id": article_id,
"title": title,
"published_date": published_date,
"institution": institution,
}
)
df = pd.DataFrame(rows)
# ============================================================
# 8. Clean the date column
# We convert text dates into pandas datetime values
# ============================================================
if not df.empty:
df["published_date"] = pd.to_datetime(df["published_date"], errors="coerce")
# ============================================================
# 9. Sidebar filters
# ============================================================
st.sidebar.header("Filters")
# ---- Institution filter ----
institution_options = ["All"]
if not df.empty and "institution" in df.columns:
unique_institutions = sorted(df["institution"].dropna().unique().tolist())
institution_options.extend(unique_institutions)
selected_institution = st.sidebar.selectbox(
"Institution",
institution_options,
)
# ---- Date filter ----
min_date = None
max_date = None
if not df.empty and df["published_date"].notna().any():
min_date = df["published_date"].min().date()
max_date = df["published_date"].max().date()
selected_start_date = None
selected_end_date = None
if min_date and max_date:
selected_start_date = st.sidebar.date_input("Start date", value=min_date)
selected_end_date = st.sidebar.date_input("End date", value=max_date)
else:
st.sidebar.info("No valid dates available for filtering.")
# ============================================================
# 10. Apply the filters
# ============================================================
filtered_df = df.copy()
if selected_institution != "All":
filtered_df = filtered_df[filtered_df["institution"] == selected_institution]
if selected_start_date and selected_end_date:
filtered_df = filtered_df[
(filtered_df["published_date"].dt.date >= selected_start_date)
& (filtered_df["published_date"].dt.date <= selected_end_date)
]
# ============================================================
# 11. Show a few simple dashboard elements
# ============================================================
col1, col2 = st.columns(2)
with col1:
st.metric("Datasets loaded", len(df))
with col2:
st.metric("Datasets after filtering", len(filtered_df))
st.subheader("Filtered datasets")
st.dataframe(filtered_df, use_container_width=True)
# Optional CSV download
csv_data = filtered_df.to_csv(index=False).encode("utf-8")
st.download_button(
label="Download filtered data as CSV",
data=csv_data,
file_name="filtered_datasets.csv",
mime="text/csv",
)From the project directory, run:
streamlit run minimal_dashboard.pyStreamlit will start a local development server and open the app in your browser.
The minimal dashboard code is organized as a short end-to-end pipeline: it configures the Streamlit page, requests data from the 4TU.ResearchData API, converts the JSON response into a pandas table, applies a couple of filters, and then presents the results in the browser.
The script starts by importing three libraries:
requeststo communicate with the APIpandasto organize the returned JSON into a tablestreamlitto build the dashboard interface
This reflects the three core parts of the workflow:
data access (requests), data transformation (pandas), and data presentation (streamlit).
Next, the script defines some basic interface elements:
- the browser tab title with
st.set_page_config(...) - the main page title with
st.title(...) - a short introductory text with
st.write(...)
This is the first visible part of the dashboard and helps users immediately understand what the application does.
The script then stores the base URL of 4TU.ResearchData and builds two endpoint URLs:
/v2/articlesto retrieve dataset records/v3/groupsto retrieve institutional group information
Using separate variables for these endpoints makes the code easier to read and easier to modify later.
The function get_json() wraps the requests.get() call. It sends a request to a given URL, optionally includes query parameters, waits up to 30 seconds, checks whether the request succeeded, and returns the JSON response.
This is a good teaching pattern because it avoids repeating the same request logic throughout the script. Instead of writing requests.get(...) multiple times, the dashboard can reuse one small function.
Before loading datasets, the script requests the institutional groups from the groups endpoint. It stores them in a dictionary called group_map, where the key is the group ID and the value is the group name.
This matters because dataset records contain a group_id, which is not very meaningful to a human reader on its own. By creating this lookup table first, the dashboard can later translate group IDs into readable institution names. If this request fails, the script does not stop completely; instead, it shows a warning in the app.
The script then requests dataset metadata from the articles endpoint, using two parameters:
limit=500offset=0
This means the app loads the first 500 dataset records. For a workshop, this is a practical choice: it keeps the example fast and simple while still providing enough data to explore. If the dataset request fails, the script shows an error and stops execution with st.stop().
Once the dataset JSON is loaded, the script loops through the returned records and extracts only a small set of fields:
idtitlepublished_dateinstitution
For the institution field, it uses the earlier group_map dictionary to convert group_id into a readable group name. If no match is found, it labels it as an unknown group.
Each simplified record is added to a list called rows, and at the end that list is converted into a pandas DataFrame. This is the main transformation step: a nested JSON response becomes a flat table that is much easier to filter and display.
After creating the DataFrame, the script converts the published_date column into pandas datetime format using pd.to_datetime(...).
This is an important preparation step because date filters work much better when dates are stored as actual datetime values instead of plain text strings. The option errors="coerce" ensures that invalid date values are converted safely rather than crashing the app.
The dashboard then creates a sidebar with two filters:
- an institution filter
- a date range filter
For the institution filter, the script collects the unique institution names from the DataFrame, sorts them, and adds them to a selectbox, with "All" as the default option.
For the date filter, the script calculates the minimum and maximum publication dates present in the data and uses them as the default values in two date_input widgets. If no valid dates are available, the app shows an informational message instead.
The script creates a copy of the original DataFrame called filtered_df and then applies the selected filters step by step.
- If the user selects a specific institution, the table is restricted to rows from that institution.
- If the user selects a start and end date, the table is further restricted to rows whose publication date falls within that range.
This is a clear and useful pattern for teaching dashboard logic: start with the full table, then progressively narrow it down according to user choices.
Finally, the dashboard presents the filtered results in three ways:
- a metric showing the total number of loaded datasets
- a metric showing how many datasets remain after filtering
- a table displaying the filtered records
The metrics are placed side by side using st.columns(2), which gives the app a slightly more dashboard-like layout. Then st.dataframe(...) shows the filtered dataset table in an interactive format.
As a final convenience, the script converts the filtered DataFrame to CSV and adds a Download filtered data as CSV button.
This is a small but very useful feature: it allows users not only to explore the data in the browser, but also to export the current filtered results for reuse in another tool.
Now, make a modification to the code such as it allows users to search datasets by title. Add a search box so, users can filter datasets containing a word in the title
Add the following code bellow the section 10 "Applying the filters" of the main web app code:
# ============================================================
# 10. Apply the filters
# ============================================================
filtered_df = df.copy()Add this below,
# ---- Text search title filter ----
search_term = st.sidebar.text_input("Search in title") #Streamlit widgets return Python variables.
if search_term:
filtered_df = filtered_df[
filtered_df["title"].str.contains(search_term, case=False, na=False)
]Now, make another modification such there is chart showing the number of datasets per institution.
Insert this block in the last module of the main web app code:
# Plotting
st.subheader("Datasets per institution")
institution_counts = filtered_df["institution"].value_counts()
st.bar_chart(institution_counts)The easiest way to publish this dashboard for other people is Streamlit Community Cloud.
Your repository should contain at least:
minimal_dashboard.pyrequirements.txt
A minimal requirements.txt could be:
streamlit
pandas
requestsIf you use extra libraries, add them there as well.
Make sure the app script is committed and pushed.
Go to Streamlit Community Cloud and connect your GitHub account.
Choose:
- the GitHub repository,
- the branch,
- and the main file, for example
minimal_dashboard.py.
Click Deploy. Streamlit will install dependencies and launch the app.
Once deployment succeeds, you will get a public app URL ending in streamlit.app.
If your 4TU.ResearchData API workflow requires a token or other secret, do not hard-code it in the Python file.
For local development, use Streamlit secrets or environment variables. For deployment, add the secret in the Streamlit Community Cloud app settings.
In Streamlit, you would typically read a secret like this:
api_token = st.secrets["API_TOKEN"]And then pass it in the request headers.
Example:
headers = {"Authorization": f"Bearer {api_token}"}
response = requests.get(API_URL, headers=headers, timeout=TIMEOUT)