An R package for efficiently processing customer relationship data, or more generic, iterval data, to identify and merge consecutive periods with minimal gaps. Built with Rcpp for better performance using C++ computing and data.table for scalable data manipulation.
In essence, the package exists in order to calculate the overall customer relationship timeline per customer from many fragmented activity inputs. It is particularly useful for CRMs with fragmented data such as SAP or Salesforce. It transforms smaller fragments of orders or positions into continuous periods where the subject has been active without a meaningful pause in relationship.
The package now supports two related timeline styles:
| Input type | Best for | Continuity rule | Example threshold |
|---|---|---|---|
Date |
tenure, churn, lifecycle, campaign attribution | treat gaps in whole days | gap_threshold = 1 |
POSIXct |
sessions, handoffs, SLA windows, intraday journeys | treat gaps in seconds, minutes, or hours | gap_threshold = 30, gap_units = "mins" |
That means you can use the same package to answer both "how long has this customer been active overall?" and "which events belong to the same session or operational window?"
The package is best understood as an interval-collapsing engine: it merges overlapping or near-adjacent fragments into continuous periods.
Best when continuity is measured in whole days and the collapsed result represents a relationship span:
- Customer tenure and lifecycle duration
- Subscription, contract, membership, or account active spans
- Churn, lapse, and reactivation windows
- Campaign exposure and loyalty analysis
- Coverage, entitlement, or status ranges
Best when continuity is measured within the day and the collapsed result represents an episode or operational window:
- Session stitching for web, app, or product behavior
- Call-center, support, or ownership handoff windows
- SLA coverage and escalation timelines
- Machine uptime or downtime episodes
- Logistics, fulfillment, or delivery event windows
- Same-day pause-and-return behavior
- Solid Performance: C++ implementation via Rcpp for fast period merging
- Scalable: Uses data.table for efficient memory management with large datasets
- Clean API: Simple, well-documented functions for customer timeline processing
- Validated Input: Automatic data validation and type coercion
- Flexible Time Granularity: Handles both day-level
Dateranges and intra-dayPOSIXcttimelines - Informative: Execution timing information and record counts in output
- R >= 4.0.0
- C++14 compatible compiler (Rtools for Windows, Xcode for macOS, gcc for Linux)
devtools::install_github("patrikios/customerRelationship")# Set working directory to package root
devtools::load_all()
# Or build and install
devtools::install()library(customerRelationship)
library(data.table)
# Basic usage
data <- data.table(
ID = c("CUS001", "CUS001", "CUS001", "CUS002"),
From = as.Date(c("2020-01-01", "2020-01-02", "2020-02-01", "2020-01-15")),
To = as.Date(c("2020-01-01", "2020-01-03", "2020-02-05", "2020-01-20")),
CharacteristicBeg = c("Active", "Active", "Active", "Active"),
CharacteristicEnd1 = c("Type1", "Type1", "Type1", "Type1"),
CharacteristicEnd2 = c("Cat_A", "Cat_B", "Cat_B", "Cat_C")
)
timeline <- calculate_customer_timeline(
data,
id_column = "ID",
from_column = "From",
to_column = "To",
characteristic_beg_columns = "CharacteristicBeg",
characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2")
)
print(timeline)
# Custom column names with multiple characteristics
data2 <- data.table(
CustomerID = c("A", "A", "B"),
StartDate = c("2020-01-01", "2020-01-02", "2020-02-01"),
EndDate = c("2020-01-01", "2020-01-03", "2020-02-05"),
StatusBeg = c("New", "New", "Returning"),
StatusEnd = c("Active", "Active", "Active"),
TypeBeg = c("Basic", "Basic", "Premium"),
TypeEnd = c("Basic", "Premium", "Gold")
)
timeline2 <- calculate_customer_timeline(
data2,
id_column = "CustomerID",
from_column = "StartDate",
to_column = "EndDate",
characteristic_beg_columns = c("StatusBeg", "TypeBeg"),
characteristic_end_columns = c("StatusEnd", "TypeEnd")
)
print(timeline2)
# Datetime timelining with a 30-minute continuity window
events <- data.table(
ID = c("CUS001", "CUS001", "CUS001"),
From = as.POSIXct(
c("2020-01-01 10:00:00", "2020-01-01 10:45:00", "2020-01-01 12:00:00"),
tz = "UTC"
),
To = as.POSIXct(
c("2020-01-01 10:30:00", "2020-01-01 11:00:00", "2020-01-01 12:30:00"),
tz = "UTC"
),
CharacteristicBeg = c("Active", "Active", "Active"),
CharacteristicEnd1 = c("Checkout", "Checkout", "Support"),
CharacteristicEnd2 = c("Web", "Web", "Phone")
)
session_timeline <- calculate_customer_timeline(
events,
id_column = "ID",
from_column = "From",
to_column = "To",
characteristic_beg_columns = "CharacteristicBeg",
characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2"),
gap_threshold = 30,
gap_units = "mins"
)
print(session_timeline)For repeated timeline calculations with the same column mapping and options, use
the CustomerTimeline S7 class. It stores the configuration once, then applies it
with calculate_timeline().
processor <- CustomerTimeline(
id_column = "CustomerID",
from_column = "StartDate",
to_column = "EndDate",
characteristic_beg_columns = c("StatusBeg", "TypeBeg"),
characteristic_end_columns = c("StatusEnd", "TypeEnd"),
gap_threshold = 1,
gap_units = "days",
verbose = FALSE
)
timeline <- calculate_timeline(processor, data2)
print(timeline)The same pattern works for intraday timelines:
session_processor <- CustomerTimeline(
id_column = "ID",
from_column = "From",
to_column = "To",
characteristic_beg_columns = "CharacteristicBeg",
characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2"),
gap_threshold = 30,
gap_units = "mins",
keep_all_periods = TRUE,
output_columns = c("ID", "From", "To", "period_start"),
verbose = FALSE
)
session_debug <- calculate_timeline(session_processor, events)
print(session_debug)Use the function API when each call has different options. Use the S7 processor when you want a reusable, validated timeline configuration.
Create an S7 processor that stores the same options accepted by
calculate_customer_timeline(), including column mappings, gap thresholds, time
granularity, debug output, and copy behavior.
processor <- CustomerTimeline(verbose = FALSE)
calculate_timeline(processor, data)The constructor validates configuration immediately. Scalar column arguments
such as id_column, from_column, and to_column must be single non-empty
character strings; characteristic and output column arguments can be character
vectors.
Apply a CustomerTimeline processor to a data.frame or data.table. The result is
the same data.table shape returned by calculate_customer_timeline().
Process customer relationship data and merge consecutive periods with gaps <= gap_threshold.
Parameters:
data_frame: A data.frame or data.table with customer relationship recordsgap_threshold: Maximum gap between periods to merge. Numeric values are interpreted as days by default, preserving the legacy API. For datetime workflows you can also passdifftimevalues or combine numeric thresholds withgap_units. A new period starts only whenFrom - previous To > gap_threshold. (default: 1 day)gap_units: Units for numericgap_thresholdvalues. One of"auto","days","hours","mins", or"secs"(default:"auto")id_column: Name of the customer ID columnfrom_column: Name of the start date columnto_column: Name of the end date columntime_class: One of"auto","date", or"datetime"to control whether the package preserves daily or intra-day granularity (default:"auto")characteristic_beg_columns: Column names that should preserve beginning valuescharacteristic_end_columns: Column names that should take ending valueskeep_all_periods: If TRUE, keep the raw internal rows with gap diagnostics for debugging, including aperiod_startcolumn that marks which rows are included in the normal merged-period output (default: FALSE)verbose: If TRUE, print processing time and result summary (default: TRUE)output_columns: Columns to include in output. If NULL, includes all relevant columns (default: NULL)include_gap_column: If TRUE andkeep_all_periodsis TRUE, include theDifferencecolumn (default: TRUE)copy_data: If TRUE, work on a copy of the input data; if FALSE, work on the input object directly without copying it (default: TRUE)
Returns: A data.table with merged periods
Output Columns:
- ID column (name specified by
id_column) - From column (name specified by
from_column) - To column (name specified by
to_column) - Beginning characteristic columns (preserve first period values)
- Ending characteristic columns (take last period values)
- Difference: Gap between the current row's start and the end of the active merged period. Returned in days for
Datetimelines and asdifftimeseconds for datetime timelines whenkeep_all_periods = TRUEandinclude_gap_column = TRUE - period_start: Logical flag returned when
keep_all_periods = TRUE.TRUEmarks rows that start a merged period and would be kept in normal output;FALSEmarks internal rows that were merged into an active period.
Difference Column:
Difference is a debugging diagnostic that shows how the merge decision was made for each raw row. Internally, the algorithm tracks the first row of the active merged period and extends that period's end whenever another row merges into it. For each later row in the same customer group, Difference is calculated as:
current From - active merged period ToFor example, with gap_threshold = 1:
| ID | From | To |
|---|---|---|
| CUS001 | 2020-01-01 | 2020-01-01 |
| CUS001 | 2020-01-02 | 2020-01-03 |
| CUS001 | 2020-02-01 | 2020-02-05 |
The raw debug output includes:
| ID | From | To | Difference | period_start | Meaning |
|---|---|---|---|---|---|
| CUS001 | 2020-01-01 | 2020-01-03 | NA |
TRUE |
First row for this customer; starts the active merged period |
| CUS001 | 2020-01-02 | 2020-01-03 | 1 |
FALSE |
2020-01-02 - 2020-01-01 = 1, so this row merges and extends the active period to 2020-01-03 |
| CUS001 | 2020-02-01 | 2020-02-05 | 29 |
TRUE |
2020-02-01 - 2020-01-03 = 29, so this row starts a new merged period |
The first row for each customer has Difference = NA because there is no active period before it.
To recreate the normal merged-period output from raw debug rows, filter to period_start == TRUE.
The continuity rule is simple:
- A new period starts only when
From - active merged period To > gap_threshold - Overlapping periods merge automatically
- Back-to-back periods merge when the gap is within the threshold
- The first period keeps beginning characteristics, while the last merged fragment contributes ending characteristics
The package implements a period-merging algorithm:
- Sorts records by customer ID and start time
- Iterates through sorted records, tracking each customer's current merged period
- Calculates the gap between the current row and the active merged period for the same customer
- Merges periods if the gap is less than or equal to
gap_thresholdby:- Extending the current period's end
- Updating ending characteristics to the later period's values
- Returns one row per merged period, with optional gap diagnostics when
keep_all_periods = TRUE
The package started as a daily relationship engine, but it becomes much richer when you preserve time-of-day:
- Daily tenure timelines: the original CRM use case where continuity means "no break of more than 1 day"
- Session stitching: merge browsing, app, or call-center activity windows separated by only a few minutes
- Same-day reactivation: distinguish a return within 2 hours from a return next week
- SLA coverage: track support ownership, escalation windows, or response continuity inside one business day
- Stateful journeys: compress event logs into phases like onboarding, active use, pause, and reactivation
That gives the package two complementary modes: Date for lifecycle and tenure questions, and POSIXct for true timeline reconstruction.
- Compilation: C++ code is pre-compiled into the package
- Memory: Efficient with data.table's reference semantics
- Speed: Typically processes 1M+ records in seconds on modern hardware
- 1M-row test benchmark: On this workspace, the 1,000,000-row heterogeneous fixture ran 5 times with min 0.11s, median 0.13s, and max 0.15s elapsed time.
# Generate documentation from roxygen comments
devtools::document()
# Check package
devtools::check()
# Run tests
devtools::test()# Windows/macOS/Linux
R CMD build customerRelationship
R CMD check customerRelationship_*.tar.gzMIT License - see LICENSE file for details
Contributions are welcome. Please open an issue or submit a pull request.