customerRelationship

An R package for efficiently processing customer relationship data, or more generic, iterval data, to identify and merge consecutive periods with minimal gaps. Built with Rcpp for better performance using C++ computing and data.table for scalable data manipulation.

Overview

In essence, the package exists in order to calculate the overall customer relationship timeline per customer from many fragmented activity inputs. It is particularly useful for CRMs with fragmented data such as SAP or Salesforce. It transforms smaller fragments of orders or positions into continuous periods where the subject has been active without a meaningful pause in relationship.

Choose Your Time Granularity

The package now supports two related timeline styles:

Input type	Best for	Continuity rule	Example threshold
`Date`	tenure, churn, lifecycle, campaign attribution	treat gaps in whole days	`gap_threshold = 1`
`POSIXct`	sessions, handoffs, SLA windows, intraday journeys	treat gaps in seconds, minutes, or hours	`gap_threshold = 30, gap_units = "mins"`

That means you can use the same package to answer both "how long has this customer been active overall?" and "which events belong to the same session or operational window?"

Use Cases

The package is best understood as an interval-collapsing engine: it merges overlapping or near-adjacent fragments into continuous periods.

Date-Based Collapsing

Best when continuity is measured in whole days and the collapsed result represents a relationship span:

Customer tenure and lifecycle duration
Subscription, contract, membership, or account active spans
Churn, lapse, and reactivation windows
Campaign exposure and loyalty analysis
Coverage, entitlement, or status ranges

Datetime-Based Collapsing

Best when continuity is measured within the day and the collapsed result represents an episode or operational window:

Session stitching for web, app, or product behavior
Call-center, support, or ownership handoff windows
SLA coverage and escalation timelines
Machine uptime or downtime episodes
Logistics, fulfillment, or delivery event windows
Same-day pause-and-return behavior

Features

Solid Performance: C++ implementation via Rcpp for fast period merging
Scalable: Uses data.table for efficient memory management with large datasets
Clean API: Simple, well-documented functions for customer timeline processing
Validated Input: Automatic data validation and type coercion
Flexible Time Granularity: Handles both day-level Date ranges and intra-day POSIXct timelines
Informative: Execution timing information and record counts in output

Installation

Prerequisites

R >= 4.0.0
C++14 compatible compiler (Rtools for Windows, Xcode for macOS, gcc for Linux)

From GitHub

devtools::install_github("patrikios/customerRelationship")

Local Installation

# Set working directory to package root
devtools::load_all()
# Or build and install
devtools::install()

Quick Start

library(customerRelationship)
library(data.table)

# Basic usage
data <- data.table(
  ID = c("CUS001", "CUS001", "CUS001", "CUS002"),
  From = as.Date(c("2020-01-01", "2020-01-02", "2020-02-01", "2020-01-15")),
  To = as.Date(c("2020-01-01", "2020-01-03", "2020-02-05", "2020-01-20")),
  CharacteristicBeg = c("Active", "Active", "Active", "Active"),
  CharacteristicEnd1 = c("Type1", "Type1", "Type1", "Type1"),
  CharacteristicEnd2 = c("Cat_A", "Cat_B", "Cat_B", "Cat_C")
)

timeline <- calculate_customer_timeline(
  data,
  id_column = "ID",
  from_column = "From",
  to_column = "To",
  characteristic_beg_columns = "CharacteristicBeg",
  characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2")
)
print(timeline)

# Custom column names with multiple characteristics
data2 <- data.table(
  CustomerID = c("A", "A", "B"),
  StartDate = c("2020-01-01", "2020-01-02", "2020-02-01"),
  EndDate = c("2020-01-01", "2020-01-03", "2020-02-05"),
  StatusBeg = c("New", "New", "Returning"),
  StatusEnd = c("Active", "Active", "Active"),
  TypeBeg = c("Basic", "Basic", "Premium"),
  TypeEnd = c("Basic", "Premium", "Gold")
)

timeline2 <- calculate_customer_timeline(
  data2,
  id_column = "CustomerID",
  from_column = "StartDate",
  to_column = "EndDate",
  characteristic_beg_columns = c("StatusBeg", "TypeBeg"),
  characteristic_end_columns = c("StatusEnd", "TypeEnd")
)
print(timeline2)

# Datetime timelining with a 30-minute continuity window
events <- data.table(
  ID = c("CUS001", "CUS001", "CUS001"),
  From = as.POSIXct(
    c("2020-01-01 10:00:00", "2020-01-01 10:45:00", "2020-01-01 12:00:00"),
    tz = "UTC"
  ),
  To = as.POSIXct(
    c("2020-01-01 10:30:00", "2020-01-01 11:00:00", "2020-01-01 12:30:00"),
    tz = "UTC"
  ),
  CharacteristicBeg = c("Active", "Active", "Active"),
  CharacteristicEnd1 = c("Checkout", "Checkout", "Support"),
  CharacteristicEnd2 = c("Web", "Web", "Phone")
)

session_timeline <- calculate_customer_timeline(
  events,
  id_column = "ID",
  from_column = "From",
  to_column = "To",
  characteristic_beg_columns = "CharacteristicBeg",
  characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2"),
  gap_threshold = 30,
  gap_units = "mins"
)
print(session_timeline)

S7 Processor Workflow

For repeated timeline calculations with the same column mapping and options, use the CustomerTimeline S7 class. It stores the configuration once, then applies it with calculate_timeline().

processor <- CustomerTimeline(
  id_column = "CustomerID",
  from_column = "StartDate",
  to_column = "EndDate",
  characteristic_beg_columns = c("StatusBeg", "TypeBeg"),
  characteristic_end_columns = c("StatusEnd", "TypeEnd"),
  gap_threshold = 1,
  gap_units = "days",
  verbose = FALSE
)

timeline <- calculate_timeline(processor, data2)
print(timeline)

The same pattern works for intraday timelines:

session_processor <- CustomerTimeline(
  id_column = "ID",
  from_column = "From",
  to_column = "To",
  characteristic_beg_columns = "CharacteristicBeg",
  characteristic_end_columns = c("CharacteristicEnd1", "CharacteristicEnd2"),
  gap_threshold = 30,
  gap_units = "mins",
  keep_all_periods = TRUE,
  output_columns = c("ID", "From", "To", "period_start"),
  verbose = FALSE
)

session_debug <- calculate_timeline(session_processor, events)
print(session_debug)

Use the function API when each call has different options. Use the S7 processor when you want a reusable, validated timeline configuration.

Function Reference

`CustomerTimeline(...)`

Create an S7 processor that stores the same options accepted by calculate_customer_timeline(), including column mappings, gap thresholds, time granularity, debug output, and copy behavior.

processor <- CustomerTimeline(verbose = FALSE)
calculate_timeline(processor, data)

The constructor validates configuration immediately. Scalar column arguments such as id_column, from_column, and to_column must be single non-empty character strings; characteristic and output column arguments can be character vectors.

`calculate_timeline(processor, data_frame, ...)`

Apply a CustomerTimeline processor to a data.frame or data.table. The result is the same data.table shape returned by calculate_customer_timeline().

`calculate_customer_timeline(data_frame, ...)`

Process customer relationship data and merge consecutive periods with gaps <= gap_threshold.

Parameters:

data_frame: A data.frame or data.table with customer relationship records
gap_threshold: Maximum gap between periods to merge. Numeric values are interpreted as days by default, preserving the legacy API. For datetime workflows you can also pass difftime values or combine numeric thresholds with gap_units. A new period starts only when From - previous To > gap_threshold. (default: 1 day)
gap_units: Units for numeric gap_threshold values. One of "auto", "days", "hours", "mins", or "secs" (default: "auto")
id_column: Name of the customer ID column
from_column: Name of the start date column
to_column: Name of the end date column
time_class: One of "auto", "date", or "datetime" to control whether the package preserves daily or intra-day granularity (default: "auto")
characteristic_beg_columns: Column names that should preserve beginning values
characteristic_end_columns: Column names that should take ending values
keep_all_periods: If TRUE, keep the raw internal rows with gap diagnostics for debugging, including a period_start column that marks which rows are included in the normal merged-period output (default: FALSE)
verbose: If TRUE, print processing time and result summary (default: TRUE)
output_columns: Columns to include in output. If NULL, includes all relevant columns (default: NULL)
include_gap_column: If TRUE and keep_all_periods is TRUE, include the Difference column (default: TRUE)
copy_data: If TRUE, work on a copy of the input data; if FALSE, work on the input object directly without copying it (default: TRUE)

Returns: A data.table with merged periods

Output Columns:

ID column (name specified by id_column)
From column (name specified by from_column)
To column (name specified by to_column)
Beginning characteristic columns (preserve first period values)
Ending characteristic columns (take last period values)
Difference: Gap between the current row's start and the end of the active merged period. Returned in days for Date timelines and as difftime seconds for datetime timelines when keep_all_periods = TRUE and include_gap_column = TRUE
period_start: Logical flag returned when keep_all_periods = TRUE. TRUE marks rows that start a merged period and would be kept in normal output; FALSE marks internal rows that were merged into an active period.

Difference Column: Difference is a debugging diagnostic that shows how the merge decision was made for each raw row. Internally, the algorithm tracks the first row of the active merged period and extends that period's end whenever another row merges into it. For each later row in the same customer group, Difference is calculated as:

current From - active merged period To

For example, with gap_threshold = 1:

ID	From	To
CUS001	2020-01-01	2020-01-01
CUS001	2020-01-02	2020-01-03
CUS001	2020-02-01	2020-02-05

The raw debug output includes:

ID	From	To	Difference	period_start	Meaning
CUS001	2020-01-01	2020-01-03	`NA`	`TRUE`	First row for this customer; starts the active merged period
CUS001	2020-01-02	2020-01-03	`1`	`FALSE`	`2020-01-02 - 2020-01-01 = 1`, so this row merges and extends the active period to `2020-01-03`
CUS001	2020-02-01	2020-02-05	`29`	`TRUE`	`2020-02-01 - 2020-01-03 = 29`, so this row starts a new merged period

The first row for each customer has Difference = NA because there is no active period before it. To recreate the normal merged-period output from raw debug rows, filter to period_start == TRUE.

Merge Semantics

The continuity rule is simple:

A new period starts only when From - active merged period To > gap_threshold
Overlapping periods merge automatically
Back-to-back periods merge when the gap is within the threshold
The first period keeps beginning characteristics, while the last merged fragment contributes ending characteristics

Algorithm

The package implements a period-merging algorithm:

Sorts records by customer ID and start time
Iterates through sorted records, tracking each customer's current merged period
Calculates the gap between the current row and the active merged period for the same customer
Merges periods if the gap is less than or equal to gap_threshold by:
- Extending the current period's end
- Updating ending characteristics to the later period's values
Returns one row per merged period, with optional gap diagnostics when keep_all_periods = TRUE

Timelining Ideas

The package started as a daily relationship engine, but it becomes much richer when you preserve time-of-day:

Daily tenure timelines: the original CRM use case where continuity means "no break of more than 1 day"
Session stitching: merge browsing, app, or call-center activity windows separated by only a few minutes
Same-day reactivation: distinguish a return within 2 hours from a return next week
SLA coverage: track support ownership, escalation windows, or response continuity inside one business day
Stateful journeys: compress event logs into phases like onboarding, active use, pause, and reactivation

That gives the package two complementary modes: Date for lifecycle and tenure questions, and POSIXct for true timeline reconstruction.

Performance

Compilation: C++ code is pre-compiled into the package
Memory: Efficient with data.table's reference semantics
Speed: Typically processes 1M+ records in seconds on modern hardware
1M-row test benchmark: On this workspace, the 1,000,000-row heterogeneous fixture ran 5 times with min 0.11s, median 0.13s, and max 0.15s elapsed time.

Development

Building The Package

# Generate documentation from roxygen comments
devtools::document()

# Check package
devtools::check()

# Run tests
devtools::test()

Building From Source

# Windows/macOS/Linux
R CMD build customerRelationship
R CMD check customerRelationship_*.tar.gz

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome. Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
R		R
inst/extdata		inst/extdata
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
endvers.R		endvers.R
examples.R		examples.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

customerRelationship

Overview

Choose Your Time Granularity

Use Cases

Date-Based Collapsing

Datetime-Based Collapsing

Features

Installation

Prerequisites

From GitHub

Local Installation

Quick Start

S7 Processor Workflow

Function Reference

`CustomerTimeline(...)`

`calculate_timeline(processor, data_frame, ...)`

`calculate_customer_timeline(data_frame, ...)`

Merge Semantics

Algorithm

Timelining Ideas

Performance

Development

Building The Package

Building From Source

License

Contributing

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

customerRelationship

Overview

Choose Your Time Granularity

Use Cases

Date-Based Collapsing

Datetime-Based Collapsing

Features

Installation

Prerequisites

From GitHub

Local Installation

Quick Start

S7 Processor Workflow

Function Reference

CustomerTimeline(...)

calculate_timeline(processor, data_frame, ...)

calculate_customer_timeline(data_frame, ...)

Merge Semantics

Algorithm

Timelining Ideas

Performance

Development

Building The Package

Building From Source

License

Contributing

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`CustomerTimeline(...)`

`calculate_timeline(processor, data_frame, ...)`

`calculate_customer_timeline(data_frame, ...)`

Packages