Skip to content

kgkazakos/causaltrack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CausalTrack: Audio-Based Say-Do Gap Detection

A personal research project exploring multimodal AI for detecting the Say-Do Gap in user research.

πŸŽ“ Part of my Computational Product Research learning journey
πŸ“… February 2026
πŸ”¬ Methodology validation study

CausalTrack Knowledge Graph - 611 nodes

Knowledge graph of 50 synthetic user interviews revealing the Say-Do Gap: 611 nodes (2 user segments, 50 interviews, 559 behavioral cues)


The Problem

Research shows that 42% of startup failures...


The Problem

Research shows that 42% of startup failures are attributed to misreading market demand - building products users said they wanted during research, but refused to adopt upon launch.

One of the potential culprits? Social Desirability Bias - users smooth over negative feedback to be polite.

Someone might say: "This feature is easy to use."

But their audio reveals: "It's... [pause 3s]... easy." [frustrated tone]

Traditional text-based research misses this gap entirely.


The Hypothesis

Can we automatically detect the Say-Do Gap by analyzing audio behavioral cues using multimodal AI?

  • Pauses (>2 seconds)
  • Vocal hesitation ("um", "uh")
  • Frustrated tone
  • Confused tone
  • Sentiment mismatch

The Experiment

Dataset: 50 synthetic user interviews (AcmeCal - a fictional camping gear rental marketplace)

Bias Injection:

  • 40 Admin users (smooth experience)
  • 10 End Users (friction-filled experience)

Analysis Pipeline:

  1. Text β†’ Audio (OpenAI TTS)
  2. Audio β†’ Behavioral Cues (Gemini 3 Pro)
  3. Cues β†’ Knowledge Graph (Neo4j)
  4. Graph β†’ Say-Do Consistency Score

The Results

Say-Do Consistency Scores:

User Segment Score Interpretation
Admin Users 83.0% HIGH CONSISTENCY - Trustworthy feedback βœ…
End Users 3.2% LOW CONSISTENCY - Hidden problems detected ⚠️

Bias Gap: 79.8 percentage points


Key Findings

  1. End Users showed 2X more behavioral friction (18.8 vs 9.3 cues per interview)
  2. But expressed similar verbal sentiment to Admin users
  3. The audio revealed what the text concealed

Example: Frustrated Moments (End Users)

Timestamp Quote Behavioral Cue
02:24 "I can't tell if I selected it or not." Frustrated tone
03:06 "Honestly, it's kind of confusing..." Pause (4s) + vocal markers
03:38 "This is taking a while." Frustrated tone

Yet in verbal summaries, these users described the experience as "generally positive."


Architecture

High-Level System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Text Scripts    β”‚
β”‚ (50 interviews) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Audio Files     β”‚
β”‚ (OpenAI TTS)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gemini 3 Pro            β”‚
β”‚ (Audio-Direct Analysis) β”‚
β”‚ - Extract pauses        β”‚
β”‚ - Detect vocal markers  β”‚
β”‚ - Analyze tone          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Behavioral Cues     β”‚
β”‚ (559 cues extracted)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Neo4j Knowledge     β”‚
β”‚ Graph               β”‚
β”‚ - 50 Interviews     β”‚
β”‚ - 559 Cues          β”‚
β”‚ - 2 User Segments   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Say-Do Consistency  β”‚
β”‚ Score Calculation   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

Intelligence:

  • Gemini 3 Pro (gemini-3.0-pro) - Multimodal audio analysis
  • OpenAI TTS (tts-1) - Synthetic audio generation

Infrastructure:

  • Neo4j Aura - Knowledge graph storage
  • Python 3.11+ - Analysis pipeline
  • Google Cloud Run - Deployment (planned)

Demo Video

🎬 Watch the demo

See the full methodology in action, including:

  • Live Neo4j graph queries
  • Say-Do Score calculation
  • Behavioral cue extraction examples
  • Evidence of the 79.8 point bias gap

White Paper

πŸ“„ Read the full methodology paper

Detailed explanation of:

  • Theoretical foundation (Social Desirability Bias)
  • Experimental design
  • Results analysis
  • Limitations & future work

Limitations & Future Research

This is a methodology validation study with important limitations:

  1. Synthetic data only - Real human validation needed
  2. English language only - Cross-cultural generalization unknown
  3. Controlled TTS voices - Natural speech variation not tested
  4. Single product domain - B2B/Enterprise contexts may differ

Next step: Seeking research collaborators for academic conferences (e.g. CHI 2027 paper). Validation study across diverse populations and contexts.


Example Output

Sample behavioral cue extraction:

{
  "interview_id": "end_user_script_03",
  "behavioral_cues": [
    {
      "timestamp": "01:38",
      "type": "pause",
      "duration": "3 seconds",
      "context": "Before confirming selection"
    },
    {
      "timestamp": "02:24",
      "type": "frustrated",
      "quote": "I can't tell if I selected it or not.",
      "context": "Attempting to complete task"
    },
    {
      "timestamp": "03:06",
      "type": "vocal_marker",
      "quote": "Um, uh, this is kind of confusing",
      "context": "Navigation confusion"
    }
  ],
  "say_do_score": 3.2,
  "interpretation": "LOW CONSISTENCY - Hidden problems detected"
}

Methodology Validation Study

Seeking research collaborators for academic conferences (e.g. CHI 2027 paper).

If you're a UX researcher or product researcher or similar product roles interested in validating this methodology:

πŸ“§ Sign up here

What's involved:

  • Provide 1-2 real user research sessions (audio recordings)
  • Receive CausalTrack analysis report
  • Validate findings against your expert judgment
  • Co-author credit on CHI submission (if desired)

Target: Researchers or relevant roles across diverse domains


License

Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

You are free to:

  • Use this methodology for academic research
  • Share and adapt the approach
  • Cite this work in publications

Under these terms:

  • Attribution required - Credit this work appropriately
  • Non-commercial - Not for commercial use without permission

For commercial licensing inquiries: kgkazakos@gmail.com


Citation

If you use this methodology in your research:

@misc{causaltrack2026,
  author = {Kostas Kazakos},
  title = {CausalTrack: Audio-Based Behavioral Truth Detection for User Research},
  year = {2026},
  month = {February},
  url = {https://github.com/kgkazakos/causaltrack},
  note = {Methodology validation study}
}

Acknowledgments

Built with:

  • Gemini 3 Pro by Google DeepMind
  • Neo4j graph database
  • OpenAI TTS for synthetic audio

Inspired by decades of research on Social Desirability Bias and the Say-Do Gap in behavioral science.


Contact

Personal Research Project
Not affiliated with any employer.

πŸ“§ Email: kgkazakos@gmail.com
πŸ’Ό LinkedIn: www.linkedin.com/in/kazakosk/


Last Updated: February 23, 2026
Status: Methodology Validation Phase
Next Milestone: Scaled validation for academic conference

About

Audio-based behavioral truth detection for user research - A methodology validation study

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors