blog: cohort analysis#4088
Conversation
nathan-contino
left a comment
There was a problem hiding this comment.
Lots of feedback on minor style issues. The incorrect chart is the biggest issue, but hopefully you'll incorporate most or all of this feedback.
| @@ -0,0 +1,301 @@ | |||
| --- | |||
| publish_date: 2025-12-01 | |||
There was a problem hiding this comment.
Let's update this to the current year, at least :)
| publish_date: 2025-12-01 | ||
| title: "Analyzing Why Customers Leave — Cohort Analysis in FusionAuth" | ||
| description: "Using login data to calculate helpful business statistics." | ||
| authors: Person McPersonface |
There was a problem hiding this comment.
Let's use a real person. Would you like to use your own name here?
There was a problem hiding this comment.
I've added Dan's name as the author as I'm not comfortable recommending that readers install modules without the safety of containers
| - [Introduction](#introduction) | ||
| - [FusionAuth Native Reports](#fusionauth-native-reports) | ||
| - [An Overview Of Cohort Analysis](#an-overview-of-cohort-analysis) | ||
| - [Understanding Additional User Charts](#understanding-additional-user-charts) | ||
| - [Total Users Charts](#total-users-charts) | ||
| - [User Acquisition Charts](#user-acquisition-charts) | ||
| - [User Age Chart](#user-age-chart) | ||
| - [Logins Per Year Charts](#logins-per-year-charts) | ||
| - [Percent Logins Per Year Charts](#percent-logins-per-year-charts) | ||
| - [Abandonment Charts](#abandonment-charts) | ||
| - [Activity Cohort Chart](#activity-cohort-chart) | ||
| - [Returning Users Chart](#returning-users-chart) | ||
| - [Friction Chart](#friction-chart) | ||
| - [Login Frequency Chart](#login-frequency-chart) | ||
| - [Cohort Analysis Chart](#cohort-analysis-chart) | ||
| - [How To Run The Chart Code For Your FusionAuth Instance](#how-to-run-the-chart-code-for-your-fusionauth-instance) | ||
| - [Extract Customer Data](#extract-customer-data) | ||
| - [Calculate And Display The Charts](#calculate-and-display-the-charts) | ||
| - [Summary](#summary) | ||
| - [Appendix — How To Create Fake Customer Data In FusionAuth](#appendix--how-to-create-fake-customer-data-in-fusionauth) |
There was a problem hiding this comment.
| - [Introduction](#introduction) | |
| - [FusionAuth Native Reports](#fusionauth-native-reports) | |
| - [An Overview Of Cohort Analysis](#an-overview-of-cohort-analysis) | |
| - [Understanding Additional User Charts](#understanding-additional-user-charts) | |
| - [Total Users Charts](#total-users-charts) | |
| - [User Acquisition Charts](#user-acquisition-charts) | |
| - [User Age Chart](#user-age-chart) | |
| - [Logins Per Year Charts](#logins-per-year-charts) | |
| - [Percent Logins Per Year Charts](#percent-logins-per-year-charts) | |
| - [Abandonment Charts](#abandonment-charts) | |
| - [Activity Cohort Chart](#activity-cohort-chart) | |
| - [Returning Users Chart](#returning-users-chart) | |
| - [Friction Chart](#friction-chart) | |
| - [Login Frequency Chart](#login-frequency-chart) | |
| - [Cohort Analysis Chart](#cohort-analysis-chart) | |
| - [How To Run The Chart Code For Your FusionAuth Instance](#how-to-run-the-chart-code-for-your-fusionauth-instance) | |
| - [Extract Customer Data](#extract-customer-data) | |
| - [Calculate And Display The Charts](#calculate-and-display-the-charts) | |
| - [Summary](#summary) | |
| - [Appendix — How To Create Fake Customer Data In FusionAuth](#appendix--how-to-create-fake-customer-data-in-fusionauth) |
I don't think a TOC at the top of the page is especially helpful, particularly for blog posts. Can we remove this?
| - [Appendix — How To Create Fake Customer Data In FusionAuth](#appendix--how-to-create-fake-customer-data-in-fusionauth) | ||
|
|
||
|
|
||
| ## Introduction |
There was a problem hiding this comment.
| ## Introduction |
I prefer to omit the Introduction header for docs content. It takes up a lot of vertical space at no benefit to the user.
| If you use an authentication gateway like FusionAuth and want to improve your business's customer acquisition and retention, you need to perform user cohort analysis. Without analyzing your users' behavior, you have no information to guide your product development. Cohort analysis allows you to answer questions like: | ||
|
|
||
| - How frequently do customers use your app? | ||
| - How many people register but quickly lose interest in your service? | ||
| - Did new features or advertising campaigns entice new users or irritate existing users? | ||
|
|
||
| This article demonstrates how to use FusionAuth to track customer statistics: retention rates, user age analyses, and customer cohorts (groups). While you need to investigate your own application database for the reasons *why* customers stay or leave, having a base of FusionAuth login statistics to work from lets you know *whom* to analyze. |
There was a problem hiding this comment.
| If you use an authentication gateway like FusionAuth and want to improve your business's customer acquisition and retention, you need to perform user cohort analysis. Without analyzing your users' behavior, you have no information to guide your product development. Cohort analysis allows you to answer questions like: | |
| - How frequently do customers use your app? | |
| - How many people register but quickly lose interest in your service? | |
| - Did new features or advertising campaigns entice new users or irritate existing users? | |
| This article demonstrates how to use FusionAuth to track customer statistics: retention rates, user age analyses, and customer cohorts (groups). While you need to investigate your own application database for the reasons *why* customers stay or leave, having a base of FusionAuth login statistics to work from lets you know *whom* to analyze. | |
| Understanding users' behavior can help guide product development and improve customer retention. This article demonstrates how to use FusionAuth to track customer statistics. Understanding *why* customers stay or leave might require data from your application database. But FusionAuth login statistics can help you understand *who* your customers are, and help you sort them into cohorts (groups). Cohort analysis allows you to answer questions like: | |
| - How frequently do customers use your app? | |
| - How many people register but quickly lose interest in your service? | |
| - Did new features or advertising campaigns entice new users or irritate existing users? |
The first paragraph here is a shade too wordy for my liking; I think we can remove all of the auth gateway/fusionauth mentions, as well as the rather unwieldy phrase 'business' customer acquisition and retention' -- after all, cohort analysis is useful for just about any software! Even better, if we rephrase 'analyzing your user's behavior' to 'understanding', we sound more like humans and less like aliens.
Additionally, this section puts the cart before the horse by defining 'cohort analysis' before explaining what a cohort is (or how it relates to FusionAuth). My suggestion above moves the bullet points defining cohort analysis to the bottom of this section, so users get a chance to understand our context about before we start introducing new terms.
|
|
||
| Browse to http://localhost:7777 to view your charts. | ||
|
|
||
| The app reads all data from `users.json` in the `main()` function, calculates the charts, then inserts them into `5page.html`, and returns that page when requested. It calculates each chart in parallel, using the `runParallel()` function in the `getChartData()` function. The chart calculations themselves are just loops through the users and login dates. |
There was a problem hiding this comment.
| The app reads all data from `users.json` in the `main()` function, calculates the charts, then inserts them into `5page.html`, and returns that page when requested. It calculates each chart in parallel, using the `runParallel()` function in the `getChartData()` function. The chart calculations themselves are just loops through the users and login dates. | |
| The web server reads all data from `users.json` in the `main()` function, calculates the charts, then inserts them into `5page.html`, and returns that page when requested. Each chart calculates in parallel, using the `runParallel()` function in the `getChartData()` function. |
Suggestion: let's not start using the term 'app' here, if we haven't used it before. Slight rephrases to active voice.
| You can also edit `4app.go` and `5page.html` to add any new charts you need. LLMs handle Go and HTML code very well due to the languages' simplicity. If you provide both files to any chatbot, it should instantly be able to add any chart you want. | ||
|
|
||
| If you want even deeper analysis on a regular basis, you need to dive into the world of business intelligence (BI): data extraction, denormalization, and dashboards. | ||
|
|
There was a problem hiding this comment.
| You can also edit `4app.go` and `5page.html` to add any new charts you need. LLMs handle Go and HTML code very well due to the languages' simplicity. If you provide both files to any chatbot, it should instantly be able to add any chart you want. | |
| If you want even deeper analysis on a regular basis, you need to dive into the world of business intelligence (BI): data extraction, denormalization, and dashboards. |
We already told users how to add and edit charts. Let's not repeat ourselves and oversell.
| If you haven't done so already, use Git to clone the [accompanying repository](https://github.com/ritza-co/fusionauth-user-charts), or download and unzip it. | ||
|
|
||
| ```sh | ||
| git clone https://github.com/ritza-co/fusionauth-user-charts.git | ||
| cd fusionauth-user-charts | ||
| ``` |
There was a problem hiding this comment.
Same feedback as this same copy/pasted snippet above on splitting these commands up, removing 'download and unzip it', and marking the snippets as console.
| Edit the `1createMockData.go` file. Adjust the URL, authorization key, and application Id in the code above to match your FusionAuth instance. Then run the file with the following command: | ||
|
|
||
| ```sh | ||
| docker run --init -it --rm --platform linux/amd64 --name "app" --network faNetwork -v .:/app -v ./gocache:/go/pkg -v ./buildcache:/root/.cache/go-build -w /app golang:1.25-bookworm sh -c "go run 1createMockData.go" | ||
| ``` | ||
|
|
||
| The only parameter you need to change is `--network faNetwork`. Ensure your FusionAuth instance is running on the same network as the extraction code (`localhost` won't work in Docker). Alternatively, if you run FusionAuth remotely, remove the network parameter entirely. | ||
|
|
||
| If you installed Go directly on your computer, you can run the following command instead of using Docker: | ||
|
|
||
| ```sh | ||
| go run 1createMockData.go | ||
| ``` |
There was a problem hiding this comment.
Same feedback as this same copy/pasted snippet above on the placeholder etc.
|
|
||
| Once the users are registered, you need to randomize their registration dates, set 5% of user email addresses to unverified, and create thousands of login dates. To do this, you can either connect to your FusionAuth database in a SQL browser and run the contents of `2createMockData.sql`, or use the following command with Docker: | ||
|
|
||
| ```sh |
There was a problem hiding this comment.
| ```sh | |
| ```console |
nathan-contino
left a comment
There was a problem hiding this comment.
Lots of feedback on minor style issues. The incorrect chart is the biggest issue, but hopefully you'll incorporate most or all of this feedback.
|
@nathan-contino sorry, I should have mentioned I was reviewing this as well and will have feedback for the author too. |
| @@ -0,0 +1,301 @@ | |||
| --- | |||
| publish_date: 2025-12-01 | |||
| title: "Analyzing Why Customers Leave — Cohort Analysis in FusionAuth" | |||
There was a problem hiding this comment.
| title: "Analyzing Why Customers Leave — Cohort Analysis in FusionAuth" | |
| title: Cohort Analysis |
| - [Introduction](#introduction) | ||
| - [FusionAuth Native Reports](#fusionauth-native-reports) | ||
| - [An Overview Of Cohort Analysis](#an-overview-of-cohort-analysis) | ||
| - [Understanding Additional User Charts](#understanding-additional-user-charts) | ||
| - [Total Users Charts](#total-users-charts) | ||
| - [User Acquisition Charts](#user-acquisition-charts) | ||
| - [User Age Chart](#user-age-chart) | ||
| - [Logins Per Year Charts](#logins-per-year-charts) | ||
| - [Percent Logins Per Year Charts](#percent-logins-per-year-charts) | ||
| - [Abandonment Charts](#abandonment-charts) | ||
| - [Activity Cohort Chart](#activity-cohort-chart) | ||
| - [Returning Users Chart](#returning-users-chart) | ||
| - [Friction Chart](#friction-chart) | ||
| - [Login Frequency Chart](#login-frequency-chart) | ||
| - [Cohort Analysis Chart](#cohort-analysis-chart) | ||
| - [How To Run The Chart Code For Your FusionAuth Instance](#how-to-run-the-chart-code-for-your-fusionauth-instance) | ||
| - [Extract Customer Data](#extract-customer-data) | ||
| - [Calculate And Display The Charts](#calculate-and-display-the-charts) | ||
| - [Summary](#summary) | ||
| - [Appendix — How To Create Fake Customer Data In FusionAuth](#appendix--how-to-create-fake-customer-data-in-fusionauth) |
| - **Time Horizon:** The period over which you track that metric (for example, Day 0, Day 1, or Week 4). | ||
|
|
||
| Cohort analysis is useful for distinguishing between **growth** and **retention**. For example, a product may have 10,000 active users, which looks healthy. However, cohort analysis reveals that 9,000 of those users registered this month, and only 5% of users from six months ago ever returned. This indicates an abandonment problem, where the organization successfully acquires users but fails to keep them. | ||
|
|
There was a problem hiding this comment.
In the same vein, add another example with an app that has only 1000 users but 95% of them returning. That would be a healthier app, even if it has less users.
|
|
||
| The age chart groups users by the number of years they have used your service. It's an indication of how much experience the average user has with your business. | ||
|
|
||
| Notice that this chart is the mirror image of the new users per year chart. |
There was a problem hiding this comment.
| Notice that this chart is the mirror image of the new users per year chart. | |
| This chart is a mirror image of the new users per year chart. If a user registered in 2020, then in 2026, they will have been registered for six years. |
|
|
||
| The logins per year and month charts show how active your users are. These charts, and some of the later charts, depend on the expiry duration of your login tokens. Longer expiry times mean that users log in less frequently. You can adjust the sample code to analyze a period that makes sense for your configuration. | ||
|
|
||
| Sometimes it's more meaningful to work with deduplicated logins for a period. In other words, if a user logs in more than once in a year or month, they are counted as logging in only once in that period. This prevents a small fraction of users who log in frequently from making the service appear more used than it actually is. |
There was a problem hiding this comment.
Don't mention this unless we show how to do deduplication.
| ``` | ||
|
|
||
| <Aside type="caution"> | ||
| If you have more than 999,999 users, you need to edit `numberOfResults=999999` in the script, and possibly extract your users in batches and concatenate the JSON output files. |
There was a problem hiding this comment.
See above about 999999 not working. In particular, I got this error message when I printed out the body:
body: {"fieldErrors":{"numberOfResults":[{"code":"[invalid]numberOfResults","message":"The [numberOfResults] is invalid because it has exceeded the maximum res
ult window size. The sum of [startRow] and [numberOfResults] must be less than or equal to [10000]."}]},"generalErrors":[]}
httpError! status: 400
| If you're using Docker, run the following command: | ||
|
|
||
| ```sh | ||
| docker run --init -it --rm --platform linux/amd64 --name "app" -p 7777:7777 -v .:/app -v ./gocache:/go/pkg -v ./buildcache:/root/.cache/go-build -w /app golang:1.25-bookworm sh -c "go mod tidy && go run 4app.go" |
There was a problem hiding this comment.
Again, any reason not to assume the reader has golang installed locally?
| email: string | ||
| isVerified: bool | ||
| registeredDate: string // timestamp since 1970 | ||
| loginDates: string[] // oldest dates first |
There was a problem hiding this comment.
as noted above, if the user is only registered to one application, the first loginDate will always match teh registered date.
Also why are these being held as strings? why not uint64?
| cd fusionauth-user-charts | ||
| ``` | ||
|
|
||
| Edit the `1createMockData.go` file. Adjust the URL, authorization key, and application Id in the code above to match your FusionAuth instance. Then run the file with the following command: |
| Edit the `1createMockData.go` file. Adjust the URL, authorization key, and application Id in the code above to match your FusionAuth instance. Then run the file with the following command: | ||
|
|
||
| ```sh | ||
| docker run --init -it --rm --platform linux/amd64 --name "app" --network faNetwork -v .:/app -v ./gocache:/go/pkg -v ./buildcache:/root/.cache/go-build -w /app golang:1.25-bookworm sh -c "go run 1createMockData.go" |
There was a problem hiding this comment.
let's assume the user has golang installed locally.
|
Hi @nathan-contino. I just ran a QA pass on this one. Some small things I've touched up:
The Let me know if anything needs adjusting. |
e4a316b to
f521c5e
Compare
|
Pinging @mooreds so we don't step on each other's feet: is this ready for a final copy edit and merge? |
|
No, there were some issues with the code before so I wanted to run it locally. |
mooreds
left a comment
There was a problem hiding this comment.
Hi folks,
There's still an issue with pagination and running the golang code on a system with > 10,000 users. (Worked great when I had 8 users.)
Here's the error message I got when I tried to do this:
% go mod tidy && go run extractUsers.go
Extracted 1000 users (1000 total)
Extracted 1000 users (2000 total)
Extracted 1000 users (3000 total)
Extracted 1000 users (4000 total)
Extracted 1000 users (5000 total)
Extracted 1000 users (6000 total)
Extracted 1000 users (7000 total)
Extracted 1000 users (8000 total)
Extracted 1000 users (9000 total)
Extracted 1000 users (10000 total)
httpError: numberOfResults: The [numberOfResults] is invalid because it has exceeded the maximum result window size. The sum of [startRow] and [numberOfResults] must be less than or equal to [10000].
Here's our pagination docs but basically make sure you use the nextResults token.
Please make sure this code works, as that is a prerequisite to delivering the blog post.
We could not reproduce your error. I tested FusionAuth versions 1.64.1, 1.66.0, and even 1.47.1 where the 10000 limitation is supposed to occur successfully with 20 000 users. We've tested on Linux and Mac. With and without OpenSearch. Are you using the latest code from https://github.com/ritza-co/fusionauth-user-charts? If so, please zip your entire FusionAuth folder, with .env files, kickstart files, docker-compose file, and mail it to us so we can try reproduce the problem locally. |
|
@RichardJECooke I'm running this against a live server in the cloud. I am using the latest code, and am running 1.66.0. Can't share access to this particular server, but let me stand up a new one and verify I see the same behavior. Will share details in slack. |
|
@RichardJECooke slack thread with details on the server that I used to replicate the issue: https://inversoft.slack.com/archives/C0363LCJ267/p1778712502330409?thread_ts=1776801470.181799&cid=C0363LCJ267 |
|
@mooreds Fixed, thanks. |
|
@RichardJECooke thanks! the search retrieves all the users now (took about 13 minutes to process from the fake server on my machine). However, there's no code that actually writes the charts.html file. It appears to be checked in, and when I removed it and ran This is problematic because we want anyone downloading this repo to be able to build the charts for themselves, based on their instance. |
|
@mooreds The Or if you're asking how readers can add new custom charts, they need to edit the .html and .go file. |
|
Thanks @RichardJECooke . When I run the command above with the |
|
@mooreds Yes, no files changed. The Go file reads the user.json file, calculates the chart statistics (in memory), loads the html file, injects the chart data, and serves the file to the user. If you'd like us to explain more in the guide, please let me now. I was trying to keep it from getting verbose. The guide currently says
|
|
Ah, my bad! I wasn't following the guide, just trying to work from the repo. No changes needed at this time, but appreciate the explanation. |
|
@mooreds is that an LGTM i hear? |
|
nope, not yet @nathan-contino . Still working through my errors on the chart side, then need to review the blog post :) . But progress! |
Dan reported the extract took ~13 minutes for ~20k users and was unsure if it had stalled. Add a short note after the extract command so readers expect the wait and know progress is being printed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The note was added when the example repo defaulted to 20,000 mock users (a ~13 minute extraction). The default has been reverted to 1,000, which completes in well under a minute, so the warning is no longer warranted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous "Charts created" line was incomplete — charts.go also prints "charts-output.html written" and the actual ready signal "Server listening at http://0.0.0.0:7777". The first line appears before the server is up; a reader watching only for it would think the program had hung after rendering. Also surface charts-output.html, the standalone HTML the program writes on startup. It's mentioned in the output but had no explanation; without one, readers would wonder what the file is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two prose fixes in the cohort analysis article.
1. Abandonment chart description. The previous wording ("haven't
logged in for more than one, two, six, or 12 months") naturally
reads as cumulative thresholds, but the code (and the rendered
image) use mutually exclusive buckets, with the last one
open-ended. Cumulative would be mathematically impossible given
the image: it shows ~560 users in the "12" column and only ~60
in "1", but if 560 users were inactive for 12+ months they would
all also count as inactive for 1+ month, so "1" would have to be
at least 560. Rewrite to describe the actual exclusive bucketing
and explain why "12+" dominates.
2. Appendix instruction "Adjust the URL, authorization key, and
application Id in the code above" was nonsensical — those values
live in .env, and no code appears above. Drop "in the code
above".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The accompanying repo no longer ships a tracked .env (it now ships .env.example with placeholder values). Update both spots that referenced .env to point at the new copy-and-edit flow. The appendix spot also gets a "if you skipped the extraction step" qualifier — a reader running the appendix after the main extraction section already has a working .env and doesn't need to do this again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The accompanying repo no longer writes a static HTML copy on startup; the server at localhost:7777 is the only output. Update the expected terminal output to match and remove the now-stale sentence about a standalone copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses #2990 and FusionAuth/fusionauth-issues#2303