statsim
diff --git a/‎.github/workflows/deploy.yml‎
Lines changed: 7 additions & 1 deletion b/‎.github/workflows/deploy.yml‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎AGENTS.md‎
Lines changed: 24 additions & 5 deletions b/‎AGENTS.md‎
Lines changed: 24 additions & 5 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 34 additions & 6 deletions b/‎README.md‎
Lines changed: 34 additions & 6 deletions
diff --git a/‎css/main.css‎
Lines changed: 11 additions & 0 deletions b/‎css/main.css‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎index.html‎
Lines changed: 7 additions & 1 deletion b/‎index.html‎
Lines changed: 7 additions & 1 deletion
@@ -27,8 +27,14 @@ jobs:
           cache: npm
       - run: npm ci
       - run: mkdir -p dist && npm run build
+      - run: |
+          mkdir -p _site
+          cp index.html _site/
+          cp -r css fonts dist _site/
+          if [ -f CNAME ]; then cp CNAME _site/; fi
+          touch _site/.nojekyll
       - uses: actions/upload-pages-artifact@v3
         with:
-          path: .
+          path: _site
       - id: deployment
         uses: actions/deploy-pages@v4
@@ -14,27 +14,46 @@ Use **2-space indentation** and **semicolon-free** syntax. Use **single quotes**
 
 ## Quick start
 - Install: `npm install`
-- Tests: `npm test` (runs Playwright E2E against a local static server)
+- Tests: `npm test` (runs unit tests + Playwright E2E)
 - Lint/format: Not configured; use `npm run build-dev` as a fast sanity build
+- CLI:
+  - `node src/cli/index.js data.csv` (auto: summary on TTY, JSON when piped)
+  - `node src/cli/index.js data.csv --format json|summary|serve`
 
 ## Repo map
-- `src/main.js`: core browser app (streaming CSV parse, stats aggregation, output rendering)
+- `src/core/`: pure-JS profiling engine (no DOM deps)
+  - `index.js`: `profileStream(readable, opts)` — main API
+  - `constants.js`: shared constants (missing markers, thresholds, stats conventions)
+  - `classify.js`: `classifyValue()`, `getVariableType()` — pure functions
+  - `columns.js`: `initColumns()`, `updateColumns()` — online-stats wrappers
+  - `result.js`: `finalizeResult()` — builds versioned ProfileResult (v1)
+- `src/render/index.js`: DOM/chart rendering (tui-chart), consumes ProfileResult
+- `src/worker/profile-worker.js`: Web Worker — runs core in background thread (file + URL jobs)
+- `src/main.js`: browser entry — DnD/file/url handlers, Worker dispatch, render
+- `src/cli/index.js`: CLI entry — `fs.createReadStream`/stdin → core → formatter (`json|summary|serve`)
+- `src/cli/progress.js`: CLI progress renderer (spinner/progress bar)
+- `src/cli/format-summary.js`: ANSI terminal summary formatter
+- `src/cli/serve.js`: local report server for `--format serve` (`/api/result` + browser open)
 - `index.html`: UI shell and app entrypoint
 - `css/`: app styles and vendor chart CSS
 - `dist/bundle.js`: built browser bundle (generated)
+- `dist/worker-bundle.js`: built worker bundle (generated)
 - `fonts/`: local Roboto font assets
+- `tests/unit/`: tape unit tests for core modules
 - `tests/e2e/`: Playwright browser-level regression tests
 - `tests/support/`: local static server used by E2E tests
-- `docs/architecture.md`: not present in this repo
 
 ## Definition of done
 - Run: `npm run build` (or `npm run build-dev` during iteration)
-- Add/adjust tests for: browser upload flow, streaming parse behavior, missing-value classification, numeric stats gating, and top-value counting in `src/main.js`
-- If you can’t run tests: explain + add a minimal verification note
+- Add/adjust unit tests for core logic (classify, columns, result) in `tests/unit/`
+- Add/adjust E2E tests for browser upload flow + URL flow in `tests/e2e/`
+- If you can't run tests: explain + add a minimal verification note
 
 ## Constraints
 - Don’t add new production dependencies without asking.
 - No DB migrations exist in this repo; ask before introducing any persistence layer or migration tooling.
+- Update the `README.md`, `CHANGELOG.md` and `index.html` with any user-facing changes or new features.
+- Commit messages should be clear and descriptive, following the format: `feature|fix|test|docs: short description` (e.g., `feature: add new column type classification`).
 
 ## Conventions
 - Formatting: no formatter is configured; preserve existing style (2-space indentation, semicolon-free, single quotes)
 
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Anton Zemlyansky
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -1,8 +1,36 @@
-## Profile
+## StatSim Profile
 
-Generate data profiles in the browser. Data is processed locally as a stream. In theory you can process really big files because online algorithms are awesome!
+Generate data profiles in the browser or from the command line. Data is processed locally as a stream using online algorithms, so you can handle very large files without loading them into memory.
 
-* CSV
-* Count missing values
-* Statistics (min, max, mean, std)
-* Top N
+### Features
+
+* CSV and TSV support
+* Streaming processing via Web Workers (UI stays responsive)
+* Missing value detection (empty, NA, NULL, NaN, etc.)
+* Descriptive statistics (min, max, mean, variance, std)
+* Histograms and top-N value counts
+* Variable type classification (Number, String, Boolean, Categorical, Mixed)
+* Load files from URL via query param: `?file=https://example.com/data.csv`
+* CLI output modes: `summary`, `json`, and `serve`
+* npm package: `@statsim/profile`
+
+### Usage
+
+**Browser:** Open [statsim.com/profile](https://statsim.com/profile/), drag a CSV file, paste a URL, or open a prefilled link such as `?file=https://example.com/data.csv`.
+
+**CLI:**
+```
+npx @statsim/profile data.csv
+sprofile data.csv                       # summary (TTY)
+sprofile data.csv --format json         # raw JSON
+sprofile data.csv --format serve        # open local browser report
+cat data.csv | sprofile --stdin         # stdin mode
+```
+
+### Development
+
+```
+npm install
+npm run build-dev   # build browser bundles
+npm test            # run unit + E2E tests
+```
@@ -62,6 +62,17 @@ dd {
   height: 3px;
 }
 
+#progress.indeterminate {
+  width: 100% !important;
+  animation: indeterminate 1.5s infinite ease-in-out;
+}
+
+@keyframes indeterminate {
+  0% { opacity: 0.3; }
+  50% { opacity: 1; }
+  100% { opacity: 0.3; }
+}
+
 #output h2 {
   font-size: 42px;
   font-weight: 700;
 
@@ -48,6 +48,12 @@
           <div class="drag-text">
             <h4>Drag & drop a CSV file</h4>
             <p>Or <label for="input" style="color:#039be5; cursor: pointer; font-size: 14px;">choose a file</label></p>
+            <div class="url-input" style="margin-top: 16px;">
+              <input id="url-input" type="text" placeholder="Or paste a CSV URL (https://...)" style="width: 400px; max-width: 80%; padding: 4px 8px; font-size: 14px; border: 1px solid #ccc; border-radius: 3px;">
+              <button id="url-load" style="padding: 4px 12px; font-size: 14px; cursor: pointer; margin-left: 4px;">Load</button>
+            </div>
+            <p style="font-size: 13px; color: #666; margin-top: 8px;">Tip: you can also open <code>?file=https://example.com/data.csv</code></p>
+            <p id="url-error" style="color: #e53935; font-size: 13px; display: none;"></p>
           </div>
         </div>
       </div>
@@ -60,7 +66,7 @@ <h4>Drag & drop a CSV file</h4>
             <h1>Data profiling online</h1>
             <h2>Use this free and open-source web app to profile data and generate visual summaries of your CSV datasets</h2>
             <p>
-              In many industries, understanding data is a vital skill. However, most people struggle to recognize patterns and extract insights from raw tabular datasets because we are not computers. That's why data visualization and profiling are invaluable tools, frequently utilized to transform raw numbers into comprehensible elements like charts, trends, and statistics. Data profiling enables the creation of overviews of tabular files, offering detailed information and descriptive statistics for each variable contained in a dataset. <b>StatSim Profile</b> is a browser-based data profiling tool that is free and open-source. It processes files locally without uploading them to a web server and can handle large datasets, even those in gigabytes.
+              In many industries, understanding data is a vital skill. However, most people struggle to recognize patterns and extract insights from raw tabular datasets because we are not computers. That's why data visualization and profiling are invaluable tools, frequently utilized to transform raw numbers into comprehensible elements like charts, trends, and statistics. Data profiling enables the creation of overviews of tabular files, offering detailed information and descriptive statistics for each variable contained in a dataset. <b>StatSim Profile</b> is a browser-based data profiling tool that is free and open-source. It processes files locally without uploading them to a web server and can handle large datasets, even those in gigabytes. Processing runs in a Web Worker to keep the UI responsive. You can also load CSV files directly from a URL or use the <a href="https://github.com/statsim/profile">command-line tool</a> with <code>summary</code>, <code>json</code>, and <code>serve</code> output modes.
             </p>
           </div>
         </div>