You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`tests/support/`: local static server used by E2E tests
28
-
-`docs/architecture.md`: not present in this repo
29
45
30
46
## Definition of done
31
47
- Run: `npm run build` (or `npm run build-dev` during iteration)
32
-
- Add/adjust tests for: browser upload flow, streaming parse behavior, missing-value classification, numeric stats gating, and top-value counting in `src/main.js`
33
-
- If you can’t run tests: explain + add a minimal verification note
48
+
- Add/adjust unit tests for core logic (classify, columns, result) in `tests/unit/`
49
+
- Add/adjust E2E tests for browser upload flow + URL flow in `tests/e2e/`
50
+
- If you can't run tests: explain + add a minimal verification note
34
51
35
52
## Constraints
36
53
- Don’t add new production dependencies without asking.
37
54
- No DB migrations exist in this repo; ask before introducing any persistence layer or migration tooling.
55
+
- Update the `README.md`, `CHANGELOG.md` and `index.html` with any user-facing changes or new features.
56
+
- Commit messages should be clear and descriptive, following the format: `feature|fix|test|docs: short description` (e.g., `feature: add new column type classification`).
38
57
39
58
## Conventions
40
59
- Formatting: no formatter is configured; preserve existing style (2-space indentation, semicolon-free, single quotes)
Generate data profiles in the browser. Data is processed locally as a stream. In theory you can process really big files because online algorithms are awesome!
3
+
Generate data profiles in the browser or from the command line. Data is processed locally as a stream using online algorithms, so you can handle very large files without loading them into memory.
4
4
5
-
* CSV
6
-
* Count missing values
7
-
* Statistics (min, max, mean, std)
8
-
* Top N
5
+
### Features
6
+
7
+
* CSV and TSV support
8
+
* Streaming processing via Web Workers (UI stays responsive)
9
+
* Missing value detection (empty, NA, NULL, NaN, etc.)
* Variable type classification (Number, String, Boolean, Categorical, Mixed)
13
+
* Load files from URL via query param: `?file=https://example.com/data.csv`
14
+
* CLI output modes: `summary`, `json`, and `serve`
15
+
* npm package: `@statsim/profile`
16
+
17
+
### Usage
18
+
19
+
**Browser:** Open [statsim.com/profile](https://statsim.com/profile/), drag a CSV file, paste a URL, or open a prefilled link such as `?file=https://example.com/data.csv`.
20
+
21
+
**CLI:**
22
+
```
23
+
npx @statsim/profile data.csv
24
+
sprofile data.csv # summary (TTY)
25
+
sprofile data.csv --format json # raw JSON
26
+
sprofile data.csv --format serve # open local browser report
<h2>Use this free and open-source web app to profile data and generate visual summaries of your CSV datasets</h2>
62
68
<p>
63
-
In many industries, understanding data is a vital skill. However, most people struggle to recognize patterns and extract insights from raw tabular datasets because we are not computers. That's why data visualization and profiling are invaluable tools, frequently utilized to transform raw numbers into comprehensible elements like charts, trends, and statistics. Data profiling enables the creation of overviews of tabular files, offering detailed information and descriptive statistics for each variable contained in a dataset. <b>StatSim Profile</b> is a browser-based data profiling tool that is free and open-source. It processes files locally without uploading them to a web server and can handle large datasets, even those in gigabytes.
69
+
In many industries, understanding data is a vital skill. However, most people struggle to recognize patterns and extract insights from raw tabular datasets because we are not computers. That's why data visualization and profiling are invaluable tools, frequently utilized to transform raw numbers into comprehensible elements like charts, trends, and statistics. Data profiling enables the creation of overviews of tabular files, offering detailed information and descriptive statistics for each variable contained in a dataset. <b>StatSim Profile</b> is a browser-based data profiling tool that is free and open-source. It processes files locally without uploading them to a web server and can handle large datasets, even those in gigabytes. Processing runs in a Web Worker to keep the UI responsive. You can also load CSV files directly from a URL or use the <ahref="https://github.com/statsim/profile">command-line tool</a> with <code>summary</code>, <code>json</code>, and <code>serve</code> output modes.
0 commit comments