Skip to content

Commit 884bf76

Browse files
committed
Merge branch 'main' of https://github.com/jwhco/scripts
2 parents 3c49145 + 787e5d8 commit 884bf76

10 files changed

Lines changed: 198 additions & 56 deletions

.vscode/extensions.json

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,17 @@
22
"recommendations": [
33
"github.copilot-chat",
44
"github.vscode-pull-request-github",
5-
"mark-wiemer.vscode-autohotkey-plus-plus",
6-
"janisdd.vscode-edit-csv",
7-
"ltex-plus.vscode-ltex-plus",
8-
"ms-vscode.makefile-tools",
9-
"ms-vscode-remote.remote-wsl",
10-
"ms-toolsai.jupyter",
11-
"ms-toolsai.jupyter-keymap",
12-
"ms-toolsai.jupyter-renderers",
135
"ms-python.python",
146
"ms-python.vscode-pylance",
157
"ms-python.vscode-python-envs",
16-
"yzhang.markdown-all-in-one"
8+
"ms-toolsai.jupyter",
9+
"ms-toolsai.jupyter-keymap",
10+
"ms-toolsai.jupyter-renderers",
11+
"yzhang.markdown-all-in-one",
12+
"ms-vscode.makefile-tools",
13+
"ms-vscode-remote.remote-wsl",
14+
"mark-wiemer.vscode-autohotkey-plus-plus",
15+
"janisdd.vscode-edit-csv",
16+
"ltex-plus.vscode-ltex-plus"
1717
]
1818
}

.vscode/k8s.code-workspace

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
{
22
"folders": [
33
{
4-
"path": "/workspace/scripts"
4+
"path": "/workspaces/scripts",
55
},
66
{
7-
"path": "/workspace/obsidian"
8-
}
7+
"path": "/workspaces/obsidian",
8+
},
99
],
10-
"settings": {}
11-
}
10+
"settings": {},
11+
}

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Project Overview
44

5-
- A library of scripts to support text conversion, editirial productivity, and quality assurance of markdown.
5+
- A library of scripts to support text conversion, editorial productivity, and quality assurance of markdown.
66

77
## Conciseness
88

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Installation Guide for Markdown Tools
2+
3+
## Environment
4+
5+
1. Setup the Python3 virtual environment,
6+
2. Install required Python3 modules,
7+
8+
```bash
9+
python3 -m venv .venv
10+
source .venv/bin/activate
11+
python3 -m pip install -r MarkdownTools/requirements.txt
12+
```
13+
14+
3. Open a Juyper in VsCode to run,
15+
1. Correct any errors in execution,
16+
1. Change Kernels to Pythong Virtual Environment,
17+
2. Make corrections in `requirements.txt`
18+
3. Check running in right environment,
19+
4. FINISH
20+
21+
## Execution
22+
23+
- Make sure you are point at the right markdown corpus,
24+
25+
26+
/EOF/

MarkdownTools/docs/extract-ngram-phrases-README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,14 @@
22

33
## Use Case
44

5-
- Extract from a file all of the ngrams (trigram by default), then print on the screen.
5+
- Extract from a file all of the ngrams (trigram by default), then print on the screen. Use to better understand a single file or a corpus of files.
6+
- The n-grams can be piped into a script for further analysis, or handled by another tool for clustering.
67

7-
## Configuration
8+
## Configure Run-Time Environment
89

9-
- Install NLTK data sets, https://www.nltk.org/data.html
10+
1. Work from a `.venv` Python Virtual Environment,
11+
2. Prepare packages, `pip install -r requirements.txt`,
12+
3. Install NLTK data sets, https://www.nltk.org/data.html
1013

1114
```python
1215
import nltk

MarkdownTools/docs/visualize-content-clusters-README.md

Lines changed: 46 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Use Case
44

5-
- For a directory of markdown notes, determine what are the top five topical clusters.
5+
- For a directory of markdown notes, determine what are the top five topical clusters.
66
- Beacause hashtags and front matter tags are normalized, related terms will group on tags.
77
- Works with markdown note-taking applications like Obsidian, Zettlr, LogSeq, and FOAM.
88

@@ -12,21 +12,58 @@
1212

1313
## Requirements
1414

15-
- Break out YAML front matter tags and Camel case hash tags as plain words.
16-
- Example, `key-word` becomes `key word` for analysis.
17-
- Example, `KeyWord` becomes `key word` for analysis.
18-
- Conversion happens before n-gram analysis of body text.
15+
- Break out YAML front matter tags and Camel case hash tags as plain words.
16+
- Example, `key-word` becomes `key word` for analysis.
17+
- Example, `KeyWord` becomes `key word` for analysis.
18+
- Conversion happens before n-gram analysis of body text.
1919
- Ignore short common headers. The best way to to only tokenize headers three words or longer.
2020
- The ability to have custom stop words to clean up cluster results. Use this for brands, fractional words, and other words that show up in clusters but isn't useful.
21+
- Use Jupyter for concepts, for implementation use command line script that can focus on specific directories.
22+
23+
24+
25+
## Interpretation
26+
27+
### Scatter Plot: Content Semantic Map
28+
29+
Each dot represents one markdown note from your corpus `ZETTEL_ROOT`, a markdown repo.
30+
31+
Here's how to interpret the scatter plot it produces:
32+
33+
34+
35+
- **Color/cluster membership** indicates semantic similarity—notes of the same color share similar concepts and vocabulary
36+
- **Physical proximity** means notes are highly semantically related; dots clustered together contain overlapping ideas
37+
- **Distance between clusters** shows conceptual separation—far clusters represent distinct topics
38+
- **Cluster density** reflects thematic cohesion—tight clusters have focused meaning; loose clusters contain diverse but related concepts
39+
- **Isolated outliers** (dots far from clusters) represent unique notes that don't align well with major themes
40+
- **Top terms printed for each cluster** (C0, C1, etc.) reveal the dominant concepts defining that cluster
41+
- **Dimensionality reduction caveat** as the 2D plot compresses high-dimensional semantic space, so visual distance is approximate
42+
43+
The key insight: **examine cluster labels and look for outliers**, then review the notes associated with them to validate whether the semantic grouping makes sense for your content.
44+
45+
2146

2247
## User Story
2348

49+
### "Is my writing on topic?"
50+
51+
- User has a markdown note-taking application with files stored as plain text. They want to get an idea of what they have been writing about.
52+
- After running the script, they can see the top eight clusters of note-taking topics.
53+
- After careful consideration, the user focuses on a specific cluster to create a report.
54+
- For the desired cluster, the tool reports observed context. User sees tight mapping of dots.
55+
56+
### "Where to prune research set? Tighten work up?"
2457

25-
- User have a markdown note-taking application with files stored as plain text. They want to get an idea of what they have been writing about.
26-
- After running the script, they can see the top eight clusters of note-taking topics.
27-
- After careful consideration, the user focuses on a specific cluster to create a report.
58+
- User is examining a body of research, looking for a concentration to write a paper, but also wants awareness when it comes to distractions.
59+
- All relevant research, proposal, and paper outline is put in the same directory.
60+
- There may include draft materials, relevant commentary, and research notes.
61+
- User runs script against directory to see if there are any outlyers to validate. Decision on tangents.
62+
- An outliner is found, a cluster of n-grams that has out of place words. User searches corpus to move those notes out of the project.
63+
- There is a level of curation, determinging if the note is on purpose for the project.
64+
- In some cases, the outlier indicates a relevant topic that needs more research or expanding of context.
2865

2966

3067
> Copyright 2026 [JWH Consolidated LLC](https://www.jwhco.com/?utm_source=repository&utm_medium=github.com&utm_content=visualize-content-clusters) All rights reserved.
3168
32-
/EOF/
69+
/EOF/

MarkdownTools/extract-hashtag-terms.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
# Configuration
1111
# ZETTEL_ROOT = "/home/hittjw/Documents/GitHub/obsidian/Zettelkasten" # Ubuntu
12-
ZETTEL_ROOT = "/workspace/obsidian/Zettelkasten" # K8S
12+
ZETTEL_ROOT = "/workspaces/obsidian/Zettelkasten" # K8S
1313

1414
WHITELIST = {
1515
"vscode", "latex", "zettlr", "github", "obsidian", "python", "jupyter",

MarkdownTools/visualize-content-clusters.ipynb

Lines changed: 90 additions & 7 deletions
Large diffs are not rendered by default.

Text2Markdown/Is-PDF-Machine-Readable.ipynb

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,15 @@
5353
},
5454
{
5555
"cell_type": "code",
56-
"execution_count": null,
56+
"execution_count": 1,
5757
"id": "4",
5858
"metadata": {},
5959
"outputs": [
6060
{
61-
"ename": "",
62-
"evalue": "",
63-
"output_type": "error",
64-
"traceback": [
65-
"\u001b[1;31mRunning cells with 'venv (3.10.12) (Python 3.10.12)' requires the ipykernel package.\n",
66-
"\u001b[1;31mInstall 'ipykernel' into the Python environment. \n",
67-
"\u001b[1;31mCommand: '/workspaces/scripts/venv/bin/python -m pip install ipykernel -U --force-reinstall'"
61+
"name": "stdout",
62+
"output_type": "stream",
63+
"text": [
64+
"Hello World\n"
6865
]
6966
}
7067
],
@@ -84,7 +81,7 @@
8481
],
8582
"metadata": {
8683
"kernelspec": {
87-
"display_name": "venv (3.10.12)",
84+
"display_name": ".venv (3.12.3)",
8885
"language": "python",
8986
"name": "python3"
9087
},
@@ -98,7 +95,7 @@
9895
"name": "python",
9996
"nbconvert_exporter": "python",
10097
"pygments_lexer": "ipython3",
101-
"version": "3.10.12"
98+
"version": "3.12.3"
10299
}
103100
},
104101
"nbformat": 4,

TidyObsidian/docs/markdown-tasks-quality-README.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Markdown Task Quality Checker
22

3-
## Purpose
3+
## Purpose
44

55
- Find non-standard markdown tasks, fix them or highlight for user, report all tasks.
66
- The script itself doesn't change the markdown files, it reports a higher quality version of the tasks.
@@ -12,8 +12,8 @@
1212

1313
```markdown
1414
- [ ] Task description. (Est: 8h)
15-
- [ ] Sub-Task Description (2h)
16-
- [ ] Sub-Task Two Description.
15+
- [ ] Sub-Task Description (2h)
16+
- [ ] Sub-Task Two Description.
1717
```
1818

1919
- Because Obsidian is my primary note-tking application, favor syntax compatable with Task plugin.
@@ -27,11 +27,10 @@
2727
## Requirements
2828

2929
- When cleaning up a task, don't change layout. Don't change indentation, tab spacing in front of bullet list. The task could have sub-tasks for details in a list.
30-
- Find all the markdown tasks like `grep -r -E '^[\t ]*[-*]\s*\[.?\].*' /workspace/obsidian --include=*.md` which works well. It finds things the script mixed.
30+
- Find all the markdown tasks like `grep -r -E '^[\t ]*[-*]\s*\[.?\].*' /workspaces/obsidian --include=*.md` which works well. It finds things the script mixed.
3131
- Script needs to know if a task is in a `---` or code block as an example. Wholesale updating format may be okay, except in documentation showing poor syntax.
3232
- Understand tasks that are hierachal, attributing the indented sub-tasks as inherint dependency to the higher level task. An outline of tasks implies highest level tasks are completed after the sub-tasks, or sub-sub-tasks are completed.
3333

34-
3534
## Workflow Pseodocode
3635

3736
1. Isolate leading structure (indentation + marker + checkbox). Find the task via basic formatting. Only looking for `- [ ]` task in various forms.
@@ -45,17 +44,14 @@
4544
9. Report best quality markdown task. Make sure that every task is hashed in a way to match back with original when updates are available.
4645
10. END
4746

48-
49-
50-
5147
## Notes
5248

53-
- Python library `markdown-checklist` can crate task lists with checkboxes in Markdown format.
49+
- Python library `markdown-checklist` can crate task lists with checkboxes in Markdown format.
5450
- Python library `markdown-analysis` can parse markdown, extracting headers, paragraphs, and links. https://pypi.org/project/markdown-analysis/
5551

5652
## Reference
5753

5854
- Matthew Rathbone. (2025, August 19) Markdown Task Lists and Checkboxes: Complete Guide for Project Management. https://blog.markdowntools.com/posts/markdown-task-lists-and-checkboxes-complete-guide
59-
- Highlights good and bad syntax for basic task list. As well as some platform specific.
55+
- Highlights good and bad syntax for basic task list. As well as some platform specific.
6056

61-
/EOF/
57+
/EOF/

0 commit comments

Comments
 (0)