diff --git a/.claude/skills/book.read_and_summarize_chapter/SKILL.md b/.claude/skills/book.read_chapter/SKILL.md similarity index 69% rename from .claude/skills/book.read_and_summarize_chapter/SKILL.md rename to .claude/skills/book.read_chapter/SKILL.md index 827b323db..72a5f2ac3 100644 --- a/.claude/skills/book.read_and_summarize_chapter/SKILL.md +++ b/.claude/skills/book.read_chapter/SKILL.md @@ -49,20 +49,18 @@ the book markdown file. Return the content to the user for context. -## Step 4: Summarize Content +## Step 4: Print Content Structure -- Write a summary using the same structure of the chapter and subchapter in - markdown headers +- Write a structure of the chapter and subchapter in markdown headers - Use numbers of chapter (e.g., 1.) and subchapters (e.g., 1.1) - Use the chapter numbers that come from the book -- For each chunk of text, summarize the text using rules from - @.claude/skills/text.summarize_in_bullet_points/SKILL.md - ## Step 5: Answer Follow-up Questions -Answer any questions the user asks about the content just read, referencing -specific sections or concepts from the chapter summary. +- Do not do anything else (e.g., summarize the content), but wait for the user + to ask questions +- Answer any questions the user asks about the content just read, referencing + specific sections or concepts from the chapter summary. # Example @@ -77,14 +75,4 @@ specific sections or concepts from the chapter summary. - Chapter 3: Graphical Models - 3.1 Directed Acyclic Graphs (DAGs) - 3.2 d-separation and Conditional Independence - - # Summary: - - - **Chapter 3: Graphical Models**: DAGs represent causal relationships through - nodes and directed edges; d-separation criterion determines conditional - independence from graph structure - - **3.1 DAGs**: Nodes represent variables, edges represent direct causal - effects, acyclicity prevents circular reasoning - - **3.2 d-separation**: Variables are d-separated if no open path exists; - d-separation implies conditional independence under causal model assumptions ``` diff --git a/.claude/skills/graphviz.causal_kg_style/SKILL.md b/.claude/skills/graphviz.causal_kg_style/SKILL.md index 0097e839f..400babd79 100644 --- a/.claude/skills/graphviz.causal_kg_style/SKILL.md +++ b/.claude/skills/graphviz.causal_kg_style/SKILL.md @@ -8,10 +8,10 @@ I will give you a description or an image and your task is to produce a Graphviz/DOT representation of that graph that follows the rules below exactly. The resulting graph should allow a knowledgeable reader to -- distinguish causation from correlation at a glance -- identify exogenous vs endogenous variables -- identify latent vs observable variables -- recognize interventions and counterfactuals +- Distinguish causation from correlation at a glance +- Identify exogenous vs endogenous variables +- Identify latent vs observable variables +- Recognize interventions and counterfactuals Use color to distinguish variable types consistently. @@ -22,7 +22,6 @@ Use color to distinguish variable types consistently. - Use Graphviz DOT syntax - Use a directed graph (`digraph`) - Set `rankdir=LR` for left-to-right causal flow -- Prefer readability over compactness - Use both `color` (border) and `fillcolor` + `style=filled` to encode variable type (do not rely on color alone; keep shape conventions too) diff --git a/.claude/skills/markdown.fix_bullet_points/SKILL.md b/.claude/skills/markdown.fix_bullet_points/SKILL.md index 7561e7207..863851bf5 100644 --- a/.claude/skills/markdown.fix_bullet_points/SKILL.md +++ b/.claude/skills/markdown.fix_bullet_points/SKILL.md @@ -4,10 +4,17 @@ description: Reorganize a markdown file to use bullet points and ensure all fenc - Given a markdown file passed from the user -# Step 1 -- Make sure the text is organized in bullet points +# Organize text in bullet points +- Make sure all the text is organized in bullet points + ``` + **What it does**: + - Extracts each page of a PDF file as a separate PNG image + - Numbers output files sequentially (slides001.png, slides002.png, etc.) + - Supports customizable DPI for image quality control + - Creates output directory automatically with optional from-scratch mode + ``` -# Step 2 +# Handle fenced div - Make sure that all fenced div have a syntax description (e.g., python, markdown, verbatim) - Bad @@ -15,7 +22,7 @@ description: Reorganize a markdown file to use bullet points and ensure all fenc The simplest ripgrep command searches for a pattern in the current directory: ```bash - > rg "pattern" + > rig "pattern" ``` ```` - Good @@ -23,15 +30,58 @@ description: Reorganize a markdown file to use bullet points and ensure all fenc - The simplest ripgrep command searches for a pattern in the current directory: ```bash - > rg "pattern" + > rig "pattern" ``` ```` -# Step 3 +# Format commands - Make sure Linux / MacOS shell commands are prepended with: - `>` when they are bash commands - `docker>` when they are commands run inside Docker - `claude>` when they are commands run inside Claude -# Step 4 -- Run `lint_txt.py -i ` to reformat the text +# Do not abuse level 3 headers +- Do not use header level 3, but use bold when there are too many of them with + too small of content +- E.g., convert + ``` + ### What It Does + ... + + ### Examples + ... + ``` + to + ``` + **What it does** + ... + + **Examples** + ... + ``` + +# Do not abuse bold +- Do not abuse bold in the explanation of commands + - Bad + ``` + - **Extract with higher DPI** for better image quality: + ``` + - Good + ``` + - Extract with higher DPI for better image quality: + ``` + +# Add table + +- If the file contains a description of commands add a table at the beginning + with a summary of all the commands + - E.g., + ``` + | Script | Location | Description | + | :------------------------- | :------------------------------------------------ | :---------------------------------------------------------------------------------------------------------------------- | + | `concatenate_pdfs.py` | `helpers_root/dev_scripts_helpers/documentation/` | Combines multiple PDF files into a single PDF (used for creating full book from chapters) | + | `count_book_pages.py` | `class_scripts/` | Counts pages in all PDF files in `{DIR}/book/` directory using macOS `mdls` command | + ``` + +# Lint +- At the end of the process, run `lint_txt.py -i ` to reformat the text diff --git a/.claude/skills/markdown.format_rules/SKILL.md b/.claude/skills/markdown.format_rules/SKILL.md index a71d8fa45..f15ad5ea0 100644 --- a/.claude/skills/markdown.format_rules/SKILL.md +++ b/.claude/skills/markdown.format_rules/SKILL.md @@ -3,10 +3,6 @@ description: Format markdown files according to conventions for clarity, structu model: haiku --- -# Summary - -- This file contains conventions for writing markdown files - # Goals and Philosophy - Make the text easy to consume for both humans and AI diff --git a/.claude/skills/slides.add_visuals/SKILL.md b/.claude/skills/slides.add_visuals/SKILL.md new file mode 100644 index 000000000..05ad272e4 --- /dev/null +++ b/.claude/skills/slides.add_visuals/SKILL.md @@ -0,0 +1,26 @@ +--- +description: Propose visuals for each slides +--- + +- Given a markdown file with slides for a college class, where each slide title + is prepended with `*` + +# Leave structure unchanged +- Maintain the structure of the text and keep the content of the existing text + +# For each slide +- If a slide doesn't contain a picture or a diagram (e.g., graphviz), consider + what can be used to illustrate the concepts visually, e.g., + - Propose a graphviz diagram + - Find an image on the Internet (download and save it in a dir + `proposed_images`) + - Propose the description of an image in the format + ``` + + Description of the image + + ``` + +# Ask user to confirm and decide +- Make numbered list of proposed changes for the user +- Once user confirms changes, perform the changes diff --git a/.claude/skills/slides.criticize_structure/SKILL.md b/.claude/skills/slides.criticize_structure/SKILL.md new file mode 100644 index 000000000..a2aea36ea --- /dev/null +++ b/.claude/skills/slides.criticize_structure/SKILL.md @@ -0,0 +1,44 @@ +--- +description: Criticize and suggest improvements for class slides +--- + +- Given a Markdown file storing slides for a class (where each slide title is + prepended with `*`) + +# Propose improvements +- Read the content and make suggestions on how to improve the class, numbering + each suggestion so that it's easy to refer to + - Do not make changes but only make proposals + +## Change order of slides +- Propose how to organize the slides in a different flow, separating cohesive + chunks with level 1 and 2 headers + - E.g., + ``` + # Topic 1 + + ## Topic 1.1 + + * Slide 1 + + * Slide 2 + ``` + +## Slides to remove +- Remove slides whose content is redundant or unclear + +## Slides to merge +- Merge slides to remove redundant content + +## Fix content of slides +- If a slide content is incorrect, propose how to fix it + +## Ignore TODOs and comments +- Leave the TODOs or comments in the format + ``` + // ... + ``` + untouched + +# Ask the user which improvement needs to be done +- After the user approves a subset of changes, perform the changes in place diff --git a/.claude/skills/slides.fix_errors/SKILL.md b/.claude/skills/slides.fix_errors/SKILL.md new file mode 100644 index 000000000..2b01233a0 --- /dev/null +++ b/.claude/skills/slides.fix_errors/SKILL.md @@ -0,0 +1,13 @@ +--- +description: Fix slides without changing their structure +--- + +- Given a markdown file with slides for a college class, where each slide title + is prepended with `*` + +# Leave structure unchanged +- Maintain the structure of the text and keep the content of the existing text + +# Fix mistakes +- Fix English grammar +- Fix any mistake only if you are sure about the correction diff --git a/.claude/skills/slides.fix_slides/SKILL.md b/.claude/skills/slides.fix_slides/SKILL.md deleted file mode 100644 index 133654120..000000000 --- a/.claude/skills/slides.fix_slides/SKILL.md +++ /dev/null @@ -1,15 +0,0 @@ ---- -description: Fix and improve markdown slides by adding bullet points, examples, and correcting grammar ---- - -- Given the slides in Markdown - - Each slide titles are prepended with `*` - -- You will: - - Maintain the structure of the text and keep the content of the existing text - - Add bullet points to the text that are important or missing - - Add examples to clarify the text and help intuition - - Fix the English grammar - - Fix any mistake only if you are sure about the correction - -Print only the markdown without any explanation. diff --git a/.claude/skills/slides.format_rules/SKILL.md b/.claude/skills/slides.format_rules/SKILL.md index 53b43c23e..7dc3eaae5 100644 --- a/.claude/skills/slides.format_rules/SKILL.md +++ b/.claude/skills/slides.format_rules/SKILL.md @@ -2,36 +2,26 @@ description: Format slides for technical audiences following structured presentation conventions --- -You are an expert writer of slides and presentation. - -# Audience - -- Your target is college graduate in computer science +You are an expert writer of slides and presentation for college students + - Your target audience is college graduate in computer science - You need to be clear and precise # Formatting - - Don't use emoji - Don't use page separators -- Make examples for concepts whenever possible +- Don't use unicode characters but use Latex symbols if needed + - Instead of → use $\to$ -# When I Ask You to Create a Slide - -- Each slide should start with <* Slide title> -- Each slide contains at most 8 bullet point arranged in a hierarchical - structure +# Slide format +- Each slide should start with: + ``` + * Slide title + ``` +- Each slide contains bullet point arranged in a hierarchical structure - Every line starts with a bullet point - Do not use period at the end of a phrase -# Suggest Images When Possible - -- When an image might help with clarity, add a description of it like: +- Use italic and add quotes for questions ``` - - Description of the image - + - E.g., _"If we lower prices by 10%, will revenue increase?"_ ``` - -# Output Is Markdown - -- Write the output in a markdown format in the form of code diff --git a/.claude/skills/slides.read_chapter_and_write_slides/SKILL.md b/.claude/skills/slides.read_chapter_and_write_slides/SKILL.md new file mode 100644 index 000000000..1d10d6579 --- /dev/null +++ b/.claude/skills/slides.read_chapter_and_write_slides/SKILL.md @@ -0,0 +1,6 @@ +--- +description: Read a chapter of a book and write lecture slides +--- + +/book.read_chapter /Users/saggese/src/notes1/books/facure2.md "Neutral Controls" +/slides.write_slides create one slide for the "Neutral controls" diff --git a/.claude/skills/slides.reduce_text/SKILL.md b/.claude/skills/slides.reduce_text/SKILL.md index 2b5f41adb..08467c996 100644 --- a/.claude/skills/slides.reduce_text/SKILL.md +++ b/.claude/skills/slides.reduce_text/SKILL.md @@ -1,37 +1,107 @@ --- -description: Reduce the text in a technical slide +description: Reduce the text in slide leaving the structure unchanged --- -You are an expert writer of slides and presentations. +You are an expert writer of slides and presentations for a college audience ## Subset of slides - If there are tokens and you will process only the text between those tokens -- Otherwise you can process the entire file +- Otherwise you process the entire file + +## Follow conventions +- Follow the conventions in `.claude/skills/slides.format_rules/SKILL.md ## Slide title - If a line starts with an asterisk `*`, it's the slide title and leave it unchanged - Examples: - - * Slide title - - This is a very long bullet point that is not clear and should be removed - - This is a clear bullet point that should be kept - - - - * Slide title - - This is a clear bullet point that should be kept - ## Keep the structure +- Maintain the sequence of slides, the comments, and the headers - Maintain the structure of the text in terms of bullet and sub-bullet points - Keep all the figures +- Leave bold lines untouched + - E.g., + ``` + - **Collections of data** + - Aggregated, organized data sets for analysis + - E.g., customer purchase histories in a CRM system + ``` + - Good + ``` + - **Collections of data** + - Organized datasets for analysis + - E.g., customer purchase histories in CRM + ``` + - Bad + ``` + - **Collections of data**: organized datasets for analysis + - E.g., customer purchase histories in CRM + ``` + +## Reduce text +- Reduce text keeping the structure of the bullets untouched + - Write directly to the audience using "you" + - Be concise: remove filler words (e.g., "the", "that", "very") + - Use active voice (e.g., "Improve accuracy," not "Accuracy can be improved") + - Prefer short phrases over full sentences +- E.g., + + ``` + * Slide title + - This is a very long bullet point that is not clear and should be removed + - This is a clear bullet point that should be kept + ``` + + + + ``` + * Slide title + - This is a clear bullet point that should be kept + ``` + + +- Example + + - **Collections of data** + - Aggregated, organized data sets for analysis + - E.g., customer purchase histories in a CRM system + + - **Descriptive statistics** + - Summary metrics: mean, median, mode, standard deviation + - E.g., average sales per quarter to understand trends + + - **Historical reports** + - Examination of _past performance_ + - E.g., monthly sales reports for past fiscal year + + - **Dashboards** + - Visual displays of key metrics for insights + - E.g., dashboard showing quarterly revenue, expenses + + - **Models** + - Statistical representations to _forecast, explain phenomena_ + - E.g., model to anticipate customer churn based on behavioral data + + + + - **Collections of data** + - Organized datasets for analysis + - E.g., customer purchase histories in a CRM + + - **Descriptive statistics** + - Key metrics: mean, median, mode, standard deviation + - E.g., average quarterly sales to track trends + + - **Historical reports** + - Review of _past performance_ + - E.g., monthly sales reports from last year -## Improve text -- Make the text clean and readable -- Remove all the words that are not needed and that are not important -- Use "you" instead of "we" -- Be concise: Drop filler words ("the", "that", etc.) -- Use active voice: "Improve accuracy" instead of "Accuracy can be improved." + - **Dashboards** + - Visuals of key metrics for quick insights + - E.g., quarterly revenue and expense dashboard + - **Models** + - Statistical tools to _forecast, explain_ + - E.g., churn prediction from customer behavior + diff --git a/.claude/skills/slides.suggest_improvements/SKILL.md b/.claude/skills/slides.suggest_improvements/SKILL.md deleted file mode 100644 index 046a50ae9..000000000 --- a/.claude/skills/slides.suggest_improvements/SKILL.md +++ /dev/null @@ -1,17 +0,0 @@ ---- -description: Suggest which markdown slides to remove or merge to improve a presentation ---- - -- Given a file storing slides in Markdown - - Each slide title is prepended with `*` - -- Follow the @.claude/skills/slides.suggest_improvements/SKILL.md - -# Slides to remove -- Remove slides whose content is redundant or unclear - -# Slides to merge -- Merge slides and remove redundant content - -# Fix content of slides -- If a slide content is incorrect, fix it or clarify it diff --git a/.claude/skills/slides.write_slides/SKILL.md b/.claude/skills/slides.write_slides/SKILL.md index 2ad1ae2f5..04b60b3fc 100644 --- a/.claude/skills/slides.write_slides/SKILL.md +++ b/.claude/skills/slides.write_slides/SKILL.md @@ -2,52 +2,72 @@ description: Write lecture slides for a graduate-level ML course following academic formatting and pedagogical style --- -You are a college professor in CS. - -You are tasked with creating lecture slides for MSML610: Advanced Machine -Learning. - -- Follow this format exactly +You are a college professor in Computer Science, machine learning, and artificial +intelligence and you are tasked with creating lecture slides for a college class. ## Pedagogical Style - When writing slides, maintain academic rigor while ensuring clarity for - graduate-level ML students + graduate-level students - Balance mathematical formalism with intuitive explanations and concrete examples -- Progressive Complexity: Start simple, build to complex -- Multiple Representations: Text, math, diagrams, tables, examples -- Concrete Examples: Burglar alarm, wet grass, car insurance, medical diagnosis -- Clear Terminology: Bold new terms on first use -- Intuition Before Formalism: Explain concept, then formalize -- Connections: Reference earlier concepts when building on them +- Progressive complexity: start simple, build to complex +- Multiple representations: text, math, diagrams, tables, examples +- Use concrete examples + - Label clearly as "**Example**" with real-world scenarios +- Intuition before formalism: explain concept, then formalize +- Reference earlier concepts when building on them + +- Introduce Problem/Motivation: Start with why the topic matters +- Formal Definitions: Use clear, mathematical definitions +- Visualizations: Include GraphViz diagrams for relationships/networks +- Comparisons: Use "vs" or side-by-side columns +- Algorithms: Number steps clearly +- Pros/Cons: Use bullet lists with `**Pros**` and `**Cons**` headers ## Sections -- Major Sections are delimited with: +- Major sections are delimited with: ``` # ############################################################################## # Section Title # ############################################################################## ``` -- Subsections: Use `##` for subsections or just section names without `#` +- Use `##` for subsections ## Formatting style - Write slides in markdown - Do not use page separators -- Special definitions: `\defeq` for "defined as" - Group font size changes: `\begingroup \large ... \endgroup` -- Reference figures from `msml610/lectures_source/figures/` -- Use `\iff` for "if and only if" -- Use `\perp` for independence symbol -- Do not use non ASCII characters but use Latex when neede - - E.g., instead of ε use $\epsilon$ -- Instead of → use $\to$ -- Use $\EE[...]$ and $\VV[...]$ +- Do not use non ASCII characters for symbols but use Latex when needed + - Instead of ε use $\varepsilon$ + - Instead of → use $\to$ + +## Mathematical Notation +- Inline math: `$\Pr(X | Y)$` +- Display math: + ```markdown + $$ + \Pr(X | Y) = \frac{\Pr(Y | X) \Pr(X)}{\Pr(Y)} + $$ + ``` +- Multi-line equations: + ``` + \begin{align*} + & \Pr(x_1, x_2) \\ + & = \Pr(x_1) \Pr(x_2 | x_1) + \end{align*} + ``` +- Special definitions: + - `\defeq` for "defined as" + - `\iff` for "if and only if" + - `\perp` for independence symbol + - $\EE[...]$ for mean + - $\VV[...]$ for varianc3 ## Slide formats - Use `*` for slide title/bullets: - ``` + ```markdown * Slide Title - Main point @@ -76,13 +96,15 @@ Learning. ```` ## Tables -``` -\begingroup \scriptsize -| **Column1** | **Column2** | -| ----------- | ----------- | -| Value | Value | -\endgroup -``` + +- Whenever possible use markdown tables + ```markdown + \begingroup \scriptsize + | **Column1** | **Column2** | + | ----------- | ----------- | + | Value | Value | + \endgroup + ``` ## Columns (Side-by-Side Content) ``` @@ -96,31 +118,14 @@ Content on right ::: ``` -## Mathematical Notation - -- Inline math: `$\Pr(X | Y)$` -- Display math: `$$\Pr(X | Y) = \frac{\Pr(Y | X) \Pr(X)}{\Pr(Y)}$$` -- Multi-line equations: - ``` - \begin{align*} - & \Pr(x_1, x_2) \\ - & = \Pr(x_1) \Pr(x_2 | x_1) - \end{align*} - ``` -## Content Patterns +# Different formats of slides -1. Introduce Problem/Motivation: Start with why the topic matters -2. Formal Definitions: Use clear, mathematical definitions -3. Examples: Label clearly as "**Example**" with real-world scenarios -4. Visualizations: Include GraphViz diagrams for relationships/networks -5. Comparisons: Use "vs" or side-by-side columns -6. Algorithms: Number steps clearly -7. Pros/Cons: Use bullet lists with `**Pros**` and `**Cons**` headers +- Examples of different format slides are below ## Definition Slide ``` -* Term: Definition +* : Definition - **Term** is [definition] - Property 1 @@ -132,7 +137,7 @@ Content on right ## Example Slide ``` -* Topic: Example +* : Example - **Example**: [scenario description] - Given: [conditions] diff --git a/.claude/skills/text.summarize_hn_in_bullet_points/SKILL.md b/.claude/skills/text.summarize_hn_in_bullet_points/SKILL.md index 1b805288a..b12f325fd 100644 --- a/.claude/skills/text.summarize_hn_in_bullet_points/SKILL.md +++ b/.claude/skills/text.summarize_hn_in_bullet_points/SKILL.md @@ -3,20 +3,40 @@ description: Summarize the discussion on Hacker News on a topic model: haiku --- -Analyze the Hacker News comment section for the linked article. +- Given a pointer to a discussion on HackerNews in the form of a URL + - E.g., https://news.ycombinator.com/item?id=47743628 -From all comments, select the 5 most interesting ones based on the following -criteria: -- Thought-provoking or insightful -- Presents a unique perspective or uncommon knowledge -- Sparks discussion or debate -- Technically informative or educational -- Controversial but well-argued +- All the output the text should be structured markdown bullet points following + the rules in `.claude/skills/text.summarize_in_bullet_points/SKILL.md` -Avoid selecting comments that are: -- Simple jokes or memes -- Very short reactions -- Repetitive or low-effort +# Step 1: Summarize article +- Summarize the main article in 5 bullet points + ``` + # The peril of laziness lost + - ... + - ... + ``` -- Convert the text into structured markdown bullet points following the rules in - @.claude/skills/text.summarize_in_bullet_points/SKILL.md +# Step 2: Summarize comments +- Analyze the Hacker News comment section for the linked article. + +- From all comments, summarize the 5 most interesting ones based on the following + criteria: + - Thought-provoking or insightful + - Presents a unique perspective or uncommon knowledge + - Sparks discussion or debate + - Technically informative or educational + - Controversial but well-argued + - Do not print the name of the commenter + +- Avoid selecting comments that are: + - Simple jokes or memes + - Very short reactions + - Repetitive or low-effort + +# Step 3: +- Do not output any comment on screen +- Output the result in a file `hn.txt` without bold or other markdown formatting, + but also +- Run `lint_txt.py -i hn.txt` +- Run `cat hn.txt` diff --git a/.claude/skills/tikz.create_plot/SKILL.md b/.claude/skills/tikz.create_plot/SKILL.md new file mode 100644 index 000000000..e1f37f126 --- /dev/null +++ b/.claude/skills/tikz.create_plot/SKILL.md @@ -0,0 +1,70 @@ +--- +description: Create a tikz plot +--- + +You are an expert LaTeX/TikZ developer. + +Your task is to convert the given input (which may be an image or a textual +description) into clean, compilable TikZ code. + +# Step 1: Generate TikZ description + +## Output valid TikZ description +- Output ONLY valid LaTeX code using the TikZ package +- Wrap everything inside a complete minimal working example: + ``` + \documentclass{standalone} + \usepackage{tikz} + \begin{document} + \begin{tikzpicture} + ... + \end{tikzpicture} + \end{document} + ``` + +## Preserve layout, if needed +- If an image was given, accurately reproduce the layout: + - Preserve proportions, relative positions, and symmetry. + - Use coordinates and scaling where appropriate. + - Approximate complex curves with TikZ paths when needed. + +## Create TikZ code +- Use appropriate TikZ features: + - Nodes for labeled elements + - draw, fill, shade for shapes + - arrows and edge styles when relevant + - positioning and calc libraries if helpful + +- Keep the code clean and readable: + - Use indentation + - Define reusable styles if repeated elements exist + +- If the input is ambiguous: + - Make reasonable assumptions + - Prefer clarity and visual correctness over perfection + +- Do NOT include explanations, comments, or markdown. + Only output the LaTeX code. + +# Step 2: Save File + +- Save the output in a `tikz_figure.tex` file +- Output only valid tikz code without triple backticks +- Do not explain the code in natural language + +# Step 3: Render Graph + +- After the graph description is generated, generate an image with: + ``` + > ./helpers_root/dev_scripts_helpers/documentation/dockerized_tikz_to_bitmap.py \ + -i tikz_figure.tex -o output.png + + > open output.png + ``` + +# Step 4: Read the PNG file + +- If an image was specified, read the PNG file +- If the generated PNG image is very different from the input image: + - Find the differences in terms of layout + - Apply changes to the causal_graph.dot to approximate the input image diff --git a/dev_scripts_helpers/documentation/convert_pdf_to_md.py b/dev_scripts_helpers/documentation/convert_pdf_to_md.py deleted file mode 100755 index c5c713abd..000000000 --- a/dev_scripts_helpers/documentation/convert_pdf_to_md.py +++ /dev/null @@ -1,166 +0,0 @@ -#!/usr/bin/env -S uv run - -# /// script -# dependencies = ["pymupdf", "pyyaml"] -# /// - -""" -Convert PDF file to markdown and extract figures. - -# Usage: - -1) Run this command to convert a PDF to markdown: - -> IN_FILE_NAME="document.pdf" -> OUT_FILE_NAME="document.md" -> convert_pdf_to_md.py --input $IN_FILE_NAME --output $OUT_FILE_NAME - -Figures will be extracted to a directory `$OUT_FILE_NAME.figs/` in the same location -as the markdown file. -""" - -import argparse -import logging -import os - -import fitz - -import helpers.hdbg as hdbg -import helpers.hio as hio -import helpers.hparser as hparser - -_LOG = logging.getLogger(__name__) - -# ############################################################################# -# PDF Processing -# ############################################################################# - - -def _extract_text_and_images(pdf_path: str, output_dir: str) -> str: - """ - Extract text and images from PDF using pymupdf. - - :param pdf_path: path to the input PDF file - :param output_dir: directory to save extracted images - :return: markdown text with image references - """ - hdbg.dassert_file_exists(pdf_path) - hio.create_dir(output_dir, incremental=False) - _LOG.info("Opening PDF: %s", pdf_path) - pdf_document = fitz.open(pdf_path) - markdown_content = [] - image_counter = 0 - fig_dir_name = os.path.basename(output_dir) - # Process each page. - for page_num in range(len(pdf_document)): - _LOG.debug("Processing page %d", page_num + 1) - page = pdf_document[page_num] - # Extract text from the page. - text = page.get_text() # type: ignore - if text.strip(): - markdown_content.append(text) - # Extract images from the page. - images = page.get_images() - for img_ref in images: - image_counter += 1 - # Get image data. - xref = img_ref[0] - base_image = pdf_document.extract_image(xref) - image_bytes = base_image["image"] - image_ext = base_image["ext"] - # Save image. - image_filename = f"figure_{image_counter:03d}.{image_ext}" - image_path = os.path.join(output_dir, image_filename) - with open(image_path, "wb") as f: - f.write(image_bytes) - _LOG.debug("Saved image: %s", image_path) - # Add markdown reference. - markdown_content.append( - f"\n![Figure {image_counter}]({fig_dir_name}/{image_filename})\n" - ) - pdf_document.close() - _LOG.info("Extracted %d images from PDF", image_counter) - return "\n".join(markdown_content) - - -def _clean_markdown(content: str) -> str: - """ - Clean up extracted markdown content. - - :param content: raw markdown content - :return: cleaned markdown content - """ - # Remove excessive blank lines. - lines = content.split("\n") - cleaned_lines = [] - prev_blank = False - for line in lines: - is_blank = line.strip() == "" - if is_blank and prev_blank: - continue - cleaned_lines.append(line) - prev_blank = is_blank - return "\n".join(cleaned_lines).strip() - - -# ############################################################################# -# CLI -# ############################################################################# - - -def _parse() -> argparse.ArgumentParser: - parser = argparse.ArgumentParser( - description=__doc__, - formatter_class=argparse.RawDescriptionHelpFormatter, - ) - parser.add_argument( - "--input", - "-i", - action="store", - required=True, - type=str, - help="The PDF file to convert to Markdown", - ) - parser.add_argument( - "--output", - "-o", - action="store", - required=True, - type=str, - help="The output Markdown file", - ) - hparser.add_verbosity_arg(parser) - return parser - - -def _main(parser: argparse.ArgumentParser) -> None: - args = parser.parse_args() - hdbg.init_logger(verbosity=args.log_level, use_exec_path=True) - pdf_file = args.input - md_file = args.output - # Validate input file exists. - hdbg.dassert_file_exists( - pdf_file, "Input PDF file does not exist:", pdf_file - ) - # Create the folder for the figures. - md_file_figs = md_file.replace(".md", ".figs") - _LOG.info("Creating figures directory: %s", md_file_figs) - # Extract text and images. - _LOG.info("Extracting text and images from PDF...") - markdown_content = _extract_text_and_images(pdf_file, md_file_figs) - # Clean up markdown. - _LOG.info("Cleaning up markdown content...") - markdown_content = _clean_markdown(markdown_content) - # Write markdown to file. - _LOG.info("Writing markdown to: %s", md_file) - hio.to_file(md_file, markdown_content) - _LOG.info( - "Successfully converted '%s' to '%s'", - pdf_file, - md_file, - ) - _LOG.info("Figures saved to: %s", md_file_figs) - - -if __name__ == "__main__": - _main(_parse()) diff --git a/dev_scripts_helpers/documentation/convert_pdf_to_md.sh b/dev_scripts_helpers/documentation/convert_pdf_to_md.sh deleted file mode 100755 index 338a54b7c..000000000 --- a/dev_scripts_helpers/documentation/convert_pdf_to_md.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/bin/bash -xe -EXEC='/Applications/calibre.app/Contents/MacOS/ebook-convert' -SRC_FILE='/Users/saggese/Library/CloudStorage/GoogleDrive-saggese@gmail.com/My Drive/books/Math - Bayesian methods/2023 - Facure - Causal Inference in Python_ Applying Causal Inference in the Tech Industry.pdf' -DST_FILE1="paper.epub" -DST_FILE2="paper.md" - -"$EXEC" "$SRC_FILE" "$DST_FILE1" --enable-heuristics --chapter "//*[name()='h1' or name()='h2']" - -pandoc $DST_FILE1 \ - --to gfm \ - --wrap=none \ - --extract-media=images \ - -o $DST_FILE2 diff --git a/dev_scripts_helpers/documentation/epub_to_md.sh b/dev_scripts_helpers/documentation/epub_to_md.sh new file mode 100644 index 000000000..5cc83b145 --- /dev/null +++ b/dev_scripts_helpers/documentation/epub_to_md.sh @@ -0,0 +1,5 @@ +pandoc facure.epub \ + --to=gfm \ + --wrap=none \ + --extract-media=images \ + -o output.md diff --git a/dev_scripts_helpers/documentation/latex_abbrevs.sty b/dev_scripts_helpers/documentation/latex_abbrevs.sty index 23ee6c94f..04fd368a7 100644 --- a/dev_scripts_helpers/documentation/latex_abbrevs.sty +++ b/dev_scripts_helpers/documentation/latex_abbrevs.sty @@ -1,5 +1,9 @@ % From https://tex.stackexchange.com/questions/75667/change-colour-on-chapter-section-headings-lyx \usepackage{xcolor} +\definecolor{darkgreen}{rgb}{0,0.5,0} + +\usepackage{cancel} + %\usepackage{sectsty} %\chapterfont{\color{blue}} %\sectionfont{\color{red}} @@ -18,17 +22,39 @@ % Work-around for too deeply nested error. \usepackage{enumitem} \setlistdepth{9} -\setlist[itemize,1]{label=\textbullet} +\setlist[itemize,1]{label=\textendash} \setlist[itemize,2]{label=\textendash} -\setlist[itemize,3]{label=\textasteriskcentered} -\setlist[itemize,4]{label=\textperiodcentered} -\setlist[itemize,5]{label=\textbullet} -\setlist[itemize,6]{label=\textbullet} -\setlist[itemize,7]{label=\textbullet} -\setlist[itemize,8]{label=\textbullet} -\setlist[itemize,9]{label=\textbullet} +\setlist[itemize,3]{label=\textendash} +\setlist[itemize,4]{label=\textendash} +\setlist[itemize,5]{label=\textendash} +\setlist[itemize,6]{label=\textendash} +\setlist[itemize,7]{label=\textendash} +\setlist[itemize,8]{label=\textendash} +\setlist[itemize,9]{label=\textendash} \renewlist{itemize}{itemize}{9} +% \setlist[itemize,1]{label=\textbullet} +% \setlist[itemize,2]{label=\textendash} +% \setlist[itemize,3]{label=\textasteriskcentered} +% \setlist[itemize,4]{label=\textperiodcentered} +% setlist[itemize,5]{label=\textbullet} +% \setlist[itemize,6]{label=\textbullet} +% \setlist[itemize,7]{label=\textbullet} +% \setlist[itemize,8]{label=\textbullet} +% \setlist[itemize,9]{label=\textbullet} + +% Configure enumerate lists to prevent infinite recursion in Beamer themes. +\setlist[enumerate,1]{label=\arabic*.,ref=\arabic*} +\setlist[enumerate,2]{label=\alph*.,ref=\theenumi.\alph*} +\setlist[enumerate,3]{label=\roman*.,ref=\theenumi.\theenumii.\roman*} +\setlist[enumerate,4]{label=\Alph*.,ref=\theenumi.\theenumii.\theenumiii.\Alph*} +\setlist[enumerate,5]{label=\Roman*.,ref=\theenumi.\theenumii.\theenumiii.\theenumiv.\Roman*} +\setlist[enumerate,6]{label=\arabic*.,ref=\theenumi.\theenumii.\theenumiii.\theenumiv.\theenumv.\arabic*} +\setlist[enumerate,7]{label=\alph*.,ref=\theenumi.\theenumii.\theenumiii.\theenumiv.\theenumv.\theenumvi.\alph*} +\setlist[enumerate,8]{label=\roman*.,ref=\theenumi.\theenumii.\theenumiii.\theenumiv.\theenumv.\theenumvi.\theenumvii.\roman*} +\setlist[enumerate,9]{label=\arabic*.,ref=\theenumi.\theenumii.\theenumiii.\theenumiv.\theenumv.\theenumvi.\theenumvii.\theenumviii.\arabic*} +\renewlist{enumerate}{enumerate}{9} + % ############################################################################# % Vector / matrix notation % ############################################################################# diff --git a/dev_scripts_helpers/documentation/pandoc.latex b/dev_scripts_helpers/documentation/pandoc.latex index 67496ed4b..c144bf1d4 100644 --- a/dev_scripts_helpers/documentation/pandoc.latex +++ b/dev_scripts_helpers/documentation/pandoc.latex @@ -22,6 +22,7 @@ $endif$ \usepackage{amsfonts} % This creates problems. %\usepackage{dot2texi} +\usepackage{xcolor} \usepackage{tikz} \usepackage[pdf]{graphviz} \usetikzlibrary{shapes,arrows} diff --git a/dev_scripts_helpers/documentation/pdf_to_md.py b/dev_scripts_helpers/documentation/pdf_to_md.py index e6faf4392..8eac79a50 100644 --- a/dev_scripts_helpers/documentation/pdf_to_md.py +++ b/dev_scripts_helpers/documentation/pdf_to_md.py @@ -13,16 +13,16 @@ Automatically installs dependencies via `uv` if missing. Usage: - # Convert PDF to markdown with images. - uv run ./helpers_root/dev_scripts_helpers/documentation/pdf_to_md.py \ - --input document.pdf \ - --output output_dir - - # With verbose logging. - uv run ./helpers_root/dev_scripts_helpers/documentation/pdf_to_md.py \ - --input document.pdf \ - --output output_dir \ - -v DEBUG +# Convert PDF to markdown with images. +> pdf_to_md.py \ + --input document.pdf \ + --output output_dir + +# With verbose logging. +> pdf_to_md.py \ + --input document.pdf \ + --output output_dir \ + -v DEBUG Import as: diff --git a/dev_scripts_helpers/scraping_script/extract_hn_article.py b/dev_scripts_helpers/scraping_script/process_hn_article.py similarity index 98% rename from dev_scripts_helpers/scraping_script/extract_hn_article.py rename to dev_scripts_helpers/scraping_script/process_hn_article.py index 75841c9d5..74f8c4796 100755 --- a/dev_scripts_helpers/scraping_script/extract_hn_article.py +++ b/dev_scripts_helpers/scraping_script/process_hn_article.py @@ -5,10 +5,12 @@ This script processes Hacker News item URLs from CSV files and uses the Firebase API to extract selected fields: -- The submission title (--extract_title) -- The original article URL that the submission links to (--extract_url) -- The submission timestamp converted to date format YYYY-MM-DD in UTC (--extract_timestamp) -- Optionally, classify articles into categories using LLM (--tag_articles, requires --extract_title) +- `--extract_title`: The submission title +- `--extract_url`: The original article URL that the submission links +- `--extract_timestamp`: The submission timestamp converted to date format + YYYY-MM-DD in UTC +- `--tag_articles`: classify articles into categories using LLM (requires + --extract_title) All extraction options are opt-in and must be explicitly enabled. diff --git a/dev_scripts_helpers/slides/README.md b/dev_scripts_helpers/slides/README.md index 007a65ce5..5267fa54d 100644 --- a/dev_scripts_helpers/slides/README.md +++ b/dev_scripts_helpers/slides/README.md @@ -1,50 +1,29 @@ # Summary +This directory contains tools for processing lecture slides and generating +images using AI services. -This directory contains tools for processing lecture slides and generating images -using AI services. - -# Structure of the Dir - -- No subdirectories in this directory - -# Description of Files - -- `extract_png_from_pdf.py` - - Extracts PNG images from PDF files with one image per page using sequential - numbering -- `generate_book_chapter.py` - - Generates book chapter from markdown with PNG or PDF, YAML preamble with - title, and centered headers -- `header-style.tex` - - LaTeX header customization file for pandoc PDF conversion with styled - section headers -- `generate_class_images.py` - - Generates multiple images using OpenAI's DALL-E API from text prompts with - quality options -- `generate_slide_script.py` - - Generates presentation scripts from markdown slides using LLM processing -- `process_lessons.py` - - Orchestrates generation of PDF slides, scripts, and book chapters with - pattern and range support -- `process_slides.py` - - Processes markdown slides using LLM prompts for transformation and quality - checks -- `slides_utils.py` - - Utility functions for extracting slides from markdown and processing slide - images # Description of Executables +| Script | Description | +| :------------------------- | :----------------------------------------------------------------- | +| `extract_png_from_pdf.py` | Extracts PNG images from PDF files with sequential numbering | +| `generate_book_chapter.py` | Generates book chapters from markdown and PDF/PNG images | +| `generate_class_images.py` | Generates images using OpenAI's DALL-E API from text prompts | +| `generate_slide_script.py` | Generates presentation scripts from markdown slides using LLM | +| `header-style.tex` | LaTeX header customization file for pandoc PDF conversion | +| `process_lessons.py` | Orchestrates PDF, script, and book chapter generation for lectures | +| `process_slides.py` | Processes markdown slides with LLM transformations and validations | +| `slides_utils.py` | Utility functions for extracting and processing slide content | ## `extract_png_from_pdf.py` - -### What It Does +**What it does**: - Extracts each page of a PDF file as a separate PNG image - Numbers output files sequentially (slides001.png, slides002.png, etc.) - Supports customizable DPI for image quality control - Creates output directory automatically with optional from-scratch mode -### Examples +**Examples**: - Extract all pages from a PDF with default settings: ```bash @@ -67,20 +46,19 @@ using AI services. ``` ## `generate_book_chapter.py` - -### What It Does +**What it does**: - Processes markdown slides with PNG images or PDF file to create book chapter format -- Extracts title from markdown file (e.g., from `\text{\blue{Lesson 2.1: - Git}}`) and adds YAML preamble for pandoc metadata +- Extracts title from markdown file (e.g., from `\text{\blue{Lesson 2.1: Git}}`) + and adds YAML preamble for pandoc metadata - Extracts PNG images from PDF automatically when --input_pdf_file is provided - Validates that the number of slides in markdown matches the number of PNG files (expects num_slides + 1 = num_pngs to account for title slide) - Properly aligns title slide (first PNG) with content slides (remaining PNGs) to ensure header, slide image, and commentary are synchronized -- First slide (PNG 1) is treated as title slide with only the image (no title - or commentary) +- First slide (PNG 1) is treated as title slide with only the image (no title or + commentary) - Content slides (PNG 2+) are paired with corresponding markdown slides, with centered headers formatted as "idx / tot: title" and LLM-based commentary - Supports optional page breaks via --add_new_page flag to insert `\newpage` @@ -91,7 +69,7 @@ using AI services. - Creates markdown output with PNG references and detailed commentary for each slide -### Examples +**Examples**: - Generate book chapter from markdown and PNG directory: ```bash @@ -123,7 +101,7 @@ using AI services. > ./generate_book_chapter.py --input_file lecture.txt --input_pdf_file lecture.pdf --output_dir ./book_chapters/ --add_new_page ``` -### Converting to PDF with pandoc +**Converting to PDF with pandoc**: After generating the book chapter markdown, convert it to PDF using pandoc with custom header styling: @@ -139,15 +117,14 @@ custom header styling: ``` ## `generate_class_images.py` - -### What It Does +**What it does**: - Generates multiple images using OpenAI's DALL-E 3 API from text prompts - Supports both standard and HD quality image generation in 1024x1024 resolution - Includes special workload mode for generating predefined image sets for course materials -### Examples +**Examples**: - Generate 5 HD quality images from a prompt: ```bash @@ -170,14 +147,13 @@ custom header styling: ``` ## `generate_slide_script.py` - -### What It Does +**What it does**: - Processes markdown slides and generates presentation scripts using LLM - Groups slides for batch processing to optimize LLM API calls - Supports limiting slide ranges and customizable grouping strategies -### Examples +**Examples**: - Generate script from markdown slides with default settings: ```bash @@ -200,16 +176,15 @@ custom header styling: ``` ## `process_lessons.py` - -### What It Does +**What it does**: Orchestrates the generation of multiple outputs from lecture source files for educational materials. This is the main entry point for processing lecture content into various formats. -**Key Features:** +**Key features**: -- Converts lecture text source files to PDF slides using notes_to_pdf.py +- Converts lecture text source files to PDF slides using `notes_to_pdf.py` - Generates reading scripts from lecture materials with transition text - Applies LLM-based transformations for slide reduction and quality checking - Generates book chapters from lecture content @@ -217,7 +192,7 @@ content into various formats. - Provides slide range limiting for focused processing - Includes dry-run mode for previewing commands -**Supported Actions:** +**Supported actions**: - `generate_pdf`: Generate presentation slides from text source files - `generate_script`: Generate instructor reading scripts with commentary @@ -225,15 +200,15 @@ content into various formats. - `check_slide`: Apply LLM validation to check slide quality - `improve_slide`: Apply LLM transformation to improve slide content - `book_chapter`: Generate book chapter PDF from lecture content -- `generate_class_quizzes`: Generate multiple choice quizzes from lecture content - using LLM +- `generate_class_quizzes`: Generate multiple choice quizzes from lecture + content using LLM - `generate_class_recap`: Generate open-ended discussion/review questions from lecture content using LLM -**Workflow:** +**Workflow**: -1. Parse lecture patterns or ranges from command line arguments (e.g., '01\*', - '01.1', '01\*:03\*', '01.1-03.2') +1. Parse lecture patterns or ranges from command line arguments (e.g., '01*', + '01.1', '01*:03\*', '01.1-03.2') 2. Find matching lecture source files in `/lectures_source/` directory 3. For each matching file, execute specified actions in sequence 4. Output generated files to appropriate directories: @@ -243,11 +218,11 @@ content into various formats. - Multiple choice quizzes → `/lectures_quizzes/` - Discussion/recap questions → `/lectures_recap/` -**Command Line Arguments:** +**Command line arguments**: - `--lectures`: Lecture(s) to process (required) - Single pattern: '01.1' or '01\*' - - Union of patterns (colon-separated): '01\*:02\*:03.1' + - Union of patterns (colon-separated): '01*:02*:03.1' - Continuous range (hyphen-separated): '01.1-03.2' (inclusive) - Note: Range and union syntax cannot be mixed - `--class`: Class directory name (required, choices: data605, msml610) @@ -258,7 +233,7 @@ content into various formats. - `--dry_run`: Print commands without executing them - `-v/--log_level`: Set logging verbosity (DEBUG, INFO, WARNING, ERROR) -**Dependencies:** +**Dependencies**: - `notes_to_pdf.py`: Converts text source to PDF slides - `generate_slide_script.py`: Creates instructor scripts @@ -267,7 +242,7 @@ content into various formats. - `class_scripts/gen_quizzes.py`: Generates quizzes from lecture content - `lint_txt.py`: Lints generated text files -### Examples +**Examples**: - Generate PDF slides for all lectures in lesson 01: ```bash @@ -335,15 +310,14 @@ content into various formats. ``` ## `process_slides.py` - -### What It Does +**What it does**: - Extracts individual slides from markdown files and processes each with LLM prompts - Supports various actions like slide reduction, text checking, and improvement - Provides parallel processing with incremental execution and error recovery -### Examples +**Examples**: - Process slides with LLM transformation: ```bash diff --git a/dev_scripts_helpers/slides/process_lessons.py b/dev_scripts_helpers/slides/process_lessons.py index 34e0c35c5..ed934ce66 100755 --- a/dev_scripts_helpers/slides/process_lessons.py +++ b/dev_scripts_helpers/slides/process_lessons.py @@ -30,15 +30,15 @@ _LOG = logging.getLogger(__name__) _VALID_ACTIONS = [ - "generate_pdf", - "generate_tex", - "generate_script", - "reduce_slide", "check_slide", - "improve_slide", - "book_chapter", + "generate_book_chapter", "generate_class_quizzes", "generate_class_recap", + "generate_pdf", + "generate_script", + "generate_tex", + "improve_slide", + "reduce_slide", ] _DEFAULT_ACTIONS = ["generate_pdf"] @@ -497,7 +497,7 @@ def _process_lecture_file( elif action == "improve_slide": # TODO: Implement _slide_improve function. hdbg.dfatal("improve_slide action not yet implemented") - elif action == "book_chapter": + elif action == "generate_book_chapter": _generate_book_chapter(class_dir, source_path, source_name) elif action == "generate_class_quizzes": _generate_class_quizzes(class_dir, source_path, source_name) diff --git a/helpers/hgit.py b/helpers/hgit.py index 9644d0dfb..c7b1effa4 100644 --- a/helpers/hgit.py +++ b/helpers/hgit.py @@ -202,31 +202,39 @@ def find_git_root(path: str = ".") -> str: lines = txt.split("\n") for line in lines: # Look for a `gitdir:` line that specifies the linked directory. - # Example: `gitdir: ../.git/modules/helpers_root`. + # Example: `gitdir: ../.git/modules/helpers_root` (submodule) + # or `gitdir: /path/to/.git/worktrees/name` (worktree). if line.startswith("gitdir:"): git_dir_path = line.split(":", 1)[1].strip() _LOG.debug("git_dir_path=%s", git_dir_path) - # Resolve the relative path to the absolute path of the Git directory. - abs_git_dir = os.path.abspath( - os.path.join(path, git_dir_path) - ) - # Traverse up to find the top-level `.git` directory. - while True: - # Check if the current directory is a `.git` directory. - if os.path.basename(abs_git_dir) == ".git": - git_root_dir = os.path.dirname(abs_git_dir) - # Found the root. - break - # Move one level up in the directory structure. - parent = os.path.dirname(abs_git_dir) - # Reached the filesystem root without finding the `.git` directory. - hdbg.dassert_ne( - parent, - abs_git_dir, - "Top-level .git directory not found.", + # For worktrees, the current path is the root of the worktree. + # The worktree's `.git` file points to the shared git directory + # (e.g., main_repo/.git/worktrees/worktree_name). + if ".git/worktrees/" in git_dir_path: + git_root_dir = path + else: + # For other linked setups (submodules, custom .git directory), + # traverse up to find the root of the target repository. + abs_git_dir = os.path.abspath( + os.path.join(path, git_dir_path) ) - # Continue traversing up. - abs_git_dir = parent + # Traverse up to find the top-level `.git` directory. + while True: + # Check if the current directory is a `.git` directory. + if os.path.basename(abs_git_dir) == ".git": + git_root_dir = os.path.dirname(abs_git_dir) + # Found the root. + break + # Move one level up in the directory structure. + parent = os.path.dirname(abs_git_dir) + # Reached the filesystem root without finding the `.git` directory. + hdbg.dassert_ne( + parent, + abs_git_dir, + "Top-level .git directory not found.", + ) + # Continue traversing up. + abs_git_dir = parent break # Exit the loop if the Git root directory is found. if git_root_dir is not None: diff --git a/helpers/test/test_hgit.py b/helpers/test/test_hgit.py index c328e629b..7a3a40ba6 100644 --- a/helpers/test/test_hgit.py +++ b/helpers/test/test_hgit.py @@ -768,3 +768,57 @@ def test2(self) -> None: Top-level .git directory not found. """ self.assert_equal(actual, expected, purify_text=True, fuzzy_match=True) + + +# ############################################################################# +# Test_find_git_root6 +# ############################################################################# + + +class Test_find_git_root6(hunitest.TestCase): + """ + Check that the function returns the correct git root if: + - the repo is a worktree + + Directory structure: + main_repo/ + `-- .git/ + |-- config + `-- worktrees/ + `-- csfy2/ + |-- HEAD + `-- config + csfy2/ (worktree) + `-- .git (points to /main_repo/.git/worktrees/csfy2) + """ + + def set_up_test(self) -> None: + temp_dir = self.get_scratch_space() + # Create main repo with a .git directory. + self.main_repo_dir = os.path.join(temp_dir, "main_repo") + hio.create_dir(self.main_repo_dir, incremental=False) + self.git_dir = os.path.join(self.main_repo_dir, ".git") + hio.create_dir(self.git_dir, incremental=False) + # Create worktree git metadata directory. + self.worktree_git_dir = os.path.join( + self.git_dir, "worktrees", "csfy2" + ) + hio.create_dir(self.worktree_git_dir, incremental=False) + # Create worktree directory. + self.worktree_dir = os.path.join(temp_dir, "csfy2") + hio.create_dir(self.worktree_dir, incremental=False) + # Create pointer from worktree to the git directory. + worktree_git_file = os.path.join(self.worktree_dir, ".git") + txt = f"gitdir: {self.worktree_git_dir}\n" + hio.to_file(worktree_git_file, txt) + + def test1(self) -> None: + """ + Check that the function returns the worktree root when called from a worktree. + """ + self.set_up_test() + with hsystem.cd(self.worktree_dir): + git_root = hgit.find_git_root(".") + # For worktrees, the function should return the worktree root, + # not the main repository root. + self.assert_equal(git_root, self.worktree_dir)