|
| 1 | +# File Loading Example |
| 2 | + |
| 3 | +This example demonstrates loading markdown files into HTM's long-term memory with automatic chunking, YAML frontmatter extraction, and source tracking. |
| 4 | + |
| 5 | +**Source:** [`examples/file_loader_usage.rb`](https://github.com/madbomber/htm/blob/main/examples/file_loader_usage.rb) |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +The file loading example shows: |
| 10 | + |
| 11 | +- Loading single markdown files |
| 12 | +- Loading directories with glob patterns |
| 13 | +- YAML frontmatter extraction |
| 14 | +- Querying nodes from loaded files |
| 15 | +- Re-sync behavior for changed files |
| 16 | +- Unloading files from memory |
| 17 | + |
| 18 | +## Running the Example |
| 19 | + |
| 20 | +```bash |
| 21 | +export HTM_DATABASE__URL="postgresql://user@localhost:5432/htm_development" |
| 22 | +ruby examples/file_loader_usage.rb |
| 23 | +``` |
| 24 | + |
| 25 | +## Code Walkthrough |
| 26 | + |
| 27 | +### Loading a Single File |
| 28 | + |
| 29 | +```ruby |
| 30 | +htm = HTM.new(robot_name: "FileLoaderDemo") |
| 31 | + |
| 32 | +# Load a markdown file |
| 33 | +result = htm.load_file("docs/guide.md") |
| 34 | +# => { |
| 35 | +# file_source_id: 1, |
| 36 | +# chunks_created: 5, |
| 37 | +# chunks_updated: 0, |
| 38 | +# skipped: false |
| 39 | +# } |
| 40 | +``` |
| 41 | + |
| 42 | +### YAML Frontmatter |
| 43 | + |
| 44 | +Files with frontmatter have metadata extracted automatically: |
| 45 | + |
| 46 | +```markdown |
| 47 | +--- |
| 48 | +title: PostgreSQL Guide |
| 49 | +author: HTM Team |
| 50 | +tags: |
| 51 | + - database |
| 52 | + - postgresql |
| 53 | +--- |
| 54 | + |
| 55 | +# PostgreSQL Guide |
| 56 | + |
| 57 | +Content starts here... |
| 58 | +``` |
| 59 | + |
| 60 | +Access frontmatter via FileSource: |
| 61 | + |
| 62 | +```ruby |
| 63 | +source = HTM::Models::FileSource.find(result[:file_source_id]) |
| 64 | +source.title # => "PostgreSQL Guide" |
| 65 | +source.author # => "HTM Team" |
| 66 | +source.frontmatter_tags # => ["database", "postgresql"] |
| 67 | +source.frontmatter # => { "title" => "...", ... } |
| 68 | +``` |
| 69 | + |
| 70 | +### Loading a Directory |
| 71 | + |
| 72 | +```ruby |
| 73 | +# Load all markdown files |
| 74 | +results = htm.load_directory("docs/", pattern: "**/*.md") |
| 75 | +# => [ |
| 76 | +# { file_path: "docs/guide.md", chunks_created: 3, ... }, |
| 77 | +# { file_path: "docs/api.md", chunks_created: 5, ... } |
| 78 | +# ] |
| 79 | + |
| 80 | +# Load with specific pattern |
| 81 | +results = htm.load_directory("docs/guides/", pattern: "*.md") |
| 82 | +``` |
| 83 | + |
| 84 | +### Querying Loaded Files |
| 85 | + |
| 86 | +```ruby |
| 87 | +# Get all nodes from a specific file |
| 88 | +nodes = htm.nodes_from_file("docs/guide.md") |
| 89 | + |
| 90 | +nodes.each do |node| |
| 91 | + puts "#{node.id}: #{node.content[0..50]}..." |
| 92 | +end |
| 93 | +``` |
| 94 | + |
| 95 | +### Re-Sync Behavior |
| 96 | + |
| 97 | +HTM tracks file modification times for efficient updates: |
| 98 | + |
| 99 | +```ruby |
| 100 | +# First load - creates chunks |
| 101 | +htm.load_file("docs/guide.md") |
| 102 | +# => { skipped: false, chunks_created: 5 } |
| 103 | + |
| 104 | +# Second load - skipped (unchanged) |
| 105 | +htm.load_file("docs/guide.md") |
| 106 | +# => { skipped: true } |
| 107 | + |
| 108 | +# After editing file - re-syncs |
| 109 | +htm.load_file("docs/guide.md") |
| 110 | +# => { skipped: false, chunks_updated: 2, chunks_created: 1 } |
| 111 | + |
| 112 | +# Force reload |
| 113 | +htm.load_file("docs/guide.md", force: true) |
| 114 | +``` |
| 115 | + |
| 116 | +### Unloading Files |
| 117 | + |
| 118 | +```ruby |
| 119 | +# Soft delete all chunks from a file |
| 120 | +count = htm.unload_file("docs/guide.md") |
| 121 | +puts "Removed #{count} chunks" |
| 122 | +``` |
| 123 | + |
| 124 | +## Chunking Configuration |
| 125 | + |
| 126 | +```ruby |
| 127 | +HTM.configure do |config| |
| 128 | + config.chunk_size = 1024 # Characters per chunk (default) |
| 129 | + config.chunk_overlap = 64 # Overlap between chunks (default) |
| 130 | +end |
| 131 | +``` |
| 132 | + |
| 133 | +Or via environment variables: |
| 134 | + |
| 135 | +```bash |
| 136 | +export HTM_CHUNK_SIZE=512 |
| 137 | +export HTM_CHUNK_OVERLAP=50 |
| 138 | +``` |
| 139 | + |
| 140 | +## Expected Output |
| 141 | + |
| 142 | +``` |
| 143 | +HTM File Loader Example |
| 144 | +============================================================ |
| 145 | +
|
| 146 | +1. Configuring HTM with Ollama provider... |
| 147 | + Configured with Ollama provider |
| 148 | +
|
| 149 | +2. Initializing HTM... |
| 150 | + Robot: FileLoaderDemo (ID: 1) |
| 151 | +
|
| 152 | +3. Creating sample markdown files... |
| 153 | + Created: /tmp/htm_demo/postgresql_guide.md |
| 154 | + Created: /tmp/htm_demo/ruby_intro.md |
| 155 | +
|
| 156 | +4. Loading single file with frontmatter... |
| 157 | + File: postgresql_guide.md |
| 158 | + Source ID: 1 |
| 159 | + Chunks created: 3 |
| 160 | + Frontmatter title: PostgreSQL Guide |
| 161 | + Frontmatter author: HTM Team |
| 162 | + Frontmatter tags: database, postgresql |
| 163 | +
|
| 164 | +5. Loading directory... |
| 165 | + Files processed: 2 |
| 166 | + - postgresql_guide.md: skipped |
| 167 | + - ruby_intro.md: 2 chunks |
| 168 | +
|
| 169 | +... |
| 170 | +
|
| 171 | +============================================================ |
| 172 | +Example completed successfully! |
| 173 | +``` |
| 174 | + |
| 175 | +## Rake Tasks |
| 176 | + |
| 177 | +```bash |
| 178 | +# Load a single file |
| 179 | +rake 'htm:files:load[docs/guide.md]' |
| 180 | + |
| 181 | +# Load directory |
| 182 | +rake 'htm:files:load_dir[docs/]' |
| 183 | +rake 'htm:files:load_dir[docs/,**/*.md]' |
| 184 | + |
| 185 | +# List loaded files |
| 186 | +rake htm:files:list |
| 187 | + |
| 188 | +# Show file details |
| 189 | +rake 'htm:files:info[docs/guide.md]' |
| 190 | + |
| 191 | +# Unload a file |
| 192 | +rake 'htm:files:unload[docs/guide.md]' |
| 193 | + |
| 194 | +# Sync all files |
| 195 | +rake htm:files:sync |
| 196 | + |
| 197 | +# Show statistics |
| 198 | +rake htm:files:stats |
| 199 | + |
| 200 | +# Force reload |
| 201 | +FORCE=true rake 'htm:files:load[docs/guide.md]' |
| 202 | +``` |
| 203 | + |
| 204 | +## See Also |
| 205 | + |
| 206 | +- [File Loading Guide](../guides/file-loading.md) |
| 207 | +- [Basic Usage Example](basic-usage.md) |
| 208 | +- [Markdown Chunking](../guides/file-loading.md#chunking-strategy) |
0 commit comments