Skip to content

Add sanitizeInput() and improve normalizeStreet() for edge cases#24

Merged
NickHamby merged 2 commits into
mainfrom
copilot/add-string-sanitization-function
Apr 6, 2026
Merged

Add sanitizeInput() and improve normalizeStreet() for edge cases#24
NickHamby merged 2 commits into
mainfrom
copilot/add-string-sanitization-function

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

Adds a sanitizeInput() function for cleaning raw user input before Nominatim queries, and fixes normalizeStreet() to correctly handle ZIP codes, positional directionals, and address fractions.

New: web/js/sanitize.js

Strips parentheticals, removes query-breaking characters, collapses whitespace — preserves commas (Nominatim separators), hyphens, and slashes:

sanitizeInput('123 Main St. (near park)!') // → '123 Main St near park'

web/js/hazards.jsnormalizeStreet() rewrite

  • Splits ABBR_MAP into STREET_TYPE_ABBRS + DIRECTIONAL_EXPAND — old flat map expanded E/W/N/S mid-string (e.g. inside "New" or "West"), causing false mismatches
  • Strips trailing ZIP codes before expansion ("8 1/2 W Canal St, 23220""west canal street")
  • Expands directionals only at string prefix/suffix, not mid-token

web/js/app.js

  • attachAutocomplete() and run(): replaced .value.trim() with sanitizeInput(inputEl.value)

web/index.html

  • Added <script src="js/sanitize.js"></script> as the first script tag so sanitizeInput is available to all subsequent scripts
Original prompt

Implement the string sanitization plan as described in docs/string-sanitization-plan.md. This adds a new sanitizeInput() function and improves normalizeStreet() to handle ZIP codes, positional directionals, and other edge cases.

Changes required

1. Create web/js/sanitize.js (new file)

// sanitize.js — sanitizes raw user input before geocoding or internal matching

/**
 * Sanitizes a raw user-supplied address string for use as a Nominatim query.
 *
 * - Trims leading/trailing whitespace
 * - Collapses multiple spaces into one
 * - Strips parenthetical context (e.g. "(near the park)")
 * - Removes characters that break queries: # " ' . ; : ! ? @ ^ * [ ] { } | \ ~ ` = + < > % & _
 * - Preserves commas (Nominatim field separators), hyphens (address ranges), slashes (fractions)
 * - Does NOT expand abbreviations (leave that to normalizeStreet in hazards.js)
 * - Does NOT strip ZIP codes (Nominatim handles them; strip only inside normalizeStreet)
 * - Must never be called on coordinate (lat/lng) values
 *
 * @param {string} str  Raw input string
 * @returns {string}    Sanitized string safe for Nominatim queries
 */
function sanitizeInput(str) {
  if (typeof str !== 'string') return '';
  let s = str;
  // Remove parenthetical content
  s = s.replace(/\([^)]*\)/g, '');
  // Keep: letters, digits, spaces, commas, hyphens, forward slashes
  s = s.replace(/[^a-zA-Z0-9\s,\-\/]/g, '');
  // Collapse multiple spaces
  s = s.replace(/\s{2,}/g, ' ');
  return s.trim();
}

2. Update web/js/hazards.js

Replace the existing ABBR_MAP constant and normalizeStreet() function with the following improved version that handles ZIP codes and positional directionals:

const STREET_TYPE_ABBRS = [
  [/\bSt\b/g, 'Street'],
  [/\bAve\b/g, 'Avenue'],
  [/\bBlvd\b/g, 'Boulevard'],
  [/\bDr\b/g, 'Drive'],
  [/\bRd\b/g, 'Road'],
  [/\bPkwy\b/g, 'Parkway'],
  [/\bLn\b/g, 'Lane'],
  [/\bCt\b/g, 'Court'],
];

const DIRECTIONAL_EXPAND = {
  W: 'West',
  E: 'East',
  N: 'North',
  S: 'South',
};

function normalizeStreet(str) {
  let s = str;

  // 1. Strip trailing ZIP code (5 digits, optionally preceded by comma and/or space)
  s = s.replace(/[,\s]+\d{5}\s*$/, '');

  // 2. Expand street type abbreviations
  for (const [pattern, replacement] of STREET_TYPE_ABBRS) {
    s = s.replace(pattern, replacement);
  }

  // 3. Expand directionals at start of string (prefix directional)
  s = s.replace(/^(W|E|N|S)\b\s*/, (_, d) => DIRECTIONAL_EXPAND[d] + ' ');

  // 4. Expand directionals at end of string (suffix directional)
  s = s.replace(/\s+(W|E|N|S)$/, (_, d) => ' ' + DIRECTIONAL_EXPAND[d]);

  // 5. Strip leading house numbers (including fractional like "8 1/2")
  s = s.replace(/^\d+(\s+\d+\/\d+)?\s+/, '');

  // 6. Strip remaining punctuation
  s = s.replace(/[^a-zA-Z0-9\s]/g, '');

  // 7. Collapse whitespace and trim, lowercase
  s = s.replace(/\s{2,}/g, ' ');
  return s.trim().toLowerCase();
}

Do NOT change anything else in hazards.jsgetAllHazards(), getHazardsOnRoute(), and the coordinate bounding box logic from the previous PR are all unchanged.

3. Update web/js/app.js

In attachAutocomplete(): Replace inputEl.value.trim() with sanitizeInput(inputEl.value):

// Before:
const query = inputEl.value.trim();

// After:
const query = sanitizeInput(inputEl.value);

In run(): Replace the two .value.trim() calls with sanitizeInput():

// Before:
const origin = document.getElementById('origin').value.trim();
const destination = document.getElementById('destination').value.trim();

// After:
const origin = sanitizeInput(document.getElementById('origin').value);
const destination = sanitizeInput(document.getElementById('destination').value);

4. Update web/index.html

Add <script src="js/sanitize.js"></script> as the first script tag, before geocode.js:

<!-- Before: -->
<script src="js/geocode.js"></script>

<!-- After: -->
<script src="js/sanitize.js"></script>
<script src="js/geocode.js"></script>

5. Update web/js/geocode.js

Add Richmond bounding box to the Nominatim search URL (this was identified as a bug in a previous session — geocodeAddress currently has no bounding box, so addresses can resolve outside Richmond):

// Before:
const url = `https://nominatim.openstreetmap.org/search?format=json&q=${encodeURIComponent(address)}`;

// After:
const url = `https://nominatim.openstreetmap.org/search?format=json&bounded=1&viewbox=-77.6,37.7,-77.2,37.4&countrycodes=us&q=${encodeURIComponent(address)}`;

Also add countrycodes=us to the autocomplete URL in web/js/app.js (line with viewbox=-77.6,37.7,-77.2,37.4):

// Before:
const url = `https://nominatim.openstreetmap.org/search?format=json&addressdetails=0&limit=5&bounded=1&viewbox=-77.6,37.7,-77.2,37.4&q=${encodeURIComponent(query)}`;

// After:
const url = `https://nominatim.openstreetmap.org/search?format=json&addressdetails=0&limit=5&bound...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

Copilot AI changed the title [WIP] Add input sanitization and improve street normalization Add sanitizeInput() and improve normalizeStreet() for edge cases Apr 6, 2026
Copilot AI requested a review from NickHamby April 6, 2026 17:29
@NickHamby NickHamby marked this pull request as ready for review April 6, 2026 17:40
@NickHamby NickHamby merged commit c255904 into main Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants