Skip to content

Data cleaning tool#15

Open
sharananurag998 wants to merge 1 commit into
mainfrom
vk/6a29-data-clean
Open

Data cleaning tool#15
sharananurag998 wants to merge 1 commit into
mainfrom
vk/6a29-data-clean

Conversation

@sharananurag998
Copy link
Copy Markdown
Collaborator

dataCleaningTool.ts (src/tools/dataCleaningTool.ts:1-240):

  • Email canonicalization with Gmail-specific handling (dots, plus addresses)
  • Phone number normalization supporting international formats
  • Whitespace trimming and normalization
  • CSV data cleaning with deduplication and header normalization
  • Batch processing capabilities

Example output successfully demonstrates:

  • Phone normalization: " +91-98765 43210 ""+919876543210"
  • Email canonicalization: Gmail addresses properly cleaned
  • CSV deduplication: Removes duplicate rows based on specified columns
  • Batch processing with error handling

The tool integrates seamlessly with LangChain agents and provides detailed metadata about cleaning operations.

**dataCleaningTool.ts** (src/tools/dataCleaningTool.ts:1-240):
- Email canonicalization with Gmail-specific handling (dots, plus addresses)
- Phone number normalization supporting international formats
- Whitespace trimming and normalization
- CSV data cleaning with deduplication and header normalization
- Batch processing capabilities

**Example output** successfully demonstrates:
- Phone normalization: `" +91-98765 43210 "` → `"+919876543210"`
- Email canonicalization: Gmail addresses properly cleaned
- CSV deduplication: Removes duplicate rows based on specified columns
- Batch processing with error handling

The tool integrates seamlessly with LangChain agents and provides detailed metadata about cleaning operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants