Overview Sidikjari is a Python-based metadata extraction and analysis tool designed for cybersecurity professionals, penetration testers, digital forensic analysts, and security researchers. It systematically crawls websites to discover documents, extracts comprehensive metadata from various file types, and generates detailed interactive reports that highlight potential security risks, information leakage, and organizational relationships.
Sidikjari helps identify sensitive information unintentionally exposed in document metadata that could be leveraged during security assessments, investigations, or by malicious actors.
Key Features
- Advanced Website Crawling: Recursively crawls websites to discover documents with configurable depth and threading
- Form Discovery and Analysis: Identifies and captures sensitive web forms, including login, registration, and contact forms
- Intelligent Document Handling: Automatically downloads and analyzes discovered documents
- Comprehensive Metadata Extraction: Extracts detailed metadata from multiple document types:
- PDF files (author information, creation/modification dates, software used)
- Microsoft Office documents (DOCX, XLSX, PPTX)
- Image files with EXIF data (JPG, JPEG, PNG, GIF)
- CSV files and structured data
- SSL Certificate Analysis: Evaluates security of target's SSL implementation
- Website Screenshot Capture: Takes full-page screenshots of target websites
- Domain Intelligence Collection: Gathers WHOIS data, DNS records, and domain status information
- IP Address Analysis: Collects detailed information about discovered IP addresses, including geolocation, ASN data, and reverse DNS
- GPS Data Extraction and Mapping: Creates interactive maps for documents containing geolocation data
- Entity Relationship Analysis: Builds interactive visualization graphs showing connections between users, domains, emails, and IP addresses
- Multi-Threaded Performance: Concurrent operations for significantly faster analysis
- Interactive HTML Reporting: Generates comprehensive reports with collapsible sections, interactive visualizations, and detailed findings
- Local File Analysis Mode: Can analyze directories of local files instead of crawling websites
- Customizable Crawling Parameters: Configurable user-agent types and request throttling
- Extensive Logging: Maintains detailed logs of all actions for audit purposes
Advanced Data Extraction Capabilities
- Extracts user/author information across documents to identify organizational structure
- Discovers email addresses and internal domains from document content
- Identifies internal file paths that may reveal system architecture
- Detects software versions for potential vulnerability assessment
- Extracts device information from document metadata
- Maps relationships between entities to reveal organizational connections
- Collects temporal data to establish document timelines
Requirements
- Python 3.6+
- ExifTool (must be installed and available in PATH)
- wkhtmltoimage (for screenshot capabilities)
- Various Python libraries (including requests, BeautifulSoup, magic, PyPDF2, PIL, docx, openpyxl, etc.)
Practical Applications
- Security assessments and penetration testing
- Digital forensic investigations
- OSINT (Open Source Intelligence) gathering
- Security vulnerability assessments
- Document metadata security auditing
- Organizational relationship mapping
- Clone the repository:
git clone https://github.com/sec0ps/sidikjari.git
cd sidikjari
- Install required Python dependencies:
pip install -r requirements.txt
sudo apt install wkhtmltopdf exiftool
python sidikjari.py --url example.com --output results --depth 2 --threads 10
python sidikjari.py --local /path/to/documents --output results --threads 10
--url,-u: Target URL to scan--output,-o: Output directory (default: "output")--depth,-d: Crawl depth (0=homepage only, 1=direct links, etc.)--threads,-t: Number of concurrent threads (default: 10)--local,-l: Local directory of files to analyze (instead of URL)--time-delay: Delay in seconds between web requests (default: 0.0)--user-agent: User agent type to use ("default", "chrome", "firefox", "safari", "edge", "mobile", or "random")
The HTML report generated by Sidikjari includes:
- Domain Information: Registration details, WHOIS data
- SSL Certificate Analysis: Validation, algorithms, expiration, security assessment
- Document Metadata: Comprehensive details extracted from each document
- Discovered Information: Users, email addresses, internal paths, software versions
- Interactive Maps: Visual representation of GPS coordinates found in documents
- Relationship Graphs: Interactive visualization of connections between discovered entities
- Website Screenshots: Visual capture of the target website
python sidikjari.py --url targetcompany.com
python sidikjari.py --url targetcompany.com --depth 3 --time-delay 0.5 --user-agent random
python sidikjari.py --local /path/to/client/documents --threads 20
For professional services, integrations, or support contact: operations@redcellsecurity.org
Author: Keith Pachulski
Company: Red Cell Security, LLC
Email: keith@redcellsecurity.org
Website: www.redcellsecurity.org
© 2025 Keith Pachulski. All rights reserved.
**This software is provided "as-is" under the MIT License. It should only be used for legitimate security assessments with proper authorization. The authors are not responsible for misuse or any damages resulting from the use of this software.
If you find my work useful and want to support continued development, you can donate here:
DISCLAIMER:
This software is provided "as-is," without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.
This project is a modern Python implementation inspired by the original FOCA tool developed by ElevenPaths.