Skip to content

Sidikjari is a powerful Python-based metadata extraction and analysis tool designed for cybersecurity professionals, penetration testers, and security researchers. It crawls websites to discover documents, extracts comprehensive metadata from various file types, and generates detailed reports highlighting potential security risks.

Notifications You must be signed in to change notification settings

sec0ps/sidikjari

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sidikjari: Advanced Metadata Extraction and Analysis Tool

Overview Sidikjari is a Python-based metadata extraction and analysis tool designed for cybersecurity professionals, penetration testers, digital forensic analysts, and security researchers. It systematically crawls websites to discover documents, extracts comprehensive metadata from various file types, and generates detailed interactive reports that highlight potential security risks, information leakage, and organizational relationships.

Sidikjari helps identify sensitive information unintentionally exposed in document metadata that could be leveraged during security assessments, investigations, or by malicious actors.

Key Features

  • Advanced Website Crawling: Recursively crawls websites to discover documents with configurable depth and threading
  • Form Discovery and Analysis: Identifies and captures sensitive web forms, including login, registration, and contact forms
  • Intelligent Document Handling: Automatically downloads and analyzes discovered documents
  • Comprehensive Metadata Extraction: Extracts detailed metadata from multiple document types:
    • PDF files (author information, creation/modification dates, software used)
    • Microsoft Office documents (DOCX, XLSX, PPTX)
    • Image files with EXIF data (JPG, JPEG, PNG, GIF)
    • CSV files and structured data
  • SSL Certificate Analysis: Evaluates security of target's SSL implementation
  • Website Screenshot Capture: Takes full-page screenshots of target websites
  • Domain Intelligence Collection: Gathers WHOIS data, DNS records, and domain status information
  • IP Address Analysis: Collects detailed information about discovered IP addresses, including geolocation, ASN data, and reverse DNS
  • GPS Data Extraction and Mapping: Creates interactive maps for documents containing geolocation data
  • Entity Relationship Analysis: Builds interactive visualization graphs showing connections between users, domains, emails, and IP addresses
  • Multi-Threaded Performance: Concurrent operations for significantly faster analysis
  • Interactive HTML Reporting: Generates comprehensive reports with collapsible sections, interactive visualizations, and detailed findings
  • Local File Analysis Mode: Can analyze directories of local files instead of crawling websites
  • Customizable Crawling Parameters: Configurable user-agent types and request throttling
  • Extensive Logging: Maintains detailed logs of all actions for audit purposes

Advanced Data Extraction Capabilities

  • Extracts user/author information across documents to identify organizational structure
  • Discovers email addresses and internal domains from document content
  • Identifies internal file paths that may reveal system architecture
  • Detects software versions for potential vulnerability assessment
  • Extracts device information from document metadata
  • Maps relationships between entities to reveal organizational connections
  • Collects temporal data to establish document timelines

Requirements

  • Python 3.6+
  • ExifTool (must be installed and available in PATH)
  • wkhtmltoimage (for screenshot capabilities)
  • Various Python libraries (including requests, BeautifulSoup, magic, PyPDF2, PIL, docx, openpyxl, etc.)

Practical Applications

  • Security assessments and penetration testing
  • Digital forensic investigations
  • OSINT (Open Source Intelligence) gathering
  • Security vulnerability assessments
  • Document metadata security auditing
  • Organizational relationship mapping

Installation

  1. Clone the repository:
git clone https://github.com/sec0ps/sidikjari.git
cd sidikjari
  1. Install required Python dependencies:
pip install -r requirements.txt
sudo apt install wkhtmltopdf exiftool

Usage

Analyze a Website

python sidikjari.py --url example.com --output results --depth 2 --threads 10

Analyze Local Files

python sidikjari.py --local /path/to/documents --output results --threads 10

Command Line Options

  • --url, -u: Target URL to scan
  • --output, -o: Output directory (default: "output")
  • --depth, -d: Crawl depth (0=homepage only, 1=direct links, etc.)
  • --threads, -t: Number of concurrent threads (default: 10)
  • --local, -l: Local directory of files to analyze (instead of URL)
  • --time-delay: Delay in seconds between web requests (default: 0.0)
  • --user-agent: User agent type to use ("default", "chrome", "firefox", "safari", "edge", "mobile", or "random")

Report Features

The HTML report generated by Sidikjari includes:

  • Domain Information: Registration details, WHOIS data
  • SSL Certificate Analysis: Validation, algorithms, expiration, security assessment
  • Document Metadata: Comprehensive details extracted from each document
  • Discovered Information: Users, email addresses, internal paths, software versions
  • Interactive Maps: Visual representation of GPS coordinates found in documents
  • Relationship Graphs: Interactive visualization of connections between discovered entities
  • Website Screenshots: Visual capture of the target website

Examples

Basic Website Scan

python sidikjari.py --url targetcompany.com

Deep Website Scan with Delay

python sidikjari.py --url targetcompany.com --depth 3 --time-delay 0.5 --user-agent random

Local Document Analysis

python sidikjari.py --local /path/to/client/documents --threads 20

Contact

For professional services, integrations, or support contact: operations@redcellsecurity.org

License

Author: Keith Pachulski
Company: Red Cell Security, LLC
Email: keith@redcellsecurity.org
Website: www.redcellsecurity.org

© 2025 Keith Pachulski. All rights reserved.

**This software is provided "as-is" under the MIT License. It should only be used for legitimate security assessments with proper authorization. The authors are not responsible for misuse or any damages resulting from the use of this software.

Support My Work

If you find my work useful and want to support continued development, you can donate here:

Donate

DISCLAIMER:
This software is provided "as-is," without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the software or the use or other dealings in the software.

Acknowledgements

This project is a modern Python implementation inspired by the original FOCA tool developed by ElevenPaths.

About

Sidikjari is a powerful Python-based metadata extraction and analysis tool designed for cybersecurity professionals, penetration testers, and security researchers. It crawls websites to discover documents, extracts comprehensive metadata from various file types, and generates detailed reports highlighting potential security risks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages