You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Side note: First of all thank you for an awesome gem. Over the past years and I've reached for this gem numerous times for various purposes big and small, its always a joy to use - thank you! 🙌
Simple Command Line Interface (CLI)
Rationale
I've found myself wanting to do a "quick and dirty" crawl of different websites quite often. For example to find 4X, 5XX etc. So far I've written small Ruby scripts using spidr with the things I need.
Many of these use cases could be solved with a fairly simple CLI.
Usage: spidr [options] <url>
--columns=[val1,val2] Columns in output
--content-types=[val1,val2] Formats to output (html, javascript, css, json, ..)
--[no-]header Include the header
--open-timeout=val Optional open timeout
--read-timeout=val Optional read timeout
--ssl-timeout=val Optional ssl timeout
--continue-timeout=val Optional continue timeout
--keep-alive-timeout=val Optional keep_alive timeout
--proxy-host=val The host the proxy is running on
--proxy-port=val The port the proxy is running on
--proxy-user=val The user to authenticate as with the proxy
--proxy-password=val The password to authenticate with
--default-headers=[key1=val1,key2=val2]
Default headers to set for every request
--host-header=val The HTTP Host header to use with each request
--host-headers=[key1=val1,key2=val2]
The HTTP Host headers to use for specific hosts
--user-agent=val The User-Agent string to send with each requests
--referer=val The number of seconds to pause between each request
--queue=[val1,val2] The initial queue of URLs to visit
--history=[val1,val2] The initial list of visited URLs
--limit=val The maximum number of pages to visit
--max-depth=val The maximum link depth to follow
--[no-]robots Respect Robots.txt
-h, --help How to use
--version Show version
todo
Communicate that the --[no-]robots option requires gem install robots?
If you don't want to include this here then this could be a separate gem, something like spidr_cli (with your blessing unless you object?). However it would probably be easier for others to find it if its here.
I've created a spidr_cli gem which includes the above mentioned functionality, plus accept/reject hosts, ports, links and urls arguments and ability to chose what method to use: Spidr::site|host|start_at.
Sorry for not noticing this. If I were to add a CLI it would need to be a class called Spidr::CLI. It would also need to catch Interrupt and Errno::EPIPE exceptions (see: how command_kit handles this). Also, would need a --format or --output-format option to control plain text, CSV, or JSON. Would also need specs that invoke the command and uses RSpec's .to output(...).to_stdout.
Not to plug my own code too much, but you might want to consider using command_kit for your spidr-cli gem?
If you want to get this merged, checkout the CLI class from wordlist.rb. Feel free to copy it's zero-dependency boilerplate CLI code.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Side note: First of all thank you for an awesome gem. Over the past years and I've reached for this gem numerous times for various purposes big and small, its always a joy to use - thank you! 🙌
Simple Command Line Interface (CLI)
Rationale
I've found myself wanting to do a "quick and dirty" crawl of different websites quite often. For example to find 4X, 5XX etc. So far I've written small Ruby scripts using
spidrwith the things I need.Many of these use cases could be solved with a fairly simple CLI.
Examples
it supports all
Spdir::Agentargumentsyou can output multiple values (CSV-style), the columns argument map to methods on
pageUsage
todo
--[no-]robotsoption requiresgem install robots?If you don't want to include this here then this could be a separate gem, something like
spidr_cli(with your blessingunless you object?). However it would probably be easier for others to find it if its here.Thanks!