Skip to content

Latest commit

 

History

History
777 lines (420 loc) · 17.2 KB

File metadata and controls

777 lines (420 loc) · 17.2 KB

downflux


downflux / BaseProvider

Abstract Class: BaseProvider<TExec>

Defined in: packages/base/BaseProvider.ts:38

Base provider API for every supported site.

Remarks

Providers are the public entry points because callers should not need to know about parsers, transformers, pipelines, or transport details. A provider owns URL validation, provider metadata, fluent job configuration, and the typed methods that turn a site URL into an execution request.

Extended by

Type Parameters

TExec

TExec extends ExecutionArgs<ExecutionShape>

Constructors

Constructor

new BaseProvider<TExec>(url, config): BaseProvider<TExec>

Defined in: packages/base/BaseProvider.ts:51

Parameters

url

string

config

ProviderConfig

Returns

BaseProvider<TExec>

Properties

executionOptions

protected executionOptions: ExecutionOptions = {}

Defined in: packages/base/BaseProvider.ts:39


httpOptions

protected httpOptions: HttpFetchOptions = {}

Defined in: packages/base/BaseProvider.ts:40


deps

protected readonly deps: CoordinatorDependencies

Defined in: packages/base/BaseProvider.ts:41


provider

protected readonly provider: Provider

Defined in: packages/base/BaseProvider.ts:42


urlPattern

protected readonly urlPattern: RegExp

Defined in: packages/base/BaseProvider.ts:43


providerMetadata

protected readonly providerMetadata: ProviderMetadata

Defined in: packages/base/BaseProvider.ts:44


url

protected readonly url: string

Defined in: packages/base/BaseProvider.ts:52


config

protected config: ProviderConfig

Defined in: packages/base/BaseProvider.ts:53

Accessors

metadata

Get Signature

get protected metadata(): ProviderMetadata

Defined in: packages/base/BaseProvider.ts:47

Provider capabilities, integration status, and access restrictions.

Returns

ProviderMetadata


ORIGIN

Get Signature

get protected ORIGIN(): string

Defined in: packages/base/BaseProvider.ts:81

Returns

string


HOST_NAME

Get Signature

get protected HOST_NAME(): string

Defined in: packages/base/BaseProvider.ts:85

Returns

string

Methods

isValidHostName()

protected isValidHostName(): boolean

Defined in: packages/base/BaseProvider.ts:89

Returns

boolean


setAuth()

setAuth(auth): this

Defined in: packages/base/BaseProvider.ts:110

Sets authentication credentials for the provider.

Parameters

auth

AuthenticatedCrawlOptions

Authentication options including cookie, bearer token, CSRF token, API key, client ID, and user agent

Returns

this

Remarks

Configures HTTP headers and user agent based on provided authentication credentials. Supports multiple authentication methods: cookies, bearer tokens, CSRF tokens, API keys, and client IDs.


setHeaders()

setHeaders(headers): this

Defined in: packages/base/BaseProvider.ts:129

Sets custom HTTP headers.

Parameters

headers

Record<string, string>

Request header map

Returns

this


setTimeout()

setTimeout(timeoutMs): this

Defined in: packages/base/BaseProvider.ts:138

Sets HTTP timeout.

Parameters

timeoutMs

number

Timeout in milliseconds

Returns

this


setRetries()

setRetries(retries): this

Defined in: packages/base/BaseProvider.ts:147

Sets fetch retry count.

Parameters

retries

number

Retry attempt count

Returns

this


setTransformOutput()

setTransformOutput(transform?): this

Defined in: packages/base/BaseProvider.ts:156

Transform output to provider-specific result type.

Parameters

transform?

boolean = true

Default is true, which applies the default transformation. Set to false to return raw extracted data.

Returns

this


setHttpOptions()

setHttpOptions(opts): this

Defined in: packages/base/BaseProvider.ts:165

Sets HTTP fetch options.

Parameters

opts

HttpFetchOptions

HTTP options to merge

Returns

this


setNoDownload()

setNoDownload(noDownload?): this

Defined in: packages/base/BaseProvider.ts:175

Sets no download flag.

Parameters

noDownload?

boolean = false

No download flag

Returns

this

Default Value

false - set to true to skip the download phase and only perform extraction (useful for debugging or when you only need metadata)

setTranscodeOptions()

setTranscodeOptions(opts): this

Defined in: packages/base/BaseProvider.ts:188

Sets transcode options.

Parameters

opts

TranscodeOptions

Sometimes due to nature of the OS, the video might not play after download.

In such cases, you can set transcodeOptions to re-encode the video using ffmpeg which should resolve most compatibility issues. Make sure your OS can handle it

Returns

this


setPreferredFormat()

setPreferredFormat(format): this

Defined in: packages/base/BaseProvider.ts:197

Sets preferred video format.

Parameters

format

VideoFormat

Video format (hls or mp4)

Returns

this


setPreferredCodec()

setPreferredCodec(codec): this

Defined in: packages/base/BaseProvider.ts:211

Sets preferred video codec.

Parameters

codec

VideoCodec

Video codec (h264 or av1)

This feature is still experimental not yet implemented for all providers.

It allows you to specify a preferred video codec which can help with compatibility or performance in some cases. If the provider supports it, it will try to download the video in the specified codec. If not available, it will fall back to the default behavior.

Returns

this


setJobOptions()

setJobOptions(opts): this

Defined in: packages/base/BaseProvider.ts:220

Sets ExecutionCoordinator options.

Parameters

opts

ExecutionOptions

Job options to merge

Returns

this


setAgentOptions()

setAgentOptions(opts): this

Defined in: packages/base/BaseProvider.ts:229

Sets HTTP agent options.

Parameters

opts

HttpAgentOptions

HTTP agent options to merge

Returns

this


setMaxDownloads()

setMaxDownloads(maxDownloads): this

Defined in: packages/base/BaseProvider.ts:238

Sets maximum downloads.

Parameters

maxDownloads

number

Download limit

Returns

this


setAllowedExtensions()

setAllowedExtensions(...extensions): this

Defined in: packages/base/BaseProvider.ts:247

Sets allowed file extensions.

Parameters

extensions

...AllowedExtension[]

File extensions such as jpg or png

Returns

this


onProgress()

onProgress(handler): this

Defined in: packages/base/BaseProvider.ts:256

Sets progress handler.

Parameters

handler

(event) => void

Progress event callback

Returns

this


setProgressLogging()

setProgressLogging(enabled?): this

Defined in: packages/base/BaseProvider.ts:266

Enables console progress logging.

Parameters

enabled?

boolean = true

Console logging flag

Returns

this

Default Value

true

setOutput()

setOutput(type, config?): this

Defined in: packages/base/BaseProvider.ts:277

Sets output type.

Parameters

type

OutputType

Job output mode

config?

DirectoryOutputOptions = {}

Directory output configuration

Returns

this

Default Value

OutputType.JSON

setExecutionType()

setExecutionType(type): this

Defined in: packages/base/BaseProvider.ts:298

Sets execution strategy.

Parameters

type

ExecutionType

Execution mode

Returns

this

Default Value

ExecutionType.SEQUENTIAL

This feature is still experimental and not yet implemented for all providers. It allows you to specify the execution strategy for the extraction and download process.

  • SEQUENTIAL: Extracts and downloads items one by one. This is the most compatible mode and should work with all providers, but can be slower for large batches.

  • PARALLEL: Extracts all items first, then downloads them in parallel. This can be faster for large batches, but may cause issues with providers that have strict rate limits or anti-bot measures. Use with caution and test thoroughly if you choose to use PARALLEL execution.


buildRequest()

protected buildRequest(overrides?): TExec

Defined in: packages/base/BaseProvider.ts:309

Builds the execution request passed to the coordinator layer.

Parameters

overrides?

Partial<TExec>

Provider method options that should override defaults.

Returns

TExec

A typed request containing provider metadata and execution options.


execute()

protected execute<TResult>(overrides): Promise<TResult>

Defined in: packages/base/BaseProvider.ts:330

Runs extraction and optional downloads through the shared coordinator.

Type Parameters

TResult

TResult

Parameters

overrides

TExec | { entryUrl?: string; } & object

Provider method request data, including execution shape.

Returns

Promise<TResult>

Extracted output in the shape requested by the provider method.


makeTargets()

protected makeTargets(sourceUrl, range, provider, method, addTrailingSlash?): object

Defined in: packages/base/BaseProvider.ts:359

Builds paginated target URLs for list-like provider methods.

Parameters

sourceUrl

string

Base URL before the page number.

range

Range

Page or start/end range to expand.

provider

Provider

Provider used for range validation errors.

method

string

Provider method used for range validation errors.

addTrailingSlash?

boolean = true

Whether generated target URLs should end with /.

Returns

object

Provider, method, and generated target URLs.

targets

targets: string[]

provider

provider: Provider

method

method: string