downflux / WikimediaProvider
Defined in: packages/providers/wikimedia/WikimediaProvider.ts:5
Generic provider for sites that can use the default extraction pipeline while site-specific parsers are still being built.
new WikimediaProvider(
url):WikimediaProvider
Defined in: packages/providers/wikimedia/WikimediaProvider.ts:6
string
WikimediaProvider
GenericContentProvider.constructor
protectedexecutionOptions:ExecutionOptions={}
Defined in: packages/base/BaseProvider.ts:39
GenericContentProvider.executionOptions
protectedhttpOptions:HttpFetchOptions={}
Defined in: packages/base/BaseProvider.ts:40
protectedreadonlydeps:CoordinatorDependencies
Defined in: packages/base/BaseProvider.ts:41
protectedreadonlyprovider:Provider
Defined in: packages/base/BaseProvider.ts:42
GenericContentProvider.provider
protectedreadonlyurlPattern:RegExp
Defined in: packages/base/BaseProvider.ts:43
protectedreadonlyproviderMetadata:ProviderMetadata
Defined in: packages/base/BaseProvider.ts:44
DefaultProvider.providerMetadata
protectedreadonlyurl:string
Defined in: packages/base/BaseProvider.ts:52
protectedconfig:ProviderConfig
Defined in: packages/base/BaseProvider.ts:53
get
protectedmetadata():ProviderMetadata
Defined in: packages/base/BaseProvider.ts:47
Provider capabilities, integration status, and access restrictions.
get
protectedORIGIN():string
Defined in: packages/base/BaseProvider.ts:81
string
get
protectedHOST_NAME():string
Defined in: packages/base/BaseProvider.ts:85
string
protectedisValidHostName():boolean
Defined in: packages/base/BaseProvider.ts:89
boolean
GenericContentProvider.isValidHostName
setAuth(
auth):this
Defined in: packages/base/BaseProvider.ts:110
Sets authentication credentials for the provider.
Authentication options including cookie, bearer token, CSRF token, API key, client ID, and user agent
this
Configures HTTP headers and user agent based on provided authentication credentials. Supports multiple authentication methods: cookies, bearer tokens, CSRF tokens, API keys, and client IDs.
GenericContentProvider.setAuth
setHeaders(
headers):this
Defined in: packages/base/BaseProvider.ts:129
Sets custom HTTP headers.
Record<string, string>
Request header map
this
GenericContentProvider.setHeaders
setTimeout(
timeoutMs):this
Defined in: packages/base/BaseProvider.ts:138
Sets HTTP timeout.
number
Timeout in milliseconds
this
GenericContentProvider.setTimeout
setRetries(
retries):this
Defined in: packages/base/BaseProvider.ts:147
Sets fetch retry count.
number
Retry attempt count
this
GenericContentProvider.setRetries
setTransformOutput(
transform?):this
Defined in: packages/base/BaseProvider.ts:156
Transform output to provider-specific result type.
boolean = true
Default is true, which applies the default transformation. Set to false to return raw extracted data.
this
GenericContentProvider.setTransformOutput
setHttpOptions(
opts):this
Defined in: packages/base/BaseProvider.ts:165
Sets HTTP fetch options.
HTTP options to merge
this
GenericContentProvider.setHttpOptions
setNoDownload(
noDownload?):this
Defined in: packages/base/BaseProvider.ts:175
Sets no download flag.
boolean = false
No download flag
this
false - set to true to skip the download phase and only perform extraction (useful for debugging or when you only need metadata)GenericContentProvider.setNoDownload
setTranscodeOptions(
opts):this
Defined in: packages/base/BaseProvider.ts:188
Sets transcode options.
Sometimes due to nature of the OS, the video might not play after download.
In such cases, you can set transcodeOptions to re-encode the video using ffmpeg which should resolve most compatibility issues. Make sure your OS can handle it
this
GenericContentProvider.setTranscodeOptions
setPreferredFormat(
format):this
Defined in: packages/base/BaseProvider.ts:197
Sets preferred video format.
Video format (hls or mp4)
this
GenericContentProvider.setPreferredFormat
setPreferredCodec(
codec):this
Defined in: packages/base/BaseProvider.ts:211
Sets preferred video codec.
Video codec (h264 or av1)
This feature is still experimental not yet implemented for all providers.
It allows you to specify a preferred video codec which can help with compatibility or performance in some cases. If the provider supports it, it will try to download the video in the specified codec. If not available, it will fall back to the default behavior.
this
GenericContentProvider.setPreferredCodec
setJobOptions(
opts):this
Defined in: packages/base/BaseProvider.ts:220
Sets ExecutionCoordinator options.
Job options to merge
this
GenericContentProvider.setJobOptions
setAgentOptions(
opts):this
Defined in: packages/base/BaseProvider.ts:229
Sets HTTP agent options.
HTTP agent options to merge
this
GenericContentProvider.setAgentOptions
setMaxDownloads(
maxDownloads):this
Defined in: packages/base/BaseProvider.ts:238
Sets maximum downloads.
number
Download limit
this
GenericContentProvider.setMaxDownloads
setAllowedExtensions(...
extensions):this
Defined in: packages/base/BaseProvider.ts:247
Sets allowed file extensions.
...AllowedExtension[]
File extensions such as jpg or png
this
GenericContentProvider.setAllowedExtensions
onProgress(
handler):this
Defined in: packages/base/BaseProvider.ts:256
Sets progress handler.
(event) => void
Progress event callback
this
GenericContentProvider.onProgress
setProgressLogging(
enabled?):this
Defined in: packages/base/BaseProvider.ts:266
Enables console progress logging.
boolean = true
Console logging flag
this
trueGenericContentProvider.setProgressLogging
setOutput(
type,config?):this
Defined in: packages/base/BaseProvider.ts:277
Sets output type.
Job output mode
Directory output configuration
this
OutputType.JSONGenericContentProvider.setOutput
setExecutionType(
type):this
Defined in: packages/base/BaseProvider.ts:298
Sets execution strategy.
Execution mode
this
ExecutionType.SEQUENTIAL
This feature is still experimental and not yet implemented for all providers.
It allows you to specify the execution strategy for the extraction and download process.
-
SEQUENTIAL: Extracts and downloads items one by one. This is the most compatible mode and should work with all providers, but can be slower for large batches. -
PARALLEL: Extracts all items first, then downloads them in parallel. This can be faster for large batches, but may cause issues with providers that have strict rate limits or anti-bot measures. Use with caution and test thoroughly if you choose to usePARALLELexecution.
GenericContentProvider.setExecutionType
protectedbuildRequest(overrides?):WikimediaExecArgs
Defined in: packages/base/BaseProvider.ts:309
Builds the execution request passed to the coordinator layer.
Partial<WikimediaExecArgs>
Provider method options that should override defaults.
A typed request containing provider metadata and execution options.
GenericContentProvider.buildRequest
protectedexecute<TResult>(overrides):Promise<TResult>
Defined in: packages/base/BaseProvider.ts:330
Runs extraction and optional downloads through the shared coordinator.
TResult
{ entryUrl?: string; } | WikimediaExecArgs & object
Provider method request data, including execution shape.
Promise<TResult>
Extracted output in the shape requested by the provider method.
GenericContentProvider.execute
protectedmakeTargets(sourceUrl,range,provider,method,addTrailingSlash?):object
Defined in: packages/base/BaseProvider.ts:359
Builds paginated target URLs for list-like provider methods.
string
Base URL before the page number.
Page or start/end range to expand.
Provider used for range validation errors.
string
Provider method used for range validation errors.
boolean = true
Whether generated target URLs should end with /.
object
Provider, method, and generated target URLs.
targets:
string[]
provider:
Provider
method:
string
GenericContentProvider.makeTargets
getMetadata():
Promise<DefaultExecutionResult<unknown>>
Defined in: packages/providers/shared/GenericContentProvider.ts:29
Promise<DefaultExecutionResult<unknown>>
GenericContentProvider.getMetadata
getLinks():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:39
Promise<string[]>
GenericContentProvider.getLinks
getImages():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:44
Promise<string[]>
GenericContentProvider.getImages
getVideos():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:49
Promise<string[]>
GenericContentProvider.getVideos
getAudio():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:54
Promise<string[]>
GenericContentProvider.getAudio
getAllUrls():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:59
Promise<string[]>
GenericContentProvider.getAllUrls
getDownloadableResources():
Promise<string[]>
Defined in: packages/providers/shared/GenericContentProvider.ts:72
Promise<string[]>