archivekit

archivekit provides a mechanism for storing a (large) set of immutable documents and data files in an organized way. Transformed versions of each file can be stored the alongside the original data in order to reflect a complete processing chain. Metadata is kept with the data as a YAML file.

This library is inspired by OFS, BagIt and Pairtree. It replaces a previous project, docstash.

Installation

The easiest way of using archivekit is via PyPI:

$ pip install archivekit

Alternatively, check out the repository from GitHub and install it locally:

$ git clone https://github.com/pudo/archivekit.git
$ cd archivekit
$ python setup.py develop

Example

archivekit manages Packages which contain one or several Resources and their associated metadata. Each Package is part of a Collection.

from archivekit import open_collection, Source

# open a collection of packages
collection = open_collection('file', path='/tmp')

# or via S3:
collection = open_collection('s3', aws_key_id='..', aws_secret='..',
                             bucket_name='test.pudo.org')

# import a file from the local working directory:
collection.ingest('README.md')

# import an http resource:
collection.ingest('http://pudo.org/index.html')
# ingest will also accept file objects and httplib/urllib/requests responses

# iterate through each document and set a metadata
# value:
for package in collection:
    for source in package.all(Source):
        with source.fh() as fh:
            source.meta['body_length'] = len(fh.read())
    package.save()

The code for this library is very compact, go check it out.

Configuration

If AWS credentials are not supplied for an S3-based collection, the application will attempt to use the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. AWS_BUCKET_NAME is also supported.

License

archivekit is open source, licensed under a standard MIT license (included in this repository as LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
archivekit		archivekit
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

archivekit

Installation

Example

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

archivekit

Installation

Example

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages