Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ updates:
schedule:
interval: "weekly"
groups:
actions:
patterns:
- "*"
actions:
patterns:
- "*"
- package-ecosystem: "pip"
directory: "/"
schedule:
Expand Down
7 changes: 3 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,21 @@ name: CI

on:
push:
branches: [ main ]
branches: [main]

pull_request:
branches: [ main ]
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:

Test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.10", "3.11", "3.12", "3.13" ]
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
Expand Down
22 changes: 21 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
ci:
autoupdate_schedule: monthly
autoupdate_commit_msg: "chore: Update pre-commit hooks"
autofix_commit_msg: "style: Pre-commit fixes"
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
Expand All @@ -11,6 +15,22 @@ repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.14.7
hooks:
- id: ruff
- id: ruff-check
types_or: [python, pyi, jupyter]
args: [--fix, --show-fixes, --exit-non-zero-on-fix]
- id: ruff-format

- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0
hooks:
- id: python-no-log-warn
- id: rst-backticks
- id: rst-directive-colons
- id: rst-inline-touching-normal
- id: text-unicode-replacement-char

- repo: https://github.com/rbubley/mirrors-prettier
rev: v3.7.3
hooks:
- id: prettier
args: ["--cache-location=.prettier_cache/cache"]
2 changes: 2 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

build:
os: ubuntu-24.04
tools:
Expand Down
224 changes: 130 additions & 94 deletions CHANGES.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ cff-version: 1.2.0
type: software
title: bioframe
license: MIT
repository-code: 'https://github.com/open2c/bioframe'
repository-code: "https://github.com/open2c/bioframe"
message: >-
If you use this software, please cite it using the
metadata from this file.
authors:
- given-names: Nezar
family-names: Abdennur
orcid: 'https://orcid.org/0000-0001-5814-0864'
orcid: "https://orcid.org/0000-0001-5814-0864"
- given-names: Geoffrey
family-names: Fudenberg
orcid: "https://orcid.org/0000-0001-5905-6517"
Expand Down Expand Up @@ -57,7 +57,7 @@ preferred-citation:
- family-names: Open2C
- given-names: Nezar
family-names: Abdennur
orcid: 'https://orcid.org/0000-0001-5814-0864'
orcid: "https://orcid.org/0000-0001-5814-0864"
- given-names: Geoffrey
family-names: Fudenberg
orcid: "https://orcid.org/0000-0001-5905-6517"
Expand Down
12 changes: 4 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,19 @@
# Contributing


## General guidelines

If you haven't contributed to open-source before, we recommend you read [this excellent guide by GitHub on how to contribute to open source](https://opensource.guide/how-to-contribute). The guide is long, so you can gloss over things you're familiar with.

If you're not already familiar with it, we follow the [fork and pull model](https://help.github.com/articles/about-collaborative-development-models) on GitHub. Also, check out this recommended [git workflow](https://www.asmeurer.com/git-workflow/).


## Contributing Code

This project has a number of requirements for all code contributed.

* We follow the [PEP-8 style](https://www.python.org/dev/peps/pep-0008/) convention.
* We use [NumPy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html).
* It's ideal if user-facing API changes or new features have documentation added.
* It is best if all new functionality and/or bug fixes have unit tests added with each use-case.

- We follow the [PEP-8 style](https://www.python.org/dev/peps/pep-0008/) convention.
- We use [NumPy-style docstrings](https://numpydoc.readthedocs.io/en/latest/format.html).
- It's ideal if user-facing API changes or new features have documentation added.
- It is best if all new functionality and/or bug fixes have unit tests added with each use-case.

## Setting up Your Development Environment

Expand Down Expand Up @@ -96,7 +93,6 @@ This will build the documentation and serve it on a local http server which list

Documentation from the `main` branch and tagged releases is automatically built and hosted on [readthedocs](https://readthedocs.org/).


## Acknowledgments

This document is based off of the [guidelines from the sparse project](https://github.com/pydata/sparse/blob/master/docs/contributing.rst).
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ Bioframe enables flexible and scalable operations on genomic interval dataframes

Bioframe is built directly on top of [Pandas](https://pandas.pydata.org/). Bioframe provides:

* A variety of genomic interval operations that work directly on dataframes.
* Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
* Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.
- A variety of genomic interval operations that work directly on dataframes.
- Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
- Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the [documentation](https://bioframe.readthedocs.io/en/latest/), including the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html), as well as the [publication](https://doi.org/10.1093/bioinformatics/btae088) for more information.

Expand All @@ -34,10 +34,10 @@ pip install bioframe

Interested in contributing to bioframe? That's great! To get started, check out the [contributing guide](https://github.com/open2c/bioframe/blob/main/CONTRIBUTING.md). Discussions about the project roadmap take place on the [Open2C Discord](https://discord.com/invite/qVfSbDYHNG) server and regular developer meetings scheduled there. Anyone can join and participate!


## Interval operations

Key genomic interval operations in bioframe include:

- `overlap`: Find pairs of overlapping genomic intervals between two dataframes.
- `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe.
- `cluster`: Group overlapping intervals in a dataframe into clusters.
Expand All @@ -46,6 +46,7 @@ Key genomic interval operations in bioframe include:
Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`, `select`, and `subtract`.

To `overlap` two dataframes, call:

```python
import bioframe as bf

Expand All @@ -62,8 +63,8 @@ For these two input dataframes, with intervals all on the same chromosome:
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_0.png" width=60%>
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_1.png" width=60%>


To `merge` all overlapping intervals in a dataframe, call:

```python
import bioframe as bf

Expand All @@ -90,12 +91,12 @@ ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
```

## Tutorials
See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

## Citing

If you use ***bioframe*** in your work, please cite:
If you use **_bioframe_** in your work, please cite:

```bibtex
@article{bioframe_2024,
Expand Down
15 changes: 7 additions & 8 deletions docs/api-resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Bioframe provides a collection of genome assembly metadata for commonly used
genomes. These are accessible through a convenient dataclass interface via :func:`bioframe.assembly_info`.

The assemblies are listed in a manifest YAML file, and each assembly
has a mandatory companion file called `seqinfo` that contains the sequence
has a mandatory companion file called _seqinfo_ that contains the sequence
names, lengths, and other information. The records in the manifest file contain
the following fields:

Expand All @@ -22,7 +22,7 @@ the following fields:
- ``default_units``: default assembly units to include from the seqinfo file
- ``url``: URL to where the corresponding sequence files can be downloaded

The `seqinfo` file is a TSV file with the following columns (with header):
The _seqinfo_ file is a TSV file with the following columns (with header):

- ``name``: canonical sequence name
- ``length``: sequence length
Expand All @@ -31,21 +31,20 @@ The `seqinfo` file is a TSV file with the following columns (with header):
- ``unit``: assembly unit of the chromosome (e.g., "primary", "non-nuclear", "decoy")
- ``aliases``: comma-separated list of aliases for the sequence name

We currently do not include sequences with "alt" or "patch" roles in `seqinfo` files, but we
We currently do not include sequences with "alt" or "patch" roles in _seqinfo_ files, but we
do support the inclusion of additional decoy sequences (as used by so-called NGS *analysis
sets* for human genome assemblies) by marking them as members of a "decoy" assembly unit.

The `cytoband` file is an optional TSV file with the following columns (with header):

The _cytoband_ file is an optional TSV file with the following columns (with header):
- ``chrom``: chromosome name
- ``start``: start position
- ``end``: end position
- ``band``: cytogenetic coordinate (name of the band)
- ``stain``: Giesma stain result

The order of the sequences in the `seqinfo` file is treated as canonical.
The ordering of the chromosomes in the `cytobands` file should match the order
of the chromosomes in the `seqinfo` file.
The order of the sequences in the _seqinfo_ file is treated as canonical.
The ordering of the chromosomes in the _cytobands_ file should match the order
of the chromosomes in the _seqinfo_ file.

The manifest and companion files are stored in the ``bioframe/io/data`` directory.
New assemblies can be requested by opening an issue on GitHub or by submitting a pull request.
Expand Down
3 changes: 0 additions & 3 deletions docs/guide-bedtools.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ kernelspec:

# Bioframe for bedtools users


Bioframe is built around the analysis of genomic intervals as a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) in memory, rather than working with tab-delimited text files saved on disk.

Bioframe supports reading a number of standard genomics text file formats via [`read_table`](https://bioframe.readthedocs.io/en/latest/api-fileops.html#bioframe.io.fileops.read_table), including BED files (see [schemas](https://github.com/open2c/bioframe/blob/main/bioframe/io/schemas.py)), which will load them as pandas DataFrames, a complete list of helper functions is [available here](API_fileops).
Expand All @@ -25,7 +24,6 @@ For example, with gtf files, you do not need to turn them into bed files, you ca

Finally, if needed, bioframe provides a convenience function to write dataframes to a standard BED file using [`to_bed`](https://bioframe.readthedocs.io/en/latest/api-fileops.html#bioframe.io.bed.to_bed).


## `bedtools intersect`

### Select unique entries from the first bed overlapping the second bed `-u`
Expand Down Expand Up @@ -107,7 +105,6 @@ out = bf.overlap(A, B, how='inner', suffixes=('_', ''))[B.columns]

> **Note:** This gives one row per overlap and can contain duplicates. The output dataframe of the former method will use the same pandas index as the input dataframe `B`, while the latter result --- the join output --- will have an integer range index, like a pandas merge.


### Intersect multiple beds against A

```sh
Expand Down
Loading