Skip to content

Commit b2d3ad7

Browse files
authored
Merge pull request #40 from theochem/update_license
Clean up and fix PyPI packaging configurations
2 parents f0aa915 + dfb35ac commit b2d3ad7

4 files changed

Lines changed: 47 additions & 140 deletions

File tree

MANIFEST.in

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Explicitly include only what we want
1+
# Only include essential files
22
include README.md
33
include LICENSE
44
include CITATION.cff
@@ -10,16 +10,3 @@ recursive-include B3DB *.tsv *.tsv.gz *.csv
1010

1111
# Exclude everything else
1212
global-exclude *
13-
exclude *.pyc
14-
exclude __pycache__
15-
exclude *.egg-info
16-
exclude .git*
17-
exclude .github
18-
exclude build
19-
exclude dist
20-
exclude .tox
21-
exclude .pytest_cache
22-
exclude .coverage
23-
exclude htmlcov
24-
exclude .mypy_cache
25-
exclude .ruff_cache

README.md

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -139,18 +139,3 @@ pip install -r requirements.txt
139139

140140
The materials and data under this repo are distributed under the
141141
[CC0 Licence](http://creativecommons.org/publicdomain/zero/1.0/).
142-
143-
## Update: New External Dataset Available
144-
145-
We’ve expanded the B3DB dataset by adding a new file: `B3DB_classification_external.tsv`. This file introduces additional compounds (171 BBB+ and 4 BBB-) that were not present in the original B3DB dataset. These compounds were carefully selected and incorporated to further enrich B3DB.
146-
147-
### Usage
148-
149-
To load and work with the new classification data in Python, you can use the following code snippet:
150-
151-
```python
152-
import pandas as pd
153-
154-
# Load the new external classification dataset
155-
external_classification_data = pd.read_csv("B3DB/B3DB_classification_external.tsv", sep="\t")
156-
```

data_curation/README.md

Lines changed: 1 addition & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,4 @@
1-
# About *B3DB*
2-
3-
In this repo, we present a large benchmark dataset, [Blood-Brain Barrier Database (B3DB)](https://www.nature.com/articles/s41597-021-01069-5), compiled
4-
from 50 published resources (as summarized at
5-
[raw_data/raw_data_summary.tsv](raw_data/raw_data_summary.tsv)) and categorized based on
6-
the consistency between different experimental references/measurements. This dataset was [published in Scientific Data](https://www.nature.com/articles/s41597-021-01069-5) and this repository is occasionally uploaded with new experimental data. Scientists who would like to contribute data should contact the database's maintainers (e.g., by creating a new Issue in this database).
7-
8-
A subset of the
9-
molecules in B3DB has numerical `logBB` values (1058 compounds), while the whole dataset
10-
has categorical (BBB+ or BBB-) BBB permeability labels (7807 compounds). Some physicochemical properties
11-
of the molecules are also provided.
12-
13-
## Citation
14-
15-
Please use the following citation in any publication using our *B3DB* dataset:
16-
17-
```md
18-
@article{Meng_A_curated_diverse_2021,
19-
author = {Meng, Fanwang and Xi, Yang and Huang, Jinfeng and Ayers, Paul W.},
20-
doi = {10.1038/s41597-021-01069-5},
21-
journal = {Scientific Data},
22-
number = {289},
23-
title = {A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors},
24-
volume = {8},
25-
year = {2021},
26-
url = {https://www.nature.com/articles/s41597-021-01069-5},
27-
publisher = {Springer Nature}
28-
}
29-
```
30-
31-
## Features of *B3DB*
32-
33-
1. The largest dataset with numerical and categorical values for Blood-Brain Barrier small molecules
34-
(to the best of our knowledge, as of February 25, 2021).
35-
36-
2. Inclusion of stereochemistry information with isomeric SMILES with chiral specifications if
37-
available. Otherwise, canonical SMILES are used.
38-
39-
3. Characterization of uncertainty of experimental measurements by grouping the collected molecular
40-
data records.
41-
42-
4. Extended datasets for numerical and categorical data with precomputed physicochemical properties
43-
using [mordred](https://github.com/mordred-descriptor/mordred).
44-
45-
## Usage
46-
47-
There are two types of dataset in [B3DB](B3DB), [regression data](B3DB/B3DB_regression.tsv)
48-
and [classification data](B3DB/B3DB_classification.tsv) and they can be loaded simply using *pandas*. For example
49-
50-
```python
51-
import pandas as pd
52-
53-
# load regression dataset
54-
regression_data = pd.read_csv("B3DB/B3DB_regression.tsv",
55-
sep="\t")
56-
57-
# load classification dataset
58-
classification_data = pd.read_csv("B3DB/B3DB_classification.tsv",
59-
sep="\t")
60-
61-
# load extended regression dataset
62-
regression_data_extended = pd.read_csv("B3DB/B3DB_regression_extended.tsv.gz",
63-
sep="\t", compression="gzip")
64-
65-
# load extended classification dataset
66-
classification_data_extended = pd.read_csv("B3DB/B3DB_classification_extended.tsv.gz",
67-
sep="\t", compression="gzip")
68-
69-
```
70-
71-
We also have three examples to show how to use our dataset,
72-
[numerical_data_analysis.ipynb](notebooks/numerical_data_analysis.ipynb),
73-
[PCA_projection_fingerprint.ipynb](notebooks/PCA_projection_fingerprint.ipynb) and
74-
[PCA_projection_descriptors.ipynb](notebooks/PCA_projection_descriptors.ipynb).
75-
[PCA_projection_descriptors.ipynb](notebooks/PCA_projection_descriptors.ipynb) uses precomputed
76-
chemical descriptors for visualization of chemical space of `B3DB`, and can be used directly
77-
using *MyBinder*,
78-
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/theochem/B3DB/main?filepath=notebooks%2FPCA_projection_descriptors.ipynb).
79-
Due to the difficulty of installing `RDKit` in *MyBinder*, only `PCA_projection_descriptors.
80-
ipynb` is set up in *MyBinder*.
1+
# Data Curation Process of B3DB
812

823
## Working environment setting up
834

@@ -136,21 +57,3 @@ pip install -r requirements.txt
13657
```
13758

13859
`ALOGPS` version 2.1 can be accessed at http://www.vcclab.org/lab/alogps/.
139-
140-
The materials and data under this repo are distributed under the
141-
[CC0 Licence](http://creativecommons.org/publicdomain/zero/1.0/).
142-
143-
## Update: New External Dataset Available
144-
145-
We’ve expanded the B3DB dataset by adding a new file: `B3DB_classification_external.tsv`. This file introduces additional compounds (171 BBB+ and 4 BBB-) that were not present in the original B3DB dataset. These compounds were carefully selected and incorporated to further enrich B3DB.
146-
147-
### Usage
148-
149-
To load and work with the new classification data in Python, you can use the following code snippet:
150-
151-
```python
152-
import pandas as pd
153-
154-
# Load the new external classification dataset
155-
external_classification_data = pd.read_csv("B3DB/B3DB_classification_external.tsv", sep="\t")
156-
```

pyproject.toml

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ description = "A rich molecule dataset for Blood-Brain Barrier (BBB) permeabilit
1919
readme = {file = 'README.md', content-type='text/markdown'}
2020
requires-python = ">=3.9"
2121
# "LICENSE" is name of the license file, which must be in root of project folder
22-
license = {file = "LICENSE"}
22+
license = "CC0-1.0"
2323
authors = [
2424
{name = "QC-Devs Community", email = "qcdevs@gmail.com"},
2525
]
@@ -37,7 +37,6 @@ keywords = [
3737
classifiers = [
3838
"Development Status :: 5 - Production/Stable",
3939
"Environment :: Console",
40-
"License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication",
4140
"Natural Language :: English",
4241
"Operating System :: MacOS",
4342
"Operating System :: Microsoft :: Windows",
@@ -51,6 +50,7 @@ classifiers = [
5150
"Programming Language :: Python :: 3.11",
5251
"Programming Language :: Python :: 3.12",
5352
"Programming Language :: Python :: 3.13",
53+
"Programming Language :: Python :: 3.14",
5454
"Topic :: Scientific/Engineering",
5555
"Topic :: Scientific/Engineering :: Chemistry",
5656
]
@@ -83,7 +83,7 @@ build-backend = "setuptools.build_meta"
8383

8484
[tool.setuptools.dynamic]
8585
dependencies = {file = ["requirements.txt"]}
86-
optional-dependencies = {dev = { file = ["requirements_dev.txt"] }}
86+
# optional-dependencies = {dev = { file = ["requirements_dev.txt"] }}
8787

8888
[tool.setuptools_scm]
8989
# can be empty if no extra settings are needed, presence enables setuptools-scm
@@ -95,15 +95,11 @@ include-package-data = true
9595
# This just means it's safe to zip up the bdist
9696
zip-safe = true
9797

98-
# Non-code data that should be included in the package source code
99-
# https://setuptools.pypa.io/en/latest/userguide/datafiles.html
100-
[tool.setuptools.package-data]
101-
B3DB = ["*.csv", "*.tsv", "*.tsv.gz"]
102-
10398
# Python modules and packages that are included in the
10499
# distribution package (and therefore become importable)
105100
[tool.setuptools.packages.find]
106-
where = ["B3DB"]
101+
where = ["."]
102+
include = ["B3DB"]
107103
exclude = [
108104
"*/*/tests",
109105
"tests_*",
@@ -114,8 +110,44 @@ exclude = [
114110
"grouping",
115111
"preprocessing",
116112
"raw_data",
113+
"data_curation/**/*",
114+
"notebooks/**/*",
115+
]
116+
117+
# Non-code data that should be included in the package source code
118+
# https://setuptools.pypa.io/en/latest/userguide/datafiles.html
119+
[tool.setuptools.package-data]
120+
B3DB = ["*.csv", "*.tsv", "*.tsv.gz"]
121+
122+
123+
124+
125+
126+
[tool.setuptools.exclude-package-data]
127+
# exclude all files in data_curation folder from all packages
128+
"*" = [
129+
"data_curation/*",
130+
"data_curation/**/*",
131+
"notebooks/*",
132+
"notebooks/**/*",
133+
"*.pyc",
134+
"__pycache__/*",
135+
"*.egg-info/*",
136+
".git*",
137+
".github/*",
138+
"build/*",
139+
"dist/*",
140+
".tox/*",
141+
".pytest_cache/*",
142+
".coverage",
143+
"htmlcov/*",
144+
".mypy_cache/*",
145+
".ruff_cache/*",
117146
]
118147

148+
149+
150+
119151
[tool.black]
120152
line-length = 100
121153

@@ -269,10 +301,10 @@ markers = [
269301
# Configuration for coverage.py
270302
[tool.coverage.run]
271303
# files or directories to exclude from coverage calculations
272-
omit = [
273-
'B3DB/measures/tests/*',
274-
'B3DB/methods/tests/*',
275-
]
304+
# omit = [
305+
# 'B3DB/measures/tests/*',
306+
# 'B3DB/methods/tests/*',
307+
# ]
276308

277309

278310
# Configuration for vulture

0 commit comments

Comments
 (0)