This tool is an integrated CLI for converting compound data (SMILES format, one-line graph CSV, gSpan format) between different formats. The converted data can be directly used in pmopt and other tools.
pysmiles: SMILES notation molecular readingpandas: DataFrame operationsnetworkx: Graph operationsrdkit: Molecular structure conversion
Installation example:
pip install pysmiles pandas networkx rdkitpython converter_cli.py --mode smiles_to_graph --input input.csv --output output.csv --smiles-column SMILES --objective-column y --hydrogen 0
python converter_cli.py --mode graph_to_gspan --input input.csv --output output.graph
python converter_cli.py --mode gspan_to_graph --input input.graph --output output.csv
python converter_cli.py --mode graph_to_smiles --input input.csv --output output.csv--mode smiles_to_graph- Required:
--inputinput CSV/Excel,--outputoutput CSV - Optional:
--smiles-columnSMILES column name,--objective-columnobjective variable column name,--hydrogeninclude hydrogen (1/0)
--mode graph_to_gspan- Required:
--inputinput CSV,--outputoutput .graph - Optional:
--graph-columngraph column name,--spacingblank lines between graphs,--target-columnobjective variable column name (output in header)
--mode gspan_to_graph- Required:
--inputinput .graph,--outputoutput CSV - Optional:
--gspan-target-columnobjective variable column name (default: objective)
--mode graph_to_smiles- Required:
--inputinput CSV,--outputoutput CSV - Optional:
--graph-columngraph column name,--other-columnsother column names to output together
Input CSV:
objective,SMILES
-5.89,O=C1c3ccccc3[Se]N1c2ccccc2Command:
python converter_cli.py --mode smiles_to_graph --input input.csv --output output.csv --smiles-column SMILES --objective-column objective --hydrogen 1Output CSV:
objective,graph
-5.89,v 0 O v 1 C ... e 0 1 = ...Input CSV:
graph,objective
v 0 O v 1 C ... e 0 1 = ...,100Command:
python converter_cli.py --mode graph_to_gspan --input input.csv --output output.graph --target-column objectiveOutput .graph:
t # 0 objective 100
v 0 8
v 1 6
...
e 0 1 4
...
Input .graph:
t # 0 objective 100
v 0 8
v 1 6
...
e 0 1 4
...
Command:
python converter_cli.py --mode gspan_to_graph --input input.graph --output output.csvOutput CSV:
objective,graph
100,v 0 O v 1 C ... e 0 1 = ...Input CSV:
graph
v 0 O v 1 C ... e 0 1 = ...Command:
python converter_cli.py --mode graph_to_smiles --input input.csv --output output.csvOutput CSV:
smiles
O=C1c3ccccc3[Se]N1c2ccccc2--smiles-column: Column name containing SMILES (default: smiles)--objective-column: Objective variable column name (default: objective)--hydrogen: Whether to include hydrogen (1: include, 0: exclude, default: 0)--graph-column: Column name containing one-line graph (default: graph)--spacing: Insert blank lines between graphs in gSpan output--target-column: Column name for objective variable (target value) to output in gSpan header (e.g., objective, y, etc., optional)--gspan-target-column: Objective variable column name for gSpan input (default: objective)--other-columns: Other column names to output together when generating SMILES
- Please pay attention to the column names and format of input files
- Errors will occur if there are unknown element symbols or bond symbols
- If there are multiple input SMILES, the longest one will be selected
- Converted data can be directly used in pmopt
- When converting SMILES→One-line Graph, the following information is lost:
- Stereochemical information ([C@H], E/Z, etc.)
- Explicit aromaticity flags
- Number of hydrogens (when hydrogen=0)
- Atomic charge and radical information
- Therefore, reverse conversion from One-line Graph→SMILES may not completely match the original SMILES.
- Particularly for molecules containing aromatic rings or stereochemistry, RDKit's Kekulize processing may cause errors (Can't kekulize mol.), making reverse conversion to SMILES impossible.
- This is due to the specifications of this tool and RDKit/pysmiles, and depends on the structure and information content of the input molecule.
- If complete reverse conversion is required, it is recommended to also save the original SMILES.
A tool for visualizing molecular structures in SMILES notation as SVG images.
python draw_smiles_to_svg.py input.csv output_dir
python draw_smiles_to_svg.py input.csv output_dir --smiles_col SMILEScsv_path: Input CSV file (file containing SMILES)out_dir: Output directory (where SVG files will be saved)--smiles_col: Column name containing SMILES (default: smiles)
Input CSV (input.csv):
smiles,name
O=C1c3ccccc3[Se]N1c2ccccc2,compound1
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O,compound2Command:
python draw_smiles_to_svg.py input.csv svg_outputOutput:
svg_output/0.svg: SVG image of molecular structure from row 1svg_output/1.svg: SVG image of molecular structure from row 2
- If there are invalid SMILES notations, those rows will be skipped and error messages will be displayed
- Output SVG images are 300x300 pixels in size
- If the output directory doesn't exist, it will be created automatically