You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-48Lines changed: 30 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,53 +61,33 @@ Fedrann generates the following output files to help you understand your analysi
61
61
62
62
-`fedrann.log`: A log file that details the pipeline's progress, from start to finish.
63
63
64
-
-`overlaps.tsv`: This file contains a list of every sequence analyzed, including its name and orientation.
65
-
66
-
-`metadata.tsv`: This file lists all identified candidate overlaps and their similarity metrics.
64
+
-`overlaps.tsv`: This file lists all identified candidate overlaps with sequence names, orientations, and their similarity metrics.
67
65
68
66
-`feature_matrix.npz`: (Optional) This sparse-format file contains the feature matrix generated during the analysis.
69
67
70
68
### `overlaps.tsv`
71
-
This file serves as a reference for all input sequences. The index column provides a numerical identifier for each sequence. The `read_name` column contains the original name of the sequence, while the `strand` column specifies its orientation. A strand value of `0` denotes the original sequence, and a value of `1` denotes its reverse complement.
69
+
This file details the candidate overlaps identified by the tool. Each row represents a potential overlap between two sequences.
72
70
73
71
Example `overlaps.tsv` file:
74
72
```
75
-
index read_name strand
76
-
0 c2924806-d5c6-4564-b31a-c701c0226fbc 0
77
-
1 c29246d8-d6c6-4564-b31a-c701c0226fbc 1
78
-
2 b5ec3070-2ba3-430d-a55d-1d7b178c8d36 0
79
-
3 b5ec3070-2ba3-430d-a55d-1d7b178c8d36 1
80
-
4 5b1d405b-08b4-448d-b601-dd922aa9380c 0
81
-
5 5b1d405b-08b4-448d-b601-dd922aa9380c 1
82
-
6 f3d50991-ad3e-4564-98de-97bf986f992c 0
83
-
7 f3d50991-ad3e-4564-98de-97bf986f992c 1
84
-
8 316caa78-7d27-4d49-b0f3-684fa17063e4 0
85
-
9 316caa78-7d27-4d49-b0f3-684fa17063e4 1
86
-
10 7b2ee773-fc72-4b54-b6fe-a412df3f8744 0
87
-
```
88
-
89
-
### `metadata.tsv`
90
-
This file details the candidate overlaps identified by the tool. You can use the `query_index` and `target_index` to look up the full sequence names in the `overlaps.tsv` file.
`distance`: Measures the dissimilarity between the embedded vectors of the query and target sequences. A smaller value indicates higher similarity between the sequences.
109
83
110
-
`rank`: The similarity rank of the `target_index` sequence among all potential matches for the `query_index`. A lower rank (closer to 1) signifies a better match.
84
+
Column descriptions:
85
+
-`query_name`: The name of the query sequence
86
+
-`query_orientation`: The orientation of the query sequence (`+` for forward, `-` for reverse complement)
87
+
-`target_name`: The name of the target sequence
88
+
-`target_orientation`: The orientation of the target sequence (`+` for forward, `-` for reverse complement)
89
+
-`neighbor_rank`: The similarity rank of the target sequence among all potential matches for the query sequence, where `1` is the best match, `2` is the second best, and so on.
90
+
-`distance`: Measures the dissimilarity between the embedded vectors of the query and target sequences. A smaller value indicates higher similarity between the sequences.
111
91
112
92
113
93
@@ -215,16 +195,18 @@ The following flowchart illustrates the main steps of the FEDRANN pipeline. The
0 commit comments