Skip to content

Commit 74eb00b

Browse files
committed
Modified Python and JS code to handle extranumerical lines and alternative readings
1 parent ab2be77 commit 74eb00b

14 files changed

Lines changed: 380 additions & 68 deletions

src/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
d3_practice.md
33
network_practice.md
44
sankey_test.md
5+
sandbox.md
56
components/sankey.js
67
components/sankey_original.js
78
components

src/about.md

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,8 @@ While the project overall is strictly concerned with Latin poetry, Greek texts a
9696

9797
Finally, each `word-level intertext` records at least one scholarly source (sometimes the original publication proposing the intertext, and sometimes a commentary), which are collectively stored in a `publication` table. (It is also possible to record an ancient work as the scholarly source, since occasionally the explicit recognition of an intertext goes back to a grammarian of antiquity.) This information is not currently displayed in any fashion, but it will eventually be shown when a passage is selected.
9898

99+
Some additional information about particulars of the database and project can be found on the [Frequently Asked Questions page](./faq).
100+
99101
### Data Pipeline
100102

101103
*Non-coders may wish to [skip this part](#visualizations)!*
@@ -161,7 +163,7 @@ def table_to_df(table, cols_dict):
161163

162164
The data loader then joins the disparate metrical data into a single dataframe and then returns it to a single restructured JSON object; and it converts each of the other dataframes to a JSON object, which are collectively stored in an array. These are all saved to files that are automatically committed to GitHub.
163165

164-
The same Python data loader also creates network nodes and edges from the data in order to enable visualization of the intertexts as [Sankey diagrams](https://en.wikipedia.org/wiki/Sankey_diagram). (I chose these over traditional [network graphs](https://guides.library.yale.edu/dh/graphs) since the sequential nature of an intertextual network makes it well-suited to visualizing as a flow-path.) While part of the network creation is done automatically by the d3 Sankey module, the initial preparation of nodes and edges is performed in the data loader; further filtering, when necessary, is done on the fly based on the user's selections.
166+
The same Python data loader also creates network nodes and edges from the data in order to enable visualization of the intertexts as [Sankey diagrams](https://en.wikipedia.org/wiki/Sankey_diagram). (I chose these over traditional [network graphs](https://guides.library.yale.edu/dh/graphs) since the sequential nature of an intertextual network makes it well-suited to visualizing as a flow-path.) While part of the network creation is done automatically by the d3 Sankey module, the initial preparation of nodes and edges is performed in the data loader; further filtering, when necessary, is done on the fly based on the user’s selections.
165167

166168
<p><details>
167169
<summary>Click to view the two custom functions for this stage.</summary>
@@ -451,12 +453,28 @@ for (let meter in meters) {
451453

452454
// Define grid height based on number of lines.
453455

454-
const gridY = (lineRange.lastLine - lineRange.firstLine) + 1; // I may need to modify this to accomodate passages with extra lines
456+
let gridYInterim = (lineRange.lastLine - lineRange.firstLine) + 1;
457+
let extraLineSet;
458+
459+
// make a set of any extranumerical lines
460+
461+
if (wordsFiltered.filter(word => word.line_num_modifier).length > 0) {
462+
extraLineSet = new Set(
463+
wordsFiltered.filter(word => word.line_num_modifier)
464+
.map(word => ({lineNum: word.line_num, lineNumMod: word.line_num_modifier, lineNumString: `${word.line_num}${word.line_num_modifier}`}))
465+
);
466+
gridYInterim += extraLineSet.size; // if there are extranumerical lines, increase the height multiplier accordingly, so that cells remain square
467+
}
468+
469+
const extraLines = extraLineSet ? Array.from(extraLineSet) : [];
470+
471+
const gridY = gridYInterim;
455472

456473
const cellSize = 20;
457474
const gridHeight = gridY * cellSize;
458475
const gridWidth = gridX * cellSize;
459476

477+
460478
// Create plot, conditional on the existence of intertexts
461479

462480
// set tick range; increase step every ten (max) intertexts
@@ -469,6 +487,18 @@ else {
469487
};
470488
let tickRange = d3.range(Math.min(...intxtCnts), Math.max(...intxtCnts)+1, step);
471489

490+
let lineVals = d3.range(lineRange.firstLine, lineRange.lastLine +1);
491+
492+
// if there are extranumerical lines, insert them into the line values array
493+
494+
for (let line of extraLines) {
495+
let insertAfter = line.lineNum;
496+
let insertAfterIndex = lineVals.indexOf(insertAfter) + 1;
497+
let insertString = line.lineNumString;
498+
lineVals.splice(insertAfterIndex, 0, insertString);
499+
}
500+
501+
472502
const plotDisplay = intertextsArr.every(intxt => intxt.intxtCnt === 0) ? null : Plot.plot({
473503
grid: true,
474504
x: {
@@ -479,7 +509,8 @@ const plotDisplay = intertextsArr.every(intxt => intxt.intxtCnt === 0) ? null :
479509
},
480510
y: {
481511
label: 'Line',
482-
domain: d3.range(lineRange.firstLine, lineRange.lastLine +1),
512+
// domain: d3.range(lineRange.firstLine, lineRange.lastLine +1),
513+
domain: lineVals,
483514
tickSize: 0,
484515
},
485516
color: {scheme: "Greens",
@@ -550,24 +581,28 @@ if (plotCurrSelect) {
550581

551582
currWordId = plotCurrSelect.wordObj.obj_id; // set current word ID to the selected word
552583

584+
let intertextsTableExtended = intertextsTable.concat(intertextsModTable);
585+
553586
// create functions for getting a word's immediate ancestors or descendants
554587
function getWordAncestors(currWordId){
555-
for (let i in intertextsTable) {
556-
let intxt = intertextsTable[i];
588+
for (let i in intertextsTableExtended) {
589+
let intxt = intertextsTableExtended[i];
557590
// for each intertext in the intertexts table, if its target ID matches the focus word (either the selected word or one of its ancestors), add it to the list of ancestor intertexts and add its source to the list of words to be processed.
558591
if (currWordId === intxt.target_word_id) {
559592
ancestorIntertexts.push(intxt);
560593
ancestorWordIDs.push(intxt.source_word_id);
594+
wordSankeyIntxtIDs.push(intxt.intxt_id);
561595
}
562596
}
563597
}
564598
function getWordDescendants(currWordId){
565-
for (let i in intertextsTable) {
566-
let intxt = intertextsTable[i];
599+
for (let i in intertextsTableExtended) {
600+
let intxt = intertextsTableExtended[i];
567601
// for each intertext in the intertexts table, if its source ID matches the focus word (either the selected word or one of its descendants), add it to the list of descendant intertexts and add its target to the list of words to be processed.
568602
if (currWordId === intxt.source_word_id) {
569603
descendantIntertexts.push(intxt);
570604
descendantWordIDs.push(intxt.target_word_id);
605+
wordSankeyIntxtIDs.push(intxt.intxt_id);
571606
}
572607
}
573608
}
@@ -623,9 +658,9 @@ The colors (which distinguish between authors in the passage-level and full inte
623658

624659
## Next Steps
625660

626-
In addition to continuing database input, the code needs to be tweaked in order to handle extranumerical lines (such as 845a, which would come between 845 and 846) and alternate readings.
661+
The main focus for the near future is on entering additional intertexts into the database. Once sufficient intertexts have been entered, work can begin on the creation of analytical tools, enabling researchers to ask and answer questions about the data.
627662

628-
Beyond those crucial improvements, a few additional potential long-term developments are:
663+
A few additional, potential, long-term developments are:
629664

630665
- an option to view only direct intertext density
631666
- an option to view &ldquo;descendant&rdquo; intertexts instead of &ldquo;ancestor&rdquo; intertexts in the density display

src/data/intxt_network_graph.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/data/intxts_full.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/data/intxts_full_modified.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
[{"intxt_grp_id": "21049513", "intxt_id": "21049515", "source_word_id": "21049507", "target_word_id": "21049502", "source_author_id": "20336404", "source_work_id": "20336405", "source_work_seg_id": "20336406", "source_line_num": "718", "target_author_id": "20215016", "target_work_id": "20215018", "target_work_seg_id": "20238543", "target_line_num": 829, "match_type_ids": ["20215033", "20240810"], "original_id": "21049508", "original_grp_id": "21049513"}]

src/data/model_json_backup.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/data/nodegoat_data.json.py

Lines changed: 91 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import requests
66
import pandas as pd
77
import networkx as nx
8+
import copy
89

910
# set parameters
1011
api_token = os.getenv("NODEGOAT_API_TOKEN")
@@ -155,6 +156,18 @@ def get_object_ids(model):
155156
authorship_prob_class_table = objtype["objects"]
156157
tables_dict["authorship_prob_class_table"] = authorship_prob_class_table
157158
break
159+
elif objtype["objects"][id_num]["object"]["type_id"] == 23064:
160+
textual_prob_table = objtype["objects"]
161+
tables_dict["textual_prob_table"] = textual_prob_table
162+
break
163+
elif objtype["objects"][id_num]["object"]["type_id"] == 23065:
164+
alternate_reading_table = objtype["objects"]
165+
tables_dict["alternate_reading_table"] = alternate_reading_table
166+
break
167+
elif objtype["objects"][id_num]["object"]["type_id"] == 23066:
168+
word_lvl_intxt_mod_table = objtype["objects"]
169+
tables_dict["word_lvl_intxt_mod_table"] = word_lvl_intxt_mod_table
170+
break
158171
else:
159172
pass
160173
# end of inner for loop
@@ -272,7 +285,7 @@ def remove_decimal(id_string):
272285
"max_length": {"67537": "objval"},
273286
#"unit_line": {"68127": "objval"}
274287
}
275-
### The rest aren't necessary for the actual visualization ###
288+
### The next three aren't necessary for the actual visualization ###
276289
publication_cols = {"author_ids": {"67416": "refid"},
277290
"publication_date": {"67417": "objval"},
278291
"article_chapter_title": {"67418": "objval"},
@@ -286,6 +299,26 @@ def remove_decimal(id_string):
286299
"PID": {"67431": "objval"},
287300
# in future, may add latitude and longitude from sub-object, but that would require additional logic
288301
}
302+
###
303+
textual_prob_cols = {
304+
"work_segment_id": {"71932": "refid"},
305+
"line_num": {"71933": "objval"},
306+
"line_num_modifier": {"71934": "objval"},
307+
"start_pos_id": {"71935": "refid"},
308+
"stop_pos_id": {"71936": "refid"}
309+
}
310+
alternate_reading_cols = {
311+
"textual_prob_id": {"71937": "refid"},
312+
"word_inst_ids": {"71938": "refid"},
313+
"default_reading": {"71939": "objval"}
314+
}
315+
wd_lvl_intxt_mod_cols = {
316+
"wd_lvl_intxt_id": {"71940": "refid"},
317+
"wd_to_replace_id": {"71941": "refid"},
318+
"wd_sub_id": {"71942": "refid"},
319+
"match_type_remove_ids": {"71943": "refid"},
320+
"match_type_add_ids": {"71944": "refid"}
321+
}
289322

290323
# Convert tables to dataframes based on specified columns
291324
word_instance_df = table_to_df(word_instance_table, wd_inst_cols)
@@ -303,6 +336,9 @@ def remove_decimal(id_string):
303336
scholar_df = table_to_df(scholar_table,scholar_cols)
304337
publication_df = table_to_df(publication_table,publication_cols)
305338
pleiades_df = table_to_df(pleiades_table,pleiades_cols)
339+
textual_prob_df = table_to_df(textual_prob_table,textual_prob_cols)
340+
alternate_reading_df = table_to_df(alternate_reading_table,alternate_reading_cols)
341+
word_lvl_intxt_mod_df = table_to_df(word_lvl_intxt_mod_table,wd_lvl_intxt_mod_cols)
306342

307343
# For `word instance` df, make sure that elided_monosyllable is either False or True, not None:
308344
word_instance_df['elided_monosyllable'] = word_instance_df['elided_monosyllable'].apply(lambda x: False if x is None else x)
@@ -437,19 +473,20 @@ def remove_decimal(id_string):
437473
tables_df_to_dict[df_name] = new_dict
438474

439475
sources_table = []
440-
for obj_id in word_lvl_intxt_table:
441-
intxt_sources = word_lvl_intxt_table[obj_id]['object']['object_sources']
442-
if isinstance(intxt_sources, dict):
443-
for source_type_id in intxt_sources.keys():
444-
for source in intxt_sources[source_type_id]:
445-
sources_dict = {}
446-
sources_dict['obj_id'] = obj_id
447-
sources_dict['source_type_id'] = source_type_id
448-
source_id = source['object_source_ref_object_id']
449-
sources_dict['source_id'] = str(source_id)
450-
source_location = source['object_source_link']
451-
sources_dict['source_location'] = source_location
452-
sources_table.append(sources_dict)
476+
for table in [word_lvl_intxt_table, word_lvl_intxt_mod_table]:
477+
for obj_id in table:
478+
intxt_sources = table[obj_id]['object']['object_sources']
479+
if isinstance(intxt_sources, dict):
480+
for source_type_id in intxt_sources.keys():
481+
for source in intxt_sources[source_type_id]:
482+
sources_dict = {}
483+
sources_dict['obj_id'] = obj_id
484+
sources_dict['source_type_id'] = source_type_id
485+
source_id = source['object_source_ref_object_id']
486+
sources_dict['source_id'] = str(source_id)
487+
source_location = source['object_source_link']
488+
sources_dict['source_location'] = source_location
489+
sources_table.append(sources_dict)
453490
# else:
454491
# sources_dict = {'obj_id': obj_id, 'source_type_id': None, 'source_id': None}
455492
# sources_table.append(sources_dict)
@@ -470,14 +507,13 @@ def remove_decimal(id_string):
470507
def build_intxt_dict(intxt_ids):
471508
for intxt in intxt_ids:
472509
intxt_id = str(intxt)
473-
for row2 in word_lvl_intxt_df[word_lvl_intxt_df.obj_id == intxt_id].iterrows():
510+
for i, row2 in word_lvl_intxt_df[word_lvl_intxt_df.obj_id == intxt_id].iterrows():
474511
row_dict = {}
475512
if intxt_id in grp_intxts_list:
476513
row_dict["intxt_grp_id"] = intxt_grp_id
477514
else:
478515
row_dict["intxt_grp_id"] = None
479516
row_dict["intxt_id"] = intxt_id
480-
row2 = row2[1]
481517
source_id = row2.source_word_id
482518
target_id = row2.target_word_id
483519
if isinstance(row2.match_type_ids, list):
@@ -509,18 +545,55 @@ def build_intxt_dict(intxt_ids):
509545
row_dict["match_type_ids"] = match_type_ids
510546
intxt_grp_list.append(row_dict)
511547

512-
for row in word_lvl_intxt_grp_df.iterrows():
513-
row = row[1]
548+
for i, row in word_lvl_intxt_grp_df.iterrows():
514549
intxt_grp_id = row.obj_id
515550
intxt_ids = row.word_intxt_ids
516551
build_intxt_dict(intxt_ids)
552+
553+
# do the same for intertexts not included in a group
517554
build_intxt_dict([intxt for intxt in word_lvl_intxt_df.obj_id if intxt not in grp_intxts_list])
518555

519556
intxt_full_df = pd.DataFrame.from_dict(intxt_grp_list)
520557

521558
with open(scriptdir+"/intxts_full.json", "w") as intxts_full:
522559
json.dump(intxt_grp_list, intxts_full)
523560

561+
# make a list of full intertexts modified based on potential word substitutions due to alternate readings
562+
563+
intxts_to_modify_df = intxt_full_df.query(f"intxt_id in {word_lvl_intxt_mod_df['wd_lvl_intxt_id'].to_list()}").copy().reset_index(drop=True)
564+
intxt_full_mod = []
565+
566+
for i, row in intxts_to_modify_df.iterrows(): # take each original full intertext that needs to be modified
567+
intxt_mod_subset = word_lvl_intxt_mod_df.query("wd_lvl_intxt_id == @row.intxt_id") # get possible modifications for the current original itnertext
568+
for j, row2 in intxt_mod_subset.iterrows():
569+
new_intxt_full = copy.deepcopy({key: val for key, val in row.items()}) # new deep copy dictionary of unmodified full intertext
570+
new_intxt_full['intxt_id'] = row2['obj_id']
571+
for st in ['source','target']:
572+
if row[f'{st}_word_id'] == row2['wd_to_replace_id']:
573+
new_intxt_full[f'{st}_word_id'] = row2['wd_sub_id']
574+
new_word = word_instance_df.query(f"obj_id == '{row2["wd_sub_id"]}'").reset_index(drop=True)
575+
new_intxt_full[f'{st}_line_num'] = new_word.loc[0, 'line_num']
576+
new_workseg = new_word.loc[0, "work_segment_id"]
577+
if row[f'{st}_work_seg_id'] != new_workseg:
578+
new_intxt_full[f'{st}_work_seg_id'] = new_workseg
579+
new_work = work_seg_df.query("obj_id == @new_workseg").reset_index(drop=True).loc[0, "work_id"]
580+
new_intxt_full[f'{st}_work_id'] = new_work
581+
new_author = work_df.query("obj_id == @new_work").reset_index(drop=True).loc[0, "author_id"]
582+
new_intxt_full[f'{st}_author_id'] = new_author
583+
for id in row2.match_type_remove_ids:
584+
new_intxt_full['match_type_ids'].remove(id)
585+
for id in row2.match_type_add_ids:
586+
new_intxt_full['match_type_ids'].append(id)
587+
new_intxt_full['original_id'] = row.intxt_id
588+
new_intxt_full['original_grp_id'] = row.intxt_grp_id
589+
if new_intxt_full['source_work_seg_id'] != row['source_work_seg_id'] or new_intxt_full['target_work_seg_id'] != row['target_work_seg_id']:
590+
new_intxt_full['intxt_grp_id'] = None
591+
592+
intxt_full_mod.append(new_intxt_full)
593+
594+
with open(scriptdir+"/intxts_full_modified.json", "w") as intxts_full_mod_file:
595+
json.dump(intxt_full_mod, intxts_full_mod_file)
596+
524597

525598
######### CREATE AND EXPORT NETWORK ####################
526599

src/data/nodegoat_tables.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/data/objects_json_backup.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

src/data/sankey_data.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)