Skip to content

Latest commit

 

History

History
135 lines (117 loc) · 5.6 KB

File metadata and controls

135 lines (117 loc) · 5.6 KB

Main Workflow for creating data models

Contents:

Scrapy Workflow (NEWEST)

-- Scrapy Workflow Use this newest way to collect data, using Scrapy framework. After data parsing you should proceed to MainWorkflow step - Processing collected data

!!!IMPORTANT!!! Store all files in !WORKFLOW directory Run all scripts MyNamesEnvironment directory:

iMac-Anton:MyNamesEnvironment antonnovoselov$ pwd
/Users/antonnovoselov/Documents/Development/Nickname generator wrap/MyNamesEnvironment

iMac-Anton:MyNamesEnvironment antonnovoselov$ python 07.\ Collect\ images/getImages.py

Collecting source data

  1. Collect ENG names data using parser. Collect name, description, gender, URL.
  • Use:
    • Datacol parser
    • Last working examples:
      • lotr4.par (DEPRECATED)
      • lotr4-eng.par
  • Configure parser:
    • Name, description, gender to collecting fields.
    • Datacol collecting URL automatically.
  • Result: Spreadsheet with Eng data. Name it sourcetableStage1.xlsx
  • Rename sheet with data to 'sheet1'
  • Put sourcetableStage1.xlsx to !WORKFLOW directory

Cellecting RUS data

  1. For every ENG URL, collect corresponding RUS URL if it can be found.
  • Use:
    • get_links_from_web.py
  • Configure script:
    • cell_start_number
    • cell_end_number
  • Source: sourcetableStage1.xlsx
  • Result: resulttableStage1.xlsx (with added RUS URLs to column 'E')
  • Delete sourcetableStage1.xlsx
  1. For every RUS URL, collect corresponding RUS data (name, description, URL)
  • Use:
    • Datacol parser
  • Last working examples:
    • lotr eng-rus from URLs.par
  • Source: Copy column E contents from resulttableStage1.xlsx
  • Result: Spreadsheet with Rus data. It will be saved with name lotr eng-rus from URLs in MyDocuments directory
  1. Add RUS data to sourcesheet
  • In resulttableStage1.xlsx create new sheet sourcesheet
  • Copy contents of lotr eng-rus from URLs to sourcesheet of the resulttableStage1.xlsx
  • Use:
    • proceed_ruslinks.py
  • Configure script:
    • sheet1 sheet:
      • cell_start_number
      • cell_end_number
    • sourcesheet sheet:
      • cell_source_start_number
      • cell_source_end_number
  • Source: resulttableStage1.xlsx
  • Result: resulttableStage2.xlsx (with added RUS name to column 'G' and Rus bio to column 'H')
  • Drag column E to column H (replace). F - rus name, G - rus bio, H - rus url
  • Delete resulttableStage1.xlsx

Processing collected data

  1. Transfer data from resulttableStage2.xlsx to TemplateTable.xlsx:
  • Correct column 'C' - specify correct gender. If there's race - concatenate race + gender. For example
    • HobbitMasc - if it's hobbits race
    • Masc - no race
  • Correct cell 'N3'. Use format: 'category ID'.'gender ID.'. For example:
    • 02.02.0. - Fiction.Tolkien.Masc.
  • Correct cell 'O3'. Specify '.race ID'. If there's no race, delete cell content. For example:
    • .03
  • Enumerate column 'A' according to names list count
  • Delete resulttableStage2.xlsx
  1. Use script to fill imageName column in names TemplateTable.xlsx:
  • Use:
    • workbookDiacriticRemover.py
  • Configure:
    • cell_start_number
    • cell_end_number
  • Source: TemplateTable.xlsx
  • Result: DoneTable.xlsx (with imageName filled to column 'G' for every name).

Collecting images for names

  1. Collect images for names using script
  • Use
    • getImagesFromSRCLinks.py
  • Configure:
    • cell_start_number
    • cell_end_number
    • macos = True/False (for Selenium version of script)
    • dirPath - path where save parsed from URLs images
  • Source: DoneTable.xlsx
  • Result: names images loaded and saved to dirPath using correct image names.

Transfer data from DoneTable.xlsx to Xcode Plists, and upload images to Firebase Storage

  1. Copy column 'H' and column 'M' contents of DoneTable.xlsx to Xcode project as plists.
  • In Xcode create 2 plists, named as 'CategoryAliasGender.plist' or 'CategoryAliasGenderRace.plist'
  • Localize created plists - enable Eng and Rus localizations.
  • Copy column contents to standard MacOS Notes.
  • Then copy from Notes to Xcode. This action removes unnecessary quotes symbols.
  • 'H' - ENG plist. 'M' - RUS plist
  1. Upload names images using simulator working directory.
  • Pay attention to ANViewController uploadUsingFileManager() method. Both parameters configured automatically, when category, gender and race selected:
    • pathName. It will be used as directory name in Firebase Storage
    • checkingPrefix It's preventing from uploading images from other category, race and gender
  • Copy images from dirPath to !ToUpload/ directory of simulator working directory. And press the Upload button.
  1. Move DoneTable.xlsx to NamesDB storage directory. Rename file using template:
  • 'AreaCategoryGenderRace.xlsx' - if there's race
  • 'AreaCategoryGender.xlsx' - no race
  1. Move parsed images from dirPath to NamesImages storage directory.

Other scripts

-- OtherScripts

Alternative workflow (not so clean and correct)

-- ImagesFromHeap (DEPRECATED)