- Scrapy Workflow
- Collecting source data
- Collecting RUS data
- Processing collected data
- Collecting images for names
- Upload images to Firebase Storage
- Other scripts
- Alternative workflow
-- Scrapy Workflow Use this newest way to collect data, using Scrapy framework. After data parsing you should proceed to MainWorkflow step - Processing collected data
!!!IMPORTANT!!! Store all files in !WORKFLOW directory Run all scripts MyNamesEnvironment directory:
iMac-Anton:MyNamesEnvironment antonnovoselov$ pwd
/Users/antonnovoselov/Documents/Development/Nickname generator wrap/MyNamesEnvironment
iMac-Anton:MyNamesEnvironment antonnovoselov$ python 07.\ Collect\ images/getImages.py
- Collect ENG names data using parser. Collect name, description, gender, URL.
- Use:
- Datacol parser
- Last working examples:
lotr4.par(DEPRECATED)- lotr4-eng.par
- Configure parser:
- Name, description, gender to collecting fields.
- Datacol collecting URL automatically.
- Result: Spreadsheet with Eng data. Name it sourcetableStage1.xlsx
- Rename sheet with data to 'sheet1'
- Put sourcetableStage1.xlsx to !WORKFLOW directory
- For every ENG URL, collect corresponding RUS URL if it can be found.
- Use:
- get_links_from_web.py
- Configure script:
- cell_start_number
- cell_end_number
- Source: sourcetableStage1.xlsx
- Result: resulttableStage1.xlsx (with added RUS URLs to column 'E')
- Delete sourcetableStage1.xlsx
- For every RUS URL, collect corresponding RUS data (name, description, URL)
- Use:
- Datacol parser
- Last working examples:
- lotr eng-rus from URLs.par
- Source: Copy column E contents from resulttableStage1.xlsx
- Result: Spreadsheet with Rus data. It will be saved with name lotr eng-rus from URLs in MyDocuments directory
- Add RUS data to sourcesheet
- In resulttableStage1.xlsx create new sheet sourcesheet
- Copy contents of lotr eng-rus from URLs to sourcesheet of the resulttableStage1.xlsx
- Use:
- proceed_ruslinks.py
- Configure script:
- sheet1 sheet:
- cell_start_number
- cell_end_number
- sourcesheet sheet:
- cell_source_start_number
- cell_source_end_number
- sheet1 sheet:
- Source: resulttableStage1.xlsx
- Result: resulttableStage2.xlsx (with added RUS name to column 'G' and Rus bio to column 'H')
- Drag column E to column H (replace). F - rus name, G - rus bio, H - rus url
- Delete resulttableStage1.xlsx
- Transfer data from resulttableStage2.xlsx to TemplateTable.xlsx:
- Correct column 'C' - specify correct gender. If there's race - concatenate race + gender. For example
- HobbitMasc - if it's hobbits race
- Masc - no race
- Correct cell 'N3'. Use format: 'category ID'.'gender ID.'. For example:
- 02.02.0. - Fiction.Tolkien.Masc.
- Correct cell 'O3'. Specify '.race ID'. If there's no race, delete cell content. For example:
- .03
- Enumerate column 'A' according to names list count
- Delete resulttableStage2.xlsx
- Use script to fill imageName column in names TemplateTable.xlsx:
- Use:
- workbookDiacriticRemover.py
- Configure:
- cell_start_number
- cell_end_number
- Source: TemplateTable.xlsx
- Result: DoneTable.xlsx (with imageName filled to column 'G' for every name).
- Collect images for names using script
- Use
- getImagesFromSRCLinks.py
- Configure:
- cell_start_number
- cell_end_number
- macos = True/False (for Selenium version of script)
- dirPath - path where save parsed from URLs images
- Source: DoneTable.xlsx
- Result: names images loaded and saved to dirPath using correct image names.
- Copy column 'H' and column 'M' contents of DoneTable.xlsx to Xcode project as plists.
- In Xcode create 2 plists, named as 'CategoryAliasGender.plist' or 'CategoryAliasGenderRace.plist'
- Localize created plists - enable Eng and Rus localizations.
- Copy column contents to standard MacOS Notes.
- Then copy from Notes to Xcode. This action removes unnecessary quotes symbols.
- 'H' - ENG plist. 'M' - RUS plist
- Upload names images using simulator working directory.
- Pay attention to ANViewController uploadUsingFileManager() method. Both parameters configured automatically, when category, gender and race selected:
- pathName. It will be used as directory name in Firebase Storage
- checkingPrefix It's preventing from uploading images from other category, race and gender
- Copy images from dirPath to !ToUpload/ directory of simulator working directory. And press the Upload button.
- Move DoneTable.xlsx to NamesDB storage directory. Rename file using template:
- 'AreaCategoryGenderRace.xlsx' - if there's race
- 'AreaCategoryGender.xlsx' - no race
- Move parsed images from dirPath to NamesImages storage directory.
-- OtherScripts
-- ImagesFromHeap (DEPRECATED)