Skip to content

ejlchappin/literature-analysis

Repository files navigation

Table of Contents

Scripts to analyse a the network in a body of literature (linux)

Step 1. Search in Scopus

  • Use the document search to query for a number of search terms in titles, abstracts and keywords, where the search terms are separated with AND.
  • Select all documents found and use the export function. Select the complete format and export to a csv file.

Step 2. Use the scripts to reformat the resulting csv file to generate an overview of authors, papers and keywords.

  • Use generateAuthorOverview.sh to give an overview of the authors. The resulting .edges file contains an overview of the links between authors.
  • Use generateDocumentOverview.sh to give an overview of the documents. The resulting .edges file contains an overview of the citation links between documents.
  • Use generateKeywordOverview.sh to give an overview of the keywords listed. The resulting .edges file contains an overview of the keywords that are mentioned together. The keywords-count.txt files gives an ordered count of the keywords.

Step 3. Harmonise the result in Google Refine

  • The result may need to be improved to resolve unmatched duplicates with Google Refine.
  • Especially authors and documents need to be checked, because the same author and documents have different identifiers (for example ‘Nelson, R.’ and ‘Nelson, R.R.’).
  • Use the Cluster and edit function to find similar values and determine which should be duplicates. There are various clustering algorithms implemented.
  • Also use this function to combine various editions of the same publication into one document identifier.
  • Change everything to lower case. Remove malformatted references.
  • Use the Facet by Blank function to deselect empty cells.
  • Export the result as a tab-separated file.

Step 4. Visualise and explore literature network with Gephi.

  • All three lists are imported in Gephi to study the network.
  • Check by hand for duplicate nodes and use the Merge nodes function.
  • Remove erroneous nodes (such as commas only, or ‘from china’).
  • Use the data explorer function to calculate general statistics of the networks and get an overview of the mostly cited papers.
  • The network may be visualized best by using the ForceAtlas layout, by making the size of the nodes based on the degree.
  • Export the resulting networks.

About

Scripts for the literature network of authors, papers and keywords, using the result from a Scopus search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages