Skip to content

Snigda0402/Factors-shaping-Data-Science

Repository files navigation

Factors shaping Data Science

Objective

The project aims to understand the underlying reasons for the success and failure of data science initiatives by analyzing a collection of news articles related to Data Science, Machine Learning, and Artificial Intelligence.

Skills/Tools Used :

  1. Python programming language
  2. Natural Language Processing
  3. Text cleaning
  4. Named Entity Recognition
  5. Topic Modelling
  6. Sentiment Analysis

Project Overview

1. Data Collection and Preprocessing:

  • A dataset containing news articles on Data Science, Machine Learning, and AI was provided by my University that I am studying in (University of Chicago).
  • Noise Cleaning :
    • Lowercasing
    • Removed HTML tags, URLs and web crawl remnants
    • Removed punctuations and digits
    • Removed symbols and non-printable characters
    • Removed newlines, tabs and extra white spaces - Pre-processing :
    • Removed stopwords
    • Lemmatization using WordNetLemmatizer()

2. Topic Detection:

  • Use topic modeling techniques to categorize articles into major themes or topics.
  • Assign each article to the appropriate topic for analysis.

3. Sentiment Analysis:

  • Perform sentiment analysis to determine the sentiment (positive, negative) expressed in the articles.
  • Customize sentiment analysis to fit the context of data science initiatives.

4. Reasons for Failure:

  • Identify articles with negative sentiment discussing failures in data science projects.
  • Extract reasons for these failures, such as technology issues, data challenges, or project management problems.

5. Reasons for Success:

  • Identify articles with positive sentiment discussing achivements in data science projects.
  • Extract reasons for these achievements

6. Sentiment Over Time Analysis:

  • Create a timeline to visualize how sentiment changes over different time periods.
  • Investigate whether sentiment patterns align with specific events or technological advancements.

7. Entity Identification:

  • Use Named Entity Recognition to identify organizations, people, and locations mentioned in the articles.
  • Compile a list of these entities for further analysis.

8. Targeted Sentiment Analysis:

  • Analyze the sentiment associated with specific entities mentioned in the articles.
  • Determine how organizations and people are portrayed in the context of data science projects.

9. Insights and Recommendations:

  • Analyze the reasons for failure and success to extract insights.
  • Develop actionable recommendations to enhance the success rates of data science initiatives.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors