Skip to content

Juanan4290/masterDataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Curriculum

Juan Antonio Morales - Data Scientist in the making

Summary and objectives:

The goal of this master is to learn the whole Data Science cycle: since the raw data, to the building of dashboard, through machine learning and the application of statistical methods in the resolution of common Data Science problems.

At the end of the master's degree the student will be able to convert data into products and services by using the most common statistical methods in the field of Data Science, as well as write their own code to analyze huge amount of data and apply machine learning techniques to this data using Spark.

Technologies and Programming Lenguages

  • Git / Github
  • Shell Script
  • BBDD SQL / PostgreSQL
  • Python / IPython
  • Jupyter Notebook
  • R / RStudio
  • RMarkdown
  • Shiny

Topics covered:

  • Version control system with Git and GitHub.
  • Handling files and cleaning data on the Shell.
  • SQL queries with PostgreSQL.
  • Data wrangling with Python (pandas, numPy and matplotlib) and R (dplyr, data.table and ggplot2).
  • Jupyter-Notebook and RMarkdown reporting.
  • Applications for Data Visualization using Shiny.
  • Web Scraping with BeautifulSoup and request libraries in Python and rvest and tidytext in R.
  • Advanced Statistical methods:
    · Sampling and Hypothesis Tests.
    · Statistical Modeling.
    · Generalizated Linear Models.
    · Bayesian Statistics.
    · Ridge and Lasso Regularization methods.
    · PCA Dimensionality Reduction and Singular Values Decomposition.
    · Matrix Factorization and Singular Value Decomposition for Recommender Systems
    
  • Machine Learning models:
    · Linear and Logistic Regression.
    · Support Vector Machines.
    · K nearest neighbors.
    · Decision Trees.
    · Ensambling models: Bagging and Boosting Trees.
    · Introduction to Deep Learning
    
  • Big Data techonologies:
    · Introduction to distributed systems. MapReduce.
    · PySpark SQL: RDDs, DataFrames and SQL querying.
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •