The goal of this master is to learn the whole Data Science cycle: since the raw data, to the building of dashboard, through machine learning and the application of statistical methods in the resolution of common Data Science problems.
At the end of the master's degree the student will be able to convert data into products and services by using the most common statistical methods in the field of Data Science, as well as write their own code to analyze huge amount of data and apply machine learning techniques to this data using Spark.
- Git / Github
- Shell Script
- BBDD SQL / PostgreSQL
- Python / IPython
- Jupyter Notebook
- R / RStudio
- RMarkdown
- Shiny
- Version control system with Git and GitHub.
- Handling files and cleaning data on the Shell.
- SQL queries with PostgreSQL.
- Data wrangling with Python (pandas, numPy and matplotlib) and R (dplyr, data.table and ggplot2).
- Jupyter-Notebook and RMarkdown reporting.
- Applications for Data Visualization using Shiny.
- Web Scraping with BeautifulSoup and request libraries in Python and rvest and tidytext in R.
- Advanced Statistical methods:
· Sampling and Hypothesis Tests. · Statistical Modeling. · Generalizated Linear Models. · Bayesian Statistics. · Ridge and Lasso Regularization methods. · PCA Dimensionality Reduction and Singular Values Decomposition. · Matrix Factorization and Singular Value Decomposition for Recommender Systems - Machine Learning models:
· Linear and Logistic Regression. · Support Vector Machines. · K nearest neighbors. · Decision Trees. · Ensambling models: Bagging and Boosting Trees. · Introduction to Deep Learning - Big Data techonologies:
· Introduction to distributed systems. MapReduce. · PySpark SQL: RDDs, DataFrames and SQL querying.