Open-sourced, detailed and reproducible solutions to data engineering problems provided at MIPT.
The repository currently contains soltuions to problems on following topics:
- Writing a custom Mapreduce pipeline in Python for HDFS environment (
./hdfs_mapreduce) - Deploying a NGINX server on Alpine that serves a static HTML page using Docker (
./nginx_docker) - Deploying a Python app and connecting it with a Postgres database with Alembic migrations using Docker (
./postgres_docker) - Deploying a Svelte app on the Node.JS backend in a multi-staging process whithin Dockerfile (
./svelte_docker) - Writing custom Spark pipelines for graph searches and dataframe manipulations for HDFS environment (
./spark)