MapReduce Sorting Project

This project implements a distributed integer values sorting system using a MapReduce-like paradigm, with a master-worker architecture and gRPC for communication.

Overview

Master:
- Reads a configuration file (config.yaml) which includes the list of worker addresses and the number of mappers/reducers.
- Assigns mapper and reducer roles to workers.
- Assigns integer ranges to reducers and notifies them to mappers.
- Distributes chunks of input data to the mappers.
Workers:
- Mappers receive chunks of unsorted integers, sort the data and split it into sub-chunks, basing division on reducers' assigned interval ranges, send the data to reducers, and then notify reducers when done.
- Reducers wait for all mappers to finish sending their data, then sort the collected data and write the output to a local file.

Project Structure

.
├── main.go
├── go.mod
├── master
│   └── master.go
├── worker
│   └── worker.go
├── proto
│   ├── mapreduce.proto
│   ├── mapreduce.pb.go
│   └── mapreduce_grpc.pb.go
├── generate_random_input.sh
├── config.yaml
└── input

Configuration File

config.yaml should list the workers and specify how many of them are mappers. For example:

workers:
  - "localhost:50051"
  - "localhost:50052"
  - "localhost:50053"
  - "localhost:50054"
  - "localhost:50055"
  - "localhost:50056"
  - "localhost:50057"
  - "localhost:50058"
mappers: 4

Input File

The input file should contain one integer per line, for example:

Running the System

Start the Workers

Open multiple terminals, one for each worker defined in config.yaml:
```
./mapreduce --mode=worker --port=:50051
./mapreduce --mode=worker --port=:50052
./mapreduce --mode=worker --port=:50053
./mapreduce --mode=worker --port=:50054
```
Each worker will print a message indicating which port its listening on and will wait for role assignment.

Run the Master

In a separate terminal:

./mapreduce --mode=master --config=config.yaml --input=input

Processing Steps

The master:
- Reads the config and input file.
- Computes data ranges for the reducers.
- Assigns mappers and reducers roles, while advertising reducer ranges to mappers, and mappers total count to reducers.
- Distributes input data chunks to the mappers.
- Once done, the master exits.
The mappers:
- Receive input data chunks.
- Sort the data.
- Send sub-chunks to reducers based on the reducers’ assigned intervals.
- Notify every reducer when they've done.
The reducers:
- Wait for all mappers to finish sending data.
- Merge the received sub-chunks.
- Sort the merged data.
- Write data to output files.

Output Files

Each reducer produces its own sorted output file, marking it with its port number. For example:

reducer__XXXXX_output.txt

These files contain the sorted integers that the reducer processed.

Cleanup

To stop the workers, press Ctrl+C in their respective terminals.

Notes

The reducers do not produce a single merged file, each reducer’s output file contains a portion of the input data.
Adjust config.yaml and input file as necessary for your use case.
There is a generate_random_input.sh script that can be used to generate a large input file with one million random integers.
Ensure all workers are running before starting the master.
For subsequent runs, the workers can remain running, but the master must be restarted each time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce Sorting Project

Overview

Project Structure

Configuration File

Input File

Running the System

Output Files

Cleanup

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
master		master
proto		proto
worker		worker
README.md		README.md
config.yaml		config.yaml
generate_random_input.sh		generate_random_input.sh
go.mod		go.mod
go.sum		go.sum
input		input
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

MapReduce Sorting Project

Overview

Project Structure

Configuration File

Input File

Running the System

Output Files

Cleanup

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages