Releases · mtcs/faster

17 Jan 13:38

mtcs

alpha-0.0.4

2269e65

0.0.4 Alpha Pre-release

Pre-release

Change summary:

Some new functions.
Bug Fixes.
New latency test.
Memory Optimizations.
Pagerank and Pagerank-bulk examples fully functional.

Fast Distributed Dataset (FDD) types:

Simple ( char, int, long int, float and double ).
Pointer ( char *, int *, long int *, float * and double * ) (WILL BE DISCONTINUED IN THE FUTURE).
Containers ( std::vector, std::string ).
Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
Grouped (a group of two or tree datasets).

Data Functions

Map - transform a data item in any other type ( 1 to 1 ).
Reduce - reduce all elements into one ( 2 to 1 ).
FlatMap - generate a new set of data ( 1 to n ).
Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
MapByKey - transform all indexed datasets items with the same key ( n to 1 ).
FlatMapByKey - export a new set of data from entries grouped by keys.
UpdateByKey - a function to modify a dataset content.

Other Functions

FDD creation from local memory ( through constructor ).
Distributed read from file through constructor - each process read from a global file offset.
collect - get a local copy of the dataset ( send the distributed data to the driver process ).
coutByKey - just like a histogram ( count occurrence of every key and send to driver process ).
groupByKey - Group a dataset data by key, data with the same key migrates to a single machine.
printInfo - Prints runtime information of all tasks
printHeader - Prints the header of the runtime information
updateInfo - Prints runtime information for all tasks called after last updateInfo (useful for program status update).
Global variables - Global variables that can be modified by the driver process transparently.

Release Oprimizations:

Memmory leak plug.

Examples:

Pagerank - (w/ and wo/ bulk) http://en.wikipedia.org/wiki/PageRank
Latency test - Tests Framework latency woth O(1) functions.

Implementation planned for next releases:

(in order of priority)

Cogroup Optimization.
Load Redistribution/Tune
Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
HDFS support
Fault Tolerance
- Dataset data replication
- Process restart/replacement

Ps.: the general progress will move to: https://github.com/mtcs/faster/wiki/Implementation-Progress

Assets 2

23 Oct 14:41

mtcs

alpha-0.0.3

29ee163

0.0.3 Alpha Pre-release

Pre-release

Fast Distributed Dataset (FDD) types:

Simple ( char, int, long int, float and double ).
Pointer ( char *, int *, long int *, float * and double * ) (WILL BE DISCONTINUED IN THE FUTURE).
Containers ( std::vector, std::string ).
Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
Grouped (a group of two or tree datasets).

Data Functions

Map - transform a data item in any other type ( 1 to 1 ).
Reduce - reduce all elements into one ( 2 to 1 ).
FlatMap - generate anew set of data ( 1 to n ).
Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
MapByKey - transform all indexed datasets itens with the same key ( n to 1 ).

Other Functions

FDD creation from local memory ( through constructor ).
Distributed read from file through constructor - each process read from a global file offset.
collect - get a local copy of the dataset ( send the distributed data to the driver process ).
coutByKey - just like a histogram ( count occurrence of every key and send to driver process ).
groupByKey - Group a dataset data by key, data with the same key migrates to a single machine.
printInfo** - Prints runtime information of all tasks
printHeader** - Prints the header of the runtime information
updateInfo** - Prints runtime information fo all tasks called after last updateInfo (useful for program status update).

Release Oprimizations

ByKey functions no longer needs sorted data, lowering overhead.

Implementation planned for next releases:

(in order of priority)

Global variables - Global variables that can be modified by the driver process.
Load Redistribution/Tune
Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
HDFS support
Fault Tolerance
- Dataset data replication
- Process restart/replacement

** new implementation

Assets 2

26 Aug 23:25

mtcs

alpha-0.0.2

3986984

0.0.2 Alpha Pre-release

Pre-release

Fast Distributed Dataset (FDD) types:

Simple ( char, int, long int, float and double ).
Pointer ( char *, int *, long int *, float * and double * ).
Containers ( std::vector, std::string ).
Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
Grouped** (a group of two or tree datasets).

Data Functions

Map - transform a data item in any other type ( 1 to 1 ).
Reduce - reduce all elements into one ( 2 to 1 ).
FlatMap - generate anew set of data ( 1 to n ).
Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
MapByKey** - transform all indexed datasets itens with the same key ( n to 1 ).

Other Functions

FDD creation from local memory ( through constructor ).
Distributed read from file through constructor - each process read from a global file offset.
Collect - get a local copy of the dataset ( send the distributed data to the driver process ).
CoutByKey** - just like a histogram ( count occurrence of every key and send to driver process ).
GroupByKey** - Group a dataset data by key, data with the same key migrates to a single machine.

Implementation planned for next releases:

(in order of priority)

Global variables - Global variables that can be modified by the driver process.
Load Redistribution/Tune
Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
HDFS support
Fault Tolerance
- Dataset data replication
- Process restart/replacement

** new implementation

Assets 2

20 Jul 19:43

mtcs

0.0.1-alpha

238f221

0.0.1 Alpha Pre-release

Pre-release

Fast Distributed Dataset (FDD) types:

Simple ( char, int, long int, float and double )
Pointer ( char *, int *, long int *, float * and double * )
Containers ( std::vector, std::string )
Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container)

Data Functions

Map - transform a data item in any other type ( 1 to 1 )
Reduce - reduce all elements into one ( n to 1 )
FlatMap - generate anew set of data ( 1 to n )
Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations

Other Functions

FDD creation from local memory ( through constructor )
Distributed read from file through constructor ( each process read a file offset )
Collect ( send the distributed data to the driver process )

Initial implementation:

(implementation started but not finished or tested)

GroupByKey ( send all data with the same index to a specific node )
CoutByKey ( count occurrence of every key and send to driver process )

Implementation planned:

ByKey functions ( mapByKey and reduceByKey )
Global variables ( passed with custom function pointer ) ex.: myFdd->map(&mapFunc, globalVar1, globalVar2);

Assets 2

21 Jun 16:09

mtcs

0.0-alpha

0879a92

0.0 Alpha Pre-release

Pre-release

Basic simple FDD functionality with no block balancing

Assets 2

Releases: mtcs/faster

0.0.4 Alpha

Fast Distributed Dataset (FDD) types:

Data Functions

Other Functions

Release Oprimizations:

Implementation planned for next releases:

Uh oh!

0.0.3 Alpha

Fast Distributed Dataset (FDD) types:

Data Functions

Other Functions

Release Oprimizations

Implementation planned for next releases:

Uh oh!

0.0.2 Alpha

Fast Distributed Dataset (FDD) types:

Data Functions

Other Functions

Implementation planned for next releases:

Uh oh!

0.0.1 Alpha

Fast Distributed Dataset (FDD) types:

Data Functions

Other Functions

Initial implementation:

Implementation planned:

Uh oh!

0.0 Alpha

Uh oh!