Skip to content

Releases: mtcs/faster

0.0.4 Alpha

17 Jan 13:38

Choose a tag to compare

0.0.4 Alpha Pre-release
Pre-release

Change summary:

  • Some new functions.
  • Bug Fixes.
  • New latency test.
  • Memory Optimizations.
  • Pagerank and Pagerank-bulk examples fully functional.

Fast Distributed Dataset (FDD) types:

  • Simple ( char, int, long int, float and double ).
  • Pointer ( char *, int *, long int *, float * and double * ) (WILL BE DISCONTINUED IN THE FUTURE).
  • Containers ( std::vector, std::string ).
  • Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
  • Grouped (a group of two or tree datasets).

Data Functions

  • Map - transform a data item in any other type ( 1 to 1 ).
  • Reduce - reduce all elements into one ( 2 to 1 ).
  • FlatMap - generate a new set of data ( 1 to n ).
  • Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
  • MapByKey - transform all indexed datasets items with the same key ( n to 1 ).
  • FlatMapByKey - export a new set of data from entries grouped by keys.
  • UpdateByKey - a function to modify a dataset content.

Other Functions

  • FDD creation from local memory ( through constructor ).
  • Distributed read from file through constructor - each process read from a global file offset.
  • collect - get a local copy of the dataset ( send the distributed data to the driver process ).
  • coutByKey - just like a histogram ( count occurrence of every key and send to driver process ).
  • groupByKey - Group a dataset data by key, data with the same key migrates to a single machine.
  • printInfo - Prints runtime information of all tasks
  • printHeader - Prints the header of the runtime information
  • updateInfo - Prints runtime information for all tasks called after last updateInfo (useful for program status update).
  • Global variables - Global variables that can be modified by the driver process transparently.

Release Oprimizations:

  • Memmory leak plug.

Examples:

Pagerank - (w/ and wo/ bulk) http://en.wikipedia.org/wiki/PageRank
Latency test - Tests Framework latency woth O(1) functions.

Implementation planned for next releases:

(in order of priority)

  • Cogroup Optimization.
  • Load Redistribution/Tune
  • Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
  • Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
  • HDFS support
  • Fault Tolerance
    • Dataset data replication
    • Process restart/replacement

Ps.: the general progress will move to: https://github.com/mtcs/faster/wiki/Implementation-Progress

0.0.3 Alpha

23 Oct 14:41

Choose a tag to compare

0.0.3 Alpha Pre-release
Pre-release

Fast Distributed Dataset (FDD) types:

  • Simple ( char, int, long int, float and double ).
  • Pointer ( char *, int *, long int *, float * and double * ) (WILL BE DISCONTINUED IN THE FUTURE).
  • Containers ( std::vector, std::string ).
  • Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
  • Grouped (a group of two or tree datasets).

Data Functions

  • Map - transform a data item in any other type ( 1 to 1 ).
  • Reduce - reduce all elements into one ( 2 to 1 ).
  • FlatMap - generate anew set of data ( 1 to n ).
  • Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
  • MapByKey - transform all indexed datasets itens with the same key ( n to 1 ).

Other Functions

  • FDD creation from local memory ( through constructor ).
  • Distributed read from file through constructor - each process read from a global file offset.
  • collect - get a local copy of the dataset ( send the distributed data to the driver process ).
  • coutByKey - just like a histogram ( count occurrence of every key and send to driver process ).
  • groupByKey - Group a dataset data by key, data with the same key migrates to a single machine.
  • printInfo** - Prints runtime information of all tasks
  • printHeader** - Prints the header of the runtime information
  • updateInfo** - Prints runtime information fo all tasks called after last updateInfo (useful for program status update).

Release Oprimizations

ByKey functions no longer needs sorted data, lowering overhead.

Implementation planned for next releases:

(in order of priority)

  • Global variables - Global variables that can be modified by the driver process.
  • Load Redistribution/Tune
  • Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
  • Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
  • HDFS support
  • Fault Tolerance
    • Dataset data replication
    • Process restart/replacement

** new implementation

0.0.2 Alpha

26 Aug 23:25

Choose a tag to compare

0.0.2 Alpha Pre-release
Pre-release

Fast Distributed Dataset (FDD) types:

  • Simple ( char, int, long int, float and double ).
  • Pointer ( char *, int *, long int *, float * and double * ).
  • Containers ( std::vector, std::string ).
  • Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container).
  • Grouped** (a group of two or tree datasets).

Data Functions

  • Map - transform a data item in any other type ( 1 to 1 ).
  • Reduce - reduce all elements into one ( 2 to 1 ).
  • FlatMap - generate anew set of data ( 1 to n ).
  • Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations.
  • MapByKey** - transform all indexed datasets itens with the same key ( n to 1 ).

Other Functions

  • FDD creation from local memory ( through constructor ).
  • Distributed read from file through constructor - each process read from a global file offset.
  • Collect - get a local copy of the dataset ( send the distributed data to the driver process ).
  • CoutByKey** - just like a histogram ( count occurrence of every key and send to driver process ).
  • GroupByKey** - Group a dataset data by key, data with the same key migrates to a single machine.

Implementation planned for next releases:

(in order of priority)

  • Global variables - Global variables that can be modified by the driver process.
  • Load Redistribution/Tune
  • Aditional function arguments - Arguments passed with custom function pointer ) ex.: myFdd->map(&mapFunc, arg1, arg2);
  • Distributed directory read - Each process reads a local file from a directory (simulate a DFS)
  • HDFS support
  • Fault Tolerance
    • Dataset data replication
    • Process restart/replacement

** new implementation

0.0.1 Alpha

20 Jul 19:43

Choose a tag to compare

0.0.1 Alpha Pre-release
Pre-release

Fast Distributed Dataset (FDD) types:

  • Simple ( char, int, long int, float and double )
  • Pointer ( char *, int *, long int *, float * and double * )
  • Containers ( std::vector, std::string )
  • Indexed FDDs - pair of Key (simple or string) and Data (simple, pointer or container)

Data Functions

  • Map - transform a data item in any other type ( 1 to 1 )
  • Reduce - reduce all elements into one ( n to 1 )
  • FlatMap - generate anew set of data ( 1 to n )
  • Bulk Map and FlatMap - performance efficient function enables sub-iteration implementations

Other Functions

  • FDD creation from local memory ( through constructor )
  • Distributed read from file through constructor ( each process read a file offset )
  • Collect ( send the distributed data to the driver process )

Initial implementation:

(implementation started but not finished or tested)

  • GroupByKey ( send all data with the same index to a specific node )
  • CoutByKey ( count occurrence of every key and send to driver process )

Implementation planned:

  • ByKey functions ( mapByKey and reduceByKey )
  • Global variables ( passed with custom function pointer ) ex.: myFdd->map(&mapFunc, globalVar1, globalVar2);

0.0 Alpha

21 Jun 16:09

Choose a tag to compare

0.0 Alpha Pre-release
Pre-release

Basic simple FDD functionality with no block balancing