Fixed broken links in header.html file. by mandadipavan · Pull Request #4 · j143-zz/systemml

mandadipavan · 2020-07-10T10:20:28Z

Simply changed systemml links to systemds in header.html file.

This patch makes some minor performance improvements to the new builtin function for functional dependency discovery, including vectorized handling of know dependencies, less indexing overhead, removed unnecessary operations (rmEmpty, distinct, agg), and fixed sized table computation. On a scenario of 10K x 1K columns (499500 pairs) with domain 1:100, this patch improved performance from 96.5s to 62.2s. In the future, we should restrict ourself to integer/boolean types, stick to simple pairs, and only enumerate candidates of relevant pairs.

This patch contains - 1. Refactoring of function results caching 2. Code to skip caching if function contains Rand/Sample 3. Few bug fixes.

- Added entry for multilevel caching - Fixed bugs in rewrite stats

Closes apache#90

Closes apache#67.

Closes apache#79.

Outliers detection using standard deviation and repair using row deletion, mean and median imputation Closes apache#89.

Outlier detection using IQR - Initial commit Closes apache#91.

Optimizes rbind and cbind to only append federated metadata for the result. Closes apache#92.

Closes apache#99

This patch adds the unary builtin functions is.na (NA or NaN), is.nan (NaN), and is.infinite (-INF, +INF). All matrix text readers are now aware of NAs, but convert them to NaNs which falls under the definition of NAs. Furthermore, this patch also removes unnecessary builtin function reuse for all individual builtin operations which has no performance impact but reduces the codesize.

Basic JSONL Reader Implementation. Basic JSONL Writer Implementation. Basic Parallel JSONL Reader/Writer Implementation. Test Utils and WriteRead Tests DIA project. Closes apache#93.

DIA project, part 2 Closes apache#94.

1. Included Mask and threshold as input parameters 2. Output Matrix FD contains the scores for FDs Closes apache#95.

DIA project data augmentation for data cleaning (outliers, missing values, typos, swapped columns) Closes apache#101.

Closes apache#97.

Closes apache#102.

- Upgrade the startup of the Federated Environment - Support for Default Port - Relative and static file path URL for Federated Worker - Minor Startup cleanup in Federated Worker - No need for extra file argument to start a federated worker Closes apache#98.

Closes apache#103.

1) Fixed corrupted print of stop/parse issues to stderr 2) Fixed missing handling of tensors in federated instruction wrapping

- Builtin function for Multinominal Logistic Regression - Function test verifying integration Closes apache#107

- Added scripts for deployment of SystemDS on the Amazon EMR - Located in /scripts/aws/* Closes apache#112

Closes apache#962.

Closes apache#969.

Closes apache#954. Closes apache#961.

This patch fixes bugs in handling of multi-level cache duplicates, eviction and reading from disk. This also adds a new test, l2svm.

Minor change to pom to update and re-enable code coverage in testing. when testing using the following command (replace ??? with package name) `mvn test -DskipTests=false -Dtest=org.apache.sysds.???` This uses jacoco to produce a folder containing a webpage in `target/site` that show coverage. Closes apache#956

Adds Federated prefix to instructions, so the statistics returned show federated instruction executions just like Spark or GPU instructions. Minor fix in Startup of worker allowing log4j to work again. Closes apache#970

This commit moves the documentation back to the master branch. It also clean up the previous documentation (by deleting it). Such that we have a clean start. Furthermore this commit, merges back the documentation on master, into the webpage documentation. Related PRs: apache#949 apache#922 Discussion Mails: https://tinyurl.com/yal7fd3r https://preview.tinyurl.com/yal7fd3r

This patch enables reuse for rand(matrix) and few more instructions. Furthermore, it fixes a bug in eviction logic that was forming cycles in the linked lists.

This patch contains a rewrite to reuse tsmm result in lmDS if called after PCA incrementally for increasing number of columns.

Extend Python API with more operations: - rev, t, order, cholesky, trigonometric ops (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh) Also including Test cases and Docs update. Closes apache#975.

This patch fixes the logic of IPA scalar propagation into functions with multiple function calls. Similar to sizes, we check if literal function arguments have consistent values and propagate valid ones. However, this check had a logic problem of only checking if the first call was a literal. This missed cases where the first call had a scalar variable but the second call a valid scalar literal that could had been propagated individually.

This patch adds msvm w/ remote_spark parfor workers to the test suite and fixes missing support for tak+ operators in the recompute-by-lineage utility.

Adds support for Protobuf file format, for both reads and write. AMLS project SS2020, part 1 Closes apache#971

…l/sysds merge)

This patch adds basic lineage support to the MLContext API. Since in-memory objects are directly bound to the symbol table, lineage tracing views these objects as literals and incorrectly reused intermediates even if different in-memory objects where used in subsequent mlcontext invocations.

This patch makes a major refactoring of the lineage deduplication framework, including removed indirections and support for while loops and nested if program blocks. We now drop support for nested loops but this is fine as they are anyway split into many items and the biggest benefit comes from the last-level loop. In contrast, nested if blocks are critical in practice and this required a more generic collection of the lineage patches for all distinct paths (which we still do in a single pass over the loop body program). Additionally, we now support while loops with an integration very similar to for loops.

This patch fixes size propagation issues during parsing and recompilation for rbind/cbind operations over lists into a single matrix. Together with other rewrites, the incorrect size propagation led to invalid runtime plans. However, the additional tests with CV-lm still require an assertion to allow function inlining as a precondition for the fold-rewrite to eliminate redundancy. Solving this remaining issue requires a principled size propagation approach for matrix objects in lists.

This patch adds a new rewrite to partially reuse tsmm results in StepLM (forward).

- Privacy Constraint support for GLM - Privacy tests for GLM - Improved exception handling of federated responses - Log of checked privacy constraints This is to give the federated master information about which privacy constraints were violated and to be able to throw the actual exception on the master side. Add Initial Implementation of Checked Privacy Constraints Log This will enable the user to check which privacy constraints were retrieved during handling of federated instruction. This is an initial implementation since the checked privacy constraints are added to the federated response, but this is never retrieved by the federated master. If the privacy constraint is null for a checked data object, this is currently not logged. This could easily be changed by moving the put operation before the privacy constraint null check in the PrivacyMonitor. Closes apache#946

- New builtin for identifying cells which violate length constrain. - Replacing OutputInfo.CSVOutputInfo with Types.FileFormat.CSV 1. Operations are now consistent with their semantics i.e., dropInvalidLength and dropInvalidType 2. Instead of identify the invalid cells the "dropInvalidLength" now replaces the invalid values with null and returns a frame 3. Binary method changed from MMBinaryMethod.MR_BINARY_R to MMBinaryMethod.MR_BINARY_M 4. Spark broadcast replaced with PartitionedBroadcast

exclude protobuf in Jdocs Closes apache#923

Add FederatedWorkerHandlerException And Improved Handling of Exceptions in FederatedWorkerHandler

This patch makes the following performance improvements in the context of basic lineage tracing and lineage-based reuse probing: 1) Avoid string handling: Materialize the flag if a createvar instruction has a persistent-read prefix in the name, which avoid unnecessary string comparisons for ALL createvar instructions, so almost 30% of all instructions. 2) Apply the existing constant folding rewrite not just during static rewrites but now also as a cleanup rewrite in order to remove remaining constant expressions (introduced by rewrites) inside loops. This has especially large impact in lineage because constructing the lineage item is more expensive than the entire scalar operation. 3) Leverage the materialized hash code in lineage items as early-out condition in the recursive equals check of lineage DAGs. This is especially useful where all lineage DAGs have the same repreated structure (e.g., from unrolled iterations) but a different input. The equals would go all the way to the first differences, while the comparison of hash codes (aggregates over all inputs) very likely differ earlier. On a mini-batch scenario of 250,000 iterations batch size 8, and 40 operations per iteration, the runtime w/o lineage was 65.6s, and the changes (1) and (2) improved the runtime with lineage tracing from 76.5s to 72.6s. Furthermore, we also seen some improvements for reuse probing in this scenario, but this requires but work too.

This patch fixes an interesting performance bug caused by the recursive hash computation of lineage items. Due to repeated operation sequences (from loop iterations) and integer overflows during the hash computation, there were systematic hash sequence within one lineage DAG. This in turn lead to less pruning power on recursive equals computations, and collisions in the lineage cache, leading to even more recursive equals comparisons. The fix is simple. We now handle such overflows on hash aggregation (e.g., hash(int,int)) with a long instead of int hash function on demand. On the following test scenario for(i in 1:1000) X = ((X + X) * 2 - X) / 3 the previous runtime was 162s while with this patch it reduced to 0.244s. Even with 10K iterations, the runtime is still 1.1s, which suggests that any super-linear behavior has been eliminated.

This patch makes some minor performance improvements to the lineage reuse probing and cache put operations. Specifically, we now avoid unnecessary lineage hashing and comparisons by using lists instead of hash maps, move the time computations into the reuse path (to not affect the code path without lineage reuse), avoid unnecessary branching, and materialize the score of cache entries to avoid repeated computation for the log N comparisons per add/remove/constaints operation. For 100K iterations and ~40 ops per iteration, lineage tracing w/ reuse improved from 41.9s to 38.8s (pure lineage tracing: 27.9s).

This patch makes a minor performance improvement to the important partial rewrite tsmm(cbind(X,v)) to tsmm(X) + compensation plan, by avoiding cbind(X, v)[,1:n-1] to extract X if X is still available in the lineage cache. This avoids unnecessary allocation and copies.

simply updated systemml links to systemds in header.html file

Shafaq-Siddiqi and others added 30 commits January 26, 2020 20:46

[SYSTEMDS-197] Builtin function for functional dependency discovery

6589c93

Closes apache#88.

Function results caching updates and bug fixes.

06e5de3

This patch contains - 1. Refactoring of function results caching 2. Code to skip caching if function contains Rand/Sample 3. Few bug fixes.

Extends lineagecache statistics.

d978638

- Added entry for multilevel caching - Fixed bugs in rewrite stats

[MINOR] Upgrading URL handling for initiating federated commands

b3f4d28

Closes apache#90

Bug fix

91b305b

[BUGFIX] two more try-block issues fixed

ce0ae27

[SYSTEMDS-241] New GPU dense cumulative aggregates

6282ec8

Closes apache#67.

[SYSTEMDS-156] Initial code style eclipse template

48faea4

Closes apache#79.

[SYSTEMDS-193] New builtin function for outlier detection via std dev

e2b903c

Outliers detection using standard deviation and repair using row deletion, mean and median imputation Closes apache#89.

[SYSTEMDS-194] New builtin function for outlier detection via IQR

da30e27

Outlier detection using IQR - Initial commit Closes apache#91.

[SYSTEMDS-226] New federated rbind and cbind operations

645d0ba

Optimizes rbind and cbind to only append federated metadata for the result. Closes apache#92.

[261] Stable Marriage Algorithm

c42c90b

Closes apache#99

[MINOR] Fix strict compilation errors (types, statements)

c46c738

[SYSTEMDS-12] Cleanup unnecessary reorg hop/lop type indirections

149da7b

[SYSTEMDS-12] Cleanup unnecessary hop/lop indirections (op3,op4,opn,dnn)

8e8ea36

[SYSTEMDS-51] New JSON frame reader and writers

c804c91

Basic JSONL Reader Implementation. Basic JSONL Writer Implementation. Basic Parallel JSONL Reader/Writer Implementation. Test Utils and WriteRead Tests DIA project. Closes apache#93.

[SYSTEMDS-235] Add lineage support for parfor spark workers

bf07122

DIA project, part 2 Closes apache#94.

[SYSTEMDS-197] Improved functional dependency discovery (discoverFD)

8df07bf

1. Included Mask and threshold as input parameters 2. Output Matrix FD contains the scores for FDs Closes apache#95.

[SYSTEMDS-262] Data augmentation tool for data cleaning primitives

f5394e9

DIA project data augmentation for data cleaning (outliers, missing values, typos, swapped columns) Closes apache#101.

[MINOR] Various fixes and cleanups of recent changes

525dc7a

[SYSTEMDS-206] Improved codegen outer template compilation

5067b9a

Closes apache#97.

[SYSTEMDS-236] Extended multi-level lineage-based reuse (block-level)

e44e412

Closes apache#102.

[SYSTEMDS-198] Extended slice finding for classification tasks

2abddc1

Closes apache#103.

[SYSTEMDS-236] Fix parfor ID handling (hidden statement block ID)

b5aa876

[MINOR] Various fixes and cleanups of recent changes, part 2

c8c3d41

1) Fixed corrupted print of stop/parse issues to stderr 2) Fixed missing handling of tensors in federated instruction wrapping

[SYSTEMDS-199] Predict builtin for Multinomial Logistic Regression

db2dbeb

- Builtin function for Multinominal Logistic Regression - Function test verifying integration Closes apache#107

[SYSTEMDS-36] AWS deploy script

6c6506b

- Added scripts for deployment of SystemDS on the Amazon EMR - Located in /scripts/aws/* Closes apache#112

Supratick Dey and others added 30 commits June 13, 2020 15:27

[DOC] Documentation for builtin cvlm function

fb9a480

Closes apache#962.

[DOC][1/2] imputeByFD and discoverFD builtin func.

0c2a2b3

Closes apache#969.

[DOC] multiLogReg and intersect builtin func.

ade2bd7

Closes apache#954. Closes apache#961.

[SYSTEMDS-411] Fix bugs in cache eviction.

8751390

This patch fixes bugs in handling of multi-level cache duplicates, eviction and reading from disk. This also adds a new test, l2svm.

[MINOR] Add fed prefix for stats

a4853cf

Adds Federated prefix to instructions, so the statistics returned show federated instruction executions just like Spark or GPU instructions. Minor fix in Startup of worker allowing log4j to work again. Closes apache#970

[MINOR] Fix paths to resources docs

7326e6f

[MINOR] Reuse rand and others, bug fixes

2671a55

This patch enables reuse for rand(matrix) and few more instructions. Furthermore, it fixes a bug in eviction logic that was forming cycles in the linked lists.

[SYSTEMDS-414] New rewrite for PCA -> lmDS pipeline

882760b

This patch contains a rewrite to reuse tsmm result in lmDS if called after PCA incrementally for increasing number of columns.

[SYSTEMDS-310] Python Bindings Extension

67439d8

Extend Python API with more operations: - rev, t, order, cholesky, trigonometric ops (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh) Also including Test cases and Docs update. Closes apache#975.

[MINOR] Additional lineage parfor remote tests, and cleanups

c6d7a52

This patch adds msvm w/ remote_spark parfor workers to the test suite and fixes missing support for tak+ operators in the recompute-by-lineage utility.

[SYSTEMDS-511] Add protobuf support

b76915c

Adds support for Protobuf file format, for both reads and write. AMLS project SS2020, part 1 Closes apache#971

[MINOR] recompiled Linux native blas libs (stopped working after sysm…

73a25a6

…l/sysds merge)

[MINOR] Improve robustness in partial reuse.

ba075af

[SYSTEMDS-417] Add a new rewrite for StepLM

93e0930

This patch adds a new rewrite to partially reuse tsmm results in StepLM (forward).

[MINOR] Fix robustness lineage tests (less explain/print/lineage output)

d9d3288

[MINOR] Fix JDocs

dd742ab

exclude protobuf in Jdocs Closes apache#923

[SYSTEMDS-2568] Privacy Runtime Extended

433f638

Add FederatedWorkerHandlerException And Improved Handling of Exceptions in FederatedWorkerHandler

Updated header file.

2900494

simply updated systemml links to systemds in header.html file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed broken links in header.html file.#4

Fixed broken links in header.html file.#4
mandadipavan wants to merge 581 commits into
j143-zz:masterfrom
mandadipavan:master

mandadipavan commented Jul 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

mandadipavan commented Jul 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants