Skip to content

Fixed broken links in header.html file.#4

Open
mandadipavan wants to merge 581 commits into
j143-zz:masterfrom
mandadipavan:master
Open

Fixed broken links in header.html file.#4
mandadipavan wants to merge 581 commits into
j143-zz:masterfrom
mandadipavan:master

Conversation

@mandadipavan

Copy link
Copy Markdown

Simply changed systemml links to systemds in header.html file.

Shafaq-Siddiqi and others added 30 commits January 26, 2020 20:46
This patch makes some minor performance improvements to the new builtin
function for functional dependency discovery, including vectorized
handling of know dependencies, less indexing overhead, removed
unnecessary operations (rmEmpty, distinct, agg), and fixed sized table
computation.

On a scenario of 10K x 1K columns (499500 pairs) with domain 1:100, this
patch improved performance from 96.5s to 62.2s.

In the future, we should restrict ourself to integer/boolean types,
stick to simple pairs, and only enumerate candidates of relevant pairs.
This patch contains -
1. Refactoring of function results caching
2. Code to skip caching if function contains Rand/Sample
3. Few bug fixes.
- Added entry for multilevel caching
- Fixed bugs in rewrite stats
Outliers detection using standard deviation and repair using row
deletion, mean and median imputation

Closes apache#89.
Outlier detection using IQR - Initial commit

Closes apache#91.
Optimizes rbind and cbind to only append federated metadata for the
result.

Closes apache#92.
This patch adds the unary builtin functions is.na (NA or NaN), is.nan
(NaN), and is.infinite (-INF, +INF). All matrix text readers are now
aware of NAs, but convert them to NaNs which falls under the definition
of NAs. Furthermore, this patch also removes unnecessary builtin
function reuse for all individual builtin operations which has no
performance impact but reduces the codesize.
Basic JSONL Reader Implementation.
Basic JSONL Writer Implementation.
Basic Parallel JSONL Reader/Writer Implementation.
Test Utils and WriteRead Tests

DIA project.

Closes apache#93.
1. Included Mask and threshold as input parameters
2. Output Matrix FD contains the scores for FDs

Closes apache#95.
DIA project data augmentation for data cleaning (outliers, missing
values, typos, swapped columns)

Closes apache#101.
- Upgrade the startup of the Federated Environment
- Support for Default Port
- Relative and static file path URL for Federated Worker
- Minor Startup cleanup in Federated Worker
- No need for extra file argument to start a federated worker

Closes apache#98.
1) Fixed corrupted print of stop/parse issues to stderr
2) Fixed missing handling of tensors in federated instruction wrapping
- Builtin function for Multinominal Logistic Regression
- Function test verifying integration

Closes apache#107
- Added scripts for deployment of SystemDS on the Amazon EMR
- Located in /scripts/aws/*

Closes apache#112
Supratick Dey and others added 30 commits June 13, 2020 15:27
This patch fixes bugs in handling of multi-level cache duplicates,
eviction and reading from disk. This also adds a new test, l2svm.
Minor change to pom to update and re-enable code coverage in testing.
when testing using the following command (replace ??? with package name)

`mvn test -DskipTests=false -Dtest=org.apache.sysds.???`

This uses jacoco to produce a folder containing a webpage in
`target/site` that show coverage.

Closes apache#956
Adds Federated prefix to instructions, so the statistics returned
show federated instruction executions just like Spark or GPU
instructions.

Minor fix in Startup of worker allowing log4j to work again.

Closes apache#970
This commit moves the documentation back to the master branch.
It also clean up the previous documentation (by deleting it).
Such that we have a clean start.

Furthermore this commit, merges back the documentation on master,
into the webpage documentation.

Related PRs: apache#949 apache#922

Discussion Mails:
https://tinyurl.com/yal7fd3r
https://preview.tinyurl.com/yal7fd3r
This patch enables reuse for rand(matrix) and few more
instructions. Furthermore, it fixes a bug in eviction
logic that was forming cycles in the linked lists.
This patch contains a rewrite to reuse tsmm result in lmDS if
called after PCA incrementally for increasing number of columns.
Extend Python API with more operations:
- rev, t, order, cholesky, trigonometric ops
  (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh)
Also including Test cases and Docs update.

Closes apache#975.
This patch fixes the logic of IPA scalar propagation into functions with
multiple function calls. Similar to sizes, we check if literal function
arguments have consistent values and propagate valid ones. However, this
check had a logic problem of only checking if the first call was a
literal. This missed cases where the first call had a scalar variable
but the second call a valid scalar literal that could had been
propagated individually.
This patch adds msvm w/ remote_spark parfor workers to the test suite
and fixes missing support for tak+ operators in the recompute-by-lineage
utility.
Adds support for Protobuf file format, for both reads and write.

AMLS project SS2020, part 1

Closes apache#971
This patch adds basic lineage support to the MLContext API. Since
in-memory objects are directly bound to the symbol table, lineage
tracing views these objects as literals and incorrectly reused
intermediates even if different in-memory objects where used in
subsequent mlcontext invocations.
This patch makes a major refactoring of the lineage deduplication
framework, including removed indirections and support for while loops
and nested if program blocks. We now drop support for nested loops but
this is fine as they are anyway split into many items and the biggest
benefit comes from the last-level loop. In contrast, nested if blocks
are critical in practice and this required a more generic collection of
the lineage patches for all distinct paths (which we still do in a
single pass over the loop body program). Additionally, we now support
while loops with an integration very similar to for loops.
This patch fixes size propagation issues during parsing and
recompilation for rbind/cbind operations over lists into a single
matrix. Together with other rewrites, the incorrect size propagation led
to invalid runtime plans.

However, the additional tests with CV-lm still require an assertion to
allow function inlining as a precondition for the fold-rewrite to
eliminate redundancy. Solving this remaining issue requires a principled
size propagation approach for matrix objects in lists.
This patch adds a new rewrite to partially reuse tsmm
results in StepLM (forward).
- Privacy Constraint support for GLM
- Privacy tests for GLM
- Improved exception handling of federated responses
- Log of checked privacy constraints

This is to give the federated master information about which privacy
constraints were violated and to be able to throw the actual exception
on the master side.

Add Initial Implementation of Checked Privacy Constraints Log

This will enable the user to check which privacy constraints were
retrieved during handling of federated instruction. This is an initial
implementation since the checked privacy constraints are added to the
federated response, but this is never retrieved by the federated master.
If the privacy constraint is null for a checked data object, this is
currently not logged. This could easily be changed by moving the put
operation before the privacy constraint null check in the
PrivacyMonitor.

Closes apache#946
- New builtin for identifying cells which violate length constrain.
- Replacing OutputInfo.CSVOutputInfo with Types.FileFormat.CSV

1. Operations are now consistent with their semantics i.e.,
   dropInvalidLength and dropInvalidType
2. Instead of identify the invalid cells the "dropInvalidLength"
   now replaces the invalid values with null and returns a frame
3. Binary method changed from MMBinaryMethod.MR_BINARY_R to
   MMBinaryMethod.MR_BINARY_M
4. Spark broadcast replaced with PartitionedBroadcast
exclude protobuf in Jdocs

Closes apache#923
Add FederatedWorkerHandlerException And Improved Handling of
Exceptions in FederatedWorkerHandler
This patch makes the following performance improvements in the context
of basic lineage tracing and lineage-based reuse probing:

1) Avoid string handling: Materialize the flag if a createvar
instruction has a persistent-read prefix in the name, which avoid
unnecessary string comparisons for ALL createvar instructions, so almost
30% of all instructions.

2) Apply the existing constant folding rewrite not just during static
rewrites but now also as a cleanup rewrite in order to remove remaining
constant expressions (introduced by rewrites) inside loops. This has
especially large impact in lineage because constructing the lineage item
is more expensive than the entire scalar operation.

3) Leverage the materialized hash code in lineage items as early-out
condition in the recursive equals check of lineage DAGs. This is
especially useful where all lineage DAGs have the same repreated
structure (e.g., from unrolled iterations) but a different input. The
equals would go all the way to the first differences, while the
comparison of hash codes (aggregates over all inputs) very likely differ
earlier.

On a mini-batch scenario of 250,000 iterations batch size 8, and 40
operations per iteration, the runtime w/o lineage was 65.6s, and the
changes (1) and (2) improved the runtime with lineage tracing from 76.5s
to 72.6s. Furthermore, we also seen some improvements for reuse probing
in this scenario, but this requires but work too.
This patch fixes an interesting performance bug caused by the recursive
hash computation of lineage items. Due to repeated operation sequences
(from loop iterations) and integer overflows during the hash
computation, there were systematic hash sequence within one lineage DAG.
This in turn lead to less pruning power on recursive equals
computations, and collisions in the lineage cache, leading to even more
recursive equals comparisons.

The fix is simple. We now handle such overflows on hash aggregation
(e.g., hash(int,int)) with a long instead of int hash function on
demand. On the following test scenario

for(i in 1:1000)
  X = ((X + X) * 2 - X) / 3

the previous runtime was 162s while with this patch it reduced to
0.244s. Even with 10K iterations, the runtime is still 1.1s, which
suggests that any super-linear behavior has been eliminated.
This patch makes some minor performance improvements to the lineage
reuse probing and cache put operations. Specifically, we now avoid
unnecessary lineage hashing and comparisons by using lists instead of
hash maps, move the time computations into the reuse path (to not affect
the code path without lineage reuse), avoid unnecessary branching, and
materialize the score of cache entries to avoid repeated computation
for the log N comparisons per add/remove/constaints operation.

For 100K iterations and ~40 ops per iteration, lineage tracing w/ reuse
improved from 41.9s to 38.8s (pure lineage tracing: 27.9s).
This patch makes a minor performance improvement to the important
partial rewrite tsmm(cbind(X,v)) to tsmm(X) + compensation plan, by
avoiding cbind(X, v)[,1:n-1] to extract X if X is still available in the
lineage cache. This avoids unnecessary allocation and copies.
simply updated systemml links to systemds in header.html file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.