Skip to content

Added Developers-tools-systemds.md#5

Open
omkarghugarkar007 wants to merge 795 commits into
j143-zz:masterfrom
omkarghugarkar007:master
Open

Added Developers-tools-systemds.md#5
omkarghugarkar007 wants to merge 795 commits into
j143-zz:masterfrom
omkarghugarkar007:master

Conversation

@omkarghugarkar007

Copy link
Copy Markdown

Added the doc for IntelliJ instructions

arnabp and others added 30 commits June 8, 2020 16:05
This patch
 - extends reusable opcodes, which primarily improves multilevel (statementblock) cache hits,
 - removes most of the System.nanoTime calls from cache logic,
 - replaces operand names with placeholders in datagen lineage items,
   (Note: this fix is temporarily commented due to a bug in parfor-lineage)
 - fixes a bug in lineage item creation for multilevel caching,
 - update grid search lineage test with a loss function.
This patch makes further robustness improvements to the handling of
large lineage DAGs via non-recursive primitives. In this context,
explain needed special treatment to preserve the previous output in DFS
order w/ post-append.

Furthermore, this also fixes a number of issues of the parfor
integration such as (1) invalid cached hashes after sub-DAG replacement,
(2) introduced cycles during parfor lineage merge, (3) steplm script
improvements (disabled parfor dependency analysis was hiding the issue
that introduced the cycles), and (4) some debugging functionality to
reliably detect cycles in lineage DAGs.
This patch disabled lineage-based reuse for update-inplace left indexing
operations as reuse would create incorrect results due to later in-place
updates the change the cached data object.

Furthermore, this patch also aims to make the codegen tests for robust
wrt the surefire github action integration (less explain output).
This patch enables caching of multi-return instructions like eigen.
Furthermore, this includes a new partial rewrite and adds PCA
as a test case.
   * img_brightness()
   * img_crop()
   * img_mirror()

Closes apache#959.
This patch fixes bugs in handling of multi-level cache duplicates,
eviction and reading from disk. This also adds a new test, l2svm.
Minor change to pom to update and re-enable code coverage in testing.
when testing using the following command (replace ??? with package name)

`mvn test -DskipTests=false -Dtest=org.apache.sysds.???`

This uses jacoco to produce a folder containing a webpage in
`target/site` that show coverage.

Closes apache#956
Adds Federated prefix to instructions, so the statistics returned
show federated instruction executions just like Spark or GPU
instructions.

Minor fix in Startup of worker allowing log4j to work again.

Closes apache#970
This commit moves the documentation back to the master branch.
It also clean up the previous documentation (by deleting it).
Such that we have a clean start.

Furthermore this commit, merges back the documentation on master,
into the webpage documentation.

Related PRs: apache#949 apache#922

Discussion Mails:
https://tinyurl.com/yal7fd3r
https://preview.tinyurl.com/yal7fd3r
This patch enables reuse for rand(matrix) and few more
instructions. Furthermore, it fixes a bug in eviction
logic that was forming cycles in the linked lists.
This patch contains a rewrite to reuse tsmm result in lmDS if
called after PCA incrementally for increasing number of columns.
Extend Python API with more operations:
- rev, t, order, cholesky, trigonometric ops
  (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh)
Also including Test cases and Docs update.

Closes apache#975.
This patch fixes the logic of IPA scalar propagation into functions with
multiple function calls. Similar to sizes, we check if literal function
arguments have consistent values and propagate valid ones. However, this
check had a logic problem of only checking if the first call was a
literal. This missed cases where the first call had a scalar variable
but the second call a valid scalar literal that could had been
propagated individually.
This patch adds msvm w/ remote_spark parfor workers to the test suite
and fixes missing support for tak+ operators in the recompute-by-lineage
utility.
Adds support for Protobuf file format, for both reads and write.

AMLS project SS2020, part 1

Closes apache#971
This patch adds basic lineage support to the MLContext API. Since
in-memory objects are directly bound to the symbol table, lineage
tracing views these objects as literals and incorrectly reused
intermediates even if different in-memory objects where used in
subsequent mlcontext invocations.
This patch makes a major refactoring of the lineage deduplication
framework, including removed indirections and support for while loops
and nested if program blocks. We now drop support for nested loops but
this is fine as they are anyway split into many items and the biggest
benefit comes from the last-level loop. In contrast, nested if blocks
are critical in practice and this required a more generic collection of
the lineage patches for all distinct paths (which we still do in a
single pass over the loop body program). Additionally, we now support
while loops with an integration very similar to for loops.
This patch fixes size propagation issues during parsing and
recompilation for rbind/cbind operations over lists into a single
matrix. Together with other rewrites, the incorrect size propagation led
to invalid runtime plans.

However, the additional tests with CV-lm still require an assertion to
allow function inlining as a precondition for the fold-rewrite to
eliminate redundancy. Solving this remaining issue requires a principled
size propagation approach for matrix objects in lists.
This patch adds a new rewrite to partially reuse tsmm
results in StepLM (forward).
- Privacy Constraint support for GLM
- Privacy tests for GLM
- Improved exception handling of federated responses
- Log of checked privacy constraints

This is to give the federated master information about which privacy
constraints were violated and to be able to throw the actual exception
on the master side.

Add Initial Implementation of Checked Privacy Constraints Log

This will enable the user to check which privacy constraints were
retrieved during handling of federated instruction. This is an initial
implementation since the checked privacy constraints are added to the
federated response, but this is never retrieved by the federated master.
If the privacy constraint is null for a checked data object, this is
currently not logged. This could easily be changed by moving the put
operation before the privacy constraint null check in the
PrivacyMonitor.

Closes apache#946
- New builtin for identifying cells which violate length constrain.
- Replacing OutputInfo.CSVOutputInfo with Types.FileFormat.CSV

1. Operations are now consistent with their semantics i.e.,
   dropInvalidLength and dropInvalidType
2. Instead of identify the invalid cells the "dropInvalidLength"
   now replaces the invalid values with null and returns a frame
3. Binary method changed from MMBinaryMethod.MR_BINARY_R to
   MMBinaryMethod.MR_BINARY_M
4. Spark broadcast replaced with PartitionedBroadcast
exclude protobuf in Jdocs

Closes apache#923
arnabp and others added 30 commits September 25, 2020 17:00
Add strings to python interface to enable calling SystemDS without
enforcing an transfer and parsing into python.
This PR improves propagation of fine-grained privacy constraints for
matrix multiplications. The PR also provides a new structure for
fine-grained privacy propagations by introducing a "Propagator"
interface which are implemented by different propagator classes. This
interface will be used in the following implementations of privacy
propagation for other operators.

The new matrix multiplication propagation is more efficient than the
previous implementation since it makes an array with the summarized
privacy level of the rows of the first matrix and the columns of the
second matrix.
Furthermore, it takes the operator type into account. This means that
if a row or column contains only a single non-zero value, it cannot be
considered an aggregation, hence the output in case of the
PrivateAggregation privacy level in the input should still be
PrivateAggregation. The rules of propagation is implemented in the
method "PrivacyPropagator.corePropagation", where the comment also
details the privacy "truth table".

- Edit Fine-Grained Constraint Propagation in Matrix Multiplications
- The new version will take operator type into account when propagating
  and will summarize the privacy level of rows and columns of the input
  matrices to make a faster propagation. The new implementation needs
  further test cases, which will be added in future commits.
- Add Tests of Matrix Multiplication Privacy Propagation
- Refactor Matrix Multiplication Propagation By Introducing
  the Propagator Interface
- Add Optimized PrivateFirst Propagator

Closes apache#1060
Adds FederatedLocalData so that we can use local data without the
necessity to send it to a worker. This allows reusing a lot of code, but
might lead to overhead. Other options to handle this scenario exist.

- Adds support for local data rbind and cbind.
- Fix federated rbind/cbind with support for local data
- Adds `FederatedLocalData` so that we can use local data without the
  necessity to send it to a worker. This allows reusing a lot of code,
  but might lead to overhead.
- Add return comment to `FederatedData.copyWithNewID()`
- Ignore failing privacy transfer tests

Closing apache#1062
… Also:

* Used EXCEPTION_EXPECTED instead of "true" with one test case because it makes the code clearer and silences a warning.
- tweak heading numbering and tex syntax
- clustering header site name and table link
- minor changes to regression
- minor tweaks in matrix factorization
- minor tweaks in survival analysis
- correct the syntax
- correct the syntax for descriptive statistics
Fix relative path when inside site

closes apache#1072
replace the rbind/cbind with indexing
rand call is updated with a seed value
for loop converted to parfor
syntax reformations suggested by Arnab
After parfor operations a result-merge implementation merges the partial
results from parfor workers into the final result variables. In case of
remote parfor, we have #result variables x #parfor tasks files, which -
in case of in-memory result merge - are into the driver, aggregated, and
finally deleted. In sub-optimal cluster configurations, the delete can
have substantial latency (independent of file size). To mitigate this
latency we now delete these files in a multi-threaded manner, which
showed good performance. Note that we refrain from asynchronous deletion
to avoid synchronization in case of parfor loops in surrounding
while/for loops (where the same files might be written multiple times).
SYSTEMDS 2543-2544 Federated Aggregations:

- Federated Min + Max col and row aggregation
- Federated mean and sum aggregations

Closes apache#1040
This patch adds more robust error handling for the scenario of
dml-bodied builtin functions whose scripts are are loaded but the
functions are unavailable (e.g., due to typos).

Furthermore, this also includes a fix of the lasso builtin function,
where last minute renaming of function arguments (for consistency)
failed the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.