Added Developers-tools-systemds.md#5
Open
omkarghugarkar007 wants to merge 795 commits into
Open
Conversation
This patch - extends reusable opcodes, which primarily improves multilevel (statementblock) cache hits, - removes most of the System.nanoTime calls from cache logic, - replaces operand names with placeholders in datagen lineage items, (Note: this fix is temporarily commented due to a bug in parfor-lineage) - fixes a bug in lineage item creation for multilevel caching, - update grid search lineage test with a loss function.
This patch makes further robustness improvements to the handling of large lineage DAGs via non-recursive primitives. In this context, explain needed special treatment to preserve the previous output in DFS order w/ post-append. Furthermore, this also fixes a number of issues of the parfor integration such as (1) invalid cached hashes after sub-DAG replacement, (2) introduced cycles during parfor lineage merge, (3) steplm script improvements (disabled parfor dependency analysis was hiding the issue that introduced the cycles), and (4) some debugging functionality to reliably detect cycles in lineage DAGs.
This patch disabled lineage-based reuse for update-inplace left indexing operations as reuse would create incorrect results due to later in-place updates the change the cached data object. Furthermore, this patch also aims to make the codegen tests for robust wrt the surefire github action integration (less explain output).
This patch enables caching of multi-return instructions like eigen. Furthermore, this includes a new partial rewrite and adds PCA as a test case.
* img_brightness() * img_crop() * img_mirror() Closes apache#959.
Closes apache#954. Closes apache#961.
This patch fixes bugs in handling of multi-level cache duplicates, eviction and reading from disk. This also adds a new test, l2svm.
Minor change to pom to update and re-enable code coverage in testing. when testing using the following command (replace ??? with package name) `mvn test -DskipTests=false -Dtest=org.apache.sysds.???` This uses jacoco to produce a folder containing a webpage in `target/site` that show coverage. Closes apache#956
Adds Federated prefix to instructions, so the statistics returned show federated instruction executions just like Spark or GPU instructions. Minor fix in Startup of worker allowing log4j to work again. Closes apache#970
This commit moves the documentation back to the master branch. It also clean up the previous documentation (by deleting it). Such that we have a clean start. Furthermore this commit, merges back the documentation on master, into the webpage documentation. Related PRs: apache#949 apache#922 Discussion Mails: https://tinyurl.com/yal7fd3r https://preview.tinyurl.com/yal7fd3r
This patch enables reuse for rand(matrix) and few more instructions. Furthermore, it fixes a bug in eviction logic that was forming cycles in the linked lists.
This patch contains a rewrite to reuse tsmm result in lmDS if called after PCA incrementally for increasing number of columns.
Extend Python API with more operations: - rev, t, order, cholesky, trigonometric ops (sin, cos, tan, asin, acos, atan, sinh, cosh, tanh) Also including Test cases and Docs update. Closes apache#975.
This patch fixes the logic of IPA scalar propagation into functions with multiple function calls. Similar to sizes, we check if literal function arguments have consistent values and propagate valid ones. However, this check had a logic problem of only checking if the first call was a literal. This missed cases where the first call had a scalar variable but the second call a valid scalar literal that could had been propagated individually.
This patch adds msvm w/ remote_spark parfor workers to the test suite and fixes missing support for tak+ operators in the recompute-by-lineage utility.
Adds support for Protobuf file format, for both reads and write. AMLS project SS2020, part 1 Closes apache#971
This patch adds basic lineage support to the MLContext API. Since in-memory objects are directly bound to the symbol table, lineage tracing views these objects as literals and incorrectly reused intermediates even if different in-memory objects where used in subsequent mlcontext invocations.
This patch makes a major refactoring of the lineage deduplication framework, including removed indirections and support for while loops and nested if program blocks. We now drop support for nested loops but this is fine as they are anyway split into many items and the biggest benefit comes from the last-level loop. In contrast, nested if blocks are critical in practice and this required a more generic collection of the lineage patches for all distinct paths (which we still do in a single pass over the loop body program). Additionally, we now support while loops with an integration very similar to for loops.
This patch fixes size propagation issues during parsing and recompilation for rbind/cbind operations over lists into a single matrix. Together with other rewrites, the incorrect size propagation led to invalid runtime plans. However, the additional tests with CV-lm still require an assertion to allow function inlining as a precondition for the fold-rewrite to eliminate redundancy. Solving this remaining issue requires a principled size propagation approach for matrix objects in lists.
This patch adds a new rewrite to partially reuse tsmm results in StepLM (forward).
- Privacy Constraint support for GLM - Privacy tests for GLM - Improved exception handling of federated responses - Log of checked privacy constraints This is to give the federated master information about which privacy constraints were violated and to be able to throw the actual exception on the master side. Add Initial Implementation of Checked Privacy Constraints Log This will enable the user to check which privacy constraints were retrieved during handling of federated instruction. This is an initial implementation since the checked privacy constraints are added to the federated response, but this is never retrieved by the federated master. If the privacy constraint is null for a checked data object, this is currently not logged. This could easily be changed by moving the put operation before the privacy constraint null check in the PrivacyMonitor. Closes apache#946
- New builtin for identifying cells which violate length constrain. - Replacing OutputInfo.CSVOutputInfo with Types.FileFormat.CSV 1. Operations are now consistent with their semantics i.e., dropInvalidLength and dropInvalidType 2. Instead of identify the invalid cells the "dropInvalidLength" now replaces the invalid values with null and returns a frame 3. Binary method changed from MMBinaryMethod.MR_BINARY_R to MMBinaryMethod.MR_BINARY_M 4. Spark broadcast replaced with PartitionedBroadcast
exclude protobuf in Jdocs Closes apache#923
Add strings to python interface to enable calling SystemDS without enforcing an transfer and parsing into python.
This PR improves propagation of fine-grained privacy constraints for matrix multiplications. The PR also provides a new structure for fine-grained privacy propagations by introducing a "Propagator" interface which are implemented by different propagator classes. This interface will be used in the following implementations of privacy propagation for other operators. The new matrix multiplication propagation is more efficient than the previous implementation since it makes an array with the summarized privacy level of the rows of the first matrix and the columns of the second matrix. Furthermore, it takes the operator type into account. This means that if a row or column contains only a single non-zero value, it cannot be considered an aggregation, hence the output in case of the PrivateAggregation privacy level in the input should still be PrivateAggregation. The rules of propagation is implemented in the method "PrivacyPropagator.corePropagation", where the comment also details the privacy "truth table". - Edit Fine-Grained Constraint Propagation in Matrix Multiplications - The new version will take operator type into account when propagating and will summarize the privacy level of rows and columns of the input matrices to make a faster propagation. The new implementation needs further test cases, which will be added in future commits. - Add Tests of Matrix Multiplication Privacy Propagation - Refactor Matrix Multiplication Propagation By Introducing the Propagator Interface - Add Optimized PrivateFirst Propagator Closes apache#1060
Adds FederatedLocalData so that we can use local data without the necessity to send it to a worker. This allows reusing a lot of code, but might lead to overhead. Other options to handle this scenario exist. - Adds support for local data rbind and cbind. - Fix federated rbind/cbind with support for local data - Adds `FederatedLocalData` so that we can use local data without the necessity to send it to a worker. This allows reusing a lot of code, but might lead to overhead. - Add return comment to `FederatedData.copyWithNewID()` - Ignore failing privacy transfer tests Closing apache#1062
… Also: * Used EXCEPTION_EXPECTED instead of "true" with one test case because it makes the code clearer and silences a warning.
- tweak heading numbering and tex syntax - clustering header site name and table link - minor changes to regression - minor tweaks in matrix factorization - minor tweaks in survival analysis - correct the syntax - correct the syntax for descriptive statistics
Fix relative path when inside site closes apache#1072
replace the rbind/cbind with indexing rand call is updated with a seed value
for loop converted to parfor syntax reformations suggested by Arnab
After parfor operations a result-merge implementation merges the partial results from parfor workers into the final result variables. In case of remote parfor, we have #result variables x #parfor tasks files, which - in case of in-memory result merge - are into the driver, aggregated, and finally deleted. In sub-optimal cluster configurations, the delete can have substantial latency (independent of file size). To mitigate this latency we now delete these files in a multi-threaded manner, which showed good performance. Note that we refrain from asynchronous deletion to avoid synchronization in case of parfor loops in surrounding while/for loops (where the same files might be written multiple times).
SYSTEMDS 2543-2544 Federated Aggregations: - Federated Min + Max col and row aggregation - Federated mean and sum aggregations Closes apache#1040
AMLS project SS2020. Closes apache#1070.
AMLS project SS2020. Closes apache#1071.
This patch adds more robust error handling for the scenario of dml-bodied builtin functions whose scripts are are loaded but the functions are unavailable (e.g., due to typos). Furthermore, this also includes a fix of the lasso builtin function, where last minute renaming of function arguments (for consistency) failed the tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added the doc for IntelliJ instructions