-
Notifications
You must be signed in to change notification settings - Fork 111
feat(gui): add the notebook migration tool #3819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: teamA-migration-tool-main
Are you sure you want to change the base?
feat(gui): add the notebook migration tool #3819
Conversation
…to workflow migration tool
…pache#3795) Bumps [@babel/helpers](https://github.com/babel/babel/tree/HEAD/packages/babel-helpers) from 7.25.7 to 7.28.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/babel/babel/releases"><code>@babel/helpers</code>'s releases</a>.</em></p> <blockquote> <h2>v7.28.4 (2025-09-05)</h2> <p>Thanks <a href="https://github.com/gwillen"><code>@gwillen</code></a> and <a href="https://github.com/mrginglymus"><code>@mrginglymus</code></a> for your first PRs!</p> <h4>:house: Internal</h4> <ul> <li><code>babel-core</code>, <code>babel-helper-check-duplicate-nodes</code>, <code>babel-traverse</code>, <code>babel-types</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17493">#17493</a> Update Jest to v30.1.1 (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> <li><code>babel-plugin-transform-regenerator</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17455">#17455</a> chore: Clean up <code>transform-regenerator</code> (<a href="https://github.com/liuxingbaoyu"><code>@liuxingbaoyu</code></a>)</li> </ul> </li> <li><code>babel-core</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17474">#17474</a> Switch to <code>@jridgewell/remapping</code> (<a href="https://github.com/mrginglymus"><code>@mrginglymus</code></a>)</li> </ul> </li> </ul> <h4>Committers: 5</h4> <ul> <li>Babel Bot (<a href="https://github.com/babel-bot"><code>@babel-bot</code></a>)</li> <li>Bill Collins (<a href="https://github.com/mrginglymus"><code>@mrginglymus</code></a>)</li> <li>Glenn Willen (<a href="https://github.com/gwillen"><code>@gwillen</code></a>)</li> <li>Huáng Jùnliàng (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> <li><a href="https://github.com/liuxingbaoyu"><code>@liuxingbaoyu</code></a></li> </ul> <h2>v7.28.3 (2025-08-14)</h2> <h4>:eyeglasses: Spec Compliance</h4> <ul> <li><code>babel-helper-create-class-features-plugin</code>, <code>babel-plugin-proposal-decorators</code>, <code>babel-plugin-transform-class-static-block</code>, <code>babel-preset-env</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17443">#17443</a> [static blocks] Do not inject new static fields after static code (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> </ul> </li> </ul> <h4>:bug: Bug Fix</h4> <ul> <li><code>babel-parser</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17465">#17465</a> fix(parser/typescript): parse <code>import("./a", {with:{},})</code> (<a href="https://github.com/easrng"><code>@easrng</code></a>)</li> <li><a href="https://redirect.github.com/babel/babel/pull/17478">#17478</a> fix(parser): stop subscript parsing on async arrow (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> </ul> <h4>:nail_care: Polish</h4> <ul> <li><code>babel-plugin-transform-regenerator</code>, <code>babel-plugin-transform-runtime</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17363">#17363</a> Do not save last yield in call in temp var (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> </ul> </li> </ul> <h4>:memo: Documentation</h4> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17448">#17448</a> move eslint-{parser,plugin} docs to the website (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> <h4>:house: Internal</h4> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17454">#17454</a> Enable type checking for <code>scripts</code> and <code>babel-worker.cjs</code> (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> <h4>:microscope: Output optimization</h4> <ul> <li><code>babel-plugin-proposal-destructuring-private</code>, <code>babel-plugin-proposal-do-expressions</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17444">#17444</a> Optimize do expression output (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> </ul> <h4>Committers: 5</h4> <ul> <li>Babel Bot (<a href="https://github.com/babel-bot"><code>@babel-bot</code></a>)</li> <li>Huáng Jùnliàng (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> <li>Jam Balaya (<a href="https://github.com/JamBalaya56562"><code>@JamBalaya56562</code></a>)</li> <li>Nicolò Ribaudo (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> <li>easrng (<a href="https://github.com/easrng"><code>@easrng</code></a>)</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/babel/babel/blob/main/CHANGELOG.md"><code>@babel/helpers</code>'s changelog</a>.</em></p> <blockquote> <h2>v7.28.4 (2025-09-05)</h2> <h4>:house: Internal</h4> <ul> <li><code>babel-core</code>, <code>babel-helper-check-duplicate-nodes</code>, <code>babel-traverse</code>, <code>babel-types</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17493">#17493</a> Update Jest to v30.1.1 (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> <li><code>babel-plugin-transform-regenerator</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17455">#17455</a> chore: Clean up <code>transform-regenerator</code> (<a href="https://github.com/liuxingbaoyu"><code>@liuxingbaoyu</code></a>)</li> </ul> </li> <li><code>babel-core</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17474">#17474</a> Switch to <code>@jridgewell/remapping</code> (<a href="https://github.com/mrginglymus"><code>@mrginglymus</code></a>)</li> </ul> </li> </ul> <h2>v7.28.3 (2025-08-14)</h2> <h4>:eyeglasses: Spec Compliance</h4> <ul> <li><code>babel-helper-create-class-features-plugin</code>, <code>babel-plugin-proposal-decorators</code>, <code>babel-plugin-transform-class-static-block</code>, <code>babel-preset-env</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17443">#17443</a> [static blocks] Do not inject new static fields after static code (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> </ul> </li> </ul> <h4>:bug: Bug Fix</h4> <ul> <li><code>babel-parser</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17465">#17465</a> fix(parser/typescript): parse <code>import("./a", {with:{},})</code> (<a href="https://github.com/easrng"><code>@easrng</code></a>)</li> <li><a href="https://redirect.github.com/babel/babel/pull/17478">#17478</a> fix(parser): stop subscript parsing on async arrow (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> </ul> <h4>:nail_care: Polish</h4> <ul> <li><code>babel-plugin-transform-regenerator</code>, <code>babel-plugin-transform-runtime</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17363">#17363</a> Do not save last yield in call in temp var (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> </ul> </li> </ul> <h4>:memo: Documentation</h4> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17448">#17448</a> move eslint-{parser,plugin} docs to the website (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> <h4>:house: Internal</h4> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17454">#17454</a> Enable type checking for <code>scripts</code> and <code>babel-worker.cjs</code> (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> <h4>:microscope: Output optimization</h4> <ul> <li><code>babel-plugin-proposal-destructuring-private</code>, <code>babel-plugin-proposal-do-expressions</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17444">#17444</a> Optimize do expression output (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> </ul> <h2>v7.28.2 (2025-07-24)</h2> <h4>:bug: Bug Fix</h4> <ul> <li><code>babel-types</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17445">#17445</a> [babel 7] Make <code>operator</code> param in <code>t.tsTypeOperator</code> optional (<a href="https://github.com/nicolo-ribaudo"><code>@nicolo-ribaudo</code></a>)</li> </ul> </li> <li><code>babel-helpers</code>, <code>babel-plugin-transform-async-generator-functions</code>, <code>babel-plugin-transform-regenerator</code>, <code>babel-preset-env</code>, <code>babel-runtime-corejs3</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17441">#17441</a> fix: <code>regeneratorDefine</code> compatibility with es5 strict mode (<a href="https://github.com/liuxingbaoyu"><code>@liuxingbaoyu</code></a>)</li> </ul> </li> </ul> <h2>v7.28.1 (2025-07-12)</h2> <h4>:bug: Bug Fix</h4> <ul> <li><code>babel-plugin-transform-async-generator-functions</code>, <code>babel-plugin-transform-regenerator</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17426">#17426</a> fix: <code>regenerator</code> correctly handles <code>throw</code> outside of <code>try</code> (<a href="https://github.com/liuxingbaoyu"><code>@liuxingbaoyu</code></a>)</li> </ul> </li> </ul> <h4>:memo: Documentation</h4> <ul> <li><code>babel-types</code> <ul> <li><a href="https://redirect.github.com/babel/babel/pull/17422">#17422</a> Add missing FunctionParameter docs (<a href="https://github.com/JLHwung"><code>@JLHwung</code></a>)</li> </ul> </li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/babel/babel/commit/35055e392079a65830b7bf5b1d1c1fc4de90a78f"><code>35055e3</code></a> v7.28.4</li> <li><a href="https://github.com/babel/babel/commit/18d88b83c67c8dbbe63e4ac423e6006c4c01b85c"><code>18d88b8</code></a> Improve <code>@babel/core</code> typings (<a href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17471">#17471</a>)</li> <li><a href="https://github.com/babel/babel/commit/ef155f5ca83c73dbc1ea8d95216830b7dc3b0ac2"><code>ef155f5</code></a> v7.28.3</li> <li><a href="https://github.com/babel/babel/commit/741cbd2381ac0cda3afd42bc04454a87d9d8762a"><code>741cbd2</code></a> chore: fix various typos across codebase (<a href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17476">#17476</a>)</li> <li><a href="https://github.com/babel/babel/commit/cac0ff4c3426eed30b4d27e7971b348da7c9f1e6"><code>cac0ff4</code></a> v7.28.2</li> <li><a href="https://github.com/babel/babel/commit/f743094585b39bd9f7a9e3a3561215b2103e2474"><code>f743094</code></a> fix: <code>regeneratorDefine</code> compatibility with es5 strict mode (<a href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17441">#17441</a>)</li> <li><a href="https://github.com/babel/babel/commit/baa4cb8b9f8a551d7dae9042b19ea2f74df6b110"><code>baa4cb8</code></a> v7.27.6</li> <li><a href="https://github.com/babel/babel/commit/fdbf1b32b3aa3705761ff820661e81c0aececab7"><code>fdbf1b3</code></a> fix: <code>finally</code> causes unexpected return value (<a href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17366">#17366</a>)</li> <li><a href="https://github.com/babel/babel/commit/7d069309fdfcedda2928a043f6f7c98135c1242a"><code>7d06930</code></a> v7.27.4</li> <li><a href="https://github.com/babel/babel/commit/5b9468d9bf1ab4f427241673e9f03593da115a69"><code>5b9468d</code></a> Reduce <code>regenerator</code> size more (<a href="https://github.com/babel/babel/tree/HEAD/packages/babel-helpers/issues/17287">#17287</a>)</li> <li>Additional commits viewable in <a href="https://github.com/babel/babel/commits/v7.28.4/packages/babel-helpers">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/texera/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> Co-authored-by: Chris <143021053+kunwp1@users.noreply.github.com>
…e#3818) ### **Purpose** This PR fixes apache#3804 that the upload status panel behaved unexpectedly: when there were no queued/active uploads, the UI still rendered empty panels, which was confusing. This PR hides empty panels and restores the clear empty state. ### **Changes** - Introduce a flag ` hasAnyActivity = queuedCount > 0 || activeCount > 0 || pendingChangesCount > 0 ` - Conditionally render status panels: - Pending only when queuedCount > 0 - Uploading only when activeCount > 0 - Finished only when hasAnyActivity - Restore the empty state: when no activity, render `<texera-dataset-staged-objects-list>` outside the collapse so “No pending changes” is visible - Add a bottom divider beneath the staged list to improve visual separation (the previous `[nzBorder]` was removed to avoid overlapping with the vertical divider) ### **Demonstration** **Datasets page:** <img width="1315" height="870" alt="main" src="https://github.com/user-attachments/assets/9112999c-24bc-4076-b139-eb3b405c2288" /> **Finished panel:** | Collapsed | Expanded (delete) | Expanded (adds) | |---|---|---| | <img width="260" src="https://github.com/user-attachments/assets/58fe8ba4-2b13-45a6-b049-318b288ec37e" alt="collapsed" /> | <img width="260" src="https://github.com/user-attachments/assets/21dada5c-0e39-4a9f-b1ee-ab0ede345aca" alt="delete" /> | <img width="260" src="https://github.com/user-attachments/assets/d3a25bea-bee3-4bb7-8023-b407741b01fe" alt="expand" /> | **Uploading files:** https://github.com/user-attachments/assets/413f9480-330e-456a-8a0a-7f87511fbf13 **Remove files:** https://github.com/user-attachments/assets/04fc0dc0-4f59-4e18-bb1d-72b0132ac943 Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
…#3597) ### Purpose This PR fixes an issue with text wrapping in workflow comments, where words would be broken up between sentences reducing readability closes apache#3595 ### Changes - Editited css in `nz-modal-comment-box.component.scss` to fix wrapping ### Before: <img width="565" height="434" alt="Screenshot (229)" src="https://github.com/user-attachments/assets/e05d44b5-c11d-45a8-88ed-a60d5bd48776" /> ### After: <img width="530" height="433" alt="Screenshot (230)" src="https://github.com/user-attachments/assets/970a8aa7-54d6-41ba-a774-0fb47b93db52" /> Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
This PR improves the template for creating GitHub issues of type "Bug", based on the feedback provided in [this comment](apache#3812 (comment)) 1. Added a Pre-release Version option to the version selection field. 2. Added an optional Commit Hash field for developers to specify the exact commit associated with the issue.
…pache#3635) Bumps [transformers](https://github.com/huggingface/transformers) from 4.44.2 to 4.53.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>Release v4.53.0</h2> <h3>Gemma3n</h3> <p>Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.</p> <p>Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain. For more information on Gemma 3n's efficient parameter management technology, see the <a href="https://ai.google.dev/gemma/docs/gemma-3n#parameters">Gemma 3n</a> page.</p> <p><img src="https://github.com/user-attachments/assets/858cb034-364d-4eb6-8de8-4a0b5eaff3d7" alt="image" /></p> <pre lang="python"><code>from transformers import pipeline import torch <p>pipe = pipeline( "image-text-to-text", torch_dtype=torch.bfloat16, model="google/gemma-3n-e4b", device="cuda", ) output = pipe( "<a href="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg">https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg</a>", text="<image_soft_token> in this image, there is" )</p> <p>print(output) </code></pre></p> <h3>Dia</h3> <p><img src="https://github.com/user-attachments/assets/bf86e887-e4f4-4222-993d-f5eac58f8040" alt="image" /></p> <p>Dia is an opensource text-to-speech (TTS) model (1.6B parameters) developed by <a href="https://huggingface.co/nari-labs">Nari Labs</a>. It can generate highly realistic dialogue from transcript including nonverbal communications such as laughter and coughing. Furthermore, emotion and tone control is also possible via audio conditioning (voice cloning).</p> <p><strong>Model Architecture:</strong> Dia is an encoder-decoder transformer based on the original transformer architecture. However, some more modern features such as rotational positional embeddings (RoPE) are also included. For its text portion (encoder), a byte tokenizer is utilized while for the audio portion (decoder), a pretrained codec model <a href="https://github.com/huggingface/transformers/blob/HEAD/dac.md">DAC</a> is used - DAC encodes speech into discrete codebook tokens and decodes them back into audio.</p> <ul> <li>Add Dia model by <a href="https://github.com/buttercrab"><code>@buttercrab</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/38405">#38405</a></li> </ul> <h3>Kyutai Speech-to-Text</h3> <!-- raw HTML omitted --> <p>Kyutai STT is a speech-to-text model architecture based on the <a href="https://huggingface.co/docs/transformers/en/model_doc/mimi">Mimi codec</a>, which encodes audio into discrete tokens in a streaming fashion, and a <a href="https://huggingface.co/docs/transformers/en/model_doc/moshi">Moshi-like</a> autoregressive decoder. Kyutai’s lab has released two model checkpoints:</p> <ul> <li><a href="https://huggingface.co/kyutai/stt-1b-en_fr">kyutai/stt-1b-en_fr</a>: a 1B-parameter model capable of transcribing both English and French</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/huggingface/transformers/commit/67ddc82fbc7e52c6f42a395b4a6d278c55b77a39"><code>67ddc82</code></a> Release: v4.53.0</li> <li><a href="https://github.com/huggingface/transformers/commit/0a8081b03d118da9a8c3fa143a03afe54a5c624e"><code>0a8081b</code></a> [Modeling] Fix encoder CPU offloading for whisper (<a href="https://redirect.github.com/huggingface/transformers/issues/38994">#38994</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/c63cfd6a833d629a74c098933017c61dd755969d"><code>c63cfd6</code></a> Gemma 3n (<a href="https://redirect.github.com/huggingface/transformers/issues/39059">#39059</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/3e5cc1285503bbdb6a0a3e173b5ae90566862215"><code>3e5cc12</code></a> [tests] remove tests from libraries with deprecated support (flax, tensorflow...</li> <li><a href="https://github.com/huggingface/transformers/commit/cfff7ca9a27280338c6a57dfa7722dcf44f51a87"><code>cfff7ca</code></a> [Whisper] Pipeline: handle long form generation (<a href="https://redirect.github.com/huggingface/transformers/issues/35750">#35750</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/02ecdcfc0f7d81e90a9c8e7f9e6d636123a84254"><code>02ecdcf</code></a> add _keep_in_fp32_modules_strict (<a href="https://redirect.github.com/huggingface/transformers/issues/39058">#39058</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/d973e62fdd86d64259f87debc46bbcbf6c7e5de2"><code>d973e62</code></a> fix condition where torch_dtype auto collides with model_kwargs. (<a href="https://redirect.github.com/huggingface/transformers/issues/39054">#39054</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/44b231671db25974cfebcdae34402ad5099bf37a"><code>44b2316</code></a> [qwen2-vl] fix vision attention scaling (<a href="https://redirect.github.com/huggingface/transformers/issues/39043">#39043</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/ae15715df138949328d18e1dd95fd9cb4efb8e09"><code>ae15715</code></a> polishing docs: error fixes for clarity (<a href="https://redirect.github.com/huggingface/transformers/issues/39042">#39042</a>)</li> <li><a href="https://github.com/huggingface/transformers/commit/3abeaba7e53512ef9c1314163dd7e462ab405ce6"><code>3abeaba</code></a> Create test for <a href="https://redirect.github.com/huggingface/transformers/issues/38916">#38916</a> (custom generate from local dir with imports) (<a href="https://redirect.github.com/huggingface/transformers/issues/39015">#39015</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.44.2...v4.53.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/texera/network/alerts). </details> > **Note** > Automatic rebases have been disabled on this pull request as it has been open for over 30 days. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
yunyad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Hardcoded Localhost URLs and Tokens
There are multiple instances of hardcoded URLs and ports (e.g., http://localhost:5000/, http://localhost:8889/). It's recommended to move these into a configuration file or environment variable for better maintainability and portability. -
Use of fetch Instead of Angular HttpClient
The current implementation uses native fetch inside Angular components. It is recommended to use Angular’s HttpClient service instead. -
Move Mapping Logic to Backend
The mapping logic appears to reside on the frontend. For consistency and scalability, consider moving it to the backend so that it can be reused across sessions and clients.
When a PR is created or updated, automatically set the author as the assignee. Signed-off-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
## Purpose
This PR sets user system to be enabled by default in the configuration.
Currently, this flag is by default set to be disabled (a.k.a. the
non-user mode). As no one is using the non-user mode and we are
requiring all the developers to enable the user system, we have decided
to abandon the non-user mode.
## Challenge & Design
The major blocker of setting the flag to be enabled by default is two
e2e test suites that rely on the non-user mode. These two test suites
execute a workflow in the Amber engine in each of their test cases.
Enabling the user mode would require texera_db in the test environment,
as in the user-system mode, the execution of a workflow requires an
`eid` (and subsequently a `vid`, `wid`, and `uid`) in `texera_db`.
We could use `MockTexeraDB`, which is currently used by many unit tests.
`MockTexeraDB` creates an embedded postgres instance per test suite, and
the embedded db is destroyed at the end of each such test suite.
However, a complexity of the two e2e test cases is they both access a
singleton resource `WorkflowExecutionsResource`, which caches the DSL
context from `SqlServer` (i.e., it only gets evaluated once per JVM):
```
final private lazy val context = SqlServer
.getInstance()
.createDSLContext()
```
In fact, most of the singleton resources in our current codebase cache
the `DSLContext` / Dao, as the `DSLContext` never gets updated during
the real Texera environment (i.e., the real`texera_db`'s address never
changes).
In the test environment, however, when working with `MockTexeraDB`, that
assumption does not hold, as each instance of `MockTexeraDB` has a
different address, and gets destroyed before other test suite runs.
Since all the test suites are executed in the same JVM during CI run,
using `MockTexeraDB` would cause the 2nd of the two e2e test cases to
fail because it still uses the DSL context from the 1st test suite's
`MockTexeraDB`.
The diagrams below show what happens when using the embedded
`MockTexeraDB` to run two e2e test suites that both need to access the
same singleton resource during their execution.
The 1st test suite creates an embedded DB (`DB1`) and lets the singleton
`SqlServer` object set its `DSLContext` to point to `DB1`. When the test
cases first access `WorkflowExecutionsResource` (`WER`), WER grabs the
`DSLContext` from `SqlServer` and caches it. `WER` then queries `DB1`
for all the test cases of test suite 1. When test suite 1 finishes,
`DB1` gets destroyed.

Later, In the same JVM, when test suite 2 starts, it also creates its
own embedded DB (`DB2`) and lets `SqlServer` point to `DB2`. However, as
the `DSLContext` in `WER` is cached, it does not get updated when the
test cases access `WER`, so `WER` still points to `DB1`, which is
already destroyed, and causes failures.

To solve this problem, we could either:
1. Avoid caching DSLContext/Dao in the codebase, or
2. Let the two e2e test cases use the same real, external database (same
as production environment) instead of `MockTexeraDB`.
**We choose the 2nd design, as these two are e2e tests which should
emulate production behavior with a real database.** To avoid polluting
the developer's local `texera_db`, we use a separate test database with
the same schema.
## Changes
- Sets `user-sys` to be enabled by default.
- Introduces a `texera_db_for_test_cases` specifically for test cases
and CIs. `texera_ddl.sql` is updated to allow creating the database with
a name other than `texera_db` (and still defaults to `texera_db`), and
CIs will automatically create `texera_db_for_test_cases` with the same
schema as `texera_db`.
- Updates `DataProcessingSpec` and `PauseSpec` to use
`texera_db_for_test_cases`. The two test suites now populate and cleanup
this database during their run.
- `MockTexeraDB` is updated to incorporate the changes to the DDL
script.
- `SqlServer` is also updated with a `clearInstance` logic so that other
unit tests that use `MockTexeraDB` can clear their instance in
`SqlServer` properly so that they do not interfere with the two e2e
tests.
## Next Step
Remove the `user-sys`'s`enabled` flag and its `if-else` handling logic
completely.
---------
Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
…3796) Bumps [prismjs](https://github.com/PrismJS/prism) from 1.29.0 to 1.30.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/PrismJS/prism/releases">prismjs's releases</a>.</em></p> <blockquote> <h2>v1.30.0</h2> <h2>What's Changed</h2> <ul> <li>check that <code>currentScript</code> is set by a script tag by <a href="https://github.com/lkuechler"><code>@lkuechler</code></a> in <a href="https://redirect.github.com/PrismJS/prism/pull/3863">PrismJS/prism#3863</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/lkuechler"><code>@lkuechler</code></a> made their first contribution in <a href="https://redirect.github.com/PrismJS/prism/pull/3863">PrismJS/prism#3863</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0">https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/PrismJS/prism/blob/v2/CHANGELOG.md">prismjs's changelog</a>.</em></p> <blockquote> <h1>Prism Changelog</h1> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/PrismJS/prism/commit/76dde18a575831c91491895193f56081ac08b0c5"><code>76dde18</code></a> Release 1.30.0</li> <li><a href="https://github.com/PrismJS/prism/commit/93cca40b364215210f23a9e35f085a682a2b8175"><code>93cca40</code></a> npm pkg fix</li> <li><a href="https://github.com/PrismJS/prism/commit/99c5ca970f18f744d75e473573d4679100f87086"><code>99c5ca9</code></a> Add release script</li> <li><a href="https://github.com/PrismJS/prism/commit/8e8b9352dac64457194dd9e51096b4772532e53d"><code>8e8b935</code></a> check that currentScript is set by a script tag (<a href="https://redirect.github.com/PrismJS/prism/issues/3863">#3863</a>)</li> <li><a href="https://github.com/PrismJS/prism/commit/f894dc2cbb507f565a046fed844fd541f07aa191"><code>f894dc2</code></a> Fix logo in the footer</li> <li><a href="https://github.com/PrismJS/prism/commit/ac38dcec9bea6bac064a7264b7aeba086e3102bf"><code>ac38dce</code></a> Delete CNAME</li> <li><a href="https://github.com/PrismJS/prism/commit/9b5b09aef4dc2c18c28d2f5a6244d4efcc6ab5cb"><code>9b5b09a</code></a> Enable CORS</li> <li>See full diff in <a href="https://github.com/PrismJS/prism/compare/v1.29.0...v1.30.0">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~dmitrysharabin">dmitrysharabin</a>, a new releaser for prismjs since your current version.</p> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/texera/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> Co-authored-by: yunyad <114192306+yunyad@users.noreply.github.com>
Reverts apache#3835. The added action `technote-space/assign-author@v1` is not approved by apache.
## Update This PR fixes formatting issues that introduce redundant file changes in the core [PR](apache#3598).
…ze the requests to `/wsapi` and `Computing Unit` endpoints (apache#3598) ## Access Control Service This service is currently used only by envoy as authorization service. It act as a third party service to authorize any request sent to the computing unit to get socket connection through `/wsapi`. It parses the `user-token` from URL parameters and then check user access to the computing unit by checking the database and add the corresponding information to the following headers: - x-user-cu-access - x-user-id - x-user-name - x-user-email If the service can not parse the token or fail for any reason, the access to computing unit is denied by envoy. If the authorization succeed, the user is directly connected to computing unit using `Upgrade` on the first `HTTP` handshake request so the latency will not change. ## The new connection flow <img width="1282" height="577" alt="489656839-e09b06ee-3915-4c18-9584-e880bc06011d" src="https://github.com/user-attachments/assets/f7b0d29e-f30b-4e7f-9a0d-966f52d8d48a" /> 1. A user initiates an `HTTP` request to connect to a specific Computing Unit. 2. The request is first routed through the **Gateway** to **Envoy**. 3. Envoy pauses the request and sends a query to the **Access Control Service** to get an authorization decision. 4. The Access Control Service verifies the user's token and checks a PostgreSQL database to see if the user has the necessary permissions for the target Computing Unit. 5. **If authorized**, the service injects specific HTTP headers (`x-user-cu-access`, `x-user-id`, `x-user-name`) into the request and sends an approval back to Envoy. 6. Envoy then forwards the approved request to the Computing Unit. 7. The connection is then upgraded to a WebSocket, establishing a secure, interactive session. If authorization fails at any point, Envoy immediately denies the connection request, and the user is prevented from accessing the Computing Unit. This new process provides **enhanced security**, a **centralized authorization logic**, and is designed to have **no performance impact** on the established WebSocket connection since the check is performed only on the initial handshake. ## Summary of file changes | Component/Flow | File | Description | | :--- | :--- | :--- | | **Database Access Logic** | `core/auth/src/main/scala/edu/uci/ics/texera/auth/util/ComputingUnitAccess.scala` | Implements the logic to query the PostgreSQL database and determine a user's access privilege (`READ`, `WRITE`, `NONE`) for a given Computing Unit. | | | `core/auth/src/main/scala/edu/uci/ics/texera/auth/util/HeaderField.scala` | Defines constants for the custom HTTP headers (`x-user-cu-access`, `x-user-id`, etc.) that are injected by the Access Control Service. | | **WebSocket Connection Handling** | `core/amber/src/main/scala/edu/uci/ics/texera/web/ServletAwareConfigurator.scala` | Modified to read the new authorization headers during the WebSocket handshake. If headers are present, it creates the `User` object from them; otherwise, it falls back to the old method of parsing the JWT from URL parameters for single-node mode. | | | `core/amber/src/main/scala/edu/uci/ics/texera/web/SessionState.scala` | Updated to store the user's access privilege level for the current computing unit within the session. | | | `core/amber/src/main/scala/edu/uci/ics/texera/web/resource/WorkflowWebsocketResource.scala` | Enforces the access control by checking if the user has `WRITE` privilege before allowing a `WorkflowExecuteRequest`. | | **Deployment & Routing** | `deployment/access-control-service.dockerfile` | New Dockerfile for building and containerizing the Access Control Service. | | | `deployment/k8s/texera-helmchart/templates/access-control-service-deployment.yaml` | New Kubernetes manifest to deploy the Access Control Service. | | | `deployment/k8s/texera-helmchart/templates/access-control-service-service.yaml` | New Kubernetes service manifest to expose the Access Control Service within the cluster. | | | `deployment/k8s/texera-helmchart/templates/envoy-config.yaml` | **Key change:** Configures Envoy to use the new service as an external authorization filter (`ext_authz`). It intercepts relevant requests, forwards them for an authorization check, and then passes the injected headers to the upstream service (AmberMaster). | | | `deployment/k8s/texera-helmchart/values.yaml` | Adds the configuration parameters for the new Access Control Service to the Helm chart. | | **Frontend UI** | `core/gui/src/app/workspace/component/menu/menu.component.ts` & `.html`| The frontend is updated to disable the "Run" button if the connected user does not have `WRITE` access to the selected Computing Unit, providing immediate visual feedback. | | **Build & Configuration** | `core/build.sbt` | The root SBT build file is updated to include the new `AccessControlService` module. | | | `core/config/src/main/scala/edu/uci/ics/amber/util/PathUtils.scala` | Adds a path helper for the new service's directory structure. | --------- Co-authored-by: Ali Risheh <alirisheh@dhcp-172-31-175-237.mobile.uci.edu>
…ough parameters (apache#3820) ## Summary - Fixed non-deterministic parameter ordering issue when creating Dataset objects from JOOQ records - Used `createdDataset.into(classOf[Dataset])` to convert DatasetRecord to Dataset POJO instead of manual constructor Fixes apache#3821 --------- Co-authored-by: Claude <noreply@anthropic.com>
# Purpose This PR is a successor of apache#3782. As the non-user system mode is no longer used or maintained, we can remove the flag to switch between user-system being enabled/disabled, and keep only the mode of user-system being enabled. # Content - Removed the `user-sys.enabled` flag, both in the frontend and backend. - Removed all the if-else statements based on this flag in the codebase. Only the cases of user system being enabled are kept. - Removed `ExecutionResourceMapping` in the backend as it is no longer needed. - Removed `WorkflowCacheService` in the frontend as it is no longer needed. --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
…pache#3836) ## Purpose apache#3571 disabled frontend undo/redo due to an existing bug with the undo/redo manager during shared editing. This PR fixes that bug and re-enables undo/redo. ## Bug with shared editing The bug can be minimally reproduced as follows with two users editing the same workflow (or two tabs opened by the same user): 1. User A deletes a link E from operator X to Y on the canvas, 2. User B deletes operator Y. 3. User A clicks "undo", and the workflow reaches an erroneous state, where there is a link E that connects to an operator Y that no longer exists. Note E exists in the frontend data but is not visible on the UI. The following gif shows this process.  ## Shared-editing Architecture Shared editing (apache#1674) is achieved by letting the frontend rely on data structures from yjs (a CRDT library) as its data model, as any manipulation to these data structures can be propagated to other users with automatic conflict-resolution. There are two layers of data on each user's Texera frontend, one being the UI data (jointjs), and the other being this shared "Y data". The two layers in each user's UI are synched by our application code, and the Y data between users of a shared-editing sessions are kept in sync with automatic conflict resolution by relying on yjs. The following diagram shows what happens when a user adds a link and how the other user sees this change in real-time.  Yjs's CRDT guarantees the eventual **consistency** of this underlying data model among concurrent editors, i.e., it makes sure this data model is correctly synced in each editor's frontend. ## The core problem Yjs does not offer a "graph" data structure, and currently in Texera, the shared data structures for operators and links are two separate `Map`s: - `operatorIDMap`: `operatorID`->`Operator` - `operatorLinkMap`: `linkID`-> `OperatorLink` There is an application-specific "referential constraint" in Texera's frontend that "a link must connect to an operator that exists", and this kind of sanity checking on the data is not the concern of CRDT. It can only be enforced by the application (i.e., ourselves). Ideally, before making any changes to the shared data model, we should do sanity checking and reject changes that violate our application-specific constraints. As shown below, in each user's frontend, there are 3 paths where the shared data model can be modified.  **Path 1**: The first is path includes those changes initiated by a user's UI actions (e.g., add a link on the UI). For this path, we do have existing sanity checking logic: ``` public addLink(link: OperatorLink): void { this.assertLinkNotExists(link); this.assertLinkIsValid(link); this.sharedModel.operatorLinkMap.set(link.linkID, link); } ``` **Path 2**: Another path is undo/redo, which is purely managed by an `UndoManager`, also offered by Yjs. This module is local to each user's frontend, and it automatically tracks local changes to the shared data model. When a user clicks "undo", `UndoManager` directly applies changes to the shared data model. **The core of the problem is there is no sanity checking on this path.** **Path 3**: The third path is remote changes from another collaborator. There is also no sanity checking on this path, but the correctness of such changes depends on whether the change was sanity-checked on the collaborator's side (i.e., if it is a UI change from User A, the propagated change to User B's frontend would be sanity-checked; if it is a undo change, however, the propagated changed to User B would not be sanity-checked and could cause issues.) ## Cause of the bug The following diagram shows how the bug happens from the perspective of the shared model.  When user A clicks "Undo" after 2), the `UndoManager` simply applies the reverse-operation of "Delete E", and add the link `E` to `operatorLinkMap `. As there is no sanity checking during this process, this operation succeeds, and the shared model reaches a state that violates the constraint. ## Solution Unfortunately, due to the limitations of Yjs's APIs, it is not possible to add sanity checking to Path 2 or 3 **before** a change is applied, as an undo/redo operation on the `UndoManager`'s stack is not exposed as a meaningful action (i.e., there is no way to tell that an action to be applied to the shared model is an `addLink` if it is an undo operation). Nevertheless, we can react to a change to the shared model that is initiated from Path 2 or Path 3 after the change has been applied, and add sanity checking logic there to "repair" unsanitary changes. This places (`SharedModelChangeHandler`) is exactly where we sync the changes from the shared model to the UI: any changes to the shared model not initiated by the UI (i.e., changes from the `UndoManager` or remote changes by other users) go through this place, and such changes are parsed as meaningful changes such as "add a link", "delete an operator", etc.  Currently, the only sanity checking needed is to check if a newly added link connects to operator / ports that exist and that it is not a duplicate link. We add such checking logic in `SharedModelChangeHandler`, and revert unsanitary operations before it is reflected on the UI. ## Demo The following gif shows the experience after the fix. When unsanitary actions caused by undo happens, it would fail and we log it in the console. The workflow JSON no longer gets damaged.  --------- Co-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
) ### **Purpose** This PR resolved apache#3844 that pending uploads cannot be removed before they start. This PR enables removing/canceling items directly from the Pending panel, improving queue control and flexibility in managing uploads. ### **Changes** - Add a Remove action to Pending items; behavior and styling match the Uploading panel’s remove action. - Refactor cancelExistingUpload(fileName: string): - Uploading/Initializing → reuse the abort path to properly finalize server-side and prevent leaks. - Pending → front-end clean only (remove from queue, tasks) with no backend abort call. ### **Demonstration** https://github.com/user-attachments/assets/aa4aa40c-bf7a-45fd-9257-fcfac4a00da9
This PR changes the following: - remove `version` attribute - update container names to avoid conflicts - set default named volumes for data persistence resolves apache#3816 --------- Co-authored-by: Jiadong Bai <43344272+bobbai00@users.noreply.github.com>
…e#3772) ### Description: Implemented restriction on `export result` to prevent users from exporting workflow results that depend on non-downloadable datasets they don't own. This ensures dataset download cannot be circumvented through workflow execution and result export. Closes apache#3766 ### Changes: **Backend** - Added server-side validation to analyze workflow dependencies and block export of operators that depend on non-downloadable datasets - Implemented algorithm to propagate restrictions to downstream operators **Frontend** - Updated export dialog component to show restriction warnings, filter exportable operators, and display blocking dataset information ### Video: The video demonstrates how `export result` behaves on: - workflows with downloadable datasets - workflows with non-downloadable datasets - workflows with both downloadable and non-downloadable datasets https://github.com/user-attachments/assets/56b78aeb-dbcc-40fc-89b4-9c4238f8bc56 --------- Signed-off-by: Seongjin Yoon <75426413+seongjinyoon@users.noreply.github.com> Co-authored-by: Seongjin Yoon <seongjin@Seongjins-MacBook-Pro.local> Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> Co-authored-by: Seongjin Yoon <seongjin@dhcp-172-31-219-219.mobile.uci.edu> Co-authored-by: Seongjin Yoon <seongjin@seongjins-mbp.lan> Co-authored-by: Seongjin Yoon <seongjin@dhcp-172-31-230-246.mobile.uci.edu> Co-authored-by: Jiadong Bai <43344272+bobbai00@users.noreply.github.com>
…e#4065) <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR fixes a bug where editing user data on admin dashboard would result in user data jumping around. This issue is caused by the part where it fetches the user list again after editing. The original implementation was to call `ngOnInit` after editing to re-fetch the whole user list from the backend, causing the changed data to be out of order. The new implementation does the following thing: - Creates a new `User` instance with the affected user's data along with the updated attribute - After backend successfully updates the updated user in the database, the frontend uses the helper function `replaceOneImmutable` to update `userList` and `listOfDisplayUser` in the frontend to reflect the changes in frontend. This allows the user data to be changed in place without fetching the whole list after every update. ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> Closes apache#4064 ### Before Change video https://github.com/user-attachments/assets/6769e32f-d7a4-4817-956d-773e97fae57e ### Proposed Change video https://github.com/user-attachments/assets/01b4a0b1-3f56-437f-9b29-637854e3dd79 ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> None. ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No. --------- Co-authored-by: ali risheh <ali.risheh876@gmail.com>
### What changes were proposed in this PR? 1. **Centralize and extend `AttributeType` operations** Move and refactor the existing attribute-type helpers into `AttributeTypeUtils`: * `compare`, `add`, `zeroValue`, `minValue`, `maxValue`. * Unify null-handling semantics across these operations. (use of match-case instead of if + match) Extend support to additional types: * Add comparison/aggregation support for `BOOLEAN`, `STRING`, and `BINARY`. Change numeric coercion strategy: * Coerce numeric values to `Number` instead of a specific primitive type (e.g., `Double`) to reduce `ClassCastException`s when the input is not strictly schema-validated. * Preserve existing comparison semantics for doubles by delegating to `java.lang.Double.compare` (including handling of ±∞ and `NaN`). Introduce “identity” helpers: * `zeroValue` returns an additive identity for numeric/timestamp types, and `Array.emptyByteArray` for `BINARY` as a safe, non-throwing identity. * `minValue` / `maxValue`: provide lower/upper bounds for supported numeric and timestamp types. 2. **Refactor operators to reuse `AttributeTypeUtils`** * `AggregationOperation`: implement `SUM` / `MIN` / `MAX` using the centralized helpers instead of custom per-operator logic. * `StableMergeSortOpExec`: reuse the typed compare logic from `AttributeTypeUtils`. * `SortPartitionsOpExec`: simplify to use a one-liner comparator based on `AttributeTypeUtils.compare` (or a thin wrapper) for clarity and reuse. 3. **Add tests** * workflow-core/src/test/scala/org/apache/amber/core\tuple/AttributeTypeUtilsSpec.scala * **compare**: Verifies correct null-handling and ordering for INTEGER, BOOLEAN, TIMESTAMP, STRING, and BINARY values. * **add**: Ensures `null` acts as identity and confirms correct addition for INTEGER, LONG, DOUBLE, and TIMESTAMP. * **zeroValue**: Checks that numeric/timestamp zero identities and empty binary array for BINARY are returned, and that unsupported types (e.g., STRING) throw. * **minValue / maxValue**: Validate correct numeric and timestamp bounds, BINARY minimum, and exceptions for unsupported types (e.g., BOOLEAN, STRING). * workflow-operator/src/test/scala/org/apache/amber/operator/aggregate/AggregateOpSpec.scala * Verifies `getAggregationAttribute` chooses the correct result type for different functions (SUM keeps input type, COUNT → INTEGER, CONCAT → STRING). * Checks `getAggFunc` SUM behavior for INTEGER and DOUBLE columns, ensuring correct totals and preserved fractional values. * Tests COUNT, CONCAT, MIN, MAX, and AVERAGE aggregations, including correct handling of `null` values and edge cases like “no rows”. * Confirms `getFinal` rewrites COUNT into a SUM on the intermediate count column and rewires attributes correctly for non-COUNT functions. * Exercises `AggregateOpExec` end-to-end: SUM grouped by a key (city) and combined global SUM+COUNT with no group-by keys, validating the produced tuples. 5. **Scope / non-goals / Extras** * No change to external APIs * Main behavior changes are localized to `AttributeType` operations and the operators that consume them. --- **Any related issues, documentation, discussions?** * Closes: apache#3923 **How was this PR tested?** Workflow Image: <img width="1684" height="859" alt="image" src="https://github.com/user-attachments/assets/2682ebdc-0f45-40c6-b304-0cea0b76b44f" /> Workflow file: [agg_test_1.json](https://github.com/user-attachments/files/23540242/agg_test_1.json) Python benchmark: ``` import pandas as pd df = pd.read_csv("/mnt/data/test.csv") # Limit BEFORE sorting df_limited = df.head(1000) # Now sort ascending df_sorted = df_limited.sort_values("rna_umis", ascending=True) # Group by pass_all_filters with aggregations agg = df_sorted.groupby("pass_all_filters")["rna_umis"].agg( min="min", max="max", count="count", avg="mean", sum="sum" ).reset_index() agg ``` Python Result: <img width="928" height="188" alt="image" src="https://github.com/user-attachments/assets/69da33cd-ada4-4b05-a3f9-ae139f8575b9" /> Texera Result (Avg): False | 0 | 80926 | 240 | 15987.68 | 3837043 -- | -- | -- | -- | -- | -- True | 11893 | 102559 | 760 | 35557.93 | 27024027 For timestamps test: - 1970-01-01T00:00:00Z - 2000-02-29T12:00:00Z - 2024-12-31T23:59:59Z 1. Avg: - New version: 909835199750 - Previous version: 909835199750 2. Sum: - New version: 2055-03-01T05:59:59.000Z (UTC) - Previous version: 2055-03-01T11:59:59.000Z (UTC-6; Mexico City Time) **Was this PR authored or co-authored using generative AI tooling?** * Co-authored with ChatGPT.
### What changes were proposed in this PR? This PR updates all Texera service images in the single-node `docker-compose.yml` to use the Apache registry with `latest` tags, aligning with the naming convention established in the CI/CD workflow (apache#4055). The following image references have been updated: - `texera/file-service:single-node-release-1-0-0` → `apache/texera-file-service:latest` - `texera/workflow-compiling-service:single-node-release-1-0-0` → `apache/texera-workflow-compiling-service:latest` - `texera/computing-unit-master:single-node-release-1-0-0` → `apache/texera-workflow-execution-coordinator:latest` - `texera/texera-web-application:single-node-release-1-0-0` → `apache/texera-dashboard-service:latest` - `texera/texera-example-data-loader:single-node-release-1-0-0` → `apache/texera-example-data-loader:latest` This change ensures that the docker-compose configuration uses the correct image names and registry that are now being built and pushed by the GitHub Actions workflow. ### Any related issues, documentation, discussions? Related to apache#4055 which introduced the GitHub Actions workflow for building and pushing images to the Apache registry. ### How was this PR tested? This PR only updates image references in the docker-compose.yml configuration file. No code changes were made. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Co-authored-by: Claude <noreply@anthropic.com>
…apache#4067) <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR introduces a new attribute type, `big_object`, that lets Java operators pass data larger than 2 GB to downstream operators. Instead of storing large data directly in the tuple, the data is uploaded to MinIO, and the tuple stores a pointer to that object. Future PRs will add support for Python and R UDF operators. #### Main changes: 1. MinIO - Added a new bucket: `texera-big-objects`. - Implemented multipart upload (separate from LakeFS) to efficiently handle large uploads 2. BigObjectManager (Internal Java API) - `create()` → Generates a unique S3 URI, registers it in the database, and returns the URI string - `deleteAllObjects()` → Deletes all big objects from S3 (Please check the Note section below) 3. Streaming I/O Classes - `BigObjectOutputStream`: Streams data to S3 using background multipart upload - `BigObjectInputStream`: Lazily streams data from S3 when reading 4. Iceberg Integration - BigObject pointers are stored as strings in Iceberg - A magic suffix is added to attribute names to differentiate them from normal strings #### User API ##### Creating and Writing a BigObject: ```java // In an OperatorExecutor BigObject bigObject = new BigObject(); try (BigObjectOutputStream out = new BigObjectOutputStream(bigObject)) { out.write(myLargeDataBytes); // or: out.write(byteArray, offset, length); } // bigObject is now ready to be added to tuples ``` ##### Reading a BigObject: ```java // Option 1: Read all data at once try (BigObjectInputStream in = new BigObjectInputStream(bigObject)) { byte[] allData = in.readAllBytes(); // ... process data } // Option 2: Read a specific amount try (BigObjectInputStream in = new BigObjectInputStream(bigObject)) { byte[] chunk = in.readNBytes(1024); // Read 1KB // ... process chunk } // Option 3: Use as a standard InputStream try (BigObjectInputStream in = new BigObjectInputStream(bigObject)) { int bytesRead = in.read(buffer, offset, length); // ... process data } ``` #### Note This PR does NOT handle lifecycle management for big objects. For now, when a workflow or workflow execution is deleted, all related big objects in S3 are deleted immediately. We will add proper lifecycle management in a future update. #### System Diagram <img width="3444" height="2684" alt="BigObject-Page-1 drawio (4)" src="https://github.com/user-attachments/assets/98eded06-03b2-41be-b50b-0520a654ddca" /> ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 4. If there is design documentation, please add the link. 8. If there is a discussion in the mailing list, please add the link. --> Related to apache#3787. ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Tested by running this workflow multiple times and check MinIO dashboard to see whether three big objects are created and deleted. Specify the file scan operator's property to use any file bigger than 2GB. [Big Object Java UDF.json](https://github.com/user-attachments/files/23666312/Big.Object.Java.UDF.json) ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> Yes. --------- Signed-off-by: Chris <143021053+kunwp1@users.noreply.github.com>
# Conflicts: # frontend/src/app/app.module.ts # frontend/src/app/workspace/component/workflow-editor/workflow-editor.component.ts
Please see this [wiki page](https://github.com/apache/texera/wiki/Guide-to-enable-the-LLM%E2%80%90based-Texera-copilot) to learn how to enable this feature <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? This PR introduces the LLM agent management & chat panel on the workflow workspace to help users with their workflows. #### Demo 1. Manage agent using the panel  2. Ask agent questions regarding available Texera operators  3. Ask agent about users' current workflow  #### Architecture Diagram See apache#4034 #### Major Changes 1. Frontend: introduce the agent management & chat panel 5. Backend: - New micro service `litellm` is introduced: which is a open source service that manages the communication between app and LLM APIs - `AccessControlService` is modified: adding the logic for routing `litellm` related requests ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 6. If there is design documentation, please add the link. 7. If there is a discussion in the mailing list, please add the link. --> Related to apache#4034 #### Current PR limitation and future PR plans In current PR, the agent is only able to act in a "read-only" way, meaning it can only answer questions regarding operators, but couldn't change user's workflow. In future PRs, - Agent will be able to edit user's workflow - Agent feature will be added to k8s deployment architecture. ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Frontend unit test cases are added. To test the PR e2e: 1. Launch litellm by following the instruction in `bin/litellm-config.yaml` 2. Launch `AccessControlService` 5. All set! You can now test the agent in workflow workspace. ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> The code content is co-authored with Claude code. This PR is not generated by generative AI. --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> Co-authored-by: Claude <noreply@anthropic.com>
…ry (apache#4072) ### What changes were proposed in this PR? This PR updates all Texera service images in the Kubernetes Helm chart (`bin/k8s/values.yaml`) to use the Apache registry with `latest` tags, aligning with the naming convention established in the CI/CD workflow (apache#4055). The following image references have been updated: - `texera/texera-example-data-loader:cluster` → `apache/texera-example-data-loader:latest` - `texera/texera-web-application:cluster` → `apache/texera-dashboard-service:latest` - `texera/workflow-computing-unit-managing-service:cluster` → `apache/texera-workflow-computing-unit-managing-service:latest` - `texera/workflow-compiling-service:cluster` → `apache/texera-workflow-compiling-service:latest` - `texera/file-service:cluster` → `apache/texera-file-service:latest` - `texera/config-service:cluster` → `apache/texera-config-service:latest` - `texera/access-control-service:cluster` → `apache/texera-access-control-service:latest` - `texera/computing-unit-master:cluster` → `apache/texera-workflow-execution-coordinator:latest` This ensures that the Kubernetes Helm chart uses the correct image names and registry that are now being built and pushed by the GitHub Actions workflow. ### Any related issues, documentation, discussions? Related to apache#4055 which introduced the GitHub Actions workflow for building and pushing images to the Apache registry. ### How was this PR tested? This PR only updates image references in the Kubernetes Helm chart configuration file. No code changes were made. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Chen Li <chenli@gmail.com>
### What changes were proposed in this PR? Move dependency `transformer` from `requirements.txt` to `operator-requirements.txt`. ### Any related issues, documentation, discussions? The dependency were introduced apache#2600 for supporting hugging face operators. It should not have been a dependency for pyamber, but the specific operator. - apache#2600 This blocks apache#4088 ### How was this PR tested? Existing tests. ### Was this PR authored or co-authored using generative AI tooling? No
### What changes were proposed in this PR? Pin external GitHub Actions ### Any related issues, documentation, discussions? Per https://infra.apache.org/github-actions-policy.html ### Was this PR authored or co-authored using generative AI tooling? No
<!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> Bump `transformers` from 4.53.0 to 4.57.3 to support Hugging Face operators. ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> Resolves apache#4091 by updating the `transformers` dependency to support Hugging Face operators. ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Tested by running the Hugging Face operators in Texera and verifying that the models load and run successfully (see screenshot below). <img width="453" height="295" alt="image" src="https://github.com/user-attachments/assets/208d9721-24a2-4da9-9488-81da5ad3219a" /> ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No.
### What changes were proposed in this PR? Bump `pandas` version to 2.2.3 to be [compatible with Python 3.13](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.2.3.html#pandas-2-2-3-is-now-compatible-with-python-3-13). ### Any related issues, documentation, discussions? Resolves apache#4095 ### How was this PR tested? CI ### Was this PR authored or co-authored using generative AI tooling? No Signed-off-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
### What changes were proposed in this PR? Bump numpy version to 2.1.0 to be [compatible with Python 3.13](https://numpy.org/news/#numpy-210-released). ### Any related issues, documentation, discussions? Closes apache#4097 ### How was this PR tested? CI ### Was this PR authored or co-authored using generative AI tooling? No --------- Signed-off-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
…artifacts (apache#4076) <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR adds a CI file for uploading the release artifacts to the [dist.apache/](https://dist.apache.org/repos/dist/dev/incubator/texera/) Here are the secrets needed to be set: | Secret | Purpose | |-----------------|-----------------------------------------------| | GPG_PRIVATE_KEY | The GPG private key used to sign the release tarball. Imported via gpg --import to create the .asc signature file. | | GPG_PASSPHRASE | Passphrase for the GPG private key. Used with --passphrase-fd to unlock the key during signing | | SVN_USERNAME | Apache SVN username for committing artifacts to dist.apache.org. Used to authenticate with the ASF distribution repository. | | SVN_PASSWORD | Apache SVN password. Paired with SVN_USERNAME to push release artifacts to the staging directory (dist/dev/incubator/texera/). | ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> Closes apache#4081 ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> This PR is tested manually using the Github actions on my own fork. See: https://github.com/bobbai00/texera/actions/runs/19608186790 ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> Yes, co-authored with Claude code --------- Co-authored-by: Claude <noreply@anthropic.com>
### What changes were proposed in this PR? This PR refactors the package structure by moving all Amber engine code from `org.apache.amber` to `org.apache.texera.amber`. This aligns the package naming with the Texera project organization and ensures all components are properly namespaced under the Apache Texera organization. **Key Changes:** 1. **Directory Structure Migration** - Moved all source directories: - Scala/Java sources: 8 modules moved - Protobuf definitions: 14 files moved - Python proto generated code: moved under new namespace - Frontend TypeScript proto: moved under new namespace 2. **Code Updates** - Updated across 707 files: - Package declarations in 576 Scala/Java files - Import statements across all Scala/Java files - 57 Python files updated for new proto imports - 14 Protobuf files updated with new Java package - 2 TypeScript files updated with new import paths - Configuration files (cluster.conf) - String literals containing class names for reflection/dynamic loading 3. **Package Namespace Changes:** ```diff - org.apache.amber.engine.common - org.apache.amber.operator.* - org.apache.amber.core.* - org.apache.amber.compiler.* + org.apache.texera.amber.engine.common + org.apache.texera.amber.operator.* + org.apache.texera.amber.core.* + org.apache.texera.amber.compiler.* ``` ### Any related issues, documentation, discussions? Closes apache#4003 ### How was this PR tested? CI ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.5 (Cursor IDE)
…4087) <!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR adds pre-configured IntelliJ run configurations for: - launching all 8 backend microservices, - the frontend service, - and lakeFS via Docker Compose. With these changes, developers can now launch the backend services, lakeFS, and frontend directly from IntelliJ’s run menu, eliminating the need to manually locate and configure each relevant class or compose file. This leverages IntelliJ’s built-in Compound and individual run configurations, so no additional plugins are required. https://github.com/user-attachments/assets/9ef8fb13-2dc3-4598-ba44-0540d37202db ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> Fixes apache#4045 ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Verified on a local IntelliJ IDEA environment. The Compound run config cleanly launches all backend microservices in parallel. ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu> Co-authored-by: Chen Li <chenli@gmail.com>
…est architecture (apache#4077) ### What changes were proposed in this PR? This PR improves the single-node docker-compose configuration with the following changes: 1. **Added microservices**: - `config-service` (port 9094): Provides endpoints for configuration management - `access-control-service` (port 9096): Handles user permissions and access control - `workflow-computing-unit-managing-service` (port 8888): Provides endpoints for managing computing units - All services are added with proper health checks and dependencies on postgres - Nginx reverse proxy routes are configured for `/api/config` and `/api/computing-unit` 2. **Removed outdated environment variables** from `.env`: - `USER_SYS_ENABLED=true` - `STORAGE_ICEBERG_CATALOG_TYPE=postgres` 3. **Removed unused example data loader**: the example data will be loaded via other ways, not the container way anymore. ### Any related issues, documentation, discussions? Closes apache#4083 ### How was this PR tested? docker-compose tested locally. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-5-20250101) --------- Co-authored-by: Claude <noreply@anthropic.com>
Bumps [pg8000](https://github.com/tlocke/pg8000) from 1.31.2 to 1.31.5. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/tlocke/pg8000/commits">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/apache/texera/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xiaozhen Liu <xiaozl3@uci.edu>
### What changes were proposed in this PR? Add a configuration option to automatically shorten file paths for Windows users when the original path exceeds the system’s maximum length. After this PR, Windows users should not see this error anymore. <img width="612" height="157" alt="image" src="https://github.com/user-attachments/assets/73a23ef2-0fad-4f2f-bc99-c7f2e576a4d9" /> ### Any related issues, documentation, discussions? Follow-up of PR apache#4087 ### How was this PR tested? Tested manually. ### Was this PR authored or co-authored using generative AI tooling? No
### What changes were proposed in this PR? Removed official support for R-UDF. The frontend is not changed, but during execution user will receive an error about unofficially supported R-UDF. We plan to move the R-UDF to a third party hosted repo, so users can install the R-UDF support as a plugin. ### Any related issues, documentation, discussions? This change was due to the fact that R-UDF runtime requires `rpy2`, which is not apache-license friendly. resolves apache#4084 ### How was this PR tested? Added test suite `TestExecutorManager`. ### Was this PR authored or co-authored using generative AI tooling? Tests generated by Cursor. --------- Co-authored-by: Yicong Huang <yicong.huang+data@databricks.com> Co-authored-by: Chen Li <chenli@gmail.com>
<!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> 1. Replace flake8 and black with Ruff in CI. 2. Format existing code using Ruff Basic Ruff commands: Under amber/src/main/python ```cd amber/src/main/python``` Run Ruff’s formatter in dry mode ```ruff format --check .``` Run Ruff’s formatter ```ruff format .``` Run Ruff’s linter ```ruff check .``` ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 4. If there is design documentation, please add the link. 5. If there is a discussion in the mailing list, please add the link. --> Closes apache#4078 ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> I created a PR on my own fork to ensure CI is working. ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
### What changes were proposed in this PR? This PR bumps the project version from `1.0.0` to `1.1.0-incubating` across all relevant configuration files: - **`build.sbt`**: Updated `version := "1.0.0"` to `version := "1.1.0-incubating"` - **`bin/single-node/docker-compose.yml`**: - Updated project name from `texera-single-node-release-1-0-0` to `texera-single-node-release-1-1-0-incubating` - Updated network name from `texera-single-node-release-1-0-0` to `texera-single-node-release-1-1-0-incubating` - Updated all 7 Texera service image tags from `:latest` to `:1.1.0-incubating` - Updated the R operator comment reference - **`bin/k8s/values.yaml`**: Updated all 8 Texera service image tags from `:latest` to `:1.1.0-incubating` ### Any related issues, documentation, discussions? Closes apache#4082 ### How was this PR tested? This is a configuration-only change. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.5) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
<!-- Thanks for sending a pull request (PR)! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: [Contributing to Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md) 2. Ensure you have added or run the appropriate tests for your PR 3. If the PR is work in progress, mark it a draft on GitHub. 4. Please write your PR title to summarize what this PR proposes, we are following Conventional Commits style for PR titles as well. 5. Be sure to keep the PR description updated to reflect all changes. --> ### What changes were proposed in this PR? <!-- Please clarify what changes you are proposing. The purpose of this section is to outline the changes. Here are some tips for you: 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, you can clarify why it is a bug. 3. If it is a refactoring, clarify what has been changed. 3. It would be helpful to include a before-and-after comparison using screenshots or GIFs. 4. Please consider writing useful notes for better and faster reviews. --> This PR renames the `BigObject` type to `LargeBinary`. The original feature was introduced in apache#4067, but we decided to adopt the `LargeBinary` terminology to align with naming conventions used in other systems (e.g., Arrow). This change is purely a renaming/terminology update and does not modify the underlying functionality. ### Any related issues, documentation, discussions? <!-- Please use this section to link other resources if not mentioned already. 1. If this PR fixes an issue, please include `Fixes apache#1234`, `Resolves apache#1234` or `Closes apache#1234`. If it is only related, simply mention the issue number. 2. If there is design documentation, please add the link. 3. If there is a discussion in the mailing list, please add the link. --> apache#4100 (comment) ### How was this PR tested? <!-- If tests were added, say they were added here. Or simply mention that if the PR is tested with existing test cases. Make sure to include/update test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Run this workflow and check if the workflow runs successfully and see if three objects are created in MinIO console. [Java UDF.json](https://github.com/user-attachments/files/23976766/Java.UDF.json) ### Was this PR authored or co-authored using generative AI tooling? <!-- If generative AI tooling has been used in the process of authoring this PR, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> No. --------- Signed-off-by: Chris <143021053+kunwp1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…4124) ### What changes were proposed in this PR? This PR removes the `WITH_R_SUPPORT` build argument and all R-related installation logic from the Docker build configuration: 1. **Dockerfiles** (`computing-unit-master.dockerfile` and `computing-unit-worker.dockerfile`): - Removed `ARG WITH_R_SUPPORT` build argument - Removed conditional R runtime dependencies installation - Removed R compilation and installation steps (R 4.3.3) - Removed R packages installation (arrow, coro, dplyr) - Removed `LD_LIBRARY_PATH` environment variable for R libraries - Removed `r-requirements.txt` copy in worker dockerfile - Simplified to Python-only dependencies 2. **GitHub Actions Workflow** (`.github/workflows/build-and-push-images.yml`): - Removed `with_r_support` workflow input parameter - Removed `with_r_support` from job outputs and parameter passing - Removed `WITH_R_SUPPORT` build args from both AMD64 and ARM64 build steps - Removed R Support from build summary ### Any related issues, documentation, discussions? Related to apache#4090 ### How was this PR tested? Verified Dockerfile & CI yml syntax are valid ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) via Claude Code CLI
NOTE: this tool is still in development, design choices and features currently present are not finalized
PR Description
This PR reintroduces the migration tool branch to the Texera repository after it was removed during our transition to an Apache project. The code changes included in this PR are purely front-end GUI changes, as the back-end is currently a standalone micro-service separate from the Texera codebase.
Purpose
Currently, users who have existing code outside of Texera and want to migrate that code to Texera must create a workflow from scratch. This can take a long time to do depending on the complexity of the code. This tool aims to reduce the amount of time needed migrating to Texera by utilizing large language models to migrate Jupyter Notebooks to Texera workflows.
Tool Overview (Demo Videos Below)
The user can upload a Jupyter Notebook which will be given to the OpenAI LLM API to migrate into a Texera workflow. Once generated, the user can modify the workflow alongside the original notebook until they are satisfied with the migration results.
Design
Future Work
Demo
1. User starts with a Jupyter Notebook they want to migrate into Texera.
1.show.original.notebook.mp4
2. User uploads the Jupyter Notebook using the new tool button.
2.show.import.notebook.mp4
3. User can view the uploaded notebook from within Texera.
3.show.jupyter.window.mp4
4. Depending on the notebook size and complexity, generation can take between one to three minutes. After the workflow is generated, the user can begin editing.
4.show.workflow.mp4