Skip to content

html支持#116

Open
tangtao0722 wants to merge 2 commits intomini-software:mainfrom
tangtao0722:supportHtml
Open

html支持#116
tangtao0722 wants to merge 2 commits intomini-software:mainfrom
tangtao0722:supportHtml

Conversation

@tangtao0722
Copy link

1.因为HtmlToOpenXml依赖,调整项目依赖项为net462和netstandard2.0
2.项目添加依赖HtmlToOpenXml和codePages
3.新增类:MiniWordHtml
4.MiniWord.Implment.cs文件中,添加了html解析代码
5.调用示例:
new MiniWordHtml {
HtmlText = File.ReadAllText(@"TestData\sample2.html")
}
6.HtmlToOpenXml开源地址:https://github.com/onizet/html2openxml

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在增强MiniWord库的功能,使其能够处理和嵌入HTML内容到生成的Word文档中。通过集成HtmlToOpenXml库,MiniWord现在可以解析HTML字符串并将其转换为OpenXML元素,从而极大地扩展了其处理富文本和从Web源导入内容的能力。

Highlights

  • HTML支持: 增加了对MiniWord中HTML内容的支持。
  • 目标框架更新: 项目目标框架已调整为net462和netstandard2.0。
  • 新依赖项: 引入了HtmlToOpenXml和System.Text.Encoding.CodePages NuGet包。
  • MiniWordHtml类: 新增了MiniWordHtml类,用于封装HTML文本。
  • HTML解析逻辑: 在MiniWord.Implment.cs中实现了HTML解析和插入逻辑,允许将HTML内容转换为Word文档元素。

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/MiniWord/MiniWord.Implment.cs
    • 调整了using语句,引入了HtmlToOpenXml命名空间。
    • 在ReplaceText方法中增加了对MiniWordHtml对象及其集合的处理逻辑。
    • 新增了AddHtmls方法,用于将HTML内容通过HtmlConverter解析并插入到Word文档中。
    • 新增了FindTopPara方法,用于查找顶层段落以正确插入HTML内容。
  • src/MiniWord/MiniWord.csproj
    • 更新了项目目标框架,从net45改为net462。
    • 添加了HtmlToOpenXml.dll和System.Text.Encoding.CodePages的NuGet包引用。
    • 更新了DocumentFormat.OpenXml的包版本。
  • src/MiniWord/MiniWordHtml.cs
    • 新增了MiniWordHtml类,作为HTML内容的参数对象,包含一个HtmlText属性。
Activity
  • 此拉取请求目前没有记录任何评论或审查活动。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次代码变更主要是为了增加对HTML内容的支持。整体实现思路清晰,通过引入 HtmlToOpenXml 库来转换HTML,并将其集成到现有的模板渲染流程中。

我发现了一些可以改进的地方:

  1. AddHtmls 方法中存在一个空的 catch 块,这会隐藏潜在的错误。同时,该方法缺少对 FindTopPara 返回值的空检查,这可能导致在处理位于表格内的占位符时程序崩溃。
  2. FindTopPara 方法的循环逻辑可以优化,以提高效率和可读性。
  3. ReplaceText 方法中对不同HTML集合类型的处理可以进行重构,以减少代码重复。

具体的修改建议请见代码行评论。

Comment on lines +1202 to +1224
private static void AddHtmls(WordprocessingDocument docx, Run run, MiniWordHtml[] miniWordHtmls)
{
//找到当前顶级段落(body)添加,html中的表格不能直接放在run或者段落里
Paragraph topPara = FindTopPara(run);
foreach (var miniWordHtml in miniWordHtmls)
{
try
{
//实例化转换对象
HtmlConverter converter = new HtmlConverter(docx.MainDocumentPart);
//解析
var paras = converter.Parse(miniWordHtml.HtmlText);
//倒排插入(因为都是插入到标记位置后面所以需要倒排)
for (var i = paras.Count - 1; i >= 0; i--)
{
var item = paras[i];
topPara.Parent.InsertAfter(item, topPara);
}
}
catch (Exception)
{ }
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

AddHtmls 方法存在两个主要问题:

  1. 潜在的 NullReferenceException: 当占位符在表格等嵌套结构中时,FindTopPara 会返回 null。当前代码没有检查 topPara 是否为 null,会导致程序在尝试访问 topPara.Parent 时崩溃。
  2. 异常被静默吞噬: catch (Exception) 块是空的,这意味着任何在HTML解析或插入过程中发生的错误都将被忽略,这会使问题排查变得极为困难。

建议重构此方法以解决这些问题,提高代码的健壮性和可维护性。

        private static void AddHtmls(WordprocessingDocument docx, Run run, MiniWordHtml[] miniWordHtmls)
        {
            //找到当前顶级段落(body)添加,html中的表格不能直接放在run或者段落里
            Paragraph topPara = FindTopPara(run);
            if (topPara == null)
            {
                // TODO: Consider logging a warning or throwing an exception as HTML cannot be inserted here.
                return;
            }

            foreach (var miniWordHtml in miniWordHtmls)
            {
                try
                {
                    //实例化转换对象
                    HtmlConverter converter = new HtmlConverter(docx.MainDocumentPart);
                    //解析
                    var paras = converter.Parse(miniWordHtml.HtmlText);
                    //倒排插入(因为都是插入到标记位置后面所以需要倒排)
                    for (var i = paras.Count - 1; i >= 0; i--)
                    {
                        var item = paras[i];
                        topPara.Parent.InsertAfter(item, topPara);
                    }
                }
                catch (Exception ex)
                {
                    // TODO: Add logging here to trace potential issues.
                    // For example: System.Diagnostics.Trace.WriteLine($"Error parsing HTML: {ex.Message}");
                }
            }
        }

Comment on lines +657 to +671
else if (value is MiniWordHtml html)
{
AddHtmls(docx, run, new[] { html });
t.Remove();
}
else if (value is MiniWordHtml[] htmls)
{
AddHtmls(docx, run, htmls);
t.Remove();
}
else if (value is IEnumerable<MiniWordHtml> htmlList)
{
AddHtmls(docx, run, htmlList.ToArray());
t.Remove();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

MiniWordHtml, MiniWordHtml[]IEnumerable<MiniWordHtml> 的处理存在代码重复,并且对 IEnumerable<MiniWordHtml> 调用 ToArray() 会带来不必要的性能开销。可以通过修改 AddHtmls 方法的签名,使其接受 IEnumerable<MiniWordHtml>,来简化这部分逻辑。这样可以合并 MiniWordHtml[]IEnumerable<MiniWordHtml> 的处理分支,并避免不必要的内存分配和集合转换。

                            else if (value is MiniWordHtml html)
                            {
                                AddHtmls(docx, run, new[] { html });
                                t.Remove();
                            }
                            else if (value is IEnumerable<MiniWordHtml> htmlList)
                            {
                                AddHtmls(docx, run, htmlList);
                                t.Remove();
                            }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整

Comment on lines +1231 to +1243
private static Paragraph FindTopPara(Run run)
{
Paragraph result = null;
for (var pnode = run.Parent; pnode != null;)
{
if (pnode is Paragraph para && pnode.Parent != null && pnode.Parent is Body)
{
result = para;
}
pnode = pnode.Parent;
}
return result;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

FindTopPara 方法中的 for 循环在找到匹配的段落后没有立即返回,而是继续向上遍历父元素直到根节点。这造成了不必要的性能开销。此外,循环的写法(for (var pnode = run.Parent; pnode != null;) 并在循环体内更新 pnode)也略显不寻常,可以改写为更常规的 forwhile 循环以提高可读性。

        private static Paragraph FindTopPara(Run run)
        {
            for (var pnode = run.Parent; pnode != null; pnode = pnode.Parent)
            {
                if (pnode is Paragraph para && pnode.Parent is Body)
                {
                    return para;
                }
            }
            return null;
        }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant