-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathatom.xml
More file actions
175 lines (100 loc) · 63.1 KB
/
atom.xml
File metadata and controls
175 lines (100 loc) · 63.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>dlee 的 Blog</title>
<link href="/atom.xml" rel="self"/>
<link href="http://dlee-libo.github.io/"/>
<updated>2025-06-26T08:08:59.628Z</updated>
<id>http://dlee-libo.github.io/</id>
<author>
<name>dlee</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>为什么讲故事是最有效的信息传递方式?</title>
<link href="http://dlee-libo.github.io/2025/06/14/%E4%B8%BA%E4%BB%80%E4%B9%88%E8%AE%B2%E6%95%85%E4%BA%8B%E6%98%AF%E6%9C%80%E6%9C%89%E6%95%88%E7%9A%84%E4%BF%A1%E6%81%AF%E4%BC%A0%E9%80%92%E6%96%B9%E5%BC%8F%EF%BC%9F/"/>
<id>http://dlee-libo.github.io/2025/06/14/为什么讲故事是最有效的信息传递方式?/</id>
<published>2025-06-14T06:45:25.000Z</published>
<updated>2025-06-26T08:08:59.628Z</updated>
<content type="html"><![CDATA[<p>人类最擅长的信息传递方式,也许就是“讲故事”。</p><p>为什么我们如此擅长听故事、讲故事?背后很可能与我们的记忆机制有关。有研究指出,人类的记忆更倾向于按叙事结构来组织内容。也就是说,我们仿佛有一个后台程序,在持续地将经历转化为线性、连贯的“故事”,串联起过去的片段。我们所谓的“断片”,比如醉酒时的记忆缺失,也很像是叙事结构被打断的表现。</p><p>也正因此,人类对于以故事形式传达的信息,往往更容易理解、记住、共情。从这个视角,我们不妨看看各种艺术媒介是如何“讲故事”的。</p><hr><h2 id="各种媒介与讲故事的能力"><a href="#各种媒介与讲故事的能力" class="headerlink" title="各种媒介与讲故事的能力"></a>各种媒介与讲故事的能力</h2><h3 id="🎨-绘画:情绪大于信息"><a href="#🎨-绘画:情绪大于信息" class="headerlink" title="🎨 绘画:情绪大于信息"></a>🎨 绘画:情绪大于信息</h3><p>绘画可以承载故事,但它对背景知识的依赖极大。有些作品只传达情绪,不试图讲述具体事件。每个观众解读不同:你看到的是诗意,他感受到的是压抑。而作者要传达的,观众未必能准确捕捉。</p><p>它的优势在于激发感受,而非传递明确的信息。</p><h3 id="🎶-音乐:情绪流动的语言"><a href="#🎶-音乐:情绪流动的语言" class="headerlink" title="🎶 音乐:情绪流动的语言"></a>🎶 音乐:情绪流动的语言</h3><p>音乐尤其是器乐部分,也与绘画类似,更擅长传达情绪而非具体事件。但它在情绪控制上的能力更强,可以通过旋律、节奏直接影响听者的情绪波动。</p><p>如果结合歌词或视觉(如MV),讲故事能力会更强。</p><h3 id="📖-文学:信息密度的跃迁"><a href="#📖-文学:信息密度的跃迁" class="headerlink" title="📖 文学:信息密度的跃迁"></a>📖 文学:信息密度的跃迁</h3><p>文字——无论是书写还是口述——大大增强了讲故事的能力。它不仅能表达情节、角色,还能传达思想、隐喻、结构性逻辑。</p><p>口述再结合声音的变化、语调、背景音乐,就更具感染力——这也解释了为什么广播剧、播客和评书有不小的受众。</p><h3 id="🏛-建筑:空间里的“静态叙事”"><a href="#🏛-建筑:空间里的“静态叙事”" class="headerlink" title="🏛 建筑:空间里的“静态叙事”"></a>🏛 建筑:空间里的“静态叙事”</h3><p>虽然略去不谈,但建筑作为空间媒介,也能通过路径、光影、结构来“讲故事”——只是它是静态的、非线性的,但在叙事性空间设计中却潜力巨大(例如博物馆展陈)。</p><h3 id="🎬-电影:讲故事的集大成者"><a href="#🎬-电影:讲故事的集大成者" class="headerlink" title="🎬 电影:讲故事的集大成者"></a>🎬 电影:讲故事的集大成者</h3><p>电影整合了图像、声音、节奏、语言,是现代最强的叙事工具之一。通过蒙太奇剪辑,它还可以调动观众的脑补能力,在表达现实与虚构之间灵活穿梭。</p><p>它是对人类“讲故事天赋”的系统强化。</p><hr><h2 id="🎮-游戏:最具潜力也最难掌握的故事形式"><a href="#🎮-游戏:最具潜力也最难掌握的故事形式" class="headerlink" title="🎮 游戏:最具潜力也最难掌握的故事形式"></a>🎮 游戏:最具潜力也最难掌握的故事形式</h2><p>理论上,游戏是电影的超集。它增加了交互性,观众不只是被动接收,而是参与其中。</p><p>但这也带来了挑战——互动常常打断叙事的连续性。比如你正沉浸在恐怖氛围里,突然游戏要求你解个拼图;你在享受剧情,下一秒却进入了无意义的战斗。节奏被打断、注意力分散,是许多游戏面临的难题。</p><h3 id="讲故事-vs-可玩性:魂类游戏的例子"><a href="#讲故事-vs-可玩性:魂类游戏的例子" class="headerlink" title="讲故事 vs 可玩性:魂类游戏的例子"></a>讲故事 vs 可玩性:魂类游戏的例子</h3><p>“魂类游戏”提供了一个有趣的解法。它弱化了传统意义上的线性叙事,而是通过片段留白和环境细节,鼓励玩家自己拼出故事。而在机制上,它也刻意<strong>省略了地图、任务引导等强指示性 UI 元素</strong>,让玩家必须靠自己的观察和探索前进。</p><p>这种设计不仅增强了探索感,也<strong>有效减少了分心因素</strong>:没有箭头、没有任务清单、没有地图上的小点点,注意力自然集中在战斗和环境中。你所获得的一切进展都靠自己挣来的,自然也更投入、更记得住。</p><p>最终,你拼出的不只是一个游戏背景,而是你自己的故事。</p><p>相对地,也有强化系统设计的游戏,比如《CS》这类竞技游戏,完全弱化剧情,靠规则循环练习、挑战、成就来吸引人。</p><p>魂类则处在两者之间,实现了一种良好的 balance。</p><hr><h2 id="🤖-AI-Chatbot:一种新型的交互性知识传递方式?"><a href="#🤖-AI-Chatbot:一种新型的交互性知识传递方式?" class="headerlink" title="🤖 AI Chatbot:一种新型的交互性知识传递方式?"></a>🤖 AI Chatbot:一种新型的交互性知识传递方式?</h2><p>最近兴起的 AI Chatbot,或许不是一种新的艺术形式,但在知识传递方式上,它非常像游戏——强调“交互”。</p><p>你可以提出自己的问题、疑惑、比喻,Chatbot 不仅能解答,还能修正你类比的偏差、拓展你的思路。这种方式相较于传统的“慕课”或视频讲解(哪怕有弹幕和测验),大大提升了主动性和个性化。</p><p>主动学习的方式,其实也很像我们开头讲的:越接近“你自己的故事”,越容易记住。</p><hr><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>人类最擅长也最喜欢的接收信息方式,是故事。而媒介的进步,不断丰富我们讲故事的方式。</p><ul><li>画、音表达情绪;</li><li>文、影表达情节;</li><li>游戏和AI则带来了互动与个性化的叙事可能。</li></ul><p>最终,我们记住的,永远是我们亲身经历的故事——哪怕是虚拟世界里失败了一百次的那个自己。</p>]]></content>
<summary type="html">
<p>人类最擅长的信息传递方式,也许就是“讲故事”。</p>
<p>为什么我们如此擅长听故事、讲故事?背后很可能与我们的记忆机制有关。有研究指出,人类的记忆更倾向于按叙事结构来组织内容。也就是说,我们仿佛有一个后台程序,在持续地将经历转化为线性、连贯的“故事”,串联起过去的片段。
</summary>
</entry>
<entry>
<title>Open Source, Hardware, and the Reshaping of the Software Industry: Growth, Consolidation, and Cultural Lag</title>
<link href="http://dlee-libo.github.io/2025/06/13/Open-Source-Hardware-and-the-Reshaping-of-the-Software-Industry-Growth-Consolidation-and-Cultural-Lag/"/>
<id>http://dlee-libo.github.io/2025/06/13/Open-Source-Hardware-and-the-Reshaping-of-the-Software-Industry-Growth-Consolidation-and-Cultural-Lag/</id>
<published>2025-06-13T09:22:46.000Z</published>
<updated>2025-06-13T09:23:20.562Z</updated>
<content type="html"><![CDATA[<p>The software industry has undergone explosive growth over the past few decades, with two key forces behind its rapid transformation: the <strong>open source movement</strong> and the <strong>dramatic advancement in hardware</strong>. These forces have democratized access to tools and knowledge, fueled innovation, and reshaped global competitiveness — especially in countries like China. But these same forces have also introduced subtle challenges: labor displacement, cultural gaps in management, and increasing consolidation around dominant technologies.</p><h2 id="Open-Source-as-Information-Democracy"><a href="#Open-Source-as-Information-Democracy" class="headerlink" title="Open Source as Information Democracy"></a>Open Source as Information Democracy</h2><p>The open source movement is often seen as a triumph of collaboration, but at its core, it’s a powerful engine of <strong>information transparency</strong> and <strong>knowledge dissemination</strong>. By making software source code publicly available, open source allowed developers anywhere in the world to study, learn from, and build upon cutting-edge technology.</p><p>This movement <strong>flattened global barriers</strong> to software innovation. In China, for example, access to open source projects enabled rapid catch-up with Western counterparts. Developers could immediately access high-quality code, best practices, and community-driven knowledge without needing access to elite universities or proprietary corporate systems.</p><p>Open source has created a shared foundation for software development that transcends borders. It has empowered individuals, startups, and even entire nations to build competitive products without reinventing the wheel. It is, in many ways, a form of <strong>global public infrastructure</strong> for the digital age.</p><h2 id="Hardware-Improvement-Lowering-the-Barrier-to-Entry"><a href="#Hardware-Improvement-Lowering-the-Barrier-to-Entry" class="headerlink" title="Hardware Improvement: Lowering the Barrier to Entry"></a>Hardware Improvement: Lowering the Barrier to Entry</h2><p>Running parallel to the open source revolution is the relentless improvement in hardware capabilities. The increasing power and affordability of CPUs, memory, and storage significantly reduced the technical prerequisites for building software.</p><p>In earlier eras, developers needed to understand low-level system behavior to write performant code. Today, however, high-level abstractions and frameworks, supported by powerful hardware, allow developers to write functional applications even if the underlying code is inefficient. This shift has opened the door to many more participants in the industry.</p><p>Especially in China, this hardware-fueled democratization played a key role. Without needing deep system-level knowledge, a large population of new developers could quickly become productive, contributing to the country’s rapid software industry expansion.</p><h2 id="The-Winner-Take-All-Effect-of-Open-Source"><a href="#The-Winner-Take-All-Effect-of-Open-Source" class="headerlink" title="The Winner-Take-All Effect of Open Source"></a>The Winner-Take-All Effect of Open Source</h2><p>Despite the decentralizing ethos of open source, it has also led to <strong>a new kind of monopolization</strong>. In many domains — databases, web servers, orchestration, machine learning frameworks — a single open source project becomes the de facto standard.</p><p>Once a high-quality solution is widely adopted, it eliminates the need for alternatives. Network effects kick in. Tooling, talent, and documentation all concentrate around the dominant choice. While this avoids wasteful duplication and drives consistency, it also <strong>limits diversity</strong> and stifles alternative experimentation.</p><p>This standardization creates a paradox: open source makes software more accessible, yet at the same time <strong>centralizes power</strong> in a handful of dominant ecosystems and contributors.</p><h2 id="Labor-Market-Consequences-Infrastructure-Work-Disappears"><a href="#Labor-Market-Consequences-Infrastructure-Work-Disappears" class="headerlink" title="Labor Market Consequences: Infrastructure Work Disappears"></a>Labor Market Consequences: Infrastructure Work Disappears</h2><p>One of the least-discussed side effects of this consolidation is its impact on the labor market. When open source infrastructure becomes ubiquitous, the demand for engineers to build and maintain alternative systems disappears. What once required dedicated teams inside every company is now outsourced to a few core maintainers or cloud service providers.</p><p>As a result, <strong>infrastructure engineering jobs shrink</strong>, and developers are pushed <strong>up the stack</strong> toward application development. While this shift enables faster product delivery, it also narrows the career paths available to engineers and concentrates specialized knowledge in fewer hands.</p><h2 id="Management-The-Missing-Link-in-Information-Sharing"><a href="#Management-The-Missing-Link-in-Information-Sharing" class="headerlink" title="Management: The Missing Link in Information Sharing"></a>Management: The Missing Link in Information Sharing</h2><p>Software development was successfully democratized through open source — anyone with internet access could learn from and build upon public code. Hardware, by contrast, was democratized through physical delivery: end users received powerful tools, but not the knowledge of how they were made. <strong>Management</strong>, however, experienced neither. It remained locked in practice, largely undocumented and deeply dependent on tacit experience. This lack of transparent, transferable knowledge made it far harder for organizations to improve their leadership structures by simply observing or replicating successful models elsewhere.</p><p>This gap is particularly visible in the Chinese tech industry. While the technical side advanced rapidly, management practices lagged behind. The result: widespread inefficiencies, poor planning, and toxic work cultures like <strong>996</strong> (working from 9 a.m. to 9 p.m., six days a week).</p><p>Many workers in China have openly criticized the <strong>waste of labor hours</strong> and lack of respect for personal time — signs that while the tools of production advanced, the organizational systems managing those tools did not evolve at the same pace.</p><h2 id="A-Complex-Transformation"><a href="#A-Complex-Transformation" class="headerlink" title="A Complex Transformation"></a>A Complex Transformation</h2><p>The software industry’s transformation over the past few decades has been dramatic, global, and uneven. Open source and hardware improvements made software development more inclusive and faster-moving. But this same transformation led to consolidation of tools, loss of infrastructure engineering roles, and exposed deep cultural gaps in leadership and management.</p><p>We’re left with a layered reality:</p><ul><li>Open source enabled learning and growth, but also centralization.</li><li>Hardware progress reduced the need for deep CS expertise, but made inefficiency tolerable.</li><li>Management remains a bottleneck in many regions, especially where organizational culture hasn’t caught up with technical capability.</li></ul><h2 id="Conclusion"><a href="#Conclusion" class="headerlink" title="Conclusion"></a>Conclusion</h2><p>The evolution of the software industry is a story of <strong>information freedom</strong> and <strong>structural tradeoffs</strong>. Open source and hardware democratized access and supercharged global development, but they also introduced new dependencies, shifted labor dynamics, and revealed the limits of what information alone can solve.</p><p>As we prepare for the next phase — AI-driven development, decentralized systems, or sovereign tech stacks — we must recognize that <strong>not all parts of the stack evolve equally</strong>. Tools may be global, but management is still deeply local. And understanding that tension will be key to building sustainable and humane systems in the decades to come.</p>]]></content>
<summary type="html">
<p>The software industry has undergone explosive growth over the past few decades, with two key forces behind its rapid transformation: the
</summary>
</entry>
<entry>
<title>Kernel journey with bpftrace</title>
<link href="http://dlee-libo.github.io/2020/05/31/bpftrace-kernel-journey/"/>
<id>http://dlee-libo.github.io/2020/05/31/bpftrace-kernel-journey/</id>
<published>2020-05-31T07:55:19.000Z</published>
<updated>2025-06-13T09:17:24.720Z</updated>
<content type="html"><![CDATA[<h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>前几天同事和我聊起 calico 的一些实现原理,他给了我一个脚本让我试玩一下如何通过 linux 下的 veth 设备使得在单独的 network namespace 可以与 host 进行通信。意外的是,在我的笔记本上整个方案没有正确地工作。由于缺乏相关的文档支持,所以我通过使用 bpftrace 配合阅读内核的源码,终于搞明白了是什么原因导致的,借此机会记录下来展示下 bpftrace 及一些相关工具的基本用法。</p><h2 id="问题描述"><a href="#问题描述" class="headerlink" title="问题描述"></a>问题描述</h2><p>通过执行下面的脚本,我们将会创建一个新的 network namepsace: <code>ns0</code> ,以及一对 veth 设备 <code>v-ns0</code> 和 <code>v-ns0-peer</code>。我们将 <code>v-ns0</code> 放入 <code>ns0</code> 中,将 <code>v-ns0-peer</code> 留在 host 中,通过开启 <code>v-ns0-peer</code> 的 proxy_arp 功能我们应当能看到 <code>v-ns0-peer</code> 设备会用自己的 MAC 地址响应 <code>v-ns0</code> 设备发出的 ARP 请求。如果进一步设置相关的转发和路由规则 <code>ns0</code> 中的进程将可以顺畅地与其他机器上的容器进行通信。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#!/bin/bash</span></span><br><span class="line">NS=ns0</span><br><span class="line">VETH=v-<span class="variable">$NS</span></span><br><span class="line"></span><br><span class="line">ip netns add <span class="variable">$NS</span></span><br><span class="line"></span><br><span class="line">ip link add <span class="variable">$VETH</span> <span class="built_in">type</span> veth peer name <span class="variable">$VETH</span>-peer</span><br><span class="line"></span><br><span class="line">ip link <span class="built_in">set</span> <span class="variable">$VETH</span>-peer up</span><br><span class="line">ip link <span class="built_in">set</span> <span class="variable">$VETH</span> netns <span class="variable">$NS</span></span><br><span class="line">ip netns <span class="built_in">exec</span> <span class="variable">$NS</span> ip link <span class="built_in">set</span> <span class="variable">$VETH</span> up</span><br><span class="line"></span><br><span class="line">ip netns <span class="built_in">exec</span> <span class="variable">$NS</span> ip addr add 10.6.0.1/32 dev <span class="variable">$VETH</span></span><br><span class="line">ip netns <span class="built_in">exec</span> <span class="variable">$NS</span> ip route add 169.254.0.1 dev <span class="variable">$VETH</span> scope link</span><br><span class="line">ip netns <span class="built_in">exec</span> <span class="variable">$NS</span> ip route add default via 169.254.0.1 dev <span class="variable">$VETH</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> 1 > /proc/sys/net/ipv4/conf/<span class="variable">$VETH</span>-peer/proxy_arp</span><br><span class="line"><span class="built_in">echo</span> 0 > /proc/sys/net/ipv4/conf/<span class="variable">$VETH</span>-peer/rp_filter</span><br></pre></td></tr></table></figure><p>我面临的主要问题是在我的笔记本上执行完上面的脚本后, <code>v-ns0-peer</code> 设备的 proxy_arp 功能并没有生效,通过使用 wireshark 在 <code>v-ns0-peer</code> 上抓包同时执行 <code>ip netns exec ns0 ping 192.168.1.1</code> ,可以明显看到只有对于 <code>ns0</code> 中的默认网关 169.254.0.1 的 ARP 请求却没有任何的 ARP 应答。<br><img src="/images/no-arp-reply.png" alt="no arp reply"></p><h2 id="问题分析"><a href="#问题分析" class="headerlink" title="问题分析"></a>问题分析</h2><p>设备 <code>v-ns0-peer</code> 的 proxy_arp 功能没有正确工作,我一开始的推测是需要开启某些设置,可能是安全策略相关的。所以一开始在 google 上找了一圈资料,但是没有发现什么特别有帮助的,于是考虑跟踪下内核的代码,看看是什么条件没有符合。直接阅读内核代码的方式来分析难度比较大,因此我选择一边阅读一边使用 trace 工具快速确定内核的执行路径,这里我选择使用的工具是 <a href="https://github.com/iovisor/bpftrace" target="_blank" rel="noopener">bpftrace</a> 。</p><h3 id="工具准备"><a href="#工具准备" class="headerlink" title="工具准备"></a>工具准备</h3><p>我的笔记本是 ubuntu 20.04 的操作系统,其他操作系统下工具的安装和准备应该是类似的。</p><ol><li>获取当前内核的源代码。执行 <code>apt-get source linux-image-unsigned-$(uname -r)</code> 即可。</li><li>获取当前内核的 debug info 。添加源 ddebs.ubuntu.com 后执行 <code>apt-get install linux-image-$(uname -r)-dbgsym</code> 即可。</li><li>安装 bpftrace 以及 <a href="https://github.com/iovisor/bcc" target="_blank" rel="noopener">bcc</a> 。执行 <code>apt-get install bpftrace bpfcc-tools linux-headers-$(uname -r)</code> 即可。</li></ol><h3 id="定位内核代码"><a href="#定位内核代码" class="headerlink" title="定位内核代码"></a>定位内核代码</h3><p>我们的目标是找出 proxy_arp 功能为什么不工作,处理 ARP 请求的代码在 net/ipv4/arp.c 中的 <code>arp_process</code> 函数中。通过大致阅读该函数,我们可以迅速发现与 proxy_arp 相关的代码段应该如下 813 行附近。<br><img src="/images/code-813.png" alt="code-813"></p><h3 id="trace-内核"><a href="#trace-内核" class="headerlink" title="trace 内核"></a>trace 内核</h3><p>我们的 ARP 请求在设备 <code>v-ns0-peer</code> 上被收到以后内核执行到 813 行,检查 arp->ar_op 肯定是 ARPOP_REQUEST ,所以我们第一需要确定函数 <code>ip_route_input_noref</code> 的返回值。我们使用 bpftrace 来完成这项工作,通过执行 <code>bpftrace -e 'kretprobe:ip_route_input_noref { printf("pid %d. ret: %d\n", pid, retval); }'</code> 我们可以得到 <code>ip_route_input_noref</code> 每次调用的返回值。</p><p>开始 trace 以后,我们还没有在 <code>ns0</code> 中执行任何操作的情况下,已经能看到一些输出了:<br><img src="/images/trace1.png" alt="trace1"><br>这个显然是机器上处理其他的 ARP 请求的时候执行的,为了排除这些干扰,我将笔记本的网络断开了,再重新开始 trace ,并执行 <code>ip netns exec ns0 ping 192.168.1.1</code> 来触发 ARP 请求。<br><img src="/images/trace2.png" alt="trace2"><br>可以看到有很多输出,有的 PID 是 ping ,有的 PID 是 0 ,所有的返回值都是 0。所以我们可以肯定代码肯定进入了816行,接下来是两个分支,819行和836行。因为函数 <code>skb_rtable</code> 被内联了,我们无法使用 bpftrace 去 trace 该函数。为了确认代码的执行路径,我们可以利用 bpftrace 的 kprobe 支持 function offset 这个特点来打点。</p><h4 id="反汇编内核"><a href="#反汇编内核" class="headerlink" title="反汇编内核"></a>反汇编内核</h4><p>为了知道819和837两个分支的代码相对于函数 <code>arp_process</code> 的偏移,我们使用 gdb 反汇编 <code>gdb -q /usr/lib/debug/boot/vmlinux-$(uname -r) --ex 'disassemble arp_process'</code> ,然后迅速滚动到有函数 <code>ip_route_input_noref</code> 的调用的附近:<br><img src="/images/disasm.png" alt="disassemble"><br>从图中可以看到,函数 <code>ip_route_input_noref</code> 的调用在 +1173 的位置,接下来 +1191 判断其返回值是否为0,不为0就跳转到 arp_process+289 的位置,我们可以使用 addr2line 来确认下对应的是代码中的什么位置。执行 <code>addr2line -e /usr/lib/debug/boot/vmlinux-$(uname-r) 0xffffffff819ed861</code> 得到结果 linux-5.4.0/include/net/neighbour.h:516 。阅读代码发现是函数 <code>__neigh_lookup</code> 的实现,该函数其实是在 865 行调用的,被内联了。</p><p>因为我们上面 trace 知道 <code>ip_route_input_noref</code> 返回是0,所以代码执行应该是进入了816行,也就是顺着 +1197 继续执行。+1197 从内存读取一个值将低位抹掉作为地址去加载一个值,通过读代码我们知道是函数 <code>skb_rtable</code> 的内容,接着+1210的判断其实就是我们的819行的判断,读代码可以知道 <code>RTN_LOCAL</code> 的值是 2 ,<code>RTN_UNICAST</code> 的值是 1 。为了确定代码是否进入了820行,我们可以在+1219埋点,可惜的是我的笔记本上安装的 bpftrace 在编译的时候没有开启 <code>ALLOW_UNSAFE_PROBE</code> 所以当我在笔记本上执行 <code>bpftrace -e 'kprobe:arp_process+1219 { printf("executed\n"); } '</code> 的时候报错:<br><img src="/images/unsafe.png" alt="unsafe-error"></p><h4 id="手工使用-bcc"><a href="#手工使用-bcc" class="headerlink" title="手工使用 bcc"></a>手工使用 bcc</h4><p>bpftrace 不支持,我们可以使用 bcc 直接写程序来 trace 内核,代码非常简单:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="keyword">from</span> bcc <span class="keyword">import</span> BPF</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> strftime</span><br><span class="line"></span><br><span class="line"><span class="comment"># load BPF program</span></span><br><span class="line">bpf_text = <span class="string">"""</span></span><br><span class="line"><span class="string">#include <uapi/linux/ptrace.h></span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">int print_called(struct pt_regs *ctx) {</span></span><br><span class="line"><span class="string"> bpf_trace_printk("executed!\\n");</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string"> return 0;</span></span><br><span class="line"><span class="string">}</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line">b = BPF(text=bpf_text)</span><br><span class="line">b.attach_kprobe(event=<span class="string">'arp_process'</span>, event_off=<span class="number">1219</span>, fn_name=<span class="string">'print_called'</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># header</span></span><br><span class="line">print(<span class="string">'%-9s %-6s %s'</span> % (<span class="string">'TIME'</span>, <span class="string">'PID'</span>, <span class="string">'MSG'</span>))</span><br><span class="line"></span><br><span class="line"><span class="comment"># format output</span></span><br><span class="line"><span class="keyword">while</span> <span class="number">1</span>:</span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> (task, pid, cpu, flags, ts, msg) = b.trace_fields()</span><br><span class="line"> <span class="keyword">except</span> ValueError:</span><br><span class="line"> <span class="keyword">continue</span></span><br><span class="line"> print(<span class="string">'%-9s %-6d %s'</span> % (strftime(<span class="string">'%H:%M:%S'</span>), pid, msg))</span><br></pre></td></tr></table></figure><p>运行起来我们的 bcc 程序,执行 ping ,然后我们发现,+1219 是有执行到的,那么说明 <code>addr_type</code> 的值不是 <code>RTN_LOCAL</code> 。所以代码将会执行到836行的判断 <code>IN_DEV_FORWARD(in_dev)</code> ,直接看汇编代码,我们很容易看出关键是看+1229的 <code>je</code> 指令,如果发生跳转就说明没有进入837行。所以我们继续 trace +1235 即可知道,执行 ping ,我们发现 +1235 没有执行到!说明 <code>IN_DEV_FORWARD(in_dev)</code> 的判断没成功,我们进入代码库搜索发现这是一个宏,主要干的事是检查设备的 forwarding 选项是否开启。我们执行 <code>cat /proc/sys/net/ipv4/conf/v-ns0-peer/forwarding</code> 可以看到结果是0,所以我们将其修改为1后继续 trace 。现在代码顺利执行到了 +1235 ,但是抓包依然没有看到 ARP 应答。</p><p>我们继续看汇编,+1235 和 +1238 的判断对应代码中837行的第一个条件,测试 <code>addr_type</code> 是否是 <code>RTN_UNICAST</code> ,我们继续 trace +1244 来验证下这个条件判断是否成功。执行 ping ,我们发现这个判断失败了,所以代码执行跳到了后面,于是 proxy_arp 没有正确工作。</p><p>稍微阅读下代码,我们可以知道 <code>addr_type</code> 是使用我们 ARP 请求里面的目的 IP 地址查路由表后得到的讯息,这里的逻辑只是要确保我们的 IP 地址是单播地址。而我们查讯的地址 169.254.0.1 是 link-local 地址,肯定是单播地址,因此这个行为就比较奇怪了。进一步思考,结合 wikipedia 中对 proxy_arp 的一段描述:</p><blockquote><p>The proxy is aware of the location of the traffic’s destination, and offers its own MAC address as the (ostensibly final) destination.</p></blockquote><p>我猜测是因为断网状态下我的笔记本不知道如何到达 169.254.0.1 ,执行 <code>ip route get 169.254.0.1</code> 报错:RTNETLINK answers: Network is unreachable 。打开笔记本网络继续测试,首先 <code>ip route get 169.254.0.1</code> 正确返回了,然后我们退回去最开始去 trace <code>ip_route_input_noref</code> ,为了让干扰尽量少,我关闭了笔记本上大部分程序。开启 trace 后执行 ping 结果发现 <code>ip_route_input_noref</code> 居然返回了非0值:-18 。</p><h4 id="深入-ip-route-input-noref"><a href="#深入-ip-route-input-noref" class="headerlink" title="深入 ip_route_input_noref"></a>深入 <code>ip_route_input_noref</code></h4><p>阅读内核代码,<code>ip_route_input_noref</code> 的实现在 net/ipv4/route.c 中,一个简化的调用链路是 <code>ip_route_input_noref</code> -> <code>ip_route_input_rcu</code> -> <code>ip_route_input_slow</code> 。其中函数 <code>ip_route_input_slow</code> 比较复杂,而我们的目的只是简单找到返回 -18 的原因,因此我们可以跟踪产生返回值的地方,除开大部分常值不符合我们的期望外,第一个可能产生 -18 的地方是对于函数 <code>fib_validate_source</code> 的调用。执行命令 <code>bpftrace -e 'kretprobe:fib_validate_source { printf("pid %d. ret: %d\n", pid, retval); }'</code> 并开始 ping ,我们非常幸运,果然这个函数返回了 -18 !</p><h4 id="追随-fib-validate-source"><a href="#追随-fib-validate-source" class="headerlink" title="追随 fib_validate_source"></a>追随 <code>fib_validate_source</code></h4><p>函数 <code>fib_validate_source</code> 的实现在 net/ipv4/fib_frontend.c 里面,代码很短,可能产生 -18 的返回值的地方是函数 <code>__fib_validate_source</code> ,进入函数 <code>__fib_validate_source</code> 查看,很显眼看到末尾的 <code>return -EXDEV</code> ,简单用个 C 程序验证就发现 errno EXDEV 正好是 18!阅读代码发现导致返回 EXDEV 的原因是参数 rpf 非0,而该参数是上层函数 <code>fib_validate_source</code> 传进来的,值是这样得到的:<code>int r = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);</code> 。所以说明是 <code>secpath_exists(skb)</code> 返回了0,然后 <code>IN_DEV_RPFILTER</code> 返回了非0值。但是我们一开始的脚本里面明明是有这样一句的 <code>echo 0 > /proc/sys/net/ipv4/conf/$VETH-peer/rp_filter</code> ,所以这里就比较奇怪了。</p><p>到这里我们知道了方向但是依然不知道最终的原因是什么,我们继续看看 <code>IN_DEV_RPFILTER</code> 的实现有什么奇怪的地方。在 include/linux/inetdevice.h 中我们找到了这个宏的定义为: <code>#define IN_DEV_RPFILTER(in_dev) IN_DEV_MAXCONF((in_dev), RP_FILTER)</code> ,我们继续看 <code>IN_DEV_MAXCONF</code> 的定义:</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">define</span> IN_DEV_MAXCONF(in_dev, attr) \</span></span><br><span class="line">(<span class="built_in">max</span>(IPV4_DEVCONF_ALL(dev_net(in_dev->dev), attr), \</span><br><span class="line"> IN_DEV_CONF_GET((in_dev), attr)))</span><br></pre></td></tr></table></figure><p>啊哈!原来这个是取全局的值和设备的值中的较大者。执行 <code>cat /proc/sys/net/ipv4/conf/all/rp_filter</code> 发现该值为 2,将其修改为0后再进行测试, ARP 应答正确出现!<br><img src="/images/arp-reply.png" alt="arp reply"></p>]]></content>
<summary type="html">
<h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>前几天同事和我聊起 calico 的一些实现原理,他给了我一个脚本让我试玩一下如何通过 linux 下的 veth 设备使得在单独的 net
</summary>
</entry>
<entry>
<title>UNIX process UID model</title>
<link href="http://dlee-libo.github.io/2019/05/13/UNIX-process-UID-model/"/>
<id>http://dlee-libo.github.io/2019/05/13/UNIX-process-UID-model/</id>
<published>2019-05-13T06:53:41.000Z</published>
<updated>2025-06-13T09:17:24.720Z</updated>
<content type="html"><![CDATA[<p>In UNIX environment, each process has three user id: real user id, effective user id, saved set-user-id. How these three UID are set is depended on whether the setuid bit of executable file has been set. The table bellow shows how these three UID will change after a <code>exec</code> system call.</p><table><thead><tr><th align="left">setuid bit</th><th align="left">real user id</th><th align="left">effective user id</th><th align="left">saved set-user-id</th></tr></thead><tbody><tr><td align="left">set</td><td align="left">unchanged</td><td align="left">the owner of executable file</td><td align="left">copy from effective user id</td></tr><tr><td align="left">unset</td><td align="left">unchanged</td><td align="left">unchanged</td><td align="left">copy from effective user id</td></tr></tbody></table><p>Here is a demo, I first compile a simple program into <code>a.out</code>, then use <code>chown root:root a.out</code> to change the ower of the executable file, after that I use <code>chmod +s a.out</code> to set setuid bit. The output of <code>ls -l a.out</code> is: <code>-rwsrwsr-x 1 root root 14600 Apr 24 08:02 a.out</code>. After start <code>a.out</code>, the output of <code>ps -o pid,ppid,euid,ruid,suid,cmd -p 16990,17210</code> is:</p><table><thead><tr><th align="left">PID</th><th align="left">PPID</th><th align="left">EUID</th><th align="left">RUID</th><th align="left">SUID</th><th align="left">CMD</th></tr></thead><tbody><tr><td align="left">16990</td><td align="left">16989</td><td align="left">1001</td><td align="left">1001</td><td align="left">1001</td><td align="left">-bash</td></tr><tr><td align="left">17210</td><td align="left">16990</td><td align="left">0</td><td align="left">1001</td><td align="left">0</td><td align="left">./a.out</td></tr></tbody></table><p>We can notice that EUID of <code>a.out</code> is root, RUID keeps same as its parent process, SUID is same as EUID.</p><hr><p>Permission check is based on effective user id. UNIX system provides these system calls to manipulate these three UID: <a href="http://man7.org/linux/man-pages/man2/setuid.2.html" target="_blank" rel="noopener">setuid</a>, <a href="http://man7.org/linux/man-pages/man2/seteuid.2.html" target="_blank" rel="noopener">seteuid</a>. How these system call affect three UID is based on whether the process has root privilege.</p><table><thead><tr><th align="left">system call</th><th align="left">ID</th><th align="left">root privilege</th><th align="left">non root privilege</th></tr></thead><tbody><tr><td align="left"><code>setuid(uid)</code></td><td align="left">real user id</td><td align="left">set to uid</td><td align="left">unchanged</td></tr><tr><td align="left"><code>setuid(uid)</code></td><td align="left">effective user id</td><td align="left">set to uid</td><td align="left">set to uid. uid must equal to ruid or suid, else return error</td></tr><tr><td align="left"><code>setuid(uid)</code></td><td align="left">saved set-user-id</td><td align="left">set to uid</td><td align="left">unchanged</td></tr></tbody></table><table><thead><tr><th align="left">system call</th><th align="left">ID</th><th align="left">root privilege</th><th align="left">non root privilege</th></tr></thead><tbody><tr><td align="left"><code>seteuid(uid)</code></td><td align="left">real user id</td><td align="left">unchanged</td><td align="left">unchanged</td></tr><tr><td align="left"><code>seteuid(uid)</code></td><td align="left">effective user id</td><td align="left">set to uid</td><td align="left">set to uid. uid must equal to ruid or suid, else return error</td></tr><tr><td align="left"><code>seteuid(uid)</code></td><td align="left">saved set-user-id</td><td align="left">unchanged</td><td align="left">unchanged</td></tr></tbody></table><p>One use case of this model is a program we all familiar with: <code>sudo</code>, <code>ll /usr/bin/sudo</code>: <code>-rwsr-xr-x 1 root root 149080 Jan 18 2018 /usr/bin/sudo</code>.<br>We can see that <code>sudo</code> has setuid bit setted. What happened after we type <code>sudo some-command</code>. The shell will start <code>sudo</code> with <code>some-command</code> as its arguments. As <code>sudo</code> has setuid bit set, so <code>sudo</code> will have ruid set to normal user, euid and suid set to root. Then <code>sudo</code> call <code>setuid(0)</code> change all three UID to root, after that <code>sudo</code> will <code>fork</code> and <code>exec</code> our command, so our command will be executed as all three UID set to root. This is just a brief process, permission check and ask password stuff all ignored.</p>]]></content>
<summary type="html">
<p>In UNIX environment, each process has three user id: real user id, effective user id, saved set-user-id. How these three UID are set is d
</summary>
<category term="UNIX" scheme="http://dlee-libo.github.io/tags/UNIX/"/>
</entry>
<entry>
<title>Go dependency management introduction</title>
<link href="http://dlee-libo.github.io/2019/03/14/Go-dependency-management-introduction/"/>
<id>http://dlee-libo.github.io/2019/03/14/Go-dependency-management-introduction/</id>
<published>2019-03-14T03:24:27.000Z</published>
<updated>2025-06-13T09:17:24.720Z</updated>
<content type="html"><![CDATA[<p>As far as I know, we usually have three ways to manage dependencies for a Go project.</p><ol><li>Manage manually.</li><li>Use <a href="https://golang.github.io/dep/" target="_blank" rel="noopener">dep</a></li><li>Use <a href="https://github.com/golang/go/wiki/Modules" target="_blank" rel="noopener">Go module</a></li></ol><h2 id="Manually-management"><a href="#Manually-management" class="headerlink" title="Manually management"></a>Manually management</h2><p>To manage dependencies manually, we add all our external dependencies into <code>vendor</code> folder, then add <code>vendor</code> folder into version control system. That’s all!</p><h2 id="Dep"><a href="#Dep" class="headerlink" title="Dep"></a>Dep</h2><p>The typical usage of dep is that, we write our code first, and then issue <code>dep ensure</code> to scan our code to find out external dependencies. Then dep can download those dependencies and store them inside our project’s <code>vendor</code> folder. Dep will maintain a <code>Gopkg.lock</code> file to store version information. As programmer we don’t edit <code>Gopkg.lock</code> directly, we edit a file named <code>Gopkg.toml</code> to specify rules which will guide dep to generate <code>Gopkg.lock</code>. By using dep, we usually no need to add the <code>vendor</code> folder into version control system, instead we manage <code>Gopkg.lock</code> and <code>Gopkg.toml</code>.</p><h2 id="Go-module"><a href="#Go-module" class="headerlink" title="Go module"></a>Go module</h2><p>By using dep or managing manually, we need to put our project properly inside <code>$GOPATH/src</code> if our project contains more than one packages. That’s very unpleasant for some people. So we can use <a href="https://github.com/golang/go/wiki/Modules" target="_blank" rel="noopener">Go module</a> . For a project using Go module, we should have two files: <code>go.mod</code>, <code>go.sum</code> which should be generated via <code>go mod init</code> for a new project.<br>Inside <code>go.mod</code> we can specify our module’s identity which will be used as a mapping to <code>$GOPATH/src</code>. For example, one of my project has <code>module github.com/dlee/admin</code>, then once I invoke <code>env GO111MODULE=on go build</code> the go tool will treat my project root path as <code>$GOPATH/src/github.com/dlee/admin</code>. Then I can import packages defined inside my project properly. e.g. <code>import "github.com/dlee/admin/pkg1"</code> will cause go build tool search pkg1 from my project folder not <code>$GOPATH/src/github.com/dlee/admin/pkg1</code>.</p><p>During daily developing, we use <code>env GO111MODULE=on go get</code> to download/add dependencies or upgrade dependencies, this will trigger updating the <code>go.mod</code> file. Or if needed we can directly edit <code>go.mod</code>. So basically if we clone a Go module enabled project, we can use <code>go mod download</code> to download dependencies, and then we can build the project. To upgrade a dependency, we use <code>env GO111MODULE=on go get foo@version</code> to upgrade to a specific version or <code>env GO111MODULE=on go get foo</code> to upgrade to latest version. Once upgrade, the <code>go.mod</code> and <code>go.sum</code> will be updated automatically.</p>]]></content>
<summary type="html">
<p>As far as I know, we usually have three ways to manage dependencies for a Go project.</p>
<ol>
<li>Manage manually.</li>
<li>Use <a href=
</summary>
</entry>
<entry>
<title>Derive Turing fixed point with Python explained</title>
<link href="http://dlee-libo.github.io/2018/12/11/derive-turing-fixed-point/"/>
<id>http://dlee-libo.github.io/2018/12/11/derive-turing-fixed-point/</id>
<published>2018-12-11T01:27:44.000Z</published>
<updated>2025-06-13T09:17:24.720Z</updated>
<content type="html"><![CDATA[<p>In lambda calculus, all functions are anonymous, so we can not define a recursive function in usual way, e.g. call function itself by name inside function’s body.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># we can not do this invoke self thing</span></span><br><span class="line"><span class="keyword">lambda</span> x: <span class="number">1</span> <span class="keyword">if</span> x < <span class="number">2</span> <span class="keyword">else</span> x * invoke_self(x - <span class="number">1</span>)</span><br></pre></td></tr></table></figure><p>To tackle this issue, we can assume we already have one function <code>f</code> which can calculate factorial of <code>x</code>. Then we use this helper function <code>f</code> to implement our factorial function.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># we use currying here rather than write lambda f, x:</span></span><br><span class="line"><span class="keyword">lambda</span> f: <span class="keyword">lambda</span> x: <span class="number">1</span> <span class="keyword">if</span> x < <span class="number">2</span> <span class="keyword">else</span> x * f(x - <span class="number">1</span>)</span><br></pre></td></tr></table></figure><p>To use this function, we need to pass two arguments, one is the helper function <code>f</code>, the other is the number we need to calculate its factorial.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">F = <span class="keyword">lambda</span> f: <span class="keyword">lambda</span> x: <span class="number">1</span> <span class="keyword">if</span> x < <span class="number">2</span> <span class="keyword">else</span> x * f(x - <span class="number">1</span>)</span><br><span class="line">F(f)(<span class="number">5</span>) == <span class="number">120</span></span><br></pre></td></tr></table></figure><p>By now, we get a function <code>F1 = F(f)</code>, and with this <code>F1</code>, we can calculate factorial of any number. Then we get to know one thing: actually, <code>F1</code> is our helper function <code>f</code>!<br>Let’s look closer, <code>F1 = F(f)</code> and <code>F1 is f</code> so <code>F1 = F(F1)</code>. If we know fixed point concept in mathematics, then we’ll find out that <code>F1</code> is a fixed point of <code>F</code>.<br>If we have one function <code>T</code>, for any given <code>F</code>: <code>T(F) == F(T(F))</code>, e.g. <code>T(F)</code> is a fixed point of <code>F</code>. Then we can easily write recursive functions, for our factorial example:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">factorial = T(<span class="keyword">lambda</span> f: <span class="keyword">lambda</span> x: <span class="number">1</span> <span class="keyword">if</span> x < <span class="number">2</span> <span class="keyword">else</span> x * f(x - <span class="number">1</span>))</span><br></pre></td></tr></table></figure><p>OK, let’s figure out the definition of <code>T</code>. Wait, we already <em>have</em> the definition of <code>T</code>: <code>T(F) == F(T(F))</code>. Except that this definition is a recursive definition. But we are good at eliminating recursion.</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># T = lambda y: y(T(y))</span></span><br><span class="line"><span class="comment"># we define T1 which accept two arguments, first one is T1 itself</span></span><br><span class="line">T1 = <span class="keyword">lambda</span> x: <span class="keyword">lambda</span> y: y(x(x)(y))</span><br><span class="line"><span class="comment"># to get T, we just apply T1 to T1</span></span><br><span class="line">T = T1(T1)</span><br><span class="line"><span class="comment"># in conclusion</span></span><br><span class="line">T = (<span class="keyword">lambda</span> x: <span class="keyword">lambda</span> y: y(x(x)(y)))(<span class="keyword">lambda</span> x: <span class="keyword">lambda</span> y: y(x(x)(y)))</span><br></pre></td></tr></table></figure><p>This <code>T</code> is the Turing fixed-point combinator. But if we test it in Python, we will get <code>RuntimeError: maximum recursion depth exceeded</code>. This is because Python use strict evaluation strategy, so when we call <code>T(F)</code>, it try to evaluate arguments immediately, then endless evaluation occur. To work around this, we don’t return value directly in <code>T1</code>, we return the function:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">T1 = <span class="keyword">lambda</span> x: <span class="keyword">lambda</span> y: <span class="keyword">lambda</span> z : y(x(x)(y))(z)</span><br><span class="line">T = T1(T1)</span><br><span class="line"><span class="keyword">print</span> T(<span class="keyword">lambda</span> f: <span class="keyword">lambda</span> x: <span class="number">1</span> <span class="keyword">if</span> x < <span class="number">2</span> <span class="keyword">else</span> x * f(x - <span class="number">1</span>))(<span class="number">5</span>)</span><br></pre></td></tr></table></figure>]]></content>
<summary type="html">
<p>In lambda calculus, all functions are anonymous, so we can not define a recursive function in usual way, e.g. call function itself by nam
</summary>
</entry>
<entry>
<title>编辑距离解法及证明</title>
<link href="http://dlee-libo.github.io/2018/06/13/edit-distance/"/>
<id>http://dlee-libo.github.io/2018/06/13/edit-distance/</id>
<published>2018-06-13T10:17:42.000Z</published>
<updated>2025-06-13T09:17:24.720Z</updated>
<content type="html"><![CDATA[<h2 id="问题介绍"><a href="#问题介绍" class="headerlink" title="问题介绍"></a>问题介绍</h2><p>给定字符串 w1 , w2 , 以及如下对字符串的操作:</p><ul><li>删除字符串中指定位置的字符</li><li>在字符串中插入指定字符</li><li>将字符串中指定字符替换为另外的字符</li></ul><p>使用上述操作将 w1 变为 w2 所需要的最少操作次数即为 w1 与 w2 的编辑距离</p><a id="more"></a><h2 id="解法介绍"><a href="#解法介绍" class="headerlink" title="解法介绍"></a>解法介绍</h2><p>编辑距离是一道经典的动态规划的题目,定义 <code>f(i, j)</code> 为 <code>w1[0, i]</code> => <code>w2[0, j]</code> 的编辑距离,则有状态转移方程为:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">f(i, j) = f(i - <span class="number">1</span>, j - <span class="number">1</span>) <span class="keyword">if</span> w1[i] == w2[j] <span class="keyword">else</span> min(f(i - <span class="number">1</span>, j), f(i - <span class="number">1</span>, j - <span class="number">1</span>), f(i, j - <span class="number">1</span>)) + <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#边界条件</span></span><br><span class="line">f(<span class="number">0</span>, <span class="number">0</span>) = <span class="number">0</span> <span class="keyword">if</span> w1[<span class="number">0</span>] == w2[<span class="number">0</span>] <span class="keyword">else</span> <span class="number">1</span></span><br><span class="line">f(<span class="number">0</span>, j) = j <span class="keyword">if</span> w1[<span class="number">0</span>] == w2[j] <span class="keyword">else</span> f(<span class="number">0</span>, j - <span class="number">1</span>) + <span class="number">1</span></span><br><span class="line">f(i, <span class="number">0</span>) = i <span class="keyword">if</span> w1[i] == w2[<span class="number">0</span>] <span class="keyword">else</span> f(i - <span class="number">1</span>, <span class="number">0</span>) + <span class="number">1</span></span><br></pre></td></tr></table></figure><h2 id="证明"><a href="#证明" class="headerlink" title="证明"></a>证明</h2><p>当 <code>w1[i] == w2[j]</code> 时, 显然有 <code>f(i, j) = f(i - 1, j - 1)</code> 。 当 <code>w1[i] != w2[j]</code> 时, 不妨将两个单词分别记为:<code>w1 a</code> 与 <code>w2 b</code> 。<br>假设 <code>w1 a</code> => <code>w2 b</code> 最少需要 x 个步骤, <code>w1 a</code> => <code>w2</code> 最少需要 y1 个步骤, <code>w1</code> => <code>w2</code> 最少需要 y2 个步骤, <code>w1</code> => <code>w2 b</code> 最少需要 y3 个步骤。<br>则有推论:y1 + 1 >= x; y2 + 1 >= x; y3 + 1 >= x ,使用反证法即可证明。</p><p>接下来我们证明: x == min(y1 + 1, y2 + 1, y3 + 1)</p><p>首先我们容易证明所有的操作序列都可以统一调整顺序为先进行删除操作(记为 d 操作),然后进行插入操作(记为 i 操作),最后进行替换操作(记为 r 操作),并且我们可以把同类操作进行调整顺序使得操作从前往后进行。这样的调整操作顺序不会导致操作的数量发生改变,可以方便我们讨论。</p><p>考察 <code>w1 a</code> => <code>w2 b</code> 所有可能的操作序列的组合:</p><p>如果 <code>w1 a</code> => <code>w2 b</code> 全部是 d 操作,那么最后一步一定是 d last a ,即 <code>w1 a</code> => <code>w2 b a</code> => <code>w2 b</code> ,那么就有 x - 1 个步骤一定能把 <code>w1</code> 变成 <code>w2 b</code><br>所以有 x - 1 >= y3 && x >= y3 + 1 ,于是有 x == y3 + 1</p><p>如果 <code>w1 a</code> => <code>w2 b</code> 全部是 i 操作,那么最后一步一定是 i last b ,即 <code>w1 a</code> => <code>w2</code> => <code>w2 b</code> ,那么就有 x - 1 个步骤可以把 <code>w1 a</code> 变成 <code>w2</code><br>所以有 x - 1 >= y1 && x >= y1 + 1 ,于是有 x == y1 + 1</p><p>如果 <code>w1 a</code> => <code>w2 b</code> 全部是 r 操作,那么最后一步一定是 r last a to b ,即 <code>w1 a</code> => <code>w2 a</code> => <code>w2 b</code> ,那么就有 x - 1 个步骤可以把 <code>w1</code> 变成 <code>w2</code><br>所以有 x - 1 >= y2 && x >= y2 + 1 ,于是有 x == y2 + 1</p><p>如果 <code>w1 a</code> => <code>w2 b</code> 是 d i 操作,分别讨论:</p><ol><li>如果 d 操作没有在 a 上执行过,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一系列 i 操作] => <code>w2</code> => [i] => <code>w2 b</code><br>我们构造 <code>w1 a</code> => [同样的 d 操作] => <code>w1' a</code> => [同样的 i 操作] => <code>w2</code> ,所以 x - 1 个步骤可以把 <code>w1 a</code> => <code>w2</code> ,所以有 x - 1 >= y1 && x <= y1 + 1 故而 x == y1 + 1</li><li>如果 d 操作过 a ,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [d] => <code>w1'</code> => [一系列 i 操作] => <code>w2 b</code><br>我们构造 <code>w1</code> => [同样的 d 操作] => <code>w1'</code> => [一系列的 i 操作] => <code>w2 b</code> , 所以 x - 1 个步骤可以实现 <code>w1</code> => <code>w2 b</code> ,所以有 x - 1 >= y3 && x <= y3 + 1 故而 x == y3 + 1</li></ol><p>如果 <code>w1 a</code> => <code>w2 b</code> 是 d r 操作, 分别讨论:</p><ol><li>如果 d 操作没有在 a 上执行,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一些列 r 操作] => <code>w2 a</code> => [r] => <code>w2 b</code><br>我们构造 <code>w1</code> => [一系列 d 操作] => <code>w1'</code> => [一系列 r 操作] => <code>w2</code> ,所以 x - 1 个步骤可以把 <code>w1</code> => <code>w2</code> ,所以有 x - 1 >= y2 && x <= y2 + 1 故而 x == y2 + 1</li><li>如果 d 操作过 a ,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [d] => <code>w1'</code> => [一系列 r 操作] => <code>w2 b</code><br>我们构造 <code>w1</code> => [一系列 d 操作] => <code>w1'</code> => [一系列 r 操作] => <code>w2 b</code> ,所以 x - 1 个步骤可以把 <code>w1</code> => <code>w2 b</code> ,所以有 x - 1 >= y3 && x <= y3 + 1 故而 x == y3 + 1</li></ol><p>如果 <code>w1 a</code> => <code>w2 b</code> 是 i r 操作,分别讨论:</p><ol><li>如果 i 操作没有越过 a ,那么就是 <code>w1 a</code> => [一系列 i 操作] => <code>w1' a</code> => [一系列 r 操作] => <code>w2 a</code> => [r] => <code>w2 b</code><br>我们构造 <code>w1</code> => [一系列 i 操作] => <code>w1'</code> => [一系列的 r 操作] => <code>w2</code> ,所以 x - 1 个步骤可以把 <code>w1</code> => <code>w2</code> ,所以有 x - 1 >= y2 && x <= y2 + 1 故而 x == y2 + 1</li><li>如果 i 操作过 a ,那么就是 <code>w1 a</code> => [一系列 i 操作] => <code>w1' a</code> => [一系列的 i 操作] => <code>w1' a w1''</code> => [ i 操作,最后的 b ] => <code>w1' a w1'' b</code> => [一系列的 r 操作](r 操作肯定不会超过 <code>w1'a</code> 的位置, 因为后面的字符都是 i 操作插入的新字符) => <code>w2 b</code><br>我们构造 <code>w1 a</code> => [一系列 i 操作] => <code>w1' a</code> => [一系列 i 操作] => <code>w1' a w1''</code> => [一系列的 r 操作] => <code>w2</code>,所以 x - 1 个步骤可以把 <code>w1 a</code> => <code>w2</code> ,所以有 x - 1 >= y1 && x <= y1 + 1 故而 x == y1 + 1</li></ol><p>如果 <code>w1 a</code> => <code>w2 b</code> 是 d i r 操作,分别讨论:</p><ol><li>如果 d 操作没有操作过 a ,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一系列 i 操作] => [一系列 r 操作] => <code>w2 b</code> ,现在我们知道 <code>w1' a</code> => [一系列 i 操作] => [一系列 r 操作] => <code>w2 b</code>,类比上面的方法,分开讨论:<ul><li>如果 i 操作没有超过 a ,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一系列 i 操作] => <code>w1'' a</code> => [一系列 r 操作] => <code>w2 a</code> => [r] => <code>w2 b</code><br>我们构造 <code>w1</code> => [一系列 d 操作] => <code>w1'</code> => [一系列 i 操作] => <code>w1''</code> => [一系列 r 操作] => <code>w2</code> ,所以 x - 1 个步骤可以把 <code>w1</code> => <code>w2</code> ,所以有 x - 1 >= y2 && x <= y2 + 1 故而 x == y2 + 1</li><li>如果 i 操作超过 a , 那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一系列 i 操作] => <code>w1'' a</code> => [一系列 i 操作] => <code>w1'' a w1'''</code> => [i 操作,最后的 b ] => <code>w1'' a w1''' b</code> => [一系列 r 操作] => <code>w2 b</code><br>我们构造 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [一系列 i 操作] => <code>w1'' a</code> => [一系列 i 操作] => <code>w1'' a w1'''</code> => [一系列 r 操作] => <code>w2</code>,所以 x - 1 个步骤可以把 <code>w1 a</code> => <code>w2</code> ,所以有 x - 1 >= y1 && x <= y1 + 1 故而 x == y1 + 1</li></ul></li><li>如果 d 操作操作过 a ,那么就是 <code>w1 a</code> => [一系列 d 操作] => <code>w1' a</code> => [d] => <code>w1'</code> => [一系列 i 操作] => [一系列 r 操作] => <code>w2 b</code><br>我们构造 <code>w1</code> => [一系列 d 操作] => <code>w1'</code> => [一系列 i 操作] => [一系列 r 操作] => <code>w2 b</code> ,所以 x - 1 个步骤可以把 <code>w1</code> => <code>w2 b</code> ,所以有 x - 1 >= y3 && x <= y3 + 1 故而 x == y3 + 1</li></ol><p>通过枚举所有的操作序列组合,得到结论: x 的取值一定是 y1 + 1 , y2 + 1 , y3 + 1 中的一个,所以根据定义有: x == min(y1 + 1, y2 + 1, y3 + 1)</p>]]></content>
<summary type="html">
<h2 id="问题介绍"><a href="#问题介绍" class="headerlink" title="问题介绍"></a>问题介绍</h2><p>给定字符串 w1 , w2 , 以及如下对字符串的操作:</p>
<ul>
<li>删除字符串中指定位置的字符</li>
<li>在字符串中插入指定字符</li>
<li>将字符串中指定字符替换为另外的字符</li>
</ul>
<p>使用上述操作将 w1 变为 w2 所需要的最少操作次数即为 w1 与 w2 的编辑距离</p>
</summary>
</entry>
</feed>