{
    "version": "https://jsonfeed.org/version/1",
    "title": "AI浪潮下的小小避风港",
    "subtitle": "思考ing：智能的本质是什么？",
    "icon": "https://yumengmeng.cn/assets/favicon.ico",
    "description": "只要不停下脚步，道路就会一直延伸",
    "home_page_url": "https://yumengmeng.cn",
    "items": [
        {
            "id": "https://yumengmeng.cn/2026/06/03/CS231n%E2%80%94%E2%80%94lecture6%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E6%9E%B6%E6%9E%84/index/",
            "url": "https://yumengmeng.cn/2026/06/03/CS231n%E2%80%94%E2%80%94lecture6%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E6%9E%B6%E6%9E%84/index/",
            "title": "CS231n——Lecture6 卷积神经网络架构",
            "date_published": "2026-06-02T16:00:00.000Z",
            "content_html": "<p>&lt;!-- 内容待补充 --&gt;</p>\n",
            "tags": [
                "CS231n学习笔记",
                "CS231n",
                "计算机视觉",
                "深度学习",
                "CNN",
                "架构设计"
            ]
        },
        {
            "id": "https://yumengmeng.cn/2026/06/02/CS231n%E2%80%94%E2%80%94lecture5%E5%9F%BA%E4%BA%8ECNN%E7%9A%84%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/index/",
            "url": "https://yumengmeng.cn/2026/06/02/CS231n%E2%80%94%E2%80%94lecture5%E5%9F%BA%E4%BA%8ECNN%E7%9A%84%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/index/",
            "title": "CS231n——Lecture5 基于CNN的图像分类",
            "date_published": "2026-06-02T08:58:43.000Z",
            "content_html": "<h2 id=\"第一部分回顾深度学习基础lecture2lecture4\"><a class=\"anchor\" href=\"#第一部分回顾深度学习基础lecture2lecture4\">#</a> 第一部分回顾：深度学习基础（Lecture2—Lecture4）</h2>\n<p>在进入卷积网络之前，先回顾前几讲建立的深度学习基础框架。</p>\n<h3 id=\"图像分类与线性分类器\"><a class=\"anchor\" href=\"#图像分类与线性分类器\">#</a> 图像分类与线性分类器</h3>\n<p>第一步是<strong>定义问题</strong>：输入一张图像（展开为张量），输出一个分数向量，表示各标签与图像的匹配程度。通过权重矩阵 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 进行预测：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>W</mi><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">f(x, W) = Wx\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mord mathnormal\">x</span></span></span></span></span></p>\n<p>问题由此转化为：<strong>如何选择一个好的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span>？</strong> 这便引入了损失函数。</p>\n<h3 id=\"损失函数\"><a class=\"anchor\" href=\"#损失函数\">#</a> 损失函数</h3>\n<p>损失函数告诉我们：给定权重矩阵 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 与数据集，这个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 在解决当前问题上表现如何。常用损失函数包括：</p>\n<ul>\n<li><strong>多分类 SVM 损失（Hinge Loss）</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><msub><mo>∑</mo><mrow><mi>j</mi><mo mathvariant=\"normal\">≠</mo><msub><mi>y</mi><mi>i</mi></msub></mrow></msub><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>s</mi><mi>j</mi></msub><mo>−</mo><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub><mo>+</mo><mi mathvariant=\"normal\">Δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L_i = \\sum_{j \\neq y_i} \\max(0, s_j - s_{y_i} + \\Delta)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1858em;vertical-align:-0.4358em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1864em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\"><span class=\"mrel mtight\"><span class=\"mord vbox mtight\"><span class=\"thinbox mtight\"><span class=\"rlap mtight\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"inner\"><span class=\"mord mtight\"><span class=\"mrel mtight\"></span></span></span><span class=\"fix\"></span></span></span></span></span><span class=\"mrel mtight\">=</span></span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4358em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Δ</span><span class=\"mclose\">)</span></span></span></span></li>\n<li><strong>Softmax 损失（交叉熵损失）</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mfrac><msup><mi>e</mi><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub></msup><mrow><msub><mo>∑</mo><mi>j</mi></msub><msup><mi>e</mi><msub><mi>s</mi><mi>j</mi></msub></msup></mrow></mfrac><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">L_i = -\\log\\left(\\frac{e^{s_{y_i}}}{\\sum_j e^{s_j}}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8275em;vertical-align:-0.6775em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9421em;\"><span style=\"top:-2.6447em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mop mtight\"><span class=\"mop op-symbol small-op mtight\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1496em;\"><span style=\"top:-2.1786em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4603em;\"><span></span></span></span></span></span></span><span class=\"mspace mtight\" style=\"margin-right:0.1952em;\"></span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.779em;\"><span style=\"top:-2.9714em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3448em;margin-left:0em;margin-right:0.1em;\"><span class=\"pstrut\" style=\"height:2.6595em;\"></span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5092em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.783em;\"><span style=\"top:-2.9754em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2306em;\"><span style=\"top:-2.3em;margin-left:0em;margin-right:0.1em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3448em;margin-left:-0.0359em;margin-right:0.1em;\"><span class=\"pstrut\" style=\"height:2.6595em;\"></span><span class=\"mord mathnormal mtight\">i</span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3147em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5147em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6775em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">)</span></span></span></span></span></span></li>\n</ul>\n<p>现在我们有了问题定义（线性分类器）和评判标准（损失函数），但还需要<strong>找到一个好的解决方案</strong>——这引出了优化。</p>\n<h3 id=\"优化\"><a class=\"anchor\" href=\"#优化\">#</a> 优化</h3>\n<p>将优化环境想象成高维空间中的一个曲面：x 轴（或整个平面）上的每个点对应一组权重 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span>，y 轴是损失函数值。损失越高说明模型越差，所以优化的目标是让损失<strong>往下滑</strong>，在曲面的最低点附近找到一组权重。</p>\n<p>常用优化算法沿以下路线演进：</p>\n<table>\n<thead>\n<tr>\n<th>方法</th>\n<th>核心改进</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>SGD</strong></td>\n<td>用 mini-batch 梯度近似全量梯度，解决计算效率</td>\n</tr>\n<tr>\n<td><strong>SGD + Momentum</strong></td>\n<td>引入速度/惯性，冲过鞍点和局部极小值</td>\n</tr>\n<tr>\n<td><strong>RMSProp</strong></td>\n<td>逐参数自适应学习率，解决病态条件</td>\n</tr>\n<tr>\n<td><strong>Adam</strong></td>\n<td>动量 + 自适应学习率 + 偏差修正，全能选手</td>\n</tr>\n<tr>\n<td><strong>AdamW</strong></td>\n<td>Adam + 解耦权重衰减，当前主流默认选择</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"线性分类器的局限\"><a class=\"anchor\" href=\"#线性分类器的局限\">#</a> 线性分类器的局限</h3>\n<p>线性分类器本质上需要为每个类别总结出一个<strong>模板</strong>。当同一类别在特征空间中呈现多模态分布（如分布在两个相对象限中），单个模板无法同时覆盖——这正是线性分类器无法处理奇偶分布、同心圆等问题的根本原因。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/linear_classifier_limitation.png\" alt=\"线性分类器无法处理的多模态分布问题\" /></p>\n<h3 id=\"神经网络的引入\"><a class=\"anchor\" href=\"#神经网络的引入\">#</a> 神经网络的引入</h3>\n<p>将两个权重矩阵叠加，并在中间插入非线性激活函数：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>W</mi><mn>2</mn></msub><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">f(x, W_1, W_2) = W_2 \\max(0, W_1 x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>这一简单的改动赋予了模型<strong>非线性分类能力</strong>。关键在于：如果去掉中间的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mo>⋅</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\max(0, \\cdot)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">⋅</span><span class=\"mclose\">)</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mn>2</mn></msub><msub><mi>W</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">W_2 W_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 退化为单个矩阵 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mn>3</mn></msub></mrow><annotation encoding=\"application/x-tex\">W_3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>，又变回线性分类器。堆叠任意多层线性变换，等价于一层线性变换——<strong>表达能力没有一丝提升</strong>。非线性激活函数的最核心作用永远是：<strong>引入非线性</strong>。</p>\n<h3 id=\"计算图与反向传播\"><a class=\"anchor\" href=\"#计算图与反向传播\">#</a> 计算图与反向传播</h3>\n<p>为了优化模型参数，需要计算损失函数对每个权重的梯度。<strong>计算图</strong>正是为此而设计：它是一个有向无环图（DAG），节点是运算步骤，边是数据流。</p>\n<ul>\n<li><strong>前向传播</strong>：数据从左往右流，经过各中间节点，最终算出损失</li>\n<li><strong>反向传播</strong>：一旦算出损失，从右往左沿图回溯，使用链式法则自动计算每个节点的梯度</li>\n</ul>\n<p>核心公式极为简洁：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>下游梯度</mtext><mo>=</mo><mtext>上游梯度</mtext><mo>×</mo><mtext>局部梯度</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{下游梯度} = \\text{上游梯度} \\times \\text{局部梯度}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">下游梯度</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">上游梯度</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">局部梯度</span></span></span></span></span></span></p>\n<p>重要的形状法则：<strong>上游梯度总是与输出形状相同，下游梯度总是与输入形状相同</strong>。维度分析即可推导出任何张量运算的反向传播，无需死记公式。</p>\n<h3 id=\"完整流程总结\"><a class=\"anchor\" href=\"#完整流程总结\">#</a> 完整流程总结</h3>\n<p>对于任意待解决的问题：</p>\n<ol>\n<li>将输入编码为张量</li>\n<li>写出计算图，计算输出张量</li>\n<li>收集数据集，定义损失函数</li>\n<li>使用梯度下降优化损失，通过反向传播自动计算梯度</li>\n</ol>\n<p>这套流程<strong>基本支撑了所有深度学习应用</strong>。</p>\n<hr />\n<h2 id=\"图像特征表示-image-features\"><a class=\"anchor\" href=\"#图像特征表示-image-features\">#</a> 图像特征表示 Image Features</h2>\n<h3 id=\"为什么需要特征\"><a class=\"anchor\" href=\"#为什么需要特征\">#</a> 为什么需要特征？</h3>\n<p>实际上，神经网络的输入<strong>不一定非得是原始像素</strong>。我们可以定义一些其他类型的函数来提取特征——将原始图像的像素值转化为更有意义的表示，再送入线性分类器。</p>\n<p>两个经典的例子：</p>\n<p><strong>颜色直方图 Color Histogram</strong>：将所有像素按颜色值归类到具体的桶（bucket）中，只统计颜色分布，完全忽略空间位置信息。比如一张蓝天的图片，蓝色像素占比会远高于其他颜色。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/colorhistogram.jpg\" alt=\"颜色直方图：将像素按颜色分类，忽略空间结构\" /></p>\n<p><strong>定向梯度直方图 Histogram of Oriented Gradients (HOG)</strong>：丢弃颜色信息，只关注图像中的<strong>结构信息</strong>——计算每个局部区域中边缘的方向分布。它关注的是&quot;这个区域有怎样的边缘走向&quot;，而不是&quot;这个区域是什么颜色&quot;。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/hog.jpg\" alt=\"定向梯度直方图：只提取边缘方向，丢弃颜色\" /></p>\n<h3 id=\"特征提取-分类器的组合范式\"><a class=\"anchor\" href=\"#特征提取-分类器的组合范式\">#</a> 特征提取 + 分类器的组合范式</h3>\n<p>在深度学习兴起之前，计算机视觉的主流范式是：</p>\n<ol>\n<li><strong>手动设计特征提取器</strong>（颜色直方图、HOG、SIFT 等）</li>\n<li>将多种特征<strong>堆叠拼接</strong>成一个特征向量</li>\n<li>将特征向量送入线性分类器（如 SVM）</li>\n</ol>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/feature_pipeline.jpg\" alt=\"传统范式：图像 → 特征提取 → 特征向量拼接 → 线性分类器\" /></p>\n<p>好的特征表示需要大量领域知识和工程经验。问题在于：<strong>作为人类，很难手写完美的特征提取器</strong>——数据本身和端到端的学习往往能做得更好。</p>\n<h3 id=\"端到端神经网络\"><a class=\"anchor\" href=\"#端到端神经网络\">#</a> 端到端神经网络</h3>\n<p>神经网络的思路完全不同：</p>\n<ul>\n<li>输入：<strong>原始像素值</strong></li>\n<li>输出：预测分数</li>\n<li>整个系统通过梯度下降<strong>从训练数据中自动学习</strong>所有参数</li>\n</ul>\n<p>不需要手动设计特征——网络自己学习需要提取哪些特征。问题变成了：<strong>如何设计神经网络架构？</strong> 即决定运算符的序列和中间张量的大小。</p>\n<hr />\n<h2 id=\"从全连接层到卷积层\"><a class=\"anchor\" href=\"#从全连接层到卷积层\">#</a> 从全连接层到卷积层</h2>\n<h3 id=\"全连接层的局限\"><a class=\"anchor\" href=\"#全连接层的局限\">#</a> 全连接层的局限</h3>\n<p>回顾全连接层（Fully Connected Layer）：将 CIFAR-10 图像（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>32</mn><mo>×</mo><mn>32</mn><mo>×</mo><mn>3</mn><mo>=</mo><mn>3072</mn></mrow><annotation encoding=\"application/x-tex\">32 \\times 32 \\times 3 = 3072</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3072</span></span></span></span> 维）展开成一维向量，与权重矩阵 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mn>10</mn><mo>×</mo><mn>3072</mn></mrow></msup></mrow><annotation encoding=\"application/x-tex\">W \\in \\mathbb{R}^{10 \\times 3072}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">10</span><span class=\"mbin mtight\">×</span><span class=\"mord mtight\">3072</span></span></span></span></span></span></span></span></span></span></span></span> 做矩阵乘法，得到 10 个类别的预测分数。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/neural_net.jpeg\" alt=\"全连接层：将三维张量压扁为一维向量\" /></p>\n<p>全连接层的底层原理是<strong>向量点积</strong>：两个向量方向接近时点积结果高，正交时结果为零（或很小）。每一行权重可视为该类别的&quot;模板&quot;。</p>\n<p><strong>致命的缺陷</strong>：将图像压扁成一维向量，<strong>完全损失了空间结构</strong>。一个像素与它上下左右的像素之间的空间关系在压扁后荡然无存——左邻像素和右邻像素在 3072 维向量中相隔 32 个位置，与一个完全无关的远处像素相邻。</p>\n<h3 id=\"如何尊重二维信息\"><a class=\"anchor\" href=\"#如何尊重二维信息\">#</a> 如何尊重二维信息？</h3>\n<p>这就是<strong>卷积神经网络 CNN</strong> 的诞生原因。</p>\n<p>CNN 的核心洞察：<strong>图像具有平移不变性</strong>。一只猫在图像偏左还是偏右的位置，它仍然是猫。全连接层会把&quot;左上角的猫&quot;和&quot;右下角的猫&quot;视为完全不同的输入模式，需要分别学习——这极其浪费。卷积层通过<strong>参数共享</strong>直接内置了这一归纳偏置：同一个滤波器在整个图像上滑动检测，无论特征出现在哪个位置。</p>\n<hr />\n<h2 id=\"卷积层-convolutional-layer\"><a class=\"anchor\" href=\"#卷积层-convolutional-layer\">#</a> 卷积层 Convolutional Layer</h2>\n<h3 id=\"基本思想局部连接-参数共享\"><a class=\"anchor\" href=\"#基本思想局部连接-参数共享\">#</a> 基本思想：局部连接 + 参数共享</h3>\n<p>卷积层改变了全连接层的连接方式：</p>\n<ul>\n<li><strong>全连接层</strong>：每个输出神经元连接所有输入像素，权重数量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>×</mo><mi>H</mi><mo>×</mo><mi>W</mi><mo>×</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">C_{in} \\times H \\times W \\times C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></li>\n<li><strong>卷积层</strong>：每个滤波器只看输入的<strong>一小部分</strong>（局部感受野），同一个滤波器在整个图像上<strong>滑动并共享权重</strong>，权重数量仅 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>×</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>×</mo><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub></mrow><annotation encoding=\"application/x-tex\">C_{out} \\times C_{in} \\times K_h \\times K_w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></li>\n</ul>\n<h3 id=\"卷积运算的直观过程\"><a class=\"anchor\" href=\"#卷积运算的直观过程\">#</a> 卷积运算的直观过程</h3>\n<p>将一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5 \\times 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span> 的小滤波器（filter / kernel）滑动到输入图像上的某个位置，计算该滤波器与该位置图像片段的<strong>逐元素乘积之和</strong>（即点积），得到一个标量。这个标量衡量了该位置与滤波器的&quot;匹配程度&quot;——值越大，说明该区域越符合滤波器检测的模式。</p>\n<p>然后<strong>重复这个过程</strong>：将小滤波器滑动到输入图像上的每一个可能位置，计算并收集所有匹配分数，排列成一个二维网格 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">28 \\times 28 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span>（称为<strong>激活图 Activation Map</strong> 或 <strong>特征图 Feature Map</strong>）。</p>\n<p>单个滤波器只能检测一种模式（如边缘、纹理、颜色对比等）。要检测多种模式，需要多个滤波器：</p>\n<ul>\n<li>第二个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5 \\times 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span> 滤波器做同样的事，生成另一张 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">28 \\times 28 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 的特征图</li>\n<li>以此类推，添加任意数量的滤波器</li>\n<li>将所有特征图沿深度方向堆叠，得到一个三维张量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">28 \\times 28 \\times C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/depthcol.jpeg\" alt=\"卷积操作：滤波器在输入上滑动并计算点积\" /></p>\n<p>每个卷积层还会添加一个<strong>偏置项</strong>（每个滤波器一个标量偏置），为非线性变换提供灵活性。</p>\n<h3 id=\"卷积层的定义\"><a class=\"anchor\" href=\"#卷积层的定义\">#</a> 卷积层的定义</h3>\n<p>卷积层的关键运算符：</p>\n<blockquote>\n<p><strong>输入</strong>：一张图像 + 一组滤波器（随机初始化）</p>\n<p><strong>超参数</strong>：滤波器数量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>（输出通道数）、滤波器尺寸 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub></mrow><annotation encoding=\"application/x-tex\">K_h \\times K_w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></p>\n</blockquote>\n<p>与全连接层相同，卷积层输出后必须接<strong>非线性激活函数</strong>（如 ReLU），否则多个卷积层的叠加仍等价于一层线性变换。</p>\n<h3 id=\"批量处理四维张量\"><a class=\"anchor\" href=\"#批量处理四维张量\">#</a> 批量处理：四维张量</h3>\n<p>实际操作中我们处理的是<strong>一批图像</strong>（mini-batch）：</p>\n<table>\n<thead>\n<tr>\n<th>张量</th>\n<th>维度</th>\n<th>含义</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>输入</td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>×</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>×</mo><mi>H</mi><mo>×</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">N \\times C_{in} \\times H \\times W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span></td>\n<td>一批图像（N 张）</td>\n</tr>\n<tr>\n<td>滤波器</td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>×</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>×</mo><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub></mrow><annotation encoding=\"application/x-tex\">C_{out} \\times C_{in} \\times K_h \\times K_w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></td>\n<td>一组滤波器</td>\n</tr>\n<tr>\n<td>输出</td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>×</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>×</mo><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>×</mo><msup><mi>W</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup></mrow><annotation encoding=\"application/x-tex\">N \\times C_{out} \\times H&#x27; \\times W&#x27;</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8352em;vertical-align:-0.0833em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7519em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7519em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7519em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span></span></span></span></td>\n<td>一批输出特征图</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"空间维度计算\"><a class=\"anchor\" href=\"#空间维度计算\">#</a> 空间维度计算</h3>\n<p>卷积后特征图的空间尺寸由三个因素决定：</p>\n<p><strong>不填充，步幅为 1</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mi>H</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">H&#x27; = H - K + 1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p>例如：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>32</mn><mo>×</mo><mn>32</mn></mrow><annotation encoding=\"application/x-tex\">32 \\times 32</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">32</span></span></span></span> 的输入，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">5</span></span></span></span> 的卷积核 → <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn></mrow><annotation encoding=\"application/x-tex\">28 \\times 28</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">28</span></span></span></span> 的输出。</p>\n<p><strong>带零填充 Zero Padding</strong>：</p>\n<p>在图像边界外添加 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi></mrow><annotation encoding=\"application/x-tex\">P</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span></span></span></span> 圈零值像素，使输出尺寸可控：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mi>H</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>1</mn><mo>+</mo><mn>2</mn><mi>P</mi></mrow><annotation encoding=\"application/x-tex\">H&#x27; = H - K + 1 + 2P\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span></span></span></span></span></p>\n<p>如果需要输出尺寸与输入相同（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mi>H</mi></mrow><annotation encoding=\"application/x-tex\">H&#x27; = H</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7519em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7519em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span></span></span></span>），设置 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><mo>=</mo><mo stretchy=\"false\">(</mo><mi>K</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">P = (K-1)/2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord\">/2</span></span></span></span>。这也是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span> 通常取<strong>奇数</strong>（3, 5, 7）的原因——保证填充为整数。</p>\n<p><strong>带步幅 Stride</strong>：</p>\n<p>默认滤波器每次滑动 1 步。将步幅设为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi></mrow><annotation encoding=\"application/x-tex\">S</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span></span> 可以加速卷积、减少计算量：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mfrac><mrow><mi>H</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>2</mn><mi>P</mi></mrow><mi>S</mi></mfrac><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">H&#x27; = \\frac{H - K + 2P}{S} + 1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p>通用公式（三个参数同时出现，通常不引入膨胀）：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mrow><mo fence=\"true\">⌊</mo><mfrac><mrow><mi>H</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>2</mn><mi>P</mi></mrow><mi>S</mi></mfrac><mo fence=\"true\">⌋</mo></mrow><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">H&#x27; = \\left\\lfloor \\frac{H - K + 2P}{S} \\right\\rfloor + 1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">⌊</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">⌋</span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">⌊</mo><mo>⋅</mo><mo stretchy=\"false\">⌋</mo></mrow><annotation encoding=\"application/x-tex\">\\lfloor \\cdot \\rfloor</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">⌊</span><span class=\"mord\">⋅</span><span class=\"mclose\">⌋</span></span></span></span> 表示下取整——当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>H</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>2</mn><mi>P</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(H - K + 2P)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mclose\">)</span></span></span></span> 不能被 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi></mrow><annotation encoding=\"application/x-tex\">S</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span></span> 整除时，多余的边缘像素直接丢弃。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/stride.jpeg\" alt=\"不同步幅对输出尺寸的影响\" /></p>\n<h3 id=\"感受野-receptive-field\"><a class=\"anchor\" href=\"#感受野-receptive-field\">#</a> 感受野 Receptive Field</h3>\n<p><strong>为什么更深的层能&quot;看到&quot;更大的结构？</strong></p>\n<p>卷积层每一层的输出神经元只查看输入的<strong>本地区域</strong>。但经过堆叠后，感受野会被逐层放大：</p>\n<ul>\n<li>第一层：一个神经元能看到 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi><mo>×</mo><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K \\times K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span> 的原始输入区域</li>\n<li>第二层：一个神经元能看到 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi><mo>×</mo><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K \\times K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span> 的第一层输出区域 → 对应 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mn>2</mn><mi>K</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mo>×</mo><mo stretchy=\"false\">(</mo><mn>2</mn><mi>K</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(2K-1) \\times (2K-1)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span> 的原始输入区域</li>\n<li>第三层：对应 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mn>3</mn><mi>K</mi><mo>−</mo><mn>2</mn><mo stretchy=\"false\">)</mo><mo>×</mo><mo stretchy=\"false\">(</mo><mn>3</mn><mi>K</mi><mo>−</mo><mn>2</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(3K-2) \\times (3K-2)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">3</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">2</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">3</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">2</span><span class=\"mclose\">)</span></span></span></span> 的原始输入区域</li>\n</ul>\n<p>这种逐层扩大的效应使得：<strong>浅层学习边缘和纹理等局部特征，深层学习语义和整体结构等全局特征</strong>。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/receptive_field.jpeg\" alt=\"感受野逐层扩大示意图\" /></p>\n<h3 id=\"1-times-1-卷积\"><a class=\"anchor\" href=\"#1-times-1-卷积\">#</a> <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">1 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 卷积</h3>\n<p>当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">K = 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 时，卷积核退化为一个&quot;逐点&quot;操作——它不混合空间信息，仅在通道维度上做线性组合。<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">1 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 卷积本质上是一个<strong>作用于每个像素位置的全连接层</strong>，常用于：</p>\n<ul>\n<li><strong>降维/升维</strong>：减少或增加通道数，相比于大核卷积极大幅度减少参数量</li>\n<li><strong>增加非线性</strong>：在 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">1 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 卷积后接 ReLU，在不改变空间尺寸的前提下增强模型表达能力</li>\n<li>是 ResNet 瓶颈结构（Bottleneck Block）和 Inception 架构的核心构件</li>\n</ul>\n<h3 id=\"卷积的其他变体\"><a class=\"anchor\" href=\"#卷积的其他变体\">#</a> 卷积的其他变体</h3>\n<p>除了标准的二维卷积，同一原理可以推广到不同维度：</p>\n<table>\n<thead>\n<tr>\n<th>类型</th>\n<th>输入维度</th>\n<th>卷积核维度</th>\n<th>典型应用</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>1D 卷积</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>×</mo><mi>C</mi><mo>×</mo><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">N \\times C \\times L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi><mo>×</mo><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">C \\times K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span></td>\n<td>序列数据（文本、音频、时间序列）</td>\n</tr>\n<tr>\n<td><strong>2D 卷积</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>×</mo><mi>C</mi><mo>×</mo><mi>H</mi><mo>×</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">N \\times C \\times H \\times W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi><mo>×</mo><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub></mrow><annotation encoding=\"application/x-tex\">C \\times K_h \\times K_w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></td>\n<td>图像处理</td>\n</tr>\n<tr>\n<td><strong>3D 卷积</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi><mo>×</mo><mi>C</mi><mo>×</mo><mi>D</mi><mo>×</mo><mi>H</mi><mo>×</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">N \\times C \\times D \\times H \\times W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi><mo>×</mo><msub><mi>K</mi><mi>d</mi></msub><mo>×</mo><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub></mrow><annotation encoding=\"application/x-tex\">C \\times K_d \\times K_h \\times K_w</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></td>\n<td>视频分析、医学影像（CT/MRI）</td>\n</tr>\n</tbody>\n</table>\n<p><strong>膨胀卷积 Dilated Convolution</strong>：在滤波器元素之间插入空洞（dilation rate <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>d</mi></mrow><annotation encoding=\"application/x-tex\">d</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">d</span></span></span></span>），在不增加参数的前提下<strong>指数级扩大感受野</strong>。例如一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3</mn><mo>×</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">3 \\times 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span> 卷积核，膨胀率 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>d</mi><mo>=</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">d = 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">d</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 时感受野等效于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">5</span></span></span></span>。常用于语义分割和需要大感受野的密集预测任务。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/dilated.jpeg\" alt=\"膨胀卷积示意图\" /></p>\n<p><strong>分组卷积 Grouped Convolution</strong>：将输入通道分为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>G</mi></mrow><annotation encoding=\"application/x-tex\">G</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">G</span></span></span></span> 组，每组独立进行卷积，最后拼接输出。它将参数量和计算量同时降为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi mathvariant=\"normal\">/</mi><mi>G</mi></mrow><annotation encoding=\"application/x-tex\">1/G</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1/</span><span class=\"mord mathnormal\">G</span></span></span></span>。分组卷积的思想最早在 AlexNet 中被用于多 GPU 训练，后来成为 MobileNet（深度可分离卷积）、ResNeXt 等高效/高性能架构的核心设计。</p>\n<p><strong>深度可分离卷积 Depthwise Separable Convolution</strong>：分组卷积的极端形式——<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>G</mi><mo>=</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>=</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">G = C_{in} = C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">G</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>，即每个通道分配一个独立的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi><mo>×</mo><mi>K</mi><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">K \\times K \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 滤波器。计算量约为标准卷积的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi mathvariant=\"normal\">/</mi><msup><mi>K</mi><mn>2</mn></msup><mo>+</mo><mn>1</mn><mi mathvariant=\"normal\">/</mi><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">1/K^2 + 1/C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em;\"></span><span class=\"mord\">1/</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1/</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 倍，是 MobileNet/Xception 等移动端网络的基石。</p>\n<h3 id=\"卷积层参数计算\"><a class=\"anchor\" href=\"#卷积层参数计算\">#</a> 卷积层参数计算</h3>\n<p>以输入 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>32</mn><mo>×</mo><mn>32</mn><mo>×</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">32 \\times 32 \\times 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span>，卷积层 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>6</mn></mrow><annotation encoding=\"application/x-tex\">6</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">6</span></span></span></span> 个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">5</span></span></span></span> 滤波器，步幅 1，无填充为例：</p>\n<ul>\n<li>每个滤波器参数：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>3</mn><mo>+</mo><mn>1</mn><mo>=</mo><mn>76</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5 \\times 3 + 1 = 76</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">76</span></span></span></span>（权重 + 偏置）</li>\n<li>总参数量：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>76</mn><mo>×</mo><mn>6</mn><mo>=</mo><mn>456</mn></mrow><annotation encoding=\"application/x-tex\">76 \\times 6 = 456</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">76</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">6</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">456</span></span></span></span></li>\n<li>输出维度：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><mn>6</mn></mrow><annotation encoding=\"application/x-tex\">28 \\times 28 \\times 6</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">6</span></span></span></span></li>\n<li><strong>总连接数（乘加运算）</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><mn>6</mn><mo>×</mo><mo stretchy=\"false\">(</mo><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>3</mn><mo stretchy=\"false\">)</mo><mo>=</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>764</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">28 \\times 28 \\times 6 \\times (5 \\times 5 \\times 3) = 1,764,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">6</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">3</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">764</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">000</span></span></span></span></li>\n</ul>\n<hr />\n<h2 id=\"池化层-pooling-layer\"><a class=\"anchor\" href=\"#池化层-pooling-layer\">#</a> 池化层 Pooling Layer</h2>\n<h3 id=\"为什么需要池化\"><a class=\"anchor\" href=\"#为什么需要池化\">#</a> 为什么需要池化？</h3>\n<p>步幅卷积是下采样的一种方式，但它需要学习参数且计算量较大。<strong>池化层</strong>提供了另一种更轻量的下采样方法——无参数、计算简单、天然带上采样能力。</p>\n<h3 id=\"最大池化-max-pooling\"><a class=\"anchor\" href=\"#最大池化-max-pooling\">#</a> 最大池化 Max Pooling</h3>\n<p>最常见的池化方式：将特征图划分为多个不重叠区域（如 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">2 \\times 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span>），每个区域取最大值作为输出。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/maxpool.jpeg\" alt=\"最大池化：取每个  区域的最大值\" /></p>\n<p>最大池化的两个核心作用：</p>\n<ul>\n<li><strong>降维</strong>：空间尺寸变为原来的一半（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">2 \\times 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 池化），减少计算量和内存</li>\n<li><strong>引入平移不变性</strong>：小范围平移不会改变最大值的结果，网络对位置的微小变化更鲁棒</li>\n</ul>\n<p>池化是一个早已存在的信号处理方法，只是被重新发现并整合进了深度学习管道中。</p>\n<h3 id=\"池化的设计约定\"><a class=\"anchor\" href=\"#池化的设计约定\">#</a> 池化的设计约定</h3>\n<ul>\n<li><strong>不使用填充</strong>：池化层的核心目的是降维，加零填充与其目的相悖</li>\n<li><strong>最常见设置</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">2 \\times 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 区域、步幅 2（相当于无重叠的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">2 \\times 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 窗口）</li>\n<li><strong>池化区域尺寸与步幅相等</strong>：默认就是不重叠的；如果步幅小于池化区域，就有重叠</li>\n</ul>\n<h3 id=\"平均池化-average-pooling\"><a class=\"anchor\" href=\"#平均池化-average-pooling\">#</a> 平均池化 Average Pooling</h3>\n<p>取每个区域的平均值而非最大值。历史上曾广泛使用，但目前在分类网络中已大部分被最大池化取代。</p>\n<p>一个重要区别：如果用平均池化取代最大池化，需要紧接着引入<strong>非线性激活函数</strong>，因为取均值是线性运算。而最大池化取的是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>max</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\max</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mop\">max</span></span></span></span>，<strong>本身已经带非线性</strong>，不需要额外的激活。</p>\n<h3 id=\"池化层的形状公式\"><a class=\"anchor\" href=\"#池化层的形状公式\">#</a> 池化层的形状公式</h3>\n<p>将池化视为一种&quot;特殊的卷积&quot;：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mfrac><mrow><mi>H</mi><mo>−</mo><mi>K</mi></mrow><mi>S</mi></mfrac><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">H&#x27; = \\frac{H - K}{S} + 1\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></span></p>\n<p>其中池化窗口尺寸 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span></span></span></span> 通常等于步幅 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi></mrow><annotation encoding=\"application/x-tex\">S</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span></span></span></span>（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi><mo>=</mo><mi>S</mi><mo>=</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">K=S=2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 是最常见配置），因此：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mi>H</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo>=</mo><mfrac><mi>H</mi><mn>2</mn></mfrac></mrow><annotation encoding=\"application/x-tex\">H&#x27; = \\frac{H}{2}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8019em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8019em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<h3 id=\"其他池化方法\"><a class=\"anchor\" href=\"#其他池化方法\">#</a> 其他池化方法</h3>\n<table>\n<thead>\n<tr>\n<th>方法</th>\n<th>说明</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>最大池化 Max Pooling</td>\n<td>取区域最大值，自带非线性，最常用</td>\n</tr>\n<tr>\n<td>平均池化 Average Pooling</td>\n<td>取区域平均值，需要额外非线性激活</td>\n</tr>\n<tr>\n<td>全局平均池化 Global Average Pooling</td>\n<td>取整张特征图的最大值，常用于 CNN 末端替代全连接层</td>\n</tr>\n<tr>\n<td>全局平均池化 Global Average Pooling</td>\n<td>取整张特征图的平均值，将 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>H</mi><mo>×</mo><mi>W</mi><mo>×</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">H \\times W \\times C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span> 压缩为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn><mo>×</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">1 \\times 1 \\times C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>，直接接分类器，极大减少参数量</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"输入尺寸不统一怎么办\"><a class=\"anchor\" href=\"#输入尺寸不统一怎么办\">#</a> 输入尺寸不统一怎么办？</h3>\n<ul>\n<li><strong>调整至同一大小</strong>：将所有输入图像 resize 到固定尺寸</li>\n<li><strong>填充</strong>：较小的图像用零或其他值填充至统一大小</li>\n<li><strong>自适应池化</strong>：在 CNN 末端使用全局平均池化（Global Average Pooling），无论输入特征图多大，输出固定为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn><mo>×</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">1 \\times 1 \\times C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>，自然消除了尺寸不统一的问题</li>\n</ul>\n<hr />\n<h2 id=\"cnn-架构设计\"><a class=\"anchor\" href=\"#cnn-架构设计\">#</a> CNN 架构设计</h2>\n<h3 id=\"典型架构模式\"><a class=\"anchor\" href=\"#典型架构模式\">#</a> 典型架构模式</h3>\n<p>池化层与卷积层<strong>交替插入</strong>到网络中：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>INPUT</mtext><mo>→</mo><mo stretchy=\"false\">[</mo><mtext>CONV</mtext><mo>→</mo><mtext>ReLU</mtext><msub><mo stretchy=\"false\">]</mo><mi>N</mi></msub><mo>→</mo><mtext>POOL</mtext><mo>→</mo><mo stretchy=\"false\">[</mo><mtext>CONV</mtext><mo>→</mo><mtext>ReLU</mtext><msub><mo stretchy=\"false\">]</mo><mi>M</mi></msub><mo>→</mo><mtext>POOL</mtext><mo>→</mo><mtext>FC</mtext><mo>→</mo><mtext>FC</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{INPUT} \\to [\\text{CONV} \\to \\text{ReLU}]_N \\to \\text{POOL} \\to [\\text{CONV} \\to \\text{ReLU}]_M \\to \\text{POOL} \\to \\text{FC} \\to \\text{FC}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">INPUT</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord text\"><span class=\"mord\">CONV</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">ReLU</span></span><span class=\"mclose\"><span class=\"mclose\">]</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">POOL</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord text\"><span class=\"mord\">CONV</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">ReLU</span></span><span class=\"mclose\"><span class=\"mclose\">]</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">M</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">POOL</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">FC</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">FC</span></span></span></span></span></span></p>\n<p>一个更具体的视觉模式：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mn>32</mn><mo>×</mo><mn>32</mn><mo>×</mo><mn>3</mn><mover><mo stretchy=\"true\" minsize=\"3.0em\">→</mo><mpadded width=\"+0.6em\" lspace=\"0.3em\"><mtext>CONV+ReLU</mtext></mpadded></mover><mn>28</mn><mo>×</mo><mn>28</mn><mo>×</mo><mn>6</mn><mover><mo stretchy=\"true\" minsize=\"3.0em\">→</mo><mpadded width=\"+0.6em\" lspace=\"0.3em\"><mtext>POOL</mtext></mpadded></mover><mn>14</mn><mo>×</mo><mn>14</mn><mo>×</mo><mn>6</mn><mover><mo stretchy=\"true\" minsize=\"3.0em\">→</mo><mpadded width=\"+0.6em\" lspace=\"0.3em\"><mtext>CONV+ReLU</mtext></mpadded></mover><mn>10</mn><mo>×</mo><mn>10</mn><mo>×</mo><mn>10</mn><mover><mo stretchy=\"true\" minsize=\"3.0em\">→</mo><mpadded width=\"+0.6em\" lspace=\"0.3em\"><mtext>POOL</mtext></mpadded></mover><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>10</mn><mover><mo stretchy=\"true\" minsize=\"3.0em\">→</mo><mpadded width=\"+0.6em\" lspace=\"0.3em\"><mtext>Flatten</mtext></mpadded></mover><mtext>FC</mtext></mrow><annotation encoding=\"application/x-tex\">32 \\times 32 \\times 3 \\xrightarrow{\\text{CONV+ReLU}} 28 \\times 28 \\times 6 \\xrightarrow{\\text{POOL}} 14 \\times 14 \\times 6 \\xrightarrow{\\text{CONV+ReLU}} 10 \\times 10 \\times 10 \\xrightarrow{\\text{POOL}} 5 \\times 5 \\times 10 \\xrightarrow{\\text{Flatten}} \\text{FC}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1113em;vertical-align:-0.011em;\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel x-arrow\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1003em;\"><span style=\"top:-3.322em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight x-arrow-pad\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">CONV+ReLU</span></span></span></span></span><span class=\"svg-align\" style=\"top:-2.689em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"hide-tail\" style=\"height:0.522em;min-width:1.469em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"0.522em\" viewBox=\"0 0 400000 522\" preserveAspectRatio=\"xMaxYMin slice\"><path d=\"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.011em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1113em;vertical-align:-0.011em;\"></span><span class=\"mord\">6</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel x-arrow\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1003em;\"><span style=\"top:-3.322em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight x-arrow-pad\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">POOL</span></span></span></span></span><span class=\"svg-align\" style=\"top:-2.689em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"hide-tail\" style=\"height:0.522em;min-width:1.469em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"0.522em\" viewBox=\"0 0 400000 522\" preserveAspectRatio=\"xMaxYMin slice\"><path d=\"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.011em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">14</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">14</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1113em;vertical-align:-0.011em;\"></span><span class=\"mord\">6</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel x-arrow\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1003em;\"><span style=\"top:-3.322em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight x-arrow-pad\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">CONV+ReLU</span></span></span></span></span><span class=\"svg-align\" style=\"top:-2.689em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"hide-tail\" style=\"height:0.522em;min-width:1.469em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"0.522em\" viewBox=\"0 0 400000 522\" preserveAspectRatio=\"xMaxYMin slice\"><path d=\"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.011em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1113em;vertical-align:-0.011em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel x-arrow\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1003em;\"><span style=\"top:-3.322em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight x-arrow-pad\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">POOL</span></span></span></span></span><span class=\"svg-align\" style=\"top:-2.689em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"hide-tail\" style=\"height:0.522em;min-width:1.469em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"0.522em\" viewBox=\"0 0 400000 522\" preserveAspectRatio=\"xMaxYMin slice\"><path d=\"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.011em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1191em;vertical-align:-0.011em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel x-arrow\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1081em;\"><span style=\"top:-3.322em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight x-arrow-pad\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">Flatten</span></span></span></span></span><span class=\"svg-align\" style=\"top:-2.689em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"hide-tail\" style=\"height:0.522em;min-width:1.469em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"0.522em\" viewBox=\"0 0 400000 522\" preserveAspectRatio=\"xMaxYMin slice\"><path d=\"M0 241v40h399891c-47.3 35.3-84 78-110 128\n-16.7 32-27.7 63.7-33 95 0 1.3-.2 2.7-.5 4-.3 1.3-.5 2.3-.5 3 0 7.3 6.7 11 20\n 11 8 0 13.2-.8 15.5-2.5 2.3-1.7 4.2-5.5 5.5-11.5 2-13.3 5.7-27 11-41 14.7-44.7\n 39-84.5 73-119.5s73.7-60.2 119-75.5c6-2 9-5.7 9-11s-3-9-9-11c-45.3-15.3-85\n-40.5-119-75.5s-58.3-74.8-73-119.5c-4.7-14-8.3-27.3-11-40-1.3-6.7-3.2-10.8-5.5\n-12.5-2.3-1.7-7.5-2.5-15.5-2.5-14 0-21 3.7-21 11 0 2 2 10.3 6 25 20.7 83.3 67\n 151.7 139 205zm0 0v40h399900v-40z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.011em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord\">FC</span></span></span></span></span></span></p>\n<p>核心规律：</p>\n<ul>\n<li><strong>空间尺寸逐步缩小</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>32</mn><mo>→</mo><mn>28</mn><mo>→</mo><mn>14</mn><mo>→</mo><mn>10</mn><mo>→</mo><mn>5</mn></mrow><annotation encoding=\"application/x-tex\">32 \\to 28 \\to 14 \\to 10 \\to 5</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">32</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">28</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">14</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">5</span></span></span></span>（通过卷积缩小 + 池化加速缩小）</li>\n<li><strong>通道数逐步增加</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3</mn><mo>→</mo><mn>6</mn><mo>→</mo><mn>10</mn></mrow><annotation encoding=\"application/x-tex\">3 \\to 6 \\to 10</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">6</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">10</span></span></span></span>（从低级纹理到高级语义，需要更多模式来描述）</li>\n<li>尾部使用全连接层进行分类</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cnn/convnetvis.jpeg\" alt=\"典型 CNN 架构中各层特征图可视化\" /></p>\n<h3 id=\"参数量分布规律\"><a class=\"anchor\" href=\"#参数量分布规律\">#</a> 参数量分布规律</h3>\n<p>以经典架构为例，各层的参数量和计算量分布呈明显的&quot;倒置&quot;结构：</p>\n<ul>\n<li><strong>早期层</strong>：参数量极少（滤波器尺寸小），但计算量最大（空间尺寸大，卷积在大量空间位置上滑动）</li>\n<li><strong>后期全连接层</strong>：参数量巨大（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>5</mn><mo>×</mo><mn>5</mn><mo>×</mo><mn>10</mn><mo>=</mo><mn>250</mn></mrow><annotation encoding=\"application/x-tex\">5 \\times 5 \\times 10 = 250</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">5</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">250</span></span></span></span> 个输入神经元 × 分类数 = 大量参数），但计算量较小</li>\n</ul>\n<p>这个规律驱动了此后多年 CNN 架构的设计：<strong>将全连接层替换为全局平均池化以削减参数，用更深的卷积层换取表达能力</strong>。</p>\n<h3 id=\"设计中的关键归纳偏置\"><a class=\"anchor\" href=\"#设计中的关键归纳偏置\">#</a> 设计中的关键归纳偏置</h3>\n<p>卷积网络的两个核心假设（归纳偏置），使其天然适合处理图像：</p>\n<ol>\n<li><strong>局部性 Locality</strong>：像素之间的关系随空间距离增大而衰减——近处相关的概率远大于远处</li>\n<li><strong>平移不变性 Translation Invariance</strong>：一个特征无论在图像的哪个位置出现，都应当被同一滤波器检测到——参数共享保证了这一点</li>\n</ol>\n<p>这两个假设不是从数据中学来的，而是被<strong>硬编码</strong>进网络结构中的设计选择。它们大幅减少了自由度——一个全连接层需要学习 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mi>H</mi><mi>W</mi><mo>×</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mi>H</mi><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">C_{in}HW \\times C_{out}HW</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 个连接模式，而卷积层只需学习 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><msub><mi>K</mi><mi>h</mi></msub><msub><mi>K</mi><mi>w</mi></msub><mo>×</mo><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">C_{in}K_hK_w \\times C_{out}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 个，同时天然过滤掉大量无意义的假关联。</p>\n<hr />\n<h2 id=\"代码实现\"><a class=\"anchor\" href=\"#代码实现\">#</a> 代码实现</h2>\n<h3 id=\"基本卷积运算的-numpy-实现\"><a class=\"anchor\" href=\"#基本卷积运算的-numpy-实现\">#</a> 基本卷积运算的 NumPy 实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">import</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> numpy </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">as</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> conv2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> b</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pad</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    X: 输入 (N, C_in, H, W)</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    W: 滤波器 (C_out, C_in, K_h, K_w)</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    b: 偏置 (C_out,)</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C_in</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> H</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_in </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    C_out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> _</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K_h</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K_w </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    H_out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pad </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K_h</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> //</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W_out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W_in </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pad </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> K_w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> //</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    # 填充</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    X_padded </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pad</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pad</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pad</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pad</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pad</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">)</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C_out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> H_out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_out</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> n </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">C_out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">            for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H_out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">                for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W_out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    h_start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    w_start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    patch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_padded</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">n</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> :,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h_start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h_start</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">K_h</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w_start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w_start</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">K_w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">n</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">patch </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">c</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> b</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">c</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> max_pool2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pool_size</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    最大池化前向传播</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    X: 输入 (N, C, H, W)</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> H</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_in </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    H_out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pool_size</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> //</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W_out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W_in </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> pool_size</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> //</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> H_out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_out</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> n </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">C</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">            for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H_out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">                for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W_out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    h_start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    w_start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> stride</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                    out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">n</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">max</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">                        X</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">n</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> c</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h_start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h_start</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pool_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w_start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w_start</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">pool_size</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span></span>\n<span class=\"line\"><span style=\"color:#999999;--shiki-dark:#666666\">                    </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span></span></code></pre>\n<h3 id=\"cnn-架构示例类似-lenet-5-风格\"><a class=\"anchor\" href=\"#cnn-架构示例类似-lenet-5-风格\">#</a> CNN 架构示例（类似 LeNet-5 风格）</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">class</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\"> SimpleCNN</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> __init__</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 第一卷积层：3 → 6 通道, 5x5 卷积核</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv1_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">6</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 3</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 5</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 5</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv1_b </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">6</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 第二卷积层：6 → 16 通道, 5x5 卷积核</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv2_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">16</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 6</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 5</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 5</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv2_b </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">16</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 全连接层：16*5*5 = 400 → 120 → 10</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc1_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">400</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 120</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc1_b </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">120</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc2_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">120</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 10</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc2_b </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">10</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 输入 X: (N, 3, 32, 32)</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # Conv1: (N,3,32,32) → (N,6,28,28)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> conv2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv1_W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv1_b</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">maximum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # ReLU</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # Pool1: (N,6,28,28) → (N,6,14,14)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_pool2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> pool_size</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> stride</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # Conv2: (N,6,14,14) → (N,16,10,10)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> conv2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv2_W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">conv2_b</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">maximum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # ReLU</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # Pool2: (N,16,10,10) → (N,16,5,5)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> max_pool2d_forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">out</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> pool_size</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> stride</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # Flatten: (N,16,5,5) → (N,400)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reshape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # FC1: (N,400) → (N,120)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc1_W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc1_b</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        out </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">maximum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # ReLU</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # FC2: (N,120) → (N,10)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> out</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc2_W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fc2_b</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> scores</span></span></code></pre>\n<hr />\n<h2 id=\"convolution-summary\"><a class=\"anchor\" href=\"#convolution-summary\">#</a> Convolution Summary</h2>\n<table>\n<thead>\n<tr>\n<th>概念</th>\n<th>说明</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>卷积层</strong></td>\n<td>局部连接 + 参数共享，保留空间结构</td>\n</tr>\n<tr>\n<td><strong>滤波器</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub><mo>×</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">K_h \\times K_w \\times C_{in}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 的小张量，每个滤波器学习一种模式</td>\n</tr>\n<tr>\n<td><strong>输出维度</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">⌊</mo><mo stretchy=\"false\">(</mo><mi>W</mi><mo>−</mo><mi>K</mi><mo>+</mo><mn>2</mn><mi>P</mi><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">/</mi><mi>S</mi><mo stretchy=\"false\">⌋</mo><mo>+</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\lfloor (W - K + 2P) / S \\rfloor + 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">⌊(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mclose\">)</span><span class=\"mord\">/</span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;\">S</span><span class=\"mclose\">⌋</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></td>\n</tr>\n<tr>\n<td><strong>参数量</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>C</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>×</mo><mo stretchy=\"false\">(</mo><msub><mi>C</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>×</mo><msub><mi>K</mi><mi>h</mi></msub><mo>×</mo><msub><mi>K</mi><mi>w</mi></msub><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">C_{out} \\times (C_{in} \\times K_h \\times K_w + 1)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">in</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:-0.0715em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span>，与输入空间尺寸无关</td>\n</tr>\n<tr>\n<td><strong>步幅</strong></td>\n<td>控制滑动步长，影响输出尺寸和计算量</td>\n</tr>\n<tr>\n<td><strong>零填充</strong></td>\n<td>控制边界效果，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><mo>=</mo><mo stretchy=\"false\">(</mo><mi>K</mi><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">P = (K-1)/2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">K</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mord\">/2</span></span></span></span> 可保持尺寸不变</td>\n</tr>\n<tr>\n<td><strong>感受野</strong></td>\n<td>堆叠卷积层后逐层扩大，深层学习全局结构</td>\n</tr>\n<tr>\n<td><strong><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">1 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 卷积</strong></td>\n<td>通道维度上的线性组合，用于升降维和增加非线性</td>\n</tr>\n<tr>\n<td><strong>膨胀卷积</strong></td>\n<td>不增加参数的前提下扩大感受野</td>\n</tr>\n<tr>\n<td><strong>分组卷积</strong></td>\n<td>分通道独立卷积，减少参数量和计算量</td>\n</tr>\n</tbody>\n</table>\n<h2 id=\"pooling-summary\"><a class=\"anchor\" href=\"#pooling-summary\">#</a> Pooling Summary</h2>\n<table>\n<thead>\n<tr>\n<th>概念</th>\n<th>说明</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><strong>池化层</strong></td>\n<td>轻量级下采样，无参数，降低空间尺寸和计算量</td>\n</tr>\n<tr>\n<td><strong>最大池化</strong></td>\n<td>最常用，自带非线性，引入平移不变性</td>\n</tr>\n<tr>\n<td><strong>平均池化</strong></td>\n<td>需要额外非线性激活，已被最大池化大量取代</td>\n</tr>\n<tr>\n<td><strong>典型设置</strong></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mo>×</mo><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">2 \\times 2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2</span></span></span></span> 窗口，步幅 2，无填充</td>\n</tr>\n<tr>\n<td><strong>全局平均池化</strong></td>\n<td>将 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>H</mi><mo>×</mo><mi>W</mi><mo>×</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">H \\times W \\times C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span> 压缩为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>×</mo><mn>1</mn><mo>×</mo><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">1 \\times 1 \\times C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>，替代全连接层</td>\n</tr>\n<tr>\n<td><strong>位置</strong></td>\n<td>与卷积层交替插入：CONV-ReLU-POOL 循环模式</td>\n</tr>\n</tbody>\n</table>\n<hr />\n<h2 id=\"声明\"><a class=\"anchor\" href=\"#声明\">#</a> 声明</h2>\n<p>本blog由Yumengmeng基于<a href=\"https://www.bilibili.com/video/BV1YJ3PzLEiW?spm_id_from=333.788.videopod.episodes&amp;vd_source=9f80ac68a038439c43f542a83ffa7b69&amp;p=3\">2025春季李飞飞斯坦福CS231n计算机视觉课程</a>的视频内容结合Claude Code抓取网上开源笔记进行美化与排版,仅供个人复习使用。</p>\n",
            "tags": [
                "CS231n学习笔记",
                "CS231n",
                "计算机视觉",
                "深度学习",
                "卷积神经网络",
                "CNN"
            ]
        },
        {
            "id": "https://yumengmeng.cn/2026/06/02/CS231n%E2%80%94%E2%80%94lecture4%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E4%B8%8E%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD/index/",
            "url": "https://yumengmeng.cn/2026/06/02/CS231n%E2%80%94%E2%80%94lecture4%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E4%B8%8E%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD/index/",
            "title": "CS231n——Lecture4 神经网络与反向传播",
            "date_published": "2026-06-02T05:06:16.000Z",
            "content_html": "<h2 id=\"从线性分类器到神经网络\"><a class=\"anchor\" href=\"#从线性分类器到神经网络\">#</a> 从线性分类器到神经网络</h2>\n<h3 id=\"回顾线性函数\"><a class=\"anchor\" href=\"#回顾线性函数\">#</a> 回顾：线性函数</h3>\n<p>线性分类器的核心公式：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>W</mi><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">f(x, W) = Wx\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mord mathnormal\">x</span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>D</mi></msup></mrow><annotation encoding=\"application/x-tex\">x \\in \\mathbb{R}^D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">D</span></span></span></span></span></span></span></span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mi>C</mi><mo>×</mo><mi>D</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">W \\in \\mathbb{R}^{C \\times D}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em;\">C</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">D</span></span></span></span></span></span></span></span></span></span></span></span>。D 是输入维度，C 是类别数量（输出标签数量）。</p>\n<p>在 Lecture 2 中我们看到，线性分类器每类只能学习<strong>一个模板</strong>，面对多模态分布、同心圆等问题完全无能为力——你无法用一条直线分开两个交替占据四个象限的类别。</p>\n<h3 id=\"双层神经网络\"><a class=\"anchor\" href=\"#双层神经网络\">#</a> 双层神经网络</h3>\n<p>神经网络在线性分类器的基础上，在输入和输出之间插入了一个<strong>隐藏层</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mi>W</mi><mn>2</mn></msub><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">f(x, W_1, W_2) = W_2 \\max(0, W_1 x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>其中：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>D</mi></msup></mrow><annotation encoding=\"application/x-tex\">x \\in \\mathbb{R}^D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">D</span></span></span></span></span></span></span></span></span></span></span> — 输入向量</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mn>1</mn></msub><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mi>H</mi><mo>×</mo><mi>D</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">W_1 \\in \\mathbb{R}^{H \\times D}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.08125em;\">H</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">D</span></span></span></span></span></span></span></span></span></span></span></span> — 第一层权重，将输入映射到 H 维隐藏空间</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mn>2</mn></msub><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mi>C</mi><mo>×</mo><mi>H</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">W_2 \\in \\mathbb{R}^{C \\times H}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em;\">C</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.08125em;\">H</span></span></span></span></span></span></span></span></span></span></span></span> — 第二层权重，将隐藏表示映射到 C 个类别得分</li>\n<li>H — 隐藏神经元数量，是一个<strong>超参数</strong></li>\n</ul>\n<p>权重矩阵的维度必须保持一致性：D → H → C，矩阵乘法才能正确执行。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/neural_net.jpeg\" alt=\"双层神经网络结构\" /></p>\n<h3 id=\"为什么要引入非线性\"><a class=\"anchor\" href=\"#为什么要引入非线性\">#</a> 为什么要引入非线性？</h3>\n<p>这是整个神经网络设计中<strong>最关键的问题</strong>。</p>\n<p>如果去掉中间的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mo>⋅</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\max(0, \\cdot)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">⋅</span><span class=\"mclose\">)</span></span></span></span>，两层网络退化为：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo>=</mo><msub><mi>W</mi><mn>2</mn></msub><mo stretchy=\"false\">(</mo><msub><mi>W</mi><mn>1</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo stretchy=\"false\">(</mo><msub><mi>W</mi><mn>2</mn></msub><msub><mi>W</mi><mn>1</mn></msub><mo stretchy=\"false\">)</mo><mi>x</mi><mo>=</mo><msub><mi>W</mi><mn>3</mn></msub><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">f = W_2 (W_1 x) = (W_2 W_1) x = W_3 x\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span></span></span></span></span></p>\n<p>存在一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>W</mi><mn>3</mn></msub><mo>=</mo><msub><mi>W</mi><mn>2</mn></msub><msub><mi>W</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">W_3 = W_2 W_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 使得 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi><mo>=</mo><msub><mi>W</mi><mn>3</mn></msub><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">f = W_3 x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span></span></span></span>，又变回了线性分类器。<strong>堆叠任意多层线性变换，等价于一层线性变换</strong>——无论加多少层，表达能力没有任何提升。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>线性 → 线性 → 线性 = 线性</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{线性 → 线性 → 线性 = 线性}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">线性</span><span class=\"mord\"> → </span><span class=\"mord cjk_fallback\">线性</span><span class=\"mord\"> → </span><span class=\"mord cjk_fallback\">线性</span><span class=\"mord\"> = </span><span class=\"mord cjk_fallback\">线性</span></span></span></span></span></span></p>\n<p>非线性激活函数的作用是<strong>从一个空间变换到另一个空间</strong>，使原本线性不可分的数据在新空间中变得线性可分。隐藏层中的每个神经元可以理解为最终输出标签的某一部分&quot;特征模板&quot;——比如一个神经元学会检测&quot;猫耳朵&quot;，另一个检测&quot;猫眼睛&quot;，最终组合起来做出分类决策。</p>\n<hr />\n<h2 id=\"激活函数-activation-functions\"><a class=\"anchor\" href=\"#激活函数-activation-functions\">#</a> 激活函数 Activation Functions</h2>\n<p>除了 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\max(0, x)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span>（ReLU），常见的激活函数还包括：</p>\n<h3 id=\"sigmoid\"><a class=\"anchor\" href=\"#sigmoid\">#</a> Sigmoid</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\sigma(x) = \\frac{1}{1+e^{-x}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0908em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6973em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>将值压缩到 (0,1)；容易导致<strong>梯度消失</strong>——两端饱和区梯度趋近于零。梯度公式简洁：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>σ</mi><mo mathvariant=\"normal\" lspace=\"0em\" rspace=\"0em\">′</mo></msup><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\sigma&#x27;(x) = (1-\\sigma(x)) \\cdot \\sigma(x)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0019em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7519em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">′</span></span></span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span>。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/sigmoid.jpeg\" alt=\"Sigmoid\" /></p>\n<h3 id=\"tanh\"><a class=\"anchor\" href=\"#tanh\">#</a> Tanh</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>tanh</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mrow><msup><mi>e</mi><mi>x</mi></msup><mo>−</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow><mrow><msup><mi>e</mi><mi>x</mi></msup><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\tanh(x) = \\frac{e^x-e^{-x}}{e^x+e^{-x}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">tanh</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.2177em;vertical-align:-0.7693em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4483em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.5904em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6973em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7713em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>将值压缩到 (-1,1)，<strong>零中心化</strong>优于 Sigmoid，但同样存在梯度消失问题。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/tanh.jpeg\" alt=\"Tanh\" /></p>\n<h3 id=\"relu\"><a class=\"anchor\" href=\"#relu\">#</a> ReLU</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>ReLU</mtext><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{ReLU}(x) = \\max(0, x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">ReLU</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>计算高效，<strong>大多数场景的默认首选</strong>（CNN、Transformer 均适用）；缺点是有&quot;<strong>死亡神经元</strong>&quot;问题——权重一旦归零后永不激活。Krizhevsky 等人 2012 年的实验表明，ReLU 的收敛速度远超 Tanh：</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/relu.jpeg\" alt=\"ReLU\" /></p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/alexplot.jpeg\" alt=\"ReLU vs Tanh 收敛速度对比（AlexNet论文）\" /></p>\n<h3 id=\"leaky-relu\"><a class=\"anchor\" href=\"#leaky-relu\">#</a> Leaky ReLU</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>LeakyReLU</mtext><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0.01</mn><mi>x</mi><mo separator=\"true\">,</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{LeakyReLU}(x) = \\max(0.01x, x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">LeakyReLU</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0.01</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>负区间保留微小梯度，缓解死亡神经元问题。</p>\n<h3 id=\"elu\"><a class=\"anchor\" href=\"#elu\">#</a> ELU</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>x</mi></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>x</mi><mo>≥</mo><mn>0</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>α</mi><mo stretchy=\"false\">(</mo><msup><mi>e</mi><mi>x</mi></msup><mo>−</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>x</mi><mo>&lt;</mo><mn>0</mn></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">f(x) = \\begin{cases} x &amp; x \\ge 0 \\\\ \\alpha(e^x-1) &amp; x &lt; 0 \\end{cases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>负区间平滑，输出均值接近零，训练更稳定但计算量稍大。</p>\n<h3 id=\"gelu-与-silu-swish\"><a class=\"anchor\" href=\"#gelu-与-silu-swish\">#</a> GELU 与 SILU (Swish)</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>GELU</mtext><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>x</mi><mo>⋅</mo><mi mathvariant=\"normal\">Φ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo separator=\"true\">,</mo><mspace width=\"1em\"/><mtext>SILU</mtext><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>x</mi><mo>⋅</mo><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{GELU}(x) = x \\cdot \\Phi(x), \\quad \\text{SILU}(x) = x \\cdot \\sigma(x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord\">GELU</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Φ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:1em;\"></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord text\"><span class=\"mord\">SILU</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>GELU 是 Transformer 架构的标配激活函数（BERT、GPT 均使用）；SILU/Swish 是 GELU 的近似，在某些视觉任务中性能更优。</p>\n<h3 id=\"选择建议\"><a class=\"anchor\" href=\"#选择建议\">#</a> 选择建议</h3>\n<ul>\n<li><strong>ReLU</strong> — 默认起点，大部分 CNN 场景首选</li>\n<li><strong>GELU / SILU</strong> — 现代 Transformer 架构的主流选择</li>\n<li><strong>Sigmoid / Tanh</strong> — 基本不用于隐藏层（梯度消失严重），仅偶见于输出层或门控机制</li>\n<li>激活函数最核心的作用永远是：<strong>引入非线性</strong></li>\n</ul>\n<hr />\n<h2 id=\"一个完整的双层神经网络\"><a class=\"anchor\" href=\"#一个完整的双层神经网络\">#</a> 一个完整的双层神经网络</h2>\n<p>大约 20 行代码即可构建。分为四步：</p>\n<h3 id=\"第一步定义模型\"><a class=\"anchor\" href=\"#第一步定义模型\">#</a> 第一步：定义模型</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">import</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> numpy </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">as</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 超参数</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 100</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 样本数</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">D </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 3072</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">       # 输入维度（如 CIFAR-10: 32×32×3）</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 100</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 隐藏神经元数量</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">C </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 10</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">         # 输出类别数</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 随机生成数据（仅示例）</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> D</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">y </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randint</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 初始化权重：小随机数 × 缩放因子</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">D</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> H</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">b1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">H</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> C</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.01</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">b2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">C</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<h3 id=\"第二步前向传播-forward-pass\"><a class=\"anchor\" href=\"#第二步前向传播-forward-pass\">#</a> 第二步：前向传播 Forward Pass</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 隐藏层：线性变换 + ReLU 激活</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> b1          </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># [N, H]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h_relu </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">maximum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"> # ReLU</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 输出层：线性变换得到得分</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h_relu </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W2 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> b2 </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># [N, C]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># Softmax 交叉熵损失</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp_scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">max</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> axis</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> keepdims</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">True</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">probs </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> exp_scores </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp_scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> axis</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> keepdims</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">True</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 数据损失：负对数似然</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mean</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">log</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">probs</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">arange</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#a13865;--shiki-dark:#d9739f\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 正则化损失：L2</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">reg </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.001</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">total_loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> reg</span></span></code></pre>\n<h3 id=\"第三步反向传播-backward-pass\"><a class=\"anchor\" href=\"#第三步反向传播-backward-pass\">#</a> 第三步：反向传播 Backward Pass</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># Softmax 梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dscores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> probs</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dscores</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">arange</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> -=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dscores </span><span style=\"color:#999999;--shiki-dark:#666666\">/=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># W2 和 b2 的梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dW2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h_relu</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dscores </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.001</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W2  </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 包含正则化梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">db2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dscores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> axis</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 反向传播通过 ReLU：只路由到激活的神经元</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dh </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dscores </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W2</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dh</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   # ReLU 反向：h≤0 的位置梯度为零</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># W1 和 b1 的梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dW1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dh </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.001</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">db1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dh</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> axis</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<h3 id=\"第四步梯度下降更新\"><a class=\"anchor\" href=\"#第四步梯度下降更新\">#</a> 第四步：梯度下降更新</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">learning_rate </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1e-3</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W1 </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">b1 </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> db1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W2 </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW2</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">b2 </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> db2</span></span></code></pre>\n<hr />\n<h2 id=\"模型容量与正则化\"><a class=\"anchor\" href=\"#模型容量与正则化\">#</a> 模型容量与正则化</h2>\n<p><strong>更多的神经元 = 更强的学习能力 = 更复杂的决策边界</strong>。但容量过大必然导致过拟合——模型记住了训练数据的噪声而非真实规律。</p>\n<p>实践中更推荐的做法是：</p>\n<blockquote>\n<p><strong>优先调整正则化超参数，而非缩减模型规模。</strong></p>\n</blockquote>\n<p>一个容量足够大 + 正确正则化的网络，通常优于一个容量刚好合适的网络。正则化强度 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 是关键调节旋钮：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 过大 → 权重被过度压制 → 欠拟合</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 过小 → 模型自由度过高 → 过拟合</li>\n<li>通过验证集调参找到最佳值</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/layer_sizes.jpeg\" alt=\"不同隐藏层神经元数量下的决策边界\" /></p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn1/reg_strengths.jpeg\" alt=\"不同正则化强度下的决策边界\" /></p>\n<p>损失函数的完整组成（以双层网络为例）：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>得分: </mtext><mi>s</mi><mo>=</mo><msub><mi>W</mi><mn>2</mn></msub><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>W</mi><mn>1</mn></msub><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{得分: } s = W_2 \\max(0, W_1 x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">得分</span><span class=\"mord\">: </span></span><span class=\"mord mathnormal\">s</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>数据损失（Hinge）: </mtext><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><munder><mo>∑</mo><mrow><mi>j</mi><mo mathvariant=\"normal\">≠</mo><msub><mi>y</mi><mi>i</mi></msub></mrow></munder><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>s</mi><mi>j</mi></msub><mo>−</mo><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{数据损失（Hinge）: } L_i = \\sum_{j \\neq y_i} \\max(0, s_j - s_{y_i} + 1)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8778em;vertical-align:-0.1944em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">数据损失（</span><span class=\"mord\">Hinge</span><span class=\"mord cjk_fallback\">）</span><span class=\"mord\">: </span></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4882em;vertical-align:-1.4382em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\"><span class=\"mrel mtight\"><span class=\"mord vbox mtight\"><span class=\"thinbox mtight\"><span class=\"rlap mtight\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"inner\"><span class=\"mord mtight\"><span class=\"mrel mtight\"></span></span></span><span class=\"fix\"></span></span></span></span></span><span class=\"mrel mtight\">=</span></span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4382em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>正则化（L2）: </mtext><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>k</mi></munder><munder><mo>∑</mo><mi>l</mi></munder><msubsup><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow><mn>2</mn></msubsup></mrow><annotation encoding=\"application/x-tex\">\\text{正则化（L2）: } R(W) = \\sum_k \\sum_l W_{k,l}^2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">正则化（</span><span class=\"mord\">L2</span><span class=\"mord cjk_fallback\">）</span><span class=\"mord\">: </span></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3521em;vertical-align:-1.3021em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3831em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>总损失: </mtext><mi>L</mi><mo>=</mo><mfrac><mn>1</mn><mi>N</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><msub><mi>L</mi><mi>i</mi></msub><mo>+</mo><mi>λ</mi><mi>R</mi><mo stretchy=\"false\">(</mo><msub><mi>W</mi><mn>1</mn></msub><mo stretchy=\"false\">)</mo><mo>+</mo><mi>λ</mi><mi>R</mi><mo stretchy=\"false\">(</mo><msub><mi>W</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\text{总损失: } L = \\frac{1}{N} \\sum_i L_i + \\lambda R(W_1) + \\lambda R(W_2)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">总损失</span><span class=\"mord\">: </span></span><span class=\"mord mathnormal\">L</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.5991em;vertical-align:-1.2777em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<hr />\n<h2 id=\"计算图与反向传播\"><a class=\"anchor\" href=\"#计算图与反向传播\">#</a> 计算图与反向传播</h2>\n<h3 id=\"什么是计算图\"><a class=\"anchor\" href=\"#什么是计算图\">#</a> 什么是计算图？</h3>\n<p>计算图是一个<strong>有向无环图（DAG）</strong>，节点是运算步骤（加法、乘法、max、sigmoid 等），边是数据流（标量、向量、矩阵）。通过计算图，我们可以系统化地应用链式法则，自动计算任意复杂函数的梯度——这就是 PyTorch、JAX 等框架自动微分的核心原理。</p>\n<p>反向传播的核心机制非常简单：每个节点收到上游传来的梯度，乘以自身的<strong>局部梯度</strong>（输出对输入的导数），然后把结果传递给下游。整个过程完全局部化，每个节点独立运作。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>下游梯度</mtext><mo>=</mo><mtext>上游梯度</mtext><mo>×</mo><mtext>局部梯度</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{下游梯度} = \\text{上游梯度} \\times \\text{局部梯度}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">下游梯度</span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">上游梯度</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord text\"><span class=\"mord cjk_fallback\">局部梯度</span></span></span></span></span></span></p>\n<p>计算图上的反向传播，本质就是链式法则的递归应用。根据输入 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 的维度，分成三种情况：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 是标量时导数是标量，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 是向量且 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span> 是标量时导数是向量（梯度），<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span> 都是向量时导数变成雅可比矩阵。</p>\n<hr />\n<h3 id=\"x-是标量y-是标量-导数也是标量\"><a class=\"anchor\" href=\"#x-是标量y-是标量-导数也是标量\">#</a> x 是标量，y 是标量 → 导数也是标量</h3>\n<p><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span> 以及所有中间变量都是标量，导数就是普通的 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi>d</mi><mi>y</mi></mrow><mrow><mi>d</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{dy}{dx}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span>。最基础的情形。</p>\n<p>经典例子 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>y</mi><mo separator=\"true\">,</mo><mi>z</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo stretchy=\"false\">(</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>z</mi></mrow><annotation encoding=\"application/x-tex\">f(x, y, z) = (x + y) \\cdot z</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.04398em;\">z</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.04398em;\">z</span></span></span></span>，设 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo>=</mo><mo>−</mo><mn>2</mn><mo separator=\"true\">,</mo><mi>y</mi><mo>=</mo><mn>5</mn><mo separator=\"true\">,</mo><mi>z</mi><mo>=</mo><mo>−</mo><mn>4</mn></mrow><annotation encoding=\"application/x-tex\">x=-2, y=5, z=-4</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">−</span><span class=\"mord\">2</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em;\"></span><span class=\"mord\">5</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.04398em;\">z</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">−</span><span class=\"mord\">4</span></span></span></span>：</p>\n<p>前向：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>q</mi><mo>=</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo>=</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">q = x + y = 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi><mo>=</mo><mi>q</mi><mo>⋅</mo><mi>z</mi><mo>=</mo><mo>−</mo><mn>12</mn></mrow><annotation encoding=\"application/x-tex\">f = q \\cdot z = -12</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.04398em;\">z</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">−</span><span class=\"mord\">12</span></span></span></span></p>\n<p>反向（从 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow></mfrac><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial f}=1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.4133em;vertical-align:-0.4811em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4811em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span> 开始）：</p>\n<table>\n<thead>\n<tr>\n<th>步骤</th>\n<th>计算</th>\n<th>结果</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>z</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial z}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.04398em;\">z</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>q</mi><mo>×</mo><mn>1</mn><mo>=</mo><mn>3</mn></mrow><annotation encoding=\"application/x-tex\">q \\times 1 = 3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">q</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3</span></span></span></span></td>\n<td><strong>3</strong></td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>q</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial q}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.4133em;vertical-align:-0.4811em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">q</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4811em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>z</mi><mo>×</mo><mn>1</mn><mo>=</mo><mo>−</mo><mn>4</mn></mrow><annotation encoding=\"application/x-tex\">z \\times 1 = -4</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.04398em;\">z</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">−</span><span class=\"mord\">4</span></span></span></span></td>\n<td><strong>-4</strong></td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></td>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>q</mi></mrow></mfrac><mo>⋅</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>q</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac><mo>=</mo><mo>−</mo><mn>4</mn><mo>×</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial q} \\cdot \\frac{\\partial q}{\\partial x} = -4 \\times 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.4133em;vertical-align:-0.4811em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">q</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4811em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">q</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">−</span><span class=\"mord\">4</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1</span></span></span></span></td>\n<td><strong>-4</strong></td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>f</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial f}{\\partial y}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.4133em;vertical-align:-0.4811em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4811em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></td>\n<td>同理</td>\n<td><strong>-4</strong></td>\n</tr>\n</tbody>\n</table>\n<p>在这个最简单的例子中，三种基本门已经展现出了各自的&quot;性格&quot;：</p>\n<ul>\n<li><strong>Add 门</strong>：<strong>分发器</strong>。上游梯度原封不动传给两个输入（局部梯度恒为 1）</li>\n<li><strong>Multiply 门</strong>：<strong>交换器</strong>。<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mo stretchy=\"false\">(</mo><mi>a</mi><mi>b</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>a</mi></mrow></mfrac><mo>=</mo><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial (ab)}{\\partial a} = b</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.355em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.01em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">a</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.485em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">ab</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span></span></span></span>，梯度乘以&quot;另一个输入的值&quot;。这意味着如果乘法门的一个输入很小、另一个很大，梯度分配会严重失衡——小输入得到极大梯度，大输入得到极小梯度。这也是<strong>数据预处理影响训练稳定性</strong>的根本原因</li>\n<li><strong>Max 门</strong>：<strong>路由器</strong>。梯度只流向前向传播中值更大的那个输入，另一个得 0</li>\n</ul>\n<p>当同一个变量被多次使用时（即计算图中出现分支），来自各条路径的梯度需要<strong>累加</strong>（<code>+=</code> 而不是 <code>=</code>），这是多变量链式法则的直接推论。</p>\n<p>Sigmoid 函数 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>+</mo><msup><mi>e</mi><mrow><mo>−</mo><mi>x</mi></mrow></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\sigma(x) = \\frac{1}{1+e^{-x}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2484em;vertical-align:-0.4033em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7027em;\"><span style=\"top:-2.786em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mathnormal mtight\">x</span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4033em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 可以拆解为多个基本门（add → multiply → exp → add → reciprocal），但其局部梯度有简洁的闭式形式：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi>d</mi><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mi>d</mi><mi>x</mi></mrow></mfrac><mo>=</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo><mo>⋅</mo><mi>σ</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{d\\sigma(x)}{dx} = (1 - \\sigma(x)) \\cdot \\sigma(x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.113em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">))</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">σ</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>将整个 sigmoid 打包成一个&quot;复合门&quot;能大幅简化计算图：</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 前向</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">f </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1.0</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> /</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> math</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">exp</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">   # sigmoid 复合门</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 反向：一行公式得到梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">ddot </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> f</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> f               </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># σ'(x) = (1-σ)σ</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">w</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> ddot</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> w</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> ddot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dw </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> ddot</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> ddot</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1.0</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> ddot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span></code></pre>\n<hr />\n<h3 id=\"x-是向量y-是标量-导数是向量\"><a class=\"anchor\" href=\"#x-是向量y-是标量-导数是向量\">#</a> x 是向量，y 是标量 → 导数是向量</h3>\n<p>这是实际神经网络中最常见的情形——<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 是一个向量（比如隐藏层激活值 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>h</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>H</mi></msup></mrow><annotation encoding=\"application/x-tex\">h \\in \\mathbb{R}^H</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7335em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">h</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.08125em;\">H</span></span></span></span></span></span></span></span></span></span></span>），损失 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span> 是标量。此时 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 变成一个与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 同形的向量，每个分量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>x</mi><mi>i</mi></msub></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial x_i}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3252em;vertical-align:-0.4451em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4451em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 表示改变 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">x_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 对最终损失的影响。</p>\n<p><strong>核心原则</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>v</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial v}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">v</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 的形状永远与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>v</mi></mrow><annotation encoding=\"application/x-tex\">v</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span></span></span></span> 完全相同。这是向量化反向传播的第一性原理，后面所有推导都以此为出发点。</p>\n<p>对于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi><mo>=</mo><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">y = f(x)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>n</mi></msup><mo separator=\"true\">,</mo><mi>y</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>m</mi></msup></mrow><annotation encoding=\"application/x-tex\">x \\in \\mathbb{R}^n, y \\in \\mathbb{R}^m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5782em;vertical-align:-0.0391em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8833em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6889em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span></span></span></span></span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span></span></span></span> 为标量：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac><mo>=</mo><msup><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac><mo fence=\"true\">)</mo></mrow><mi>T</mi></msup><mo>⋅</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial x} = \\left(\\frac{\\partial y}{\\partial x}\\right)^T \\cdot \\frac{\\partial L}{\\partial y}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.6313em;vertical-align:-0.95em;\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6812em;\"><span style=\"top:-3.9029em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.2519em;vertical-align:-0.8804em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8804em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>这里 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial y}{\\partial x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 是 Jacobian 矩阵（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo>×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m \\times n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span>），<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial y}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.3612em;vertical-align:-0.4811em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4811em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 是梯度向量（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi></mrow><annotation encoding=\"application/x-tex\">m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">m</span></span></span></span> 维）。但<strong>实际上我们几乎从不显式构造 Jacobian</strong>——一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1000</mn><mo>×</mo><mn>1000</mn></mrow><annotation encoding=\"application/x-tex\">1000 \\times 1000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1000</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">1000</span></span></span></span> 的 Jacobian 就有 100 万个元素，而实际网络的维度远大于此。</p>\n<p>以 ReLU 为例：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">y = \\max(0, x)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo separator=\"true\">,</mo><mi>y</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mi>n</mi></msup></mrow><annotation encoding=\"application/x-tex\">x, y \\in \\mathbb{R}^n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7335em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6889em;\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span></span></span></span></span></span></span></span>。它的 Jacobian 是一个 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n \\times n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">n</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 的<strong>对角矩阵</strong>——对角线上要么是 0（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>≤</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">x_i \\leq 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.786em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≤</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>）要么是 1（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>&gt;</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">x_i &gt; 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6891em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>），所有非对角元素全是 0。所以反向传播直接简化为逐元素条件判断：</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dh </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dscores </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W2</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T      </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 上游梯度 [N, H]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dh</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">h </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">           # 只传给激活了的神经元，无需构造矩阵</span></span></code></pre>\n<hr />\n<h3 id=\"x-是向量y-是向量-导数是雅可比矩阵重点\"><a class=\"anchor\" href=\"#x-是向量y-是向量-导数是雅可比矩阵重点\">#</a> x 是向量，y 是向量 → 导数是雅可比矩阵（重点）</h3>\n<p>当 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span> 都是向量时，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>y</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>x</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial y}{\\partial x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> 变成一个矩阵——雅可比矩阵（Jacobian），形状为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo>×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m \\times n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span>（<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi></mrow><annotation encoding=\"application/x-tex\">m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">m</span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span></span></span></span> 的维度，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">n</span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span></span></span></span> 的维度）。但实际中我们几乎从不显式构造它，而是利用稀疏性绕过。</p>\n<p>以全连接层 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mo>⋅</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">Y = X \\cdot W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 为例——<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>X</mi></mrow><annotation encoding=\"application/x-tex\">X</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>D</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, D]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span></span></span></span> 的矩阵，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>D</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[D, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi></mrow><annotation encoding=\"application/x-tex\">Y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span></span></span></span> 是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>X</mi></mrow><annotation encoding=\"application/x-tex\">X</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span></span>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>D</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, D]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>D</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[D, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi></mrow><annotation encoding=\"application/x-tex\">Y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span></span></span></span>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span></li>\n<li>已知上游梯度 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial Y}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">Y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span>，形状也是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>（与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi></mrow><annotation encoding=\"application/x-tex\">Y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span></span></span></span> 一致）</li>\n</ul>\n<p>不需要死记公式。<strong>维度分析法</strong>四步就能推出来：</p>\n<p><strong>推导 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>W</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial W}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></strong>：目标形状 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>D</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[D, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>（与 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 一致）。手上有的矩阵：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>X</mi><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>D</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">X [N, D]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial Y} [N, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">Y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>。唯一能拼出 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>D</mi><mo separator=\"true\">,</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[D, M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span> 的组合：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>X</mi><mi>T</mi></msup><mo stretchy=\"false\">[</mo><mi>D</mi><mo>×</mo><mi>N</mi><mo stretchy=\"false\">]</mo><mo>⋅</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac><mo stretchy=\"false\">[</mo><mi>N</mi><mo>×</mo><mi>M</mi><mo stretchy=\"false\">]</mo><mo>=</mo><mo stretchy=\"false\">[</mo><mi>D</mi><mo>×</mo><mi>M</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">X^T [D \\times N] \\cdot \\frac{\\partial L}{\\partial Y} [N \\times M] = [D \\times M]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0913em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">Y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span></span></span></span>。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>W</mi></mrow></mfrac><mo>=</mo><msup><mi>X</mi><mi>T</mi></msup><mo>⋅</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial W} = X^T \\cdot \\frac{\\partial L}{\\partial Y}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8913em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><strong>推导 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>X</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial X}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em;\">X</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></strong>：目标形状 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">[</mo><mi>N</mi><mo separator=\"true\">,</mo><mi>D</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">[N, D]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span></span></span></span>。<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac><mo stretchy=\"false\">[</mo><mi>N</mi><mo>×</mo><mi>M</mi><mo stretchy=\"false\">]</mo><mo>⋅</mo><msup><mi>W</mi><mi>T</mi></msup><mo stretchy=\"false\">[</mo><mi>M</mi><mo>×</mo><mi>D</mi><mo stretchy=\"false\">]</mo><mo>=</mo><mo stretchy=\"false\">[</mo><mi>N</mi><mo>×</mo><mi>D</mi><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial Y} [N \\times M] \\cdot W^T [M \\times D] = [N \\times D]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2251em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em;\">Y</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0913em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">]</span></span></span></span>。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>X</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><mi>Y</mi></mrow></mfrac><mo>⋅</mo><msup><mi>W</mi><mi>T</mi></msup></mrow><annotation encoding=\"application/x-tex\">\\frac{\\partial L}{\\partial X} = \\frac{\\partial L}{\\partial Y} \\cdot W^T\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal\">L</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8913em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>用代码写出来就两行：</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dY    </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># [D, N] @ [N, M] = [D, M]  ✓</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dX </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dY </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">@</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">T    </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># [N, M] @ [M, D] = [N, D]  ✓</span></span></code></pre>\n<p>观察这两个公式：它们正是乘法门&quot;交换变量&quot;特性在矩阵层面的体现——梯度传到 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 时乘以 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>X</mi><mi>T</mi></msup></mrow><annotation encoding=\"application/x-tex\">X^T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span></span>，传到 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>X</mi></mrow><annotation encoding=\"application/x-tex\">X</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span></span></span></span> 时乘以 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>W</mi><mi>T</mi></msup></mrow><annotation encoding=\"application/x-tex\">W^T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8413em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">T</span></span></span></span></span></span></span></span></span></span></span>。整个过程<strong>没有构造任何完整 Jacobian</strong>（那个尺寸会是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>N</mi><mi>M</mi><mo stretchy=\"false\">)</mo><mo>×</mo><mo stretchy=\"false\">(</mo><mi>N</mi><mi>D</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(NM) \\times (ND)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">NM</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"mclose\">)</span></span></span></span>，根本放不下内存）——这正是深度学习能在千万级参数上高效训练的根本原因。</p>\n<p>值得一提的是，这个维度分析技巧适用面非常广：全连接层、卷积层、注意力层、Einsum……任何张量运算的反向传播，只要记住&quot;梯度形状 = 变量形状&quot;，逆向拼维度就能推导出正确答案。</p>\n<hr />\n<h2 id=\"实践要点\"><a class=\"anchor\" href=\"#实践要点\">#</a> 实践要点</h2>\n<ol>\n<li><strong>分阶段计算</strong>：把复杂函数拆成简单中间变量，每个独立求导，链式组合</li>\n<li><strong>缓存前向值</strong>：反向需要前向的中间结果，别重复算</li>\n<li><strong>分支处用 <code>+=</code></strong>：同一变量被多次使用时，梯度必须累加</li>\n<li><strong>梯度形状 = 变量形状</strong>：所有向量化反向传播的推导起点</li>\n<li><strong>先数值梯度检查，再切解析梯度</strong>：调试时用有限差分验证实现正确性</li>\n</ol>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">class</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\"> MultiplyGate</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> forward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">y </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y  </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 缓存</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> backward</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dout</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">y </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dout</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dout  </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 交换 + 乘上游梯度</span></span></code></pre>\n<p>每个运算模块写好 <code>forward()</code> 和 <code>backward()</code>，就能像搭积木一样组合出任意复杂网络——这正是 PyTorch 等框架的设计哲学。</p>\n<hr />\n<h2 id=\"声明\"><a class=\"anchor\" href=\"#声明\">#</a> 声明</h2>\n<p>本blog由Yumengmeng基于<a href=\"https://www.bilibili.com/video/BV1YJ3PzLEiW?spm_id_from=333.788.videopod.episodes&amp;vd_source=9f80ac68a038439c43f542a83ffa7b69&amp;p=3\">2025春季李飞飞斯坦福CS231n计算机视觉课程</a>的视频内容结合Claude Code抓取网上开源笔记进行美化与排版,仅供个人复习使用。</p>\n",
            "tags": [
                "CS231n学习笔记",
                "CS231n",
                "计算机视觉",
                "深度学习",
                "神经网络",
                "反向传播"
            ]
        },
        {
            "id": "https://yumengmeng.cn/2026/05/31/CS231n%E2%80%94%E2%80%94lecture3%E6%AD%A3%E5%88%99%E5%8C%96%E4%B8%8E%E4%BC%98%E5%8C%96/index/",
            "url": "https://yumengmeng.cn/2026/05/31/CS231n%E2%80%94%E2%80%94lecture3%E6%AD%A3%E5%88%99%E5%8C%96%E4%B8%8E%E4%BC%98%E5%8C%96/index/",
            "title": "CS231n——Lecture3 正则化与优化",
            "date_published": "2026-05-31T07:27:56.000Z",
            "content_html": "<h2 id=\"正则化-regularization\"><a class=\"anchor\" href=\"#正则化-regularization\">#</a> 正则化 Regularization</h2>\n<h3 id=\"为什么需要正则化\"><a class=\"anchor\" href=\"#为什么需要正则化\">#</a> 为什么需要正则化？</h3>\n<p>在 Lecture 2 中我们定义了完整的损失函数：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>L</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><mn>1</mn><mi>N</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><msub><mi>L</mi><mi>i</mi></msub><mo stretchy=\"false\">(</mo><mi>f</mi><mo stretchy=\"false\">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo separator=\"true\">,</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo separator=\"true\">,</mo><msub><mi>y</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>+</mo><mi>λ</mi><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L(W) = \\frac{1}{N} \\sum_i L_i(f(x_i, W), y_i) + \\lambda R(W)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">L</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.5991em;vertical-align:-1.2777em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>其中第一项是<strong>数据损失 Data Loss</strong>，衡量模型在训练集上的预测误差；第二项是<strong>正则化项 Regularization Term</strong>。</p>\n<p>如果只最小化数据损失，模型会倾向于<strong>过拟合 Overfitting</strong>——过度记忆训练数据中的每一个细节、噪声和无用特征，导致在训练集上表现极好（loss 接近零），但在从未见过的测试数据上表现糟糕。正则化的核心目的就是<strong>防止模型在训练数据上&quot;表现得太好&quot;</strong>，从而提升泛化能力。</p>\n<p>这背后是<strong>奥卡姆剃刀原则 Occam's Razor</strong>：在所有能解释同一现象的假设中，最简单的那个往往是最好的。正则化通过惩罚复杂模型，引导优化过程偏好简单、可泛化的解。</p>\n<h3 id=\"正则化的三个核心作用\"><a class=\"anchor\" href=\"#正则化的三个核心作用\">#</a> 正则化的三个核心作用</h3>\n<ul>\n<li><strong>表达偏好</strong>：在多个都能拟合训练数据的 W 中，正则化表达了我们对&quot;什么样的权重更好&quot;的先验偏好（例如权重应该分散、不应该过大）</li>\n<li><strong>提升泛化</strong>：降低模型复杂度，减少过拟合，让模型在未知测试数据上表现更好</li>\n<li><strong>稳定优化</strong>：L2 正则化为损失函数添加二次曲率，使优化曲面更加&quot;碗状&quot;，梯度下降更容易找到好的解</li>\n</ul>\n<p>超参数 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 控制正则化的强度：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 过大 → 模型过于简单 → <strong>欠拟合 Underfitting</strong>（连训练数据都学不好）</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 过小 → 模型过于复杂 → <strong>过拟合 Overfitting</strong>（记住了训练数据的噪声）</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span> 需要通过验证集手动调整，是训练时必须调优的重要超参数</li>\n</ul>\n<hr />\n<h3 id=\"l1-正则化lasso\"><a class=\"anchor\" href=\"#l1-正则化lasso\">#</a> L1 正则化（Lasso）</h3>\n<p><strong>公式：</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>k</mi></munder><munder><mo>∑</mo><mi>l</mi></munder><mi mathvariant=\"normal\">∣</mi><msub><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub><mi mathvariant=\"normal\">∣</mi></mrow><annotation encoding=\"application/x-tex\">R(W) = \\sum_k \\sum_l |W_{k,l}|\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3521em;vertical-align:-1.3021em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span></span></span></span></span></p>\n<p>L1 正则化将所有权重的<strong>绝对值和</strong>作为惩罚项。L1 的惩罚是线性增长的，这导致一个关键特性：<strong>稀疏性 Sparsity</strong>——优化过程会主动将大部分权重推向精确的零值，只保留少数真正重要的非零权重。</p>\n<p><strong>为什么 L1 产生稀疏解？</strong> 考虑一个简单场景：损失函数是一个二次曲面，正则化是 L1 的菱形约束区域。最优解往往落在菱形的<strong>顶点</strong>（坐标轴上），此时某些权重精确为零。而 L2 的约束区域是圆形，最优解通常落在圆内部某处，权重都不为零。这就是 L1 天然适合<strong>特征选择 Feature Selection</strong>的原因。</p>\n<p><strong>代码实现：</strong></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> l1_regularization</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    L1 正则化损失</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    reg_loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">abs</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> reg_loss</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> l1_gradient</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    L1 正则化的梯度：d(R)/dW = lambda * sign(W)</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    注意在 W=0 处不可导，实践中使用次梯度 subgradient</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sign</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span></code></pre>\n<hr />\n<h3 id=\"l2-正则化weight-decay-ridge\"><a class=\"anchor\" href=\"#l2-正则化weight-decay-ridge\">#</a> L2 正则化（Weight Decay / Ridge）</h3>\n<p><strong>公式：</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>k</mi></munder><munder><mo>∑</mo><mi>l</mi></munder><msubsup><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow><mn>2</mn></msubsup></mrow><annotation encoding=\"application/x-tex\">R(W) = \\sum_k \\sum_l W_{k,l}^2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3521em;vertical-align:-1.3021em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3831em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>L2 正则化将所有权重的<strong>平方和</strong>作为惩罚项。由于平方函数的特性——权重值越大惩罚越重（平方增长），优化过程会倾向于让所有权重都保持较小的值，并且<strong>均匀分散</strong>到各个维度。</p>\n<p><strong>直观例子</strong>：假设输入向量 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi><mo>=</mo><mo stretchy=\"false\">[</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>1</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">x = [1, 1, 1, 1]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">1</span><span class=\"mclose\">]</span></span></span></span>，考虑两组权重：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>w</mi><mn>1</mn></msub><mo>=</mo><mo stretchy=\"false\">[</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo separator=\"true\">,</mo><mn>0</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">w_1 = [1, 0, 0, 0]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0</span><span class=\"mclose\">]</span></span></span></span>，与 x 的内积 = 1，L2 惩罚 = 1</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>w</mi><mn>2</mn></msub><mo>=</mo><mo stretchy=\"false\">[</mo><mn>0.25</mn><mo separator=\"true\">,</mo><mn>0.25</mn><mo separator=\"true\">,</mo><mn>0.25</mn><mo separator=\"true\">,</mo><mn>0.25</mn><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">w_2 = [0.25, 0.25, 0.25, 0.25]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">[</span><span class=\"mord\">0.25</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0.25</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0.25</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">0.25</span><span class=\"mclose\">]</span></span></span></span>，与 x 的内积 = 1，L2 惩罚 = 0.25</li>\n</ul>\n<p>两组权重在分类效果上完全相同（内积都是 1），但 L2 正则化<strong>强烈偏好 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>w</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">w_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0269em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></strong>，因为它利用了所有输入维度，权重分布更均匀、每个值更小。这种&quot;不把鸡蛋放在一个篮子里&quot;的策略让模型对输入噪声更鲁棒。</p>\n<p><strong>代码实现：</strong></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> l2_regularization</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    L2 正则化损失</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    W: 权重矩阵</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    lambda_reg: 正则化强度超参数</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    reg_loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> reg_loss</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> l2_gradient</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    L2 正则化的梯度：d(R)/dW = 2 * lambda * W</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> lambda_reg </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span></code></pre>\n<hr />\n<h3 id=\"elastic-net\"><a class=\"anchor\" href=\"#elastic-net\">#</a> Elastic Net</h3>\n<p><strong>公式：</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>k</mi></munder><munder><mo>∑</mo><mi>l</mi></munder><mrow><mo fence=\"true\">(</mo><mi>β</mi><mo>⋅</mo><msubsup><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow><mn>2</mn></msubsup><mo>+</mo><mi mathvariant=\"normal\">∣</mi><msub><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub><mi mathvariant=\"normal\">∣</mi><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">R(W) = \\sum_k \\sum_l \\left(\\beta \\cdot W_{k,l}^2 + |W_{k,l}|\\right)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3521em;vertical-align:-1.3021em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">(</span></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-2.453em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3831em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size1\">)</span></span></span></span></span></span></span></p>\n<p>Elastic Net 将 L1 和 L2 线性组合，同时享受两者的优点：L2 的<strong>权重收缩与稳定性</strong> + L1 的<strong>稀疏性与特征选择</strong>。通过调整 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 参数来控制两者的相对强度。</p>\n<p>在实际深度学习中，单纯的 L1/L2 权重正则化使用频率逐渐降低，更多被以下<strong>结构化正则化方法</strong>取代：</p>\n<ul>\n<li><strong>Dropout</strong>：训练时随机&quot;丢弃&quot;（置零）一部分神经元，强制网络学习冗余表示，防止神经元之间形成过度依赖</li>\n<li><strong>Batch Normalization</strong>：对每一层的激活值进行归一化，稳定数据分布，本身自带轻微正则化效果</li>\n<li><strong>数据增强 Data Augmentation</strong>：对训练图像进行随机翻转、裁剪、颜色抖动等变换，相当于免费扩大了训练集</li>\n<li><strong>Stochastic Depth</strong>：训练时随机丢弃整层网络，强迫梯度通过不同的子网络传播</li>\n</ul>\n<hr />\n<h2 id=\"优化-optimization\"><a class=\"anchor\" href=\"#优化-optimization\">#</a> 优化 Optimization</h2>\n<h3 id=\"优化问题概述\"><a class=\"anchor\" href=\"#优化问题概述\">#</a> 优化问题概述</h3>\n<p>在确定了损失函数之后，下一个问题是：<strong>如何找到使损失最小的权重 W？</strong> 这就是优化问题。</p>\n<p>损失函数可以想象成一个高维曲面（比如 CIFAR-10 线性分类器有 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mo>×</mo><mn>3072</mn><mo>=</mo><mn>30720</mn></mrow><annotation encoding=\"application/x-tex\">10 \\times 3072 = 30720</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">10</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">3072</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">30720</span></span></span></span> 个参数），我们的目标是走到这个曲面的<strong>最低点</strong>。但盲目地走显然不可行——下面首先来看为什么&quot;随机猜测&quot;行不通。</p>\n<hr />\n<h3 id=\"策略零随机搜索-random-search反面教材\"><a class=\"anchor\" href=\"#策略零随机搜索-random-search反面教材\">#</a> 策略零：随机搜索 Random Search（反面教材）</h3>\n<p>最 naive 的想法：随机生成很多组权重，每组都算一下损失，选损失最小的那个。</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">best_loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> float</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">'</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">inf</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">'</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">best_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1E754F;--shiki-dark:#4D9375\"> None</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> _ </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1000</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">randn</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">10</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 3073</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0.001</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 随机生成权重</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> compute_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_train</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_train</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    if</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">&#x3C;</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> best_loss</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        best_loss </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        best_W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span></span></code></pre>\n<p>在 CIFAR-10 上，随机搜索的最好结果仅为 <strong>15.5%</strong> 准确率，而当时最优方法已达到约 95%。随机搜索在高维空间中完全没有方向感——这就像在撒哈拉沙漠里随机扔一个飞镖，指望恰好命中某个特定沙粒。</p>\n<p><strong>我们需要利用损失曲面的几何信息来指导搜索方向。</strong> 这就引出了梯度下降。</p>\n<hr />\n<h2 id=\"梯度下降法-gradient-descent\"><a class=\"anchor\" href=\"#梯度下降法-gradient-descent\">#</a> 梯度下降法 Gradient Descent</h2>\n<h3 id=\"介绍\"><a class=\"anchor\" href=\"#介绍\">#</a> 介绍</h3>\n<p>梯度下降是整个深度学习优化的<strong>基石</strong>，所有后续高级优化器（SGD、Momentum、Adam）本质上都是它的变体或改进。</p>\n<h3 id=\"思路\"><a class=\"anchor\" href=\"#思路\">#</a> 思路</h3>\n<p>想象你站在一片山区，四周被浓雾笼罩，完全看不见山脚在哪里。你唯一能感知的是脚下地面的<strong>坡度</strong>——往哪个方向走是下坡。梯度下降的策略很简单：每一步都沿着当前最陡的下坡方向走一小段距离，重复这个过程直到地面变平（梯度趋近于零）。</p>\n<p>梯度是导数在多维空间中的推广，它是一个<strong>向量</strong>，每个分量是损失函数对对应参数的偏导数。梯度指向函数值<strong>增长最快</strong>的方向，所以我们沿着<strong>负梯度方向</strong>走就是下降最快。</p>\n<h3 id=\"原理\"><a class=\"anchor\" href=\"#原理\">#</a> 原理</h3>\n<p><strong>一维情况</strong>：导数 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mrow><mi>d</mi><mi>f</mi></mrow><mrow><mi>d</mi><mi>x</mi></mrow></mfrac><mo>=</mo><msub><mrow><mi>lim</mi><mo>⁡</mo></mrow><mrow><mi>h</mi><mo>→</mo><mn>0</mn></mrow></msub><mfrac><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo>+</mo><mi>h</mi><mo stretchy=\"false\">)</mo><mo>−</mo><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><mi>h</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{df}{dx} = \\lim_{h \\to 0} \\frac{f(x+h) - f(x)}{h}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2772em;vertical-align:-0.345em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9322em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">x</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.4461em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">df</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.355em;vertical-align:-0.345em;\"></span><span class=\"mop\"><span class=\"mop\">lim</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">h</span><span class=\"mrel mtight\">→</span><span class=\"mord mtight\">0</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.01em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">h</span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.485em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">x</span><span class=\"mbin mtight\">+</span><span class=\"mord mathnormal mtight\">h</span><span class=\"mclose mtight\">)</span><span class=\"mbin mtight\">−</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">x</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span>，表示函数在该点的瞬时变化率。</p>\n<p><strong>多维情况</strong>：梯度 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><mo>=</mo><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>w</mi><mn>1</mn></msub></mrow></mfrac><mo separator=\"true\">,</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>w</mi><mn>2</mn></msub></mrow></mfrac><mo separator=\"true\">,</mo><mi mathvariant=\"normal\">.</mi><mi mathvariant=\"normal\">.</mi><mi mathvariant=\"normal\">.</mi><mo separator=\"true\">,</mo><mfrac><mrow><mi mathvariant=\"normal\">∂</mi><mi>L</mi></mrow><mrow><mi mathvariant=\"normal\">∂</mi><msub><mi>w</mi><mi>n</mi></msub></mrow></mfrac><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\nabla_W L = \\left(\\frac{\\partial L}{\\partial w_1}, \\frac{\\partial L}{\\partial w_2}, ..., \\frac{\\partial L}{\\partial w_n}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8em;vertical-align:-0.65em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3173em;\"><span style=\"top:-2.357em;margin-left:-0.0269em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4451em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3173em;\"><span style=\"top:-2.357em;margin-left:-0.0269em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4451em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">...</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8801em;\"><span style=\"top:-2.655em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02691em;\">w</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1645em;\"><span style=\"top:-2.357em;margin-left:-0.0269em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.394em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\" style=\"margin-right:0.05556em;\">∂</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4451em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size2\">)</span></span></span></span></span></span>，是一个 n 维向量。</p>\n<p><strong>更新公式：</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\nabla_W L\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span> 是<strong>学习率 Learning Rate</strong>，控制每一步走多远，是最关键的超参数之一。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>θ</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msub><mi>θ</mi><mi>t</mi></msub><mo>−</mo><mi>α</mi><mo>⋅</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>θ</mi></msub><mi>J</mi><mo stretchy=\"false\">(</mo><msub><mi>θ</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\theta_{t+1} = \\theta_t - \\alpha \\cdot \\nabla_\\theta J(\\theta_t)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9028em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em;\">θ</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em;\">J</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/stepsize.jpg\" alt=\"不同步长（学习率）对梯度下降的影响\" /></p>\n<h3 id=\"数值梯度-vs-解析梯度\"><a class=\"anchor\" href=\"#数值梯度-vs-解析梯度\">#</a> 数值梯度 vs 解析梯度</h3>\n<p>计算梯度有两种方式：</p>\n<p><strong>数值梯度 Numerical Gradient</strong>（用定义近似）：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi>d</mi><mi>f</mi></mrow><mrow><mi>d</mi><mi>x</mi></mrow></mfrac><mo>≈</mo><mfrac><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo>+</mo><mi>h</mi><mo stretchy=\"false\">)</mo><mo>−</mo><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo>−</mo><mi>h</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mn>2</mn><mi>h</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{df}{dx} \\approx \\frac{f(x+h) - f(x-h)}{2h}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">df</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.113em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\">h</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">h</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">h</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> eval_numerical_gradient</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">f</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1e-5</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">使用中心差分法计算数值梯度</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    grad </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    it </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">nditer</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> flags</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">[</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">'</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">multi_index</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">'</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">]</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    while</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> not</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> it</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">finished</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> it</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">multi_index</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        old_val </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> old_val </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        fxh1 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> f</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> old_val </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        fxh2 </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> f</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        grad</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">fxh1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> fxh2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> /</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">2</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> h</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        x</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> old_val</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        it</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">iternext</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> grad</span></span></code></pre>\n<ul>\n<li>优点：简单、直观，不需要推导</li>\n<li>缺点：<strong>极其缓慢</strong>（每个参数都要算两次前向传播），精度受 h 选择影响；<strong>仅用于梯度检查，不用于训练</strong></li>\n</ul>\n<p><strong>解析梯度 Analytic Gradient</strong>（微积分推导）：</p>\n<p>直接从损失函数公式推导出精确的梯度表达式。例如，SVM 损失对 W 的梯度：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><mrow><mo fence=\"true\">{</mo><mtable rowspacing=\"0.36em\" columnalign=\"left left\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mtext>if </mtext><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub><mo>≥</mo><msub><mi>s</mi><mi>j</mi></msub><mo>+</mo><mn>1</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>x</mi><mi>i</mi></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mtext>if </mtext><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub><mo>&lt;</mo><msub><mi>s</mi><mi>j</mi></msub><mo>+</mo><mn>1</mn></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding=\"application/x-tex\">\\nabla_W L_i = \\begin{cases} 0 &amp; \\text{if } s_{y_i} \\geq s_j + 1 \\\\ x_i &amp; \\text{if } s_{y_i} &lt; s_j + 1 \\end{cases}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em;\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em;\"><span class=\"delimsizing size4\">{</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\">0</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em;\"></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.69em;\"><span style=\"top:-3.69em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">if </span></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span></span></span><span style=\"top:-2.25em;\"><span class=\"pstrut\" style=\"height:3.008em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">if </span></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.19em;\"><span></span></span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<ul>\n<li>优点：<strong>精确、快速</strong>（一次计算得到所有梯度）</li>\n<li>缺点：推导过程容易出错</li>\n</ul>\n<p><strong>实践中的铁律</strong>：训练时使用解析梯度；调试时使用数值梯度做<strong>梯度检查 Gradient Check</strong>——如果两者差异过大，说明解析梯度的公式或实现中有 bug。</p>\n<h3 id=\"基本梯度下降的代码实现\"><a class=\"anchor\" href=\"#基本梯度下降的代码实现\">#</a> 基本梯度下降的代码实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> gradient_descent</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_iters</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    全量梯度下降（Batch Gradient Descent）</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    每次迭代使用全部训练数据计算梯度</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 计算在整个训练集上的梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 包含数据损失 + 正则化损失的梯度</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 沿负梯度方向更新</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span></code></pre>\n<p>全量梯度下降的致命问题：当训练集很大时（如 ImageNet 有 120 万张图），每走一步都需要计算全部数据的梯度，计算代价高到无法接受。</p>\n<hr />\n<h2 id=\"随机梯度下降-stochastic-gradient-descent-sgd\"><a class=\"anchor\" href=\"#随机梯度下降-stochastic-gradient-descent-sgd\">#</a> 随机梯度下降 Stochastic Gradient Descent (SGD)</h2>\n<h3 id=\"介绍-2\"><a class=\"anchor\" href=\"#介绍-2\">#</a> 介绍</h3>\n<p>SGD 是深度学习中最基础的实用优化算法。它用一个关键洞察解决了全量梯度下降的效率问题：<strong>不需要精确梯度，近似梯度就够了。</strong></p>\n<h3 id=\"思路-2\"><a class=\"anchor\" href=\"#思路-2\">#</a> 思路</h3>\n<p>与其每次用全部 N 个样本计算精确梯度，不如随机抽取一小批（<strong>mini-batch</strong>，通常 32/64/128/256 个样本），用这批样本的梯度作为整体梯度的<strong>无偏估计</strong>。这样每一步的计算量从 O(N) 降到了 O(batch_size)，使得大规模数据上的训练成为可能。</p>\n<blockquote>\n<p>术语澄清：&quot;SGD&quot; 在实际使用中几乎总是指 <strong>Mini-batch SGD</strong>（小批量随机梯度下降），而不是每次只用一个样本的极端版本。</p>\n</blockquote>\n<h3 id=\"原理-2\"><a class=\"anchor\" href=\"#原理-2\">#</a> 原理</h3>\n<p>全量梯度下降的更新公式是：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><mfrac><mn>1</mn><mi>N</mi></mfrac><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></munderover><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><msub><mi>L</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\frac{1}{N}\\sum_{i=1}^N \\nabla_W L_i\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.106em;vertical-align:-1.2777em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.8283em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em;\">N</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>SGD 将其替换为对 mini-batch 的近似：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><mfrac><mn>1</mn><mi>m</mi></mfrac><munderover><mo>∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><msub><mi>L</mi><msub><mi>i</mi><mi>k</mi></msub></msub></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\frac{1}{m}\\sum_{k=1}^m \\nabla_W L_{i_k}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.9535em;vertical-align:-1.3021em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.6514em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mrel mtight\">=</span><span class=\"mord mtight\">1</span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span><span style=\"top:-4.3em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">m</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3021em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">i</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2559em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>其中 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi></mrow><annotation encoding=\"application/x-tex\">m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">m</span></span></span></span> 是 mini-batch 大小，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>i</mi><mi>k</mi></msub></mrow><annotation encoding=\"application/x-tex\">i_k</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8095em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">i</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 是随机采样的索引。一个 <strong>Epoch</strong> 定义为遍历整个训练集一次（即所有 mini-batch 的梯度更新加起来覆盖了全部数据）。</p>\n<h3 id=\"sgd-面临的三大挑战\"><a class=\"anchor\" href=\"#sgd-面临的三大挑战\">#</a> SGD 面临的三大挑战</h3>\n<p>尽管 SGD 解决了计算效率问题，但它本身存在三个核心缺陷：</p>\n<p><strong>挑战一：病态条件 Ill-Conditioning</strong></p>\n<p>损失曲面的曲率在不同方向差异巨大——某些方向陡峭（梯度大），某些方向平坦（梯度小）。SGD 在陡峭方向上来回震荡（zigzag），在平坦方向上却几乎无法前进。这就像一个窄长的峡谷：你沿着峡谷壁来回弹跳，但沿着谷底的推进却极其缓慢。</p>\n<p><strong>挑战二：鞍点 Saddle Points</strong></p>\n<p>在高维空间中，<strong>鞍点的数量远超局部极小值</strong>。鞍点处梯度等于零，但有些方向是上坡（最小值）、有些方向是下坡（最大值）——典型的例子是 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>y</mi><mo stretchy=\"false\">)</mo><mo>=</mo><msup><mi>x</mi><mn>2</mn></msup><mo>−</mo><msup><mi>y</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">f(x, y) = x^2 - y^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8974em;vertical-align:-0.0833em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0085em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span>，在原点沿 x 方向是极小值，沿 y 方向是极大值。普通 SGD 遇到鞍点时梯度趋近于零，更新几乎停滞，但实际上并非真正的最优点。</p>\n<p><strong>挑战三：梯度噪声 Gradient Noise</strong></p>\n<p>每个 mini-batch 计算出的梯度都是整体梯度的含噪估计——不同的 mini-batch 给出略有不同的梯度方向和大小。这种随机性使得 SGD 的更新路径始终在&quot;抖动&quot;，收敛过程不够平滑。</p>\n<p>不过梯度噪声也有一个意外的好处：它有时能帮助 SGD 从<strong>浅的局部极小值</strong>中跳出来，而这在全量梯度下降中是不可能的。</p>\n<h3 id=\"代码实现\"><a class=\"anchor\" href=\"#代码实现\">#</a> 代码实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> sgd</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    小批量随机梯度下降</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">        # 每个 epoch 开始时打乱数据顺序</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 取一个 mini-batch</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 计算 mini-batch 梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 更新</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span></code></pre>\n<hr />\n<h2 id=\"带动量的随机梯度下降-sgd-with-momentum\"><a class=\"anchor\" href=\"#带动量的随机梯度下降-sgd-with-momentum\">#</a> 带动量的随机梯度下降 SGD with Momentum</h2>\n<h3 id=\"介绍-3\"><a class=\"anchor\" href=\"#介绍-3\">#</a> 介绍</h3>\n<p>带动量的 SGD 是 SGD 的<strong>第一个重要升级</strong>。它在原始 SGD 的基础上引入了一个<strong>速度变量 velocity</strong>，让参数更新带有&quot;惯性&quot;。</p>\n<h3 id=\"思路-3\"><a class=\"anchor\" href=\"#思路-3\">#</a> 思路</h3>\n<p>想象你把 SGD 的优化过程想象成一个小球在损失曲面上滚动。普通 SGD 的小球没有质量——每一步只看当前位置的坡度，走到哪算哪，非常容易在峡谷里来回震荡或被鞍点&quot;卡住&quot;。带动量的 SGD 赋予小球<strong>质量和惯性</strong>：它记住了之前运动的方向和速度，即使当前梯度很小甚至为零，积累的动量也能推动它继续前进，冲过鞍点和平坦区域。</p>\n<p>同时，由于速度是历史梯度的加权平均，它自然地<strong>平滑掉了梯度噪声</strong>——单个 mini-batch 的随机波动被抹平，整体的前进方向更加一致和稳定。</p>\n<h3 id=\"原理-3\"><a class=\"anchor\" href=\"#原理-3\">#</a> 原理</h3>\n<p>动量方法维护一个<strong>速度向量 v</strong>，每一步将其更新为历史速度与当前梯度的加权组合：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>v</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>ρ</mi><mo>⋅</mo><msub><mi>v</mi><mi>t</mi></msub><mo>+</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">v_{t+1} = \\rho \\cdot v_t + \\nabla_W L\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><msub><mi>v</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot v_{t+1}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>其中：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span>（动量系数，又称 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>β</mi></mrow><annotation encoding=\"application/x-tex\">\\beta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span></span></span></span> 或 momentum）通常取 <strong>0.9</strong> 或 <strong>0.99</strong></li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">v_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 是历史梯度的指数加权移动平均 Exponential Moving Average</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi><mo>=</mo><mn>0.9</mn></mrow><annotation encoding=\"application/x-tex\">\\rho = 0.9</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0.9</span></span></span></span> 意味着当前速度约等于过去约 10 步梯度的加权平均</li>\n</ul>\n<p><strong>为什么动量能克服鞍点？</strong> 在鞍点处 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><mo>≈</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">\\nabla_W L \\approx 0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>，如果是普通 SGD，更新直接停止。但带动量的 SGD 中：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>v</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>≈</mo><mi>ρ</mi><mo>⋅</mo><msub><mi>v</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">v_{t+1} \\approx \\rho \\cdot v_t\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6915em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>速度不会立刻消失，而是以 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span> 的比例衰减，继续推动参数向前走。这就像球滚到了一个小凹陷处——纯靠坡度它出不来，但如果有速度，就能直接冲过去。</p>\n<h3 id=\"nesterov-加速梯度nag\"><a class=\"anchor\" href=\"#nesterov-加速梯度nag\">#</a> Nesterov 加速梯度（NAG）</h3>\n<p>Nesterov 动量是标准动量的一个巧妙改进：<strong>先沿着速度方向&quot;展望&quot;一步，在那个位置计算梯度</strong>，而不是在当前位置算。这相当于在行动之前先&quot;看一步&quot;。</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>v</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>ρ</mi><mo>⋅</mo><msub><mi>v</mi><mi>t</mi></msub><mo>−</mo><mi>α</mi><mo>⋅</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo>+</mo><mi>ρ</mi><mo>⋅</mo><msub><mi>v</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">v_{t+1} = \\rho \\cdot v_t - \\alpha \\cdot \\nabla_W L(W + \\rho \\cdot v_t)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>+</mo><msub><mi>v</mi><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W + v_{t+1}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span></span></span></p>\n<p>NAG 对于凸优化问题有更好的理论收敛保证，在非凸的深度学习实践中也往往比标准动量略好一些。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn3/nesterov.jpeg\" alt=\"Nesterov 动量与标准动量的更新轨迹对比\" /></p>\n<h3 id=\"代码实现-2\"><a class=\"anchor\" href=\"#代码实现-2\">#</a> 代码实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> sgd_momentum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> momentum</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    带动量的随机梯度下降</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    momentum: 动量系数，典型值 0.9</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 初始化速度为零</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 核心：速度更新 + 参数更新</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> momentum </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW          </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 累积历史梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v          </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 用速度（而非原始梯度）更新</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># Nesterov 动量版本</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> sgd_nesterov</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> momentum</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    Nesterov 加速梯度下降</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 在\"前瞻\"位置计算梯度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W_ahead </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> momentum </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W_ahead</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> momentum </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span></code></pre>\n<hr />\n<h2 id=\"rmsprop\"><a class=\"anchor\" href=\"#rmsprop\">#</a> RMSProp</h2>\n<h3 id=\"介绍-4\"><a class=\"anchor\" href=\"#介绍-4\">#</a> 介绍</h3>\n<p>RMSProp（Root Mean Square Propagation）由 Geoff Hinton 在 Coursera 课程中提出，是<strong>自适应学习率方法</strong>的代表作之一。它解决了 SGD 最为人诟病的问题——在所有参数上使用同一个全局学习率。</p>\n<h3 id=\"思路-4\"><a class=\"anchor\" href=\"#思路-4\">#</a> 思路</h3>\n<p>回到 SGD 面临的<strong>病态条件</strong>问题：某些参数方向的梯度一直很大（陡峭方向），某些方向的梯度一直很小（平坦方向）。对所有参数使用相同的学习率必然意味着——要么陡峭方向上震荡发散，要么平坦方向上寸步难行。</p>\n<p>RMSProp 的核心思想：<strong>每个参数应该有自己专属的学习率</strong>。如果一个参数的梯度一直很大，就给它小步走（防止震荡）；如果一个参数的梯度一直很小，就给它大步走（加速前进）。</p>\n<p>具体做法是维护每个参数的<strong>梯度平方的指数移动平均</strong>，然后用这个平均值来<strong>逐元素缩放</strong>学习率——梯度大的方向被除以大数（步长变小），梯度小的方向被除以小数（步长变大）。</p>\n<h3 id=\"原理-4\"><a class=\"anchor\" href=\"#原理-4\">#</a> 原理</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mtext>cache</mtext><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>ρ</mi><mo>⋅</mo><msub><mtext>cache</mtext><mi>t</mi></msub><mo>+</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>ρ</mi><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">\\text{cache}_{t+1} = \\rho \\cdot \\text{cache}_t + (1 - \\rho) \\cdot (\\nabla_W L)^2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9028em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">cache</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">cache</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1141em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mfrac><mi>α</mi><mrow><msqrt><msub><mtext>cache</mtext><mrow><mi>t</mi><mo>+</mo><mn>1</mn></mrow></msub></msqrt><mo>+</mo><mi>ϵ</mi></mrow></mfrac><mo>⋅</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\frac{\\alpha}{\\sqrt{\\text{cache}_{t+1}} + \\epsilon} \\cdot \\nabla_W L\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0376em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.2819em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8281em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">cache</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.7881em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2119em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span></span></span></span></span></p>\n<p>其中：</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext>cache</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{cache}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord text\"><span class=\"mord\">cache</span></span></span></span></span>：梯度平方的指数移动平均，注意这里的乘法是<strong>逐元素乘法</strong></li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ρ</mi></mrow><annotation encoding=\"application/x-tex\">\\rho</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord mathnormal\">ρ</span></span></span></span>：衰减率，典型值 <strong>0.9</strong> 或 <strong>0.99</strong>，控制历史信息保留多久</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding=\"application/x-tex\">\\epsilon</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span></span>：极小常数（如 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mn>10</mn><mrow><mo>−</mo><mn>7</mn></mrow></msup></mrow><annotation encoding=\"application/x-tex\">10^{-7}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em;\"></span><span class=\"mord\">1</span><span class=\"mord\"><span class=\"mord\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">−</span><span class=\"mord mtight\">7</span></span></span></span></span></span></span></span></span></span></span></span>），防止除零</li>\n</ul>\n<p><strong>为什么用指数移动平均而不是简单累加？</strong> RMSProp 的前身是 AdaGrad，它将所有历史梯度平方直接累加：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mtext>cache</mtext><mtext>AdaGrad</mtext></msub><mo>=</mo><mtext>cache</mtext><mo>+</mo><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">\\text{cache}_{\\text{AdaGrad}} = \\text{cache} + (\\nabla_W L)^2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">cache</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">AdaGrad</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em;\"></span><span class=\"mord text\"><span class=\"mord\">cache</span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1141em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p>这导致 cache 单调递增、学习率单调递减，在非凸问题中会过早地把学习率压到零，让训练提前停滞。RMSProp 使用指数移动平均——旧信息逐渐&quot;遗忘&quot;、新信息权重更大——完美解决了这个问题。</p>\n<h3 id=\"代码实现-3\"><a class=\"anchor\" href=\"#代码实现-3\">#</a> 代码实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> rmsprop</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> decay_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1e-7</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    RMSProp 优化器</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    decay_rate: 梯度平方的衰减率 (rho)，典型值 0.9 或 0.99</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    cache </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 梯度平方的移动平均</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 更新梯度平方的移动平均</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            cache </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> decay_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> cache </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> decay_rate</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 逐元素自适应学习率</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sqrt</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">cache</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span></code></pre>\n<hr />\n<h2 id=\"adamadaptive-moment-estimation\"><a class=\"anchor\" href=\"#adamadaptive-moment-estimation\">#</a> Adam（Adaptive Moment Estimation）</h2>\n<h3 id=\"介绍-5\"><a class=\"anchor\" href=\"#介绍-5\">#</a> 介绍</h3>\n<p>Adam 由 Kingma 和 Ba 在 2015 年提出，是目前深度学习中使用最广泛的优化器。它将 <strong>Momentum（一阶矩/动量）</strong> 和 <strong>RMSProp（二阶矩/自适应学习率）</strong> 的思想优雅地结合在一起，既有动量带来的平滑加速能力，又有逐参数自适应学习率的鲁棒性。</p>\n<p>Adam 几乎是所有项目的<strong>默认优化器</strong>——如果不知道该用什么，先用 Adam 通常不会错。</p>\n<h3 id=\"思路-5\"><a class=\"anchor\" href=\"#思路-5\">#</a> 思路</h3>\n<p>Adam 同时维护两个指数移动平均：</p>\n<ul>\n<li><strong>一阶矩 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">m_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></strong>：梯度的指数移动平均（如同动量），记录梯度的&quot;方向&quot;和&quot;趋势&quot;</li>\n<li><strong>二阶矩 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">v_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></strong>：梯度平方的指数移动平均（如同 RMSProp），记录梯度的&quot;波动幅度&quot;</li>\n</ul>\n<p>然后用 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">m_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 作为更新方向（替代原始梯度），用 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msqrt><msub><mi>v</mi><mi>t</mi></msub></msqrt></mrow><annotation encoding=\"application/x-tex\">\\sqrt{v_t}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.04em;vertical-align:-0.3147em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7253em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.6853em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3147em;\"><span></span></span></span></span></span></span></span></span> 作为逐参数的学习率缩放因子。最终更新公式大致是：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><mfrac><msub><mi>m</mi><mi>t</mi></msub><mrow><msqrt><msub><mi>v</mi><mi>t</mi></msub></msqrt><mo>+</mo><mi>ϵ</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\frac{m_t}{\\sqrt{v_t} + \\epsilon}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1083em;vertical-align:-1.0007em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7253em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.6853em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3147em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0007em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><strong>但有个关键问题</strong>：在训练的最初几步，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">m_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">v_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 都被初始化为零。由于指数移动平均的性质，前几步的值会严重偏向零，导致参数更新过小。Adam 使用<strong>偏差修正 Bias Correction</strong>来解决这个问题。</p>\n<h3 id=\"原理-5\"><a class=\"anchor\" href=\"#原理-5\">#</a> 原理</h3>\n<p><strong>第一步：更新一阶矩和二阶矩</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>m</mi><mi>t</mi></msub><mo>=</mo><msub><mi>β</mi><mn>1</mn></msub><mo>⋅</mo><msub><mi>m</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>1</mn></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">m_t = \\beta_1 \\cdot m_{t-1} + (1 - \\beta_1) \\cdot \\nabla_W L\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>v</mi><mi>t</mi></msub><mo>=</mo><msub><mi>β</mi><mn>2</mn></msub><mo>⋅</mo><msub><mi>v</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"normal\">∇</mi><mi>W</mi></msub><mi>L</mi><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">v_t = \\beta_2 \\cdot v_{t-1} + (1 - \\beta_2) \\cdot (\\nabla_W L)^2\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7917em;vertical-align:-0.2083em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">t</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.1141em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord\">∇</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em;\">W</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">L</span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8641em;\"><span style=\"top:-3.113em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span></span></p>\n<p><strong>第二步：偏差修正</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mover accent=\"true\"><mi>m</mi><mo>^</mo></mover><mi>t</mi></msub><mo>=</mo><mfrac><msub><mi>m</mi><mi>t</mi></msub><mrow><mn>1</mn><mo>−</mo><msubsup><mi>β</mi><mn>1</mn><mi>t</mi></msubsup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">m</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.25em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0599em;vertical-align:-0.9523em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7754em;\"><span style=\"top:-2.4337em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9523em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mover accent=\"true\"><mi>v</mi><mo>^</mo></mover><mi>t</mi></msub><mo>=</mo><mfrac><msub><mi>v</mi><mi>t</mi></msub><mrow><mn>1</mn><mo>−</mo><msubsup><mi>β</mi><mn>2</mn><mi>t</mi></msubsup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0599em;vertical-align:-0.9523em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7754em;\"><span style=\"top:-2.4337em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span><span style=\"top:-3.0448em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9523em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>偏差修正的直观理解：在 t=1 时，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mn>1</mn></msub><mo>=</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>1</mn></msub><mo stretchy=\"false\">)</mo><mo>⋅</mo><msub><mi>g</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">m_1 = (1-\\beta_1) \\cdot g_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">g</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>（注意 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mn>0</mn></msub><mo>=</mo><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">m_0=0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">0</span></span></span></span>），除以 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>−</mo><msub><mi>β</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">1-\\beta_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 后正好还原为 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>g</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">g_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">g</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span>。随着 t 增大，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo>−</mo><msubsup><mi>β</mi><mn>1</mn><mi>t</mi></msubsup></mrow><annotation encoding=\"application/x-tex\">1 - \\beta_1^t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em;\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0417em;vertical-align:-0.2481em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7936em;\"><span style=\"top:-2.4519em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2481em;\"><span></span></span></span></span></span></span></span></span></span> 趋近于 1，偏差修正逐渐失效——这正是我们想要的，因为移动平均本身已经足够准确了。</p>\n<p><strong>第三步：参数更新</strong></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><mfrac><msub><mover accent=\"true\"><mi>m</mi><mo>^</mo></mover><mi>t</mi></msub><mrow><msqrt><msub><mover accent=\"true\"><mi>v</mi><mo>^</mo></mover><mi>t</mi></msub></msqrt><mo>+</mo><mi>ϵ</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3014em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.2528em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8572em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8172em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1828em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">m</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.25em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><strong>推荐超参数</strong>（论文默认值，大多数场景直接使用即可）：</p>\n<table>\n<thead>\n<tr>\n<th>参数</th>\n<th>推荐值</th>\n<th>说明</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span> (lr)</td>\n<td>1e-3</td>\n<td>学习率，有时用 5e-4</td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>β</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">\\beta_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></td>\n<td>0.9</td>\n<td>一阶矩衰减率（动量项）</td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>β</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">\\beta_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05278em;\">β</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0528em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span></td>\n<td>0.999</td>\n<td>二阶矩衰减率（缩放项）</td>\n</tr>\n<tr>\n<td><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding=\"application/x-tex\">\\epsilon</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span></span></td>\n<td>1e-8</td>\n<td>数值稳定常数</td>\n</tr>\n</tbody>\n</table>\n<h3 id=\"adamw解耦权重衰减\"><a class=\"anchor\" href=\"#adamw解耦权重衰减\">#</a> AdamW：解耦权重衰减</h3>\n<p>标准 Adam 的一个隐性问题：当使用 L2 正则化时，正则化项的梯度进入了 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>m</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">m_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">v_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 的计算，导致<strong>权重衰减效果被自适应学习率干扰</strong>——不同参数的权重衰减强度不一致，违背了正则化的初衷。</p>\n<p><strong>AdamW</strong>（Loshchilov &amp; Hutter, 2019）的解决方案极其简单：<strong>将权重衰减从梯度计算中解耦出来</strong>，作为独立的更新步骤：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>W</mi><mo>←</mo><mi>W</mi><mo>−</mo><mi>α</mi><mo>⋅</mo><mfrac><msub><mover accent=\"true\"><mi>m</mi><mo>^</mo></mover><mi>t</mi></msub><mrow><msqrt><msub><mover accent=\"true\"><mi>v</mi><mo>^</mo></mover><mi>t</mi></msub></msqrt><mo>+</mo><mi>ϵ</mi></mrow></mfrac><mo>−</mo><mi>α</mi><mo>⋅</mo><mi>λ</mi><mo>⋅</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W \\leftarrow W - \\alpha \\cdot \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon} - \\alpha \\cdot \\lambda \\cdot W\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">←</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3014em;vertical-align:-0.93em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em;\"><span style=\"top:-2.2528em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8572em;\"><span class=\"svg-align\" style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\" style=\"padding-left:0.833em;\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">v</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.2222em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8172em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1828em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord mathnormal\">ϵ</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em;\"><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord mathnormal\">m</span></span><span style=\"top:-3em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"accent-body\" style=\"left:-0.25em;\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span></span></p>\n<p>最后一项 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi><mo>⋅</mo><mi>λ</mi><mo>⋅</mo><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha \\cdot \\lambda \\cdot W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4445em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span></span></span></span> 是直接的权重衰减，不经过任何自适应缩放。这个简单的修改在 ImageNet 等大型实验中一致地<strong>提升了泛化性能</strong>。AdamW 已逐渐成为现代深度学习训练的<strong>首选优化器</strong>。</p>\n<h3 id=\"代码实现-4\"><a class=\"anchor\" href=\"#代码实现-4\">#</a> 代码实现</h3>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> adam</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1e-8</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    Adam 优化器</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    beta1: 一阶矩衰减率，默认 0.9</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    beta2: 二阶矩衰减率，默认 0.999</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    m </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 一阶矩（动量）</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">  # 二阶矩（RMSProp cache）</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    t </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">                 # 时间步计数</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            t </span><span style=\"color:#999999;--shiki-dark:#666666\">+=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 更新一阶矩和二阶矩</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            m </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dW </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 偏差修正</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            m_unbiased </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> t</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v_unbiased </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> t</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 参数更新</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m_unbiased </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sqrt</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">v_unbiased</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> adamw</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">          weight_decay</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> num_epochs</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1e-8</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    AdamW 优化器：将权重衰减从自适应学习率中解耦</span></span>\n<span class=\"line\"><span style=\"color:#B56959;--shiki-dark:#C98A7D\">    weight_decay: 权重衰减系数（即原 L2 正则化的 lambda）</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">    \"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    N </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    W </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W_init</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">copy</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    m </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros_like</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    t </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 0</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    loss_history </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    num_iters_per_epoch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> N </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">//</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> epoch </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_epochs</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        idx </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">random</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">permutation</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">N</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        X_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        y_shuffled </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">idx</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_iters_per_epoch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            t </span><span style=\"color:#999999;--shiki-dark:#666666\">+=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 1</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            start </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            end </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> start </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> batch_size</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            X_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            y_batch </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_shuffled</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">start</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">end</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 注意：梯度计算中不包含 L2 正则化项</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            scores </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X_batch</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dot</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">W</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> svm_loss_without_reg</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">scores</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y_batch</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            m </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> dW</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">+</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> *</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dW </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 2</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            m_unbiased </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta1 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> t</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            v_unbiased </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> v </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> -</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> beta2 </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">**</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> t</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 自适应更新 + 解耦的权重衰减</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> m_unbiased </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">/</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sqrt</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">v_unbiased</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#AB5959;--shiki-dark:#CB7676\"> +</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> eps</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            W </span><span style=\"color:#999999;--shiki-dark:#666666\">-=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> learning_rate </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> weight_decay </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">*</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W  </span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 独立的权重衰减步骤</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            loss_history</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">loss</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> W</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> loss_history</span></span></code></pre>\n<hr />\n<h2 id=\"优化器对比总结\"><a class=\"anchor\" href=\"#优化器对比总结\">#</a> 优化器对比总结</h2>\n<table>\n<thead>\n<tr>\n<th>方法</th>\n<th>动量/惯性</th>\n<th>自适应学习率</th>\n<th>偏差修正</th>\n<th>核心特点</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>SGD</td>\n<td>✗</td>\n<td>✗</td>\n<td>✗</td>\n<td>最基础、最朴素</td>\n</tr>\n<tr>\n<td>SGD+Momentum</td>\n<td>✓</td>\n<td>✗</td>\n<td>✗</td>\n<td>惯性冲过鞍点，平滑噪声</td>\n</tr>\n<tr>\n<td>RMSProp</td>\n<td>✗</td>\n<td>✓ (EMA)</td>\n<td>✗</td>\n<td>逐参数自适应，解决病态条件</td>\n</tr>\n<tr>\n<td>Adam</td>\n<td>✓</td>\n<td>✓ (EMA)</td>\n<td>✓</td>\n<td>动量 + 自适应 + 偏差修正，全能选手</td>\n</tr>\n<tr>\n<td>AdamW</td>\n<td>✓</td>\n<td>✓ (EMA)</td>\n<td>✓</td>\n<td>Adam + 解耦权重衰减，泛化更好</td>\n</tr>\n</tbody>\n</table>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn3/opt1.gif\" alt=\"各优化器在损失曲面上的收敛轨迹对比\" /></p>\n<hr />\n<h2 id=\"学习率调度策略-learning-rate-scheduling\"><a class=\"anchor\" href=\"#学习率调度策略-learning-rate-scheduling\">#</a> 学习率调度策略 Learning Rate Scheduling</h2>\n<p>学习率 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>α</mi></mrow><annotation encoding=\"application/x-tex\">\\alpha</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span></span></span></span> 不一定在整个训练过程中保持不变。一个好的学习率调度策略往往能显著提升最终性能。</p>\n<h3 id=\"常用策略\"><a class=\"anchor\" href=\"#常用策略\">#</a> 常用策略</h3>\n<p><strong>阶梯衰减 Step Decay</strong>：在预设的 epoch 节点将学习率乘以一个衰减因子（如 0.1）。例如 ResNet 在 epoch 30、60、90 各衰减一次。这是 CNN 时代的标准做法。</p>\n<p><strong>余弦退火 Cosine Annealing</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>α</mi><mi>t</mi></msub><mo>=</mo><mfrac><mn>1</mn><mn>2</mn></mfrac><mo>⋅</mo><msub><mi>α</mi><mn>0</mn></msub><mo>⋅</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>+</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mfrac><mi>t</mi><mi>T</mi></mfrac><mi>π</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\alpha_t = \\frac{1}{2} \\cdot \\alpha_0 \\cdot (1 + \\cos(\\frac{t}{T}\\pi))\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0074em;vertical-align:-0.686em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">2</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5945em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.9781em;vertical-align:-0.686em;\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2921em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">π</span><span class=\"mclose\">))</span></span></span></span></span></p>\n<p>学习率沿着余弦曲线平滑地从初始值衰减到零。目前是 Transformer 和大模型训练的主流选择。</p>\n<p><strong>线性衰减 Linear Decay</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>α</mi><mi>t</mi></msub><mo>=</mo><msub><mi>α</mi><mn>0</mn></msub><mo>⋅</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mo>−</mo><mi>t</mi><mi mathvariant=\"normal\">/</mi><mi>T</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\alpha_t = \\alpha_0 \\cdot (1 - t/T)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.5945em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.0037em;\">α</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0037em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">0</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mopen\">(</span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">t</span><span class=\"mord\">/</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">T</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>简单稳定，适合中小规模模型。</p>\n<p><strong>线性预热 Linear Warmup</strong>：在训练刚开始的若干步（通常前 5-10 个 epoch），将学习率从 0 线性增加到目标值。这防止了随机初始化的权重在最初几步产生过大的梯度导致训练不稳定。</p>\n<h3 id=\"实践建议\"><a class=\"anchor\" href=\"#实践建议\">#</a> 实践建议</h3>\n<ul>\n<li><strong>默认选择</strong>：<strong>AdamW + 线性预热 + 余弦退火衰减</strong>，这是 2025 年深度学习社区的主流方案</li>\n<li><strong>备选方案</strong>：如果计算资源充裕且需要极致性能，可以尝试 SGD + Momentum + 精心调参（学习率、动量、衰减策略），在一些大规模视觉任务上仍有超越 Adam 的可能</li>\n<li><strong>经验法则</strong>：batch size 翻倍时，学习率也翻倍（线性缩放法则）</li>\n<li><strong>不要过早优化</strong>：先用固定学习率观察 loss 曲线，确定模型能正常训练后再加入衰减策略</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nn3/learningrates.jpeg\" alt=\"不同学习率对 loss 收敛的影响\" /></p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/dataflow.jpeg\" alt=\"数据流总览：输入 → 得分 → 损失 → 梯度 → 更新\" /></p>\n<hr />\n<h2 id=\"声明\"><a class=\"anchor\" href=\"#声明\">#</a> 声明</h2>\n<p>本blog由Yumengmeng基于<a href=\"https://www.bilibili.com/video/BV1YJ3PzLEiW?spm_id_from=333.788.videopod.episodes&amp;vd_source=9f80ac68a038439c43f542a83ffa7b69&amp;p=3\">2025春季李飞飞斯坦福CS231n计算机视觉课程</a>的视频内容结合Claude Code抓取网上开源笔记进行美化与排版,仅供个人复习使用。</p>\n",
            "tags": [
                "CS231n学习笔记",
                "CS231n",
                "计算机视觉",
                "深度学习",
                "正则化",
                "优化算法"
            ]
        },
        {
            "id": "https://yumengmeng.cn/2026/05/31/CS231n%E2%80%94%E2%80%94lecture2%E4%BD%BF%E7%94%A8%E7%BA%BF%E6%80%A7%E5%88%86%E7%B1%BB%E5%99%A8%E8%BF%9B%E8%A1%8C%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/index/",
            "url": "https://yumengmeng.cn/2026/05/31/CS231n%E2%80%94%E2%80%94lecture2%E4%BD%BF%E7%94%A8%E7%BA%BF%E6%80%A7%E5%88%86%E7%B1%BB%E5%99%A8%E8%BF%9B%E8%A1%8C%E5%9B%BE%E5%83%8F%E5%88%86%E7%B1%BB/index/",
            "title": "CS231n——Lecture2 使用线性分类器进行图像分类",
            "date_published": "2026-05-31T06:27:50.000Z",
            "content_html": "<h2 id=\"图像分类-image-classification\"><a class=\"anchor\" href=\"#图像分类-image-classification\">#</a> 图像分类 Image Classification</h2>\n<p><strong>图像分类</strong>的核心任务：给定一张图像和一组类别标签，设计算法将其中一个标签分配给此图像。</p>\n<p>图像在计算机中就是一个巨大的数字网格，每个像素值介于 [0,255] 之间。对于一个 800×600 分辨率的彩色图像，其数据张量为 <strong>800 × 600 × 3</strong>，因为有 RGB 三个颜色通道（红、绿、蓝）。</p>\n<p><strong>语义鸿沟 Semantic Gap</strong>：人类看到图像能轻松识别物体，但计算机看到的只是一个巨大的整数矩阵。这个差距就是我们需要跨越的核心问题。</p>\n<ul>\n<li>图像分类面临的六大挑战：\n<ul>\n<li><strong>视角变化 viewpoint variation</strong>：即使物体完全静止，只要相机视角发生微小变化，数据张量就可能完全不同</li>\n<li><strong>光照条件 illumination conditions</strong>：RGB 像素值是表面材料颜色与光源共同作用的函数，同一物体在不同光线下数值差异巨大</li>\n<li><strong>形变 deformation</strong>：物体本身具有非刚性，姿态变化导致像素分布改变</li>\n<li><strong>遮挡 occlusion</strong>：物体被部分遮挡，人类可以通过部分信息做出精确判断，计算机则非常困难</li>\n<li><strong>背景杂乱 background clutter</strong>：物体与背景混杂在一起，难以分离前景和背景</li>\n<li><strong>类内变化 intraclass variation</strong>：同一类别内的个体差异本身就很大（比如不同品种、不同颜色的猫）</li>\n</ul>\n</li>\n</ul>\n<p>传统的基于边缘检测器的方法（找到图像边缘 → 提取特征 → 映射成输出类）效果有限，无法应对这些复杂性。</p>\n<p><strong>数据驱动方法 Data-driven approach</strong> 的核心思路：</p>\n<ul>\n<li>收集足够多的图像及其标签的数据集</li>\n<li>使用机器学习算法训练分类器——用训练函数接受图像与标签，建立一个将图像与标签关联的模型</li>\n<li>在新的图像上测试分类器——创建预测函数，输入测试图像，预测标签并返回</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/trainset.jpg\" alt=\"CIFAR-10 训练集示例\" /></p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> train</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">images</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> labels</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    # 记忆数据与标签</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> model</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> predict</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">model</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> test_images</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    # 为测试图像找到最相近的训练图像，输出该图像标签</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">    return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> test_labels</span></span></code></pre>\n<hr />\n<h2 id=\"最近邻分类器-nearest-neighbor-classifier\"><a class=\"anchor\" href=\"#最近邻分类器-nearest-neighbor-classifier\">#</a> 最近邻分类器 Nearest Neighbor Classifier</h2>\n<p>最简单的数据驱动方法：<strong>记住所有训练数据</strong>，预测时在训练集中找到与测试图像最相似的一张，输出其标签。</p>\n<p>我们需要一个<strong>距离函数</strong>来衡量两张图片的相似程度。</p>\n<p><strong>L1 距离</strong>（曼哈顿距离 Manhattan Distance）：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>d</mi><mn>1</mn></msub><mo stretchy=\"false\">(</mo><msub><mi>I</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>I</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>p</mi></munder><mi mathvariant=\"normal\">∣</mi><msubsup><mi>I</mi><mn>1</mn><mi>p</mi></msubsup><mo>−</mo><msubsup><mi>I</mi><mn>2</mn><mi>p</mi></msubsup><mi mathvariant=\"normal\">∣</mi></mrow><annotation encoding=\"application/x-tex\">d_1(I_1, I_2) = \\sum_p |I_1^p - I_2^p|\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4361em;vertical-align:-1.3861em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.9em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3861em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0486em;vertical-align:-0.2663em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span></span></span></span></span></p>\n<p>对每个像素位置 p，直接求像素差的绝对值并累加。如果两张图完全一样，L1 距离为零；差异越大，距离值越大。</p>\n<p><strong>L2 距离</strong>（欧氏距离 Euclidean Distance）：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>d</mi><mn>2</mn></msub><mo stretchy=\"false\">(</mo><msub><mi>I</mi><mn>1</mn></msub><mo separator=\"true\">,</mo><msub><mi>I</mi><mn>2</mn></msub><mo stretchy=\"false\">)</mo><mo>=</mo><msqrt><mrow><munder><mo>∑</mo><mi>p</mi></munder><mo stretchy=\"false\">(</mo><msubsup><mi>I</mi><mn>1</mn><mi>p</mi></msubsup><mo>−</mo><msubsup><mi>I</mi><mn>2</mn><mi>p</mi></msubsup><msup><mo stretchy=\"false\">)</mo><mn>2</mn></msup></mrow></msqrt></mrow><annotation encoding=\"application/x-tex\">d_2(I_1, I_2) = \\sqrt{\\sum_p (I_1^p - I_2^p)^2}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em;\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:3.04em;vertical-align:-1.5742em;\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4658em;\"><span class=\"svg-align\" style=\"top:-5em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"mord\" style=\"padding-left:1em;\"><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.9em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3861em;\"><span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7823em;\"><span style=\"top:-2.4337em;margin-left:-0.0785em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span><span style=\"top:-3.1809em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em;\"><span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"mclose\">)</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em;\"><span style=\"top:-2.989em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.4258em;\"><span class=\"pstrut\" style=\"height:5em;\"></span><span class=\"hide-tail\" style=\"min-width:1.02em;height:3.08em;\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"3.08em\" viewBox=\"0 0 400000 3240\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M473,2793\nc339.3,-1799.3,509.3,-2700,510,-2702 l0 -0\nc3.3,-7.3,9.3,-11,18,-11 H400000v40H1017.7\ns-90.5,478,-276.2,1466c-185.7,988,-279.5,1483,-281.5,1485c-2,6,-10,9,-24,9\nc-8,0,-12,-0.7,-12,-2c0,-1.3,-5.3,-32,-16,-92c-50.7,-293.3,-119.7,-693.3,-207,-1200\nc0,-1.3,-5.3,8.7,-16,30c-10.7,21.3,-21.3,42.7,-32,64s-16,33,-16,33s-26,-26,-26,-26\ns76,-153,76,-153s77,-151,77,-151c0.7,0.7,35.7,202,105,604c67.3,400.7,102,602.7,104,\n606zM1001 80h400000v40H1017.7z\"/></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5742em;\"><span></span></span></span></span></span></span></span></span></span></p>\n<p>L2 是几何意义上的直线距离。</p>\n<p>完整的 Nearest Neighbor 分类器实现：</p>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">import</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> numpy </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">as</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">class</span><span style=\"color:#2E8F82;--shiki-dark:#5DA994\"> NearestNeighbor</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> __init__</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        pass</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> train</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">        \"\"\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">训练就是记住所有训练数据，只需 O(1) 时间</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xtr </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span></span>\n<span class=\"line\"><span style=\"color:#A65E2B;--shiki-dark:#C99076\">        self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">ytr </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> y</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#AB5959;--shiki-dark:#CB7676\">    def</span><span style=\"color:#59873A;--shiki-dark:#80A665\"> predict</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">        \"\"\"</span><span style=\"color:#B56959;--shiki-dark:#C98A7D\">预测：对每个测试样本，遍历训练集找到最近邻</span><span style=\"color:#B5695977;--shiki-dark:#C98A7D77\">\"\"\"</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        num_test </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">shape</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">0</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">        Ypred </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">zeros</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_test</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> dtype</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">ytr</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">dtype</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> i </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#998418;--shiki-dark:#B8A965\"> range</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">num_test</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 使用 L1 距离：广播计算测试样本与所有训练样本的差值</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            distances </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">sum</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">abs</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\">self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xtr </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">-</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> X</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> :</span><span style=\"color:#a65e2b;--shiki-dark:#d4976c\">]</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> axis</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 找到距离最小的训练样本索引</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            min_index </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">argmin</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">distances</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">            # 输出该训练样本的标签</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">            Ypred</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">i</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\"> =</span><span style=\"color:#A65E2B;--shiki-dark:#C99076\"> self</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">ytr</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">min_index</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">        return</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Ypred</span></span></code></pre>\n<p>这个分类器的设计哲学是：<strong>训练快（O(1)），预测慢（O(N)）</strong>。但实际应用中我们恰好需要相反的特性——可以接受训练慢一些，但预测必须快，因为用户不会等。</p>\n<ul>\n<li>L1 与 L2 距离的关键区别：\n<ul>\n<li><strong>L1 距离对特征值敏感</strong>：它的决策边界几乎平行于坐标轴，对图像的旋转会导致 L1 形状变化</li>\n<li><strong>L2 距离对特征值不敏感</strong>：它的决策边界不受坐标轴方向限制，旋转图像后 L2 看不出变化</li>\n<li>实际选择取决于特征向量中每个维度的含义——如果各维度有明确的物理意义，L1 可能更好；如果只是通用空间向量，L2 更合适</li>\n</ul>\n</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/samenorm.png\" alt=\"L1 与 L2 距离对坐标系的敏感度差异\" /></p>\n<hr />\n<h2 id=\"k-邻近算法-k-nearest-neighbor\"><a class=\"anchor\" href=\"#k-邻近算法-k-nearest-neighbor\">#</a> K-邻近算法 K-Nearest Neighbor</h2>\n<p>最邻近算法可以自然地扩展为 <strong>K 邻近算法</strong>：选择距离最近的 K 个邻居进行<strong>多数投票</strong>，票数最多的类别作为预测结果。</p>\n<ul>\n<li>当 K &gt; 1 时，决策边界变得更平滑，对噪声和离群点更鲁棒</li>\n<li>图中白色区域是 K 个邻居中平票的区域，无法确定标签——说明这个区域数据不足，适合收集更多数据</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/nneg.jpeg\" alt=\"不同 K 值下的决策边界（K=1,3,5）\" /></p>\n<p>在 <strong>CIFAR-10</strong> 数据集上（10 个类别，50000 张训练集，10000 张测试集），KNN 的准确率约为 <strong>28%～29%</strong>，仅比随机猜测（10%）好一些。</p>\n<p><strong>超参数 Hyperparameters</strong> 是由用户设定的变量（如 K 值、距离函数的选择），模型无法从数据中自动学到。如何选择超参数是机器学习中的关键问题。</p>\n<ul>\n<li>四种超参数设置策略（递进式推理）：\n<ol>\n<li>❌ <strong>选择在训练集上表现最好的超参数</strong>：模型会&quot;记住&quot;训练数据，训练准确率始终接近 100%，但严重过拟合，无法泛化到新数据</li>\n<li>❌ <strong>选择在测试集上表现最好的超参数</strong>：这是&quot;作弊&quot;行为——测试集信息泄露到超参数选择中，我们无法知道算法在新数据上的真实表现</li>\n<li>✅ <strong>将数据分为训练集 train、验证集 validation、测试集 test</strong>：在训练集上用不同超参数训练，在验证集上评估并选出最佳超参数，最后<strong>仅在测试集上运行一次</strong>得到最终结果。测试集是一种极其宝贵的资源，在最后一步之前永远不要碰它。这个方法的难点在于需要选择合适的验证集划分</li>\n<li>✅ <strong>K-Fold 交叉验证 Cross-Validation</strong>：将训练数据折叠成 K 个褶皱，每个褶皱轮流充当一次验证集，其余 K-1 个做训练集。训练 K 次后取平均精度。最后用最佳超参数在全部训练数据上重新训练，在测试集上评估一次。这可以产生更可靠的结果，但在大数据集上计算负担很重，深度学习实践中不常用</li>\n</ol>\n</li>\n</ul>\n<pre class=\"shiki shiki-themes vitesse-light vitesse-dark\" style=\"background-color:#ffffff;--shiki-dark-bg:#121212;color:#393a34;--shiki-dark:#dbd7caee\" tabindex=\"0\"><code class=\"language-python\"><span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 交叉验证示例：在 CIFAR-10 上为 KNN 寻找最佳 K 值</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\"># 假设 Xtr_rows 和 Ytr 是训练数据</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xval_rows </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Xtr_rows</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1000</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#999999;--shiki-dark:#666666\"> :</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    # 取前 1000 个作为验证集</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Yval </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Ytr</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1000</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xtr_rows </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Xtr_rows</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1000</span><span style=\"color:#999999;--shiki-dark:#666666\">:,</span><span style=\"color:#999999;--shiki-dark:#666666\"> :</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">     # 剩余作为训练集</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Ytr </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Ytr</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1000</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">validation_accuracies </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span></span>\n<span class=\"line\"><span style=\"color:#1E754F;--shiki-dark:#4D9375\">for</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> k </span><span style=\"color:#1E754F;--shiki-dark:#4D9375\">in</span><span style=\"color:#999999;--shiki-dark:#666666\"> </span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">[</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\">1</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 3</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 5</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 10</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 20</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 50</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#2F798A;--shiki-dark:#4C9A91\"> 100</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">]</span><span style=\"color:#999999;--shiki-dark:#666666\">:</span></span>\n<span class=\"line\"><span style=\"color:#A0ADA0;--shiki-dark:#758575DD\">    # 对每个 k 值评估验证集精度</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    nn </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> NearestNeighbor</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    nn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">train</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xtr_rows</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Ytr</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    Yval_predict </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> nn</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">predict</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Xval_rows</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#B07D48;--shiki-dark:#BD976A\"> k</span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">k</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    acc </span><span style=\"color:#999999;--shiki-dark:#666666\">=</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> np</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">mean</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">Yval_predict </span><span style=\"color:#AB5959;--shiki-dark:#CB7676\">==</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> Yval</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span>\n<span class=\"line\"><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">    validation_accuracies</span><span style=\"color:#999999;--shiki-dark:#666666\">.</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">append</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">(</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">(</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\">k</span><span style=\"color:#999999;--shiki-dark:#666666\">,</span><span style=\"color:#393A34;--shiki-dark:#DBD7CAEE\"> acc</span><span style=\"color:#1e754f;--shiki-dark:#4d9375\">)</span><span style=\"color:#2993a3;--shiki-dark:#5eaab5\">)</span></span></code></pre>\n<p>实验结果表明 CIFAR-10 上最优 K 值约为 <strong>7</strong>。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/cvplot.png\" alt=\"交叉验证中不同 K 值的分类精度\" /></p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/crossval.jpeg\" alt=\"K-Fold 交叉验证示意图\" /></p>\n<p><strong>KNN 的三大缺陷</strong>：</p>\n<ul>\n<li><strong>预测时间极慢</strong>：训练 O(1)，预测 O(N)，与实际需求完全相反。我们关注的是测试效率，用户不会为了分类一张图片等上几分钟</li>\n<li><strong>像素距离不等于语义距离</strong>：从人眼来看四张完全不同的图片，它们的 L2 像素距离可能完全相同——背景、姿态、光照的不同导致逐像素比较彻底失效</li>\n<li><strong>维度灾难 Curse of Dimensionality</strong>：在高维空间中，距离的概念变得反直觉——所有点看起来都很远，最近邻失去意义。要密集覆盖高维空间，所需样本数以指数级增长</li>\n</ul>\n<p>但 KNN 仍然是一个理解数据驱动方法和超参数调优的绝佳起点。</p>\n<hr />\n<h2 id=\"线性分类器-linear-classifier\"><a class=\"anchor\" href=\"#线性分类器-linear-classifier\">#</a> 线性分类器 Linear Classifier</h2>\n<p>线性分类器是整个神经网络和卷积网络的<strong>基础构件</strong>。大规模神经网络实质上就是这些基础单元一层层堆叠起来的。</p>\n<h3 id=\"核心公式\"><a class=\"anchor\" href=\"#核心公式\">#</a> 核心公式</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>W</mi><mi>x</mi><mo>+</mo><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">f(x, W) = Wx + b\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7667em;vertical-align:-0.0833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mord mathnormal\">x</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">b</span></span></span></span></span></p>\n<ul>\n<li><strong>x</strong>：输入图像，展开为列向量。CIFAR-10 图像为 32×32×3 = <strong>3072 维</strong></li>\n<li><strong>W</strong>：权重矩阵 <strong>Weights</strong>，又称为参数。输出是 10 个类别的得分，所以 W 的维度为 <strong>10 × 3072</strong></li>\n<li><strong>b</strong>：偏置向量 <strong>Bias</strong>，维度为 <strong>10 × 1</strong>，表示数据独立的类别偏好值（例如数据集中猫的图像比狗多，则猫对应的偏置会更大）</li>\n<li><strong>f(x, W)</strong>：输出是一个 10 维向量，每个元素对应一个类别的<strong>得分 score</strong></li>\n</ul>\n<p>W 的每一行对应一个类别的分类器。线性分类器的几何本质是：<strong>在 3072 维空间中找到一个超平面，将不同类别区分开来</strong>。</p>\n<p>如果没有偏置 b，每条分界线都必须穿过原点，分类将失去灵活性——比如当某个类别的所有样本都落在第一象限时，穿过原点的直线无法很好地分离它们。</p>\n<h3 id=\"技巧将-w-和-b-合并\"><a class=\"anchor\" href=\"#技巧将-w-和-b-合并\">#</a> 技巧：将 W 和 b 合并</h3>\n<p>在 x 向量的末尾添加一个常数维度 1，同时将 b 作为新的一列附加到 W 中：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>f</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo separator=\"true\">,</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>W</mi><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">f(x, W) = Wx\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10764em;\">f</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mord mathnormal\">x</span></span></span></span></span></p>\n<p>这样公式更简洁，实现时也更方便。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/wb.jpeg\" alt=\"偏置技巧：将 b 合并入 W\" /></p>\n<h3 id=\"模板匹配视角-template-matching\"><a class=\"anchor\" href=\"#模板匹配视角-template-matching\">#</a> 模板匹配视角 Template Matching</h3>\n<p>线性分类器还可以从另一个角度理解：</p>\n<ul>\n<li>W 的每一行可以重新排列成与输入图像相同的大小（32×32×3），想象成一张&quot;模板&quot;图像</li>\n<li>线性分类就是<strong>使用内积来比较输入图像与每个类别的模板，找到最相似的那个</strong></li>\n<li>以<strong>船</strong>分类器为例：蓝色通道（水和天空）有许多正权重，而红色和绿色通道多为负权重——这恰好是一张蓝色大海背景下船只轮廓的模糊模板</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/templates.jpg\" alt=\"CIFAR-10 学习到的权重模板\" /></p>\n<h3 id=\"图像预处理\"><a class=\"anchor\" href=\"#图像预处理\">#</a> 图像预处理</h3>\n<p>在输入分类器之前，建议将像素值从 [0, 255] 归一化到 [-1, 1] 范围，有助于训练稳定。</p>\n<h3 id=\"线性分类器的局限性\"><a class=\"anchor\" href=\"#线性分类器的局限性\">#</a> 线性分类器的局限性</h3>\n<p>线性分类器每类只能学习<strong>一个模板</strong>，无法处理多模态数据。以奇偶像素计数分类为例：蓝色类别在平面上占据两个相反的象限——<strong>没有办法绘制一条单独的直线来同时覆盖两个离散的蓝色区域</strong>。</p>\n<p>以下三类问题线性分类器无法解决：</p>\n<ul>\n<li>奇偶问题（类别按奇偶数分布在两个相反的象限）</li>\n<li>同心圆问题（内外圆分属两类）</li>\n<li>多模态分布（同一类分布在空间中不相邻的多个区域）</li>\n</ul>\n<p>这些局限正是后续引入神经网络和非线性激活函数的原因。</p>\n<hr />\n<h2 id=\"损失函数-loss-function\"><a class=\"anchor\" href=\"#损失函数-loss-function\">#</a> 损失函数 Loss Function</h2>\n<p>现在的问题是：<strong>如何量化 W 的好坏？</strong> 损失函数用于衡量分类器的&quot;糟糕程度&quot;。我们的目标是找到一个 W 使得损失函数值最小。</p>\n<h3 id=\"多分类-svm-损失-multiclass-svm-losshinge-loss\"><a class=\"anchor\" href=\"#多分类-svm-损失-multiclass-svm-losshinge-loss\">#</a> 多分类 SVM 损失 Multiclass SVM Loss（Hinge Loss）</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><munder><mo>∑</mo><mrow><mi>j</mi><mo mathvariant=\"normal\">≠</mo><msub><mi>y</mi><mi>i</mi></msub></mrow></munder><mi>max</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo separator=\"true\">,</mo><msub><mi>s</mi><mi>j</mi></msub><mo>−</mo><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub><mo>+</mo><mi mathvariant=\"normal\">Δ</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L_i = \\sum_{j \\neq y_i} \\max(0, s_j - s_{y_i} + \\Delta)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4882em;vertical-align:-1.4382em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8479em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span><span class=\"mrel mtight\"><span class=\"mrel mtight\"><span class=\"mord vbox mtight\"><span class=\"thinbox mtight\"><span class=\"rlap mtight\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em;\"></span><span class=\"inner\"><span class=\"mord mtight\"><span class=\"mrel mtight\"></span></span></span><span class=\"fix\"></span></span></span></span></span><span class=\"mrel mtight\">=</span></span><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4382em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">max</span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8694em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">Δ</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>s</mi><msub><mi>y</mi><mi>i</mi></msub></msub></mrow><annotation encoding=\"application/x-tex\">s_{y_i}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7167em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:-0.0359em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.143em;\"><span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span>：正确类别的得分</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>s</mi><mi>j</mi></msub></mrow><annotation encoding=\"application/x-tex\">s_j</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7167em;vertical-align:-0.2861em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span></span></span></span>：错误类别的得分</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Δ</mi></mrow><annotation encoding=\"application/x-tex\">\\Delta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span></span></span></span>：安全间隔 margin，通常取 1</li>\n</ul>\n<p>直观理解：我们希望正确类别的得分比所有错误类别高出至少 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Δ</mi></mrow><annotation encoding=\"application/x-tex\">\\Delta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span></span></span></span>。如果某个错误类别的得分与正确类别差距小于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Δ</mi></mrow><annotation encoding=\"application/x-tex\">\\Delta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span></span></span></span>，则产生损失；如果差距足够大（大于 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Δ</mi></mrow><annotation encoding=\"application/x-tex\">\\Delta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span></span></span></span>），则损失为零。这个损失被称为 <strong>Hinge Loss</strong>，因为它的形状像合页一样在 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Δ</mi></mrow><annotation encoding=\"application/x-tex\">\\Delta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord\">Δ</span></span></span></span> 处弯折。</p>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/margin.jpg\" alt=\"多分类 SVM 损失示意图\" /></p>\n<h3 id=\"softmax-分类器交叉熵损失-cross-entropy-loss\"><a class=\"anchor\" href=\"#softmax-分类器交叉熵损失-cross-entropy-loss\">#</a> Softmax 分类器（交叉熵损失 Cross-Entropy Loss）</h3>\n<p>将得分转化为概率分布：</p>\n<ol>\n<li><strong>指数化</strong>：取 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>e</mi><msub><mi>s</mi><mi>k</mi></msub></msup></mrow><annotation encoding=\"application/x-tex\">e^{s_k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6644em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>，确保所有值 &gt; 0</li>\n<li><strong>归一化</strong>：除以所有指数之和 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mo>∑</mo><mi>j</mi></msub><msup><mi>e</mi><msub><mi>s</mi><mi>j</mi></msub></msup></mrow><annotation encoding=\"application/x-tex\">\\sum_j e^{s_j}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.1858em;vertical-align:-0.4358em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.162em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4358em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2819em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>，使所有概率之和为 1</li>\n</ol>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>P</mi><mo stretchy=\"false\">(</mo><mi>Y</mi><mo>=</mo><mi>k</mi><mi mathvariant=\"normal\">∣</mi><mi>X</mi><mo>=</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo><mo>=</mo><mfrac><msup><mi>e</mi><msub><mi>s</mi><mi>k</mi></msub></msup><mrow><munder><mo>∑</mo><mi>j</mi></munder><msup><mi>e</mi><msub><mi>s</mi><mi>j</mi></msub></msup></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">P(Y=k|X=x_i) = \\frac{e^{s_k}}{\\sum_j e^{s_j}}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em;\">k</span><span class=\"mord\">∣</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4632em;vertical-align:-1.1218em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3414em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.162em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4358em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6065em;\"><span style=\"top:-3.0051em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3281em;\"><span style=\"top:-2.357em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em;\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2819em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">e</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em;\"><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em;\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em;\"><span class=\"pstrut\" style=\"height:2.5em;\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em;\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1218em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p>例如模型输出 [cat: 0.13, car: 0.87, frog: 0.00]，意味着模型认为这张图是猫的概率为 13%，是车的概率为 87%，是青蛙的概率为 0%。</p>\n<p>损失函数直接取<strong>负对数似然 Negative Log Likelihood</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><mi>P</mi><mo stretchy=\"false\">(</mo><mi>Y</mi><mo>=</mo><msub><mi>y</mi><mi>i</mi></msub><mi mathvariant=\"normal\">∣</mi><mi>X</mi><mo>=</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L_i = -\\log P(Y=y_i|X=x_i)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em;\">Y</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em;\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em;\">X</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>本质是在做<strong>最大似然估计 Maximum Likelihood Estimation</strong>——我们希望正确类别的概率越大越好。</p>\n<h3 id=\"交叉熵与-kl-散度的关系\"><a class=\"anchor\" href=\"#交叉熵与-kl-散度的关系\">#</a> 交叉熵与 KL 散度的关系</h3>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>H</mi><mo stretchy=\"false\">(</mo><mi>P</mi><mo separator=\"true\">,</mo><mi>Q</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mo>−</mo><munder><mo>∑</mo><mi>x</mi></munder><mi>P</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mi>log</mi><mo>⁡</mo><mi>Q</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">H(P, Q) = -\\sum_x P(x) \\log Q(x)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3em;vertical-align:-1.25em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.9em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>D</mi><mrow><mi>K</mi><mi>L</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>P</mi><mo>∥</mo><mi>Q</mi><mo stretchy=\"false\">)</mo><mo>=</mo><munder><mo>∑</mo><mi>x</mi></munder><mi>P</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo><mi>log</mi><mo>⁡</mo><mfrac><mrow><mi>P</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow><mrow><mi>Q</mi><mo stretchy=\"false\">(</mo><mi>x</mi><mo stretchy=\"false\">)</mo></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">D_{KL}(P \\parallel Q) = \\sum_x P(x) \\log \\frac{P(x)}{Q(x)}\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em;\">K</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.677em;vertical-align:-1.25em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.9em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">x</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">Q</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">x</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>H</mi><mo stretchy=\"false\">(</mo><mi>P</mi><mo separator=\"true\">,</mo><mi>Q</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>H</mi><mo stretchy=\"false\">(</mo><mi>P</mi><mo stretchy=\"false\">)</mo><mo>+</mo><msub><mi>D</mi><mrow><mi>K</mi><mi>L</mi></mrow></msub><mo stretchy=\"false\">(</mo><mi>P</mi><mo>∥</mo><mi>Q</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">H(P, Q) = H(P) + D_{KL}(P \\parallel Q)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em;\">H</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em;\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em;\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em;\">K</span><span class=\"mord mathnormal mtight\">L</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">∥</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<ul>\n<li>P 是真实概率分布，Q 是模型预测的概率分布</li>\n<li>H(P) 是真实分布的熵（常数，真实标签固定不变）</li>\n<li><strong>交叉熵 = KL 散度 + 常数</strong>，所以最小化交叉熵就是在最小化 KL 散度，让预测分布逼近真实分布</li>\n</ul>\n<p><img loading=\"lazy\" src=\"https://cs231n.github.io/assets/svmvssoftmax.png\" alt=\"SVM vs Softmax 损失函数对比\" /></p>\n<h3 id=\"两个实用的-debug-问题\"><a class=\"anchor\" href=\"#两个实用的-debug-问题\">#</a> 两个实用的 Debug 问题</h3>\n<p><strong>Q1：Softmax 损失函数的最大值是多少？</strong></p>\n<p>理论上为<strong>无穷大</strong>。当正确类别的概率趋近于 0 时，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo>−</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>0</mn><mo stretchy=\"false\">)</mo><mo>→</mo><mi mathvariant=\"normal\">∞</mi></mrow><annotation encoding=\"application/x-tex\">-\\log(0) \\to \\infty</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord\">0</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em;\"></span><span class=\"mord\">∞</span></span></span></span>。实际中由于数值精度限制，概率不会精确为零，但可以非常大。</p>\n<p><strong>Q2：初始化时所有权重为小随机数，所有 <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>s</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">s_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">s</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span></span></span></span> 近似相等，损失函数的值是多少？</strong></p>\n<p>所有类别概率相等，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><mo>=</mo><mn>1</mn><mi mathvariant=\"normal\">/</mi><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">P = 1/C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">P</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">1/</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span></span></span></span>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><mo>−</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>1</mn><mi mathvariant=\"normal\">/</mi><mi>C</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>C</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L_i = -\\log(1/C) = \\log(C)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord\">1/</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em;\">C</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<p>当 C = 10 时，<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>L</mi><mi>i</mi></msub><mo>=</mo><mi>log</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mn>10</mn><mo stretchy=\"false\">)</mo><mo>≈</mo><mn>2.3</mn></mrow><annotation encoding=\"application/x-tex\">L_i = \\log(10) \\approx 2.3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mop\">lo<span style=\"margin-right:0.01389em;\">g</span></span><span class=\"mopen\">(</span><span class=\"mord\">10</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">≈</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em;\"></span><span class=\"mord\">2.3</span></span></span></span>。这是一个很有用的 sanity check：训练刚开始时如果 loss 偏离这个值太多，说明实现中很可能有 bug。</p>\n<h3 id=\"正则化-regularization\"><a class=\"anchor\" href=\"#正则化-regularization\">#</a> 正则化 Regularization</h3>\n<p>完整损失函数 = <strong>数据损失 Data Loss + 正则化项 Regularization</strong>：</p>\n<p><span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>L</mi><mo>=</mo><mfrac><mn>1</mn><mi>N</mi></mfrac><munder><mo>∑</mo><mi>i</mi></munder><msub><mi>L</mi><mi>i</mi></msub><mo>+</mo><mi>λ</mi><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">L = \\frac{1}{N} \\sum_i L_i + \\lambda R(W)\n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em;\"></span><span class=\"mord mathnormal\">L</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.5991em;vertical-align:-1.2777em;\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em;\"><span style=\"top:-2.314em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em;\">N</span></span></span><span style=\"top:-3.23em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em;\"></span></span><span style=\"top:-3.677em;\"><span class=\"pstrut\" style=\"height:3em;\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em;\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em;\"><span style=\"top:-1.8723em;margin-left:0em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span><span style=\"top:-3.05em;\"><span class=\"pstrut\" style=\"height:3.05em;\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.2777em;\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\">L</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em;\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\">λ</span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span></span></span></span></span></p>\n<ul>\n<li><strong>数据损失</strong>：让模型拟合训练数据，最小化预测误差</li>\n<li><strong>正则化项</strong>：惩罚复杂模型，鼓励简单权重，防止过拟合</li>\n<li><strong><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>λ</mi></mrow><annotation encoding=\"application/x-tex\">\\lambda</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em;\"></span><span class=\"mord mathnormal\">λ</span></span></span></span></strong>：平衡两个目标的超参数</li>\n</ul>\n<p>常见的正则化方式：</p>\n<ul>\n<li><strong>L2 正则化</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mo>∑</mo><mi>k</mi></msub><msub><mo>∑</mo><mi>l</mi></msub><msubsup><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow><mn>2</mn></msubsup></mrow><annotation encoding=\"application/x-tex\">R(W) = \\sum_k \\sum_l W_{k,l}^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2333em;vertical-align:-0.4192em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1864em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1864em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em;\"><span style=\"top:-2.4169em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span style=\"top:-3.063em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.4192em;\"><span></span></span></span></span></span></span></span></span></span>，鼓励权重分散到所有维度，惩罚个别过大的权重值</li>\n<li><strong>L1 正则化</strong>：<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi>W</mi><mo stretchy=\"false\">)</mo><mo>=</mo><msub><mo>∑</mo><mi>k</mi></msub><msub><mo>∑</mo><mi>l</mi></msub><mi mathvariant=\"normal\">∣</mi><msub><mi>W</mi><mrow><mi>k</mi><mo separator=\"true\">,</mo><mi>l</mi></mrow></msub><mi mathvariant=\"normal\">∣</mi></mrow><annotation encoding=\"application/x-tex\">R(W) = \\sum_k \\sum_l |W_{k,l}|</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em;\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em;\">R</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em;\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0497em;vertical-align:-0.2997em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1864em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mop\"><span class=\"mop op-symbol small-op\" style=\"position:relative;top:0em;\">∑</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1864em;\"><span style=\"top:-2.4003em;margin-left:0em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2997em;\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em;\"></span><span class=\"mord\">∣</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em;\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;\"><span class=\"pstrut\" style=\"height:2.7em;\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em;\">k</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em;\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em;\"><span></span></span></span></span></span></span><span class=\"mord\">∣</span></span></span></span>，鼓励稀疏权重</li>\n<li><strong>Elastic Net</strong>：L1 + L2 的结合</li>\n</ul>\n<hr />\n<h2 id=\"声明\"><a class=\"anchor\" href=\"#声明\">#</a> 声明</h2>\n<p>本blog由Yumengmeng基于<a href=\"https://www.bilibili.com/video/BV1YJ3PzLEiW?spm_id_from=333.788.videopod.episodes&amp;vd_source=9f80ac68a038439c43f542a83ffa7b69&amp;p=3\">2025春季李飞飞斯坦福CS231n计算机视觉课程</a>的视频内容结合Claude Code抓取网上开源笔记进行美化与排版,仅供个人复习使用。</p>\n",
            "tags": [
                "CS231n学习笔记",
                "CS231n",
                "计算机视觉",
                "深度学习"
            ]
        }
    ]
}