AI Engineer · LLM Systems · Multi-Agent

Hi, I'm Do Pho Hieu Anh.

Final-year Computer Science student building self-hosted, production-grade AI ecosystems — from LangGraph ReAct orchestration and Model Context Protocol microservices, to hybrid retrieval, GPU-accelerated OCR, and hallucination guardrails. Zero managed cloud AI required.

📍 Hanoi, Vietnam 🎂 Born 11 May 2004 🎓 NEU · Computer Science · 2022–2026 📊 GPA 3.32 / 4.0 🏆 2nd · AI Olympics 2025 (NCT)
12
Docker services
3
MCP microservices
~75%
VRAM saved (NF4)
119
Credits earned
About me

Translating AI research into shipping systems.

I designed and implemented every layer of a multi-agent system from the ground up — agent orchestration, tool protocol, retrieval, vision, and observability. My strongest competency is taking cutting-edge research and turning it into containerized, deployment-ready infrastructure that runs entirely on private hardware.

Tech stack

Technical skills

Agent & Reasoning
LangGraph (ReAct)Model Context Protocol Multi-step Tool CallingCoT / Few-shot Loop Detection & Budgets
LLM & Inference
HuggingFace Transformers4-bit NF4 (bitsandbytes) Mistral-7BLlama-3 Qwenbfloat16
RAG & Search
Hybrid (BM25 + KNN)Elasticsearch 8.x ChromaDB Sentence TransformersReranking
Computer Vision / OCR
GOT-OCR 2.0PyMuPDF pdf2image / TesseractPIL preprocessing
Quality & Guardrails
BERTScoreLLaMA-1B Inspector Fuzzy Duplicate DetectionUnicode Normalization
Web Intelligence
Firecrawl + PlaywrightSearXNG RabbitMQMarkdown Extraction
Backend & API
FastAPIhttpx / asyncio REST microservicesLangGraph state machines
Databases & Caching
MongoDBRedisElasticsearch
Infra & DevOps
Docker / ComposeNginx Reverse Proxy NVIDIA GPU pinningArize Phoenix (OTEL)
AI workflow

AI Tools & IDEs I Use Daily

The AI toolchain and IDEs I use every day for research, coding, and model ops — combining CLI coding agents, local LLM runtimes, knowledge tools, and cross-platform development environments.

AGENT CLI

Coding Agents

  • Claude Code — Anthropic's official agentic CLI for refactoring, code generation, and orchestrating multi-file tasks directly in the terminal.
  • OpenClaw — Open-source CLI agent used as an alternative to Claude Code for local automation pipelines.
  • Cline + Local LLM Coder — Cline (VS Code) paired with a local code model (Qwen-Coder / DeepSeek-Coder via Ollama) for fully offline coding with no cloud dependency.
LOCAL LLM

Local Model Runtimes

  • Ollama — Local LLM runtime; pulls and serves 7B–14B models (Qwen, Llama, Mistral) over a REST API on the RTX 5080.
  • LM Studio — Desktop GUI for running GGUF models offline; used for quick benchmarks and prompt iteration before promoting to the production pipeline.
RESEARCH

Knowledge & Research

  • Perplexity — Cited-source AI search; used for rapid surveys of new papers, frameworks, and best practices.
  • NotebookLM — Google's grounded notebook; condenses internal docs (PDFs, slides) into source-cited briefs for fast onboarding.
IDE

Development Environments

  • VS Code — Primary IDE for Python / TypeScript / Docker, integrated with Cline, Copilot, and remote-WSL.
  • Antigravity IDE — Google's agentic IDE; experimented with for multi-agent workflows on the same codebase.
  • Android Studio — Building and debugging Android apps, especially for the on-device AI products on my roadmap.
Self-hosted infrastructure

Server & Hardware Management

Toàn bộ hệ thống multi-agent của tôi chạy trực tiếp trên một workstation cá nhân — không EC2, không vendor-lock. Tôi tự quản trị từ tầng OS, GPU driver, container runtime cho tới observability, dùng WSL2 làm môi trường Linux production-grade trên Windows.

WORKSTATION

Personal AI Rig — 24/7 Inference Node

  • CPU: Intel Core Ultra 7 265K · 20 cores / 20 threads
  • RAM: 32 GB DDR5
  • GPU: NVIDIA RTX 5080 · 16 GB VRAM (Blackwell, CUDA 12.x)
  • Storage: 500 GB NVMe SSD
  • Host OS: Windows 11
  • Dev/Runtime: WSL2 · Ubuntu 22.04 LTS
OPS

Operations & Tuning

  • VRAM budgeting on 16 GB: 4-bit NF4 quantization + bfloat16 to fit Mistral-7B + LLaMA-3.2-1B + GOT-OCR concurrently.
  • GPU pinning via Docker --gpus & CUDA_VISIBLE_DEVICES; isolated per-service to prevent OOM cascades.
  • WSL2 tuning: custom .wslconfig (memory cap, swap, kernel args) + systemd enabled for clean Docker/Nginx services.
  • NVIDIA Container Toolkit bridging Windows driver → WSL2 → Docker, validated with nvidia-smi inside containers.
  • Reverse proxy: Nginx terminates traffic, routes to FastAPI / 3 MCP services / Phoenix UI on a private Docker network.
  • Observability: Arize Phoenix (OTEL) traces every ReAct step; Docker healthchecks + restart policies for unattended operation.
  • Storage hygiene on 500 GB: volume-mounted model cache, periodic prune of dangling images/layers, logs rotated.
Selected work

Projects

FLAGSHIP

AIchat — Self-Hosted Multi-Agent AI Ecosystem

2024 – 2025 · Python · LangGraph · MCP · Docker · CUDA
⭐ GitHub 🖥️ Request Demo

A Vietnamese-native, on-prem agentic RAG platform: a LangGraph ReAct loop driving Qwen2.5-7B across three Model Context Protocol microservices (Local Data, Web Agent, OCR Vision), with hybrid retrieval, GPU-accelerated OCR, and BERTScore hallucination middleware — wired together as a 12-container Docker Compose stack. No external LLM keys.

Agent Orchestration (ReAct)

  • Multi-node LangGraph StateGraph driving a full ReAct loop with structured output parsing.
  • 3-layer loop prevention: tool budget, exact-duplicate, and fuzzy duplicate (Unicode normalization + sorted word sets).
  • Auto web-search intent detection with 35+ keyword patterns enabling zero-config search routing.
  • Force-Final-Answer injection & server-side query sanitization protecting internal metadata.

MCP Architecture

  • Decoupled reasoning from tool execution via 3 MCP services: Local Data (:8011), Web Agent (:8012), OCR Vision (:8013).
  • Runtime tool discovery via /tools/list, building a live routing map per user with graceful degradation.
  • Hard security blocks at the tool-node level to strictly control internet-search access.

Hybrid Search & RAG

  • Elasticsearch 8.x hybrid search: dense KNN combined with BM25 in a single query.
  • Ingestion supporting digital extraction + OCR fallback for scanned docs with adaptive DPI.
  • Dual-write to Chroma & Elasticsearch with thread-level metadata filtering.

Vision & Hallucination Guardrails

  • GOT-OCR 2.0 on GPU containers with aggressive cleaners filtering repetition loops & junk sequences.
  • BERTScore middleware: scores Final Answers vs. observations, intercepting F1 < 0.15.
  • SearXNG + Firecrawl for multi-threaded scraping into clean Markdown via RabbitMQ queue.

Infrastructure & Performance

  • 4-bit NF4 + bfloat16 reduced VRAM ≈75% while preserving quality.
  • Dual-model architecture: Mistral-7B for generation + LLaMA-3.2-1B as a semantic content filter.
  • Full stack via Docker Compose, Nginx proxy, GPU pinning, and Arize Phoenix tracing.
RESEARCH

Transformer Research & Advanced Prompt Engineering

2023 · Python · PyTorch · Transformers
  • Deep-dived into Transformer internals (Self-Attention, Multi-Head Attention) to understand failure modes.
  • Benchmarked CoT and Few-shot prompting; findings applied directly to AIchat's ReAct system prompts.
ROADMAP

Upcoming AI Products

A pipeline of AI-native products in active design — each solving a real problem with on-prem or edge AI.

📵
AI Child-Safe Phones
On-device content filtering & screen-time AI for kids
🆘
AI SOS
Emergency detection & auto-alert system using edge AI
📈
AI Sales & Marketing
Agentic campaign builder & lead intelligence platform
📚
AI Book
AI-assisted authoring & interactive reading experience
🗣️
AI English
Adaptive English tutor with pronunciation & grammar AI
Education

Education & Achievements

B.Sc. Computer Science — National Economics University (NEU)

Faculty of IT · CS-64 · 2022 – 2026 · Full-time
  • GPA: 3.32 / 4.0  ·  8.01 / 10
  • Total credits earned: 119
  • Conduct: Excellent (Tốt)
  • Student ID: 11220583

Selected high-grade courses

Graduation Thesis — CS9.80 A+
AI in Business & Management9.70 A+
Object-Oriented Programming9.50 A+
CS Capstone Project9.50 A+
.NET Programming9.20 A+
Operating Systems9.00 A+
Database Management Systems9.00 A+
Modern IT Technologies9.00 A+
Application Programming8.90 A
Web Design8.90 A
Unstructured Data8.80 A
Web Programming8.80 A
Computer Networks8.70 A
Mobile App Development8.50 A
Machine Learning8.40 B+
Java Programming8.20 B+

Achievements & Languages

Work Experience

IT Staff — Công ty TNHH MH Rental · 2026 – Present

Managing internal IT infrastructure, network systems, and deploying AI-powered tools for operational efficiency.

Marketing Collaborator — SEC English Center · 2023 – 2024

Executed digital marketing campaigns across social media, increasing brand awareness and student enrollment.

References

Dr. Luu Minh Tuan — Vice Dean, Faculty of IT, NEU
Lmtneu@gmail.com · (+84) 904 143 460

Contact

Let's build something.

I'm open to AI engineering roles, research collaborations, and ambitious on-prem AI projects. The fastest way to reach me is email.

📍 41 Pho Vong, Hai Ba Trung, Hanoi