How I Organized 3,674 Obsidian Vault Files in One Day with Claude Code

Published: (February 16, 2026 at 08:59 AM EST)
8 min read
Source: Dev.to

Source: Dev.to

Introduction

I had 3,674 Markdown files exported from Evernote, Apple Notes, and Apple Journal sitting in my Obsidian Vault. No front‑matter, no tags, no classification. Web clippings were mixed with my own writing, and duplicate files were everywhere.
I organized all of it in one day using Claude Code (Opus 4.6).

Metric

MetricBeforeAfter
Total files3,674~1,000
Files with front‑matter0All
Duplicate files2,7510
MOCs (Map of Content)05
Plugins (configured)010
Templates06

Obsidian has a rich ecosystem of community plugins. Bulk‑inserting front‑matter into 3,674 files, tagging based on folder structure, detecting and removing duplicates, and batch‑editing plugin settings in JSON is not feasible through GUI operations alone.

With Claude Code I could drive all of this from the shell:

  • Generate and run Python scripts – write classification logic in Python and execute immediately.
  • Per‑file content judgment – read 100+ files one by one in mixed folders and classify them.
  • JSON editing of plugin settings – batch‑update configurations for 10 plugins.
  • Rapid iteration – fix a failing script and re‑run instantly.

In short, “applying a mix of rule‑based and judgment‑based processing to a large number of files” is exactly where Claude Code excels.

Front‑matter insertion

I had Claude Code generate and run a Python script to insert front‑matter with the following structure into 3,491 Markdown files:

---
category: tech        # tech / personal / creative / reference
type: fleeting        # fleeting / literature / permanent / moc
status: draft         # draft / review / done
tags:
  - journal
  - reflection
source: evernote      # evernote / apple-notes / apple-journal
date: "2020-01-15"
---

I defined folder‑to‑tag mappings for 26 folders, and bulk_frontmatter.py processed all 3,491 files with zero errors.

macOS filename normalization

Files exported from Apple Notes had mysterious filename‑matching failures. The cause was macOS filesystem’s NFD (Normalization Form D) while Python string literals are NFC.

# macOS filesystem stores filenames in NFD
# Python string literals are NFC
# → They don't match

import unicodedata

# Solution: Normalize to NFC before comparison
normalized = unicodedata.normalize("NFC", filename)

Tip: If you’re writing Python scripts that handle non‑ASCII filenames on macOS, unicodedata.normalize("NFC") is essential. This applies to any file operation, not just Obsidian.

Duplicate & low‑content removal

  • Evernote exports generate massive numbers of duplicate files following the 2.md pattern. I detected them with a regex, verified diffs against originals, and deleted them.
  • I also removed files with extremely low character counts (≤ 50 characters, stats‑only tables, etc.). Thresholds varied by folder—journal folders were excluded since even short entries have value.

Failed approach: Hiragana‑ratio heuristic

I hypothesized that my own writing would have a higher hiragana ratio and tried filtering on that basis. It failed because Japanese web articles also have high hiragana ratios, causing massive misclassification.

# This didn't work
def is_personal(text: str) -> bool:
    hiragana = sum(1 for c in text if '\u3040'  0.3  # No threshold gave acceptable accuracy

Successful approach: Folder‑level classification

In the end, the Evernote‑era folder structure was the most reliable classification criterion.

  • Folders clearly containing my own writing (journals, reflections) were kept.
  • Web‑clip‑heavy folders (Study, Lifehack, etc.) were moved to an archive wholesale.

Only mixed folders (e.g., Inbox) required Claude Code to read and judge files individually. Out of 124 files, 47 were classified as personal writing and 77 as web clips, with high accuracy.

Lesson: Distinguishing content origin by statistical features of natural language is difficult. Metadata (folder structure, filename patterns) is far more reliable.

Plugin configuration

After the vault structure settled, I installed and configured the following plugins:

PluginPurposeConfiguration Highlight
DataviewMetadata‑based queries & dashboardsJS queries & inline enabled
LinterAuto‑format front‑matterFixed YAML key order, duplicate tag removal
TemplaterTemplate engineFolder‑template linking
Tag WranglerBulk tag rename & mergeOrganized 77 tags
CalendarCalendar viewJapanese locale, week starts Monday
Auto Note MoverAuto‑sort new notesFunnel to Inbox, exclude legacy folders
QuickAddQuick capture3 purpose‑specific commands
GraphKnowledge graph visualizationColor‑coding by folder
Smart ConnectionsSemantic search & related notesSwitched to multilingual model (see below)
Periodic NotesDaily/Weekly/Monthly notesTemplate integration

Obsidian stores plugin settings in .obsidian/plugins/{plugin‑name}/data.json. Editing this JSON directly with Claude Code saves the tedium of clicking through the GUI. However, there’s an important caveat:

Do not edit settings while Obsidian is running. Obsidian holds settings in memory, so CLI edits will be overwritten on the next save. Always quit Obsidian before editing.

Vault root confusion

During the session I hit an issue where plugin‑settings changes had zero effect. Investigation revealed that the vault was pointing to the parent directory, causing two .obsidian directories to exist:

Documents/                  ← Obsidian recognized this as vault root
├── .obsidian/              ← This config was active
└── Obsidian Vault/
    ├── .obsidian/          ← This was being ignored
    └── (actual notes)

Merging the correct .obsidian directory fixed the problem.

Lesson: When plugin settings won’t take effect, first run
find . -name ".obsidian" -type d to verify the .obsidian location.

Smart Connections quirks

  • The default embedding model TaylorAI/bge-micro-v2 is English‑focused and performs poorly on non‑English vaults.
  • Moreover, this plugin’s settings are stored not in data.json but in .smart-env/smart_env.json.
# Important: edit .smart-env/smart_env.json, not the usual data.json

Switching to a multilingual model (e.g., intfloat/multilingual-e5-base) restored useful semantic search for my Japanese‑heavy vault.

Takeaways

  1. Bulk operations (front‑matter insertion, tag bulk‑rename, duplicate removal) are best driven by scripts rather than the GUI.
  2. Filesystem normalization matters on macOS; always normalize filenames to NFC before comparison.
  3. Metadata beats heuristics – folder structure and filename patterns are far more reliable than language‑statistical tricks.
  4. Plugin settings should be edited offline and you must ensure you’re editing the correct .obsidian directory.
  5. For multilingual vaults, choose an appropriate embedding model for semantic‑search plugins.

With Claude Code handling the heavy lifting, I turned a chaotic 3,674‑file mess into a tidy, searchable, and well‑structured Obsidian vault in a single day.

Summary

  • Problem: Obsidian Smart Connections was using the default English embedding model, which didn’t capture multilingual relationships.
  • Solution: Switched the embedding model to Xenova/multilingual-e5-small and re‑indexed the vault.
  • Result: Related notes now show accurate connections across languages.

Lesson:
Obsidian plugin settings aren’t always stored in data.json. When a setting appears to have no effect, also check files outside .obsidian/plugins/{plugin}/ (e.g., .smart‑env/).

Configuration Change

// .smart-env/smart_env.json
{
  "smart_sources": {
    "embed_model": {
      "model_key": "Xenova/multilingual-e5-small"
    }
  }
}

After updating the model key and clearing the embedding cache for a fresh re‑index, the multilingual connections appeared correctly.

Organizing the Vault

1. Maps of Content (MOCs)

A MOC is a note that lists other notes on a specific theme, using a mix of manual links and Dataview queries. Example query:

TABLE date, status
FROM #books
SORT date DESC

2. Dataview Dashboard

The dashboard shows:

  • Note counts by status
  • Recently updated notes
  • Uncategorized notes

Enabling Dataview’s JS queries provides more flexible aggregation.

Automation with Claude Code

I ran Claude Code against an iCloud‑hosted Obsidian vault (3,674 .md files). Directly feeding all files to Claude is impractical because of token limits, so the workflow became:

  1. Generate a Python script with Claude.
  2. Run the script locally.
  3. Inspect results, fix any issues, and repeat.

Scripts Used

ScriptPurposeFiles Processed
bulk_frontmatter.pyBulk front‑matter insertion3,491
apple_notes_folders.pySub‑folder‑based classification66
apple_notes_root.pyContent‑analysis‑based classification86
vault_audit.pyTag distribution & gap detectionAll
tag_cleanup.pyTag merge & renameAll

Technical Nuggets

  • macOS + non‑ASCII filenames → must normalise paths with unicodedata.normalize("NFC").
    • NFD/NFC mismatches break file searches, pattern matching, and dictionary key look‑ups.
  • Backup strategy:
    • Created a tar.gz backup before bulk operations.
    • The hiragana‑ratio classification failed, requiring a restore.
    • Having a quick‑restore backup enables rapid “try → fail → restore” cycles.
  • Plugin settings:
    • Direct JSON editing works, but only after quitting Obsidian.
    • Watch for vault‑root issues and non‑standard settings files.

Key Takeaways

  1. Bulk processing = code generation + local execution

    • Don’t feed every file to Claude; encode the logic as Python (or another language) and run it yourself.
  2. Metadata beats heuristics

    • Folder structure and front‑matter are far more reliable than statistical heuristics (e.g., hiragana‑ratio).
  3. Smart Connections multilingual support

    • Switch to multilingual‑e5‑small in .smart‑env/smart_env.json.
  4. Unicode normalisation is essential on macOS for non‑ASCII filenames.

  5. Claude Code is a general‑purpose organizer

    • It excels when paired with rule‑based batch scripts and occasional content‑based human judgment.

All of the above enabled me to reorganize the entire 3,674‑note vault in a single day.

0 views
Back to Blog

Related posts

Read more »

Preface

Motivation I wanted to record my studies to have consistency. Since I don't directly learn building projects from my CS program, I want to be an expert in my a...