feat: Initial commit - Content Extractor for YouTube, Instagram, and blogs

- YouTube extraction with transcript support
- Instagram reel extraction via browser automation
- Blog/article web scraping
- Auto-save to Obsidian vaults
- Smart key point generation
- Configurable via .env file
- Quick extract shell script

Tech stack: Python, requests, beautifulsoup4, playwright, youtube-transcript-api
This commit is contained in:
naki
2026-03-05 13:02:58 +05:30
commit c997e764b5
12 changed files with 1302 additions and 0 deletions

21
.env.example Normal file
View File

@@ -0,0 +1,21 @@
# Content Extractor Configuration
# Obsidian vault path (default: ~/Obsidian Vault)
OBSIDIAN_VAULT_PATH=~/Obsidian Vault
# Browser settings (for Instagram extraction)
BROWSER_HEADLESS=true
BROWSER_TIMEOUT=30000
# Content extraction settings
MAX_CONTENT_LENGTH=10000
GENERATE_SUMMARY=true
# YouTube settings
YOUTUBE_LANGUAGE=en
# Instagram settings
INSTAGRAM_WAIT_TIME=5
# Logging
LOG_LEVEL=INFO