feat: Initial commit - Content Extractor for YouTube, Instagram, and blogs

- YouTube extraction with transcript support - Instagram reel extraction via browser automation - Blog/article web scraping - Auto-save to Obsidian vaults - Smart key point generation - Configurable via .env file - Quick extract shell script Tech stack: Python, requests, beautifulsoup4, playwright, youtube-transcript-api
2026-03-05 13:02:58 +05:30
commit c997e764b5
12 changed files with 1302 additions and 0 deletions
@@ -0,0 +1,21 @@
+# Content Extractor Configuration
+
+# Obsidian vault path (default: ~/Obsidian Vault)
+OBSIDIAN_VAULT_PATH=~/Obsidian Vault
+
+# Browser settings (for Instagram extraction)
+BROWSER_HEADLESS=true
+BROWSER_TIMEOUT=30000
+
+# Content extraction settings
+MAX_CONTENT_LENGTH=10000
+GENERATE_SUMMARY=true
+
+# YouTube settings
+YOUTUBE_LANGUAGE=en
+
+# Instagram settings
+INSTAGRAM_WAIT_TIME=5
+
+# Logging
+LOG_LEVEL=INFO