feat: Initial commit - Content Extractor for YouTube, Instagram, and blogs

- YouTube extraction with transcript support
- Instagram reel extraction via browser automation
- Blog/article web scraping
- Auto-save to Obsidian vaults
- Smart key point generation
- Configurable via .env file
- Quick extract shell script

Tech stack: Python, requests, beautifulsoup4, playwright, youtube-transcript-api
This commit is contained in:
naki
2026-03-05 13:02:58 +05:30
commit c997e764b5
12 changed files with 1302 additions and 0 deletions

23
requirements.txt Normal file
View File

@@ -0,0 +1,23 @@
# Content Extractor Dependencies
# Web scraping
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=4.9.0
# YouTube
youtube-transcript-api>=0.6.0
pytube>=15.0.0
# Browser automation (for Instagram and dynamic content)
playwright>=1.40.0
# Text processing
markdown>=3.5.0
# Utilities
python-dotenv>=1.0.0
pydantic>=2.5.0
# Date handling
python-dateutil>=2.8.0