Async Document Processor
An enterprise-grade platform for intelligent document extraction, processing, and review using an asynchronous event-driven architecture.
What is this?
A system that takes raw PDF/Word documents and converts them into structured, actionable JSON data using rule-based AI.
Why Async?
Heavy processing is offloaded to background workers (Celery), keeping the UI lightning fast and responsive at all times.
Cloud Ready
Integrated with Google Cloud Storage for reliable file persistence and Neon/Upstash for global scalability.
How to Use
1. Upload Document
Go to the "Upload" page and drop a PDF or Word file. The system will immediately store it in the cloud.
2. Async Extraction
The system triggers a background worker that parses the file, extracts line items, taxes, and metadata in real-time.
3. Human Review
Review the extracted data in the visual dashboard. Correct any errors directly in the built-in JSON editor.
4. Finalize & Export
Once satisfied, click "Finalize". You can then export the clean data as a structured JSON or CSV file.
Technical Architecture
Frontend
Next.js 15+
App Router, Server Components, Client Hydration
Tailwind CSS
Utility-first styling for a polished modern UI
Lucide Icons
Consistent, beautiful iconography
Context API
Efficient global state management for documents
Backend
FastAPI
High-performance Python API framework
Celery & Redis
Distributed task queue for async processing
SQLAlchemy
Async ORM with Neon (PostgreSQL) support
PyPDF & Docx
Real document parsing and text extraction
Infrastructure
Designed for the modern web, the system utilizes top-tier infrastructure providers to ensure your data is safe, processed quickly, and always available.
Key Metrics
Processing Time
~15s
Max File Size
10MB
Formats
PDF, DOCX
Architecture
Async