Project Guide

Async Document Processor

An enterprise-grade platform for intelligent document extraction, processing, and review using an asynchronous event-driven architecture.

What is this?

A system that takes raw PDF/Word documents and converts them into structured, actionable JSON data using rule-based AI.

Why Async?

Heavy processing is offloaded to background workers (Celery), keeping the UI lightning fast and responsive at all times.

Cloud Ready

Integrated with Google Cloud Storage for reliable file persistence and Neon/Upstash for global scalability.

How to Use

1. Upload Document

Go to the "Upload" page and drop a PDF or Word file. The system will immediately store it in the cloud.

2. Async Extraction

The system triggers a background worker that parses the file, extracts line items, taxes, and metadata in real-time.

3. Human Review

Review the extracted data in the visual dashboard. Correct any errors directly in the built-in JSON editor.

4. Finalize & Export

Once satisfied, click "Finalize". You can then export the clean data as a structured JSON or CSV file.

Technical Architecture

Frontend

Next.js 15+
App Router, Server Components, Client Hydration
Tailwind CSS
Utility-first styling for a polished modern UI
Lucide Icons
Consistent, beautiful iconography
Context API
Efficient global state management for documents

Backend

FastAPI
High-performance Python API framework
Celery & Redis
Distributed task queue for async processing
SQLAlchemy
Async ORM with Neon (PostgreSQL) support
PyPDF & Docx
Real document parsing and text extraction

Infrastructure

Designed for the modern web, the system utilizes top-tier infrastructure providers to ensure your data is safe, processed quickly, and always available.

GCP Storage

Neon DB

Upstash Redis

Key Metrics

Processing Time

~15s

Max File Size

10MB

Formats

PDF, DOCX

Architecture

Async