Pages Dashboard

AU
Project Guide

Async Document Processor

An enterprise-grade platform for intelligent document extraction, processing, and review using an asynchronous event-driven architecture.

What is this?

A system that takes raw PDF/Word documents and converts them into structured, actionable JSON data using rule-based AI.

Why Async?

Heavy processing is offloaded to background workers (Celery), keeping the UI lightning fast and responsive at all times.

Cloud Ready

Integrated with Google Cloud Storage for reliable file persistence and Neon/Upstash for global scalability.

How to Use

1. Upload Document

Go to the "Upload" page and drop a PDF or Word file. The system will immediately store it in the cloud.

2. Async Extraction

The system triggers a background worker that parses the file, extracts line items, taxes, and metadata in real-time.

3. Human Review

Review the extracted data in the visual dashboard. Correct any errors directly in the built-in JSON editor.

4. Finalize & Export

Once satisfied, click "Finalize". You can then export the clean data as a structured JSON or CSV file.

Technical Architecture

Frontend

  • Next.js 15+

    App Router, Server Components, Client Hydration

  • Tailwind CSS

    Utility-first styling for a polished modern UI

  • Lucide Icons

    Consistent, beautiful iconography

  • Context API

    Efficient global state management for documents

Backend

  • FastAPI

    High-performance Python API framework

  • Celery & Redis

    Distributed task queue for async processing

  • SQLAlchemy

    Async ORM with Neon (PostgreSQL) support

  • PyPDF & Docx

    Real document parsing and text extraction

Infrastructure

Designed for the modern web, the system utilizes top-tier infrastructure providers to ensure your data is safe, processed quickly, and always available.

GCP Storage
Neon DB
Upstash Redis

Key Metrics

Processing Time

~15s

Max File Size

10MB

Formats

PDF, DOCX

Architecture

Async