Overview

Our n8n Workflow automates the process of collecting invoice data in PDF format from the user, and sends this data to the SPACE OCR api to fetch all the fields we need to extract them. parse and clean it, use a mistral AI agent, get correct HSN codes from a Google Sheet, return the modified invoice with appropriate codes. Not manually just through an API.

Objective

The primary objective of workflow documentation for an OCR-based invoice processing system with HSN code auto assignment is to standardize, optimize and automate the entire lifecycle to reduce manual efforts, improve data accuracy, and ensure regulatory compliance.

How does it work?

Step 1: On Form Submission (Trigger Node)

  • Type: n8n Form Trigger (Form Submission)
  • Goal: Trigger the workflow when a user post (upload) an invoice using an n8n form.
  • Process:
    • Receives uploaded invoice PDF.
    • Relays the file data to the next node.

Step 2: OCR Response (HTTP Request Node)

  • Type: HTTP Request
  • Purpose: Sends the uploaded invoice to an OCR API such as OCR. space) to copy over text from the PDF.
  • Configuration:
    • Method: POST
    • URL: https://api.ocr.space/parse/image

Step 3: AI Agent (Integration of the Mistral Cloud Chat Model)

  • Type: AI Agent Node
  • Goal: Clean and structure raw
  • Logic:
    • Prompt: “Tank all structured data from this OCR text and return in usuable JSON format fields may include item name, qty, value of items, category etc”

Step 4: Code Node (Pre-HSN Processing)

  • Type: Code
  • Purpose: I am trying to parse the AI generated JSON into objects of different types for something else.

Step 5: Get Row(s) from Sheet (Google Sheets Node)

  • Type: Google Sheets → Read
  • Purpose: A file of HSN Master Sheet is being read which has keyword/category and HSN code mapping.

Step 6: Code1 Node (HSN Auto-Assignment)

  • Type: Code
  • Objective: 1.Associating HSN Codes present in the Google sheet with invoice item categories.
  • Logic:
    • Look among the master sheet for as many invoice items you have to search for keyword matches.
    • Assign the corresponding HSN code.
    • If HSN not available – mention as “HSN Not

Technology Stack Included

Key Benefits

Accurate OCR Data Extraction

Leverages an OCR API to capture text from uploaded invoices with a high degree of accuracy to reduce data entry errors.

AI-Powered Parsing & Cleaning

The AI Agent takes the raw OCR output and cleans it then structures into a useful JSON

Dynamic HSN Code Assignment

Real-Time match extracted product descriptions with your HSN Google Sheet database. If there are any changes to HSN codes in Google Sheets.

Scalable & Reusable

Can be extended to support various types of documents (invoices, purchase orders, receipts) with ease.

Improved Compliance & GST Readiness

It auto-categorizes HSN codes rightly, prevents mismatches in GST returns, and helps you avoid penalties.

Time & Cost Savings

Automates manual searching and saves your accounting team hours.

Real-Time Processing

Outputs structured invoice data immediately after you submit the form.

Closure

This documentation outlines the step by step process, defines roles and specifies the technology usage to achieve increased efficiency. Plus, it also automates the process of collecting invoice data in PDF format from the user, and sends this data to the SPACE OCR api to fetch all the fields we need to extract them.