# Gemina API Documentation

Integrate Gemina's production harness for AI document extraction. Specialized agents, validation layers, and compliance controls — accessible via REST API and MCP server.

- HTML version: https://www.gemina.co/docs
- FileTag docs (HTML): https://www.gemina.co/docs/filetag
- FileTag docs (markdown): https://www.gemina.co/docs/filetag.md
- MCP manifest: https://www.gemina.co/.well-known/mcp.json

## Getting Started

The Gemina API enables you to extract structured data from documents programmatically. Upload documents via file or URL and receive structured JSON with extracted fields, coordinates, and confidence scores.

- **Base URL:** `https://api.gemina.co`
- **Authentication:** `X-API-Key: your-api-key` header
- **Get an API key:** https://console.gemina.co/registration/create-account?planId=trial

## Python

### Quick Start

Install the required package and set up your environment:

```bash
pip install requests
```

```python
import os

BASE_URL = os.getenv("GEMINA_BASE_URL", "https://api.gemina.co")
API_KEY = os.getenv("GEMINA_API_KEY", "")
HEADERS = {"X-API-Key": API_KEY}
```

### Authentication

All API requests require authentication via the `X-API-Key` header:

```python
import requests

headers = {"X-API-Key": "your-api-key"}
response = requests.get(
    "https://api.gemina.co/api/v1/documents/",
    headers=headers
)
```

### Upload Document

Upload a document for extraction using multipart form data:

```python
import requests

url = "https://api.gemina.co/api/v1/documents/uploads"
headers = {"X-API-Key": "your-api-key"}

form_data = [
    ("extraction_types", "invoice_headers"),
    ("extraction_types", "invoice_line_items"),
    ("external_id", "inv-2025-0001"),
    ("model_type", "invictus"),
]

files = {
    "file": ("invoice.pdf", open("./invoice.pdf", "rb"), "application/pdf")
}

response = requests.post(url, headers=headers, data=form_data, files=files)
result = response.json()
print(result)
```

## Node.js

### Quick Start

Install the required package and set up your environment:

```bash
npm install axios form-data
```

```typescript
import axios from "axios";

const BASE_URL = process.env.GEMINA_BASE_URL || "https://api.gemina.co";
const API_KEY = process.env.GEMINA_API_KEY || "";

const client = axios.create({
  baseURL: BASE_URL,
  headers: { "X-API-Key": API_KEY },
  timeout: 90_000,
});
```

### Authentication

All API requests require authentication via the `X-API-Key` header:

```typescript
import axios from "axios";

const client = axios.create({
  baseURL: "https://api.gemina.co",
  headers: { "X-API-Key": "your-api-key" },
});

const response = await client.get("/api/v1/documents/");
```

### Upload Document

Upload a document for extraction using multipart form data:

```typescript
import fs from "fs";
import FormData from "form-data";
import axios from "axios";

const form = new FormData();
form.append("extraction_types", "invoice_headers");
form.append("extraction_types", "invoice_line_items");
form.append("external_id", "inv-2025-0001");
form.append("model_type", "invictus");
form.append("file", fs.createReadStream("./invoice.pdf"));

const response = await axios.post(
  "https://api.gemina.co/api/v1/documents/uploads",
  form,
  {
    headers: {
      "X-API-Key": "your-api-key",
      ...form.getHeaders(),
    },
  }
);

console.log(response.data);
```

## Java

### Quick Start

Install the required package and set up your environment:

```xml
<dependency>
    <groupId>com.squareup.okhttp3</groupId>
    <artifactId>okhttp</artifactId>
    <version>4.12.0</version>
</dependency>
```

```java
String BASE_URL = System.getenv("GEMINA_BASE_URL") != null
    ? System.getenv("GEMINA_BASE_URL") : "https://api.gemina.co";
String API_KEY = System.getenv("GEMINA_API_KEY");

OkHttpClient client = new OkHttpClient.Builder()
    .connectTimeout(90, TimeUnit.SECONDS)
    .readTimeout(90, TimeUnit.SECONDS)
    .build();
```

### Authentication

All API requests require authentication via the `X-API-Key` header:

```java
Request request = new Request.Builder()
    .url("https://api.gemina.co/api/v1/documents/")
    .header("X-API-Key", "your-api-key")
    .get()
    .build();

Response response = client.newCall(request).execute();
```

### Upload Document

Upload a document for extraction using multipart form data:

```java
File invoiceFile = new File("invoice.pdf");

RequestBody requestBody = new MultipartBody.Builder()
    .setType(MultipartBody.FORM)
    .addFormDataPart("extraction_types", "invoice_headers")
    .addFormDataPart("extraction_types", "invoice_line_items")
    .addFormDataPart("external_id", "inv-2025-0001")
    .addFormDataPart("model_type", "invictus")
    .addFormDataPart("file", invoiceFile.getName(),
        RequestBody.create(invoiceFile, MediaType.parse("application/pdf")))
    .build();

Request request = new Request.Builder()
    .url("https://api.gemina.co/api/v1/documents/uploads")
    .header("X-API-Key", "your-api-key")
    .post(requestBody)
    .build();

Response response = client.newCall(request).execute();
System.out.println(response.body().string());
```

## C#

### Quick Start

```csharp
using System.Net.Http;

var baseUrl = Environment.GetEnvironmentVariable("GEMINA_BASE_URL")
    ?? "https://api.gemina.co";
var apiKey = Environment.GetEnvironmentVariable("GEMINA_API_KEY") ?? "";

var client = new HttpClient
{
    BaseAddress = new Uri(baseUrl),
    Timeout = TimeSpan.FromSeconds(90)
};
client.DefaultRequestHeaders.Add("X-API-Key", apiKey);
```

### Authentication

All API requests require authentication via the `X-API-Key` header:

```csharp
using var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-API-Key", "your-api-key");

var response = await client.GetAsync(
    "https://api.gemina.co/api/v1/documents/"
);
```

### Upload Document

Upload a document for extraction using multipart form data:

```csharp
using var form = new MultipartFormDataContent();

form.Add(new StringContent("invoice_headers"), "extraction_types");
form.Add(new StringContent("invoice_line_items"), "extraction_types");
form.Add(new StringContent("inv-2025-0001"), "external_id");
form.Add(new StringContent("invictus"), "model_type");

var fileBytes = await File.ReadAllBytesAsync("invoice.pdf");
var fileContent = new ByteArrayContent(fileBytes);
fileContent.Headers.ContentType =
    new MediaTypeHeaderValue("application/pdf");
form.Add(fileContent, "file", "invoice.pdf");

var response = await client.PostAsync(
    "https://api.gemina.co/api/v1/documents/uploads",
    form
);
var result = await response.Content.ReadAsStringAsync();
Console.WriteLine(result);
```

## PHP

### Quick Start

```php
<?php

define('GEMINA_BASE_URL', getenv('GEMINA_BASE_URL') ?: 'https://api.gemina.co');
define('GEMINA_API_KEY', getenv('GEMINA_API_KEY') ?: '');

if (empty(GEMINA_API_KEY)) {
    throw new Exception('Set GEMINA_API_KEY environment variable');
}
```

### Authentication

All API requests require authentication via the `X-API-Key` header:

```php
<?php

$ch = curl_init('https://api.gemina.co/api/v1/documents/');
curl_setopt_array($ch, [
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => [
        'X-API-Key: your-api-key',
    ],
]);

$response = curl_exec($ch);
curl_close($ch);
```

### Upload Document

Upload a document for extraction using multipart form data:

```php
<?php

$url = 'https://api.gemina.co/api/v1/documents/uploads';
$cfile = new CURLFile('./invoice.pdf', 'application/pdf', 'invoice.pdf');

$postData = [
    'extraction_types[0]' => 'invoice_headers',
    'extraction_types[1]' => 'invoice_line_items',
    'external_id'         => 'inv-2025-0001',
    'model_type'          => 'invictus',
    'file'                => $cfile,
];

$ch = curl_init($url);
curl_setopt_array($ch, [
    CURLOPT_POST           => true,
    CURLOPT_POSTFIELDS     => $postData,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER     => [
        'X-API-Key: your-api-key',
    ],
]);

$response = curl_exec($ch);
curl_close($ch);

echo $response;
```

## Response Format

Successful extractions return structured JSON with field values and confidence scores:

```json
{
  "status": "success",
  "meta": {
    "documentId": "9860df92-64fe-4b53-9663-5c11b38a3051",
    "externalId": "inv-2026-0001",
    "filename": "invoice.pdf"
  },
  "data": {
    "extractions": [
      {
        "extractionType": "invoice_headers",
        "status": "success",
        "values": {
          "vendorName": {"value": "Acme Beverages Ltd.", "confidence": "high"},
          "invoiceNumber": {"value": "IL-2026-04812", "confidence": "high"},
          "invoiceDate": {"value": "2026-05-12", "confidence": "high"},
          "currency": {"value": "ILS", "confidence": "high"},

          "grossSubtotalAmount": {"value": 7663.63, "confidence": "high"},
          "discountAmount":      {"value": 229.91,  "confidence": "high"},
          "discountPercentage":  {"value": 3.0,     "confidence": "high"},
          "roundingAmount":      {"value": -0.18,   "confidence": "high"},

          "subtotalAmount": {"value": 7433.90, "confidence": "high"},
          "taxes": [
            {"type": "vat", "name": "VAT 18%", "rate": 18.0, "amount": 1338.10, "confidence": "high"}
          ],
          "totalAmount": {"value": 8772.00, "confidence": "high"}
        }
      },
      {
        "extractionType": "invoice_line_items",
        "status": "success",
        "values": {
          "line_items": [
            {
              "lineNumber": 1,
              "description": "Premium 6-pack 330ml beer cans",
              "itemCode": "BV-330-6",
              "quantity": 12.0,
              "listPrice": 65.00,
              "unitPrice": 58.50,
              "discountAmount": 6.50,
              "discountPercentage": 10.0,
              "taxRate": 18.0,
              "packagingAmount": 0.30,
              "depositAmount": 1.20,
              "unitsPerPackage": 6,
              "packageQuantity": 2.0,
              "lineTotal": 703.50
            },
            {
              "lineNumber": 2,
              "description": "Olive oil 1L",
              "itemCode": "OO-1L",
              "quantity": 0.5,
              "listPrice": null,
              "unitPrice": 45.00,
              "discountAmount": null,
              "discountPercentage": null,
              "taxRate": 18.0,
              "packagingAmount": null,
              "depositAmount": null,
              "unitsPerPackage": null,
              "packageQuantity": null,
              "lineTotal": 22.50
            }
          ],
          "total_lines": 2
        }
      }
    ]
  }
}
```

## Response Fields

The response includes structured extraction values keyed by extraction type. Unpopulated fields are `null`.

### `invoice_headers` fields

Each header field uses an envelope shape: `{ value, coordinates, confidence }`. When the invoice does not print the value, the whole envelope is `null` — a defensive client can safely guard with `if (response.discountAmount) { … }`.

- `grossSubtotalAmount` — Sum of line items before any header-level discount or rounding.
- `discountAmount` — Header-level discount in the document’s currency. Sign is verbatim from the invoice: some templates print positives (`229.91`), others negatives (`-229.91`) or parenthesized values. Clients that subtract on their side must handle both.
- `discountPercentage` — Header-level discount as a percentage (e.g. `3.0` means 3%). Only populated when the invoice prints it.
- `roundingAmount` — Rounding adjustment (e.g. "round off", agorot rounding). Signed as printed; magnitude typically < 1.0 in document currency.
- `subtotalAmount` — The tax base: the value the invoice’s VAT/tax percentage is calculated against, after any header-level discount and rounding, before tax.

The reconciliation identity (modulo printing artifacts):

```text
subtotalAmount + Σ taxes[].amount ≈ totalAmount
```

### `invoice_line_items` fields

Each item in the `line_items` array is a flat object (no envelope). Unpopulated fields are `null`.

- `listPrice` — Gross/catalog unit price before any line-level discount. Populated only when the invoice prints a dedicated "list price" / "catalog price" / "MSRP" column. Documentation-only — do not use it in line-total math.
- `unitPrice` — NET price per unit, after any line-level discount. The `lineTotal` math uses this value, so the per-line `discountAmount` and `discountPercentage` should not be subtracted again. For the gross/catalog price, use `listPrice` when populated.
- `packagingAmount` — Additive packaging charge (crate fee, palletizing fee). Positive. Contributes to `lineTotal`.
- `depositAmount` — Additive deposit/refund charge (bottle deposit, container deposit). Positive. Contributes to `lineTotal`.
- `unitsPerPackage` — Structural pack size: whole-number count of units per package (e.g. `24` cans per case). Informational; never a volume or weight.
- `packageQuantity` — Order quantity in package units; may be fractional (e.g. `2.1` cartons). Informational. Most invoices print only one of `unitsPerPackage` or `packageQuantity` — both can be `null` independently.

Line-total math contract:

```text
lineTotal ≈ quantity × unitPrice
          + taxAmount        (if present)
          + packagingAmount  (if present)
          + depositAmount    (if present)
```

When both pack-size fields are present, `quantity ≈ packageQuantity × unitsPerPackage` — the relationship is approximate, not enforced.

## API Reference

### Extraction Types

- `invoice_headers` — Invoice header fields (see [field list](#invoice_headers-fields))
- `invoice_line_items` — Line item details (see [field list](#invoice_line_items-fields))
- `ocr` — Full text extraction
- `document_details_hebrew` — Hebrew documents

### Model Types

- `velox` — Fast processing
- `praetorian` — Balanced accuracy
- `invictus` — Highest accuracy

### Endpoints

- `POST /api/v1/documents/uploads`
- `POST /api/v1/documents/uploads/web`
- `GET /api/v1/documents/{id}`
- `GET /api/v1/documents/results/{id}`

### Response Statuses

- `success` — Extraction completed
- `pending` — Job queued
- `in_process` — Processing
- `failed` — Error occurred

### FileTag API

- Document tagging via REST + MCP
- Free tier: 1,500 tags/month
- HTML docs: https://www.gemina.co/docs/filetag
- Markdown docs: https://www.gemina.co/docs/filetag.md
