# Google Cloud Vision OCR Setup Guide

## Why Switch from Tesseract?

ChatGPT uses advanced OCR (likely GPT-4 Vision) which provides superior accuracy compared to Tesseract, especially for:
- Low quality or scanned images
- Complex layouts with multiple columns
- Text with poor contrast or lighting
- Mixed fonts and sizes

Google Cloud Vision API provides similar accuracy to ChatGPT's OCR capabilities.

## Setup Steps

### 1. Create Google Cloud Project

1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing one
3. Enable the **Cloud Vision API**:
   - Navigate to "APIs & Services" > "Library"
   - Search for "Cloud Vision API"
   - Click "Enable"

### 2. Create Service Account Credentials

1. Go to "APIs & Services" > "Credentials"
2. Click "Create Credentials" > "Service Account"
3. Fill in details:
   - Name: `constituency-ocr`
   - Description: `OCR service for voter data extraction`
4. Click "Create and Continue"
5. Grant role: **Cloud Vision AI Service Agent**
6. Click "Done"
7. Click on the created service account
8. Go to "Keys" tab
9. Click "Add Key" > "Create new key"
10. Choose **JSON** format
11. Download the JSON file

### 3. Configure Laravel Application

1. Copy the downloaded JSON credentials file to:
   ```
   /Volumes/Workspace/Client Projects/ConstituencyApi/storage/app/google/vision-credentials.json
   ```

2. Add to your `.env` file:
   ```env
   # OCR Configuration
   OCR_ENGINE=google_vision
   GOOGLE_VISION_CREDENTIALS_PATH=/Volumes/Workspace/Client Projects/ConstituencyApi/storage/app/google/vision-credentials.json
   ```

3. To use Tesseract as fallback or for testing:
   ```env
   OCR_ENGINE=tesseract
   ```

### 4. Test the Integration

Run a test import:
```bash
php artisan tinker --execute="
use App\Services\OcrService;
\$ocr = new OcrService();
\$words = \$ocr->extractStructuredData('Output/nellithope 3.png');
echo 'Extracted ' . count(\$words) . ' words' . PHP_EOL;
echo 'First 5 words: ' . implode(', ', array_slice(array_column(\$words, 'text'), 0, 5)) . PHP_EOL;
"
```

## Cost Information

### Google Cloud Vision Pricing
- First 1,000 images/month: **FREE**
- 1,001 - 5,000,000 images: **$1.50 per 1,000 images**
- 5,000,001+ images: **$0.60 per 1,000 images**

### Example Cost Calculation
- Processing 10,000 voter roll pages:
  - First 1,000 pages: $0 (free tier)
  - Next 9,000 pages: 9 × $1.50 = **$13.50**
  - Total: **$13.50**

## Benefits Over Tesseract

✅ **Accuracy**: 95%+ vs Tesseract's 70-80%
✅ **OCR Error Handling**: Better handling of $ → S, O → 0, etc.
✅ **Layout Detection**: Superior column and structure detection
✅ **Text Quality**: Works better with poor quality scans
✅ **Maintenance**: No need for image preprocessing

## Fallback Mechanism

The application automatically falls back to Tesseract if:
- Google Vision credentials are not configured
- API call fails
- Network issues occur
- Rate limits are hit

This ensures your application continues working even if Google Cloud Vision is unavailable.

## Security Notes

⚠️ **Important**: Add the credentials file to `.gitignore`:
```
storage/app/google/vision-credentials.json
```

Never commit API credentials to version control!

## Monitoring Usage

Monitor your Google Cloud Vision usage:
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Select your project
3. Navigate to "APIs & Services" > "Dashboard"
4. View "Cloud Vision API" usage statistics

## Testing Both Engines

You can switch between engines anytime by changing the `.env`:

```bash
# Use Google Vision (recommended for production)
OCR_ENGINE=google_vision

# Use Tesseract (free, lower accuracy)
OCR_ENGINE=tesseract
```

No code changes needed - the OcrService automatically handles both!
