Task: Redesign streetview module

## Describe the task
Redesign the streetview module to work with the new pipeline and cloud-based data architecture. The old implementation (since deleted) relied on a deprecated PostgreSQL database and needs to be updated to work with GeoParquet files in the cloud. The new system should intelligently identify new vacant properties, implement proper rate limiting and parallelization, and optimize image collection processes.

## Acceptance Criteria
- [ ] **Data Source Migration**: Replace PostgreSQL queries with GeoParquet file processing from cloud storage
- [ ] **Incremental Processing**: Implement logic to compare current and previous month's datasets using `opa_id` and `vacant` columns to identify newly vacant properties
- [ ] **Image Refresh Logic**: Add functionality to pull new imagery for properties that haven't been updated in the past year
- [ ] **Parallelization**: Implement concurrent processing with proper rate limiting to respect Google Street View API limits
- [ ] **ZenSVI Integration**: Evaluate and potentially integrate [ZenSVI](https://github.com/koito19960406/ZenSVI) for improved street view image collection
- [ ] **Geometry-based Queries**: Consider replacing address-based queries with parcel geometry coordinates for more accurate image positioning
- [ ] **Enhanced Error Handling**: Implement comprehensive error handling for API failures, network issues, and data processing errors
- [ ] **Improved Logging**: Add structured logging with appropriate log levels for monitoring and debugging
- [ ] **Configuration Management**: Externalize configuration parameters (API keys, rate limits, image parameters, etc.)
- [ ] **Testing**: Include unit tests for core functionality and integration tests for the full pipeline

## Additional context
Our previous module looked like this:
```
import os
import time
from urllib.parse import quote

import pandas as pd
import requests
from classes.featurelayer import google_cloud_bucket
from config.psql import conn

# Configure Google
bucket = google_cloud_bucket()
key = os.environ["CLEAN_GREEN_GOOGLE_KEY"]
bucket_name = bucket.name


# Helper Functions
def get_streetview_metadata(address):
    """Fetches metadata from the Street View API."""
    url = f"https://maps.googleapis.com/maps/api/streetview/metadata?location={quote(address)}, Philadelphia, PA&key={key}"
    response = requests.get(url)
    return response.json()


def get_streetview_image(address):
    """Fetches an image from the Street View API."""
    image_url = f"https://maps.googleapis.com/maps/api/streetview?location={quote(address)}, Philadelphia, PA&key={key}&size=600x400"
    response = requests.get(image_url)
    return response.content


def update_blob_metadata(blob, metadata):
    """Updates the metadata of an existing blob."""
    blob.metadata = metadata
    blob.patch()
    print(f"Metadata updated for {blob.name}")


def upload_image_with_metadata(blob, image_content, metadata):
    """Uploads an image with metadata to a GCP bucket."""
    blob.metadata = metadata
    blob.upload_from_string(image_content, content_type="image/jpeg")
    print(f"Image uploaded to {bucket_name}/{blob.name} with metadata")


# Load Data
properties = pd.read_sql("select * from vacant_properties_end", conn)
print(len(properties), "properties loaded from database")

# Get list of all filenames in bucket
blobs = bucket.list_blobs()
blobs = [blob.name.split(".")[0] for blob in blobs]
print(f"Found {len(blobs)} images in bucket")

# Remove from properties any value of opa_id that is in blobs
properties = properties[~properties.opa_id.astype(str).isin(blobs)]
print(f"Found {len(properties)} images to fetch")


for idx, row in properties.iterrows():
    opa_id = row["opa_id"]
    file_name = f"{opa_id}.jpg"
    blob = bucket.blob(file_name)

    # Check if file already exists (shouldn't happen based on above code, but just to confirm)
    if blob.exists():
        print(f"Image {file_name} already exists")

    else:
        # Get streetview image
        print(f"Fetching image for {row['address']}")
        image_content = get_streetview_image(row["address"])
        metadata = get_streetview_metadata(row["address"])
        upload_image_with_metadata(blob, image_content, metadata)

    time.sleep(0.5)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task: Redesign streetview module #1232

Describe the task

Acceptance Criteria

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Task: Redesign streetview module #1232

Description

Describe the task

Acceptance Criteria

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions