Skip to content

Conversation

@frayle-ons
Copy link
Contributor

@frayle-ons frayle-ons commented Feb 3, 2026

✨ Summary

Package Level Error Handling

These changes add a custom error class, ClassifaiError used to catch and handle internal package errors, avoiding throwing raw errors or third party error messages in code.

Additional subclasses of ClassifaiError are also introduced for specific workflows within the package code including:

  • ConfigurationError
  • DataValidationError
  • ExternalServiceError
  • VectorisationError
  • IndexBuildError
  • HookError

The codebase for the Vectorisers, VectorStore and start_api functionality have also been updated to use these custom Error classes where needed.

Examples include:

  • Trying to call a Vectoriser embedding model API
        try:
            response = ollama.embed(model=self.model_name, input=texts)
        except Exception as e:
            raise ExternalServiceError(
                "Failed to generate embeddings using Ollama.",
                context={
                    "vectoriser": "ollama",
                    "model": self.model_name,
                    "cause": str(e),
                    "cause_type": type(e).__name__,
                },
            ) from e
  • Trying to execute a custom user hook and catching errors
        if "reverse_search_preprocess" in self.hooks:
            try:
                modified_query = self.hooks["reverse_search_preprocess"](query)
                query = VectorStoreReverseSearchInput.validate(modified_query)
            except Exception as e:
                raise HookError(
                    "reverse_search_preprocess hook raised an exception.",
                    context={
                        "hook": "reverse_search_preprocess",
                        "cause_type": type(e).__name__,
                        "cause_message": str(e),
                    },
                ) from e

In these examples there is a core error message that isolates the error to a specific part of the code, and additional context arguments are passed that are appropriate to that specific runtime error. The ClassifaiError builds these additional context arguments into the error message at runtime to provide extra information to certain errors. For example, the above hook error 'cause_message' context will inform the user if error happened in the hook itself or in the validation aftermath.

However, there are some errors that require less additional context, where the cause_message/type is not included. Specifically in some of the DataValidationErrors, such as when an error is thrown for an invalid file path:

# ---- Input validation (caller mistakes) -> DataValidationError / ConfigurationError
if not isinstance(file_name, str) or not file_name.strip():
     raise DataValidationError("file_name must be a non-empty string.", context={"file_name": file_name})

Finally, while many of the above error classes are intuitively named and appear in the correct corresponding locations of the code, the generalised parent ClassifaiError catches more general processing errors such as from the search core logic or reverse search core logic.

Classifai Examples:

The following are taken from a Jupyter notebook session where I intentionally passed bad arguments to the package API:

Passing a made up model name to the hugging face vectoriser triggers an ExternalServiceError() with the corresponding message from hugging face:
Screenshot 2026-02-09 at 15 40 28

Passing 'paper' as the file type to the VectorStore init method returns a DataValidationError but does not need additional context messages beyond the users input:
Screenshot 2026-02-09 at 15 42 37

Pandera Dataclass schema validation errors remain untouched so that we do not lose the power of Pyndatic and Pandera. Not setting the 'id' column will return:

from classifai.indexers.dataclasses import VectorStoreSearchInput

input_data = VectorStoreSearchInput(
    {"query": ["I am a fruit farmer"]}
)
Screenshot 2026-02-09 at 15 44 34

I manually broke the VectorStore search method to showcase the ClassifaiError that catches general errors. Here I set the loaded document embeddings np.array to a string - "apple" before it could be used:
Screenshot 2026-02-09 at 15 45 09

FastAPI Level Error Handling

I experimented with adding FastAPI custom error handlers, which would allow us to handle ClassifAI errors in a specific way - i.e. choosing the response code (400s, 500s, etc) and what content should be returned in the API. It is possible to pass the entire error message this way showing what ClassifAI error was the cause, however the default FastAPI is to return a 500 internal server error, which I believe is fairly robust for now, returns no details of the internal package mechanisms. Currently the errors are logged to the terminal by default so a developer running a FastAPI ClassifAI instance can still observe the issue.

Possibly in the future we may want an optional parameter on the server setup which triggers returning ClassifAIError details through the FastAPI json response body - this can be achieved with FastAPI custom_exception handlers as described above.

The exception to this rule, is that API data validation done with the Pydantic models still returns 422 error response codes, indicating the data that is missing from the API request call.

FastAPI examples:

Example of the experimental setup which was not implanted in this version where we can pass a ClassifAI error directly through the API by setting up custom exception handlers to perform this custom logic on ClassifAIErrors.
Screenshot 2026-02-09 at 14 46 54

Example of final choice, where the same error is concealed by FastAPI default behaviour but the ClassifaiError is still logged in the console.

Screenshot 2026-02-09 at 15 25 47

Example of data validation error returning a 422 error through fastapi - I removed one of the essential columns from the API - the 'description'
Screenshot 2026-02-09 at 15 27 44

📜 Changes Introduced

  • feat: Added exceptions.py file to src directory which provides definition of ClassifaiError class
  • feat Added error subclasses for different sections of the classifai workflow - hooks, data validation, external calls, etc
  • refactor: Updated Vectoriser, Vectorstore and server logic to utilise Error class

✅ Checklist

Please confirm you've completed these checks before requesting a review.

  • Code passes linting with Ruff
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

Simply running several of the demo notebooks should show the tester several different types of issues that exist. For example, running the VectorStore constructor with several bad parameters including:

  • vectoriser="apple"
  • data_type="paper"

will result in several errors which inherit from ClassifAIError of types:

  • DataValidationError
  • ConfigurationError

Creating a bad hook function which just breaks due to the code, or returns a data object which is not of the correct dataclass type will return a HookError and will detail the bad content.

Intentionally sabotaging some of the codebase, for example in the VectorStore search method I abruptly set the loaded vdb doc embeddings to "apple":

        # ---- Main search (wrap operational failures) -> SearchError / VectorisationError
        try:
            doc_embeddings = self.vectors["embeddings"].to_numpy()
            doc_embeddings = "apple"

            all_results: list[pl.DataFrame] = []

         #remained of search method code
         ...

Doing so and rebuilding the package to run will show the more general class of ClassifaiError which can occur when an error falls in the general processing steps of the package and not specifically related to one of the sub class errors.

@frayle-ons frayle-ons marked this pull request as ready for review February 9, 2026 16:00
@github-actions github-actions bot added the enhancement New feature or request label Feb 9, 2026
rileyok-ons
rileyok-ons previously approved these changes Feb 11, 2026
Copy link
Collaborator

@rileyok-ons rileyok-ons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like how this works, should really enhance user experience

Copy link
Collaborator

@rileyok-ons rileyok-ons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff

@frayle-ons frayle-ons merged commit f21f873 into main Feb 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants