feat: ClassifaiError class for Errors and logging #121
+853
−317
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Summary
Package Level Error Handling
These changes add a custom error class,
ClassifaiErrorused to catch and handle internal package errors, avoiding throwing raw errors or third party error messages in code.Additional subclasses of ClassifaiError are also introduced for specific workflows within the package code including:
ConfigurationErrorDataValidationErrorExternalServiceErrorVectorisationErrorIndexBuildErrorHookErrorThe codebase for the Vectorisers, VectorStore and start_api functionality have also been updated to use these custom Error classes where needed.
Examples include:
In these examples there is a core error message that isolates the error to a specific part of the code, and additional context arguments are passed that are appropriate to that specific runtime error. The
ClassifaiErrorbuilds these additional context arguments into the error message at runtime to provide extra information to certain errors. For example, the above hook error 'cause_message' context will inform the user if error happened in the hook itself or in the validation aftermath.However, there are some errors that require less additional context, where the
cause_message/typeis not included. Specifically in some of theDataValidationErrors, such as when an error is thrown for an invalid file path:Finally, while many of the above error classes are intuitively named and appear in the correct corresponding locations of the code, the generalised parent
ClassifaiErrorcatches more general processing errors such as from the search core logic or reverse search core logic.Classifai Examples:
The following are taken from a Jupyter notebook session where I intentionally passed bad arguments to the package API:
Passing a made up model name to the hugging face vectoriser triggers an

ExternalServiceError()with the corresponding message from hugging face:Passing 'paper' as the file type to the VectorStore init method returns a

DataValidationErrorbut does not need additional context messages beyond the users input:Pandera Dataclass schema validation errors remain untouched so that we do not lose the power of Pyndatic and Pandera. Not setting the 'id' column will return:
I manually broke the VectorStore search method to showcase the

ClassifaiErrorthat catches general errors. Here I set the loaded document embeddings np.array to a string - "apple" before it could be used:FastAPI Level Error Handling
I experimented with adding FastAPI custom error handlers, which would allow us to handle ClassifAI errors in a specific way - i.e. choosing the response code (400s, 500s, etc) and what content should be returned in the API. It is possible to pass the entire error message this way showing what ClassifAI error was the cause, however the default FastAPI is to return a 500 internal server error, which I believe is fairly robust for now, returns no details of the internal package mechanisms. Currently the errors are logged to the terminal by default so a developer running a FastAPI ClassifAI instance can still observe the issue.
Possibly in the future we may want an optional parameter on the server setup which triggers returning ClassifAIError details through the FastAPI json response body - this can be achieved with FastAPI custom_exception handlers as described above.
The exception to this rule, is that API data validation done with the Pydantic models still returns 422 error response codes, indicating the data that is missing from the API request call.
FastAPI examples:
Example of the experimental setup which was not implanted in this version where we can pass a ClassifAI error directly through the API by setting up custom exception handlers to perform this custom logic on ClassifAIErrors.

Example of final choice, where the same error is concealed by FastAPI default behaviour but the ClassifaiError is still logged in the console.
Example of data validation error returning a 422 error through fastapi - I removed one of the essential columns from the API - the 'description'

📜 Changes Introduced
✅ Checklist
terraform fmt&terraform validate)🔍 How to Test
Simply running several of the demo notebooks should show the tester several different types of issues that exist. For example, running the VectorStore constructor with several bad parameters including:
vectoriser="apple"data_type="paper"will result in several errors which inherit from
ClassifAIErrorof types:DataValidationErrorConfigurationErrorCreating a bad hook function which just breaks due to the code, or returns a data object which is not of the correct dataclass type will return a
HookErrorand will detail the bad content.Intentionally sabotaging some of the codebase, for example in the VectorStore search method I abruptly set the loaded vdb doc embeddings to "apple":
Doing so and rebuilding the package to run will show the more general class of
ClassifaiErrorwhich can occur when an error falls in the general processing steps of the package and not specifically related to one of the sub class errors.