Skip to content

Inconsistent Output Formats in sem_extract #199

@superctj

Description

@superctj

PLEASE FILL IN THE BUG REPORT BELOW ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Bug Description

Values in an output column have inconsistent formats even when extract_quotes is set to False. See the screenshot below.

Expected Behavior

Values in an output column have the consistent format.

Steps to Reproduce

Environment Information

Operating System:

  • macOS
  • Linux
  • Windows
  • Other (please specify)

Python Version:

3.10

Package Versions:

1.1.3

Error Messages and Logs

N/A

Screenshots

Image

Minimal Reproduction Example

# Minimal code to reproduce the bug

text_df = pd.read_csv(args.text_filepath, sep=",", quotechar='"')

prompt = "Analyze the movie description and extract the director name."
text_input_cols = ["text"]
text_output_cols = {
    "director": "The director of the movie",
}

processed_text_df = text_df.sem_extract(
    text_input_cols,
    text_output_cols,
    # extract_quotes=False,
    # return_raw_outputs=False,
)
print(processed_text_df.head())

Link to download the CSV file.

Additional Context

Checklist

  • I have searched existing issues to avoid duplicates
  • I have provided all required information
  • I have tested with the latest version of the package
  • I have included a minimal reproduction example (if applicable)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions