NLP With Deep Learning (W266)

Submission by Carolina Arriaga, Ayman, Abhi Sharma

Winter 2021 | UC Berkeley

ShapSum: A Framework to Predict Human Judgement Multi-Dimensional Quality Scores for Text Summarization

Text summarization is the task of producing a shorter version of a document. Model performance has been compared amongst each other based mainly on their ROUGE score. The metric has been widely criticized because it only assesses content selection and does not account for other quality metrics such as fluency, grammaticality, coherence, consistency and relevance (Ruder). (Lin, 2004) Combined score metrics like BLEND or DPMFcomb incorporate lexical, syntactic and semantic based metrics and achieve high correlation with human judgement (Yu et al., 2015) in the MT and text generation context. However, none of these combined metrics have been tested in summaries, and particularly, have moved away from human scores based on Pyramid and Responsiveness scores. Our findings show that multiple metrics used in the summarization field are predictive of multidimensional quality evaluations from experts. We produced four saturated models using decision trees and the corresponding surrogate Shapley explanation models to measure metric contribution against four dimensions of evaluation (fluency, rele-vance, consistency, coherence). We hope that our work can be used as a standard evaluation framework to compare summary quality between new summarization models.

If you are looking for the auxiliary analysis done by the team regarding varying length summary output vs summary length, along with additional exploration - that can be found here.

Project outputs

Link to Google Drive folder.
Link to paper.
Link to presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
analysis		analysis
code		code
data		data
presentation		presentation
report		report
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP With Deep Learning (W266)

ShapSum: A Framework to Predict Human Judgement Multi-Dimensional Quality Scores for Text Summarization

Project outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP With Deep Learning (W266)

ShapSum: A Framework to Predict Human Judgement Multi-Dimensional Quality Scores for Text Summarization

Project outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages