paint-brush
From Script to Summary: A Smarter Way to Condense Moviesby@scripting
New Story

From Script to Summary: A Smarter Way to Condense Movies

by Scripting Technology3mApril 9th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This paper introduces a new dataset of 100 movie scripts with human-annotated salient scenes and proposes a two-stage model, SELECT & SUMM, which first identifies key scenes and then generates summaries using only those. The approach outperforms prior models in accuracy and efficiency, making movie script summarization more scalable and informative.

People Mentioned

Mention Thumbnail
featured image - From Script to Summary: A Smarter Way to Condense Movies
Scripting Technology HackerNoon profile picture
0-item

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

A. Further Implementation Details

All experiments were performed on an A100 GPU with 80GB memory. It took approximately 22 hours to fully fine-tune the LED model and 30 hours for the Pegasus-X model. The LED-based models have 161M parameters, which were all fine-tuned. Our Scene Saliency Model has 60.2M parameters. The total number of parameters is 221.2M. The Pegasus-X has 568M parameters but its performance is lower than LED.


For evaluation, we used Benjamin Heinzerling’s implementation of Rouge[5] and BERTScore with the microsoft/deberta-xlarge-mnli model.

B. Scene Encoder Experiment

Table 8: Performance of Scene Saliency Model for different base models as scene encoder.


We compared the performance of Roberta with that of BART (Lewis et al., 2020) and LED (Encoder only) as the base models for computing scene embeddings in the classification of salient scenes. For each model, we employed the large variant and extracted the encoder’s last hidden state as scene embeddings. We report the results of scene saliency classification with different base models in Table 8. Among these models, Roberta’s embeddings performed marginally better and also had fewer parameters.


Table 9: Cross validation result for scene saliency classifier.

C. Classifier Robustness

To study the robustness of the scene saliency classifier we performed k-fold cross-validation with k = 5. We report mean results with standard deviation across all folds in Table 9. The low standard deviation shows that the performance of the scene classifier is robust across different folds.


Table 10: Performance of Scene Saliency Model for different base models as scene encoder.

D. Statistics for Summarization Result

All the ROUGE scores reported in the paper are mean F1 scores with bootstrap resampling with 1000 number of samples. To assess the significance of the results, we are reporting 95% confidence interval results for our model and the closest baseline in Table 10.

E. Samples of Movie Summaries

Table 11: Gold reference summary for the movie Lincoln with sample of question-answer pairs generated for evaluation.


Table 12: Model generated summary of the movie Lincoln with answers to the generated question. The correct answers are represented by the green color, while the incorrect answers are represented by the red color. Some answers can be partially correct which can have both the colors.


Table 13: Gold reference summary for the movie Black Panther with sample of question-answer pairs generated for evaluation.


Table 14: Model generated summary of the movie Black Panther with answers to the generated question. The correct answers are represented by the green color, while the incorrect answers are represented by the red color. Some answers can be partially correct which can have both the colors.










This paper is available on arxiv under CC BY 4.0 DEED license.


[5] https://github.com/bheinzerling/pyrouge