New Story

From Script to Summary: A Smarter Way to Condense Movies

by Scripting Technology3mApril 9th, 2025

Too Long; Didn't Read

This paper introduces a new dataset of 100 movie scripts with human-annotated salient scenes and proposes a two-stage model, SELECT & SUMM, which first identifies key scenes and then generates summaries using only those. The approach outperforms prior models in accuracy and efficiency, making movie script summarization more scalable and informative.

People Mentioned

featured image - From Script to Summary: A Smarter Way to Condense Movies

‘a hollywood movie script’ Image created by HackerNoon AI Image Generator

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Table of Links

A. Further Implementation Details

All experiments were performed on an A100 GPU with 80GB memory. It took approximately 22 hours to fully fine-tune the LED model and 30 hours for the Pegasus-X model. The LED-based models have 161M parameters, which were all fine-tuned. Our Scene Saliency Model has 60.2M parameters. The total number of parameters is 221.2M. The Pegasus-X has 568M parameters but its performance is lower than LED.

For evaluation, we used Benjamin Heinzerling’s implementation of Rouge[5] and BERTScore with the microsoft/deberta-xlarge-mnli model.

B. Scene Encoder Experiment

We compared the performance of Roberta with that of BART (Lewis et al., 2020) and LED (Encoder only) as the base models for computing scene embeddings in the classification of salient scenes. For each model, we employed the large variant and extracted the encoder’s last hidden state as scene embeddings. We report the results of scene saliency classification with different base models in Table 8. Among these models, Roberta’s embeddings performed marginally better and also had fewer parameters.

C. Classifier Robustness

To study the robustness of the scene saliency classifier we performed k-fold cross-validation with k = 5. We report mean results with standard deviation across all folds in Table 9. The low standard deviation shows that the performance of the scene classifier is robust across different folds.

D. Statistics for Summarization Result

All the ROUGE scores reported in the paper are mean F1 scores with bootstrap resampling with 1000 number of samples. To assess the significance of the results, we are reporting 95% confidence interval results for our model and the closest baseline in Table 10.

E. Samples of Movie Summaries

This paper is available on arxiv under CC BY 4.0 DEED license.

[5] https://github.com/bheinzerling/pyrouge

L O A D I N G
. . . comments & more!

About Author

Scripting Technology@scripting

Weaving spells of logic and creativity, bringing ideas to life, and automating the impossible.

Read my stories

TOPICS

machine-learning #movie-script-summarization #scene-saliency-detection #abstractive-summarization #long-document-summarization #nlp-for-screenplays #large-language-models #summarization-dataset #select-and-summ-model

THIS ARTICLE WAS FEATURED IN...

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

From Script to Summary: A Smarter Way to Condense Movies

Too Long; Didn't Read

People Mentioned

Table of Links

A. Further Implementation Details

B. Scene Encoder Experiment

C. Classifier Robustness

D. Statistics for Summarization Result

E. Samples of Movie Summaries

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES