paint-brush
Inside the Quietly Powerful Architecture That Could Power User Recommendationsby@vignesh3193
113 reads New Story

Inside the Quietly Powerful Architecture That Could Power User Recommendations

by Vignesh Kamath5mApril 8th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Using a personalized playlist generating system to explain event driven architecture using AWS systems.

Company Mentioned

Mention Thumbnail
featured image - Inside the Quietly Powerful Architecture That Could Power User Recommendations
Vignesh Kamath HackerNoon profile picture

What Are Event-Driven Systems?

An event-driven system is an architecture where components react to events rather than waiting for direct requests. In contrast to traditional synchronous systems, where services make blocking calls and wait for responses, event-driven architectures allow services to communicate asynchronously via event notifications.

Key Components of Event-Driven Systems

  1. Event Producers – Generate events when actions occur (e.g., a user skips a song).
  2. Event Routers – Deliver events to interested subscribers (e.g., SNS, EventBridge).
  3. Event Consumers – Process events and take necessary actions (e.g., SQS, Lambda, database updates).

How Are They Different from Synchronous Systems?

Feature

Synchronous Systems

Event-Driven (Asynchronous) Systems

Processing

Request-response model (blocking)

Event-based (non-blocking)

Latency

Users wait for responses

Immediate response while processing happens in the background

Fault Tolerance

Failures propagate through services

Failures are isolated from other processes

When Should Event-Driven Systems Be Used?

As the title suggests, the obvious use cases are for systems that rely primarily on events (For example a package delivery tracker). However the key use case I want to focus on here is to facilitate expensive asynchronous data processing to allow for low latency request based retrieval.

Explained Using an Example: Personalized Playlists

What are Personalised Playlists?

Spotify continuously updates personalized playlists (like Discover Weekly) based on user interactions. For example, when a user skips/likes a song, that data is processed asynchronously to update their music preferences.


A synchronous system would try to generate a playlist on request.. Instead, an event-driven system ensures:

  1. Fast User Experience – The UI does not wait for processing.
  2. Efficient Processing – Large-scale data updates happen asynchronously.
  3. Scalability – Millions of users can trigger song events without overwhelming the system.

Personalized playlist Event-Driven Architecture Using AWS


Architecture example


Such architecture would work as follows:

Step 1: The Event Producer – User Skips/likes a Song

  • When a user skips or likes a song in the Spotify app, an event (SongSkipped, SongLiked) is generated.
  • This event is published to Amazon SNS (Simple Notification Service), which acts as an event router.

Step 2: Event Routing – SNS Distributes the Event

  • SNS broadcasts the event to multiple subscribers:
    • SQS Queue (for batch processing of user preferences).
    • AWS Lambda (for real-time metrics and analytics updates).
    • EventBridge (if needed for integration with third-party recommendation models).

Step 3: Event Consumer – Processing in SQS and Lambda

  • Amazon SQS (Simple Queue Service) receives the event and stores it in a queue. Note multiple queues can subscribe to an SNS event.

  • A personalization microservice (running in AWS Fargate or Lambda) pulls events from the SQS queue and processes them:

    1. Updates the User Preferences Store in Amazon DynamoDB.
    2. Updates an aggregated machine learning model that improves recommendations.
    3. The machine learning model re generates the playlist with new information and stores it in DynamoDb
  • A user activity lambda listens through its own queue (subscribed to multiple activity topics) and updates the activity table

  • A playlist preview service listens to DynamoDB stream events and saves some playlists in ElastiCache with song previews

Step 4: Real-Time API Call for Updated Playlists

  • When the user later requests a playlist, the Spotify API fetches recommendations from:

    1. ElastiCache (Redis) – Fast retrieval of recommendations with previews
    2. DynamoDB (if Redis cache is expired).
  • The user instantly sees an updated personalized playlist based on their past interactions.


AWS Services Used and Their Role

AWS Service

Role in Architecture

Amazon SNS

Publishes and routes the events (ex: SongSkipped) to multiple subscribers.

Amazon SQS

Stores events for batch processing, ensuring scalability.

AWS Lambda

Processes real-time analytics updates and sends metrics.

Amazon DynamoDB

Stores user preferences and recommendation history.

Amazon ElastiCache (Redis)

Provides low-latency playlist retrieval for API calls.

Amazon SageMaker

Hosts and runs machine learning models

AWS Fargate

Compute platform for heavy logic microservices

AWS S3

Cold storage for large data (music files)

Advantages of this Architecture

  • Instant User Experience – Playlists update asynchronously without blocking requests.
  • Scalable Event Processing – SNS and SQS handle millions of concurrent events and event bursts. SQS can decide the queue polling frequency to ensure the backend systems have an even load.
  • Optimized Costs – AWS Lambda and Fargate scale based on demand, reducing idle costs.
  • Decoupled Services – Microservices can evolve independently.

Disadvantages of Event-Driven Systems

While event-driven systems provide scalability and flexibility, they have some trade-offs:

  • Complex Debugging & Monitoring – Events flow asynchronously, making debugging harder.
  • Event Ordering Challenges – Handling event order across distributed services can be tricky. Events may come in outside expected orders and you must account for how to handle them.
  • Increased Latency for Data Consistency – Updates propagate asynchronously, leading to eventual consistency instead of immediate consistency.
  • Error Handling Complexity – Failed events require dead-letter queues (DLQs) to prevent message loss. Redriving dead letter queues also has challenges since these events are no longer accurate representations of the time they actually occurred.


Despite these challenges, using AWS monitoring tools like AWS X-Ray, CloudWatch, and SQS DLQs can mitigate many risks. Good logging and metrics are paramount to be able to audit and understand what happened and why.

Conclusion

Event-driven architectures allow platforms to process user interactions at scale while ensuring real-time API responses. By leveraging AWS services such as SNS, SQS, DynamoDB, Lambda, and ElastiCache, Systems can asynchronously update recommendations without affecting user experience.


This loose coupling of services ensures high availability, cost efficiency, and scalability, making event-driven systems the preferred choice for large-scale applications.


Would you adopt an event-driven approach in your applications? Let me know your thoughts!

References

Official AWS documentation:


Other: