Configuring a DynamoDB stream

How I configured the DynamoDB stream that sends newly created articles to a lambda function for processing.

Goal

Whenever I insert a new article into my table, I want a lambda function to read the record and process it. Processing includes:

extracting header, abstract and date
noting last change from the stream event metadata
compiling the markdown content to html

Creating the table

Creating a table in AWS SAM is really easy, it is just a few lines in the template.yaml file.

My table template looks like this:

BlogTable:
    Type: AWS::DynamoDB::Table
    Properties:
    AttributeDefinitions:
        - AttributeName: PK
          AttributeType: S
        - AttributeName: SK
          AttributeType: S
    KeySchema:
        - AttributeName: PK
          KeyType: HASH
        - AttributeName: SK
          KeyType: RANGE
    BillingMode: PAY_PER_REQUEST
    DeletionProtectionEnabled: true
    StreamSpecification:
        StreamViewType: NEW_IMAGE

The most important choices here are the keys and BillingMode. I like to use general key names - PK for primary key and SK for secondary key. This way, the keys are easily reused for many types of entities without introducing confusion. Imagine having primary key called ArticleType and secondary key called ArticleDateTime and then wanting to save a link to an image and its metadata to this table. It would work, but would introduce confusion. Also when you write queries in your code, it is really convenient to know that whenever you need to use the primary key in your code, it is PK and secondary key is always SK, no matter what table you currently use.

The last two lines of the code specify that I want to create a stream from the table and send events containing the new state of the inserted record.

Joining a lambda to the stream

Joining a lambda function to a dynamo stream is quite straightforward. This is how the Properties.Events.Stream of my function looks like:

      Events:
        Stream:
          Type: DynamoDB
          Properties:
            Stream: !GetAtt BlogTable.StreamArn
            BatchSize: 100
            StartingPosition: TRIM_HORIZON

I created the lambda, deployed to AWS and … nothing happened.

I did some googling and it lead me to a good resource Deep Dive into DynamoDB streams and the Lambda integration.

After spending some time reading the article, everything started working as expected. The queues and processing behind the stream just could take some time to do its thing, as it is an asynchronous distributed system. Lesson learned: Sometimes you have to be patient with AWS resources.

Goal#

Creating the table#

Joining a lambda to the stream#

Goal

Creating the table

Joining a lambda to the stream