How I configured the DynamoDB stream that sends newly created articles to a lambda function for processing.
Goal
Whenever I insert a new article into my table, I want a lambda function to read the record and process it. Processing includes:
- extracting header, abstract and date
- noting last change from the stream event metadata
- compiling the markdown content to html
Creating the table
Creating a table in AWS SAM is really easy, it is just a few lines in the template.yaml file.
My table template looks like this:
BlogTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeName: PK
AttributeType: S
- AttributeName: SK
AttributeType: S
KeySchema:
- AttributeName: PK
KeyType: HASH
- AttributeName: SK
KeyType: RANGE
BillingMode: PAY_PER_REQUEST
DeletionProtectionEnabled: true
StreamSpecification:
StreamViewType: NEW_IMAGE
The most important choices here are the keys and BillingMode. I like to use general key names - PK for primary key and SK for secondary key. This way, the keys are easily reused for many types of entities without introducing confusion. Imagine having primary key called ArticleType and secondary key called ArticleDateTime and then wanting to save a link to an image and its metadata to this table. It would work, but would introduce confusion. Also when you write queries in your code, it is really convenient to know that whenever you need to use the primary key in your code, it is PK and secondary key is always SK, no matter what table you currently use.
The last two lines of the code specify that I want to create a stream from the table and send events containing the new state of the inserted record.
Joining a lambda to the stream
Joining a lambda function to a dynamo stream is quite straightforward.
This is how the Properties.Events.Stream
of my function looks like:
Events:
Stream:
Type: DynamoDB
Properties:
Stream: !GetAtt BlogTable.StreamArn
BatchSize: 100
StartingPosition: TRIM_HORIZON
I created the lambda, deployed to AWS and … nothing happened.
I did some googling and it lead me to a good resource Deep Dive into DynamoDB streams and the Lambda integration.
After spending some time reading the article, everything started working as expected. The queues and processing behind the stream just could take some time to do its thing, as it is an asynchronous distributed system. Lesson learned: Sometimes you have to be patient with AWS resources.