How an simple mistake blocked my stream processing and had me debugging something that I was sure was working for the past week.
Simple bug breaks down my whole article processing pipeline
When I created my DynamoDB stream, I knew that I will be only adding articles, I was not considering removing records from the table at all.
As such, I created a python lambda function, that always checks the NewImage field in the incomming records and processes it. Everything was working smoothly for a few days, until I decided to change an articles sort key.
DynamoDB does not support changing item keys, so what happens when you change the key in the administriation console, it offers you to transactionaly remove the old record and insert a new one. And this is exactly where the problem came up.
Stream is reliable
DynamoDB streams are doing their best to deliver the records where you want to get them and tries to do it until they time out in 24 hours. Also, they are designed to deliver all records in order. The result is, that when your processing lambda fails on one record, it tries again and again until this record times out.
When a REMOVE event came instead of MODIFY or INSERT as a result of the sort key change, it did not contain the NewImage field and the lambda crashed on it. All subsequent records in the stream were therefore blocked by this single record.
Take away:
Always be ready to process all types of events from your stream. A simple:
if record["eventName"] in ["INSERT", "MODIFY"]:
can save you a lot of headache.