# Full-text Fuzzy Search with DynamoDB and Typesense

This walk-through will show you how to ingest data from a DynamoDB table into Typesense, and then use Typesense to search through the data with typo-tolerance, filtering, faceting, etc.

At a high level we'll be setting up a Lambda function to listen for change events using DynamoDB streams (opens new window) and write the data into Typesense.

Typesense DynamoDB Integration Chart

# Step 1: Create Typesense Cluster

Sign up for an account on Typesense Cloud (opens new window), spin up a cluster and get the Endpoint URL, Port number and API key.

We're using Typesense Cloud for this walk-through since we need a public Typesense endpoint for the Lambda function to be able to write to.

You can also self-host Typesense on a server/provider of your choice. See Typesense Installation for more details on how to self-host Typesense.

# Step 2: Create a DynamoDB table

Create a DynamoDB table with your choice of name and partition key ("id" is recommended). After creating the table you want to enable streams in the Overview section of the AWS console.

You can also do this using AWS CLI:

aws dynamodb create-table \
    --table-name YourTableName \
    --attribute-definitions AttributeName=id,AttributeType=N \
    --key-schema AttributeName=id,KeyType=HASH  \
    --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Step 3: Create Lambda Execution Role

Now let's create a "Lambda Execution Role" i.e give permission for your function to access the resources it needs to.

Head over to the IAM Roles section in the AWS Console and create a new IAM role with three main permissions:

  • AmazonDynamoDBFullAccess
  • AmazonDynamoDBFullAccesswithDataPipeline
  • AWSLambdaBasicExecutionRole

WARNING

These IAM role permissions are just examples for the purposes of this guide. Before deploying for production, please consult the IAM documentation to only grant the minimal permissions needed for your particular use case.

You can also do this using AWS CLI:

Create a file named trust-relationship.json with the following contents.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Then, create execute the following command

aws iam create-role --role-name YourLambdaRole \
    --path "/service-role/" \
    --assume-role-policy-document file://trust-relationship.json

Now, create role-policy.json with the following contents. (Replace accountID and region)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:region:accountID:function:typesense-indexing*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:region:accountID:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DescribeStream",
        "dynamodb:GetRecords",
        "dynamodb:GetShardIterator",
        "dynamodb:ListStreams"
      ],
      "Resource": "arn:aws:dynamodb:region:accountID:table/typesense/stream/*"
    },
  ]
}

The policy has three statements that allow TypesenseLambdaRole to do the following:

  • Run a Lambda function typesense-indexing. We'll be creating the function later in this tutorial.
  • Access Amazon CloudWatch Logs. The Lambda function writes diagnostics to CloudWatch Logs at runtime.
  • Read data from the DynamoDB stream for typesense.

Now, we are going to attach the above roles to our IAM execution role which we have created

aws iam put-role-policy --role-name YourLambdaRole \
    --policy-name TypesenseLambdaRolePolicy \
    --policy-document file://role-policy.json

# Step 4: Create a Lambda Function

Head over to Lambda section of the AWS console and create a new Lambda function with the above created execution role. See AWS Lambda documentation for detailed information AWS Lambda Execution Role (opens new window)

For context, here's what an example event that DynamoDB will be calling our Lambda function with:

{
  "Records": [
    {
      "eventID": "2",
      "eventVersion": "1.0",
      "dynamodb": {
        "OldImage": {
          // Existing values
        },
        "SequenceNumber": "222",
        "Keys": {
          // your partion key and sort key
        },
        "SizeBytes": 59,
        "NewImage": {
          // New Values
        },
        "awsRegion": "us-east-2",
        "eventName": "MODIFY", // this can be 'INSERT', 'MODIFY' and 'DELETE'
        "eventSourceARN": "<AWS-ARN>",
        "eventSource": "aws:dynamodb"
      },
    }
  ]
}

Now let's add the following code to our Lambda function. We're using Python for this example, but you can use also use Node, Ruby or any languages that AWS Lambda supports.

def lambda_handler(event, context):
    client = typesense.Client({
        'nodes': [{
            'host': '<Endpoint URL>',
            'port': '<Port Number>',
            'protocol': 'https',
        }],
        'api_key': '<API Key>',
        'connection_timeout_seconds': 2
    })

    processed = 0
    for record in event['Records']:
        ddb_record = record['dynamodb']
        if record['eventName'] == 'REMOVE':
            res = client.collections['<collection-name>'].documents[str(ddb_record['OldImage']['id']['N'])].delete()
        else:
            document = ddb_record['NewImage'] # format your document here and the use upsert function to index it.
            res = client.collections['<collection-name>'].upsert(document)
            print(res)
        processed = processed + 1
        print('Successfully processed {} records'.format(processed))
    return processed

See the Typesense API documentation for detailed information about all the parameters available to create collections and documents.

TIP

Install all your dependencies using pip install <dependency-name> -t .. This will install all the dependencies for the function in the current directory, which is what Lambda expects.

After this, zip up your current directory and upload it to your Lambda function via the AWS Console.

You can also do this using AWS CLI:

  • Get the ARN for the execution role you created:

    aws iam get-role --role-name YourLambdaRole
    

    In the output, look for the ARN:

    ...
    "Arn": "arn:aws:iam::region:role/service-role/YourLambdaRole"
    ...
    
  • Now, create the Lambda function:

    aws lambda create-function \
      --region us-east-2 \
      --function-name YourLambdaFunction \
      --zip-file fileb://YourZipFile.zip \
      --role YourRoleARN \
      --handler lambda_function.lambda_handler \
      --timeout 5 \
      --runtime python3.7
    

# Step 5: Setup up a trigger

Now, navigate to your DynamoDB table in the AWS Console, visit the Triggers section and add this existing Lambda function to that table.

You can also do this using the AWS CLI:

  • Get ARN for DynamoDB table
    aws dynamodb describe-table --table-name YourTableName
    
    Note, the ARN for the stream:
    ...
    "LatestStreamArn": "arn:aws:dynamodb:`region`:`accountID`:table/`table-name`/stream/`timestamp`"
    ...
    
  • Now, add this ARN to Lambda:
    aws lambda create-event-source-mapping \
      --region us-east-1 \
      --function-name YourLambdaFunction \
      --event-source YourStreamARN \
      --batch-size 1 \
      --starting-position TRIM_HORIZON
    

TIP

When dealing with a large amount of changes in a high-traffic environment, we'd highly recommend that you batch writes into Typesense. You want to use something like Kinesis to stage the DynamoDB events and then batch write the changes into Typesense using the import endpoint.

And that's a wrap! Now your any data you create, update or delete in your DynamoDB table will be automatically indexed in your Typesense cluster.

# References

Last Updated: 10/7/2021, 10:16:33 AM