A Developer Workflow for Modern AWS Serverless Applications

Modern serverless applications on AWS are complex with a lot of moving parts. Mapping a developer workflow onto those applications can be difficult. This article discusses the developer workflow I have developed for complex serverless applications at aleph0, with example CloudFormation template and GitHub Action snippetes to illustrate the concepts.

Goals for Developer Workflow

An efficient developer workflow compatible with team growth and modern practices, e.g., CI/CD, Infrastructure as Code, etc.
Separate environments for staging, production, etc. are a requirement.

Assumptions for Developer Workflow

An AWS serverless architecture, per the title. A straw man architecture is documented below for the sake of this discussion.
Serverless components are managed independently. While managing everything together (i.e., in a single CloudFormation template) can simplify things, it’s important that the process survive growth in team size, complexity, etc.

Example Architecture

This article will use the following architecture to drive the discussion:

The architecture uses an API Gateway REST API to define endpoints that use StartSyncExecution to invoke an Express Step Function, which in turn invokes Lambda function(s) and/or other AWS services.

Building APIs with this architecture involves coordinating changes to three different components, all of which must work together:

API Gateway mapping template, one per endpoint
Express Step Function, one per endpoint
Lambda Function, one per microservice

The remainder of this discussion focuses on a developer workflow for building an application with this architecture.

Tools and Techniques

The developer workflow makes use of several modern best practices, which are outlined here. The tech stack this workflow uses to implement these practices are as follows:

Certainly, these tools could easily be substituted for others — GitLab for GitHub, Jenkins for GitHub Actions, and so on — but this workflow uses the above tooling.

Continuous Integration and Continuous Delivery

CICD — or Continuous Integration and Continuous Delivery — is a form of developer tooling that takes code changes and builds, validates, and deploys them automatically. It’s a cornerstore of modern development workflows and all developer workflows should use it. There are many reasons why, which have already been discussed at length online, but here are just a few:

Efficiency. Developers should be able to see the results of their code changes live on demand, preferably in 5 minutes or less. Wait is waste.
Consistency. Computers are better at performing tasks the same way than humans are. Putting computers in charge of deployments reduces deployment errors.
Security. Fewer people need access to the staging and production environments.
Definition of Done. Features are not done until their tests are passing in CI, and they are working in staging as deployed by CD.

This developer workflow uses the following (simple, bog-standard) CI/CD pipeline:

When a developer is ready to submit a code change, they push to GitHub. After the change has passed the team’s code review process and been accepted, a CI/CD process runs as a GitHub action, which ultimately deploys the code change to AWS.

References to CI/CD in the rest of the workflow are talking about this pipeline.

Infrastructure as Code

IaC — or Infrastructure as Code — is another form of developer tooling that takes a description of a software deployment and assembles it automatically. IaC is another cornerstone of modern development practices that all workflows should use. Its virtues have also been extolled online, but here are a few for reference:

Efficiency. Developers should be able to see the results of their code changes live on demand, preferably in 5 minutes or less. Wait is waste.
Consistency. Computers are better at performing tasks the same way than humans are. Putting computers in charge of deployments reduces deployment errors.
Security. Fewer people need access to the staging and production environments.
Definition of Done. Features are not done until their tests are passing in CI, and they are working in staging as deployed by CD.

The list should look familiar.

Given the nature of the task — building serverless architectures in AWS — this developer workflow uses AWS CloudFormation for IaC with the Serverless Application Model (SAM) transformation for ease of use.

References to IaC in the rest of the workflow, including sample templates, are talking about this stack.

Developer Workflow

Per the above architecture and assumptions, the Lambda Functions, Step Functions, and the REST API are all managed separately. It’s possible to manage all of these things together in one CloudFormation template, but that makes things simpler, so this discussion assumes everything is managed independently. It would not be difficult to adapt this process to the simpler case where everything is managed together.

Lambda Function Workflow

Each Lambda Function implements a microservice, and therefore has its own repository and can be deployed independently. The developer workflow for Lambda Functions is the standard CI/CD pipeline. The deployment is implemented as a SAM template, which allows it publish a new Lambda Function version and alias automatically on each deploy.

So when developers push a code change to a Lambda Function repository, the updated code becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”.

CloudFormation Template Snippet

This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml in the GitHub repository.

Note that the resource is of type AWS::Serverless::Function, not AWS::Lambda::Function.

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31      # Must use SAM xformation
Parameters:
  BuildId:
    Type: String
    Description: The build identifier
    AllowedPattern: "^[0-9A-Za-z]+$"
Resources:
  LambdaFunction:
    Type: AWS::Serverless::Function
    Properties:   
      # Many other properties for architecture, runtime, code, IAM, ... 
      Name: xyz
      AutoPublishAlias: stag               # Auto-publish stag alias
      Timeout: 30                          # 30sec is API Gateway timeout
      VersionDescription: !Ref BuildId     # Label for reference only
Outputs:
  LambdaFunctionArn:
    Value: !GetAtt LambdaFunction.Arn
    Description: The ARN of the XYZ Lambda function
    Export:
      Name: xyz-lambda-function-arn

  LambdaFunctionName:
    Value: !GetAtt LambdaFunction.Name
    Description: The Name of the XYZ Lambda function
    Export:
      Name: xyz-lambda-function-name

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the Lambda function. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery
on:
  push:
    branches:
      - main
permissions:
  id-token: write
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout source code
        uses: actions/checkout@v3
      - # Set up platform, e.g., Java actions/setup-java
      - # Configure build tool if necessary, e.g., mvn
      - # Set up caching if desired, e.g., maven repo actions/cache
      # See below for setting up OIDC for AWS permissions
      # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
      - name: Configure AWS credentials for us-east-2
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
          aws-region: "${{ vars.AWS_REGION }}"
      - # Perform build with build tool, e.g., mvn
      - name: Prepare CloudFormation stack
        # You must make cfn.yml in repo yourself, see above snippet
        run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
        env:
          REPOSITORY: "${{ github.event.repository.name }}"
          S3_BUCKET: "${{ vars.S3_BUCKET }}"
      - name: Deploy CloudFormation stack
        uses: aws-actions/aws-cloudformation-github-deploy@v1
        with:
          name: "${{ github.event.repository.name }}"
          template: cfn.packaged.yml
          parameter-overrides: >-
            BuildId=${{ github.sha }}
          no-fail-on-empty-changeset: 1
          # You must make this role yourself with proper IAM perms
          role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
          capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

Step Function Workflow

Each Step Function implements a service, which is comprised of orchestrated calls to the Lambda microservices from above and other AWS services. The developer workflow for Step Functions is the standard CI/CD pipeline. The repository only needs to contain the Step Function definition, the relevant GitHub actions, and a way to deploy the Step Function, e.g., a CloudFormation template.

Also, the Step Function must take as part of its input a value indicating the environment in which it’s been invoked, e.g., “stag” vs. “prod”. (Surprisingly, if a Step Function is invoked using an alias, that alias is not available inside the Step Function, even in the State Machine ARN, which is why this is necessary.) When calling a Lambda Function, the Step Function must use the environment value to resolve the alias to use to perform the invocation, typically using the environment value as the alias itself.

So when developers push a code change to a Step Function repository, the updated Step Function becomes available in the cloud shortly thereafter under an alias, e.g., “stag” short for “staging”. The Step Function must also be an Express Step Function (as opposed to Standard Step Function).

Starting Points

Users may find the template examples in the AWS Console to be very helpful in creating new State Machines.

The suggested templates are uncommonly useful!

CloudFormation Template Snippet

This snippet shows the relevant details of the CloudFormation template. The below CD workflow assumes the template is stored at /cfn.yml in the GitHub repository.

Note that the resource is of type AWS::Serverless::StateMachine, not AWS::StepFunctions::StateMachine.

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31      # Must use SAM xformation
Parameters:
  BuildId:
    Type: String
    Description: The build identifier
    AllowedPattern: "^[0-9A-Za-z]+$"
Resources:
  StateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      # Many other properties for architecture, runtime, code, IAM, ... 
      Name: xyz
      Type: EXPRESS                        # Has to be express
      AutoPublishAlias: stag               # Auto-publish stag alias
      # This example definition simply calls the Lambda function above.
      # Obviously, it can do anything you want it to.
      Definition:
        Comment: "${buildId}"
        StartAt: LambdaInvoke
        States:
          LambdaInvoke:
            Type: Task
            Resource: arn:aws:states:::lambda:invoke
            OutputPath: "$.Payload"
            Parameters:
              FunctionName: "${lambdaFunctionArn}:stag"
              Payload.$: "$"
            Retry:
            - ErrorEquals:
              - Lambda.ServiceException
              - Lambda.AWSLambdaException
              - Lambda.SdkClientException
              - Lambda.TooManyRequestsException
              IntervalSeconds: 1
              MaxAttempts: 3
              BackoffRate: 2
            End: true
      DefinitionSubstitutions:
        buildId: !Ref BuildId
        lambdaFunctionArn: !ImportValue xyz-lambda-function-arn
      Policies:
        - LambdaInvokePolicy:
            functionName: !ImportValue xyz-lambda-function-name
Outputs:
  StateMachineArn:
    Value: !Ref StateMachine
    Description: The ARN of the XYZ State Machine
    Export:
      Name: xyz-state-machine-arn
  StateMachineName:
    Value: !GetAtt StateMachine.Name
    Description: The name of the XYZ State Machine
    Export:
      Name: xyz-state-machine-name

For reference, this step function renders as follows in the excellent Workflow Studio:

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the State Machine. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery
on:
  push:
    branches:
      - main
permissions:
  id-token: write
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout source code
        uses: actions/checkout@v3
      # See below for setting up OIDC for AWS permissions
      # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
      - name: Configure AWS credentials for us-east-2
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
          aws-region: "${{ vars.AWS_REGION }}"
      - name: Prepare CloudFormation stack
        # You must make cfn.yml in repo yourself, see above snippet
        run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
        env:
          REPOSITORY: "${{ github.event.repository.name }}"
          S3_BUCKET: "${{ vars.S3_BUCKET }}"
      - name: Deploy CloudFormation stack
        uses: aws-actions/aws-cloudformation-github-deploy@v1
        with:
          name: "${{ github.event.repository.name }}"
          template: cfn.packaged.yml
          parameter-overrides: >-
            BuildId=${{ github.sha }}
          no-fail-on-empty-changeset: 1
          # You must make this role yourself with proper IAM perms
          role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
          capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

REST API Workflow

The REST API should be defined as an OpenAPI spec. A GitHub repository will be dedicated to the API and its deployment. The spec itself must either (a) reside in the repository, or (b) be publicly available over HTTP using a well-known URL. A process may be used to prepare the OpenAPI spec for deployment (for example, by adding API Gateway Extensions to the spec), or the spec may be stored already prepared.

The developer workflow for the REST API is the standard CI/CD pipeline. By default, the pipeline should deploy changes to a staging environment, which is represented as a REST API stage with the name “stag”. In addition to the normal “git push” trigger, the CI/CD pipeline may also have a manual trigger for cases when an externally-stored OpenAPI spec changes.

In general, endpoint implementations should invoke Step Function services using StartSyncExecution (which is why they must be Express Step Functions) and passing the stage name (available as $context.stage in mapping templates) as the required environment input.

So when developers push a code change to the REST API repository, the updated stag stage of the REST API becomes available in the cloud shortly thereafter.

CloudFormation Template Snippet

Serverless templates for REST APIs are essentially thin wrappers around OpenAPI specs with API Gateway extensions. The AWS Management Console has an outstanding builder for REST APIs that can export the definition automatically to help users get started. This is left as an exercise to the reader.

GitHub Actions CD Workflow Snippet

This snippet shows the skeleton of a CD workflow for the API Gateway. It can easily be adapted to an integration workflow as well. Note that it assumes the CloudFormation template is stored at /cfn.yml in the GitHub repository.

name: delivery
on:
  push:
    branches:
      - main
permissions:
  id-token: write
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout source code
        uses: actions/checkout@v3
      # See below for setting up OIDC for AWS permissions
      # https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#sample-iam-oidc-cloudformation-template
      - name: Configure AWS credentials for us-east-2
        uses: aws-actions/configure-aws-credentials@v1
        with:
          role-to-assume: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/github-oidc-role"
          aws-region: "${{ vars.AWS_REGION }}"
      - name: Prepare CloudFormation stack
        # You must make cfn.yml in repo yourself, see above snippet
        run: aws cloudformation package --template-file cfn.yml --s3-bucket "$S3_BUCKET" --s3-prefix artifacts/REPOSITORY >cfn.packaged.yml
        env:
          REPOSITORY: "${{ github.event.repository.name }}"
          S3_BUCKET: "${{ vars.S3_BUCKET }}"
      - name: Deploy CloudFormation stack
        uses: aws-actions/aws-cloudformation-github-deploy@v1
        with:
          name: "${{ github.event.repository.name }}"
          template: cfn.packaged.yml
          parameter-overrides: >-
            BuildId=${{ github.sha }}
          no-fail-on-empty-changeset: 1
          # You must make this role yourself with proper IAM perms
          role-arn: "arn:aws:iam::${{ vars.AWS_ACCOUNT_ID }}:role/cloudformation-deploy-role"
          capabilities: CAPABILITY_IAM, CAPABILITY_NAMED_IAM, CAPABILITY_AUTO_EXPAND

Workflow Examples

This section assumes that the API has already been deployed and contains at least one endpoint implemented as a Step Function that calls a Lambda Function.

If a developer needs to make a change to an existing Lambda Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Lambda Function to be deployed with the “stag” alias automatically. Note that this makes the updated Lambda Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any Step Functions automatically call this updated version due to the updated alias.

If a developer needs to make a change to an existing Step Function, then they simply push the change to the corresponding GitHub repository. This causes the updated Step Function to be deployed with the “stag” alias automatically. Note that this makes the updated Step Function available over the “stag” API immediately with no additional changes or deployments to Step Functions or the REST API required. Any REST API endpoints automatically call this updated version due to the updated alias.

If a developer needs to make a (backwards-compatible) change to the REST API or OpenAPI spec, then they simply push the change, and perhaps trigger the manual workflow in the corresponding GitHub repository. Ideally, any Step Functions and Lambda Functions it depends on will already have been deployed first. A version of this process can be used to bootstrap the API for the first time.

Promoting to Production

The API owner should write a set of integration tests that run against an API deployment and determine whether its behavior is acceptable. The REST API repository should have an action that runs these tests and performs the following steps if they all pass:

Apply the alias “prod” to all Lambda Function versions that currently have the “stag” alias
Apply the alias “prod” to all Step Function versions that currently have the “stag” alias
Deploy the “prod” REST API stage

The API owner can choose to have this action run:

Automatically, after a successful API “stag” deployment, which implements Continuous Deployment
Manually, on demand, by the API owner only, which implements Continuous Delivery

Note that as long as all changes being deployed are backwards compatible, no service interruption is implied.

On Breaking Changes

The above developer workflow works well for non-breaking changes, i.e., changes not involving updates to the call-level interface between the API and Step Function, or Step Function and Lambda Function. However, because changes to Step Functions and Lambda Functions are not transactional, breaking changes can result in (a) a brief period where incompatible components are in production, at best, or (b) incomplete deployments on partial failure resulting in broken endpoint(s), at worst.

To work around this, it may be necessary to hard-code a Lambda Function version in a Step Function (instead of using the alias) temporarily, or to hard-code a Step Function version in the REST API (instead of using the alias) temporarily, in order to avoid the issue. After a complete deployment with these hard-coded values, a normal deployment with appropriate aliases should be possible.

I suspect there is a version of this developer workflow that uses a different, dynamic alias (e.g., timestamps) instead of “prod” for the production environment that would solve this more transparently, but I haven’t run it to ground yet.

Conclusion

Hopefully, this developer workflow makes sense. I suspect it’s simpler in practice than the reader might infer. I’m very interested to hear users’ experiences!