AWS Step Functions Distributed Map ResultWriter Example

I’m using AWS Step Functions to do some complex orchestration of services that could span more than 25,000 state transitions and exchange data sets larger than 256KB, so I’m making heavy use of the new distributed map feature. It definitely makes things easier than the old everything-is-a-child-execution approach! However, the ResultWriter field is not particularly well-documented, so I’m hoping to shed some light on it here with a simple example.

Simple Example Step Function

I built the above (very) simple step function to test the distributed map function. It basically just runs a distributed map over the items attribute of its input:

{
  "Comment": "Simple example of distributed map state",
  "StartAt": "Map",
  "States": {
    "Map": {
      "Type": "Map",
      "ItemProcessor": {
        "ProcessorConfig": {
          "Mode": "DISTRIBUTED",
          "ExecutionType": "STANDARD"
        },
        "StartAt": "Pass",
        "States": {
          "Pass": {
            "Type": "Pass",
            "End": true
          }
        }
      },
      "End": true,
      "Label": "Map",
      "MaxConcurrency": 1000,
      "ResultWriter": {
        "Resource": "arn:aws:states:::s3:putObject",
        "Parameters": {
          "Bucket": "my-bucket-name",
          "Prefix": "garbage"
        }
      },
      "InputPath": "$.items"
    }
  }
}

I then executed this step function with the following (very) simple input:

{
  "items": [
    "alpha",
    "bravo",
    "charlie",
    "delta",
    "echo"
  ]
}

The output from the workflow was:

{
  "MapRunArn": "arn:aws:states:us-east-2:123456789012:mapRun:MyStateMachine-deafh9ubj/Map:6dbfba93-bcff-410b-9b5c-abf41ae9eddb",
  "ResultWriterDetails": {
    "Bucket": "my-bucket-name",
    "Key": "garbage/6dbfba93-bcff-410b-9b5c-abf41ae9eddb/manifest.json"
  }
}

The map run produced the following (not so) simple ResultWriter output at the configured prefix in S3:

ResultWriter S3 Output

You can download the raw files above if you’re curious. Their (abridged) contents follow here:

# manifest.json
{
  "DestinationBucket": "my-bucket-name",
  "MapRunArn": "arn:aws:states:us-east-2:123456789012:mapRun:MyStateMachine/Map:4977ded7-b097-3bee-81e1-7863c5660f29",
  "ResultFiles": {
    "FAILED": [],
    "PENDING": [],
    "SUCCEEDED": [
      {
        "Key": "garbage/4977ded7-b097-3bee-81e1-7863c5660f29/SUCCEEDED_0.json",
        "Size": 2328
      }
    ]
  }
}
# SUCCEEDED_0.json
[
  {
    "ExecutionArn": "arn:aws:states:us-east-2:123456789012:execution:MyStateMachine/Map:a47c247f-5416-3386-89b8-8a7a81662ac3",
    "Input": "\"alpha\"",
    "InputDetails": {
      "Included": true
    },
    "Name": "a47c247f-5416-3386-89b8-8a7a81662ac3",
    "Output": "\"alpha\"",
    "OutputDetails": {
      "Included": true
    },
    "StartDate": "2023-04-28T02:24:27.999Z",
    "StateMachineArn": "arn:aws:states:us-east-2:123456789012:stateMachine:MyStateMachine/Map",
    "Status": "SUCCEEDED",
    "StopDate": "2023-04-28T02:24:28.093Z"
  },
  {
    "ExecutionArn": "arn:aws:states:us-east-2:123456789012:execution:MyStateMachine/Map:7b7f4229-c44e-35c3-bed2-8d0ed218230a",
    "Input": "\"bravo\"",
    "InputDetails": {
      "Included": true
    },
    "Name": "7b7f4229-c44e-35c3-bed2-8d0ed218230a",
    "Output": "\"bravo\"",
    "OutputDetails": {
      "Included": true
    },
    "StartDate": "2023-04-28T02:24:28.089Z",
    "StateMachineArn": "arn:aws:states:us-east-2:123456789012:stateMachine:MyStateMachine/Map",
    "Status": "SUCCEEDED",
    "StopDate": "2023-04-28T02:24:28.158Z"
  },
  # more here...
]

The ResultWriter output includes all of the information about each execution of the distributed map, including its inputs and outputs. However, based on the contents of the manfiest and the naming of the SUCCEEDED_0.json file, it looks like the data can be spread across more than one file.

The distributed map step can read data to process from S3 using ItemReader. This is obviously enormously handy when trying to process data larger than 256KB! It would be nice to have an “inverse” operation to write data to S3, too. This would allow users to move data into and out of S3 using the Step Function framework only so that underlying code could focus on business logic while ignoring storage concerns. Unfortunately, ResultWriter is not that natural opposite, since output can be spread across multiple files. ItemReader can also work on lists of files, but doing so would require processors to read data from S3, which defeats the point.

I hope this was a useful example of how the AWS Step Function Distributed Map ResultWriter attribute works and the data that it generates in S3. Happy building!