Event-Driven File-Processing Programming Model

SCAR supports an event-driven programming model suitable for the execution of highly-parallel file-processing applications that require a customized runtime environment.

The following command:

cat >> darknet.yaml << EOF
functions:
  aws:
  - lambda:
      name: scar-darknet-s3
      memory: 2048
      init_script: yolo.sh
      container:
        image: grycap/darknet
      input:
      - storage_provider: s3
        path: scar-darknet/input
      output:
      - storage_provider: s3
        path: scar-darknet/output
EOF

scar init -f darknet.yaml

Creates a Lambda function to execute the shell-script yolo.sh inside a Docker container created out of the grycap/darknet Docker image stored in Docker Hub.

The following workflow summarises the programming model:

  1. The Amazon S3 bucket scar-darknet is created with an input folder inside it if it doesn’t exist.
  2. The Lambda function is triggered upon uploading a file into the input folder created.
  3. The Lambda function retrieves the file from the Amazon S3 bucket and makes it available for the shell-script running inside the container in the path $TMP_INPUT_DIR. The $INPUT_FILE_PATH environment variable will point to the location of the input file.
  4. The shell-script processes the input file and produces the output (either one or multiple files) in the folder specified by the $TMP_OUTPUT_DIR global variable.
  5. The output files are automatically uploaded by the Lambda function into the output folder created inside of the scar-darknet bucket.

Many instances of the Lambda function may run concurrently and independently, depending on the files to be processed in the S3 bucket. Initial executions of the Lambda may require retrieving the Docker image from Docker Hub but this will be cached for subsequent invocations, thus speeding up the execution process.

For further information, examples of such application are included in the examples/ffmpeg folder, in order to run the FFmpeg video codification tool, and in the examples/imagemagick, in order to run the ImageMagick image manipulation tool, both on AWS Lambda.

More Event-Driven File-Processing thingies

SCAR also supports another way of executing highly-parallel file-processing applications that require a customized runtime environment.

After creating a function with the configuration file defined in the previous section, you can activate the SCAR event launcher using the run command like this:

scar run -f darknet.yaml

This command lists the files in the input folder of the scar-darknet bucket and sends the required events (one per file) to the lambda function.

Note

The input path must be previously created and must contain some files in order to launch the functions. The bucket could be previously defined and you don’t need to create it with SCAR.

The following workflow summarises the programming model, the differences with the main programming model are in bold:

  1. The folder ‘input’ inside the amazon S3 bucket ‘scar-darknet’ will be searched for files.
  2. The Lambda function is triggered once for each file found in the folder. The first execution is of type ‘request-response’ and the rest are ‘asynchronous’ (this is done to ensure the caching and accelerate the subsequent executions).
  3. The Lambda function retrieves the file from the Amazon S3 bucket and makes it available for the shell-script running inside the container. The $INPUT_FILE_PATH environment variable will point to the location of the input file.
  4. The shell-script processes the input file and produces the output (either one or multiple files) in the path specified by the $TMP_OUTPUT_DIR global variable.
  5. The output files are automatically uploaded by the Lambda function into the output folder of scar-darknet bucket.
_images/wait.png

Function Definition Language (FDL)

In the last update of SCAR, the language used to define functions was improved and now several functions with their complete configurations can be defined in one configuration file. Additionally, differente storage providers with different configurations can be used.

A complete working example of this functionality can be found here.

In this example two functions are created, one with Batch delegation to process videos and the other in Lambda to process the generated images. The functions are connected by their linked buckets as it can be seen in the configuration file:

cat >> scar-video-process.yaml << EOF
functions:
  aws:
  - lambda:
      name: scar-batch-ffmpeg-split
      init_script: split-video.sh
      execution_mode: batch
      container:
        image: grycap/ffmpeg
      input:
      - storage_provider: s3
        path: scar-video/input
      output:
      - storage_provider: s3
        path: scar-video/split-images
  - lambda:
      name: scar-lambda-darknet
      init_script: yolo-sample-object-detection.sh
      memory: 3008
      container:
        image: grycap/darknet
      input:
      - storage_provider: s3
        path: scar-video/split-images
      output:
      - storage_provider: s3
        path: scar-video/output
EOF

scar init -f scar-video-process.yaml

Using the common folder split-images these functions can be connected to create a workflow. None of this buckets or folders must be previously created for this to work. SCAR manages the creation of the required buckets/folders.

To launch this workflow you only need to upload a video to the folder input of the scar-video bucket, with the command:

scar put -b scar-video/input -p seq1.avi

This will launch first, the splitting function that will create 68 images (one per each second of the video), and second, the 68 Lambda functions that process the created images and analyze them.