Using AWS X-Ray to Trace and Understand Your Application

Mar 09, 2020 | By Trung Dinh

Circuit board X-Ray

As FloQast has focused on microservices and decoupling our services infrastructure, we have faced a number of challenges, including the ability to not just monitor and log our systems but understand the internals of our application with tracing. To begin to solve these problems, FloQast recently implemented AWS’s X-Ray.

The problem

As a fast growing company, we add more and more business logic to the codebase every single day. We began to notice that some code was creating bottlenecks. We wanted any decisions we made in addressing these bottlenecks to be data-driven and testable, so we started to gather metrics with the goal of improving performance for our end users.

A brief history of X-Ray

X-Ray is an AWS service that allows users to gain insight into requests that your application serves. It comes with a dashboard for users to view, filter, and collect meaningful data, which help identifying bugs and opportunities for optimization. You can also easily trace requests across AWS resources and other microservices.

A proof of concept for X-Ray

Wrapping the code

In order to monitor the business logic and database logic sections, we created our own X-Ray wrapper, using AWS SDK as an example.

Xray.callbackCapture = (identifier, segment, obj, func) => {
    return (...args) => {
        const subsegment = segment.addNewSubsegment(identifier);
        const callback = args[args.length - 1];
        args[callbackIdx] = (...callbackArgs) => {
            if (callbackArgs[0]) {
                subsegment.close(callbackArgs[0]);
            } else {
                subsegment.close();
            }

            return callback.apply(this, callbackArgs);
        }

        return func.apply(obj,args);
    }
}

Figure 1: An illustration of our X-Ray wrapper implementation

We also created these wrappers with flexibility in mind, so that they could work on both X-Ray wrapped routes and non X-Ray routes. It was important for us to not break the application in the event that X-Ray failed to work correctly.

The Daemon

As an X-Ray daemon is recommended by AWS to batch all the X-Ray requests before sending them to AWS. To do that, we built and ran this daemon as a sidecar to our main task in AWS ECS.


Figure 2: X-Ray daemon helps in batching segment requests

Now to the infrastructure, initially we considered an X-Ray daemon running in its own ECS service and its own EC2 container. However, our EC2 performance metrics showed that we had enough bandwidth to run an X-Ray daemon side by side with the main task. On top of that, we realized the original method would add cost and also requires an independent deploy pipeline for AWS X-Ray. As a result, we decided it was more efficient to have a new task definition to run AWS X-Ray daemon side by side with the main app task in the same EC2 container.

The following resources should be added to your Terraform in the terraform-infrastructure repository:

cloudwatch.tf:

resource "aws_cloudwatch_log_group" "xray" {
  name = "/fq/${var.environment}/${var.service_name}-watcher/instance/xray"
}

ecs.tf:

resource "aws_ecs_service" "ecs-watcher" {
  ...
  depends_on = [
    ...
    "aws_cloudwatch_log_group.xray"
  ]
  ...

policies/ecsServiceInstanceRolePolicy.json:

   {
        "Sid": "XRayDaemonWriteAccess",
        "Effect": "Allow",
        "Action": [
            "xray:PutTraceSegments",
            "xray:PutTelemetryRecords",
            "xray:GetSamplingRules",
            "xray:GetSamplingTargets",
            "xray:GetSamplingStatisticSummaries"
        ],
        "Resource": [
            "*"
        ]
    },

task-definitions/task-definition.json: (NOTE: This file may exist in your application's repository and have a different name)

[
  {
    ...
    "environment": [
      {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "fq-${ENVIRONMENT}-xray:2000"
      }
    ],
    ...
    "links": [
      "fq-${ENVIRONMENT}-xray"
    ]
    ...
  },
  {
    "name": "fq-${ENVIRONMENT}-xray",
    "essential": true,
    "image": "${AWS_ACCOUNT}.dkr.ecr.us-west-2.amazonaws.com/xray:latest",
    "cpu": 32,
    "memoryReservation": 256,
    "portMappings" : [
      {
        "hostPort": 0,
        "containerPort": 2000,
        "protocol": "udp"
      }
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/fq/${ENVIRONMENT}/${SERVICE_NAME}/instance/xray",
        "awslogs-region": "${REGION}",
        "awslogs-stream-prefix": "xray"
      }
    }
  }
]

Sampling

Since costs are dependent on the segment api call count, we also wanted the ability to only monitor routes that we need to, and the ability to add or remove X-Ray from the routes quickly if we ever needed to make a change. This led to us configuring the X-Ray route as code in the server instead of setting them up through Terraform and AWS Console.


Figure 3: Our X-Ray sampling rule

It’s alive!

As the data started flowing in, we were able to look at the production X-Ray dashboard to identify the code sections that could be improved.


Figure 4: An illustration of our working X-Ray trace!

Learnings going forward

As we continue to grow as a data-driven organization, monitoring data for our application allows the team to allocate effort on the right areas of the code.

Monitoring services also usually require a quick implementation with a focus on scalability. Third party monitoring tools like AWS X-Ray that work out of the box and are easy to implement are a great answer to these challenges.

Interested in joining our Engineering team?

Trung Dinh
Trung Dinh
Just a software engineer who loves to learn and to laugh every day.

Check out research, videos, case studies, and more!

Learn more about working at FloQast!