Fargate and Cribl (Stream): How We Got It Working

Fargate and Cribl (Stream): How We Got It Working

Oct 2, 2023 Page Glave

There has been (and will be) a lot of talk about the ever-growing data companies are generating and the need to monitor it. Log centralization, SIEM (Security Information and Event Management), and the accompanying price tags have many looking to get better control over log ingestion. Cribl offers Stream as an observability pipeline that gives you much more control over the flow and structure of logs than typically available with ingestion to SIEM.

One of the nice things about Panther (our SIEM – learn more here, here, and here) is that you can ingest whatever you get to an S3 bucket. We have handled custom ingestions with AWS Lambdas, and while that worked well, the logs were not always as pretty as we would have liked. JSON is great, but sometimes things come in nested more than works easily for detections or have information that could be organized better.

That’s where Cribl Stream (Cribl) comes in. We can use it to bring the logs in, shape them as we see fit, and ship them on to wherever. For the most part, that means an S3 bucket. With Cribl, we are able to easily archive data to replay to Panther later, process the logs to make them more usable, and just about anything else we want to do. There are a ton of options, and for most of those, you should go to the Cribl resource page. The Sandbox and Cribl University are especially helpful.

But if you are looking for info on deploying Cribl using AWS Fargate, you’ve come to the right place. This isn’t a typical approach to running Cribl in containers – they have Helm charts available that you can use with EKS, but ECS and Fargate are our preferred approaches. Plus, who wants to follow documentation anyway?

Yeah, I do. At least for things like this. What’s there is enough to get most of the way there, but there are some gaps. My goal is to help fill those gaps so if you need to use Fargate for…reasons, you can do so without much difficulty. Setting up a dev/testing cluster via the AWS console and prod via Infrastructure as Code makes a lot of sense. It will also give you a nice testing area for when you need to make version upgrades. This won’t be a complete walkthrough – you’ll need working knowledge of AWS and/or Terraform. If you are familiar with working with AWS via the console and CLI and deploying resources via Terraform, the info here should fill in the missing pieces from the Cribl documentation. If you are not, seek out the appropriate documentation for AWS/Terraform as needed.

Is deploying via Fargate frowned upon by Cribl? Not really; it’s just not something they’ve got documented. Their Slack community has been fantastic when I’ve run into issues. Depending on your environmental requirements, you may also want to consider using EC2 for the leader. There are pros/cons either way, so YMMV.

Overview

Some architectural basics – for this example, we’re looking at a leader with a single group of 2 workers. The containers will be in their own VPC with internal and external network load balancers. The load balancers must be the network variety – the other options will not work. I would break out your security groups into small chunks for maximum isolation.

A basic breakdown of the infrastructure:

VPC and associated things – The typical 3 public and 3 private subnet approach will be fine
Internal network load balancer – NOT application or classic load balancers
External network load balancer – Ditto
Security groups – See diagram for ingress rules needed
- Worker(s) – I would make one per Cribl group
- Leader
- EFS
Fargate Cluster
Leader Fargate Service and Task Definition
Worker Fargate Service and Task Definition
EFS (persistent storage) with access point and mount target for the leader
ECR image – If you don’t want to just pull latest
IAM roles
- Leader task and task execution roles
- Worker task and task execution roles

A rough diagram of the architecture

If you go the Terraform route, your repo will need docker and terraform directories, and including a docker-compose.yaml file will make local testing a breeze. The basic structure would look something like this. I/we chose to split the Terraform to separate the main stuff from the EFS and VPC stuff because that made the most sense for us.

docker
- Dockerfile
- docker-compose.yaml
terraform
- cribl – TF files related to the cluster
  - policies – IAM policies
  - task_definition – JSON task definitions
- efs – TF files to set up EFS for persistent storage
- vpc – TF files to set up the VPC and associated infrastructure

You can split your Terraform files as desired/is accepted practice where you are. Whatever linter you use will probably complain about the format of the task definitions. But they are probably ok. Watch your environmental variables here – protect sensitive ones using the secrets block and SSM parameters or secrets.

Nothing scary yet, right?

Nuances

Where you will run into some fun is that sometimes the AWS console will not allow you to do what you need to do. Specifically, you will need to add additional target groups to the services because you can only add a single target group in the console. The additional ones will need to be added via the CLI. You will add the comma-separated target group ARNs in the --load_balancers parameter. You will need to include the initial one added via the console.

aws ecs update-service --cluster <cluster_name> --service <service_name> -- load-balancers "targetGroupArn=<arn_target_group_1>, targetGroupArn=<arn_target_group_2>"

Deploying with Terraform allows you to set up the networking as needed, so if you go that route for production, it’s pretty straightforward. The Terraform documentation on each of the resources will get you where you need to be.

For the leader, the Terraform will look something like…

resource "aws_ecs_service" "leader" {
  name                 = "cribl-leader"
  cluster              = aws_ecs_cluster.cribl.id
  task_definition      = aws_ecs_task_definition.cribl-leader.arn
  desired_count        = 1
  launch_type          = "FARGATE"
  platform_version     = "1.4.0"
  force_new_deployment = true

  network_configuration {
    subnets          = data.aws_subnet.private.*.id
    security_groups  = [aws_security_group.cribl-leader.id]
    assign_public_ip = false
  }

  # Internal load balancer for leader <> worker communication
  load_balancer {
    target_group_arn = aws_lb_target_group.cribl-coms.arn
    container_name   = "cribl-leader"
    container_port   = 4200
  }

  # External load balancer for access to leader GUI
  load_balancer {
    target_group_arn = aws_lb_target_group.cribl-gui.arn
    container_name   = "cribl-leader"
    container_port   = 9000
  }

  # And whatever else you need
}

Follow the same basic outline for the worker service(s). I would recommend putting the default workers in a specific subnet (not all the subnets) so you can easily put workers into different groups down the line. Using different subnets will let you create logic in Cribl to put the worker in a specific group based on IP.

Now on to the task definitions. Creating in the console is straightforward. You will want separate task definitions for the leader and each worker group. To create via Terraform, you can get the JSON either by creating the task definition in the console and pulling the JSON or working from an example. Give yourself enough compute to get the job done. You can adjust upward as needed, but I found the smallest options to not be enough. You might be able to get by with the 0.5 vCPU level, but I would size up to something in the 1 vCPU range for your testing. Use the sizing docs to help figure out what you need for prod.

Once you get a cluster where the worker(s) can talk to the leader (confirm by looking at the logs), you will also want to use persistent storage (add a volume) in the form of EFS so your configuration will persist – it’s also required for HA. However, remote git and persistent storage, both set as environmental variables, will break things. Set up persistent storage in the environmental variables and add a remote git if you want one via the Cribl GUI.

Important Note: Currently, the Cribl documentation shows putting the personal access token for GitHub into the URL of the remote git. DO NOT DO THIS. I mean, you can, but it will put the token into the repo in plain text. Which is just icky. Instead, use the basic authentication option and use the token in place of the password – this will encrypt the value. You should be using a private repo, which would limit the risk of logging the token, but it’s just not something you should do. And make sure to scope the token to the specific repo needed and limit permissions. (They are working on updating the docs – I got help in the Cribl Slack to address this.)

As you add storage, sources, and destinations, remember that you will need to update the infrastructure to allow this. So if you want to add syslog, you’ll need to add ingress to the worker security group and a target group on the external load balancer. You’ll need to give the leader access to the EFS storage and to things like S3 storage.

Updating versions

I’ve found following the version update process in the Cribl GUI to work well. Update the leader, then the workers. Test as required. Then, when you are ready you can update your ECR image to pull in the latest. I prefer creating an ECR image to have more control over the image version.

I also include a dev update that pulls directly from the Cribl latest version in Docker to see how much will change when the new image is pulled in. That may or may not be an extra step you want to take. With persistent storage, you can use the old version of the image and just update it again if you have to. But I like updating the ECR image, too. Depending on how much changes in dev between the previous and latest images, you may want to update prod services too, so they are running off the new image. Do it at a time that makes sense for you. It’s fairly seamless, but it takes some time and you might drop some events.

The full process is:

Update dev leader via GUI
Update dev workers via GUI
Test until you are satisfied
Deploy dev from Cribl:latest if desired
Update prod leader via GUI
Update prod workers via GUI
Update ECR image
Update services if desired

The End

That should be enough info to navigate the little oddities of deploying Cribl via Fargate. I tried to provide enough detail to navigate the hiccups without writing a novel. This setup works quite well. If you can go to Helm and EKS, that’s an easier path. Cribl updates the Helm charts regularly. The Cribl Slack, University, and Sandbox are all excellent resources to help you experiment or get unstuck. It’s really a question of what works best in your environment.

Page Glave

Page is a Security Engineer at FloQast focusing on detection engineering and incident response. Outside of the infosec bubble, she enjoys music, creative pursuits, and the outdoors.

Back to Blog

Solutions

Platform

By Role

Customers

Partner Program

Resources

Services & Support

Professional Education & Training Courses

Company