Docs
test1test2
6.5
6.5
  • ElastiFlow Documentation
  • Unified Flow Collector
    • General Configuration
    • Changelog
    • Maxmind GeoIP2 and GeoLite2
    • RiskIQ PassiveTotal
    • Network Interfaces
    • User-Defined Metadata
    • Docker
    • Linux
    • Unified Flow Collector Introduction
    • System Requirements
    • Supported IEs
    • AWS VPC Flow Log IEs
    • IPFIX IEs
    • Netflow IEs
    • sFlow IEs
  • Unified SNMP Collector
    • Device Groups
    • Changelog
    • Devices
    • Downloading Definitions
    • Enumerations
    • Objects
    • Object Groups
    • User-Defined Metadata
    • Docker
    • Network Interfaces
    • United SNMP Collector Introduction
    • Linux
    • Scheduling Rediscovery
  • Monitoring ElastiFlow
    • Liveness & Readiness
    • Metrics
    • Prometheus & Grafana
  • Configuration Reference
    • YAML Configuration Files
    • Configuration Reference Overview
    • Common
      • API
      • Licensing
      • Overview
      • Logging
      • HTTP output
      • Elasticsearch output
      • Kafka output
      • Monitor output
      • OpenSearch output
      • Splunk output
      • stdout output
      • Processor
    • Unified Flow Collector
      • Overview
      • Community/Conversation IDs
      • EF_PROCESSOR_ENRICH_TOTALS_IF_NO_DELTAS
      • Overview
      • RiskIQ PassiveTotal
      • Maxmind
      • User-Defined Metadata
      • Overview
      • Overview
      • User-Defined Metadata
      • Overview
      • Benchmark Input
      • Netflow/IPFIX/sFlow (UDP)
      • Licensing
      • Decoder/Processor
      • Sample Rate
      • Configuration Changes
    • Unified SNMP Collector
      • User-Defined Metadata
      • Overview
      • Licensing
      • SNMP Poller
      • EF_PROCESSOR_SNMP_ENUM_DEFINITIONS_DIRECTORY_PATH
  • API Reference
    • API Reference Overview
    • SNMP Operations
  • Data Platforms
    • Elastic
      • Basic Cluster
      • Advanced Cluster
      • Single Server
      • Multi-Tier Cluster
      • Single "Lab" Server
      • Elasticsearch
      • ElastiFlow vs. Filebeat and Logstash
      • RHEL/CentOS
      • Ubuntu/Debian
      • Kibana
      • ML
        • Network Security
        • Machine Learning
        • Availability
          • Network Availability
          • DHCP
          • LDAP
          • DNS
          • NTP
          • RADIUS
          • TCP Sessions
        • Network Security Activity
          • Rare Autonomous System
          • Network Activity
          • Rare Conversation
          • Rare Geolocation
        • Network Security Brute Force
          • Brute Force CLI Access
          • Brute Force Remote Desktop Access
          • Brute Force Attacks
        • Network Security DDoS
          • Denial-of-Service
          • ICMP Flood Attack
          • SYN Flood Attack
          • TCP DDoS Attack
          • UDP Amplification Attack
        • Network Security Recon
          • ICMP Scan
          • Reconnaissance
          • Port Scan
        • Performance
          • Unusual ASN Traffic Volume
          • Unusual Network Interface Traffic Volume
          • Network Performance
    • Opensearch
      • Dashboards
      • Auth Sig V4
    • Splunk
      • Default Search Macro
      • Configuring Data Input & Index
      • Splunk App Installation
    • Output Configuration
  • Additional Guides
    • Catalyst (sFlow)
    • FortiGate
    • hsflowd
    • Configuring Flow Sampling on Juniper Routers
    • Junos OS (sFlow)
    • MikroTik RouterOS
    • OpenWRT (softflowd)
    • Ubiquiti EdgeRouter
    • SonicWall
    • Junos OS
    • Extending SNMP Device Support
    • Flow Device Support Overview
    • SNMP Device Support Overview
    • Generating A Support Bundle
  • FAQ
    • Flows stopped showing up in Kibana (Disk(s) Full)
    • Common reasons why you have discrepancies between ElastiFlow data & reality
    • What Are Snapshots?
    • Importing the wrong dashboards (No data)
  • Knowledge Base
    • Config
      • Elasticsearch Authentication Failure
      • CA Certificate Path Incorrect
      • license/error Invalid Segments
    • Flow
      • Bidirectional Flow Support
      • Configure the UDP Input
      • Flow Records Not Received
      • Netflow v9/IPFIX Template Not Receieved
      • Unsupported sFlow Structures
    • General
      • License Has Expired
      • License Agreement Not Accepted
    • Install
      • .deb Upgrade Fails File Overwrite
    • Operation
      • Flow Collector Queues 90% Full
      • Dashboard Updates
      • Change elastiflow-* Index Name?
  • Elastic Stack Deployment
  • Download Links
Powered by GitBook
On this page
  • Sizing
  • Deployment Architectures
  1. Data Platforms
  2. Elastic

Elasticsearch

Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. The ElastiFlow Unified Collectors can be configured to store the collected, processed and enriched records in Elasticsearch. Kibana enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack. Elasticsearch is where the indexing, search, and analysis happens.

Elasticsearch provides real-time search and analytics for all types of data. It efficiently indexes and stores records in a way that supports fast queries. As your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly along with it.

Sizing

Elasticsearch can be deployed as a single-mode server or multi-node cluster. The latter provides for horizontal scaling to handle very high ingest rates and longer retention periods. This section describes multiple deployment scenarios, from a single "lab" server to a multi-node cluster.

System Resources

Elasticsearch was engineered to run on "commodity" hardware. This is partially due to the Java Virtual Machine (JVM) and its loss of efficiency with heap sizes of 32GB and above. For this reason the provided architectures scale by adding Elasticsearch nodes to form larger clusters, rather than increasing the resources allocated to each node.

:::tip Hardware has grown more and more powerful. Servers with 128 cores (256 threads) and 512GB-1TB of memory are now common. While engineered to run on "commodity" hardware, Elasticsearch can still be deployed on such systems. To take full advantage of the available resources multiple instances of Elasticsearch should be deployed on the server. Such a cluster can provide very good performance and reliability as long as: 1. each Elasticsearch node has its own dedicated disks; and 2. rack-awareness features are used to ensure that primary and replica shards are not stored on the same physical server. :::

To understand the provided architectures, the following should be considered.

CPU

The provided CPU core counts refer to actual physical CPU cores. CPUs which provide SMT/Hyperthreading will have a thread count, twice the core count. For example, if the architecture refers to 16 cores, this would be a 16 core/32 thread processor.

:::info When deploying in virtualized or cloud environments, a vCPU is not the same as a physical core. For example, 2 vCPUs are the equivalent of 1 physical core and 1 SMT thread. In such environments the number of allocated vCPUs should be double the number of indicated cores. :::

Memory

The configured JVM Heap Size for Elasticsearch should be approximately 1/3, and no more than 1/2, of the total memory. However the heap should never be set to more than 31GB. Any additional memory will be used by the operating system as page cache. This allows many queries against recent data to be answered without significant disk I/O. For this reason more memory will usually result in better query performance.

Storage

Determining the necessary storage capacity is generally a straight-forward math problem. The indexed size of a flow record is usually 450-550 bytes. A size of 500 bytes is typically used to estimate the required storage capacity. This results in a storage requirement of 43.2GB/day for each 1000 flows/sec. A replica would also require the same capacity.

Elasticsearch will not allocate shards to nodes that have used more than 85% of their storage capacity. This low watermark is configurable, but should only be changed in special circumstances. This means that the effective storage capacity of a node will only be 85% of the actual capacity. For example, only 6.8 of 8TB should be considered for capacity planning purposes.

Deployment Architectures

The following example deployment architectures are provided to help with planning for your own needs and environment.

:::note The following retention periods assume that the recommended maximum ingest rate is sustained. If the 24-hour average ingest rate is lower, the retention period will be proportionately longer. :::

Single "Lab" Server (x-small)

The Single "Lab" Server (x-small) deployment is for lab environments and testing with a smaller volume of records.

Sizing Parameter
Value

Licensed Units

up to 2

Recommended Max. Ingest Rate

2000 flows/sec

Retention at Max. Rate

19 days

Redundancy

No

Single Server (small)

The Single Server (small) deployment is suitable for moderate ingest rates where redundancy is not a requirement, and downtime can be tolerated for activities such as upgrades.

Sizing Parameter
Value

Licensed Units

up to 6

Recommended Max. Ingest Rate

16000 flows/sec

Retention at Max. Rate

10 days

Redundancy

No

Basic Cluster (medium)

The Basic Cluster (medium) deployment is suitable for moderate ingest rates where redundancy is a requirement. It also allows for minimal to no downtime for most maintenance tasks.

Sizing Parameter
Value

Licensed Units

up to 8

Recommended Max. Ingest Rate

24000 flows/sec

Retention at Max. Rate

10 days

Redundancy

Yes

Advanced Cluster (large)

The Advanced Cluster (large) deployment is suitable for high ingest rates and is easily expanded as necessary.

Sizing Parameter
Value

Licensed Units

up to 16

Recommended Max. Ingest Rate

48000 flows/sec

Retention at Max. Rate

10 days

Redundancy

Yes

Multi-Tier Cluster (x-large)

The Multi-Tier Cluster (x-large) deployment is suitable for high ingest rates, while also supporting longer retention periods.

Sizing Parameter
Value

Licensed Units

up to 16

Recommended Max. Ingest Rate

48000 flows/sec

Retention at Max. Rate

30 days

Redundancy

Yes

PreviousSingle "Lab" ServerNextElastiFlow vs. Filebeat and Logstash