author: Patrick Merlot summary: Learning Elasticsearch by running it in Docker id: learning-elasticsearch-running-in-docker categories: education,elastic,elk,elasticsearch,docker,docker-compose environments: Codelabs status: draft feedback link: github.com/Patechoc/codelabs analytics account: UA-72074624-2
Duration: 5:00
Elasticsearch is a powerful search engine based on the Apache Lucene library
This tutorial is about learning the basics of Elasticsearch, possibly going through all the topics one need to know to pass the certification exam (Elasticsearch Engineer I & II).
This tutorial doesn't exactly follow the content of the official training, but you will learn to run Elasticsearch on a small single-node cluster, running locally for free on your laptop.
More specifically, you will learn to do the following:
- install Elasticsearch using Docker
- install Elasticsearch by customizing your own Docker image, and configure Elasticsearch as you wish
- configure it to load data by restoring a snapshot [incomplete]
- configure it to process raw files and load their data into Elasticsearch [incomplete]
- ... more basic features of ELK (more coming!!!)
Duration: 2:30
Looking for the official Elastic training and certifications, you should check these links:
- Elasticsearch Engineer I: learn how to manage deployments and develop solutions.
- Elasticsearch Engineer II: Develop a deeper understanding of how Elasticsearch works and master advanced deployment techniques.
- Elastic Certified Engineer: Test your Elasticsearch skills with our performance-based certification exam
- How to register for a course?
https://training.elastic.co/instructor-led-training/ElasticsearchEngineerI-Virtual
https://training.elastic.co/instructor-led-training/ElasticsearchEngineerII-Virtual
Negative : The training are often virtual courses you can follow from anywhere, but they are live, so you need to register in advance and plan 4 hours a day for 4 days in a row.
Duration: 15:00
Normal steps to install a single-node Elastic environment would include:
- Install Java
- Download and Setup Elastisearch
- Run Elasticsearch:
<path_to_elasticsearch_root_dir>/bin/elasticsearch - Run Kibana:
<path_to_kibana_root_dir>/bin/kibana - Verify that your installation is working:
- http://localhost:5601 -- should display Kibana UI.
- http://localhost:9200 -- should return status code 200.
Described below is another way to run Elasticsearch from within a container using Docker. This allows for a simpler, cleaner, (but temporary!!) installation of the Elastic stack which makes it practical for learning purposes.
Negative : This is the most frequent reason for your image of Elasticsearch failing to start since Elasticsearch version 5 was released. Among other prerequisites, Elasticsearch alone needs at least 2GB of RAM to run >> a minimum of 4GB RAM assigned to run in Docker!
On Linux, use sysctl vm.max_map_count on the host to view the current value, and see Elasticsearch's documentation on virtual memory for guidance on how to change this value. Note that the limits must be changed on the host; they cannot be changed from within a container.
~$ sysctl vm.max_map_count
vm.max_map_count = 65530On Linux, you can increase the limits by running the following command as root:
sysctl -w vm.max_map_count=262144- To set this value permanently, update the
vm.max_map_countsetting in/etc/sysctl.conf - To verify after rebooting, run
sysctl vm.max_map_count.
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package. By doing so, thanks to the container, the developer can rest assured that the application will run on any other Linux machine regardless of any customized settings that machine might have that could differ from the machine used for writing and testing the code.
This section describes how to use the sebp/elk Docker image, which provides a convenient centralised log server and log management web interface, by packaging Elasticsearch, Logstash, and Kibana, collectively known as ELK.
This type of installation is recommended to get started, but you might be limited later on when you need to configure Elasticsearch and restart it to apply the changes. So if you need to change your configuration, you will have to (re-)build your Docker image locally.
The RPM and Debian packages will configure this setting automatically. No further configuration is required.
To pull the image from the Docker registry, open a shell prompt and enter:
docker pull sebp/elkor this one if you haven't configured docker for your own user:
sudo docker pull sebp/elkRun a container from the image with the following command:
docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elkNote - The whole ELK stack will be started. See the Starting services selectively section to selectively start part of the stack.
This command publishes the following ports, which are needed for proper operation of the ELK stack:
5601(Kibana web interface): http://localhost:56019200(Elasticsearch JSON interface): http://localhost:9200/5044(Logstash Beats interface, receives logs from Beats such as Filebeat – see the Forwarding logs with Filebeat section).
This procedure is of course more cumbersome than pulling a pre-made docker image, but this will allow us to tweak the configuration of our Elasticsearch instance.
Here we will:
- clone the an official Docker recipe to build the Elasticsearch image
- make changes to our configuration of Elasticsearch
- build the ELK containers from that image
- run the same way we would run the official Docker image
>> Cloning the official image:
/$ cd /tmp
/tmp$ git clone https://github.com/spujadas/elk-docker.git>> Make your changes to the configuration of Elasticsearch:
... coming example to be copied from the next section...
>> Build and run the ELK containers from that image:
You may need to remove any former container with the image name 'elk' (maybe not necessary to check!!!!)
/tmp$ cd elk-docker/
/tmp/elk-docker$ docker-compose build elk
/tmp/elk-docker$ docker-compose upGo take a cup of coffee or 3 ☕☕☕, it might take 5-10 minutes!
Elasticsearch ships with good defaults and requires very little configuration. Most settings can be changed on a running cluster using the Cluster Update Settings API.
The configuration files should contain settings which are node-specific (such as node.name and paths), or settings which a node requires in order to be able to join a cluster, such as
cluster.nameandnetwork.host.
You can configure:
- the Elasticsearch Java Virtual Machine (JVM) with
jvm.otions - the Elasticsearch logging with
log4j2.properties - Elasticsearh itself with
elasticsearch.yml
Important Elasticsearch configuration are mostly settings which need to be considered before going into production:
- Path settings
- Cluster name
- Node name
- Network host
- Discovery settings
- Heap size
- Heap dump path
- GC logging
- Temp directory
Duration: 20:00
You have 2 options to index the data into Elasticsearch.
- You can either use the Elasticsearch snapshot and restore API to directly restore a dataset index from a snapshot,
- or you can download the raw data from your favorite source and then process it to index the data.
Enter your local Elasticsearch single-node cluster by entering the Docker container running it:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bab86b772b05 sebp/elk "/usr/local/bin/star…" 2 hours ago Up 2 hours 5044/tcp, 5601/tcp, 9200/tcp, 9300/tcp infallible_saha
$ docker exec -i -t bab86b772b05 /bin/bashUsing the option to restore a snapshot involves 4 easy steps:
- Download and uncompress the index snapshot .tar.gz file into a local folder
# Create snapshots directory
mkdir elastic_restaurants
cd elastic_restaurants
# Download index snapshot to elastic_restaurants directory
wget http://download.elasticsearch.org/demos/nyc_restaurants/nyc_restaurants-5-4-3.tar.gz .
# Uncompress snapshot file
tar -xf nyc_restaurants-5-4-3.tar.gzThis adds a nyc_restaurants subfolder containing the index snapshots.
- Add
nyc_restaurantsdir to thepath.repovariable inelasticsearch.ymlin the<path_to_elasticsearch_root_dir>/config/folder. See example here.. Restart elasticsearch for the change to take effect.
With Docker, any changes to your Elasticseach's configuration will be lost after a restart of the Docker container.
One solution is therefore to edit our Docker image before re-running it, then you can apply the changes mentioned above.
Register a file system repository for the snapshot (change the value of the “location” parameter below to the location of your restaurants_backup directory)
curl -H "Content-Type: application/json" -XPUT 'http://localhost:9200/_snapshot/restaurants_backup' -d '{ "type": "fs", "settings": { "location": "<path_to_nyc_restaurants>/", "compress": true, "max_snapshot_bytes_per_sec": "1000mb", "max_restore_bytes_per_sec": "1000mb" } }'
Duration: 0:30
- Starting Elasticsearch (video)
- Introduction to Kibana (video)
- Logstash Starter Guide (video)
- Elastic Cloud (video series)
- Non-codelabs-friendly old tutorial