Docker Swarm 1.12 on PicoCluster 17 Jul 2016 8:40 AM (8 years ago)

I followed the directions at https://medium.com/@bossjones/how-i-setup-a-raspberry-pi-3-cluster-using-the-new-docker-swarm-mode-in-29-minutes-aa0e4f3b1768#.ma06iyonf but tweaked them a bit.

First off, I wanted to have my cluster using eth0 to connect to my laptop and then share its WiFi connection. Using this technique means that my WiFi network name and password are not on the cluster. So the cluster should be able to plug into any laptop or server without changes. Follow instructions at https://t.co/2jRbNAOiCU to share your eth0 connection.

Use lsblk to umount any directories on the SD cards you'll be using. See http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html for a bit of information about lsblk.

Now flash the SD cards using the flash tool from hypriot. Notice that *no* network information is provided.

I used piX naming convention so that I can easily loop over all five RPI in the PicoCluster.

flash --hostname pi1 --device /dev/mmcblk0 https://github.com/hypriot/image-builder-rpi/releases/download/v0.8.1/hypriotos-rpi-v0.8.1.img.zip
flash --hostname pi2 --device /dev/mmcblk0 https://github.com/hypriot/image-builder-rpi/releases/download/v0.8.1/hypriotos-rpi-v0.8.1.img.zip
flash --hostname pi3 --device /dev/mmcblk0 https://github.com/hypriot/image-builder-rpi/releases/download/v0.8.1/hypriotos-rpi-v0.8.1.img.zip
flash --hostname pi4 --device /dev/mmcblk0 https://github.com/hypriot/image-builder-rpi/releases/download/v0.8.1/hypriotos-rpi-v0.8.1.img.zip
flash --hostname pi5 --device /dev/mmcblk0 https://github.com/hypriot/image-builder-rpi/releases/download/v0.8.1/hypriotos-rpi-v0.8.1.img.zip

Using this function, you can find the IP addresses for the RPI.

function getip() { (traceroute $1 2>&1 | head -n 1 | cut -d$ -f 2 | cut -d$ -f 1) }

List the IP addresses.

for i in `seq 1 5`; do echo "HOST: pi$i IP: $(getip pi$i.local)"; done

Remove any fingerprints for the RPI.

for i in `seq 1 5`; do ssh-keygen -R pi${i}.local 2>/dev/null; done

Copy your PKI identity to the RPI.

for i in `seq 1 5`; do ssh-copy-id -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi${i}.local; done

Download the deb file for Docker v1.12

curl -O https://jenkins.hypriot.com/job/armhf-docker/17/artifact/bundles/latest/build-deb/raspbian-jessie/docker-engine_1.12.0%7Erc4-0%7Ejessie_armhf.deb

Copy the deb file to the RPI

for i in `seq 1 5`; do scp -oStrictHostKeyChecking=no -oCheckHostIP=no docker-engine_1.12.0%7Erc4-0%7Ejessie_armhf.deb pirate@pi$i.local:.; done
Remove older Docker version from the RPI

for i in `seq 1 5`; do ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi$i.local sudo apt-get purge -y docker-hypriot; done

Install Docker

for i in `seq 1 5`; do ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi$i.local sudo dpkg -i docker-engine_1.12.0%7Erc4-0%7Ejessie_armhf.deb; done

Initialize the Swarm

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi1.local docker swarm init

Join slaves to Swarm - replace the join command below with the specific one displayed by the init command.

for i in `seq 2 5`; do
ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi$i.local docker swarm join --secret ceuok9jso0klube8m3ih9gcsv --ca-hash sha256:f0864eb57963e3f9cd1756e691d0b609903e3a0bb48785272ea53155809025ee 10.42.0.49:2377;
done
Exercise the Swarm

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi1.local
docker service create --name ping hypriot/rpi-alpine-scratch ping 8.8.8.8
docker service tasks ping
docker service update --replicas 10 ping
docker service tasks ping
docker service rm ping

How I Got Apache Spark to Sort Of (Not Really) Work on my PicoCluster of 5 Raspberry PI 5 Jul 2016 3:50 PM (9 years ago)

I've read several blog posts about people running Apache Spark on a Raspberry PI. It didn't seem too hard so I thought I've have a go at it. But the results were disappointing. Bear in mind that I am a Spark novice so some setting is probably. I ran into two issues - memory and heartbeats.

So, this what I did.

I based my work on these pages:

* https://darrenjw2.wordpress.com/2015/04/17/installing-apache-spark-on-a-raspberry-pi-2/
* https://darrenjw2.wordpress.com/2015/04/18/setting-up-a-standalone-apache-spark-cluster-of-raspberry-pi-2/
* http://www.openkb.info/2014/11/memory-settings-for-spark-standalone_27.html

I created five SD cards according to my previous blog post (see http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html).

Installation of Apache Spark

* install Oracle Java and Python

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local sudo apt-get install -y oracle-java8-jdk python2.7 &); done

* download Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz

* Copy Spark to all RPI

for i in `seq 1 5`; do (scp -q -oStrictHostKeyChecking=no -oCheckHostIP=no spark-1.6.2-bin-hadoop2.6.tgz pirate@pi0${i}.local:. && echo "Copy complete to pi0${i}" &); done

* Uncompress Spark

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local tar xfz spark-1.6.2-bin-hadoop2.6.tgz && echo "Uncompress complete to pi0${i}" &); done

* Remove tgz file

for i in `seq 1 5`; do (ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local rm spark-1.6.2-bin-hadoop2.6.tgz); done

* Add the following to your .bashrc file on each RPI. I can't figure out how to put this into a loop.

export SPARK_LOCAL_IP="$(ip route get 1 | awk '{print $NF;exit}')"

* Run Standalone Spark Shell

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
cd spark-1.6.2-bin-hadoop2.6
bin/run-example SparkPi 10
bin/spark-shell --master local[4]
# This takes several minutes to display a prompt.
# While the shell is running, visit http://pi01.local:4040/
scala> sc.textFile("README.md").count
# After the job is complete, visit the monitor page.
scala> exit

* Run PyShark Shell

bin/pyspark --master local[4]
>>> sc.textFile("README.md").count()
>>> exit()

CLUSTER

Now for the clustering...

* Enable password-less SSH between nodes

ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi01.local
for i in `seq 1 5`; do avahi-resolve --name pi0${i}.local -4 | awk ' { t = $1; $1 = $2; $2 = t; print; } ' | sudo tee --append /etc/hosts; done
echo "$(ip route get 1 | awk '{print $NF;exit}') $(hostname).local" | sudo tee --append /etc/hosts
ssh-keygen
for i in `seq 1 5`; do ssh-copy-id pirate@pi0${i}.local; done

* Configure Spark for Cluster

cd spark-1.6.2-bin-hadoop2.6/conf

create a slaves file with the following contents
pi01.local
pi02.local
pi03.local
pi04.local
pi05.local

cp spark-env.sh.template spark-env.sh
In spark-env.sh
Set SPARK_MASTER_IP the results of "ip route get 1 | awk '{print $NF;exit}'"
SPARK_WORKER_MEMORY=512m

* Copy the spark environment script to the other RPI

for i in `seq 2 5`; do scp spark-env.sh pirate@pi0${i}.local:spark-1.6.2-bin-hadoop2.6/conf/; done

* Start the cluster

cd ..
sbin/start-all.sh

* Visit the monitor page

http://192.168.1.8:8080

And everything is working so far! But ...

* Start a Spark Shell

bin/spark-shell --executor-memory 500m --driver-memory 500m --master spark://pi01.local:7077 --conf spark.executor.heartbeatInterval=45s

And this fails...

How I got Docker Swarm to Run on a Raspberry PI PicoCluster with Consul 25 Jun 2016 9:05 AM (9 years ago)

At the end of this article, I have a working Docker Swarm running on a five-node PicoCluster. Please flash your SD cards according to http://affy.blogspot.com/2016/06/how-did-i-prepare-my-picocluster-for.html. Stop following that article after copying the SSH ids to the RPI.

I am controlling the PicoCluster using my laptop. Therefore, my laptop is the HOST in the steps below.

There is no guarantee this commands are correct. They just seem to work for me. And please don't ever, ever depend on this information for anything non-prototype without doing your own research.

* On the HOST, create the Docker Machine to hold the consul service.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--generic-ip-address=$(getip pi01.local) \
--generic-ssh-user "pirate" \
consul-machine

* Connect to the consul-machine Docker Machine

eval $(docker-machine env consul-machine)

* Start Consul.

docker run \
-d \
-p 8500:8500 \
hypriot/rpi-consul \
agent -dev -client 0.0.0.0

* Reset docker environment to talk with host docker.

unset DOCKER_TLS_VERIFY DOCKER_HOST DOCKER_CERT_PATH DOCKER_MACHINE_NAME

* Visit the consul dashboard to provide it is working and accessible.

firefox http://$(getip pi01.local):8500

* Create the swarm-master machine. Note that eth0 is being used instead of eth1.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-master \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi02.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-master

* Create the first slave node.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi03.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave01

* List nodes in the swarm. I don't know why, but this command must be run from one of the RPI. Otherwise, I see a "malformed HTTP response" message.

eval $(docker-machine env swarm-master)

docker -H $(docker-machine ip swarm-master):3376 run \
--rm \
hypriot/rpi-swarm:latest \
list consul://$(docker-machine ip consul-machine):8500

* Create the second slave node.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi04.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave02

* Create the first third node.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--swarm-discovery="consul://$(docker-machine ip consul-machine):8500" \
--generic-ip-address=$(getip pi05.local) \
--generic-ssh-user "pirate" \
--engine-opt="cluster-store=consul://$(docker-machine ip consul-machine):8500" \
--engine-opt="cluster-advertise=eth0:2376" \
swarm-slave03

* Check that docker machine sees all of the nodes

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
consul-machine - generic Running tcp://192.168.1.8:2376 v1.11.1
swarm-master - generic Running tcp://192.168.1.7:2376 swarm-master (master) v1.11.1
swarm-slave01 - generic Running tcp://192.168.1.2:2376 swarm-master v1.11.1
swarm-slave02 - generic Running tcp://192.168.1.5:2376 swarm-master v1.11.1
swarm-slave03 - generic Running tcp://192.168.1.4:2376 swarm-master v1.11.1

* List the swarm nodes in Firefox using Consul.

firefox http://$(docker-machine ip consul-machine):8500/ui/#/dc1/kv/docker/swarm/nodes/

* Is my cluster working? First, switch to the swarm-master environment. Then view it's information. You should see the slaves listed. Next run the hello-world container. And finally, list the containers.

eval $(docker-machine env swarm-master)
docker -H $(docker-machine ip swarm-master):3376 info
docker -H $(docker-machine ip swarm-master):3376 run hypriot/armhf-hello-world
docker -H $(docker-machine ip swarm-master):3376 ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

456fa23b8c52 hypriot/armhf-hello-world "/hello" 8 seconds ago Exited (0) 5 seconds ago swarm-slave01/nauseous_swartz

e1eb8a790e3f hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave03/swarm-agent

122b89a2ae5d hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave02/swarm-agent

449aa7087ecc hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-slave01/swarm-agent

6355f31de952 hypriot/rpi-swarm:latest "/swarm join --advert" 3 hours ago Up 3 hours 2375/tcp swarm-master/swarm-agent

05ee666e8662 hypriot/rpi-swarm:latest "/swarm manage --tlsv" 3 hours ago Up 3 hours 2375/tcp, 192.168.1.7:3376->3376/tcp swarm-master/swarm-agent-master

Jump up and down when you see that the hello-world container was run from swarm-master but run on swarm-slave01!

How I attached a USB Thumb drive to my Raspberry PI and used it to hold Docker's Root Directory! 22 Jun 2016 7:52 PM (9 years ago)

This post tells how I attached a USB Thumb drive to my Raspberry PI and used it to hold Docker's Root Directory.

The first step is to connect to the RPI.

$ ssh -o 'StrictHostKeyChecking=no' -o 'CheckHostIP=no' 'pirate@pi02.local'

Now create a mount point. This is just a directory, nothing fancy. It should be owned by root because Docker runs as root. Don't try to use "pirate" as the owner. I tried that. It failed. Leave the owner as root.

$ sudo mkdir /media/usb

Then look at the attached USB devices.

$ sudo blkid
/dev/mmcblk0: PTTYPE="dos"
/dev/mmcblk0p1: SEC_TYPE="msdos" LABEL="HypriotOS" UUID="D6D9-1D76" TYPE="vfat"
/dev/mmcblk0p2: LABEL="root" UUID="81e5bfc7-0701-4a09-80aa-fe5bc3eecbcf" TYPE="ext4"
/dev/sda1: LABEL="STORE N GO" UUID="F171-FAE6" TYPE="vfat" PARTUUID="f11d6f2b-01"

Note that the USB thumb drive is /dev/sda1. The information above is for the original formatting of the drive. After formatting the drive to use "ext3" the information looks like:

/dev/sda1: LABEL="PI02" UUID="801b666c-ea47-4f6f-ab6b-b88acceff08f" TYPE="ext3" PARTUUID="f11d6f2b-01"

This is the command that I used to format the drive to use ext3. Notiice that I named the drive the same as the hostname. I have no particular reason to do this. It just seemed right. Only run this formatting command once.

$ sudo mkfs.ext3 -L "PI02" /dev/sda1

Now it's time to mount the thumb drive. Here we connect the device (/dev/sda1) to the mount point. After this command is run you'll be able to use /media/usb as a normal directory.

$ sudo mount /dev/sda1 /media/usb

Next we setup the thumb drive to be available whenever the RPI is rebooted. First, find the UUID. It's whatever UUID is associated with sda1.

$ sudo ls -l /dev/disk/by-uuid
total 0
lrwxrwxrwx 1 root root 10 Jul 3 2014 801b666c-ea47-4f6f-ab6b-b88acceff08f -> ../../sda1
lrwxrwxrwx 1 root root 15 Jul 3 2014 81e5bfc7-0701-4a09-80aa-fe5bc3eecbcf -> ../../mmcblk0p2
lrwxrwxrwx 1 root root 15 Jul 3 2014 D6D9-1D76 -> ../../mmcblk0p1

Now add that UUID to the /etc/fstab file so it will be recognized across reboots. If you re-flash your SD card, you'll need to execute this step again.

$ echo "UUID=801b666c-ea47-4f6f-ab6b-b88acceff08f /media/usb nofail 0 0" | sudo tee -a /etc/fstab

Some images already on the Hypriot SD card. We'll make sure they are available after we move the Docker Root directory.

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hypriot/rpi-swarm 1.2.2 f13b7205f2db 5 weeks ago 13.97 MB
hypriot/rpi-consul 0.6.4 879ac05d5353 6 weeks ago 19.71 MB

Stop Docker to ensure that the Docker root directory does not change.

$ sudo systemctl stop docker

Copy files to the new location. Don't bother deleting the original files.

$ sudo cp --no-preserve=mode --recursive /var/lib/docker /media/usb/docker

If you are paranoid, you can compare the two directory trees.

$ sudo diff /var/lib/docker /media/usb/docker
Common subdirectories: /var/lib/docker/containers and /media/usb/docker/containers
Common subdirectories: /var/lib/docker/image and /media/usb/docker/image
Common subdirectories: /var/lib/docker/network and /media/usb/docker/network
Common subdirectories: /var/lib/docker/overlay and /media/usb/docker/overlay
Common subdirectories: /var/lib/docker/tmp and /media/usb/docker/tmp
Common subdirectories: /var/lib/docker/trust and /media/usb/docker/trust
Common subdirectories: /var/lib/docker/volumes and /media/usb/docker/volumes

Edit the docker service file to add --graph "/media/usb/docker" to the end of the ExecStart line.

$ sudo vi /etc/systemd/system/docker.service

Now reload the systemctl daemon and start docker.

sudo systemctl daemon-reload
sudo systemctl start docker

Confirm that the ExecStart is correct - that is has the graph parameter.

$ sudo systemctl show docker | grep ExecStart

Confirm that the Docker Root Directory has changed.

$ docker info | grep "Root Dir"

And finally, confirm that you can see docker images.

$ docker images

How Did I prepare My PicoCluster For Docker Swarm? 21 Jun 2016 5:48 PM (9 years ago)

How Did I prepare my PicoCluster?

DOCKER VERSION: 1.11.1
HYPRIOT VERSION: 0.8
RASPBERRY PI: 3

From my Linux laptop, I created five SD cards using the flash utility from Hypriot.

As I plugged each SD card into my laptop, I ran 'lsblk'. Then I used 'umount' for anything mounted to the SD card. For example.

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 111.8G 0 disk
├─sda1 8:1 0 79.9G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 31.9G 0 part [SWAP]
sdb 8:16 0 894.3G 0 disk
└─sdb1 8:17 0 894.3G 0 part /data
sr0 11:0 1 1024M 0 rom
mmcblk0 179:0 0 15G 0 disk
├─mmcblk0p1 179:1 0 64M 0 part /media/medined/3ABE-55E4
└─mmcblk0p2 179:2 0 14.9G 0 part /media/medined/root

umount any mount points for mmcblk0 (or your SD card). For example,

umount /media/medined/3ABE-55E4
umount /media/medined/root

If the SD cards were flashed in the past then you'll need to run

umount /media/medined/HypriotOS
umount /media/medined/root

Here are the five flash commands that I used. Of course, I used my real SSID and PASSWORD. Note that this command leaves your password in your shell history. If this is a concern, please research alternatives.

As you flash the SD cards, use a gold sharpie to indicate the hostname of the SD card. This will make it much easier to make sure they are in the right RPI.

flash --hostname pi01 --ssid NETWORK --password PASSWORD --device /dev/mmcblk0 https://downloads.hypriot.com/hypriotos-rpi-v0.8.0.img.zip
flash --hostname pi02 --ssid NETWORK --password PASSWORD --device /dev/mmcblk0 https://downloads.hypriot.com/hypriotos-rpi-v0.8.0.img.zip
flash --hostname pi03 --ssid NETWORK --password PASSWORD --device /dev/mmcblk0 https://downloads.hypriot.com/hypriotos-rpi-v0.8.0.img.zip
flash --hostname pi04 --ssid NETWORK --password PASSWORD --device /dev/mmcblk0 https://downloads.hypriot.com/hypriotos-rpi-v0.8.0.img.zip
flash --hostname pi05 --ssid NETWORK --password PASSWORD --device /dev/mmcblk0 https://downloads.hypriot.com/hypriotos-rpi-v0.8.0.img.zip

Next after the SD cards are plaeced into the PicoCluster, I plugged it into power.

As a sidenote, each time you restart the RPIs, their SSH fingerprint changes. You'll need to remove the old fingerprint. One technique is the following:

for i in `seq 1 5`; do ssh-keygen -R pi0${i}.local 2>/dev/null; done

I dislike questions about server fingerprint's when connecting. Therefore, you'll see me using the "StrictHostKeyChecking=no" option with SSH. I take no stance on the security ramifications of this choice. I'm connecting to my local PicoCluster not some public server. Make your own security decisions.

Ensure that you have a SSH key set. Look for "~/.ssh/id_rsa". If you don't have that file, use ssh-keygen to make one.

Now copy your PKI credential to the five PRI to enable password-less SSH. You be asked for the password, which should be "hypriot", five times.

for i in `seq 1 5`; do ssh-copy-id -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local; done

Next you can check that password-less SSH is working. After each SSH, you'll see a prompt like "HypriotOS/armv7: pirate@pi01 in ~". Just check the hostname is correct and then type exit to move onto the next RPI.

for i in `seq 1 5`; do ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local; done

You can use the following shell function to determine the IP address of an RPI. I also found it happy to log into my router to see the list of attached devices. By the way, if you haven't changed the default password for the admin user of your router, do it. This article will wait...

function getip() { (traceroute $1 2>&1 | head -n 1 | cut -d$ -f 2 | cut -d$ -f 1) }

It's probably a good idea to place that function in your .bashrc file so that you'll always have it handy.

for i in `seq 1 5`; do echo "PI0${i}.local: $(getip pi0${i}.local)"; done

Now comes the fun part, setting up the Docker Swarm. Fair warning. I don't know if these steps are correct.

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-master \
--swarm-image hypriot/rpi-swarm:latest \
--generic-ip-address=$(getip pi01.local) \
--generic-ssh-user "pirate" \
--swarm-discovery="token://01" \
swarm

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--generic-ip-address=$(getip pi02.local) \
--generic-ssh-user "pirate" \
--swarm-discovery="token://01" \
swarm-slave01

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--generic-ip-address=$(getip pi03.local) \
--generic-ssh-user "pirate" \
--swarm-discovery="token://01" \
swarm-slave02

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--generic-ip-address=$(getip pi04.local) \
--generic-ssh-user "pirate" \
--swarm-discovery="token://01" \
swarm-slave03

docker-machine create \
-d generic \
--engine-storage-driver=overlay \
--swarm \
--swarm-image hypriot/rpi-swarm:latest \
--generic-ip-address=$(getip pi05.local) \
--generic-ssh-user "pirate" \
--swarm-discovery="token://01" \
swarm-slave04

Now you can run list the nodes in the cluster using Docker Machine:

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
swarm - generic Running tcp://192.168.1.12:2376 swarm (master) v1.11.1
swarm-slave01 - generic Running tcp://192.168.1.7:2376 swarm v1.11.1
swarm-slave02 - generic Running tcp://192.168.1.11:2376 swarm v1.11.1
swarm-slave03 - generic Running tcp://192.168.1.23:2376 swarm v1.11.1
swarm-slave04 - generic Running tcp://192.168.1.22:2376 swarm v1.11.1

Notice that a master node is indicated but it is not marked as active. I don't know why.

Before moving on, let's look at what containers are being run. There should be six.

for i in `seq 1 5`; do echo "RPI ${i}"; ssh -oStrictHostKeyChecking=no -oCheckHostIP=no pirate@pi0${i}.local docker ps -a; done
RPI 1
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ceb4a5255dc2 hypriot/rpi-swarm:latest "/swarm join --advert" About an hour ago Up About an hour 2375/tcp swarm-agent
e9d3bf308284 hypriot/rpi-swarm:latest "/swarm manage --tlsv" About an hour ago Up About an hour 2375/tcp, 0.0.0.0:3376->3376/tcp swarm-agent-master
RPI 2
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e2dca97c23fe hypriot/rpi-swarm:latest "/swarm join --advert" About an hour ago Up About an hour 2375/tcp swarm-agent
RPI 3
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
07d0b4fc4490 hypriot/rpi-swarm:latest "/swarm join --advert" 11 minutes ago Up 11 minutes 2375/tcp swarm-agent
RPI 4
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
88712d8df693 hypriot/rpi-swarm:latest "/swarm join --advert" 6 minutes ago Up 6 minutes 2375/tcp swarm-agent
RPI 5
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b7738fb8c4b8 hypriot/rpi-swarm:latest "/swarm join --advert" 2 minutes ago Up 2 minutes 2375/tcp swarm-agent

Currently, when you type "docker ps" you're looking at containers running on your local computer. You can switch so that "docker" connects to one of the "docker machines" using this command:

eval $(docker-machine env swarm)

Now "docker ps" returns information about containers running on pi01.

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ceb4a5255dc2 hypriot/rpi-swarm:latest "/swarm join --advert" About an hour ago Up About an hour 2375/tcp swarm-agent
e9d3bf308284 hypriot/rpi-swarm:latest "/swarm manage --tlsv" About an hour ago Up About an hour 2375/tcp, 0.0.0.0:3376->3376/tcp swarm-agent-master

One neat "trick" is to look at the information from the "swarm-agent-master" container. This is done using Docker's -H option. Notice that the results indicate there are six containers running. Count the number of containers found using the "for..loop" earlier. They are the same number.

$ docker -H $(docker-machine ip swarm):3376 info
Containers: 6
Running: 6
Paused: 0
Stopped: 0
Images: 15
Server Version: swarm/1.2.3
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 5
swarm: 192.168.1.12:2376
└ ID: P4OH:AB7Q:T2T3:P6OK:BW5F:YSIB:NACW:Q2F3:FKU4:IJFD:AUJQ:74CZ
└ Status: Healthy
└ Containers: 2
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 971.7 MiB
└ Labels: executiondriver=, kernelversion=4.4.10-hypriotos-v7+, operatingsystem=Raspbian GNU/Linux 8 (jessie), provider=generic, storagedriver=overlay
└ UpdatedAt: 2016-06-22T01:39:56Z
└ ServerVersion: 1.11.1
swarm-slave01: 192.168.1.7:2376
└ ID: GDQI:WYHS:OD2W:EE67:CKMU:A2PW:6K5T:YZSK:B5KL:SPCZ:6GVX:5MCO
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 971.7 MiB
└ Labels: executiondriver=, kernelversion=4.4.10-hypriotos-v7+, operatingsystem=Raspbian GNU/Linux 8 (jessie), provider=generic, storagedriver=overlay
└ UpdatedAt: 2016-06-22T01:39:45Z
└ ServerVersion: 1.11.1
swarm-slave02: 192.168.1.11:2376
└ ID: CA7H:C7UA:5V5N:NY4C:KECT:JK57:HDGN:2DNH:ASXQ:UJFQ:A5A4:US3Y
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 971.7 MiB
└ Labels: executiondriver=, kernelversion=4.4.10-hypriotos-v7+, operatingsystem=Raspbian GNU/Linux 8 (jessie), provider=generic, storagedriver=overlay
└ UpdatedAt: 2016-06-22T01:39:32Z
└ ServerVersion: 1.11.1
swarm-slave03: 192.168.1.23:2376
└ ID: 6H6D:P6EN:PTBL:Q5E3:MP32:T6CI:XU33:PCQV:KT6H:KRJ4:LYSN:76EJ
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 971.7 MiB
└ Labels: executiondriver=, kernelversion=4.4.10-hypriotos-v7+, operatingsystem=Raspbian GNU/Linux 8 (jessie), provider=generic, storagedriver=overlay
└ UpdatedAt: 2016-06-22T01:39:25Z
└ ServerVersion: 1.11.1
swarm-slave04: 192.168.1.22:2376
└ ID: 2ZBK:3DJE:D23C:7QAB:TLFS:L7EO:L4L4:IQ6Y:EC7D:UG7S:3WU6:QJ5D
└ Status: Healthy
└ Containers: 1
└ Reserved CPUs: 0 / 4
└ Reserved Memory: 0 B / 971.7 MiB
└ Labels: executiondriver=, kernelversion=4.4.10-hypriotos-v7+, operatingsystem=Raspbian GNU/Linux 8 (jessie), provider=generic, storagedriver=overlay
└ UpdatedAt: 2016-06-22T01:39:32Z
└ ServerVersion: 1.11.1
Plugins:
Volume:
Network:
Kernel Version: 4.4.10-hypriotos-v7+
Operating System: linux
Architecture: arm
CPUs: 20
Total Memory: 4.745 GiB
Name: e9d3bf308284
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support

And that's as far as I've gotten.

Go Program to Read Docker Image List From Unix Socket (/var/run/docker.sock) 24 Aug 2015 5:20 PM (9 years ago)

It took me a bit of time to get this simple program working so I'm sharing for other people new to Go.

package main

import (
    "fmt"
    "io"
    "net"
)

func reader(r io.Reader) {
    buf := make([]byte, 1024)
    for {
        n, err := r.Read(buf[:])
        if err != nil {
            return
        }
        println(string(buf[0:n]))
    }
}

func main() {
    c, err := net.Dial("unix", "/var/run/docker.sock")
    if err != nil {
        panic(err)
    }
    defer c.Close()

    fmt.Fprintf(c, "GET /images/json HTTP/1.0\r\n\r\n")

    reader(c)
}

Running the NodeJS Example Inside Docker Container 24 Apr 2015 7:01 PM (10 years ago)

Yesterday, I showed how to run NodeJS inside a Docker container. Today, I updated my Github project (https://github.com/medined/docker-nodejs) so that the Example server works correctly.

The trick is for the NodeJS code inside the container to find the container's IP address and listen on that address instead of localhost or 127.0.0.1. This is not difficult.

require('dns').lookup(require('os').hostname(), function (err, add, fam) {
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.end('Hello World\n');
}).listen(1337, add);
console.log('Server running at http://' + add + ':1337/');
})

If you're using my Docker image, then you'd just run the following to start the server. Use ^C to stop the server.

node example.js

Now you can browse from the host computer using the following URL. Note that the 'docker run' command exposes port 1337.

http://localhost:1337/

Running NodeJS (and related tools) from a Docker container. 23 Apr 2015 6:58 PM (10 years ago)

In my continuing quest to run my development tools from within Docker containers, I looked at Node today.

The Github project is at https://github.com/medined/docker-nodejs.

My Dockerfile is fairly simple:

FROM ubuntu:14.04

RUN apt-get -qq update \
&& apt-get install -y curl \
&& curl -sL https://deb.nodesource.com/setup | sudo bash - \
&& apt-get install -y nodejs \
&& npm install -g inherits bower grunt grunt-cli

RUN useradd -ms /bin/bash developer

USER developer
WORKDIR /home/developer

It's built using:

docker build -t medined/nodejs .

Using the 'developer' user is important because bower can't be used by root. By itself, this container does not look impressive. Some magic is added by the following shell script called 'node':

#!/bin/bash

CMD=$(basename $0)

docker run \
-it \
--rm \
-p 1337:1337 \
-v "$PWD":/home/developer/source \
-w /home/developer/source \
medined/nodejs \
$CMD $@

I expose port 1337 because that's the port used on the NodeJS home page example. The current directory is exposed in the container at a convenient location. That location is used at the working directory.

You might be puzzled at the use of $CMD. I symlink this script to bower, grunt, and npm. The $CMD invokes the proper command inside the container.

Running Spring Boot inside Docker 20 Apr 2015 5:26 PM (10 years ago)

This is another in my series of very short entries about Docker. I've been working to not install maven on my development laptop. But I still want to use spring-boot:run to launch my applications. Here is the Docker command I am using. Notice the server.port is specified on the command line so that I can change it as needed.

docker run \
-it \
--rm \
-p 8090:8090 \
-e server.port=8090 \
--link artifactory:artifactory \
--link mysql:mysql \
-v "$PWD/m2":/root/.m2 \
-v "$PWD":/usr/src/mymaven \
-w /usr/src/mymaven \
maven:3.3-jdk-8 \
mvn spring-boot:run

The MySQL container was started like this:

docker run \
--name mysql \
-p 3306:3306 \
-v /data/mysql:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=password \
-e MYSQL_DATABASE=docker \
-e MYSQL_USER=docker \
-e MYSQL_PASSWORD=password \
-d \
mysql/mysql-server:5.5

Running Maven inside Docker. 20 Apr 2015 12:23 PM (10 years ago)

I recently reinstalled Ubuntu on my zareason laptop. As I was thinking about installing my development tools, I thought about how to integrate Docker into the process. Below I show how simple using the Maven container can be:

* Create an alias to the Maven container.

alias mvn="docker run \
-it \
--rm \
--name my-maven-project \
-v "$PWD":/usr/src/mymaven \
-w /usr/src/mymaven \
maven:3.3-jdk-8 \
mvn"

* Clone my ragnvald Java project.

git clone git@github.com:medined/ragnvald.git

* cd ragnvald

* Package the project.

mvn package
That's it. You're using Maven without installing onto your laptop! The results of the compilation are placed into the target directory.

If you need to specify a Maven settings.xml file that's fairly easy as well. Simply create it alongside the pom.xml file. Then slightly modify your alias:

alias mvn="docker run \
-it \
--rm \
--name my-maven-project \
   -v "$PWD":/root/.m2 \
   -v "$PWD":/usr/src/mymaven \
   -w /usr/src/mymaven \
maven:3.3-jdk-8 \
mvn"

The ragnvald project goes one step farther to use an Artifactory container so that I can use the Artifactory web interface if needed. That's quite convenient!

Running MySQL on Docker 15 Apr 2015 7:53 PM (10 years ago)

This entry doesn't reveal any hidden secrets just the simple steps to start using MySQL on Docker.

* Install docker

* Install docker-compose

* mkdir firstdb

* cd firstdb

* vi docker-compose.yml

mysql:
image: mysql:latest
environment:
MYSQL_DATABASE: sample
MYSQL_USER: mysql
MYSQL_PASSWORD: mysql
MYSQL_ROOT_PASSWORD: supersecret

* docker-compose up
* docker-compose ps

Name Command State Ports
-----------------------------------------------------------------
firstdb_mysql_1 /entrypoint.sh mysqld Up 3306/tcp

* Use a one-shot Docker instance to display environment variables. Notice
the variables that start with MYSQL? Your programs can use these variables
to make the database connection.

docker run --link=firstdb_mysql_1:mysql ubuntu env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=abfc8d50633b
MYSQL_PORT=tcp://172.17.0.23:3306
MYSQL_PORT_3306_TCP=tcp://172.17.0.23:3306
MYSQL_PORT_3306_TCP_ADDR=172.17.0.23
MYSQL_PORT_3306_TCP_PORT=3306
MYSQL_PORT_3306_TCP_PROTO=tcp
MYSQL_NAME=/nostalgic_rosalind/mysqldb
MYSQL_ENV_MYSQL_PASSWORD=mysql
MYSQL_ENV_MYSQL_ROOT_PASSWORD=supersecret
MYSQL_ENV_MYSQL_USER=mysql
MYSQL_ENV_MYSQL_DATABASE=sample
MYSQL_ENV_MYSQL_MAJOR=5.6
MYSQL_ENV_MYSQL_VERSION=5.6.24
HOME=/root

* Use a one-shot Docker instance for a MySQL command-line interface. Once this
is running, you'll be able to use command like 'show databases'.

docker run -it \
--link=firstcompose_mysqldb_1:mysql \
--rm \
mysql/mysql-server:latest \
sh -c 'exec mysql -h"$MYSQL_PORT_3306_TCP_ADDR" -P"$MYSQL_PORT_3306_TCP_PORT" -uroot -p"$MYSQL_ENV_MYSQL_ROOT_PASSWORD"'

That's all it takes to start.

Using AZUL 7 instead of OpenJDK Java for smaller Docker images. 23 Nov 2014 4:01 PM (10 years ago)

Witness a tale of two Dockerfiles that perform the same task. See the size difference. Imagine how it might change infrastructure costs.

DOCKERFILE ONE

FROM debian:wheezy

RUN apt-get update && apt-get install -y openjdk-7-jre && rm -rf /var/lib/apt/lists/*

ADD target/si-standalone-sample-1.0-SNAPSHOT.jar /

ENV JAVA_HOME /usr/lib/jvm/java-7-openjdk-amd64
ENV CLASSPATH si-standalone-sample-1.0-SNAPSHOT.jar

CMD [ "java", "org.springframework.boot.loader.JarLauncher" ]

DOCKERFILE TWO

FROM debian:wheezy

RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 0x219BD9C9 && \
  echo "deb http://repos.azulsystems.com/ubuntu precise main" >> /etc/apt/sources.list.d/zulu.list && \
  apt-get -qq update && \
  apt-get -qqy install zulu-7 && \
  rm -rf /var/lib/apt/lists/*

ADD target/si-standalone-sample-1.0-SNAPSHOT.jar /

ENV JAVA_HOME /usr/lib/jvm/zulu-7-amd64
ENV CLASSPATH si-standalone-sample-1.0-SNAPSHOT.jar

CMD [ "java", "org.springframework.boot.loader.JarLauncher" ]

Notice the only difference is which Java is being installed. Here are the image sizes:

spring-integration   openjdk  549.1 MB
spring-integration   azul     261.3 MB

That's a 288MB difference.

Is using "lsb_release -cs" a good idea inside a debian:wheezy Dockerfile? 23 Nov 2014 1:28 PM (10 years ago)

Update from Jan 2015: The Zulu team added formal Debian support last October, I just did not know about it. Look at the version history for Zulu 8.4, 7.7, and 6.6 at http://www.azulsystems.com/zulurelnotes. Also look on DockerHub for their 8.4.x Docker files. They don't use lsb_release -cs in Debian Dockerfiles anymore, and instead allow the Zulu repository to honor 'stable' as release name. 'stable' always pushes the highest level for a Java major version. - I am paraphrasing the comments from Matthew Schuetze below.

I saw the following line in a Dockerfile

RUN echo "deb http://repos.azulsystems.com/ubuntu `lsb_release -cs` main" >> /etc/apt/sources.list.d/zulu.list

The lsb_release program is not part of the wheezy standard programs. But we can install it:

$ apt-get update && apt-get install -y lsb

How many files were created by that install?

$ docker diff 09 | wc -l
30013

Over 30,000 files!

I next tried being a bit more specific with

$ apt-get update && apt-get install -y lsb-release

How many files were created by that install?

$ docker diff 23 | wc -l
1689

I conclude that hard-coding "wheezy" is better than using lsb_release in a Dockerfile. At least when using Debian as the base operating system.

Using Docker to find out what apt-get update does! 22 Nov 2014 7:26 PM (10 years ago)

While I dabble in System Administration, I don't have a deep knowledge how packages are created or maintained. Today, we'll see how to use Docker to increase my understanding of "apt-get update". I was curious about this command because I read that it's good practice to remove the files created during the update process.

I started a small container using

docker run -i -t debian:wheezy /bin/bash

In another window, I found the ID of the running container using "docker ps". Let's pretend that ID starts with "45...". Look for any changed files using

docker diff "45"

You'll see nothing displayed. Now run "apt-get update" in the wheezy container. Then run the diff command again. You should see the following differences:

C /var
C /var/lib
C /var/lib/apt
C /var/lib/apt/lists
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy-updates_Release
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy-updates_Release.gpg
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy-updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy_Release
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy_Release.gpg
A /var/lib/apt/lists/http.debian.net_debian_dists_wheezy_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/lock
C /var/lib/apt/lists/partial
A /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_Release
A /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_Release.gpg
A /var/lib/apt/lists/security.debian.org_dists_wheezy_updates_main_binary-amd64_Packages.gz

Inside the wheezy container we now know where to look to find file sizes:

# ls -lh /var/lib/apt/lists
total 8.0M
-rw-r--r-- 1 root root 121K Nov 23 02:49 http.debian.net_debian_dists_wheezy-updates_Release
-rw-r--r-- 1 root root  836 Nov 23 02:49 http.debian.net_debian_dists_wheezy-updates_Release.gpg
-rw-r--r-- 1 root root    0 Nov 23 02:37 http.debian.net_debian_dists_wheezy-updates_main_binary-amd64_Packages
-rw-r--r-- 1 root root 165K Oct 18 10:33 http.debian.net_debian_dists_wheezy_Release
-rw-r--r-- 1 root root 1.7K Oct 18 10:44 http.debian.net_debian_dists_wheezy_Release.gpg
-rw-r--r-- 1 root root 7.3M Oct 18 10:07 http.debian.net_debian_dists_wheezy_main_binary-amd64_Packages.gz
-rw-r----- 1 root root    0 Nov 23 04:09 lock
drwxr-xr-x 2 root root 4.0K Nov 23 04:09 partial
-rw-r--r-- 1 root root 100K Nov 20 16:31 security.debian.org_dists_wheezy_updates_Release
-rw-r--r-- 1 root root  836 Nov 20 16:31 security.debian.org_dists_wheezy_updates_Release.gpg
-rw-r--r-- 1 root root 270K Nov 20 16:31 security.debian.org_dists_wheezy_updates_main_binary-amd64_Packages.gz

Obviously, those .gz files might be interesting. It's easy enough to uncompress them:

gzip -d http.debian.net_debian_dists_wheezy_main_binary-amd64_Packages.gz

And now it's possible to see what's inside:

# more http.debian.net_debian_dists_wheezy_main_binary-amd64_Packages
Package: 0ad
Version: 0~r11863-2
Installed-Size: 8260
Maintainer: Debian Games Team 
Architecture: amd64
Depends: 0ad-data (>= 0~r11863), 0ad-data (<= 0~r11863-2), gamin | fam, libboost-signals1.49.0 (>= 1.49.0-1), libc6 (>= 2.11), libcurl3-gnutls (>= 7.16.2), libenet1a, libgamin0 | libfam0, libgcc1 (>= 1:4.1.1), libgl1-mesa-glx | libgl1, lib
jpeg8 (>= 8c), libmozjs185-1.0 (>= 1.8.5-1.0.0+dfsg), libnvtt2, libopenal1, libpng12-0 (>= 1.2.13-4), libsdl1.2debian (>= 1.2.11), libstdc++6 (>= 4.6), libvorbisfile3 (>= 1.1.2), libwxbase2.8-0 (>= 2.8.12.1), libwxgtk2.8-0 (>= 2.8.12.1), l
ibx11-6, libxcursor1 (>> 1.1.2), libxml2 (>= 2.7.4), zlib1g (>= 1:1.2.0)
Pre-Depends: dpkg (>= 1.15.6~)
Description: Real-time strategy game of ancient warfare
Homepage: http://www.wildfiregames.com/0ad/
Description-md5: d943033bedada21853d2ae54a2578a7b
Tag: game::strategy, implemented-in::c++, interface::x11, role::program,
 uitoolkit::sdl, uitoolkit::wxwidgets, use::gameplaying,
 x11::application
Section: games
Priority: optional
Filename: pool/main/0/0ad/0ad_0~r11863-2_amd64.deb
Size: 2260694
MD5sum: cf71a0098c502ec1933dea41610a79eb
SHA1: aa4a1fdc36498f230b9e38ae0116b23be4f6249e
SHA256: e28066103ecc6996e7a0285646cd2eff59288077d7cc0d22ca3489d28d215c0a

...

Given the information about the text file, we can find out how many packages are available:

# grep "Package" http.debian.net_debian_dists_wheezy_main_binary-amd64_Packages | wc -l
36237

Now you know why it's important to run the following in your Dockerfile after using apt-get to install software.

rm -rf /var/lib/apt/lists/*

Have fun exploring!

Using Docker to Build Brooklyn 12 Nov 2014 7:13 PM (10 years ago)

Brooklyn is a large project with a lot of dependencies. I wanted to compile it, but I also wanted to remove all traces of the project when I was done experimenting. I used Docker to accomplish this goal.

See the files below at https://github.com/medined/docker-brooklyn.

First, I created a Dockerfile to load java, maven, and clone the repository.

$ cat Dockerfile 
FROM ubuntu:14.04
MAINTAINER David Medinets 

#
# Install Java
#

RUN apt-get update && \
  apt-get install -y software-properties-common && \
  add-apt-repository -y ppa:webupd8team/java && \
  echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections && \
  echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections && \
  apt-get update && \
  apt-get install -y oracle-java8-installer

ENV JAVA_HOME /usr/lib/jvm/java-8-oracle

#
# Install Maven
#
RUN echo "deb http://ppa.launchpad.net/natecarlson/maven3/ubuntu precise main" >> /etc/apt/sources.list && \
  echo "deb-src http://ppa.launchpad.net/natecarlson/maven3/ubuntu precise main" >> /etc/apt/sources.list && \
  apt-get update && \
  apt-get -y --force-yes install maven3 && \
  rm -f /usr/bin/mvn && \
  ln -s /usr/share/maven3/bin/mvn /usr/bin/mvn

RUN mkdir -p /root/.m2

ADD settings.xml /root/.m2/settings.xml

#
# Clone the brooklyn project
#
RUN apt-get install -y git 
RUN git clone https://github.com/apache/incubator-brooklyn.git

WORKDIR /incubator-brooklyn

RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

There is one twist - that settings.xml file. It's used to connect to a Docker-based Artifactory image later.

Then I created a script to build the image.

$ cat build_image.sh 
#!/bin/bash
sudo DOCKER_HOST=$DOCKER_HOST docker build --no-cache --rm=true -t medined/brooklyn.build .

Also a script to run the image.

$ cat run_image.sh 
#!/bin/bash

#####
# Make sure that Artifactory is running.
#
ARTIFACTORY_COUNT=$(docker ps --filter=status=running | grep artifactory | wc -l)
if [ "${ARTIFACTORY_COUNT}" != "1" ]
then
  echo "Starting Artifactory"
  docker run --name "artifactorydata" -v /opt/artifactory/data -v /opt/artifactory/logs tianon/true
  docker run -d -p 8081:8081 --name "artifactory" --volumes-from artifactorydata  codingtony/artifactory
fi

IMAGEID=$(docker ps -a |grep "brooklyn.build" | awk '{print $1}')
if [ "$IMAGEID" != "" ]
then
  echo "Stopping $IMAGEID"
  IMAGEID=$(sudo DOCKER_HOST=$DOCKER_HOST docker stop $IMAGEID | xargs docker rm)
fi

sudo DOCKER_HOST=$DOCKER_HOST \
  docker run \
    --link artifactory:artifactory \
    -i \
    -t medined/brooklyn.build \
    /bin/bash

In the run script, an Artifactory image is started if one isn't running. Artifactory lets you compile Brooklyn over and over with needing to download the dependencies more than once.

Using R to Fetch List of Pokemon Sets. 31 Oct 2014 12:03 PM (10 years ago)

This document shows how to extract a dataset from an HTML page.

We’ll start by loading two libraries. RCurl is used to read an HTML page. XML is used to parse HTML which can be viewed as a form of XML.

library(RCurl)

## Loading required package: bitops

library(XML)

Let R know where to find the HTML page. Then download and parse it.

theurl <- "http://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_Trading_Card_Game_expansions"
webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)
doc <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)

Use XPATH to extract all tr (table row) nodes from the HTML page. There is a lot of extraneous information in those tr nodes so we’ll filter the list from 70 elements to 67 elements.

tr <- getNodeSet(doc, "//*/tr")
tr_with_pokemon_sets <- tr[4:length(tr)-1]

Let’s look at one example of the HTML. It holds information about one Pokemon set. The pound signs at the start of the lines are not part of the data, they are just part of the printing.

tr_with_pokemon_sets[1]

[[1]]
<tr><th> 1
</th>
<td> 1
</td>
<td>
</td>
<td> <a href="/wiki/Base_Set_(TCG)" title="Base Set (TCG)">Base Set</a>
</td>
<td> Expansion Pack
</td>
<td> 102
</td>
<td> 102
</td>
<td> January 9, 1999
</td>
<td> October 20, 1996
</td></tr>

In order to make sense of that HTML, we’ll use a custom function to manipulate each element in tr_with_pokemon_sets. Generally speaking, the function removes newlines and HTML syntax. It also provides data types and column names.

xmlToCsv <- function(xml) {
  a <- gsub('\n\n','\t', xmlValue(xml))
    b <- gsub('\t\t','\t \t', a)
    d <- gsub('\t\t','\t', b)
    e <- gsub('^ |\t$','', d)
    f <- gsub('\t ','\t', e)
    cc <- c("numeric", "numeric", "character", "character", "character", "character", "character", "character", "character")
    cn <- c("EngNumber", "JpNumber", "Icon", "EngSet", "JpSet", "EngCardCount", "JpCardCount", "EngDate", "JpDate")
    g <- read.table(text=f, sep="\t", header=FALSE)
    colnames(g) <- cn
    keeps <- c("EngNumber", "EngSet", "EngCardCount")
    return(g[keeps])
}

Magic happens next. We apply the custom function, convert results toa data.frame and remove NA values.

pokemon_set_dataframe <- na.omit(do.call(rbind, lapply(tr_with_pokemon_sets, xmlToCsv)))

The information is displayed so you can see the data so far.

pokemon_set_dataframe

   EngNumber                     EngSet EngCardCount
1          1                   Base Set          102
2          2                     Jungle           64
3          3                     Fossil           62
4          4                 Base Set 2          130
5          5                Team Rocket          83*
6          7              Gym Challenge          132
7          8                Neo Genesis          111
8          9              Neo Discovery           75
9         10             Neo Revelation          66*
10        11                Neo Destiny         113*
11        12       Legendary Collection          110
14        13        Expedition Base Set          165
15        14                  Aquapolis         186*
16        14                  Aquapolis         186*
17        15                   Skyridge         182*
18        15                   Skyridge         182*
19        16         EX Ruby & Sapphire          109
20        17               EX Sandstorm          100
21        18                  EX Dragon         100*
22        19 EX Team Magma vs Team Aqua          97*
23        20          EX Hidden Legends         102*
24        21     EX FireRed & LeafGreen         116*
25        22     EX Team Rocket Returns         111*
26        23                  EX Deoxys         108*
27        24                 EX Emerald         107*
28        25           EX Unseen Forces         145*
29        26           EX Delta Species         114*
30        27            EX Legend Maker          93*
31        28          EX Holon Phantoms         111*
32        29       EX Crystal Guardians          100
33        30        EX Dragon Frontiers          101
34        31           EX Power Keepers          108
35        32            Diamond & Pearl          130
36        33       Mysterious Treasures         124*
37        34             Secret Wonders          132
38        35           Great Encounters          106
39        36              Majestic Dawn          100
40        37           Legends Awakened          146
41        38                 Stormfront         106*
42        40              Rising Rivals         120*
43        41            Supreme Victors         153*
44        42                     Arceus         111*
45        43     HeartGold & SoulSilver         124*
46        44                  Unleashed          96*
47        45                  Undaunted          91*
48        46                 Triumphant         103*
49        47            Call of Legends          106
50        48              Black & White         115*
51        49            Emerging Powers           98
52        50            Noble Victories         102*
53        51             Next Destinies         103*
54        52             Dark Explorers         111*
55        53            Dragons Exalted         128*
56        54         Boundaries Crossed         153*
57        55               Plasma Storm         138*
58        56              Plasma Freeze         122*
59        57               Plasma Blast         105*
60        58        Legendary Treasures         138*
61        59                         XY          146
62        60                  Flashfire         109*
63        61              Furious Fists         113*
64        62             Phantom Forces         122*
65        63               Primal Clash         150+

Notice those extra asterisks and plus signs? The next bit of code removes them.

pokemon_set_dataframe$EngCardCount <- gsub("\\*|\\+", "", pokemon_set_dataframe$EngCardCount)

Here is the final dataset.

pokemon_set_dataframe

   EngNumber                     EngSet EngCardCount
1          1                   Base Set          102
2          2                     Jungle           64
3          3                     Fossil           62
4          4                 Base Set 2          130
5          5                Team Rocket           83
6          7              Gym Challenge          132
7          8                Neo Genesis          111
8          9              Neo Discovery           75
9         10             Neo Revelation           66
10        11                Neo Destiny          113
11        12       Legendary Collection          110
14        13        Expedition Base Set          165
15        14                  Aquapolis          186
16        14                  Aquapolis          186
17        15                   Skyridge          182
18        15                   Skyridge          182
19        16         EX Ruby & Sapphire          109
20        17               EX Sandstorm          100
21        18                  EX Dragon          100
22        19 EX Team Magma vs Team Aqua           97
23        20          EX Hidden Legends          102
24        21     EX FireRed & LeafGreen          116
25        22     EX Team Rocket Returns          111
26        23                  EX Deoxys          108
27        24                 EX Emerald          107
28        25           EX Unseen Forces          145
29        26           EX Delta Species          114
30        27            EX Legend Maker           93
31        28          EX Holon Phantoms          111
32        29       EX Crystal Guardians          100
33        30        EX Dragon Frontiers          101
34        31           EX Power Keepers          108
35        32            Diamond & Pearl          130
36        33       Mysterious Treasures          124
37        34             Secret Wonders          132
38        35           Great Encounters          106
39        36              Majestic Dawn          100
40        37           Legends Awakened          146
41        38                 Stormfront          106
42        40              Rising Rivals          120
43        41            Supreme Victors          153
44        42                     Arceus          111
45        43     HeartGold & SoulSilver          124
46        44                  Unleashed           96
47        45                  Undaunted           91
48        46                 Triumphant          103
49        47            Call of Legends          106
50        48              Black & White          115
51        49            Emerging Powers           98
52        50            Noble Victories          102
53        51             Next Destinies          103
54        52             Dark Explorers          111
55        53            Dragons Exalted          128
56        54         Boundaries Crossed          153
57        55               Plasma Storm          138
58        56              Plasma Freeze          122
59        57               Plasma Blast          105
60        58        Legendary Treasures          138
61        59                         XY          146
62        60                  Flashfire          109
63        61              Furious Fists          113
64        62             Phantom Forces          122
65        63               Primal Clash          150

With a bit more complexity the first column of numbers can be removed.

x <- as.matrix(format(pokemon_set_dataframe))
rownames(x) <- rep("", nrow(x))
print(x, quote=FALSE)

 EngNumber EngSet                     EngCardCount
  1        Base Set                   102         
  2        Jungle                     64          
  3        Fossil                     62          
  4        Base Set 2                 130         
  5        Team Rocket                83          
  7        Gym Challenge              132         
  8        Neo Genesis                111         
  9        Neo Discovery              75          
 10        Neo Revelation             66          
 11        Neo Destiny                113         
 12        Legendary Collection       110         
 13        Expedition Base Set        165         
 14        Aquapolis                  186         
 14        Aquapolis                  186         
 15        Skyridge                   182         
 15        Skyridge                   182         
 16        EX Ruby & Sapphire         109         
 17        EX Sandstorm               100         
 18        EX Dragon                  100         
 19        EX Team Magma vs Team Aqua 97          
 20        EX Hidden Legends          102         
 21        EX FireRed & LeafGreen     116         
 22        EX Team Rocket Returns     111         
 23        EX Deoxys                  108         
 24        EX Emerald                 107         
 25        EX Unseen Forces           145         
 26        EX Delta Species           114         
 27        EX Legend Maker            93          
 28        EX Holon Phantoms          111         
 29        EX Crystal Guardians       100         
 30        EX Dragon Frontiers        101         
 31        EX Power Keepers           108         
 32        Diamond & Pearl            130         
 33        Mysterious Treasures       124         
 34        Secret Wonders             132         
 35        Great Encounters           106         
 36        Majestic Dawn              100         
 37        Legends Awakened           146         
 38        Stormfront                 106         
 40        Rising Rivals              120         
 41        Supreme Victors            153         
 42        Arceus                     111         
 43        HeartGold & SoulSilver     124         
 44        Unleashed                  96          
 45        Undaunted                  91          
 46        Triumphant                 103         
 47        Call of Legends            106         
 48        Black & White              115         
 49        Emerging Powers            98          
 50        Noble Victories            102         
 51        Next Destinies             103         
 52        Dark Explorers             111         
 53        Dragons Exalted            128         
 54        Boundaries Crossed         153         
 55        Plasma Storm               138         
 56        Plasma Freeze              122         
 57        Plasma Blast               105         
 58        Legendary Treasures        138         
 59        XY                         146         
 60        Flashfire                  109         
 61        Furious Fists              113         
 62        Phantom Forces             122         
 63        Primal Clash               150

And we can plot the number of cards per set against the set number.

plot(pokemon_set_dataframe[c(1,3)])

alt

The EngCount column is actually a character data type which is not correct. The transform method changes the datatype.

pokemon_set_dataframe <- transform(pokemon_set_dataframe, EngCardCount = as.numeric(EngCardCount))

Now it’s possible to sum the card counts.

noquote(format(sum(pokemon_set_dataframe$EngCardCount), big.mark=","))

[1] 7,372

How do I provide a single file to multiple Docker containers? 10 Oct 2014 12:01 PM (10 years ago)

http://www.cyberciti.biz/faq/bash-shell-change-the-color-of-my-shell-prompt-under-linux-or-unix/

Simple Explanation of the MIT D4M Accumulo Schema 29 Sep 2014 1:07 PM (10 years ago)

https://github.com/medined/D4M_Schema provides a step-by-step introduction to the D4M nosql schema used by many organizations.

D4M is a breakthrough in computer programming that combines the advantages of five distinct processing technologies (sparse linear algebra, associative arrays, fuzzy algebra, distributed arrays, and triple-store/NoSQL databases such as Hadoop HBase and Apache Accumulo) to provide a database and computation system that addresses the problems associated with Big Data.

Sharing Files Between Docker Images Using Volumes 15 Jul 2014 10:17 AM (10 years ago)

Recently I wanted to provide the same configuration file to two different Docker containers. I choose to solve this using a Docker volume. The configuration file will be sourced from within each container and looks like this:

$ cat bridge-env.sh 
export BRIDGENAME=brbob
export IMAGENAME=bob
export IPADDR=10.0.10.1/24

Before any explanations, let's look at the files we'll be using:

./configuration/build_image.sh - wrapper for _docker build_.
./configuration/run_image.sh - wrapper for _docker run_.
./configuration/Dockerfile - control file for Docker image.
./configuration/files/bridge-env.sh - environment setting script.

All of the files are fairly small. Since our main topic today is Docker, let's look at the Docker configuration file first.

$ cat Dockerfile
FROM stackbrew/busybox:latest
MAINTAINER David Medinets <david.medinets@gmail.com>
RUN mkdir /configuration
VOLUME /configuration
ADD files /configuration

And you can build this image.

$ cat build_image.sh
sudo DOCKER_HOST=$DOCKER_HOST docker build --rm=true -t medined/shared-configuration .

I setup my docker to use a port instead of a UNIX socket. Therefore my DOCKERHOST is "tcp://0.0.0.0:4243". Since _sudo is being used, the environment variable needs to be set inside the sudo enviroment. If you want to use the default UNIX socker, leave DOCKER_HOST empty. The command will still work.

Then run it.

$ cat run_image.sh
sudo DOCKER_HOST=$DOCKER_HOST docker run --name shared-configuration -t medined/shared-configuration true

This command runs a docker container called sharedconfiguration. You'll notice that the _true command is run which exits immediately. Since this container will only hold files, it's ok there are no processes running in it. However, be very careful not to delete this container. Here is the output from docker ps showing the container.

$ docker ps -a
CONTAINER ID        IMAGE                               COMMAND             CREATED             STATUS                      PORTS               NAMES
d4a2aa46b5d9        medined/shared-configuration:latest   true                7 seconds ago       Exited (0) 7 seconds ago                       -shared-configuration

Now it's time to spin up two plain Ubuntu containers that can access the shared file.

$ sudo DOCKER_HOST=$DOCKER_HOST docker run --name A --volumes-from=shared-configuration -d -t ubuntu /bin/bash
94638de8b615f356f1240bbe602c0b7862e0589f1711fbff242b6d6f74c7de7d
$ sudo DOCKER_HOST=$DOCKER_HOST docker run --name B --volumes-from=shared-configuration -d -t ubuntu /bin/bash
sudo DOCKER_HOST=$DOCKER_HOST docker run --name B --volumes-from=shared-configuration -d -t ubuntu /bin/bash

How can we see the shared file? Let's turn to a very useful tool called nsenter (or namespace enter). The following command installs nsenter if isn't already installed.

hash nsenter 2>/dev/null \
  || { echo >&2 "Installing nsenter"; \
  sudo DOCKER_HOST=$DOCKER_HOST \
  docker run -v /usr/local/bin:/target jpetazzo/nsenter;  }

I use a little script file to make nsenter easier to use:

$ cat enter_image.sh
#!/bin/bash

IMAGENAME=$1

usage() {
  echo "Usage: $0 [image name]"
  exit 1
}

if [ -z $IMAGENAME ]
then
  echo "Error: missing image name parameter."
  usage
fi

PID=$(sudo DOCKER_HOST=$DOCKER_HOST docker inspect --format {{.State.Pid}} $IMAGENAME)
sudo nsenter --target $PID --mount --uts --ipc --net --pid

This script is used by specifying the image name to use. For example,

$ ./enter_image.sh A
root@94638de8b615:/# cat /configuration/bridge-env.sh 
export BRIDGENAME=brbob
export IMAGENAME=bob
export IPADDR=10.0.10.1/24
root@94638de8b615:/# exit
logout
$ ./enter_image.sh B
root@925365faded2:/# cat /configuration/bridge-env.sh 
export BRIDGENAME=brbob
export IMAGENAME=bob
export IPADDR=10.0.10.1/24
root@925365faded2:/# exit
logout

We see the same information in both containers. Let's prove that the bridge-env.sh file is shared instead of being two copies.

$ ./enter_image.sh A
root@94638de8b615:/# echo "export NEW_VARIABLE=VALUE" >> /configuration/bridge-env.sh 
root@94638de8b615:/# exit
logout
$ ./enter_image.sh B
root@925365faded2:/# cat /configuration/bridge-env.sh 
export BRIDGENAME=brbob
export IMAGENAME=bob
export IPADDR=10.0.10.1/24
export NEW_VARIABLE=VALUE

We changed the file in the first container and saw the changes in the second container. As an alternative to using nsenter, you can simply run a container to list the files.

$ docker run --volumes-from shared-configuration busybox ls -al /configuration

Running a Single-Node Accumulo Docker container 12 Jul 2014 5:35 AM (11 years ago)

Based on the work by sroegner, I have a github project at https://github.com/medined/docker-accumulo which lets you run multiple single-node Accumulo instances using Docker.

First, create the image.

git clone https://github.com/medined/docker-accumulo.git
cd docker-accumulo/single_node
./make_image.sh

Now start your first container.

export HOSTNAME=bellatrix
export IMAGENAME=bellatrix
export BRIDGENAME=brbellatrix
export SUBNET=10.0.10
export NODEID=1
export HADOOPHOST=10.0.10.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST yes

And then you can start a second one:

export HOSTNAME=rigel
export IMAGENAME=rigel
export BRIDGENAME=brrigel
export SUBNET=10.0.11
export NODEID=1
export HADOOPHOST=10.0.11.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

And a third!

export HOSTNAME=saiph
export IMAGENAME=saiph
export BRIDGENAME=brbellatrix
export SUBNET=10.0.12
export NODEID=1
export HADOOPHOST=10.0.12.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

The SUBNET is different for all containers. This isolates the Accumulo containers from each other.

Look at the running containers

$ docker ps
CONTAINER ID        IMAGE                     COMMAND                CREATED             STATUS              PORTS
                    NAMES
41da6f17261f        medined/accumulo:latest   /docker/run.sh saiph   4 seconds ago       Up 2 seconds        0.0.0.0:49179->19888/tcp, 0.0.0.0:49180->2181/tcp, 0.0.0.0:49181->50070/tcp, 0.0.0.0:49182->50090/tcp, 0.0.0.0:49183->8141/tcp, 0.0.0.0:49184->10020/tcp, 0.0.0.0:49185->22/tcp, 0.0.0.0:49186->50095/tcp, 0.0.0.0:49187->8020/tcp, 0.0.0.0:49188->8025/tcp, 0.0.0.0:49189->8030/tcp, 0.0.0.0:49190->8050/tcp, 0.0.0.0:49191->8088/tcp   saiph               
23692dfe3f1e        medined/accumulo:latest   /docker/run.sh rigel   10 seconds ago      Up 9 seconds        0.0.0.0:49166->19888/tcp, 0.0.0.0:49167->2181/tcp, 0.0.0.0:49168->50070/tcp, 0.0.0.0:49169->8025/tcp, 0.0.0.0:49170->8088/tcp, 0.0.0.0:49171->10020/tcp, 0.0.0.0:49172->22/tcp, 0.0.0.0:49173->50090/tcp, 0.0.0.0:49174->50095/tcp, 0.0.0.0:49175->8020/tcp, 0.0.0.0:49176->8030/tcp, 0.0.0.0:49177->8050/tcp, 0.0.0.0:49178->8141/tcp   rigel               
63f8f1a7141f        medined/accumulo:latest   /docker/run.sh bella   21 seconds ago      Up 20 seconds       0.0.0.0:49153->19888/tcp, 0.0.0.0:49154->50070/tcp, 0.0.0.0:49155->8020/tcp, 0.0.0.0:49156->8025/tcp, 0.0.0.0:49157->8030/tcp, 0.0.0.0:49158->8050/tcp, 0.0.0.0:49159->8088/tcp, 0.0.0.0:49160->8141/tcp, 0.0.0.0:49161->10020/tcp, 0.0.0.0:49162->2181/tcp, 0.0.0.0:49163->22/tcp, 0.0.0.0:49164->50090/tcp, 0.0.0.0:49165->50095/tcp   bellatrix

You can connect to running instances using the public ports. Especially useful is the public zookeeper port. Rather than searching through the ports listed above, here is an easier way.

$  docker port saiph 2181
0.0.0.0:49180
$ docker port rigel 2181
0.0.0.0:49167
$ docker port bellatrix 2181
0.0.0.0:49162

Having '0.0.0.0' in the response means that any IP address can connect.

You can enter the namespace of a container (i.e., access a bash shell) this way.

$ ./enter_image.sh rigel
-bash-4.1# hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - accumulo accumulo            0 2014-07-12 09:13 /accumulo
drwxr-xr-x   - hdfs     supergroup          0 2014-07-11 21:06 /user

-bash-4.1# accumulo shell -u root -p secret
Shell - Apache Accumulo Interactive Shell
- 
- version: 1.5.1
- instance name: accumulo
- instance id: bb713243-3546-487f-b6d6-cfaa272efb30
- 
- type 'help' for a list of available commands
- 
root@accumulo> tables
!METADATA

Now let's start an edge node. For my purposes, an edge node can connect to Hadoop, Zookeeper and Accumulo without running any of those processes. All of the edge node's resources are dedicated to client work.

export HOSTNAME=rigeledge
export IMAGENAME=rigeledge
export BRIDGENAME=brrigel
export SUBNET=10.0.11
export NODEID=2
export HADOOPHOST=10.0.11.1
./make_container.sh $HOSTNAME $IMAGENAME $BRIDGENAME $SUBNET $NODEID $HADOOPHOST no

As this container is started, the 'no' means that the supervisor configuration files will be deleted. So while supervisor will be running, it won't be managing any processes. This is not a best practice. It's just the way I chose for this prototype.

How find the published port of a Docker container in Java 10 Jul 2014 4:52 AM (11 years ago)

After I spin up Accumulo in a Docker container, well-known ports (like 2181 for Zookeeper) are not well-known any more. The internal private port (i.e., 2181) is exposed as a different public port (i.e., 49143). Java program trying to connect to Accumulo must automatically find the public port numbers.

The java code below finds the public port for Zookeeper for a Docker container named "walt". I don't know why the slash is needed in the image name.

int wantedPublicPort = -1;

        String wantedContainerName = "/walt";
        int wantedPrivatePort = 2181;
        
        String dockerURL = "http://127.0.0.1:4243";
        String dockerUser = "medined";
        String dockerPassword = "XXXXX";
        String dockerEmail = "david.medinets@gmail.com";
        DockerClient docker = new DockerClient(dockerURL);
        docker.setCredentials(dockerUser, dockerPassword, dockerEmail);
        
        List<Container> containers = docker.listContainersCmd().exec();
        for (Container container : containers) {
            String[] names = container.getNames();
            for (String name : container.getNames()) {
                if (name.equals(wantedContainerName)) {
                    for (Container.Port port : container.getPorts()) {
                        if (port.getPrivatePort() == wantedPrivatePort) {
                            wantedPublicPort = port.getPublicPort();
                        }
                    }
                }
            }
        }

        System.out.println("Zookeeper Port: " + wantedPublicPort);

In order to use the DockerClient object, I added the following to my pom.xml:

gt;
    gt;com.github.docker-javagt;
    gt;docker-javagt;
    gt;0.9.0gt;
  gt;

Finding Log Files Inside Docker Containers 10 Jul 2014 4:34 AM (11 years ago)

As a simple lay programmer, I sometimes have trouble figuring out where log files are stored on unix systems. Sometimes logs are within application directories. Other times they are in /var/log. With Docker containers, this uncertainty is eliminated. How? By the 'docker diff' command. I will show why. When connecting to a Docker-based system, you can see the running containers:

$ docker ps
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS              PORTS                                                                                                                                                                                                                                                                                                                                    NAMES
90a9f7122c02        medined/accumulo:latest   /run.sh walt        9 hours ago         Up 9 hours          0.0.0.0:49153->50070/tcp, 0.0.0.0:49154->50090/tcp, 0.0.0.0:49155->50095/tcp, 0.0.0.0:49156->8025/tcp, 0.0.0.0:49157->8030/tcp, 0.0.0.0:49158->8088/tcp, 0.0.0.0:49159->10020/tcp, 0.0.0.0:49160->19888/tcp, 0.0.0.0:49161->2181/tcp, 0.0.0.0:49162->22/tcp, 0.0.0.0:49163->8020/tcp, 0.0.0.0:49164->8050/tcp, 0.0.0.0:49165->8141/tcp   walt

Then you can list changed files within the container using the image id or name.

$ docker diff walt
...
D /data1/hdfs/dn/current/BP-1274135865-172.17.0.10-1404767453280/current/finalized/blk_1073741825_1001.meta
...
A /var/log/supervisor/accumulo-gc-stderr---supervisor-5H7Rr7.log
A /var/log/supervisor/accumulo-gc-stdout---supervisor-LK8wDU.log
...
A /var/log/supervisor/namenode-stdout---supervisor-mciN4u.log
A /var/log/supervisor/secondarynamenode-stderr---supervisor-EaluLZ.log
A /var/log/supervisor/secondarynamenode-stdout---supervisor-Ap4Fri.log
C /var/log/supervisor/supervisord.log
A /var/log/supervisor/zookeeper-stderr---supervisor-CCwUGw.log
A /var/log/supervisor/zookeeper-stdout---supervisor-lDiuIF.log
C /var/run
C /var/run/sshd.pid
C /var/run/supervisord.pid

Armed with this list you can confidently either look in /var/lib/docker or use the nsenter command to join the namespace of the container to read interesting files.

How to Detach From a Running Docker Image 10 Jun 2014 6:10 PM (11 years ago)

Here is another quick note. This time about Docker.

# Run the standard Ubuntu image
docker run --name=bash -i -t ubuntu /bin/bash

# Do something
...

# Detach by typing Ctl-p and Ctl-q.

# Look at the image while on the Host system.
docker ps

# Reattach to the Ubuntu image
docker attach bash

While experimenting with these commands, I noticed that I needed to press to see the prompt after the ^P^Q combination and after reattaching.

Accumulo BatchScanner With and Without WholeRowIterator 25 May 2014 1:15 PM (11 years ago)

This note shows the difference between an Accumulo query both without and with an WholeRowIterator. The code snippet below picks up the narrative after you've initialized a Connector object. First we can see what a plain scan looks like:

  // Read from the tEdge table of the D4M schema. 
  String tableName = "tEdge";

  // Read from 5 tablets at a time.
  int numQueryThreads = 5;

  Text startRow = new Text("6000");
  Text endRow = new Text("6001");
  List<Range> range = Collections.singletonList(new Range(startRow, endRow));
 
  BatchScanner scanner = connector.createBatchScanner(tableName, new Authorizations(), numQueryThreads);
  scanner.setRanges(range);

  for (Entry<Key, Value> entry : scanner) {
    System.out.println(entry.getKey());
  }

  scanner.close();

The results of this query, using the data loaded by the SOICSVToAccumulo
class from https://github.com/medined/D4M_Schema, is shown below.

600006a870bb4c8471a27c9bd0f3f064265d062d :a00100|0.0001 [] 1401023353637 false
600006a870bb4c8471a27c9bd0f3f064265d062d :a00200|0.0001 [] 1401023353637 false
...
600006a870bb4c8471a27c9bd0f3f064265d062d :state|UT [] 1401023353637 false
600006a870bb4c8471a27c9bd0f3f064265d062d :zipcode|84521 [] 1401023353637 false
6000338cbf2daede3efd4355165c98771b3e2b66 :a00100|29673.0000 [] 1401023273694 false
6000338cbf2daede3efd4355165c98771b3e2b66 :a00200|20421.0000 [] 1401023273694 false
...
6000338cbf2daede3efd4355165c98771b3e2b66 :state|OR [] 1401023273694 false
6000338cbf2daede3efd4355165c98771b3e2b66 :zipcode|97365 [] 1401023273694 false

Hopefully you can see that this output represents two 'standard' RDMS records with
columns named 'a00100', 'a00200', etc. This organization becomes really obvious
when the WholeRowIterator is used. The scanner part of the code for this is shown below:

  BatchScanner scanner = connector.createBatchScanner(tableName, new Authorizations(), numQueryThreads);
  scanner.setRanges(range);

  IteratorSetting iteratorSetting = new IteratorSetting(1, WholeRowIterator.class);
  scanner.addScanIterator(iteratorSetting);

  for (Entry<Key, Value> entry : scanner) {
    System.out.println(entry.getKey());
  }

  scanner.close();

The output for this code is:

600006a870bb4c8471a27c9bd0f3f064265d062d : [] 9223372036854775807 false
6000338cbf2daede3efd4355165c98771b3e2b66 : [] 9223372036854775807 false

What happened to all of the other information? We can find it again using the
WholeRowIterator.decodeRow method as shown below:

  for (Entry<Key, Value> entry : scanner) {
    try {
        SortedMap<Key, Value> wholeRow = WholeRowIterator.decodeRow(entry.getKey(), entry.getValue());
        System.out.println(wholeRow);
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
  }

This code produces:

{600006a870bb4c8471a27c9bd0f3f064265d062d :a00100|0.0001 [] 1401023353637 false=1, 600006a870bb4c8471a27c9bd0f3f064265d062d :a00200|0.0001 [] 1401023353637 false=1,
... 
600006a870bb4c8471a27c9bd0f3f064265d062d :state|UT [] 1401023353637 false=1, 
600006a870bb4c8471a27c9bd0f3f064265d062d :zipcode|84521 [] 1401023353637 false=1}
{6000338cbf2daede3efd4355165c98771b3e2b66 :a00100|29673.0000 [] 1401023273694 false=1, 6000338cbf2daede3efd4355165c98771b3e2b66 :a00200|20421.0000 [] 1401023273694 false=1,
...
6000338cbf2daede3efd4355165c98771b3e2b66 :state|OR [] 1401023273694 false=1, 
6000338cbf2daede3efd4355165c98771b3e2b66 :zipcode|97365 [] 1401023273694 false=1}

Data Distribution Throughout the Accumulo Cluster 25 May 2014 4:21 AM (11 years ago)

Data Distribution Throughout the Accumulo Cluster

This document answers these questions:

* What is a tablet?
* What is a split point?
* What is needed before data can be distributed?

A distributed database typically is thought of as having data spread across multiple servers. But how does the data spread out? That's a question I hope to answer - at least for Accumulo.

At a high level of abstraction, the concept is simple. If you have two servers, then 50% of the data should go to server one and 50% should go to server two. The examples below give concrete demonstrations of data distribution.

Accumulo stores information as key-value pairs (or entries). For a visual reference, below is an empty key-value pair.

-----------  ---------
| key     |  | value |
-----------  ---------
| [nothing here yet] |
-----------  ---------

Tables

A collecton of key-values is called a table. This table is different from one
found in a relational database because there is no schema associated with it.

What is a Key? See below.

Note: Understanding the difference between a relational database and a
key-value database is beyond the scope of this discussion. If you want, you
can think of the "key" in this discussion as a primary key. But, fair warning,
that is a false analogy. One which you'll need to forget as you gain more
proficiency with key-value databases.

A new Accumulo table has a single unit of storage called a tablet. When created, the tablet is empty. As more entries are inserted into a table, Accumulo may
automatically decide to split the initial tablet into two tablets. As the size
of the table continues to grow, the split operation is repeated. Or you can
specify how the splitting occurs. We'll discuss this further below.

>Tables have one or more tablets.

Below is an empty table. For convenience, we'll use 'default' as the name of
the initial tablet.

-----------  -----------  ---------
| tablet  |  | key     |  | value |
-----------  -----------  ---------
| default |  |  |
-----------  -----------  ---------

Even though the table is empty, it still has a starting key of -infinity and
an ending key of +infinity. All possible data occurs between the two extremes of infinity.

  -infinity ==> ALL DATA  <== +infinity.

This concept of start and end keys can be shown in our tablet depiction as well.

-----------  -----------  ---------
| tablet  |  | key     |  | value |
-----------  -----------  ---------
|      start key: -infinity       |
-----------------------------------
| default |  |  |
-----------------------------------
|        end key: +infinity       |
-----------  -----------  ---------

After inserting three records into a new table, you'll have the following situaton. Notice that Accumulo always stores keys in lexically sorted order. So far, the start and end keys have not been changed.

-----------  -------  ---------
| tablet  |  | key |  | value |
-----------  -------  ---------
| default |  | 01  |  | X     |
| default |  | 03  |  | X     |
| default |  | 05  |  | X     |
-----------  -------  ---------

Accumulo stores all entries for a tablet on a single node in the clsuter. Since our table has only one tablet, the information can't spread beyond one node. In order to distribute information, you'll need to create more than tablet for your table.

The tablet's range is still from -infinity to +infinity. That hasn't changed yet.

Splits

Now we can introduce the idea of splits. When a tablet is split, one tablet becomes two. If you want your information to be spread onto three nodes, you'll need two splits. We'll illustrate this idea.

Split point - the place where one tablet becomes two.

Let's add two split points to see what happens. As the split points are added, new tablets are created.

Adding Splits

First Split

First, adding split point 02 results in a second tablet being created. It's worth noting that the tablet names are meaningless. Accumulo assigns internal names that you rarely need to know. I picked "A" and "B" because they are easy to read.

-----------  -------  ---------
| tablet  |  | key |  | value |
-----------  -------  ---------
| A       |  | 01  |  | X     | range: -infinity to 02 (inclusive)
|       split point 02        |
| B       |  | 03  |  | X     | range: 02 (exclusive) to +infinity
| B       |  | 05  |  | X     | 
-----------  -------  ---------

The split point does not need to exist as an entry. This feature means that you can pre-split a table by simply giving Accumulo a list of split points.

Tablet Movement

Before continuing, let's take a small step back to see how tablets are moved between servers. At first, the table resides on one server. This makes sense - one tablet is on one server.

--------------------------------
| Tablet Server                |
--------------------------------
|                              |
|  -- Tablet ----------------  |
|  | -infinity to +infinity |  |
|  --------------------------  |
|                              |
--------------------------------

Then the first split point is added. Now there are two tablets. However, they are still on a single server. And this also makes sense. Thinking about adding a split point to a table with millions of entries. While the two tablets reside on one server, adding a split is just an accounting change.

-----------------------------------------------------------------------
| Tablet Server                                                       |
-----------------------------------------------------------------------
|                                                                     |
|  -- Tablet ---------------------   -- Tablet ---------------------  |
|  | -infinity to 02 (inclusive) |   | 02 (exclusive) to +infinity |  |
|  -------------------------------   -------------------------------  |
|                                                                     |
-----------------------------------------------------------------------

At some future point, Accumulo might move the second tablet to another Tablet Server.

------------------------------------|  |------------------------------------
| Tablet Server                     |  | Tablet Server                     |
------------------------------------|  |------------------------------------
|                                   |  |                                   |
|  -- Tablet ---------------------  |  |  -- Tablet ---------------------  |
|  | -infinity to 02 (inclusive) |  |  |  | 02 (exclusive) to +infinity |  |
|  -------------------------------  |  |  -------------------------------  |
|                                   |  |                                   |
-------------------------------------  -------------------------------------

Second Split

You'll wind up with three tablets when a second split point of "04" is added.

-----------  -------  ---------
| tablet  |  | key |  | value |
-----------  -------  ---------
| A       |  | 01  |  | X     | range: -infinity to 02 (inclusive)
|       split point 02        |
| B       |  | 03  |  | X     | range: 02 (exclusive) to 04 (inclusive)
|       split point 04        |
| C       |  | 05  |  | X     | range: 04 (exclusive) to +infinity
-----------  -------  ---------

The table now has three tablets. When enough tablets are created, some process inside Accumulo moves one or more tablets into different nodes. Once that happens the data is distributed. Hopefully, you can now figure out which tablet any specific key inserts into. For example, key "00" goes into tablet "A".

-----------  -------  ---------
| tablet  |  | key |  | value |
-----------  -------  ---------
| A       |  | 00  |  | X     | range: -infinity to 02 (inclusive)
| A       |  | 01  |  | X     |
|       split point 02        |
| B       |  | 03  |  | X     | range: 02 (exclusive) to 04 (inclusive)
|       split point 04        |
| C       |  | 05  |  | X     | range: 04 (exclusive) to +infinity
-----------  -------  ---------

Internally, the first tablet ("A") as a starting key of -infinity. Any entry with a key between -infinity and "00" inserts into the first key. The last tablet has an ending key of +infinity. Therefore any key between "05" and +infinity inserts into the last tablet. Accumulo automatically creates split points based on some conditions. For example, if the tablet grows too large. However, that's a whole 'nother conversation.

What is a Key?

Plenty of people have described Accumulo's Key layout. Here is the bare-bones explanation:

-------------------------------------------------------------------
| row | column family | column qualifier | visibility | timestamp |
-------------------------------------------------------------------

These five components, combined, go into the _Key_.

Using Shards To Split a Row

Each row resides on a single tablet which can cause a problem if any single row has a few million entries. For example, if your table held all ISBN's using this schema:

------------------------------------------------
| row | column family | column qualifier       |
------------------------------------------------
| book | 140122317    | Batman: Hush           |
| book | 1401216676   | Batman: A Killing Joke |

You can see how the _book_ row would have millions of entries. Potentially causing memory issues inside your TServer. Many people add a _shard_ value to the row to introduce potential split points. With shard values, the above table might look like this:

---------------------------------------------------
| row    | column family | column qualifier       |
---------------------------------------------------
| book_0 |  140122317    | Batman: Hush           |
| book_5 |  1401216676   | Batman: A Killing Joke |

With this style of row values, Accumulo could use book_5 as a split point so that the row are no longer unmanageable. Of course, this technique adds a bit of complexity to the query process. I'll leave the query issue to a future note. Let's explore how shard values can be generated.

When an Accumulo table is created

It may be tempting to have the computers flip a virtual coin to decide which server to target for each record. In the RDBMS world that procedure works but in key-value databases, information is stored vertically instead of horizontally so the coin flip analogy does not work. Let's quickly review why.

Coin Flip Sharding

Relational databases spread information across columns (i.e., horizontally). Hopefully, there is in Id value using a synthetic key (SK) and I hope you have them in your data. If not your very first task is to get your DBA's to add them. Seriously, synthetic keys save you a world of future trouble. Here is a simple relational record.

|--------------------------------------
| RELATIONAL REPRESENTATION           |
|--------------------------------------
| SK   | First Name | Last Name | Age |
|-------------------------------------|
| 1001 | John       | Kloplick  | 36  |
---------------------------------------

Key-value database spread information across several rows using the synthetic key to tie them together. In simplified form, the information is stored in three key-value combinations (or three entries).

|----------------------------------
| KEY VALUE REPRESENTATION        |
|----------------------------------
| ROW  | CF         | CQ          |
|---------------------------------|
| 1001 | first_name | John        |
| 1001 | last_name  | Kloplick    |
| 1001 | age        | 36          |
-----------------------------------

If the coin flip sharding strategy were used the information might look like the following. The potential split point shows that the entries can be spread across two tablets.

|-------------------------------------
| ROW     | CF         | CQ          |
|------------------------------------|
| 1001_01 | first_name | John        |
| 1001_01 | age        | 36          |
| 1001_02 | last_name  | Kloplick    | <-- potential split point
--------------------------------------

To retrieve the information you'd need to scan both servers! This coin flip sharding technique is not going to scale. Imagine information about a person spread over 40 servers. Collating that information would be prohibitively time-consuming.

HASH + MOD Sharding (using natural key)

Of course, there is a better sharding strategy to use. You can base the strategy on one of the fields. Get its hash code and then mod it by the number of partitions. Ultimately, this strategy will fail but let's go through the process to see why. Skip to the next section if you already see the problem. "John".hashCode() is 2314539. Then we can mod that by the number of partitions (or servers) in our cluster. Let's pretend we have 5 servers instead of the two we used earlier for variety. Our key-value entries now look thusly:

2,314,539 modulo 5 = 4

|-------------------------------------
| ROW     | CF         | CQ          |
|------------------------------------|
| John_04 | first_name | John        |
| John_04 | age        | 36          |
| John_04 | last_name  | Kloplick    |
--------------------------------------

Note that the shard value is _not_ related to any specific node. It's just a potential split point for Accumulo.

It's time to look at a specific use case to see if this sharding strategy is sound. What if we need to add a set of friends for John? It's unlikely that the information about John's friends have his first name. But very likely for his synthetic key of 1001 to be there. We can now see choosing the first_name field as the base of the sharding strategy was unwise.

HASH + MOD Sharding (using synthetic key)

Using the synthetic key as the basis for the hash provides more continuity between updates. And regardless of how information changes, we'll always put the information in the same shard. "1001".hashCode() is 1507424. If we use the first prime number less than 1,000 then the shard calculation generates a shard value of 957. So the key-value information is now:

1,507,424 modulo 997 = 957

|--------------------------------------
| ROW      | CF         | CQ          |
|-------------------------------------|
| 1001_957 | first_name | John        |
| 1001_957 | age        | 36          |
| 1001_957 | last_name  | Kloplick    |
--------------------------------------

Using this technique makes it simple to add a height field.

|--------------------------------------------
| ROW      | CF               | CQ          |
|-------------------------------------------|
| 1001_957 | height_in_inches | 68          |
---------------------------------------------

CodeBits - Tested Complex Code! View RSS