Back to index
Mesos is fun!

I was recently involved in writing a mesos framework to autoscale GoCD agents.

My development setup involves:

The framework runs two threads. First thread is the actual framework implementation which listens to messages from mesos-master. The second thread is a Http poller, which polls the GoCD server to find out the demand and supply of agents. If the demand > supply, the framework launches a new go agent.

The Poller part was pretty straight forward to implement. While the framework part is based on standard interface all mesos frameworks follow, there were minor gotchas to get it working.

Setting mesos native lib in the PATH

Exception in thread "main" java.lang.UnsatisfiedLinkError: no mesos in java.library.path

The above error is because the framework can't find the mesos native lib. Explicitly set MESOS_NATIVE_JAVA_LIBRARY=/path/to/mesos/native/lib In Mac OSX, it is usually inside /usr/local/lib/libmesos.dylib, after you brew install mesos.

Making framework visible to the mesos-master

Mesos frameworks are usually run on the same machine as the master or in a machine is publicly accesible from the master. During development there is a high chance that your framework is binding to the localhost ip 127.0.0.1, which is not visible in the public network. Which will result in very cryptic errors like below.

On framework logs in local, it will be stuck at

sched.cpp:264] No credentials provided. Attempting to register without authentication

In the mesos-master logs you will see messages like:

master.cpp:1423] Received re-registration request from framework GOCD-Mesos-1456332472341 at scheduler-3bfce855-b59c-4ebe-bb04-567770e04f5a@0.0.0.0:57816 master.cpp:1474] Re-registering framework GOCD-Mesos-1456332472341 at scheduler-3bfce855-b59c-4ebe-bb04-567770e04f5a@0.0.0.0:57816 master.cpp:1501] Framework GOCD-Mesos-1456332472341 failed over hierarchical_allocator_process.hpp:375] Activated framework GOCD-Mesos-1456332472341 master.cpp:3559] Sending 1 offers to framework GOCD-Mesos-1456332472341 master.cpp:725] Framework GOCD-Mesos-1456332472341 disconnected master.cpp:1655] Deactivating framework GOCD-Mesos-1456332472341 hierarchical_allocator_process.hpp:405] Deactivated framework GOCD-Mesos-1456332472341 hierarchical_allocator_process.hpp:563] Recovered cpus():1; mem():378; disk():32808; ports():[31000-32000] (total allocatable: cpus():1; mem():378; disk():32808; ports():[31000-32000]) on slave 20160224-163417-169978048-5050-1266-0 from framework GOCD-Mesos-1456332472341

The problem here is the mesos-master isn't able to communicate back the framework because the framework isn't visible on the public interface. The solution is to set LIBPROCESS_IP=public_interface_ip and restart the framework.

Specify exact user to run command on slaves

The gocd-mesos framework launches go agents as docker containers. When the framework launches a new mesos task, the task needs a explicit user to be specified along with the Task Config. If else it assumes the same user as the framwework is running. This is problematic when your framework and mesos-slaves are running in different machines, which have different uid. To solve this, the mesos-slave should be started with a --switch-user option enabled.

All the above problems are not easily found on a google search. Hope this may help a soul not loose sleepless nights figuring it out. Also mesos framework development is fun. If you are running a instance of GoCD, checkout the progress and if possible support/contribute the development here.

Published at : Wednesday Feb,24 2016 at 22:40 | Tags : hacking, mesos, scala, | Comments
blog comments powered by Disqus Browse Archives
I am..
Azhagu Selvan SP
Atheist, FOSS enthusiast,
Pythonist, Student
Legal
Creative Commons License