Skip to main content

Simple self-hosting and on-premises options

Kameleoon is usually used in a Software as a Service (SaaS) model. This means that the data related to customers (such as, experiments, personalizations, and accounts) is hosted on a common platform. About 150 physical servers (as of 2023) are used to support all the functional features of our software, such as script hosting, data collection, storage, and creation of analytical reports. For every customer, data is shared on common servers and the separation between customer data is logical. Of course, the Kameleoon platform prevents a customer from accessing another customer's data, but these restrictions are implemented in our application's code.

More specifically, customers are regrouped into clusters that correspond to their geographic location (country). For example, French customers share data inside the French data cluster while German customers share inside the German clusters. A cluster for customers of a particular country is always physically hosted in a data-center on that country's soil. This is done for data confidentiality reasons and to comply with regional data privacy laws. So even if you use the default, SaaS model of Kameleoon, we ensure that the data collected on your website is hosted in your country and that local laws are correctly taken into account.

Sometimes customers wish to use our software in self-hosting / on-premises mode. This is usually done for performance, data confidentiality, or security reasons. Kameleoon fully supports the on-premises model and we're able to offer customers three different options to self-host our software. The first one allows customers to host the critical Kameleoon application file (and optionally, public resources such as images) on their own servers or CDN rather than Kameleoon's CDN. It's a quick and easy option (2-3 days). The second one consists in the setup of a dedicated data storage cluster and is of medium difficulty (1-2 weeks). The last one is the full on-premises configuration of Kameleoon where absolutely everything runs on dedicated servers. Setup time can vary depending on the customer's specific requirements but is usually in the range of 1-2 months.

Application file & public resources self-hosting

The simplest option to consider when thinking about using Kameleoon in On-Premises mode is the ability to self-host the application file. The Kameleoon application file can safely reside either on the Kameleoon CDN (default, SaaS setup) or on your own servers or CDN. The required configuration option can be set in the Kameleoon back-office, in the websites setup section. Three values are possible: no self-hosting at all, self-hosting only for application file, self-hosting only for public resources, and full self-hosting, both for application file and other public resources (images).

note

In addition to choosing the correct option for self-hosting in the corresponding Kameleoon back-office select box, please also fill the planned hosting URL/URLs in the text field below. This will be used to generate a correct installation script, and is also needed for images self-hosting (see details below).

Application file self-hosting

Hosting the Kameleoon application file on your own servers can give a small but noticeable performance boost, by removing the additional DNS query and SSL handshake needed if you use the Kameleoon CDN. Also, you may wish to self-host for security reasons (if the Kameleoon application file is served from your own servers, you can ensure your internal security policies are followed and you are responsible for the security of the hosting servers, instead of entrusting us with this responsibility) or to prevent adblockers from affecting the solution.

To self-host the Kameleoon application file, the following two steps are required:

  1. Once you know the URL where the application file will be hosted on your side, you need to provide this URL in the installation tag (this results in a slightly modified installation tag compared with the default ones).

For example, if you are using the JavaScript File (Asynchronous Loading with Anti-Flicker) implementation method, the Kameleoon application file is by default hosted on //SITE_CODE.kameleoon.io/kameleoon.js. You need to change this URL in the installation tag, replacing it with your own URL (such as https://www.customerdomain.com/resources/scripts/kameleoon.js).

  1. Implement synchronization between the file hosted on your servers / CDN and the original file generated by the Kameleoon platform. This is mandatory because the application file is not a static file, its contents change regularly (to be more precise, its contents change everytime an experiment or personalization changes its status on our platform, and also when some configuration changes are carried out). For instance, if you start a new experiment, or pause or stop a running one, contents of the application file will change.

The appropriate way to perform this synchronization depends on your exact setup. For CDNs, they will have their own interface to configure this, which is CDN dependent and is out of the scope of this article. For standard web hosting on your own HTTP server such as nginx or Apache, we recommend a simple cron job that will perform a wget command to retrieve the appropriate file. We recommend running this job every 5 minutes.

Once these two steps are completed, you are ready to use the Kameleoon platform with a self-hosted application file.

note

If you want to obtain a hash of the contents of our original file, this is possible via our Automation API. You can use it to make sure the copied file is equal to the original one, or to be alerted when contents change on our side (if you don't want to perform automatic synchronizations, for instance, but rather only trigger synchronization when it is actually needed).

The following example provides a ready to use installation tag and synchronization commands. Copying / pasting is usually enough to get it working.

Example: Instructions for self-hosting of the Kameleoon application file

<script type="text/javascript">
// Duration in milliseconds to wait while the Kameleoon application file is loaded
var kameleoonLoadingTimeout = 1000;

var kameleoonQueue = kameleoonQueue || [];
var kameleoonStartLoadTime = new Date().getTime();
if (! document.getElementById("kameleoonLoadingStyleSheet") && ! window.kameleoonDisplayPageTimeOut)
{
var kameleoonS = document.getElementsByTagName("script")[0];
var kameleoonCc = "* { visibility: hidden !important; background-image: none !important; }";
var kameleoonStn = document.createElement("style");
kameleoonStn.type = "text/css";
kameleoonStn.id = "kameleoonLoadingStyleSheet";
if (kameleoonStn.styleSheet)
{
kameleoonStn.styleSheet.cssText = kameleoonCc;
}
else
{
kameleoonStn.appendChild(document.createTextNode(kameleoonCc));
}
kameleoonS.parentNode.insertBefore(kameleoonStn, kameleoonS);
window.kameleoonDisplayPage = function(fromEngine)
{
if (!fromEngine)
{
window.kameleoonTimeout = true;
}
if (kameleoonStn.parentNode)
{
kameleoonStn.parentNode.removeChild(kameleoonStn);
}
};
window.kameleoonDisplayPageTimeOut = window.setTimeout(window.kameleoonDisplayPage, kameleoonLoadingTimeout);
}
</script>
<script type="text/javascript" src="//www.customerdomain.com/resources/scripts/kameleoon.js" async="true"></script>

On the integration snippet, we changed the source of the script to our own URL: //www.customerdomain.com/resources/scripts/kameleoon.js. Below are working examples of synchronization commands.

# wget command

wget https://SITE_CODE.kameleoon.io/kameleoon.js -O /var/www/html/resources/scripts/kameleoon.js -T 30 -t 3

# cron entry

*/5 * * * * wget https://SITE_CODE.kameleoon.io/kameleoon.js -O /var/www/html/resources/scripts/kameleoon.js -T 30 -t 3
note

The domain for your Kameleoon scripts may vary from one project to another. Your projects may be hosted on either kameleoon.eu or kameleoon.io depending on their creation date. Make sure you use the domain displayed in your project in the Kameleoon App.

If you use unified session data across subdomain, note that an additional static iFrame (https://www.customerdomain.com/path/to/kameleoon-iframe.html) has also to be self hosted. But this is the case even in the standard (SaaS) setup for this implementation method, which is documented in the Unify session data across subdomains documentation.

Images self-hosting

In addition to application file self-hosting, images uploaded via the Kameleoon platform can also be self hosted. If this option is chosen and images URL is provided, the generated URLs for uploaded images will use your own server / CDN and not our main url. The standard URL path for uploaded images is SITE_CODE.kameleoon.io/images/ and resources there are served by a Content Delivery Network (the same one as the application file). If the images url path you specified is different from our path, like https://server.mydomain.com/path/resources/images/, then you will need to adjust your CDN configuration to rewrite /path/resources/images/ to /images/ yourself. Different CDN providers have different mechanisms for that so it's out of the scope of this article. If you don't actually need a different path, just keep your url path the same as ours - https://server.mydomain.com/images/.

Of course, for images self-hosting to work a synchronization mechanism must also be used. This is outside the scope of this article, as it's much more complex than synchronizing a single file as in the previous section. Several files have to be considered and the exact names and thus URLs of images uploaded cannot be known in advance.

We recommend using images self-hosting only via a CDN, which usually has its own built-in mechanism for replication. Writing your own could be time consuming. The exact configuration to be performed depends on your exact CDN, but the general idea is to point your CDN to serve resources from the SITE_CODE.kameleoon.io/images/ origin url.

note

The Kameleoon domain may vary from one project to another. Your projects may be hosted on either kameleoon.eu or kameleoon.io depending on their creation date. Make sure you use the domain displayed in your project in the Kameleoon App.

Dedicated clusters for Data Storage

Using a separate cluster for data storage means that data collected for visitors on your website will no longer be stored with data from other Kameleoon users: it will be physically separated. It will reside on dedicated, separate servers. This offers the following advantages:

  • Security: physical separation offers another level of security compared to logical separation.

  • Performance: the servers are only used for your storage and operations, so speed will always be optimal, independently of the usage of the platform by other customers.

  • Access to raw-data: when using a dedicated cluster, we authorize low-level access to the underlying databases (mostly ClickHouse). This means that your Data Scientists can run custom queries and obtain tailor-made results.

The setup of a dedicated cluster for data storage consists in the installation of several open-source database systems. We use 4 main technologies. Three are mandatory (Kafka, HDFS and ClickHouse) for any setup. The 2 others (Cassandra and Elasticsearch) depends on your use of Kameleoon. If you only have the A/B Testing module, they are not needed, but if you use the Personalization module for example, then you would need Cassandra.

  1. Kafka (mandatory). All the data collection events are produced in Kafka topics, thus made available to several ETL applications all along the data pipeline.

  2. Hadoop File System (mandatory). All the data collection events are stored in HDFS. From this raw data, we can rebuild visits, which are then used in other scalable databases. As a result HDFS is seen as the main datastore / backup system / source of truth.

  3. ClickHouse (mandatory). ClickHouse is the OLAP engine we use to create all analytical reports on our platform. With knowledge of the data model we use, you can also run your own custom queries for advanced analysis and reports.

  4. Cassandra (required for personalization and / or cross-device history reconciliation). Cassandra is used for various tasks, for instance computation of the Machine Learning models or cross-device history reconciliation.

note

The actual setup and configuration of the servers is usually performed by experience Kameleoon engineers. These operations can either be performed on your data center (servers operated and owned by you, the customer) or on our own data centers (Kameleoon manages the server provisioning and billing). Flexibility of deployment is one of Kameleoon's competitive advantages.

These are the server requirements for the dedicated data cluster option.

caution

Servers need to be physical, bare metal servers. Although theoretically possible, we have no experience yet in using virtualized servers.

ComponentVersionMinimal number of serversOptimal number of serversRecommended amount of RAMServer Storage typeRemarks
Kafka2.3.12232 GBSpinning disks with large capacity (8TB or more)Confluent distribution - version 5.3.1
Hadoop File System2.9.12232 GBSpinning disks with large capacity (8TB or more)Replication of data is crucial, so 2 servers needed
ClickHouse22.3.31264 GBSSD recommended
Cassandra4.0.11232 GBSSD mandatory

We recommend using the latest version of Rocky Linux distribution for all components.

note

For a customer with only the A/B testing module, this amounts to a minimal setup of 5 servers, and a recommended setup of 6 servers.

For a customer with the personalization module, this amounts to a minimal setup of 6 servers, and a recommended setup of 8 servers.

Full On-Premises model (separated back-office, data collection and storage cluster, and application file hosting)

To host the entirety of the Kameleoon platform, you need self-hosting of the application file, a dedicated data storage cluster, a dedicated data collection pipeline and a separated back-office. With this scenario, ALL the components and functionality of the Kameleoon platform are hosted on your own IT ecosystem. This allows for custom security policies, for instance it would be quite common in this case to setup a VPN with access restricted to corporate workstations.

The Back-Office application itself runs on a Tomcat JEE server. It uses several other Java standalone applications that communicate through an instance of ActiveMQ. We use MySQL as a relational database for the back-office. And nginx is required as a high performance HTTP server to collect data events sent by browsers (beacon HTTP calls).

Here are the exact server requirements for the dedicated data pipeline and back-office. In addition to that, you need to provide servers for the storage cluster detailed in the previous section, and either a CDN or hosting server (first section).

ComponentVersionMinimal number of serversOptimal number of serversRecommended amount of RAMServer Storage typeRemarks
JDK / Tomcat / ActiveMQ1.8 / 8.0.47 / 5.14.51132 GBSSD recommendedTomcat JEE server and standalone Java applications are collocated
MySQL8.0.211132 GBSSD recommended
nginx1.20.11232 GBSpinning disksA proprietary Java log parsing application will also be installed on these nodes

We recommend using the latest version of Rocky Linux distribution for all components. The Back-office application is provided as a WAR file which has to be hosted on the Tomcat server. Other Java modules (standalone applications) are provided as JAR files.

note

Some additional collocation is theoretically possible. For instance, you could collocate the MySQL server with the Tomcat JEE server on a single server. For security and performance reasons we do not recommend any collocation except the one mentionned on the first row of the previous table.

note

For a customer with only the A/B testing module, this amounts to a minimal setup of 9 servers, and a recommended setup of 11 servers.

For a customer with the personalization module, this amounts to a minimal setup of 10 servers, and a recommended setup of 13 servers.

These numbers are obtained from the sum of the two tables on this page, adding one extra server for the Kameleoon application file hosting (so if you use a CDN, you can substract one from these numbers).