Try Serenytics for free - 14 Days
>> Back to tutorials menu

Serenytics data platform architecture

Overview

Serenytics is a cloud app to manipulate data and create dashboards. The data to display in a dashboard can be obtained:

  • From a data-warehouse provided by Serenytics (usually an AWS Redshift cluster). Data can be loaded in this data-warehouse using our internal ETL functions.
  • Directly from an external SQL database (i.e. a customer SQL database) or API (e.g. the Google Analytics API).
  • From a file loaded in Serenytics.

The architecture is composed of three main modules:

  • The Backend, responsible for providing the Serenytics REST API
  • The Frontend, a webapp to connect datasources, create dashboards, dataflows... It's an advanced client for our REST API.
  • The Python client, a library to simplify the use of our API in Python.

Backend architecture

Here is the schema of our backend architecture (click to enlarge):

Serenytics architecture

Our backend is written in Python. It uses a queue-worker architecture:

  • A web server receives the request sent to api.serenytics.com. For small requests (e.g. authentification, get my list of dashboards...), it answers itself. For larger requests (e.g. query data in a datasource, generate a PDF...), it just creates a task in the queue.
  • Workers compute tasks put in the queue.
Key-points about the web server:
  • The web server uses the Flask framework, served with Gunicorn.
  • Models are stored in a PostGreSQL database (we use JSON fields to store complex fields such as dashboard models).
  • It uses MRQ for queuing, a distributed task queue built on top of MongoDB, Redis and Gevent.
  • There is a cache system using a Redis database. For example, if a data query (e.g. "group sales, nb_items by country") has already been computed and is still valid, the web server will reply with this data instead of creating a task in the queue to compute the data query on the data-source (e.g. a SQL database).
Key-points about the workers:
  • The most common task is to query a datasource connected by the user (e.g. a SQL database). For such a request, the web server has queued a job with the query details as parameters in JSON format (e.g. {'groupby':['country'], metrics:['sales', 'nb_items']}). The first job of the worker is to transform this JSON in a query understandable by the queried datasource (e.g. in SQL if the datasource is a SQL database). And then, it actually runs the generated query (e.g. the SQL query), waits for it to be executed and returns the results. This is mostly based on the SQLAlchemy library.
  • One particular task is the Python script (i.e. the user has written Python code directly in Serenytics). For such a task, the worker executes the Python code in a dedicated Docker container.
  • Some tasks can be very short (a few milliseconds), and some others may last for dozens of minutes.
Here are some tricky parts of the whole process:
  • In the worker, translating a JSON to a SQL query is not trivial. Especially when the input JSON contains formulas defined on the datasource. And that would be nice if SQL databases spoke the same language, but they don't (PostGreSQL SQL is close to MySQL SQL, but it's not exactly identical).
  • Serenytics is multi-tenant, meaning that some resources are shared among users (such as workers and data-warehouses). The web server has a policy to avoid allocating all the resources to a single user.
  • Data queries are cancelable (e.g. if their result is not required anymore by the client). For example, if a user opens a dashboard tab and this triggers very long queries on the Redshift data-warehouse, but after a short time, he opens another tab, we cancel the queries for the first tab to run as soon as possible the queries from the new tab.
  • In the web server, the choice to read a result in the cache or to compute it depends on many parameters that could invalidate the cache.

Frontend architecture

The Frontend architecture is an AngularJS application. It is mostly a CRUD (create/read/update/delete) application on the objects managed by Serenytics:

  • Datasources
  • Dashboards
  • Tasks
Datasources and Tasks are simple models and their CRUDing is also simple. The dashboard model is a lot more complicated and thus displaying and editing a dashboard requires a more complex application.

Python client
The Python client is a simple wrapper around the Serenytics API. It is a lot simpler to use it rather than directly calling the Serenytics REST API. It provides high-level functions such as:
  • Send an email with a dashboard attached as a PDF.
  • Get data from a datasource in a Pandas dataframe.
  • Trigger another Serenytics script.
  • ...
Serenytics deployment
Our modules are dockerized and this makes deployment very easy. The containers to deploy are:
  • Backend web servers
  • Workers
  • Frontends (to serve it through a NGinx server)
This is scalable: we can add more backend servers and configure the load balancers to use them. And adding new workers is very straightforward. In practice, we use several queues and we adapt the capacity (i.e. nb of workers) to each queue depending on the type of tasks it processes. These containers require access to:
  • A PostGreSQL database for models.
  • One or several data-warehouses (AWS Redshift or a PostGreSQL).
  • A Redis (for queuing and cache engine).
  • A MongoDB (for queuing).
  • A file storage system (we use AWS S3 but it's also compatible with a local storage).
Mostly, we deploy on our multi-tenant AWS architecture. But we also deploy on our local machines for development (in this case, we also have Docker containers providing all the required services such as the PostGreSQL db, Redis, MongoDB...).
Conclusion

This article was intended to give an overview of the Serenytics architecture.

Feel free to contact us at contact@serenytics.com if you have any feedback. We'll be happy to discuss it with you.

Try Serenytics for free - 14 Days
>> Back to tutorials menu
×