Next Wednesday, a meetup will take place in Paris talking about Elasticsearch. My previous post was talking about a log solution using Elasticsearch.
During that meetup, an overview of how Elasticsearch powerful real-time capabilities enrich Hadoop development. The session will showcase how to perform real-time searches on top and within Hadoop, Hive, Pig or Cascading jobs to get better answers, fast.
We’ll cover architectural topics such as index scalability, data locality and partitioning between the Hadoop cluster and the search index, using off and on-premise storages (HDFS, S3, local file-systems) and multi-tenancy.
Managing logs is not a complicated tasks with classical syslog systems (syslog-ng, rsyslog…). However, being able to search in them quickly when you have several gigabit of logs, with scalability, with a nice graphical interface etc…is not the same thing.
Hopefully today, tools that permit to do it very well exists, here are the list of tools that’s we’re going to use to achieve it:
Elasticsearch: Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine.
As you may know, I’m using and loving MariaDB. A colleague recently introduced me the benefits of the php built-in MySQL/MariaDB library (shame on me, I never heard of it). Here are the informations you can find from it on the MySQL website:
The mysqlnd library is highly optimized for and tightly integrated into PHP. The MySQL Client Library cannot offer the same optimizations because it is a general-purpose client library.
On next Monday, a meetup will take place in Paris concerning Big Data with Twitter feeback. This meetup will be live retransmitted through YouTube.
The main talk will speak about how Twitter did to scale to millions of users:
Twitter is one of the most heavily trafficked sites on the internet, with over 240 million active users and serving over 500 million Tweets per day. Twitter, as a company and as a service has seen remarkable growth in its eight years of existence.
The Percona replication manager (PRM) is a framework using the Linux HA resource agent Pacemaker that manages replication and provides automatic failover. This covers the installation of the framework on a set of servers. The PRM framework is made of 4 components: Corosync, Pacemaker, the mysql resource agent and MySQL itself.
It’s easy to setup, better if you already know how to use Pacemaker and it works like a charm. In fact it setup a master and x slaves.
Vagrant is a fast solution build on top of VirtualBox. I already talked about it in a previous post.
The thing is, you could do really more with Vagrant by adding Puppet manifests or Chef recipes in your Vagrant configuration file. For those who don’t often use one of those 2 softwares, it quickly could transform into a nightmare when they want to deploy softwares in addition of the OS.
I recently gave a training on MariaDB/MySQL regarding performances tuning and High Availability. You can find here the slides I’ve made for it (french).
Some subjects related in that document are replication master -> slave, master < -> master and Galera Cluster. You can download the PDF here or see it directly on SlideShare :
I hope this will help some of you to get a better understanding on some advanced features.