Introduction to GlusterFS (File System) Backup Server and Installation on RHEL/CentOS...

Introduction to GlusterFS (File System) [Master-Slave] Backup Server and Installation on RHEL/CentOS and Fedora

by -
0 1914

Introduction to GlusterFS

We are living in a world where data is growing in an unpredictable way and it our need to store this data, whether it is structured or unstructured, in an efficient manner. Distributed computing systems offer a wide array of advantages over centralized computing systems. Here data is stored in a distributed way with several nodes as servers.

1gl

What is GlusterFS?

GlusterFS is a distributed file system defined to be used in user space, i.e. File System in User Space (FUSE). It is a software based file system which accounts to its own flexibility feature.

Look at the following figure which schematically represents the position of GlusterFS in a hierarchical model. By default TCP protocol will be used by GlusterFS.

GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads.

 

2glFigure above: GlusterFS – One Common Mount Point

GlusterFS supports standard clients running standard applications over any standard IP network. Figure 1, above, illustrates how users can access application data and files in a Global namespace using a variety of standard protocols.

No longer are users locked into costly, monolithic, legacy storage platforms. GlusterFS gives users the ability to deploy scale-out, virtualized storage – scaling from terabytes to petabytes in a centrally managed and commoditized pool of storage.

Attributes of GlusterFS

Attributes of GlusterFS

  • Scalability and Performance
  • High Availability
  • Global Namespace
  • Elastic Hash Algorithm
  • Elastic Volume Manager
  • Standards-based

Advantages to GlusterFS

Advantages to GlusterFS

  1. Innovation – It eliminates the metadata and can dramtically improve the performance which will help us to unify data and objects.
  2. Elasticity – Adapted to growth and reduction of size of the data.
  3. Scale Linearly – It has availability to petabytes and beyond.
  4. Simplicity – It is easy to manage and independent from kernel while running in user space.

What makes Gluster outstanding among other distributed file systems ?

What makes Gluster outstanding among other distributed file systems?

  1. Salable – Absence of a metadata server provides a faster file system.
  2. Affordable – It deploys on commodity hardware.
  3. Flexible – As I said earlier, GlusterFS is a software only file system. Here data is stored on native file systems like ext4, xfs etc.
  4. Open Source – Currently GlusterFS is maintained by Red Hat Inc, a billion dollar open source company, as part of Red Hat Storage.

Storage concepts in GlusterFS

Storage concepts in GlusterFS

  1. Brick – Brick is basically any directory that is meant to be shared among the trusted storage pool.
  2. Trusted Storage Pool – is a collection of these shared files/directories, which are based on the designed protocol.
  3. Block Storage – They are devices through which the data is being moved across systems in the form of blocks.
  4. Cluster – In Red Hat Storage, both cluster and trusted storage pool convey the same meaning of collaboration of storage servers based on a defined protocol.
  5. Distributed File System – A file system in which data is spread over different nodes where users can access the file without knowing the actual location of the file. User doesn’t experience the feel of remote access.
  6. FUSE – It is a loadable kernel module which allows users to create file systems above kernel without involving any of the kernel code.
  7. glusterd – glusterd is the GlusterFS management daemon which is the backbone of file system which will be running throughout the whole time whenever the servers are in active state.
  8. POSIX – Portable Operating System Interface (POSIX) is the family of standards defined by the IEEE as a solution to the compatibility between Unix-variants in the form of an Application Programmable Interface (API).
  9. RAID – Redundant Array of Independent Disks (RAID) is a technology that gives increased storage reliability through redundancy.
  10. Subvolume – A brick after being processed by least at one translator.
  11. Translator – A translator is that piece of code which performs the basic actions initiated by the user from the mount point. It connects one or more sub volumes.
  12. Volume – A volumes is a logical collection of bricks. All the operations are based on the different types of volumes created by the user.

Installation of GlusterFS in RHEL/CentOS and Fedora

In this article, we will be installing and configuring GlusterFS for the first time for high availability of storage. For this, we’re taking two servers to create volumes and replicate data between them.

Step :1 Have at least two nodes

Step :1 Have at least two nodes

  1. Install CentOS 6.5 (or any other OS) on two nodes.
  2. Set hostnames named “server1.glusterfs.com” and “server2.glusterfs.com“.
  3. A working network connection.
  4. Storage disk on both nodes named “/glusterfsvolume

Step 2:Take the ssh of both the linux machines.

1gl

 

Configure SELinux and iptables

Open ‘/etc/sysconfig/selinux‘ and change SELinux to either “permissive” or “disabled” mode on both the servers. Save and close the file.

# This file controls the state of SELinux on the
system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is
enforced.
#     permissive - SELinux prints warnings
instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are
protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

:wq!
Save the file and reboot the machine.

# init 6

Next, flush the iptables in both nodes or need to allow access to the other node via iptables.

# iptables -F

10gl

Here some thing about iptables, i would love to share with you.

Note:When ever i just put iptables command on terminal. I just need to save the iptables , on the other hand if i keep change in iptables configuration  file, i need to restart the iptables to make the iptables effective with latest firewall rule.

Step 3:Set the hostnames of both the machines.

Step 3:Set the host-names of both the machines.

2gl

3gpl

# vi /etc/hosts

4gpl

4gpl

Then save the file on both sides.

To keep hostname effective , restart the machine on both sides or if you do not wantg to restart machine, the you can simply set the hostname by hostname command.Please follow below commands.

# init 6 (To reboot the machines.)

#hostname server1.glusterfs.com

#hostname server2.glusterfs.com

Step 4: A working network connection

A working network connection. And both the machines should be connected to each other in a network.

Step 5:: Enable EPEL and GlusterFS Repository

Before Installing GlusterFS on both the servers, we need to enable EPEL and GlusterFS repositories in order to satisfy external dependencies. Use the following link to install and enable epel repository under both the systems.

Next, we need to enable GlusterFs repository on both servers.

# yum install wget  -y

(Sometimes in minimal installation, we do not find wget command, so we need to install wget via yum )

# cd /etc/yum.repos.d

# wget http://download.gluster.org/
pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

7gl

Step 6: Installing GlusterFS

Install the software on both servers.

# yum install glusterfs-server -y

Start the GlusterFS management daemon.

# service glusterd start

Now check the status of daemon.

# service glusterd status

9gl

Step 8: Configure the Trusted Pool

Run the following command on ‘Server1‘(server1.glusterfs.com)

#gluster peer probe server2.glusterfs.com

Run the following command on ‘Server2‘.(server2.glusterfs.com) to check.

# gluster peer probe server1
11gl

Error which you will get sometimes:

[root@server1 ~]#gluster peer probe server2 
peer probe: failed:
Probe returned with unknown errno 107 

gluster peer probe: failed: Probe returned with unknown errno 107

If you get unknown errno 107 when you do a “gluster peer probe othermachine” then you have got a network problem.

Check the following:

  • check the output of iptables-save for any firewall rules that might be interfering, on both machines
  • make sure glusterd is running on both machines
  • make sure you can actually ping the other machine
  • both machines need to have a valid gateway set up

It is probably one of those.  In any case, it is a network problem.

After checking all: these i came to resolution , when at first time i has done
this, as these kins of issues for an engineer at fresher level or at intial level, he may or will face such kind of issues.Most important thing is for machines to be in connectivity in lan or network.You need to keep the servers hostname entry in each other files to be able to recognize each other easily .

Step 9: Set up a GlusterFS Volume

On both server1 and server2.

# mkdir    /glusterfsvolume/kvit

12gl

Step 10: Create a volume On any single server

Step 10: Create a volume On any single server and start the volume. Here, I’ve taken ‘Server1‘.

# gluster volume create kvit replica 2  server1.glusterfs.com:/glusterfsvolume/kvit server2.glusterfs.com:/glusterfsvolume/kvit

volume create: kvit: failed: The brick server1.glusterfs.com:/glusterfsvolume/kvit is being created in the root partition. It is recommended that you don’t use the system’s root partition for storage backend. Or use ‘force’ at the end of the command if you want to override this behavior.

Here please have a look at below snapshot.

13gl

Now follow this below snapshot , you will overcome the error, in previous section , when i was trying to start the volume, i was not able to start becuse as it clearly shows that shared partition is not / or root partition itself,So we need to run this command forcefully by putting force at the end of this command.

Now what i have done:

# gluster volume create kvit replica 2 server1.glusterfs.com:/glusterfsvolume/kvit server2.glusterfs.com:/glusterfsvolume/kvit force

volume create: kvit: success: please start the volume to access data

14

[root@server1 ~]#gluster volume start kvit 
volume start: kvit:
success

15

Next, confirm the status of volume.

# gluster volume
info
[root@server1 ~]#gluster volume info 
Volume Name: kvit 
Type: Replicate 
Volume ID:
b06e14dc-7f8b-428e-88ff-a3597891a4c7 
Status: Started 
Number of Bricks: 1
x 2 = 2 
Transport-type: tcp
Bricks: 
Brick1:
server1.glusterfs.com:/glusterfsvolume/kvit 
Brick2:
server2.glusterfs.com:/glusterfsvolume/kvit 
16n

Glusterfs logs

Note: If in-case volume is not started, the error messages are logged under ‘/var/log/glusterfs‘ on one or both the servers.

[root@server1 ~]# tail -f /var/log/glusterfs/glustershd.log

[2015-05-12 12:13:02.072602] I [client-handshake.c:1413:select_server_supported_programs] 0-kvit-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2015-05-12 12:13:02.077723] I [client-handshake.c:1200:client_setvolume_cbk] 0-kvit-client-0: Connected to kvit-client-0, attached to remote volume ‘/glusterfsvolume/kvit’.

[2015-05-12 12:13:02.077793] I [client-handshake.c:1210:client_setvolume_cbk] 0-kvit-client-0: Server and Client lk-version numbers are not same, reopening the fds

[2015-05-12 12:13:02.077891] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-kvit-replicate-0: Subvolume ‘kvit-client-0’ came back up; going online.

[2015-05-12 12:13:02.078061] I [client-handshake.c:188:client_set_lk_version_cbk] 0-kvit-client-0: Server lk version = 1

[2015-05-12 12:13:02.120142] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-kvit-client-1: changing port to 49152 (from 0)

[2015-05-12 12:13:02.126013] I [client-handshake.c:1413:select_server_supported_programs] 0-kvit-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2015-05-12 12:13:02.129557] I [client-handshake.c:1200:client_setvolume_cbk] 0-kvit-client-1: Connected to kvit-client-1, attached to remote volume ‘/glusterfsvolume/kvit’.

[2015-05-12 12:13:02.129627] I [client-handshake.c:1210:client_setvolume_cbk] 0-kvit-client-1: Server and Client lk-version numbers are not same, reopening the fds

[2015-05-12 12:13:02.136177] I [client-handshake.c:188:client_set_lk_version_cbk] 0-kvit-client-1: Server lk version = 1

Just have a look at above logs, we will catch all the things, how connectivity between two linux machines is done for glusterfs .

[2015-05-12 12:13:02.072602] I [client-handshake.c:1413:select_server_supported_programs] 0-kvit-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

17n

Figure above:Logs of glusterfs at server1.glusterfs.com

Have a look at logs.

Munt and Verify GlusterFS Volume

Mount the volume to a directory under ‘/mnt‘.

# mount -t glusterfs server1.glusterfs.com:/kvit  /mnt

(/mnt is mount point)

18lgl

For permanent mounting , you can mount in /etc/fstab file.

Now you can test this by creating directory in server1 machine
you will same output in server2 machine.As this is master-slave,
this storage technique will act as backup of one machine on
another.

The work will between done on server1 machine and server2 machine will act as backup for server1.

 

19nl

You can restore data from the slave to the master volume, whenever the master volume becomes faulty for reasons such as hardware failure.

That's all about this glusterfs article, its a very wonderful
backup procedure.Stay tuned to linuxgateway.in for more such 
linux interesting articles.

For any queries and questions ,please mail us at linux@kvit.in 
or lalitvohra04@gmail.com.We will be surely help you out.

Download PDF

NO COMMENTS

Leave a Reply

Required Captcha *