迈博汇金股票资讯网:Installing a large Linux cluster, Part 3: Storage and shared file systems

来源:百度文库 编辑:九乡新闻网 时间:2024/05/02 02:17:08

Installing a large Linux cluster, Part 3: Storage and shared file systems

Large Linux cluster storage backend

Graham White (gwhite@uk.ibm.com), Systems Management Specialist, IBM Mandie Quartly (mandie_quartly@uk.ibm.com), IT Specialist, IBM

Summary:  Create a working Linux? cluster from many separate pieces of hardware andsoftware, including System x? and IBM TotalStorage? systems. Part 3 provides the firsthalf of the instructions you need to set up the storage backend, includingdetails on storage architecture, needed hardware, and the Storage Area Network.

View more content in this series

Tag this!Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests

Date:  04 May 2007
Level:  Advanced
Also available in:  Chinese Russian

Activity:  10408 views
Comments:   0 (View | Add comment - Sign in)

Average rating (5 votes)
Rate this article

Introduction

This is the third in a series of articles that cover the installation andsetup of a large Linux computer cluster. The purpose of the series is to bringtogether in one place up-to-date information from various sources in the publicdomain about the process required to create a working Linux cluster from manyseparate pieces of hardware and software. These articles are not intended toprovide the basis for the complete design of a new large Linux cluster; refer tothe relevant reference materials and Redbooks? mentioned throughout for generalarchitecture pointers.

This series addresses systems architects and systems engineers to plan andimplement a Linux cluster using the IBM eServer Cluster 1350 framework (see Resources for more information about the framework). Some parts might alsobe relevant to cluster administrators for educational purposes and during normalcluster operation. Each part of this article refers to the same exampleinstallation.

Part 1of the series provides detailed instructions for setting up the hardware for thecluster.Part 2takes you through the next steps after hardware configuration: softwareinstallation using the IBM systems management software, Cluster Systems Management(CSM), and node installation.

This third part is the first of two articles that describe the storage backend of thecluster. Together, these two articles cover the storage hardware configuration andthe installation and configuration of the IBM shared file system, General ParallelFile System (GPFS). This third part takes you through the architecture of thestorage system, hardware preparation, and details about setting up a Storage AreaNetwork. The fourth and final part of the series provides details about CSMspecifics related to the storage backend of our example cluster, notablyperforming node installation for the storage system, and GPFS clusterconfiguration.

Storage architecture

Before continuing, you will benefit from reviewing the General clusterarchitecture section inPart 1of this series.

Figure 1 shows an overview of the storage configuration used for the examplecluster described in this series. The configuration is explained in more detailthroughout this article. This setup is based on GPFS version 2.3. It includes onelarge GPFS cluster split into two logical halves with a single large file system.The example design provides resilience in case of a disaster where, if one halfof the storage backend is lost, the other can continue operation.


Figure 1. Storage architectureoverview

Figure 1 shows four storage servers that manage the storage provided by two disksubsystems. In the top right-hand corner, you can see a tie-breaker server. Thenetwork connections and fiber channel connections are shown for reference. All aredescribed in further detail in the following sections. The rest of the cluster isshown as a cloud and will not be addressed in this article. For more details aboutthe rest of the cluster, seePart 1andPart 2of this series.

Nodes

The majority of the nodes within this GPFS cluster are running Red HatEnterprise Linux 3. The example uses a server/client architecture, where a smallsubset of servers has visibility of the storage using a fiber channel. They act asnetwork shared disk (NSD) servers to the rest of the cluster. This means that mostof the members of the GPFS cluster access the storage over IP using the NSD servers.There are four NSD nodes (also known here as storage nodes) in total: two in eachlogical half of the GPFS cluster. These are grouped into pairs, where each pairmanages one of the storage subsystems.

Tiebreaker

As each half of the cluster contains exactly the same number of nodes, should onehalf be lost, quorum becomes an issue. With GPFS, for the file system to remainavailable, a quorum of nodes needs to be available. Quorum is defined asQuorum = ( number of quorum nodes / 2 ) + 1.

In a case such as this configuration, where the cluster is made of two identicalhalves, the GPFS file system becomes unavailable if either half is lost. To avoidthis situation, the system employs a tie-breaker node. This node is physicallylocated away from the main cluster. This means that should either half becomeunavailable, the other half can continue accessing the GPFS file system. This isalso made possible by the use of three failure groups, which are further explainedunder Data replication. This means two copies of the data areavailable: one in each half of the cluster.

Infrastructure

As illustrated in Figure 1, each node is connected to two networks. The first ofthese is used for compute traffic and general cluster communication. The secondnetwork is dedicated to GPFS and is used for storage access over IP for thosenodes that do not have a direct view of the Storage Area Network (SAN) storagesystem. This second network uses jumbo frames for performance. See the GPFSnetwork tuning section in Part 4 of the series for more details on the storagenetwork.

Storage AreaNetwork storage

The storage backend of this solution is comprised of two disk subsystems that are IBMTotalStorage DS4500 (formerly FAStT 900s) disk systems, each with a number offully populated EXP710 expansion disk drawers attached. Each DS4500 is configuredinto RAID 5 4+P arrays plus some hot spare disks.

Each DS4500 is owned by a pair of storage servers. The architecture splits the4+P arrays between the two servers so that each server is the primary server forthe first half of the arrays and the secondary server for the other half of thearrays. This way, should one of the servers fail, the other server can take overas primary for the disks from the failed server.

Datareplication

This example has GPFS replicate the data and metadata on the GPFS filesystem. The storage is spilt into three failure groups. Afailure group is a set of logical disks that share a common point of failure. (Asseen from the operating system, here a disk corresponds to one LUN, which is onedisk array on the DS4500.) The failure groups in this system are made up of thefollowing:

  • One DS4500 system in failure group one
  • One DS4500 system in failure group two
  • A local disk belonging to the tie-breaker node

When you created the GPFS file system, you should have specified the number of copies of dataand metadata as two. So, with the failure groups defined above,each half contains one copy of the file system. The third failure group isrequired to solve disk quorum issues so that should either half of the storage gooffline, disk quorum is satisfied, and the file system remains accessible.

Hardware preparation

As mentioned, this cluster contains two IBM TotalStorage DS4500 devices, whichform the storage backend of the solution. You can find more information about thishardware under Resources.

IBM couples each DS4500 system with IBM TotalStorage DS4000 EXP710 fiber channel(FC) storage expansion units. Each of these is a 14-bay, 2 GBps rack-mountable FCenclosure. You can find more details about this hardware in the Resources section.

The following section covers in some detail the configuration of the DS4500 and EXP710units within the example solution.

Order of powering on and off

Note that you need to power on and off the SAN system in aspecific order so that all storage is discovered correctly. Perform powering on inthe following order:

  1. SAN switches (and allow them to fully initialize)
  2. EXP 710 drawers
  3. DS4500 (and allow it to fully initialize)
  4. Storage servers

Power off in the opposite order, as follows:

  1. Storage servers
  2. DS4500
  3. EXP 710
  4. SAN switches

Connections

Figure 2 shows the rear of a DS4500 unit. On the left-hand side are four mini-hubports for host connectivity. In this article, these are referred to as slots 1 to4, numbered from left to right, as shown in Figure 1. Slots 1 and 3 correspond tothe top controller, which is controller A. Slots 2 and 4 correspond to the bottomcontroller, which is controller B. On the right-hand side are four mini-hub portsfor expansion drawer (EXP710) connectivity.


Figure 2: Rear view of a DS4500

Cabling

Each DS4500 is cabled into two loops as shown in Figure 3.


Figure 3: Example cabling for aDS4500 and EXP drawers

Set EXP enclosure IDs

Each EXP 710 drawer must have a unique ID. These are set using the panel on theback of each enclosure.

Configure IP addressesfor DS4500 controllers

Set the IP address of each controller using the serial port at the back ofeach enclosure. You could use the application hyperterminal on Windows?or minicom on Linux. The example uses the following settings:

  • baud 38400
  • bits 8
  • parity no
  • stop bits 1
  • flow xon/xoff

Make the connection by sending a break (Ctrl-Break using hyperterminal), thenhitting the space bar to set the speed. Then, send another break and use theescape key to enter the shell. The default password isinfiniti.

Use the command netCfgShow to show the current IPsettings of the controller, and use the command netCfgSet toset the desired IP address, subnet mask, and gateway.

Discover DS4500 from Storage Manager

After this point, the DS4500 is managed using the Storage Manager (SM)software. Use the latest version (9.1 or higher) with new hardware.

You can use Storage Manager to:

  • Configure arrays and logical drives
  • Assign logical drives to storage partitions
  • Replace and rebuild failed disk drives
  • Expand the size of arrays
  • Convert from one RAID level to another

You can also troubleshoot and perform management tasks, such as checking the status ofthe TotalStorage subsystem and updating the firmware of RAID controllers. See Resources for the latest version of Storage Manager for your hardware.

The SM client can be installed on a variety of operating systems. In the exampledescribed in the article, the SM client is installed on the management server.Discover the newly configured DS4500 from the SM client using the first button onthe left, which has a wand on it. To perform operations on a DS4500 seen throughthis interface, double-click the computer name to open a new window.

General DS4500 controller configuration steps

First, rename DS4500 by going to Storage Subsystem >Rename…, and enter a new name. Next, check that clocks are synchronized by goingto Storage Subsystem > Set Controller Clock. There, check that theclocks are all synchronized. Now, set the system password by going to StorageSubsystem > Change > Password.

Update firmware for DS4500 and EXP 710 drawers

To check system firmware levels from the Storage Manager, go to Advanced> Maintenance > Download > Firmware. The current levelsare listed at the top of this window. You can download newer versions onto thecomputer from here, but be sure to use the correct firmware for themodel and to upgrade levels in the order specified in any notes that come with thefirmware code. The firmware for the disks and the ESMs can be also checked fromthe Download menu.

Manual configuration versus scripted configuration

The following sections detail the manual set up of a DS4500. Follow thesesteps for the initial configuration of one of the DS4500s in this solution, savingthe configuration of the first DS4500. This action produces a script that you can thenuse to reproduce the configuration on the same DS4500 should it be reset orreplaced with new hardware.

You can replicate this script and edit it for use on the other DS4500 to alloweasy and accurate reproduction of a similar set up. You need to change the fieldscontaining the name for the DS4500, disk locations, array names, and mappingdetails for hosts (that is, the World Wide Port Numbers [WWPNs]). Note that thesescripts leave the Access LUN in the host group definition. This is removedmanually on each DS4500.

Create hot spare disks

This example uses a number of disks on each DS4500 to remain as hot spares. These areadded by right-clicking the disk to be assigned as a hot spare, choosing themanual option, and entering the password for the DS4500 (set in the General DS4500 controller configuration section).

Create disk arrays

  1. Right-click an unassigned disk to be added to the array, and choose Create Logical Drive.
  2. Click Next in the wizard that appears.
  3. Choose RAID level 5. The original drive is already selected.
  4. Add the four other drives to the array to make five in total.
  5. Click OK on the Array Success window to create a logical drive on this array.
  6. Choose the default option, where the whole of the LUN is used for one logical drive. The naming convention used for the logical drive name is _array. Under Advanced Parameters, choose Customize Settings.
  7. In I/O Characteristics type, use the default, which is File System, and choose the preferred slot so that the arrays alternate between A and B. In this example, there are odd numbered arrays on slot A and even numbered arrays on slot B.
  8. Choose Map Later to return to mapping at a later time.

You see a green cylinder with a clock next to it while you create thisarray. You can check your progress by right-clicking on the logical drive name andchoosing Properties.

Note that the steps beyond this point require that you have configured the SANswitches and installed and run the storage servers with the host bus adapters(HBAs) configured so that WWPNs of the HBAs are seen at the SAN switches and,therefore, by the DS4500. See the SAN infrastructure and the HBA configuration sectionsin Part 4 of the series for details about these steps.

Storage partitioning and disk mapping

Once LUNs are created, they need to be assigned to hosts. In this example, use storage partitioning.Define storage partitions by creating alogical-drive-to-LUN mapping. This grants a host or host group access to aparticular logical drive. Perform these steps in order when defining storagepartitioning. You will initially define the topology and then the actual storagepartition:

  1. Define the host group.
  2. Define the hosts within this group.
  3. Define the host ports for each host.
  4. Define the storage partition.

As already described, in this setup there is only one group per DS4500, containingthe two storage nodes between which all disks on that DS4500 will be twin tailed.All LUNs are assigned to this group, with the exception of the Access LUN, whichmust not be assigned to this group. The Access LUN is used for in-band managementof the DS4500. However, it is not supported by Linux and must be removed from anynode groups created.

Create a new host group by right-clicking the Default Group section andselecting Define New Host Group. Enter the host group name. Create a new hostby right-clicking the host group created and selecting Define Host Port. Inthe pull-down menu, select the WWPN corresponding to the HBA to be added. Notethat for the WWPN to appear in this menu, you must have configured and zoned thehost correctly in the SAN. Storage Manager will then see the port under Show AllHost Port Information. The Linux Host Type has been chosen, and the Host port nameshould be entered in the final box.

Repeat this step so that each host has both ports defined. Next, create the storagepartition by right-clicking the newly created host group and selecting DefineStorage Partition. This opens the Storage Partitioning wizard. Click Next tostart the wizard. Select the Host Group you just created, and click Next. Choosethe LUNs you previously defined to include them here. Note that you must notinclude the Access LUN here. Click Finish to finalize this selection.

SAN infrastructure

This section explains the steps to set up for the SANinfrastructure in a cluster. The SAN switches used in the example configurationare IBM TotalStorage SAN Switch H16 switches (2005-H16). See Resources for more details about this hardware.

In this section, this article covers in some detail the steps in configuration of SANswitches, referring specifically to commands and interfaces for H16 switches asexamples.

Configure IP addressesand hostnames for H16 SAN switches

To perform the initial configuration of the IP addresses on the H16 SAN switches,connect using the serial cable that comes with the switch (black ends,not null modem) into the port at the back of the computer. Use these connectionsettings:

  • 9600 baud
  • 8 data bits
  • No parity
  • 1 stop bit
  • No flow control

Use the default login details: username adminand password password. Change the hostname and IPaddress using the command ipAddrSet. Verify thesettings using the command ipAddrShow.

Once the IP addresses are configured, you can manage the SAN switches with theWeb interface. Connect to a SAN switch using the IP address with a browser with aJava? plugin. To access the Admin interface, click the Admin button and enter theusername and password. At this point, you can enter the new name of the switchinto the box indicated and apply the changes.

The domain ID must be unique for every domain in a fabric. In this example, theswitches are contained in their own fabric, but the IDs are changed in case offuture merges. Note that the switch needs to be disabled before you can change thedomain ID.

For future reference, once the network can access the switch, you canchange the IP address of the SAN switch using the Admin interface from the NetworkConfig tab. This is an alternative to using a serial connection.

SAN switch zoning

The example cluster uses the following zoning rules:

  • HBA0 (Qlogic fiber card in PCI slot 3) on all hosts zoned to see controller A (slot 1 and 3) of the DS4500
  • HBA1 (Qlogic fiber card in PCI slot 4) on all hosts zoned to see controller B (slot 2 and 4) of the DS4500

You set the zoning of the SAN switches using the Web interface on each switch asdescribed in the previous section. The zoning page can be reached using the farright button in the group on the bottom lefthand corner of the window. Tosimplify the management of zoning, assign aliases to each WWPN to identify thedevice attached to the port.

Here is how to create the aliases and assign them to hosts. First, add an alias byclicking Create and entering the name of the alias. Then, choose a WWPN to assignto this newly created alias. You see three levels of detail at each port, asfollows:

  1. The host WWN
  2. The WWPN
  3. Comments

Add the second level to the alias by choosing the second level and selecting Addmember.

Once you create aliases, the next step is to create zones by combininggroups of aliases. In this configuration, you have used zones where each HBA oneach host sees only one controller on the relevant DS4500. As explained in theprevious section, in this example setup, each DS4500 presents its disks to only two hosts.Each host uses a different connection to the controller to spread the load andmaximize the throughput. This type of zoning is known as single HBA zoning. Allhosts are isolated from each other at the SAN level. This zoning removesunnecessary PLOGI activity from host to host, as well as removing the risk ofproblems caused by a faulty HBA affecting others. As a result of this, themanagement of the switch becomes safer, because modifying each individual zonedoesnot affect the other hosts. When you add a new host, create new zones also,instead of adding the host to an existing zone.

The final step is to add the zones defined into a configuration that can be savedand then activated. It is useful to produce a switch report, which you can do byclicking the Admin button and then choosing Switch Report. This report contains,in html format, all the information you need to manually recreate theconfiguration of the switch.

Saving configuration toanother server

Once the SAN switch is configured the configuration can be uploaded to anotherserver using ftp. You can do this again if necessary to automatically reconfigurethe switch. Here are the steps to save the configuration file to a server:

  1. Set up and start ftp on the server to receive the file.
  2. Log into the SAN switch as admin (default password is password) using telnet.
  3. Enter the configupload command.
  4. Enter the information required: IP address, account and password, and name and location of the file to be created.

Updating firmware

You can make firmware updates using download from an FTP server. Here are the stepsto follow:

  1. Set up the FTP server and uncompress the firmware tar package into the ftp directory.
  2. Use the firmwareshow command to check the current firmware level.
  3. Use the firmwaredownload command to start the download process.
  4. Enter the information required: IP address, account and password, and the directory currently holding the firmware followed by release.plist (for example, /pub/v4.4.0b/release.plist). Do not be confused at this point that the release.plist file does not appear to exist. The switch downloads and installs the software and then reboots.
  5. Log in as admin and check the status of the update using the command firmwaredownloadstatus.

Conclusion

This is only part of setting up the backend of your example cluster. The nextsteps involve using CSM to complete setup of the storage backend, which includesperforming node installation for the storage system and GPFS clusterconfiguration. The fourth and final part of this series covers those processes.


Resources

Learn

  • RSS feed for this series: Request notification for the upcoming articles in this series. (Find out more about RSS feeds of developerWorks content.)

  • Review the first two parts of this series:
    • Installing a large Linux cluster, Part 1: Introduction and hardware configuration.
    • Installing a large Linux cluster, Part 2: Management server configuration and node installation.

  • See the IBM TotalStorage DS4500 system reference materials:
    • IBM TotalStorage DS4500 Web page.
    • IBM DS4500 support page.

  • Check out the IBM TotalStorage DS4000 EXP710 fiber channel storage expansion unit reference materials:
    • General IBM EXP710 product page.
    • IBM EXP710 support page.

  • Find the IBM TotalStorage SAN Switch H16 switch reference materials at:
    • General IBM SAN Switch H16 product page.
    • IBM SAN Switch H16 support page .

  • Want more? The developerWorks IBM Systems zone hosts hundreds of informative articles and introductory, intermediate, and advanced tutorials.

  • Stay current with developerWorks technical events and webcasts.

Get products and technologies

  • Get the latest version of Storage Manager for your hardware from the DS4500 download page.

  • Build your next development project with IBM trial software for download directly from developerWorks.

Discuss

  • Exchange information with other developers on the IBM Systems forums and developerWorks blogs.

About the authors

GrahamWhite is a systems management specialist in the Linux IntegrationCentre within Emerging Technology Services at the IBM Hursley Parkoffice in the United Kingdom. He is a Red Hat Certified Engineer, and hespecializes in a wide range of open-source, open-standard, and IBMtechnologies Graham's areas of expertise include LAMP, Linux, security,clustering, and all IBM Systems hardware platforms. He received a BScwith honors in Computer Science with Management Science from ExeterUniversity in 2000.

MandieQuartly is an IT specialist with the IBM UK Global Technology Servicesteam. Mandie performs a cross-brand role, with current experience inboth Intel and POWER? platform implementations as well as AIX and Linux(Red Hat and Suse). She specializes in the IBM product General ParallelFile System (GPFS). She received a PhD in astrophysics from theUniversity of Leiceste