迈博汇金app:Installing a large Linux cluster, Part 1: Introduction and hardware configuration

来源:百度文库 编辑:九乡新闻网 时间:2024/04/29 17:51:14

Installing a large Linux cluster, Part 1: Introduction and hardware configuration

Getting started installing a large Linux cluster

Graham White (gwhite@uk.ibm.com), Systems Management Specialist, IBM, Software GroupMandie Quartly (mandie_quartly@uk.ibm.com), IT Specialist, IBM

Summary:  Create a working Linux? cluster from manyseparate pieces of hardware and software, including IBM? System x? andIBM TotalStorage? systems. This part in this multipart series covershardware configuration, including understanding architecture, planninglogical network design, setting up terminal servers, and updatingfirmware.

View more content in this series

Tags for this article:  gpfs

Tag this!Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests

Date:  06 Dec 2006
Level:  Advanced
Also available in:  Chinese Russian

Activity:  16701 views
Comments:   0 (View | Add comment - Sign in)

Average rating (22 votes)
Rate this article

Introduction to the large Linux cluster series

Thisis the first of multiple articles that cover the installation and setupof a large Linux computer cluster. The aim of the series is to bringtogether in one place up-to-date information from various places in thepublic domain on the process required to create a working Linux clusterfrom many separate pieces of hardware and software. These articles arenot intended, however, to provide the basis for the complete design of anew large Linux cluster. Refer to the reference materials and Redbooksunder Resources for general architecture pointers.

Thefirst two parts of this series address the base installation of thecluster and include an overview of the hardware configuration andinstallation using the IBM systems management software, Cluster SystemsManagement (CSM). The first article introduces you to the topic andtakes you through hardware configuration. The second article coversmanagement server configuration and node installation. Subsequent partsof the series deal with the storage back-end of the cluster. They coverthe storage hardware configuration and the installation andconfiguration of the IBM shared file system, General Parallel FileSystem (GPFS ).

This series addresses systems architects andsystems engineers to use when they plan and implement a Linux clusterusing the IBM eServer Cluster 1350 framework (see Resources). Some parts might also be relevant to cluster administrators for educational purposes and during normal cluster operation.

Part 1: General cluster architecture

A good design is critically important before you undertake any configuration steps. The design has two parts:

  • Physical design
    • Rack layout for each rack type (for example, management racks and compute racks)
    • Floor plan for how the racks will be laid out during both the installation and production use, if the two are different
    • Inter-rack connection diagrams for network, power, console access, and so on
    • Intra-rack cabling for storage, terminal servers and so on
  • Logical design
    • Network design including IP address ranges, subnet configuration, computer naming conventions, and so on
    • CSM configuration for custom script locations, hardware settings, and monitoring requirements
    • Operating system requirements, custom package lists, and system configuration options
    • Storage layout, including file system layout, partitioning, replication, and so on

Theexample cluster (see Figure 1) consists entirely of Intel? or AMD-basedIBM Systems computers with attached TotalStorage subsystems (see Resourcesfor more information about these systems). For simplicity, coppergigabit Ethernet cable provides cluster interconnection. This cableprovides good speed in most circumstances with bandwidth increasesavailable between racks using bonded/port-channeled/etherchannel insert-your-favourite-trunking-term-here links.

Thenetwork topology takes a star shape, with all racks connecting back to amain switch in the management rack. The example cluster uses threenetworks: one for management/data (the compute network), one for theclustered file system (the storage network), and one for administrativedevice management. The first two networks are normal IP networks. Thecompute network is used for most tasks, including inter-processcommunications (such as MPI) and cluster management. The storage networkis used exclusively for clustered file system communication and access.


Figure 1. Cluster architecture diagram

Some additional design and layout details for the example cluster include:

  • Management server -- Management server function can reside on a single server or multiple servers. In a single server environment, the management server operates in standalone mode. You can also set up highly available management servers. You can use CSM high-availability (HA) software to "heartbeat" between two servers and manage dynamic failover between them if a failure condition occurs. Another possible method of introducing extra management servers is to use a replication setup if HA is not important in your environment. In this situation, you can back up the management server data to another live system, which you can bring online manually to take over management if necessary. In Figure 1, the management network connections are shown in red. The management server is the CSM server, which used exclusively to control the cluster within using the CSM function: taking care of system installation, monitoring, maintenance, and other tasks. In this cluster, there is one management server.

  • Storage servers and disks -- You can connect several storage servers to a disk-based backend using various mechanisms. Connecting storage to the cluster can be direct or through a storage area network (SAN) switch, either by fiber, copper, or a mixture of the two (see Figure 1). These servers provide shared storage access to the other servers within the cluster. If data backup is required, connect the backup device to the storage server using an extra copper or fiber link. For the example cluster, the storage back end is a single entity, providing shared file system access across the cluster. The next article in the series goes into detail about the setup, configuration, and implementation of the storage hardware and clustered file system.

  • User nodes -- Ideally, the compute nodes of a cluster should not accept external connections and should only be accessible to system administrators through the management server. System users can log in to user nodes (or login nodes) in order to run their workloads on the cluster. Each user node consists of an image with full editing capabilities, the required development libraries, compilers, and everything else required to produce a cluster-enabled application and retrieve results.

  • Scheduler nodes -- In order to run a workload on the cluster, users should submit their work to a scheduler node. A scheduler daemon, which runs on one or more scheduler nodes, uses a predetermined policy to run the workloads on the cluster. Like compute nodes, scheduler nodes should not accept external connections from users. The system administrator should manage them from the management server.

  • Compute nodes -- These nodes run the cluster workload, accepting jobs from the scheduler. The compute nodes are the most disposable part of the cluster. The system administrator can easily reinstall or reconfigure them using the management server.

  • External connections -- Example external connections are shown in green in Figure 1. These connections are considered to be outside of the cluster, and therefore, they are not described in this article.

Hardware configuration

Afteryou assemble the racks and put them into place with all cablingcompleted, there is still a large amount of hardware configuration.Specific cabling details of any particular cluster are not covered inthis article. The hardware configuration steps required before clusterinstallation are described with some specific examples using the examplecluster design outlined above.

Logical network design

Oneof the most commonly overlooked tasks when installing a cluster is thelogical network design. Ideally, the logical design should be on paperbefore cluster implementation. Once you have the logical network design,use it to create a hosts file. In a small cluster, you can write outthe hosts file manually if there are not many devices on the network.However, it is usually best to produce a naming convention and write acustom script to produce the file.

Ensure all the devices on thenetwork are represented in the hosts file. Some examples include thefollowing (with example names):

  • Management servers (mgmt001 - mgmtXXX)
  • Storage servers (stor001 - storXXX)
  • Compute nodes (node001 - nodeXXX)
  • Scheduler nodes (schd001 - schdXXX)
  • User nodes (user001 - userXXX)

Thisnaming convention covers only the five types of computer systems in thenetwork and only one network, which is not nearly good enough. Thereare also the storage network and compute networks to factor in, plus adevice management network. So this file needs to be expanded. Each noderequiring access to the clustered file system needs an address on thestorage network. Each node requires two addresses on the computenetwork: one for the compute address and another for the BaseboardManagement Controller (BMC), which is used for hardware monitoring andpower control. Table 1 outlines a much more comprehensive namingconvention with example IP address ranges.



Table 1. Host file naming convention
DeviceCompute 192.168.0.0/24BMC 192.168.0.0/24Storage 192.168.1.0/24Device 192.168.2.0/24External ext n/w Management server mgmt001 mgmt001_d mgmt001_s mgmt001_m mgmt001_e Storage server stor001 stor001_d stor001_s stor001_m stor001_e User nodes user001 user001_d user001_s none none Scheduler nodes schd001 schd001_d schd001_s/ none none Compute nodes node001 node001_d node001_s none none Compute switches none none none gigb01a none Storage switches none none none gigb01b none Terminal servers none none none term001 none Storage controller A/B none none none disk01a/b none LCM/KVM/RCM none none none cons001 none

When implemented, this scheme produces a hosts file like the example you can access under Downloads.This is a small example cluster consisting of sixteen compute nodes,one management server, one storage server, one user node, and onescheduler node in two racks with the relevant devices attached. Whilenot representing a large cluster, this is sufficient for this examplecluster, and you can easily extend it to represent far larger clustersif required.

Ethernet switches

Thereare two physical networks: one for compute traffic and one forstorage. A standard 32 nodes per rack requires two 48-port switches ineach rack, one for each network. In smaller clusters, the managementrack also requires two of the same switches. For larger clusters, 48ports might not be enough, so a larger central switch might be required.

Eachswitch for the two main networks (ignoring the device managementnetwork) requires a slightly different configuration, because, as in theexample, Gigabit Ethernet interconnects use jumbo frames for thestorage network and a standard frame size for the compute network. Thedevice management network setup is usually very simple: a flat-layertwo-type network on a 10/100 switch is acceptable for device managementpurposes, so no further explanation is needed.

Example A: Extreme Networks switch

Here are the configuration steps for an Extreme Networks Summit 400-48t 48-port Gigabit Ethernet switch.

First,connect to each switch using the serial console port with a straightserial cable (9600, 8-N-1, no flow control) with a default user ID admin and no password. (Just press the Enter key at the prompt.)

For all switches, follow these steps:

  1. Enter unconfig switch all -- Wipes any existing configuration, if required.
  2. Enter configure vlan mgmt ipaddress 192.168.2.XXX/24 -- Sets the management IP address.
  3. Enter configure snmp sysname gigbXXX.cluster.com -- Sets the switch name.
  4. Enter configure sntp-client primary server 192.168.2.XXX -- Sets the NTP server to the management server.
  5. Enter configure sntp-client update-interval 3600 -- Sets time synchronization to hourly.
  6. Enter configure timezone 0 -- Sets the time zone.
  7. Enter enable sntp-client -- Turns on NTP.
  8. Enter configure ports 1-4 preferred-medium copper -- Changes default preferred medium from fiber to copper on ports 1-4, if required.

Now, to configure jumbo frames on the storage network switches, follow these steps:

  1. Enter create vlan jumbo -- Creates the jumbo frames vlan.
  2. Enter configure "mgmt" delete ports 1-48 -- Removes ports from the mgmt vlan.
  3. Enter configure "jumbo" add ports 1-48 -- Adds ports to the jumbo vlan.
  4. Enter configure jumbo-frame size 9216 -- Sets the maximum transmission unit (MTU) size.
  5. Enter enable jumbo-frame ports 1-48 -- Turns on jumbo frame support.

To enable trunking on a 2-port link, use enable sharing 47 grouping 47-48 (group ports 47 and 48, with 47 as the primary).

To complete the configuration, complete the following:

  1. Enter save configuration primary -- Writes switch configuration to flash in order to survive reboots.
  2. Enter use configuration primary

Example B: Force 10 Networks switch

Hereare the configuration steps for a Force 10 Networks e600 multi-bladeGigabit Ethernet switch (with two 48-port blades) for routed networkswhere a central 48-port switch is not big enough.

Configure the chassis, line cards, and ports for an initial layer two configuration by doing the following:

  1. Connect to the switch using the serial console port with a straight serial cable (9600, 8-N-1, no flow control) with default no user ID or password required.
  2. Enter enable -- Enters super-user mode, no password required by default.
  3. Enter chassis chassis-mode TeraScale -- Initializes the switch to tera-scale mode.
  4. Reboot the switch when prompted. This will take a few minutes.
  5. After reboot, connect to the switch and enter super-user mode again by entering enable.
  6. Enter configure -- Enters configuration mode. Prompt looks like Force10(conf)#) .
  7. Enter Interface Range GigabitEthernet 0/0 - 47 (configure line card 0 ports 0 through 47, prompt looks like Force10(conf-if-range-ge0/1-47)#) .
  8. Enter mtu 9252 -- Sets jumbo frames, if required.
  9. Enter no shutdown -- Allows the port to activate.
  10. Enter exit -- Goes back to configuration mode.
  11. Interface Range GigabitEthernet 1/0 - 47. (The prompt looks like Force10(conf-if-range-ge0/1-47)#).
  12. Repeat steps 7-10 for each line card.

Configure the line cards and ports for layer 3 (VLan routing) by doing the following:

  1. Connect to the switch and enter super-user configuration mode by typing enable.
  2. Enter int port channel 1 -- Configures port channel 1.
  3. Enter channel-member gig 0/46-47 -- Adds line card 0 ports 46 and 47 to the vlan.
  4. Enter no shutdown -- Allows the port channel to activate; this option overrides port configuration for inactive/active ports.
  5. Enter ip add 192.168.x.x/24 -- Sets the IP address for the port channel; this is the gateway for your subnet.
  6. Enter mtu 9252 -- Sets jumbo frames, if they are required.

Now, turn on the DHCP helper to forward DHCP broadcasts across subnet boundaries by doing the following:

  1. Enter int range po 1-X -- Applies configuration to all the port channels you have configured.
  2. Enter ip helper 192.168.0.253 -- Forwards DHCP to your management server IP address.

Next, configure the switch for remote management (using telnet or SSH) by doing the following:

  1. Enter interface managementethernet 0 -- Configures the management port from the configure prompt.
  2. Enter ip add 192.168.2.x/24 -- Sets an IP address on the device management network and connects the admin port to the device management switch.
  3. Set a user ID and password in order to allow remote connections.

Finally, save the switch configuration, by entering write mem.

Afterthe switch configuration is complete, you can run a few sanity checkson your configuration. Plug in a device, such as a laptop, at variouspoints on the network to check connectivity. Most switches have thecapability to export their configuration. Consider making a backup copyof your running switch configuration once you have the network setupcorrectly.

The two example switches are described because they are working,100-percent non-blocking, and high-performance Gigabit Ethernetswitches. Cisco Systems switches do not provide 100% non-blockingthroughput, but can be used nonetheless.

Terminal servers

Terminalservers play an important role in large cluster installations that useearlier versions of CSM than CSM 1.4. Clusters using the early versionsrelied on terminal servers to gather MAC addresses for installation.With the compatibility of CSM and system UUIDs, terminal servers are notas important for the installation of a more modern IBM cluster.However, if you have slightly older hardware or software in a largecluster, terminal servers are still vital during system setup. Ensuringthe correct setup of the terminal server itself can save a great deal oftime later in the installation process. In addition to collecting MACaddresses, terminal servers can also be used to view terminals from asingle point from POST and on into the operating system.

Ensurethat the terminal server baud speed for each port matches that of theconnecting computer. Most computers are set to a default of 9600 baud,so this might not be an issue. Also ensure the connection settings andflow control between the terminal server and each connecting system arethe same. If the terminal server expects an authenticated connection,set this up in CSM or turn off authentication altogether.

Example C: MRV switch

Here is an example configuration for the MRV InReach LX series switch (see Resources for more information about this switch). Configure the MRV card by doing the following:

  1. Connect to the switch using the serial console port with a straight serial cable (9600, 8-N-1, no flow control).
  2. Log in at the prompt. The default username is InReach with password access.
  3. Enter enable -- Enters super user mode with default password system. You see a configuration screen the first time the device is configured. Otherwise, enter setup to get to the same screen.
  4. Enter and save the various network parameters as required.
  5. Enter config -- Enters configuration mode.
  6. Enter port async 1 48 -- Configures ports 1 through 48.
  7. Enter no authentication outbound -- Turns off internal authentication.
  8. Enter no authentication inbound -- Turns off external authentication.
  9. Enter no autobaud-- Fixes the baud rate.
  10. Enter access remote -- Allows remote connection.
  11. Enter flowcontrol cts -- Sets hardware flow control to CTS, which is the default on most IBM computers.
  12. Enter exit -- Goes back to configuration mode.
  13. Enter exit -- Goes back to default super user mode.
  14. Enter save config flash -- Saves the configuration and make it persistent across reboots.

Afterthis initial configuration, you should have little else to do. Again,make sure the settings you made here match the settings on theconnecting computers. You should now be able to telnet to your terminalservers in order to manage them in the future. As with the Ethernetswitches, you can view the running configuration in order to do somesanity checking of the configuration on the terminal servers, ifrequired. For example, the command show port async all char returns detailed information about each port on the terminal server.

Firmware updates and setting BMC addresses

If it is appropriate, check and update the firmware across your entire cluster. Consider the following elements:

  • Computer BIOS
  • Baseboard Management Controller (BMC) firmware
  • Device firmware
  • Network adapter firmware

Youcan obtain IBM system updates on the IBM support Web site >, andvendor-specific hardware updates are usually available directly from thevendors' Web sites (see Resources).

Updating firmware on IBM systems

Note:The following method of firmware update might not be supported in yourarea or for your hardware. You are advised to check with your local IBMrepresentative before proceeding. This information is offered forexample purposes only.

CSM code for remotely flashing firmware isstill under development. Currently, if you need to flash many computersfor BIOS, BMC, or other firmware updates, you are presented with a largeproblem. It is not reasonable to flash a large cluster with currentmethods, which involve writing a floppy disk or CD image and attendingto each computer individually; an alternative is required. If you haveno hardware power control (no BMC IP address is set), start by flashingthe BMC firmware, which enables you to set the IP address at the sametime. You only need to press all the power buttons once. For otherfirmware flashes, you can remotely power the systems on and off.

Thefollowing example is for IBM Systems 325 or 326 AMD processor-basedsystems. However, only small alterations are required to apply it toSystem x computers. The idea is to take a default firmware update imageand modify it so that you can use it as a PXE boot image. Then you canboot a system over the network and have it unpack and flash the relevantfirmware. Once the system is set to PXE boot, you only need to turn iton for the flash to take place.

Setting up a PXE boot server

Acomputer on the network running DHCP and TFTP servers is required. ACSM management node installed and running with CSM is a suitablecandidate. However, if there are currently no installed computers on thenetwork, use a laptop running Linux connected to the network. Make surethe PXE server is on the correct part of the network (in the samesubnet), or that your switches are forwarding DHCP requests to thecorrect server across subnet boundaries. Then, complete the followingsteps:

  1. Bring up your PXE server with an IP address of 192.168.0.1.
  2. Install, configure, and start a simple DHCP server on the same computer. Here is a sample configuration:
    ddns-update-style ad-hoc;                subnet 192.168.0.0 netmask 255.255.255.0 {                range 192.168.0.2 192.168.255.254;                filename "/pxelinux.0";                next-server 192.168.0.1;                }                

  3. Install, configure, and start a TFTP server to run out of /tftpboot/. Install syslinux, which is provided as an RPM package for both Suse and Red Hat Linux.
  4. Copy the memdisk and pxelinux.0 files installed with the syslinux package into /tftpboot/.
  5. Create the directories /tftpboot/pxelinux.cfg/ to hold the configuration files and /tftpboot/firmware/ to hold the firmware images.
  6. Write a default PXE configuration containing entries for the firmware to upgrade to /tftpboot/pxelinux.cfg/default, such as the following:
    serial 0 9600                default local                #default bmc                #default bios                #default broadcom                label local                localboot 0                label bmc                kernel memdisk                append initrd=firmware/bmc.img                label bios                kernel memdisk                append initrd=firmware/bios.img                label broadcom                kernel memdisk                append initrd=firmware/broadcom.img                

For reference, when a computer receives a DHCP address during PXE, the configuration files in /tftpboot/pxelinux.cfgare searched in a specific order with the first file found being theone used for the boot configuration for the requesting computer. Thesearch order is determined by converting the requesting DHCP addressinto 8 hexadecimal digits and searching for the first matching filenamein the configuration directory by expanding subnets -- by removing adigit right-to-left on each pass of the search.

As an example, consider a client computer getting the address 192.168.0.2 from the server during PXE boot. The first file search is for the hexadecimal version of this IP address /tftpboot/pxelinux.cfg/C0A80002. If this configuration file is not present, the next searched for is C0A8000, and so on. If no matches are found, the file named default is used. Therefore, putting the above PXE configuration in a file named default works for all computers, regardless of your DHCP configuration. However, for the example, writing the configuration to C0A800 (the 192.168.0.0/24 subnet) reduces the amount of searching.

Updating the Baseboard Management Controller (BMC) firmware and setting an IP address

Note:the procedure described here is for the AMD-based cluster nodes.However, you can use a similar procedure for the Intel-based nodes.Intel BMC updates are provided with the bmc_cfg.exe program (instead of lancfg.exe) to set the BMC address. You can drive this using the terminal servers with a script such as the sample script available under Downloads. Also, for Intel-based computers, you can usually set the BMC address in the system BIOS.

Afteryou set the BMC address on a node, you have remote power control, whichmakes life easier when configuring the cluster. However, this method ofupdating the BMC relies on network boot, so if your computers are notset to PXE boot in the BIOS yet, you can update the BIOS first andreturn to the BMC update afterwards.

Download the latest BMCfirmware update DOS image and follow the instructions to create a floppydisk boot image. This image contains a program called lancfg.exethat allows you to set an IP address on the BMC. The usual process isto insert the floppy disk and boot from it in order to apply the update.However, first create a PXE boot image from the floppy disk on your PXEboot server computer with the following command:

dd if=/dev/fd0 of=/tftpboot/firmware/bmc.img bs=1024

Nowyou can edit the DOS image as needed. For the BMC update, nomodifications are required to the base image itself, except to copy aDOS power-off program into the image. At a high level, you power on thecomputer, it PXE boots to flash the BMC firmware, and it leaves thecomputer running in the DOS image. Using a script, you can then set theBMC address through the terminal server and power the computer off. Inthis way you know all the computers powered on are either flashing theirBMC firmware or waiting for the IP address to be set. On any computersthat are powered off, this process is completed. Download a suitableDOS-based power off command, such as the atxoff.com utility. Once you have a power-off utility, copy it to the image as follows:

mount -o loop /tftpboot/firmware/bmc.img /mnt            cp /path/to/poweroff.exe /mnt            umount /mnt            

Now ensure your PXE bootconfiguration file can send the correct image by changing theappropriate comment to set the default to BMC in the /tftpboot/pxelinux.cfg/defaultfile previously created. After testing on a single node, boot allcomputers from the power off state so the flash takes place across allthe required nodes. When all the nodes have booted the PXE image, changethe configuration back to localboot in order to minimize the chance ofaccidentally flashing a computer if one were to be rebooted.

You can now call the lancfgprogram and operate it through the terminal server (assuming the BIOSsettings export the terminal over serial with the same settings asconfigured on the terminal server). The BMC IP address can be set using lancfg in a Perl script, such as the unsupported sample script available under Downloads. For example, to set the BMC address of all computers in a node group called Rack1 with gateway address 192.168.10.254 and netmask 255.255.255.0, run the following from the PXE boot server computer:

perl set-bmc-address.pl -N Rack1 -g 192.168.10.254 -m 255.255.255.0            

You can customize this scriptbased on your setup. When the script is completed, the computer turnsoff automatically after having its BMC IP address set using the DOSpower-off program you copied to the boot image.

Updating the BIOS

Ifyou have the default BIOS settings applied on all computers, you can dothis step before the BMC update above. Flashing the BIOS is a two-stageprocess, resulting in the factory default settings being applied ifperformed without changes. Therefore, you need to flash and also apply anew appropriate configuration with any required changes for yourcluster. Download the latest BIOS update DOS image, and follow theinstructions to create a floppy disk boot image.

You need a savedconfiguration for the appropriate BIOS level and settings you require.In order to do this, manually update one computer. Boot a computer withthe floppy disk image (use a USB floppy drive if the computer does nothave one). Apply the update according to the readme file, and wait forthis to finish as normal. Reboot the computer and make all the changesto settings you require in the BIOS. Options to consider are turningNumlock to off (if you don't have a number keypad on your keyboard),enabling the serial port, setting the console redirection through theserial port with the appropriate settings configured to match theterminal servers, and setting the boot order to ensure Network appearsbefore Hard Disk. When the changes are complete, save them, and turn offthe computer.

On another computer (such as the one you have setup for PXE booting), mount the floppy disk containing the BIOS update.Rename the autoexec.bat file to keep it as a backup on the floppy forlater. This prevents the system from flashing the BIOS if this disk werebooted again. Insert the disk back into the computer where the updatedand configured BIOS options are set, and boot from your modified floppydisk image.

When the DOS prompt appears, ensure your currentworking directory appears on the a: drive. There is a program on thefloppy called cmosram.exe that allows you to save theconfiguration of the BIOS to disk. Run this program to save the BIOSsettings to floppy disk as follows:

cmosram /load:cmos.dat            

Once the settings are in the autoexec.batfile, you are ready to apply the update. As a sanity check, test thefloppy image you have in a computer to check that the flash happensautomatically and the correct settings are applied. You also notice thatthe system remains on after flashing the BIOS. You can get the systemto turn off automatically after the BIOS update in a similar way asdescribed in the BMC update section by using a DOS power-off utility and calling it from the autoexec.bat file.

Onceyou are satisfied with your modified BIOS update image, you can create aPXE boot image from the floppy disk with the following command:

dd if=/dev/fd0 of=/tftpboot/firmware/bios.img bs=1024            

Change the default PXE boot configuration file /tftpboot/pxelinux.cfg/defaultso it serves the BIOS image when the systems PXE boot. Now, by poweringon a system connected to the network, it automatically flashes the BIOSwithout any user input, applies the correct BIOS settings, and powersoff again. When all updates are complete, return the default PXE bootconfiguration to boot from local disk to avoid any accidents if acomputer were to make a PXE request.

Updating the Broadcom firmware

Afterupdating the BMC firmware and BIOS, updating the Broadcom firmware is asimple repeat of the same ideas. Follow these steps:

  1. Download the Broadcom firmware (see Resources), and follow the instructions to create a floppy disk boot image.
  2. Create a PXE boot image from the floppy disk using the following command: dd if=/dev/fd0 of=/tftpboot/firmware/broadcom.img bs=1024"
  3. Loop mount the image file using the following command: mount -o loop /tftpboot/firmware/broadcom.img /mnt
  4. Copy a DOS-based power-off program into the image directory.
  5. Change the autoexec.bat file to automatically update the Broadcom firmware in unattended mode, and turn off the computer when it finishes. For example, for an IBM Systems 326, machine type 8848, the autoexec.bat might look like the following:
    @echo off                call sramdrv.bat                echo.                echo Extracting files...                call a:\bin.exe -d -o %ramdrv%\update >NULL                copy a:\command.com %ramdrv%\command.com                copy a:\atxoff.com %ramdrv%\atxoff.com                set COMSPEC=%ramdrv%\command.com                if exist NULL del NULL                %ramdrv%                cd \update                call update.bat 8848                cd                 atxoff                

  6. Unmount the image.
  7. Check the default configuration /tftpboot/pxelinux.cfg/default to ensure the computer can boot the firmware update for the Broadcom adapter.
  8. Boot any computers that require the update.
  9. Return the configuration back to the local disk PXE configuration.

Afteryou have updated the firmware across the cluster, you can continuehardware setup with the knowledge that fewer problems will arise lateras a result of having the latest firmware code. However, you can repeatthe process at any time should you need another update. Also, theprinciples behind this type of firmware update can also be applied toany other type of firmware you might need to flash, as long as you canobtain a firmware update image to PXE boot.

Conclusion

That concludes the instructions for hardware configuration for a large Linux cluster. Subsequent articles in the Installing a large Linux clusterseries contain the steps to set up the software side of the cluster,including management server configuration and node installationprocedures in the next article.