How to gather controller logs on Power Edge 12G server from Life cycle controller

Instructions Step 1 Install USB flash drive in the server Step 2 Reboot the server and press F2 during the POST to enter BIOS Step 3 Under BIOS go to Device Settings > RAID Controller > Controller Management Step 4 Click Save Debug Log (Make sure you have a USB key plugged to the server) Step 5 On the next screen it will show up your USB device where you can save the...

Read More

Script to gather storage, hardware logs from Dell Cloudedge and Poweredge Servers

Log-bot script uses ipmitool, lshw, MegaCli to grab System event logs, controller logs and various hardware information. During execution the script checks for the ipmitool (delloem patched) and install it if its not available. The script takes 4-8 minute to complete. It is successfully tested on various Dell CloudEdge and PowerEdge servers running different Linux distributions. List of logs it gathers: Chassis -Fan rpm’s -FRU information -Power consumption and power history -Power Supply -DRAC info -System event logs -Temperature information OS -Basic Linux command output: (df -h, dmesg, fdisk, free, fstab, lspci etc) -Network Information -Bus information -Logical information -Driver and firmware information -Tape drive information -Disk and volume information -Controller & Vendor information -Size & disk information -Disk’s UUID -Disk size & serial number -BMC information -GPGPU information Storage -Controller logs -Partition information & other storage related information Note: In each bundle there will be a execs.logs file which will have all the commands the script execute.   How to gather logs: For previous versions click here  Download the script by clicking on link or root@theprojectbot:-# wget http://theprojectbot.com/Program/log-bot_v2.tar.gz  Extract the file using the command: root@theprojectbot:-# tar -xvf log-bot_v2.tar.gz  Navigate to the right directory using command: root@theprojectbot:-# cd bot  Run the script: root@theprojectbot:-# python...

Read More

What is a punctured RAID array?

What is a puncture stripe or a punctured RAID array and how to recover from it?  To understand the concept of a punctured stripe first we need to understand what exactly a RAID array is and how the information is stored on the disks in a RAID configuration. In the following post I am considering RAID5 (with three drives) as an example and will try to explain how the puncture happen and how to get rid of it. What is RAID5: In RAID5 the data is distributed in the form of parity across all the member disks.  In the case if one of the drive goes bad the data can rebuild again by calculating the parity across all the drives. More information on the parity can be found on: http://www.dataclinic.co.uk/raid-parity-xor/ But if two drives goes bad then there is no way to rebuild the data back to its original state. In most of the LSI* based controller whenever one disk fails from a container (Virtual Disk), the controller marked that virtual disk as degraded. What causes a puncture? Usually there are several things which can cause a puncture but it usually starts with a failed drive. For an instance John is a busy system admin and his job is to monitor a Dell PE 1950 which has a PERC 5/i controller installed [RAID5 with three disks] . He did not bother to do anything unless there is a amber light reporting an error on the front LCD panel. One ugly Monday  he came to work and saw a drive in slot 0 blinking amber. He called the support and ordered a new drive. Once he received the new drive he yanked the bad hard drive out and put the new drive in. As soon as he puts the new drive in, it starts rebuilding and in an hour or so all the drives are green again.   What did John do wrong? Most of us will say he didn’t do anything wrong. So lets move forward.   After couple of days John find out that drive in slot 1 is now blinking amber. Oh! Bummer. He called the support again and got another drive and continue with the same thing.   What did John do wrong this time? Hmmm lets say nothing because there is a possibility of multiple drive failure in a week difference. No big deal.  ...

Read More