UNIX/Linux Tips [and Tricks?]

Mabuhay

Hello world! This is it. I've always wanted to blog. I don't want no fame but just to let myself heard. No! Just to express myself. So, I don't really care if someone believes in what I'm going to write here nor if ever someone gets interested reading it. My blogs may be a novel-like, a one-liner, it doesn't matter. Still, I'm willing to listen to your views, as long as it justifies mine... Well, enjoy your stay, and I hope you'll learn something new because I just did and sharing it with you.. Welcome!

Monday, December 29, 2008

UPDATED: Veritas NetBackup Tutorial: some troubleshooting commands, etc.

Supposedly, I'll let several days to pass by before making this follow up but I was thinking, it could turn into weeks or even months before I can find time. So, before that happens, I'm going to publish the sequel this early. I don't know but, at times, no, often, I easily forget the details about things BUT it won't take time before I remember it anyway. Just a bit of focus and re-channeling of some energies and time and am back on track again. Ok, 'nuff with the gibberish.. moving forward...

Very early today, I laid down the basic config and executable files needed by NBU to run, at least in a UNIX environment. What's next are some basic troubleshooting commands that you might find useful in finding errors [code] generated by failure, or determine the culprit behind it.

And before we proceed any further, you may find it missing, i.e. topic about installation. As of this writing, I don't have a background in NBU installation. My experience is mostly on the maintenance part. So, please spare me of such questions. But just like any eager student, I'm always willing to learn and craves for knowledge.. at times!

Back!

I'll try to group these according to scenarios when they're needed but no guarantees.

# bpadm = text-based user interface; much quicker than jnbSA [Java-based]; can be used for restoration

# vmoprcmd -d = list of tape drives that are busy with active jobs; status [TLD, TLD-DOWN, AVR, PEND-TLD]; tapes mounted on to drive

# vmoprcmd -hoststatus -h hostname = should get similar o/p as below

Host `hostname` is ACTIVE

# vmoprcmd -h hostname -dps = will report if drive is SHARED; status should be UP

# vmdareq -driveinfo = check which drive is RESERVED or AVAILABLE

# tpconfig -d = tape device files configured for the machine and its status; helpful in identifying EMM

# robtest = to run SCSI pass-through commands; scans drive within library to find empty/full slot IDs; unloading and/or moving tapes across

# bptestbpcd -host hostname [-client clientname] [ -verbose | -debug] = check communication between client and server; to find communication problem; can be executed either from Master or Client

# vmquery -m [-ev in version 3.4] mediaID = tape density, slot number, volume pool assigned, tape location [on/offsite], vault session ID for tape location if offsite; please take note of the result robot type, if it is set to TLD-Tape Library, the tape is in silo, otherwise, it'll show NONE- Not Robotic which is the opposite

# bpgetmedia -p Scratch [| wc -l]

EH0466 8 800 20

EH0901 8 800 20

...

# echo "s d" | tldtest -r /dev/rac/c10t6d0 | grep mediaID = checks for media availability in the robot; -r here refers to the device file/path of the robot [at least for HP-UX]; this shows the drive ["s d"] or ["s s"] for slot

# tpclean -C drivenumber = clean a drive

# vmcheckxxx -rt tld -rn robotnumber = lists the tapes [TLD] currently in the robot; will use current host if neither -h hostname nor -rh robothost is specified

# /usr/openv/netbackup/bin/admincmd/bpdbjobs -report | grep clientname | grep policyname | grep schedule = generates a report of jobs that are done - successful or not, queued, and/or active

# bperror -jobid jobnumber_from_bpdbjobs_o/p -U = provides details about a particular job that ran; usually includes the files/directories being backed up, media used, and error - if failed

# bperror -L -backstat -columns -hours_ago HH | awk '{if (/CLIENT/) ORS="\n" ; else ORS=" "} {print $0}' | grep -i clientname | awk '{print $1, $2, $4, $5, $6, $12, $14, $16, $17, $18, $19}' | grep -i policyname = this is an alternative way of getting the error code from logs

# bpdbjobs [-restart | -cancel | -cancel_all] jobID_number

# telnet clientname bpcd = to check if bpcd daemon is accepting connection; usually executed from Master

# netstat -an | grep -i bpcd = should return bpcd with LISTEN status; see previous blog [and click the Google ads as well, he he]

# bpclntcmd [-ip IP_of_Master | -hn hostname_of_Master] -pn = hostname resolution; resolve multiple host interfaces on clients

# bpplclients | grep -i clientname = lists the client - if present - that is assigned to the Master

# bppllist | grep -i policy_pattern = shows the list of policies accdg. to pattern

# bppllist -byclient clientname -U = lists the policy defined or being used by a particular client

# bppllist policy_name -L -U = a detailed display of properties of a particular policy

# bpdbm -ctime 1161224987 = this will convert it to common format of date and time

1161224987 = Thu Oct 19 10:29:47 2006

# bpimagelist -client clientname -backupid clientname_numberFromANF = I'm not sure if ANF setup is true for all

# bpimagelist -client clientname -d MM/DD/YYYY HH:mm -e MM/DD/YYYY HH:mm -U [-m mediaID] [-L] = this is executed from the Master; checks for saved images from the date specified; note that images saved here are may be limited [not sure 'though depending on the infra setup]

# bpmedialist -U mcontents -m mediaID = list media contents

# bpmedia [-freeze | -unfreeze | -suspend] -ev mediaID

# bpexpdate -d 0 -ev mediaID -force = expire a tape

And lastly, you can run a manual backup as:

# echo "/var/opt" > /home/user01/nbfile

# bpbackup -i -c classname -s schedule -L /path/to/log/file -f /home/user01/nbfile

# bpbackup -p policyname -s schedule -L /path/to/log/file /usr/openv/netbackup/bp*

You might be interested on the what tape device files means such as this:

/dev/rmt/cXtYdZBESTnb

where:

BEST = operational capabilities required including the highest density/format and data compression, if supported

n = no rewind

b = Berkeley style; after file closes, tape is not repositioned in anyway

Please note that this was added on 31 December, 2008 @ 17:53:35...

I almost forgot regarding ejecting a tape, which is as crucial as any other task. I'm not sure but I knew I read it somewhere: eject the tape on the robot first before doing it via `nbmenu`. Anyway - sorry, I accept the fact that I do forget things, often -, here is the way to do it:

1. Access the robot via `robtest` [careful with this command, for it can do nasty things to your robot].
2. Before doing anything stupid, please consider the tape if NOT used by other backup jobs.
3. Select from the classification of tapes you want to move [TLD 0, etc.].
4. Execute the following:

"s d" - to check on the contents of the tape drives or identify the drives that has tape (Contains Cartridge =yes, Barcode = XXXXX)
"s s" - check on the content of the library or identify the empty slots (re-inventory)
"m d# s#" - move the tape from drive to slot
"s d" - to verify that the drive is empty after move
"s s" - to verify that the slot has the tape

5. Quit.
6. Now, you can go to the NBU level - which is pretty straightforward - to eject the tape via textual user interface [nbmenu] or CLI:

vmchange -h Master -multi_eject -res -ml list_of_tapes_delimited_by_colon -rt robot_type -rn robot_number -rh MM -sc -verbose

So I guess, this is it. Hope it will help. If you find anything wrong with these tutorials or guides [means, included are the previous], please do leave a message, and I'd be more than happy to check and learn from it. Parking..

Veritas NetBackup Tutorial: config, executable files

Previously on Netbackup [cool; sounds like Heroes]... we discussed the overview on how NBU works. Going forward, we'll do check out the directories and files needed and/or configured for it to work.

Well, the one of the most - IF not the most - important files is bp.conf located in /usr/openv/netbackup. This file [which can be found accross setup] tells how a machine is configured: Master, Media Manager, and/or Client. Please also note that a machine can be configured as follows:

Master-MM-Client-in-one
MM-Client-in-one
Client

And here is a sample of a basic entry:

$ more /usr/open/netbackup/bp.conf
SERVER=[name of Master]
SERVER=[name of Media Manager]
...[list of other servers]
EMMSERVER=[name of EMM; mostly Master - depending on setup]
CLIENT_NAME=[name of Client; depending on the role as defined above]
... [additional options follows]

Another config file you might want to check-up is /usr/openv/volmgr/vm.conf which contains:

MM_SERVER_NAME=[name of MM]

In addition to these files, we are also considering native UNIX files and ensure that they're properly configured to allow NBU to run:

/etc/services - defines service names and corresponding ports
/etc/inetd.conf
/dev - device files directory which will identify the robots and tapes

Forgive me, I've been busy with specifics and gravely forgot the parent of it all. Almost all of the files: config and commands are located in /usr/openv for Unices or [install directory]\VERITAS for Windows.

So what we're talking here are:

../netbackup - contains NBU, VolMgr binaries, NBU DB
../db/data - EMM & NB databases
../netbackup/db - NBU DB of class, schedules, images, etc.
../netbackup/logs - log files

Here are the daemons, in addition to what was listed before:

tldd - started with ltid robotic daemon one on each MM server
tldcd - started with ltid; talks to the robot
bpbrm - backup and restore manager
bptm - tape manager
bpdm - disk manager
avrd - bar code reader

Next, we'll go to basic troubleshooting of common issues encountered.

Monday, November 17, 2008

Veritas NetBackup Tutorial: overview

It's been a very long time.. always is. The hard part for this kind of blog is you only get to write when something interesting pops up or an error was encountered [and you get to solve it - right] or a new experience.

Well, I never really had the time to internalize this software when I was working with it some few months ago, in my previous job. Just last week [not true anymore; the time here is relative to when I was preparing for a interview - published December 28, 2008 @ 16:19.xx], our paths has crossed, again.

The sudden interest rose from a technical panel [or panel technical?] interview that I've been through. Well, I was satisfied with the result for at least I was able to address all the issues that they've thrown at me, 3 of them. You can't help it but even though I was advised that it has something to do with Veritas NetBackup [NBU], it always will cross with UNIX or Linux. Surprisingly, including myself, I have at least a deeper basic understanding of how NBU works than ever before [and I hope that this is right].

Please note that this so-called tutorial is more aimed to those who have worked with or working with NetBackup yet do not understand how it works. And another thing is, this is on how I understand how it works, which is not necessarily true. But so far, no one questions it, so it must be true. Before it'll cause you giddiness, let's proceed...

****** START ******
In order for a successful backup to run, in NBU environment, we need to complete The Triangle - Master, Media Manager, and Client - well on its simple state. 'Though we also have another entity such as Enterprise Media Manager or EMM server, this can be integrated into the Master, depending on your infra if it needs it to [I'm not going through definition of these so please do your part and read on other articles ;)].

Basically, all the schedules [and catalogs - valid images of a client, archiving, and restorations] are stored in the Master server. If I'm not mistaken, this is being handled by nbpem daemon which checks for backup due to all clients or makes a list for each policy that are due to run and submits it to nbjm. nbjm in turn, gets the necessary resource by coordinating with nbrb to start the backup. nbrb is responsible for allocation of resource to a job with the help of nbemm - holds the info about media and device configurations. Daemons discussed belongs to what we call Intelligent Resource Manager (IRM - resides in Master) who's working together with EMM to schedule and allocate resources for the job.

There are other daemons but I'll not go through each discussing each, however, here are some [take note of the version of your NBU, there might be some daemons that were replaced, so check out the docs]:

Master: bpcd, bpcompatd, bpdbm, ltid, nbnos, nbrd, nbsl, nbsvcmon, vmd, and pbx_exchange
Media Manager: bpcd, bpcompatd, ltid, nbnos, nbsl, nbsvcmon, vmd, pbx_exchange
Client: bpcd

Note that bpcd is the only daemon that runs on Client.
Check using:
$ netstat -an | grep -i bpcd
tcp 0 0 *.bpcd *.* LISTEN

Next up: IRM tells the Media Manager [handles the robot/tape silo] or MM to assign drive and gather data from the Client. So MM, requests the Client for the image of backup, which generates it, and sends back. Media Manager tells the Robotic Control [Robot Tape Library/Silo - I'm not sure it they're a single entity] to find a tape and mount it, afterwhich, MM sends the data to the tape.
****** END ******

That is the general picture of how backup is iniated and done. Again, I'd like to raise that these are my understanding, and I'm doing [when time permits] other reading and may modify this from time to time to correct info. But I always welcome comments: corrections, etc. which will benefit - hopefully - us all.

Note: This is not all my idea. I've read from other articles but forgot the url but thanks to whoever wrote and provided me such materials.

Saturday, October 18, 2008

Filesystem extention - fsadm errno2

Hi, This will be my first contribution for my friend's blog site. Well, This is just another urgent file system increase for other team restoration task. Basically what happen was after I extended the VG and about to extend the LV an error occurred indicating "vxfs fsadm: cannot open /oracle/A6C/mirrlogA/lost+found/.fsadm - errno 2".

The issue was resolved by recreating the lost+found then re-executing the fsadm.

[root@dagz:/oracle/A6C/origlogB]
# lvextend -L 20000 /dev/vg_A6C_00/lv_mirrlogA
Logical volume "/dev/vg_A6C_00/lv_mirrlogA" has been successfully extended.
Volume Group configuration for /dev/vg_A6C_00 has been saved in /etc/lvmconf/vg_A6C_00.conf

[root@dagz:/oracle/A6C/origlogB]
# fsadm -b 20000m /oracle/A6C/mirrlogA
fsadm: /etc/default/fs is used for determining the file system type
vxfs fsadm: cannot open /oracle/A6C/mirrlogA/lost+found/.fsadm - errno 2

[root@dagz:/oracle/A6C/origlogB]
# cd /oracle/A6C/mirrlogA

[root@dagz:/oracle/A6C/mirrlogA]
# mklost+found
creating slots...
removing dummy files...
done
drwxr-xr-x 2 root sys 4096 Oct 15 10:59 /oracle/A6C/mirrlogA/lost+found

[root@dagz:/oracle/A6C/mirrlogA]
# fsadm -b 20000m /oracle/A6C/mirrlogA
fsadm: /etc/default/fs is used for determining the file system type
fsadm: /dev/vg_A6C_00/rlv_mirrlogA is currently 5120000 sectors - size will be increased

[root@dagz:/oracle/A6C/mirrlogA]
# bdf /oracle/A6C/mirrlogA
Filesystem kbytes used avail %used Mounted on
/dev/vg_A6C_00/lv_mirrlogA
20480000 4102261 15354200 21% /oracle/A6C/mirrlogA