Saturday, May 30, 2009

Configuring a Solaris 10 system as a DNS Cache Server... BIND 9.3.5-P1

Following is list of steps when configuring Solaris 10 system as a DNS Cache server. (Hostname and IP addresses just for an illustration. Its not real but db.cache file. Its hold real ROOT.SERVERS information)

1. Update Solaris 10 DNS patch.
Patch-ID# 119783-08: SunOS 5.10; bind patch (Note: current patch release: 119783-10)

2. Create and check following configuration files
- Check for /etc/hosts file
- Create and edit DNS configuration file; /etc/named.conf
- Create domain run directory; e.g. /var/opt/named
root@prambanan: mkdir /var/opt
root@prambanan: mkdir /var/opt/named
- Obtain a copy of root name (root hints) file; /var/opt/named/db.cache
- Create and edit reverse zone file; /var/opt/named/db.local
- Create and edit local zone configuration file; /var/opt/named/candi.hosts
root@prambaban: # cat /etc/hosts
#
127.0.0.1 localhost
123.152.163.70 prambanan prambanan.candi.com loghost
#

snip

root@prambaban: # cat /etc/named.conf

acl acl_post {
111.123.171.0/24;
111.124.153.64/28;
};
acl acl_precu {
124.111.44.0/22;
120.112.8.0/21;
};
options {
directory "/var/opt/named/";
recursion yes;
allow-recursion {
acl_post;acl_precu;
};
recursive-clients 300000;
forwarders {
124.123.111.222 port 53;
124.124.112.123 port 53;
};
forward first;
};
zone "." {
type hint;
file "db.cache";
};

zone "0.0.127.in-addr.arpa" {
type master;
file "db.local";
};
zone "candi.com" {
type slave;
file "candi.hosts";
masters {
111.222.123.124;
};
};


root@prambaban: # cat /var/opt/named/db.cache
; This file holds the information on root name servers needed to
; initialize cache of Internet domain name servers
; (e.g. reference this file in the "cache . "
; configuration file of BIND domain name servers).
;
; This file is made available by InterNIC
; under anonymous FTP as
; file /domain/named.root
; on server FTP.INTERNIC.NET
; -OR- RS.INTERNIC.NET
;
; last update: Feb 04, 2008
; related version of root zone: 2008020400
;
; formerly NS.INTERNIC.NET
;
. 3600000 IN NS A.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET. 3600000 A 198.41.0.4
A.ROOT-SERVERS.NET. 3600000 AAAA 2001:503:BA3E::2:30
;
; formerly NS1.ISI.EDU
;
. 3600000 NS B.ROOT-SERVERS.NET.
B.ROOT-SERVERS.NET. 3600000 A 192.228.79.201
;
; formerly C.PSI.NET
;
. 3600000 NS C.ROOT-SERVERS.NET.
C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12
;
; formerly TERP.UMD.EDU
;
. 3600000 NS D.ROOT-SERVERS.NET.
D.ROOT-SERVERS.NET. 3600000 A 128.8.10.90
;
; formerly NS.NASA.GOV
;
. 3600000 NS E.ROOT-SERVERS.NET.
E.ROOT-SERVERS.NET. 3600000 A 192.203.230.10
;
; formerly NS.ISC.ORG
;
. 3600000 NS F.ROOT-SERVERS.NET.
F.ROOT-SERVERS.NET. 3600000 A 192.5.5.241
F.ROOT-SERVERS.NET. 3600000 AAAA 2001:500:2f::f
;
; formerly NS.NIC.DDN.MIL
;
. 3600000 NS G.ROOT-SERVERS.NET.
G.ROOT-SERVERS.NET. 3600000 A 192.112.36.4
;
; formerly AOS.ARL.ARMY.MIL
;
. 3600000 NS H.ROOT-SERVERS.NET.
H.ROOT-SERVERS.NET. 3600000 A 128.63.2.53
H.ROOT-SERVERS.NET. 3600000 AAAA 2001:500:1::803f:235
;
; formerly NIC.NORDU.NET
;
. 3600000 NS I.ROOT-SERVERS.NET.
I.ROOT-SERVERS.NET. 3600000 A 192.36.148.17
;
; operated by VeriSign, Inc.
;
. 3600000 NS J.ROOT-SERVERS.NET.
J.ROOT-SERVERS.NET. 3600000 A 192.58.128.30
J.ROOT-SERVERS.NET. 3600000 AAAA 2001:503:C27::2:30
;
; operated by RIPE NCC
;
. 3600000 NS K.ROOT-SERVERS.NET.
K.ROOT-SERVERS.NET. 3600000 A 193.0.14.129
K.ROOT-SERVERS.NET. 3600000 AAAA 2001:7fd::1
;
; operated by ICANN
;
. 3600000 NS L.ROOT-SERVERS.NET.
L.ROOT-SERVERS.NET. 3600000 A 199.7.83.42
;
; operated by WIDE
;
. 3600000 NS M.ROOT-SERVERS.NET.
M.ROOT-SERVERS.NET. 3600000 A 202.12.27.33
M.ROOT-SERVERS.NET. 3600000 AAAA 2001:dc3::35
; End of File


root@prambanan: # cat /var/opt/named/db.local
@ttl 3600
0.0.127.in-addr.arpa. IN SOA prambanan.candi.com. ogut.candi.com. (
2008112300;
10800;
3600;
604800;
86400 );
IN NS prambaban.candi.com.
1 IN PTR localhost.


root@prambanan: # cat /var/opt/named/borobudur.hosts
$ORIGIN .
$TTL 3600 ; 1 hour
candi.com IN SOA borobudur.candi.com. ogut.candi.com. (
20080805 ; serial
3600 ; refresh (1 hour)
600 ; retry (10 minutes)
86400 ; expire (1 day)
3600 ; minimum (1 hour)
)
NS prambanan.candi.com.
NS mendut.candi.com.
NS borobudur.candi.com.
A 120.152.171.183
MX 10 lorojonggrang.candi.com.
TXT "PT. Candi Nusantara"
TXT "Jakarta"
TXT "Jl. Raden Widjaya 101"
$ORIGIN candi.com.
prambaban A 123.152.163.70
mendut A 123.153.132.99
muntilan A 123.155.6.157
lorojongrang A 123.155.6.150
muntilanmail NAME muntilan
jawa A 124.155.19.18
sumatera A 123.112.161.211
kalimantan MX 10 lorojonggrang
sulawesi A 123.112.152.213
MX 10 lorojonggrang


3. Start the DNS Server
root@prambanan: # svcadm enable /network/dns/server

4. Check /var/adm/messages file for a successful named (BIND) startup
root@prambanan: # tail /var/adm/messages
Jun 12 10:25:30 prambanan named [1916]: [ID 767358 daemon.notice] starting BIND 9.3..5-P1
Jun 12 10:25:30 prambanan named [1916]: [ID 767358 daemon.notice] command channel istening on 127.0.0.1#953


After up and running for long time, the cached may be huge, and may need to refresh. Use procedure below when refreshing the cache or restarting BIND:
root@prambanan: # svcadm disable /network/dns/server
root@prambanan: # svcadm enable /network/dns/server

Friday, May 29, 2009

Installing Veritas 5.0 MP2 patches leave Solaris 10 system unbootable.... Un-encapsulated

After installing Veritas 5.0 MP2 patches set, Solaris 10 system become unbootable. Boot process failed and system goes back to "ok>" prompt.
Following is the system console log:
{1} ok boot
Resetting...

Rebooting with command: boot
SunOS Release 5.10 Version Generic_118833-36 64-bit
Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
sp diff: name finddevice, nargs 1, nret 1,sp 0xf05d35b8 osp 0xf05d35a8
Hardware watchdog enabled
Unretrieved lom log history follows ...

4/29/09 2:39:04 PM Domain Reboot A: Initiating keyswitch: on, domain A.
4/29/09 2:39:04 PM Domain Reboot A: Initiating keyswitch: on, domain A.
Booting to milestone "milestone/single-user:default".
Hostname: sun1

syncing file systems... done
NOTICE: f_client_exit: Program terminated!
debugger entered.

{2} ok


Just heard that it could be an issue with boot disk encapsulation. The decision then is to recover the server by boot with normal disk slice.
First, disable boot disk encapsulation (un-encapsulated) by modify /etc/system and /etc/vfstab. Since server unbootable, reboot from alternate media (cdrom)
ok> boot cdrom -s
root # fsck /dev/rdsk/c0t0d0s0
root # mount /dev/dsk/c0t0d0s0 /a
root # cd /a/etc
root # cp system system.ORG
root # cp vfstab vfstab.ORG


Modify /etc/system (in this case /a/etc/system). Mark "rootdev and vxio"
* vxvm_START (do not remove)
forceload: drv/vxdmp
forceload: drv/vxio
forceload: drv/vxspec
***rootdev:/pseudo/vxio@0:0
***set vxio:vol_rootdev_is_volume=1
* vxvm_END (do not remove)


Modify /etc/vfstab (in this case /a/etc/vfstab) to normal disk slice.
root # cat /a/etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c1t0d0s1 - - swap - no -
/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 / ufs 1 no -
/dev/dsk/c1t0d0s3 /dev/rdsk/c1t0d0s3 /var ufs 2 no -
/dev/dsk/c1t0d0s4 /dev/rdsk/c1t0d0s4 /export/home ufs 2 yes -
swap - /tmp tmpfs - yes -
root #


Then reboot server.
root # umount /a
root # /etc/halt
ok> boot


With luck the server now bootable into multi-user mode. When everything is back to normal, then proceed with boot disk encapsulation and re-attach the mirror.

There is Sun Alert Document (Doc ID: 250446) described an issue during upgrade to Veritas 5.0 MP3. However, the impact is very similar. Server unbootable after patches installation.
Below is copy a portion of the document. Please refer to http://sunsolve.sun.com/search/document.do?assetkey=1-66-250446-1 for detail.

Document ID: 250446
Title: During Installation of Veritas VxVM 5.0 Patches the System may Become Unbootable
Solution 250446 : During Installation of Veritas VxVM 5.0 Patches the System may Become Unbootable

During installation of Veritas VxVM 5.0 patches the system may become unbootable (see below for details). Rebooting after installing the patches mentioned below, the system may fail to boot to multiuser mode. This issue is described by Symantec at:
http://seer.entsupport.symantec.com/docs/315817.htm

This alert described such problem and its workaround/resolution.
Following is (cut and paste) steps to recover and re-install the patch:

1. Boot the system from an alternate root. (mirror, net or CDROM)
2. Run format and look at all of the disks to determine which one was the original boot disk. You may need to mount some and see if VRTS patches are there as this is a good indicator that it was the boot disk.
3. Mount the original boot disk to /mnt
4. # bootadm update-archive -R /mnt
(updates the original boot archive, applicable for s10u6 and later)
5. # /etc/vx/bin/vxunroot
(to un-encapsulate root)
6. # reboot
(boot using normal slices)
7. # /etc/vx/bin/vxdg destroy rootdg
(destroy rootdg and all volumes associated with rootdg)
8. # /etc/vx/bin/vxdiskunsetup -C c#t#d#
(c#t#d# of rootdisk) (remove rootdisk from VxVM control)
9. # pkgrm VRTSvxvm
(in order to remove the patch from the system, as the patch cannot be backed out and patchinfo of 122058-11 is inconsistent.)
10. # reboot
11. # install VRTSvxvm and Run *vxinstall* to add/verify licenses and initialize Veritas
12. # vxdiskadm - encapsulate/mirror root and other partitions on the root disk.
13. # install patch 122058-11 and ignore the console prompt referenced above.
14. # reboot

There is good related article on Sun BigAdmin (Enda O'Connor, April 2009) stated:
The recommended way prior to patching is break the mirror, and then patch only on one half of the mirror. If a problem occured, it should be possible to boot from the other half of the mirror (the unpatched half). It is also strongly advised to use Solaris Volume Manager for mirroring root file systems, as opposed to using Veritas Volume Manager (VxVM) mirroring of root file systems. This is due to VxVM disk encapsulation, which, depending on the disk layout and free partitions on the root disk, might create a hard-to-manage disk layout that causes issues when trying to rescue such a layout. Also, there is the common misconception in the sys admin world that encapsulating a disk in VxVM equates to mirroring. This is a major mistake. Encapsulation is only the first step towards mirroring a root file system, whereby the currently installed root file system disk is given over to VxVM control and the data is preserved encapsulated). You then need to mirror the encapsulated root disk using further VxVM commands.


Thursday, May 28, 2009

Manipulating Solaris device instance numbers.... /etc/path_to_inst

/etc/path_to_inst file stores device tree (system devices) information which is critical to the kernel. It is registered all physical devices path, instance numbers, and the device instance name (device driver). When Solaris found a device at a specific location on system bus, an entry is added for the instance number and its device instance name. Device tree is persistent across reboots and even across configuration changes. This feature is very important in providing consistency across reboots and hardware changes.

To simplify, /etc/path_to_inst hold device tree information which is physical device path, together with it instance number, and device driver.

Following is a small portion of Sun Fire V490 /etc/path_to_inst file;
"/node@2/pci@8,700000/network@2" 0 "ce"
"/node@2/pci@8,700000/network@3" 1 "ce"
"/node@2/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01147a181,0" 0 "ssd"
"/node@2/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011479831,0" 1 "ssd"
"/node@2/pci@9,700000/network@2" 2 "ce"
"/node@2/pci@9,600000/network@1" 3 "ce"
"/node@2/pseudo" 0 "pseudo"
"/node@2/scsi_vhci" 0 "scsi_vhci"
"/node@2/scsi_vhci/ssd@g60060e8004767a000000767a00000512" 16 "ssd"


We can decode as;
- /pci@8,700000/network@2" 0 "ce"; Ethernet card in PCI slot 2, ce0
- /pci@8,700000/network@3" 1 "ce"; Ethernet card in PCI slot 3, ce1
- /pci@9,700000/network@2" 2 "ce"; On-board Ethernet, ce2
- /pci@9,600000/network@1" 3 "ce"; On-board Ethernet, ce3
- /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01147a181,0" 0 "ssd"; disk slot 0, ssd0


When dealt with Solaris cluster installation, its recommended that all nodes have uniform instance numbering. It will simplify installation and easy maintenance. Assumed that nodes cluster are Sun Fire E4900 and Sun Fire E2900. The I/O instance number would not exactly the same between those two server, since the server have different I/O board. (e.g ce0 on Sun Fire E4900 may be ethernet card installed on PCI slot 1, while Sun Fire E2900 has it on-board). We can re-arrange I/O instance number by manipulate /etc/path_to_inst file instead of shuffle the hardware (I/O card).
Illustration below is re-arrange the ce instance number;
"/node@2/pci@8,700000/network@2" 3 "ce"
"/node@2/pci@8,700000/network@3" 2 "ce"
"/node@2/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01147a181,0" 0 "ssd"
"/node@2/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e011479831,0" 1 "ssd"
"/node@2/pci@9,700000/network@2" 1 "ce"
"/node@2/pci@9,600000/network@1" 0 "ce"


After reconfiguration boot (boot -r), ce instance number will change as defined above.
- /pci@8,700000/network@2" 3 "ce"; Ethernet card in PCI slot 2, ce3
- /pci@8,700000/network@3" 2 "ce"; Ethernet card in PCI slot 3, ce2
- /pci@9,700000/network@2" 1 "ce"; Onboard Ethernet, ce1
- /pci@9,600000/network@1" 0 "ce"; Onboard Ethernet, ce0

(Please revert to previous instance number for the differences).

Likewise, we can re-arrange disk controller (c) and/or disk instances (ssd) as well.


Wednesday, May 27, 2009

Giving disk a volume name or label with format command.... volname

Some site may have server with a lot number of disks attached. It can be a physical disk drives or a Logical Unit disks (LUNs) presented by a storage subsystem. It always a challenge for sysadmin to administer those disks. The question, how easy to identify the disks, which disks belong to a filesystem, which disks always busy, which disk is a part of certain JBOD or storage subsystem, which disks have already been used, etc. Furthermore how if those disks all visible by many servers configured in cluster, how it can be manage easier?.
By giving each disk a volume name or label, it may the solution and is a great help.

Solaris embed utility that is used to manage disks; format command.
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit


volname menu is the choice to instruct disk labelling.

Below is a log when we executing the command:
root@sun1: # format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@1c,600000/scsi@1/sd@0,0
1. c0t1d0
/pci@1c,600000/scsi@1/sd@1,0
2. c0t2d0
/pci@1c,600000/scsi@1/sd@2,0
3. c0t3d0
/pci@1c,600000/scsi@1/sd@3,0
4. c0t4d0
/pci@1c,600000/scsi@1/sd@4,0
5. c0t5d0
/pci@1c,600000/scsi@1/sd@5,0
6. c1t0d0
/pci@1c,600000/scsi@1,1/sd@0,0
7. c1t1d0
/pci@1c,600000/scsi@1,1/sd@1,0
8. c1t2d0
/pci@1c,600000/scsi@1,1/sd@2,0
9. c1t3d0
/pci@1c,600000/scsi@1,1/sd@3,0
10. c1t4d0
/pci@1c,600000/scsi@1,1/sd@4,0
11. c1t5d0
/pci@1c,600000/scsi@1,1/sd@5,0
12. c3t0d0
/pci@1f,700000/scsi@2/sd@0,0
13. c3t1d0
/pci@1f,700000/scsi@2/sd@1,0
14. c6t600A0B800019E29D0000153A437A9463d0
/scsi_vhci/ssd@g600a0b800019e29d0000153a437a9463
15. c6t600A0B800019E29D0000153C437A9511d0
/scsi_vhci/ssd@g600a0b800019e29d0000153c437a9511
16. c6t600A0B800019E29D0000153E437A95B7d0
/scsi_vhci/ssd@g600a0b800019e29d0000153e437a95b7
17. c6t600A0B800019E29D00001538437A93EBd0
/scsi_vhci/ssd@g600a0b800019e29d00001538437a93eb
18. c6t600A0B800019E29D00001540437A96A1d0
/scsi_vhci/ssd@g600a0b800019e29d00001540437a96a1
19. c6t600A0B800019E29D00001542437A97C7d0
/scsi_vhci/ssd@g600a0b800019e29d00001542437a97c7
20. c6t600A0B800019E29D00001544437A98F3d0
/scsi_vhci/ssd@g600a0b800019e29d00001544437a98f3
Specify disk (enter its number): 1
selecting c0t1d0:
[disk formatted]

FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format> volname
Enter 8-character volume name (remember quotes)[""]:"Boot-Mir"
Ready to label disk, continue? y

format> disk
AVAILABLE DISK SELECTIONS:
0. c0t0d0
/pci@1c,600000/scsi@1/sd@0,0
1. c0t1d0
Boot-Mir
/pci@1c,600000/scsi@1/sd@1,0
2. c0t2d0
/pci@1c,600000/scsi@1/sd@2,0
3. c0t3d0
/pci@1c,600000/scsi@1/sd@3,0
4. c0t4d0
/pci@1c,600000/scsi@1/sd@4,0
5. c0t5d0
/pci@1c,600000/scsi@1/sd@5,0
6. c1t0d0
/pci@1c,600000/scsi@1,1/sd@0,0
7. c1t1d0
/pci@1c,600000/scsi@1,1/sd@1,0
8. c1t2d0
/pci@1c,600000/scsi@1,1/sd@2,0
9. c1t3d0
/pci@1c,600000/scsi@1,1/sd@3,0
10. c1t4d0
/pci@1c,600000/scsi@1,1/sd@4,0
11. c1t5d0
/pci@1c,600000/scsi@1,1/sd@5,0
12. c3t0d0
/pci@1f,700000/scsi@2/sd@0,0
13. c3t1d0
/pci@1f,700000/scsi@2/sd@1,0
14. c6t600A0B800019E29D0000153A437A9463d0
/scsi_vhci/ssd@g600a0b800019e29d0000153a437a9463
15. c6t600A0B800019E29D0000153C437A9511d0
/scsi_vhci/ssd@g600a0b800019e29d0000153c437a9511
16. c6t600A0B800019E29D0000153E437A95B7d0
/scsi_vhci/ssd@g600a0b800019e29d0000153e437a95b7
17. c6t600A0B800019E29D00001538437A93EBd0
/scsi_vhci/ssd@g600a0b800019e29d00001538437a93eb
18. c6t600A0B800019E29D00001540437A96A1d0
/scsi_vhci/ssd@g600a0b800019e29d00001540437a96a1
19. c6t600A0B800019E29D00001542437A97C7d0
/scsi_vhci/ssd@g600a0b800019e29d00001542437a97c7
20. c6t600A0B800019E29D00001544437A98F3d0
/scsi_vhci/ssd@g600a0b800019e29d00001544437a98f3
Specify disk (enter its number): 1
selecting c0t1d0: Boot-Mir
[disk formatted]
format> quit

root@sun1: #
Now we have a perspective of disk c0t1d0 as a sub-mirorr disk (Boot-Mir).

Note:
The volname menu is a non-destructive command. It will not destroy data on a disk. Even if the disk have partition mounted. However executing this command during disk setup is a wise initiative.


Tuesday, May 26, 2009

In Solaris, there is system daemon which flushes dirty filesystem pages to disk.... fsflush

The pagedaemon scanned the page cache and set the MMU reference bit if dirty page finds. When the fsflush daemon run, it scans the page cache looking for pages with the MMU reference bit set, and schedules these pages to be written to disk. Since all cached data flushed from memory to disk, it will prevent data lost in case the computer crash.
fsflush is governed by 2 tuneable parameter, tune_t_fsflushr and autoup. The flushrate (set by tune_t_fsflushr) is how often the daemon runs, while autoup determine the maximum age of a page (in seconds). By default flushrate have set to 5, and autoup 30. Its mean the daemon will run every 5 seconds, and flushing 1/6 portion of memory until it has covered all dirty pages in 30 seconds. We can make the flushrate smaller and increase the maximum age to spread out the flush load. Basically, the less frequent the flushing, the faster the computer, but risk losing data more. More flushing, slower computer (more CPU cycles for flushing) but less risk losing data.

Below is a picture of fsflush daemon in busy system:
root@sun1: uptime
10:55am up 30 day(s), 21:36 7 users, load average: 21.62, 22.49, 23.37
root@sun1: # sysdef
snip
* Tunable Parameters
30 auto update time limit in seconds (NAUTOUP)
1 fsflush run rate (FSFLUSHR)
snip
root@sun1: ps -ef
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 Mar 07 ? 0:01 sched
root 1 0 0 Mar 07 ? 36:03 /etc/init -
root 2 0 0 Mar 07 ? 0:00 pageout
root 3 0 0 Mar 07 ? 2470:17 fsflush
root 431 1 0 Mar 07 ? 1:33 /usr/lib/inet/in.mpathd -a
root 11 1 0 Mar 07 ? 0:00 /platform/SUNW,Sun-Fire-15000
root 130 1 0 Mar 07 ? 0:02 devfsadmd
root 719 1 0 Mar 07 ? 5:01 /usr/sbin/inetd -s -t
root 19907 1 0 Mar 08 ? 1:45 /usr/sbin/cron
root 785 1 0 Mar 07 ? 47:35 /usr/sbin/nscd
root 747 1 0 Mar 07 ? 2:51 /usr/sbin/syslogd
smmsp 992 1 0 Mar 07 ? 0:00 /usr/lib/sendmail -Ac -q15m
root 2534 1 0 Mar 07 ? 0:00 ./rasserv -d /opt/SUNWstade
root 2828 1 0 Mar 07 ? 32:43 /usr/local/sbin/snmpd
nobody 2799 2677 0 Mar 07 ? 0:08 /usr/apache/bin/httpd
root 2864 2853 0 Mar 07 ? 0:00 /usr/lib/saf/ttymon
oracle 2647 1 0 Mar 24 ? 224:32 ora_pmon_mspro
oracle 4053 4052 0 Mar 07 ? 498:07 /oracle/am1/bin/am1agent
oracle 2668 1 0 Mar 24 ? 64:23 ora_lgwr_mspro
oracle 2676 1 0 Mar 24 ? 50:18 ora_cjqa_mspro
oracle 2672 1 0 Mar 24 ? 24:27 ora_smon_mspro
oracle 2670 1 0 Mar 24 ? 114:32 ora_ckpt_mspro
oracle 28396 1 0 07:10:05 ? 12:48 oraclemspro (LOCAL=NO)
oracle 17848 1 0 07:52:10 ? 0:15 oraclemspro (LOCAL=NO)
oracle 3158 1 0 Mar 24 ? 136:51 ora_arc_mspro

snip

With tunable parameter set to:
autoup=30
tune_t_fsflushr=1

Its shown that fsflush dominated CPU usage. We may able to reduce fsflush CPU usage by modify /etc/system and reboot the server afterward.
set autoup=900
set tune_t_fsflushr=1



Monday, May 25, 2009

Patch removal when system can only boot from alternate media (cdrom, network).... patchrm -R

When installing system or kernel related patches, its strongly recommends that it should be carried out in single-user mode (run level S). Its also common that certain patch required a Special Install Instruction which must be follows. Neglected the direction may rendered the system unbootable, either panicking in a loop or dropping to an OK prompt. However in a rare circumstances, even have follow the install direction, there is a possibility that patch installation (patchadd) get failed, partially installed or in-completed, and leave server unbootable.

Now the decision is to uninstall (back out) the problematic patch. Since the server unbootable, it need to boot up from alternate media such as cdrom or network, and then remove the patch through patchrm -R.

Following is steps for removal patch when server can not boot from its bootdisk;
Assumed the server have /, /var, and /usr filesystem on separate disk slices.
1. Boot from alternate media or the network.
ok> boot cdrom -s (boot net -s)
2. Mount the root file system and any other required file systems.
# mount /dev/dsk/c0t0d0s0 /a
# mount /dev/dsk/c0t0d0s3 /a/var
# mount /dev/dsk/c0t0d0s4 /a/usr


In case that filesystem can not be mount, it may need run fsck for integrity check.
# fsck /dev/rdsk/c0t0d0s0
# mount /dev/dsk/c0t0d0s0 /a


After all necessary file systems have been mounted, the next step is to uninstall or remove the patch:
# patchrm -R /a (patch number)

Done. The server may now bootable.

With similar scenario, we also able to install patches to server which boot from alternate media. The command; patchadd -R. This is the way Solaris liveupdate and Jumpstart Enterprise Toolkit (JET) update such patches.

Note:
If bootdisk was mirroring (either by Solaris Volume Manager or Veritas Volume Manager), its recommended to break the mirror and then patch only one of the sub-mirror disk. If a problem occurred, it should be possible to boot from the other sub-mirror (the unpatched sub-mirror disk).


Sunday, May 24, 2009

fork: Not enough space.. swap space limit exceeded.... tmpfs

During normal business hour, IHAC got a message on his terminal window as:
bash: fork: Not enough space
Experienced that the system performance degraded, and later on became un-responsive. Executing simple command still responded, but take a long time to finish. From /var/adm/messages file, "df -k" command output, and /etc/vfstab file there is the clue:
root@sun1: # cat /var/adm/messages
(snip)
Feb 24 10:06:41 sun1 tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
Feb 24 10:06:46 sun1 last message repeated 264 times

(snip)
root@sun1: df -k
Filesystem kbytes used avail capacity Mounted on
/dev/vx/dsk/bootdg/rootvol 68392666 24266489 43442251 36% /
/proc 0 0 0 0% /proc
mnttab 0 0 0 0% /etc/mnttab
fd 0 0 0 0% /dev/fd
swap 69801872 184 69801688 1% /var/run
dmpfs 69801688 0 69801688 0% /dev/vx/dmp
dmpfs 69801688 0 69801688 0% /dev/vx/rdmp
swap 69813720 69801688 12032 100% /tmp

(snip)
root@sun1: # cat /etc/vfstab
(snip)
swap - /tmp tmpfs - yes -
(snip)

For some reason, the /tmp filesystem filled up. Since the system un-responsived the last resort was rebooting the server (with hard reset anyway)./tmp filesystem by default, is a memory based file system (tmpfs), and its mounted without any size limitation. Furthermore it is the possibility that every user can use the whole /tmp space since its set publicly writeable.This case happened could be because an errant application or someone dumping file into /tmp. To avoid this happens again, its suggested to modify the /tmp entry in /etc/vfstab to limit to adequate size (e.g 1024Mb).
root@sun1: cat /etc/vfstab
(snip)
swap - /tmp tmpfs - yes size=1024m
(snip)
Unfortunately, system reboot is required to activate the changes.