Skip Headers

Oracle& reg; High Availability Architecture and Best Practices
10g Release 1 (10.1)

Part Number B10726-01
Go to Documentation Home
Home

Book List

Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to next page
Next
View PDF

Contents< /font>

Title and Copyright Information

Send Us Your Comments

Preface

Audience
Or ganization
Related Documentation
Conventions
Documentation Accessibility

Part I Getting Started

1 Overview of High Availability

Introduction to High Availability
What is Availability?
Importance of Availability
Causes of Downtime
What Does This Book Contain?
Who Sho uld Read This Book?

2 Determining Your High Availability Requirements

Why It Is Important to Determine High Availability Requirements
Analysis Framework for Determining High Availability Requirements
Business Impact Analysis
Cost of Downtime
Recovery Time Objective
Recovery Point Objective
Choosing a High Availability Architecture
HA Systems Capabilities
Business Performance, Budget and Growth Plans
High Availability Best Practices

Part II Oracle Database High Availability Features, Arch itectures, and Policies

3 Oracle Database High Availability Features

Oracle Real Application Clusters
Oracle Data Guard
Oracle Stream s
Online Reorganization
< dd class="H1TOC">Transportable Tablespaces
Automatic Storage Management
Flashback Technology
Oracle Flashback Query
Oracle Flashback Version Query
Oracle Flashback Transaction Query
Oracle Flashback Table
Oracle Flashback Drop
Oracle Flashback Database
Dynamic Reconfiguration
Oracle Fail Safe
Recovery Manag er
Flash Recovery Area
Hardware Assisted Resilient Data (HARD) Initiative

4 High Availability Arc hitectures

Oracle Datab ase High Availability Architectures
"Database Only" Architecture
"RAC Only" Architecture
"Data Guard Only" Architecture
Max imum Availability Architecture
Streams Architecture
Choosing the Correct HA Architecture
Assessing Other Architectures

5 Operational Policies for High Availability

Introduction to Operational Polic ies for High Availability
Service Level Managem ent for High Availability
Planning Capacity to Promote High Availability
Change Management for High Availability
Backup and Recovery Planning for High Availability
Disaster Recovery Planni ng
Planning Scheduled Outages
Staff Training for High Availability
Documentation as a Means of Maintaining High Availability
Physical Security Policies and Procedures for High Availability

Part III Config uring a Highly Available Oracle Environment

6 System and Network Configuration

Overview of System Configuration Recommendations
Recommendations for Configuring Storage
Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
Use an Array That Can Be Serviced Online
Mirror and Stripe for Pro tection and Performance
Load-Balance Across All Physical Interfaces
Create Independent Storage Areas
Storage Recommendations for Specific HA Architectures
Define ASM Disk and Failure Groups Properly
Use HARD-Compliant Storage f or the Greatest Protection Against Data Corruption
Storage Recommendation for RAC
Protect the Oracle Cluster Registry and Voting Disk From Media Failure
Recommendations for Co nfiguring Server Hardware
Server Hardware Recommendat ions for All Architectures
Use Fewer, Faster, and Denser Compo nents
Use Redundant Hardware Components
Use Systems That Can Detect and Isolate Failures
Protect t he Boot Disk With a Backup Copy
Server Hardware Recommendations for RAC
Use a Supported Cluster System to Run RAC
Choose the Proper Cluster Interconnect
Server Hardware Recommendations for Data Guard
Us e Identical Hardware for Every Machine at Both Sites
Recommendations for Configuring Server Software
Server Software Recommendations for All Architectures
Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
Use an Operating System That is Fault-Tolerant to Hardware Failures
Configur e Swap Partititions Appropriately
Set Operating System Parameters to Enable F uture Growth
Use Logging or Journal File Systems
< a href="sysnet.htm#1011884">Mirror Disks That Contain Oracle and Application Software
Server Software Recommendations for RAC
Use Supported Clustering Software
Use Network Time Protocol (NTP) On All Cluster Nodes
Recommendations for Configuri ng the Network
Network Configuration Best Practices f or All Architectures
Ensure That All Network Components Are Re dundant
Use Load Balancers to Distribute Incoming Requests
Network Configuration Best Practices for RAC
Classify Network Interfaces Using the Oracle Interface Configuration Tool
Network Configuration Best Practices for Data Guard
< a href="sysnet.htm#1012534">Configure System TCP Parameters Appropriately
Use WAN Traffic Managers to Provide Site Failover Capabilities

7 Oracle Configuration Best Practices

Configuration Best Practices for the Database
Use Two Control Files
Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
Configure the Size of Redo Log Files and Groups Appropriately
Multiplex Online Redo Log Files
Enable ARCHIVELOG Mode
Enable Block Checksums
Enable Database Block Checking
Log Checkpoints to the Alert Log
U se Fast-Start Checkpointing to Control Instance Recovery Time
Capture Perfo rmance Statistics About Timing
Use Automatic Undo Management
Use Locally Managed Tablespaces
Use Automatic Segment Space Management
Use Temporary Tablespaces and Speci fy a Default Temporary Tablespace
Use Resumable Space Allocation
< dd class="H2TOC">Use a Flash Recovery Area
E nable Flashback Database
Set Up and Follow Security Best Practices
Use the Database Resource Manager
Use a Server Parameter File
Configu ration Best Practices for Real Application Clusters
Register All Instances with Remote Listeners
Do Not Set CLUSTER_INTERCONNE CTS Unless Required for Scalability
Configur ation Best Practices for Data Guard
Use a Simple, R obust Archiving Strategy and Configuration
Use Multiplexed Standby Redo Log s and Configure Size Appropriately
Enable FORCE LOGGING Mode
Use Real Time Apply
Configure t he Database and Listener for Dynamic Service Registration
Tune the Network in a WAN Environment
Determine the Data Protection Mode
Determining the Protection Mode
Changing the Data Protection Mode
Conduct a Performanc e Assessment with the Proposed Network Configuration
Use a LAN or MAN for M aximum Availability or Maximum Protection Modes
Set SYNC=NOPARALLEL/PARALLE L Appropriately
Use ARCH for the Greatest Performance Throughput
< dd class="H2TOC">Use the ASYNC Attribute with a 50 MB Buffer for Maximum Performance Mode
Evaluate SSH Port Forwarding with Compression
Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
Provide Secure Transmi ssion of Redo Data
Set DB_UNIQUE_NAME
Set LOG_ARCHIVE_CONFIG Correctly
Recommendations for t he Physical Standby Database Only
Tune Media Recovery Perfor mance
Recommendations for the Logical Standby Database Only
Use Supplemental Logging and Primary Key Constraints
Set the MAX_SERVERS Initialization Parameter
Increase the PARALLEL_MAX_SERVERS Initialization Parameter
Set the TRANSACTION_CONSISTENCY Initialization Parameter
Skip SQL Apply for Unnecessary Objects
Configuration Best Practices for MAA
Configure Multiple Standby Instances
Configure Connect-Time Failover for Network Service Descriptors
Recommendations for Backup and Recovery
Use Recovery Manager to Back Up Database Files
Understand When to Use Backups
Perform Regular Backups
Initial Data Guard Environment Set-Up
Recovering from Data Failures Using File or Block Media Recovery
Double Failure Resolution
Long-Term Backups
Use an RMAN Recovery Catalog
Use the Autobackup Feature for the Control File and SPFILE
< a href="configbp.htm#1016775">Use Incrementally Updated Backups to Reduce Restoration Time
Enable Change Tracking to Reduce Backup Time
Create Databas e Backups on Disk in the Flash Recovery Area
Create Tape Backups from the F lash Recovery Area
Determine Retention Policy and Backup Frequency
Configure the Size of the Flash Recovery Area Properly
< a href="configbp.htm#1016783">In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
During Backups, Use the Target Database Control File as the RMAN Repository
Regularly Check Database Files for Corruption
Periodically Test Recovery Procedures
Back Up the OCR to Tape or Offsit e
Recommendations for Fast Application Failo ver
Configure Connection Descriptors for All Possib le Production Instances
Use RAC Availability Notifications and Events
Use Transparent Application Failover If RAC Notification Is Not Feasible
New Connections
Existing Connections
LOAD_BALANCE Parameter in the Connection Descri ptor
FAILOVER Parameter in the Connection Descriptor
SERVICE_NAME Parameter in the Connection Descriptor
RETRIES Parameter in the Connection Descriptor
DELAY Parameter in the Connection Descriptor
Configure Services
Configure CRS for High Availability
Configure Service Callouts to Notify Middle-Tier Applications and Clients
Publish Standby or Nonproduction Services
Publish Production Services

Part IV Managing a Highly Available Oracle Environment

8 Using Oracle Enterprise Manager for Monitoring and Detection

Overview of Monitoring and Detection for High Availability
Using Enterprise Manager for System Monitoring
Set Up Default Notification Rules for Each System
< a href="monitor.htm#1007721">Use Database Target Views to Monitor Health, Availability, and Performance
Use Event Notifications to React to Metric Changes
Use Events to Monitor Data Guard system Availability
Managing the HA Environment with Enterprise Manager
Check Enterprise Manager Policy Violations
Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines
Use Enterprise Manag er to Manage Data Guard Targets
Highly Availa ble Architectures for Enterprise Manager
Recommendat ions for an HA Architecture for Enterprise Manager
Protect th e Repository and Processes As Well as the Configuration They Monitor
Place t he Management Repository in a RAC Instance and Use Data Guard
Configure At L east Two Management Service Processes and Load Balance Them
Consider Hosting Enterprise Manager on the Same Hardware as an HA System
Monitor the Network Bandwidth Between Processes and Agents
Unscheduled Outages for En terprise Manager
Additional Enterprise Manage r Configuration
Configure a Separate Listener for En terprise Manager
Install the Management Repository Into an Existing Database

9 Recovering fr om Outages

Recovery Steps for Unscheduled Outages
Recovery Steps for Unscheduled Outages on the Primary Site
Recovery Steps for Unscheduled Outages on the Se condary Site
Recovery Steps for Scheduled Out ages
Recovery Steps for Scheduled Outages on the Pri mary Site
Recovery Steps for Scheduled Outages on the Secondary Site
Preparing for Scheduled Secondary Site Maintenance

10 Detailed Recovery Steps

Summary of Recovery Operations
< dd class="H1TOC">Complete or Partial Site Failover
Complete Site Failover
Par tial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
Database Failover
When to Use Data Guard Failover
When Not to Use Data Guard Failover
Data Guard Failover Using SQL*Plus
Physical Standby Failover Using SQL*Plus
Logical St andby Failover Using SQL*Plus
Datab ase Switchover
When to Use Data Guard Switchover
When Not to Use Data Guard Switchover
Data Guard Switchover Using SQL*Plus
Physical Standby Switchover Using SQL*Plus
Logical Standby Switchover Using SQL*Plus
RAC Recovery
RAC Recovery for Unscheduled Outages
Automatic Instance Recovery for Failed Instances
Single Node Failure in Real Application Clusters
Mul tiple Node Failures in Real Application Clusters
Automatic Service Relocation
RAC Recovery for Scheduled Outages
Disabling CRS-Managed Resources
Planned Service Relocation
Apply Instance Failover
Performing an Apply Instanc e Failover Using SQL*Plus
Step 1: Ensure That the Chosen Stan dby Instance is Mounted
Step 2: Verify Oracle Net Connection to the Chosen S tandby Host
Step 3: Start Recovery on the Chosen Standby Instance
< dd class="H3TOC">Step 4: Copy Archived Redo Logs to the New Apply Host
Step 5: Verify the New Configuration
Recovery Solutions for Data Failures
Detecting and Recovering From Datafile Block Corruption
Detecting Datafile Block Corruption
Recovering From Datafile Block Co rruption
Determine the Extent of the Corruption Problem
Replace or Move Away From Faulty Hardware
Determine Which Objects Are Affected
Decide Which Recovery Method to Use
Recovering From Media Failure
Determine the Extent of the Media Failure
Replace or Move Away From Faulty Hardware
Decide Which Recovery Action to Take
Re covery Methods for Data Failures
Use RMAN Datafile Media Reco very
Use RMAN Block Media Recovery
Re-Create Objects Manually
Use Data Guard to Recover From Da ta Failure
Recovering from User Err or with Flashback Technology
Resolving Row and Trans action Inconsistencies
Flashback Query
Flashback Version Query
Flashback Transac tion Query
Example: Using Flashback Technology to Investigate Salary Discrep ancy
Resolving Table Inconsistencies
Flashback Table
Flashback Drop
Resolving Database-Wide Inconsistencies
Flashback Database
Using Flashba ck Database to Repair a Dropped Tablespace
RAC Rolling Upgrade
Applying a Patch with op atch
Rolling Back a Patch with opatch
Using opatch to List Installed Software Components and Patches
Recommended Practices for RAC Rolling Upgrades
Upgrade with Logical Standby Database
Online Object Reorganization
Online Table Reorganiza tion
Online Index Reorganization
Online Tablespace Reorganization

11 Restoring Fault Tolerance

Restoring Full Tolerance
Restoring Failed Nodes or Instances in a RAC Cluster
Recovering Service Availability
Considerations for Client Connect ions After Restoring a RAC Instance
Restoring the Standby Database After a Failover
Restoring a P hysical Standby Database After a Failover
Step 1P: Retrieve S TANDBY_BECAME_PRIMARY_SCN
Step 2P: Flash Back the Previous Production Databa se
Step 3P: Mount New Standby Database From Previous Production Database
Step 4P: Archive to New Standby Database From New Production Database
Step 5P: Start Managed Recovery
Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
Restoring a Logical Standby Database After a Failover
Step 1L: Retrieve END_PRIMARY_SCN
Step 2L: Flash Back the Previous Production Dat abase
Step 3L: Open New Logical Standby Database and Start SQL Apply
Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage
Step 1: Start t he Standby Database
Step 2: Start Recovery
Step 3: Verify Log Transport Services on Production Database
Step 4: Verify that Recovery is Progressing on Standby Database
Ste p 5: Restore Production Database Protection Mode
Restoring Fault Tolerance after a Standby Database Data Failure
Step 1: Fix the Cause of the Outage
Step 2: Restore the Back up of Affected Datafiles
Step 3: Restore Required Archived Redo Log Files
Step 4: Start the Standby Database
Step 5: Start Recovery or Apply
Step 6: Verify Log Transport S ervices On the Production Database
Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database
Step 8: Restore Production Database Prot ection Mode
Restoring Fault Tolerance After t he Production Database Has Opened Resetlogs
Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production
Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production
Restoring Fault Tolerance after Dual Failures

A Hardware Assisted Resilient Data (HARD) Initiative

Preventing Data Corruptions with HARD-Compliant Storage
Data Corruptions
Types of Data Corruption Addressed by HARD
Possible HARD Checks

B Database SPFILE and Oracle Net Configuration File Samples

SPFILE Samples
Oracle Net Configuration Files
SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration

Index