Contents<
/font>
- Audience
- Or
ganization
- Related Documentation
- Conventions
- Documentation Accessibility
Introduction to High Availability
What is Availability?
Importance of Availability
Causes of Downtime
What Does This Book Contain?
Who Sho
uld Read This Book?
- Why It Is Important to Determine High Availability Requirements
- Analysis Framework for Determining High Availability Requirements
- Business Impact Analysis
- Cost of Downtime
- Recovery Time Objective
- Recovery Point Objective
- Choosing a High Availability Architecture
- HA Systems Capabilities
- Business Performance, Budget and Growth Plans
- High Availability Best Practices
- Oracle Real Application Clusters
- Oracle Data Guard
- Oracle Stream
s
- Online Reorganization
<
dd class="H1TOC">Transportable Tablespaces
-
Automatic Storage Management
- Flashback Technology
- Oracle Flashback Query
- Oracle Flashback Version Query
- Oracle Flashback Transaction Query
- Oracle Flashback Table
- Oracle Flashback Drop
- Oracle Flashback Database
- Dynamic Reconfiguration
- Oracle Fail Safe
- Recovery Manag
er
- Flash Recovery Area
Hardware Assisted Resilient Data (HARD) Initiative
- Oracle Datab
ase High Availability Architectures
- "Database
Only" Architecture
- "RAC Only" Architecture
-
"Data Guard Only" Architecture
- Max
imum Availability Architecture
- Streams Architecture
- Choosing the Correct HA Architecture
Assessing Other Architectures
- Introduction to Operational Polic
ies for High Availability
- Service Level Managem
ent for High Availability
- Planning Capacity to
Promote High Availability
- Change Management for
High Availability
- Backup and Recovery Planning
for High Availability
- Disaster Recovery Planni
ng
- Planning Scheduled Outages
- Staff Training for High Availability
- Documentation as a Means of Maintaining High Availability
- Physical Security Policies and Procedures for High Availability
- Overview of System Configuration Recommendations
- Recommendations for Configuring Storage
- Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
- Use an Array That Can Be Serviced Online
- Mirror and Stripe for Pro
tection and Performance
- Load-Balance Across All Physical Interfaces
- Create Independent Storage Areas
- Storage Recommendations for Specific HA Architectures
- Define ASM Disk and Failure Groups Properly
- Use HARD-Compliant Storage f
or the Greatest Protection Against Data Corruption
- Storage Recommendation for
RAC
- Protect the Oracle Cluster Registry and Voting Disk From
Media Failure
- Recommendations for Co
nfiguring Server Hardware
- Server Hardware Recommendat
ions for All Architectures
- Use Fewer, Faster, and Denser Compo
nents
- Use Redundant Hardware Components
- Use Systems That Can Detect and Isolate Failures
- Protect t
he Boot Disk With a Backup Copy
- Server Hardware Recommendations for
RAC
- Use a Supported Cluster System to Run RAC
- Choose the Proper Cluster Interconnect
- Server Hardware Recommendations for Data Guard
- Us
e Identical Hardware for Every Machine at Both Sites
- Recommendations for Configuring Server Software
- Server Software Recommendations for All Architectures
- Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
- Use an Operating System That is Fault-Tolerant to Hardware Failures
- Configur
e Swap Partititions Appropriately
- Set Operating System Parameters to Enable F
uture Growth
- Use Logging or Journal File Systems
- <
a href="sysnet.htm#1011884">Mirror Disks That Contain Oracle and Application Software
- Server Software Recommendations for RAC
- Use
Supported Clustering Software
- Use Network Time Protocol (NTP) On All Cluster
Nodes
- Recommendations for Configuri
ng the Network
- Network Configuration Best Practices f
or All Architectures
- Ensure That All Network Components Are Re
dundant
- Use Load Balancers to Distribute Incoming Requests
- Network Configuration Best Practices for RAC
- Classify Network Interfaces Using the Oracle Interface Configuration Tool
- Network Configuration Best Practices for Data Guard
- <
a href="sysnet.htm#1012534">Configure System TCP Parameters Appropriately
- Use
WAN Traffic Managers to Provide Site Failover Capabilities
- Configuration Best Practices for the Database
- Use Two Control Files
- Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
- Configure the Size of
Redo Log Files and Groups Appropriately
- Multiplex Online Redo Log Files
- Enable ARCHIVELOG Mode
- Enable Block Checksums
- Enable Database Block Checking
- Log Checkpoints to the Alert Log
- U
se Fast-Start Checkpointing to Control Instance Recovery Time
- Capture Perfo
rmance Statistics About Timing
- Use Automatic Undo Management
- Use Locally Managed Tablespaces
- Use Automatic Segment Space Management
- Use Temporary Tablespaces and Speci
fy a Default Temporary Tablespace
- Use Resumable Space Allocation
<
dd class="H2TOC">Use a Flash Recovery Area
- E
nable Flashback Database
- Set Up and Follow Security Best Practices
- Use the Database Resource Manager
- Use a Server Parameter File
Configu
ration Best Practices for Real Application Clusters
- Register All Instances with Remote Listeners
- Do Not Set CLUSTER_INTERCONNE
CTS Unless Required for Scalability
Configur
ation Best Practices for Data Guard
- Use a Simple, R
obust Archiving Strategy and Configuration
- Use Multiplexed Standby Redo Log
s and Configure Size Appropriately
- Enable FORCE LOGGING Mode
- Use Real Time Apply
- Configure t
he Database and Listener for Dynamic Service Registration
- Tune the Network
in a WAN Environment
- Determine the Data Protection Mode
- Determining the Protection Mode
- Changing the Data Protection Mode
- Conduct a Performanc
e Assessment with the Proposed Network Configuration
- Use a LAN or MAN for M
aximum Availability or Maximum Protection Modes
- Set SYNC=NOPARALLEL/PARALLE
L Appropriately
- Use ARCH for the Greatest Performance Throughput
<
dd class="H2TOC">Use the ASYNC Attribute with a 50 MB Buffer for Maximum Performance Mode
Evaluate SSH Port Forwarding with Compression
Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
Provide Secure Transmi
ssion of Redo Data
Set DB_UNIQUE_NAME
Set LOG_ARCHIVE_CONFIG Correctly
Recommendations for t
he Physical Standby Database Only
- Tune Media Recovery Perfor
mance
Recommendations for the Logical Standby Database Only
- Use Supplemental Logging and Primary Key Constraints
- Set the MAX_SERVERS Initialization Parameter
- Increase the PARALLEL_MAX_SERVERS Initialization Parameter
-
Set the TRANSACTION_CONSISTENCY Initialization Parameter
- Skip SQL Apply for
Unnecessary Objects
Configuration
Best Practices for MAA
- Configure Multiple Standby
Instances
- Configure Connect-Time Failover for Network Service Descriptors
a>
Recommendations for Backup and Recovery
strong>
- Use Recovery Manager to Back Up Database Files
- Understand When to Use Backups
- Perform Regular Backups
- Initial Data Guard Environment
Set-Up
- Recovering from Data Failures Using File or Block Media Recovery
- Double Failure Resolution
- Long-Term Backups
- Use an RMAN Recovery Catalog
- Use the Autobackup Feature for the Control File and SPFILE
- <
a href="configbp.htm#1016775">Use Incrementally Updated Backups to Reduce Restoration Time
- Enable Change Tracking to Reduce Backup Time
- Create Databas
e Backups on Disk in the Flash Recovery Area
- Create Tape Backups from the F
lash Recovery Area
- Determine Retention Policy and Backup Frequency
- Configure the Size of the Flash Recovery Area Properly
- <
a href="configbp.htm#1016783">In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
- During Backups, Use the Target Database Control File as the RMAN Repository
- Regularly Check Database Files for Corruption
- Periodically Test Recovery Procedures
- Back Up the OCR to Tape or Offsit
e
Recommendations for Fast Application Failo
ver
- Configure Connection Descriptors for All Possib
le Production Instances
- Use RAC Availability Notifications and Events
dd>
- Use Transparent Application Failover If RAC Notification Is Not Feasible
- New Connections
- Existing Connections
- LOAD_BALANCE Parameter in the Connection Descri
ptor
- FAILOVER Parameter in the Connection Descriptor
- SERVICE_NAME Parameter in the Connection Descriptor
- RETRIES Parameter in the Connection Descriptor
- DELAY Parameter
in the Connection Descriptor
Configure Services
Configure CRS for High Availability
Configure Service Callouts to Notify Middle-Tier Applications and Clients
Publish Standby or Nonproduction Services
Publish Production Services
- Overview of Monitoring and Detection for High Availability
- Using Enterprise Manager for System Monitoring
- Set Up Default Notification Rules for Each System
- <
a href="monitor.htm#1007721">Use Database Target Views to Monitor Health, Availability, and Performance
- Use Event Notifications to React to Metric Changes
- Use Events to Monitor Data Guard system Availability
- Managing the HA Environment with Enterprise Manager
- Check Enterprise Manager Policy Violations
- Use Enterprise Manager
to Manage Oracle Patches and Maintain System Baselines
- Use Enterprise Manag
er to Manage Data Guard Targets
- Highly Availa
ble Architectures for Enterprise Manager
- Recommendat
ions for an HA Architecture for Enterprise Manager
- Protect th
e Repository and Processes As Well as the Configuration They Monitor
- Place t
he Management Repository in a RAC Instance and Use Data Guard
- Configure At L
east Two Management Service Processes and Load Balance Them
- Consider Hosting
Enterprise Manager on the Same Hardware as an HA System
- Monitor the Network
Bandwidth Between Processes and Agents
- Unscheduled Outages for En
terprise Manager
- Additional Enterprise Manage
r Configuration
- Configure a Separate Listener for En
terprise Manager
- Install the Management Repository Into an Existing Database
- Recovery Steps for
Unscheduled Outages
- Recovery Steps for Unscheduled
Outages on the Primary Site
- Recovery Steps for Unscheduled Outages on the Se
condary Site
- Recovery Steps for Scheduled Out
ages
- Recovery Steps for Scheduled Outages on the Pri
mary Site
- Recovery Steps for Scheduled Outages on the Secondary Site
- Preparing for Scheduled Secondary Site Maintenance
- Summary of Recovery Operations
<
dd class="H1TOC">Complete or Partial Site Failover
- Complete Site Failover
- Par
tial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
- Database Failover
- When to Use Data Guard Failover
- When Not to Use Data Guard Failover
Data Guard Failover Using SQL*Plus
- Physical Standby Failover Using SQL*Plus
- Logical St
andby Failover Using SQL*Plus
Datab
ase Switchover
- When to Use Data Guard Switchover
- When Not to Use Data Guard Switchover
- Data Guard Switchover Using SQL*Plus
- Physical
Standby Switchover Using SQL*Plus
- Logical Standby Switchover Using SQL*Plus
RAC Recovery
- RAC Recovery for Unscheduled Outages
- Automatic Instance Recovery for Failed Instances
- Single Node Failure in Real Application Clusters
- Mul
tiple Node Failures in Real Application Clusters
- Automatic Service
Relocation
- RAC Recovery for Scheduled Outages
- Disabling CRS-Managed Resources
- Planned Service Relocation
Apply Instance Failover
- Performing an Apply Instanc
e Failover Using SQL*Plus
- Step 1: Ensure That the Chosen Stan
dby Instance is Mounted
- Step 2: Verify Oracle Net Connection to the Chosen S
tandby Host
- Step 3: Start Recovery on the Chosen Standby Instance
<
dd class="H3TOC">Step 4: Copy Archived Redo Logs to the New Apply Host
- Step 5: Verify the New Configuration
Recovery Solutions for Data Failures
- Detecting and Recovering From Datafile Block Corruption
- Detecting Datafile Block Corruption
- Recovering From Datafile Block Co
rruption
- Determine the Extent of the Corruption Problem
dd>
- Replace or Move Away From Faulty Hardware
- Determine Which Objects Are Affected
- Decide Which Recovery
Method to Use
- Recovering From Media Failure
-
- Determine the Extent of the Media Failure
- Replace or Move Away From Faulty Hardware
- Decide Which Recovery Action to Take
- Re
covery Methods for Data Failures
- Use RMAN Datafile Media Reco
very
- Use RMAN Block Media Recovery
- Re-Create Objects Manually
- Use Data Guard to Recover From Da
ta Failure
Recovering from User Err
or with Flashback Technology
- Resolving Row and Trans
action Inconsistencies
- Flashback Query
- Flashback Version Query
- Flashback Transac
tion Query
- Example: Using Flashback Technology to Investigate Salary Discrep
ancy
- Resolving Table Inconsistencies
- Flashback Table
- Flashback Drop
- Resolving Database-Wide Inconsistencies
- Flashback Database
- Using Flashba
ck Database to Repair a Dropped Tablespace
RAC Rolling Upgrade
- Applying a Patch with op
atch
- Rolling Back a Patch with opatch
- Using opatch to List Installed Software Components and Patches
- Recommended Practices for RAC Rolling Upgrades
Upgrade with Logical Standby Database
Online Object Reorganization
- Online Table Reorganiza
tion
- Online Index Reorganization
- Online Tablespace Reorganization
- Restoring Full Tolerance
- Restoring Failed Nodes or Instances in a RAC Cluster
- Recovering Service Availability
- Considerations for Client Connect
ions After Restoring a RAC Instance
- Restoring
the Standby Database After a Failover
- Restoring a P
hysical Standby Database After a Failover
- Step 1P: Retrieve S
TANDBY_BECAME_PRIMARY_SCN
- Step 2P: Flash Back the Previous Production Databa
se
- Step 3P: Mount New Standby Database From Previous Production Database
- Step 4P: Archive to New Standby Database From New Production Database
- Step 5P: Start Managed Recovery
- Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
- Restoring a Logical Standby Database After a Failover
- Step
1L: Retrieve END_PRIMARY_SCN
- Step 2L: Flash Back the Previous Production Dat
abase
- Step 3L: Open New Logical Standby Database and Start SQL Apply
- Restoring Fault Tolerance after Secondary
Site or Clusterwide Scheduled Outage
- Step 1: Start t
he Standby Database
- Step 2: Start Recovery
- Step 3: Verify Log Transport Services on Production Database
- Step 4: Verify that Recovery is Progressing on Standby Database
- Ste
p 5: Restore Production Database Protection Mode
- Restoring Fault Tolerance after a Standby Database Data Failure
- Step 1: Fix the Cause of the Outage
- Step 2: Restore the Back
up of Affected Datafiles
- Step 3: Restore Required Archived Redo Log Files
- Step 4: Start the Standby Database
- Step 5: Start Recovery or Apply
- Step 6: Verify Log Transport S
ervices On the Production Database
- Step 7: Verify that Recovery or Apply Is
Progressing On the Standby Database
- Step 8: Restore Production Database Prot
ection Mode
- Restoring Fault Tolerance After t
he Production Database Has Opened Resetlogs
- Scenario
1: SCN on Standby is Behind Resetlogs SCN on Production
- Scenario 2: SCN on
Standby is Ahead of Resetlogs SCN on Production
- Restoring Fault Tolerance after Dual Failures
- Preventing Data Corruptions with HARD-Compliant Storage
- Data Corruptions
- Types of Data Corruption Addressed by HARD
-
Possible HARD Checks
- SPFILE Samples
- Oracle Net Configuration Files
- SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
- LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
- TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration