HA_Synchronization_RevB downloaden
Die PA soll im HA Modus mit 3 HA Links verbunden werden. Man hat mit der Konfiguration angefangen … und übersehen, dass der HA Link (ganz oder zeitweise) down war. Das führte vermutlich dazu, dass die Konfiguration in einem inkonsisteten Zustand ist. wir versuchen uns da heranzutasten …
Als erstes die Ausgabe von
show high-availibility state
admin@PA1(active-primary)>
Group 1:
Mode: Active-Active
Local Information:
Version: 1
Mode: Active-Active
State: active-primary (last 25 days)
Device Information:
Management IPv4 Address: 172.25.3.99/24
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
HA1 Control Links Joint Configuration:
Encryption Enabled: no
Election Option Information:
Priority: 100
Preemptive: no
Version Compatibility:
Software Version: Match
Application Content Compatibility: Match
Threat Content Compatibility: Match
Anti-Virus Compatibility: Match
VPN Client Software Compatibility: Match
Global Protect Client Software Compatibility: Match
State Synchronization: synchronized; type: ethernet
Peer Information:
Connection status: up
Version: 1
Mode: Active-Active
State: active-secondary (last 6 minutes)
Last suspended state reason: User requested
Device Information:
Management IPv4 Address: 172.25.3.199/24
Management IPv6 Address:
Jumbo-Frames disabled; MTU 1500
Connection down; Reason: Never able to connect to peer
Connection up; Primary HA1 link
Election Option Information:
Priority: 200
Preemptive: no
Configuration Synchronization:
Enabled: yes
Running Configuration: not synchronized
Out-of-sync Reason: Failure to complete config sync
admin@PA2(active-secondary)> Group 1: Mode: Active-Active Local Information: Version: 1 Mode: Active-Active State: active-secondary (last 5 minutes) Device Information: Management IPv4 Address: 172.25.3.199/24 Management IPv6 Address: Jumbo-Frames disabled; MTU 1500 HA1 Control Links Joint Configuration: Encryption Enabled: no Election Option Information: Priority: 200 Preemptive: no Version Compatibility: Software Version: Match Application Content Compatibility: Match Threat Content Compatibility: Match Anti-Virus Compatibility: Match VPN Client Software Compatibility: Match Global Protect Client Software Compatibility: Match State Synchronization: synchronized; type: ethernet Peer Information: Connection status: up Version: 1 Mode: Active-Active State: active-primary (last 6 minutes) Device Information: Management IPv4 Address: 172.25.3.99/24 Management IPv6 Address: Jumbo-Frames disabled; MTU 1500 Connection down; Reason: Never able to connect to peer Connection up; Primary HA1 link Election Option Information: Priority: 100 Preemptive: no Configuration Synchronization: Enabled: yes Running Configuration: not synchronized Out-of-sync Reason: Started with config out-of-sync admin@PA2(active-secondary)>
Mit dem Befehl
request high-availability sync-to-remote running-config
soll die Konfig zu dem anderen Peer übertragen werden. Wir probieren es …
admin@PA1(active-primary)> request high-availability sync-to-remote running-config
<Enter> Finish input
admin@PA1(active-primary)> request high-availability sync-to-remote running-config
Executing this command will overwrite the candidate configuration on
the peer and trigger a commit on the peer.
Do you want to continue(y/n)? (y or n)
Successfully synchronized running configuration with HA peer
Es hat funktioniert, nun checken wir, ob die Daten noch da sind …
Alles ist scheibar OK, trotzdem sind die Daten inkonsistent. Schuld daran war bestimmt der Ausfall des HA1 Links, der u.A auch für den Sync der Konfig verantworlich ist:
HA1 Link Failure
If the HA1 Link fails and there is no HA1 Backup configured, configuration synchronization will fail and a split brain condition will be created. Split brain conditions occur when HA members can no longer communicate with each other to exchange HA monitoring information. Each HA member will assume the other member is in a non-functional state and take over as the Active (A/P) or Active-Primary (A/A). Split brain conditions can be prevented by configuring an HA1 Backup link and/or enabling Heartbeat Backup.
The HA control link also known as the HA1 link is used by the HA agent for the devices in HA to communicate with one another. The HA1 link is a layer3 link requiring an IP address. The HA agent uses TCP port 28769 for clear text communication, or SSH over TCP port 49969 if using encryption. This connection is used to send and receive hellos and HA state information, and configuration sync and management plane sync, such as routing and user-id information. Configuration changes to either units are automatically synchronized to the other device over this link The PA-4000 Series and PA-5000 Series firewalls have dedicated HA1 links. All other platforms require a revenue port to be configured as a HA1 link.
Monitor Hold time: HA control link monitoring tracks the state of the HA1 link to see if the peer HA device is down. This will catch a power-cycle, a reboot, or a power down of the peer device. To ignore the flapping of a link that wouldn’t necessarily take the HA control connection down, a monitor hold down timer for the HA control link monitoring can be configured. The monitor hold down time is configured under the HA1 link. The default value is 3000ms.
Und hier noch eine Empfehlung:
Configuration changes, commits, and synchronization between HA members should be planned and overlapping changes and commits should be avoided whenever possible.
Da dies nicht bemerkt wurde, ist es dringend nötig, sich über solche Fehler per Mail benachrichtigen zu lassen. Diese Mail hätte schon früher geholfen. Dazu einen Mailepfänger samt SMTP Server konfigurieren und unter
54141
Committing a critical High Availability (HA) group configuration was resulting in an email alert following commit: “SYSTEM ALERT: critical: HA Group 1: Running configuration not synchronized after retries”. A timeout on the HA peer while committing the HA synchronization caused the email alert to be generated.
So geht es:
Weitere Erkentnisse: Es scheint, als ob das Problem mit dem der Active / Active HA zusammenhängt. Die Antworten stehen noch aus. Auf jeden Fall kann man sich mit der CLI mit commit force helfen.
configure commit force
….
to be continued :=)