In-Service Software Upgrade(ISSU) on QFX5100
In order to upgrade a device, traditionally a downtime is needed to upgrade the Routing Engine.
You could avoid the downtime if you have redundant Routing Engines. This is possible in the chassis based platforms where features that enable high availability like Graceful Routing Engine Switchover (GRES), NonStop Routing (NSR) and NonStop Bridging (NSB) allows smoother upgrade with minimal or no service impact.
However, on a fixed platform, there are no redundant Routing Engines. Until now.
The QFX5100 Series switches run Junos OS within a virtual machine on top of a Linux-based host OS. During an ISSU, Junos OS runs in two separate virtual machines in active and standby pairs like in any other dual routing engine platforms. The VMs that represent redundant REs move to the newer software version while maintaining operations in the data plane. ISSU is supported across all Layer 2 and Layer 3 protocols.
Let’s consider the below diagram to understand the ISSU architecture:
Disclaimer:
I took the diagram of ISSU architecture from the Data Center Switching (DCX) course.
During normal switch operations, Junos OS only runs on one VM (VM-A in our example). When ISSU is initiated, a second VM (VM-B in our example) is launched with the new version of the software. Once VM-B has launched, it synchronizes protocol states with VM-A. When synchonization process is finished, VM-B takes over the control and VM-A is shut down.
To support the ISSU operations on a QFX5100 Series switch, a number of connections and communications between the two VMs must exist. The passive connections become active when the graceful Routing Engine switchover (GRES) event associated with ISSU is complete.
The em0 and em1 interfaces are management ports. The em2 interface is used by the master Routing Engine (RE) to communicate with the host machine. The em3 interfaces are used for RE-to-RE communications during a ISSU operation.
There are few requirements for ISSU to be possible:
- GRES must be configured
- NSB must be configured
- NSR must be configured
This is what is happening during ISSU:
- Check if prerequisites are met
- Spawn the backup RE(RE1) with the new software
- Synchronize RE1 with RE0
- Make RE1 the master RE with PFE control
- Rename RE1 to RE0
- Shut down the initial VM
There are few limitations and caveats, but they are subject of change in the future meaning that in the future they will no longer exist so it’s better at the time when you will perform ISSU to check what are the current limitations.
If ISSU is failing, the logs about the ISSU events are saved in /var/log/vjunos-log.tgz.
So let’s try to perform ISSU on a QFX5100 without any of the high availability/redundancy features:
{master:0}[edit] root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz warning: GRES not configured {master:0}[edit] root@QFX5100#
As you can see, the upgrade doesn’t even start when GRES is not configured. After it is configured, let’s give it another try:
{master:0}[edit] root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz Starting ISSU Fri Sep 11 12:04:08 2015 PRE ISSU CHECK: --------------- PFE Status : Online Member Id zero : Valid VC not in mixed or fabric mode : Valid Member is single node vc : Valid BFD minimum-interval check done : Valid GRES enabled : Valid NSR not configured : Invalid drop-all-tcp not configured : Valid error: System not ready for ISSU. {master:0}[edit] root@QFX5100#
As you can see, now it’s complaining about NSR not being configured. After NSR is configured, let’s see what happens:
{master:0}[edit] root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz Starting ISSU Fri Sep 11 12:04:49 2015 PRE ISSU CHECK: --------------- PFE Status : Online Member Id zero : Valid VC not in mixed or fabric mode : Valid Member is single node vc : Valid BFD minimum-interval check done : Valid GRES enabled : Valid NSR enabled : Valid drop-all-tcp not configured : Valid warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost! ISSU: Validating Image error: 'Non Stop Bridging' not configured error: aborting ISSU Fri Sep 11 12:04:50 2015 error: ISSU Aborted! ISSU: IDLE {master:0}[edit] root@QFX5100#
It’s obvious that NSB is missing and this is the last prerequisite before ISSU can start and pass the mandatory checks.
This is a full ISSU log that contains the pre-checks and the operations during the ISSU:
{master:0}[edit] root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz Starting ISSU Fri Sep 11 12:05:30 2015 PRE ISSU CHECK: --------------- PFE Status : Online Member Id zero : Valid VC not in mixed or fabric mode : Valid Member is single node vc : Valid BFD minimum-interval check done : Valid GRES enabled : Valid NSR enabled : Valid drop-all-tcp not configured : Valid warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost! ISSU: Validating Image ISSU: Preparing Backup RE Prepare for ISSU ISSU: Backup RE Prepare Done Extracting jinstall-qfx-5-13.2X51-D36.1-domestic ... Install jinstall-qfx-5-13.2X51-D36.1-domestic completed Spawning the backup RE Spawn backup RE, index 1 successful GRES in progress GRES done in 0 seconds Waiting for backup RE switchover ready GRES operational Copying home directories Copying home directories successful Initiating Chassis In-Service-Upgrade Chassis ISSU Started ISSU: Preparing Daemons ISSU: Daemons Ready for ISSU ISSU: Starting Upgrade for FRUs ISSU: FPC Warm Booting ISSU: FPC Warm Booted ISSU: Preparing for Switchover ISSU: Ready for Switchover Checking In-Service-Upgrade status Item Status Reason FPC 0 Online (ISSU) Initiate em0 device handoff Console and management sessions will be disconnected. Please login again. ISSU Done ~Fri Sep 11 12:12:43 2015 pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway em0: bus=0, device=3, func=0, Ethernet address f0:1c:2d:42:39:f8 QFX5100 (ttyd0) login:
As mentioned, during ISSU there is another VM spawned so that there will be Routing Engine redundancy.
You can see this by performing “show chassis routing-engine” during the ISSU.
The below output is captured from before the switchover when the backup VM(the future master RE) was just spawned:
{master:0} root@QFX5100> show chassis routing-engine | no-more Routing Engine status: Slot 0: Current state Master Temperature 35 degrees C / 95 degrees F CPU temperature 35 degrees C / 95 degrees F DRAM 1953 MB Memory utilization 35 percent CPU utilization: User 2 percent Background 0 percent Kernel 2 percent Interrupt 0 percent Idle 96 percent Model QFX Routing Engine Serial ID BUILTIN Uptime 3 hours, 30 minutes, 1 second Last reboot reason 0x4000:VJUNOS reboot Load averages: 1 minute 5 minute 15 minute 0.18 0.24 0.12 Routing Engine status: Slot 1: Current state Backup Temperature 0 degrees C / 32 degrees F CPU temperature 0 degrees C / 32 degrees F Memory utilization 0 percent CPU utilization: User 0 percent Background 0 percent Kernel 0 percent Interrupt 0 percent Idle 0 percent Model QFX Routing Engine Serial ID BUILTIN Last reboot reason 0x4000:VJUNOS reboot Load averages: 1 minute 5 minute 15 minute 0.00 0.00 0.00 {master:0} root@QFX5100>
And that is how ISSU on QFX5100 is working. Now you should have a better understanding how you can achieve zero packet loss during an upgrade on a standalone QFX5100.
Paris ARAU
Latest posts by Paris ARAU (see all)
- Junos Fusion – Part IV – Satellite policies and uplink failure detection - 30 July 2018
- Junos Fusion – Part III – Satellite commands and traffic forwarding - 16 July 2018
- Junos Fusion – Part II – Configuration, Administration and Operation - 16 July 2018
- Junos Fusion – Part I – Overview, Components, Ports and Software - 11 July 2018
- Vagrant – Part IV – Network topology using Juniper and Cumulus - 26 April 2018
Thanks for the post.
Can I ask you a question?
I have a QFX5100-96S.
It seems that it has only one VM.
When I issue “show chassis routing-engine”, it shows only RE0.
Is there any prerequisites to use second VM?
Regards
Hi,
The second VM will show only during ISSU.
You need to enable GRES/NSR/NSB.
Thanks,
Paris
Thank you for the reply.
I’m not a native user of English.
And the device was delivered to my customer, so I cannot have hands-on now.
Please understand me for asking again.
Below are how I understand.
Let me know which one is correct
or correct me if both of two are wrong.
1.
The device (QFX5100) runs only one VM before ISSU is issued.
So, it shows only one RE in output of “show chassis hardware” and “show chassis routing-engine” at ordinary times.
Only after ISSU is issued, the device creates second VM and it shows two REs not only during ISSU but also after ISSU.
2.
The device runs only one VM at ordinary times.
It creates second VM only for ISSU and delete it after ISSU.
Thus, it shows only one RE in output of “show chassis hardware” and “show chassis routing-engine” before ISSU and after complete ISSU totally.
It shows two VMs(REs) only during ISSU.
Which one is correct?
Kindly let me know.
Regards.
Hi,
The second VM will be created only during ISSU. Before and after ISSU, there will be only one VM.
Thanks,
Paris
Thanks a lot.
I really appreciate it.
Hi Arau,
Just want to know your point of view, how about upgrading in spine and leaf mode without any loss traffic?.
Thank you.