In-Service Software Upgrade(ISSU) on QFX5100

In order to upgrade a device, traditionally a downtime is needed to upgrade the Routing Engine.

You could avoid the downtime if you have redundant Routing Engines. This is possible in the chassis based platforms where features that enable high availability like Graceful Routing Engine Switchover (GRES), NonStop Routing (NSR) and NonStop Bridging (NSB) allows smoother upgrade with minimal or no service impact.

However, on a fixed platform, there are no redundant Routing Engines. Until now.

The QFX5100 Series switches run Junos OS within a virtual machine on top of a Linux-based host OS. During an ISSU, Junos OS runs in two separate virtual machines in active and standby pairs like in any other dual routing engine platforms. The VMs that represent redundant REs move to the newer software version while maintaining operations in the data plane. ISSU is supported across all Layer 2 and Layer 3 protocols.

Let’s consider the below diagram to understand the ISSU architecture:

 

issu_architecture

Disclaimer:

I took the diagram of ISSU architecture from the Data Center Switching (DCX) course.

 

During normal switch operations, Junos OS only runs on one VM (VM-A in our example). When ISSU is initiated, a second VM (VM-B in our example) is launched with the new version of the software. Once VM-B has launched, it synchronizes protocol states with VM-A. When synchonization process is finished, VM-B takes over the control and VM-A is shut down.
To support the ISSU operations on a QFX5100 Series switch, a number of connections and communications between the two VMs must exist. The passive connections become active when the graceful Routing Engine switchover (GRES) event associated with ISSU is complete.
The em0 and em1 interfaces are management ports. The em2 interface is used by the master Routing Engine (RE) to communicate with the host machine. The em3 interfaces are used for RE-to-RE communications during a ISSU operation.

There are few requirements for ISSU to be possible:

  • GRES must be configured
  • NSB must be configured
  • NSR must be configured

This is what is happening during ISSU:

  • Check if prerequisites are met
  • Spawn the backup RE(RE1) with the new software
  • Synchronize RE1 with RE0
  • Make RE1 the master RE with PFE control
  • Rename RE1 to RE0
  • Shut down the initial VM

There are few limitations and caveats, but they are subject of change in the future meaning that in the future they will no longer exist so it’s better at the time when you will perform ISSU to check what are the current limitations.

If ISSU is failing, the logs about the ISSU events are saved in /var/log/vjunos-log.tgz.

So let’s try to perform ISSU on a QFX5100 without any of the high availability/redundancy features:

 

{master:0}[edit]
root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz
warning: GRES not configured

{master:0}[edit]
root@QFX5100#

 

As you can see, the upgrade doesn’t even start when GRES is not configured. After it is configured, let’s give it another try:

 

{master:0}[edit]
root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz
Starting ISSU Fri Sep 11 12:04:08 2015

 PRE ISSU CHECK:
 ---------------
 PFE Status                            : Online
 Member Id zero                        : Valid
 VC not in mixed or fabric mode        : Valid
 Member is single node vc              : Valid
 BFD minimum-interval check done       : Valid
 GRES enabled                          : Valid
 NSR not configured                    : Invalid
 drop-all-tcp not configured           : Valid

error: System not ready for ISSU.

{master:0}[edit]
root@QFX5100#

 

As you can see, now it’s complaining about NSR not being configured. After NSR is configured, let’s see what happens:

 

{master:0}[edit]
root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz
Starting ISSU Fri Sep 11 12:04:49 2015

 PRE ISSU CHECK:
 ---------------
 PFE Status                            : Online
 Member Id zero                        : Valid
 VC not in mixed or fabric mode        : Valid
 Member is single node vc              : Valid
 BFD minimum-interval check done       : Valid
 GRES enabled                          : Valid
 NSR enabled                           : Valid
 drop-all-tcp not configured           : Valid

warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
error: 'Non Stop Bridging' not configured
error: aborting ISSU Fri Sep 11 12:04:50 2015
error: ISSU Aborted!
ISSU: IDLE

{master:0}[edit]
root@QFX5100#

 

It’s obvious that NSB is missing and this is the last prerequisite before ISSU can start and pass the mandatory checks.

This is a full ISSU log that contains the pre-checks and the operations during the ISSU:

 

{master:0}[edit]
root@QFX5100# run request system software in-service-upgrade /var/tmp/jinstall-qfx-5-13.2X51-D36.1-domestic-signed.tgz
Starting ISSU Fri Sep 11 12:05:30 2015

 PRE ISSU CHECK:
 ---------------
 PFE Status                            : Online
 Member Id zero                        : Valid
 VC not in mixed or fabric mode        : Valid
 Member is single node vc              : Valid
 BFD minimum-interval check done       : Valid
 GRES enabled                          : Valid
 NSR enabled                           : Valid
 drop-all-tcp not configured           : Valid

warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
ISSU: Preparing Backup RE
Prepare for ISSU
ISSU: Backup RE Prepare Done
Extracting jinstall-qfx-5-13.2X51-D36.1-domestic ...
Install jinstall-qfx-5-13.2X51-D36.1-domestic completed
Spawning the backup RE
Spawn backup RE, index 1 successful
GRES in progress
GRES done in 0 seconds
Waiting for backup RE switchover ready
GRES operational
Copying home directories
Copying home directories successful
Initiating Chassis In-Service-Upgrade
Chassis ISSU Started
ISSU: Preparing Daemons
ISSU: Daemons Ready for ISSU
ISSU: Starting Upgrade for FRUs
ISSU: FPC Warm Booting
ISSU: FPC Warm Booted
ISSU: Preparing for Switchover
ISSU: Ready for Switchover
Checking In-Service-Upgrade status
  Item           Status                  Reason
  FPC 0          Online (ISSU)
Initiate em0 device handoff
Console and management sessions will be disconnected. Please login again.
ISSU Done ~Fri Sep 11 12:12:43 2015
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
em0: bus=0, device=3, func=0, Ethernet address f0:1c:2d:42:39:f8


QFX5100 (ttyd0)

login:

 

As mentioned, during ISSU there is another VM spawned so that there will be Routing Engine redundancy.

You can see this by performing “show chassis routing-engine” during the ISSU.

The below output is captured from before the switchover when the backup VM(the future master RE) was just spawned:

 

{master:0}
root@QFX5100> show chassis routing-engine | no-more
Routing Engine status:
  Slot 0:
    Current state                  Master
    Temperature                 35 degrees C / 95 degrees F
    CPU temperature             35 degrees C / 95 degrees F
    DRAM                      1953 MB
    Memory utilization          35 percent
    CPU utilization:
      User                       2 percent
      Background                 0 percent
      Kernel                     2 percent
      Interrupt                  0 percent
      Idle                      96 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Uptime                         3 hours, 30 minutes, 1 second
    Last reboot reason             0x4000:VJUNOS reboot
    Load averages:                 1 minute   5 minute  15 minute
                                       0.18       0.24       0.12
Routing Engine status:
  Slot 1:
    Current state                  Backup
    Temperature                 0 degrees C / 32 degrees F
    CPU temperature             0 degrees C / 32 degrees F
    Memory utilization           0 percent
    CPU utilization:
      User                       0 percent
      Background                 0 percent
      Kernel                     0 percent
      Interrupt                  0 percent
      Idle                       0 percent
    Model                          QFX Routing Engine
    Serial ID                      BUILTIN
    Last reboot reason             0x4000:VJUNOS reboot
    Load averages:                 1 minute   5 minute  15 minute
                                       0.00       0.00       0.00

{master:0}
root@QFX5100>

 

And that is how ISSU on QFX5100 is working. Now you should have a better understanding how you can achieve zero packet loss during an upgrade on a standalone QFX5100.

 

The following two tabs change content below.

Paris ARAU

Paris ARAU is a networking professional with strong background on routing and switching technologies. He is a holder of CCIE R&S and dual JNCIE(SP and ENT). The day to day work allows him to dive deeply in networking technologies. Part of the continuously training, he is focusing on Software Defined Network and cloud computing.

Comments

This post currently has 5 responses

  • Thanks for the post.

    Can I ask you a question?
    I have a QFX5100-96S.
    It seems that it has only one VM.
    When I issue “show chassis routing-engine”, it shows only RE0.
    Is there any prerequisites to use second VM?

    Regards

      • Thank you for the reply.

        I’m not a native user of English.
        And the device was delivered to my customer, so I cannot have hands-on now.
        Please understand me for asking again.

        Below are how I understand.
        Let me know which one is correct
        or correct me if both of two are wrong.

        1.
        The device (QFX5100) runs only one VM before ISSU is issued.
        So, it shows only one RE in output of “show chassis hardware” and “show chassis routing-engine” at ordinary times.
        Only after ISSU is issued, the device creates second VM and it shows two REs not only during ISSU but also after ISSU.

        2.
        The device runs only one VM at ordinary times.
        It creates second VM only for ISSU and delete it after ISSU.
        Thus, it shows only one RE in output of “show chassis hardware” and “show chassis routing-engine” before ISSU and after complete ISSU totally.
        It shows two VMs(REs) only during ISSU.

        Which one is correct?
        Kindly let me know.

        Regards.

Leave a Reply

Your email address will not be published. Required fields are marked *

Sidebar



%d bloggers like this: