

# Development of CMS L1 Tracking Trigger Vertical Slice System Demonstration

Ted Liu\*, Gregory Deptuch, Jim Hoff, Zhen Hu, Sergo Jindariani, Siddhartha Joshi, Jamieson Olsen, Luciano Ristori, Rafael Lopes De Sa, Nhan Tran, Jin-yuan Wu, Zijun Xu, Hang Yin  
Fermilab  
Kristian Hahn\*, Kevin Sung, Marco Trovato, Stanislava Sevova  
Northwestern University  
Jacobo Konigsberg\*, Darin Acosta, Ivan Furic, Souvik Das, Jia Fu Low  
University of Florida  
Ricardo Eusebi\*, Alexei Safonov, Keith Ulmer  
Texas AM  
Eva Halkiadakis\*, Yuri Gershtein, Amit Lath, Robert Stone, Sunil Somalwar  
Rutgers University  
Richard Cavanaugh\*, Nikos Varelas, Leonard Apanasevich  
UIC

## Abstract

In order for CMS to fulfill its aggressive goal to pursue a rich physics program in the high pile-up environment envisioned for HL-LHC, the experiment must preserve its ability to identify signatures of events originating from interesting physics processes in real time. The ability to use tracker information at Level 1 trigger provides a highly efficient handle for pile-up mitigation. In this proposal we request the second year USCMS funding for the R&D program aimed at developing a tracking trigger solution for the HL-LHC CMS upgrade. This R&D program will allow USCMS to continue to lead the development and construction of a Vertical Slice Demonstration System. It will comprise a full tracking trigger path, running at full speed with simulated high-luminosity data to measure trigger latency and efficiency, to study overall system performance and to identify appropriate solutions to possible bottlenecks. The success of this project will provide the needed proof-of-existence of a L1 silicon-based tracking trigger for HL-LHC, and will allow the design of the upgraded tracker to be finalized. L1 track triggering will be among, if not the, highest profile upgrade for CMS in Phase II. The project will directly and visibly impact the quality of physics results produced at the HL-LHC. The system we propose could provide the highest of rewards at a reasonable risk: we seek to bring modern technologies that are pervasive in other fields to bear on a problem that is familiar in hadron collider experiments. Given the scale of the real time processing challenge we seek to address, the project's success will attract attention not only within HEP, but also from the growing number of high performance computing consumers. With the investments already made in the project by USCMS and by outside sources, and considering the level of interest expressed by university groups, the CMS Tracker project, and within the larger collaboration, we are primed to make rapid progress on a decisive L1 tracking trigger demonstration system. Funding is requested to support engineering efforts in developing critical components of the system: system specification, the custom ATCA hardware used for data dispatching, the custom associative memory (AM) chip for pattern recognition, and the pattern recognition mezzanine (AMchip+FPGA based), as well as simulation software development. As explained in this proposal, adequate funding for FY2015 is crucial for the success of this R&D project in general, and for securing USCMS's unique leadership position in this area in particular.

---

<sup>\*</sup>) Denotes PI/Contact person for the corresponding institution. Overall project contact: Ted Liu, thliu@fnal.gov.

# Contents

|                                                                            |           |
|----------------------------------------------------------------------------|-----------|
| <b>1 R&amp;D Overview</b>                                                  | <b>2</b>  |
| 1.1 Physics Motivations . . . . .                                          | 2         |
| 1.2 R&D Overview . . . . .                                                 | 2         |
| <b>2 Technical Description and Deliverables</b>                            | <b>4</b>  |
| 2.1 Introduction . . . . .                                                 | 4         |
| 2.2 Track Trigger System Architecture . . . . .                            | 5         |
| 2.2.1 Tracker geometry and Trigger Towers . . . . .                        | 5         |
| 2.2.2 System Architecture . . . . .                                        | 6         |
| 2.2.3 Architecture Flexibility . . . . .                                   | 7         |
| 2.3 Vertical Slice Demonstrator System: Overview and Methodology . . . . . | 8         |
| <b>3 What has been achieved in CY2014 and Plan for CY2015</b>              | <b>8</b>  |
| <b>4 Schedule, Milestones, and Resources</b>                               | <b>15</b> |
| 4.1 Overall Schedule . . . . .                                             | 15        |
| 4.2 Milestones . . . . .                                                   | 16        |
| 4.3 Facilities, Equipment, and Other Resources . . . . .                   | 16        |
| 4.4 Outlook . . . . .                                                      | 17        |
| <b>5 Budget Requests for FY2015</b>                                        | <b>17</b> |

# 1 R&D Overview

## 1.1 Physics Motivations

As outlined recently by the US P5 report as well as by the European Strategy reports and CERN management, the high luminosity LHC (HL-LHC) is one of the top priorities for the particle physics community. The CMS collaboration intends to pursue a rich physics program in the era of the HL-LHC. Precision measurements of the properties of the newly discovered Higgs boson will clearly be central to the CMS physics program. The relatively low transverse momentum of Higgs decay products will make Higgs identification a formidable task in the harsh collision environment of the HL-LHC.

Virtual corrections to the Higgs mass imply that its natural value should be very far from the electroweak scale. In the absence of fine-tuning, the low value observed for the Higgs mass strongly suggests the presence of new physics. A variety of new physics scenarios (composite Higgs, extra dimensions and super-symmetry, for example) naturally tame radiative Higgs mass corrections. It is therefore essential for CMS to have the ability to cover a large region of phase space for potential new particle production. This phase space, like the Higgs, is near the threshold of current CMS triggering capabilities. With the existing L1 system, the higher trigger rates anticipated in the HL-LHC will necessitate significantly higher trigger thresholds, drastically diminishing the discovery potential of CMS.

Perhaps the largest open question the HL-LHC can address involves the production of dark matter. Physics analyses at the LHC are capable of confirming or denying the existence of dark matter in regions of space that direct and indirect detection experiments can not reach. Signatures of dark matter in a proton collision environment typically involve missing transverse energy (MET), a quantity that is difficult to utilize in high pileup conditions. Recently, however, novel MET reconstruction techniques have demonstrated an encouraging degree of pileup insensitivity. These techniques rely heavily on tracking information, and can in principle be utilized for MET calculations performed in the trigger, as well as in the offline environment.

The physics program outlined above involves signatures with relatively low energy or with poorly resolved final state objects. Such objects require low L1 trigger thresholds that are robust against the high intensity and pileup conditions expected at the HL-LHC. The trigger thresholds used in current physics analyses cannot be maintained at the HL-LHC without incorporating tracking information into the earliest stage of the event selection process. Maintaining low trigger thresholds in order to preserve high efficiency for low-pt objects is the primary physics motivation of the work proposed here.

## 1.2 R&D Overview

While the discovery of the Higgs boson is a major achievement, many questions remain, including those regarding the precise mechanism of electroweak symmetry breaking and the nature of dark matter. In order to maximize the potential for discovery, CMS must preserve or improve its ability to identify, in real time, events with signatures consistent with the Higgs boson and new particle decays. This is a highly non-trivial task given the high pile-up conditions anticipated in the HL-LHC era.

At CMS, the only major detector not used in the present L1 trigger is the Silicon Tracker. It has become clear that the development of a L1 tracking trigger will be required for CMS to maintain physics acceptances for basic objects (leptons, photons, jets and MET) in the HL-LHC era. Without L1 tracking, the resolution of quantities traditionally used in L1 will seriously degrade due to pileup effects. Trigger thresholds would need to increase in response, which would lead to unacceptable losses in trigger efficiency. Most of the anticipated CMS physics program would slip out of reach.

Consequently, the design of the Phase-II CMS Tracker must allow for an effective implementation of the tracking trigger. Because the construction of the Phase-II Tracker will take many years, its design must be finalized soon (by the 2017 TDR). A silicon-based L1 tracking trigger has never been realized at this scale and thus it is imperative that its feasibility be demonstrated before the design of the Phase-II Tracker can be finalized. Silicon-based Level-2 tracking trigger systems based on associative memory were successfully implemented in the past [1] [2] and are being actively explored at present [3]. Experience with these systems will serve as useful input to the design of the CMS L1 tracking trigger. However the higher occupancies anticipated at the HL-LHC and the low latencies required at L1 (about several  $\mu$ s for the track finding stage) present us with a formidable set of challenges that must be attacked with a well organized R&D project. The participation of CMS institutions with strong expertise in modern high-speed electronics and pattern recognition technologies will be crucial for the success of this important

R&D program.

A number of USCMS institutions led by Fermilab have established a strong generic R&D program in the area of silicon-based tracking trigger. This R&D program, funded mostly by non-CMS sources until last year, has so far yielded excellent prototype results and put USCMS in the unique position to develop a working solution for the CMS L1 track trigger. The long-term goal of this R&D effort is to develop these critical technologies to the point where we can ultimately propose them as a viable solution to the problems of HL-LHC L1 track triggering. Given the fact that Tracker Upgrade TDR is due in a few years, the progress made so far by this R&D has well positioned us to take the next important step to establish a Vertical Slice Demonstration System. We have proposed a new architecture and system demonstration design [4] to the CMS Phase-II Tracker community, and this has been well-received. This system will comprise a full tracking trigger path and will be used with simulated high-luminosity data to measure trigger latency and efficiency, to study overall system performance and to identify appropriate solutions to possible bottlenecks. The proposed system is not intended to be final; rather, serves the purpose of an existence proof. If a functional vertical slice can be demonstrated with today's technology, Moore's law and developments in the semiconductor industry guarantee that the tracking trigger will become less expensive and more performant as we progress toward the HL-LHC.

Processing each beam crossing implies finding and fitting thousands of tracks starting from a collection of "stubs" (hit pairs) from the front-end Tracker sensors. 40M beam crossings per second must be processed with a maximum latency of order of a few microseconds. The total raw computation power needed to solve this problem is huge, several orders of magnitude larger than what has ever been used for L1 triggering in the past. The problem obviously calls for massive parallelism. We choose a processing scheme in which data from different bunch crossings or from different regions of the detector in the same crossing are processed in parallel. For the purpose of regional multiplexing, we divide the detector into 48 angular regions ( $6$  in  $\eta$  times  $8$  in  $\phi$ ) we call "towers". We assign multiple processing engines to each tower so that data from that tower and from different crossings may be processed in parallel. In such a parallel system, one significant problem is how to optimally distribute data among the various processors. Data from the same crossing, coming from different detector elements, must be assembled and delivered to the same processing unit for track reconstruction. Data from different crossings, coming from the same detector element, must be delivered to different processing units for optimal time multiplexing. The subdivision of the detector into geographical towers, does not lead to an exact corresponding subdivision of the track parameter space. Data coming from a given geographical tower may need to be delivered to multiple parameter space regions. This happens, in particular, when a stub comes from a detector element close to the border between geographical towers, due to the finite curvature of charged particles in the magnetic field and finite size of the luminous region along the beam axis.

In addition to the complex data dispatching challenge, there is the obvious challenge of finding and fitting hundreds of billions of tracks every second. This requires extremely fast pattern recognition algorithms. The Associative Memory [1] uses a massively parallel architecture to tackle the intrinsically complex combinatorics of track finding algorithms, avoiding the typical power law dependence of execution time on occupancy and solving the pattern recognition in times roughly proportional to the number of hits. This is of crucial importance given the large occupancy fluctuations typical of hadronic collisions. The design of an Associative Memory system capable of dealing with the complexity of HL-LHC collisions and with the short latency required by Level 1 triggering poses significant, as yet unsolved, technical challenges. For this reason, an aggressive R&D program at Fermilab has been launched to advance the state-of-the-art in associative memory technology (3D VIPRAM [11] R&D is funded by DOE CDRD program [12]).

Because the Associative Memory approach is thus far the only proven solution to hardware-based tracking triggers in a hadron collider environment (albeit only for Level 2 triggers applications), it is chosen as the baseline for the demonstration in what follows. The overall system architecture we are proposing, however, is open and flexible, and could be used to support a variety of pattern recognition engines. This will provide for testing and direct comparisons with possible alternative pattern recognition scheme, such as the tracklet-based approach, described in a separate proposal.

For the reasons above, the design of the overall architecture is focused on the need for efficient dispatching of the data for time and regional multiplexing, and on providing a common, flexible framework for pattern recognition and track fitting. Efficient data dispatch for time and regional multiplexing requires high bandwidth, low latency, and flexible real time communication between processing nodes. A completely interconnected, "full mesh" backplane is a natural fit to these criteria. A custom full mesh enabled ATCA board called Pulsar II has been designed at Fermilab with the goal of creating a scalable architecture abundant in flexible, non-blocking, high bandwidth board-to-board communication channels. The Pulsar II hardware will be the workhorse for the vertical

slice demonstration. The architecture we are proposing for the CMS L1 tracking trigger demonstration permits high bandwidth inter-board communication. The full-mesh backplane is used to time-multiplex the high volume of incoming data in such a way that I/O demands are manageable at the board and chip level. The resulting architecture will provide an early technical demonstration using existing technology, will allow for the exploration and comparison of various approaches for pattern recognition and track fitting.

The proposed system architecture is described in the following section. We describe an affordable demonstration system that can be designed and built within 2-3 years. We then define the Vertical Slice that we propose as the deliverable of this R&D project. Next, we describe the track finding approach we are pursuing, and discuss the advantages and challenges related to this approach. We then discuss the work that has been accomplished thus far and that which is required in the coming two years. We close with the funding request. This USCMS R&D project, if adequately funded, will allow our community to focus attention on the complex challenges of L1 tracking, to compare different possible solutions to the fundamental pattern recognition and track fitting problems in the HL-LHC era, and to gain the experience necessary to design the final system.

## 2 Technical Description and Deliverables

### 2.1 Introduction

For CMS at HL-LHC, the bandwidth required to bring all data from the massive outer silicon detector reaches 100 Tbps. Every 25 ns, all tracks with  $p_T$  above 2 GeV/c from (on average) 140 interactions need to be fully reconstructed. Current estimates show that only a few microseconds will be available for L1 trigger processing. This includes the time needed for data dispatch to the trigger towers, pattern recognition, track fitting. Because each of these processes must be accomplished within a very short time, communication between processing elements in different towers requires very high bandwidth and very low latency. Extremely fast and effective track fitting is also required. This is as high performance as computing gets. L1 tracking for the HL-LHC will require the most advanced real time processing technology.

In the traditional SVT-like AM approach, pattern recognition is solved by the Associative Memory, while track fitting is done in FPGAs using a linear approximation of the dependence of the track parameters on the exact location of the hits within each road. Because roads are narrow, this linear approximation works very well and the track fitting stage is much simplified and fast [15] using pre-calculated track parameters for hits in the center of the road, and applying corrections that are linear in the exact position of the hits in each layer. Although roads are narrow, there is still a finite probability (especially with high occupancy at HL-LHC) that multiple hits may fall within the same road for a given detector layer, requiring multiple fits with different hit combinations and leading to longer execution times. To reduce latency, the occurrence of multiple hits from the same detector layer in the same road must be minimized. Consequently, roads must be made as narrow as possible, which requires higher number of patterns that must be stored in the AM. This is why an aggressive R&D program focused on achieving higher AM densities is an important component of the effort needed to reach the unprecedented low latencies required for silicon based tracking at the HL-LHC.

The PRAM pattern density can be improved by optimizing the design in single-layer chips (2D), using custom cell designs with smaller feature size technology. An R&D effort on-going at INFN aims to use 65 nm technology to improve AM design for the ATLAS FTK application (a L2 tracking trigger). INFN now has in hand a 65 nm AM prototype, the AMchip05. A predecessor, the INFN AM06, is expected to be submitted in Spring 2015. The AMchip05 or AM06 have not been designed or optimized for L1 track trigger applications; however, if available, these chips could be used for the initial testing of the CMS L1 tracking trigger demonstration system.

The on-going R&D effort at Fermilab explores the use of both conventional 2D and emerging 3D technology in the design of a future generation of PRAM chip (based on the VIPRAM [11], [12] approach) specifically for the needs of the L1 CMS tracking trigger. The Fermilab VIPRAM R&D project has two goals. The first is to increase in pattern density through the use of vertical integration and circuit and geometrical (layout) enhancements. This project will continue through FY15 for proof-of-principle of the 3D VIPRAM concept and is funded by DOE CDRD. The second is to increase speed and to improve the system interface, specifically with regard to Level 1 Tracking Trigger applications for CMS at HL-LHC.

Establishing international collaborations within CMS to work on this project is essential. For most recent status of tracking trigger activities within CMS, please see presentations at the recent joint meeting of front-end, TK-DAQ and Track Finding meeting [8], a summary of L1 track trigger progress report is also available [9]. Both INFN/Italy and Lyon/France have joined us to work on the vertical slice demonstration using associative memory approach,

while others are interested in contributing or exploring possible new track finding algorithms on the same hardware platform. For example, INFN Italy has been working together with the Atlas FTK team exploring the possibility of using FTK associative memory chips for the CMS L1 tracking trigger demonstration. INFN is working closely with Fermilab to develop the pattern recognition mezzanine for the Pulsar II board to host the FTK AMchips (note that another mezzanine is being designed specifically for the associative memory chips being developed at Fermilab). Experience gained with the FTK AMchips will be useful for guiding the design of a dedicated, CMS L1 associative memory chip at Fermilab.

Besides the hardware development, one of the main activities in the coming year in USCMS (FY2015) will consist of extensive simulation efforts done by physicists, in order to establish technical specifications based on Phase 2 physics goals. Due to the intrinsic massive parallel processing hardware nature of the AM operation, there is a clear challenge in using software based simulation to emulate the hardware performance. The development of simulation tools for the Associative Memory approach in CMS has been so far lead by the Lyon group, with significant progress made recently towards making the machinery work. The AM Simulation Camp [10] at CERN in July 2014 has attracted many people and groups to attend, and many from USCMS. However, much of the work still remain to be done. Fermilab/LPC/USCMS have been working and will be more involved in the simulation efforts in FY2015.

## 2.2 Track Trigger System Architecture

### 2.2.1 Tracker geometry and Trigger Towers

Multiple challenges must be faced at the different stages of the processing chain. First, data needs to be transferred out of the Tracker at the necessary speed. Stubs from thousands of silicon modules must then be formatted, organized into  $\eta - \phi$  trigger towers, duplicated and shared across tower boundaries as needed. Next, pattern recognition and track fitting must be performed. Finally, all reconstructed tracks must be processed to form an intelligent trigger decision. A coherent system design for a Level-1 track trigger must consider each of these aspects.

For simplicity, we assume that the FEDs are upstream, receive fibers from the modules, and pass the relevant stub data to the track trigger system. One could further consider an architecture in which the FED resides in the same ATCA shelf as the track trigger processing boards; however, the focus of this document is the Vertical Slice Demonstration System, not DAQ readout, so do not attempt to specify FED details here. The FED interface will need to be defined for demonstration purposes, however the actual FEDs do not need to be involved in the tracking trigger demonstration.

The found stubs are sent from the modules using a block synchronous data transfer scheme which tolerates random occupancy fluctuations while bonding latency. The current plan is to have the data from 8 consecutive beam crossings as one block. The front-end designers are finalizing the baseline of the data formats after much work investigating different format variants for robustness against rate fluctuations, ease of implementation, impact on power consumption, etc. While choosing the 8 crossings scheme as our current working assumption, our strategy is to design the downstream components to be flexible enough to handle different possible formats.



Figure 1: Six sectors in  $\eta$  (left). Note that the symmetry around  $\eta = 0$  will provide for easier cable grouping. Eight sectors in phi (right).

Stubs from the 15K silicon modules must be delivered to the correct trigger towers. Detailed data sharing studies have been performed for the Barrel-Endcap (BE) Tracker geometry with different trigger tower partitions. Based on these studies, a  $6$  (in  $\eta$ )  $\times$   $8$  (in  $\phi$ ) =  $48$  trigger tower partition has been chosen as the baseline configuration (see Figures 1). Detailed studies have been performed on data sharing assuming the default 48 tower partition with a minimum  $p_T$  of 2 GeV and track origin smearing in  $z \pm 7$  cm. Figure 2 shows the number of trigger towers that stubs from a given module must be delivered to under these conditions. When a stub is in the middle of the trigger

tower, it will have to be delivered to only one tower (to the native trigger tower). When a stub is near the boundary in phi or eta (but not both), it will have to be delivered to two towers. If a stub is at both the boundaries in eta and phi, it will have to be delivered to four towers. Note that four towers is the maximum number of towers any stub must be delivered to.

The subdivision of the tracker into 48 trigger towers is shown in Figure 2 top, where the colored lines indicate all needed interconnections among the trigger towers. A unique feature of this arrangement is that any given trigger tower needs only to be connected with its immediate eight neighbors for stub sharing. Studies have shown that this feature is essentially independent from the minimum pT threshold and track origin smearing in z.



Figure 2: Top: Conceptual view of the proposed CMS phase II L1 tracking trigger towers. The formation is organized as 48 trigger towers ( $6 \eta \times 8 \phi$ ). Each node in this diagram represents a trigger tower processor engine. Within each crate the full mesh backplane is used for time multiplexing of the incoming data, while the data sharing between towers is handled with inter-crate fiber links. Bottom: A simple system configuration that can be built with today's technology assumes one ATCA shelf per trigger tower (the actual system will likely be smaller in the future).

### 2.2.2 System Architecture

The tower processor platform must support large number of fiber transceivers, which are used for receiving input links and sharing data between neighboring towers. A flexible, high bandwidth backplane is also required to quickly transfer data between boards. The boards should be large enough to support pattern recognition engines and fiber connections. Given these requirements, we propose a full mesh 14 slot ATCA shelf to support the tower processors. An ATCA shelf is typically an air-cooled 13U rack mounted chassis consisting of 14 slots. The first two slots are reserved for Ethernet switch blades. Switch blades may include a fast CPU and are often used for controls and other system functions. The remaining 12 slots are used for processor or payload blades. In a full mesh ATCA backplane each pair of slots is directly connected with a multi-lane bidirectional serial channel capable of supporting sustained 40 Gbps data transfers. A modern "40G" full mesh ATCA shelf has a total aggregate

bandwidth of over 7 Tbps, not including external I/O.

For simplicity, we assume one ATCA shelf per trigger tower for the moment. Following this assumption, if the L1 Tracking Trigger system were built today, the full system would comprise 48 ATCA shelves, as shown in Figs. 2. Our assumption is, of course, very conservative. The actual system will most likely be significantly smaller due to rapid progress in technology development. Note that connections between tower processor shelves are limited to eight nearest neighbors, and this can be easily achieved.

In addition, as shown in Figure 2, an additional shelf could act as a second stage processor. Each board in this shelf could receive tracks from complete events, allowing track duplication removal to be implemented at this stage. With more boards to share data over the full mesh backplane, it is also possible to implement jet related triggers, vertexing capabilities, track based MET calculations, etc.

The processor blade is the Pulsar 2b, which is shown in Figure 4. The front board measures 8U x 280mm and is designed around a single FPGA. This FPGA connects directly to the full mesh backplane fabric, mezzanine cards, and fiber transceivers located on a rear transition module (RTM). For the most part, communication channels are high speed serial point to point links and are directly supported by SERDES transceivers in the FPGA.

The fundamental processing element or engine is a pattern recognition mezzanine (PRM) card. The Pulsar II supports four mezzanine cards which conform to the FPGA Mezzanine Card (FMC) standard. These mezzanine cards may contain FPGAs, pattern recognition ASICs, fiber optic transceivers, or any other custom hardware. The PRM performs both track finding and fitting. Time multiplexed data transfers into several parallel PRMs can reduce bandwidth requirement to manageable level. PRM's using different approaches to track finding and fitting may be tested and compared within the same overall high-level system architecture and data dispatching scheme. The first prototype is shown on Figure 4 next to the Pulsar 2b. It features four SFP+ pluggable serial transceivers (for standalone data receiving), a Kintex 7 FPGA, configuration flash memory, DDR3 memory, power supplies, local oscillators, a test socket for testing custom ASIC chips (primarily aimed at testing pattern recognition associative memory devices). This prototype mezzanine has been used extensively in FY 2014 to test the protoVIPRAM chips. Future mezzanine card designs will feature larger, more powerful FPGAs and will support multiple PRAM ASICs. A new version of mezzanine design is in progress.

### 2.2.3 Architecture Flexibility

The system architecture described above is scalable, flexible and will enable us to provide an early technical demonstration of the feasibility of a L1 tracking trigger for CMS. A major advantage of the full mesh backplane is that it effectively blurs the distinction between boards, thus enabling system architects to experiment with different shelf configurations. In the following sections we briefly illustrate two kinds of tower processor systems made possible by the flexibility of the full mesh architecture.

**N DIB and M PRM configuration ( $N + M \leq 12$ )** The most straightforward tower processor architecture consists of N data input boards (DIB), which receive input links and perform zero suppression. A DIB may be built using the generic ATCA processor blade (Figure 4) if the data is coming from FEDs or directly from the detector modules. It is also possible to use a generic ATCA carrier board and several FED AMC mezzanines, if the latter were to ultimately implement the DIB functionality (ie: the ability to pass stubs to the L1 track trigger PRBs). After zero suppression, the N DIBs transfer the event data to M number of pattern recognition boards (PRB), which contain Mx4 pattern recognition mezzanine (PRM) cards. Data transfers from the DIBs to the PRMs are time multiplexed, thus the bandwidth requirements can be significantly relaxed. For example, the bandwidth requirement for the fabric channels over the full mesh backplane can be reduced to 20 Gbps, assuming worst case scenario of 500 32-bits stubs per trigger tower per beam crossing (current studies show that on average 200 stubs are expected).

**DIB/PRB combo configuration** DIB and PRB functionalities could also be combined into a single blade design, which is a special case of the "N DIB and M PRB" configuration described above ( $N=0$  and  $M=12$ ). A tower shelf would then consist of 10 Processor blades, one Gateway blade (for data sharing), and one Collector blade (for tracks found). These three different blade functionalities can be implemented in the same hardware, and Pulsar 2b is designed to meet all the requirements. The 10 Processor blades will process events in a round-robin fashion by communicating over all available channels of the full-mesh backplane. By using the full mesh fabric more

effectively we are able to decrease the channel bandwidth requirement from 20 Gbps down to 6 Gbps with no significant latency increase. Note that Pulsar 2b has 20 Gbps fabric interface bandwidth capability, and therefore can meet the requirements in any of the system configurations described above.

### 2.3 Vertical Slice Demonstrator System: Overview and Methodology

The flexible architecture described above lends itself to an early technical demonstration of the system. The main goal of the demonstration system is to identify possible problems in the architecture design and, hopefully, find solutions. We would study, measure and optimize trigger latency and efficiencies at different stages of the system using the hardware prototypes that are being developed. Extensive simulation work is needed to guide the hardware implementation and to establish performance expectations that can be compared with actual measurements. The proposed Vertical Slice Demonstration System is shown in Figure 3.

The Data Source mimics the data flow out of the upgraded Phase II outer-tracker. It will drive 300+ fibers (one/module) to the trigger tower under study exactly as if the data were coming from the real detector at high luminosity and full speed. Each fiber connection will transmit data at 3.25 Gbps payload bandwidth, in the same way that modules will in the Phase II Tracker. The data will be derived from simulation, appropriately formatted, stored into on-board memories, and then played back at full speed. The Pulsar IIb can be used for the Data Source stage, as each board has 40 optical interfaces on the RTM (all bi-directional). Eight Pulsar IIb boards can source 320 modules worth of data.

This demonstration system will be implemented in stages: at mezzanine level, board level, crate level and multi crate level. These different stages would naturally proceed in sequence, from the bottom up. This way, we will have the opportunity to learn along the way about the performance of the different components of the system before having to decide exactly how the system will be cabled. A third crate, emulating three neighbor towers, will only be introduced in the late stage of the demonstration, when studies of system dynamics are undertaken.



Figure 3: Vertical slice test bench principle.

The traditional CDF SVT/FTK-style algorithm [15] can be used to benchmark of the performance of the track fitting stage. Various experimental track fitting algorithms can also be implemented in the FPGA on the PRMs. Each can be studied and directly compared using the same vertical slice demonstration setup.

## 3 What has been achieved in CY2014 and Plan for CY2015

**Pulsar II: Pulsar IIb has been successfully tested with excellent results** Leveraging the experience we gained through designing, building and testing the Pulsar IIa board, we have successfully designed and tested the next

generation board, the Pulsar IIb (as shown in Figure 4). Most of the hardware design work for the Pulsar II has been done in FY2014 [5], including the successful design [6] [7] and testing of Pulsar IIb and its related hardware. The details of most recent progress report on Pulsar IIb can be found here [8].

The Pulsar IIb design replaces the two Kintex K325T devices with a single large Virtex-7 FPGA. The GTH transceiver count has increased up to 80 channels, providing a significant bandwidth increase to the RTM, Fabric and Mezzanine cards. The Pulsar IIa design was originally motivated by Atlas FTK needs, but the Pulsar IIb is designed to meet the challenging requirements of CMS L1 tracking trigger demonstration. As such, the performance of the actual Pulsar IIb far exceeds the original FTK requirements. This includes the 80 GTH high speed (10Gbps) bidirectional communication channels from the Virtex-7 FPGA with challenging layout work for large ATCA board, the capability to interface with CMS TTC/FMC card, the capability to distribute TTC clock signals over the backplane so all Pulsar II based crates can run in sync with each other (important for L1 trigger applications), as well as the capability to be compatible with CMS IPBus protocol. The Pulsar IIb, as it is designed, can be used as the workhorse for the Vertical Slice Demonstration system for CMS L1 tracking trigger. The next milestone will be the full crate level testing with Pulsar IIb. Most of the Pulsar IIb design work is supported by non-CMS funds.



Figure 4: Top: The Pulsar IIb [5] [7] with its prototype mezzanine card. Bottom: Pulsar IIb crate.

**ATCA 40G high performance full-mesh backplane evaluation using Pulsar IIb** Few vendors worldwide can produce ATCA shelves with a 40G high performance full-mesh backplane. Because the Pulsar 2b fabric full-mesh performance is excellent (tested to 10 Gbps), two Pulsar IIbs are being used to evaluate the performance of 40G ATCA crate from different vendors. The vendors are very interested in our capability to test their backplanes, and they are offering us free ATCA shelves for short term (2-3 months) evaluation purposes. We have been testing three so far, with another due to arrive soon. Our testing results show that not all 40G full-mesh backplanes are created equal, and we will pick the best one among them.

**Pulsar IIb new RTM, Mini-backplane, and IPMC card** A new RTM design has been finished and submitted recently. This version has increased the channel counts from 38 to 40 (all bidirectional) to be fully compatible with the Pulsar IIb, and has improved high speed signal routing and power regulation and distribution. This version allows eight PulsarIIb boards to sink or source 320 optical links (or modules), the number of modules/fibers targeted for one trigger tower. This RTM revision work is supported by USCMS funding.

A new Mini-Backplane has been developed to loop back all fabric interface channels for high speed (10 Gbps) Pulsar IIb self testing. It also has Base Interface Ethernet ports brought out to RJ45 and SFP+, which enables single board testing on the bench top. The new mini-backplanes have been used extensively during Pulsar IIb testing, and have proven to be highly valuable. This mini-backplane revision work is supported by USCMS funding.

An IPMC (Intelligent Platform Management Controller) mezzanine card has been developed at Fermilab. An IPMC is required for all ATCA boards. The controller talks to the shelf manager to coordinate hot swap, e-keying, and to monitor various board sensors. The FNAL IPMC card has successfully powered up the Pulsar2b and RTM, and it is now part of the Pulsar IIb. This IPMC work is supported by USCMS funding.



Figure 5: Top: The Pulsar IIb new RTM design, just submitted. Bottom Left: Pulsar IIb new mini-backplane; Bottom Right: Pulsar IIb FNAL IPMC card.

**Next versions of Pattern Recognition Mezzanine card design in progress** This is the core pattern recognition engine, and is being designed to host the protoVIPRAM-L1CMS chips for pattern recognition, and the latest Xilinx Ultra Scale FPGAs for high performance track fitting. This design is now in progress and will be one of the two major engineering efforts in FY2015 (the other being the protoVIPRAM-L1CMS chip design).

Track reconstruction typically consists of two steps: pattern recognition followed by track fitting. Pattern recognition involves choosing, among all the hits present in the detector, those hits that were potentially caused by the same particle. The Associative Memory (AM) approach [1] solves the combinatorial problem (due to high occupancy) inherent in this kind of pattern recognition task by employing a massively parallel architecture to simultaneously compare each detector hit to a large number of pre-calculated geometrical patterns. The AM solves the pattern recognition problem in essentially zero time, and only pass the hits of interests to track fitting stage therefore making the downstream task easier and faster.

The pattern recognition stage produces a set of hits of interest. Track fitting involves extracting track parameters from the coordinates of these hits. As soon as hits have been loaded in the AM, found patterns (or fired roads) are ready to be output and processed (fit) with fast FPGAs. Because each pattern corresponds to a very narrow "road through the detector, the usual helical fit can be considerably simplified by using a pre-calculated set of parameter values for the center of the road. Corrections are then applied as a linear function of the actual hit positions in each layer. The hits or stubs of interest within each road are combined to form tracks this way. Track helix parameters and 2 can be extracted from the linear equations in the local silicon hit coordinates. It has been shown (by FTK and SVT) that very good performance (in terms of the resolution of the linear fit) can be achieved this way using

modern FPGA DSPs. Note that the Xilinx Ultra Scale FPGAs are known for their enhanced DSP capacity, making them a suitable choice for the pattern recognition mezzanine card.

Both the pattern recognition and track fitting stages will be implemented on the pattern recognition mezzanine card, making it the most important core pattern recognition engine of the entire L1 tracking trigger system. The main challenge of this work is likely the firmware work due to the low latency requirement for CMS L1 tracking trigger.



Figure 6: The pattern recognition mezzanine card conceptual design (compatible with Pulsar II). The new mezzanine card being designed is double width mezzanine.

**VIPRAM R&D: ProtoVIPRAM 2D chip has been successfully tested** The numerous advantages of an Associative Memory-based track trigger are well established. Its primary limitations lie in pattern density and in readout speed for Level 1 trigger applications. A secondary challenge is to minimize power consumption. Vertical Integration is an emerging technology which offers dramatic improvements in all these areas. The overall objective of the VIPRAM project at this point is to make steady progress towards a final solution. This requires a strategic approach to architecture and layout that permits near term solutions in classical VLSI technology and long term solutions in aggressive Vertical Integration.

From the beginning, our design methodology has been to develop concepts and circuitry in 2D to confirm functionality as economically as possible and then to translate, where necessary, those ideas into 3D. The first step taken by the VIPRAM Project was the development of a 2D prototype (protoVIPRAM1) in which the associative memory building blocks were laid out as if this was a 3D design. Room was left for as yet non-existent Through Silicon Vias and routing was performed to avoid these areas. The readout circuitry is deliberately simplified to allow direct performance studies of the CAM and Control cells. The protoVIPRAM1 was designed and fabricated in a 130nm Low Power CMOS process. The design was thoroughly simulated at all levels before submission and the teststand was fully ready before the chips arrived. In fact, we were able to correctly observe found patterns the day after the first prototype chip became available for testing.

The first prototypes of the protoVIPRAM verifies the CAM and majority logic designs for future use in more application specific chips. The testing results match the simulation studies and show that these building blocks are ready for 3D stacking. The results have been presented at the Front-end electronics workshop (FEE 2014). More detailed testing results of the prototype will be presented at TWEPP 2014. This work is supported by DOE CDRD funds.

**VIPRAM R&D: Next versions of ProtoVIPRAM design in progress** The VIPRAM approach has, from the beginning, attempted to increase pattern density and decrease power density through Vertical Integration. To mitigate issues implicit in adopting an emerging technology, a flexible architecture has been developed that can be implemented in either conventional or Vertically Integrated VLSI. This allows us to bring the system interface to maturity at an early stage while, at the same time, making steady progress towards the final VIPRAM solution. This is particularly important for Level 1 Tracking Trigger applications. The protoVIPRAM1 is the first step to developing the next generation AM chips for L1 applications. The next two steps will be performed in parallel. We now have two designs in progress: protoVIPRAM3D and protoVIPRAM-L1CMS.



Figure 7: The protoVIPRAM and its testing mezzanine card (compatible with Pulsar II). This prototype mezzanine has been used extensively in FY 2014 to test the protoVIPRAM chips.

The protoVIPRAM3D takes the circuitry designed in protoVIPRAM1 and vertically integrates it. The Control cells are moved onto a Control Tier and the CAM cells become a CAM Tier. Since the basic building blocks are already fully tested in 2D, what we will be testing with protoVIPRAM3D is the 3D design and process. The chip interface will be kept the same as the protoVIPRAM1, so that the testing can be done in the same way, allowing us to compare directly the 3D version performance with that of the 2D. This work is supported by DOE CDRD funds.

The protoVIPRAM-L1CMS for CMS, on the other hand, attempts to improve the data input and readout speed of the associative memory chips and bring the system-level interface to maturity using conventional 2D VLSI. The flow of VIPRAMs tasks can be divided into two broad categories: 1) Pattern Recognition Associative Memory (PRAM), and 2) input/output and control (IOC). The former consists of CAM Cells, Majority Logic Cells, and pattern and critical signal distributions. This was the focus of protoVIPRAM1, a 2D implementation of the 3D-compatible cells necessary for the final design. The IOC consists of data input handling, slow control, road match capture, sparsification, and road output. During operation, silicon data is sent to the VIPRAM followed by a unique End-of-Event signal. At the arrival of the End-of-Event, the road match capture logic in the IOC snaps a picture of the state of the PRAM, freeing the PRAM to begin collecting data for the next event, if necessary. The captured road match snap shot is sparsified, placed in a FIFO, serialized and driven off-chip to the track fitting logic.

Design in Vertical Integration is, in a sense, the logical partitioning of functionality into a third dimension. To make an architecture that can be either 2D or simple 3D or more aggressive 3D, the partitioning must also be adjustable so that its granularity can be changed to fit the desired implementation with the present available technology in a cost effective way. The PRAM structure, intrinsically, is adjustable in the 3rd dimension from the full road level down to the individual CAM level. The IOC, being logically separable from the PRAM, can be implemented on its own tier as well, leaving more space for a high performance system interface. This flexible architecture is fully compatible with our long-term goal of high-density 3D stacking while, at the same time, achieving our near-term need of a functional chip for a CMS Level 1 Tracking Trigger demonstration.

This design is dedicated for CMS Level 1 trigger applications. Several of the ideas introduced in the protoVIPRAM1, most notably the square layout of the CAM cells and the simplified readout architecture, will be used as stepping stones for increasing readout speed and flexibility. For example, Figure 8 (right) is the layout of the 8-layer pattern core of protoVIPRAM-L1CMS, which is the basic building block that includes all circuitry necessary to match layer addresses to pattern addresses and then to associate matched addresses to road matches. It is approximately 1600 transistors in an area of 70x70 microns. This pattern footprint size allows us to connect it to the readout architecture either by 3D methods or by classical bump bonding. The image shown is for 3D Direct Bond Interconnect (3DDBI). This is indicated by the array of small octagons across the pattern block. Note that configuring this for bump bonding does not require a new layout. Foundries permit extra wafers to be fabricated using metal redistribution (RDL) layers for bump bonding. In short, the same mask set can fulfill more than one objective. The protoVIPRAM-L1CMS design is supported by USCMS funding.

Both designs are in progress in CY2014 and will continue into CY2015, with chip submission expected in CY2015. Both designs will be on the same MPW run. To realize substantial savings in fabrication costs, protoVIPRAM-L1CMS will share wafer space with two additional chips being developed at Fermilab. One is our own protoVIPRAM3D, the other is the VIPIC (Vertically Integrated Photon Imaging Chip) designed to address the challenges of X-ray Correlation Spectroscopy (XCS). The VIPIC project is funded by BES. Figure 9 shows the division 2:1:2 (area ratio) of a reticle between the protoVIPRAM-L1CMS (HEP/CMS), protoVIPRAM3D (HEP) and VIPIC (BES) projects on the planned 3D run. The total cost of the MPW is \$450K, and protoVIPRAM-L1CMS share is \$180K while the protoVIPRAM3D is \$90K. The protoVIPRAM3D work is supported by generic R&D funds so far. The processing cost following the wafer stage is \$80K for protoVIPRAM-L1CMS, bringing the total



Figure 8: Left: We started with a 3D VIPRAM concept [11] a few years ago and followed with a generic R&D project [12] funded by DOE CDRD. Middle: The first step taken was a 2D prototype (protoVIPRAM1) in which the associative memory building blocks were laid out in 2D as if this was a 3D design. A four layer AM pattern design in 130 nm for protoVIPRAM1, with a layout footprint of 25um X 125um size. Room was left in the middle for as yet non-existent Through Silicon Vias and routing was performed to avoid these areas. Right: One full eight layer AM pattern design in 130 nm for protoVIPRAM-L1CMS, with 70um x 70um size. Note there is no room left for TSVs in this case. One can then tile these pattern blocks on one tier uniformly, and have readout of fired roads on the second tier. The two tiers can be connected either via conventional bump bonding, or using the 3D DBI technology. Current plan is to have the design compatible with both approaches.

cost to \$260K. We are requesting USCMS funds to cover the cost for protoVIPRAM-L1CMS.



Figure 9: A drawing showing the division of a reticle between the VIPRAM (HEP) and VIPIC (BES) projects on the planned 3D run.

**Fermilab URA fellowships for VIPRAM R&D and engineering students** Last year, one URA fellowship was awarded to SMU Electrical Engineering graduate student for one year to work on the power and thermal analysis of the protoVIPRAM design. The work done by the student (from Prof. Ping Gui's group) has laid the foundation for the power and thermal analysis of the chip. This year another URA fellowship was awarded to Northwestern Electrical Engineering graduate student (from Prof. Seda Memik's group) to continue this work for one more year, this time with an emphasis on the protoVIPRAM3D power and thermal analysis. In addition, one visiting engineering student from BIST/India worked on VIPRAM project for more than one year with extensive simulation and testing work done for the VIPRAM project. This work was supported by generic R&D funding. Over the past year, there has been three engineering master degree theses on VIPRAM.

**Work done by the Northwestern group** The Northwestern group (lead by K. Hahn) has made significant progress over the past year on key aspects of system integration and data formatting/transfer. Their integration efforts include the development of IPBus firmware and software for the Pulsar II based ATCA platform. IPBus is a flexible, scalable application layer protocol that will become the primary means of DAQ/trigger slow control in Phase-2. The group has developed a functioning implementation of IPBus firmware for both the Pulsar 2a and 2b, and has demonstrated control of Pulsar hardware with IPBus routed through a commercial ATCA switch.

The group's IPBus implementation is an important milestone for the AM-based Tracking Trigger project, and has facilitated the testing of AM prototypes at FNAL.

Northwestern is additionally developing and testing firmware that will integrate the Pulsar platform with the CMS/LHC TCDS system. The firmware enables the Pulsar to receive LHC signals and clock and to distribute these within the ATCA shelf. The team has successfully tested a basic version of the firmware with a  $\mu$ TCA GLIB and a legacy TTC system at CERN. Their effort is now focused on transitioning the firmware to the Pulsar hardware. A demonstration of TTC distribution over ATCA will be jointly conducted by Northwestern and FNAL at CERN in the near future.

With regard to data transfer, the Northwestern group is performing an extensive evaluation of the Aurora family of protocols from Xilinx. These protocols are an industry-standard for low-latency, multi-gigabit serial communication. The team is assessing the suitability of Aurora for application in the L1 trigger environment by simultaneously characterizing link latency and signal integrity. Northwestern has shown that the Aurora 8b/10b protocol can be used to achieve  $\sim 130$  ns latency for point-to-point Pulsar communication over the ATCA backplane (Figure 10). The team has recently embedded a microblaze soft-processor in their Aurora test-bench to enable the in-system collection of link quality statistics as a function of transceiver/protocol configuration.



Figure 10: Left: Northwestern ATCA test-stand configured for Aurora link testing. Right: scope trace showing  $\sim 130$  ns latency for Aurora 8b/10b Pulsar-to-Pulsar communication over the ATCA backplane.

Northwestern has designed and proposed a data formatting scheme for the communication of stubs from the Tracker front-end to and within the Pulsar II ATCA-based trigger towers. The proposed format was recently endorsed by the Tracker DAQ, Electronics and Tracking Trigger groups. Northwestern (working closely with FNAL) is now developing a first firmware implementation of the corresponding data formatting pipeline for the Pulsar. Northwestern's work has been performed with Xilinx evaluation kits and two ATCA test-stands (one at the university and another at the Tracker Integration Facility at CERN) that the group has established. To date, all equipment, IP licensing and personnel support for Northwestern's work has been privately funded.

The Northwestern group requests \$70K to hire a firmware engineer to support their on-going integration and development activities. Hahn has a commitment from the University for an additional \$70K in matched funds for this purpose. The combined funds will be sufficient to support a qualified junior engineer for 1-1.5 years. The engineer will be based at Fermilab, which will facilitate the coordination and integration of his/her work with that of the overall project.

Northwestern also requests 12 months of COLA at \$1K per month for CERN-based postdoc Kevin Sung. Kevin built and is responsible for the Northwestern test-stand at CERN. This system is an important "beachhead" for the project at the lab. The test-stand is located in the TIF next to a legacy TTC system, and thus has naturally become the center of the project's TTC integration efforts. Soon, a new  $\mu$ TCA CMS TCDS crate will also be installed at the TIF, and the Northwestern test-stand will be employed for integration testing with that system. The CERN-based test-stand includes a  $\mu$ TCA shelf, which provides a means of interface testing and development with Phase-2 Tracker FED prototypes. As with the legacy systems, we anticipate that most of the hardware and operational expertise for the new TCDS and Tracker DAQ systems will be concentrated at CERN.

**AM Simulaiton Work by USCMS** Due to the intrinsic massively parallel processing nature of the AM operation, there is a clear challenge in using software based simulation to emulate the hardware performance and to aid its design. Here we are attempting to simulate the hundreds of million-fold parallelism of the L1 tracking trigger system (a non-von Neumann machine) with commercial CPUs (von Neumann machines). Significant simulation preparation work has been done within CMS in FY2014 [10], and extensive simulation work is expected to be done in FY2015 in order to properly specify the demonstration system. Fermilab/LPC/USCMS have been active in this area and will become more heavily involved in the simulation efforts in FY2015. In particular, the Florida group (working closely with the Fermilab group through LPC) has a dedicated effort that attempts to significantly improve the performance of the simulation tool. Three new groups have joined the project recently, TAMU, Rutgers, and UIC, and each plans to involve themselves in the simulation effort.

The existing simulation tools have been developed by different groups at various times. While they do provide a framework to perform studies, there are lots of room for improvement, specifically with regard to the memory required to simulate the AM hardware and the time required to generate test tracks. Improvement in the software would facilitate a dramatic improvement in how quickly the performance of the system can be studied and tested. This increase in efficiency would affect nearly all aspects of the project development. For this reason, a new software professional hire (with funding at 50% FTE) is requested by both TAMU and Florida groups to strengthen the USCMS effort in this area. This would enable USCMS to drive the systematic evaluation of the performance requirements of the project. This work will optimize the number of reads required from the AM hardware, the data volumes and data links that must be handled, and the tracking performance requirements for physics output. The link between the hardware specifications from the tracker and AM to the physics performance requirements involves many tradeoffs that must be studied in a systematic way, with an understanding of the required trigger latencies and project cost.

## 4 Schedule, Milestones, and Resources

### 4.1 Overall Schedule

The schedule of the proposed program is driven by the current state of the R&D efforts, the CMS desire to perform the Vertical Slice Demonstration by 2017 and (of course) available funding. In order to meet the aggressive deadline, we have to finalize design of the hardware in CY 2014 with year CY2015 dedicated mainly to the firmware development, setting up the demonstration test-stand and performing the actual testing. This was the goal stated in our previous proposal, and so far we are more or less on schedule meeting this aggressive goal for CY2014. This is only possible because we have already done so much tracking trigger R&D over the past few years (mostly with non-CMS funds).

Currently the Pulsar IIb prototypes are undergone extensive testing over the past few months and the results have been very promising, we expect to perform crate level testing of Pulsar IIb in Fall 2015 to make sure the design is good enough for the demonstration. It is likely that based on the results of the testing so far, there is no need for another round of prototyping (Pulsar IIc). Either way, we expect the design will be finalized with production as early as in late 2014, followed by the production version testing in early 2015.

Similarly, the first prototype of the associative memory chip using conventional 2D technology has been successfully tested for the core functionalities together with the prototype mezzanine cards. As outlined above, this prototype chip was intended for testing all the important design blocks of the core functionalities of associative memory. For this purpose the design was kept simple and for this version we intentionally did not include all features needed for L1 applications. Engineering work to adjust the 2D chip design to accommodate higher pattern density, faster speed and sparsified readout needed for L1 applications has started and has been taking into consideration results of the protoVIPRAM1 testing. We expect protoVIPRAM-L1CMS to be submitted in spring 2015 with delivery late 2015.

Design of the Pattern Recognition Mezzanine card is proceeding in parallel and in close communication with the protoVIPRAM-L1CMS development. Submission will likely take place at the end of 2014. Since production of the card is expected to take less time than for the ASIC, the new mezzanine prototype is expected to be available for testing early 2015, prior to the arrival of protoVIPRAM-L1CMS chips. Extensive and challenging firmware work will be done during much of the CY2015. Due to the reduced (50%) FY2014 funding for engineering effort, the protoVIPRAM-L1CMS progress was slowed. As a result, the next version of pattern recognition mezzanine card design also slowed because the design requires the full specification of the protoVIPRAM-L1CMS chip. Both the protoVIPRAM-L1CMS and the pattern recognition mezzanine card designs are crucial for the L1 tracking trigger demonstration and having adequate funding for these efforts in FY2015 is very important to the overall success of

this project.

The data flow and data format within the Pulsar II based tracking trigger demonstration system has been specified in FY2014. The main work left for Pulsar IIb for the vertical slice demonstration will be firmware implementation in FY2015. Much of the CY 2015 will be dedicated to the final tests of the production Pulsar II related hardware components, firmware implementation and initial integration. It is expected that most of the engineering effort will be spent on the firmware development for the Pulsar and PRM cards. Different versions of firmware will be needed for the Pulsar board to function as DIB and PRB, as well as to source data. Initial integration is expected to take place over Summer 2015 followed by initial crate-level testing and measurements of system performance parameters. Of course, the overall schedule of the demonstration system will depend on the schedule of the protoVIPRAM-L1CMS chips.

## 4.2 Milestones

- Pulsar-II/RTM design finalized: by end of CY2014
- Pulsar-II/RTM final design tested: by early CY2015
- Initial Pulsar-II firmware for DIB, PRB finished: early CY2015
- Pattern Recognition Mezzanine card design finished: end of CY2014
- Pattern Recognition Mezzanine card testing: early CY2015
- Demonstration system initial specification through simulation: early CY2015
- ProtoVIPRAM-L1CMS initial design dedicated for CMS L1 Track Trigger finished: early CY2015
- ProtoVIPRAM-L1CMS prototype available for testing: late CY2015
- Crate level test-stand setup: summer CY2015
- Initial system level integration: end of CY2015

## 4.3 Facilities, Equipment, and Other Resources

The proposed R&D would be carried out as a collaborative effort among Fermilab, Northwestern, University of Florida, TAMU, Rutgers, UIC, and Tezzaron Semiconductor [13]. Some of the physicists in this collaboration have been involved in the original design, building, commissioning, operation and upgrade of the CDF SVT system, as well as the design work of the FTK system. Others have a long history of involvement with the current CMS Tracker and CDF SVX detector. A few years ago, Fermilab also collaborated closely with INFN Pisa and Frascati in Italy on the 2D development of AMchip04 [16] in 65 nm. Fermilab contributed to the new Majority Logic design as well as the pattern readout algorithm using a Fisher Tree approach. The extensive experience developed with the associative memory approach within the collaboration will be important for carrying out this R&D project. In addition, this proposal will leverage unique areas of engineering expertise at Fermilab.

The Fermilab ASIC Design Group is a leader in 3D ASIC design, and has expertise with the design of the memory cells. The preliminary protoVIPRAM design work already done by the group would be a starting point for the design of a dedicated associative memory device for the CMS L1 tracking trigger demonstration, using conventional 130nm technology to keep costs low.

The LPC at Fermilab has been and will be providing an intellectual and physical focal point to this project. The groups involved have resident physicists at LPC or are frequent visitors.

We believe that, with adequate funding in FY2015, we are reaching a critical mass of technical and scientific expertise within USCMS to move forward with the design and construction of the vertical slice demonstration for the CMS Level 1 tracking trigger. So far we have attracted many collaborators over the past year within and outside USCMS, and we continue to actively look for more collaborators to join the project.

## 4.4 Outlook

While this tracking trigger R&D proposal is fully dedicated to the CMS HL-LHC Phase II upgrade, it is useful to consider a perspective that extends beyond the LHC. Generally speaking, the ultimate physics reach of any higher energy hadron collider (given a center-of-mass energy) will be governed by its maximum instantaneous luminosity. Given the huge cost associated with any future higher energy hadron collider, it is crucial to push for higher luminosity (similar to HL-LHC or beyond). This is to maximize the new physics reach of the huge investment already made, before a new higher energy collider can be proposed or built. Because tracking information is the most effective means for high pile-up mitigation, a high performance, real time tracking trigger will be mandatory. From this perspective, the USCMS-led tracking trigger project is truly a pioneering effort; not only will it be crucial for the success of CMS physics program in the HL-LHC era, it also lays the technological foundation for the future of the field. In some ways, perhaps the present situation is similar to the case of CDF in the 1980s, when the silicon detector was first developed for hadron collider despite huge technological challenges.

## 5 Budget Requests for FY2015

The fully loaded funding request for FY2015 is \$1294K. This consists of \$716K of labor and \$578K of M&S including travel to CERN. The FY2015 requested funds are for:

1. Pulsar II pattern recognition mezzanine (Fermilab engineer-III time, 1FTE): \$270,376.00 fully loaded
  - Pulsar II pattern recognition mezzanine hardware design, compatible with protoVIPRAM-L1CMS chip
  - Pulsar II pattern recognition mezzanine firmware (core engine)
2. The design of 2D protoVIPRAM-L1CMS (Fermilab engineer-III time, 1FTE): \$270,376 fully loaded
  - protoVIPRAM-L1CMS chip design dedicated for CMS L1 track trigger
3. ProtoVIPRAM-L1CMS submission: \$260,000.00
  - Shared cost of Fermilab dedicated MPW run in CY2015: 40% of \$450,000 total = \$180,000
  - Extra processing cost: \$80,000
4. Pulsar II ATCA related Hardware needs: \$178,000.00
  - One ATCA shelf for the core trigger tower at Fermilab teststand
  - One crate (Twelve slots) worth of Pulsar II/RTMs and one HUB board
  - Ten Pattern Reconcintion Mezzanine card prototypes
5. Pulsar II firmware work (Northwestern junior engineer time, 1FTE): \$70,000.00 fully loaded
  - Pulsar II firmware and system integration
  - Northwestern University will match \$70,000.00
6. Travel for Fermilab Engineers
  - Fermilab engineers: four trips to CERN (one week per visit). \$14,892.00 including all overheads.
7. COLA request by Northwestern
  - 12 months of COLA at \$1K per month for CERN-based postdoc Kevin Sung.
  - Kevin Sung built and is responsible for the Northwestern Pulsar II test-stand at CERN.
8. Funding request for TAMU to work on AM simulation
  - A new software professional hire (50% FTE funding request) : \$45,360 including all overheads
  - TAMU will match the cost for the other half.
  - Travel to attend critical meetings at CERN. Two trips requested, \$7,446.00 including all overheads
9. Funding request for Florida to work on AM simulation (waiting for the actual numbers from Jaco tonight).
  - A new software professional hire (50% FTE funding request) : \$60,000 including all overheads
  - Florida will match the cost for the other half.
  - Travel to attend critical meetings at CERN. Two trips requested, \$7,446.00 including all overheads

## References

- [1] M. Dell Orso, L. Ristori - "VLSI Structure For Track Finding" - Nucl.Instr. and Meth. A278 (1989), 436; L. Ristori and G. Punzi, Ann. Rev. Nucl. Part. Sci. 60, 595-614 (2010).
- [2] J. Adelman *et al.* - "The Silicon Vertex Trigger upgrade at CDF" - Nucl.Instr. and Meth. A572 (2007), 361
- [3] - "FastTracKer(FTK) Technical Design Report" - CERN-LHCC-2013-007, ATLAS-TDR-021-2013, 2013
- [4] T. Liu *et al.* - "System Architecture for CMS L1 Track Trigger and work plan for Vertical Slice System Demonstration" - Draft proposal for discussion v1.9, Nov. 16, 2013:  
[http://hep.uchicago.edu/~thliu/PulsarII/TT\\_1.9.pdf](http://hep.uchicago.edu/~thliu/PulsarII/TT_1.9.pdf)  
For more detailed presentation and discussion, please see most recent talks at the "L1 Track Finding meeting during the tracker Phase II days" in Nov 19th 2013:  
<https://indico.cern.ch/conferenceDisplay.py?confId=278418>
- [5] J. Olsen, T. Liu and Y. Okumura - "A Full Mesh ATCA-based General Purpose Data Processing Board: Pulsar II" - FERMILAB-CONF-13-526-CMS-PPD (Nov. 22, 2013), presented at TWEPP 2013 and submitted to JINST:  
[http://hep.uchicago.edu/~thliu/PulsarII/PulsarII\\_twepp\\_paper.pdf](http://hep.uchicago.edu/~thliu/PulsarII/PulsarII_twepp_paper.pdf)
- [6] Y. Okumura, J. Olsen, T. Liu and H. Ying - "Prototype Performance studies of a Full Mesh ATCA-based General Purpose Data Processing Board" - FERMILAB-CONF-13-527-CMS-PPD (Nov. 22, 2013), presented at IEEE NSS 2013:  
[http://hep.uchicago.edu/~thliu/PulsarII/PulsarII\\_NSS\\_paper.pdf](http://hep.uchicago.edu/~thliu/PulsarII/PulsarII_NSS_paper.pdf)
- [7] Pulsar II web page.  
<http://www-ppd.fnal.gov/EEDOffice-w/Projects/atca/>
- [8] Phase 2 Tracker Week: Joint Front-end, TK-DAQ and L1 Track Finding meeting (July 24, 2014):  
<https://indico.cern.ch/event/325486/>
- [9] Phase 2 Tracker Week Plenary (July 25, 2014):  
<https://indico.cern.ch/event/324769/>
- [10] AM Simulation Camp:  
<https://indico.cern.ch/event/325924/registration/registrants>
- [11] T. Liu *et al.* - "A New Concept of Vertically Integrated Pattern Recognition Associative Memory (VIPRAM)" - FERMILAB-CONF-11-709-E, Proceedings of TIPP 2011 conference, Volume 37, 2012, Pages 1973:  
<http://www.sciencedirect.com/science/article/pii/S1875389212019165>  
or  
<http://lss.fnal.gov/archive/2011/conf/fermilab-conf-11-709-e.pdf>
- [12] T. Liu *et al.* - "Development of 3D Vertically Integrated Pattern Recognition Associative Memory (VIPRAM)" - FERMILAB-TM-2493-CMS-E-PPD-TD:  
<http://lss.fnal.gov/archive/test-tm/2000/fermilab-tm-2493-cms-e-ppd-td.pdf>
- [13] "Tezzaron Semiconductor": <http://www.tezzaron.com>
- [14] A. Annovi *et al.* - "A VLSI Processor for Fast Track Finding Based on Content Addressable Memories" - IEEE Transactions on Nuclear Science, vol. 53, no. 4 (2006), 1
- [15] A. Annovi *et al.* - "The GigaFitter: A next generation track fitter to enhance online tracking performances at CDF" - Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE (2009), 1143
- [16] A. Annovi *et al.* - Variable Resolution Associative Memory for High Energy Physics - Submitted to the Advancements in Nuclear Instrumentation, Measurement Methods and their Applications (ANIMMA), Ghent Belgium, 6-9 June, 2011.