MMA Project Book, Correlator

MMA Project Book, Chapter 10

MMA CORRELATOR

John Webber
Ray Escoffier
Last revised 1999-08-24

Revision History:

1998-09-18: Added chapter number to section numbers. Placed specifications in table format. Added milestone summary.

1999-04-09: Revised milestone dates and made date format conform to adopted standard. Revised tables and some text to reflect adoption of digital FIR filter. Changed text to reflect architectural change in delay line implementation. Revised block diagram.

Summary This section describes the proposed correlator for the MMA. The design described here is for a lag correlator with a system clock rate of 125 MHz. The goals of the design and development phase are to produce paper designs and some simulations of all major correlator elements, including the correlator chip, and to fabricate and test some prototype hardware. Table 10.1 Correlator Specifications

Item	Specification
Number of antennas	36
Number of baseband inputs per antenna	8
Maximum sampling rate per baseband input	4 GHz
Sampling format	4 bit, 16 level
Correlation format	2 bit, 4 level
Maximum baseline delay range	30 km
Hardware cross-correlators per baseline	1024 lags + 1024 leads
Autocorrelators per antenna	1024
Product pairs possible for polarization	RR, RL, LR, LL (for circular, e.g.)

Although the specification is for 36 antennas, the design is tentatively intended to allow up to 40 antennas to be connected, permitting various tests to be performed without impacting operations.

Table 10.2 Principal milestones for correlator work during D&D Phase

Deliver test correlator to VLA site	2000-03-31
Preliminary Design Review	1999-08-02
Decision: FIR filter or analog BBC	1999-02-18
FIR filter Critical Design Review	1999-07-12
Prototype correlator Critical Design Review	2000-07-31
Deliver FIR filter for test interferometer	2000-12-01

10.1 System Block Diagram

The system architecture described has been chosen as the best tradeoff to produce high reliability, robust operating margins, a minimum number of integrated circuits, and a minimum number of cable interconnects (see MMA Memo 166 ). The performance of the proposed architecture permits high versatility in correlator operation (see MMA Memo 194 ).
The adoption of a digital FIR filter eliminates many potential sources of systematic error (see MMA Memo 204 and MMA Memo 248 ).

The correlator system envisioned for the MMA includes; the samplers (digitizers), digital filters, mode selection, a delay line and data format conversion stage, cross- and auto- correlators, long term accumulation, and initial digital computer processing. Depending on the mode of operation, the output of the correlator could be in either the lag or frequency domains.

A simplified block diagram for the MMA correlator is given in Figure 10.1. This diagram presents a fairly conventional lag correlator except for the presence of the data format conversion stage.

Correlator, Figure 10.1

Figure 10.1: simplified correlator block diagram

The analog outputs of the baseband system drive sampler inputs where 4-bit, 16-level sampling is performed at 4 GS/second. When less than 2 GHz bandwidth is desired, the samples are used as the input to the digital filter. The use of 4-bit quantization at the FIR filter input results in a negligibly small (~2%) loss of SNR; the output re-quantization to 2 bits provides suitable input to the correlator.

Logic in the mode selection block routes outputs from the digital filters into the data format conversion system. When fewer than 8 samplers per antenna are being used, this stage will assure high system efficiency by replicating active sampler outputs into unused memory areas and hence into otherwise unused correlators where additional lags can be generated. In this way, maximum performance will be obtained for the observational mode desired.

The digital filter stage will also do the sample decimation for observations in which sample rates less than 4 GS/second are needed. A 32-sample delay is required just before the digital filter in order to perform the finest resolution delay adjustment.

Adjusting the signals to the appropriate timing by means of a bulk delay is provided on the memory cards in very efficient high density RAMs. For a 30 KM delay range, 524,288 RAM bits per sampler output bit are required.

The data format conversion block seen in figure 10.1 will take the 32 parallel outputs of each sampler and, using RAMs, both adjust delays and re-sort the samples. In this block, the 32 parallel outputs of a high speed sampler would be converted from each carrying every 32nd sample to each carrying short (about 1 msec) bursts of contiguous samples. If the N-wide parallel (2-bit) output of a high speed sampler (each output carrying every Nth sample) were to drive the correlators using a conventional architecture, an N-by-N matrix of correlators would be required to insure every sample is correlated with every other sample. For N = 32, this would mean a matrix of 1024 small correlators to correlate the output of every baseband input of every baseline.

By using the format conversion scheme, the 32-wide parallel output from a high speed sampler will be transformed into 32 parallel signals each carrying 1 millisecond time segments of contiguous samples that need drive only an N-by-1 array of correlators. This simplification in the correlator circuit requirements is obtained at the cost of an inefficiency of about 0.2% which results because the end bits in adjacent 1 msec time segments of samples will not get correlated with each other.

(Note that the conversion from a conventional N-by-N architecture to an N-by-1 architecture does not improve the spectral resolution performance of the correlator. The performance is set by the number of hardware correlators in the system. The conversion does, however, greatly simplifying the system wiring in that all N-by-N signals from two antennas do not have to be wired to closely spaced electronics, thus simplifying the wiring matrix driving the cross correlators as well as reducing the number of I/O pins required by logic cards and integrated circuits.)

An additional benefit of the format conversion strategy is that it allows the system the same advantage as a recirculating correlator: when the bandwidth being processed is reduced by a factor of 2, the number of lags the system is capable of generating goes up by a factor of 2. This results in a factor of 4 increase in frequency resolution for a factor of 2 decrease in bandwidth.

Still another advantage of the format conversion (by far the most important in the MMA correlator) is that it allows a minimum cable interconnect complex between the station electronics and the correlators. It also eliminates any requirement to interconnect correlator arrays in low bandwidth modes. Since the number of data interfaces between these two stages in the MMA correlator surpasses that of any other astronomical correlator system by a factor of almost 100, this aspect of the system architecture is most important.

The cross correlator matrix of figure 10.1 is used to correlate the sampler outputs of every antenna with those of every other antenna. At the intersection of any antenna X and another antenna Y in this matrix, there will be a correlator chip. This correlator will compute lag products for the XY baseline while the antenna Y and antenna X intersection of the matrix computes the baseline lead products. Auto correlation products for each antenna are obtained from correlators on the matrix diagonal.

In order to minimize further the station electronics to cross-multiplier cable interconnect, a very compact cross correlator matrix is essential. The proposed design for the MMA correlator places an entire 40 X 40 cross correlator matrix (handling a 1/32, 125 MHz data rate, slice of the decimated sampler outputs) for two baseband inputs of opposite polarization on a single printed circuit card. This PC card in addition is configured such that no signal drives more than one load. For the number of signals required on a 40 antenna system, this property permits an absolute minimum cable matrix since every signal out of the station electronics goes one and only one place, driving only a single load.

One disadvantage of the proposed architecture is that once the number of antennas for the array has been set, future expansion of the correlator beyond this number is not practical.

The proposed custom lag correlator chip has a dual 4-by-4 array of correlators (one for each of 2 polarizations). The chip can be programmed via a microprocessor supplied program word for its position in the matrix and to select one of three correlator configurations;

1. four short correlators to compute the lags of all 4 polarization products (RR, RL, LR, and LL).
2. two longer correlators to compute just the lags for the two polarization components (RR and LL).
3. a single long correlator to compute lags for only one of the two baseband inputs.

The estimated size of this custom correlator chip is in the 750,000 gate range.

For observations in which fewer than 8 baseband inputs are being used, more lags can be produced by dedicating more than one correlator array to process the outputs of active baseband inputs. In this case, cards in the data format conversion stage will be used to form a virtual connection, the effect of which is to link two or more correlator arrays in series. The delayed input to the correlator chips that are to compute the higher level lags will be displaced in time the appropriate number of bits by offset RAM addressing in the data format conversion cards.

The long term accumulation block seen in figure 10.1 integrates the correlator outputs for the desired duration. The correlator chips will produce a total of 52,428,800 lag results to be accumulated. The parallelism factor, 32, allows the reduction of this number to 1,638,400 which when double buffered and spread across 32 long term accumulator cards will require integration storage of 102,400 results per card.

The adoption of a digital FIR filter has a potential system-wide consequence: it makes more attractive the option of performing the digitization at the antenna and transmitting the data to the correlator over a digital rather an analog fiber optic link. This is due to the fact that, with analog filters, sampling at the antenna implies placing the analog filters at the antenna, with resulting stringent specifications on filter temperature stability which could be difficult to meet. The advantage of digitizing at the antenna is that the limited SNR and gain instability of an analog fiber optic link are eliminated. The disadvantages are possible shielding difficulties for the sampling clock and the (at present) high cost of digital transmission for 64 Gbit/sec data compared to the cost of two 8 GHz wide analog channels.

10.2 Performance

This section gives performance parameters for some typical operating modes of the MMA correlator. The MMA correlator will be programmable on a baseband by baseband basis and, hence, some baseband inputs may be processed in one mode while other baseband inputs are processed in other modes.

Bandwidths per baseband input range from a maximum of 2 GHz down in factor of 2 steps to 31.25 MHz. For 8 baseband inputs per antenna, this yields a maximum bandwidth per antenna of 16 GHz.

Sub-arrays will also be possible using the MMA correlator. The maximum number of sub-arrays for the MMA will not be determined by the correlator (that is, the MMA correlator will be able to support the maximum number of sub-arrays limited by other parts of the MMA).

There are assumed to be 8 samplers per antenna. The baseband inputs driving the samplers can consist of 4 dual polarization pairs or 8 independent inputs. For the case in which the baseband inputs come in polarization pairs, all 4 polarization cross-products may be computed. Each sampler is assumed to digitize at 4 GHz and hence to be driven by RF signals at most 2 GHz in bandwidth. The maximum bandwidth processed is thus 16 GHz split into 2 GHz pieces. Note that the analog baseband constraints of the planned MMA baseband processing system will impose limits as well.

The smallest division of lags in the projected correlator chip is 64 lags. Because of the architecture proposed, this will produce 64 lead and 64 lag channels and hence 64 spectral points per product. This smallest correlator division means that in the full-up configuration, all baseband inputs active at maximum bandwidth and all 4 polarization products being computed, 64 spectral points will be produced for every baseline, every spectrum. This gives a frequency resolution per spectral channel of 31.25 MHz.

Given the full-up performance as defined above, the number of lags that the proposed correlator can produce for a given experiment results from the following considerations:

1. If polarization cross-products are not required, a factor of 2 more lags (finer resolution) can be obtained. The particular configuration can be selected on a baseband pair by baseband pair basis.

2. If fewer than 8 baseband inputs are required, lags go up as 1 over the fraction of baseband inputs used (1/2 the baseband inputs, 2 times the lags).

3. If a lower bandwidth than 2 GHz per baseband input is required, lags go as 1 over the fraction of maximum bandwidth (1/4 the maximum bandwidth, 4 times the lags) until a factor of 32 is reached. After that, the number of lags stays constant. The particular configuration can be selected on an baseband by baseband basis.

Note that item 3 implies the characteristic described above that for each reduction by a factor of 2 in bandwidth, an increase of a factor of 4 in resolution is obtained (up to the factor of 32 limit after which the resolution improves by only 2 for each factor of 2 reduction in bandwidth).

Table 10.3 below illustrates some of the possible modes. The first four columns relate to the correlator proper. The columns relating to velocity range and resolution assume 90% of the analog bandwidth will be usable. (See MMA memo 194 for additional illustration of the MMA correlator performance.)

Table 10.3 Selected correlator modes

# of samplers	Bandwidth/ Sampler	Cross-pol Products?	Channels/ Product	At 230 GHz, in velocity space: Range Resolution km/s
8	2 GHz	Yes	64	9391	40.8
8	2 GHz	No	128	18783	20.4
8	1 GHz	No	256	9391	5.1
8	500 MHz	Yes	256	2348	2.5
8	250 MHz	No	1024	2348	0.32
4	2 GHz	Yes	128	4696	20.4
4	1 GHz	No	512	4696	2.5
4	500 MHz	Yes	512	1174	1.3
4	250 MHz	No	2048	1174	0.16
2	2 GHz	Yes	256	2348	10.2
2	1 GHz	No	1024	2348	1.3
2	500 MHz	Yes	1024	587	0.64
2	250 MHz	No	4096	587	0.08

A specification for the output dump rate of the MMA correlator has not been set yet. However, the architecture thus far described is versatile in regard to dump rates by the very nature of the process. The short time segments of samples from the memory cards insure that very short fundamental integrations are always made. Thus, dumps of the long term accumulations can be made at natural intervals that are multiples of 1/32 and 1 millisecond. This means that the correlator hardware need never be the limiting factor in obtaining high dump rates; the down stream processing and storage medium used with the system would set the dump rate limit. Since this part of the system can be changed as processing and storage technology improves, the system will be able to keep up with this improvement.

10.3 Size and Power Requirement Estimate

Table 10.4 Preliminary MMA correlator module and printed circuit card requirements

Item	# required	Size	Power
4 GS/s dual sampler	160	double width module	20 w
FIR filter card	320	6U euro card	80 w
Mode card	160	6U euro card	20 w
Memory card	320	6U euro card	80 w
Correlator card	128	9U euro card	300 w
Control card	32	6U euro card	40 w
Long term accumulator	32	6U euro card	60 w
TOTALS	1152		100kw

It is estimated that the station-dependent part of the system (sampler, filter, mode, and memory) will require 1/2 a rack per antenna, or 20 racks for 40 antennas. The remainder of the system, proportional to the number of antennas squared (correlator, control, and accumulator) will occupy 8 racks for 40 antennas. The grand total of racks is therefore about 28.

The power estimates given in Table 10.4 above are based on the experience gained in the development of the GBT spectrometer. The biggest unknown at this time is the dissipation to be expected in the custom correlator chip, 12,800 of which will be required in the system. The GBT correlator chip dissipates about 5 watts with a clock rate of 125 MHz. Such a high chip dissipation in the MMA correlator would mean both high system power requirements and lower reliability because of the difficulty in removing the heat from the system at the high altitude site.

By using low voltage chip technology it is hoped that the custom correlator chip described in this document can be built with about a 2 or 3 watt power requirement. The chip represents about a factor 2 increase in the level of integration when compared to the GBT correlator chip (twice the number of transistors). By using a more modern process, with finer component features and low voltage technology, a smaller chip with lower power requirements should be possible. The smaller silicon size should also mean a higher yield in the manufacturing process.