The Top K System is realized as an imbedded DRAM application. On a single die, 16Mbits of DRAM are combined with 64 bit-serial weighted L1-Norm processors, and a 64x64 network of sorting elements that can return up to the top 64 close matches to an input query. Example data vectors are encoded with 16-bit integers per dimension, and 8 bit weights per dimension. The distance measures are 32 bits, which can accumulate 256 dimensional data vectors without overflow. Data dimension of up to 16K is possible, with the distance measure saturating once overflow occurs, marking the distance as too dissimilar to be considered as a close match. The bit-serial processors and sorting elements are connected in a systolic pipelined array providing high-speed (100Mhz+) operation under simple local control signals from processor to processor. Only a single global clock signal is required to synchronize the chip operation. Data is retrieved 8 bits at time from each of 64 256Kbit Dram blocks simultaneously. The 8 bits from each DRAM block are fed serially into the distance processors, which requires that the DRAM cycle time be only one eight the cycle time of the processors. Multiple chips can be combined on a single board to scale up memory size, speed, and number of "Top K" matches to be retrieved. A board of 16 chips can provide a speed advantage of over 3 orders of magnitude compared to a high-speed workstation implementation.
A floorplan of the chip is illustrated below.
Aaron Lipman lipman@vlsi.eecs.harvard.edu