martedì 25 febbraio 2014

Paginated column fetching on Cassandra with Ruby

When you have a column family on Cassandra with an unknown number of columns, and you need to fetch them all, the default column limit of 10000 might not be enough.
This is a snippet to solve this issue using the 'gotime-cassandra_object' gem. It fetches all the columns in blocks of 10000 and returns an array of hashes.


 class MyCassandraObject < CassandraObject::Base  
   
      def self.find(key)  
        # number of column to fetch with every query  
        page_size = 10000  
   
        # array to be returned  
        total_items = Array.new  
   
        # start the loop: fetch (page_size+1) columns  
        page_items = MyCassandraObject.execute_cql("SELECT FIRST #{page_size+1} * FROM column_family WHERE KEY = ?", key).fetch  
   
        while true  
   
             if (page_items.columns > page_size)  
               # successfully fetched (page_size+1) columns: there are more columns.  
                 
               # get column name from last record  
               next_offset = page_items.column_names.last   
   
               # append results to final array, except last  
               page_items.column_names[0..-2].collect do |column|   
                    total_items.push({column_name: column, column_value: page_items[column]})  
               end  
   
               # fetch another (page_size+1) columns   
               page_items = MyCassandraObject.execute_cql("SELECT FIRST #{page_size+1} '#{next_offset}'..'' FROM column_family WHERE KEY = ?", key).fetch  
             else  
               # fetched less than (page_size) columns: no more results  
                 
               # append all results to final array, and exit  
               page_items.column_names.collect do |column|   
                    total_items.push({column_name: column, column_value: page_items[column]})  
               end  
   
               puts "Fetched row #{key}, found #{total_items.size} columns."  
               return total_items  
             end  
        end  
   
        return total_items  
      end  
   
 end  


Nessun commento:

Posta un commento