Skip to content
pulse00 edited this page Sep 11, 2012 · 3 revisions

An indexing API for PDT extensions based on apache lucene

Participants

Discussion

Google group thread.

Introduction

Most frameworks come with a variety of configuration files in various formats. Indexing those files to add support for framework specific behavior is currently not possible using the PDT/DLTK infrastructure (the DLKT H2 indexer only treats IModelElements as indexable - which .xml or .json files are not in a PHP project).

To overcome this limitation i propose the following API to allow indexing of non-sourcefiles in PDT projects based on apache lucene.

Implementation

Extension point

The following extension point allows to register BuildParticipants which receive configuration files to index:

    <schema targetNamespace="com.dubture.indexing.core" xmlns="http://www.w3.org/2001/XMLSchema">
    <annotation>
          <appinfo>
             <meta.schema plugin="com.dubture.indexing.core" id="buildParticipant" name="Lucene Build Indexing Participant"/>
          </appinfo>
          <documentation>
             [Enter description of this extension point.]
          </documentation>
       </annotation>

       <element name="extension">
          <annotation>
             <appinfo>
                <meta.element />
             </appinfo>
          </annotation>
          <complexType>
             <sequence>
                <element ref="participant" minOccurs="1" maxOccurs="unbounded"/>
             </sequence>
             <attribute name="point" type="string" use="required">
                <annotation>
                   <documentation>
                      
                   </documentation>
                </annotation>
             </attribute>
             <attribute name="id" type="string">
                <annotation>
                   <documentation>
                      
                   </documentation>
                </annotation>
             </attribute>
             <attribute name="name" type="string">
                <annotation>
                   <documentation>
                      
                   </documentation>
                   <appinfo>
                      <meta.attribute translatable="true"/>
                   </appinfo>
                </annotation>
             </attribute>
          </complexType>
       </element>

       <element name="participant">
          <complexType>
             <attribute name="nature_id" type="string" use="required">
                <annotation>
                   <documentation>
                      
                   </documentation>
                </annotation>
             </attribute>
             <attribute name="file_extensions" type="string" use="required">
                <annotation>
                   <documentation>
                      Space separated list of file extensions registered for this visitor.
                   </documentation>
                </annotation>
             </attribute>
             <attribute name="visitor" type="string" use="required">
                <annotation>
                   <documentation>
                      The Implementation of the IndexingVisitor for the participant.
                   </documentation>
                   <appinfo>
                      <meta.attribute kind="java" basedOn=":com.dubture.indexing.core.index.IndexingVisitor"/>
                   </appinfo>
                </annotation>
             </attribute>
             <attribute name="name" type="string">
                <annotation>
                   <documentation>
                      The human readable name of the participant
                   </documentation>
                </annotation>
             </attribute>
          </complexType>
       </element>

    </schema>

Extenders should implement a BuildParticipant with following interface:

    /**
     * IndexingVisitor interface for buildParticipant extension implementation.
     * 
     * @author Robert Gruendler <r.gruendler@gmail.com>
     * 
     */
    public interface IndexingVisitor {
      /**
       * Set the indexingrequestor.
       * 
       * @param requestor
       * @return {@link IndexingVisitor}
       */
      IndexingVisitor setRequestor(IIndexingRequestor requestor);

      /**
       * Get the resource the visitor is operating on.
       * 
       * @return {@link IResource}
       */
      IResource getResource();

      /**
       * Sets the resource the visitor is operating on.
       * 
       * @param resource
       * @return {@link IndexingVisitor}
       */
      IndexingVisitor setResource(IResource resource);

      /**
       * The transformed POJO.
       * 
       * @param object
       *            can be safely cast to your implemetation object.
       */
      void visit(Object object);

      /**
       * A resource is about to be deleted.
       * 
       * @param file
       *            the resource being deleted
       */
      void resourceDeleted(IFile file);
    }

An example implementation of the extension point looks like this (taken from the composer plugin):

   <extension point="com.dubture.indexing.core.buildParticipant">
      <participant
            file_extensions="json"
            name="ComposerBuildParticipant"
            nature_id="com.dubture.composer.core.composerNature"
            visitor="com.dubture.composer.core.visitor.ComposerVisitor">
      </participant>
   </extension>

With an example implementation of the visit() method which receives to object being indexed:

    @Override
    public void visit(Object objectToIndex)
    {
        // ... type checking etc ommitted for simplicity
        PackageInterface pHPPackage = (PackageInterface) objectToIndex;        
        ReferenceInfo info = new ReferenceInfo("composer.json.reference",pHPPackage.getName());
        requestor.addReference(info);
    }

In this case the objectToIndex is already deserialized into the correct Java type (PackageInterface) from a json file. This is done by implementing the JsonIndexingVisitor interface (which extends IndexingVisitor):

    @Override
    public Gson getBuilder()
    {
        return new GsonBuilder()
                .registerTypeAdapter(License.class, new LicenseDeserializer())
                .setFieldNamingStrategy(new ComposerFieldNamingStrategy())
                .create();
        
    }

This is a simple API which enables extenders to easily index configuration files from frameworks.

Searching the index

The indexing feature provides a SearchEngine API to query the index for previous indexed references:

    /**
     * some internal method from an extender plugin to search for indexed items by path
     */ 
    public List<EclipsePHPPackage> getPackages(IPath path)
    {
        List<ReferenceInfo> references;
        List<EclipsePHPPackage> packages = new ArrayList<EclipsePHPPackage>();

        SearchEngine engine = SeachEngine.getInstance();
        references = engine.findReferences(path, ComposerVisitor.REFERENCE_ID);
        
        for (ReferenceInfo info : references) {
            // retrieve additional metadata from the indexed document
            String meta = info.getMetadata();
            PHPPackage json = gson.fromJson(meta, PHPPackage.class);
            EclipsePHPPackage pHPPackage = new EclipsePHPPackage(json);
            packages.add(pHPPackage);
        }
        
        return packages;
    }

Lucene index

The underlying index is implemented using apache lucene. Every reference to a domain object is indexed as a lucene document with the following structure:

  • path: the absolute path to the item being index
  • filename: the name of the file where the item being indexed is found
  • type: the type of the reference (should be a unique identifier over all index-implementors, use your java namespace as a base)
  • referencename: the name of the reference
  • metadata: an optional string where users can store additional metadata to retrieve later on when querying the index