In contrast to a language unaware index a language aware index is designed with knowledge about the graph query language that will be used to query it. For this project this is the language of conjunctive path queries (CPQs). Information about the language can then be used to optimise certain aspects of the index. Most notably there will be paths that cannot be distinguished by any query in the query language. The index can capitalise on this opportunity for optimisation, for example by not storing all paths that cannot be distinguished by the query language anyway (see k-path bisimulation).
More formally, language-aware indexing is a general methodology which leverages language-specific structural filtering in index design. Given a query language $L$, the basic idea of $L$-aware path indexing is to partition the set of paths in a graph $G$ into $L$-equivalence classes and build an index on the set of equivalence classes. The paths in the same $L$ equivalence class cannot be distinguished by any query in $L$, i.e., for every query $q \in L$, either all paths in the class appear in the evaluation of $q$ on $G$, or none of the paths appear. The index is then built on top of the set of equivalence classes. When retrieving results from the index we can then only retrieve these equivalence classes for a given query instead of all paths that would belong to the equivalence class. Using this we can essentially process these paths together instead of individually. Hence, language-aware path indexes accelerate query evaluation for a specific language.