Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up tag search #442

Open
alphaCTzo7G opened this issue Jun 4, 2018 · 27 comments
Open

Speed up tag search #442

alphaCTzo7G opened this issue Jun 4, 2018 · 27 comments

Comments

@alphaCTzo7G
Copy link

Hi all,

I use Universal Ctag and gutentags to generate the tag files. However, when I use :CtrlPTag, it takes a lot of time to search through the tags.

If I use find in the ctrlp_user_command, will I be able to speed up the tag search?

Does ctrlp use the same engine for all searches? files/buffer/tags etc?

@tacahiroy
Copy link
Member

If :CtrlPTag takes time only for the first time when you run it, because CtrlP parses the tags file by itself and caches lines in the tags file. Also CtrlP generates a tags file if it's stale.
I think these 2 things make :CtrlPTag slow.

Does ctrlp use the same engine for all searches? files/buffer/tags etc?
Yes, if 'searches' means narrowing down process.

To speed up narrowing down items you can use external CtrlP matcher like nixprime/cpsm or FelikZ/ctrlp-py-matcher which is much faster than built-in matcher.

@alphaCTzo7G
Copy link
Author

Thank you very much.. I will look into one of these..

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jun 11, 2018

@tacahiroy I installed ctrlp-pymatcher because from the description cpsm seemed more oriented towards files.

It did speed up the search process quite a bit. However, I finding that ctrlp doesnt really find the appropriate tags using ctrlptag.

For example, in the current buffer, if I use ctrl-], I can go to the correct location. However, when I use CtrlPTag, and type in the name of the function that I want to navigate to, it doesn't show anything.

Is there any other setting I need to enable for CtrlPTag to work properly?

Or do I need to increase the number of tags that ctrlp will search through, because I only see around 20 tags displayed?

@alphaCTzo7G
Copy link
Author

I opened a issue on ctrlp-py-matcher as well.. FelikZ/ctrlp-py-matcher#43

@tacahiroy
Copy link
Member

Can you give me more details about your setup please.

  • ctags version
  • CtrlP related settings
  • tags file which you have the issue

@alphaCTzo7G
Copy link
Author

Sure.. I installed Universal Ctags by compiling it from sources by following the instructions here:

https://github.com/universal-ctags/ctags.git

Ctags:

Universal Ctags 0.0.0(4fe1a60), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Apr 13 2018, 20:54:21
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath

let g:ctrlp_cache_dir = $HOME . '/.cache/ctrlp'
let g:ctrlp_clear_cache_on_exit = 0
let g:ctrlp_match_func = { 'match': 'pymatcher#PyMatch' }
let g:ctrlp_follow_symlinks = 2

I will generate a new tag file, because I can't share the current tag file..

@alphaCTzo7G
Copy link
Author

Its a bit weird.. I tried replicating it wit the ctrlp repository, and this issue doesn't exist. I can use CtrlPTag to search through the tags with ctrlp-py-matcher.
tags.zip

However, when I try to use my own repository, I am still having the same problem.

To check whether this is just coming from having some errors in my previous tag file, I deleted my original tag file and regenerated a new tag file.

However, even after generation of the new tag file, the problem still exists. I can't get CtrlPTag to work on my repository..

My repository however is much larger than ctrlp's repository (355mb vs 2.1mb).

Most of the size of my repository comes from having a relatively large number of external Python libraries that I use (~185mb) vs ~57mb of actual code (+data).

Have you guys experienced this kind of issues with relatively large repositories?

@alphaCTzo7G
Copy link
Author

Well my initial hypothesis of this being related to the tag file and repository file size probably is wrong.. I tried it with the linux kernel repository (3.7GB) size and a 500mb tag file.. It was able to find specific keys from the repository..

@alphaCTzo7G
Copy link
Author

It turns out.. that if I do remove the env directory.. CtrlPTag starts working again.. not sure why..

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jun 12, 2018

Ok.. I have been able to reproduce it using another library. What seems to be important here to reproduce this issue is

  1. the repository has to be a Python repository
  2. has to have a relatively large code base
  3. has to have a env directory.

Steps to reproduce:

  1. get the library, create the virtualenv, generate the tags:
mkdir ~/tmp
cd ~/tmp
git clone git@github.com:Theano/Theano.git
cd Theano
virtualenv env
source env/bin/activate
pip install -r requirement-rtd.txt
vim bin/theano_nose.py
  1. try to navigate to the particular tag:
    Next in vim type :CtrlPTag and then search for options. You can see that
    there is a function named options in line 240 of bin/theano_nose.py

However, :CtrlPTag doesn't show that it exists.

Next, do the following:

cd ~/tmp/Theano
rm -rf env tags
vim bin/theano_nose.py

Next in vim type :CtrlPTag and then search for options. You will be able
to find the options function.

P.S. I am using gutentags for indexing the tags, but most likely gutentags
doesn't have anything to do with it

@tacahiroy
Copy link
Member

hmm - I don't reproduce the issue by following the steps above :/

Is there 'options' in the tags file in both cases actually?
I just created a tags file simply by running ctags -R --exclude='*.min.js' --exclude='*.pyx' --exclude='*.pxd' and CtrlPTag worked as expected.

  • ctags: universal-ctags/ctags. compiled by myself with 44a0e9791
  • ctrlp.vim: 306bc60
  • Vim: 8.1.42

@alphaCTzo7G
Copy link
Author

@tacahiroy.. thanks for checking.. I will try to follow your steps and see it works..

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jul 1, 2018

What version of ctrlp-py-matcher are you running?

To reduce it to the smallest problem possible, I am not loading anything else other than the necessary files, so my .vim consists of the following

.vim/
/autoload/
pathogen.vim
/bundle/
ctrlp.vim
ctrlp-py-matcher/

This is the vimrc I am using now:

set nocompatible              " be iMproved, required

execute pathogen#infect()

filetype plugin indent on

syntax on

let g:ctrlp_match_func = { 'match': 'pymatcher#PyMatch' }

I am running

ctrlp version (as you are): 306bc60

ctrlp-py-matcher version (latest version): cf63fd5

ctags version: 4fe1a60

Vim version: 8.0.1430

I removed gutentags from my vim and used ctags directly to run the files : ctags -R --exclude='*.min.js' --exclude='*.pyx' --exclude='*.pxd'.

Yes, the options is present in both tag files.

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jul 1, 2018

Added the tags with and without the env folder:

tags.tar.gz

@alphaCTzo7G
Copy link
Author

Upgraded my ctags to your version: 44a0e97.. but the problem still persists.

The only difference, I can see now is that you have a later VIM version.. I will try that as well... In the mean time, if you have any idea whats going on.. or how to debug this issue let me know..

@alphaCTzo7G
Copy link
Author

Updated my vim to match yours.. the problem still persists.. After doing a binary search through the files, I found that the python libraries Pygments and Jinja2 are causing problems.. there are probably more.. but not sure why this is so.. It probably something to do with that the plugin have python modules..

@alphaCTzo7G
Copy link
Author

Created a small test with a vimrc, ctrlp, ctrlp-pymatcher, pathogen, Jinja2 etc here: https://github.com/alphaCTzo7G/test

if you do a git clone https://github.com/alphaCTzo7G/test

and then do

cd test; rm tags; ctags -R; vim test.py

Then do a :CtrlPTag and try to navigate to fun, fun is not there..

If you remove the Jinja2 library.. it starts working.. This problem doesn't seem to be with the env folder or python libraries perse but with specific libraries that are installed in the env folder.

I was able to reproduce this with Windows and Linux virtual environments..

@Grueslayer
Copy link

Please have a look at #421, universal ctags does not support "-extra" anymore. The given pull-request still isn't merged.

@alphaCTzo7G
Copy link
Author

@Grueslayer .. thanks for replying..

In this case, CtrlPTag works fine if I dont have specific libraries such as Jinja2 or Pygments in my python virtualenv. I looked at the patch for the issue that you referred to as well here: 3acd9e8

Maybe, I dont understand the implications of the patch..However, there too I dont see anything related to --extra/--extras. Further, I am using gutentags to generate the tag file.. and it works fine if I dont have specific python libraries such as Jinja2 or pygments.

Wondering why you think this is related to the --extra/--extras option?

I will try out your suggestion anyways to see if it resolves the issue..

@Grueslayer
Copy link

This was my fault, I thought about CtrlPBufTag. This is not using the tags file, instead it is calling ctags by itself (and universal ctags does not accept -extra anymore, it fails). You are right that does not matter for CtrlPTag itself. Sorry. Did you had a look in the tagfile itself, anything different for the lines when including additional stuff?

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jul 10, 2018

@Grueslayer .. actually there is not much difference.. I tracked it down to one file(atleast for the jinja2 library) which is causing CtrlPTag to fail: https://github.com/pallets/jinja/blob/master/jinja2/_identifier.py

If I have this file anywhere in my repostitory CtrlPTag using ctrlp-py-matcher will fail..

Heres the github repository to try this with..

https://github.com/alphaCTzo7G/test

This repository contains the _identifier.py file which completely stops CtrlPTag from working when using ctrlp-py-matcher.

To replicate the problem:

`

  1. copy your vim settings from your ~ directory into a backup folder using: md ~/tmp/vim; cp -r ~/.vim* ~/tmp/vim
  2. clone https://github.com/alphaCTzo7G/test into ~/tmp using: cd ~/tmp; git clone https://github.com/alphaCTzo7G/test
  3. copy the .vimrc and .vim folders from https://github.com/alphaCTzo7G/test to your home folder: cd ~/tmp/test; cp -r .vimrc .vim ~
  4. if you have ctags already installed use cd ~/tmp/test; rm tags; ctags -R; vim test.py
  5. type :CtrlPTag and then in the CtrlPTag You will notice that there are a lot of tags
  6. Next try to search for fun by typing fun in the quickfix/location list list that shows up, but the tag for fun doesn't show up, even though its present in the tag file

Next try to remove the _identifier.py file and repeat the process.. where you can find the tag for the function fun:

  1. delete _identifier.py from the test directory using rm ~/tmp/test/_identifier.py
  2. generate the tags and open up the test.py file cd ~/tmp/test; rm tags; ctags -R; vim test.py
  3. Using search for the tag using :CtrlPTag
  4. try to search for fun in the CtrlPTag window,... suddenly fun is present..

@alphaCTzo7G
Copy link
Author

The only difference that I can find between the 2 tag files in the tags file uploaded to the github repository is this line:

212d211 < pattern _identifier.py /^pattern = '·̀-ͯ·҃-֑҇-ֽֿׁׂ-ًؚ-ٰٟۖ-ۜ۟-۪ۤۧۨ-ܑۭܰ-݊ަ-ް߫-߳ࠖ-/;" v

related to the _idenfier.py file..

@Grueslayer
Copy link

Everything works quite fine with both tags files from you (and generated myself) under MacOS (8.0.1238 and 8.1.72) using your stripped down setup. The line contains many UTF-8 encoded characters, maybe not handled correct by your vim / python?

@alphaCTzo7G
Copy link
Author

Hmm.. I made a small correction to the instructions above to replicate the problem.. When I have _identifier.py in my repository, and I do :CtrlPTag, there are tags that show up but tags related to the function fun in the file test.py don't show up..

but they show up when I have _identifier.py.

If both you and @tacahiroy haven't been able to replicate it, perhaps its related to my system python or vim/shell.. I am running a customized this on a Ubuntu 16.04 Virtualbox VM. So its possible that one of the customizations is messing this up..

I will do some tests to see if the UTF-8 and system settings are messing this up as well..

Thanks for checking..

@alphaCTzo7G
Copy link
Author

I just thought I would update it here.. that the issues is with a particular function in vim vim.eval which is used by ctrlp-py-matcher. Certain files such as _idenfier.py in the jinja2 library contain BOM fields. This apparently crashes vim.eval

When I did remove the BOM field form the _identifier.py using the solution here: https://unix.stackexchange.com/a/381263/242983, regenerated the tags

:CtrlPTag started working again using the `ctrlp-py-matcher.

It seems that @ludovicchabant also faced the same issue and had to modify https://github.com/ludovicchabant/vim-gutentags to handle the issue: https://ludovic.chabant.com/devblog/2017/02/25/aaa-gamedev-with-vim/

His modification of ctrlp-py-matcher is here: https://github.com/ludovicchabant/ctrlp-py-matcher/blob/2f6947480203b734b069e5d9f69ba440db6b4698/autoload/pymatcher.py#L22

I am not sure yet why you guys are being able to get the tags even when _identifier.py and the BOM fields are present in your tags... Any ideas?

Currently, I have eliminated the entire python env folder because my own python files are not going to have these BOM fields, and I have the correct encoding set up.

Do you guys know any other alternative to vim.eval which would able to convert a:items to the python equivalent, even if a:items has BOM fields https://github.com/FelikZ/ctrlp-py-matcher/blob/cf63fd546f1e80dd4db3db96afbeaad301d21f13/autoload/pymatcher.py#L7?

@Grueslayer
Copy link

Grueslayer commented Jul 15, 2018 via email

@alphaCTzo7G
Copy link
Author

alphaCTzo7G commented Jul 16, 2018

@Grueslayer .. Thanks for your reply.

What if ctags contains invalid characters. In my case, iconv shows the following:

iconv -f utf-8 -t ascii -o _identifier.py _identifier.py
iconv: illegal input sequence at position 65

Also you are correct, that there is probably no BOM in the file atleast in my case.

If I open up the file in vim and use set bomb? it shows no bomb
2018-07-16_08-20-32

it seems the invalid utf-8 characters are tripping up vim.eval..

I have seen this happen on multiple libraries.. weirdly, this has happened mostly on python libraries.. Perhaps because python is cross-platform.. people pay less attention to encodings.. leading to incorrect characters and thus this kind of failure..

Even if there is a single invalid character in the entire python library, vim.eval will fail and thus CtrlPTag will stop working.

What would be the right approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants