What php search engine would you use?

March 8th, 2006 by Ivo

I'm facing a challenge with a customer running a site we built on a hosting environment that is quite restrictive. We used to use htdig to add search functionality, but the hoster doesn't allow execution of binaries.

I've taken a look at several search engine implementations, but so far, none match all requirements:

  • Zend Framework's lucene search: requires PHP5, but we have to run it on PHP4

  • phpdig: uses exec() to perform the search, but we are running in safe mode
  • google: layout not flexible enough
  • htdig: requires execution of binaries, which is not allowed
  • perlfect: requires execution of perl, which is not allowed

Maybe isearch is an option but I cannot find a lot of info about it.

The requirements are simple, but restrictive:

  • Should be usable/integratable in a PHP4 based site

  • Should run with safe mode on
  • Layout should be customisable
  • Should not require execution of perl or binaries, at least not for the search part (uploading an index that is created on a separate machine is somewhat acceptable although not prefered)
  • Should be spider based, so no database query based search engine

Free Software/Open Source/Free as in Free Beer would be nice, but is no strict requirement.

Anybody any suggestion? What are you guys using?

20 Responses to “What php search engine would you use?”

  1. March 08, 2006 at 3:15 pm, Marco said:

    Does the limitation on binaries extend to extensions? I’m only asking because if you can sneak in an extension that you can load dynamically, you could use Xapian (which powers BeebleX), which fulfills all the other requirements. I’m not sure if that’s going to fly with safe_mode on, though.

  2. March 08, 2006 at 3:43 pm, Markus Wolff said:

    You could have a look at:
    http://www.phpdig.net/index.php

    I currently don’t remember if it strictly requires a shell script to be run, but I think it does. However, as it’s written in PHP, it should be fairly easy to take that shell script and hack it a bit so that it can be invoked via HTTP request.

    But, reading your requirements I can’t help but think: Really man, if *that’s* your limitations and you really *need* a search engine… for heaven’s sake pay the $0.02 extra per month and get a real hosting service!

  3. March 08, 2006 at 4:23 pm, Jeff Moore said:

    I know this isn’t always possible, but it seems like it would be easier and better in the long run to find more accommodating hosting.

  4. March 08, 2006 at 4:25 pm, Ivo Jansch said:

    Actually, the company I work for provides better hosting, but in this case, that’s not an option. Choice of hosting is not always a matter of price unfortunately :-(

  5. March 08, 2006 at 4:27 pm, Harry Fuecks said:

    May be there’s some ideas you can get from Dokuwiki, which contains it’s own search engine:

    [url]http://dev.splitbrain.org/view/darcs/dokuwiki/inc/search.php[/url]
    [url]http://dev.splitbrain.org/view/darcs/dokuwiki/lib/exe/indexer.php[/url]

    This is Dokuwiki-specific but perhaps some of the ideas can be re-used, in particular the approach to indexing, using a web bug – tried to explain that a little [url=http://www.sitepoint.com/blogs/2005/11/03/web-bugs-for-job-scheduling-hack-or-solution/]here[/url]

  6. March 08, 2006 at 4:51 pm, Ren said:

    You could port ZF Lucene to PHP4. I dont think it’d be much of a struggle. Though would have to take make sure the current limitations of ZF implementation are ok.

  7. March 08, 2006 at 5:16 pm, Anonymous said:

    Take a look at the new http://www.php.net search:
    [url=http://www.php.net/results.php?q=demo&l=en&p=all]http://www.php.net/results.php?q=demo&l=en&p=all[/url]
    It uses Yahoo!’s web services API.

  8. March 08, 2006 at 6:50 pm, Rob Houweling said:

    I use ZOOM, wich comes in several flavors, e.g. PHP. It works like a dream and the layout is flexible. Have a look: http://www.wrensoft.com/

    the best, in my experience

  9. March 08, 2006 at 9:28 pm, Maarten Manders said:

    Have you tried the google search webservice? http://www.google.com/apis/

  10. March 09, 2006 at 12:14 am, Clay Loveless said:

    I second the recommendation for Yahoo! Search API

    http://developer.yahoo.net/search/

    http://pear.php.net/manual/en/package.webservices.services-yahoo.php

  11. March 09, 2006 at 1:37 am, darin said:

    sphider

    http://www.cs.ioc.ee/~ando/sphider/

  12. March 09, 2006 at 3:23 pm, Balluche said:

    # Zend Framework’s : well, do u want to reprogram entirely your application with a full framework ?
    # google: well, do you seriously want to wait until most of your content is indexed ??

    The CMS SPIP (http://www.spip.net) offers an integrated search engine that works even with safe_mode and shared web servers. The indexing process is diluted on fiew http requests.

  13. March 09, 2006 at 3:28 pm, Matt Simpson said:

    I ran into this situation not too long ago and decided Sphider was right for me: http://cs.ioc.ee/~ando/sphider/

    It doesn’t take a lot to run and should fit quite nicely into your requirements. You can also add in things like PDF, DOC, PPT indexing, etc… if at any point your hosting provider allows you to run a few open source apps on their server.

    ~Matt Simpson

  14. March 09, 2006 at 5:22 pm, Sam Stevens said:

    Here’s an interesting standards compliant solution for smaller sites: http://www.gr0w.com/amos/growsearch/. Indexing is done on the fly, so it’s not a good solution for larger sites.

  15. March 09, 2006 at 5:32 pm, Leendert Brouwer said:

    I don’t know what your budget is over there, but you could try Google mini
    http://www.google.com/enterprise/mini/

    Anyway if you want to host the search engine by yourself, _and_ you want the search engine to use spidering, but you can’t run executables, how are you planning to initiate a spider application? That said, using PHP directly for live searching without spidering sounds icky to me (although I don’t know about how much data we’re talking about)

  16. March 09, 2006 at 6:35 pm, Anonymous said:

    http://fenec.noplay.net

  17. March 09, 2006 at 11:26 pm, Codes said:

    [url=http://www.isearchthenet.com/isearch/]isearch[/url] is used for the [url=http://blog.dreamhosters.com/kbase/]DreamHost Knowledge Base Mirror[/url]. Works pretty well.

  18. March 22, 2006 at 10:21 pm, mixa said:

    You can initiate using CRON on Linux server…ofc, if you have such option

  19. March 24, 2006 at 7:51 am, chris.is-a-geek.net said:

    Ivo Jansch de Achievo s’est posé la même question que moi : comment implémenter un moteur de recherche en PHP pour son site ?
    Classiquement, les moteurs de recherche sont en deux parties :

    Un composant d’indexation : qui permet de rajou…

  20. November 14, 2006 at 1:02 am, DreamHost said:

    Hello!

    Why don’t you change hosting like Markus said? I usually promote DreamHost, but they also have some restrictions (you can’t use fopen, just cURL), so you better check before signing up. The first year will be almost free with the code we affiliates can give. Mine is MAXIMUMPROMO , and it’s a $97 discount, the maximum affiliates can give. I won’t earn a dime from your direct referral.
    More information is here: http://dreamhost97.wordpress.com/