What php search engine would you use?

I'm facing a challenge with a customer running a site we built on a hosting environment that is quite restrictive. We used to use htdig to add search functionality, but the hoster doesn't allow execution of binaries.

I've taken a look at several search engine implementations, but so far, none match all requirements:

  • Zend Framework's lucene search: requires PHP5, but we have to run it on PHP4

  • phpdig: uses exec() to perform the search, but we are running in safe mode
  • google: layout not flexible enough
  • htdig: requires execution of binaries, which is not allowed
  • perlfect: requires execution of perl, which is not allowed

Maybe isearch is an option but I cannot find a lot of info about it.

The requirements are simple, but restrictive:

  • Should be usable/integratable in a PHP4 based site

  • Should run with safe mode on
  • Layout should be customisable
  • Should not require execution of perl or binaries, at least not for the search part (uploading an index that is created on a separate machine is somewhat acceptable although not prefered)
  • Should be spider based, so no database query based search engine

Free Software/Open Source/Free as in Free Beer would be nice, but is no strict requirement.

Anybody any suggestion? What are you guys using?

Tags: , , , , , ,

20 Responses to “What php search engine would you use?”

  1. Marco Says:

    Does the limitation on binaries extend to extensions? I’m only asking because if you can sneak in an extension that you can load dynamically, you could use Xapian (which powers BeebleX), which fulfills all the other requirements. I’m not sure if that’s going to fly with safe_mode on, though.

  2. Markus Wolff Says:

    You could have a look at:
    http://www.phpdig.net/index.php

    I currently don’t remember if it strictly requires a shell script to be run, but I think it does. However, as it’s written in PHP, it should be fairly easy to take that shell script and hack it a bit so that it can be invoked via HTTP request.

    But, reading your requirements I can’t help but think: Really man, if *that’s* your limitations and you really *need* a search engine… for heaven’s sake pay the $0.02 extra per month and get a real hosting service!

  3. Jeff Moore Says:

    I know this isn’t always possible, but it seems like it would be easier and better in the long run to find more accommodating hosting.

  4. Ivo Jansch Says:

    Actually, the company I work for provides better hosting, but in this case, that’s not an option. Choice of hosting is not always a matter of price unfortunately :-(

  5. Harry Fuecks Says:

    May be there’s some ideas you can get from Dokuwiki, which contains it’s own search engine:

    [url]http://dev.splitbrain.org/view/darcs/dokuwiki/inc/search.php[/url]
    [url]http://dev.splitbrain.org/view/darcs/dokuwiki/lib/exe/indexer.php[/url]

    This is Dokuwiki-specific but perhaps some of the ideas can be re-used, in particular the approach to indexing, using a web bug - tried to explain that a little [url=http://www.sitepoint.com/blogs/2005/11/03/web-bugs-for-job-scheduling-hack-or-solution/]here[/url]

  6. Ren Says:

    You could port ZF Lucene to PHP4. I dont think it’d be much of a struggle. Though would have to take make sure the current limitations of ZF implementation are ok.

  7. Anonymous Says:

    Take a look at the new http://www.php.net search:
    [url=http://www.php.net/results.php?q=demo&l=en&p=all]http://www.php.net/results.php?q=demo&l=en&p=all[/url]
    It uses Yahoo!’s web services API.

  8. Rob Houweling Says:

    I use ZOOM, wich comes in several flavors, e.g. PHP. It works like a dream and the layout is flexible. Have a look: http://www.wrensoft.com/

    the best, in my experience

  9. Maarten Manders Says:

    Have you tried the google search webservice? http://www.google.com/apis/

  10. Clay Loveless Says:

    I second the recommendation for Yahoo! Search API

    http://developer.yahoo.net/search/

    http://pear.php.net/manual/en/package.webservices.services-yahoo.php

  11. darin Says:

    sphider

    http://www.cs.ioc.ee/~ando/sphider/

  12. Balluche Says:

    # Zend Framework’s : well, do u want to reprogram entirely your application with a full framework ?
    # google: well, do you seriously want to wait until most of your content is indexed ??

    The CMS SPIP (http://www.spip.net) offers an integrated search engine that works even with safe_mode and shared web servers. The indexing process is diluted on fiew http requests.

  13. Matt Simpson Says:

    I ran into this situation not too long ago and decided Sphider was right for me: http://cs.ioc.ee/~ando/sphider/

    It doesn’t take a lot to run and should fit quite nicely into your requirements. You can also add in things like PDF, DOC, PPT indexing, etc… if at any point your hosting provider allows you to run a few open source apps on their server.

    ~Matt Simpson

  14. Sam Stevens Says:

    Here’s an interesting standards compliant solution for smaller sites: http://www.gr0w.com/amos/growsearch/. Indexing is done on the fly, so it’s not a good solution for larger sites.

  15. Leendert Brouwer Says:

    I don’t know what your budget is over there, but you could try Google mini
    http://www.google.com/enterprise/mini/

    Anyway if you want to host the search engine by yourself, _and_ you want the search engine to use spidering, but you can’t run executables, how are you planning to initiate a spider application? That said, using PHP directly for live searching without spidering sounds icky to me (although I don’t know about how much data we’re talking about)

  16. Anonymous Says:

    http://fenec.noplay.net

  17. Codes Says:

    [url=http://www.isearchthenet.com/isearch/]isearch[/url] is used for the [url=http://blog.dreamhosters.com/kbase/]DreamHost Knowledge Base Mirror[/url]. Works pretty well.

  18. mixa Says:

    You can initiate using CRON on Linux server…ofc, if you have such option

  19. chris.is-a-geek.net Says:

    Ivo Jansch de Achievo s’est posé la même question que moi : comment implémenter un moteur de recherche en PHP pour son site ?
    Classiquement, les moteurs de recherche sont en deux parties :

    Un composant d’indexation : qui permet de rajou…

  20. DreamHost Says:

    Hello!

    Why don’t you change hosting like Markus said? I usually promote DreamHost, but they also have some restrictions (you can’t use fopen, just cURL), so you better check before signing up. The first year will be almost free with the code we affiliates can give. Mine is MAXIMUMPROMO , and it’s a $97 discount, the maximum affiliates can give. I won’t earn a dime from your direct referral.
    More information is here: http://dreamhost97.wordpress.com/

Leave a Reply