Installation Guide

From DocMGR

Jump to: navigation, search

Contents

Requirements

DocMGR was written and tested under PHP 5.0.4 (Apache webserver 1.3.33) and PostgreSQL 8.0.3 on a Slackware box.

As far as the OS goes, any linux or other unix-based OS should do nicely. Because of the javascript code used in DocMGR is fairly new, the client will need at least IE 5.5 SP2 or Netscape 7.0/Mozilla 1.0 for DocMGR to work correctly. On the server side, you will need at least Postgresql 7.4.0 and php 5.0 for DocMGR to work correctly.

DocMGR also requires some standard Unix programs, and the php binary, for indexing files. These are as follows: tr, cat, ps, and which. If it cannot find these in apache's path, it will report an error when first accessing the program. These programs are pretty standard, so I don't anticipate any problems.

Installation Procedure

Setup the files

  • Untar the DocMGR archive
  • Now we need to allow the user your apache process runs as to have write access to our files directory. By default, on most linux distributions, apache runs as the user "nobody". So, just type "chown -R nobody /path/to/docmgr/files". If you are running apache as a different user, replace "nobody" with that user.
  • DocMGR now allows you to move the "files/" directory to a location not accessible from the web. This allows the storage of data on a separate drive/partition, or just in a location out of apache's reach.

    If you leave the "files/" directory in the default location, or in a web-accessible location, be sure to perform the httpd.conf modification below. To not do so will allow anyone to download files stored in DocMGR!. In your apache httpd.conf file, add the following lines:

    <Files "*.docmgr">
      Order allow,deny
      Deny from all
    </Files>

    This prevents anyone from pointing their browser to your "files/data/" or "files/thumbnails/" directories and freely downloading files from the application.
  • Edit the config/config.php file. The file is commented and hopefully self-explanatory. Make sure you set any options in the REQUIRED section.
  • (No longer available in 0.54 or later)
    If you selected CRON_INDEX in the config file, you'll need to setup a corresponding cron entry. First, put the scripts/cronindex.php in a directory of your choice. Then, open the file in an editor and set the first line to the path of your php binary. You can find this out with a "which php" at the command line. Then set the $dirPath variable to the absolute path of your DocMGR directory. Lastly, add a cron job to nightly index your files (or whatever frequency you choose).
    An Example (every night at 4):

    0 4 * * * /usr/local/bin/cronindex.php 1>/dev/null 2>/dev/null

Setting up the database

Installing A Fresh Database

Create your database using the provided "docmgr.pgsql" file in the scripts directory. After logging into your sql server, become the postgres user (or whatever user your postgresql installation runs as). First, create a new database by running /path/to/pgsql/bin/createdb docmgr (or whatever you want it to be called). Next, run "/path/to/pgsql/bin/psql -d docmgr -f /path/to/docmgr.pgsql" to create your database structure.

Edit the config/config.php file to match your database settings. Also, be sure to set USE_OID if you are using a version of postgresql previous to 8.1. 8.1 or later users can ignore that setting.

If you are creating a fresh installation, you can now login as explained below. For a complete Document Management System, I recommend you installed the optional programs outlined in #Optional Features. However, before you start installing software, your server may already have some of the optional software installed. After logging in, check out Admin -> External Applications to see what optional features are already available. If you are upgrading, please peruse the <a href=#optional>Optional Features</a> section to see if there have been any changes from previous versions, then proceed to the "Upgrading Your Database" section.

First Time Login Username & Password

If this is a new installation, uou can login now with the username/password combo admin/admin. After you log in you may want to head over to Admin -> Database Admin -> External Applications to see if DocMGR is finding the external apps you installed (if any). If it can't find a particular program, it will tell you. It will also report which directories it is looking in for that program's binary files.

Upgrading Your Database

Backup your database first!!!

Upgrading from 0.5x
  • First, setup the config file in the new installation directory.
  • Next, copy your files/ directory from the old installation to the new installation.
  • Go to the doc/scripts/ directory and open the upgrade.php file. If you are running 0.50.x, uncomment "upgrade51 = 1" on line 16. If you are running 0.51 or 0.52, uncomment "upgrade53 = 1" on line 13, and 0.53 users need to uncomment "upgrade54 = 1". These lines basically cause the script to upgrade your database to the next version before continuing. Also, comment out line 18 so the script will run.
  • Run the script with "php upgrade.php". When the script finishes, your upgrade is complete.
  • If you use tsearch2, run the "reindex.php" file to correct an indexing bug regarding summaries of objects not being indexed.
Upgrading from 0.44-0.49.x

To upgrade to 0.44-0.49.x to 0.56, you must upgrade to 0.50 first. First cd to the scripts/upgrade50 directory and follow the directions below. After completing these steps, run the upgrade.php file in the scripts/ directory to complete the upgrade to 0.56.

  • First, create a new database by running "createdb <dbname>" on the command line as your postgresql user.
  • Next, in the scripts/upgrade50/upgrade.php file, set the database information for your old database and your newly created docmgr database.
  • Run the upgrade.php script from the scripts/upgrade50 directory by running "php upgrade.php". Make sure you run the script from within scripts/upgrade50.
  • Copy your data and thumbnails to the new installation. You will be copying the contents of the data/ directory in the old DocMGR installation to "files/data" in the new installation. Also, the contents of the thumbnails/ directory will be copied to "files/thumbnails" in the new installation.
  • Make "tmp" and "document" subdirectories in the "files" directory.
  • Run "chown -R <apacheuser> files/"
  • Follow the steps in #Upgrading from 0.5x.
  • When finished, reindex your files by running "php reindex.php".

The Scripts Directory

Once you're done with the install, you'll probably want to put the scripts/ directory somewhere not accessible via the web. But, hang on to it in case you want to upgrade to Tsearch2 or use the reindex or thumbnail scripts again.


Optional Features

The latest version of DocMGR has several additional features which require outside software to be installed. The features and their required software are listed below. I apologize for the large amount of outside requirements, but this software is needed to allow DocMGR to function more as a complete Document Management System. DocMGR will automatically determine which programs you have installed and are available to apache, and will configure itself accordingly. Again, these additional features are optional and can be completely disabled in the config.php file

Image OCR

If you want the content of your images to be indexed so you can look up a scanned page (or whatever) by its content, then you will need this. This requires GOCR at http://jocr.sourceforge.net, ImageMagick at http://www.imagemagick.org, and LibTiff at http://www.remotesensing.org/libtiff/ to work properly. You probably already have imagemagick and libtiff installed on your system. Drop to a command prompt and type 'convert' for imagemagick, and 'tiffinfo' for libtiff, and see what you get. If you don't have them, you can download the packages from the url above. For DocMGR to enable OCR support, it needs to be able to find the gocr, mogrify, convert, tiffinfo, and tiffsplit binaries. So, they must be in apache's path.

PDF Indexing

This will allow your pdf file content to be indexed. I highly recommend this feature, especially if you use the PDF format for a majority of your scanned documentation. You have two choices here, xpdf and ghostscript. XPDF allows for faster indexing of PDFs. It will also autorotate your pdf pages for better OCRing. So, I definately recommend it over ghostscript. But, currently DocMGR does support either (for now). If you have both installed, it will favor xpdf over ghostscript. You can get XPDF from http://www.foolabs.com/xpdf/. XPDF requires version 3.0 or later. If you use ghostscript it must be version 6.52 or higher. If you do not have at least this version, you can download it from http://www.ghostscript.com

To index pdfs using ghostscript, just make sure the gs binary is in apache's path. To use xpdf, make sure the pdftotext, pdfimages, and pdftoppm binaries are in apache's path. They are probably in /usr/X11R6/bin by default. If you want to index encapsulated pdfs as well (like the one's from a copier), you'll need to follow the above steps for OCR support as well.

Note: Ghostscript requires zlib, libpng, and jpeg-6b to compile. You can download them from the "3rdparty" directory of the ghostscript ftp site. Just untar them in the top level directory of the ghostscript source, and rename the directories to zlib, libpng, and jpeg, respectively.

Thumbnail Support

Like Image OCR, thumbnail support requires imagemagick and libtiff to work properly. See above for the packages' home pages. Text file thumbnails require enscript, which is probably already on your machine.

Email Support

You can now email files to any email address. You may also receive subscription notifications via email. You can use these features if sendmail is installed and running on your system, or if IMAP support is compiled into PHP.

Anonymous Email Support

If email support is enabled, you may also send files to non-docmgr users. These users are emailed a unique link and pin number which may be used to access the desired file. At the time of email sending, the sender designates the length of time the length is valid, and has the ability to be notified via email or SMS when the file is viewed by the recipient.

URL Indexing

DocMGR can index any urls you link to. This will be enabled if you have "wget" installed on your system.

Zip And Download Collection

You can download a zipped version of any collection. The utility "zip" must be installed on your system.

Keyword Searching

The keywords are stored in config/keywords.xml. If you follow the templates, the setup is relatively painless. The new setup allows for the Administrator to allow the user to select from predefined values for a keyword. Currently, keywords may be inserted using a text field or dropdown select box. More options may be added later. Currently, up to 6 different keywords are allowed for a file. This is basically a text box the user fills in with the keyword value. To setup a text keyword, use the template below:

   <keyword>
       <title>Invoice Number</title>
       <name>field1</name>
       <type>text</type>
   </keyword>

The title can be anything you want. The type must be set to text, and the name must be either field1-field6. The name tag must have a different value for every keyword!!! Besides, the dropdown keywords allow the user to select from a predefined set of values for a keyword, and then search by those values later. Simply use the template below. You can specify as many tags as you wish.

   <keyword>
       <title>Purchase Method</title>
       <name>field2</name>
       <type>dropdown</type>
       <option>Web</option>
       <option>Catalog</option>
       <option>Store</option>
   </keyword>

Again, the title can be anything you want. The type must be set to dropdown, and the name must be either field1-field6. The name tag must have a different value for every keyword!!!

This template is also already in config/keywords.xml. Just be sure to remove the comment tags to enable them.

Tsearch2 Full Text Indexing

For those of you running larger DocMGR installations, you'll want to use tsearch2 for your document indexing. It results in much faster searches of your documents and result ranking. Tsearch2 installation instructions for DocMGR are listed below.

All of these above features are optional. But, I highly recommend them for a more efficient Document Management System.

ClamAV File Scanning

If clamav is installed on your system, DocMGR will use it to scan files for virusesvat upload, import, view, checkin/checkout, and email. If a virus is found, the virusvwill be reported, and the action will be cancelled. DocMGR looks for the "clamscan" binary to enable this feature. ClamAV may be downloaded from http://www.clamav.net.

IconV support

DocMGR will convert the character encoding for certain file types to the encoding of your database before indexing. This allows for more accurate searches in non-english languages. DocMGR will use the iconv binary if available, followed by the php iconv function. If neither are found, no conversion will take place.

MS Word indexing/thumbnailing

By default, DocMGR has MS Word indexing support. However, the results may not be always completely accurate. With the installation of antiword, the Word document is converted to text before being indexed, and a thumbnail of the document is created. You may download antiword from http://www.winfield.demon.nl.

File Checksum Verification

Upon uploading/updating a file in the system, a md5 checksum of the file is created and saved in the database. At any type of file retrieval, the current checksum is verified against the stored value in the database before the file may be viewed. Also, a "Digital Signature" file (checksum.md5) may be sent with an email so the recipient may verify the file upon retrieval. The sending of the checksum.md5 attachment may be disabled in config.php. During file viewing, if the checksums do not match, the user will be notified and file viewing will not be allowed.

Restrictive Delete

By setting this option in config.php, non-adminstrative users will not be able to remove files from the system. These users will still be able to upload/move files if their permissions are set accordingly.

File Revision Limit

This allows you to limit the number of past revisions to be kept for a file. This may be desired if disk space is limited on your server. In addition to this, you may set FILE_REVISION_REMOVE to allow a user to selectively remove past revisions of a file in the File's History module.


WebDAV Support

WebDAV works with Windows XP, Gnome 2.10 VFS, Konqueror, WebDrive, and the webdav.org cadaver client. I couldn't get davfs to compile on my machine, so I don't know if it works or not. DavFS is based on the libneon library, as is cadaver, so it should work. Webdav support is still beta-level, though. Please read the BUGS file in the doc/webdav/ directory for a list of all known issues.

To setup, set the WEBDAV_PATH define in /path/to/docmgr/webdav/client.php to the absolute path of the DocMGR installation.

To setup an XP machine as a client, create a WebDAV folder in "My Network Places", and point it to "http://docmgr/webdav/client.php". That should be all it takes.

Multiple Language Support

DocMGR has the ability to support non-english languages. You may translate the lang/English.php file to the language of your choice, and drop your translation into the lang/ directory. DocMGR will find the new translation automatically and make it available to your users. See the <a href="language.php" class=main>Language</a> page for more information.

Any language files you create may be submitted back to me to be posted on the DocMGR Language download page.


Tsearch2 Indexing Support

If you decide your document repository has outgrown DocMGR's simple indexing system, you may use tsearch2 for full text indexing. Your searches will be faster, and your search results will be ranked.

If you use tsearch2 and are running a postgresql database version 7.4.x or earlier, I highly recommend you apply the regprocedure_update.sql patch available on the tsearch2 website. It allows for easier backup and restoration of your tsearch2-based database, instead of those complicated steps. I have tested the patch, and it works just fine with DocMGR.

To add tsearch2 to your database, first complete the DocMGR installation and/or upgrade steps above. Then perform the following steps:

  • Enter your Postgresql source directory. Switch to the contrib/tsearch2 directory and issue a "make; make install". Restart Postgres and you're done with this step.
  • Next, we need to add tsearch2 to DocMGR. Do this by calling the scripts/docmgr-tsearch2.sql file in your DocMGR directory. You may accomplish this by the following:
    • su postgres
    • psql -d docmgr -f /path/to/docmgr/scripts/docmgr-tsearch2-XXX.sql. Use the appropriate file for your version of Postgresql. If you are using 8.0.1 or earlier, use docmgr-tsearch2-pre802.sql. Otherwise use docmgr-tsearch2.sql.</li>
  • Next, tell DocMGR to use tsearch2 by setting the TSEARCH2_INDEX define to "1" (and make sure it's not commented out) in the config/config.php file.
  • By default, DocMGR uses tsearch2's "default" profile for indexing. Those of you using a non-english language will want to set "TSEARCH2_PROFILE" to "simple" in config/config.php.
  • This will result in a larger index, but should work. Tsearch2 also supports dictionary-based indexing using ispell. That's beyond the scope of this document. But, foreign-language users may want to look into this. See the tsearch2 website referenced below for more details.
  • Finally, we need to reindex our documents. To accomplish this, cd to the scripts/ directory in DocMGR. Make sure the first line in reindex.php points to your php binary. Then, type ./reindex.php The script will reindex all your files using tsearch2.
  • Note: This step is not needed in Postgresql versions 8.0 and later. Apply the regprocedure_update.sql patch to DocMGR. This will allow pg_dump & pg_restore to work properly for database dumps. This will apparently no longer be a problem in the next major release of Postgresql. Run "psql -d docmgr -f /path/to/docmgr/scripts/regprocedure_update.sql" to patch the DocMGR database.
  • If you want more info on tsearch2 and its abilities, the home page is http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/.

Utility Scripts

These scripts are located in the DocMGR scripts/ directory. They may be used for maintenance, upgrading, or other specialized tasks. A brief description of the available scripts is below.

createthumbs.php

Recreates thumbnails for all supported file types in the system.

docmgr-autoimport

This file imports documents in a specified directory. It will delete the documents in that directory when the import is finished. This is intended to be run as a cron job. You may set the directory to import to and the user to import the documents as at the top of the file.

docmgr-cronindex

(Not available in 0.54 and later) </div> This file indexes pending documents in the background. Just set CRON_INDEX in the config.php file and add this file as a cron job.

docmgr-tsearch2-pre803.sql

For postgresql 8.0.2 and earlier. This sql scripts makes the DocMGR database tsearch2 ready.

docmgr-tsearch2-803.sql

For postgresql 8.0.3 and later. This sql scripts makes the DocMGR database tsearch2 ready.

docmgr.pgsql

The original docmgr database creation sql script.</p>

regprocedure_update.sql

This file updates a tsearch2 enabled docmgr to allow for easier backups. Should only be required for 7.4.x versions of postgresql.

reindex.php

Reindexes all documents. Some upgrades may require this script to be run. Or, you'll need to run this if you transfer to/from tsearch2.

upgrade.php

Upgrades docmgr database from 0.5x to 0.54

upgrade50/upgrade.php

This file upgrades 0.44-0.49.x to 0.50. This must be run before running the 0.51 migration script if you are running 0.44-0.49.x. Run it from within the upgrade50/ directory.



Personal tools