How to create BIND PSI MI 1.0: First, download BIND XML files for the species of interest. You can get files from the following ftp site ftp://ftp.bind.ca/pub/BIND/data/datasets/taxon/xml. You will need to register first but It is free to register. After you have downloaded and extracted the BIND data to your directory, you will need to get an XSLT processor. Most of the linux systems have the software 'xsltproc'; so, you can use it. There are some other XSLT processors written in other languages for different operating systems, you have to figure out which one is most suitable for you. I'll assume that you are using a linux operating system with pre-installed xsltproc software. Copy our XSLT script to the same folder as your BIND data. For efficiency reasons, it is strongly recommended to split each BIND XML file so that the resulting files are not more than ~250 MB. You can do this manually or by using the perl script supplied. Look at the following section for automatic splitting. For each file, run the following command in the shell (don't forget to substitute file names with appropriate ones) : $ xsltproc --timing XSLT_file BIND_XML_file > output_file Otherwise, you can use the supplied perl script to do this conversion automatically. Look at the following section for automatic conversion. It can take some time depending on the size of the file; if it takes a long time, it might means that you have to split the BIND XML files into smaller files. This XSLT is most efficient with smaller files. To optimize your speed decrease the size of the bind xml files. Output file is created in PSI MI 1.0 format Automating the Process There are two perl scripts to automate the process of conversion; so you need to first install a perl compiler if you do not have. Then, copy the scripts into your directory. To split files: Use the script 'split_files.pl'. First, generate a list of BIND XML file names. To do this in a linux environment, you can use: $ ls taxon* > filelist.txt (I'm assuming that the names of your BIND XML files start with 'taxon') Then, feed the list to the script: $ perl split_files.pl < filelist.txt Files are created in the same directory. Their names indicate the source file and the part of the file (such as: taxon10090.1.xml.part3 ) once you have split the files into smaller ones, you can then convert those files automatically using the perl script 'convert_given_files'. This script uses xsltproc which needs to be already installed. Again, create a list of the split file names: $ ls *part* > splitted_files.txt (I'm assuming that the names of splitted files has 'part' string within the name) Then, type the following command : $ perl convert_given_files < splitted_files.txt Created files will have an additional part*.xml string at the end of their names. Trouble Shooting Sometimes split_files.pl might not work properly due the structure of the source BIND XML file, you might need to split the large files ( > 300 MB) manually. Be careful when manually splitting the files; do not corrupt the structure of the XML otherwise it won't work. You might need some prior knowledge of BIND schema to split the file manually. If you encounter a problem using convert_given_files.pl , record the error so that you can find the reason (most probably because of the split source file). You can either try to solve it manually, or you can contact us to request help. Since some of the records do not have protein-protein interaction information, resulting files have zero-bytes. Again, some files only have a few lines and no information; which means that they cannot be used as well. Be sure that zero-byte files are the ones that are properly converted and have no errors when converting; check the output of convert_given_files.pl command. If there are files with errors, correct them manually and re-run the script only for those files. At the time it was written ( August 25, 2006), the XSLT worked well with the BIND XML files. BIND data has not been changed over quite some time; however, if it changes, XSLT or the supplied scipts might not work properly. Assumptions for creating PSI MI 1.0 formatted files from BIND XML files. In some cases there was a loss of data but we tried to maximize the data extracted from bind. - In a single record, both of the interactors should be proteins. - Each entry is stored under 'BIND-Submit/BIND-Submit_interactions/BIND-Interaction-set/BIND-Interaction-set_interactions/BIND-Interaction' tag. - Each interaction has an interaction description. - Experiment descriptions are supplied within the data such as interaction detection method and publication information. If cannot be found, only publication information is parsed. In this case, there is no guarantee that there is publication info. Interaction detection method cannot be parsed in this case as well. - Experiment description ids are generated randomly. - Protein interactor ids are generated randomly and they are unique. - Each interactor has xref information for reference purpose. If no related information is found, appropriate string from psimi controlled vocabulary is assigned to them. - If taxonomy id of an interactor is found, it is parsed. If not (some cases of artificial peptides), they are indicated as unidentified. - Each interaction record has a BIND id and it is stored in an appropriate location. - XSLT was written so that the resulting file fits PSI MI schema even if the source file has some errors.