naaaa, i was wrong,
svn i accessible:
svn://www.crystallography.net/cod and also
http://www.crystallography.net/cif/<COD number>
so it is possible to play with CIF's =)
@johnewarren, any way their search sucks =). Your approach (for Olex2) is more GUIsh, mine is more console (can i say eq 'strict'

).
alllllll'right, cause my girl is angry on me, i will reveal my crazy idea =)
i want to import full CIF files to database (yes, with all distances, angles, etc ,etc)[1].
if we gonna move deep into details about db, tables gonna be like that:
* table for header info: from data_ till first loop (loop_ _atom_site_label) and last lines about angle of diffraction, etc.
* tables for each big loop: 5 i guess.
after that (that's not so simple, but still not so hard =)), very cool search is needed. i'm thinking about file format similar to export file after ORTH of xp soft (yes this is proprietary from GM).
small example:
- Code: Select all
TITL a in P2(1)/c
CELL 1
SFAC C H N O
O1 4 0.00000 0.00000 0.00000
O2 4 0.58725 -2.08931 -1.90980
O3 4 3.34897 -2.80072 -2.93412
O4 4 4.79309 -1.67345 -0.71235
O5 4 4.72131 0.87606 1.01439
O6 4 1.83065 1.30171 1.24343
[lots of text]
LINK O1 C1 1
LINK O1 C2 1
LINK C2 H2A 1
LINK C2 H2B 1
LINK O2 C3 1
LINK C2 C3 1
END
so as you can see, it is simply and powerful. coordinates should not be neccessary, but still someone should be able to define them. For not important coordinates second variable comes in (ups, there can be coordinates like 20.0000, need to think about it). for link command, i don't know what 4th column means, but we definitly need bond type (cause CCDC has it

). several things are missing in this file: symmetry, i suggest to use shelxl format LATT + SYMM it is long but more machine'ish then CIF's one. Another thing is missing are temperature factors (Ueq), i don't know if we really need them =)
about search: now i'm thinking about that it is really hard to define in search bond length between two atoms, cause only coordinates are available, may be we can somehow extend LINK line, need to think about it. And also we need X atom type, that someone can search like C-1-X-2-O, with 1, 2 bond length known (CCDC will suck after that kind of search =)) )
==========================
technical stuff:
database engine: PostgreSQL (in 2 words: it is better then MySQL)
scripts: 2 types: database populator and search (may be we need some db bot scripts)
database populator:
1. cif parser (obvious)
2. database writer
search:
1. request parser
2. database searcher
3. output
db bots:
1. maybe some kind of validation in database
2. statistics (ex: interatomic bond length (angles) statistical data) (we can then print this data and sell as International
Tables, he he he)
===========================
for technical stuff i purpose to use Perl.
database populator:
1. cif parser (obvious) STAR::Parser (
http://pdb.sdsc.edu/STAR/index.html)
2. database writer (Perl DBI, it can work with any database engine, ok, almost any

)
search:
1. request parser (some new perl code

)
2. database searcher (Perl DBI)
3. output (some CGI on Perl)
============================
Conclusions:
it is possible =) I want hear what do you think about it. I will also publish tables schemes that describe db structure. Thank you for reading all that crap =)
Sasha.
PS: i don't like to write new code, i better use someone's else

PS2: testing would be easy: we need the same id (COD id ) like there. So we can randomly export entries from db and compare them their CIFs and write all differences.