Crystallography Open Database

Please use this forum to post questions on general small molecule X-ray crystallography from theory to practical.

Re: Crystallography Open Database

Postby pascalp » 24 Oct 2009, 10:00

Too fast, I have seen many database poorly modelled and the key point of a database is integrity.

First point this: http://www.iucr.org/resources/cif/spec/ ... /cifsyntax
Everything in this should be possible, maybe not everything implemented at first.

For example:
29. Data names may not exceed 75 characters in length.
There is no point to create a varchar column of 256 characters and can lead to future problems.

_publ_author_name is a data block. It does not look very difficult to handle, but think about it.
You don't know the number of names, this is therefor a 1:N relationship. You need two tables for this.
However, in this case the authors table is not normalised. Duplicates records will occured as an author can be present in many cif files. It is now a N:M relationship, you need 3 tables:
One table for the cif file
One table for the authors
One table for the linking between the two (2 columns defined as foreign key of each primary key of the two tables above)

_publ_section_title
I did not find if this one can be multi lines or not.
; ; construct allow multi lines construct. The difference between varchar and text data is important for indexing purpose.

These are more or less trivial, just a column in the cif main table (_journal_name_full, _diffrn_measurement_method, _diffrn_radiation* could be normalised, ie more tables) :
Code: Select all
_journal_issue                   9
_journal_name_full               'Acta Crystallographica, Section C'
_journal_page_first              1073
_journal_page_last               1074
_journal_volume                  56
_journal_year                    2000
_chemical_formula_moiety         '(C5 H16 N2 )[AlHP2 O8 ]'
_chemical_formula_sum            'C5 H17 Al N2 O8 P2'
_chemical_formula_weight         322.13
[...]
[...]
_audit_creation_method           SHELXL-97
_cell_angle_alpha                90.00
_cell_angle_beta                 95.1470(10)
_cell_angle_gamma                90.00
_cell_formula_units_Z            4
_cell_length_a                   7.8783(2)
_cell_length_b                   10.46890(10)
_cell_length_c                   16.0680(4)
_cell_measurement_reflns_used    5007
_cell_measurement_temperature    296(2)
_cell_measurement_theta_max      29.83
_cell_measurement_theta_min      2.32
_cell_volume                     1319.90(5)
_computing_cell_refinement       SMART
_computing_data_collection       'SMART (Siemens, 1996a)'
_computing_data_reduction        'SHELXTL96 (Siemens, 1996b)'
_computing_molecular_graphics    'DIAMOND (Bergerhoff, 1996)'
_computing_publication_material  SHELXTL
_computing_structure_refinement  'SHELXL93 (Sheldrick, 1993)'
_computing_structure_solution    'SHELXS86 (Sheldrick, 1990)'
_diffrn_ambient_temperature      296(2)
_diffrn_measurement_device       'Siemens SMART diffractometer'
_diffrn_measurement_method       '\w scans'
_diffrn_radiation_monochromator  graphite
_diffrn_radiation_source         'fine-focus sealed tube'
_diffrn_radiation_type           MoK\a
_diffrn_radiation_wavelength     .71073
_diffrn_reflns_av_R_equivalents  .0383
_diffrn_reflns_av_sigmaI/netI    .0532
_diffrn_reflns_limit_h_max       10
_diffrn_reflns_limit_h_min       -10
_diffrn_reflns_limit_k_max       13
_diffrn_reflns_limit_k_min       -14
_diffrn_reflns_limit_l_max       9
_diffrn_reflns_limit_l_min       -21
_diffrn_reflns_number            8939
_diffrn_reflns_theta_max         29.83
_diffrn_reflns_theta_min         2.32
_exptl_absorpt_coefficient_mu    .429
_exptl_absorpt_correction_T_max  .978
_exptl_absorpt_correction_T_min  .844
_exptl_absorpt_correction_type   semi-empirical
_exptl_absorpt_process_details   'SADABS (Sheldrick, 1996)'
_exptl_crystal_colour            colorless
_exptl_crystal_density_diffrn    1.621
_exptl_crystal_density_meas      'not measured'
_exptl_crystal_description       parallelepiped
_exptl_crystal_F_000             672
_exptl_crystal_size_max          .12
_exptl_crystal_size_mid          .06
_exptl_crystal_size_min          .05
_refine_diff_density_max         1.357
_refine_diff_density_min         -.604
_refine_ls_extinction_coef       .013(8)
_refine_ls_extinction_method     'SHELXL93 (Sheldrick, 1993)'
_refine_ls_goodness_of_fit_all   1.055
_refine_ls_goodness_of_fit_ref   1.080
_refine_ls_hydrogen_treatment    constr
_refine_ls_matrix_type           full
_refine_ls_number_parameters     167
_refine_ls_number_reflns         2521
_refine_ls_number_restraints     4
_refine_ls_restrained_S_all      1.370
_refine_ls_restrained_S_obs      1.096
_refine_ls_R_factor_all          .1073
_refine_ls_R_factor_gt           .0584
_refine_ls_shift/esd_mean        .000
_refine_ls_shift/su_max          <0.001
_refine_ls_structure_factor_coef Fsqd
_refine_ls_weighting_scheme
'calc w = 1/[\s^2^(Fo^2^)+(0.0573P)^2^+3.0698P] where P=(Fo^2^+2Fc^2^)/3'
_refine_ls_wR_factor_all         .2069
_refine_ls_wR_factor_ref         .1362
_reflns_number_gt                1901
_reflns_number_total             3421
_reflns_threshold_expression     I>2\s(I)


Remark: std deviation 7.8783(2) is tricky as it is not a number. It can't be stored as is.

These ones:
_symmetry_cell_setting Monoclinic
_symmetry_space_group_name_H-M P2(1)/n
Should be used with a dictionnary. a table cell_setting and a table space group name. It enforces integrity and consistency.

Code: Select all
_symmetry_equiv_pos_as_xyz
'x, y, z'
'-x+1/2, y+1/2, -z+1/2'
'-x, -y, -z'
'x-1/2, -y-1/2, z-1/2'

1:N relationship. possibly N:M. Is there a finite number of symmetry operator?

SGBDRs are complicated, it has to be done properly. Just to handle genealogical data, I am using about 10 tables for data and 25 relation tables for N:M relationships.
pascalp
Rotating Anode
 
Posts: 134
Joined: 17 Dec 2007, 16:01
Location: Utrecht, NL

Previous

Return to General SMX

Who is online

Users browsing this forum: CommonCrawl [Bot] and 0 guests