##########################################################################
Source code of Step 4 "A simple counting algorithm for final pattern recovery" in the paper "Population analysis of 3D genome structures reveals major factors governing the stability of regulatory communities"
##########################################################################

This code is written by C++ and built with C++ BOOST Version 1.52.0 (http://www.boost.org/). You need to install C++ BOOST library before compilation. Then please change the following line in the "Makefile":

INCLUDE= -I/your_boost_root_directory/include
LIBS= -L/your_boost_root_directory/lib

##########################################################################
Command Usage
##########################################################################

USAGE: recoverPattern <option> [patterns file] [pattern index] [homologous domain IDs file] [domain IDs file] [graph IDs file] [original graph (CIG) files path] [original graph (CIG) file suffix] [min_density] [min_freq] [output file]
  option:
      -h    (--help) print help message
      -o    (--outputIsomorphism) output another file containing isomorphisms for all modules' occurrences. This option will generate a file with extension name ".occurs". Please always use this option, because the generated file provides the full detail of how this recovered pattern is.

Below is the description of the input files:
[patterns file]: input file, it is the PATTERN file generated by Step 3 "Tensor-based frequent dense subgraph identification algorithm", the software "unweightedNetsTensor". Its format refers to the file "README_unweightedNetsTensor".
[pattern index]: input number, it is the index (1-based) of the patterns in the [patterns file]. For example, if pattern index is 10, this software will recover the pattern of the 10-th line in the [patterns file].
[homologous domain IDs file]: input file, each homologous domain (we called "bead") has a string ID. This file is a list of all homologous domain IDs, each is in a line.
[domain IDs file]: input file, each domain (we called "chromosome region") has a string ID. This file is a list of all domain IDs. Because each domain has two copies, each line (a chromosome region) also contains its copy indexes. It is a tab-delimited format is below:
    domain_ID    homologous_domain/copy_A_index    homologous_domain/copy_B_index
    ...    ...    ...
    ...    ...    ...
[original graph (CIG) IDs file]: input file, it is a list of graph (chromosome interaction graph, or CIG) string IDs. Each line is a graph ID.
[original graph (CIG) path]: the directory of all input chromosome interaction graphs (CIG). In this directory, each graph file is named as "graph_ID.net". Each graph file format is below:
    column 1: Homologous domain 1 index (start from 1).
    column 2: Homologous domain 2 index.
    column 3: Weight of the edge between Domains A and B. (It's 1 for unweighted networks)
[network file suffix]: the extension of the network filename, default is ".net".
[min_density]: minimum subgraph density of a pattern
[min_freq]: minimum number of graphs that a pattern/module occurs
[output file]: the recovered pattern for the specified contracted pattern. Please use only the output file (using the option "--outputIsomorphism") with the extension name "*.occurs". Its format is tab-delimited and described below:
    column 1: number of domains in the recovered pattern
    column 2: number of graphs that the recovered pattern occurs
    column 3: the subgraph information in the 1st graph that the recovered pattern occurs. It is a triplet "[network_index]:[subgraph_density]:[a list of bead indexes]". Note that (1) bead index is the line number of this bead in the input "bead IDs file"; and (2) the list of bead indexes is in the format as [685,682,680,686,681,689,683,679,690].
    column 4: the subgraph information in the 2nd graph that the recovered pattern occurs. It is still a triplet as aforementioned.
    column 5: ...
    column xxx: the subgraph information in the last graph that the recovered pattern occurs.