Zhou Files

##########################################################################
Source code of Step 3 "Tensor-based frequent dense subgraph identification algorithm" in the paper "Population analysis of 3D genome structures reveals major factors governing the stability of regulatory communities"
##########################################################################

This source code is written by Standard ANSI C. It can be compiled in both Windows and Unix systems. It has been tested using Visual C++ IDE in Windows and gcc in Linux.

##########################################################################
Compilation and Building Source Code
##########################################################################

In Unix, you can type "make clean; make".

In Windows, you need to firstly change the following lines in the C file "UtilsOfTensor.h" (in the beginning of this file):

#define WINDOWS
//#define UNIX

Then use any C compiler tool of Windows to compile and build source code.

##########################################################################
Command Usage
##########################################################################

Please note that all paths provided for the parameters must include the path separator "/" in the end of path string. For example, "/home/user_name/graphs/".

Below are required parameters for running the program:

-tNUM or --nodeTotalNumber=NUM :total number of nodes in the graphs, REQUIRED
--minNode=NUM : minimum number of nodes for the frequent pattern
--minNet=NUM : minimum number of graphs or structures for the frequent pattern
--minDensity=NUM : minimum subgraph density for the frequent pattern
-DSTRING or --selectedDatasetsListFile=FILE :required(put in the graphs path)
-NSTRING or --networksPath=PATH : the directory of all the graph files
-RSTRING or --resultPath=PATH : the directory that will contain the output pattern file

Below is the full description of all parameters.

Tensor Computation of Multiple Networks for Frequent Dense Subgraphs Discovery
Tensor Method: Multi-Stage Convex Relaxation (MSCR)

Usage: unweightedNetsTensor [options] ...
Options:
-h or --help :show this message and exit
-tNUM or --nodeTotalNumber=NUM :total number of nodes in the graphs, REQUIRED
-gNUM or --minNode=NUM :minimum number of nodes, default(3)
-GNUM or --maxNode=NUM :maximum number of nodes, default(30)
-nNUM or --minNet=NUM :minimum number of graphs, default(3)
-dNUM or --minDensity=NUM :minimum density (0<d<=1) :default(0.6)
-DSTRING or --selectedDatasetsListFile=FILE :required(put in the graphs path)
-NSTRING or --networksPath=PATH : the directory of all the graph files. default(./datasets/)
-RSTRING or --resultPath=PATH : the directory that will contain the output pattern. default(./results)

-pNUM or --maxPattern=NUM :number of patterns to be discovered, default(unlimited: -1)
-iNUM or --nIteration=NUM :number of iterations, default(20)
-sNUM or --nStage=NUM :number of stages, default(20)
-SSTRING or --maskStrategy=STRING :EDGES_PATTERN,EDGES_ALLNETS or GENES (default: EDGES_PATTERN)
-OSTRING or --overlapPatternChoose=STRING :PATTERN_WITH_MORE_NETS,PATTERN_WITH_MORE_GENES, PATTERN_WITH_BOTH or PATTERN_WITH_NONZEROS_XY (default: PATTERN_WITH_BOTH)
-fSTRING or --prefixResultFilename=STRING :default()
-FSTRING or --suffixdataFile=STRING :default(.cnet)
-ENUM or --nEdgesLoad=NUM :max #edges loaded for each graph, default(1000000)
-r or --resume :resume the last run
-c or --succintOutput :output only PATTERN file -u or --loadUnweighted :load graphs in the unweighted manner.

-wSTRING or --rangeWeight=STRING : minimum and maximum weight of edges that can be loaded to memory, default (ALL)
could be like [0.5,0.8]
-eSTRING or --removeHeavyFrequentEdges=STRING : threshold of weight and frequency of the edge that will not be
counted in tensor analysis. If this option is not set, then do
NOT remove heavy-frequency-edges; otherwise remove them before
tensor analysis. It is given like [0.6,0.4], meaning remove
edges whose weight>=0.6, and frequency>=%40, default (NONE)
-INUM or --Init_xy=NUM : 1, use random values to initialize vectors x and y;
0, use ones to initialize x and y;
2, use random unit vector to initialize x and y;
default (0)
-PSTRING or --ParmsFile=STRING : parameters file. If not setting, use default parameters.
-oSTRING or --ResultFile=STRING : result filename(no path included). If not setting, use default filename(with params).
-xSTRING or --excludeNodesIndexesFile=STRING : filename(path must be included).
-X or --excludeEdgesOfDirectNeighborNodes : if two nodes (domains) are adjacent in genome (their indexes are of difference 1), remove the edge connecting them.
Algorithm predefined parameters, files and directories
Predefined 0 sets of parameters:
maxNode is 30
datasets Path - ./datasets
datasets filenames prefix and suffix - '' and '.cnet'
results Path - ./results

Please email mailto:XJZHOU@usc.edu or mailto:WEL@usc.edu
Further refers to http://zhoulab.usc.edu for help.

##########################################################################
Files format
##########################################################################

-------------------------------
Format of graphs IDs file: containing a list of graphs' IDs.

Each line contains a string (no space and tab) representing a graph ID. Note that the order of graphs in this file is served for graph indeces in PATTERN file. For example, a graph with index 21 indicates this graph's ID can be found at Line 21 of this graphs IDs file.

-------------------------------
Format of graph file: containing all edges and their weights of a graph.

This file is a tab separated file. Columns are separated by tab.

column 1: Node A index (start from 1).
column 2: Node B index.
column 3: Weight of the edge between Nodes A and B. (It's 1 for unweighted graphs)

-------------------------------
Format of PATTERN file: containing all representative frequent dense subgraphs found.

This file is a tab separated file. Columns are separated by tab.

column 1: Node indices (index starts from 1). These are not node IDs, but node index.
column 2: Number of nodes that this pattern contains.
column 3: density. The minimum density of the subgraph in the graphs it occur.
column 4: Number of graphs the subgraph occurs.
column 6: Subgraph Densities: The graphs (indices) this subgraph occurs are sorted in order of their indices (increasing order, start from 1 not 0). Each heaviness is listed as a triplet of values: [number of edges]:[graph index]:[density]. these triplets are separated by a space. The number of edges and heaviness are obviously duplicate information as density=number_of_edges/nodes_number_choose_2, but I included both. The number of graphs listed here will be equal to the number in column 4. The smallest density among all these graphs in this list will match the densityin column 3.