#include <iostream>
#include <cstring>
using namespace std;
class Base
{
char strVal[100];
public:
Base(){ strcpy(strVal,"");}
Base(char *val){ strcpy(strVal,val);}
~Base(){*strVal = '\0';}
friend istream& operator >>(istream &is,Base &obj);
friend ostream& operator <<(ostream &os,const Base &obj);
};
istream& operator >>(istream &is,Base &obj)
{
is>>obj.strVal;
return is;
}
ostream& operator <<(ostream &os,const Base &obj)
{
os<<obj.strVal;
return os;
}
void main()
{
Base b;
cin>>b;
cout<<"Printing the value\n";
cout<<b<<endl;
}
Tuesday, July 12, 2011
Tuesday, May 31, 2011
Handling Command Line arguments.. (with getopt)
getopt
is called in a loop. When getopt
returns -1
, indicating no more options are present, the loop terminates.switch
statement is used to dispatch on the return value from getopt
. In typical use, each case just sets a variable that is used later in the program.#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main (int argc, char **argv)
{
int aflag = 0;
int bflag = 0;
char *cvalue = NULL;
int index;
int c;
opterr = 0;
while ((c = getopt (argc, argv, "abc:")) != -1)
switch (c)
{
case 'a':
aflag = 1;
break;
case 'b':
bflag = 1;
break;
case 'c':
cvalue = optarg;
break;
case '?':
if (optopt == 'c')
fprintf (stderr, "Option -%c requires an argument.\n", optopt);
else if (isprint (optopt))
fprintf (stderr, "Unknown option `-%c'.\n", optopt);
else
fprintf (stderr,
"Unknown option character `\\x%x'.\n",
optopt);
return 1;
default:
abort ();
}
printf ("aflag = %d, bflag = %d, cvalue = %s\n",
aflag, bflag, cvalue);
for (index = optind; index < argc; index )
printf ("Non-option argument %s\n", argv[index]);
return 0;
}
Compile :
g++ testopt.cpp -o testopt
Run with various arguments:
% testopt
aflag = 0, bflag = 0, cvalue = (null)
% testopt -ab
aflag = 1, bflag = 1, cvalue = (null)
% testopt -ab -c hello
aflag = 1, bflag = 1, cvalue = hello
Tuesday, April 26, 2011
Taming WikiPedia Category Hierarchy
Taming WikiPedia Category Hierarchy
Wikipedia's category system can be thought of as consisting of overlapping trees. Any category may branch into subcategories, and it is possible for a category to be a subcategory of more than one parent. (A is said to be a parent category of B when B is a subcategory of A).
Some pointers on categorization:
- A category can be present in more than one parent category.
- Cycles are present in the wiki graph like A --> B --> C --> D --> A
- Every wikipedia article should belong to at-least 1 category.
- For any category there exist multiple paths from Root to that category.
- Wiki articles are placed under these overlapping trees.
[–] Articles
[+] Articles with resourced statements
[+] Featured articles
[+] Fundamental categories
[+] Good articles
[+] Lists
[+] Main topic classifications
[+] Spoken articles
[+] Timelines
- A snapshot of a wiki graph
Problem Statement
For a given category need to find its root parent category/categories among the following
categories(under Main Topic Classifications)
Main topic classifications
Our Approach
We approach the given problem considering following points
We need to remove cycles from the loop, otherwise it will become
an infinite loop
We need to avoid multiple paths to a category from any
root categories.
We need to remove sibling paths as they also increase
complexity and then too lead to the same parent.
Need to create a graph which points only downwards not vice
versa and without cycles (as given in the image)
Sub categories should fall under sub levels , so that this graph
can be visualized as a multi-root Tree.
We follow the graph colouring principles from Red-Black Tree and traverse our graph in
Breadth First manner with following rules:
- All processed nodes are coloured in Black.
- All leafs in process are coloured Grey.
- Edge is possible only from Black --> Grey node not vice versa.
- All the leafs of a level are coloured Black after processing.
Our Observation
Out of 1.3 million unique categories, we covered around 532,000
categories with 745,000 edges in our graph. The created wikigraph
has 16 level hierarchy. To persist this graph we are using OrientDB.
On a machine with 4 core - 16 GB ram the graph can be created
On a machine with 4 core - 16 GB ram the graph can be created
in T<25 mins. Now we can identify with root categories for a given
category by graph connectivity algorithms. OrientDb also provides
queries for the same, hence we are using the queries to find connection
of a category with the root nodes.
Subscribe to:
Posts (Atom)