Tuesday, July 12, 2011

Operator overloading << >>

#include <iostream>
#include <cstring>

using namespace std;

class Base
{
char strVal[100];

public:
Base(){ strcpy(strVal,"");}
Base(char *val){ strcpy(strVal,val);}

~Base(){*strVal = '\0';}

friend istream& operator >>(istream &is,Base &obj);
friend ostream& operator <<(ostream &os,const Base &obj);
};

istream& operator >>(istream &is,Base &obj)
{
is>>obj.strVal;
return is;
}

ostream& operator <<(ostream &os,const Base &obj)
{
os<<obj.strVal;
return os;
}
void main()
{
Base b;
cin>>b;
cout<<"Printing the value\n";
cout<<b<<endl;
}

Tuesday, May 31, 2011

Handling Command Line arguments.. (with getopt)

  • Normally, getopt is called in a loop. When getopt returns -1, indicating no more options are present, the loop terminates.
  • A switch statement is used to dispatch on the return value from getopt. In typical use, each case just sets a variable that is used later in the program.
  • A second loop is used to process the remaining non-option arguments.

  • #include <ctype.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>

    int
    main (int argc, char **argv)
    {
    int aflag = 0;
    int bflag = 0;
    char *cvalue = NULL;
    int index;
    int c;

    opterr = 0;

    while ((c = getopt (argc, argv, "abc:")) != -1)
    switch (c)
    {
    case 'a':
    aflag = 1;
    break;
    case 'b':
    bflag = 1;
    break;
    case 'c':
    cvalue = optarg;
    break;
    case '?':
    if (optopt == 'c')
    fprintf (stderr, "Option -%c requires an argument.\n", optopt);
    else if (isprint (optopt))
    fprintf (stderr, "Unknown option `-%c'.\n", optopt);
    else
    fprintf (stderr,
    "Unknown option character `\\x%x'.\n",
    optopt);
    return 1;
    default:
    abort ();
    }

    printf ("aflag = %d, bflag = %d, cvalue = %s\n",
    aflag, bflag, cvalue);

    for (index = optind; index < argc; index )
    printf ("Non-option argument %s\n", argv[index]);
    return 0;
    }


    Compile :
    g++ testopt.cpp -o testopt

    Run with various arguments:

    % testopt

    aflag = 0, bflag = 0, cvalue = (null)

    % testopt -ab

         aflag = 1, bflag = 1, cvalue = (null)      
         % testopt -ab -c hello
         aflag = 1, bflag = 1, cvalue = hello      

    Tuesday, April 26, 2011

    Taming WikiPedia Category Hierarchy


    Taming WikiPedia Category Hierarchy

    Wikipedia's category system can be thought of as consisting of overlapping trees. Any category may branch into subcategories, and it is possible for a category to be a subcategory of more than one parent. (A is said to be a parent category of B when B is a subcategory of A).

    Some pointers on categorization:
    • A category can be present in more than one parent category.
    • Cycles are present in the wiki graph like A --> B --> C --> D --> A
    • Every wikipedia article should belong to at-least 1 category.
    • For any category there exist multiple paths from Root to that category.
    • Wiki articles are placed under these overlapping trees.

    [] Articles
    [+] Articles with resourced statements
    [+] Featured articles
    [+] Fundamental categories
    [+] Good articles
    [+] Lists
    [+] Main topic classifications
    [+] Spoken articles
    [+] Timelines


    • A snapshot of a wiki graph













    Problem Statement
    For a given category need to find its root parent category/categories among the following
    categories(under Main Topic Classifications)

    Main topic classifications
    [+] Agriculture
    [+] Applied sciences
    [+] Arts
    [+] Belief
    [+] Business
    [+] Chronology
    [+] Culture
    [+] Education
    [+] Environment
    [+] Geography
    [+] Health
    [+] History
    [+] Humanities
    [+] Language
    [+] Law
    [+] Life
    [+] Mathematics
    [+] Nature
    [+] People
    [+] Politics
    [+] Science
    [+] Society
    [+] Technology


    Our Approach
    We approach the given problem considering following points

    We need to remove cycles from the loop, otherwise it will become
    an infinite loop
    We need to avoid multiple paths to a category from any
    root categories.
    We need to remove sibling paths as they also increase
    complexity and then too lead to the same parent.
    Need to create a graph which points only downwards not vice
    versa and without cycles (as given in the image)
    Sub categories should fall under sub levels , so that this graph
    can be visualized as a multi-root Tree.
























    We follow the graph colouring principles from Red-Black Tree and traverse our graph in
    Breadth First manner with following rules:











    • All processed nodes are coloured in Black.
    • All leafs in process are coloured Grey.
    • Edge is possible only from Black --> Grey node not vice versa.
    • All the leafs of a level are coloured Black after processing.

    Our Observation


    Out of 1.3 million unique categories, we covered around 532,000
    categories with 745,000 edges in our graph. The created wikigraph
    has 16 level hierarchy. To persist this graph we are using OrientDB.
    On a machine with 4 core - 16 GB ram the graph can be created
    in T<25 mins. Now we can identify with root categories for a given
    category by graph connectivity algorithms. OrientDb also provides
    queries for the same, hence we are using the queries to find connection
    of a category with the root nodes.