Thread:Machine blog posting analysis (2)

I plan to do very basic analysis in the GSOC timeframe and also working on how many tasks I would be able to complete out of my ideas during the time available for GSOC.

There was a project done at my lab some time back (http://cdeproject.iiit.ac.in/htap/). Although, a bit different from what I plan to do I can use the concepts learned here and apply them accordingly.

To describe one of the relevant projects related to twitter trends in very broad terms I’ll use an example. We take a specific event and then we make a list of all tags related to it. Then from the database of tweets which is continually updated we query for the tweets whose at least some fixed percentage of text matches with the tags. I can do the same for blogs. I have a specific activity. For each activity a list of tags can be written (I plan to do this manually to start with but it can be automated as well). Now I just search for the tags in the blog posts and see if it contains at least some of them. This the most simplistic approach I could think of to achieve automatic detection. I do have some other approaches in mind, for example I could perform an outlier detection (http://www.eng.tau.ac.il/~bengal/outlier.pdf ) for the blog posts and then the instructor or peers would have to revalidate only the outliers.

Of course if you are happy with the results of these I would be glad to perform more rigorous machine learning tasks outside the GSOC time frame too.