1. IntroductionWe, the human always need to study new things. Learningis a process which is exists on a planet from when humanstarts to live in a community. In ancient world, knowledgeis spread by learning system of remembering things. Handwritten books come to help humans in middle age andlearning process gets improved. In current age, learningprocess gets improved from printed books to digital books.But,knowledge gathering and its representation is still madeby manual system. Now a days, gradually emerging technologychanging the way learning process works. The revolutionof information and communication technologies (ICTs)has affected education, providing means to enhance boththe teaching and learning processes. Technology-supportedlearning systems (TSLSs), such as intelligent tutoring systems(ITSs), adaptive hypermedia systems (AHSs), and,especially, learning management systems (LMSs) such asMoodle or Blackboard, are being widely used in manyacademic institutions and becoming essential for education.The Domain Module is considered the core of anyTSLSs. It represents the knowledge about a subject matterto be communicated to the learner. An appropriate DomainModule, i.e., the pedagogical representation of the domain tobe learned is the basic requirement of learning methodology.In e-learning, the Domain Module enables the students tolearn by themselves. While it helps an instructor to guidestudents through the learning process.The Domain Module construction includes steps, theselection of domain topics to be learned, define the pedagogicalrelationships among the topics and contents, representthe learning sessions. The authors of study textbooks willfollow similar steps while writing their documents, whichare structured to facilitate comprehension and learning.Basically, authors will choose a set of reference booksthat provide the main didactic resources (DRs)definitions,examples, descriptions, etc. for the subject. Also teacherswill follow this same common technique for scheduling theirlectures. Implementation of artificial intelligence techniquesin learning system provide the means for the semiautomaticconstruction of the Domain Modules from electronic whichmay significantly contribute to reduce the development costof the Domain Modules.The proposed semi-automatic system DOM-Sortze withlittle bit modifications is a framework for the semiautomaticgeneration of the Domain Module from electronic textbook.The electronic textbook is submitted by the user of system.Semi-automatic system DOM-Sortze aims to be domain independent,so no domain-specific knowledge is used exceptthe processed electronic textbook, topic provided by thesystem user and the knowledge gathered from it.2. Related WorkA semi-automatic learning system describes the systemwhich is used to create the domain module from electronictextbooks. In other words, outcome of gathering the LDOand the LOs can be supervised with help of using a conceptmap-based tool for the supervision of the Domain Moduleauthoring process. The natural language processing technologyis used for analysis of the electronic textbook contentsand heuristic reasoning, domain ontology used for generateand analysis the ontology from electronic text books. Thepertinent knowledge which is get from text book calledas domain module. Thus technology based domain moduleconstruction reduces the cost of generation of domainmodule which is high and effort intensives.In 1 M.A, Hearst suggested a discovery pattern is requiredto provide applicability in wide range of text. It helpsin avoiding need for preen coded knowledge .The discoverypattern can be achieved by using hyponyms relation.The metadata 2 is required to develop a learningsystem. The metadata is abstract data of learning objects.Automatic generated data 2 is used by J.R. Anderson todevelop learning system. Also as automatic generation ofmetadata is required for digital library such that e learning,user will provide great number of learning objects.As automatic generation of metadata is required, domainontology also must be generated automatically 3 or semiautomatically.After generation of ontology it has been usedin e-learning based education as domain module. A. Zouaqand R. Nkambou, states that 3 lack of reusing learningobjects causes increase in time complexity.As per 4 there was a need of reusing learning objectsby retrieving will help in lighten the workloads of constructionof new online courses. Learning objects can be reusedto support learning in different platforms or environments. Asystem that automatically build learning objects from electronictexts using POS analysis, Natural language processingtechniques, ontologies and heuristic reasoning is presented.ErauzOnt framework 5 used to generate the learningobjects from the electronic text book directly. The LearningObjects are domain independent. Pdf2Tree Builder, whichcalls the Pdf2XML Service to extract the information fromthe input file and LO File Builder, which generates the finalcollection for the generated Learning objects.3. Domain Module ConstructionThe proposed system in a hadoop environment is amodified answer to generate semi-automatic learning domainmodule from eBook provide by system user. DOMSortzewith modifications in algorithm and implementationin mapReduce infrastructure is proposed to generate learningdomains from electronic text book. DOM-Sortze aims to bedomain independent, so no domain-specific knowledge isused except than the processed electronic textbook providedby system user and the knowledge gathered from it.The proposed system processes e-books by implementingtechnology NLP (Natural Language Processing, POSFigure 1. Architecture Diagram(Part of Speech) tagging and image processing. DOM-Sortzeis built of two main applications the Preprocessor and LDOBuilder that carry out the tasks for building the DomainModule. The first carry out the textbook processing taskand the latter facilitates the intervention of the CaptionModule authors, either instructional designers or teachers,to supervise the results. The generated short notes, notesare presented to user which might be helpful for the usersto determine whether or not the LO fits their requests.Preprocessor and LDO builder are two main modules ofproposed system.3.1. Textbook PreprocessingTextbook preprocessing processes textbook and constructa tree like structure so that later task won’t dependon the format of the ebook uploaded by the user.The treelike structure includes chapters, its subtopics and so on. Theoutcome from textbook preprocessing is helpful in both themodules i.e.gathering LDO and LO.The internal representation of the document is analyzedusing Part Of Speech tagging.3.2. LDO Builder ModuleLDO Builder module gathers domain ontology from thecontents extracted from eBook provided by system user. Itwas addressed in two ways, automatic or semiautomaticin all the works. The LDO pronounces a certain domainfor learning determinations; it is an application ontologybestowing to Guarinos cataloguing 4. Ontology learningrelies on the assumption that there is semantic knowledgeunderlying syntactic structures. In Text2 Onto uses Hearstspatterns to gather taxonomic relationships and nested termbasedmethods to identify the set of candidate domain topics.The LDO builder module pre-process the electronictext extracted by applying technologies like NLP and POStagging on it to generate LDOs. The LDO is gathered by theLDO Builder from the internal representations of the electronictextbook. The topics of the LDO are gathered from theoutline of the textbook and from the whole document. Theidentification of the topics from the whole document is conducted.The identification of the pedagogical relationshipsis also achieved following a pattern-recognition approach.The further processing stage of LDO builder includes imageprocessing. The image processing technology applies to findout any image contents in a document provided by systemuser. The image find in a document is processes using imageprocessing to gather any knowledge from it and relate imageand corresponding knowledge.The LDO has main topics and pedagogical relationshipbetween them. Pedagogical relations may be structural i.eisA and PartOf or sequential i.e. prerequisite and next.3.2.1. Document Outline analysis. This analysis includes Basic Analysis Heuristic AnalysisBasic Analysis uses the homogenized outline internalrepresentation to find important topics and relationshipamong them. In this anlysis, each unit index is consideredas a domain topic and each sub topic is used to explainrelationship with domain topic.Heuristic Analysis carried out in two steps. First,Heuristic for structural relationships used to find isA andisPartOf relationship among domain and it’s sub topics.After that, Heuristic for sequential relationships used to findprerequisite and next relationships among domain and it’ssub topics.3.2.2. Document Body Analysis. This analysis carried outby identifying new topics and pedagogical relationshipsamong these topics. We will go through each method indepthNew Topics Identification enhances the LDO gatheredin previous phase by identifying new topics and relationshipamong them. As per DOM-Sortze algorithm, Erauzterm6gathers one-word or multi-word terms which determines thedomain relatedness.Relationship among Topics Identification allows pedagogicalrelationship among topics from electronic textbookuploaded by user using pattern based approach which recognizesrelation among topics based on syntactic structuresin sentences in which topics appear.3.3. LO ModuleLO module is used to generate the notes and short notesfrom knowledge gathered from LDO builder process. LOmodule will generate notes and short notes which includesdefinitions, important points, examples, short points whichare easy to remember, images at appropriate locations suchthat corresponding points. LO module identifies and gathersLOs from electronic documents using Natural LanguageProcessing (NLP) techniques, image processing and ontologies.This framework aims to be applicable on any documentin form of electronic textbook no matter the domain itrelates to. None of its components relies on implicit domainspecificknowledge 4.As shown in fig2, DR Grammer, Discourse Markersand Learning Domain Ontology are used to gather DR’sfrom Internal Representation of document. LDO and ALOCOMOntology are used to generate LO’s from generatedDR’s. Then, generated LO’s stored in LOR for further use.Figure 2. Generation of LO’s Diagram3.3.1. DR’s Generation. Didactic Resources generation iscarried out by identifying important terms like definitions,examples, theorems, important statements, principles, etc.As shown in Fig.3, Internal representation of documentgenerated during previous stages are labeled to getlabeled internal representation of document. DR Grammerrecognizes different patterns and syntactic structures foundin e-book by using different set of rules and generates DR’s.These generated DR’s enhanced by Discourse Markers, DidacticOntology and Learning Domain Ontology. Also, itcan be enhanced by combining similar DR’s and by addingprevious fragments to each DR if it referenced to previousDR’s or statements.Combination of Similar DR’s is based on the similaritybetween the two DR’s. Similarity between two DR’s iscarried out by finding similarity between topics and theirresemblance with adjacent topics. The methods used to findsimilarity between DR’s returns the value in the range of 0Figure 3. Generation of DR’s Diagramto 1. Two DR’s are considered to be similar if the similarityreturned is beyond certain threshold value.The Cohesion Maintenance is based on DiscourseMarkers i.e. words or expressions that connects words orsentences. Discourse Markers is divided into three types Single for ex – Therefore, etc. Complex for ex – First, Finally, etc. References for ex – This, That, etc.The method deals differently with each type of DiscourseMarkers. If DR starts with “Therefore” or “This”or “That” i.e single or references, it will add at most threenecessary sentences. If DR starts with “Finally” i.e complex,then it will add all the necessary statements until the initialpart.3.3.2. DR’s to LO’s. To retrieve appropriate LO’s from alarge set of generated LO’s or Learning Domain Repositoryis difficult to use or reuse of LO’s. The selection of LO’scan be done by using the metadata that describes it. But,for larger deployments, manual construction of metadatawon’t be an option to generate suitable LO. Metadata foreach LO has been generated automatically using Samgi7,an automatic metadata generator. The generated metadatacan be enhanced using information extracted during thegeneration of DR.The usability of LO can affect by it’s presentationformats like pdf,doc, odf, etc. These formats are suitablefor final presentation but are difficult to reuse contents ofLO’s.So, In proposed system, generated LO’s are stored inzip file which contains XML representation of LO’s as wellas referenced images and resources.ALOCOM ontology represents content model for LO’swhich includes definitions,examples, theorem, importantstatements, principle, etc. and it’s components.3.3.3. LO Storage. When LO’s and it’s preview files getgenerated,they get added to to Learning Domain Repositoryto allow reuse of contents of LO’s. The LOR can be queriedto find suitable LO’s. When LO’s added to LOR, it’s all thecomponents also get stored to LOR.4. DOM-Sortze ArchitectureFigure 4. Architecture of DOM-SortzeFig.4 swhows architecture of DOM-Sortze. DOMSortzehelps in building Domain Module. It is flexiblend platform independent due to it’s web service orientedarchitecture. DOM-Sortze consists of four main applicationsi.e. Elkar-DOM, ErauzOnt, LDO Builder and Preprocessor.LDO Builder, Preprocessor, ErauzOnt preprocesses documentand Elkar-DOM facilitates supervision of authors orinstructors. All generated LO’s are stored in LO PreviewRepository and LO Repository. LO Preview Repositorypreviews files for LO’s and LO Repository stores all theresources and metadata related to LO’s. Lucene Index storesall the resources related to LO.Following services are used in DOM-Sortze -SQI(Simple Query Interface) Services – Helps to query suitableLO’s from a large set of LOR.RD(Replication Determine) Services – Determines whethertopics or part of topics has been processed before or notto avoid repetition of generated data.Pdf2XML Services – Extracts contents from pdf documentand generate XML representation of document along withreferenced images.DIR(Document Internal Representation) Builder – Providesinternal representation of document with it’s outline.NLP Analysis Services – Returns Part Of Speech informationfor text.CG(Constraint Grammer) Services – Helps in grammer basedanalysis.HC(Heuristic Confidence) Services – Determines confidence ofHeuristics used during analysis.UKB Services – Provides similarity count between to DR’swhich helps in combination of two similar DR’s.SAmgI Services – Helps in automatic metadata generation.5. Hadoop based modified DOM SortzeFrameworkThe framework must fulfill the following requirements5.1. Automatic Construction of the Caption ModuleTextbook contents which includes notes for study developmentis time and effort consuming, so the workloadof the users while building Caption Modules should belightened. Also, to find out related images and place them atappropriate locations is hectic work. In order to achieve thatgoal, the framework should rely on automatic processes thatminimize the need for the intervention of the users. Ratherthan authoring the Caption Module from scratch, the usersare expected to play a different role. They will supervisethe automatically gathered knowledge and adapt it to theirpreferences and needs. The DOM-Sortze framework consistsof several applications that perform the acquisition of thedifferent components of the Domain Module. Its modulardesign has facilitated carrying out an incremental developmentof the framework addressing, different components,which deal with particular aspects of the Domain Moduleconstruction, in each interaction.5.2. Knowledge reuseDom-Sortze framework is also reduces the cost of thedevelopment of the caption modules by content reusing.Knowledge reuse is one of the major concerns of thestudy area during the last years. In particular, standards andspecifications that enable the development and distributionof reusable components have been defined. The developedframework should facilitate and promote content reuse, towhich end it will rely on these specifications and standards.5.3. Collaborative workLecturers and teachers usually cooperate while preparingand planning their lessons, so they should be able tosupervise and develop the caption modules in a similar way.Therefore, the developed framework should facilitate thecollaborative work of the users and enable them to worktogether, discuss the contents to be used, etc. The frameworkmust provide an easy means to supervise the automaticallyelicited knowledge that facilitates the collaborative work ofthe Domain Module authors.5.4. Domain-independenceThe modified version of proposed Dom-Sortze frameworkshould not be limited to a particular domain. Because,it may limit its usefulness in different sectors of education.Although focusing on a particular single domain may resultin more accurate techniques, but those are not being applicableto other areas. So a domain-independent approachis pursued and, thus, previous domain-specific informationshould be not used. So, the proposed modified version ofDom-Sortze framework is totally domain independent andhence useful in different fields of education.6. Implementation MethodologyThe proposed modified mapReduce infrastructure basedDom-sortze application wills first analysis e-book such thatpdf file provided by system user and present the file sizeand total number of pages. Then LDO builder module willbuild remote document outline and build internal outline forthe document. User has to select from and to page number,such that from which contents of e-book, notes to be generated.Then as per page numbers selected by user, systemextract contents and images from e-book. Then apply POStagging to extracted contents and develop hyponyms relationbetween words. The POS tagging will be repeated for eachindividual statement. These will generate LDOs from POStagging and hyponyms relations. The image processing modulegathers knowledge from images and bind images withappropriate points from extracted contents. Then systemwill generate LOs from generated DRs. And finally, buildDomain Module from generated LOs. The generated domainmodule with available corresponding images is presentedto a user in a form of notes and short notes points. Userwill supervise the points, select as per requirement, modifyselected point if required and finally add point too finallygenerated notes such that final domain module.