CSE3201 Information retrieval systems - Semester 1 , 2007 unit guide

Semester 1, 2007

Chief Examiner

Dr. Maria Indrawan

Lecturers

Caulfield : Dr. Maria Indrawan

Outline

The unit covers the information management issues related to handling non-relational data. XML and free-text documents are the example of the non-relational data. XML covers DTD, XML Schema, XSLT and XPath. Free-text retrieval covers indexing techniques, vector space model, evaluating text retrieval performance and distributed text retrieval systems.

Objectives

Knowledge and Understanding
  • Students will understand the limitations of relational based data management to support XML and text databases.
  • Students will be able to understand the requirement to produce a well-formed and valid XML documents.
  • Students will be able to translate from user requirements to an XML Schema.
  • Students will be able to extract relevant information from XML documents.
  • Students will be able to identify different components of information retrieval systems.
  • Students will be able to give a sample approach or technique for each of the Information Retrieval system's component.
  • Students will be able to identify different approaches to distributed Information Retrieval Systems, and to understand the benefits and limitation of each of the approach.
  •  

    Prerequisites

    Before attempting this unit you must have satisfactorily completed

    CSE2132 Database Systems or equivalent.

    You should have knowledge of

  • Modelling techniques, such as ER and relational modelling
  • File organisation
  • Data structure
  • A programming language

  • Unit relationships

    CSE3201 is a an elective unit in the Bachelor of Computing. Before attempting this unit you must have satisfactorily completed

    CSE2132 Database Systems

    , or equivalent. You should have knowledge of
  • Modelling techniques, such as ER and relational modelling
  • File organisation
  • Data structure
  • A programming language

  • Texts and software

    Required text(s)

    Dwight Peltzer, XML:Language Mechanics & Applications, Addison Wesley, 2004, 0-201-77168-3

    Textbook availability

    The textbook can be purchased from Monash University bookshop.

    The library has some copies located under reserved and general loan.

    Software requirements

    XML Writer 2.6, Wattle Software

    The software is available in all Faculty of IT student laboratories.

    Software may be:

    • downloaded from www.xml.org (30 days evaluation copy)

    Hardware requirements

    N/A

    Recommended reading

    N/A

    Library access

    You may need to access the Monash library either personally to be able to satisfactorily complete the subject.  Be sure to obtain a copy of the Library Guide, and if necessary, the instructions for remote access from the library website.

    Study resources

    Study resources for CSE3201 are:

    The unit website can be accessed through MUSO

    Unit website

    http://muso.monash.edu.au/

    Structure and organisation

    Week Topics Key Dates
    1 Introduction to XML, DTD
    2 XML Schema Part 1
    3 XML Schema Part 2
    4 XML design and Namespace
    5 XPath, XSLT Part 1
    6 XSLT Part 2
    Non teaching week
    7 XSLT 3 Assignment 1 Due, Monday 12 Noon. The Presentation will be conducted in the tutorials.
    8 Introduction to Information Retrieval
    9 Text Indexing
    10 Storage Structure and Retrieval Models
    11 Performance Measurement for Text Retrieval Systems Assignment 2 Due, Monday 12 Noon
    12 Distributed Text Retrieval Systems/Search Engines
    13 Revision

    Timetable

    The timetable for on-campus classes for this unit can be viewed in Allocate+

    Assessment

    Assessment weighting

    Assessment for the unit consists of 2 assignments and a presentation with a total weighting of 50% and an examination with a weighting of 50%. Read this section VERY carefully.

     

    Component A:

    Weekly folio +attendance hurdle
    Assignment 1 – XML Design 20% (week 7)
    Assignment 2 – XML Programming 20% (week 11)
    Presentation 10% (week 7)

    Each week you will need to submit your group work progress on the assignment to the tutor. It is also expected that the each member of the group will attend the tutorial every week, hence contribute to the development of the assignment.

    The assignment 1 and 2 will be conducted as a group assignment. The group will be made up of 2 people. The final mark for the individual member of the group may vary depending on the member understanding of the submitted work. Individual interview may be organised to clarify the participation.

    Component B:

    Exam 50%

     

    Assessment Policy

    To pass this unit you must:

    • obtain 50% of the total amount of marks available in the unit and
    • at least 40% of the available marks in component A and B.

    In the situation whereby you fail to meet the 40% rule, the final mark that will be published is the mark of the assessment that failed to meet the 40% rule.

    For example, a final mark of 38 will be awarded to a student who receives 70% on all of the assignments and 38% on the exam.

    Your score for the unit will be calculated by:

    For value of both component A and B are greater than 40%

    Final = 0.2 x assignment 1 + 0.1 x presentation + 0.2 x assignment 2 + 0.5 x final exam.

     

    For value of component A or B less than 40%

    Final = value of the component that is less than 40%

    Assessment Requirements

    Assessment Due Date Weighting
    XML Design Monday, 16th April 2007 20%
    XML Programming Monday, 14th May 2007 20 %
    Presentation Tuorial, week 7 10 %
    Exam is 2 hours closed book exam. Exam period (S1/07) starts on 07/06/07 50 %

    Assignment specifications will be made available on the CSE3201 unit website.

    Assignment Submission

    Assignments to be submitted in both printed and softcopy format for the XML Design and XML programming assignment.

    All assignments need to be submitted to the Caulfield School of IT assignment box located on the H building level 6 by 12 NOON on the due date.

    Extensions and late submissions

    Late submission of assignments

    Assignments received after the due date will be subject to a penalty of 10% of the available mark per day. Submission received later than one week after the due date will not normally be accepted.

    This policy is strict because comments or guidance will be given on assignments as they are returned, and sample solutions may also be published and distributed, after assignment marking or with the returned assignment. 

    Extensions

    It is your responsibility to structure your study program around assignment deadlines, family, work and other commitments. Factors such as normal work pressures, vacations, etc. are seldom regarded as appropriate reasons for granting extensions. 

    Requests for extensions must be made to the unit leader (Dr. Maria Indrawan) by email at least two days before the due date. You will be asked to forward original medical certificates in cases of illness, and may be asked to provide other forms of documentation where necessary. A copy of the email or other written communication of an extension must be attached to the assignment submission.

    Grading of assessment

    Assignments, and the unit, will be marked and allocated a grade according to the following scale:

    Grade Percentage/description
    HD High Distinction - very high levels of achievement, demonstrated knowledge and understanding, skills in application and high standards of work encompassing all aspects of the tasks.
    In the 80+% range of marks for the assignment.
    D Distinction - high levels of achievement, but not of the same standards. May have a weakness in one particular aspect, or overall standards may not be quite as high.
    In the 70-79% range.
    C Credit - sound pass displaying good knowledge or application skills, but some weaknesses in the quality, range or demonstration of understanding.
    In the 60-69% range.
    P Pass acceptable standard, showing an adequate basic knowledge, understanding or skills, but with definite limitations on the extent of such understanding or application. Some parts may be incomplete.
    In the 50-59% range.
    N Not satisfactory failure to meet the basic requirements of the assessment.
    Below 50%.

    Assignment return

    We will aim to have assignment results made available to you within two weeks after assignment receipt.

    Feedback

    Feedback to you

    You will receive feedback on your work and progress in this unit. This feedback may be provided through your participation in tutorials and class discussions, as well as through your assignment submissions. It may come in the form of individual advice, marks and comments, or it may be provided as comment or reflection targeted at the group. It may be provided through personal interactions, such as interviews and on-line forums, or through other mechanisms such as on-line self-tests and publication of grade distributions.

    Feedback from you

    You will be asked to provide feedback to the Faculty through a Unit Evaluation survey at the end of the semester. You may also be asked to complete surveys to help teaching staff improve the unit and unit delivery. Your input to such surveys is very important to the faculty and the teaching staff in maintaining relevant and high quality learning experiences for our students.

    And if you are having problems

    It is essential that you take action immediately if you realise that you have a problem with your study. The semester is short, so we can help you best if you let us know as soon as problems arise. Regardless of whether the problem is related directly to your progress in the unit, if it is likely to interfere with your progress you should discuss it with your lecturer or a Community Service counsellor as soon as possible.

    Plagiarism and cheating

    Plagiarism and cheating are regarded as very serious offences. In cases where cheating  has been confirmed, students have been severely penalised, from losing all marks for an assignment, to facing disciplinary action at the Faculty level. While we would wish that all our students adhere to sound ethical conduct and honesty, I will ask you to acquaint yourself with Student Rights and Responsibilities and the Faculty regulations that apply to students detected cheating as these will be applied in all detected cases.

    In this University, cheating means seeking to obtain an unfair advantage in any examination or any other written or practical work to be submitted or completed by a student for assessment. It includes the use, or attempted use, of any means to gain an unfair advantage for any assessable work in the unit, where the means is contrary to the instructions for such work. 

    When you submit an individual assessment item, such as a program, a report, an essay, assignment or other piece of work, under your name you are understood to be stating that this is your own work. If a submission is identical with, or similar to, someone else's work, an assumption of cheating may arise. If you are planning on working with another student, it is acceptable to undertake research together, and discuss problems, but it is not acceptable to jointly develop or share solutions unless this is specified by your lecturer. 

    Intentionally providing students with your solutions to assignments is classified as "assisting to cheat" and students who do this may be subject to disciplinary action. You should take reasonable care that your solution is not accidentally or deliberately obtained by other students. For example, do not leave copies of your work in progress on the hard drives of shared computers, and do not show your work to other students. If you believe this may have happened, please be sure to contact your lecturer as soon as possible.

    Cheating also includes taking into an examination any material contrary to the regulations, including any bilingual dictionary, whether or not with the intention of using it to obtain an advantage.

    Plagiarism involves the false representation of another person's ideas, or findings, as your own by either copying material or paraphrasing without citing sources. It is both professional and ethical to reference clearly the ideas and information that you have used from another writer. If the source is not identified, then you have plagiarised work of the other author. Plagiarism is a form of dishonesty that is insulting to the reader and grossly unfair to your student colleagues.

    Communication

    Communication methods

    Students are encouraged to attend the helpdesk sessions for content related queries. For clarification on the administrative procedure of the unit, please post the query in the newsgroup.

    Notices

    Notices related to the unit during the semester will be placed on the Notices Newsgroup in the Unit Website. Check this regularly. Failure to read the Notices newsgroup is not regarded as grounds for special consideration.

    Consultation Times

    Please refer to the Helpdesk sessions schedule.

    If direct communication with your unit adviser/lecturer or tutor outside of consultation periods is needed you may contact the lecturer and/or tutors at:

    Dr Maria Indrawan
    Fax +61 3 990 31077

    Ms Flora Salim

    This person's profile is not available.Image of this person is not available.

    All email communication to you from your lecturer will occur through your Monash student email address. Please ensure that you read it regularly, or forward your email to your main address. Also check that your contact information registered with the University is up to date in My.Monash.

    Additional information

    Helpdesk Session

    TBA

    Last updated: Feb 9, 2007