Web Development for Research

Monday, March 11, 2024

After documentation

I spent the last couple of weeks learning how to document Python code as well as Angular and React projects. I figured this is best done when the project is in a nascent stage, so as to establish this as the baseline going forward.

The best documentation experience was with Angular. I used Jsdoc to document the Typescript code, but for Angular, there is a package compodoc that produces an elegant web document. This is the typical documentation for a Angular component:

/**
 * Generates a container for the video player
 *
 * @param {number} width The width of the container (optional)
 * @param {number} height The height of the container (optional)
 * @returns A container for the video and controls
 *
 * @example
 * Without any inputs, container adjusts to browser window
 * 
 *
 * @example
 * With only width, container has fixed width and
 * aspect ratio of 16:9
 * 
 *
 * @example
 * With only height, container has fixed width and
 * aspect ratio of 16:9
 * 
 *
 */

After documenting all components and utility functions, I created a very basic config file for the documentation and saved it in tsconfig.doc.json:

{
  "include": ["src/**/*.ts"],
  "exclude": ["src/test.ts", "src/**/*.spec.ts", "src/app/file-to-exclude.ts"]
}

Running compodoc produces an entire directory of html files that depicts the entire app graphically.

Very impressed to see documentation arranged in such a visually appealing manner.

In comparison, documentation for the React app was with Jsdoc and another package better-docs which has a special addition for components. Though running jsdoc produces another directory with html files, the links are broken and navigation seems a bit jumpy.

For Python, I used almost a markdown approach. Classes are documented as this example:

    '''
    Base view for a course based on course URL

    Attributes
    -------------
    serializer_class : class
        CourseSerializer class
    user_model : class
        User class
    lookup_field : str
        The field in URL used to look up model instance

    Methods
    -------------
    get_queryset() : Base method for course list view
    get_object() : Returns course model instance
    '''

The VSCode IDE displays the documentation once you hover over a class. A function has documentation similar to JS:

'''
Create a new course - POST request

Parameters
--------------
request - dict

Raises
--------------
400 error
    Course title missing or not unique
    Course price missing for non-free course
403 error
    If course created by non-admin user

Returns
--------------
201 response with course data
'''

Next step is to find a package that will convert these doc strings into html viewable pages that can be committed into Git.

Thursday, February 22, 2024

Continuing with video contents and getting started with the frontend

With some basic course and lecture end-points created and tested, the next step is the actual video contents of the course along with other downloadable resources like .pdf, .docx, .zip and other files. Since the backend is also now around 1/4th done, it is time to slowly get started with the frontend as well. Rather than start with the frontend for the app directly, I am starting with the video player.

Instead of using the regular video player, I have decided to build a custom video player for a few reasons. First, and quite obviously, I would like to have more control over the videos rather than what comes with a default video player. Second, a year or so back, I was working on another interesting project with a friend that involved building an interactive video player. This was a video player that would take user inputs and play certain parts of the video accordingly. The application for this is primarily in advertising and marketing as the user can choose to watch only what they are interested in watching. With respect to online courses, I would like to use this interactivity in asking students if they would like to review certain concepts necessary for the particular lecture they are about to watch. This is due to the fact that not all students can invest time regularly, and if they are watching videos after several days, they might need a refresher.

This project was called the Accordion player, a name that my friend came up with. I have started hosting this project at this Github link:

https://github.com/shivkiyer/accordion-player

I would definitely like to build the app in React, and would like to also use Angular as it is something I have used in the past. So, there are currently two sub-projects in this repo - one with React and the other with Angular.

Since the app is in the nascent stage, another thing which I would like to go back and work on is documentation. This time, with the Django backend, I have done my best to make the code as modular as possible. This implies classes and functions that are as small as possible. This has the advantage that the single doc string at the beginning of the function or class is usually good enough and there doesn't have to be comments all over the code which can become a little bothersome.

However, with both Python and JavaScript, it might be worthwhile to write proper doc strings which can then be used to create nice understandable documentation of the code. So there might be a few commits only related to documentation over the next week.

Wednesday, February 7, 2024

Additions to the lecture API endpoints

With the basic CRUD for lecture endpoints done, I will need to plan some other details as well as the lecture contents and resources (or attachments). For now, a lecture must have a title which must be unique in a course, and an optional description. Another field that a lecture must have is whether a preview is enabled, and in that case anyone can view it for free. Also, if the lecture has video content, the duration of the video needs to be displayed even in the list view.

Next comes the sequence of lectures in a course. There has to be a sequence ID to indicate how the lectures are to appear, and an instructor can move lectures around and change the sequence. This sequence ID is also unique within a course, and must not be set directly by the instructor, but rather must be generated during the creation process. In the case of a single lecture creation, it might be fairly easy, as one only needs to check the largest sequence number in the course and add the new lecture. In the case of bulk lecture creation, it might need some kind of counter as the contents of a spreadsheet might be read. Finally, in the lecture list view, the lectures should appear in the ascending order of the sequence ID.

There has been no difference between the Lecture List view and the Detail view. However, these two should be different - the list view should only display the title, description and time duration of a lecture. The detail view should provide everything - details of the video content, resources for download etc.

Some changes such as the detail view can wait until the other models such as videos and downloads are ready. But the sequence ID can be handled right away. Along with creating the sequence ID and retrieving lectures according to sequence ID, an API call for changing the sequence ID of a lecture must also be created. So, if lecture 15 in a course needs to be inserted before lecture 9, the sequence ID of lecture 15 will become 9, and the sequence IDs of lectures 9 to 14 will increase by 1. And the reverse process if a lecture needs to be moved down the list.

Thursday, February 1, 2024

Starting with the lecture detail view

The list view where all the lectures of a course can be fetched is a more public view, as the list of lectures should be visible to the general public even without login and registration. The only constraint is that the course should be published. Unpublished courses should be visible only to admin and to instructors.

For this reason, the GET method handler in lecture view is:

def get(self, request, *args, **kwargs):
    try:
        self.authenticate(request)
    except Exception as e:
        pass
    if self.request.user is not None and self.request.user.is_staff:
        self.init_lecture()
    else:
        self.init_lecture(admin_only=False)
    if self.kwargs.get('id', None) is None:
        return self.list(request, *args, **kwargs)
    self.check_permissions(request)
    return self.retrieve(request, *args, **kwargs)

First I call the authenticate method which will insert the user object into the request body from the JWT in the header. But, I call it in a way that it does not throw an exception. If the user is a staff user, I extract the course even if it is unpublished. Otherwise, only published courses will be extracted. This is by modifying the init_lecture method to be:

def init_lecture(self, admin_only=True):
    '''
    Initialize lecture view
        - fetch course object
    '''
    course_slug = self.kwargs.get('slug', None)
    self.course = Course.objects.get_course_by_slug(
        course_slug,
        admin_only=admin_only
    )

Unless the method is called with admin_only=False, it will extract even unpublished courses. In the view method, this happens only when there are no credentials or when the user is not admin. If a course is unpublished, a normal user or anonymous user will get a 'Course not found' 404 error.

In the GET method handler, if an id is passed in the url, it will proceed to the detail view. Here, for now, I am only checking permissions in the sense that a user should be logged in. The method check_permissions is:

def check_permissions(self, request):
    if request.user is None:
        raise CustomAPIError(
            status_code=status.HTTP_403_FORBIDDEN,
            detail='Must be logged in to access a lecture'
    )

Later this will check for registration and payment. For example, it an instructor wishes to have a course completely paid, the check_permissions will ensure that the user has paid for the course. My plan is to keep a certain percentage of videos of a course to be free, and so this can check if a video can be watched for free or needs to be paid for. This of course also implies that the lectures need a sequence ID within a course, so that you can find out beyond which lecture, a user will need to pay for the course.

The next view will be the PATCH and DELETE view. The lecture model needs a few changes and so does the serializer, as the detail view will need more details including the video content and also other resources like .pdf attachments etc.

Tuesday, January 30, 2024

Code refactor to simplify view classes

After having written a few view classes for endpoints, it is quite clear that there is way too much repetition. The reason for the repetition is the need to generate clear and different error messages from the backend so that the frontend will not have to do much. My first thought was that I would have to define a base view class that would define a wrapper method that would include these exception blocks and then all other view classes would inherit this base class. But this seemed like a very basic requirement that a million other Django developers might have wanted and so the chances that this would not somehow already be built into DRF seemed unlikely. After a little bit of googling, the answer was in the APIView class in DRF.

The GenericAPIView has only a few method such as get_queryset, get_serializer_class, get_object etc. What I was looking for was a base method similar to how the View class in Django had the dispatch method and a few others that were called under the hood. The GenericAPIView inherits the APIView class. The APIView class has a number of methods that get called under the hood, and one of them is the handle_exception method. The documentation says that any exception that is thrown by any handler method (get, post, patch etc) is passed to this method which either returns the appropriate Response or re-raises the exception if it can't handle it.

To fully appreciate what is going on, one really has to read the source code in Django Rest Framework. This is the second time I found myself browsing DRF source code and in my opinion, every one should regularly do so, as reading the source code gives you an understanding of DRF that is far deeper than just reading the documentation.

The handle_exception method is in the views.py file inside rest_framework folder of the Github repo:

def handle_exception(self, exc):
    """
    Handle any exception that occurs, by returning an appropriate response,
    or re-raising the error.
    """
    if isinstance(exc, (exceptions.NotAuthenticated,
                        exceptions.AuthenticationFailed)):
        # WWW-Authenticate header for 401 responses, else coerce to 403
        auth_header = self.get_authenticate_header(self.request)

        if auth_header:
            exc.auth_header = auth_header
        else:
            exc.status_code = status.HTTP_403_FORBIDDEN

    exception_handler = self.get_exception_handler()

    context = self.get_exception_handler_context()
    response = exception_handler(exc, context)

    if response is None:
        self.raise_uncaught_exception(exc)

    response.exception = True
    return response

There is a special handling for errors related to a user not being authenticated or authentication failed. But the actual exception handling is a bit dynamic and the method is returned by the get_exception_handler method:

def get_exception_handler(self):
    """
    Returns the exception handler that this view uses.
    """
    return self.settings.EXCEPTION_HANDLER

The exception handler to be used is defined in the settings, and this is probably in case you want to choose a custom exception handler. But, the EXCEPTION_HANDLER is the settings is merely the method defined in the same file:

def exception_handler(exc, context):
    """
    Returns the response that should be used for any given exception.

    By default we handle the REST framework `APIException`, and also
    Django's built-in `Http404` and `PermissionDenied` exceptions.

    Any unhandled exceptions may return `None`, which will cause a 500 error
    to be raised.
    """
    if isinstance(exc, Http404):
        exc = exceptions.NotFound(*(exc.args))
    elif isinstance(exc, PermissionDenied):
        exc = exceptions.PermissionDenied(*(exc.args))

    if isinstance(exc, exceptions.APIException):
        headers = {}
        if getattr(exc, 'auth_header', None):
            headers['WWW-Authenticate'] = exc.auth_header
        if getattr(exc, 'wait', None):
            headers['Retry-After'] = '%d' % exc.wait

        if isinstance(exc.detail, (list, dict)):
            data = exc.detail
        else:
            data = {'detail': exc.detail}

        set_rollback()
        return Response(data, status=exc.status_code, headers=headers)

    return None

There is special handling for 404 and permission denied errors (403). But other than all it does it checks if the error is of type APIException.

class APIException(Exception):
    """
    Base class for REST framework exceptions.
    Subclasses should provide `.status_code` and `.default_detail` properties.
    """
    status_code = status.HTTP_500_INTERNAL_SERVER_ERROR
    default_detail = _('A server error occurred.')
    default_code = 'error'

    def __init__(self, detail=None, code=None):
        if detail is None:
            detail = self.default_detail
        if code is None:
            code = self.default_code

        self.detail = _get_error_details(detail, code)

    def __str__(self):
        return str(self.detail)

    def get_codes(self):
        """
        Return only the code part of the error details.

        Eg. {"name": ["required"]}
        """
        return _get_codes(self.detail)

    def get_full_details(self):
        """
        Return both the message & code parts of the error details.

        Eg. {"name": [{"message": "This field is required.", "code": "required"}]}
        """
        return _get_full_details(self.detail)

So APIException inherits Exception but defines status_code and detail. The exception_handler method merely extracts the detail attribute along with the status_code in the APIException object and returns Response with the data being a dictionary containing the detail message. So, this exception_handler in reality is doing what I have been trying to do manually with different except blocks.

So, the first is to define a CustomAPIError that subclasses this APIException:

class CustomAPIError(APIException):
    def __init__(self, status_code, detail):
        self.status_code = status_code
        self.detail = detail

And all the other error classes like Http400Error etc can all be deleted. Where I was throwing these specific errors, I now throw the CustomAPIError. And the view classes can be much simpler, as once these errors are thrown, the handle_exception will call the exception_handler which will return the error Response. So, now the POST method handler for courses will be:

def post(self, request, *args, **kwargs):
    '''Create a new course - POST request'''
    self.authenticate(request)
    serializer = self.get_serializer(data=request.data)
    self.perform_create(serializer)
    return Response(
        serializer.data,
        status=status.HTTP_201_CREATED
    )

No need to enclose it in a try/except with different excepts throwing different errors - all taken care of by handle_exception. The only problem is that now logging is a bit broken. That will need that base class with a custom handle_exception method that uses logging.

Friday, January 26, 2024

Starting with the lecture app

A new app called lectures was created and the following very basic model just as a starting point:

class Lecture(models.Model):
    '''Lecture model'''

    course = models.ForeignKey(
        'courses.Course',
        models.SET_NULL,
        null=True
    )
    title = models.CharField(max_length=300)
    description = models.TextField(blank=True, null=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return self.title

Each lecture has to belong to a course, has to have a title and a description. For now, I have not created a ForeignKey pointing to User though it might be useful to know which instructor created the course and which one updated it. But, again, since this app is mainly for small-time instructors (such as myself) to host their courses rather than become a full-scale MOOC platform, there may not be that many instructors and so just logs might be good enough to keep track of what's going on. It can be added without too much fuss.

All URLs for lectures will be with respect to courses, and this implies, the structure will be /api/courses/<course-slug>/lectures/<lecture-urls>. For this reason, I did not add a placeholder for lectures in the main urls.py file, but rather inside the urls.py file of the courses app. In the urls.py file of the lectures app, I started off with a basic create view. Using the learning from the views in courses, I created a base view that inherits UserAuthentication and will later implement a get_object which will determine if the user has the authority to view the lecture.

class LectureBaseView(GenericAPIView, UserAuthentication):
    '''Basic lecture view'''

    serializer_class = LectureSerializer
    user_model = User
    lookup_field = 'id'


class LectureView(LectureBaseView, CreateModelMixin):
    '''Basic lecture view'''

    def get(self, request, *args, **kwargs):
        return Response('TODO')

    def post(self, request, *args, **kwargs):
        course_slug = self.kwargs.get('slug', None)
        try:
            course_obj = Course.objects.get_course_by_slug(course_slug)
            self.authenticate(request)
            serializer = LectureSerializer(data=request.data)
            serializer.save(
                user=self.request.user,
                course=course_obj
            )
            return Response(serializer.data)
        except Http400Error as e:
            return Response(
                data=str(e),
                status=status.HTTP_400_BAD_REQUEST
            )
        except Http403Error as e:
            return Response(
                data=str(e),
                status=status.HTTP_403_FORBIDDEN
            )
        except Http404Error as e:
            return Response(
                data=str(e),
                status=status.HTTP_404_NOT_FOUND
            )
        except Exception:
            return Response(
                data=DEFAULT_ERROR_RESPONSE,
                status=status.HTTP_400_BAD_REQUEST
            )

Though I am inheriting the CreateModelMixin, I am not really using it as I would like to throw custom exceptions rather than ValidationErrors by rest_framework. Maybe I will remove it later. But other than that the usual stuff, make sure user is authenticated and as an instructor (which means admin). Once course has been extracted from the slug and the user from the JWT token, the lecture can be created from the LectureSerializer:

class LectureSerializer(serializers.ModelSerializer):
    '''Serializer for Lecture model'''

    def save(self, *args, **kwargs):
        if self.is_valid():
            return super().save(*args, **kwargs)
        else:
            raise Http400Error(extract_serializer_error(self.errors))

    def check_user_is_instructor(self, course, user):
        if user is None:
            raise Http403Error(
                'Must be logged in as an instructor to create lectures'
            )
        if not course.check_user_is_instructor(user):
            raise Http403Error(
                'Must be an instructor of the course to create lectures'
            )
        return True

    def create(self, validated_data):
        user = validated_data.get('user', None)
        course = validated_data.get('course', None)
        if self.check_user_is_instructor(course, user):
            del validated_data['user']
            del validated_data['course']
            return Lecture.objects.create(
                **validated_data,
                course=course
            )

    class Meta:
        model = Lecture
        fields = ['title', 'description']
        extra_kwargs = {
            'title': {
                'error_messages': {
                    'required': 'The title of a lecture is required',
                    'blank': 'The title of a lecture is required'
                }
            }
        }

The create method checks if the user is an instructor and only then creates the course or else throws an exception.

Tried it out a bit through Postman and it works. A little more tweaking required to ensure that two lectures with the same title do not exist in the same course. That will probably a lecture manager method. Tests will make the code better.

Wednesday, January 24, 2024

Completing tests for basic course and user apps

Following the code refactor, I wrote tests for the three apps so far - user_auth, courses and registration. The tests cover almost all the non-trivial logic written so far, and so now the time has come to move on to the next part - course lectures.

A course can be created by an admin user and other admins can be added as instructors. Maybe, an option can be created for a teaching assistant at a later stage, in case an instructor does not want to add other co-instructors as admins but rather just other users as helpers. This I will figure out later along with the logic for deleting courses.

A course will have many lectures, so this is a simple case of a foreign key. The lectures can be video or audio or just text. Rather than having the content directly in the lectures, it might be better to create another table for that so that the content can be given metadata such as text translations for other languages or just merely audio in different languages.

In terms of a lecture, it should have a title, a number in a sequence, an optional description, content and optional attachments. Basic operations should be CRUD, except bulk delete should not be allowed in case a malicious instructor tries to delete a course. It should also be possible to change the sequence of lectures in a course. Of course, only an instructor should be able to perform any of these actions.

In terms of student registration, currently any student can register for any course paid or free. The payment processing will come later. In terms of student registrations, my plan is to allow any student to watch around 40% of a course for free and only then ask for payment for those courses which are paid. The reasoning - after watching 40% of the course, a student can judge whether the course is truly relevant and useful, and this way there will be no possibility of refunds later. The way to do this would be to update the CourseStudentRegistration model to include a paid field which can initially be False and will turn to True only when payment is processed. This can be used to check for authorization on whether a student can watch a particular video - if the lecture number is greater than the free quota, he will be asked to pay.