Version 9 (modified by os, 14 years ago) (diff) |
---|
OpenSubtitles v2 draft specification
Introduction
Programming in python use PEP-8 code practices, another good basics
Subtitles section
We are trying to avoid duplicate subtitles as much as possible, so in ideal world, there should be only one subtitle for many releases. We try to approach this by using SubLib. So, in system there will be saved only 1 subtitle for each version of movie: [Matrix], [Matrix - Extended Cut], [Matrix - Directors Cut] and so on. This is ideal situation and current world is not ideal.
We know, there are many versions of rip, so in ideal world, there should be just some rules, how to change original subtitles to fit them to the movie version. For example:
- Take [Matrix] subtitles id 123456
- Change frame-rate from 25 FPS to 23.978 FPS
- Add 3 seconds to the beginning
- Cut subtitles at 1 hour 25 minutes 45 seconds 78 milliseconds and make 2 files
With these rules we can represent any version, and hopefully any needs for movie. One big advantage of this system is wiki-style of subtitle editing, so, you look movie, you will find some bad translation or typo, you login to the site and edit these subtitles online (or using some program implementing our API). All these changes will be present for the all versions.
So in the system will be one master subtitle for each version of movie (original version, directors cut). This will be beginning and in database we will got rules, how to "change" it for different movie rips. So, master subtitle is present in database and by analyzing timestamps of other uploaded subtitles we got rules for re-timing it to the another movie rips, that's theory.
Wiki editing - users should be able to edit and translate subtitles online, with versioning system. All changes will be tracked, so in the final we will have in system how many changes was done by each user.
Subtitles export - subtitles in the system are saved as metadata, so user can choose any subtitle format as he want.
Realtime Re-timing, cutting - SubLib? supports re-timing, cutting, "moving" subtitles, so this should be done also online and via API.
With subtitles store its encoding. Use charset detection For language detection use TextCat?, Python implementation, langdet, google translate python lib
Movie section
Implement more than one website for movies, now is implemented only imdb.com python wrapper, which is not bad, but they don't provide any official API access to their database. That's why there is need to implement sites like TheMovieDB.org python wrapper here and TheTVDB.com python wrapper. So support 3 sites, imdb.com as last, when 2 other fails (?)
Movie hashing - there is little need for stronger hash, which need some research, how to done it properly, because current implementation (CRC64) is weak and can lead to collisions in future (so far there was no collisions). Ideally, system should be coded for more kind of hashes. I think wrong idea is put into hash information such is movielength, fps, dimensions of movie and such. Hash should be only file dependent, for example first and last 128kb (sha1), and filesize together hashed (sha1).
Media information - in database there is need to save such information, but problem is implementation can be different in programs. The most important is FPS.
User Section
Registration - simple as possible - UserName?, Email, Password, possible login using social sites, openid and so on (rpxnow.com).
Groups and permissions - similar like in current version of opensubtitles. There are permissions, groups got 1 or many permissions, and user can belong to 1 or more groups.
Translator Groups
There should be some _good_ support for subtitle translators and their groups. This need more research how to done it properly.
Website Translation
Current system of website translation is OK.
API Access
Only registered useragents will have API access by using their key. API should be provided by different standards such as XML-RPC (current), REST, JSON...good example of API is on TheMovieDB. API versioning is a must.
Caching
Cache everything what needs to be cached in memcached :)
Software specification
Lighttpd as http server, running FastCGI Postgre SQL as database server, Python as programming language, Django as framework
- Django-Sphinx
- Web Services
Memcache for memory caching Sphinx-search for fulltext search